PEDS Advance Access originally published online on March 30, 2006
Protein Engineering Design and Selection 2006 19(6):245-253; doi:10.1093/protein/gzl006
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Improved mutants from directed evolution are biased to orthologous substitutions
1 Division of Biological Engineering, Massachusetts Institute of Technology Cambridge, MA 02139, USA 2 Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge, MA 02139, USA 3Present address: Department of Bioengineering, Stanford University Stanford, CA 94305, USA 4Present address: Department of Molecular Science and Technology, Ajou University Korea
5To whom correspondence should be addressed. E-mail: wittrup{at}mit.edu
| Abstract |
|---|
|
|
|---|
We have engineered human epidermal growth factor (EGF) by directed evolution through yeast surface display for significantly enhanced affinity for the EGF receptor (EGFR). Statistical analysis of improved EGF mutants isolated from randomly mutated yeast-displayed libraries indicates that mutations are biased towards substitutions at positions exhibiting significant phylogenetic variation. In particular, mutations in high-affinity EGF mutants are statistically biased towards residues found in orthologous EGF species. This same trend was also observed with other proteins engineered through directed evolution in our laboratory (EGFR, interleukin-2) and in a meta-analysis of reported results for engineered subtilisin. By contrast, reported loss-of-function mutations in EGF were biased towards highly conserved positions. Based on these findings, orthologous mutations were introduced into a yeast-displayed EGF library by a process we term shotgun ortholog scanning mutagenesis (SOSM). EGF mutants with a high frequency of the introduced ortholog mutations were isolated through screening the library for enhanced binding affinity to soluble EGFR ectodomain. These mutants possess a 30-fold increase in binding affinity over wild-type EGF to EGFR-transfected fibroblasts and are among the highest affinity EGF proteins to be engineered to date. Collectively, our findings highlight a general approach for harnessing information present in phylogenetic variability to create useful genetic diversity for directed evolution. Our SOSM method exploits the benefits of library diversity obtained through complementary methods of error-prone PCR and DNA shuffling, while circumventing the need for acquisition of multiple genes for family or synthetic shuffling.
Keywords: epidermal growth factor/epidermal growth factor receptor/in vitro evolution/protein engineering/yeast display
| Introduction |
|---|
|
|
|---|
In protein directed evolution, a genetically diverse population of candidate mutants is screened for improvement in a phenotype of interest. A number of methods for creating the genetic diversity of the mutant pool have been explored. A common tactic is to randomly mutagenize by error-prone PCR, varying the error rate by the use of different polymerases, reaction conditions, or non-natural nucleotide analogs (Zaccolo et al., 1996
In analyzing a panel of improved-affinity epidermal growth factor (EGF) mutants derived by error-prone PCR and yeast surface display screening, we observed a strong statistical bias towards orthologous substitutions. This observation was extended to two other engineered proteins, interleukin-2 (IL-2) (Rao et al., 2004
, 2005
) and the EGF receptor (EGFR) ectodomain (Kim et al., 2005
), as well as a sample of 44 previously reported subtilisin mutants (Siezen and Leunissen, 1997
; Bryan, 2000
). We found that 4080% of the selected mutations for each of these four proteins were present in at least one ortholog, which is a significantly higher frequency than expected by random chance. The statistically significant bias towards orthologous substitutions at variable sites in improved mutants from genes mutagenized by error-prone PCR indicates that even in the absence of an explicit attempt to shuffle family members, there is a substantial overlap between family shuffling and random mutagenesis in the sequence subspace that is selected. Based on these results, we applied a strategy to combinatorially sample phylogenetic diversity by a process we term shotgun ortholog scanning mutagenesis (SOSM), in which orthologous substitutions were shuffled with EGF mutations created by error-prone PCR. This technology is related to, but distinct from previous strategies, including shotgun substitution of homologous residues (Murase et al., 2003
; Sato et al., 2004
) and substitution of the most phylogenetically common amino acids (Lehmann et al., 2000
, 2002
). SOSM bridges two alternative approaches to the generation of genetic diversity for directed evolution: error-prone PCR and family shuffling, to sample a range of possible side chains at variable positions. Collectively, these approaches allowed us to isolate dozens of EGF mutants with up to a 30-fold increase in binding affinity for the EGFR.
| Materials and methods |
|---|
|
|
|---|
Soluble protein production
Soluble EGFR ectodomain (residues 1621) was expressed in Hi FiveTM cells using the InsectSelectTM constitutive expression system (Invitrogen). EGFR protein was purified from the culture medium by immunoaffinity chromatography, using a sepharose column covalently cross-linked to the EGFR-specific monoclonal antibody 225. Analysis of purified EGFR extracellular domain by size exclusion chromatography and SDSPAGE demonstrated a single protein species at the expected molecular weight, indicating that the soluble receptor was not aggregated or hyperglycosylated. EGF wild-type and mutant DNA was subcloned into a yeast secretion vector containing N-terminal FLAG and C-terminal hexahistidine epitope tags. Secreted EGF proteins were isolated from YVH10 Saccharomyces cerevisiae supernatants by metal (Ni) chelating chromatography (Qiagen). EGF proteins were further purified by size exclusion chromatography, and purity was confirmed using SDSPAGE (16% Tricine gels). For flow cytometry experiments, soluble EGFR or EGF was fluorescently-labeled with Alexa-488 succinimide ester (Molecular Probes), through accessible lysine
-amino groups.
Error-prone PCR using nucleotide analogs
The human EGF gene was subcloned into the pCT302 backbone at NheI and BamHI restriction sites. This vector is termed pCT EGF. The yeast display construct includes a C-terminal c-myc epitope tag for detection and quantitation of cell surface proteins. PCR primers with 50 bp of overlapping sequence in the forward and reverse direction were designed for homologous recombination in yeast (Swers et al., 2004
). Primers were truncated at the NheI (5') and BamHI (3') restriction sites flanking the EGF insert. The pCT EGF plasmid was subjected to random mutagenesis by error-prone PCR with low fidelity Taq polymerase (Invitrogen) and 50 mM MgCl2. To tune the mutagenic frequency, varying amounts of the nucleotide analogs 8-oxo-dGTP (4 µM, 40 µM, or 400 µM) and dPTP (2 µM, 20 µM, or 200 µM) (TriLink Biotech) were used in separate PCR consisting of 5, 10 and/or 20 cycles (Zaccolo et al., 1996
; Zaccolo and Gherardi, 1999
). PCR products were amplified in the absence of nucleotide analogs, and 90 µg of mutagenic DNA insert and 9 µg of restriction enzyme cut pCT vector backbone were transformed into EBY100 competent yeast cells (Boder and Wittrup, 1997
) by electroporation (Meilhoc et al., 1990
). A library of
1.2 x 107 yeast transformants was obtained, as estimated by plating aliquots of the library and colony counting. The library was propagated and induced for protein expression at 30°C by the addition of galactose to the culture media.
DNA shuffling
Additional EGF mutant libraries were constructed by combining error-prone PCR using nucleotide analogs and DNA shuffling. DNA templates from the original error prone PCR (45%), the enriched library pool from the first round of EGF affinity maturation (45%), and single clones of affinity enhanced EGF mutants (1% each of 10 mutants) were randomly fragmented with DNase I for 7 min at 15°C. DNA fragments were purified using a Qiaex II kit (Qiagen), reannealed, and amplified by PCR with primers as described above for homologous recombination. 112 µg of shuffled, mutagenic DNA and 5.4 µg of restriction enzyme cut acceptor vector was transformed into the yeast strain EBY100 by homologous recombination to create a library of
107 clones. For SOSM, DNA templates from error-prone PCR (percentages as described above) were shuffled with single-stranded oligonucleotides that were
25 bases in length and corresponded to orthologous mutations indicated in Figure 4. Seven separate self assembly reactions were performed combining two or three oligonucleotides (500 ng of each) and 0.8 µg of DNase I treated fragments, such that their DNA sequences did not overlap. One additional doping reaction was set up combining all 19 oligonucleotide sequences (2 µg total) with 1 µg of DNase treated fragments. Re-assembled DNA was amplified by PCR as above, and 15 µg of each insert (120 µg total) was transformed with 0.65 µg of cut acceptor vector (5.2 µg total) into yeast by homologous recombination to create a library of
7 x 106 clones.
Flow cytometric EGF library screening
For error-prone PCR library screening, 108 induced yeast cells were labeled with 200 nM of Alexa-488 labeled EGFR for 2 h at 37°C in phosphate-buffered saline containing 1 mg/ml BSA (FACS buffer). To detect expression of the C-terminal c-myc epitope tag, a 1 : 10 dilution of monoclonal antibody 9E10 (Covance) was added for the last 30 min of incubation. Yeast cells were washed with ice-cold FACS buffer and labeled with a 1 : 6 dilution of goat anti-mouse phycoerythrin secondary antibody (Sigma) for 15 min at 4°C. Cells were washed and screened by dual-color flow cytometric sorting for yeast cells which both displayed EGF mutant proteins and bound to Alexa-488 EGFR using a DakoCytomation (Carpinteria, CA) MoFlo FACS machine. Collected yeast cells were cultured, induced for expression, and subjected to two subsequent rounds of flow cytometric sorting with 50 nM of Alexa-488 EGFR. Plasmid DNA was recovered from yeast clones isolated after the second and third sorts using a ZymoprepTM kit (Zymo Research) and amplified in XL1-blue supercompetent Escherichia coli cells (Stratagene). DNA sequencing of EGF mutants was performed by the MIT Biopolymers facility. For the DNA shuffled and SOSM libraries, 108 yeast cells were labeled with 75 nM Alexa-488 EGFR and 9E10 monoclonal antibody as described above. For subsequent rounds of flow cytometric sorting, 15 and 3 nM Alexa-488 EGFR was used for yeast screening. Plasmid DNA was recovered from isolated clones after the second and third rounds of cell sorting and analyzed as above.
Statistical analysis
Information content (R) for each amino acid site in a collection of orthologs was determined by the following formula:
![]() |
EGF was screened for improved binding affinity for soluble EGFR, and 121 mutations were analyzed, and the variability was determined from 9 orthologs. Since some of the EGF orthologs are missing residues 4953, only residues 148 were considered in the analysis. IL-2 was screened for improved affinity for the IL-2 receptor alpha subunit (Rao et al., 2004
, 2005
), and 112 mutations were analyzed, and variability was determined from 43 orthologs. The EGFR was screened for improved display of conformational epitopes on the yeast surface (Kim et al., 2005
), and 6 mutations were analyzed, and variability was determined from 13 orthologs. Subtilisin mutants were selected for a variety of phenotypes, with results from previously published studies (Bryan, 2000
) analyzed and compared to data for 36 orthologs.
EGF competition binding to EGFR-transfected fibroblasts
NR6, a murine 3T3-derived fibroblast cell line that lacks endogenous EGFR, was stably transfected with wild-type human EGFR to generate NR6 WT (Chen et al., 1994
). Prior to performing binding assays, confluent NR6 WT cells were dislodged from tissue culture plates with Versene (Gibco). EGF competition binding was measured in two ways to insure that equilibrium had been reached: 5 x 104 NR6 WT cells were incubated with Alexa-488 labeled EGF wild-type (Peprotech) for 30 min at 4°C. Increasing concentrations of unlabeled EGF wild-type or mutants were added, and samples were incubated for an additional 6 h at 4°C, with constant mixing. Alternatively, increasing concentrations of unlabeled EGF wild-type and mutants were first added to the cells for 30 min at 4°C, and Alexa-488 EGF wild-type was added for an additional 6 h at 4°C. Fluorescence intensity of cell surface Alexa-488 EGF wild-type labeling was measured by flow cytometry. Binding assays were performed in PBS supplemented with 1 mg/ml BSA (pH 7.4), under conditions where ligand depletion was negligible. Competition binding curves were fit using a four-point binding equation. Standard deviation represents replicate binding experiments performed in at least triplicate on different days using two different protein preparations.
| Results |
|---|
|
|
|---|
Isolation of EGF mutants with increased binding to the EGFR
Human EGF was expressed on the surface of yeast as a protein fusion to the Aga2p agglutinin subunit (Boder and Wittrup, 1997
). Fluorescently-labeled soluble EGFR extracellular domain was shown to specifically bind to the yeast-displayed EGF, demonstrating that the EGF and EGFR proteins were properly folded and functional (data not shown). A library of yeast-displayed EGF mutants was created by error-prone PCR of the EGF plasmid DNA using nucleotide analogs. Thirty clones from the starting library were sequenced, and demonstrated a diverse range of mutations, from 1 to 14 amino acids changes in the EGF protein. The yeast library was screened by dual-color flow cytometry for clones that both displayed EGF (as determined by indirect immunofluorescence of a C-terminal c-myc epitope tag), and bound to Alexa-488 labeled EGFR extracellular domain. Three rounds of flow cytometric sorting were used to obtain an enriched pool of yeast cells that exhibited enhanced binding to soluble EGFR protein over EGF wild-type. Twenty clones were isolated and sequenced, and 12 distinct mutant proteins were identified (Table I, Library 1). DNA shuffling (Stemmer, 1994a
) was performed to increase library diversity, and to recombine favorable EGF mutations in an attempt to isolate clones with further increased binding affinity to the EGFR. After three rounds of flow cytometric sorting of this library, an enriched population of yeast with enhanced binding affinity to soluble EGFR extracellular domain was obtained, and eleven unique clones were identified (Table I, Library 2). In general, mutations isolated from DNA shuffling and/or nucleotide analog mutagenesis were encoded by identical codons, as one base change was required to attain the observed amino acid substitution. Interesting exceptions were E24K, which was coded for by both AAA and AAG; and I38A, which required two base changes to go from ATC to GCC (Table I).
|
Amino acid mutations are biased towards homologous substitutions in protein variable regions
The amino acid sequences of the affinity enhanced EGF mutants indicated that a number of mutations selected from the directed evolution library screens were present in the EGF sequences from other species (Figure 1). This observation prompted us to compare the level of conservation at sites mutated by directed evolution with sites that are found to be variable amongst EGF orthologs. The measure of variability in orthologs can be calculated as the information content (R), related to the Shannon entropy information content (Schneider et al., 1986
). A perfectly conserved site has an R-value of 4.19 (Weiss et al., 2000
), and a completely random site has an R-value of 0. We found that amino acid changes in populations of affinity enhanced EGF mutants were biased away from the most conserved sites in orthologous proteins (Supplementary data are available at PEDS online). Interestingly, this trend was also observed when analyzing sequence data from our laboratory on affinity matured IL-2 mutants (Rao et al., 2004
, 2005
), and on EGFR extracellular domain that was engineered for enhanced folding and expression (Kim et al., 2005
). Previously published studies of subtilisin mutants selected for a variety of improved phenotypes (Bryan, 2000
) also preferentially utilized homologous residues (Supplementary data are available at PEDS online). The averaged histogram of the information content for all four proteins indicates that mutated sites are less conserved than nonmutated sites (Figure 2). Sites selected for improved function by directed evolution were enriched at intermediate information content (23), and disfavored at highly conserved sites (3.54.5) (Figure 2). By contrast, in structure/function studies of EGF, sites found to ablate EGF function were enriched at highly conserved sites, and disfavored at intermediate information content (Figure 2B). These two difference histograms together demonstrate that mutations that improve protein function most often occur at sites that are not highly conserved but are of intermediate variability. Conversely, loss-of-function mutations are more likely to occur at highly conserved sites.
|
|
The dataset analyzed in Figure 2 was examined for the fraction of mutations changed to orthologous residues for each protein (Figure 3). This frequency was compared to two random negative control values, one theoretical and one experimental. The theoretical value was calculated by enumerating all possible nucleotide point substitutions and determining whether they coded for an orthologous mutation (Figure 3). This predicted fraction agrees well with sequences sampled at random from the prescreened libraries for EGF, IL-2, and EGFR (Figure 3). Data were not available for the prescreened subtilisin libraries from previous studies. The statistical significance was determined from an exact binomial distribution confidence limit that only includes the observed fraction of selected orthologous mutations and the fraction theoretically expected. The P-value is the probability that the two results are drawn from a binomial distribution with the same mean. These data demonstrate that amino acid changes in mutants selected by directed evolution are very often to residues present in at least one ortholog, at a frequency significantly greater than expected from the theoretical distribution of random point nucleotide mutations or the actual prescreened library. Note that the probability of a random mutation coding for an orthologous substitution is a function of the size of the ortholog database available (9 for EGF, 43 for IL-2, 13 for EGFR, 36 for subtilisin). Nevertheless, for each protein the probability of selecting an improved mutant with an orthologous substitution is significantly greater than expected by chance. Of course, the phylogenetic information available for any given protein is necessarily incomplete, and so the pairwise selected/random comparison is only valid within a given dataset.
|
Introduction of orthologous mutations into the yeast-displayed EGF library
To further test the significance of selection bias towards orthologous substitutions, specific amino acids corresponding to EGF residues found in a variety of species were introduced into a mutagenic EGF library (Figure 4). Synthetic oligonucleotides coding for amino acid sequences of orthologous EGF residues were shuffled with DNA created by error-prone PCR to be incorporated into the reassembled genes. In addition, synthetic oligonucleotides containing an NNK sequence (where N = A, C, G, or T; K = G, or T) flanked by 19 bases of homology to the EGF template were added to randomly mutagenize residues Ile 38, Glu 51 and Leu 52, since these sites showed much sequence diversity in the clones isolated from the original libraries. After two rounds of flow cytometric screening with soluble EGFR extracellular domain, an enriched population of yeast-displayed EGF mutants with increased binding to EGFR was obtained. Fourteen unique clones were isolated from this library (Table II). Interestingly, a number of the introduced orthologous mutations occurred 3 times or more in the isolated clones (Figure 4 and Table II) and were distinct from mutations identified from the first two libraries created by error-prone PCR and DNA shuffling (Figure 1 and Table I).
|
|
Soluble EGF mutants exhibit enhanced binding to EGFR-transfected fibroblasts
Dose-dependent binding curves of yeast-displayed EGF mutants could not be performed due to lack of sufficient quantities of recombinant EGFR extracellular domain. Therefore, yeast-displayed EGF mutants were analyzed by flow cytometry for relative binding using non-saturating concentrations of Alexa-488 labeled EGFR (data not shown). Alexa-488 fluorescence was normalized for yeast EGF expression levels through dual-color detection of a C-terminal c-myc epitope tag. Two or three mutants from each library that exhibited the highest levels of normalized fluorescence were chosen for soluble production in yeast (Figure 4).
Soluble EGF proteins were assayed for binding to the fibroblast cell line NR6 WT, which has been stably transfected to express human EGFR (Chen et al., 1994
), to determine if decoupled EGF yeast fusion proteins would retain their increased EGFR binding affinity. Half-maximal values of binding (IC50) were determined by competition binding of EGF wild-type and mutant proteins with Alexa-488 labeled EGF wild-type (Figure 4). EGF clone 28 (IC50 = 0.29 ± 0.11 nM) and clone 30 (IC50 = 1.1 ± 0.4 nM) were shown to have approximately a 15- and 4-fold increase in binding over EGF wild-type (IC50 = 4.2 ± 0.9 nM), respectively. EGF clone 114, isolated from the shuffled DNA library, exhibited approximately a 30-fold increase (IC50 = 0.16 ± 0.02 nM) in binding to the NR6 WT fibroblast cells over EGF wild-type. EGF clone 96 was not able to be expressed in soluble form in yeast and could not be further characterized. EGF clones 121 and 123, isolated from SOSM, competed for binding to the fibroblast cells with an IC50 of 0.14 ± 0.07 nM and 0.49 ± 0.25 nM, respectively. When expressed on the yeast cell surface, EGF clone 107 exhibited the highest levels of normalized fluorescence upon the addition of soluble Alexa 488-EGFR ectodomain. However, this mutant protein precipitated when expressed in soluble form, and was not characterized further.
| Discussion |
|---|
|
|
|---|
We report here two statistical features of mutations found in each of four different proteins improved by directed evolution: (i) improvement-of-function mutations occur disproportionately at sites of intermediate phylogenetic conservation and (ii) residue substitutions are orthologous at a significantly higher than expected frequency.
Phylogenetic information has been used in a number of previous mutagenesis strategies. Family shuffling can be used to sample phylogenetic variation but requires acquisition of the gene for each homolog (Crameri et al., 1998
; Leong et al., 2003
). Synthetic shuffling circumvents this requirement but instead requires synthesis of a full set of overlapping degenerate oligonucleotides (Ness et al., 2002
). The approach used here requires only a small set of degenerate oligonucleotides to be doped into a standard DNase shuffling reaction, an approach also successful for CDR randomization in antibodies (Crameri et al., 1996
). Synthetic shuffling of 15 subtilisin orthologs with 52 variable sites has demonstrated that phylogenetically represented substitutions are generally very well tolerated, either singly or in pairs (Govindarajan et al., 2003
). It has been shown previously that shotgun substitution of homologous residues and screening by phage display of EF-Tu (Murase et al., 2003
) or the engrailed homeodomain (Sato et al., 2004
) can both identify functionally critical sites and slightly improve binding affinity. It should be emphasized that SOSM is distinct from, and in some ways the converse of, consensus engineering, in which residues in a given gene are mutated to match the most phylogenetically common amino acid, resulting in improved thermostability (Lehmann et al., 2000
, 2002
). By contrast, in SOSM, sites of intermediate variability are mutated to sample a nonconvergent range of side chains found at that position in orthologous proteins, as in family shuffling.
Voigt et al. (2001)
have developed a computational method for identifying sites that are tolerant to substitution with respect to predicted stability. Phylogenetic variability can be considered an analogous but more restrictive sampling of sequence space, since stability is only one component of a protein's phenotypic fitness in nature, which would include attributes such as enzymatic activity, binding affinity, specificity, solubility and kinetic folding capability. In vitro evolution of TEM-1 beta-galactosidase resistance to a variety of antibiotics has been shown to recapitulate substitutions found in natural isolates (Barlow and Hall, 2002
), a capability that can be applied predictively (Orencia et al., 2001
). Phylogenetically conserved networks of side chain interactions have been shown to contribute to allosteric communication among distant sites in proteins (Suel et al., 2003
), and disruption of such networks may partially account for the statistical bias against mutations at highly conserved sites observed here.
EGF has been considered for wound healing applications, and development and maintenance of the nervous system (Werner and Grose, 2003
; Xian and Zhou, 2004
). EGF superagonists exhibiting enhanced biological potencies or altered pharmacokinetic profiles would be useful in regenerative medicine applications. In addition, EGF antagonists able to block receptor dimerization and signaling could be useful in developing therapeutics against EGFR overexpressed on a variety of human malignancies (Ciardiello and Tortora, 2003
). Receptor binding affinity is coupled to function, such that modulation of ligand/receptor interactions could produce molecules with altered biological properties. Heregulin, an EGF paralog, has been engineered by phage display for mutants that possess a 50-fold increase in receptor binding affinity (Ballinger et al., 1998
). However, when these heregulin mutants were decoupled from the phage surface, they exhibited diminished receptor binding affinities (Ballinger et al., 1998
). In contrast, when produced solubly in yeast, our EGF mutants retained their increased EGFR binding affinity over wild-type EGF (Figure 4).
Error-prone PCR alone identified only one EGF mutant with substantially improved binding affinity, while several candidates were identified by DNA shuffling and SOSM (Figure 4 and data not shown). EGF clone 28, isolated from nucleotide analog mutagenesis, exhibited a 15-fold improvement in binding to EGFR on fibroblast cells. An
2-fold further increase in EGFR binding affinity was obtained from mutants isolated by DNA shuffling and SOSM libraries. The identity, and in some instances the location, of improved mutations isolated from SOSM varied from mutations identified from DNA shuffling and/or error-prone PCR. As the data show, several doped mutations were distinctly present 3 times or more in SOSM isolated mutants (D3N, D3Y, H16N, I38V). In contrast, several mutations that had been isolated 3 times or more from the DNA shuffled and/or error-prone PCR libraries were now absent in the SOSM clones (S2R, E5G, E24K, I38T, K48T). Furthermore, only two out of seven mutations from the best clone obtained by error-protein PCR (clone 28) were represented in clones identified from SOSM. More importantly, the percentage of orthologous mutations present in clones isolated from the SOSM library was 61%, compared with the error-prone PCR (55%) and DNA shuffled libraries (42%). Collectively, these findings appear to have only a modest impact on EGFR binding affinity; however, they could have important biological implications for developing EGF agonists or antagonists.
Our EGF mutagenesis data provide insight into previous studies of EGF/EGFR binding affinity and function probed by site-directed mutagenesis and phage display [(Mullenbach et al., 1998
; Souriau et al., 1997
, 1999
); and Supplementary data are available at PEDS online]. Earlier NMR studies proposed that an EGF surface patch defined by residues Y13/L15/H16 and R41/Q43/L47 was involved in high affinity binding of EGF to the EGFR (Campbell et al., 1989
). Site-directed mutagenesis of these positions and others identified residues important for receptor interaction or protein integrity [(Campion et al., 1990
; Engler et al., 1992
; Matsunami et al., 1991
; Nandagopal et al., 1996
); and Supplementary data are available at PEDS online]. Based on these findings, several groups investigated if these key EGF residues could be modified for improvements in binding affinity and hence biological function. In one study, phage displayed libraries of EGF randomized at positions R41 or D46 failed to identify affinity improved EGF mutants (Souriau et al., 1997
). Similarly, randomization at positions Y13, L15 and H16 demonstrated that these positions were already optimized for binding affinity and activity (Souriau et al., 1999
). Marginal affinity improvements were obtained by site-directed mutagenesis of EGF residues G12Q, H16D and Y13W (Mullenbach et al., 1998
). The mutant G12Q was reported to have a 5-fold increase in binding over wild-type EGF to paraformadehyde-fixed A431 cells (Mullenbach et al., 1998
). However, when tested in our experimental system the G12Q mutant exhibited a 2-fold decrease in binding over wild-type EGF to EGFR-transfected NR6 fibroblast cells (IC50 = 9.5 ± 0.9 nM versus IC50 = 4.2 ± 0.9 nM for EGF wild-type) (Figure 4). Previously, EGF mutants with increased receptor binding affinity were reported by degenerate homoduplex gene family recombination (Coco et al., 2002
). Unfortunately, these studies could not be compared to the present results, as we were not successful in obtaining binding titrations or amino acid sequences for these mutants. Our combinatorial approach allowed for random incorporation of mutations and high mutagenic frequencies that appeared to be required for isolating EGF proteins with enhanced binding affinity to the EGFR. The large number of mutations found in the high affinity EGF mutants was surprising due to the small size of the protein, but is not uncommon for mutants isolated from affinity maturation through directed evolution (Ballinger et al., 1998
; Zaccolo and Gherardi, 1999
; Drummond et al., 2005
), and is consistent with the percent homology amongst EGF orthologs (Figures 1 and 4).
Selection of mutants possessing amino acid changes only in regions of intermediate variability most likely occurs due to requirements for maintenance of protein stability and folding. This is intuitively reasonable, since protein families composed of related amino acid sequences adopt a conserved structure and fold. In a previous study, amino acid residues that were critical to function were retained as wild-type in heregulin mutants that were engineered for enhanced receptor binding, while residues that were tolerable to alanine substitution were altered in the high affinity mutants (Ballinger et al., 1998
). In EGF, the conserved regions of the protein have been shown to be important for structural integrity. Glycine residues at positions 18 and 39 are strictly required because an amino acid side-chain at position 18 would protrude into the interior of the protein and interfere with disulfide bond formation, and the addition of a side-chain at position 39 would conflict with residues Gln 43 and Tyr 44 (Groenen et al., 1994
). EGF contains two hydrophobic, aromatic clusters that are important for structural integrity centered around residues His 10, Tyr 13 and Tyr 22 in the N-terminal domain, and Tyr 37 in the C-terminal domain (Groenen et al., 1994
). All of these residues are unchanged in our affinity enhanced EGF mutants, with the exception of His 10; however, a hydrophobic, aromatic amino acid is retained at this position by its mutation to Tyr. Leu 15 (Nandagopal et al., 1996
), Leu 47 (Matsunami et al., 1991
) and the guanidinium group of Arg 41 (Engler et al., 1992
) have been shown to be important for binding to the EGFR and are conserved in all of the EGF mutants and orthologs presented here. In addition, a small hydrophobic residue at position 30 (Ala in hEGF) and large hydrophobic residues at positions 23 and 26 (Ile and Leu in hEGF, respectively) (Campion et al., 1990
; Groenen et al., 1994
) are retained in the EGF mutants, suggesting these positions are structurally important. Interestingly, the EGF mutant clone 96 contained a Ser 9 to Pro mutation at a site that was conserved in EGF orthologs (Figures 1 and 4). We were not able to solubly express this mutant protein in yeast, suggesting further that alteration of conserved residues is detrimental to protein integrity and stability. The observation that mutations selected by directed evolution are biased towards orthologous residues implies that nature utilizes a select subset of residues among different species for proper folding and expression of proteins.
In summary, we have demonstrated that screening of unbiased randomly mutagenized libraries very often leads to isolation of a selected set of mutations similar to those isolated from family shuffling libraries. SOSM provides an alternate means for extensively sampling phylogenetic diversity without full gene synthesis or physical acquisition of a bank of orthologous genes.
| Acknowledgements |
|---|
|
|
|---|
We thank Eric T.Boder (University of Pennsylvania) for insect cell culture facilities, Douglas Lauffenburger (MIT) for mammalian cell culture facilities, Alan Wells (University of Pittsburgh) for the EGF gene and the NR6 WT cell line, and the MIT Flow Cytometry Core Facility for assistance with flow cytometric sorting. This work was supported by National Institutes of Health Grants CA096504 and F32 CA94796-01 (to J.R.C).
| References |
|---|
|
|
|---|
Ballinger M.D., Jones J.T., Lofgren J.A., Fairbrother W.J., Akita R.W., Sliwkowski M.X., Wells J.A. (1998) J. Biol. Chem. 273:1167511684.
Barlow M. and Hall B.G. (2002) Genetics 160:823832.
Boder E.T. and Wittrup K.D. (1997) Nat. Biotechnol. 15:553557.[CrossRef][Web of Science][Medline]
Bryan P.N. (2000) Biochim. Biophys. Acta 1543:203222.[CrossRef][Medline]
Campbell I.D., Cooke R.M., Baron M., Harvey T.S., Tappin M.J. (1989) Prog. Growth Factor Res. 1:1322.[CrossRef][Medline]
Campion S.R., Matsunami R.K., Engler D.A., Niyogi S.K. (1990) Biochemistry 29:99889993.[CrossRef][Medline]
Chen P., Gupta K., Wells A. (1994) J. Cell Biol. 124:547555.
Ciardiello F. and Tortora G. (2003) Eur. J. Cancer 39:13481354.[CrossRef][Web of Science][Medline]
Coco W.M., Encell L.P., Levinson W.E., Crist M.J., Loomis A.K., Licato L.L., Arensdorf J.J., Sica N., Pienkos P.T., Monticello D.J. (2002) Nat. Biotechnol. 20:12461250.[CrossRef][Web of Science][Medline]
Crameri A., Cwirla S., Stemmer W.P. (1996) Nat. Med. 2:100102.[CrossRef][Web of Science][Medline]
Crameri A., Raillard S.A., Bermudez E., Stemmer W.P. (1998) Nature 391:288291.[CrossRef][Medline]
Drummond D.A., Iverson B.L., Georgiou G., Arnold F.H. (2005) J. Mol. Biol. 350:806816.[CrossRef][Web of Science][Medline]
Engler D.A., Campion S.R., Hauser M.R., Cook J.S., Niyogi S.K. (1992) J. Biol. Chem. 267:22742281.
Govindarajan S., Ness J.E., Kim S., Mundorff E.C., Minshull J., Gustafsson C. (2003) J. Mol. Biol. 328:10611069.[CrossRef][Web of Science][Medline]
Groenen L.C., Nice E.C., Burgess A.W. (1994) Growth Factors 11:235257.[Web of Science][Medline]
Hayes R.J., Bentzien J., Ary M.L., Hwang M.Y., Jacinto J.M., Vielmetter J., Kundu A., Dahiyat B.I. (2002) Proc. Natl Acad. Sci. USA 99:1592615931.
Kim Y.S., Bhandari R., Cochran J.R., Kuriyan J., Wittrup K.D. (2005) Proteins 67:10261035.
Lehmann M., Kostrewa D., Wyss M., Brugger R., D'Arcy A., Pasamontes L., van Loon A.P. (2000) Protein Eng. 13:4957.
Lehmann M., Loch C., Middendorf A., Studer D., Lassen S.F., Pasamontes L., van Loon A.P., Wyss M. (2002) Protein Eng. 15:403411.
Leong S.R., Chang J.C., Ong R., Dawes G., Stemmer W.P., Punnonen J. (2003) Proc. Natl Acad. Sci. USA 100:11631168.
Matsunami R.K., Yette M.L., Stevens A., Niyogi S.K. (1991) J. Cell. Biochem. 46:242249.[CrossRef][Medline]
Meilhoc E., Masson J.M., Teissie J. (1990) Biotechnology (N. Y.) 8:223227.[CrossRef][Medline]
Mullenbach G.T., et al. (1998) Protein Eng. 11:473480.
Murase K., Morrison K.L., Tam P.Y., Stafford R.L., Jurnak F., Weiss G.A. (2003) Chem. Biol. 10:161168.[CrossRef][Web of Science][Medline]
Nandagopal K., Tadaki D.K., Lamerdin J.A., Serpersu E.H., Niyogi S.K. (1996) Protein Eng. 9:781788.
Ness J.E., Kim S., Gottman A., Pak R., Krebber A., Borchert T.V., Govindarajan S., Mundorff E.C., Minshull J. (2002) Nat. Biotechnol. 20:12511255.[CrossRef][Web of Science][Medline]
Neylon C. (2004) Nucleic Acids Res. 32:14481459.
Orencia M.C., Yoon J.S., Ness J.E., Stemmer W.P., Stevens R.C. (2001) Nat. Struct. Biol. 8:238242.[CrossRef][Web of Science][Medline]
Rao B.M., Driver I., Lauffenburger D.A., Wittrup K.D. (2004) Mol. Pharmacol. 66:864869.
Rao B.M., Driver I., Lauffenburger D.A., Wittrup K.D. (2005) Biochemistry 44:1069610701.[CrossRef][Medline]
Sato K., Simon M.D., Levin A.M., Shokat K.M., Weiss G.A. (2004) Chem. Biol. 11:10171023.[CrossRef][Web of Science][Medline]
Schneider T.D., Stormo G.D., Gold L., Ehrenfeucht A. (1986) J. Mol. Biol. 188:415431.[CrossRef][Web of Science][Medline]
Siezen R.J. and Leunissen J.A. (1997) Protein Sci. 6:501523.[Web of Science][Medline]
Souriau C., Fort P., Roux P., Hartley O., Lefranc M.P., Weill M. (1997) Nucleic Acids Res. 25:15851590.
Souriau C., Gracy J., Chiche L., Weill M. (1999) Biol. Chem. 380:451458.[Medline]
Stemmer W.P. (1994a) Proc. Natl Acad. Sci. USA 91:1074710751.
Stemmer W.P. (1994b) Nature 370:389391.[CrossRef][Medline]
Suel G.M., Lockless S.W., Wall M.A., Ranganathan R. (2003) Nat. Struct. Biol. 10:5969.[CrossRef][Web of Science][Medline]
Swers J.S., Kellogg B.A., Wittrup K.D. (2004) Nucleic Acids Res. 32:e36.
Voigt C.A., Mayo S.L., Arnold F.H., Wang Z.G. (2001) J. Cell. Biochem. Suppl. Suppl 37, 5863.
Weiss O., Jimenez-Montano M.A., Herzel H. (2000) J. Theor. Biol. 206:379386.[CrossRef][Web of Science][Medline]
Werner S. and Grose R. (2003) Physiol. Rev. 83:835870.
Xian C.J. and Zhou X.F. (2004) Front. Biosci. 9:8592.[Web of Science][Medline]
Zaccolo M. and Gherardi E. (1999) J. Mol. Biol. 285:775783.[CrossRef][Web of Science][Medline]
Zaccolo M., Williams D.M., Brown D.M., Gherardi E. (1996) J. Mol. Biol. 255:589603.[CrossRef][Web of Science][Medline]
Received October 21, 2005; revised January 10, 2006; accepted February 16, 2006.
Edited by Andrew Bradbury
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Paramesvaran, E. G. Hibbert, A. J. Russell, and P. A. Dalby Distributions of enzyme residues yielding mutants with improved substrate specificities from two different directed evolution strategies Protein Eng. Des. Sel., July 1, 2009; 22(7): 401 - 411. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. Nolan and L. Yang The flow of cytometry into systems biology Brief Funct Genomic Proteomic, July 4, 2007; (2007) elm011v1. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






