Skip Navigation


PEDS Advance Access originally published online on September 26, 2006
Protein Engineering Design and Selection 2006 19(11):517-524; doi:10.1093/protein/gzl039
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
19/11/517    most recent
gzl039v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by DiTursi, M. K.
Right arrow Articles by Dordick, J. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by DiTursi, M. K.
Right arrow Articles by Dordick, J. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org

Bioinformatics-driven, rational engineering of protein thermostability

Mary Kate DiTursi, Seok-Joon Kwon, Philippa J. Reeder and Jonathan S. Dordick1

Department of Chemical and Biological Engineering, Rensselaer Polytechnic Institute Troy, NY 12180-3590, USA

1To whom correspondence should be addressed. E-mail: dordick{at}rpi.edu


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
A longstanding goal in protein engineering is to identify specific sequence changes that endow proteins with desired functional properties. As opposed to traditional rational and random protein engineering techniques, we have employed a bioinformatic approach to identify specific sequence changes that influence key functional properties of a protein within a defined superfamily. Specifically, we have used the Bayesian sequence-based algorithms PROBE and Classifier to identify a strand–turn–strand motif that contributes to thermophilicity among members of the serine protease subtilase superfamily. By replacing a 16 amino acid sequence in the mesophilic subtilisin E (from Bacillus subtilis) with a bioinformatics-generated thermophilic model sequence, the melting temperature of subtilisin E was increased by 13°C. While wild-type subtilisin E was inactive at 90°C, the mutant retained a substantial fraction of its function, with ca. one-third of the activity that it has at 45°C.

Keywords: PROBE and Classifier/strand–turn–strand motif/subtilisin/thermostability


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
A major focus of applied biocatalysis is to develop enzymes with improved functional properties, such as tunable selectivity (DeSantis et al., 2002Go; Kaplan and DeGrado, 2004Go; Antikainen and Martin, 2005Go; Clark et al., 2006Go; Rubin-Pitel and Zhao, 2006Go) and increased activity and stability under harsh conditions (Chen and Arnold, 1993Go, Vieille and Zeikus, 2001Go; Haki and Rakshit, 2003Go; Eijsink et al., 2004Go, 2005Go; Li et al., 2005Go; Zhang et al., 2005Go). Two common approaches include rational design, through site-directed mutagenesis, and directed molecular evolution. The former requires extensive knowledge of protein structure, which is not always available (Kumar et al., 2000Go; Sriprapundh et al., 2000Go, Almog et al., 2002Go; Dwyer et al., 2004Go; Korkegian et al., 2005Go). The latter, which typically involves multiple rounds of random mutagenesis or gene shuffling followed in each round by screening, does not require knowledge of protein structure; however, it depends on the availability of a screen for the sought after property (Zhao and Arnold, 1999Go; Petrounia and Arnold, 2000Go; Eijsink et al., 2004Go; Antikainen and Martin, 2005Go; Rubin-Pitel and Zhao, 2006Go).

An alternative to rational and random engineering of biocatalyst stability is the so-called consensus approach. This method has been widely applied to predicting protein thermal stabilization (Lehmann and Wyss, 2001Go; Eijsink et al., 2005Go; Watanabe et al., 2006Go). Based on multiple sequence alignments of a given set of related proteins with a common desired trait, such as thermophilicity, this approach identifies the most common amino acid at each residue in conserved domains throughout the protein set. The consensus sequence of amino acids is then substituted into a protein of interest to introduce the desired trait (Lehmann and Wyss, 2001Go; Amin et al., 2004Go).

Steipe et al. (1994)Go were the first to use the consensus approach to reliably predict stabilizing mutations in the immunoglobulin variable domain. They compared the variable domain of known immunoglobulin sequences and used thermodynamic statistical mechanics-based algorithms to identify conserved residues as well as likely stabilizing mutations at those positions. Lehmann et al. (2000Go, 2002Go) reported that select combinations of 38 individually stabilizing mutations in fungal phytases derived from consensus sequence analysis synergistically resulted in up to 10°C increases in thermal stabilization. Similarly, Amin et al. (2004)Go demonstrated the use of random mutagenesis of consensus domain DNA oligomers and subsequent statistical recombination to increase by nearly 10°C the thermal denaturation temperature of a ß-lactamase from Enterobacter cloacae. Finally, in a phylogenetic approach, Watanabe et al. (2006)Go determined ancestral residues conferring thermostability and directly compared them with those identified by Steipe et al. (1994)Go using the consensus approach. Many of the residues identified by Steipe et al. were in fact ancestral; however, the degree of conservation was not directly related to the antiquity of the residue at that position. Thus, although a direct evolutionary context cannot be implied by consensus sequence information alone, such sequence information nevertheless provides a simple approach to predicting adaptive changes in proteins.

The value of phylogenetic methods lie in their sensitivity to short, sometimes referred to as ‘fuzzy’, sequence motifs found among species and remote homologs for the statistical definition of evolutionary distances and relationships (Sinsheimer et al., 2003Go; Cai et al., 2006Go; Ogdenw and Rosenberg, 2006Go; Watanabe et al., 2006Go; Whiting et al., 2006Go). Multiple sequence alignment for the purpose of phylogeny relies upon techniques that minimize penalties for sequence gaps and insertions, including use of Bayesian statistics and similar alignment scoring methods such as maximum likelihood, neighbor joining and maximum parsimony (Ogdenw and Rosenberg, 2006Go). Thermostability is an excellent candidate for such sensitive multiple alignment strategies given the tremendous variety of known point substitutions for the creation of thermal variants (Eijsink et al., 2004Go). These substitutions are often distinct among evolutionarily related protein families (Watanabe et al., 2006Go).

Along these lines, Neuwald et al. (1997)Go introduced PROBE as a Bayesian approach to multiple sequence alignment. The majority of existing alignment methods, including BLAST, function by recursively minimizing assigned penalties for mismatches, gap introduction and gap extension (Altschul et al., 1990Go). In PROBE, a transitive pairwise BLAST search identifies a set of remote sequence homologs, or protein superfamily, and determines a loose set of conserved domains within that grouping. Bayesian statistics then iteratively compare the collected domains for scoring purposes. Stronger domains are preserved and undergo individual pairwise alignment with all proteins in the complete database to identify additional remote superfamily members. Thus, a PROBE run returns not only an aligned sequence set, but also a probabilistic model describing each of the highly conserved regions. Therefore, PROBE detects high-score alignment (or matches) from the database even in proteins with large insertions, N- or C-terminal extensions, and multiple domains (Liu and Lawrence, 1999Go; Lecompte et al., 2001Go). PROBE is thus a sensitive tool for identifying protein sequence superfamilies and their characteristic domains.

PROBE has been used to identify widely diverse members of the same superfamily of proteins whose highly conserved regions (and hence function) may be the same despite large portions of the protein sequence being widely different (Neuwald et al., 1997Go). Shaw and Dordick (2002)Go used PROBE and Classifier (another Bayesian algorithm that distinguishes subsets of pertinent protein sequences based on archetypal sequence motifs) to identify several residues responsible for binding specificity in nucleotide cyclases and two families of isoprenoid biosynthesis enzymes; FPPases and GGPPases. By analyzing the information content of each position in each motif within classified groups, specific residues were identified within those motifs that were most characteristic of the family identified by Classifier.

In the current work, PROBE and Classifier were used to identify a strand–turn–strand motif consensus sequence within the serine protease subtilase superfamily that appears to endow some thermophilic subtilisins with enhanced thermostability (‘subtilase’ was coined by Siezen et al. in 1991Go). The motif was located by comparison of thermophilic and mesophilic conserved domains within the superfamily. This sequence-based correlation was confirmed using experimental techniques that conferred significantly enhanced thermostability of a nonthermophilic protease via insertion of the identified thermophilic motif. Hence, we have demonstrated an example of the use of a consensus approach combined with Bayesian alignment strategies to introduce functional improvements into proteins.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Bioinformatic algorithms

PROBE was conducted using the sequences of subtilisin E (NCBI gi: 135022), subtilisin BPN' (NCBI gi: 67620) and pyrolysin (NCBI gi: 1556463) against the National Center for Biotechnology Information (NCBI) nonredundant protein databank. The database file from 9 August 2000 was acquired from NCBI and stored locally, and used as the target for all database scans. All bioinformatics work was carried out on a dual 750 MHz processor IBM® RS6000® Model 43P with 2 GB of RAM running AIX® 4.3.3.

PROBE was performed using the following parameters: transitive BLAST search maximum recursive depth: 20; database scan maximum E-value: 0.01; database scan single-block maximum E-value: 1; size of alignment population: 10; heapsize for database scan: 10 000; number of sequences in alignment (min–max): 5–300; minimum sequence length: 35; transitive blast maximum E-value: 0.01; maximum number of model refinement cycles: 10; purge maximum cutoff score: 300; number of Gibbs Sampling iterations: 2.

For Classifier runs on only the hypothetical-thermostable motif, modifications to the PROBE output files were made to produce a model consisting only of the data from the one motif. As classification could not be run to convergence using this model (the process requires all the motifs), a number of different runs were conducted and pattern analysis of the output was conducted to obtain consensus groups.

Block Maker, used for comparative analysis, is a web-based server for multiple sequence alignment and domain analysis supported by researchers at the Fred Hutchinson Cancer Research Center (blocks.fhcrc.org). Only higher scoring portions of the thermostable and mesostable subtilase superfamilies generated by PROBE using transitive BLAST were used as inputs to the server because of a 250 sequence limit.

Construction of subtilisin E mutant

The gene for subtilisin E was obtained as a gift from Frances Arnold (California Institute of Technology), along with a modified pBE3 expression vector containing both Bacillus subtilis and Escherichia coli promoters, an E. coli ampicilin-resistance site and a Bacillus kanamycin-resistance site (Zhao and Arnold, 1997Go). Subtilisin E (both wild-type and variant) were amplified using the following primers: 5' primer: GATCCGAGCGTTGCATATGTGGAAGAAGATCAT, 3' primer: AAAAGGATCCTTACTATTAATGATGATGATGATGATGATGTTGTGCAGCTGCTTGTACGTTGAT. The primer for amplifying the ~300 bp sequence containing the custom insert and the final 200 bases of the gene was 5' primer GATCGACGTCTCATCTCCAGCGCAA. The 3' primer used was the same as given above. The 884 bp mature fragment of subtilisin E was removed from pBE3 using the restriction enzymes BamH1 and Nde1. The mature gene product was then digested with Nco1 to remove the 200 bp 3' end of the gene, and then ligated to the custom-synthesized gene product using a high concentration of T4 DNA ligase. Ligation product was then amplified directly via PCR using the above primer, and the downstream (3') whole-gene primer. The ~300 bp PCR product was obtained and digested with BsmB1. A second ligation was then carried out as described above between the amplified, digested fragment and the ~600 bp 5' end of the wild-type subtilisin E gene. Subtilisin E PCR primers (described above) were then used to amplify the result of this ligation, which consisted of a whole gene with modifications from 619 to 669 bp (see Supplementary Figure S1 available at PEDS online). Whole-gene sticky ends were then exposed using simultaneous digestion with BamH1 and Nde1, followed by PCR-cleanup and ligation into open pBE3 vector, all as described above. The pBE3 containing the subtilisin E variant gene was then transformed into HB101 E. coli.

Transformation, expression and purification of wild-type and mutant subtilisin E in B. subtilis DB428

B. subtilis DB428 cells (also obtained from Frances Arnold) containing the plasmids carrying wild-type and mutant subtlilisin E were grown at 37°C in LB media with kanamycin for 36 h. After centrifugation, three volumes of ice-cold ethanol were added to supernatant. The pellet was resuspended in 10 mM sodium phosphate buffer, pH 6.2, and purified by ion-exchange chromatography on CM sepharose phosphate buffer with a gradient of 0–0.4 M NaCl in aqueous buffer, pH 6.2, followed by dialysis against 50 mM Tris–HCl (pH 8.0), 5 mM CaCl2 and 10 mM dithiothreitol. Protein concentrations were determined using the Bio-Rad protein assay.

Activity assay and melting temperature

Subtilisin E (wild-type and mutant) activity was assayed at 25°C by measuring the p-nitroaniline liberated from the substrate, succinyl-Ala-Ala-Pro-Phe-p-nitroanilide (suc-AAPF-pNA) via spectrophotometry at 405 nm (Zhao and Arnold, 1997Go). The enzyme reaction was performed at 25°C in 50 mM Tris–HCl (pH 8.0) containing 10 mM CaCl2, except where otherwise noted. Melting temperatures of wild-type and mutant enzymes were determined by differential scanning calorimetry in 10 mM Tris–HCl/1 mM CaCl2/1 mM phenylmethylsulfonylfluoride, pH 8.0. The temperature was increased at a rate of 1°C/min from 20 to 90°C.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Through generations of natural selection in diverse environments, microbes and their proteins have adapted to extremes in temperature, pH, and salt/solute concentrations, among other stresses. The sequence information for many of these proteins is cataloged on the NCBI website and available for database searching. By comparing the sequence information of related proteins that have adapted to a variety of environments, we hypothesize that it is possible to observe specific sequence motifs that contribute to adaptive features. The subtilase superfamily contains a broad distribution of enzymes (e.g. thermophiles, acidophiles and alkalophiles, among others) and thus serves as an excellent candidate for characterizing natural adaptations based solely on sequence information. An adaptive feature of great commercial interest is thermophilicity (Bryan, 2000Go; Vieille and Zeikus, 2001Go; Haki and Rakshit, 2003Go; Eijsink et al., 2004Go, 2005Go).

PROBE runs with the subtilase superfamily

PROBE is a useful tool for sequence database queries that can identify widely diverse members of the same superfamily of proteins whose highly conserved regions (and hence function) may be the same despite large portions of the protein being nonhomologous (Neuwald et al., 1997Go) (see Supplementary Figure S2 available at PEDS online). Such information is particularly useful to identify core motifs that may correlate with specific structural and functional properties of some or all members of a protein superfamily. Querying the NCBI protein databank using PROBE with four different subtilases (subtilisins E and BPN', thermitase, and pyrolisin) resulted in the same superfamily of 497 sequences (see Supplementary Table S1 available at PEDS online) despite the fact that the sequence homologies of these four subtilases were far from identical (see Supplementary Table S2 available at PEDS online).

PROBE runs with mesophilic subtilisins BPN' and E returned a five-motif model consisting of the regions surrounding the active site and corresponding to the core of the subtilisin fold (Figure 1A). Runs with the thermophilic subtilisins, thermitase and pyrolysin, returned a model containing the same five motifs (Figure 1B, motif A–motif E) as well as an additional motif (Figure 1B, motif F). The log probability ratios (Table I), a measure of the total contribution of each individual motif to the whole model, were highest for the motifs that contain a member of the catalytic triad (motifs A, B and E in Figure 1B). This was expected, as removal of these motifs would most significantly damage the ability of the model to characterize the family of subtilases. Motif F was similar in log probability ratio to that of motifs C and D, indicating that this motif was a legitimate candidate to compare the thermophilic subtilases with the nonthermophilic superfamily members. Interestingly, while the individual log probability ratio of motif F was comparable with that of other conserved motifs in the model, the whole-model log probability ratios were highest without the sixth motif (4972.96 versus 4394.46 for the thermitase model and 8505.19 versus 7587.03 for the pyrolysin model). This indicates that while that motif is significantly conserved in some members of the superfamily (high individual log probability ratio), it does not contribute strongly to the overall characterization of the subtilases (low whole-model log probability ratios). Hence, examination of the motif may reveal a desirable function or characteristic distinct to the subclass containing the motif, while not general to all subtilases in the set.


Figure 1
View larger version (33K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. The five core motifs and one additional motif comprising the model for the subtilase superfamily. (A) Core motifs are shown in bold. The additional motif is underlined. The catalytic triad of Ser-Asp-His is highlighted in large font. All motifs are shown on the sequence of mature subtilisin E. (B) Sequence logos for the model (including the extra motif F) shows residue frequencies across the entire subtilase superfamily (motif A residue 12, motif B residue 8 and motif E residue 4 are the catalytic triad residues D, H and S, respectively).

 

View this table:
[in this window]
[in a new window]

 
Table I. Single motif, whole model, and model-less-motif log probability ratios for Subtilisin BPN'/E, Thermitase, AND Pyrolysin PROBE runs

 
Because PROBE iteratively adds relevant sequences from the protein database into its model, such model evolution can be expected to differ depending on the start sequence. Although the strength of the similarity of the core motif sequences will eventually draw in all 497 superfamily members regardless of the query sequence, PROBE runs with thermitase and pyrolysin can reasonably be expected to draw in sequences most similar to them first. Thus, early phases of model development are heavily influenced by the start sequence. However, towards the end of the PROBE run, when motifs have been refined and the difference in log probabilities between the model with and without each motif (Table I) are higher, a new motif is much less likely to be detected (see Supplementary Figure S2 available at PEDS online). Since motif F remained in later PROBE runs, it represented a legitimate target for investigating its association the function of its query sequences; in this case possible thermal stability.

To briefly determine how PROBE compares with more common multiple sequence techniques used in the consensus approach, we made use of Block Maker, which also creates multiple sequence alignments and identifies conserved regions based on Gibbs sampling methods, but lacks the genetic algorithms found in PROBE (Henikoff et al., 1996Go; Neuwald et al., 1997Go). Using the PROBE-generated output of subtilase superfamilies that was based on a transitive BLAST of thermitase (1THM) and subtilisin E (1SCJ), two sets of conserved domains were identified by Block Maker (see Supplementary Figure S3A and B available at PEDS online). The conserved domains identified using subtilisin E as the ‘seed’ did not encompass the three active site residues representative of the subtilase superfamily (Bryan, 2000Go); however, three of them did overlap closely with PROBE's model (Figure 1A). Four conserved domains based on thermitase as the ‘seed’ did match fairly closely with four of the five subtilisin E model motifs; however, the domain representing motif F was not identified as would be expected. Block Maker did not return a consistent set of conserved domains representative of the subtilase superfamily between two subtilisin ‘seeds’; this is a basic requirement for comparison of conserved domains between superfamily subclasses. We may conclude, therefore, that PROBE is more sensitive than Block Maker for identification of the low scoring domains observed with our approach. We hypothesize that methods similar to Block Maker commonly used in this type of consensus approach (Edgar and Batzoglou, 2006Go) would also lack the necessary sensitivity for this application.

Validation of thermostability hypothesis

The subtilase superfamily was organized into classes based only on the sixth motif using the Classifier (Qu et al., 1998Go). Classification by only motif F produces a consistent subset of protein sequences not seen in classification based on the entire model (Table II). Of the 11 members returned by Classifier, 6 proteins are known to be thermally stable (Kaneda and Tominaga, 1975Go; Yamagata et al., 1994Go; Peters et al., 1995Go; Voorhorst et al., 1996Go; Bevan et al., 1998Go; Choi et al., 1999Go). In addition to the five confirmed thermophiles, there was also one psychrophile—a subtilisin-like protease from the Norway spruce. The remaining five proteins identified by Classifier came from the Arabidopsis thaliana sequencing project, and have not been synthesized or characterized. Hence, of the 20 total thermophiles in the subtilase superfamily database collected by PROBE 6 were identified by Classifier as containing motif F (a return of 30%). Conversely, out of 447 meosphilic protein sequences in the database, four were returned by Classifier as containing motif F (a return of <1%). Thus, the method returned a positive thermophilic signal at least 30 times stronger than its false positive rate. This supports a strong connection between motif F and enhanced thermostability in some representatives of the subtilase superfamily.


View this table:
[in this window]
[in a new window]

 
Table II. Classifier-extracted class based on motif Fa

 
To validate the involvement of motif F experimentally, the 16 amino acid consensus sequence for the motif (based on the model sequence from each member of the 11 member thermostable class, Figure 2) was engineered into the corresponding location in mesophilic subtilisin E, expressed in B. subtilis DB428 cells and purified (Figure 3). Because subtilisin E is a serine protease, it was necessary to confirm that any change in activity was a result of intrinsic differences and not changes in autocatalytic activity. The solid curves drawn in Figure 4A represent first-order deactivation fits. This, coupled with the lack of second-order decay, confirmed that deactivation was not influenced by the proteolytic activity of subtilisin E. Relative hydrolytic activity as a function of temperature (Figure 4B) demonstrates that the mutant is stable above 60°C, and at 75°C has 2-fold higher activity than the wild-type. Moreover, at 90°C, despite the wild-type displaying no activity, the mutant retains 30% of its maximal activity (as reflected by its observed activity at 45°). Finally, as depicted in Figure 4C, the apparent melting temperature of the mutant was ca. 13°C higher than that of the wild-type. While mutant expression and specific activity were reduced compared with the wild-type (15 and 28% of wild-type expression and activity, respectively); nonetheless, these results clearly indicate a strong relationship between the introduction of the motif F consensus sequence and at least a 3-fold increase in activity and stability over wild-type subtilisin E at temperatures over 60°C.


Figure 2
View larger version (37K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Sequence logo for the consensus sequence from the thermostable class of the additional motif. As compared with the whole superfamily consensus sequence, the K-P-D in residues 3–5 is far more conserved. Residues conserved between the consensus and the wild-type sequences are in bold.

 

Figure 3
View larger version (55K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. SDS-PAGE analysis of subtilisin E and its variant. Lane 1, molecular size marker (SeaBlue Pre-Stained Standard from Invitrogen); lane 2, supernatant of B. subtilis DB428 cells; lane 3, supernatant of DB428 cells expressing wild-type subtilisin E; lane 4, supernatant of DB428 cells expressing subtilisin E variant; lane 5, purified wild-type subtilisin E; lane 6, purified subtilisin E variant.

 

Figure 4
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. Activity and thermal stability of the subtilisin variant compared with wild-type subtilisin E: (A) Residual activity for both mutant (open triangles) and wild-type (open circles) at 60°C. The residual activity was assayed at 25°C by measuring the initial rate of hydrolysis of the substrate (suc-AAPF-pNA). Both blue and red line shows the first order deactivation curve fit. (B) Initial rate as a function of temperature for both mutant ({Delta}) and wild-type subtilisin E (open circles). (C) Differential scanning calorimetry data for the wild-type (top line) and subtilisin E mutant (bottom line).

 

    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Bayesian bioinformatic tools were used to determine whether sequence-based information could be used to enhance a desirable functional property of a model superfamily, specifically thermostability, in some members of the subtilase superfamily. PROBE was used to identify a set of motifs (or model) common to all subtilases and also reveal an additional motif common to many of the thermostable subtilases. This additional motif, while not nearly as strongly conserved as the characteristic motifs of the superfamily, presented a direction for further investigation into the correlation between protein sequence information and thermostability.

Based on the identity of the additional motif, Classifier was used to classify a thermophilic subset of proteins from the superfamily and determine a consensus sequence representative of the additional motif. To confirm its involvement in conferring thermostabilty, the resulting 16 amino acid model peptide was inserted into subtilisin E, a mesophilic member of the subtilase superfamily, in the corresponding sequence location. As a result, a significant enhancement in the thermostability of subtilisin E was observed, along with a considerable increase in the melting temperature of the enzyme. These results strongly suggest that this 16 amino acid region found in some thermophilic subtilases is critical in imparting both structural and functional thermostability in the subtilase superfamily.

The consensus sequence of the motif (Figure 2) contains a higher number of hydrophobic residues than the sequence in the identical subtilisin E location. Of the 16 residues, 6 are conserved (of these 5 are weakly to strongly hydrophobic) between the mesophilic and thermophilic motifs, and the remaining 10 are more hydrophobic in the thermophilic consensus sequence. This shift to a substantially more hydrophobic motif in the thermophilic versus the mesophilic subtilisins is consistent with a more rigid surface structure in the former than in the latter, and this may translate into greater thermostability (Vieille and Zeikus, 2001Go; Almog et al., 2002Go; Rader et al., 2002Go; Eijsink et al., 2004Go, 2005Go). Eijsink et al. (1995)Go analyzed a thermolysin-like protease from Bacillus stearothermophilus for thermally stabilizing mutations by comparing it with more autocatalytically stable thermolysin. They found that synergistic mutations at six residues in the thermolysin-like protease resulted in a 23°C increase in melting temperature; a melting temperature higher than thermolysin itself. Five out six of those mutations increased the hydrophobicity of the protein, and all five of those mutations were in the same region (56–69 amino acids) indicating a trend similar to what we observed.

Zhao and Arnold (1997Go, 1999Go) developed a thermostable variant of subtilisin E using directed molecular evolution. Eight point mutations in this mutant were required to enhance thermostability by 17°C: P14L, N76D, N118S, S161C, G166R, N181D, S194P and N218S. All but two of those (14 and 118) are on residues that are structurally adjacent to where motif F, a strand–turn–strand loop (residues 193–207), would be in subtilisin E. One, S194P, is located in the motif region (Figure 5A). Similar to motif F, over half the mutations in the Zhao and Arnold mutant resulted in an increase in the hydrophobicity of this region of the protein. It is possible that this increased hydrophobicity helps to stabilize the strand–turn–strand region, resulting in a more rigid thermostable mutant structure, imitating the natural evolution of rigidity in that region in thermostable subtilases. Visual analysis of the subtilisin E structure reveals that the thermophilic sequence appears to adjoin two halves of the protein (Figure 5B). By increasing the hydrophobicity of the region of motif F, this region may be more strongly associated with the hydrophobic protein core, thereby resulting in increased rigidity and a reduced propensity to unfold at elevated temperatures (Rader et al., 2002Go).


Figure 5
View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. (A) Location of motif F (yellow spacefill) and thermally stabilizing point mutations from Zhao and Arnold (1999)Go (blue spacefill) on the structure of subtilisin E (PDB 1SCJ). Catalytic triad shown in ball-and-stick (red). (B) Location of motif F (yellow), calcium ions (green spacefill) and N76D (bright blue spacefill) from Zhao and Arnold (1999)Go on the structure of subtilisin E (PDB 1SCJ). The rest of the structure is shown in terminus coloring (red, N; blue, C). Figures were created in MOE (Chemical Computing Group Inc.).

 
It is important to note that motif F does not strongly interact with either of the bound calcium ions common to many subtilases. As defined by Alexander et al. (2001)Go, the calcium ions occupy two locations, sites A and B, on opposite sides of the protein (Figure 5B). The calcium ion at site A is known to have a direct impact on protein inactivation, as removing ions at this location results in rapid destabilization of the protein. The ion at site B, however, has been shown to be less important. Site A is adjacent to motif F but closer to a separate loop that contains one of Zhao and Arnold's (1999)Go point mutations (N76D). Thus, in our thermostable mutant, calcium binding is not likely conferring increased stability. Rather, we propose that enhanced thermostability is a direct consequence of increased rigidity in the region of motif F.

Berezovsky and Shaknovich (2005)Go hypothesized that proteins evolved thermophilic characteristics in two ways: (i) nonspecific, structure-based adaptations developed by thermophilic ancestors that evolved in extreme environments, and (ii) specific, sequence based adaptations by ancestral species that recolonized in warmer environments. Our findings, together with others described above in the Introduction, have further supported this hypothesis by demonstrating that relatively small changes in a protein sequence can result in large changes in thermophilicity.

In conclusion, we have shown that it is advantageous to use highly sensitive multiple sequence alignment methods, such as Bayesian-based algorithms, to obtain sets of proteins with similar functions that have evolved in a specific environment to identify consensus sequences unique to specific adaptations. This method has been used to extract previously unknown members of superfamilies from the sequence database as well as to provide insight into families within protein superfamilies (Neuwald et al., 1997Go; Zhao and Arnold, 1997Go; Qu et al., 1998Go; Florczyk et al., 2001Go). In our case, we used this approach to identify thermophilic-related sequences in the subtilases, a superfamily that does not report a strong thermophilic domain using other, less sensitive multiple sequence alignment methods. As a result, a significant thermal stabilization of a mesophilic enzyme has been achieved using bioinformatic tools.


    Footnotes
 
Edited by Dick Janssen


    Acknowledgements
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
We are grateful to Timothy Cale at Rensselaer Polytechnic Institute for use of the IBM servers. This work was supported by grants from the Biotechnology Research and Development Corporation and the National Institutes of Health (GM66712).


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Alexander P.A., Ruan B., Bryan P.N. (2001) Biochemistry 40:10634–10639.[CrossRef][Medline]

Almog O., Gallagher D.T., Ladner J.E., Strausberg S., Alexander P., et al. (2002) J. Biol. Chem. 277:27553–27558.[Abstract/Free Full Text]

Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. (1990) J. Mol. Biol. 215:403–410.[CrossRef][Web of Science][Medline]

Amin N., Liu A.D., Ramer S., Aehle W., Meijer D., Metin M., Wong S., Gualfetti P., Schellenberger V. (2004) Protein Eng. Des. Sel. 17:787–793.[Abstract/Free Full Text]

Antikainen N.M. and Martin S.F. (2005) Bioorg. Med. Chem. 13:2701–2716.[CrossRef][Medline]

Berezovsky I.N. and Shakhnovich E.I. (2005) Proc. Natl Acad. Sci. USA 102:12742–12747.[Abstract/Free Full Text]

Bevan M., et al. (1998) Nature 391:485–488.[CrossRef][Medline]

Bryan P.N. (2000) Biochim. Biophys. Acta 1543:203–222.[CrossRef][Medline]

Cai L., Jeewon R., Hyde K.D. (2006) Mycol. Res. 110:137–150.[CrossRef][Web of Science][Medline]

Chen K. and Arnold F.H. (1993) Proc. Natl Acad. Sci. USA 90:5618–5622.[Abstract/Free Full Text]

Choi I.G., Bang W.G., Kim S.H., Yu Y.G. (1999) J. Biol. Chem. 274:881–888.[Abstract/Free Full Text]

Clark L.A., Boriack-Sjodin P.A., Eldredge J., Fitch C., Friedman B., et al. (2006) Protein Sci. 15:949–960.[CrossRef][Web of Science][Medline]

DeSantis G., et al. (2002) J. Am. Chem. Soc. 124:9024–9025.[CrossRef][Web of Science][Medline]

Dwyer M.A., Looger L.L., Hellinga H.W. (2004) Science 304:1967–1971.[Abstract/Free Full Text]

Edgar R.C. and Batzoglou S. (2006) Curr. Opin. Struct. Biol. 16:368–373.[CrossRef][Web of Science][Medline]

Eijsink V.G., Bjork A., Gaseidnes S., Sirevag R., Synstad B., van den Burg B., Vriend G. (2004) J. Biotechnol. 113:105–120.[CrossRef][Web of Science][Medline]

Eijsink V.G., Gaseidnes S., Borchert T.V., van den Burg B. (2005) Biomol. Eng. 22:21–30.[CrossRef][Web of Science][Medline]

Eijsink V.G., Veltman O.R., Aukema W., Vriend G., Venema G. (1995) Nat. Struct. Biol. 2:374–379.[CrossRef][Web of Science][Medline]

Florczyk M.A., McCue L.A., Stack R.F., Hauer C.R., McDonough K.A. (2001) Infect. Immun. 69:5777–5785.[Abstract/Free Full Text]

Haki G.D. and Rakshit S.K. (2003) Bioresour. Technol. 89:17–34.[CrossRef][Web of Science][Medline]

Henikoff J.G. and Henikoff S. (1996) Methods Enzymol. 266:88–105.[Web of Science][Medline]

Kaneda M. and Tominaga N. (1975) J. Biochem. (Tokyo) 78:1287–1296.[Abstract/Free Full Text]

Kaplan J. and DeGrado W.F. (2004) Proc. Natl. Acad. Sci. USA 101:11566–11570.[Abstract/Free Full Text]

Korkegian A., Black M.E., Baker D., Stoddard B.L. (2005) Science 308:857–860.[Abstract/Free Full Text]

Kumar S., Tsai C.J., Nussinov R. (2000) Protein Eng. 13:179–191.[Abstract/Free Full Text]

Lawrence C.E., Altschul S.F., Boguski M.S., Liu J.S., Neuwald A.F., Wootton J.C. (1993) Science 262:208–214.[Abstract/Free Full Text]

Lecompte O., Thompson J.D., Plewniak F., Thierry J., Poch O. (2001) Gene 270:17–30.[CrossRef][Web of Science][Medline]

Lehmann M., Kostrewa D., Wyss M., Brugger R., D'Arcy A., et al. (2000) Protein Eng. 13:49–57.[Abstract/Free Full Text]

Lehmann M. and Wyss M. (2001) Curr. Opin. Biotechnol. 12:371–375.[CrossRef][Web of Science][Medline]

Lehmann M., Loch C., Middendorf A., Studer D., Lassen S.F., Pasamontes L., van Loon A.P.G.M., Wyss M. (2002) Protein Eng. 15:403–411.[Abstract/Free Full Text]

Li W. F., Zhou X.X., Lu P. (2005) Biotechnol. Adv. 23:271–281.[CrossRef][Web of Science][Medline]

Liu J.S. and Lawrence C.E. (1999) Bioinformatics 15:38–52.[Abstract/Free Full Text]

Neuwald A.F., Liu J.S., Lipman D.J., Lawrence C.E. (1997) Nucleic Acids Res. 25:1665–1677.[Abstract/Free Full Text]

Ogdenw T.H. and Rosenberg M.S. (2006) Syst. Biol. 55:314–328.[CrossRef][Medline]

Peters J., et al. (1995) J. Mol. Biol. 245:385–401.[CrossRef][Web of Science][Medline]

Petrounia I.P. and Arnold F.H. (2000) Curr. Opin. Biotechnol. 11:325–330.[CrossRef][Web of Science][Medline]

Qu K., McCue L.A., Lawrence C.E. (1998) Proc. Int. Conf. Intell Syst. Mol. Biol. 6:131–139.[Medline]

Rader A.J., Hespenheide B.M., Kuhn L.A., Thorpe M.F. (2002) Proc. Natl Acad. Sci. USA 99:3540–3545.[Abstract/Free Full Text]

Rubin-Pitel S.B. and Zhao H. (2006) Comb. Chem. High Throughput Screen 9:247–257.[CrossRef][Web of Science][Medline]

Shaw E. and Dordick J.S. (2002) Biotechnol. Bioeng. 79:295–300.[CrossRef][Web of Science][Medline]

Siezen R.J., de Vos W.M., Leunissen J.A., Dijkstra B.W. (1991) Protein Eng. 4:719–737.[Abstract/Free Full Text]

Sinsheimer J.S., Suchard M.A., Dorman K.S., Fang F., Weiss R.E. (2003) Appl. Bioinformatics 2:131–144.[Medline]

Socolich M., Lockless S.W., Russ W.P., Lee H., Gardner K.H., et al. (2005) Nature 437:512–518.[CrossRef][Medline]

Sriprapundh D., Vieille C., Zeikus J.G. (2000) Protein Eng. 13:259–265.[Abstract/Free Full Text]

Steipe B., Schiller B., Pluckthun A., Steinbacher S. (1994) J. Mol. Biol. 240:188–192.[CrossRef][Web of Science][Medline]

Vieille C. and Zeikus G.J. (2001) Microbiol. Mol. Biol. Rev. 65:1–43.[Abstract/Free Full Text]

Voorhorst W.G., Eggen R.I., Geerling A.C., Platteeuw C., Siezen R.J., et al. (1996) J. Biol. Chem. 271:20426–20431.[Abstract/Free Full Text]

Watanabe K., Ohkuri T., Yokobori S., Yamagishi A. (2006) J. Mol. Biol. 355:664–674.[CrossRef][Web of Science][Medline]

Whiting A.S., Sites J.W. Jr, Pellegrino K.C., Rodrigues M.T. (2006) Mol. Phylogenet. Evol. 38:719–730.[CrossRef][Web of Science][Medline]

Yamagata H., Masuzawa T., Nagaoka Y., Ohnishi T., Iwasaki T. (1994) J. Biol. Chem. 269:32725–32731.[Abstract/Free Full Text]

Zhang W., Liu Y., Zheng H., Yang S., Jiang W. (2005) Appl. Environ. Microbiol. 71:5290–5296.[Abstract/Free Full Text]

Zhao H. and Arnold F.H. (1997) Proc. Natl Acad. Sci. USA 94:7997–8000.[Abstract/Free Full Text]

Zhao H. and Arnold F.H. (1999) Protein Eng. 12:47–53.[Abstract/Free Full Text]

Received April 28, 2006; revised July 24, 2006; accepted August 18, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
19/11/517    most recent
gzl039v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by DiTursi, M. K.
Right arrow Articles by Dordick, J. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by DiTursi, M. K.
Right arrow Articles by Dordick, J. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?