PEDS Advance Access published online on February 2, 2008
Protein Engineering Design and Selection, doi:10.1093/protein/gzn001
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SHORT COMMUNICATION |
Revisiting the correlation between proteins' thermoresistance and organisms' thermophilicity
Unité de Bioinformatique génomique et structurale, Université Libre de Bruxelles, Av. F. Roosevelt 50, CP 165/61, 1050 Brussels, Belgium
1 To whom correspondence should be addressed. E-mail: ydehouck{at}ulb.ac.be
| Abstract |
|---|
|
|
|---|
The possibility to rationally design protein mutants that remain structured and active at high temperatures strongly depends on a better understanding of the mechanisms of protein thermostability. Studies devoted to this issue often rely on the living temperature (Tenv) of the host organism rather than on the melting temperature (Tm) of the analyzed protein. To investigate the scale of this approximation, we probed the relationship between Tm and Tenv on a dataset of 127 proteins, and found a much weaker correlation than previously expected: the correlation coefficient is equal to 0.59 and the regression line is Tm
42.9°C + 0.62Tenv. To illustrate the effect of using Tenv rather than Tm to analyze protein thermoresistance, we derive statistical distance potentials, describing Glu–Arg and Asp–Arg salt bridges, from protein structure sets with high or low Tm or Tenv. The results show that the more favorable nature of salt bridges, relative to other interactions, at high temperatures is more clear-cut when defining thermoresistance in terms of Tm. The Tenv-based sets nevertheless remain informative.
Keywords: living temperature/melting temperature/salt bridges/statistical potentials/thermal stability
| Introduction |
|---|
|
|
|---|
Thermophilic and hyperthermophilic organisms live and grow at temperatures (Tenv) beyond 45 and 80°C, respectively, and sometimes even beyond 100°C. Their adaptation to these environments requires that their proteins be able to adopt or maintain their native conformation and perform their activity under such extreme conditions (Jaenicke and Bohm, 1998
The numerous studies conducted in the last decades have come to indicate that protein thermostability, characterized by its Tm, likely results from a subtle combination of residue arrangements and interactions, rather than from a few decisive characteristics (Karshikoff and Ladenstein, 2001
; Kumar and Nussinov, 2001
). Moreover, it is becoming apparent that nature has found several different ways to achieve thermostability (Razvi and Scholtz, 2006
). In particular, studies focussed on a family of homologous proteins may be successful in identifying factors responsible for thermostability, but their usefulness is usually confined within the targeted family. A number of attempts have been made to uncover more general rules by considering large sets of proteins of different thermostability (Chakravarty and Varadarajan, 2002
; Suhre and Claverie, 2003
). These approaches benefit from the growing amount of experimental data, in particular the recent sequencing of entire genomes of thermophilic organisms. Unfortunately, the number of structures of thermoresistant proteins, which allow to probe the temperature adaptation of specific interactions, has not increased as rapidly. Comparative modeling may be helpful in this regard, by providing approximate structures to be analyzed (Chakravarty and Varadarajan, 2002
), but this approach has its own limitations.
Another issue is that very few proteins have their exact temperature resistance quantified through the measurement of their Tm. A common bypass, used in most in silico analyses, resides in the use of the organism's living temperature, Tenv, instead of Tm, to evaluate the thermal resistance of proteins. This may be a strong approximation. Indeed, thermophilic organisms need thermoresistant proteins, but the converse is not true: some mesophilic organisms contain proteins that maintain their structure and activity at extremely high temperatures. However, this approach is usually justified by a high correlation between Tm and Tenv previously derived from a few families of homologous proteins (Gromiha et al., 1999
). Our goal here is to revisit this approximation, and to show and discuss its limitations by comparing melting and environmental temperatures, with regard to their ability to help unraveling the mechanisms of protein thermostability.
| Materials and methods |
|---|
|
|
|---|
Protein databases
We collected, from the literature and the ProTherm database (Bava et al., 2004
), a dataset of 127 monomeric proteins of known X-ray structure, with an atomic resolution of 2.5 Å at most, and whose Tm was measured in the absence of denaturant. When the same Tm measure was performed at different pHs, the one with pH closest to seven was kept. The mean pH for the whole dataset is equal to 6.97, with a standard deviation
= 0.85. An average environmental temperature Tenv, corresponding to the optimal growth or normal living temperature of the species, was also assigned to each of these proteins from the literature and the PGTdb database (Huang et al., 2004
). The list of the 127 proteins included in our dataset, with details about the host organism and its Tenv, about the conditions in which Tm was measured (pH, buffer/ions in solution), the reversibility of the unfolding transition, and literature references, is given as supplementary material.
We split the 127 proteins of our dataset into two subsets, one consisting of the proteins with the lowest Tms, and the other, the proteins with the highest Tms. Both subsets were then refined to remove proteins presenting more than 25% sequence identity with another protein of the same subset. The mean temperature <Tm> for the subsets so obtained is equal to 53 and 81°C. This procedure was repeated for Tenvs instead of Tms, leading to sets with average <Tm> equal to 60 and 73°C. As expected, the difference between <Tm>s is significantly reduced as compared with that obtained for the Tm-based sets. Note that the average pH at which Tm measures were performed does not differ significantly among these subsets: it remains between 6.91 and 7.03. In addition, 1000 random pairs of subsets were generated, with the constraint that their <Tm> differs by no more than 3°C.
| Results and discussion |
|---|
|
|
|---|
We investigated the relationship between the Tms and Tenvs of the 127 proteins of our dataset. As apparent in Fig. 1, these two measures are found to be correlated, but quite imperfectly. Indeed, with a correlation coefficient of 0.59, these two temperatures can hardly be considered as equivalent. The corresponding regression line is Tm = 42.9°C + 0.62Tenv. If we restrict the dataset to the 68 proteins for which evidence that the thermal unfolding transition is reversible has been reported, the correlation coefficient slightly drops, to 0.56, and the regression line remains almost identical: Tm = 42.2°C + 0.65Tenv. We are quite far from the correlation coefficient (0.91) and regression line found earlier on a much smaller dataset (Gromiha et al., 1999
|
The nonequivalence between Tm and Tenv is clearly illustrated by the many proteins of our dataset that have a Tenv close to 37°C. They are mostly proteins from mammal species or their hosted bacteria, such as Escherichia coli. Interestingly, these proteins span almost the entire range of Tm. In particular, the Tms of human proteins included in our dataset are comprised between 39.45 and 90°C. The most thermoresistant protein from a mesophilic organism in our set has a Tm beyond 120°C, and comes from the sulfate-reducing bacteria desulfovibrio vulgaris, which can survive in contaminated environments (Santana and Crasnier-Mednansky, 2006
This limited, but significant (P-value
10–12), correlation between Tm and Tenv arises partially from the fact that the Tm of a protein must be larger than the corresponding Tenv. To analyze the impact of this constraint, we constructed 106 random permutations of the Tm – Tenv couples that respect the condition Tm > Tenv. The correlation coefficient between these random Tm – Tenv couples equals 0.40 on average, and is larger than 0.59 in only 0.07% of the cases. The Tm > Tenv condition is thus not sufficient to explain the observed relationship. It might indeed be considered that since there is probably no evolutive pressure on a protein to have its Tm much larger than Tenv, it is likely that the Tms remain on average relatively close to the corresponding Tenvs. In view of Fig. 1, this appears to be particularly true for proteins from organisms with a very high or very low Tenv, while a higher variability in Tm is observed for Tenv between 20 and 40°C. This observation cannot lead to any definitive statement since the number of proteins from psychrophilic or hyperthermophilic organisms in our dataset are relatively limited. It can nevertheless be argued that higher, multicellular, organisms are mostly mesophilic, and that they might require a larger range of protein Tms since they contain different tissues and thus much more differentiated environments, notably in terms of pH and ionic strength.
It should be noted that the correlation described above is computed regardless of the type of proteins. A stronger relationship between Tm and Tenv may be expected when focussing on a single family of proteins, because the optimal range of (thermo-)stability of a protein depends on its function and on the specific environment in which the protein is active. However, even if good correlations may indeed be observed for related proteins with measured Tms, their restricted number often precludes the demonstration of a statistical significance. For example, five adenylate kinases from different organisms, with Tenv ranging from 15 to 60°C, are included in our dataset. The correlation coefficient between Tm and Tenv is equal to 0.81 within this family, but with a relatively large P-value (0.096). Our dataset also contains five
-amylases (r = 0.84, P = 0.073), four β-lactamases (r = 0.73, P = 0.27), and five cytochromes P450 (r = 0.99, P = 0.001).
Even though it is relatively weak, the overall correlation between Tm and Tenv may be of practical use in the study of the determinants of protein thermostability. In view of analyzing and illustrating the differences between the use of Tm and Tenv for in silico studies, we investigated their effect on statistical potentials. Indeed, such potentials can be exploited to estimate quantitatively the relative weights of different types of interactions in determining thermostability, by comparing potentials derived from datasets of proteins with a low or high thermal resistance (Folch et al., 2008
). We used the following potential function:
|
| (1) |
W amounts therefore to make the assumption that differences in amino acid composition among the subsets are not random events that should be corrected for, but reflect a sequence adaptation necessary to face the thermal conditions.
We derived this potential from the 1002 subset pairs presented in Materials and methods, and focussed on two interactions that have been repeatedly pointed out as important with respect to thermostability, and should thus provide reliable test cases to compare the informative content of Tm and Tenv: the Asp–Arg and Glu–Arg salt bridges (Karshikoff and Ladenstein, 2001
). The computed
W energy profiles are plotted in Fig. 2, for the subdivisions based on Tm and Tenv. The two minima that can be observed in these curves correspond to different salt bridge geometries (Folch et al., 2008
): at short inter-Cµ distances the oxygen atoms of Glu–Asp interact with the N
and N
1 or N
2 atoms of Arg, while at longer distances they interact with N
1 and N
2. These minima are shifted toward smaller inter-residue distances for proteins with a higher temperature resistance, suggesting the preference for more compact geometries.
|
As shown in Fig. 2, the minima of the Glu–Arg and Asp–Arg potentials are deeper for the sets with high Tm or Tenv than for those with low Tm or Tenv. These interactions thus appear more favorable at higher temperatures, relative to other interactions. This trend is nevertheless more pronounced for the Tm-based sets. The second minimum even vanishes in the Glu–Arg potential for proteins of high Tenv. This may be interpreted as being due to the imperfect segregation of Tm-values in the Tenv-based subsets.
The results obtained on the 1000 random subset series were exploited to evaluate the significance of the differences observed between potentials derived from proteins of high or low Tm or Tenv. The probabilities of a random occurrence of an equivalent, or larger, difference in depth of each minimum were computed. As reported in Table I, these probabilities are very low for the Tm-based sets, and higher for the Tenv-based sets. In particular, they are equal to 0.3 and 0.2% for Tm-based sets, when Glu–Arg and Asp–Arg are considered concomitantly, and 1.5 and 1.8% for Tenv-based sets. The observation of Arg-involving salt bridges being more favorable at higher temperatures may thus be considered as statistically significant. The situation is less clear-cut with Tenv, as the corresponding probabilities are two to nine times larger. It appears therefore that, in this case, Tm is a better choice than Tenv to investigate thermostability.
|
This example nicely illustrates that when proteins are divided according to their Tenv, the presence of proteins with very different Tm is likely to provoke a decrease of the signal's strength. This may be even more problematic for less pronounced sequence/structure determinants of thermostability, as for example residue pair interactions less well correlated with protein thermostability than Glu–Arg and Asp–Arg salt bridges. However, though Tm is certainly a more correct descriptor of a protein's thermal resistance, the limited number of proteins of known Tm renders the use of Tenv unavoidable in some investigations. Our study justifies this approximation by showing that Tenv is informative for sufficiently strong tendencies: even if the correlation between Tm and Tenv is rather poor, the same general trends were observed for the Glu–Arg and Asp–Arg pairs. Synergetic combinations of Tm- and Tenv-based in silico analyses should therefore be profitable.
| Funding |
|---|
|
|
|---|
We acknowledge support from the Belgian State Science Policy Office through an Interuniversity Attraction Poles Programme (DYSCO), from the Belgian Fund for Scientific Research (FRS) through an FRFC project, and from the BioXpr bioinformatics company. B.F. benefits from a FRIA grant of the FRS, and Y.D. from a First-Postdoc grant of the Walloon region (PROMeTHe project). MR is Research Director at the F.R.S.
| Footnotes |
|---|
Edited by Valerie Daggett
| References |
|---|
|
|
|---|
Bava K.A., Gromiha M.M., Uedaira H., Kitajima K., Sarai A. Nucleic Acids Res. (2004) 32:D120–D121.
Chakravarty S., Varadarajan R. Biochemistry (2002) 41:8152–8161.[CrossRef][Web of Science][Medline]
Folch B., Rooman M., Dehouck Y. J. Chem. Info. Model. (2008) 2007 Dec 28; [Epub ahead of print].
Gromiha M.M., Oobatake M., Sarai A. Biophys. Chem. (1999) 82:51–67.[CrossRef][Web of Science][Medline]
Huang S.L., Wu L.C., Laing H.K., Pan K.T., Horng J.T. Bioinformatics (2004) 20:276–278.
Jaenicke R., Bohm G. Curr. Opin. Struct. Biol. (1998) 8:738–748.[CrossRef][Web of Science][Medline]
Karshikoff A., Ladenstein R. Trends Biochem. Sci. (2001) 26:550–556.[CrossRef][Web of Science][Medline]
Kumar S., Nussinov R. Cell. Mol. Life Sci. (2001) 58:1216–1233.[CrossRef][Web of Science][Medline]
Razvi A., Scholtz M. Protein Sci. (2006) 15:1569–1578.[CrossRef][Web of Science][Medline]
Santana M., Crasnier-Mednansky M. FEMS Microbiol. Lett. (2006) 260:127–133.[CrossRef][Web of Science][Medline]
Suhre K., Claverie J.M. J. Biol. Chem. (2003) 278:17198–17202.
Vieille C., Zeikus G. Microbiol. Mol. Biol. Rev. (2001) 65:1–43.
Received November 21, 2007; revised December 20, 2007; accepted December 31, 2007.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

