Protein Engineering, Vol. 13, No. 3, 179-191,
March 2000
© 2000 Oxford University Press
Factors enhancing protein thermostability
1 Intramural Research Support Program, SAIC Frederick, 2 Laboratory of Experimental and Computational Biology, National Cancer Institute, Frederick Cancer Research and Development Center, Bldg 469, Rm 151, Frederick, MD 21702, USA and 3 Sackler Institute of Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| Abstract |
|---|
|
|
|---|
Several sequence and structural factors have been proposed to contribute toward greater stability of thermophilic proteins. Here we present a statistical examination of structural and sequence parameters in representatives of 18 non-redundant families of thermophilic and mesophilic proteins. Our aim was to look for systematic differences among thermophilic and mesophilic proteins across the families. We observe that both thermophilic and mesophilic proteins have similar hydrophobicities, compactness, oligomeric states, polar and non-polar contribution to surface areas, main-chain and side-chain hydrogen bonds. Insertions/deletions and proline substitutions do not show consistent trends between the thermophilic and mesophilic members of the families. On the other hand, salt bridges and side chainside chain hydrogen bonds increase in the majority of the thermophilic proteins. Additionally, comparisons of the sequences of the thermophilemesophile homologous protein pairs indicate that Arg and Tyr are significantly more frequent, while Cys and Ser are less frequent in thermophilic proteins. Thermophiles both have a larger fraction of their residues in the
-helical conformation, and they avoid Pro in their
-helices to a greater extent than the mesophiles. These results indicate that thermostable proteins adapt dual strategies to withstand high temperatures. Our intention has been to explore factors contributing to the stability of proteins from thermophiles with respect to the melting temperatures (Tm), the best descriptor of thermal stability. Unfortunately, Tm values are available only for a few proteins in our high resolution dataset. Currently, this limits our ability to examine correlations in a meaningful way.
Keywords: melting temperature/sequence/structure/thermophiles/thermostability
| Introduction |
|---|
|
|
|---|
Several organisms, mainly archaea, thrive under extreme environmental conditions, e.g. high pressure in deep sea vents, high temperature and non-physiological pH found in submarine hydrothermal areas, continental sulfataras, low temperatures in Antarctica and high salt concentration in the Dead Sea and in the Great Salt Lake, and in man made geothermal power plants. There has been a growing interest in understanding the stabilization of proteins from these organisms. Such an understanding, especially of the thermophilic proteins, is not only essential for a theoretical description of the physico-chemical principles behind protein folding and stability, but is also critical for designing efficient enzymes that can work at high temperatures. Such enzymes may be useful for several industrial applications, such as detergent manufacturing, food and starch processing, production of high fructose corn syrup and PCR (Adams and Kelly, 1995
Thermostable proteins maintain their activities and are stable at high temperatures. Identifying and understanding the factors contributing to the stability of proteins from organisms living under extreme conditions has been a long standing problem. The first high resolution crystal structure of thermolysin was reported in 1974 (Matthews et al., 1974
). Perutz and Raidt (1975) commented on the stereochemical basis of thermostability of ferredoxins and hemoglobin A2. Since these pioneering efforts, several investigators have focused on the problem of the molecular basis of protein thermostability. Several reasons have been attributed to the greater stability of the thermophilic proteins (Querol et al., 1996
; Jaenicke and Bohm, 1998
; Ladenstein and Antranikian, 1998
). Among the most prominent ones are greater hydrophobicity (Haney et al., 1997
), better packing, deletion or shortening of loops (Russell et al., 1997
), smaller and less numerous cavities, increased surface area buried upon oligomerization (Salminen et al., 1996
), amino acid substitutions within and outside the secondary structures (Zuber, 1988
; Haney et al., 1997
; Russell et al., 1998
), increased occurrence of proline residues (Haney et al., 1997
; Watanabe et al., 1997
; Bogin et al., 1998
), decreased occurrence of thermolabile residues (Russell et al., 1997
), increased helical content, increased polar surface area (Haney et al., 1997
; Vogt and Argos, 1997
; Vogt et al., 1997
), increased hydrogen bonding (Vogt and Argos, 1997
; Vogt et al., 1997
) and salt bridges (Yip et al., 1995
, 1998
; Haney et al., 1997
; Russell et al., 1997
, 1998
; Elcock, 1998
; Xiao and Honig, 1999
; Kumar et al., 2000
).
Here we present a statistical analysis of parameters thought to contribute toward protein thermostability. We have carried out structural comparisons to cluster the thermophile mesophile protein families, creating a non-redundant dataset of 18 families from the Protein Data Bank (PDB) (Bernstein et al., 1977
). These families span an entire spectrum, containing proteins from moderately thermophilic to hyperthermophilic organisms and their mesophilic homologs. Not all the differences observed between the thermophilic and mesophilic proteins are due to thermostability. Here we select one pair from each family. We choose the structurally most similar thermophilemesophile pair having the best resolution, so that the observed differences can be expected to be mostly due to thermostability. In our dataset, no two thermophilic proteins from different families have similar three-dimensional structures, ensuring a bias free sample. Between each thermophilemesophile pair, we have compared several structural properties such as oligomeric state, insertion/deletion of residues, compactness, hydrophobicity, helical content, hydrogen bonds and salt bridges. We find that most of these do not show consistent trends across the families, indicating versatile protein stabilization strategies adopted by the individual families. However, there are a few global trends across a large number of families. Salt bridges and side-chain hydrogen bonds increase in most of the thermophilic proteins. Interestingly, the overall amino acid distributions in the thermophilic and the mesophilic proteins are significantly different, in spite of the high sequence homologies between the protein structural pairs. The proportions of the thermolabile residue Cys and of Ser decrease significantly, while those of Arg and Tyr increase significantly in the thermophilic proteins as compared with their mesophilic homologs. Pro is observed to occur less frequently in
-helices of the thermophilic proteins. On the whole, a higher proportion of amino acids in the thermophilic proteins adopt
-helical conformation. Our results indicate a two pronged strategy adopted by the thermophiles. Thermophilic proteins appear to disfavor potentially destabilizing factors along with favoring the potentially stabilizing ones. Furthermore, here we compare our results with those obtained from an analysis of a database of 165 non-homologous proteins.
Our intention was to carry out the analysis with respect to the melting temperatures of the corresponding proteins, from both the thermophiles and the mesophiles. Melting temperatures (Tm's), are the best descriptor of thermal stability. To be able to draw reliable conclusions, we wished to focus on cases where (i) high resolution crystal structures are available for both the thermophilic protein and its mesophilic homolog; and (ii) melting temperatures for the thermophilic and mesophilic proteins have been measured and reported. Cases where the difference between the melting temperatures of the thermophilicmesophilic protein pair is not too small, and that the size of the protein is large enough, are the more meaningful ones. Too small a difference in the melting temperatures corresponds to a small difference in energy between the pair of proteins; whereas if the protein is small, the differences in structural parameters might be difficult to gauge accurately. Unfortunately, only a few cases are currently available in the literature. In these cases, the difference in the number of salt bridges between the thermophile and its mesophile homologue appears to correlate with the Tm of the thermophilic protein. While other structural factors, such as compactness and hydrophobicity, contribute to thermostability, no consistent correlation with the Tm is observed. However, we are unable to obtain statistically reliable results due to the sparse data. On the other hand, we point out that none of the structural factors correlates with the living temperatures of the thermophilic organisms.
| Materials and methods |
|---|
|
|
|---|
Construction of the families of thermophilic and mesophilic proteins
An index file, called source.idx, in the Protein Data Bank (PDB) (Bernstein et al., 1977
) contains the names of the organisms for all protein crystal structures available in the PDB. The January 7, 1998 update of this file was searched for the keywords THERM and PYRO. This search yielded 167 (out of 6751) PDB entries containing different proteins from thermophilic organisms. The entries in which protein structures had been determined by using nuclear magnetic resistance (NMR) and/or theoretical modeling, R = 1.0 Å in cmpd_res file, were discarded, leaving us with 145 PDB entries. From this set of entries containing proteins whose structures were determined by X-ray crystallography, 113 entries containing high resolution (R
2.5 Å) structures for 55 different thermophilic proteins were selected for further study. For each of the thermophilic proteins in the list, the PDB entry with the best resolution was picked. Three-dimensional structures of the thermophilic proteins were compared all against all using a sequence order independent structural comparison technique (Tsai et al., 1996
). This computer vision-based technique superimposes spatially equivalent regions in two proteins without regard to their sequential connectivity, or to the number of residues in the protein. Since the mesophilic and thermophilic proteins have different sizes and may have different oligomeric states, this technique allows us to superimpose the conserved regions of the proteins independently of these factors. Two proteins are considered to be dissimilar if (i) the backbone C
atom superposition for the two structures yields an r.m.s.d.
2.00 Å; and (ii) the sequence identity (ID) for the two proteins is
20%. Finally, thermophilic proteins were retained in the database if they have dissimilar structures and if there is at least one high resolution crystal structure for their corresponding mesophilic homologs. This step ensures non-redundancy in the database. Eighteen different thermophilic proteins were obtained. The structure of each of the 18 proteins was compared with their corresponding homologous PDB entries. Two structures were considered to be similar if they did not satisfy both of the above conditions. At this stage, many families contain several mesophilic proteins. Application of a 2.5 Å resolution cut-off substantially decrease their number. Finally, the PDB entry which has the best resolution and contains the structure that is most similar to the thermophilic protein is selected. As far as possible, we have tried to select wild-type thermophilemesophile pairs. Attention was also paid to the presence (absence) of substrates in the thermophilic and mesophilic proteins. Choosing one thermophilemesophile pair per family, in a way such that the pair contains the best resolved structures along with the largest sequence and structure homology among the various available alternates, has several advantages. First, since the two proteins are most similar, the observed differences can be correlated with thermostability with a greater degree of confidence. Second, the variability, or the consistency of the results, can be judged from the behavior of all 18 families; and third, in particular, the behavior of the parameters is a function of two factors: the extent of structural similarity between the two molecules and the sequence similarity. The non-polar buried surface area, compactness, etc. obtained in comparisons of members of the same family would need to be calibrated against the sequence differences, and it is unclear how best to do this in practice. In an extensive recent analysis, Vogt et al. (1997) have used multiple mesophilic homologs for comparison with the thermophilic proteins. They have calibrated specific protein structural properties per 10°C rise in living temperature of the organisms in a given family. The statistical trends obtained by Vogt et al. (1997) and by us are similar, indicating the equivalence of the two approaches.
The properties of these 18 pairs of thermophilic and mesophilic proteins are summarized in Table I
. The best matching protein chains in each family are indicated in the footnotes of Table I
. One PDB entry for the mesophilic protein elongation factor EF-TU-EF-TS complex (PDB entry 1EFU) from Escherichia coli is an A2B2 type tetramer with chains of type A and B being highly dissimilar. This particular protein complex has two different homologs in the thermophilic proteins, namely, EF-TU (PDB entry 1EFT) and EF-TS (PDB entry 1TFE). Furthermore, 1TFE, a dimer, matches with a single chain, 1EFU-B. The asymmetric unit of lactate dehydrogenase crystals from Bacillus stearothermophilus (PDB entry 1LDN) contains two copies of the molecule. The first copy has been used in this analysis. In all the families, the spatially overlapping regions in the superposition of the thermophilic and mesophilic proteins are very extensive. For example, in the citrate synthase family, where the similarity between the thermophilic and mesophilic proteins is relatively poor as compared with most other families, 332 residues in each chain overlap spatially. A chain of thermophilic citrate synthase (1AJ8-B) has 370 residues while a chain of mesophilic citrate synthase (1CSH) contains 435 residues. A few of the PDB entries used in this analysis have missing atoms, residues or small fragments due to poor diffraction data. Additionally, the crystal structures in several cases may be determined at low temperatures to obtain better diffraction data. However, these factors do not substantially affect the overall three-dimensional structures of the proteins. No systematic errors are expected on this count.
|
Sequence composition analysis
Distributions (numbers, N) and frequencies (percent, %) of all 20 amino acids were computed for the thermophilic and mesophilic proteins. In addition, we have computed their distributions in the
-helices. The amino acid distributions were compared using the
2-test. Hamming distance was computed between percent (%) amino acid compositions. The change in proportion test was used to identify the amino acids whose proportions change significantly. These calculations follow Kumar and Bansal (1998a).
Structural properties
Oligomeric state For a given protein, the PDB files contain coordinates for the structure observed in a crystallographic asymmetric unit. This may not reflect the true biochemically relevant oligomeric state for the protein. In our data set these oligomeric states of the thermophilic and mesophilic proteins are tabulated by studying the biochemical data contained in the relevant literature on these proteins, indicators within the PDB files and the pointers in the PDB3DB browser.
Hydrophobicity
The hydrophobicity of a protein was calculated as the fraction of the buried non-polar area out of the total non-polar area, computed by using the methods described earlier (Tsai and Nussinov, 1997a
,b
; Tsai et al., 1997
).
Compactness
The compactness (Zehfus and Rose, 1986
) of a protein was defined as the ratio of solvent accessible area (Lee and Richards, 1971
; Tsai et al., 1997
) of the protein and the surface area of a sphere with equal volume to the protein (Tsai and Nussinov, 1997a
,b
).
Hydrogen bonds and salt bridges Whenever two heavy (non-hydrogen) atoms with opposite partial charges [donor (D)accepter (A) pairs] were found to be within a distance of 3.5 Å, a hydrogen bond has been inferred. The geometrical goodness of the hydrogen bond was assessed by computing the values of the following angles.
- Angle
D between vectors BDD and DA, BD is the atom covalently bonded to the donor (D) atom.
- Angle
A between vectors DA and ABA, BA is the atom covalently bonded to the acceptor (A) atom.
A hydrogen bond was taken to have good geometry if both these angles lie in the range 90150°. Only those hydrogen bonds which have a good geometry were included in our studies.
The presence of salt bridges was inferred when Asp or Glu side-chain carbonyl oxygen atoms were found to be within 4.0 Å distance from the nitrogen atoms in Arg, Lys and His side chains.
Helical content
The helical content of a protein refers to the percentage (%) of residues that have
-helical conformation in the protein. The corresponding Dictionary of Protein Secondary Structure (DSSP) (Kabsch and Sander, 1983
) file was used to identify the residues in
-helical conformation in each protein. Overall geometries of
-helices in the thermophilic and mesophilic protein chains were characterized using HELANAL (Kumar and Bansal, 1996
; Kumar and Bansal, 1998b
). This program is available at http://www-lecb.ncifcrf.gov/~kumarsan/
Buried and exposed surface areas
Buried and accessible surface areas (Lee and Richards, 1971
; Tsai and Nussinov, 1997a
,b
) have been computed for thermophilic and mesophilic protein chains as well as for 165 dissimilar monomers. Four different fractions have been computed from these areas, in each case:
- Fraction of polar exposed surface area is the ratio of the exposed polar surface area to the total exposed surface area.
- Fraction of non-polar exposed surface area is the ratio of the exposed non-polar surface area to the total exposed surface area.
- Fraction of polar buried surface area is the ratio of the buried polar surface area to the total buried surface area.
- Fraction of non-polar buried surface area is the ratio of the buried non-polar surface area to the total buried surface area.
- Total exposed surface area is the sum of polar and non-polar exposed surface areas. Similarly, the total buried surface area is the sum of polar and non-polar buried surface areas.
Measurement of percent change in various properties
For the purpose of a comparison between a thermophilicmesophilic pair, the numbers of hydrogen bonds and salt bridges in the two proteins were normalized by their respective number of residues. Percent changes were computed as the difference between the normalized values of hydrogen bonds and salt bridges in the two proteins in each family, divided by the corresponding normalized values for the mesophilic proteins.
Changes in protein size can occur due to insertion/deletion and/or oligomerization. Percent change in protein size in each family was computed by dividing the difference in the number of residues between the thermophilic and mesophilic proteins by the number of residues in the mesophilic protein.
Percent change in hydrophobicity in each family was computed by dividing the difference in hydrophobicity for the thermophilic and mesophilic proteins by the hydrophobicity for the mesophilic protein. Percent change in compactness was also computed in the same way.
Database of 165 dissimilar monomers
A database of 165 proteins, which (i) have been solved to high resolution R
2.5 Å by X-ray crystallography and contain at least 50 amino acids, (ii) have dissimilar 3D structures, as determined by the sequence order independent structure comparison technique (Tsai et al., 1996
), and (iii) exist as monomers in solution as indicated in their PDB files, relevant biochemical literature and pointers in PDB3DB browser to other databases such as SWISS-PROT, was generated from the PDB. This database was used as a control for studying structural features, such as compactness, hydrophobicity, polar and non-polar contribution to buried and exposed surfaces in thermophilic and mesophilic protein chains.
Cases of high resolution structural pairs where the melting temperatures are currently available
- (i) 3-Phosphoglycerate kinase (PGK) (Davies et al., 1993
): Tm = 67°C for the thermophilic enzyme from Bacillus stearothermophilus and 53°C for its mesophilic enzyme counterpart, from Saccharomyces cerevisiae. The thermophilic PGK is a monomer while the mesophilic PGK is a dimer. The energy difference between the two enzymes, 
G = ~5 kcal/mol.
- (ii) Adenylate kinase (Glaser et al., 1992
): Tm = 74.5°C for the thermophilic enzyme from Bacillus stearothermophilus and 48°C for the mesophilic enzyme from Saccharomyces cerevisiae. Both the thermophilic and the mesophilic enzymes are monomers.
- (iii) CheY, the bacterial chemotaxis protein (Usher et al., 1998
): Tm for the thermophilic protein is 95°C from Thermotoga maritima. Both the thermophilic and the mesophilic proteins are monomers.
- (iv) Glutamate dehydrogenase (Yip et al., 1995
): Tm = 113°C for the thermophilic protein from Pyrococcus furiosus. Both the thermophilic and the mesophilic enzymes are hexamers. Tm = 55°C for Clostridium symbiosum glutamate dehydrogenase (Yip et al., 1995
).
- (v) Rubredoxin, a small redox protein (Day et al., 1992
): there are several estimates of Tm for rubredoxin from Pyrococcus furiosus. The one used here is from Hiller et al. (1997), determined by the Hydrogen exchange technique. Tm for thermophilic rubredoxin = 176 195°C. Both the thermophilic and the mesophilic rubredoxins are monomers.
- (ii) Adenylate kinase (Glaser et al., 1992
For PGK the melting temperatures of the thermophilic and mesophilic proteins are close (
Tm = 67 53 = 14°C). The energy difference between thermophilic and mesophilic enzymes is only 5 kcal/mol (
G = ~5 kcal/mol). Moreover, the oligomeric states of the two PGKs are also different. The thermophilic rubredoxin has a very high Tm. However, it is a very small protein, consisting of only about 50 amino acids. More than one estimate of Tm for rubredoxin further complicates the matter.
| Results |
|---|
|
|
|---|
We have selected a non-redundant dataset of 18 families consisting of thermophilic and mesophilic proteins whose high resolution (R
2.5 Å) structures are available in the PDB (Table IPacking
Reasons for higher stability of thermophilic proteins include better packing (Russell et al., 1997
, 1998
) and hence, smaller and less numerous cavities. To study packing in a protein one can compute its compactness (Zehfus and Rose, 1986
). Compactness has been defined to be the ratio of accessible surface area (ASA) (Lee and Richards, 1971
) of a given protein to the surface area of a sphere with the same volume as the protein. Assuming that most proteins are more or less globular in shape, a better packed protein will have a smaller ratio value. We have already used this formulation to study hydrophobic folding units (Tsai and Nussinov, 1997a
,b
). Figure 1
plots the compactness versus the number of residues in thermophilic and mesophilic protein chains (one chain per protein), along with the values calculated for the 165 structurally dissimilar monomeric protein chains selected from the PDB. The compactness values for the thermophilic protein chains are very similar to those calculated for the mesophilic protein chains. They are also within the range of the compactness values obtained for the 165 dissimilar monomers. However, the overall packing of an oligomeric protein may involve two components: (i) packing of atoms within individual subunits, and (ii) the association, or packing, of the subunits with respect to each other. Consequently, we have computed the compactness for the thermophilic and mesophilic proteins in their biochemically relevant oligomeric states. The results are presented in Table II
. Again, the compactness values for thermophilic and mesophilic proteins are highly similar. Hence, there is no consistent pattern in the contribution of packing to the differences in stabilities between thermophilic and mesophilic protein pairs. Recently, Karshikoff and Ladenstein (1998) have also reached similar conclusions upon computing cavity volumes for a large number of thermophilic and mesophilic proteins.
|
|
Hydrophobicity
With the rapid increase in the structural information available for proteins, it is becoming increasingly clear that the hydrophobic effect is the dominant driving force in protein folding (Dill, 1990
). Hence, it has been suggested that thermophilic proteins are substantially more hydrophobic (Haney et al., 1997
) and have more surface area buried upon oligomerization (Salminen et al., 1996
) as compared with their mesophilic counterparts. As with packing, the hydrophobic effect can manifest itself at two levels: (i) hydrophobicities of the individual protein chains, and (ii) hydrophobicity due to the association of the chains. We have computed the hydrophobicity as the fraction of buried non-polar surface area out of the total non-polar surface area (Tsai and Nussinov, 1997a
,b
), for the thermophilic and mesophilic protein chains as well as their biochemically relevant oligomeric forms. Figure 2
presents a plot of the hydrophobicity versus the number of residues in thermophilic and mesophilic protein chains, along with those for the 165 dissimilar monomeric chains. The figure illustrates that thermophilic and mesophilic protein chains have very similar hydrophobicities. The values lie within the same range as those for the hydrophobicities of 165 dissimilar monomers. The hydrophobicities computed for the thermophilic and mesophilic proteins in their biochemically relevant oligomeric states are presented in Table II
. Again, the hydrophobicities of the thermophilic and mesophilic protein oligomers are very similar.
|
Polar and non-polar surface areas
It has been suggested that increased polar surface area contributes to the greater stability of the thermophilic proteins (Haney et al., 1997
; Vogt and Argos, 1997
; Vogt et al., 1997
). Here, we have divided protein surfaces into buried and exposed parts and evaluated the contribution of polar and non-polar atoms. These calculations have been performed for all thermophilic and mesophilic protein chains (one polypeptide chain per protein) and compared with those for 165 dissimilar monomers. The calculations have been done in two different ways. In the first set all atoms including the backbone were considered. In the second set, the backbone atoms were excluded. Table III
presents the results. The distributions of buried and exposed, polar and non-polar surface areas are quite uniform for the 165 dissimilar monomers as well as for the thermophilic and mesophilic protein chains.
|
The above observations on packing, hydrophobicity and surface areas indicate that basic protein core is similar between thermophiles and mesophiles.
Salt bridges and hydrogen bonds
Along with oligomerization, chain length, hydrophobicity and compactness, hydrogen bonds and salt bridges have also been compared between the thermophilic and the mesophilic proteins. The hydrogen bonds were divided into three classes: main chainmain chain (MM H-bonds), main chainside chain (MS H-bonds) and side chainside chain hydrogen bonds (SS H-bonds). Figure 3
shows plots of SS H-bonds and salt bridge content changes in the families of thermophilic and mesophilic proteins in their biochemically relevant oligomeric states, and at their interfaces. As the figure shows, side chainside chain H-bonds and salt bridge content increase in the monomers of most thermophilic proteins and at their interfaces.
|
The most significant change in the number of salt bridges was observed in the glutamate dehydrogenase family. This family contains glutamate dehydrogenase enzymes from hyperthermophile Pyrococcus furiosus and the mesophile Clostridium symbiosum. Both thermophilic and mesophilic glutamate dehydrogenases are homohexamers and share good sequence and structural similarities (Table I
Insertions, deletions and oligomerization
It has been suggested that deletion or shortening of loops may increase protein thermal stability (Russell et al., 1997
, 1998
). Oligomerization can be another contributing factor. These factors reflect a change in protein size, and its effect on thermal stability. Figure 4
shows changes in hydrogen bonds, salt bridges, compactness and hydrophobicity plotted against the change in the number of residues between thermophilic and mesophilic proteins in each family. Mostly there is no correlation with a change in protein size, either due to insertions/deletions or due to oligomerization. This is further corroborated by the observation that in 14 out of 18 families in our database, thermophilic and mesophilic proteins have the same oligomeric states. In two families the oligomeric states of thermophilic proteins are found to be higher than those of their mesophilic homologs. However, the oligomeric states of mesophilic proteins are higher than their thermophilic homologs in the other two families.
|
Living temperatures of the thermophilic organisms and structural factors involved in protein thermostability
In the literature, the stability of thermophilic proteins has been described in a number of ways, such as in terms of the temperature at which a protein is active (activity temperature), stable (stability temperature) or by half life for a certain duration of time. Much less frequently a protein is described in terms of melting, or mid-point transition temperature (Tm). Perhaps due to this heterogeneity in the available data, a recent database analysis study (Vogt and Argos, 1997
; Vogt et al., 1997
) used the living temperatures of the organisms from which the proteins were isolated as a parameter for studying thermostability. Figure 5
plots changes in the oligomeric state, chain length, hydrophobicity, compactness, main chainmain chain, main chainside chain and side chainside chain hydrogen bonds and salt bridges as a function of living and of melting temperatures. Figure 5a
shows that structural factors involved in protein thermostability do not correlate with living temperatures of the thermophilic organisms. The trends observed in Figure 5b
are clearer. However, there are only five data points, two out of these (first and last) are unreliable due to reasons summarized in the Materials and methods section. If we ignore these points, we observe that among the various factors, only the salt bridges tend to correlate with the melting temperature. Unfortunately, this observation is unreliable, as it is based only on three proteins. However, it is consistent with studies by Yip et al. (1998), who have observed a correlation between ion pairs and thermostability for glutamate dehydrogenases from different organisms. Clearly, this phenomenon needs to be investigated further before any conclusions are drawn.
|
Distribution of amino acids
The overall distributions of amino acids in the 18 non-redundant families of thermophilic and mesophilic protein chains are presented in Table IV
. Figure 6
presents a comparison between the residue composition of the thermophilic and mesophilic proteins. Despite the high sequence homology, a
2 test (Kumar and Bansal, 1998a
) indicates that the differences between the two distributions are highly significant (
2 = 86.2). For a 19 parameter system such as amino acid distribution, a
2 value at 95% level of confidence (probability of accepting the null hypothesis that two distributions are similar, P
0.05) should be greater than 30.14 to reject the null hypothesis. This evidence is further corroborated by the observation that the value of Hamming distance in 20 dimensional amino acid composition (%) space (Kumar and Bansal, 1998a
) between thermophilic and mesophilic chains is large (8.1 distance units).
|
|
Proline substitutions
It has been suggested that Pro has an increased occurrence in thermophilic proteins, especially in loops (Haney et al., 1997
; Watanabe et al., 1997
; Bogin et al., 1998
). A total of 75 Pro substitutions are observed in loop regions of thermophilic and mesophilic chains. In 39 cases, the thermophilic chains contain a Pro residue instead of other residues found in their mesophilic homologs at equivalent loop positions. However, in 36 cases, another residue is present in the thermophilic chains instead of Pro in the mesophilic homologs. Thus, there is no consistent pattern for Pro substitutions in loops. In our database, the frequency of occurrence of Pro is unchanged (4.2%) (Figure 6
) in thermophilic and mesophilic proteins.
Preferred and avoided residues in thermophilic proteins
A change in proportion test (Kumar and Bansal, 1998a
) is used to identify amino acids whose proportions change significantly, that is, by >2 standard deviations, between thermophilic and mesophilic chains. Changes in the proportions of Cys (0.6% in thermophilic and 1.0% in mesophilic chains), Arg (4.6% in thermophilic and 3.6% in mesophilic chains), Ser (4.0% in thermophilic and 5.5% in mesophilic chains) and Tyr (4.5% in thermophilic and 3.7% in mesophilic chains) are found to be significant (Figure 6
).
Of the 20 amino acids, Asn, Gln, Met and Cys can be classified as thermolabile due to their tendency to undergo deamidation or oxidation at high temperatures (Russell et al., 1997
). Table IV
and Figure 6
indicate that the frequencies of occurrence for Gln (2.8% in thermophiles and 2.9% in mesophiles) and Met (2.3% in thermophiles and 2.4% in mesophiles) are similar. Cys (0.6% in thermophilic chains and 1.0% in mesophilic) and Asn (4.4% in thermophilic and 5.1% in mesophilic) change by appreciable amounts. However, only the change in the frequency of Cys is significant.
The above observations raise questions about the possible roles of Arg, Tyr and Ser whose proportions change significantly. It has been suggested that thermophilic proteins have increased hydrogen bonding and salt bridge formation (Yip et al., 1995
; Querol et al., 1996
; Vogt and Argos, 1997
; Vogt et al., 1997
; Russell et al., 1997
, 1998
). Due to their large side chains, Arg and Tyr may be useful both in short range local interactions and in long range interactions. The guanidium group in Arg can form salt bridges. On the other hand, due to its short side chain Ser forms mostly local interactions (Jeffrey and Saenger, 1991
). Interestingly, it has recently been observed that hot spots for binding in protein interfaces are also rich in Arg, Tyr and Trp (Clackson and Wells, 1995
; Bogan and Thorn, 1998
). Hence, it appears that in both binding and folding at high temperatures, Arg and Tyr play a similar role, contributing toward protein stability. On the other hand, Trp occurs with a similar proportion in both thermophilic and mesophilic chains (Table IV
and Figure 6
). In contrast to Arg and Tyr, Trp is a hydrophobic residue with a bulky double ring side chain, usually occurring with low frequencies in proteins. Alternatively, it is possible that the absence of a noticeable trend for Trp, a rare residue, is due to its low counts in our sample.
Thermophilic and mesophilic
-helices
It has been suggested that thermophilic proteins have a higher helical content (Querol et al., 1996
). In our database, we find that in nine out of the 18 families, thermophilic and mesophilic chains have similar values for the fraction of residues in helical conformation (fH), as identified using DSSP (Kabsch and Sander, 1983
). However, on the whole, thermophilic proteins have a higher occurrence of residues in helical conformation. fH for thermophilic chains is 32.0% as compared with 25.4% in the mesophilic chains.
-Helices in the thermophilic and mesophilic proteins adopt similar overall geometries as characterized using HELANAL (Kumar and Bansal, 1996
; Kumar and Bansal, 1998b
).
Tables V
presents the amino acid distributions in
-helices of thermophilic and mesophilic chains.
2-test shows that amino acid distribution in
-helices of thermophilic proteins is significantly different from that of
-helices in mesophilic proteins. Hamming distance (Kumar and Bansal, 1998a
) between the two distributions is 15.1 distance units in the 20 dimensional amino acid composition space. The proportions of Cys (0.1% in thermophilic and 0.8% in mesophilic helices), His (2.0% in thermophilic and 3.3% in mesophilic helices) and Arg (5.5% in thermophilic and 3.9% in mesophilic helices) change significantly. Thermophilic helices favor Arg and avoid His and Cys as compared with mesophilic helices. A recent database analysis study on
-helices shows Arg to be a helix-favoring residue with its propensity to occur in the middle region of
-helices being 1.33, while Cys (propensity = 0.87 in the middle of
-helices) and His (propensity = 0.76 in the middle of
-helices) are helix disfavoring residues (Kumar and Bansal, 1998a
). Thermostability has also been attributed to enhanced secondary structure propensity (Querol et al., 1996
). This might rationalize the increase in the proportion of Arg, a helix favoring residue in thermophilic protein helices, while helix disfavoring residues Cys and His decrease. A previous analysis of the composition of
-helices in the thermophilic proteins (Warren and Petsko, 1995
) has also noted a significant decrease in Cys and His. The proportion of Arg increases and that of Cys decreases significantly in the entire thermophilic proteins as well. Furthermore, Proline occurs with a frequency of 0.7% in
-helices of thermophilic as compared to 1.3% in
-helices of mesophilic proteins. Proline is the most avoided residue in the middle of
-helices (Kumar and Bansal, 1998a
), since it may cause kinks (Woolfson and Williams, 1990
; Kumar and Bansal, 1996
, 1998a
, Kumar and Bansal, b
).
|
From the sequence composition comparison between thermophiles and mesophiles, thermophiles favor those factors that can enhance their stability, and avoid those factors which can destabilize them. Lower occurrence of thermolabile residues in the thermophilic chains along with lower occurrence of Cys, His and Pro in thermophilic helices illustrate a clear trend in this direction.
| Discussion and conclusions |
|---|
|
|
|---|
In this extensive study we have examined structural and sequence factors involved in protein thermostability. Thermophilic proteins optimize their stabilities via different mechanisms. Sequence and structural factors, such as packing, oligomerization, insertions and deletions, proline substitutions, helical content, helical propensities, polar surface area, hydrogen bonds and salt bridges, have been proposed to contribute to greater stability of thermophilic proteins. We have analyzed all these factors in a database of 18 thermophilemesophile families. There are two major concerns in the analyses such as the ones presented here. First, protein stabilization strategies that may be observed in the individual families may not show consistent trends across several families. Second, not all differences among the thermophiles and mesophiles may be attributable to protein thermostability. Some may be due to phylogenetic differences between the thermophiles and mesophiles. In the available data, we observe that no single factor proposed to contribute toward protein thermostability is 100% consistent in our set of proteins. It is particularly interesting to note that hydrophobicity, packing and fractional polar and non-polar surface areas show little quantitative differences between thermophiles and mesophiles. While insertions/deletions, oligomerization and proline substitutions can stabilize individual thermophilic proteins, they do not show consistent trends across the families. It is also possible that the observed differences are due to phylogenetic differences between thermophiles and mesophiles. It should also be mentioned that more than one factor may be responsible for greater stability of the thermophilic protein in a given family.
The most consistent trend is shown by salt bridges and side chainside chain hydrogen bonds. These increase in the majority of the thermophilic proteins. In recent years, the role of salt bridges toward protein stability has been controversial (Hendsch and Tidor, 1994
; Kumar and Nussinov, 1999
). However, in the case of the thermophilic proteins, salt bridges have been shown to be stabilizing (Elcock, 1998
; Xiao and Honig, 1999
; Kumar et al., 2000
). Recently, we have calculated the electrostatic strengths of salt bridges in the glutamate dehydrogenase family (Kumar et al., 2000
). Network formation stabilizes individual salt bridges in Pyrococcus furiosus glutamate dehydrogenase (Kumar et al., 2000
). Salt bridges are major contributors toward thermostability of Pyrococcus furiosus glutamate dehydrogenase as compared with the mesophilic Clostridium symbiosum glutamate dehydrogenase (Yip et al., 1995
). In a large database analysis study, we have observed that salt bridges with `good geometries', such as those in the present study, have mostly, but not always, contributed stabilizing electrostatic contributions toward protein stability (Kumar and Nussinov, 1999
). Thermophilic proteins are not only stable, but are also optimally active at high temperatures. An increase in the number of salt bridges and hydrogen bonds may rigidify a thermophilic protein and expose it to the danger of becoming inactive. Still, while a thermophilic protein may be rigid at room temperature, it is likely to be flexible at high temperatures (Jaenicke and Bohm, 1998
). Recently, we have also observed that Pyrococcus furiosus glutamate dehydrogenase contains a greater number of salt bridges and their networks around the active site as compared with the mesophilic Clostridium symbiosum glutamate dehydrogenase. The salt bridges around the active site may help to keep the active site region together by opposing disorder due to greater atomic mobility at high temperatures (Kumar et al., 2000
).
Examination of the sequences shows that despite high sequence homology, the differences in amino acid distributions in the thermophilic and mesophilic proteins are highly significant. While some of the differences in the amino acid distributions are likely to be the outcome of phylogenetic differences between thermophiles and mesophiles, others correlate with protein thermostability. For example, the proportions of the thermolabile amino acid Cys, and of Ser which usually forms local interactions, decrease significantly, while those of Arg and Tyr which are capable of both short range and long range interactions increase significantly in the thermophilic proteins. The stability of the constituent
-helices also appears to contribute to protein thermal stability. Thermophilic proteins have a higher proportion of residues in helical conformation. Helix-favoring residue Arg occurs more frequently in
-helices of thermophilic proteins, whereas helix-disfavoring residues Cys, His and Pro have lower frequencies of occurrence in thermophilic helices. Refraining from using some residues, and opting for others in sequences of thermophilic proteins suggests a dual strategy employed by these proteins to enhance their stability. On the one hand, thermophilic proteins prefer residues with larger side chains that can form salt bridges, long range or local electrostatic and hydrophobic interactions, and which stabilize secondary structure elements. However, concomitantly, thermophilic proteins avoid thermolabile residues and residues that can destabilize secondary structure elements.
Our analysis shows that the organisms' living temperatures are not good descriptors of protein thermostability. Melting temperatures may be more appropriate to measure protein thermostability. When explored with respect to the melting temperatures, salt bridges appear to show a correlation with the Tm's. We note, however, that while high quality crystal structures are available, unfortunately, the Tm's have been determined only for a few of these proteins. Hence, currently we are unable to examine a correlation of salt bridges and the respective melting temperatures of the thermophiles in a statistically meaningful way. However, we observe that structural factors involved in the stability of the thermophilic proteins do not correlate with the living temperatures of their source organisms.
From the point of view of designing a thermophilic protein, this study suggests inclusion of a larger proportion of salt bridges. Additionally, it indicates including residues in
-helical conformation, and a higher frequency of Arg both to form salt bridges and additionally to stabilize
-helices. It would be preferable to avoid Pro, Cys and His in
-helices, and avoid thermolabile residues, particularly Cys.
| Acknowledgments |
|---|
We thank Drs Buyong Ma and Neeti Sinha and, in particular, Dr Jacob V.Maizel for helpful discussions. The personnel at FCRDC are thanked for their assistance. The research of R.Nussinov in Israel has been supported in part by grant no. 95-00208 from BSF, Israel, by a grant from the Ministry of Science, by the Center of Excellence, administered by the Israel Academy of Sciences, by the Magnet grant, and by the Tel Aviv University Basic Research and Adams Brain Center grants. This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract No. NO1-CO-56000. The content of this publication does not necessarily reflect the view or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products or organization imply endorsement by the U.S. Government.
| Notes |
|---|
4 To whom correspondence should be addressed Email: ruthn{at}ncifcrf.gov
| References |
|---|
|
|
|---|
Adams,M.W.W. and Kelly,R.M. (1995) Chem. Engng News, 73, 3242.
Auerbach,G., Jacob,U., Grottinger,M., Schurig,M. and Jaenicke,R. (1997) Biol. Chem., 378, 327329.
Bernstein,F., Koetzle,T., Williams,G., Meyer,E.J., Brice,M., Rodgers,J., Kennard,O. Shimanuchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535542.[ISI][Medline]
Bogan,A.A. and Thorn,K.S. (1998) J. Mol. Biol., 280, 19.[ISI][Medline]
Bogin,O., Peretz,M., Hacham,Y., Korkhin,Y., Frolow,F., Kalb(Gilboa),A.J. and Burstein,Y. (1998) Protein Sci., 7, 11561163.[Abstract]
Clackson,T. and Wells,J.A. (1995) Science, 267, 383386.
Daniel,R.M., Cowan,D.A., Morgan,H.W. and Curran,M.P. (1982) Biochem. J., 207, 641644.[ISI][Medline]
Davies,G.J., Gamblin,S.J., Littlechild,J.A. and Watson,H.C. (1993) Proteins, 15, 283289.[ISI][Medline]
Day,M.W., Hsu,B.T., Joshua-Tor,L., Park,J.B., Zhou,Z.H., Adams,M.W.W. and Rees,D.C. (1992) Protein Sci., 1, 14941507.[Abstract]
Dill,K.A. (1990) Biochemistry, 31, 71347155.
Elcock,A.H. (1998) J. Mol. Biol., 284, 489502.[ISI][Medline]
Fukuyama,K., Nagahara,Y., Tsukihara,T., Katsube,Y., Hase,T. and Matsubara,H. (1988) J. Mol. Biol., 199, 183193.[ISI][Medline]
Glaser,P., Presecan,E., Delepierre,M., Surewicz,W.K., Mantsch,H.H., Barzu,O. and Giles,A.M. (1992) Biochemistry, 31, 30383043.[Medline]
Gomes,J., Gomes,I., Kreiner,W., Esterbauer,H., Sinner,M. and Steiner,W. (1993) J. Biotech., 30, 283297.
Haney,P., Konisky,J., Koretke,K.K., Luthey-Schulten,Z. and Wolynes,P.G. (1997) Proteins, 28, 117130.[ISI][Medline]
Hendsch,Z.S. and Tidor,B. (1994) Protein Sci., 3, 211226.[Abstract]
Hiller,R., Zhou,Z.H., Adams,M.W.W. and Englander,S.W. (1997) Proc. Natl Acad. Sci. USA, 94, 1132911332.
Holland,D.R., Hausrath,A.C., Juers,D. and Matthews,B.W. (1995) Protein Sci., 4, 19551965.[Abstract]
Jaenicke,R. and Bohm,G. (1998) Curr. Opin. Struct. Biol., 8, 738748.[ISI][Medline]
Jeffrey,G.A. and Saenger,W. (1991) Hydrogen Bonding in Biological Structures. Springer-Verlag, Berlin
Jiang,Y., Nock,S., Nesper,M., Sprinzl,M. and Sigler,P.B. (1996) Biochemistry, 35, 1026910278.[Medline]
Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 25772637.[ISI][Medline]
Karshikoff,A. and Ladenstein,R. (1998) Protein Engng, 1, 867872.
Kelly,C.A., Nishiyama,M., Ohnishi,Y., Beppu,T. and Birktoft,J.J. (1993) Biochemistry, 32, 39133922.[Medline]
Kjeldgaard,M., Nissen,P., Thirup,S. and Nyborg,J. (1993) Structure, 1, 3550.[Medline]
Klump,H.H., Dikuggiero,J., Kessel,M., Park,J.B., Adams,M.W.W. and Robb,F.T. (1992) J. Biol. Chem., 267, 2268122685.
Knegtel,R.M.A., Wind,R.D., Rozeboom,H.J., Kalk,K.H., Buitelaar,R.M., Dijkhuizen,L. and Dijkstra,B.W. (1996) J. Mol. Biol., 256, 611622.[ISI][Medline]
Kumar,S. and Bansal,M. (1996) Biophys. J., 71, 15741586.
Kumar,S. and Bansal,M. (1998a) Proteins, 31, 460476.[ISI][Medline]
Kumar,S. and Bansal,M. (1998b) Biophys. J., 75, 19351944.
Kumar,S. and Nussinov,R. (1999) J. Mol. Biol., 293, 12411255.[ISI][Medline]
Kumar,S., Ma,B., Tsai,C.J. and Nussinov,R. (2000) Proteins, 38, 368383.[ISI][Medline]
Ladenstein,R. and Antranikian,G. (1998) Adv. Biochem. Engng Biotechnol., 61, 3785.
Lee,B.K. and Richards,F.M. (1971) J. Mol. Biol., 55, 379400.[ISI][Medline]
Matthews,B.W., Weaver,L.H. and Kester,W.H. (1974) J. Biol. Chem., 249, 80308044.
Obmolova,G., Kuranova,I. and Teplyakov,A. (1993) J. Mol. Biol., <

) protein chains. x-axis denotes the number of residues (N) in the protein chains and y-axis denotes compactness (Z). For comparison, 165 monomers with dissimilar structures () obtained from the PDB are also depicted.




