Protein Engineering, Vol. 15, No. 12, 967-977,
December 2002
© 2002 Oxford University Press
Identification of conserved residue patterns in small ß-barrel proteins
Centre for DNA Fingerprinting and Diagnostics, ECIL Road, Nacharam, Hyderabad 500 076, India
| Abstract |
|---|
|
|
|---|
Our abilities to predict three-dimensional conformation of a polypeptide, given its amino acid sequence, remain limited despite advances in structure analysis. Analysis of structures and sequences of protein families with similar secondary structural elements, but varying topologies, might help in addressing this problem. We have studied the small ß-barrel class of proteins characterized by four strands (n = 4) and a shear number of 8 (S = 8) to understand the principles of barrel formation. Multiple alignments of the various protein sequences were generated for the analysis. Positional entropy, as a measure of residue conservation, indicated conservation of non-polar residues at the core positions. The presence of a type II ß-turn among the various barrel proteins considered was another strikingly invariant feature. A conserved glycyl-aspartyl dipeptide at the ß-turn appeared to be important in guiding the protein sequence into the barrel fold. Molecular dynamics simulations of the type II ß-turn peptide suggested that aspartate is a key residue in the folding of the protein sequence into the barrel. Our study suggests that the conserved type II ß-turn and the non-polar residues in the barrel core are crucial for the folding of the proteins primary sequence into the ß-barrel conformation.
Keywords: ß-barrel/molecular dynamics/protein folding/SH3/type II ß-turn
| Introduction |
|---|
|
|
|---|
The landmark work of Anfinsen indicated for the first time that the primary structure of a protein dictates its tertiary structure (Anfinsen, 1973
-helices, ß-sheets and turns, the predominant secondary structural elements in proteins. By arranging these simple elements in precise patterns, complex protein structures assemble to achieve the diversity of protein functions. A major goal in understanding how the amino acid sequence of a protein specifies its structure is to understand how these elements of secondary structure are organized onto a tertiary scaffold. This requires learning how properties of individual amino acids are exploited in guiding an amino acid sequence into a particular fold. Much progress has been made in the last decade towards understanding the relationship between a proteins sequence and structure, yet the protein folding problem remains a captivating puzzle.
Researchers have learnt several rules governing the formation of helices and turns. However, the principles behind ß-sheet formation are much less understood (Serrano, 2000
). It is therefore especially intriguing to speculate how ß-sheet proteins, having complex topologies and involving numerous contacts between residues distant in sequence, acquire their native structure. Several features of ß-sheet proteins have been suggested to be important for efficient folding and stability. The overall hydrophobic and polar pattern of amino acids may be a dominant driving force for defining a proteins topology (Eisenberg et al., 1984
; Bowie et al., 1990
; Kamtekar et al., 1993
). Recognition between amino acid side chains on neighboring ß-strands may guide a correct strand register and hence stabilize the resulting ß-sheets (Merkel et al., 1999
; Mandel-Gutfreund et al., 2001
). Another possibility is the formation of turns at critical locations in the protein structure. Turns may be particularly important for anti-parallel sheet formation and hence defining the protein topology. Supporting this hypothesis, recent studies indicate that the residues in the distal loop in the SH3 domain are important for nucleation of protein folding (Martinez and Serrano, 1999
; Riddle et al., 1999
). Different combinations of these possible interactions are the most likely determinant of ß-sheet topology and hence protein stability.
In the course of evolution, three-dimensional structures of proteins are conserved to a greater degree than their sequences, which determine their structure. Residue substitutions, which tend to destabilize a particular site, would probably be compensated by other substitutions that confer greater stability on the structure. For example, if volume conservation were important to structure and function, a substitution involving a reduction of volume in the protein core might result in a destabilizing pocket in the core. In this case, it might become necessary to substitute another residue at a position distant in the sequence but near in space. This second substitution should then have a larger side chain in order to conserve the overall volume of the core and therefore the overall folded structure. Thus, if structural compensation is a general phenomenon, neighbouring sites in the three-dimensional structure will tend to evolve in a correlated fashion owing to the compensation process. In the past decade there has been a great deal of progress in the development of methods for predicting interactions in protein structures by analysis of correlated changes in sequence evolution (Altschuh et al., 1988
; Shindyalov et al., 1994
; Pollock and Taylor, 1997
).
In this study, we have undertaken a comprehensive analysis of the sequence and structural variation seen in the small ß-barrel proteins. A ß-barrel is essentially identified by two geometric characteristics: the number of ß-strands in the barrel (n) and the number of ß-bridge staggers across the ß-sheet (the shear number, S) (Murzin et al., 1994
). Within the all-ß protein class in the Structural Classification of Proteins (SCOP) database there exist five folds which can be grouped together as small ß-barrels (Murzin et al., 1995
). These barrels are characterized by the presence of four ß-strands (n = 4) and a shear number of 8 (S = 8) (Murzin et al., 1994
). Although the five folds have similar secondary structure composition, each has a distinct topology. The goal of this study was to identify conserved features across these ß-barrel folds, which may also be important in the initial steps of the folding pathway and in guiding the proteins primary sequence into a ß-barrel with the specific topology. In the work described here, we constructed and analyzed multiple sequence alignments for protein sequences in each of these five barrel folds. We also aligned structures of the different proteins, within and across the folds. In order to determine certain structural features common to the barrel folds at both the sequence and structural level, we studied the conservation and covariation in the SH3-like barrel, GroES-like and the PDZ domain-like folds. Molecular dynamics (MD) simulations on a GroES peptide, derived from a conserved ß-turn, were also carried out in order to address its role as a possible nucleation site in the folding pathway. By combining sequence and structural analysis it was possible to interpret the pattern of conservation seen in the three protein folds.
| Materials and methods |
|---|
|
|
|---|
The SCOP database classifies all-ß proteins into 93 folds according to their topology and evolutionary relationships (Murzin et al., 1995
|
Structure comparison
A total of 20 structures were considered for structural comparison of proteins (Table I
). Of these, 13 belonged to the SH3-like barrel fold, three to the GroES-like barrel and two to the PDZ domain-like barrel fold. One structure was selected for each of the N-terminal domains of the minor coat protein g3p and the Sm motif of small nuclear ribonucleoproteins, SNRNP fold. Coordinates for each of these proteins were retrieved from the PDB (Bernstein et al., 1977
). Superimpositions were done among the structures within the same and also across the various ß-barrel folds. For inter-fold superimpositions, one representative from each fold was taken. The representative structure corresponded to any one protein in a fold for which complete sequence analysis was done as per the criteria of more than 30 sequences in the multiple sequence alignment (addressed later). Hence, the structures chosen for the inter-fold comparison were the
-spectrin SH3 domain protein, 1shg; the Escherichia coli GroES, 1aon and the rat neuronal nitric oxide synthase, 1qav from the SH3-like barrel, GroES-like and the PDZ domain-like fold, respectively. The structures were superposed by visualization, followed by least-squares fitting using the lsq commands of O (Jones et al., 1991
).
Residue conservation
To measure the level of conservation at each position in the alignment, the frequency of occurrence of an amino acid at each position was determined. This was achieved by the calculation of the positional entropy at each position in the alignments obtained. A positional entropy of n is equivalent to the diversity of n residues occurring at the position with a frequency of 1/n. A position that is completely conserved will thus have a positional entropy of 1. For position i, with residues r = (A, C, D, ..., V, W, Y) occurring at frequencies pi(r), the entropy H(i) is defined as
![]() |
This entropy is known as the Shannon informational entropy (Shenkin et al., 1991
).
The positional entropy is expressed as
![]() |
Volume correlation
The correlation coefficient at each residue position in the alignment was calculated as a measure of covariation in the volumes of the side chains. The side-chain volumes were taken fromHarpaz (Harpaz et al. 1994
). A pairwise correlation coefficient, r(x,y) determined the correlation between two residue positions and was expressed as
![]() |
Sequence alignment
A total of 20 initial target sequences corresponding to the representative protein in each family were considered for the analysis (Table I
). The chosen target sequence was used for a BLAST search (E < 0.001) of the non-redundant database compilation (Altschul et al., 1997
). Homologous sequences were retrieved and the stretch of residues, aligned to the initial target domain, extracted from each protein sequence. Two or more domains within a protein sequence were considered as separate sequences. Thus, a sequence having two domains was split into two, each corresponding to a different domain, within the protein. These sequences were aligned using the ClustalW program (Thomson et al., 1994
). Once the initial alignment was constructed, sequences with ClustalW score of >90 were removed in order to remove any bias in the sequence analysis due to high degree of similarity. To avoid artifactual results arising out of inaccurate sequence alignments, sequences with a score of <25 were also removed. The remaining sequences were realigned such that, in the final alignment, no two sequences had a score of <25 or >90. Families where, after the editing, the number of sequences in the alignment was <30 were not considered for further analysis.
Only five of a total of 20 families in the n = 4, S = 8 ß-barrel protein folds fulfilled the criterion of >30 sequences in the multiple alignment. These included the SH3 domain and the C-terminal domain of ribosomal protein L2 in the SH3-like barrel fold, GroES and alcohol dehydrogenase-like, N-terminal domain in the GroES-like fold and PDZ-domain in the PDZ domain-like fold. Sequence alignment data corresponding to these families were considered for statistical analysis.
Molecular dynamics simulations
MD simulations for a small peptide of the E.coli GroES were performed using the Discover module in the InsightII molecular modelling package (MSI/Biosys, San Diego, CA, 1997). The simulations were performed with a cubic periodic boundary condition (box dimensions 25x25x25) and consisted of the peptide solvated with water molecules. The effective water density in the solvation box was 0.96 g/cm3. All atoms were considered explicitly and their interactions were computed using the CVFF force field. The time step in the MD simulations was 1 fs. All simulations began with 100 iterations of the energy minimizations of the peptide to relax the local forces. Subsequently, MD simulations were performed at 300 K for 500 ps. A seven-residue peptide with the original conformation as in the protein with an intact type II turn was the starting structure. Simulations were performed for the wild-type sequence of the peptide and also on two other peptides. In one of these, the aspartate was mutated to asparagine and in the second the aspartate was mutated to alanine. The native-like side chainmain chain hydrogen bond was retained in the aspartate to asparagine mutant.
| Results |
|---|
|
|
|---|
The study involved comparison of sequence and structure data for the different four-stranded ß-barrel folds. According to the number of strands forming a compact globular structure, these constitute the smallest barrels known. The difference among these different folds essentially lies in the manner in which the four ß-strands are connected, thereby generating a unique topology (Figure 1
|
Comparisons within the SH3-like barrel fold
Superpositions were done among 13 structures in the SH3-like barrel fold (Table I
). These structures superposed well on one another with a maximum r.m.s. deviation of 2.24 Å for 29 atoms between the DNA binding domain of HIV-I integrase, 1ex4 and the diphtheria toxin repressor, 2dtr (Table IIa
). The minimum r.m.s. deviation of 1.23 Å for 38 atoms was seen between the
-spectrin SH3 domain protein, 1shg and the CcdB protein, 2vub.
|
Of interest is the region at the type II ß-turn of SH3-like barrel fold proteins. The turn, referred to as the diverging turn in the SH3 domain (Yi et al., 1998
-spectrin SH3 domain, 1shg; photosystem I accessory protein, 1psf; diphtheria toxin repressor, 2dtr; nitrile hydratase ß-chain, 2ahj; the ribosomal protein L24, 1ffk and ferredoxin thioredoxin reductase, 1dj7. A stretch of >11 residues in the loop connecting strands 1 and 2 of the ß-barrel necessitated the presence of the type II turn, as observed in five of these structures. The presence of the turn, in these structures, appears to guide the polypeptide into the ß-barrel helping in the formation of the folded barrel structure.
Multiple sequence alignments for each of the 13 proteins considered for structural comparisons were generated as described in Materials and methods. A BLAST search with the amino acid sequence of the
-spectrin SH3 domain (SH3 domain family) gave 219 hits with E < 0.001. Splitting of multi-domain sequences augmented this number to 302. Exclusion of sequences with ClustalW scores of <25 and >90 drastically reduced the number of sequences in the final alignment to 30. A BLAST search for the sequence of the C-terminal domain of ribosomal protein L2 (translation proteins SH3-like domain family) resulted in an initial number of 132 hits with E < 0.001. A total of 65 sequences homologous to the ribosomal protein were finally obtained by editing the sequences in a manner similar to that described above. For all the remaining sequences subjected to BLAST search, the number of sequences after editing was <30. These sequence alignments were hence not considered for further analysis for reasons described in the Materials and methods section.
The degree of conservation at each position in the multiple alignment generated was determined using the Shannon Informational entropy calculation (Shenkin et al., 1991
). Tables IIIa and b
give the positional entropy values at the core residue positions for the SH3 domain and the C-terminal domain of ribosomal protein L2 families, respectively. Core residue positions for the representative proteins in each family were identified by calculating the percentage accessibility of side chains using the NACCESS program (Hubbard et al., 1991
). Residues with side chain accessibilities of <7% were considered part of the core. A total of nine core positions were identified in the
-spectrin SH3 domain. The positional entropies were <3 at all nine core positions in the SH3 domain protein (Table IIIa
). In the case of the C-terminal domain of ribosomal protein L2, 10 of the 12 core positions showed high residue conservation as indicated by a positional entropy of <3 at these positions (Table IIIb
). Tables IIIa and b
also indicate the prevalence of amino acid residues at the core residue positions for the SH3-like barrel fold families, SH3 domain and the C-terminal domain of ribosomal protein L2. These core positions, as seen from the data, are predominantly occupied by valine, leucine or isoleucine in both the families. The other residues occupying positions in the barrel core are the non-polar residues including phenylalanine, methionine, alanine and glycine. The presence of these residues contributes to the high hydrophobicity at the core of the barrel in this fold. High conservation of non-polar residues at the core residue positions suggests the importance of a hydrophobic interior in maintaining the integrity of the fold.
|
A covariance analysis of residue volumes indicated a significantly high correlation between residues at the core positions in the SH3-like barrel (Figure 2
-spectrin SH3 domain). An increase in volume of the core due to a larger side chain at residue position 23 (mostly leucine or valine) is compensated by a reduction in the side chain volume at the correlated position 44 (mostly valine or glycine) (Table IIIa
-spectrin SH3 domain and residue 53 in the fourth strand. A negative correlation of 0.84 is seen between these core residue positions. Positive correlations of 0.75 between residues 23 and 53 and 0.42 between residues 25 and 53 result in compensation of the overall core volume. This observation strongly supports the belief that maintenance of the total volume of the core would be important to keep the barrel structure intact.
|
Comparisons within the GroES-like fold
Three representative proteins in the GroES-like fold were considered for the comparative study (Table I
). Superposition of proteins within the GroES-like fold was done as in the case of the SH3-like barrel fold. The representative proteins included for analysis superposed well with one another with a minimum r.m.s. deviation of 1.35 Å for 51 atoms between the horse alcohol dehydrogenase, 3bto and the E.coli GroES, 1aon (Table IIb
). Different proteins within the alcohol dehydrogenase-like, N-terminal domain family also superimposed very well on one another (data not shown). Comparisons revealed an overall conservation of the ß-barrel core in the representative proteins.
A BLAST search yielded 170 hits for the E.coli GroES protein. Splitting of the multi-domain sequences increased this number to 174. Further editing as described earlier for the SH3-like barrel fold, however, reduced the number to 86. An initial number of 412 hits in a BLAST search for the alcohol dehydrogenase reduced to 52 sequences in the final alignment after appropriate editing. In case of the SacY protein, the number of sequences in the final alignment was <30. This protein and the corresponding family were thus omitted from further sequence and structural comparisons.
Core residue positions were identified in the two GroES-like fold proteins, the E.coli GroES and the horse alcohol dehydrogenase. Positional entropy values for the corresponding families at these core positions are shown in Tables IIIc and d
. Of the 11 core residue positions in the E.coli GroES, eight were highly conserved with positional entropies of <3. Two of the three high-entropy positions, 84 and 86, were largely occupied by non-polar residues, the most predominant being leucine. Another variable position in the core, 73, was mostly occupied by threonine. Ten core positions were identified in the alcohol dehydrogenase. High residue conservation is seen at nine of the 10 core positions, indicated by a positional entropy value of <3 (Table IIId
). These positions were largely occupied by small hydrophobic amino acid residues. The predominant residue at position 152, the only high-entropy position in the alcohol dehydrogenase, was valine. Tables IIIc and d
indicate the prevalence of non-polar amino acid residues, including valine, leucine and isoleucine, at the core residue positions for the GroES-like fold proteins. The presence of polar, uncharged residues at the core positions, however, is not uncommon. As reported earlier, valines at the core positions in these proteins are seen to be mutable into isoleucines but not to leucines (Table III
) (Taneja and Mande, 1999
). Unlike the SH3-like barrel fold proteins, proteins in the GroES-like fold did not show a high correlation between residue volumes in the barrel core.
Comparisons within the PDZ domain-like fold
The two representative structures of the PDZ domain-like fold interleukin 16, 1il16 and the neuronal nitric oxide synthase, 1qav superposed well on one another with an r.m.s. deviation of 1.90 Å for 77 atoms. Proteins within the PDZ domain family when compared among themselves superposed well on one another with an overall conservation of the ß-barrel (data not shown).
A BLAST search with the amino acid sequence of the neuronal nitric oxide synthase (representative of the PDZ-domain family) gave 230 hits. This initial number first rose to 386 owing to splitting of multi-domain sequences, but a final number of 35 sequences was obtained after editing. The number of sequences in the final alignment obtained from the protein interleukin 16 was <30. This protein and the corresponding family were thus not considered for further analysis.
A total of 16 core positions were identified in the neuronal nitric oxide synthase. Of these, 12 positions show high residue conservation with a positional entropy <3 at each of these positions (Table IIIe
). These core positions are predominantly occupied by valine, leucine or isoleucine. Alanine seems to be the residue of choice for the remaining four positions. As in the GroES-like fold, the valines appear to be mutable to leucines rather than to isoleucines (Table IIIe
). Core residue positions did not show a significant correlation among residues in proteins considered in this fold.
The final number of sequences in the multiple sequence alignments generated for the representative proteins in the N-terminal domains of the minor coat protein, g3p and Sm motif of small ribonucleoproteins, SNRNP families was <30. Since there were no sequence data for the two families owing to lack of fulfillment of the set criteria for sequence analysis, the two families and hence the corresponding folds were excluded from the study.
Comparisons across the ß-barrel folds
One of the objectives of the study was to identify similarities and dissimilarities across the ß-barrel folds. One representative structure from the three ß-barrel folds, viz.
-spectrin SH3 domain (SH3-like barrel fold), E.coli GroES (GroES-like fold) and the neuronal nitric oxide synthase (PDZ domain-like fold) were therefore considered for the comparisons. The topologies of the three representative structures are different from one another as shown in Figure 1
. Comparison of topologies of the SH3-like barrel and GroES-like fold shows the presence of a 310 helix interrupting the fourth strand in both the ß-barrel folds. The first three strands of the barrel form a similar anti-parallel ß-sheet in the two protein folds, yet the two have distinct topologies. The difference lies in the way in which the fourth ß-strand hydrogen bonds with the other strands forming the barrel. In the case of the SH3-like barrel fold, the fourth strand runs anti-parallel to the third ß-strand followed by the 310 helix. This short helix juxtaposes the fourth strand to the first resulting in the formation of the barrel. The 310 helix in GroES-like fold, however, juxtaposes the fourth strand to the third, for the formation of the complete barrel. Figure 1
also indicates the topology of the PDZ domain-like fold. This fold consists of two helices, one in the region connecting the ß-strands two and three and the other between the third and the fourth strands.
The difference in topology of the three representative proteins thus makes it difficult to superpose the corresponding structures. The presence of a common ß-barrel structural core, however, may allow comparison of the secondary structural elements forming the barrel in these proteins. Hence, ignoring the topology of the three folds, ß-strands of the representative proteins were superimposed on one another. Structural comparisons yielded two alternative ways in which ß-strands of the three proteins could be superimposed on one another with minimal r.m.s. deviation values. In one of the superimpositions, the 310 helix of the
-spectrin domain superposes very well on that in the E.coli GroES (Figure 3a
). The superposition is such that residues of strands 1, 2 and 3 of the
-spectrin domain align with those in strands 3, 2 and 1 of the E.coli GroES respectively. The r.m.s. deviation data are as shown in Table IVa
. In the case of the PDZ domain-like fold, the structural alignment superposes strands 2, 3 and 4 of the neuronal nitric oxide synthase onto strands 1, 2 and 3 of the
-spectrin SH3 domain, respectively. In an alternative superposition, ß-strands of the representative structures align such that strands 4, 1 and 2 of the
-spectrin SH3 domain align with strands 1, 2 and 3 of the E.coli GroES, respectively. The r.m.s. deviation data for this superimposition are given in Table IVb
. Interestingly, this alternative superposition superimposes a type II ß-turn present in the three structures (Figure 3b
).
|
|
The ß-turn, referred to as the diverging turn in the SH3 domain, occurs at the intervening loop connecting strands 1 and 2. The turn has previously been reported to play a role in protein folding (Riddle et al., 1999
values at this turn correctly place this turn in the type II category (
i + 1 = 63,
i + 1 = 140;
i + 2 = 98.6,
i + 2 = 8.6). In the GroES-like fold the turn (
i + 1 = 58.8,
i + 1 = 134.6;
i + 2 = 96.3,
i + 2 = 104) is present at the initiation of the third ß-strand following the dome loop. The presence of the type II turn in the GroES-like and PDZ domain-like folds suggests that the turn in these protein folds may play a role similar to that observed in case of the SH3 domain. We considered this turn to be a crucial folding nucleus in the ß-barrel folds treated in this study. Further analyses were therefore carried out in relation to the superimposition where the ß-turn of all the three representative structures superposed on one another as indicated in Figure 3bResidue conservation
In order to assess the variability of amino acid residues across the three ß-barrel folds, positional entropies were compared at the structurally aligned positions of the three folds. Upon alignment of the representative structures in the SH3-like barrel, GroES-like and PDZ domain-like folds, three ß-strands of each structure superimposed well on one another (Figure 3b
). Sequences of the representative proteins were then aligned on the basis of the structural alignment. Residues spanning the first ß-strand of the
-spectrin SH3 domain (914) aligned with positions 3843 (strand 2) in the E.coli GroES and with residues 106111 (strand 2) in the neuronal nitric oxide synthase. Residues 2532, spanning the diverging type II turn (2629) of the
-spectrin SH3 domain, aligned with positions 5966 and 123130 of the E.coli GroES and neuronal nitric oxide synthase, respectively. Residues 28 and 29 (numbers correspond to the
-spectrin SH3 domain) form the i + 2 and i + 3 positions of the type II turn. Residues spanning strand 4 of the
-spectrin SH3 domain (5761) structurally aligned with those in the first ß-strand of E.coli GroES (residues 1115) as also in the neuronal nitric oxide synthase (residues 9598). Figure 4
shows the positional entropies at the structurally aligned residues. Among these 19 structurally aligned positions, high conservation in each of the three proteins is seen at eight positions. The positional entropies at these positions are <3 in all the three protein sequence alignments. Remarkably, five of these eight highly conserved positions form the core of the barrel in all the three protein structures (Table V
). The core positions are largely occupied by small non-polar residues. High conservation at core positions in the protein barrel suggests the importance of core residues in the formation and maintenance of the barrel structure.
|
|
Of the remaining three highly conserved residue positions, two correspond to residues in the type II ß-turn mentioned earlier. The i + 2 and i + 3 residue positions of the type II turn (corresponding to residues 28 and 29 in the
-spectrin SH3 domain) show a high residue conservation. The predominant residue at the i + 2 position is glycine and that at i + 3 is aspartate. A high residue conservation at the type II ß-turn has earlier been reported for the GroES-like fold at the corresponding positions 62 and 63 of E.coli GroES. H-bonding between the side-chain carboxylate of aspartate and main chain amide of the first residue of the turn has been suggested to be important in juxtaposing the ß-strands of the barrel, such that the barrel structure is maintained (Taneja and Mande, 1999Molecular dynamics simulations
Since the side chainmain chain interaction in the type II ß-turn appears to be important for the barrel structure formation and maintenance, disruption of this interaction should result in the disintegration of the type II turn. This would ultimately result in the loss of the barrel structure. To investigate the stability of the type II ß-turn upon alteration of the aspartate, we performed extensive MD simulations for the GroES peptide and its mutants. The sequence of the peptide taken for MD simulations was VKVGDIV (corresponding to residues 5965 in the E.coli GroES). The starting conformation for the MD simulations was as observed in the crystal structure of the protein. Additional MD simulations were done with the aspartate mutated to asparagine in one case and to alanine in the other. Any alteration in the conformation in the type II turn would immediately be evident from changes in values of the dihedral angles in the type II turn.
A comparison of the
and
values was performed for the different residue positions in the ß-turn during the 500 ps simulation. The
and
values of the i + 1 residue remain more or less similar in all the three peptides (data not shown). However, major deviations occur in the
values of the i + 2 residue, glycine, when the i + 3 residue is mutated from aspartate to asparagine or alanine. While in the native peptide, the
and
values at i + 2 position fluctuate around the value of +100 and 40, respectively,
i + 2 changes to about +150 in the mutant peptides. The
i + 2 value also drops from 2 to about 100 for both the mutant forms. This, however, occurs after an initial sudden rise of
i + 2 from 2 to +70. A large deviation is seen at the fourth residue position in the ß-turn in the mutant peptide. In the case of the aspartate to asparagine mutation,
i + 3 drops from about 60 to 141 while it remains stabilized in the native peptide. Comparison of the distance between Cß atoms of the i and i + 3 residue further corroborates the disintegration of the type II turn upon mutation of the fourth residue in the turn (Figure 5
). While this distance is maintained at around 5.6 Å in the native peptide, it increases to about 8 Å in the aspartate to asparagine mutant and to about 10 Å in the aspartate to alanine mutant peptide. These results indicate the importance of the side chainmain chain H-bond interaction in the type II turn maintenance. Alteration of aspartate to asparagine is hence sufficient to cause the disruption of the type II turn conformation. A high conservation of the turn, as also the residues in the turn, thus might be of evolutionary importance in maintaining the structure of the ß-barrel.
|
| Discussion |
|---|
|
|
|---|
Within various protein families such as serine proteases, cysteine proteases and globins, the three-dimensional structure is remarkably similar despite considerable variations in the amino acid sequences. To a certain extent, conserved residues or conservative changes account for the structural conservation. In addition, correlated pairs of residues have an important role in stabilizing the protein structure. Determination of these conserved features along with the compensatory substitution patterns helps in increasing our understanding of features that may determine the three-dimensional structure of a protein.
In this study, we have attempted to identify folding determinants in the small ß-barrel proteins. Conservation patterns across these ß-barrel folds reveal interesting similarities of residues at the core of the protein barrels. Irrespective of the topologies, these proteins show a high conservation of small non-polar amino acid residues at the core positions. The core residue positions are predominantly occupied by valine, leucine and isoleucine. Interestingly, valines at the core positions are seen to be mutable into isoleucine and not leucine, an observation reported earlier (Taneja and Mande, 1999
). The higher frequency of substitution of isoleucine by valine has been attributed to a higher ß-sheet propensity of isoleucine and valine than leucine (Wilmot and Thornton, 1988
). Branching of side chains at the Cß positions in both valine and isoleucine, but not leucine has previously been suggested as a possible reason for such a mutation pattern (Taneja and Mande, 1999
). The observed mutation pattern and a high conservation of non-polar side chains suggest that the overall hydrophobic pattern of amino acids may drive the protein sequence to collapse into the ß-barrel conformation.
Correlation analysis of the SH3-like barrel fold suggests that maintenance of the total core volume occurs within the SH3 domain family of proteins. Amino acid substitutions resulting in an increase or a decrease in the volume of the core is compensated by replacement of another amino acid residue. This amino acid residue is present at a position that might be distant in sequence, but near in space to the mutated residue so as to conserve the total volume of the core and hence the overall folded structure. Interestingly, an earlier analysis of 266 SH3 sequences did not find evidence for correlated substitutions (Larson and Davidson, 2000
). We suggest that our criteria of choosing sequence identities between 25 and 90 generates a more accurate multiple alignment for a robust statistical analysis. Accuracy of the alignment is reflected in observation of the covarying mutations.
Of the common features, the presence of a type II ß-turn is the most intriguing. This turn seems to be important in the formation of the ß-barrel. Earlier studies have reported the importance of the ß-turn in SH3 domain (Riddle et al., 1999
; Larson and Davidson, 2000
). Conservation of this turn, not only in proteins constituting one of the ß-barrel folds but also across the various ß-barrel folds considered in this study, suggests that this region might be an important nucleation site in the folding pathway of the ß-barrel proteins (Riddle et al., 1999
; Larson and Davidson, 2000
). This nucleation appears to be guided by the residues present within the turn. High residue conservation has been seen at the i + 2 and i + 3 residue positions. While the presence of glycine at i + 2 guides the protein sequence into a turn, aspartate at i + 3 is important for a unique side chainmain chain interaction. Simulation studies corroborate similar conclusions of independent work carried on an SH3 peptide (Krueger and Kollman, 2001
). Furthermore, our analysis suggests that alteration of aspartate to asparagine or alanine destabilizes the ß-turn conformation. The glycyl-aspartyl dipeptide hence appears to be a major factor in helping maintain the integrity of the barrel.
Our study shows interesting similarities among proteins in the different ß-barrel folds. Despite large differences in sequence and function, the occurrence of a conserved glycyl-aspartyl dipeptide, intriguingly at a conserved type II turn, suggests the importance of the turn and the residues forming it in the formation of the ß-barrel. In addition, a conserved hydrophobic core suggests its role in maintenance of the barrel structure. Further studies such as site-directed mutagenesis should confirm the importance of these conserved features in the formation and maintenance of protein structure.
| Notes |
|---|
1 Present address: Northwestern University, Chicago, IL, USA
2 To whom correspondence should be addressed. E-mail: shekhar{at}cdfd.org.in ![]()
| Acknowledgments |
|---|
We thank Debasis Mohanty and Sharmila Mande for useful comments and suggestions. Coordinates for Thermoplasma acidophilum glucose dehydrogenase were kindly supplied by Garry Taylor. B.T. and R.Q. are CSIR Senior and Junior Research Fellows, respectively. Financial support for the work was provided by the Department of Biotechnology and by the Council of Scientific and Industrial Research.
| References |
|---|
|
|
|---|
Altschuh,D., Vernet,T., Berti,P., Moras,D. and Nagai,K. (1988) Protein Eng., 2, 193199.
Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 33893402.
Anfinsen,C.B. (1973) Science, 181, 223230.
Bernstein,F.C., Koetzle,T.F., Williams,G.J., Meyer,E.E., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535542.[Web of Science][Medline]
Bowie,J.U., Reidhaar-Olson,J.F., Lim,W.A. and Sauer,R.T. (1990) Science, 247, 13061310.
Eisenberg,D., Schwarz,E., Komaromy,M. and Wall,R. (1984) J. Mol. Biol., 179, 125142.[CrossRef][Web of Science][Medline]
Harpaz,Y., Gerstein,M. and Chothia,C. (1994) Structure, 2, 641649.[Medline]
Hubbard,S.J., Campbell,S.F. and Thornton,J.M. (1991) J. Mol. Biol., 220, 507530.[CrossRef][Web of Science][Medline]
Jones,T.A., Zou,J.Y., Cowan,S.W. and Kjeldgaard,M. (1991) Acta Crystallogr., A47, 110119.
Kamtekar,S., Schiffer,J.M., Xiong,H., Babik,J.M. and Hecht,M.H. (1993) Science, 262, 16801685.
Kraulis,P.J. (1991) J. Appl. Crystallogr., 24, 946950.[CrossRef]
Krueger,B.P. and Kollman,P.A. (2001) Proteins: Struct. Funct. Genet., 45, 415.[CrossRef][Web of Science][Medline]
Larson,S.M. and Davidson,A.R. (2000) Protein Sci., 9, 21702180.[Web of Science][Medline]
Mandel-Gutfreund,Y., Zaremba,S.M. and Gregoret,L.M. (2001) J. Mol. Biol., 305, 11451159.[CrossRef][Web of Science][Medline]
Martinez,J.C. and Serrano,L. (1999) Nature Struct. Biol., 6, 10101016.[CrossRef][Web of Science][Medline]
Merkel,J.S., Sturtevant,J.M. and Regan,L. (1999) Structure, 7, 13331343.[Medline]
Murzin,A.G., Lesk,A.M. and Chothia,C. (1994) J. Mol. Biol., 236, 13821400.[CrossRef][Web of Science][Medline]
Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536540.[CrossRef][Web of Science][Medline]
Pollock,D.D. and Taylor,W.R. (1997) Protein Eng., 10, 647657.
Riddle,D.S., Grantcharova,V.P., Santiago,J.V., Alm,E., Ruczinski,I. and Baker,D. (1999) Nature Struct. Biol., 6, 10161024.[CrossRef][Web of Science][Medline]
Serrano,L. (2000) Adv. Protein Chem., 53, 4985.[Web of Science][Medline]
Shenkin,P.S., Erman,B. and Mastrendrea,L.D. (1991) Proteins, 11, 297313.[CrossRef][Web of Science][Medline]
Shindyalov,I.N., Kolchanov,N.A. and Sander,C. (1994) Protein Eng., 7, 349358.
Taneja,B. and Mande,S.C. (1999) Protein Eng., 12, 815818.
Thomson,J.D., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 467380.
Wilmot,C.M. and Thornton,J.M. (1988) J. Mol. Biol., 203, 221232.[CrossRef][Web of Science][Medline]
Yi,Q., Bystroff,C., Rajagopal,P., Klevit,R.E. and Baker,D. (1998) J. Mol. Biol., 283, 293300.[CrossRef][Web of Science][Medline]
Received April 26, 2002; revised October 3, 2002; accepted October 10, 2002.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
F. G. Riepe, S. Tatzel, W. G. Sippell, J. Pleiss, and N. Krone Congenital Adrenal Hyperplasia: The Molecular Basis of 21-Hydroxylase Deficiency in H-2aw18 Mice Endocrinology, June 1, 2005; 146(6): 2563 - 2574. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








