Skip Navigation


PEDS Advance Access originally published online on March 24, 2006
Protein Engineering Design and Selection 2006 19(6):265-275; doi:10.1093/protein/gzl009
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow supplementary data
Right arrow All Versions of this Article:
19/6/265    most recent
gzl009v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Muppirala, U. K.
Right arrow Articles by Li, Z.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Muppirala, U. K.
Right arrow Articles by Li, Z.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org

A simple approach for protein structure discrimination based on the network pattern of conserved hydrophobic residues

Usha K. Muppirala1 and Zhijun Li1,2,3

1 Bioinformatics Program, University of the Sciences in Philadelphia Philadelphia, PA 19104, USA 2 Department of Chemistry & Biochemistry, University of the Sciences in Philadelphia Philadelphia, PA 19104, USA

3To whom correspondence should be addressed. E-mail: z.li{at}usip.edu


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Evolutionarily conserved hydrophobic residues at the core of protein structures are generally assumed to play a structural role in protein folding and stability. Recent studies have implicated that their importance to protein structures is uneven, with a few of them being crucial and the rest of them being secondary. In this work, we explored the possibility of employing this feature of native structures for discriminating non-native structures from native ones. First, we developed a network tool to quantitatively measure the structural contributions of individual amino acid residues. We systematically applied this method to diverse fold-type sets of native proteins. It was confirmed that this method could grasp the essential structural features of native proteins. Next, we applied it to a number of decoy sets of proteins. The results indicate that such an approach indeed identified non-native structures in most test cases. This finding should be of help for the investigation of the fundamental problem of protein structure prediction.

Keywords: connectivity pattern/conserved hydrophobic residue/hub-residue/network/protein structure


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
De novo protein structure prediction faces two fundamental challenges: the development of effective methods for conformation sampling and the development of an accurate function for structure discrimination (Bradley et al., 2005Go). Among various approaches to address these, the evolution-based approach has attracted an intensive interest over the years (Gobel et al., 1994Go; Casari et al., 1995Go; Lichtarge et al., 1996Go; Landgraf et al., 1999Go; Simon et al., 2002Go; Valdar, 2002Go; Gloor et al., 2005Go). This approach seeks to understand the functional roles of evolutionarily conserved amino acid residues including identical residues and conserved substitutions and subsequently apply them to structure prediction (Shindyalov et al., 1994Go; Livingstone and Barton, 1996Go; Oritz et al., 1999Go; Larson et al., 2000Go; Mihalek et al., 2003Go; Taylor et al., 2003Go). The elucidation of the functional roles of conserved residues is generally based on the classical physical model of protein structures, which dissects a structure into the surface, including functional sites, and the interior core. The packing at the interior core of a globular protein structure is regarded as homogeneous and tight, with densely interacting hydrophobic residues predominating (Richards and Lim, 1993Go; Gerstein and Chothia, 1996Go). To study the role of conserved residues, one can follow the occurrence of residues on the protein surface, at the interior of the protein or within a functional site (Figure 1A).


Figure 1
View larger version (35K):
[in this window]
[in a new window]
 
Fig. 1. Comparison of the three-dimensional structure and the two-dimensional network of the PDZ domain of protein PSD-95 (PDB ID: 1KWA). (A) The three-dimensional structure is depicted with the conserved residue I53 highlighted in yellow. (B) The two-dimensional network with the hub-residue I53 shaded is shown. The image for (A) was prepared using DINO (DINO: Visualizing Structural Biology 2002. http://www.dino3d.org).

 
What such a model lacks, however, is a more detailed description of the structural contributions by individual amino acid residues or a description of their interacting patterns. In the traditional evolutionary approach, the contributions of conserved residues, particularly conserved hydrophobic residues spatially clustered together, are normally treated equally (Yao et al., 2003Go). Typically the insight gained through this approach is only descriptive, and as a result, the implementation of such knowledge in protein structure prediction is not straightforward (Mihalek et al., 2003Go; Taylor et al., 2003Go).

Recently, two new models have been put forward that provide in-depth insight into protein structure packing (Greene and Higman, 2003Go; Socolich et al., 2005Go). The first model, based on the network view of protein structures, disclosed a scale-free network underlying any protein structure (Greene and Higman, 2003Go). A scale-free network is one in which many nodes have few edges, and a few nodes have many edges (Barabasi and Albert, 1999Go). The most connected nodes are crucial to the stability of the network (Albert et al., 2000Go). The second model, based on the statistical coupling analysis, suggested heterogeneous packing at protein cores, with a few residues strongly coupled to each other, surrounded by many residues forming weaker interactions (Socolich et al., 2005Go). Although using different approaches, each model suggested a common and unexpected feature of protein packing that proteins rely greatly on only a few members of the set of conserved residues. This observation prompts us to propose a hypothesis that native protein structures should typically display such a characteristic, and non-native structures will display it to a less extent or not at all. Thus, it should be possible to employ it to discriminate native structures from deviated ones.

To test our hypothesis, our approach included two stages. In the first stage, we developed a computational method to quantitatively measure the individual contributions of all amino acid residues in a protein structure, so that the characteristic described above could be defined quantitatively. The method we developed is similar to network analysis tools. A protein structure is transformed into a network by representing each amino acid residue as a node and each inter-residue contact as an edge or connectivity linking two nodes (Greene and Higman, 2003Go; Gupta et al., 2005Go) (Figure 1B). For our purpose, we chose to define a network edge by one of the four types of inter-residue interactions: hydrophobic interaction, hydrogen bond, ionic bond, or disulfide bond. Through this way, the contribution of a single residue could be simply determined by the number of edges connected to it. Employing this approach, we performed a large-scale analysis of 13 diverse fold-types, and 26 superfamilies of native proteins to verify that this method can capture the essential structural features of native proteins as well as differentiate individual amino acid contributions in a somehow quantitative way. In the second stage, we applied the same approach to 18 decoy sets of 94 structures. As expected, in most cases, the connectivity in the non-native structures displayed a distribution pattern, with either a smaller maximum degree of connectivity, or fewer residues with the maximum degree of connectivity, or both, compared to the native structures.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Native protein structures

The selection of native protein structures for this study was based on the structural classification information from the SCOP database (http://scop.mrc-lmb.cam.ac.uk/scop, 1.67 release on February 2005). As the first step, under the four protein classes (all-{alpha}, all-ß, {alpha}/ß, and {alpha} + ß) in this database, the 12 high-population fold-types (each with more than 10 superfamilies) and one more randomly-selected fold-type (with one superfamily) were winnowed if they did not comply with the following criteria: (i) The structure was determined by X-ray methods at a resolution of 3.0 Å or better in order to rule out the potential effect caused by the structure determination methods themselves (Garbuzynskiy et al., 2005Go); (ii) The structure does not contain modified residues as indicated in the database; (iii) The structure was not determined as part of a complex as indicated in the database. If more than one structure within a single superfamily satisfied the above criteria, only one was taken as the representative member because members within a superfamily in the SCOP are generally assumed to be homologous (Murzin et al., 1995Go). This gave us a total of 129 structures for the initial dataset. Next, these structures were inspected visually. Protein domains with cofactors or missing non-terminal residues or judged as over-extended without a well-packed core were discarded. The final trimmed dataset contained 89 structures, representing 26 superfamilies.

Sequence alignment

Homologous sequences of protein structures in the dataset were obtained by searching the SwissProt databank (Apweiler et al., 2004Go) using BLAST (Altschul et al., 1990Go) and accepting all hits with an E-score < 0.01 and a sequence identity no <50% to ensure the accuracy of the sequence alignments (Abagyan and Batalov, 1997Go; Sauder et al., 2000Go). Multiple sequence alignments were generated using Clustal W with the full alignment option and the default parameter settings of the EBI website (Chenna et al., 2003Go). The generated alignments were adopted for further analysis without any adjustments.

Definition of conservation

Identical positions and conserved substitutions were automatically identified in Clustal W and were accepted as conserved residues without further changes. In Clustal W, identical positions are marked with a ‘*’ symbol and conserved substitutions with a ‘:’ symbol. These positions were then further classified as conserved hydrophobic positions if and only if they involved the hydrophobic residues W, F, L, I, V, A or M.

Refinement of structure dataset

The accuracy of identifying conserved residues based on a multiple sequence alignment may depend on the sequences to be aligned. Both the number of homologous sequences included in the alignment and the extent of the evolutionary divergence play a role, and proper thresholds must be set. We only examined protein structures in the dataset that have at least five homologous sequences and <60% conserved positions in the considered alignment range (Supplementary Material I is available at PEDS online). Here the considered alignment range was determined using the exact sequence of the protein structures defined in the SCOP. Implementation of these thresholds on the dataset left us with 26 unique structures, representing 13 fold-types and 26 superfamilies including all four protein classes. Among them, 20 structures have more than 10 sequences in their alignment and 21 structures have less than 50% conserved positions in their considered alignment range. As the SCOP database classifies protein fold-types based on their structural similarity, this refined dataset represented a diverse three-dimensional conformation space. The rest of proteins in the previous dataset either had only one or two homologous sequences or a significantly higher percentage of conserved positions.

Derivation of networks

To transform a protein structure into a network, each residue was considered as a node. Two nodes may be connected with an edge, defined by one of four types of inter-residue interactions: hydrophobic interaction, hydrogen bond, ionic bond and disulfide bond. Hydrogen bonds had a distance cut-off between the electronegative heavy atoms of 3.1 or 3.2 Å and a geometry criteria that the angle from donor to acceptor should be within 120 and 180° (Stickle et al., 1992Go); hydrophobic interactions and ionic bonds were based only on proximity and the default cut-off was 4.5 Å for both (Rarey et al., 1996Go); and disulfide bonds were defined to exist between explicitly bonded sulfur pairs or non-bonded sulfur pairs within the distance cut-off of 2.5 Å. These interactions were determined using the Protein Contacts function in MOE (Molecular Computing Group Inc. version 2004.03).

Protein test sets

The decoy sets were selected from both single and multiple decoy sets of the Decoys ‘R’ Us database (Park and Levitt, 1996Go; Simons et al., 1997Go; Samudrala et al., 1999Go; Xia et al., 2000Go). These decoy conformations were generated by computational methods. They possess some characteristics of native protein structures but are not experimentally determined. The primary purpose of decoy databases is to test scoring functions such as energy functions developed for protein structure discrimination. The Decoys ‘R’ Us database is used widely for this purpose. The multiple decoy sets were selected from those listed under the names 4state_reduced, fisa, fisa_casp3, hg-structal, lattice_ssfit, and lmds. Only small-rmsd decoys in those multiple decoy sets with the C{alpha} rmsd no more than 1.5 Å from native structures were studied. This included a total of 17 families and 93 structures.

Computational analysis

Statistical analysis was performed using Perl scripts developed in our lab and running on a Dell Precision 670 workstation. The degree of a node (residue) was defined as the number of edges or connectivities emerging from a node. The average degree of connectivity of a group of considered nodes (e.g. conserved nodes) was calculated by dividing the sum of the connectivity degree of all considered nodes in the network by the total number of considered nodes.

Two different sets of criteria were used to attempt to score a particular decoy. The first set included the range of the average degree of connectivity per node for all nodes (0.60–1.62) and the increased contributions from conserved hydrophobic residues, all based on studying native protein structures. The second set of criteria included the value of the maximum degree of connectivity in the structure and the number of residues with that maximum degree of connectivity. To apply the second set of the criteria to a decoy structure, if the value of the maximum degree of connectivity of its residues was less than the native structure, it was scored worse than the native one; if the value was higher, it was scored better; and if the value was same, then the number of residues with that maximum degree was compared. The structure with the higher number scored higher. If this number was also the same, then the scoring was reiterated for the second maximum degree of connectivity and the number of residues with that value. This comparison process would continue until the degree of connectivity being considered was less than three.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
To test our hypothesis that proteins rely greatly on only a subset of conserved residues, the computational method developed included several steps: (i) transform an individual structure into a network; (ii) conduct multiple sequence alignment with homolog sequences and identify conserved and conserved hydrophobic amino acid residues; (iii) map those residues onto their corresponding network and (iv) examine the network patterns of conserved hydrophobic residues.

Average degree of connectivity per node

The average degree of connectivity per node indicates the density of the connectivity. Comparison of this value for all nodes, conserved nodes and conserved hydrophobic nodes for every structure in the refined dataset, indicates a clear trend. For every fold, the average degree increases in the order: all nodes (0.60–1.62) < conserved nodes (0.70–2.67) < conserved hydrophobic nodes (1.33–5.27) (Figure 2). This trend suggests increased contribution to the protein fold from conserved residues, particularly from conserved hydrophobic residues, consistent with the widely accepted view that the hydrophobic interaction is the major driving force for protein folding.


Figure 2
View larger version (32K):
[in this window]
[in a new window]
 
Fig. 2. Average degree of connectivity per node among considered nodes for each structure in the refined dataset. Open squares, all nodes; Filled circles, conserved nodes; Filled triangles, conserved hydrophobic nodes. The PDB ID for structures 1–26 is: (1) 1GRJ; (2) 1SFE; (3) 1EI7; (4) 1AUE; (5) 1B0X; (6) 1CMW; (7) 1QC7; (8) 1BOY; (9) 1F00; (10) 1MHN; (11) 2EIF; (12) 1BPO; (13) 2ALR; (14) 1KTN; (15) 6XIA; (16) 1CQJ; (17) 1VI5; (18) 1CS0; (19) 1UBI; (20) 1JAL; (21) 2PII; (22) 1L3K; (23) 1ELO; (24) 1B7Y; (25) 1FVA; (26) 1KWA.

 
Connectivity pattern of conserved hydrophobic residues

All folds in the dataset have a distribution of the connectivity for all nodes that displays a pattern of a scale-free network (Greene and Higman, 2003Go), in which a few residues have a large degree of connectivity, while the majority have a small degree of connectivity (Figure 3). In contrast, the connectivity pattern for conserved nodes, and particularly for conserved hydrophobic nodes is quite different; it shows either a uniform or a normal-like distribution, from the lowest to the highest degree end (Figure 3). This pattern clearly indicates the unequal contributions by conserved hydrophobic residues to a protein structure, as measured by their degree of connectivity.


Figure 3
Figure 3
View larger version (77K):
[in this window]
[in a new window]
 
Fig. 3. Connectivity patterns for all considered nodes of each fold in the refined dataset is plotted. Filled circles, all nodes; Filled diamonds, conserved nodes; Filled squares, conserved hydrophobic nodes.

 
Characterization of hub-nodes

An important concept related to scale-free networks is the hub-node, referring to a node displaying significantly higher degree of connectivity. In a scale-free network, hub-nodes are crucial to the stability of the network (Albert et al., 2000Go). Therefore, it is desirable to examine them further. Although the degree of connectivity of a hub-node is assumed to be much higher than the average degree in a network, there is generally no quantitative threshold that can be adopted directly. In this study, we chose to define a hub-node, also called a hub-residue for protein structure networks, as a residue with at least three degree of connectivity for two reasons. First, for all native protein structures examined, the maximum degree of connectivity ranged from three to eight, with three being the minimal limit. Secondly, as the average degree of all nodes for the structures in the refined dataset ranged from 0.60 to 1.62, a degree of three or higher is statistically significantly higher than that range. Thus, a cutoff of three is consistent with the general definition of hub-nodes (Albert et al., 2000Go).

Examination of the hub-residues presented in the native structure dataset resulted in two important observations. First, the higher the degree of a hub-residue in a network, the more likely that position will be a conserved hydrophobic amino acid residue. This was demonstrated by the convergence at the high degree end of the three distribution curves for all nodes, conserved nodes and conserved hydrophobic nodes, (Figure 3). Among all the 628 positions with three degree of connectivity or higher, 53% of them were conserved hydrophobic residues. Among the 58 positions with six degree of connectivity or higher, 66% of them were conserved hydrophobic residues. Second, four hydrophobic amino acids, Ile, Leu, Phe and Val, accounted for 71% of the 628 hub-residue positions (Figure 4A). After weighting for amino acid abundance (Barnes and Gray, 2003Go), six hydrophobic amino acids, Ile, Leu, Phe, Val, Trp and Met, were the most frequent hub-residues, accounted for 80% of the total 628 positions (Figure 4B).


Figure 4
View larger version (24K):
[in this window]
[in a new window]
 
Fig. 4. (A) Percentage of individual amino acids serving as the hub-residue in the refined dataset. (B) Percentage of individual amino acids serving as the hub-residue in the refined dataset, adjusted for amino acid abundance.

 
An example—Fab R19.9 protein

The studies above demonstrated that the method developed here grasps the essential structural features of native proteins, as expected (Richards and Lim, 1993Go; Gerstein and Chothia, 1996Go; Greene and Higman, 2003Go; Socolich et al., 2005Go). As the initial step to test our hypothesis that native protein structures employ a few residues to a significant extent and non-native ones to a lesser extent, an example was selected randomly from the single decoy set of the Decoys ‘R’ Us database. Both the correct (PDB ID: 2F19) and incorrect (PDB ID: 1F19) structures of Fab R19.9 for a Monoclonal Anti-Arsonate Antibody were originally obtained from the PDB database. Both structures contain two chains, with each chain containing two domains, for a total of four independent domains. All four domains are listed in the SCOP database and satisfied our screening criteria. The backbone rmsd between the correct and corresponding incorrect domains is between 1.41 and 1.83 Å, and the all-atom rmsd ranged from 2.24 to 3.23 Å. Thus, there is little difference between these versions of the structure, making the problem quite challenging.

Indeed, both correct and incorrect domains display very similar trends in the average degree of connectivity and the connectivity pattern. The average degree increased in the order: all residues < conserved residues < conserved hydrophobic residues (Figure 5). Conserved hydrophobic residues dominated the high end of the distribution in the degree dimension (Supplementary Material II is available at PEDS online). For domains I and III, the same residue had the maximum degree of connectivity for both correct and incorrect ones, but they differed in domains II and IV, indicating the sensitivity of this measurement to structural changes.


Figure 5
View larger version (26K):
[in this window]
[in a new window]
 
Fig. 5. Comparison of average degree of connectivity per node among considered nodes for both correct and incorrect structures in the application example. (A) Correct structure (PDB ID: 2F19). (B) Incorrect structure (PDB ID: 1F19). Solid bars, all nodes; Open bars, conserved nodes; Cross bars, conserved hydrophobic nodes.

 
Despite these similarities, two distinct observations enabled one to select the correct structure over the incorrect one. First, for all incorrect domains, the average degree of connectivity for all nodes (0.51–0.68) was at the lower limit derived on the refined dataset (0.60–1.62), and much lower than the average value of 1.04. While the range for correct domains (0.94–1.07) always fell nicely into the expected range. As we defined the connectivity solely based on favorable inter-residue interactions, the difference indicated that there were more favorable interactions formed in the correct domain structures than in the incorrect structures. Second, there was an increase in the maximum degree of connectivity or an increase in the number of residues with the maximum degree of connectivity in the correct domains. For three of four domains (I, III and IV), the maximum degree of connectivity increased to six or seven from four (Figure 6). For domain II, although the maximum degree remained the same (4), the number of residues with that degree increased from one to seven. Hence, the results were fully consistent with our hypothesis.


Figure 6
View larger version (28K):
[in this window]
[in a new window]
 
Fig. 6. Connectivity patterns for all nodes for both correct and incorrect structures in the application example. Filled diamonds, correct structure domain; Open circles, incorrect structure domain.

 
Application to a test set

To test the generality of the finding, the same approach was applied to the multiple decoy set of the same database (see Materials and Methods). Extensive studies have been carried out recently on the decoys contained in this database using both hydrophobic fitness (HF) (Huang et al., 1995Go, 1996Go) and evolutionary trace (ET)-based methods (Mihalek et al., 2003Go). This provides a good opportunity to compare the performance of our approach. For any discriminating method, the small rmsd decoys poise the main challenge and are particularly interesting to study. Therefore, we focused on decoys with the C{alpha} rmsd no more than 1.5 Å from native structures.

The example shown above seemed to suggest two sets of criteria applicable for this purpose. The first set included the range of the average degree of connectivity per node for all nodes and the increased contributions from conserved hydrophobic residues, derived from studying native protein structures. The second set of criteria included the value of the maximum degree of connectivity and the number of residues with that maximum degree of connectivity in the structure. This set was derived based on our hypothesis.

Application of the first set of criteria to the multiple decoy sets resulted in almost no positive discrimination. Among all the 17 decoy sets, only one set with two decoy structures (PDB ID: 1SN3), had an average degree of connectivity that fell out of the expected range (0.60–1.62) (Figure 7). Furthermore, all decoy structures and their corresponding native structures displayed very similar trends in average degree of connectivity and connectivity pattern. The average degree increased in the order: all residues < conserved residues < conserved hydrophobic residues. Conserved hydrophobic residues dominated the high end of the distribution in the degree dimension (data not shown). This is understandable as the first set of criteria is based on diverse fold-type sets of native proteins. As a result, criteria such as the range of the average degree of connectivity are very broad, and may not be useful for identifying decoy structures that are very similar to native ones.


Figure 7
View larger version (35K):
[in this window]
[in a new window]
 
Fig. 7. Range of average degree of connectivity per node among all nodes for each decoy set in the multiple decoy dataset.

 
In contrast, the second set of criteria worked quite well. Among 93 decoy structures, 63 or 68% of them scored worse than their native structures (Table I). Overall, our approach performed somewhat better than the HF score and comparable to the ET-based score (Mihalek et al., 2003Go). Of special interest, is the distribution of decoys that scored better than their corresponding native structures. Such decoys were not distributed evenly across all the proteins examined or proportional to the size of the decoys for each protein. Rather, the distribution was based on the proteins themselves. For approximately one-third of the proteins, their decoys all scored better than their corresponding native structures, while for the remaining two-thirds, they all scored worse. This is quite interesting. It could be related to the nature of packing at the cores of those proteins. For instance, the packing of those proteins might not be fully optimized to a scale-free network.


View this table:
[in this window]
[in a new window]
 
Table I. Number of decoys in each group scoring worse than the native structure based on the second set of criteria

 

    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
The rapid growth of the protein sequence databases has enabled the use of information extracted from evolutionarily conserved amino acid residues in the prediction of three-dimensional protein structures (Oritz, et al., 1999Go; Mihalek et al., 2003Go; Taylor et al., 2003Go). Transformation of three-dimensional protein structures into two-dimensional networks provides a straight-forward way to elucidate the structural contributions of conserved residues. While this represents a simplification of the protein structure, the transformation allows analysis of various aspects of protein folds through network and graph theory (Huang et al., 1995Go, 1996Go; Kannan and Vishveshwara, 1999Go; Jacobs et al., 2001Go; Bonneau et al., 2002Go; Dokholyan et al., 2002Go; Vendruscolo et al., 2002Go; Wangikar et al., 2003Go; Amitai et al., 2004Go; Atilgan et al., 2004Go; Gupta et al., 2005Go).

Previous graph and network analyses of protein structures employed either a single distance cutoff (Greene and Higman, 2003Go) or inter-residue interactions (Amitai et al., 2004Go) as the basis for the determination of inter-residue contacts. The connectivity number linked to a node (residue) depends on the number of neighbors or the interacting residue partners the node has in space, and reflects either the local environment around that residue or its structural contributions in a semi-quantitative way. We adopted the second approach in our studies. Specifically, four types of interactions: hydrophobic interactions, hydrogen bonds, ionic bonds, and disulfide bonds were determined and regarded as the connectivities. As described in the Materials and Methods section, the rules adopted here were consistent with current view of the non-bonded and disulfide bond interactions observed in native protein structures and their importance to protein folding. Therefore, the inter-residue connectivity derived by following these rules should reflect the true contributions of individual residues.

Mapping the conserved residues onto the network allows one to study their connectivity patterns, and to determine their distinct contributions. This approach was applied to diverse fold-type sets of protein structures and shown to capture the essential features of their native three-dimensional protein structures (Richards and Lim, 1993Go; Gerstein and Chothia, 1996Go; Greene and Higman, 2003Go; Socolich et al., 2005Go). In addition, this approach provides a transparent and explicit way to define structural contributions of individual amino acid residues.

We sought to see if our analysis of highly connected residues could help discriminate non-native structures from the native one for a protein. Using a diverse set of decoys, we show that our method is quite effective for various proteins. Hence, this approach could be of help to protein structure discrimination. By applying the same approach to folds in question, and subsequently examining the connectivity patterns of conserved hydrophobic amino acid residues (dominating the high degree end) against those from native protein folds, we can make sound judgments regarding whether the packing of a particular fold is more consistent with the native fold or not.

Correlating sequence-related information with protein three-dimensional structures is a powerful way to study protein sequence–structure relationships (Lichtarge et al., 1996Go; Landgraf et al., 1999Go; Lockless and Ranganathan, 1999Go; Gaucher et al., 2002Go; Kass and Horovitz, 2002Go; Mihalek et al., 2003Go). Networks provide a simplifying application for displaying structural connections that highlight the importance of a few key residues. We propose that the existence of these hub-residues is an important hallmark of native folds. We plan to apply the same approach to protein structures sharing a similar fold. We envision such analysis might help uncover the topological principles underlying homologous protein structures. Furthermore, new insight may be provided by applying yet unexplored network properties to native protein structures.


    Acknowledgements
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
The authors thank Lifeng Tian for helping with computer issues, Dr Michael F. Bruist at University of the Sciences in Philadelphia and Dr Terry P. Lybrand at Vanderbilt University for comments on the manuscript. This work was supported by the starting fund from University of the Sciences in Philadelphia.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Abagyan R.A. and Batalov S. (1997) J. Mol. Biol. 273:355–368.[CrossRef][Web of Science][Medline]

Albert R., Jeong H., Barabasi A.L. (2000) Nature 406:378–382.[CrossRef][Medline]

Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. (1990) J. Mol. Biol. 215:403–410.[CrossRef][Web of Science][Medline]

Amitai G., Shemesh A., Sitbon E., Shklar M., Netanely D., Venger I., Pietrokovski S. (2004) J. Mol. Biol. 344:1135–1146.[CrossRef][Web of Science][Medline]

Apweiler R., et al. (2004) Nucleic Acid Res. 32:D115–D119.[Abstract/Free Full Text]

Atilgan A.R., Akan P., Baysal C., Vendruscolo M., Dokholyan N.V., Paci E., Karplus M. (2004) Biophys. J. 86:85–91.[CrossRef][Web of Science][Medline]

Barabasi A. and Albert R. (1999) Science 286:509–512.[Abstract/Free Full Text]

Barnes M.R. and Gray I.C. (2003) Bioinformatics for Geneticists (John Wiley & Sons, England).

Bonneau R., Ruczinski I., Tsai J., Baker D. (2002) Protein Sci. 11:1937–1944.[CrossRef][Web of Science][Medline]

Bradley P., Misura K.M.S., Baker D. (2005) Science 309:1868–1871.[Abstract/Free Full Text]

Casari G., Sander C., Vencia A. (1995) Nat. Struct. Biol. 2:171–178.[CrossRef][Web of Science][Medline]

Chenna R., Sugawara H., Koike T., Lopez R., Gibson T.J., Higgins D.G., Thompson J.D. (2003) Nucleic Acid Res. 31:3497–3500.[Abstract/Free Full Text]

Dokholyan N.V., Li L., Ding F., Shakhnovich E.I. (2002) Proc. Natl Acad. Sci. USA 99:8637–8641.[Abstract/Free Full Text]

Garbuzynskiy S.O., Melnik B.S., Lobanov M.Y., Finkelstein A.V., Galzitskaya O.V. (2005) Proteins 60:139–147.[CrossRef][Web of Science][Medline]

Gaucher E.A., Gu X., Miyamoto M.M., Benner S.A. (2002) Trends Biochem. Sci. 27:315–321.[CrossRef][Web of Science][Medline]

Gerstein M. and Chothia C. (1996) Proc. Natl Acad. Sci. USA 93:10167–10172.[Abstract/Free Full Text]

Gloor G.B., Martin L.C., Wahl L.M., Dunn S.D. (2005) Biochemistry 44:7156–7165.[CrossRef][Medline]

Gobel U., Sander C., Schneider R., Valencia A. (1994) Proteins 18:309–317.[CrossRef][Web of Science][Medline]

Greene L.H. and Higman V.A. (2003) J. Mol. Biol. 334:781–791.[CrossRef][Web of Science][Medline]

Gupta N., Mangal N., Biswas S. (2005) Proteins 59:196–204.[CrossRef][Web of Science][Medline]

Huang E.S., Subbiah S., Levitt M. (1995) J. Mol. Biol. 252:709–720.[CrossRef][Web of Science][Medline]

Huang E.S., Subbiah S., Tsai J., Levitt M. (1996) J. Mol. Biol. 257:716–725.[CrossRef][Web of Science][Medline]

Jacobs D.J., Rader A.J., Kuhn L.A., Thorpe M.F. (2001) Proteins 44:150–165.[CrossRef][Web of Science][Medline]

Kannan N. and Vishveshwara S. (1999) J. Mol. Biol. 292:441–464.[CrossRef][Web of Science][Medline]

Kass I. and Horovitz A. (2002) Proteins 48:611–617.[CrossRef][Web of Science][Medline]

Landgraf R., Fischer D., Eisenberg D. (1999) Protein Eng. 12:943–951.[Abstract/Free Full Text]

Larson S.M., Nardo A.A., Davidson A.R. (2000) J. Mol. Biol. 303:433–446.[CrossRef][Web of Science][Medline]

Lichtarge O., Bourne H.R., Cohen F.E. (1996) J. Mol. Biol. 257:342–358.[CrossRef][Web of Science][Medline]

Livingstone C.D. and Barton G.J. (1996) Methods Enzymol. 266:497–512.[Web of Science][Medline]

Lockless S.W. and Ranganathan R. (1999) Science 286:295–299.[Abstract/Free Full Text]

Mihalek I., Res I., Yao H., Lichtarge O. (2003) J. Mol. Biol. 331:263–279.[CrossRef][Web of Science][Medline]

Murzin A.G., Brenner S.E., Hubbard T., Chothia C. (1995) J. Mol. Biol. 247:536–540.[CrossRef][Web of Science][Medline]

Oritz A.R., Kolinski A., Rotkiewicz P., Ilkowski B., Skolnick J. (1999) Proteins 37:177–185.

Park B. and Levitt M. (1996) J. Mol. Biol. 258:367–392.[CrossRef][Web of Science][Medline]

Rarey M., Kramer B., Lengauer T., Klebe G. (1996) J. Mol. Biol. 261:470–489.[CrossRef][Web of Science][Medline]

Richards F.M. and Lim W.A. (1993) Q. Rev. Biophys. 26:423–498.[Web of Science][Medline]

Samudrala R., Xia Y., Levitt M., Huang E. (1999) Pac. Symp. Biocomput. 1999:505–516.

Sauder J.M., Arthur J.W., Dunbrack R.L. Jr. (2000) Proteins 40:6–22.[CrossRef][Web of Science][Medline]

Shindyalov I.N., Kolchanov N.A., Sander C. (1994) Protein Eng. 7:349–358.[Abstract/Free Full Text]

Simons K., Kooperberg C., Huang E., Baker D. (1997) J. Mol. Biol. 268:209–225.[CrossRef][Web of Science][Medline]

Simon A.L., Stone E.A., Sidow A. (2002) Proc. Natl Acad. Sci. USA 99:2912–2917.[Abstract/Free Full Text]

Socolich M., Lockless S.W., Russ W.P., Lee H., Gardner K.H., Ranganathan R. (2005) Nature 437:512–518.[CrossRef][Medline]

Stickle D.F., Presta L.G., Dill K.A., Rose G.D. (1992) J. Mol. Biol. 226:1143–1159.[CrossRef][Web of Science][Medline]

Taylor W.R., Munro R.E.J., Petersen K., Bywater R.P. (2003) Comput. Biol. Chem. 27:103–114.[CrossRef][Web of Science][Medline]

Valdar W.S. (2002) Proteins 48:227–241.[CrossRef][Web of Science][Medline]

Vendruscolo M., Dokhoyan N.V., Paci E., Karplus M. (2002) Phys. Rev. E Stat Nonlin Soft Matter Phys. 65:061910.[Medline]

Wangikar P.P., Tendulkar A.V., Ramya S., Mali D.N., Sarawagi S. (2003) J. Mol. Biol. 326:955–978.[CrossRef][Web of Science][Medline]

Xia Y., Huang E., Levitt M., Samudrala R. (2000) J. Mol. Biol. 300:171–185.[CrossRef][Web of Science][Medline]

Yao H., Kristensen D.M., Mihalek I., Sowa M.E., Shaw C., Kimmel M., Kavraki L., Lichtarge O. (2003) J. Mol. Biol. 326:255–261.[CrossRef][Web of Science][Medline]

Received October 18, 2005; revised February 1, 2006; accepted February 21, 2006.

Edited by Dek Woolfson


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Protein Eng Des SelHome page
V. Pabuwal and Z. Li
Comparative analysis of the packing topology of structurally important residues in helical membrane and soluble proteins
Protein Eng. Des. Sel., February 1, 2009; 22(2): 67 - 73.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
V. Pabuwal and Z. Li
Network pattern of residue packing in helical membrane proteins and its application in membrane protein structure prediction
Protein Eng. Des. Sel., January 3, 2008; (2008) gzm059v1.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow supplementary data
Right arrow All Versions of this Article:
19/6/265    most recent
gzl009v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Muppirala, U. K.
Right arrow Articles by Li, Z.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Muppirala, U. K.
Right arrow Articles by Li, Z.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?