Skip Navigation


PEDS Advance Access originally published online on January 14, 2008
Protein Engineering Design and Selection 2008 21(2):91-100; doi:10.1093/protein/gzm083
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
21/2/91    most recent
gzm083v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Spassov, V. Z.
Right arrow Articles by Yan, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Spassov, V. Z.
Right arrow Articles by Yan, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org

LOOPER: a molecular mechanics-based algorithm for protein loop prediction

Velin Z. Spassov1, Paul K. Flook and Lisa Yan

Accelrys, 10188 Telesis Court, Suite 100, San Diego, CA 92121, USA

1 To whom correspondence should be addressed. E-mail: vss{at}accelrys.com


    Abstract
 Top
 Abstract
 Introduction
 Theory
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
We describe a new ab initio method and corresponding program, LOOPER, for the prediction of protein loop conformations. The method is based on a multi-step algorithm (developed as a set of CHARMm scripts) and uses standard CHARMm force field parameters for energy minimization and scoring. One of the main obstacles to ab initio computational loop modeling is the exponential growth of the backbone conformational states with the number of residues in the loop fragment. In contrast to many ab initio algorithms that use Monte-Carlo schemes or exhaustive sampling, LOOPER adopts a systematic search strategy with minimal sampling of the backbone torsion angles. During the initial conformational sampling, two representative states are sampled for each alanine-like residue based on pairs of initial {varphi} and {psi} dihedral angles, except glycine, which is sampled by four representative conformations. The initial ({varphi}, {psi}) values are determined from the analysis of a novel iso-energy contour map which is proposed as an alternative structure validation method to the widely used Ramachandra plot. The efficient sampling strategy is combined with energy minimization at each step. The initial energy minimization and scoring of the loop include the interactions of the protein core with loop backbone atoms only. Construction and optimization of the side-chain conformations is followed by a final ranking stage based on the CHARMm energy with a generalized Born solvation term as a scoring function. The systematic and efficient sampling strategy in LOOPER consistently finds near native loop conformations in our validation study. At the same time, the computational overhead of our method is significantly lower than many alternative approaches that use exhaustive search strategies.

Keywords: conformational sampling/energy minimization/force field/protein loops/protein modeling


    Introduction
 Top
 Abstract
 Introduction
 Theory
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
The analysis of protein structure–function relationships depends on the quality of 3D molecular models determined by either experimental or computational methods. Given the higher mobility of surface loop regions, many X-ray structures have low electron density at those sites, leading to poorly resolved loop fragment structures. This problem is exacerbated by the higher levels of variation within loop regions of homologous sequences, which increase the difficulty of predicting loop conformation using homology modeling methods.

The ability to resolve loop structures is of importance since these regions typically have a special role in understanding protein biology. Key regions of functional significance between homologous proteins, such as ion or ligand binding sites, are often concentrated in loop regions with variable structure (Fiser et al., 2000Go). For this reason, loop regions are attractive targets for mutation experiments. However, the backbone conformation of a loop region is more likely to change after amino acid substitutions compared with the core secondary structure domains which are stabilized by non-specific interactions such as networks of backbone hydrogen bonds in alpha helices and beta-sheets.

Other activities also drive the interest in protein loop structure prediction. For example, computational methods for predicting induced fit flexible protein ligand docking may be improved by determining variable loop conformations. The development of loop prediction algorithms is also of general theoretical interest since it contributes to the understanding of the structural determinants and the balance of molecular forces responsible for achieving the native loop conformation.

Collectively, these examples show the importance of accurate protein loop structure prediction and emphasize the need for ab initio computational approaches to loop conformational construction and refinement. The loop prediction problem can be seen as a ‘mini protein folding problem’ as characterized by Fiser et al. (2000Go). As with the general protein folding problem, one of the main obstacles to finding the optimal loop conformation is the combinatorial problem arising from the exponential growth of the possible conformational states of the peptide backbone with the length of the loop.

Many different approaches to loop modeling have been developed and have resulted in a gradual improvement of predictive accuracy (e.g. van Vlijmen and Karplus, 1997Go; Fiser et al., 2000Go; Deane and Blundell, 2001Go; Xiang et al., 2002Go; Jacobson et al., 2004Go). The loop predictions in Jacobson et al. (2004Go) are reported with relatively high accuracy. However, most of the results are obtained in crystal environment and it is difficult to judge if the accuracy level will remain the same in important application areas such as homology modeling and flexible ligand docking. A more recent work reported by Zhu et al. (2006Go) had better accuracy for prediction of long loops, but the loops are also modeled in crystal environment.

In general, the loop predicting methods can be divided into two classes: (i) methods based on structural database searches, as proposed by Greer (1980Go) and Chothia and Lesk (1987Go), and (ii) ab initio algorithms that use conformational sampling, structural optimization and energy scoring functions (Bruccoleri and Karplus, 1987Go; Moult and James, 1986Go). Procedures that combine elements of both approaches have also been developed (Chothia et al., 1986Go; van Vlijmen and Karplus, 1997Go). The literature on loop prediction algorithms has been comprehensively reviewed elsewhere (Fiser et al., 2000Go; Deane and Blundell, 2001Go; Jacobson et al., 2004Go) and is not covered further here.

In this report, we describe a method, LOOPER, that adopts a new computational approach to ab initio prediction of loop structure in proteins. One of the main goals behind the development of LOOPER was the creation of a computational program for loop structure prediction and refinement that could subsequently be deployed as a component in other different computationally demanding modeling procedures such as rigid or flexible protein-ligand docking, and protein–protein docking.

The method has some similarities to the approach of Bruccoleri and Karplus (1987Go) such as the use of systematic sampling of backbone conformers and the use of CHARMm (Brooks et al., 1983Go) energy minimization and scoring methods. However, LOOPER is unique in its use of an original multi-step algorithm. It includes an initial systematic search of loop conformers with optimal interactions of loop backbone with the rest of protein atoms, side-chain construction and refinement. It also employs a final ranking step that uses a CHARMm energy scoring function with a generalized Born solvation term. One of the most important features of the initial search algorithm is the efficient strategy used to sample the backbone dihedral angles of individual residues where the lack of a more comprehensive search is compensated by energy minimization. We also describe a novel iso-energy contour map used in the selection of backbone dihedral angle values which we propose as alternative to the widely used Ramachandran plot (Ramachandran et al., 1963Go).


    Theory
 Top
 Abstract
 Introduction
 Theory
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
Sampling the backbone conformations

The initial phase of the algorithm includes a systematic search of loop structures by sampling the backbone dihedral angles {varphi} and {psi}. To reduce the search time for relatively long segments we adopted a minimalistic approach to sampling the sets of initial conformational states of individual residues. The initial conformations are chosen based on the pairs of ({varphi}, {psi}) values so that each conformation belongs to one of the low energy basins of local interactions. The resulting set of combinations gives good coverage of the principal changes in the direction of the peptide chain. The direction of peptide chain depends on the mutual orientation of adjacent peptide bonds and in a simplified way can be described by the dihedral angle {zeta} around the virtual bond connecting two C{alpha} atoms, as illustrated in Fig. 1.


Figure 1
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. N-acetyl-Ala-Ala-methylamide illustrates the use of a model dipeptide to map the local backbone interactions and the ‘virtual bond’ representation of peptide chain.

 
Because of the stereochemistry of the example dipeptide shown in Fig. 1, the angle of rotation {zeta} around Ca–1–Ca virtual bond for a given pair of residues X–1, X, depends mostly on the central {psi}–1 and {varphi} dihedral angles and much less on the terminal {varphi}–1 and {psi} angles. To illustrate the relationship between peptide chain orientation and local backbone interactions, we created an iso-energy contour map of backbone interactions, as shown in Fig. 2. This map is of interest since it can potentially be used as an alternative to the well-known Ramachandran plot (Ramachandran et al., 1963Go) for evaluating the robustness of a protein structure model.


Figure 2
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. The iso-energy contours of the intramolecular interactions are calculated as a function of ({psi}–1, {varphi}) dihedral angles of the model dialanine peptide. The continuous contour lines represent the conformations with energy at 1 kcal/mol above the minimum; dashed line, 2 kcal/mol; dotted line, 6 kcal/mol. The straight lines show the mutual orientation of the peptide bonds presented by the angle of rotation {zeta} around the central virtual bond of the dipeptide according to Fig. 1.

 
The main difference between our map and the Ramachandran plot is that the energy values are represented as a function of ({psi}–1, {varphi}) dihedral angles instead of ({varphi}, {psi}). The mapped energy values were calculated for an atomic model of N-acetyl di-alanine methylamide using CHARMm and Momany and Rone (1992Go) force field parameters. The peptide conformations were generated using incremental 10° steps for central ({psi}–1, {varphi}) angles, whereas the coordinates of the atoms that are not included in central peptide were subject of structural optimization using the CHARMm adopted basis Newton–Raphson (ABNR) energy minimization routine. A similar ({psi}–1, {varphi}) plot has been used previously (Sudarsanam et al., 1995Go) to illustrate the occurrence of ({psi}–1, {varphi}) pair values in a large and diverse set of proteins. Comparison with the data reported by Sudarsanam et al. (1995)Go shows that the low energy areas on Fig. 2 are in an excellent agreement with the distribution of ({psi}–1, {varphi}) values in the corresponding X-ray structures.

In addition to the iso-energy contours in Fig. 2, we also include the idealized values of the virtual {zeta} angle (indicated by straight dashed lines). The actual computed values of the {zeta} angles show some minor fluctuations mostly because of variation in the value of the terminal {psi} angles. In the experimentally determined protein structures, the {psi} angle values vary more significantly and the actual values of {zeta} could differ by up to 20–30° from the plotted values, but these differences are relatively unimportant for the overall interpretation of the plot. Instead, the key characteristic of the peptide backbone conformation plot shown in Fig. 2 is the transition from a cis- configuration ({zeta} = 0°) of virtual Ca–Ca bonds through a ‘right {alpha}-helix like’ ({zeta} > 0°) and completely extended ({zeta} = 180°) configurations to the ‘left {alpha}-helix like’ ({zeta} < 0°), and even a full rotation around virtual Ca–Ca bond without exceeding high energy barriers. Similar to the corresponding Ramachandran plot of the same data, the left quadrants in Fig. 2 show much deeper energy basins than the right quadrants. This is consistent with the observations of Fiser et al. (2000Go), who reported low populations of conformational states with positive {varphi} angles in experimentally determined protein structures. Switching from the left to right quadrants, which corresponds to the transition of {varphi} angle from negative values to positive, is much more restrictive. The absence of significant energy barriers between different peptide orientations increases the chance of finding global minima from the limited set of initial conformations by energy minimization. This provides the theoretical explanation for our successful substitution of an extensive initial sampling step with a limited sampling combined with energy minimization. We would like to emphasize that the intrinsic internal flexibility of peptide chain, shown in Fig. 2, cannot be seen in the standard Ramachandran plot.

Analysis of Fig. 2 predicts that our initial sampling step will not only cover the low energy areas of each individual residue, but also a suitably large sample of the orientation of the virtual peptide bond. Taking this into account, the initial sampling in LOOPER for alanine-like residues was reduced to only two basic conformational states of individual residues: ({varphi} = –90, {psi} = 120) and ({varphi} = –60, {psi} = –40). As can be seen in Fig. 2, the combinations between selected values of ({varphi}, {psi}) pairs are sufficient to generate initial conformations that sample most of the basic orientations of the peptide chain, i.e. right-hand, left-hand, extended and even cis-like configurations of virtual Ca–Ca bonds. The number of points to sample for a loop of N residues is 2N. However, considerably less populated conformations with ‘left-helical-like’ values of {varphi}~60 are not unusual in protein structures and so we used a simplified approach to include the ‘left-helical’ states, extending the combinations of alanine-like residues by allowing any of them, but only one per chain, to be {varphi} = 60, {psi} = 60. Thus, the combinatorial growth becomes (N + 2)2N–1, resulting in acceptable computation times for even relatively long chains (e.g. greater than 12 residues).

Energy calculations

All energy calculations are carried out using the CHARMm package (Brooks et al., 1983Go) with Momany and Rone (1992Go) polar hydrogen force field parameters. The first two steps use the CHARMm energy function.


Formula 083M1 1

which includes an electrostatic term with a distance-dependent ‘dielectric constant’ {varepsilon}ri,j


Formula 083M2 2

where qi are the atomic charges and rij are the inter-atomic distances. In all calculations, the dielectric constant parameter {varepsilon} is set to 2. The non-polar parts of the energy function in Eq. (1) include all standard CHARMm non-covalent and covalent interaction terms, such as van der Waals energy, bond length, bond angle and dihedral angle energy terms. Equation (2) was chosen for all steps, except the final scoring. The distance-dependent electrostatics method is the fastest among all electrostatic models and more importantly, it affords the opportunity to use short cutoff distances, while still being able to take into account the presence of the solvent by absorbing the screening effect of the water environment. A more complex functional form of CHARMm energy is used for the final scoring, corresponding to a potential of mean force (Bashford and Case, 2000Go), where the electrostatic contribution to protein solvent interactions is represented in generalized Born approximation (Still, 1990Go).


Formula 083M3 3

Ecoul is a standard coulombic term


Formula 083M4 4

and {Delta}Gpol is the polar contribution to the solvation free energy in generalized Born approximation:


Formula 083M5 5

where {varepsilon}m and {varepsilon}slv are the intramolecular dielectric constant and the dielectric constant of the solvent and {alpha}i are the effective Born radii of protein atoms. Our method does not include a non-polar solvation term in its current state, since it has insignificant effect for the loop lengths reported here. However, for longer loops, it could be important, as discussed in the Loop prediction algorithm section below.

Loop definition

In both the prediction algorithm and RMSD calculations, we use a strict definition of the loop region. According to this definition, the coordinates of all atoms from the loop region are to be included in the prediction without exception, i.e. starting from the N atom of the first residue and ending with the O atom of the last residue of the loop. No restriction or anchoring is applied to any atoms within the specified loop region based on known crystallographic data. This is noteworthy since in much of the literature, loop regions subjected to optimization are not clearly defined and quite often, some of the atoms in the first and last residues of the loop are anchored according to their known input (X-ray) structure. As such, it is important to appreciate that the length of the loop is effectively shorter than discussed and, in turn, will make the reported accuracy of those methods slightly higher than their actual value on the validation data sets.

Accuracy tests

Evaluation of the accuracy of loop structure predictions were carried out by calculating the global root mean square deviation (RMSD) of the loop segment as defined by Fiser et al. (2000)Go. In this definition, the superimposition of the model to the X-ray structure is done globally using all the residues in the protein, excluding the loop, and the RMSD is then calculated using the backbone atoms N, C{alpha}, C and O of the loop region. The global RMSD is the most stringent criterion for the prediction accuracy comparing with the RMSD estimations based on N, C{alpha}, C atoms (van Vlijmen and Karplus, 1997Go) or to local RMSD values (Fiser et al., 2000Go) where the superimposition of the model to the X-ray structure is done using loop region only. Note, however, that the alternative RMSD criteria are based on reasonable assumptions and N, C{alpha}, C and O global RMSD can be approximately compared with N, C{alpha} and C global RMSD by multiplying a factor of 1.1–1.2.

Loop prediction algorithm

The method was implemented as a multi-step computational algorithm where each step is encoded as a separate program module written in the CHARMm scripting language. In the stand-alone variant, all CHARMm scripts are wrapped into a Perl program which runs the calculation in an automatic way. The Perl program has also been deployed as a Pipeline PilotTM component to improve automation. Pipeline Pilot is a data pipelining environment from AccelrysTM that enable rapid development and execution of custom pipelines. The LOOPER component as well as related protocols is available as part of the Discovery StudioTM software suite from Accelrys.

For the sake of simplicity, we will describe the algorithm in three basic steps. During each step of the algorithm, the loop fragments are modeled in the presence of all atoms of the non-loop region. The atomic coordinates of non-loop heavy atoms are taken from the input structures and remain fixed during loop prediction, whereas the hydrogen atoms are reconstructed using the CHARMm HBUILD routine (Brunger and Karplus, 1988Go). All tests in this study were conducted after excluding ions, ligands and other hetero-atoms, although in principle the programs can handle non-amino-acid residues in any non-loop part of the system, providing the appropriate topologies and parameters are defined in the CHARMm force field files. In the Discovery Studio implementation, the CHARMm Momany and Rone (1992)Go force field topology and atomic parameters are evaluated automatically for any molecule included in the system.

The results from a recent study of the intramolecular interactions in proteins (Spassov et al., 2007Go) suggest that, on average, the interactions of amino acid side chains with backbone atoms are significantly greater than the interactions between the side chain atomic groups. This suggests that a strategy for improving the performance of the initial conformational sampling stage is to replace residues from the loop segment with alanine. The exceptions to this replacement strategy are glycine, cysteine and proline residues. The side chains are subsequently added back on with an additional step of packing optimization.

At a high level, the three major steps of the computational procedure are as follows:

  1. Construction of a set of possible loop conformers with optimized interactions of loop backbone with the rest of the protein.
  2. Building and structural optimization of loop side chains and energy minimization applied to all loop atoms.
  3. Final scoring and ranking the retained variants of loop conformers.

A more detailed description of the individual steps is provided below.

Step 1: construction and optimization of the loop backbone. Initially the loop region is split into two independent halves, modeled as poly-alanine chains. With systematic conformational sampling, multiple initial loop conformers are generated from all possible combinations of a small number of individual conformational states where each residue in the loop is represented by a pair of ({varphi}, {psi}) dihedral angles. To reduce the number of combinations, we chose the minimal number of initial ({varphi}, {psi}) dihedral angles as described in ‘Sampling the backbone conformation’. The atomic coordinates of each half of the loop are generated in the presence of all non-loop atoms, but without the presence of atoms from the other half of the loop. The coordinates of the loop atoms are set by editing the CHARMm internal values of ({varphi}, {psi}) angles and then executing the CHARMm IC BUILD command.

Each reconstructed structure with a half loop is optimized by combined use of the steepest descent and ABNR energy minimization protocols with the non-loop atoms in fixed positions and 10 Å cutoff distance. For both loop halves, all generated structures are ranked using the total CHARMm energy as scoring function. No more than 50 top ranking variants for each half are selected based on an optional energy cutoff criterion and the corresponding atomic coordinates are saved for further modeling. For loops longer than eight residues, the maximal number of retained conformers is increased to 100.

The pairs of half loop structures are then connected, resulting in maximum of 2500 combinations (or 10 000 for longer loops). The resulting loops are optimized using the same energy minimization protocol. Tests show that the harmonic potential between bonded C and N atoms connecting the loop halves is strong enough to achieve correct closure of the loop in almost all of the cases. A new ranking is applied to the successfully closed loop conformations and the first 50 of them are retained for the second step of the protocol.

Step 2: construction and optimization of the loop side chain. This step involves construction of amino acid side chains on the loop residues and re-optimization of the loop segment. Similar to the first step, the atoms of the protein core are fixed. For each of the retained loop conformers from the first step, the side chains are added and optimized using a fast ab initio algorithm for side-chain refinement, ChiRotor (Spassov et al., 2007Go). In contrast to many recent side-chain prediction algorithms, ChiRotor does not rely on an exhaustive search strategy. Instead, the algorithm uses an efficient CHARMm minimization step, with limited sampling making it suitable for incorporation into the LOOPER method.

After the side chains are constructed, the constraints are removed from the loop backbone atoms and the structure of the loop segment is re-optimized using the same energy minimization protocols used in the first step but with non-bonded cutoff extended to 20 Å. The retained variants are re-ranked according to the energy after the final minimization and the coordinates of optimized structures are saved.

Step 3: scoring and ranking the loop conformers. In this final stage, the retained conformers are scored using a scoring function based on the potential of mean force calculated by Eq. (3) where the polar contribution to the free energy of solvation calculated by a CHARMm generalized Born method (Dominy and Brooks, 1999Go) is added to the intramolecular interaction terms. In principle, the CHARMm software allows the inclusion of a non-polar contribution of solvation energy and we tested this by including a surface area-dependent non-polar term as previously described (Spassov et al., 2002Go). However, even using different values of the surface-tension parameter, the inclusion of the non-polar term resulted in only noise-like fluctuations in the calculated energy with much smaller amplitude than the changes of polar solvation energy. Note that in the modeling of longer loops, the non-polar term might be useful, because the cumulative effect of atomic surface areas could increase significantly. The final energy calculations in Step 3 are carried out without the use of cutoff distances and with {varepsilon}m = 4 for the intramolecular dielectric constant in Eqs. 4 and 5.

Test sets

The test results reported below are obtained using the Fiser et al. (2000Go) collection of loops with lengths of between 2 and 12 residues. For each loop length, the Fiser et al. selection contains 40 loop segments each from 40 different protein structures. The Fiser test set has been used in a number of recent studies of loop prediction (Michalsky et al., 2003Go; Jacobson et al., 2004Go) and so it is convenient for the purposes of comparing accuracy of different methods. We also chose this set because its large size and diversity make it appropriate for obtaining statistically meaningful results.


    Results and discussion
 Top
 Abstract
 Introduction
 Theory
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
The LOOPER method creates multiple loop conformations at each step of the prediction and selects the final loop conformation based on energy criteria. Our results establish a correlation between the conformational energy and the loop backbone RMSD to the native structure at each prediction step. In Figs 3 and 4, the energies for different loop conformations are plotted against the backbone RMSD for two test cases of nine-residue loops. Figure 3 represents the results of a nine-member loop (residues 127–135 of 1arp pdb structure) to illustrate a case where the loop structure is predicted with high accuracy (0.39 Å final RMSD). The energies in the plot are represented by the CHARMm energy differences between the predicted conformations and X-ray structures optimized by CHARMm (using the same method as the final minimization step in the loop prediction). The RMSD values are calculated relative to original X-ray structure. Figure 3A shows clear correlation between RMSD values and the loop energy. The conformation that leads to the structure with lowest energy after the final scoring is marked by the shaded rhomb. It is typical for a successful prediction that the conformation leading to the final predicted structure is among the top ranked conformations after the first phase, but is usually not the best one. Figure 3B illustrates the effect of adding the side chains and the optimization of the full loop segment in Step 2. In this particular test case, it shifts the highest ranking conformation to the top of best ranked structures and improves its RMSD from 1.2 to 0.39 Å. After adding the solvation term to the energy function, the top ranked conformation remains unchanged, as seen in Fig. 3C. This implies that stronger intramolecular rather than protein–solvent interactions steer the loop to the native conformation.


Figure 3
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Example of a nine-member loop predicted with high accuracy (final RMSD = 0.39 Å). The loop is defined by residues 127–135 from 1arp pdb entry. The plot represents the energy versus backbone RMSD (in [Å]) of generated structures in different steps of the algorithm: (A) results for all successfully closed and optimized loops after Step 1. The filled circles represent the 100 best scored conformations, which are selected and advanced into next step; (B) The retained conformation after side chains are added; (C) The conformations after the final minimization and scoring. The shaded rhomb represents the conformation leading to predicted structure.

 

Figure 4
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. The same as Fig. 3 but for a loop predicted with medium accuracy (RMSD = 1.25 Å). The data are for the nine-member loop of residues 6–14 from 1gky pdb entry.

 
The second example, 1gky (shown in Fig. 4) illustrates the case where the solvation term in final scoring function is critical for the prediction. In this test case, although the leading conformation is among the best ranked after the first step, addition of side chains introduced a number of false positives (Fig. 4B). However, including the solvation term in the final scoring corrects the prediction by increasing the energy of ‘false positives’ just enough to shift the leading conformation (with RMSD = 1.25 Å) back to first place. Figure 4 also shows that a small number of conformations with a lower RMSD than predicted structure are generated, but underscored in all three steps. The reasons for underscoring of some accurate structures are complex and reflect approximations in energy minimization step included in the side chain optimization as well as in the physical model and force field parameters.

Figure 5 shows the relationship between loop length and the corresponding RMSD of the lowest energy loop to the native structure. The results are shown for each of the three steps of the algorithm since the RMSD differences between different steps reflect the impact of the structural models and energy functions. Interestingly, for loops of length nine-residues and below, Step 1 is enough to achieve an average accuracy below 2 Å. Because in this phase, the side chains are not present in the loop structure, this result suggest for a major role of the interactions involving loop backbone that together with the steric constrains drives the loop structure to the correct conformation. It can also be seen that, in general, inclusion of the side chains in the molecular model improves the accuracy of the predictions, although in a few cases no improvement is seen. A more detailed analysis shows that minimizations with all side chains added improve the RMSD of near native conformations, but at the same time occasionally introduce false positives. The latter could be expected, because at this stage the self-terms of protein–solvent interactions are not included in the scoring function but are important for scoring the conformation of the loops which have polar side chain groups on the protein surface and interact strongly with solvent. Including the generalized Born solvation term in the final scoring not only improves the accuracy as shown in Fig. 5, but also makes the RMSD dependence on loop length smoother. It may also reduce the noise effect of some false positives when including the side chains. Technical obstacles prevented inclusion of the generalized Born solvation term in the second step of the algorithm but this may be a possible approach for future investigation, though this would be at the expense of performance.


Figure 5
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5. Accuracy of prediction at different steps of the algorithm as function of the loop length. The data are taken for all 40 loops for each loop length and the RMSD are calculated for the loop conformation of lowest energy at each step. (A) Average backbone RMSD in Angstroms; (B) Median backbone RMSD; (C) Average RMSD of all heavy atoms of the loop. The lines marked with 1, 2 and 3 correspond to the results of Step 1, Step 2 and Step 3, respectively.

 
Table I presents the prediction accuracy of the test results as a function of the loop length from two to 12 residues. The test calculations were carried out using the Fiser et al. (2000Go) test sets consists of 40 loops in 40 different protein structures for each loop length.


View this table:
[in this window]
[in a new window]

 
Table I. Accuracy of prediction as a function of the loop length

 
The very low mean values of backbone RMSD for loops up to three- or four-residues long is not unexpected if we take into account the increased constraints on loop termini for shorter loops. It is remarkable, however, that despite limited initial sampling the method demonstrates high levels of accuracy for loops from medium size up to nine residues. Note that according to a widely used classification (Fiser et al., 2000Go), an accurate loop prediction is a loop predicted with a global backbone RMSD less than 1.5 Å.

The comparison of the average RMSD data in Fig. 6 with corresponding global RMSD data from Fiser on the same set of proteins (Fig. 9B in Fiser et al., 2000Go) show similar accuracy for very short loops and better accuracy for longer loops. The RMSD values reported in Fig. 9B of Fiser's work are based on 50 independent sampling which is insufficient. Even if the sampling is increased to 500 independent initial loop conformations (Fig. 8 in Fiser et al., 2000Go), a computationally intense operation, our results are still better for long loops, e.g. loop length of 8 and 12 residues. We do not present data for loops longer than 12 residues, because although lower than in the Fiser study, the average RMSD values for 12 member loops are too high to attach any meaningful interpretation. The comparison with Fiser's results leads us to conclude that a simpler approach combining systematic sampling with energy minimization protocols is a better alternative to computationally more intense methods using molecular dynamics and simulated annealing steps.


Figure 6
View larger version (12K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 6. Comparison of the average RMSD of predicted loop structures for Fiser et al. (2000Go) data set. Triangles, LOOPER; circles, Fiser et al. (2000Go); rhombs, Jacobson et al. (2004Go).

 
To the best of our knowledge, the most accurate predictions achieved to date on the Fiser data set are reported in the study of Jacobson et al. (2004Go). However, most of the results in the Jacobson study were obtained by modeling the loop structures in a crystal environment in an attempt to demonstrate the full power of the method. In addition, the loop decoy structures published in Jacobson Web page suggest that the X-ray coordinates of N and C{alpha} atoms from the first and C{alpha}, C and O atoms from the last residues of loop region are used as fixed anchors which prevents a direct comparison of the results. Therefore, in Fig. 6, we shifted the Jacobson's loop lengths by half a residue to the left to correct for the reduced effective loop length. Note that Jacobson's results presented in Fig. 6 for 11- and 12-residue loops are based on the Xiang et al. (2002Go) data set. The authors excluded the Fiser test cases of loops longer than 10 residues from their study because of the ‘high content’ of secondary structure that eventually makes the prediction much more difficult. Even so, we believe including the more difficult cases in the test sets used here is desirable since it illustrates the applicability of our method. As a result, we did not discard the Fiser test structures for 11- and 12-residue loops.

It is worth noting, without using an exhaustive sampling based on large backbone dihedral angle libraries and without crystal packing, our algorithm achieves only slightly less accurate average RMSD on ~0.1–0.2 Å. Interestingly, according to the authors, in eight member loop case, the removal of the crystal packing environment increases the average backbone RMSD from 1.05 to 1.46 Å, but no effect is observed in 10-residue loop results. Even not taking into account the difference in the effective loop length, LOOPER achieves a better accuracy of 1.33 Å for eight-residue loops comparing with the results of Jacobson et al. at much lower computational cost. For example, for an eight-member loop the average time per protein entry is 23 min using a 3 GHz processor (Fig. 7), whereas Jacobson et al. reported 1186 min of CPU time on a 1.4 GHz machine. The 4, 8 and 10 member data sets were also used by Rohl et al. (2004)Go to test the loop prediction method based on the Rosetta algorithm. The RMSD values of 0.69 Å (four residues), 1.45 Å (eight residues) and 3.62 Å (10 residues) reported by the authors are very close to our results and those of Jacobson et al.


Figure 7
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 7. The mean CPU time (in minutes) as a function of the loop length averaged on sets of 40 protein entries taken from Fiser et al. (2000Go). The data were obtained using a single CPU Intel Pentium 4, 3.0 GHz processor based machine.

 
Table II presents the test results for the individual residues from an eight-member loop data set obtained at the different stages of the algorithm. In general, the backbone RMSD decreases from the first to the last step. More significantly, after final scoring, as in Fig. 5, the results demonstrate the importance of the solvation term in the scoring function for successful re-ranking of the variants after re-constructing the side chains into the loop structures. However, a small number of cases, such as 2sga or 1hbq, show the opposite situation and the leading variants after the first step have significantly lower RMSD than after the final third step. The 2sga structure demonstrates the uncommon situation where accurate low energy backbone structures are generated successfully after the first step, but underscored in the later steps, most probably because of an unsuccessful side-chain refinement.


View this table:
[in this window]
[in a new window]

 
Table II. Accuracy of the predicted loops at different stages of the method

 
Figure 7 illustrates the scaling of computational efforts in LOOPER with respect to loop lengths. The data represent the mean CPU time per protein entry averaged on the proteins from the Fiser et al. (2000Go) data set. A more detailed inspection of the results (not shown) indicate that the time spent on the initial sampling in Step 1 increases faster than the time spent on other stages and for longer loops, Step 1 becomes the rate limiting step. This explains our observation that in short loop cases, computational time is more dependent on the overall size of the molecule, whereas for longer loops (i.e. longer than 12 residues), the computational time is more dependent on the length of the loop.


    Conclusions
 Top
 Abstract
 Introduction
 Theory
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
We have shown that an ab initio loop prediction algorithm using a minimal conformational sampling combined with energy minimization can be an effective alternative to more time consuming computational approaches (e.g. those based on exhaustive searches of backbone dihedral angle space or molecular dynamics simulation). The combined accuracy and low computational cost make LOOPER as a valuable component for integration in a variety of protein modeling tasks. Another potential application of the software is to be used as generator of multiple energy minimized non-native loop structures, for example, for the purposes of flexible ligand docking. In its current implementation, one weakness of the method is the sharp loss of accuracy for loops of more than 10 residues. However, we believe that by extending the principles used in the development of this method, further improvement is possible by optimizing the sampling of initial structures during the first stages.

We have also described a novel iso-energy ({varphi}, {psi}) contour map when compared with the well-known Ramachandran plots to illustrate the relationship between the local backbone interactions and the local orientation of the peptide chain as presented by the angle of rotation around the virtual C{alpha}–C{alpha} bonds.


    Footnotes
 
Edited by Michael Deem


    Acknowledgements
 Top
 Abstract
 Introduction
 Theory
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
The authors are thankful to Prof J. Andrew McCammon, Dr Frank Brown, Dr Hugues-Olivier Bertrand and Dr Marc Fasnacht for helpful scientific discussion.


    References
 Top
 Abstract
 Introduction
 Theory
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
Bashford D., Case D.A. Annual Review of Physical Chemistry (2000) 51:129–152.[CrossRef][Web of Science][Medline]

Brooks B., Bruccoleri R.E., Olafson B.D., States D.J., Swaminathan S., Karplus M. J. Comput. Chem. (1983) 4:187–217.[CrossRef][Web of Science]

Bruccoleri R.E., Karplus M. Biopolymers (1987) 26:137–168.[CrossRef][Web of Science][Medline]

Brunger A.T., Karplus M. Proteins (1988) 4:148–156.[CrossRef][Web of Science][Medline]

Chothia C., Lesk A.M. J. Mol. Biol. (1987) 196:901–917.[CrossRef][Web of Science][Medline]

Chothia C., Lesk A.M., Levitt M., Amit A.G., Mariuzza R.A., Phillips S.E., Poljak R.J. Science (1986) 233:755–758.[Abstract/Free Full Text]

Deane C.M., Blundell T.L. Protein Sci. (2001) 10:599–612.[CrossRef][Web of Science][Medline]

Dominy B.N., Brooks C.L. III. J. Phys. Chem. B (1999) 103:3765–3773.

Fiser A., Do R.K., Sali A. Protein Sci. (2000) 9:1753–1773.[Web of Science][Medline]

Greer J. Proc. Natl. Acad. Sci. USA (1980) 77:3393–3397.[Abstract/Free Full Text]

Jacobson M.P., Pincus D.L., Rapp C.S., Day T.J., Honig B., Shaw D.E., Friesner R.A. Proteins (2004) 55:351–367.[CrossRef][Web of Science][Medline]

Michalsky E., Goede A., Preissner R. Protein Eng. (2003) 16:979–985.[Abstract/Free Full Text]

Momany F., Rone R. J. Comp. Chem. (1992) 13:888–900.[CrossRef]

Moult J., James M.N. Proteins (1986) 1:146–163.[CrossRef][Medline]

Ramachandran G.E., Ramakrisnan C., Sasisekharan V. J. Mol. Biol. (1963) 7:95–99.[Web of Science][Medline]

Rohl C.A., Strauss C.E., Chivian D., Baker D. Proteins (2004) 55:656–677.[CrossRef][Web of Science][Medline]

Spassov V., Yan L., Szalma S.Z. J. Phys. Chem. B (2002) 106:8726–8738.

Spassov V.Z., Yan L., Flook P.K. Protein Sci. (2007) 16:494–506.[CrossRef][Web of Science][Medline]

Still W.C., Tempczyk A., Hawley R.C., Hendrickson T. J. Am. Chem. Soc. (1990) 112:6127–6129.[CrossRef][Web of Science]

Sudarsanam S., DuBose R.F., March C.J., Srinivasan S. Protein Sci. (1995) 4:1412–1420.[Web of Science][Medline]

van Vlijmen H.W., Karplus M. J. Mol. Biol. (1997) 267:975–1001.[CrossRef][Web of Science][Medline]

Xiang Z., Soto C.S., Honig B. Proc. Natl. Acad. Sci. USA (2002) 99:7432–7437.[Abstract/Free Full Text]

Zhu K., Pincus D.L., Zhao S., Friesner R.A. Proteins (2006) 65:438–452.[CrossRef][Web of Science][Medline]

Received July 25, 2007; revised November 13, 2007; accepted November 27, 2007.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
P. W. Hildebrand, A. Goede, R. A. Bauer, B. Gruening, J. Ismer, E. Michalsky, and R. Preissner
SuperLooper--a prediction server for the modeling of loops in globular and membrane proteins
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W571 - W574.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
M. Cui, M. Mezei, and R. Osman
Prediction of protein loop structures using a local move Monte Carlo approach and a grid-based force field
Protein Eng. Des. Sel., December 1, 2008; 21(12): 729 - 735.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
21/2/91    most recent
gzm083v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Spassov, V. Z.
Right arrow Articles by Yan, L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Spassov, V. Z.
Right arrow Articles by Yan, L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?