Protein Engineering, Vol. 15, No. 8, 677-681,
August 2002
© 2002 Oxford University Press
A fast empirical approach to binding free energy calculations based on protein interface information
Center for Biomedical Engineering, Beijing Polytechnic University, Beijing 100022, China
| Abstract |
|---|
|
|
|---|
Three useful variables from the interfaces of 20 proteinprotein complexes were investigated. These variables are the side-chain accessible number (Nb), the number of hydrophilic pairs (Npair) and buried apolar solventaccessible surface areas (

ASAapol). An empirical model based on the three variables was developed to describe the free energy of protein associations. As the results show, the side-chain accessible numbers characterize the loss of side-chain conformational entropy of protein interactions and the effective empirical function presented here has great capability for estimating the binding free energy. It was found that the variables of interface information capture most of the significant features of proteinprotein association. Also, we applied the model based on the variables as a rescoring function to docking simulations and found that it has the potential to distinguish the true binding mode. It is clear that the simple and empirical scale developed here is an attractive target function for calculating binding free energy for various biological processes to rational protein design.
Keywords: docking/entropy/intermolecular interactions/protein association
| Introduction |
|---|
|
|
|---|
Proteinprotein interactions play a central role in protein function. Owing to the free energy being the important criterion for proteinprotein binding, research on it is important for a better understanding of protein interactions and for the subsequent application of this knowledge to protein engineering and drug design. Computer modeling makes it possible to perform direct simulations to study proteinprotein associations. Accurate calculations of the free energy that drives the proteinprotein association are based on molecular dynamics or Monte Carlo simulations (Karplus and Petsko, 1990
![]() | (1) |
Sc represent the electrostatic energy change, the desolvation free energy and the change in conformational entropy, respectively, and T is the absolute temperature. The last term,
Gconst, includes all other free energy changes associated with translation, rotation, vibration and protonation/deprotonation effects. The results show that the average difference between calculated and measured free energies of proteases and their inhibitors was
1.3 kcal/mol, representing an error of about 10% (Vajda et al., 1995
Subsequently, Zhang et al. put forward a binding free energy function based on the atomic contact energy (Zhang et al., 1997
). The binding free energy is estimated by
![]() | (2) |
Ec is the change in atomic contact energy and
Eel is the direct electrostatic interaction between protease and its inhibitor. The term
Strv denotes the entropy change associated with the six degrees of freedom of rotation/translation and vibration. The precision of
Gcal compared with experimental data was between ±0.1 and ±2 kcal/mol.
In addition, Xu et al.(1997) devised a function relative to the hydrophilic number and the molecular surface:
![]() | (3) |
In general, entropy loss is indispensable to the binding free energy. As is well known, the entropy calculation, however, is difficult since it depends on the complete phase space of a molecular system and is sensitive to the inclusion of correlations between motions along the many degrees of freedom (Karplus and Kushick, 1981
; Di Nola et al., 1984
). Pickett and Sternberg developed an empirical scale to estimate the calculation of the side-chain conformational entropy loss (Pickett and Sternberg, 1993
). In the entropy scale the maximum conformational entropy, Sc, of each side chain was calculated by the classical expression
![]() | (4) |
In order to avoid the complicated calculation for conformational entropy and to consider the effect of entropy on the binding free energy, we obtained a simple and effective empirical scale for the conformational entropy and the binding free energy through the analysis of protein interfaces. In this study, we analyzed the binding interfaces of 20 protein complexes and extracted the three variables concerned with the interface information, i.e. the side-chain accessible number (Nb), the number of hydrophilic pairs (Npair) and buried apolar solvent-accessible surface areas of complexes interface (
ASAapol). Then, the empirical scale in terms of the three variables was established by linear fitting with experimental data for the free energy. In addition, the scale was applied as a score function to the docking processes for 10 protein complexes. Finally, the feasibility and shortcomings of our empirical method are discussed.
| Systems and methods |
|---|
|
|
|---|
All X-ray structures of 20 protein complexes were taken from the Protein Data Bank (Bernstein et al., 1977
ASA, was calculated from the difference in the buried surface area of each residue between two monomers and a dimer. If the relative change rate of
ASA was more than 20%, the calculated residue was defined as an interface residue. For the apolar group,
ASAapol was determined from the buried surface area of C atoms (the contribution of S atoms was omitted).
The side-chain accessible number, Nb, was taken from the number of contacted residues in the interface and the contacted residue was defined by the effective accessibility (
RA) of its side chain, calculated by
![]() | (5) |
At is the change of accessible surface area of side-chains and A*t is the standard side-chain surface area. If
RA of the residue across the interface of complexes was
1, the residue was taken as a side-chain accessible residue. The approximate value for 60% of the standard side-chain surface area in Equation 5
The number of hydrophilic pairs, Npair, was defined by the distance between the critical points of hydrophilic atoms, which was basically around their centers of contact surfaces (Lin et al., 1994
). If the distance between two hydrophilic atoms was <2.8 Å (the diameter of the solvent probe), the two atoms were treated as a hydrophilic pair.
To examine our model mentioned above, the 10 complexes with experimentally determined structures were selected as a test set to do molecular docking. The soft proteinprotein docking algorithm (C.H.Li et al., in preparation) developed in our group was used for the test and was based on the simplified protein models of Janins rigid-body proteinprotein docking algorithm (Cherfils et al., 1991
, 1994
; Cherfils and Janin, 1993
). The partial binding space including the partial surface of the receptor and complete surface of the ligand was searched, in which 3x104 different modes of contact between two proteins for each case were obtained. After filtering and clustering analysis, about 300 binding modes were retained. The binding free energy was then used to score those retained binding modes.
| Results and discussion |
|---|
|
|
|---|
Correlation analysis of interface information
The conformational entropy is able to affect the binding free energy of protein and its ligand as well as to drive protein folding. A major unfavorable entropy effect arises from the reduction in the number of accessible conformation, which is available to the protein backbone and side chains. As an approximation, we assume that the backbone in all folded conformations has the same conformational entropy. Therefore, only the entropy loss from the side chain is taken into account when the accessibility of the side chain is more than 60% of the standard side-chain surface area. When the values of the side-chain accessible number, Nb, are used to fit the side-chain conformational entropy loss according to Pickett and Sternbergs empirical scale, the linear fitting function is given by
![]() | (6) |
Figure 1
shows a linear fitting of side-chain conformation entropy (T
S) versus Nb. It is found that Nb correlates very well with T
S values. Therefore, Nb can be used to represent the side-chain conformational entropy loss for the proteinprotein binding process.
|
Table I
ASAapol, the hydrophobic interaction energy
Gd, the number of hydrophilic pairs Npair and the experimental binding free energies. Moreover, the electrostatic interaction energies
Eel of 13 complexes are taken from Zhang et al. (Zhang et al., 1997
Eel and between
Gd and
ASAapol. Similarly to Figure 1, Figure 2
Gd versus
ASAapol. It is found that the quantities Nb, Npair and
ASAapol capture most of the significant features of the interactions involved in those complexes.
|
|
|
Fast empirical calculation of binding free energy
As mentioned above, Nb, Npair and
ASAapol are related to the interface of protein complexes and correlate well with the conformational entropy change, the electrostatic interaction and the hydrophobic interaction, respectively. When the proteinprotein binding free energy,
Gcal, is written as a linear function of three variables Nb, Npair and
ASAapol,
Gcal can be expressed as
![]() | (7) |
ASAapol deduced from the interface can describe well the binding free energy of proteinprotein association.
|
Table III
|
Application of the score function in proteinprotein docking
Currently, the approach of rescoring docked conformations has made progress to some extent and has been used to rescore the lower root mean square deviation (r.m.s.d.) conformations (Norel et al., 2001
; Smith and Sternberg, 2002
). The main terms used in the rescoring are the statistics of residueresidue contacts across the interfaces of complexes and electrostatics. As discussed above, we presented an empirical method, which was based on the three variables extracted from the binding interface information. The calculation of the free energy of proteinprotein association with the method was quick and accurate. Especially the conformational entropy has been taken into account and this term is also accurate, which is supported from analysis. Therefore, we tried to apply this approach as a scoring function to rank the putative docked structures in the proteinprotein docking problem.
Table IV
summarizes the docking results for the 10 proteinprotein complexes including the name of the complexes, the ranking position of the first near-native structure using our scoring function and the corresponding r.m.s.d. from the X-ray crystallographic complex. For the first six cases, the complexes were reconstructed from the structures of the co-crystallized proteins. In these cases, the conformations of the two molecules are already adapted to each other. For this set of docking simulations, XX was added after the PDB code in the protein column. For the following two cases, the complexes were reconstructed from the structures in which one is from the protein of the complex and the other is from the free form. For this set of docking simulations, FX or XF was added after the PDB code, where F and X designate the free form and co-crystallized form, respectively. If the complexes were reconstructed from the structures of both proteins from the free form, FF was added to the PDB code. The docked geometry is taken into account only if the r.m.s.d. of the backbone atoms from the X-ray structure is <4.0 Å. For the 10 tested complexes, all the native-like docked geometries are found, of which six are found within the 10 top ranking solutions. This indicates that our scoring function is able to distinguish the true binding mode from the remaining false ones.
|
Figure 4
|
The definition of a general form of rescoring functions is required to distinguish reliably the true binding mode from the remaining false ones. Also, speed is an important factor considered in the rescoring functions. As the results show, the rescoring function presented here is relatively fast and effective for scoring the putative conformations. It is expected that the rescoring function is applicable to proteinprotein docking.
Conclusions
The interface information for proteinprotein complexes is important for understanding proteinprotein interactions and recognition. In this work, we investigated the useful variables from the interfaces and developed a simple scale to calculate the binding free energy of proteinprotein association. The variables are used as a scoring function in the protein protein docking calculation. As discussed above, the side-chain accessible number, Nb, can be reasonable for depicting the loss of side-chain conformational entropy in the binding process. The interface information for complexes has great potential for describing proteinprotein association and the corresponding three variables can be used to calculate the binding free energy. The model is advantageous in terms of saving calculation time and ease of use. However, the binding free energy function presented here is based on an approximate treatment in which the molecule is treated as a rigid body. Today it is necessary to develop both new docking methods for elucidating the details of specific interactions at the atomic level and computational tools for providing information on proteinprotein association in various environments (Camacho and Vadja, 2002). The interface information for complexes may give us some helpful hints on the subject and help us to get some ideas about specific associations. Work on improving the accuracy of binding free energy and molecular flexibility is currently under way.
| Notes |
|---|
1 To whom correspondence should be addressed. E-mail: cxwang{at}bjpu.edu.cn
| Acknowledgments |
|---|
We thank Professor J.Janin for providing the docking package. We also thank Dr Ben Zhuo Lu for helpful discussions. This work was supported in part by the Chinese Natural Science Foundation (Nos 299925902, 30170230 and 10174005).
| References |
|---|
|
|
|---|
Berendsen,H.J.C., van der Spoel,D. and van Drunen,R. (1995) Comput. Phys. Commun., 91, 4356.[CrossRef]
Bernstein,F.C., Koetzle,T.F., Williams,G.J.B, Meyer,E.F., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535542.[Web of Science][Medline]
Camacho,C.J. and Vajda,S. (2002) Curr. Opin. Struct. Biol., 12, 3640.[CrossRef][Web of Science][Medline]
Camacho,C.J., Weng,Z., Vajda,S. and DeLisi,C. (1999) Biophys J., 76, 11661178.[Web of Science][Medline]
Cherfils,J. and Janin,J. (1993) Curr. Opin. Struct. Biol., 3, 265269.
Cherfils,J., Duquerroy,S. and Janin,J. (1991) Proteins: Struct. Funct. Genet., 11, 271280.[CrossRef][Web of Science][Medline]
Cherfils,J., Bizebard,T., Knossow,M. and Janin,J. (1994) Proteins: Struct. Funct. Genet., 18, 818.[CrossRef][Web of Science][Medline]
Di Nola,A., Berendsen,H.J.C. and Edholm,O. (1984) Macromolecules, 17, 20442050.[CrossRef]
Goodsell,D.S. and Olson,A.J. (1990) Proteins: Struct. Funct. Genet., 8, 195202.[CrossRef][Web of Science][Medline]
Jackson,R.M. and Sternberg,M.J. (1995) J. Mol. Biol., 250, 258275.[CrossRef][Web of Science][Medline]
Karplus,M. and Kushick J.N. (1981) Macromolecules, 14, 325332.[CrossRef][Web of Science]
Karplus,M. and Petsko,G.A. (1990) Nature, 347, 631639.[CrossRef][Medline]
King,B.L., Vajda,S. and DeLisi,C. (1996) FEBS Lett., 384, 8791.[CrossRef][Web of Science][Medline]
Lee,B. and Richards F.M. (1971) J. Mol. Biol., 55, 379400.[CrossRef][Web of Science][Medline]
Lin,S.L., Nussinov,R., Fischer,D. and Wolfson,H.J. (1994) Proteins: Struct. Funct. Genet., 18, 94101.[CrossRef][Web of Science][Medline]
Mezei,M. and Beveridge,D.L. (1986) Ann. N. Y. Acad. Sci., 482, 123.[Medline]
Miyamoto,S. and Kollman,P.A. (1993) Proteins: Struct. Funct. Genet., 16, 226245.[CrossRef][Web of Science][Medline]
Nauchitel,V., Villaverde,M.C. and Sussman,F. (1995) Protein Sci., 4, 13561364.[Web of Science][Medline]
Norel,R., Sheinerman,F., Petrey,D. and Honig,B. (2001) Protein Sci., 10, 21472161.[CrossRef][Web of Science][Medline]
Novotny,J., Bruccoleri,R.E. and Saul,F.A. (1989) Biochemistry, 28, 47354749.[CrossRef][Medline]
Pickett,S.D. and Sternberg,M.J.E. (1993) J. Mol. Biol., 231, 825839.[CrossRef][Web of Science][Medline]
Reynolds,C.A., King,P.M. and Richards,W.G. (1992) Mol. Phys., 76, 251275.[CrossRef]
Sezerman,U., Vajda,S., Cornette,J., DeLisi,C. (1993) Protein Sci., 2, 18271843.[Web of Science][Medline]
Smith,G.R. and Sternberg,J.E. (2002) Curr. Opin. Struct. Biol., 12, 2835.[CrossRef][Web of Science][Medline]
Smith,K.C. and Honig,B. (1994) Proteins: Struct. Funct. Genet., 18, 119132.[CrossRef][Web of Science][Medline]
Stoddard,B.L. and Koshland,D.E.,Jr. (1993) Proc. Natl Acad. Sci. USA, 90, 11461153.
Takamatsu,Y. and Itai,A. (1998) Proteins: Struct. Funct. Genet., 33, 6273.[CrossRef][Web of Science][Medline]
Vajda,S., Weng,Z.P., Rosenfld,R. and DeLisi,C. (1994) Biochemistry, 33, 1397713988.[CrossRef][Medline]
Vajda,S., Weng,Z.P. and DeLisi,C. (1995) Protein Sci., 8, 10811092.
Vajda,S., Sippl,M., Novotny,J. (1997) Curr. Opin. Struct. Biol., 2, 222228.
Weng,Z.P., DeLisi,C. and Vajda,S. (1997) Protein Sci., 6, 19761984.[Web of Science][Medline]
Xu,D., Lin,S.L. and Nussinov,R. (1997) J. Mol. Biol., 265, 6884.[CrossRef][Web of Science][Medline]
Zhang,C., Vasmatzis,G., Cornette,J.L. and DeLisi,C. (1997) J. Mol. Biol., 267, 707726.[CrossRef][Web of Science][Medline]
Received January 30, 2002; revised April 26, 2002; accepted May 21, 2002.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











trace of experimental structure. Thin lines: C