Protein Engineering, Vol. 13, No. 4, 239-245,
April 2000
© 2000 Oxford University Press
Evaluation of proteinprotein association energies by free energy perturbation calculations
Department of Chemistry, University of Tromsø, N-9037 Tromsø, Norway. E-mail: arne.smalas{at}chem.uit.no
| Abstract |
|---|
|
|
|---|
The association energy upon binding of different amino acids in the specificity pocket of trypsin was evaluated by free energy perturbation calculations on complexes between bovine trypsin (BT) and bovine pancreatic trypsin inhibitor (BPTI). Three simulations of mutations of the primary binding residue (P1) were performed (P1-Ala to Gly, P1-Met to Gly and P1-Met to Ala) and the resulting differences in association energy (

Ga) are 2.28, 5.08 and 2.93 kcal/mol for P1-Ala to Gly, P1-Met to Gly and to Ala with experimental values of 1.71, 4.62 and 2.91 kcal/mol, respectively. The calculated binding free energy differences are hence in excellent agreement with the experimental binding free energies. The binding free energies, however, were shown to be highly dependent on water molecules at the proteinprotein interface and could only be quantitatively estimated if the correct number of such water molecules was included. Furthermore, the cavities that were formed when a large amino acid side-chain is perturbed to a smaller one seem to create instabilities in the systems and had to be refilled with water molecules in order to obtain reliable results. In addition, if the protein atoms that were perturbed away were not replaced by water molecules, the simulations dramatically overestimated the initial state of the free energy perturbations.
Keywords: free energy perturbation/molecular association/protein engineering/trypsin
| Introduction |
|---|
|
|
|---|
Serine proteinases and their natural inhibitors belong to the most comprehensively studied models of proteinprotein recognition (Bode and Huber, 1992

Ga° (
GX
GGly) as a quantitative measure of the interaction of the side-chain of the X residue with the S1 cavity, as previously suggested (Huang et al., 1995
Knowledge of structural and functional properties of enzymatic systems is of great importance if we are to design enzymes with specific properties as one may desire through protein engineering. Interplay between experimental and theoretical research has a central role in this context. Combining computer simulations based on empirical energy functions with site-directed mutagenesis and X-ray crystallography offers a unique opportunity to investigate the level of accuracy of theoretical modeling of highly complex processes, such as proteinprotein recognition. Molecular recognition in general and proteinprotein recognition in particular represent one of the most complex issues currently addressed by theoretical modeling techniques. Significant progress has been made in studies of proteinligand interactions (Kollman, 1993
; Åqvist et al., 1994
; Hansson and Åqvist, 1995
), whereas much less has been achieved in studies of proteinprotein recognition. Formation of proteinprotein complexes is expected to obey the same physical principles as the proteinligand interactions, but in the former case a more delicate balance between entropic and enthalpic contributions is anticipated, which may not be easily reproduced by computer simulation approaches (Muegge et al., 1998
).
Despite the success in providing rational explanations and predictions of free energies, most current work based on free energy perturbation (FEP) simulations suffers from the same sort of problems. First, FEP calculations reside on statistical mechanics and, in theory, as the number configurations sampled increases, the free energy should converge towards the experimental. Second, truncation of non-bonded potentials, particularly long-range electrostatics, tends to introduce instabilities in the simulated trajectories. Correct treatment of long-range potentials has received considerable attention during recent years. Hansson and Åqvist (1995) showed that using a too small cut-off suppressed some of the solvent's resistance to polarization by charged groups and as a result the simulations did not converge. Recently, McCarrick and Kollman (1999) used a dual cut-off (8 Å/12 Å) which seemed to reduce the overall drift in the average coordinates. Problems related to truncation of the non-bonded potentials can be resolved by using a local reaction field approach (LRA) (Lee and Warshel, 1992
), which uses a Taylor expansion that yields approximately the same results as infinte cut-offs at a modest cost. Still, the problem of sampling enough configurations to obtain a reliable, converged free energy is far more difficult. Therefore, in order to improve our calculated binding free energies, we use an average of several simulations of different length.
The present study was initiated in order to establish a protocol for free energy perturbation calculations that enable us to predict the energies and mechanisms of proteinprotein recognition. The model system is well suited since high-resolution structures of the proteinprotein complexes are available, along with experimental data for the association energies. The necessity of atomic resolved experimental structures of both the initial and final states was in particular addressed in the study. A series of perturbations, different protocols and simulation times were carried out for both a small (P1-Ala to Gly) and a large (P1-Met to Gly) perturbation of the complex. We find that the experimental association energies can be estimated in a quantitative manner only if the correct numbers and positions of water molecules at the proteinprotein interface are included. The quantitative estimates of binding free energy differences are dramatically improved when additional water molecules that are present in the crystal structure of the final state are taken into consideration. The protocol established here will therefore form the basis for our future calculations of binding free energies in a predictive manner on the model system.
| Materials and methods |
|---|
|
|
|---|
Generation of initial structures
Generation of initial structures was done by solvation of the recently reported X-ray structures (Helland et al., 1999
) of bovine trypsin (BT) in complex with bovine pancreatic trypsin inhibitor (BPTI) P1 variants. X-ray structures of complexes between BT and BPTI P1-Met and P1-Gly were available, the latter complex solved at both room temperature (295 K) and cryogenic temperature (120 K). The BTBPTI P1-Ala complex was modeled from the coordinates of BTBPTI P1-Gly. In order to put all the free energy calculations on a common basis, the initial starting structures were solvated using an identical protocol. This was done by placing a sphere of equilibrated TIP3P (Jorgensen et al., 1983
) water molecules with a radius of 18 Å centered at the P1 C
atom and water molecules closer than 2.8 Å to protein atoms were deleted. Water molecules with a distance <2.3 Å to other water molecules, as judged by the oxygenoxygen distance, were also removed. These structures were then subjected to energy minimization using 500 steps of steepest decent and 2000 steps of conjugate gradient.The coordinates for BPTI free in solution were generated by removal of BT from the respective complex. The BPTI structures were then subjected to the same solvation and minimization procedures as used for the BTBPTI complexes. After minimization the structures were equilibrated by performing a short simulation (15 ps) with a weak harmonic positional restraint of 0.5 kcal/mol on all heavy atoms. Structures after equilibration are now referred to as initial structures.
Computational procedure
All the calculations were carried out using the AMBER 5.0 software package (Pearlman et al., 1995
) and the Cornell et al. (1995) force field. Calculations of binding free energies for molecular systems were performed on the basis of statistical mechanics implemented into regular molecular mechanics force fields. A number of statistical mechanical procedures exist to determine free energy differences between two closely related states of a system based on molecular dynamics (MD) simulation techniques (Kollman, 1993
). The thermodynamic integration approach is now reasonably well established and the details of the methodology can be found in a number of excellent reviews (e.g. Kollman, 1993; Gilson et al., 1997); here we only briefly outline the method used in this work.
Free energy calculations applied to molecular association in solution resides on the equation depicted in Figure 1
, where E, I and I' denote enzyme, inhibitor and modified inhibitor, respectively. The free energy of association of I and I' to E can be measured experimentally, while theoretical methods can `mutate' I into I' free in solution and when bound to E. Considering that the free energy is a state function, the difference between the experimentally measured and calculated free energies should be equal:
![]() |
![]() |
and <
H/
>
is determined at various steps (
) of the integration.
|
The free energy calculations used a `frozen' atom procedure, where all residues (including water molecules) with at least one atom within 15 Å from the P1 residue were allowed to move freely. Residues not fulfilling this criterion were frozen to their initial positions as defined in their respective initial structures using the belly option in AMBER. The same moving set was used in all protein calculations, and also in all solvent simulations. A time step of 0.001 ps was used and the non-bonded pairlist was updated every 20 steps. The electrostatic interactions were treated with a group-based cut-off with a radius of 15 Å for the P1 residue and 10 Å for all other residues including water molecules. All other non-bonded interactions were treated with a 10 Å cut-off. In the first part of each simulation, only the charges were perturbed. The second simulations perturbed vdW radii and other bonded parameters. The decoupling prevents instabilities resulting from the presence of charges at atoms or groups which have small vdW radii. Dynamically modified windows were used in all calculations (Pearlman and Kollman, 1989
Average binding free energies were calculated from four free energy calculations of different lengths and to obtain an upper limit for the hysteresis, standard deviations for these four simulations were used. The hysteresis in the calculated binding free energies is normally obtained by taking the difference between forward and reverse runs (Miyamoto and Kollman, 1993
; Wang et al., 1993
), but several procedures for estimation of an upper limit for the hysteresis have been suggested. Using several simulations also allows the convergence of the simulated free energies to be monitored. Assuming an average error of ±40% in the individual
Gs (Krowarsch et al., 1999
), which corresponds to about 0.2 kcal/mol, the hysteresis in the experimental 
Gs becomes about 0.3 kcal/mol.
| Results and discussion |
|---|
|
|
|---|
The crystal structure analysis of trypsinBPTI complexes showed that the S1 pocket is packed with two to six water molecules dependent on the type of P1 side-chain (Helland et al., 1999
glycine, methionine
glycine and methionine
alanine.
|
One appealing feature of the free energy calculations is the opportunity to divide the total binding free energy into its contribution, for example electrostatic and vdW, known as decoupling. This was achieved by first running a free energy calculation that changed the charges of atoms from the initial to the final state and at the same time the vdW parameters are kept to those corresponding to the initial state. Second, the vdW parameters along with bond lengths and/or force constants were changed from initial to final state, while the charges of the atoms are kept to those of the final state. Perturbation of the electrostatic part is normally performed in a straightforward manner, but can be more difficult if it involves a creation or removal of a net charge. In such cases, one needs to ensure that the Born term does not enter into the free energies (Hansson and Åqvist, 1995
Alanine to glycine
The simplest substitution, from P1-Ala to Gly, was used to test the system and for deriving a general protocol for free energy calculations on the BTBPTI system, using P1-Gly as reference state. The BTBPTI P1-Ala complex was first modeled by using the room-temperature structure of BT P1-Gly. Coordinates for the sixth water molecule (Sol 6) were generated from superimposition of the cryogenic and the room-temperature structure. Both the initial (alanine) and final (glycine) states therefore had a water-mediated hydrogen bonding network formed by six water molecules in the S1 pocket. Individual
Gs and also total 
Gs are presented in Table Ia
. The total 
Gs range from 0.28 to 1.83 kcal/mol, with an average of 0.79 kcal/mol. Only the simulation using 1.0 ps for equilibration and data collection time is in quantitative agreement with the experimental value of 1.71 kcal/mol. It was therefore of interest to perform four new calculations based solely on the room-temperature structure and these calculations reside on both an initial and a final structure that has an S1 pocket filled with five water molecules. Table Ib
shows that three of the total 
Gs are in reasonable agreement with the experimental value of 1.71 kcal/mol, whereas the use of a 1.0 ps equilibration and data collection time estimates a free energy difference of 3.17 kcal/mol. The average binding free energy difference for mutation P1-Ala to Gly is 2.28 kcal/mol, with a standard deviation of 0.70 kcal/mol. From the individual
Gs it appears as the source of the variance in the total 
Gs arise from the vdW term from the simulations of free BPTI in solution. Furthermore, the total 
Gs is dominated by the vdW interaction term, which favours alanine by 2.57 kcal/mol (average value). Compared with the first four calculations (Table Ia
), the change in the calculated binding free energy arises from the vdW term for the simulations of the complex, as this term has on average increased by 1.55 kcal/mol. This indicates that the sixth water molecule creates unfavorable interactions within the BTBPTI P1-Ala binding pocket. In contrast, the electrostatic term is more or less unchanged from the calculations with six water molecules (Table Ia and b
).
|
Crystallographic studies of P1 variants of turkey ovomucoid third domain inhibitor (OMTKY3) in complex with Streptomyces griseus proteinase B (SGPB) revealed a unique solvent structure in the S1 pocket for each complex investigated (Huang et al., 1995

G is more or less unchanged regardless of simulation length or of the presence of five or six water molecules in the S1 pocket. The electrostatic contribution for mutation of P1-Ala to Gly therefore seems to converge rapidly. Again, the electrostatic part of the binding free energy turns out to be unfavorable for binding of BPTI P1-Ala. On the other hand, the vdW term favors binding of alanine and 
GvdW is dominated by the
GvdW from the complex. Compared with the results from the previous calculation with five water molecules (Table Ib
Gtotal is identical. Trypsin therefore seems to be able to accommodate six water molecules in the binding pocket when bound to BPTI P1-Ala, as also seen for the corresponding complex between SGPB and OMTKY3 (Huang et al., 1995Methionine to glycine
The structure of BT P1-Met showed that the P1 residue is bent, with the C
atom rotated slightly out of the specificity pocket (Figure 2
). When methionine is entering the pocket, two solvent molecules (Sol 5 and Sol 6) are expelled. Correct modeling of water molecules can be of significant importance, as seemed to be case for mutation of P1-Ala to Gly. In order to address this issue further, two estimates for the binding free energy difference between P1-Met and P1-Gly were calculated. First, the presence of both Sol 5 and Sol 6 was discarded (Table IIa
), which means that only four water molecules were present in the S1 pocket in both the initial and final states. The total binding free energy ranges from 4.25 to 11.52 kcal/mol and is dominated by the vdW interaction term favouring binding of BPTI P1-Met. The average binding free energy becomes 8.78 kcal/mol, with a standard deviation of 3.16 kcal/mol. This correctly predicts a far stronger binding for the P1-Met variant of BPTI compared with P1-Gly, but is only in qualitative agreement with the experimental value of 4.62 kcal/mol. The discrepancy can be explained from the fact that when performing a mutation as large as methionine to glycine, a cavity of unoccupied space will be generated and the resulting free energy will overestimate the binding of methionine. The calculations were therefore repeated, where the P1-Met side-chain was mutated to glycine plus a water molecule in accordance with the five water molecule arrangement in the active site of the P1-Gly complex.
|
In order to model the effects of the fifth water (Sol 5), the CH3 group at the end of the methionine side-chain was perturbed to a water molecule. A superimposition of the X-ray structures of BT P1-Met and BT P1-Gly (room temperature) showed that Sol 5 is located at approximately the C
position. The resulting average binding free energy difference becomes 5.08 kcal/mol, with a standard deviation of 1.08 kcal/mol. Again, the 
Gtotal is dominated by the vdW term, with an average of 6.47 kcal/mol. Creation of cavities within proteins is known to be destabilizing (Xu et al., 1998Methionine to alanine
The free energy calculations for mutation of P1-Ala to Gly indicated that the S1 pocket of BT in the complex with BPTI P1-Ala should contain five water molecules. To address the importance of an additional water molecule at the predicted position, we first carried out four calculations with a direct perturbation of P1-Met to Ala (Table IIIa
). Again, the binding free energy difference is far from the experimental value, but has the correct sign. In order to model the presence of Sol 5, the procedure described previously for the mutation of P1-Met to Gly was used. The binding free energy difference is 9.79 kcal/mol when the side-chain of P1-Met is mutated directly to Ala without considering the additional Sol 5 (Table IIIa
). In contrast, 
Gtotal is 2.93 kcal/mol when the effect of Sol 5 is considered (Table IIIb
). The standard deviation is 0.92 kcal/mol for mutation of P1-Met to Ala when Sol 5 is present in the final (Ala) state. Again, the total binding free energy is dominated by the vdW term which favors binding of BPTI-P1-Met by 4.33 kcal/mol (average). In terms of electrostatics, binding of the P1-Ala variant is favored by 1.40 kcal/mol (average).
|
Assuming that the difference between P1-X and P1-Gly is a reasonable measure of the interaction energy formed between the P1-X residue and the S1 pocket, other relative free energies of binding can also be estimated through additivity. Thus, if this simple additivity concept performs well, one can then predict and also explain binding free energies without performing all necessary calculations. By using the two extensive free energy calculations on the mutation of P1-Ala to Gly and P1-Met to Gly, the additivity concept predicts a binding free energy difference for mutation of P1-Met to Ala of 2.80 kcal/mol, which corresponds well with the directly calculated value of 2.93 kcal/mol.
Conclusion
The present study demonstrates that the water molecules at the proteinprotein interface are of crucial importance. In response to change in size of the P1 residue entering the specificity pocket, water molecules tend to redistribute within the S1 pocket in order to optimize the enzymeinhibitor interactions. A clear trend was observed for mutations of P1-Met to Gly/Ala, as the calculations performed by directly changing the P1 residue predicted the correct relative binding order, but are not in quantitative agreement with experiments. Only when the presence of the additional water molecules in the S1 pocket was modeled explicitly was the binding free energy difference in quantitative agreement with the experimental values. Inclusion of a fifth water molecule in the S1 pocket of the BTBPTI complex of the P1-Gly and Ala variants when mutated from P1-Met reduced the total 
G in the range 47 kcal/mol and become 5.08 and 2.93 kcal/mol for the two perturbations, respectively (Tables II and III![]()
). The corresponding experimental values are 4.62 and 2.91 kcal/mol. This can be explained by the observation that a near removal of a large side-chain such as methionine creates a large cavity. Empty spaces or cavities in the protein interior are known to be destabilizing and consequently the calculated binding free energy difference favors the methionine state too strongly. P450cam in complex with 2-phenylimidazole contains a water molecule at the proteinligand interface that stabilizes the complex by 2.8 kcal/mol (Helms and Wade, 1995
), which is comparable to the results found in this work. The present study therefore shows that relatively precise values for binding free energy differences involving two proteins can be obtained by the free energy pertubation method, but detailed knowledge about the structure of both the initial and final states is required. It is possible, however, that much longer simulations would have allowed bulk water molecules to fill the cavity formed by removal of methionine. This is not desirable, however, as one wants to obtain an estimate of the binding free energy within a reasonable amount of time.
| Notes |
|---|
1 To whom correspondence should be addressed
| Acknowledgments |
|---|
The work was supported by grants from the Norwegian Research Council.
| References |
|---|
|
|
|---|
Apostoluk,W. and Otlewski,J. (1998) Proteins, 32, 459474.[Web of Science][Medline]
Åqvist,J., Medina,C. and Samuelsson,J.E. (1994) Protein Engng, 7, 385391.
Bode,W. and Huber,R. (1992) Eur. J. Biochem., 204, 433451.[Web of Science][Medline]
Cornell,W.D. et al. (1995) J. Am. Chem. Soc., 117, 51795197.
Gilson,M.K., Given,J.A., Bush,B.L. and McCammon,J.A. (1997) Biophys. J., 72, 10471069.[Web of Science][Medline]
Hansson,T. and Åqvist,J. (1995) Protein Engng, 8, 11371144.
Helland,R., Otlewski,J., Sundheim,O., Dadlez,M. and Smalås,A.O. (1999) J. Mol. Biol., 287, 923942.[Web of Science][Medline]
Helms,V. and Wade,R.C. (1995) Biophys. J., 69, 810824.[Web of Science][Medline]
Huang,K., Lu,W., Anderson,S., Laskowski,M.,Jr and James,M.N. (1995) Protein Sci., 4, 19851997.[Web of Science][Medline]
Jorgensen,W.L., Chandrasekhar,J., Madura,J.D., Impey,R.W. and Klein,M.L. (1983) J. Chem. Phys., 79, 926935.
Kollman,P.A. (1993) Chem. Rev., 93, 23952417.[Web of Science]
Krowarsch,D., Dadlez,M., Buczek,O., Krokoszynska,I., Smalås,A.O. and Otlewski,J. (1999) J. Mol. Biol., 289, 175186.[Web of Science][Medline]
Lee,F.S. and Warshel,A. (1992) J. Chem. Phys., 97, 31003107.
Lu,W. et al. (1997) J. Mol. Biol., 266, 441461.[Web of Science][Medline]
McCarrick,M.A. and Kollman,P.A. (1999) J. Comput.-Aided Mol. Des., 13, 109121.
Miyamoto,S. and Kollman,P.A. (1993) Proteins, 16, 226245.[Web of Science][Medline]
Muegge,I., Schweins,T. and Warshel,A. (1998) Proteins, 30, 407423.[Web of Science][Medline]
Pearlman,D.A., Case,D.A., Caldwell,J.W., Ross,W.S., Cheatham,T.E., Debolt,S., Ferguson,D., Seibel,G. and Kollman,P.A. (1995) Comp. Phys. Commun., 91, 141.
Pearlman,D.A. and Kollman,P.A. (1989) J. Chem. Phys., 90, 24602470.
Qasim,M.A., Ganz,P.J., Saunders,C.W., Bateman,K.S., James,M.N. and Laskowski,M.,Jr (1997) Biochemistry, 36, 15981607.[Medline]
Schechter,I. and Berger,A. (1967) Biochem. Biophys. Res. Commun., 27, 157162.[Web of Science][Medline]
Vlassi,M., Cesareni,G. and Kokkinidis,M. (1999) J. Mol. Biol., 285, 817827.[Web of Science][Medline]
Wang,C.X., Shi,Y.Y., Zhou,F. and Wang,L. (1993) Proteins, 15, 59.[Web of Science][Medline]
Xu,J., Baase,W.A., Baldwin,E. and Matthews,B.W. (1998) Protein Sci., 7, 158177.[Web of Science][Medline]
Received November 8, 1999; revised January 24, 2000; accepted February 8, 2000.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



