PEDS Advance Access published online on August 5, 2007
Protein Engineering Design and Selection, doi:10.1093/protein/gzm036
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Predicting the affinity of epitope-peptides with class I MHC molecule HLA-A*0201: an application of amino acid-based peptide prediction
1 Key Laboratory of Subtropical Bioresource Conservation and Utilization, Guangxi University, Nanning, Guangxi 530004, China 2 Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi 530004, China 3 Gordon Life Science Institute, San Diego, California 92130, USA
4 To whom correspondence should be addressed. E-mail: duqishi{at}yahoo.com
| Abstract |
|---|
|
|
|---|
A new peptide design strategy, the amino acid-based peptide prediction (AABPP) approach, is applied for predicting the affinity of epitope-peptides with class I MHC molecule HLA-A*0201. The AABPP approach consists of two sets of predictive coefficients. The former is the coefficients for the physicochemical properties of amino acids and the latter is the weight factors for the residue positions in a peptide sequence. An iterative double least square technique is introduced to determine the two sets of coefficients alternately through a benchmark dataset. The coefficients converged through such an iterative process are further used to predict the bioactivities of query peptides. In the AABPP algorithm, the following eight physicochemical properties are used as the descriptors of amino acids: (i) lipophilic indices, (ii) hydrophilic indices, (iii) lipophilic surface area, (iv) hydrophilic surface area, (v)
-potency indices, (vi) ß-potency indices, (vii) coil-potency indices and (viii) volume of amino acid side chains. In comparison with the existing methods in this area, a remakable advantage of the current approach is that there is no need to know the exact conformation of a query peptide and its alignment with a template. The two steps are indispensable but cannot always be successfully realized otherwise. It is anticipated that the AABPP approach will become a powerful tool for peptide drug design, or at least play a complemetary role to the existing methods.
Keywords: computational vaccinology/Class I MHC/epitope-peptide/HLA-A*0201/peptide-based vaccines/physicochemical properties
| Introduction |
|---|
|
|
|---|
The recent development in bioinformatics has provided various tools for designing peptide inhibitors for drug development (Chou, 1993a, b, 1994, 1996
The pathway from protein sequence to vaccine development is lengthy and costly (Buteau et al., 2002
), entailing the development of binding assays for testing the affinity of the selected peptides to the MHC molecules, and the measurement of the T-cell response in vitro assays, as well as the ultimate test of immunogenicity in vivo. Therefore, it is highly desired to develop an automated method for screening the candidate peptides prior to the assay development, i.e. a computational approach for detecting immunogenicity.
The MHC class I epitopes bind to a well-defined binding groove on the MHC molecule. The main sources of this specificity are the anchor sites, which are the pockets in the MHC molecule that accommodate certain peptide side chains. Peptides that bind to HLA-A*0201 have a restricted size of 9 ±1 amino acids and require free N- and C-termini. In addition to a specific size, a combination of two main anchor residues is required. These anchors have been described as Leu at position 2 and Leu or Val at the C-terminal end (Falk et al., 1991
). The presence of anchors is necessary, but not sufficient, for high-affinity binding. Prominent roles for several other sequence positions (1, 3 and 7), the so-called secondary anchor residues, have also been demonstrated (Madden et al., 1993
; Ruppert et al., 1993
; Madden, 1995). Although a large number of peptides have been synthesized and tested, relatively little is known about the nature of the forces involved in the peptide-MHC molecule interaction. In the current study, the newly developed peptide design method, amino acid-based peptide prediction (AABPP) is applied to predict the affinity of epitope-peptides with class I MHC molecule HLA*A0201.
| Theory and method |
|---|
|
|
|---|
In the AABPP approach, the binding free energy, or bioactivity, between peptide ligand Pi and its protein receptor is simplified as the summation of the contributions from all amino acid residues of the peptide ligand Pi; i.e.
|
| 1 |
gi,j is the free energy contribution of residue at position j of peptide Pi and M is the total number of residues involved. The binding free energy
gi,j from individual residues may have different weight factor to the total free energy
G°i due to their different microenvironments and roles in bioactivity. We use a set of sensitive coefficients {bj} to describe the microenvironments and roles of these residues. The binding free energy
gi,j of residue j of peptide Pi is described by a series of physical and chemical properties of amino acids,
|
| 2 |
gi,j of Eq. (2) into Eq. (1) and transferring the binding free energy
G°i to bioactivity pKi = –logKi =
G°i of the peptide Pi, we obtain the following simultaneous linear equations,
|
| 3 |
Although the transformation from Eq. (1) to Eq. (3) is not a rigorous theoretical derivation, it can be used to explain the physical implication of the linear free energy equation, and the functions of two sets of coefficients, as well as some theoretical considerations in our model. Like all other QSAR approaches, the linear free energy equation is not unique and a careful selection for physicochemical properties of amino acids in a specific system may improve the predictive ability remarkably. We can refine the binding free energy by utilizing other linear free energy equations and optimizing the physicochemical parameters of amino acids.
In Eq. (3) there are two sets of coefficients: {al} are the sensitive coefficients of the physicochemical parameters considered, and {bj} the sensitive coefficients of the amino acid residues involved in the peptide concerned. An iterative double least square (IDLS) technique was developed to determine the values of the coefficient sets {al} and {bj} alternately by solving the three-dimensional simultaneous linear equations. By assigning a set of initial values for the coefficients {al(0)}, the 3D data matrix VN x M x L can be reduced to a 2D data matrix D(1)N x M with the elements given by
|
| 4 |
Through Eq. (4), the original set of 3D simultaneous linear equations [Eq. (3)] is reduced to a set of 2D equations; i.e.
|
| 5 |
The Eq. (5) can be easily solved by using the least square approach, yielding the first solutions for the sensitive coefficients {bj(1)}. Then, the values of {bj(1)} are used to reduce the 3D data matrix VN x M x L to a 2D data matrix T(1)N x L with the elements given by
|
| 6 |
|
| 7 |
|
| 8 |
gi,j is the contribution of amino acid j of the ith peptide reagent to the bioactivity. The predictive error is defined by
|
| 9 |
|
| 10 |
The correlation coefficient is defined by
|
| 11 |
|
| 12 |
To provide a clear picture, a flowchart is given in Fig. 1 to illustrate how the IDLS procedure works in solving the 3D linear equations [Eq. (3)] for the two sets of coefficients {al(n)} and {bj(n)}.
|
| Results and discussion |
|---|
|
|
|---|
The allele HLA-A*0201 is one of the most frequent class I alleles found in many different species and populations (Imanishi et al., 1992a
50% of Caucasians population (Peoples et al., 1995
-potency, ß-potency and coil-potency indices (Chou and Fasman, 1974
|
|
The sequences and the experimental binding affinities of the 102 peptides (Table II) for the training dataset and those of the 50 peptides (Table IV) in the testing dataset are taken from the paper (Doytchinova and Flower, 2001
|
The IDLS technique described in Section II is used for the binding affinity study of peptides with class I MHC molecule HLA-A*0201 based on the experimental data listed in Tables II. The initial values of sensitive coefficients of physicochemical parameters {al(0)} are assigned to be 1. This is a reasonable guess for {al(0)}, implying that all physicochemical properties are equally important. In this condition, AABPP is reduced to the traditional 2D-QSAR, because only one set of coefficients {bj} is working. In other words, the traditional 2D-QSAR is only a special case of AABPP.
Among the 102 training peptides, there are four outliers (nos 2, 3, 8 and 10) according to the reference (Doytchinova and Flower, 2001
). Our calculations show that the four peptides have larger errors and hence are likely outliers as well. Shown in Fig. 2 are the curves of correlation coefficients R versus iteration, where the curve Ra is for the iteration of coefficients {al(n)}, and the curve Rb is for the iteration of coefficients {bj(n)}. The average fitted error Q between the calculated bioactivities and the experimental bioactivities of peptides is shown in Fig. 3, where Qa is for {al(n)} iteration and Qb for {bj(n)} iteration. It has been observed that, after 10 to 12 iterations, the iterative result is converged smoothly. The converged sensitive coefficient sets {al(n)} and {bj(n)} are given in Table III.
|
|
|
The following four scenarios were used to examine the prediction quality of AABPP. (1) Only the four HMLP parameters are used, yielding that the correlation coefficient and the fitting error for the training dataset were R = 0.6726 and Q = 0.6308, and those for the testing dataset are R = 0.6895 and Q = 0.6411, respectively. (2) Five parameters (four HMLP parameters plus the volumes of amino acid side chains) were used, yielding better results with R = 0.7052 and Q = 0.6044 for the training set and R = 0.7156 and Q = 0.6199 for the testing set. (3) Eight physicochemical properties (i.e. the four HMLP parameters, the volume of residue side chains and the three secondary structure-potency indices) were used. (4) Different from the above three schemes where all the 102 peptides in the training dataset were used, here the four outliers were excluded from the training dataset, but the eight physicochemical properties were used as done in scheme-3. Although the values of Q and R thus obtained for the training dataset were further improved, the corresponding results for the testing dataset were not as good as those obtained in scheme-3, indicating the prediction power was reduced. The detailed results performed by the above four schemes are given in Table V.
|
The best predicted pIC50 for the 50 query peptides in the testing set are given in Table IV, which were obtained from scheme-3 using eight physicochemical parameters and all 102 training peptides including four outliers. In scheme-4, the four outliers were excluded from the training dataset: although the correlation coefficient thus obtained was the best, the predictive quality was not improved or even worse, implying that the diversity of the peptides in the training set is very important for the prediction power of AABLPD.
It can be seen from Table V that diversifying peptides in the training set is an important condition for improving the predictive power of ABBPD, especially for the residue positions at which we want to make prediction. In comparison with CoMFA and CoMSIA widely used in literatures, a remakable advantage of AABPP is that it neither needs knowing the exact comformations of the peptides nor needs aligning the peptides according to a template. The two steps are necessary but quite difficult for CoMFA and CoMSIA owing to that there are numerous possible conformations for peptides and that the experimental crystal structure for serving as a template is often not available. The data in Table V indicate that the four HMLP parameters (Du et al., 2006
) of amino acid residues form the main body of the describtor set in AABPP. It is expected that, with more experimental data available, the predictive power of AABPP will be further improved. AABPP provides an alternate way for peptide drug prediction.
| Conclusion |
|---|
|
|
|---|
The binding affinity prediction of epitope-peptides is vital to the goal of developing peptide-based vaccines. Computational estimation of immunogenicity can be a very useful tool for the assessment of epitope, multiepitope or subunit vaccines, whether delivered as peptide or DNA. The ability to predict MHC binding will enable us to analyze microbial genomes, identifying the most immunogenic proteins and thus selecting a set of favored putative vaccines.
The theoretical model of AABPP is built upon the biological functions and structural features of functional peptides with clear physical implications. In the traditional QSAR, only one set of predictive coefficients {bj} is used that is for the roles or microenvironments of amino acids in peptides. However, in the AABPP model, two sets of predictive coefficients {al} and {bj} are used for physical parameters and for the position of residues in peptide, respectively. The IDLS procedures are performed for {al} and {bj} alternately and iteratively. In this way, the predictive error Q decreases and the correlation coefficient R increases step by step. IDLS enhances the predictive ability of AABPP remarkably. In the calculation example, the correlation coefficient in first iteration R(0) = 0.6593 and predictive error Q(0) = ±0.6048 are the results of tradotional QSAR. Because in the first iteration, the coeficients {a(0)l} are assigned to be 1 and only coefficients {bj} are working. In this case, the AABPP is reduced to the traditional 2D-QSAR. The converged correlation coefficient R(n) = 0.8251 and predictive error Q(n) = ±0.4543 are the improved result with the IDLS method. Therefore, AABPP enhanced the predictive power of QSAR remarkably.
In the AABPP approach, the binding free energy between peptide ligand Pi and the target receptor is described by the physicochemical parameters of amino acids at every sequence site through the linear free energy equation, which has made it possible to not only get better results in predicting the bioactivities of new peptide reagents, but also can describe the physical and chemical features of an amino acid at every sequence position. This is very helpful for designing peptide reagents, peptide analogues, as well as peptide mimetics and modified peptides for drug development. The predictive ability of AABPP can be further improved by using more physicochemical propeties and optimized values of amino acid parameters. In comparison with the existing methods in this area (such as CoMFA and CoMSIA), a promising advantage of the current approach is that there is no need to know the exact conformation of a query peptide and its alignment with a template. In many cases, the active comformation is not available. In AABPP, the information of peptide conformations is embeded in the parameters of secondary structure potencies implicitly. It is expected that the AABPP will play an important role in search for new peptide-based vaccines as the molecular modeling and QSAR do in search for new drugs.
| Footnotes |
|---|
Edited by Bruce Tidor
| Acknowledgements |
|---|
|
|
|---|
This work is supported by the Chinese National Basic Research Program (973) under the project 2004CB719606 and by the Chinese National Science Foundation (NSFC).
| References |
|---|
|
|
|---|
Allsopp C.E., Harding R.M., Taylor C., Bunce M., Kwiatkowski D., Anstey N., Brewster D., McMichael A.J., Greenwood B.M., Hill A.V. Am. J. Hum. Genet. (1992) 50:411–421.[Web of Science][Medline]
Bhasin M., Raghava G. Vaccine (2004) 22:3195–3201.[CrossRef][Web of Science][Medline]
Bodmer J. Ciba Found Symp. (1996) 197:233–253.[Medline]
Brunak S., Buus S. Rev. Immunogenet. (2000) 2:477.[Medline]
Brusic V., Rudy G., Harrison L.C. Nucleic Acid Res. (1998) 26:368–371.
Buteau C., Markovic S., Celis E. Mayo Clin. Proc. (2002) 77:339–349.
Buus S. Curr. Opin. Immunol. (1999) 11:209.[CrossRef][Web of Science][Medline]
Chen J., Liu H., Yang J., Chou K.C. Amino Acids. (2007) DOI 10.1007/s00726-00006-00485-00729.
Chou J.J. J. Protein Chem. (1993) 12:291–302.[CrossRef][Web of Science][Medline]
Chou K.C. J. Biol. Chem. (1993) 268:16938–16948.
Chou K.C. Anal. Biochem. (1996) 233:1–14.[CrossRef][Web of Science][Medline]
Chou K.C. Curr. Med. Chem. (2004) 11:2105–2134.[Web of Science][Medline]
Chou K.C., Wei D.Q., Du Q.S., Sirois S., Zhong W.Z. Curr. Med. Chem. (2006) 13:3263–3270.[CrossRef][Web of Science][Medline]
Chou K.C., Wei D.Q., Zhong W.Z. Biochem. Biophys. Res. Comm. (2003) 308:148–151.[CrossRef][Web of Science][Medline]
Chou P.Y., Fasman G.D. Biochemistry (1974) 13:221–223.
Crammer R.D. III, Patterson D.E., Bunce J.D. J. Am. Chem. Soc. (1988) 110:5959–5967.[CrossRef][Web of Science]
Del Guercio M.-F., Sidney J., Hermanson G., Perez C., Grey H.M., Kubo R.T., Sette A. J. Immunol. (1995) 154:685–693.[Abstract]
Doytchinova I.A., Flower D.R. J. Med. Chem. (2001) 44:3572–3581.[CrossRef][Web of Science][Medline]
Du Q.S., Wang S.Q., Wei D.Q., Zhu Y., Guo H., Sirois S., Chou K.C. Peptides (2004) 25:1857–1864.[CrossRef][Web of Science][Medline]
Du Q.S., Liu P.J., Mezey P. J. Chem. Inf. Model. (2005a) 45:347–353.[CrossRef][Web of Science][Medline]
Du Q.S., Mezey P., Chou K.C. J. Compt. Chem. (2005b) 26:461–470.[CrossRef]
Du Q.S., Wang S., Wei D.Q., Sirois S., Chou K.C. Anal. Biochem. (2005c) 337:262–270.[CrossRef][Web of Science][Medline]
Du Q.S., Wang S.Q., Jiang Z.Q., Gao W.N., Li Y.D., Wei D.Q., Chou K.C. Med. Chem. (2005d) 1:209–213.[CrossRef][Medline]
Du Q.S., Li D.P., He W.Z., Chou K.C. J. Comput. Chem. (2006) 27:685–692.[CrossRef][Web of Science][Medline]
Falk K., Rotzschke O., Stefanovic S., Jung G., Rammensee H.-G. Nature (1991) 351:290–296.[CrossRef][Medline]
Gan Y.R., Huang H., Huang Y.D., Rao C.M., Zhao Y., Liu J.S., Wu L., Wei D.Q. Peptides (2006) 27:622–625.[CrossRef][Web of Science][Medline]
Imanishi T., Akaza T., Kimura A., Tokunaga K., Gojobori T. Estimation of allele and haplotype frequencies for HLA and complement loci.—Tsuji K., Aikawa M., Sasazuki T., eds. (1992a) I. HLA 1991: Proceedings of the 11th International Histocompatibility Workshop and Conference. Oxford University Press. 76–79.
Imanishi T., Akaza T., Kimura A., Tokunaga K., Gojobori T. Allele and Haplotype Frequencies for HLA and Complement Loci in Various Ethnic Groups—Tsuji K., Aizawa M., Sasazuki T., eds. (1992b) I. HLA 1991, Proceedings of the Eleventh International Histocompatibility Workshop and Conference. Oxford University Press. 1066–1077.
Kast W.M., Brandt R.M.P., Sidney J., Drijfhout J.-W., Kubo R.T., Grey H.M., Melief C.J.M., Sette A. J. Immunol. (1994) 152:3904–3911.[Abstract]
Kawakami Y., Eliyahu S., Jennings C., Sakaguchi K., Kang X., Southwood S., Robbins P.F., Sette A., Appella E., Rosenberg S.A. J. Immunol. (1995) 154:3961–3968.[Abstract]
Klebe G., Abraham U., Mietzner T. J. Med. Chem. (1994) 37:4130–4146.[CrossRef][Web of Science][Medline]
Klebe G., Abraham U. J. Comput. Aided Mol. Design (1999) 13:1–10.[CrossRef][Web of Science][Medline]
Lauemoller S.L., Kesmir C., Corbet S.L., Formsgaard A., Holm A., Claesson M.H., Brunak S., Buus S. Rev. Immunogenet. (2000) 2:477–491.[Medline]
Madden D.R. Annu. Rev. Immunol. 13:587–622.
Madden D.R., Garboczi D.N., Wiley D.C. Cell (1993) 75:693–708.[CrossRef][Web of Science][Medline]
McMichael A.J., Parham P., Brodsky F.M., Pilch J.R. J. Exp. Med. (1980) 152(Suppl. 2):195–203.
Parkhurst M.R., Fitzgerald E.B., Southwood S., Sette A., Rosenberg S.A., Kawakami Y. Cancer Res. (1998) 58:4895–4901.
Parkhurst M.R., Salgaller M.L., Southwood S., Robbins P.F., Sette A., Rosenberg S.A., Kawakami Y. J. Immunol. (1996) 157:2539–2548.[Abstract]
Peoples G.E., Goedegebuure P.S., Smith R., Linehan D.C., Yoshino I., Eberlein T.Y. Proc. Natl Acad. Sci. USA (1995) 92:432–436.
Rivoltini L., et al. J. Immunol. (1995) 154:2257–2265.[Abstract]
Rongcun Y., et al. J. Immunol. (1999) 163:1037–1044.
Ruppert J., Sidney J., Celis E., Kubo R.T., Grey H.M., Sette A. Cell (1993) 74:929–937.[CrossRef][Web of Science][Medline]
Schendel D.J., Gansbacher B., Oberneder R., Kriegmair M., Hofstetter A., Riethmuller G., Segurado O.G. J. Immunol. (1993) 151:4209–4220.[Abstract]
Sette A., Sidney J., del Guercio M.-F., Southwood S., Ruppert J., Dalberg C., Grey H.M., Kubo R.T. Mol. Immunol. (1994a) 31:813–822.[CrossRef][Web of Science][Medline]
Sette A., et al. J. Immunol. (1994b) 153:5586–5592.[Abstract]
Thibaut U. 3D, QSAR in Drug Design.—Kubinyi H., ed. (1993) Leiden: ESCOM. 661–696.
Tsai V., Southwood S., Sidney J., Sakaguchi K., Kawakami Y., Appella E., Sette A., Celis E. J. Immunol. (1997) 158:1796–1802.[Abstract]
Vitiello A., Sette A., Yuan L., Farrness P., Southwood S., Sidney J., Chesnut R.W., Grey H.M., Liningston B. Eur. J. Immunol. (1997) 27:671–678.[Web of Science][Medline]
Zhang R., Wei D.Q., Du Q.S., Chou K.C. Med. Chem. (2006) 2:309–314.[CrossRef][Medline]
Received April 17, 2007; revised May 31, 2007; accepted June 22, 2007.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


