PEDS Advance Access published online on February 21, 2007
Protein Engineering Design and Selection, doi:10.1093/protein/gzl054
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Design of MHC I stabilizing peptides by agent-based exploration of sequence space
1 Center for Membrane Proteomics, Institute of Organic Chemistry and Chemical Biology, Johann Wolfgang Goethe-Universität, and Institute of Cell Biology and Neuroscience, Siesmayerstr. 70, D-60323 Frankfurt am Main, Germany 2 Department of Dermatology, Clinical Research Group Tumor Immunology, CharitéUniversitätsmedizin Berlin, Schumannstr. 20/21, D-10117 Berlin, Germany 3 Department of Biology, Chemistry and Pharmacy, Freie Universität Berlin, Takustr. 3, D-14195 Berlin, Germany 4 Institute for Molecular Biology and Bioinformatics, CharitéUniversitätsmedizin Berlin, Campus Benjamin Franklin, Arnimallee 22, D-14195 Berlin, Germany
5 To whom correspondence should be addressed. E-mail: hiss{at}bioinformatik.uni-frankfurt.de
| Abstract |
|---|
|
|
|---|
Identification of molecular features that determine peptide interaction with major histocompatibility complex I (MHC I) is essential for vaccine development. We have developed a concept for peptide design by combining an agent-based artificial ant system with artificial neural networks. A jury of feedforward networks classifies octapeptides that are recognized by mouse MHC I protein H-2Kb. Prediction accuracy yielded a correlation coefficient of 0.94. Peptides were designed in machina by the artificial ant system and tested in vitro for their MHC I stabilizing effect. The behavior of the search agents during the design process was controlled by the jury network. The experimentally determined prediction accuracy was 89% for the designed stabilizing and 95% for the non-stabilizing peptides. Novel H-2Kb stabilizing peptides were conceived that reveal extensions of known residue motifs. The combined network-agent system recognized context dependencies of residue positions. A diverse set of novel sequences exhibiting substantial activity was generated.
Keywords: ant colony optimization/artifical neural networks/MHC I/peptide design
| Introduction |
|---|
|
|
|---|
Rational peptide design strategies can assist in the development of peptide vaccines against virus-induced illnesses or other diseases. Predicting peptides, which are presented by major histocompatibility complex (MHC) I molecules, is a first step in this direction. The ultimate goal is the identification of epitopes among these peptides, that is, sequences that actually stimulate an immune response. Biochemical approaches like phage display are suited for finding such peptides but can be time-consuming and not always applicable due to experimental limitations (Cohen et al., 2003
MHC I proteins are integral cell-membrane proteins of
45 kDa which present short peptides with a length of eight or nine amino acids, but also longer peptides (
15 residues) were observed (Rammensee et al., 1993
). MHC I consists of three extracellular domains (
1,
2 and
3), each with about 90 amino acids, and the non-covalently associated ß2-microglobulin (ß2M, 12 kDa). The peptide-binding pocket is formed by the
1 and
2 domains. Known 3D structures of MHC Ipeptide complexes reveal deep binding pockets in the
1 and
2 domains which explains the occurrence of conserved amino acid properties (Fremont et al., 1995
). Despite many high-resolution structural complexes available and attempts to model MHC/peptide interaction (Rammensee et al., 1993
), it is still not fully understood what renders an MHC I binding peptide an MHC I stabilizing peptide and in the end an epitope. Peptide binding to MHC I molecules stabilizes the MHC/peptide complex at the cell surface, which is a necessary condition for triggering an adaptive immune response. The stabilizing effect is an indirect indicator of the binding ability of the peptide. MHC I/peptide complex formation at the cell surface is predisposed by the residue sequence of the bound peptide (Rammensee et al., 1993
; Su and Miller, 2001
). Epitopes forming more stable MHC I/peptide complexes remain longer at the cell surface. As a consequence, the likelihood increases that the complex is detected by a suitable T-cell receptor. It is important to note that there is evidence that in the case of autoimmunization medium-stabilizing peptides more often provoke an immune response than highly stabilizing peptides (Andersen et al., 2003
). Weakly stabilizing peptides can still provoke an immune response although a higher amount of peptide is required to reach a lyses-factor that is comparable to strongly stabilizing peptides (Bredenbeck et al., 2005
; Uchiyama et al., 2005
). In any case, binding to MHC I is a prerequisite for the immune response.
Rammensee and coworkers described allele-specific canonical residue motifs found in many known MHC I binding peptides (Rammensee et al., 1993
). For H-2Kb stabilizing octapeptides the motif is defined for the so-called anchor positions 3 [Tyr], 5 [Tyr OR Phe] and 8 [aliphatic]. The term anchor is an interpretation of the conservation of residues at these positions in known MHC I-binding peptides (Rammensee et al., 1999
).
We have analyzed the variability of known MHC I-stabilizing peptides with the aim to find limits of their variability and potentially new stabilizing and non-stabilizing sequences. To this avail, we employed ANN for feature extraction and assessment of the MHC I-stabilizing capacity of novel peptides. An artificial ant-system was implemented for actual peptide design and systematic navigation through the sequence space. The idea of ant colony optimization (ACO) (Dorigo et al., 1996
) was introduced about a decade ago. ACO algorithms belong to the class of biologically motivated algorithms and intend to copy the foraging behaviors of ants of different subfamilies, e.g. Dolichodorinae. Experiments have shown that these ants are capable of finding the shortest path connecting nest and food source (Deneubourg et al., 1990
). This is achieved through utilization of a collective memory realized by pheromones (Bonabeau et al. 1999
, 2000
). With pheromones, ants are able to work as individuals yet coordinate and organize themselves towards a common aim (stigmergy). This concept is adapted by ACO. Ever since its initial introduction, ACO has been widely recognized and tailored to solve several technical optimization problems such as the quadratic assignment problem (Maniezzo and Colorni, 1999
), the scheduling problem (T'kindt et al., 2002
), the traveling salesman problem (Dorigo and Gombardella, 1997
), graph coloring problem (Costa and Hertz, 1997
), the sequential ordering problem (Gambardella and Dorigo, 1997
) and the vehicle routing problem (Bullnheimer et al., 1999
).
Despite its wide application to technical optimization tasks, ACO has not found its recognition in biotechnology. Recently, ACO has been used for the prediction of the binding core of known MHC II-binding peptides via multiple alignment (Karpenko et al., 2005
). In our study, we applied the ACO concept to peptide design with the aim to generate new peptide sequences that bind to MHC I molecules. We present the concept of ACO-based peptide design, taking the de novo design of peptides that bind to MHC I H-2Kb molecules as an example (Fig. 1). The designs were synthesized, and their actual MHC I-stabilizing effect was tested in cellular assays. We demonstrate that a simplistic version of ACO can already be employed to perform the task of generating new functional peptide sequences. An emphasis of this work was to test if the ACO design principle actually leads to peptides with user-defined stabilizing capabilities. Peptides were successfully tested in cell-based assays to prove the concept and the capacity of the algorithm. Furthermore, we developed a visualization of the peptide optimization process.
|
| Materials and methods |
|---|
|
|
|---|
Sequence compilation and encoding
One hundred and thirty five octapeptides (72 positive examples, 63 negative examples) with known MHC I H-2Kb binding ability (data not shown) and their respective affinities were compiled from the following sources:
- MHCPEP database (Brusic et al., 1994
) (URL: http://wehih.wehi.edu.au/mhcpep/),
- AntiJen database (JenPep) (Blythe et al., 2002
) (URL: http://www.jenner.ac.uk/AntiJen/),
- Publications (Siijts et al., 1994a
, 1994b
; Cole et al., 1995
; Blake et al., 1996
; Brock et al., 1996
; Gundlach et al., 1996
; Vitiello et al., 1996
; Hudrisier et al., 1997
; Wizel et al., 1997
; Beekman et al., 2000
; Park et al., 2000
).
For every MHCPEP entry used in this work, the original publication was checked to confirm the database entry and to judge the assay system used for the stabilizing measurements. To avoid annotation errors such as the origin of the peptide, all AntiJen entries were reviewed regarding the corresponding SwissProt (Bairoch et al., 2005
) entry (if available) for their source proteins and the original publication of the epitope. In cases of contradicting entries, the sequence was discarded. Sequences from the SYFPEITHI database (Rammensee et al., 1999
) (http://www.syfpeithi.de) were not included since nearly all entries fit the Rammensee motif or canonical motif for preferred residues at the anchor positions (Rammensee et al., 1993
).
For computational analysis, each peptide was described by three different sets of descriptors as input to a specific ANN.
- ANN_5: (5 · 8 = 40-dimensional input) hydrophobicity (Engelman et al., 1986
); hydrophilicity (Hopp and Woods, 1981
); bulkiness (Jones, 1975
); refractivity (Jones, 1975
); library stabilization values (Udaka et al., 1995
).
- ANN_19: (19 · 8 = 152-dimensional input): 19 score vectors from a principal component analysis of physicochemical amino acid properties (Schneider and Wrede, 1998
).
- ANN_44 (44 · 8 = 352-dimensional input): 44 descriptors selected from a set of 146 descriptors of the software package MOE (Molecular Operation Environment, version 2004.03, Chemical Computing Group, Inc., Montreal, Quebec, Canada) using KolmogorovSmirnov statistics (Byvatov and Schneider, 2004
): VDistEq, weinerPath, a_count, a_IC, b_1rotR, b_rotR, chi0v_C, chi1v_C, chi0_C, chi1_C, balabanJ, PEOE_PC +, PEOE_RPC +, PEOE_VSA-4, PEOE_VSA_FHYD, PEOE_VSA_FPNEG, PEOE_VSA_FPOL, PEOE_VSA_FPOS, PEOE_VSA_FPPOS, PEOE_VSA_HYD, Q_VSA_FHYD, Q_VSA_FNEG, Q_VSA_FPOL, Q_VSA_FPOS, Q_VSA_FPPOS, Q_VSA_POL, Q_VSA_POS, Q_VSA_PPOS, Kier3, KierA1, KierA3, KierFlex, a_don, vsa_pol, SlogP, density, vdw_area, b_count, VAdjEq, PEOE_VSA_FNEG, Q_VSA_HYD, KierA2, SMR_ VSA6, PEOE_VSA_POS.
Fully-connected feedforward networks with a single hidden layer and one output neuron were implemented using Matlab version 7.0.1.15
[EC]
and the Neural Network Toolbox 4.0.4 (The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098, USA). The outputs of three such ANNs were combined forming the input of a jury network (Baldi and Brunak, 2001
) (Fig. 2). Each of the three ANNs received a different representation of an octapeptide sequence as input pattern x (ANN_5, ANN_19, ANN_44). The overall function modeled by such an ANN is given by Eq. 1.
|
| 1 |
the hidden neurons' bias values and
the bias of the output neuron. One to ten hidden layer neurons were used to determine a useful size of the hidden layer of each of the three initial networks and the jury network. The output of the jury output neuron (score) was used as fitness function for MHC-Ant. Note that each of the three ANNs preceding the jury receive a different input x. In this way, different representations of a peptide molecule were treated separately (Givehchi and Schneider, 2005
|
We employed standard online backpropagation-of-errors with momentum for network training (Rumelhart, et al.,1986
|
| 2 |
Prior to network training, all input vectors were scaled to unit variance (Eq. 3).
|
| 3 |
k the mean of the descriptor component k and i the running index over the data points.
The stabilization assay was performed as described in Brock et al. (1996)
with TAP-deficient RMA-S cells (mutagenized Rauscher virus-induced T-lymphoma cells, murine origin) (Ljunggren and Kärre, 1985
). The cells were cultured in DMEM (Gibco-BRL, Karlsruhe, Germany) with 10% FCS (Sigma-Aldrich, Steinheim, Germany) at 37°C with 8% CO2. Prior to the assay, the cells were cultured for 16 h at 26°C to allow accumulation of unloaded MHC at the cell surfaces. The cells were then incubated for 1 h at room temperature with the peptides at concentrations ranging from 100 to 5.6 x 104 µg/ml (in 10 equal steps) followed by 1 h at 37°C for denaturation of peptide-free H-2Kb. The remaining stable H-2Kb molecules at the surface of the cells were quantified by flow cytometry using the biotinylated H-2Kb specific monoclonal antibody B8.24.3 (G. Köhler, Basel Institute of Immunology, The Immune System 2 : 2002-8, 1981; purified in the laboratory from hybridoma culture supernatant by protein G affinity chromatography and coupled with NHS-biotin, Pierce, Darmstadt, Germany) and FITC-streptavidin as secondary reagents (BD Pharmingen, Heidelberg, Germany). The mean fluorescence intensity (MFI) was taken as measure for peptide stabilizing effect and therefore as indirect indicator for peptide binding. SC50 values were calculated as the peptide concentration leading to half-maximal MFI. Measurements and data analysis were performed with a FACSCalibur (Cell Quest TMPro; BD Bioscience, Heidelberg, Germany). Peptides were synthesized by EMC Microcollections GmbH, Tübingen, Germany. To avoid problems with oxidation and oligomerization, peptides containing methionine and cysteine were excluded.
The R-score (Rammensee et al., 1999
) indicates to which extent a given peptide meets the terms of the canonical motif. The R-score was calculated using the public web interface at URL: http://www.syfpeithi.de/ (version of July 2005).
The Lib score (Udaka et al., 2000
) predicts the binding abilities to a specified MHC molecule http://hypernig.nig.ac.jp/cgi-bin/Lib-score/request.rb (version 2006).
| Results |
|---|
|
|
|---|
The primary purpose of this work was to probe the ACO concept for its usefulness in peptide design. We trained three different ANNs on sequences known to bind to the chosen mouse MHC I molecule H-2 Kb. Each ANN modeled a sequence-activity relationship for MHC/peptide interaction using different sets of molecular descriptors (descriptor spaces). The three descriptor sets represent diverse physicochemical amino acid properties. The trained networks were combined to form a jury network, which was employed as fitness function by an ACO-based algorithm (MHC-Ant) for the design of new sequences with desired stabilizing ability.
Four different categories of peptides were designed de novo:
- Category I: H-2Kb-stabilizing peptides (positive design).
- Category II: H-2Kb non-stabilizing peptides (negative design).
- Category III: H-2Kb stabilizing peptides that do not fulfill the canonical MHC binding motif of preferred residues at the anchor positions (Rammensee et al., 1993
).
- Category IV: H-2Kb non-stabilizing peptides that fulfill the canonical motif.
Agent-based ant colony optimization algorithm (MHC-Ant)
MHC-Ant was used to systematically generate octapeptides that have a desired H-2Kb stabilizing capability. The overall number of theoretically possible peptides is 208 for the murine genetically encoded amino acids. MHC-Ant used the jury net as fitness function and the ACO process for the actual peptide design step. MHC-Ant performs three major tasks.
- Sequence design (path generation). The decision space through which the search agent (the artificial ant) moves to generate a new sequence was represented by an 8 x 20 matrix (pheromone matrix). This matrix contains the transition probabilities (pheromone concentrations) to move from a residue at the sequence position i to a residue at position i + 1. This means that a peptide was regarded as a path of the artificial ant in the decision space (Fig. 1). In the first iteration, the probabilities in the decision space were identical for each amino acid (Pi
i + 1 = 0.05), and as a consequence, a random sequence was generated. In subsequent optimization cycles, the pheromone matrix was updated, resulting in more specific peptides.
- Sequence evaluation (path evaluation). The actual path of a virtual ant through the pheromone matrix was evaluated by the jury network. This path represents the peptide sequence that is defined by maximal transition probabilities for each sequence position. The output of the jury lies in [0, 1], where 0 means non-stabilizing, values around 0.5 no assessment possible and 1 strongly stabilizing.
- Pheromone update. The pheromone concentrations p at the residue positions along the path representing the evaluated peptide were updated according to the output of the jury net (score) (Eq. 4). The pheromone concentration was prevented to drop below 0.1 or rise above 0.9 to reduce the risk of premature convergence.
4
The loop of these three consecutive steps represents a single iteration of MHC-Ant, which is equivalent to one virtual ant (agent) or one generated sequence. Preliminary experiments showed that after 200 000 iterations the system reliably converged to one preferred sequence, that is, at each sequence position one of the 20 amino acids had a steady pheromone concentration of 0.9 for at least 10 000 iterations.
First, a jury ANN was trained to serve as fitness function for designing peptides of Categories I, II and IV (Table I). The networks with the best cross-validated classification accuracyquantified by Matthews' correlation coefficient, ccwere used for MHC-Ant. All three descriptor spaces led to predictive network solutions. Judging from the average cc values obtained in the cross-validation study, the set of 44 descriptors (ANN_44) was least suited for the classification task. It also suffered most from overtraining, which might be a consequence of the greater number of variables in this network. The jury net outperformed each individual ANN yielding an approximately equally high performance with both training and test data. The three descriptor sets, and consequently the three ANNs feeding the jury net, might have captured different sequence features responsible for MHC-peptide interaction. Cross-validation results indicate that the jury net was not significantly affected by an overtraining effect, yielding correlation coefficients of 0.96 (training) and 0.94 (test). We concluded that this network could be used as fitness function for MHC-Ant.
|
A second jury ANN was used for the design of Category III peptides only. It was trained with a pruned data set (not shown): All positive training examples completely fulfilling the canonical MHC binding motif (Rammensee et al., 1993
At first we tested the convergence behavior of the artificial ant system for sequence design. The pheromone matrix was expected to reach stable probability values after sufficient optimization cycles. Figure 3 shows sample states of an MHC-Ant run after 12 000 (a), 49 500 (b) and 199 500 (c) cycles. An animation of an MHC-Ant run is presented in Supplementary data, Movie 1. The system found optima in search space, which correspond to preferred amino acid residues. In Fig. 3a, the algorithm has already converged at positions 1, 3, 4, 6 and 8 but still allows for higher residue variability at positions 2 and 7. No clear preference can be observed at position 5. In Fig. 3b, positions 2 and 7 have converged. Position 5 has just begun focussing on a single residue. Note that all amino acids (excluding Phe) are still under exploration while Arg is favored. Figure 3c shows the final stage of the optimization run with convergence at all positions. Noteworthy, at position 5 Tyr was chosen despite the intermediate preference for Arg (compare Fig. 3b and c). This demonstrates the flexibility of MHC-Ant during optimization and makes the process of path-development over time particularly interesting for further investigation. A productive peptide design run was terminated when the algorithm converged at all positions.
|
Overall, 108 octapeptide sequences were designed with MHC-Ant. These belong to four distinct Categories (I, II, III and IV) with different desired H-2Kb stabilizing effects and restrictions (Table II). Bold numbers refer to the peptides listed in Table II.
|
Category I: 27 sequences were designed to be H-2Kb-stabilizing peptides (sequences 127). Sequences 1827 were generated with the restriction no serine at position 1 in order to increase diversity, since 57% of the positive training examples contained serine at position one.
Category II: 37 sequences were designed to be H-2Kb non-stabilizing peptides (sequences 2864). Here, the optimal fitness function value (jury net score) was 0 instead of 1 for the designs.
Category III: 28 sequences were designed to be H-2Kb stabilizing peptides (sequences 6592), which do not fulfill the canonical motif. The jury net was therefore trained on a pruned data set lacking canonical octapeptides. Our aim was to see whether additional H-2Kb stabilizing motifs could be found by our design approach.
Category IV: 11 sequences were designed to be H-2Kb non-stabilizing peptides while fulfilling the canonical motif for preferred residues at the anchor positions (sequences 93, 95, 97, 99, 101, 102, 104108) and five sequences designed to be H-2Kb-stabilizing (sequences 94, 96, 98, 100, 103). These five sequences were specifically designed so that they differ in only one position (either position 5 or 8) to the examples from Category IV. They served for direct comparison of the influence of these position.
SC50 values were determined for all 108 designed sequences, assessed through their MHC-stabilization ability (Table II). We use the term SC50 in this study in analogy to the concept of IC50 values. It gives the peptide concentration resulting in 50% of the maximal MHC stabilization.
Category I peptides: 21 of 27 sequences of this category exhibited an SC50 value well below 10 µM and can thus be regarded as H-2Kb stabilizing. This means that 78% of the sequences designed by MHC-Ant to be stabilizing were experimentally confirmed (positive correct predictions). Three sequences 9, 11 and 18 show borderline activity with an SC50 value around 10 µM. They can be regarded as medium stabilizing. When counted as positives, 24 of 27 (89%) sequences designed by MHC-Ant to be stabilizing were experimentally confirmed. The sequence with lowest SC50 value (best stabilizing effect) in the experiments is ITYQYIPL (24; SC50 = 0.0006 µM). For comparison, the natural epitope SIINFEKL used as positive control exhibited an SC50 value of 1 µM, the mimotope SIYRYYGL used as positive control an SC50 of 100 nM and the H-2Ld binding peptide LSPFPFDL used as negative control yielded an SC50 of 1 mM.
Category II peptides: 35 of 37 peptides designed to be non-stabilizing according to the requirements of this category have an SC50 value >10 µM and can thus be regarded as weakly or non-stabilizing. Thereby, 95% of the sequences designed by MHC-Ant to be non-stabilizing were experimentally confirmed (negative correct predictions).
Category III peptides: 27 of 28 sequences showed an SC50 value
10 µM, in that way not fulfilling the requirements for this category, which were: an SC50 <10 µM while not fulfilling the canonical motif. Sequence 83 with an SC50 value of 0.001 µM completely fulfills the canonical motif thereby disobeying the second condition of this category. Sequence 75 (SAFKGLSY) has an SC50 value of 10 µM and can therefore be regarded as medium stabilizing, although it does not contain the canonical motif and thus fulfills the first condition of this category.
Category IV peptides: 15 of 16 sequences showed an SC50
10 µM. The conditions for this category were to completely fulfill the canonical motif and an SC50 value >10 µM. Five out of 16 sequences were not designed under these conditions but as sequences completely fulfilling the canonical motif and showing an SC50 value
10 µM. They are marked with (+) in Table II and deviate in only one position to the sequences marked with () therefore serving as direct comparison. The remaining 10 sequences were designed to be non-stabilizing but did exhibit an SC50
10 µM thereby not fulfilling the second condition of this category. Sequence 93 (DKYKFRWI) fulfills the canonical motif but has an SC50 value of 100 µM (weakly stabilizing) and therefore fulfills the conditions of this category.
| Discussion |
|---|
|
|
|---|
MHC-Ant was able to design novel binding and non-binding peptide ligands for the MHC I allomorph H-2Kb(Categories I and II), proving the general functionality of the algorithm. The software was capable of generating previously unknown peptides both positive peptides that are in agreement with the known canonical motif (Category I) and positive peptides that are not, despite the lack of informative training sequences (Categories III and IV). MHC-Ant also successfully designed completely negative sequence examples (Category II), which provide a basis for further refinement of the fitness function, in our case a neural network system. We thereby conclude that MHC-Ant and thereby the ACO concept can be used for the design of peptide sequences with biological relevance.
Whereas the sequences of Categories I and II represent a general proof-of-concept, the sequences of Categories III and IV should be regarded as an outlook of possible MHC-Ant applications. Their restrictive design conditions represent an attempt to overcome the bias of the training data found in the literature, which are dominated by the canonical motif.
Special aspects of each category and selected sequences will now be discussed in more detail.
The experimentally validated results for this category demonstrate that it is possible to rationally design novel epitopes with improved H-2Kb stabilizing capabilities compared to natural epitopes (24). MHC-Ant, if trained with informative examples, is capable of coming up with new peptide sequences with substantial biological activity. It should be noted that all positive-designed peptides of Category I fulfill the canonical motif in at least one sequence position. This supports the canonical motif concept and might be explained by the bias in our training data: all positive training examples fulfill the canonical motif in at least one position. Still, there are contradicting examples of both, stabilizing (SGYDWRRL, 10; DSQWFNPP, 86) and non-stabilizing (KYYPNEDV, 31; LVLNYDKK, 40) peptides, that obey the canonical motif in one or two positions, respectively. Recently, a larger set of MHC binding peptides was published by Peters et al. (2006)
. This data set might provide additional training data for an improved version of MHC-Ant.
A requirement of Category I was to choose an alternative N-terminal amino acid to Ser, which is found in 57% of the training sequences. Therefore, Ser was forbidden at the first position during the design of sequences 1827. MHC-Ant decided for Gln, Trp, Lys and Ile at this residue position as alternatives. Our results show that other N-terminal residues besides Ser are tolerated in stabilizing peptides. Noteworthy, sequence 2 also starts with Trp without the Ser-restriction in place. This suggests that even without residue restrictions, MHC-Ant detected alternative paths through sequence space besides the dominating ones.
An important result of our design exercise is that 22 sequences (60%) of the 35 true-negative sequences produced no detectable stabilization of H-2Kb over the entire peptide concentration range tested. To our knowledge, this is the first report of de novo designed and in vitro tested sequences for H-2Kbthat do not exhibit any detectable stabilization effect. Only sequences 32 (YEGARLKH) and 44 (IKIWRFYA) yielded an SC50 value of about 10 µM (medium stabilizing), and therefore represent false-negative designs. Noteworthy, they both lack the canonical motif but still lead to a medium stabilizing effect thereby fulfilling the requirements for Category III peptides.
Peptides 31, 32, 40 and 44 were designed for Category II. Surprisingly, 31 and 40 fulfill the requirements for Category IV, whereas 32 and 44 obey the rules for Category III. Sequences 31 and 40 were found by MHC-Ant to be non-stabilizing without the restriction for Category IV in place, that is, the obligation to fulfill the canonical motif completely. Surprisingly, canonical residues are present in one or two positions of these non-stabilizing sequences. Our attempt to create sequences that completely fulfill the canonical motif and are non-stabilizing (Category IV) failed with the exception of a single such sequence (93). The occurrence of sequences 31 and 40 in Category II demonstrates that it is technically possible to design non-stabilizing peptides that partly contain the canonical motif. In our study, MHC-Ant was forced to design non-stabilizing sequences for Category IV containing the complete canonical motif. As a consequence, examples which only partly fulfill this motif and are non-stabilizinglike sequences 31 and 40were not allowed. Future design experiments might keep individual anchor positions fixed and systematically explore the degrees of variability at the remaining anchor positions.
The reasons for the non-stabilizing effect of sequences 31 and 40 might lie in the incompatibility of the combination of amino acids with the peptide-binding site of H-2Kb. The amino acids at the positions not belonging to the canonical motif (positions 1, 2, 4, 6 and 7) might be incompatible with the requirements of the MHC I binding pocket and thus affect binding. The variability of the stabilizing capabilities of peptides completely fulfilling the canonical motif supports this assumption. The stabilizing capability of these peptides varies up to 100 µM (3 and 24). This emphasizes the importance of the other sequence positions for receptor-ligand interaction.
A hierarchy of anchor positions as reported earlier (Deres et al., 1993
) could not be observed in our study. MHC-Ant showed no preference for certain anchor positions while generating the sequences; therefore, the generated and tested sequences reveal no statistical relevant evidence for such a hierarchy.
We found one sequence (75) in 28 with medium stabilizing effect while not fulfilling the canonical motif as required for this category. Sequence 75 (SAFKGLSY) has an SC50 value of 10 µM. Note that two sequences in Category II (32, 44) meet the requirements of Category III. The sequences 35, 44 and 75 fulfilling the requirements of Category III represent a property pattern that complements the canonical motif. Further studies will be essential to substantiate our initial finding.
One sequence (93) among 11 experimentally validated peptide sequences fulfills the restrictive requirements of this category, which are complete fulfillment of the canonical motif while being non-stabilizing. Note that two sequences (31, 40) where found in Category II satisfying the requirements of Category IV. The occurrence of two sequences in Category II fitting to the requirements of Category IV is discussed in Category II.
A category-spanning observation regards the accuracy of Rammensee's R-score: sequences 37, 40 and 53 partially fulfill the canonical motif with tyrosine at position 5. Therefore, they all have an R-score around 12 but differ in their measured SC50 in the order of magnitude of 100 µM. The same is true for sequences 30, 41 and 48. Peptide 48 yields the same R-score as sequence 30, but the measured SC50 values deviate about 1 mM. The Rammensee-score fails to distinguish these peptides from stabilizing peptides. These shortcomings of the R-score may be caused by its restriction to position-specific features, thus underestimating structural and context-dependent features of MHC-peptide interaction. In contrast, MHC-Ant sequence design represents a more advanced approach taking all residue positions and their potential interactions into account. The chosen residue descriptors include physicochemical as well as structural aspects of the amino acids at each position. Recently, Bui and coworkers introduced a concept for modeling the MHC-peptide interaction on a structural basis (Bui et al., 2006
). It would be interesting to see our designed sequences simulated regarding their structural capabilities, using an adapted algorithm for MHC I H-2Kb. By such simulations, further aspects of the stabilizing behavior of the here presented peptides might be revealed, as well as arguments for potential epitope functionality. A further possibility would be to use calculations of free energies of the protein-peptide interaction as fitness function for our sequence-based method. Free energy calculations were previously used for rational peptide modeling (Froloff et al., 1997
). We also see potential for ACO algorithms to combine sequence- and structure-based approaches for artificial epitope design.
The sequences fulfilling the requirements for Categories III and IV combined with the complete data set of the 108 designed and experimental validated sequences represent a potential starting point for further investigations regarding the variability of H-2Kb binding patterns. They might help explore patterns beyond the canonical motif and may serve as a basis for the generation of more diverse training data.
An adaptation of MHC-Ant to peptides of different lengths is possible but would require a new in vitro test circle for evaluation as well as a new data set of training sequences. Since the aim of the project was to test if an adaptation of the general ACO concept to the task of peptide design is possible at all, we focussed on octamers only. Of course, the concept of MHC-Ant could be adapted to peptides of variable length. Nevertheless, since this study included in vitro validation we decided for a restriction to a certain peptide length.
The variation of the ACO concept presented here demonstrates an attempt to connect the area of agent-based search algorithms with the complex area of epitope prediction. Future work will focus on special properties of ACO such as the possibility to visualize the optimization process and extract design rules for the optimal path in sequence space as well as the design of additional sequences complementing the canonical motif.
| Supplementary data |
|---|
|
|
|---|
Supplementary data are available at PEDS online.
| Footnotes |
|---|
Edited by Robin Offord
| Acknowledgment |
|---|
We are grateful to Norbert Dichter for technical assistance. Equally, we thank Alireza Givehchi for the fruitful discussions on jury networks and Dr Jürgen Paetz for comments on ant algorithms. Matthias Heinig is thanked for compiling the MHCPEP sequences. This research was supported by the Beilstein-Institut zur Förderung der Chemischen Wissenschaften, Frankfurt am Main, and the Center for Membrane Proteomics (CMP) at the Johann Wolfgang Goethe-University.
| References |
|---|
|
|
|---|
Andersen M.L.M., Ruhwald M., Nissen M.H., Buus S., Claesson M.H. (2003) Scand J Immunol 57:2127.[CrossRef][Web of Science][Medline]
Bairoch A., et al. (2005) Nucleic Acid Res. 33:154159.[CrossRef]
Baldi P. and Brunak S. (2001) Bioinformatics: The Machine Learning Approach 2nd ed. (MIT Press, Cambridge, MA).
Beekman N.J., van Veelen N.J., van Hall P.A., Neisig T.A., Sijts A., Camps M., Kloetzel P.M., Neefjes J.J., Melief C.J., Ossendorp F. (2000) J. Immunol. 164:18981905.
Bishop C.M. (1995) Neural Networks for Pattern Recognition(Clarendon Press, Oxford).
Blake J., Johnston J.V., Hellström K.E., Marquardt H., Chen L. (1996) J. Exp. Med. 184:121130.
Blythe M.J., Doytchinova I.A., Flower D.R. (2002) Bioinformatics 18:434439.
Bonabeau E., Dorigo M., Theraulaz G. (1999) Swarm Intelligence From Natural to Artificial Systems(Oxford University Press, New York).
Bonabeau E., Dorigo M., Theraulaz G. (2000) Nature 406:3942.[CrossRef][Medline]
Bredenbeck A., Losch F.O., Sharav T., Eichler-Mertens M., Filter M., Givehchi A., Sterry W., Wrede P., Walden P. (2005) J. Immunol. 174:67166724.
Brock R., Wiesmüller K.-H., Jung G., Walden P. (1996) Proc. Natl. Acad. Sci. 93:1310813113.
Brusic V., Rudy G., Harrison L. (1994) Nucleic Acids Res. 22:36633665.
Bui H.-H., Schiewe A.J., von Grafenstein A.J., Haworth H.I.S. (2006) Proteins 63:4352.[CrossRef][Web of Science][Medline]
Bullnheimer B., Hartl R.F., Strauss C. (1999) Ann. Operat. Res. 89:319328.[CrossRef]
Byvatov E. and Schneider G. (2004) J. Chem. Inf. Comput. Sci. 44:993999.[CrossRef][Web of Science][Medline]
Cohen C.J., Denkberg G., Lev A., Epel M., Reiter Y. (2003) J. Mol. Recognit. 16:324332.[CrossRef][Web of Science][Medline]
Cole G.A., Hogg T.L., Woodland D.L. (1995) J. Immunol. 155:28412848.[Abstract]
Costa D. and Hertz A. (1997) J. Operat. Res. Soc. 48:295305.
Davies M.N., Hattotuwagama C.K., Moss D.S., Drew M.G., Flower D.R. (2006) BMC Struct. Bio. 6:5.
Deneubourg J.-L., Aron S., Goss S., Pasteels J.M. (1990) J. Insect Behav. 2:159168.[CrossRef]
Deres K., Beck W., Faath S., Jung G., Rammensee H.-G. (1993) Cell Immunol. 151:158167.[CrossRef][Web of Science][Medline]
Dorigo M. and Gambardella L.M. (1997) BioSystems 43:7381.[CrossRef][Web of Science][Medline]
Dorigo M., Maniezzo V., Colorni A. (1996) IEEE Trans. Syst. Man Cybern. Part B 26:2941.
Engelman D.M., Steitz T.A., Goldman A. (1986) Annu. Rev. Biophys. Chem. 15:321353.[CrossRef][Web of Science][Medline]
Filter M., Eichler-Mertens M., Bredenbeck A., Losch F.O., Sharavb T., Givehchi A., Walden P., Wrede P. (2006) QSAR Comb. Sci. 25:350358.[CrossRef]
Fremont D.H., Stura E.A., Matsumura M., Peterson P.A., Wilson I.A. (1995) Proc. Natl. Acad. Sci. U. S. A. 92:24792483.
Froloff N., Windemuth A., Honig B. (1997) Prot. Sci. 6:12931302.[Web of Science][Medline]
Gambardella L.M. and Dorigo M. (1997) HAS-SOP: a hybrid ant system for the sequential ordering problem. Technical Report IDSIA, No. IDSIA-11-97.
Givehchi A. and Schneider G. (2005) Mol. Div. 9:371383.[CrossRef]
Gundlach B.R., Wiesmuller K.H., Junt T., Kienle S., Jung G., Walden P. (1996) J. Immunol. Meth. 192:149155.[CrossRef][Web of Science][Medline]
Hopp T.P. and Woods K.R. (1981) Proc. Natl. Acad. Sci. USA 78:38243828.
Hudrisier D., Oldstone M.B.A., Gairin J. (1997) Virology 234:6273.[CrossRef][Web of Science][Medline]
Jones D.D. (1975) J. Theor. Biol. 50:167184.[CrossRef][Web of Science][Medline]
Karpenko O., Shi J., Dai Y. (2005) Artif. Intell. Med. 35:147156.[CrossRef][Web of Science][Medline]
Ljunggren H.G. and Kärre K. (1985) J. Exp. Med. 162:17451759.
Mamitsuka H. (1998) Proteins 33:460474.[CrossRef][Web of Science][Medline]
Maniezzo V. and Colorni A. (1999) IEEE Trans Knowl. Data Eng. 11:769778.[CrossRef]
Matthews B.W. (1975) Biochem. Biophys. Acta. 405:442451.[Medline]
Park J.M., Cho S.Y., Hwang Y.K., Um S.H., Kim W.J., Cheong H.S., Byun S.M. (2000) J. Med. Virol. 60:189199.[CrossRef][Web of Science][Medline]
Peters B., et al. (2006) PloS Comput. Biol. 2:574584.
Rammensee H.-G., Falk K., Rötzschke O. (1993) Annu. Rev. lmmunol. 11:213244.
Rammensee H.-G., Bachmann J., Emmerich N.N., Bachor O.A., Stevanovic S. (1999) Immunogenetics 50:213219.[CrossRef][Web of Science][Medline]
Rumelhart D.E. and McClelland J.L. (1986) The PDP Research Group. Parallel Distributed Processing (MIT Press, Cambridge, MA).
Schneider G. and Wrede P. (1998) Prog. Biophys. Mol. Biol. 70:175222.[CrossRef][Web of Science][Medline]
Sijts A.J., Ossendorp F., Mengede E.A., van den Elsen P.J., Melief C.J. (1994a) J. Immunol. 152:106116.[Abstract]
Sijts A.J., De Bruijn M.L., Ressing M.E., Nieland J.D., Mengede E.A., Boog C.J., Ossendorp F., Uast W.M., Melief C.J. (1994b) J. Virol. 68:60386046.
Su R.-C. and Miller R.G. (2001) J. Immunol. 167:48694877.
T'kindt V., Monmarché N., Tercinet F., Laügt D. (2002) Eur. J. Operat. Res. 142:250257.[CrossRef]
Uchiyama F., Tanaka Y., Minari Y., Tokui N. (2005) J. Biosci. Bioeng. 99:448456.[CrossRef][Web of Science][Medline]
Udaka K., Wiesmüller K.-H., Kienle S., Jung G., Walden P. (1995) J. Biol. Chem. 270:2413024134.
Udaka K., Wiesmüller K.-H., Kienle S., Jung G., Tamamura H., Yamagishi H., Okumura K., Walden P., Suto T., Kawasaki T. (2000) Immunogenetics 51:816828.[CrossRef][Web of Science][Medline]
Vitiello A., Yuan L., Chesnut R.W., Sidney J., Southwood S., Farness P., Jackson M.R., Peterson P.A., Sette A. (1996) J. Immunol. 157:55555562.[Abstract]
Wizel B., Nunes M., Tarleton R.L. (1997) J. Immunol. 159:61206130.[Abstract]
Received August 10, 2006; revised November 24, 2006; accepted November 25, 2006.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


