PEDS Advance Access originally published online on December 19, 2007
Protein Engineering Design and Selection 2008 21(1):37-44; doi:10.1093/protein/gzm084
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Physicochemical feature-based classification of amino acid mutations
1 Institute of Medical Technology, FI-33014 University of Tampere, Finland 2Research Unit, Tampere University Hospital, FI-33520 Tampere, Finland
3 To whom correspondence should be addressed. E-mail: bairong.shen{at}uta.fi
| Abstract |
|---|
|
|
|---|
A huge quantity of gene and protein sequences have become available during the post-genomic era, and information about genetic variations, including amino acid substitutions and SNPs, is accumulating rapidly. To understand the effects of these changes, it is often essential to apply bioinformatics tools. Where there is a lack of homologous sequences or a three-dimensional structure, it becomes essential to predict the effects of mutations based solely on protein sequence information. Several computational methods utilizing machine learning techniques have been developed. These predictions generally use the 20-alphabet amino acid code to train the model. With limited available data, the 20-alphabet amino acid features may introduce so many parameters that the model becomes over-fitted. To decrease the number of parameters, we propose a physicochemical feature-based method to forecast the effects of amino acid substitutions on protein stability. Protein structure alterations caused by mutations can be classified as stabilizing or destabilizing. Based on experimental folding-unfolding free energy (

G) values, we trained a support vector machine with a cleaned data set. The physicochemical properties of the mutated residues, the number of neighboring residues in the primary sequence and the temperature and pH were used as input attributes. Different kernel functions, attributes and window sizes were optimized. An average accuracy of 80% was obtained in cross-validation experiments.
Keywords: amino acid/mutation/physicochemical properties/protein stability/support vector machine
| Introduction |
|---|
|
|
|---|
Mutated proteins are important and helpful for investigating protein functions, testing hypotheses and understanding genotype–phenotype relationships, as well as for rational protein design and engineering. It is well known that accumulation of autosomal mutations can lead to cancers. Hereditary diseases are caused by one or more germline mutations. All sorts of genetic variations, including single nucleotide polymorphisms (SNPs), are now being identified in large numbers (Consortium, 2005
The consequences of mutations can be estimated using numerous tools. We have applied more than 30 applications to investigate the effects of disease-causing mutations (Thusberg and Vihinen, 2006
; Väliaho et al., 2006
). These methods include, for example, general tests for mutation tolerance [programs like SIFT (Ng and Henikoff, 2001
; Ng and Henikoff, 2003
) and PolyPhen (Sunyaev et al., 2000
; Sunyaev et al., 2001
; Ramensky et al., 2002
)], sequence conservation and covariance [e.g. ProCon (Shen and Vihinen, 2004
), aaMI (Gloor et al., 2005
) and MatrixPlot (Gorodkin et al., 1999
)], and side chain packing, e.g. Probe (Word et al., 2000
), etc. One of the most important features is protein stability, which can be estimated based on sequence and protein three-dimensional structural information. Here, we concentrate on methods and tools for the analysis of the effects on stability.
When the protein structure is known, the effects of amino acid substitutions are often investigated using methods based on force-field theory, such as the free energy perturbation (FEP) technique (Rao et al., 1987
; Hirono and Kollman, 1991
; Kato et al., 2006
). The FEP technique calculates the free energy difference between normal and mutated structures by molecular dynamics simulation. This method is very computationally intensive and yet the results may still be sensitive to the computational procedures and are sometimes unreliable (Shi et al., 1993
). The empirical force field of FoldX (Guerois et al., 2002
) allows significantly faster run times, which also facilitates its use in design for protein engineering. The force field has been optimized for point mutations. FodX can also be run from the SNPeffect server (Reumers et al., 2005
, 2006
).
Several methods that require less computation have therefore been developed and applied to amino acid mutation analysis. We can group these methods into four categories. The methods of the first category make predictions based on structural information, especially that of amino acid side chain rotamers, packing quality and residue–residue contacts (Tuffery et al., 1997
; Sobolev et al., 1999
; Word et al., 2000
; Wang and Moult, 2001
; Wright and Lim, 2001
; Shen and Vihinen, 2003
; Cuff and Martin, 2004
). These techniques either compare the side chain chi (
) angle values to the backbone independent rotamer library and examine conformational space for the mutant side chain, or estimate the residue–residue contact properties. The second method category uses the information from both multiple sequence alignments and from protein three-dimensional structures. The amino acid variations in families of related proteins are converted into propensity and substitution tables. The existence of an amino acid in a structural environment and the probability of the substitutions are estimated quantitatively (Topham et al., 1997
). With the information from both the sequence and structural levels, multiple regression equations can be fitted to predict the folding–unfolding free energy difference for the mutations (Gromiha et al., 1999a
, 1999b
; 2000
; Huang et al., 2007
). The methods in the third category calculate a position-specific amino acid distribution based on multiple sequence alignments (Ng and Henikoff, 2001
; Sunyaev et al., 2001
; Ferrer-Costa et al., 2004
, 2005
; Shen and Vihinen, 2004
). According to the distribution, the amino acid substitutions are classified as tolerated or deleterious. The fourth category of methods uses solely protein sequence information and predicts the protein stability by machine learning algorithms (Capriotti et al., 2004
, 2005a
, 2005b
; Cheng et al., 2006
). Although useful information about the effects of mutations can be obtained with the above methods, there is still a need for improved and more accurate prediction methods.
The method we have developed belongs to the last category mentioned above and aims to predict the stability of amino acid substitutions based on single sequence information. These kinds of methods are useful and, in fact, are the only tools applicable in cases without close homologues and a known structure. Compared to the previous works (Capriotti et al., 2004
, 2005a
, 2005b
; Cheng et al., 2006
), our method uses a cleaned data set and takes the physicochemical similarity of amino acids into account during the training of the expert system. The new method uses reasonable number of parameters and is more robust against over-fitting and has better generality.
| Materials and methods |
|---|
|
|
|---|
Our work aims to classify mutations as stabilizing or destabilizing. For machine learning-based classification, we need to build a model, obtain a set of data and select the training attributes. A support vector machine (SVM) was used as the model and trained with data extracted from the Protherm database (Gromiha et al., 2002
The basic idea for the SVM is shown in Fig. 1. The linear-inseparable data in the input space can be transformed by suitable kernel functions to a high dimensional feature space, where the data can then be separated linearly.
|
The SVM algorithm was implemented as follows: Given a dataset (X) to classify into two classes, the decision function for the data x
is
| 1 |
{–1, +1}, and
i is a positive number which can be obtained by maximizing the margin shown in Fig. 1. The maximizing is actually performed by the following quadratic function,
| 2 |
| 3 |
Here, C is an adjustable parameter which controls the trade-off between training error and the margin. Non-zero xj
s are known as support vectors, which are the points lying closest to the separating hyper plane. Equation (2) is a quadratic programming (QP) problem and only one minimum exists. Since the SVM algorithm can find the global minimum, it has an advantage over neural networks (NN). Another advantage is that the SVM algorithm does not require much computer power even when the feature space dimensions increase, since the data points only appear in the inner products of the vectors.
For training the model, physicochemical properties including hydropathy (Eisenberg et al., 1984
), flexibility (Vihinen et al., 1994
), electronic charge concentration and the isotropic surface area (ISA) (Collantes and Dunn, 1995
) of amino acids were used as input attributes (Table I). The residue flexibility was calculated by considering their neighboring residues (Karplus and Schulz, 1985
; Vihinen et al., 1994
). The environment of the mutated residue was taken into account by including the neighboring residues with a sliding window technique. Different window sizes from 3 to 19 were tested to optimize the prediction performance.
|
Since the thermodynamic parameter of Gibbs free energy change is related to experimental conditions, such as temperature and pH value, we took this information into account as the input attributes for the SVM training and learning. The total number of input attributes for the SVM model was Np *(Winsize +1) + 2, where Np is the number of physicochemical properties and Winsize is the size of the sliding window. Two additional Np attributes, temperature and pH value, were considered for the mutants.
Data set and SVM implementation
The dataset to train the SVM was extracted from the ProTherm database http://gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html (Gromiha et al., 2002
). Since the experimental error for the measurement of 
G could be ±0.4
0.5 kcal/mol (Khatun et al., 2004
), the data with |
G| < 0.5 kcal/mol would be difficult to classify. Thus, we only extracted the cases with |
G| > 0.5 kcal/mol for the model training and testing. During further data cleaning, we removed double mutations and averaged the values when several 
G values from different resources were available for a single mutation. Finally, we had data for 1448 mutations in 68 proteins. One thousand hundred of these were destabilizing (
G
–0.5 kcal/mol) and 348 were stabilizing (
G
0.5 kcal/mol) alterations. The proteins and numbers of mutations are listed in Table II. For the SVM training, the mutations with 
G
0.5 kcal/mol were labeled as positive cases, with yi = +1, and the mutations with 
G
–0.5 kcal/mol were labeled as negative cases, with yi = –1.
|
SVMlight (Joachims, 1999
The accuracy of the predictions was measured by 10-fold cross-validation, which partitioned the data randomly into 10 sets and trained on 9/10ths of the data, then tested on the remaining 1/10th, repeated on each of the 10 sets, and averaged the results. The quality of the prediction is described by four parameters: accuracy, recall, precision and MCC.
|
| 4 |
|
| 5 |
|
| 6 |
|
| 7 |
| Results and discussion |
|---|
|
|
|---|
Foundation of the prediction
According to Anfinsens hypothesis (Anfinsen, 1973
), the information of a proteins tertiary structure is encoded in its primary sequence. Therefore, we can assume that stability information can also be explained with the sequence information. Different machine learning approaches have commonly been used to study sequence-structure relationships to make predictions, e.g. for protein secondary structures (Hoang et al., 2002
; Boden et al., 2006
), the solvent accessibility of residues (Adamczak et al., 2004
; Sim et al., 2005
), residue–residue contacts (Grana et al., 2005
; Yuan, 2005
; Cheng and Baldi, 2007
) and for protein three-dimensional structures (Klepeis et al., 2005
; Zhou and Skolnick, 2007
).
Several machine learning methods are available for extracting patterns from complex data. NN and SVMs are among the most widely used methods for biological problems. SVMs have been applied successfully in many biological data analyses, such as in microarray data classification (Chu and Wang, 2005
; Huang and Chang, 2006
; Doran et al., 2007
), protein solvent accessibility prediction (Yuan et al., 2002
; Kim and Park, 2004
; Wang et al., 2007
), protein secondary structure prediction (Hu et al., 2004
; Wang et al., 2004
) and in protein stability analysis (Capriotti et al., 2005a
, 2005b
; Yue et al., 2005
; Cheng et al., 2006
).
The performance of SVMs can be adjusted and optimized by changing the kernel. We tested different kernels in the SVM for performance speed and learning. We tested linear, polynomial and radial basis function kernels, which are described in equations (8)–(10).
Linear kernel:
|
| 8 |
Polynomial kernel:
|
| 9 |
Radial basis function kernels:
|
| 10 |
Here, d,
and
are adjustable parameters, and xi
and xj
are the data for classification. The radial basis function kernel was found to be the best for the speed and quality of the prediction. The other kernels performed poorly with accuracies below 75%. A similar result was obtained when an SVM was applied to the prediction of sub-cellular localization (Hua and Sun, 2001a
, 2001b
).
Four different physicochemical descriptors, namely hydropathy, flexibility, ISA and electronic charge concentration, were used to characterize the amino acids. The applicability of the parameter combinations was tested. The prediction accuracy is plotted for different parameters and window lengths (ranging from 3 to 19) in Fig. 2. The best prediction results for each parameter combination are listed in Table III. The adjustable parameters C in equation (3) and
in equation (10) were optimized to the range of C = 1–30,
= 0.1–0.4. With C = 2 and
= 0.1, the accuracy of the best performance for the prediction is 80.45% and the corresponding MCC is 0.39 (See Table III). Table III lists the highest scores among the optimized conditions. The scores were calculated by averaging over ten cross-validations. The accuracy for the cross validations (80.45%) for all four amino acid parameters with a window length of 13 is about 3% higher than the previously reported result (77%) (Capriotti et al., 2005a
, 2005b
).
|
|
Input attributes and window size
The choice of attributes is crucial for pattern recognition or classification. According to the results in Table III, the predictions based on all four physicochemical properties are better than the others. The performance is surprisingly even for the different predictors. In addition, the differences between the parameter combinations are relatively marginal. We also trained and predicted using just one or two physicochemical properties and found that the prediction performances were very poor (accuracy below 75% and recall close to 0). This indicates that the stabilizing/destabilizing classification is complex and it cannot be obtained with too few features. If ISA is not included, the performance shows the most significant drop in accuracy. It is well known that hydrophobic effect is one of the principal forces stabilizing protein structures (Kauzmann, 1959
), but it is not the only factor and the prediction solely based on hydropathy had a poor performance (accuracy below 75% recall is almost 0). ISA is a measurement of both the size and the proportion of the residue which is accessible for non-specific interactions with the solvent (Dunn et al., 1987
; Collantes and Dunn, 1995
). It characterizes both residue size and hydropathy. A change in the size of a side-chain of an amino acid is essential for structural stability at buried sites (Wang et al., 1998
; Liu et al., 2000
; Ferrer-Costa et al., 2002
).
The stability of a protein is determined by complex residue–residue contacts and interactions. Without structural information, the environment of a residue in many predictions is considered partially by taking into account the properties of neighbors with a sliding window average technique (Edelman and White, 1989
; Hofmann and Stoffel, 1992
; Vihinen et al., 1994
; Fares, 2004
). For the stability of a protein, it is difficult to account for the complex relationships between residues that are separated by a large distance in the sequence and the protein stability, since the structural neighboring residues may be located both within the local sequence neighborhood and far away in the sequence.
Support vectors are training examples that lie close to the decision boundary between the two classes. The SVM focuses upon the small subset of examples (SVs) that are critical and informative to the classification and throws out the remaining examples. Removal of SVs changes the location of the separating hyperplane and alters the efficiency of the classification. The number of support vectors was more than one-third of the sample size (Table III) indicating the difficulty of the classification. This observation is easy to understand by comparing the size of sequence space and structure folding space. Even very different protein sequences can fold to a similar three-dimensional structure. On the other hand, relatively similar sequences can have different folds in proteins. Protein structures can be robust for point mutations (Taverna and Goldstein, 2002
). Still numerous diseases arise due to single mutations. Residues within a protein take part in complex and non-linear interaction networks, which makes single sequence-based predictions difficult, especially when accounting for residues located far away in the sequence but close together in the folded structure.
Comparison with other sequence-based methods
Two sequence-based approaches, using single or multiple sequences, can be utilized for analyzing amino acid mutations. The multiple sequence methods are generally based on protein score matrices, such as the widely used PAM (Percent of Point Accepted Mutation) and BLOSUM (Blocks Substitution Matrix) (Topham et al., 1997
; Boland and Murphy, 2001
; Ferrer-Costa et al., 2002
). Amino acid mutations are constrained evolutionarily by two factors: structural and functional constraints. The score matrices reflect both of these constraints. Many of the functional residues locate on the surfaces of proteins, which may have little effect on the protein structure and stability. For the classification of stabilizing and destabilizing mutations, it may not be necessary to account for the functional constraint. Therefore, the matrices which include both structural and functional information may in fact decrease the discriminative power of the classifier.
In Fig. 3, the relationship between protein stability (experimental 
G) and change in physico-chemical properties (
H,
I,
E,
F) are shown. It is clear that no simple tendencies or relations could be found. Therefore, a non-linear model is necessary to describe and predict the relationships.
|
Our method uses single sequence information. Previously, this kind of classification has been based upon 20-alphabet attributes and SVM machine learning (Capriotti et al., 2005a
|
The present model improves the prediction in two aspects. First, we use the cleaned data set for the training and testing of our model. We excluded data points with a

G between –0.5 and 0.5 kcal/mol, because in this range the true difference cannot be separated due to experimental uncertainty. This data may mislead the classification since the experimental error can be up to ±0.5 kcal/mol and thus the mutations could not be precisely grouped as stabilizing or destabilizing. We analyzed the classification result of the previous work by Capriotti et al. The data set was taken from the authors webpage (http://gpcr.biocomp.unibo.it/~emidio/I-Mutant2.0/dbMutSeq.html). This site lists the data set used for training and testing their SVM models. The experimental 
Gs and their predicted 
Gs are also given. In total, there are 2048 cases in the list, 594 of which have experimental 
G values between 0.5 and –0.5 kcal/mol; 492 (24.0%) of the 2048 cases are misclassified. Figure 4 shows the misclassified cases. The signs of the predicted and experimental 
Gs are different, thus the products of the experimental 
G and the predicted 
G are less than zero. The majority of the misclassifications (250 cases, i.e. 50.8% of 492 cases) have experimental 
G values between 0.5 and –0.5 kcal/mol (the data between the two lines in Fig. 4).
|
For the other improvement, we use the numeric physicochemical properties of amino acids instead of the 20-alphabet. The properties of amino acids as attributes better explain the characteristics of residues. Protein folding and stability are mainly determined by the properties of amino acids in the primary sequence (Anfinsen, 1973
1500 cases, and only 1350 cases could be used for training when using the 10-fold cross-validation method). With fewer parameters, our model is more robust against over-fitting and has better generality. | Conclusions |
|---|
|
|
|---|
The problem of protein structure availability in the whole genome era has been redefined as the determination of representative structures of conserved protein families (Redfern et al., 2005
Site-directed mutagenesis is a commonly used technique. However, the relationship between a mutation and protein stability is often still an unresolved and difficult problem. Many studies are based on limited data and empirical rules. As the amount of data is continuously increasing, our task is to improve the prediction accuracy and refine the model. Our study indicates a promising approach to predicting mutation effects based solely on single sequence information.
| Footnotes |
|---|
Edited by Jane Clarke
| Acknowledgement |
|---|
|
|
|---|
We gratefully acknowledge the financial support of the Medical Research Fund of Tampere University Hospital.
| References |
|---|
|
|
|---|
Adamczak R., Porollo A., Meller J. Proteins (2004) 56:753–767.[CrossRef][Web of Science][Medline]
Anfinsen C.B. Science (1973) 181:223–230.
Boden M., Yuan Z., Bailey T.L. BMC Bioinformatics (2006) 7:68.[CrossRef][Medline]
Boland M.V., Murphy R.F. Bioinformatics (2001) 17:1213–1223.
Bommarius A.S., Broering J.M., Chaparro-Riggers J.F., Polizzi K.M. Curr. Opin. Biotechnol. (2006) 17:606–610.[CrossRef][Web of Science][Medline]
Capriotti E., Fariselli P., Casadio R. Bioinformatics (2004) 20(Suppl. 1):I63–I68.[CrossRef][Medline]
Capriotti E., Fariselli P., Calabrese R., Casadio R. Bioinformatics (2005) a 21(Suppl. 2):ii54–ii58.[Abstract]
Capriotti E., Fariselli P., Casadio R. Nucleic Acids Res. (2005) b 33:W306–W310.
Chandonia J.M., Brenner S.E. Science (2006) 311:347–351.
Chanock S.J., et al. Nature (2007) 447:655–660.[CrossRef][Medline]
Cheng J., Randall A., Baldi P. Proteins (2006) 62:1125–1132.[CrossRef][Web of Science][Medline]
Cheng J., Baldi P. BMC Bioinformatics (2007) 8:113.[CrossRef][Medline]
Chica R.A., Doucet N., Pelletier J.N. Curr. Opin. Biotechnol. (2005) 16:378–384.[CrossRef][Web of Science][Medline]
Chu F., Wang L. Int. J. Neural. Syst. (2005) 15:475–484.[CrossRef][Web of Science][Medline]
Collantes E.R., Dunn W.J. 3rd. J. Med. Chem. (1995) 38:2705–2713.[CrossRef][Web of Science][Medline]
Consortium T.I.H. Nature (2005) 437:1299–1320.[CrossRef][Medline]
Cuff A.L., Martin A.C. J. Mol. Biol. (2004) 344:1199–1209.[CrossRef][Web of Science][Medline]
Doran M., Raicu D.S., Furst J.D., Settimi R., Schipma M., Chandler D.P. Bioinformatics (2007) 23:487–492.
Dunn W.J. 3rd, Koehler M.G., Grigoras S. J. Med. Chem. (1987) 30:1121–1126.[CrossRef][Web of Science][Medline]
Edelman J., White S.H. J. Mol. Biol. (1989) 210:195–209.[CrossRef][Web of Science][Medline]
Eisenberg D., Schwarz E., Komaromy M., Wall R. J. Mol. Biol. (1984) 179:125–142.[CrossRef][Web of Science][Medline]
Fares M.A. Bioinformatics (2004) 20:2867–2868.
Ferrer-Costa C., Orozco M., de la Cruz X. J. Mol. Biol. (2002) 315:771–786.[CrossRef][Web of Science][Medline]
Ferrer-Costa C., Orozco M., de la Cruz X. Proteins (2004) 57:811–819.[CrossRef][Web of Science][Medline]
Ferrer-Costa C., Orozco M., de la Cruz X. Proteins (2005) 61:878–887.[CrossRef][Web of Science][Medline]
Gloor G.B., Martin L.C., Wahl L.M., Dunn S.D. Biochemistry (2005) 44:7156–7165.[CrossRef][Web of Science][Medline]
Gorodkin J., Staerfeldt H.H., Lund O., Brunak S. Bioinformatics (1999) 15:769–770.
Grana O., Baker D., MacCallum R.M., Meiler J., Punta M., Rost B., Tress M.L., Valencia A. Proteins (2005) 61(Suppl. 7):214–224.[CrossRef][Web of Science][Medline]
Gromiha M.M., Oobatake M., Kono H., Uedaira H., Sarai A. Protein Eng. (1999) a 12:549–555.
Gromiha M.M., Oobatake M., Kono H., Uedaira H., Sarai A. J. Protein Chem. (1999) b 18:565–578.[CrossRef][Web of Science][Medline]
Gromiha M.M., Oobatake M., Kono H., Uedaira H., Sarai A. J. Biomol. Struct. Dyn. (2000) 18:281–295.[Web of Science][Medline]
Gromiha M.M., Uedaira H., An J., Selvaraj S., Prabakaran P., Sarai A. Nucleic Acids Res. (2002) 30:301–302.
Guerois R., Nielsen J.E., Serrano L. J. Mol. Biol. (2002) 320:369–387.[CrossRef][Web of Science][Medline]
Hirono S., Kollman P.A. Protein Eng. (1991) 4:233–243.
Hoang T.X., Cieplak M., Banavar J.R., Maritan A. Proteins (2002) 48:558–565.[CrossRef][Web of Science][Medline]
Hofmann K., Stoffel W. Comput. Appl. Biosci. (1992) 8:331–337.
Horaitis O., Talbot C.C. Jr, Phommarinh M., Phillips K.M., Cotton R.G. Nat. Genet. (2007) 39:425.[CrossRef][Web of Science][Medline]
Hu H.J., Pan Y., Harrison R., Tai P.C. IEEE Trans. Nanobiosci. (2004) 3:265–271.[CrossRef]
Hua S., Sun Z. Bioinformatics (2001) a 17:721–728.
Hua S., Sun Z. J. Mol. Biol. (2001) b 308:397–407.[CrossRef][Web of Science][Medline]
Huang H.L., Chang F.L. Biosystems (2006) 90:516–528.[CrossRef][Web of Science][Medline]
Huang L.T., Saraboji K., Ho S.Y., Hwang S.F., Ponnuswamy M.N., Gromiha M.M. Biophys. Chem. (2007) 125:462–470.[CrossRef][Web of Science][Medline]
Joachims T. Making large-Scale SVM Learning Practical (1999) Cambridge, MA: MIT Press.
Kanal L., Chandrasekaran B. Pattern Recognit. (1971) 3:225–234.[CrossRef][Web of Science]
Karplus P.A., Schulz G.E. Naturwissenschaften (1985) 72:212–213.[CrossRef][Web of Science]
Kato M., Pisliakov A.V., Warshel A. Proteins (2006) 64:829–844.[CrossRef][Web of Science][Medline]
Kauzmann W. Adv. Protein Chem. (1959) 14:1–63.[Web of Science][Medline]
Kearns-Jonker M., Barteneva N., Mencel R., Hussain N., Shulkin I., Xu A., Yew M., Cramer D.V. BMC Immunol. (2007) 8:3.[CrossRef][Medline]
Khatun J., Khare S.D., Dokholyan N.V. J. Mol. Biol. (2004) 336:1223–1238.[CrossRef][Web of Science][Medline]
Kim H., Park H. Proteins (2004) 54:557–562.[CrossRef][Web of Science][Medline]
Klepeis J.L., Wei Y., Hecht M.H., Floudas C.A. Proteins (2005) 58:560–570.[CrossRef][Web of Science][Medline]
Liu R., Baase W.A., Matthews B.W. J. Mol. Biol. (2000) 295:127–145.[CrossRef][Web of Science][Medline]
Ng P.C., Henikoff S. Genome Res. (2001) 11:863–874.
Ng P.C., Henikoff S. Nucleic Acids Res. (2003) 31:3812–3814.
Rajendhran J., Gunasekaran P. J. Biosci. Bioeng. (2007) 103:457–463.[CrossRef][Web of Science][Medline]
Ramensky V., Bork P., Sunyaev S. Nucleic Acids Res. (2002) 30:3894–3900.
Rao S.N., Singh U.C., Bash P.A., Kollman P.A. Nature (1987) 328:551–554.[CrossRef][Medline]
Redfern O., Grant A., Maibaum M., Orengo C. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. (2005) 815:97–107.[CrossRef][Web of Science][Medline]
Reumers J., Schymkowitz J., Ferkinghoff-Borg J., Stricher F., Serrano L., Rousseau F. Nucleic Acids Res. (2005) 33:D527–D532.
Reumers J., Maurer-Stroh S., Schymkowitz J., Rousseau F. Bioinformatics (2006) 22:2183–2185.
Shen B., Vihinen M. Bioinformatics (2003) 19:2161–2162.
Shen B., Vihinen M. Protein Eng. Des. Sel. (2004) 17:267–276.
Shi Y.Y., Mark A.E., Wang C.X., Huang F., Berendsen H.J., van Gunsteren W.F. Protein Eng. (1993) 6:289–295.
Sim J., Kim S.Y., Lee J. Bioinformatics (2005) 21:2844–2849.
Sobolev V., Sorokine A., Prilusky J., Abola E.E., Edelman M. Bioinformatics (1999) 15:327–332.
Sobolev V., Eyal E., Gerzon S., Potapov V., Babor M., Prilusky J., Edelman M. Nucleic Acids Res. (2005) 33:W39–W43.
Stenson P.D., Ball E.V., Mort M., Phillips A.D., Shiel J.A., Thomas N.S., Abeysinghe S., Krawczak M., Cooper D.N. Hum. Mutat. (2003) 21:577–581.[CrossRef][Web of Science][Medline]
Sunyaev S., Ramensky V., Bork P. Trends Genet. (2000) 16:198–200.[CrossRef][Web of Science][Medline]
Sunyaev S., Ramensky V., Koch I., Lathe W. 3rd, Kondrashov A.S., Bork P. Hum. Mol. Genet. (2001) 10:591–597.
Taverna D.M., Goldstein R.A. J. Mol. Biol. (2002) 315:479–484.[CrossRef][Web of Science][Medline]
Thusberg J., Vihinen M. Hum. Mutat. (2006) 27:1230–1243.[CrossRef][Web of Science][Medline]
Topham C.M., Srinivasan N., Blundell T.L. Protein Eng. (1997) 10:7–21.
Tuffery P., Etchebest C., Hazout S. Protein Eng. (1997) 10:361–372.
Valiaho J., Smith C.I.E., Vihinen M. Hum. Mutat. (2006) 27:1209–1217.[CrossRef][Web of Science][Medline]
Vihinen M., Torkkila E., Riikonen P. Proteins (1994) 19:141–149.[CrossRef][Web of Science][Medline]
Wang J.Y., Lee H.M., Ahmad S. Proteins (2007) 68:82–91.[CrossRef][Web of Science][Medline]
Wang L., Veenstra D.L., Radmer R.J., Kollman P.A. Proteins (1998) 32:438–458.[CrossRef][Web of Science][Medline]
Wang L.H., Liu J., Li Y.F., Zhou H.B. Genome Inform (2004) 15:181–190.[Medline]
Wang Z., Moult J. Hum. Mutat. (2001) 17:263–270.[CrossRef][Web of Science][Medline]
Word J.M., Bateman R.C. Jr, Presley B.K., Lovell S.C., Richardson D.C. Protein Sci. (2000) 9:2251–2259.[Web of Science][Medline]
Wright J.D., Lim C. Protein Eng. (2001) 14:479–486.
Yuan Z., Burrage K., Mattick J.S. Proteins (2002) 48:566–570.[CrossRef][Web of Science][Medline]
Yuan Z. BMC Bioinformatics (2005) 6:248.[CrossRef][Medline]
Yue P., Li Z., Moult J. J. Mol. Biol. (2005) 353:459–473.[CrossRef][Web of Science][Medline]
Zhou H., Skolnick J. Biophys. J. (2007) 93:1510–1518.[CrossRef][Web of Science][Medline]
Received August 28, 2007; revised October 22, 2007; accepted November 22, 2007.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Kang, G. Chen, and G. Xiao Robust prediction of mutation-induced protein stability change by property encoding of amino acids Protein Eng. Des. Sel., February 1, 2009; 22(2): 75 - 83. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




