Skip Navigation


PEDS Advance Access originally published online on September 4, 2008
Protein Engineering Design and Selection 2008 21(11):659-664; doi:10.1093/protein/gzn045
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
21/11/659    most recent
gzn045v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Guo, X.
Right arrow Articles by Gao, X.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Guo, X.
Right arrow Articles by Gao, X.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org

A novel hierarchical ensemble classifier for protein fold recognition

Xia Guo and Xieping Gao1

Information Engineering College, Xiangtan University, Xiangtan 411105, Hunan, PR China

1 To whom correspondence should be addressed. E-mail: xpgao{at}xtu.edu.cn


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 Funding
 Appendix A
 Acknowledgements
 References
 
The ensemble classifier plays a critical role in protein fold recognition. In this article, a novel hierarchical ensemble classifier named GAOEC (Genetic-Algorithm Optimized Ensemble Classifier) is presented and it can be constructed in the following steps. First, a novel optimized classifier named GAET-KNN (Genetic-Algorithm Evidence-Theoretic K Nearest Neighbors) is proposed as a component classifier. Second, six component classifiers in the first layer are used to get a potential class index for every query protein. Third, according to the results of the first layer, every component classifier in the second layer generates a 27-dimension vector whose elements represent the confidence degrees of 27-folds. Finally, genetic algorithm is used for generating weights for the outputs of the second layer to get the final classification result. The standard percentage accuracy of GAOEC is 64.7% on a widely used benchmark dataset, where the proteins in the testing set have less than 35% identity with those in the training set.

Keywords: ET-KNN/GAET-KNN/genetic algorithm/hierarchical ensemble classifier/protein fold recognition


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 Funding
 Appendix A
 Acknowledgements
 References
 
With the magnitude of new sequences growth, it is urgent to develop novel structure prediction algorithms to determine the structure of new sequences. A number of methods have achieved some success through recognizing protein fold, which is a common three-dimensional pattern with the same major secondary structure elements in the same arrangement and with the same topological connections (Craven et al., 1995Go). Protein fold recognition is a very challenging problem when proteins have no significant similarity. Although traditional similarity-based methods, such as pairwise sequence alignment and sequence-structure compatibility, can accurately recognize protein fold when proteins have close evolutionary relationship (Xu et al., 2003Go; Zhou and Zhou, 2004Go; Han et al., 2005Go; Söding, 2005Go), they are not efficient when proteins have less than 20% sequence identity. Comparing with the traditional similarity-based methods, machine learning methods integrating multiple similarity features, which transfer protein fold recognition to a two-class problem, have demonstrated promising results when proteins have low sequence similarity. For example, the percentage of correct first hits for MANIFOLD is 74.93% on the dataset developed by Ding and Dubchak (Bindewald et al., 2003Go). The sensitivity of FOLDpro is about 27% on the large benchmark dataset developed by Lindahl and Elofsson (Cheng and Baldi, 2006Go).

In recent years, the taxonometric approach without relying on similarity features plays a critical role in protein fold recognition (Ding and Dubchak, 2001Go; Huang et al., 2003Go; Tan et al., 2003Go; Nanni, 2005Go; Nanni, 2006a; Nanni, 2006b, Shen and Chou, 2006Go). The taxonometric approach directly extracts features from query proteins and transfers protein fold recognition to a multi-class problem. As we know, the larger the number of classes, the more difficult the classification for multi-class problems and the number of protein folds exceeds 1000. Therefore, it is maybe impossible to recognize the fold of a query protein in all protein folds through the taxonometric approach. Most taxonometric approaches (Ding and Dubchak, 2001Go; Huang et al., 2003Go; Tan et al., 2003Go; Nanni, 2005Go; Nanni, 2006a; Nanni, 2006b, Shen and Chou, 2006Go) determine the fold of the query in 27-folds, which have no less than seven proteins and represent all major structural classes: {alpha}, β, {alpha}/β and {alpha}+β. Protein fold recognition in the 27-folds is also a challenging task in the research of protein fold recognition. Here, we focus on the ensemble classifier for protein fold recognition in the 27-folds as previous researchers. In general, an ensemble classifier is constructed through training a number of component classifiers and then integrating the component predictions. Therefore, the performance of component classifiers and the efficiency of ensemble strategies are the major factors for the classification accuracy of ensemble classifiers. Although PFP-Pred (Shen and Chou, 2006Go) has achieved better performance comparing with previous researches, it has two shortcomings. First, because the optimized parameters in OET-KNN (Optimized Evidence-Theoretic K Nearest Neighbors) are far from the global optimum, the component classifier OET-KNN is not very efficient. Second, the ensemble weighted strategy, wherein the classification accuracies of component classifiers sever as corresponding weights, does not generate the optimum weight vector to maximize classification accuracy.

In this article, a novel two-layer ensemble classifier named GAOEC is presented. First, the component classifiers in the first layer are used to get a potential class index for every query protein in the 27-folds. Second, according to the potential class index, every component classifier in the second layer generates a 27-dimension vector for every query protein, wherein each element represents the confidence degree of its corresponding fold. The rude classification in the first layer helps the operation of the second layer through excluding the training samples whose class labels are not consistent with any element of the potential class index. As previous researches, six kinds of features are used to recognize protein fold. Each layer includes six component classifiers. In practice, reducing reasonably the noise in the training set increases the accuracies of the second layer by up to 10% compared with those of the first layer. The component classifiers in each layer are GAET-KNNs, which use genetic algorithm (GA) to generate the optimum parameter vector in ET-KNNs to maximize classification accuracy. It proves in practice that GAET-KNN as the component classifier achieves much better accuracy than OET-KNN. Additionally, tuning the number of the nearest neighbors in GAET-KNN has little influence on classification accuracy. Considering its powerful global optimization performance, GA is used for generating the weights for the outputs of the second layer to maximize classification accuracy. Last, GAOEC generates a 27-dimension vector and the index of the maximum element in the vector is the final classification result for the query. In this article, majority voting also severs as the ensemble strategy for assessing the performance of the classification system.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 Funding
 Appendix A
 Acknowledgements
 References
 
Materials

The dataset studied here is taken from Ding and Dubchak (Ding and Dubchak, 2001Go). It contains 313 proteins in the training set and 385 proteins in the testing set. Proteins in the training set and the testing set are classified into 27 most populated folds (Table A1) which have no less than seven proteins and represent all major structural classes: {alpha}, β, {alpha}/β, and {alpha}+β (Ding and Dubchak, 2001Go). None of proteins in the testing set has more than 35% sequence identity with those in the training set. Six kinds of features are extracted from every protein in the dataset (Table I).


View this table:
[in this window]
[in a new window]

 
Table I. List of six kinds of features extracted from proteins in the datasets (Ding and Dubchak, 2001Go)

 
Various accuracy measures are used to assess the performance of classifiers for protein fold recognition. True positive rate and false positive rate are usually used in two-way classifiers. Analogously, sensitivity and specificity are usually used in the similarity-based methods (Bindewald et al., 2003Go; Cheng and Baldi, 2006Go). The standard Q percentage accuracy (Rost and Sander, 1993Go; Baldi et al., 2000Go) is usually used in multi-way classifiers. For multi-class protein fold classification, the standard Q percentage accuracy is used to evaluate GAOEC and GAET-KNN. Additionally, the performance of GAOEC is also evaluated by sensitivity.

Suppose ni denotes the number of testing proteins in the ith fold and only ci proteins are correctly classified. So, the classification accuracy of the ith fold can been denoted as Qi=ci/ni. Thus, the overall classification accuracy can be formulated as follows:

Formula 045M1 1

Formula 045M2 2

Formula 045M3 3
where N is the total number of testing proteins, C is the total number of correctly classified proteins, Q is the overall classification accuracy and k is the number of classes.

Methods

The two-layer ensemble classifier (GAOEC) As reviewed in Introduction section, although PFP-Pred performs well comparing with previous ensemble classifiers, both its component classifiers and ensemble weighted strategy are not efficient enough. For multi-class problems, it is obvious that the larger the number of classes, the more difficult classification. Reducing reasonably the noise in the training set might be an efficient means for improving classification performance.

In this section, a novel hierarchical ensemble classifier (GAOEC) is presented and it can be constructed in the following steps (Fig. 1). First, a novel optimized classifier (GAET-KNN) is proposed as a component classifier. Second, two layer GAET-KNNs are used to classify query proteins in the 27-folds. As previous researches, six kinds of features are extracted from every protein in the dataset. Each layer includes six GAET-KNNs. When a query protein is presented to GAOEC, every GAET-KNN in the first layer generates a 27-dimension vector, wherein each element represents the confidence degree of its corresponding fold. The index of the maximum element in the vector acts as the class label of the query. Thus, the number of potential folds for the query is not more than six. Therefore, the rude classification in the first layer can help the operation of the second layer through excluding the training samples whose class labels are not consistent with any one of the potential class index obtained by the first layer. Third, GA is used for generating the weights for the outputs of the second layer to maximize classification accuracy. Finally, GAOEC generates a 27-dimension vector and the index of the maximum element in the vector is the final classification result for the query.


Figure 1
View larger version (33K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. GAOEC is shown in Fig. 1 and employs a two-layer structure to recognize the fold of query proteins. The neighbors in the second layer are from the potential class index generated by the first layer. GAOEC employs GA to generate weights as a whole for the second layer to maximize classification accuracy.

 
The component classifier (GAET-KNN) In this article, rather than using existing OET-KNN, we propose a novel optimized classifier GAET-KNN as the component classifier which use GA to generate the optimum parameters in ET-KNN to maximize classification accuracy. For reader’s convenience, the ET-KNN rule used in paper is recalled as follows.

There are 27-folds, which can be denoted by S= {c1, c2, ... ,c27}. The training set, which contains N n-dimensional patterns x(i), can be denoted by {Gamma}={(x(1), c(2)), (x(2), c(2)), ... , (x(N), c(N))}. The class label c(i) takes value in the set S. The similarity between two patterns is measured by Euclidean distance. If the distance is small, two patterns are deemed as belonging to the same class, otherwise, their classes are completely irrelevant. Therefore, the training sample with the smallest distance is deemed as the nearest neighbor for the query. According to the ET-KNN rule, every neighbor of the query x is considered as an item of evidence supporting certain hypotheses concerning the class membership of the query (Denoeux, 1995Go). So, K nearest neighbors with the smallest distance are used to determine the fold of the query. A basic belief assignment (BBA) is assigned to every neighbor. The resulting BBA is obtained through aggregating the BBA of the K nearest neighbors using the Dempster’s rule. Consequently, a BBA can be defined by (Denoeux, 1995Go):

Formula 045M4 4

Formula 045M5 5

where d (x, x(i)) is the Euclidean distance between x and x(i), Cu is the class of x(i), {alpha} is a fixed parameter such that 0< {alpha} <1 and {gamma}u is a positive parameter associated to the class Cu.

According to Dempster’s rule, a resulting BBA m regarding the class of the query x can be formulated by (Denoeux, 1995Go):

Formula 045M6 6
where {oplus} denotes the orthogonal sum and Ik={i1, ... , ik} is the indexical set of the K nearest neighbors of x.

Thus, m can be shown as the following expression (Denoeux, 1995Go):

Formula 045M7 7

Formula 045M8 8
where W is a normalizing factor.

Thus, the class of the query x is Cu, if

Formula 045M9 9
where Max means taking the maximum one among those in the brackets.

Although the value of {alpha} has been proved not to be too critical, the tuning of the parameter vector {gamma}={{gamma}1, {gamma}2, {gamma}3, ... , {gamma}u} has significant influence on classification accuracy (Denoeux, 1995Go). A number of methods have been proposed to improve the performance of ET-KNN through generating more appropriate {gamma}u. For example, {gamma}u is set to the inverse of the mean distance between training patterns belonging to the class Cu (Denoeux, 1995Go) or is determined through optimizing a performance criterion (Zouhal and Denoeux, 1998Go). However, these methods can not get a global optimum parameter vector {gamma}.

As a powerful global optimization strategy, GA has already been successfully applied in various areas. Here, we use it to generate the optimum parameter vector {gamma} for ET-KNN and propose a novel optimized classifier named GAET-KNN. GAET-KNN generates a random parameter vector and then employs GA to evolve the parameter vector to achieve the highest classification accuracy. Thus, GAET-KNN can determine the parameters in ET-KNN as a whole and maximize classification accuracy when GA generates the optimum parameter vector {gamma}.

In this article, GAET-KNN is realized by utilizing NSGA-II (Deb et al., 2002Go) and a floating coding scheme. It is obvious that the higher the classification accuracy, the better the parameter vector. Owing to the working scheme of NSGA-II, wherein the individual in the first rank has the smallest fitness value, the error rate of classification is used as the fitness function in NSGA-II. Of course, GAET-KNN can also be realized through using other kinds of GAs and coding schemes. The GAET-KNN is summarized in Fig. 2, where {Gamma} is a training set, L is ET-KNN, A is the classification accuracy basing on {gamma}, N is the number of generations in NSGA-II.


Figure 2
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. GAET-KNN is summarized here, where {Gamma} denotes the training set, L denotes classifier ET-KNN, A denotes the classification accuracy basing on the parameter vector {gamma} and N' denotes the number of generations in GA.

 
The ensemble strategy Here, GA and major voting are used for generating weights for the second layer GAET-KNNs to get the final classification accuracy. The process of integrating the second layer GAET-KNNs using GA can be defined as follows:

Formula 045M10 10
where mi({Cj}) is the BBA for the query x belonging to the class Ci obtained by the ith GAET-KNN, Yj is the resulting BBA of the jth fold, Wij is the weight for the confidence degree of the jth fold obtained by the ith GAET-KNN, M is the number of the second layer GAET-KNNs and M is 6 in the paper.

So the resulting class for the query x is Cu with which score of the resulting BBA Yu is the highest (Shen and Chou, 2006Go); i.e. suppose

Formula 045M11 11
where Max means taking the maximum one among those in the brackets.


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 Funding
 Appendix A
 Acknowledgements
 References
 
To be comparable with previous researchers, we test our method on the widely used datasets where the identity between two different protein sequences is below 35% and most sequences in testing set have less than 25% sequence identity with those in training set (Ding and Dubchak, 2001Go).

The comparisons of 15 methods for protein fold recognition are shown in Table II. The methods (Bindewald et al., 2003Go; Chinnasamy et al., 2003Go; Gewehr et al., 2004Go; Altschul et al., 1997Go) are similarity-based methods in which the features are alignment features and the accuracy measure is the percent of correct first hits. The methods (Ding and Dubchak, 2001Go; Huang et al., 2003Go; Nanni, 2006a; Nanni 2006b; Shen and Chou, 2006Go) and GAOEC are taxonometric approaches in which the features are biochemical features and the accuracy measure is the standard percentage accuracy. Thus, the comparisons are only meant to provide a broad, rough performance assessment rather than a precise ranking. Although the features in the similarity-based methods (Bindewald et al., 2003Go; Chinnasamy et al., 2003Go; Gewehr et al., 2004Go; Altschul et al., 1997Go) are different, the features in the taxonometric approaches (Ding and Dubchak, 2001Go; Huang et al., 2003Go; Nanni, 2006a; Nanni, 2006b; Shen and Chou, 2006Go) are the same. Therefore, the comparisons of the taxonometric approaches (Ding and Dubchak, 2001Go; Huang et al., 2003Go; Nanni 2006a; Nanni 2006b; Shen and Chou, 2006Go) and GAOEC can provide a relative reasonable ranking. The standard percentage accuracy of GAOEC is 64.7% for independent test when the number of the nearest neighbors in GAET-KNN is 23. At 65.3% specificity, the sensitivity of GAOEC is 77.4% for independent test when the number of the nearest neighbors in GAET-KNN is 12. Sensitivity is defined as the percentage of query proteins whose class labels are ranked the first by at least one GAET-KNN in the first layer. Specificity is defined as the ratio between the proteins correctly classified by GAOEC and the proteins whose class labels are ranked the first by at least one GAET-KNN in the first layer. In the paper, majority voting is also used as the ensemble strategy to assess the classification system and the best standard percentage accuracy is 63.7%.


View this table:
[in this window]
[in a new window]

 
Table II. Accuracy of various methods for protein fold recognition

 
In detail, when the number of the nearest neighbors in GAET-KNN is 23, the prediction accuracies of first layer classifiers C1, C2, C3, C4, C5, C6 are 54.8%, 46.8%, 43.1%, 40.5%, 41.6%, and 40.0%, respectively, which are much higher than the accuracies of SVM and ET-KNN (Table III). It proves in practice that GA greatly improves the performance of ET-KNN through generating the global optimum parameter vector {gamma}. The most effective feature is amino acid composition, consistently with former researchers (Ding and Dubchak, 2001Go). Additionally, it is shown that the accuracies of the second layer GAET-KNNs are about 10% higher than those of the first layer. It demonstrates that the two-layer structure can reduce reasonably the noise in the training set and is an efficient means for multi-class problems. GAOEC also be tested on the combinations of different kinds of features (Table IV) to compare with other methods. It is shown that in almost all situations, the performance of GAOEC is better than other methods.


View this table:
[in this window]
[in a new window]

 
Table III. Prediction accuracy of classifiers on different kinds of features

 

View this table:
[in this window]
[in a new window]

 
Table IV. Accuracy of various methods on combinations of different kinds of features

 
GAET-KNN is not only efficient but also stable. It proves in practice that tuning the number of nearest neighbors in GAET-KNN has little influence on classification accuracy. The accuracies of GAET-KNNs in the first and second layers are represented as a function of the number of nearest neighbors and are shown in Figs 3 and 4, respectively.


Figure 3
View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Classification accuracies of the first layer GAET-KNNs as a function of K. The performance of GAET-KNN is evaluated on the different number of the nearest neighbors. Tuning the number of the nearest neighbors in GAET-KNN has little influence on classification accuracy.

 

Figure 4
View larger version (23K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4. Classification accuracies of the second layer GAET-KNNs as a function of K.

 

    Conclusion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 Funding
 Appendix A
 Acknowledgements
 References
 
We have presented an optimized hierarchical ensemble classifier GAOEC for protein fold recognition in the following steps. First, we use GA to optimize parameters in ET-KNN and propose a novel classifier (GAET-KNN) as the component classifier. It has shown high and robust performance of classification. Second, we present a two-layer GAET-KNNs structure to classify the query protein in 27-folds. With the guide of the rude classification in the first layer, the second layer can achieve higher performance through filtering the irrelevant samples in the training set. Third, we use GA to generate weights for the outputs of the second layer to get the overall classification accuracy.

Our approach delivers a good performance on current widely used datasets. Although protein fold recognition is a challenging issue, GAOEC provides good insights to improve the performance.


    Funding
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 Funding
 Appendix A
 Acknowledgements
 References
 
This work is supported by the National Natural Science Foundation of China (Grant No. 60375021) and the Hunan Provincial foundation for Distinguished Young Scholars (Grant No. 05JJ10011).


    Appendix A
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 Funding
 Appendix A
 Acknowledgements
 References
 
For reader’s convenience, we list the 27-folds, the symbols and logograms used in this article in Tables A1GoA3.


Table A1. The 27-folds used in this article

Fold Ntrain Ntest

Globin-like 13 6
Cytochrome c 7 9
DNA-binding three-helical bundle 12 20
Four-helical up-and-down bundle 7 8
Four-helical cytokines 9 9
EF-hand 7 9
Immunoglobulin-like β-sandwich 30 44
Cupredoxins 9 12
Viral coat and capsid proteins 16 13
ConA-like lectins/glucanases 7 6
SH3-like barrel 8 8
OB-fold 13 19
Trefoil 8 4
Trypsin-like serine proteases 9 4
Lipocalins 9 7
(TIM)-barrel 29 48
FAD (also NAD)-binding motif 11 12
Flavodoxin-like 11 13
NAD(P)-binding Rossmann-fold 13 27
P-loop 10 12
Thioredoxin-like 9 8
Ribonuclease H-like motif 10 14
Hydrolases 11 7
Periplasmic binding protein-like 11 4
β- grasp 7 8
Ferredoxin-like 13 27
Small inhibitors, toxins, lectins 12 27

Ntrain: number of folds used in the training set.

Ntest: number of folds used in the testing set.


Table A2. The logograms and their full names

GAOEC Genetic-algorithm optimized ensemble classifier
GAET-KNN Genetic-algorithm evidence-theoretic K nearest neighbors
OET-KNN Optimized evidence-theoretic K nearest neighbors
ET-KNN Evidence-theoretic K nearest neighbors
SVM Support vector machines
GA Genetic algorithm
OvO One-versus-others method
uOvO Unique one-versus-others method
AvA All-versus-all method
HLA Hierarchical learning architecture
NN Neural network
C Amino acid composition
S Predicted secondary structure
H Hydrophobicity
V Normalized van der waals volume
P Polarity
Z Polarizability
BBA Basic belief assignment
MLP Multilayer perceptron
GRNN General regression neural networks
RBFN Radial basis function network


Table A3. The symbols and their meanings

K The number of neighbors in K nearest neighbors (KNN) classifier
ni The number of testing proteins in the ith fold
ci The proteins correctly classified in the ith fold
Qi The accuracy of the ith fold
N The total number of testing proteins
C The total number of correctly classified proteins
Q The overall classification accuracy
k The number of classes
S' The 27-fold set
{Gamma} The training set
c(i) The label of the ith fold
x(i) The ith n-dimensional training sample
d(x, xi) The Euclidean distance between x and x(i)
{alpha} A fixed parameter
{gamma}u A positive parameter associated to the class cu
x The query protein
IK The indexical set of the K nearest neighbors of x
m The basic belief assignment
W A normalizing factor
{gamma} The parameter vector in ET-KNN
N' The number of generations in GA
L ET-KNN classifier
A The classification accuracy basing on {gamma}
mi({Cj}) The BBA for the query x belonging to ci obtained by the ith GAET-KNN
Yi The resulting BBA of the ith fold
wij The weight for the jth fold of the ith GAET-KNN
M The number of GAET-KNNs in the second layer


    Footnotes
 
Edited by Stefano Gianni


    Acknowledgements
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 Funding
 Appendix A
 Acknowledgements
 References
 
The authors would like to thank Dr Bodong Li for his help on experiment, Dr Caiyan Jia and Shaoping Ling for their constructive advice. The authors also wish to thank C.H.Q. Ding at Lawrence Berkeley National Laboratory for sharing datasets.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 Funding
 Appendix A
 Acknowledgements
 References
 
Altschul S.F., Madden T.L., Schaffer A.A., Zhang J.H., Zhang Z., Miller W., Lipman D.J. Nucleic Acids Res. (1997) 25:3389–3402.[Abstract/Free Full Text]

Bindewald E., et al. Protein Eng. (2003) 16:785–789.[Abstract/Free Full Text]

Baldi P., Brunak S., Chauvin Y., Andersen C., Nielen H. Bioinformatics (2000) 16:412–424.[Abstract/Free Full Text]

Cheng J.L., Baldi P. Bioinformatics (2006) 22:1456–1463.[Abstract/Free Full Text]

Chinnasamy A., Sung W.K., Mittal A. Pacific Symposium on Biocomputing—Altman R., Keith A., Hunter L., Jung T., Klein T., eds. (2003) 9:387–398.

Craven M.W., Mural R.J., Hauser L.J., Uberbacher E,C. ISMB (1995) 3:98–106.[Medline]

Deb K., Prata A., Agarwal S., Meyarivan T. IEEE Trans. Evol. Comput. (2002) 6:182–197.[CrossRef]

Denoeux T. IEEE Trans. Syst. Man Cybern. (1995) 25:804–813.[CrossRef][Web of Science]

Ding C.H., Dubchak I. Bioinformatics (2001) 17:349–358.[Abstract/Free Full Text]

Han S., Lee B.C., Yu S.T., Jeong C.S., Lee S., Kim D. Bioinformatics (2005) 21:2667–2673.[Abstract/Free Full Text]

Huang C.D., Lin C.T., Pal N.R. IEEE Trans. Nanobiosci. (2003) 4:221–232.

Gewehr J.E., von Öhsen N., Zimmer R. German Conference on Bioinformatics. (2004) 141–148.

Nanni L. Neurocomputing (2005) 68:317–321.

Nanni L. Neurocomputing (2006) a 69:2434–2437.[CrossRef][Web of Science]

Nanni L. Neurocomputing (2006) b 69:850–853.[CrossRef][Web of Science]

Rost B., Sander C. J. Mol. Biol. (1993) 232:584–599.[CrossRef][Web of Science][Medline]

Söding J. Bioinformatics (2005) 21:951–960.[Abstract/Free Full Text]

Shen H.B., Chou K.C. Bioinformatics (2006) 22:1717–1722.[Abstract/Free Full Text]

Tan A.C., Gilbert D., Deville Y. Genome Inform. (2003) 16:206–217.

Xu J., Xu Y., Lin G., Kim D., Li M. In Pac Symp Biocomput (2003).

Zhou H., Zhou Y. Proteins (2004) 55:1005–1013.[CrossRef][Web of Science][Medline]

Zouhal L.M., Denoeux T. IEEE Trans. Syst. Man Cybern. (1998) 28:263–271.[CrossRef][Web of Science]

Received November 14, 2007; revised July 29, 2008; accepted August 1, 2008.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
Q. Dong, S. Zhou, and J. Guan
A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation
Bioinformatics, October 15, 2009; 25(20): 2655 - 2662.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
21/11/659    most recent
gzn045v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Guo, X.
Right arrow Articles by Gao, X.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Guo, X.
Right arrow Articles by Gao, X.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?