Protein Engineering, Vol. 12, No. 7, 527-534,
July 1999
© 1999 Oxford University Press
Comparing protein sequence-based and predicted secondary structure-based methods for identification of remote homologs
1 ABS/MSCL/CIT, National Institutes of Health, Bethesda, MD 20892, 3 The Institute for Genomic Research, Rockville, MD 20850, USA and 4 Laboratoire de Biologie Cellulaire et Moleculaire, Biotechnologies, INRA, 78 352 JOUY-en-JOSAS, Cedex, France
We have compared a novel sequencestructure matching technique, FORESST, for detecting remote homologs to three existing sequence based methods, including local amino acid sequence similarity by BLASTP, hidden Markov models (HMMs) of sequences of protein families using SAM, HMMs based on sequence motifs identified using meta-MEME. FORESST compares predicted secondary structures to a library of structural families of proteins, using HMMs. Altogether 45 proteins from nine structural families in the database CATH were used in a cross-validated test of the fold assignment accuracy of each method. Local sequence similarity of a query sequence to a protein family is measured by the highest segment pair (HSP) score. Each of the HMM-based approaches (FORESST, MEME, amino acid sequence-based HMM) yielded log-odds score for the query sequence. In order to make a fair comparison among these methods, the scores for each method were converted to Z-scores in a uniform way by comparing the raw scores of a query protein with the corresponding scores for a set of unrelated proteins. Z-Scores were analyzed as a function of the maximum pairwise sequence identity (MPSID) of the query sequence to sequences used in training the model. For MPSID above 20%, the Z-scores increase linearly with MPSID for the sequence-based methods but remain roughly constant for FORESST. Below 15%, average Z-scores are close to zero for the sequence-based methods, whereas the FORESST method yielded average Z-scores of 1.8 and 1.1, using observed and predicted secondary structures, respectively. This demonstrates the advantage of the sequencestructure method for detecting remote homologs.
Keywords: hidden Markov models/motifs/remote homologs/secondary structures
2 To whom correspondence should be addressed
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G. A. Price, G. E. Crooks, R. E. Green, and S. E. Brenner Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap Bioinformatics, October 15, 2005; 21(20): 3824 - 3831. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Uehara, T. Kawabata, and N. Go Filtering remote homologues using predicted structural information Protein Eng. Des. Sel., July 1, 2004; 17(7): 565 - 570. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. V. Grigoriev, C. Zhang, and S.-H. Kim Sequence-based detection of distantly related proteins with the same fold Protein Eng. Des. Sel., July 1, 2001; 14(7): 455 - 458. [Full Text] [PDF] |
||||
![]() |
S. Y. Hsu and A. J. W. Hsueh Discovering New Hormones, Receptors, and Signaling Mediators in the Genomic Era Mol. Endocrinol., May 1, 2000; 14(5): 594 - 604. [Full Text] |
||||


