Protein Engineering, Vol. 15, No. 8, 643-649,
August 2002
© 2002 Oxford University Press
Sequence clustering strategies improve remote homology recognitions while reducing search times
The Burnham Institute, La Jolla, CA 92037, USA
Sequence databases are rapidly growing, thereby increasing the coverage of protein sequence space, but this coverage is uneven because most sequencing efforts have concentrated on a small number of organisms. The resulting granularity of sequence space creates many problems for profile-based sequence comparison programs. In this paper, we suggest several strategies that address these problems, and at the same time speed up the searches for homologous proteins and improve the ability of profile methods to recognize distant homologies. One of our strategies combines database clustering, which removes highly redundant sequence, and a two-step PSI-BLAST (PDB-BLAST), which separates sequence spaces of profile composition and space of homology searching. The combination of these strategies improves distant homology recognitions by more than 100%, while using only 10% of the CPU time of the standard PSI-BLAST search. Another method, intermediate profile searches, allows for the exploration of additional search directions that are normally dominated by large protein sub-families within very diverse families. All methods are evaluated with a large fold-recognition benchmark.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Soding, M. Remmert, A. Biegert, and A. N. Lupas HHsenser: exhaustive transitive profile search using HMM-HMM comparison. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W374 - W378. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Ge, H. Hu, K. Ding, L. Sun, and S. Zheng Protein Interaction Analysis of ST14 Domains and Their Point and Deletion Mutants J. Biol. Chem., March 17, 2006; 281(11): 7406 - 7412. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Sivakumar, C. Wilton, and L. Holm From sequences to a functional unit Physiol Genomics, March 13, 2006; 25(1): 1 - 8. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Arnold, L. Bordoli, J. Kopp, and T. Schwede The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling Bioinformatics, January 15, 2006; 22(2): 195 - 201. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. S. Pettitt, L. J. McGuffin, and D. T. Jones Improving sequence-based fold recognition by using 3D model quality assessment Bioinformatics, September 1, 2005; 21(17): 3509 - 3515. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Itoh, S. Goto, T. Akutsu, and M. Kanehisa Fast and accurate database homology search using upper bounds of local alignment scores Bioinformatics, April 1, 2005; 21(7): 912 - 921. [Abstract] [Full Text] [PDF] |
||||



