Protein Engineering, Vol. 12, No. 2, 95-100,
February 1999
© 1999 Oxford University Press
Combining sensitive database searches with multiple intermediates to detect distant homologues
1 Helix Research Institute, 15323 Yana, Kisarazu-shi, Chiba, 292, Japan, 3 Biomolecular Structure and Modelling Unit, Department of Biochemistry, University College London, Gower Street, London, UK and 4 Tsukuba Advanced Research Alliance, University of Tsukuba, Tsukuba 305, Japan
Using data from the CATH structure classification, we have assessed the blastp, fasta, smithwaterman and gapped-blast algorithms, developed a portable normalization scheme and identified safe thresholds for database searching. Of the four methods assessed, fasta, smithwaterman and gapped-blast perform similarly, whereas the sensitivity of blastp was much lower. Introduction of an intermediate sequence search substantially improved the results. When tested on a set of relationships that could not be identified by blastp, intermediate sequences were able to find double the number of relationships identified by the smithwaterman algorithm alone. However, we found that the benefit of using intermediates varied considerably between each family and depended not only on the number of available sequences, but also their diversity. In an attempt to increase sensitivity further, a multiple intermediate sequence search (MISS) procedure was developed. When assessed on 1906 cases from a wide range of homologous families that could not be detected by the previous approaches, MISS was able to identify 241 additional relationships. MISS uses the full extent of sequence diversity to detect additional relationships, but does not consider any structure-specific information. For this reason, it is more generally applicable than fold recognition and threading methods, which require a library of known structures.
Keywords: CATH/intermediate searches/sequence analysis/protein structure
2 Present address: Sanger Centre, Wellcome Trust Genome Campus, Cambridge, UK
4 To whom correspondence should be addressed. Present address: Inpharmatica Ltd, 60 Charlotte Street, London W1P 2AX, UK
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. G. Roessler, B. M. Hall, W. J. Anderson, W. M. Ingram, S. A. Roberts, W. R. Montfort, and M. H. J. Cordes Transitive homology-guided structural studies lead to discovery of Cro proteins with 40% sequence identity but different folds PNAS, February 19, 2008; 105(7): 2343 - 2348. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Bhadra, S. Sandhya, K. R. Abhinandan, S. Chakrabarti, R. Sowdhamini, and N. Srinivasan Cascade PSI-BLAST web server: a remote homology search tool for relating protein domains. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W143 - W146. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Thompson, V. Prigent, and O. Poch LEON: multiple aLignment Evaluation Of Neighbours Nucleic Acids Res., February 24, 2004; 32(4): 1298 - 1307. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Casals, P. Gomez-Puertas, J. Pie, C. Mir, R. Roca, B. Puisac, R. Aledo, J. Clotet, S. Menao, D. Serra, et al. Structural ({beta}{alpha})8 TIM Barrel Model of 3-Hydroxy-3-methylglutaryl-Coenzyme A Lyase J. Biol. Chem., August 1, 2003; 278(31): 29016 - 29023. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Panchenko Finding weak similarities between proteins by sequence profile comparison Nucleic Acids Res., January 15, 2003; 31(2): 683 - 689. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Li, L. Jaroszewski, and A. Godzik Sequence clustering strategies improve remote homology recognitions while reducing search times Protein Eng. Des. Sel., August 1, 2002; 15(8): 643 - 649. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Rognes ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches Nucleic Acids Res., April 1, 2001; 29(7): 1647 - 1652. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. T. Yudate, M. Suwa, R. Irie, H. Matsui, T. Nishikawa, Y. Nakamura, D. Yamaguchi, Z. Z. Peng, T. Yamamoto, K. Nagai, et al. HUNT: launch of a full-length cDNA database from the Helix Research Institute Nucleic Acids Res., January 1, 2001; 29(1): 185 - 188. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.E. Bray, A.E. Todd, F.M.G. Pearl, J.M. Thornton, and C.A. Orengo The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues Protein Eng. Des. Sel., March 1, 2000; 13(3): 153 - 165. [Abstract] [Full Text] [PDF] |
||||



