Protein Engineering, Vol. 12, No. 2, 85-94,
February 1999
© 1999 Oxford University Press
Twilight zone of protein sequence alignments
1 EMBL, 69 012 Heidelberg, 2 LION Bioscience AG, Im Neuenheimer Feld 517, 69 120 Heidelberg, Germany and 3 Columbia University, Department of Biochemistry and Molecular Biophysics, 650 West 168 Street, New York, NY 10032, USA
Sequence alignments unambiguously distinguish between protein pairs of similar and non-similar structure when the pairwise sequence identity is high (>40% for long alignments). The signal gets blurred in the twilight zone of 2035% sequence identity. Here, more than a million sequence alignments were analysed between protein pairs of known structures to re-define a line distinguishing between true and false positives for low levels of similarity. Four results stood out. (i) The transition from the safe zone of sequence alignment into the twilight zone is described by an explosion of false negatives. More than 95% of all pairs detected in the twilight zone had different structures. More precisely, above a cut-off roughly corresponding to 30% sequence identity, 90% of the pairs were homologous; below 25% less than 10% were. (ii) Whether or not sequence homology implied structural identity depended crucially on the alignment length. For example, if 10 residues were similar in an alignment of length 16 (>60%), structural similarity could not be inferred. (iii) The `more similar than identical' rule (discarding all pairs for which percentage similarity was lower than percentage identity) reduced false positives significantly. (iv) Using intermediate sequences for finding links between more distant families was almost as successful: pairs were predicted to be homologous when the respective sequence families had proteins in common. All findings are applicable to automatic database searches.
Keywords: alignment quality analysis/evolutionary conservation/genome analysis/protein sequence alignment/sequence space hopping
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
D. Przybylski and B. Rost Powerful fusion: PSI-BLAST and consensus sequences Bioinformatics, September 15, 2008; 24(18): 1987 - 1993. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Patel, R. George, F. Autore, F. Fraternali, J. E. Ladbury, and P. V. Nikolova Molecular interactions of ASPP1 and ASPP2 with the p53 protein family and the apoptotic promoters PUMA and Bax Nucleic Acids Res., September 1, 2008; 36(16): 5139 - 5151. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Loewenstein and M. Linial Connect the dots: exposing hidden protein family connections from the entire sequence tree Bioinformatics, August 15, 2008; 24(16): i193 - i199. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Overton, C. A. J. van Niekerk, L. G. Carter, A. Dawson, D. M. A. Martin, S. Cameron, S. A. McMahon, M. F. White, W. N. Hunter, J. H. Naismith, et al. TarO: a target optimisation system for structural biology Nucleic Acids Res., July 1, 2008; 36(suppl_2): W190 - W196. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Overton, G. Padovani, M. A. Girolami, and G. J. Barton ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction Bioinformatics, April 1, 2008; 24(7): 901 - 907. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Shah, C. S. Oehmen, and B.-J. Webb-Robertson SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection Bioinformatics, March 15, 2008; 24(6): 783 - 790. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. N. Wass and M. J. E. Sternberg ConFunc--functional annotation in the twilight zone Bioinformatics, March 15, 2008; 24(6): 798 - 806. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Mallorqui-Fernandez, S. P. Manandhar, G. Mallorqui-Fernandez, I. Uson, K. Wawrzonek, T. Kantyka, M. Sola, I. B. Thogersen, J. J. Enghild, J. Potempa, et al. A New Autocatalytic Activation Mechanism for Cysteine Proteases Revealed by Prevotella intermedia Interpain A J. Biol. Chem., February 1, 2008; 283(5): 2871 - 2882. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Hulo, A. Bairoch, V. Bulliard, L. Cerutti, B. A. Cuche, E. de Castro, C. Lachaize, P. S. Langendijk-Genevaux, and C. J. A. Sigrist The 20 years of PROSITE Nucleic Acids Res., January 11, 2008; 36(suppl_1): D245 - D249. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Brylinski and J. Skolnick A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation PNAS, January 8, 2008; 105(1): 129 - 134. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Xiong, C. E. Bauer, and A. Pancholy Insight into the haem d1 biosynthesis pathway in heliobacteria through bioinformatics analysis Microbiology, October 1, 2007; 153(10): 3548 - 3562. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Schlessinger, M. Punta, and B. Rost Natively unstructured regions in proteins identified from contact predictions Bioinformatics, September 15, 2007; 23(18): 2376 - 2384. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Ofran, V. Mysore, and B. Rost Prediction of DNA-binding residues from sequence Bioinformatics, July 1, 2007; 23(13): i347 - i353. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Pandini, G. Mauri, A. Bordogna, and L. Bonati Detecting similarities among distant homologous proteins by comparison of domain flexibilities Protein Eng. Des. Sel., June 30, 2007; (2007) gzm021v2. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Bromberg and B. Rost SNAP: predict effect of non-synonymous polymorphisms on function Nucleic Acids Res., June 28, 2007; 35(11): 3823 - 3835. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Przybylski and B. Rost Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments Nucleic Acids Res., April 1, 2007; 35(7): 2238 - 2246. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Dalhus, I. H. Helle, P. H. Backe, I. Alseth, T. Rognes, M. Bjoras, and J. K. Laerdahl Structural insight into repair of alkylated DNA by a new superfamily of DNA glycosylases comprising HEAT-like repeats Nucleic Acids Res., April 1, 2007; 35(7): 2451 - 2459. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Richardt, D. Lang, R. Reski, W. Frank, and S. A. Rensing PlanTAPDB, a Phylogeny-Based Resource of Plant Transcription-Associated Proteins Plant Physiology, April 1, 2007; 143(4): 1452 - 1466. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Krissinel On the relationship between sequence and structure similarities in proteomics Bioinformatics, March 15, 2007; 23(6): 717 - 723. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. B. Roland and E. I. Shakhnovich Divergent Evolution of a Structural Proteome: Phenomenological Models Biophys. J., February 1, 2007; 92(3): 701 - 716. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Ofran and B. Rost ISIS: interaction sites identified from sequence Bioinformatics, January 15, 2007; 23(2): e13 - e16. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Goris, K. T. Konstantinidis, J. A. Klappenbach, T. Coenye, P. Vandamme, and J. M. Tiedje DNA-DNA hybridization values and their relationship to whole-genome sequence similarities Int J Syst Evol Microbiol, January 1, 2007; 57(1): 81 - 91. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Miller, S. Tollefson, J. E. Crowe Jr., J. V. Williams, and D. W. Wright Examination of a Fusogenic Hexameric Core from Human Metapneumovirus and Identification of a Potent Synthetic Peptide Inhibitor from the Heptad Repeat 1 Region J. Virol., January 1, 2007; 81(1): 141 - 149. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Meyer, L. Hochrein, and F. H. Arnold Structure-guided SCHEMA recombination of distantly related {beta}-lactamases Protein Eng. Des. Sel., December 1, 2006; 19(12): 563 - 570. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Beuming, L. Shi, J. A. Javitch, and H. Weinstein A Comprehensive Structure-Based Alignment of Prokaryotic and Eukaryotic Neurotransmitter/Na+ Symporters (NSS) Aids in the Use of the LeuT Structure to Probe NSS Structure and Function Mol. Pharmacol., November 1, 2006; 70(5): 1630 - 1642. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Sadekar, J. Raymond, and R. E. Blankenship Conservation of Distantly Related Membrane Proteins: Photosynthetic Reaction Centers Share a Common Structural Core Mol. Biol. Evol., November 1, 2006; 23(11): 2001 - 2007. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. S. G. Chain, V. J. Denef, K. T. Konstantinidis, L. M. Vergez, L. Agullo, V. L. Reyes, L. Hauser, M. Cordova, L. Gomez, M. Gonzalez, et al. Inaugural Article: Burkholderia xenovorans LB400 harbors a multi-replicon, 9.73-Mbp genome shaped for versatility PNAS, October 17, 2006; 103(42): 15280 - 15287. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Tangrot, L. Wang, B. Kagstrom, and U. H. Sauer FISH--family identification of sequence homologues using structure anchored hidden Markov models. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W10 - W14. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Vacic, L. M. Iakoucheva, and P. Radivojac Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments Bioinformatics, June 15, 2006; 22(12): 1536 - 1537. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Fodor and R. W. Aldrich Statistical Limits to the Identification of Ion Channel Domains by Sequence Similarity J. Gen. Physiol., May 30, 2006; 127(6): 755 - 766. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Ozkirimli and C. B. Post Src kinase activation: A switched electrostatic network Protein Sci., May 1, 2006; 15(5): 1051 - 1062. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Su, J. Wang, J. Yu, X. Huang, and X. Gu Evolution of alternative splicing after gene duplication Genome Res., February 1, 2006; 16(2): 182 - 189. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Arnold, L. Bordoli, J. Kopp, and T. Schwede The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling Bioinformatics, January 15, 2006; 22(2): 195 - 201. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Mihalek, I. Res, and O. Lichtarge A structure and evolution-guided Monte Carlo sequence selection strategy for multiple alignment-based analysis of proteins Bioinformatics, January 15, 2006; 22(2): 149 - 156. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Schlessinger, Y. Ofran, G. Yachdav, and B. Rost Epitome: database of structure-inferred antigenic epitopes Nucleic Acids Res., January 1, 2006; 34(suppl_1): D777 - D780. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Kim and Y. Kliger Discovering hidden viral piracy Bioinformatics, December 1, 2005; 21(23): 4216 - 4222. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Li, J. Wang, and J.-A. Feng NdPASA: a pairwise sequence alignment server for distantly related proteins Bioinformatics, October 1, 2005; 21(19): 3803 - 3805. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. T. Konstantinidis and J. M. Tiedje Towards a Genome-Based Taxonomy for Prokaryotes J. Bacteriol., September 15, 2005; 187(18): 6258 - 6264. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Vandepoele, K. Vlieghe, K. Florquin, L. Hennig, G. T.S. Beemster, W. Gruissem, Y. Van de Peer, D. Inze, and L. De Veylder Genome-Wide Identification of Potential Plant E2F Target Genes Plant Physiology, September 1, 2005; 139(1): 316 - 328. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. A. Simossis and J. Heringa PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information Nucleic Acids Res., July 1, 2005; 33(suppl_2): W289 - W294. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Punta and B. Rost PROFcon: novel prediction of long-range contacts Bioinformatics, July 1, 2005; 21(13): 2960 - 2968. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kantardzhieva, I. Gosens, S. Alexeeva, I. M. Punte, I. Versteeg, E. Krieger, C. A. Neefjes-Mol, A. I. den Hollander, S. J. F. Letteboer, J. Klooster, et al. MPP5 Recruits MPP4 to the CRB1 Complex in Photoreceptors Invest. Ophthalmol. Vis. Sci., June 1, 2005; 46(6): 2192 - 2201. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Wallner and A. Elofsson All are not equal: A benchmark of different homology modeling programs Protein Sci., May 1, 2005; 14(5): 1315 - 1327. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Drummond, J. J. Silberg, M. M. Meyer, C. O. Wilke, and F. H. Arnold On the conservative nature of intragenic recombination PNAS, April 12, 2005; 102(15): 5380 - 5385. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. A. Simossis, J. Kleinjung, and J. Heringa Homology-extended sequence alignment Nucleic Acids Res., February 7, 2005; 33(3): 816 - 824. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. B. Do, M. S.P. Mahabhashyam, M. Brudno, and S. Batzoglou ProbCons: Probabilistic consistency-based multiple sequence alignment Genome Res., February 1, 2005; 15(2): 330 - 340. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Zhang, S. Kochhar, and M. G. Grigorov Descriptor-based protein remote homology identification Protein Sci., February 1, 2005; 14(2): 431 - 444. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Blades, J. C. Ison, R. Ranasinghe, and J. B.C. Findlay Automatic generation and evaluation of sparse protein signatures for families of protein structural domains Protein Sci., January 1, 2005; 14(1): 13 - 23. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Vandepoele and Y. Van de Peer Exploring the Plant Transcriptome through Phylogenetic Profiling Plant Physiology, January 1, 2005; 137(1): 31 - 42. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. H. Oakley, Z. Gu, E. Abouheif, N. H. Patel, and W.-H. Li Comparative Methods for the Analysis of Gene-Expression Evolution: An Example Using Yeast Functional Genomic Data Mol. Biol. Evol., January 1, 2005; 22(1): 40 - 50. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. G. Aller, E. T. Eng, C. J. De Feo, and V. M. Unger Eukaryotic CTR Copper Uptake Transporters Require Two Faces of the Third Transmembrane Domain for Helix Packing, Oligomerization, and Function J. Biol. Chem., December 17, 2004; 279(51): 53435 - 53441. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Paiardini, F. Bossa, and S. Pascarella Evolutionarily conserved regions and hydrophobic contacts at the superfamily level: The case of the fold-type I, pyridoxal-5'-phosphate-dependent enzymes Protein Sci., November 1, 2004; 13(11): 2992 - 3005. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Holscher, R. Krajmalnik-Brown, K. M. Ritalahti, F. von Wintzingerode, H. Gorisch, F. E. Loffler, and L. Adrian Multiple Nonidentical Reductive-Dehalogenase-Homologous Genes Are Common in Dehalococcoides Appl. Envir. Microbiol., September 1, 2004; 70(9): 5290 - 5297. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Liu and B. Rost Sequence-based prediction of protein domains Nucleic Acids Res., July 7, 2004; 32(12): 3522 - 3530. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Rost, G. Yachdav, and J. Liu The PredictProtein server Nucleic Acids Res., July 1, 2004; 32(suppl_2): W321 - W326. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nair and B. Rost LOCnet and LOCtarget: sub-cellular localization for structural genomics targets Nucleic Acids Res., July 1, 2004; 32(suppl_2): W517 - W521. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Simillion, K. Vandepoele, Y. Saeys, and Y. Van de Peer Building Genomic Profiles for Uncovering Segmental Homology in the Twilight Zone Genome Res., June 1, 2004; 14(6): 1095 - 1106. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Xie, M. P. Sowden, G. S. C. Dance, A. T. Torelli, H. C. Smith, and J. E. Wedekind The structure of a yeast RNA-editing deaminase provides insight into the fold and function of activation-induced deaminase and APOBEC-1 PNAS, May 25, 2004; 101(21): 8114 - 8119. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. R. Bigelow, D. S. Petrey, J. Liu, D. Przybylski, and B. Rost Predicting transmembrane beta-barrels in proteomes Nucleic Acids Res., May 11, 2004; 32(8): 2566 - 2577. [Abstract] [Full Text] [PDF] |
||||
![]() |

















