Skip Navigation

This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (18)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Rigden, D. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rigden, D. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Protein Engineering, Vol. 15, No. 2, 65-77, February 2002
© 2002 Oxford University Press

Use of covariance analysis for the prediction of structural domain boundaries from multiple protein sequence alignments

Daniel J. Rigden,1

Embrapa Genetic Resources and Biotechnology, Cenargen/Embrapa, S.A.I.N. Parque Rural, Final W5, Asa Norte, 70770-900, Brasília, Brazil

Current methods for identification of domains within protein sequences require either structural information or the identification of homologous domain sequences in different sequence contexts. Knowledge of structural domain boundaries is important for fold recognition experiments and structural determination by X-ray crystallography or nuclear magnetic resonance spectroscopy using the divide-and-conquer approach. Here, a new and conceptually simple method for the identification of structural domain boundaries in multiple protein sequence alignments is presented. Analysis of covariance at positions within the alignment is first used to predict 3D contacts. By the nature of the domain as an independent folding unit, inter-domain predicted contacts are fewer than intra-domain predicted contacts. By analysing all possible domain boundaries and constructing a smoothed profile of predicted contact density (PCD), true structural domain boundaries are predicted as local profile minima associated with low PCD. A training data set is constructed from 52 non-homologous two-domain protein sequences of known 3D structure and used to determine optimal parameters for the profile analysis. The alignments in the training data set contained 48 ± 17 (mean ± SD) sequences and lengths of 257 ± 121 residues. Of the 47 alignments yielding predictions, 35% of true domain boundaries are predicted to within 15 amino acids by the local profile minimum with the lowest profile value. Including predictions from the second- and third-lowest local minima increases the correct domain boundary coverage to 60%, whereas the lowest five local minima cover 79% of correct domain boundaries. Through further profile analysis, criteria are presented which reliably identify subsets of more accurate predictions. Retrospective analysis of CASP3 targets shows predictions of sufficient accuracy to enable dramatically improved fold recognition results. Finally, a prediction is made for geminivirus AL1 protein which is in full agreement with biochemical data, yielding a plausible, novel threading result.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.