PEDS Advance Access published online on July 8, 2009
Protein Engineering Design and Selection, doi:10.1093/protein/gzp040
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alignment of multiple protein structures based on sequence and structure features
1Department of Bioengineering and Therapeutic Sciences 2Department of Pharmaceutical Chemistry 3California Institute for Quantitative Biomedical Research, University of California at San Francisco, Byers Hall, Box 2552, 1700 4th Street, Suite 503B, San Francisco, CA 94158, USA 4Present address: Bioinformatics Institute, 30 Biopolis Street, #07-01 Matrix, Singapore 138 671, Singapore 5Present address: Structural Genomics Unit, Bioinformatics and Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain 6Present address: E. I. DuPont India Pvt Ltd, DuPont Knowledge Center, Hyderabad 500 078, India
8 To whom correspondence should be addressed. E-mail: sali{at}salilab.org
Comparing the structures of proteins is crucial to gaining insight into protein evolution and function. Here, we align the sequences of multiple protein structures by a dynamic programming optimization of a scoring function that is a sum of an affine gap penalty and terms dependent on various sequence and structure features (SALIGN). The features include amino acid residue type, residue position, residue accessible surface area, residue secondary structure state and the conformation of a short segment centered on the residue. The multiple alignment is built by following the guide tree constructed from the matrix of all pairwise protein alignment scores. Importantly, the method does not depend on the exact values of various parameters, such as feature weights and gap penalties, because the optimal alignment across a range of parameter values is found. Using multiple structure alignments in the HOMSTRAD database, SALIGN was benchmarked against MUSTANG for multiple alignments as well as against TM-align and CE for pairwise alignments. On the average, SALIGN produces a 15% improvement in structural overlap over HOMSTRAD and 14% over MUSTANG, and yields more equivalent structural positions than TM-align and CE in 90% and 95% of cases, respectively. The utility of accurate multiple structure alignment is illustrated by its application to comparative protein structure modeling.
Keywords: multiple structure alignment/dynamic programming/guide tree/RMSD/structure overlap
Received June 18, 2009; revised June 18, 2009; accepted June 18, 2009.
7 M.S.M. and B.M.W. contributed equally to this work.