Protein Engineering vol. 16 no. 5 pp. 323-330, 2003
© 2003 Oxford University Press
Reduction of protein sequence complexity by residue grouping
National Laboratory of Solid State Microstructure, Institute of Biophysics and Department of Physics, Nanjing University, Nanjing 210093, China
1 To whom correspondence should be addressed. e-mail: wangwei{at}nju.edu.cn
It is well known that there are some similarities among various naturally occurring amino acids. Thus, the complexity in protein systems could be reduced by sorting these amino acids with similarities into groups and then protein sequences can be simplified by reduced alphabets. This paper discusses how to group similar amino acids and whether there is a minimal amino acid alphabet by which proteins can be folded. Various reduced alphabets are obtained by reserving the maximal information for the simplified protein sequence compared with the parent sequence using global sequence alignment. With these reduced alphabets and simplified similarity matrices, we achieve recognition of the protein fold based on the similarity score of the sequence alignment. The coverage in dataset SCOP40 for various levels of reduction on the amino acid types is obtained, which is the number of homologous pairs detected by program BLAST to the number marked by SCOP40. For the reduced alphabets containing 10 types of amino acids, the ability to detect distantly related folds remains almost at the same level as that by the alphabet of 20 types of amino acids, which implies that 10 types of amino acids may be the degree of freedom for characterizing the complexity in proteins.
Received November 20, 2002; revised March 10, 2003; accepted April 4, 2003.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. N. Davies, A. Secker, A. A. Freitas, E. Clark, J. Timmis, and D. R. Flower Optimizing amino acid groupings for GPCR classification Bioinformatics, September 15, 2008; 24(18): 1980 - 1986. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Suemori and M. Iwakura A Systematic and Comprehensive Combinatorial Approach to Simultaneously Improve the Activity, Reaction Specificity, and Thermal Stability of p-Hydroxybenzoate Hydroxylase J. Biol. Chem., July 6, 2007; 282(27): 19969 - 19978. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Vinogradov 'Genome design' model and multicellular complexity: golden middle Nucleic Acids Res., November 6, 2006; 34(20): 5906 - 5914. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. C. Edgar Local homology recognition and distance measures in linear time using compressed amino acid alphabets Nucleic Acids Res., January 16, 2004; 32(1): 380 - 385. [Abstract] [Full Text] [PDF] |
||||


