Protein Engineering vol. 16 no. 12 pp. 949-955, 2003
© 2003 Oxford University Press
Protein fold comparison by the alignment of topological strings
Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK 1Present address: Department of Biochemistry and Chemistry, University of Leicester, Leicester LE1 7RH, UK
2 To whom correspondence should be addressed. e-mail: wtaylor{at}nimr.mrc.ac.uk
Using the definitions of protein folds encoded in a text string, a dynamic programming algorithm was devised to compare these and identify their largest common substructure and calculate the distance (in terms of the number of edit operations) that this lay from each structure. This provided a metric on which the folds were clustered into a phylogenetic tree. This construction differs from previous automatic structure clustering algorithms as it has explicit representation of the structures at ancestral branching nodes, even when these have no corresponding known structure. The resulting tree was compared with that compiled by an expert in the field and while there was broad agreement, differences were found that resulted from differing degrees of emphasis being placed on the types of operations that can be used to transform structures. Some concluding speculations on the relationship of such trees to the evolutionary history and folding of the proteins are advanced.
Received June 19, 2003; revised September 13, 2003; accepted October 21, 2003