Protein Engineering, Vol. 13, No. 1, 15-19,
January 2000
© 2000 Oxford University Press
Is it better to combine predictions?
Department of Computer Science, University of Wales, Aberystwyth Penglais, Aberystwyth, Ceredigion, SY23 3DB, Wales, UK, 2 Department of Engineering Mathematics and Computer Science, Speed Scientific School, University of Louisville, Louisville, KY 40292, USA and 3 Department of Biostatistics and Medical Informatics and Department of Computer Sciences, University of Wisconsin, 1300 University Avenue, Room 5795 Medical Sciences, Madison, WI 53706, USA.Email: page{at}biostac.wisc.edu
We have compared the accuracy of the individual protein secondary structure prediction methods: PHD, DSC, NNSSP and Predator against the accuracy obtained by combing the predictions of the methods. A range of ways of combing predictions were tested: voting, biased voting, linear discrimination, neural networks and decision trees. The combined methods that involve `learning' (the non-voting methods) were trained using a set of 496 non-homologous domains; this dataset was biased as some of the secondary structure prediction methods had used them for training. We used two independent test sets to compare predictions: the first consisted of 17 non-homologous domains from CASP3 (Third Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction); the second set consisted of 405 domains that were selected in the same way as the training set, and were non-homologous to each other and the training set. On both test datasets the most accurate individual method was NNSSP, then PHD, DSC and the least accurate was Predator; however, it was not possible to conclusively show a significant difference between the individual methods. Comparing the accuracy of the single methods with that obtained by combing predictions it was found that it was better to use a combination of predictions. On both test datasets it was possible to obtain a ~3% improvement in accuracy by combing predictions. In most cases the combined methods were statistically significantly better (at P = 0.05 on the CASP3 test set, and P = 0.01 on the EBI test set). On the CASP3 test dataset there was no significant difference in accuracy between any of the combined method of prediction: on the EBI test dataset, linear discrimination and neural networks significantly outperformed voting techniques. We conclude that it is better to combine predictions.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Albrecht, S. C.E. Tosatto, T. Lengauer, and G. Valle Simple consensus procedures are effective and sufficient in secondary structure prediction Protein Eng. Des. Sel., July 1, 2003; 16(7): 459 - 462. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. Wilson, S. J. Hubbard, and A. J. Doig A critical assessment of the secondary structure {alpha}-helices and their termini in proteins Protein Eng. Des. Sel., July 1, 2002; 15(7): 545 - 554. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. B. Kell, R. M. Darby, and J. Draper Genomic Computing. Explanatory Analysis of Plant Expression Profiling Data Using Machine Learning Plant Physiology, July 1, 2001; 126(3): 943 - 951. [Full Text] [PDF] |
||||
![]() |
A. C.R. Martin The ups and downs of protein topology; rapid comparison of protein structure Protein Eng. Des. Sel., December 1, 2000; 13(12): 829 - 837. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Gijsbers, H. Ceulemans, W. Stalmans, and M. Bollen Structural and Catalytic Similarities between Nucleotide Pyrophosphatases/Phosphodiesterases and Alkaline Phosphatases J. Biol. Chem., January 5, 2001; 276(2): 1361 - 1368. [Abstract] [Full Text] [PDF] |
||||


