PEDS Advance Access originally published online on August 18, 2009
Protein Engineering Design and Selection 2009 22(11):665-671; doi:10.1093/protein/gzp050
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Modeling the functional consequences of single residue replacements in bacteriophage f1 gene V protein
Laboratory for Structural Bioinformatics, Department of Bioinformatics and Computational Biology, George Mason University, 10900 University Blvd. MS 5B3, Manassas, VA 20110, USA
1 To whom correspondence should be addressed. E-mail: mmasso{at}gmu.edu
A computational mutagenesis methodology utilizing a four-body, knowledge-based, statistical contact potential is applied toward globally quantifying relative environmental perturbations (residual scores) in bacteriophage f1 gene V protein (GVP) due to single amino acid substitutions. We show that residual scores correlate well with experimentally measured relative changes in protein function upon mutation. Residual scores also distinguish between GVP amino acid positions grouped according to protein structural or functional roles or based on similarities in physicochemical characteristics. For each mutant, the in silico mutagenesis additionally yields local measures of environmental change (EC scores) occurring at every residue position (residual profile) relative to the native protein. Implementation of the random forest (RF) algorithm, utilizing experimental GVP mutants whose feature vector components include EC scores at the mutated position and at six structurally nearest neighbors, correctly classifies mutants based on function with up to 77% cross-validation accuracy while achieving 0.82 area under the receiver operating characteristic curve. A control experiment highlights the effectiveness of mutant feature vector signals, and a variety of learning curves are generated to analyze the impact of GVP mutant data set size on performance measures. An optimally trained RF model is subsequently used for inferring function for all the remaining unexplored GVP mutants.
Keywords: computational mutagenesis/Delaunay tessellation/knowledge-based statistical potential/random forest supervised classification/structure–function relationship
Received April 2, 2009; revised July 19, 2009; accepted July 20, 2009.