Skip Navigation



PEDS Advance Access published online on January 23, 2007

Protein Engineering Design and Selection, doi:10.1093/protein/gzl053
This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
20/1/39    most recent
gzl053v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Shen, H.-B.
Right arrow Articles by Chou, K.-C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Shen, H.-B.
Right arrow Articles by Chou, K.-C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org

Article

Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins

Hong-Bin Shen1 and Kuo-Chen Chou1,2,3

1 Institute of Image Processing & Pattern Recognition, Shanghai Jiaotong University, 1954 Hua-Shan Road, Shanghai 200030, China 2 Gordon Life Science Institute, 13784 Torrey Del Mar Drive, San Diego, CA 92130, USA

3 To whom correspondence should be addressed. E-mail: kchou{at}san.rr.com

A statistical analysis indicated that, of the 35 016 Gram-positive bacterial proteins from the recent Swiss-Prot database, ~57% of these entries are without subcellular location annotations. In the gene ontology database, the corresponding percentage is ~67%, meaning the percentage of proteins without subcellular component annotations is even higher. With the avalanche of gene products generated in the post-genomic era, the number of such location-unknown entries will continuously increase. It is highly desired to develop an automated method for timely and accurately identifying their subcellular localization because the information thus obtained is very useful for both basic research and drug discovery practice. In view of this, an ensemble classifier called ‘Gpos-PLoc’ was developed for predicting Gram-positive protein subcellular localization. The new predictor is featured by fusing many basic classifiers, each of which was engineered according to the optimized evidence-theoretic K-nearest neighbors rule. As a demonstration, tests were performed on Gram-positive proteins among the following five subcellular location sites: (1) cell wall, (2) cytoplasm, (3) extracell, (4) periplasm and (5) plasma membrane. To eliminate redundancy and homology bias, only those proteins which have < 25% sequence identity to any other in a same subcellular location were allowed to be included in the benchmark datasets. The overall success rates thus achieved by Gpos-PLoc were > 80% for both jackknife cross-validation test and independent dataset test, implying that Gpos-PLoc might become a very useful vehicle for expediting the analysis of Gram-positive bacterial proteins. Gpos-PLoc is freely accessible to public as a web-server at http://202.120.37.186/bioinf/Gpos/. To support the need of many investigators in the relevant areas, a downloadable file is provided at the same website to list the results identified by Gpos-PLoc for 31 898 Gram-positive bacterial protein entries in Swiss-Prot database that either have no subcellular location annotation or are annotated with uncertain terms such as ‘probable’, ‘potential’, ‘perhaps’ and ‘by similarity’. Such large-scale results will be updated once a year to include the new entries of Gram-positive bacterial proteins and reflect the continuous development of Gpos-PLoc.

Keywords: amphiphilic pseudo amino acid composition/fusion/gene ontology/Gram-positive/OET-KNN rule

Received October 11, 2006; revised November 20, 2006; accepted November 22, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.