PEDS Advance Access published online on November 6, 2008
Protein Engineering Design and Selection, doi:10.1093/protein/gzn064
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Prediction of signal peptides in archaea
1Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens 15701 2Department of Informatics with Applications in Biomedicine, University of Central Greece, Papasiopoulou 2–4, Lamia 35100, Greece
3 To whom correspondence should be addressed. E-mail: pbagos{at}biol.uoa.gr, pbagos{at}ucg.gr
| Abstract |
|---|
|
|
|---|
Computational prediction of signal peptides (SPs) and their cleavage sites is of great importance in computational biology; however, currently there is no available method capable of predicting reliably the SPs of archaea, due to the limited amount of experimentally verified proteins with SPs. We performed an extensive literature search in order to identify archaeal proteins having experimentally verified SP and managed to find 69 such proteins, the largest number ever reported. A detailed analysis of these sequences revealed some unique features of the SPs of archaea, such as the unique amino acid composition of the hydrophobic region with a higher than expected occurrence of isoleucine, and a cleavage site resembling more the sequences of gram-positives with almost equal amounts of alanine and valine at the position-3 before the cleavage site and a dominant alanine at position-1, followed in abundance by serine and glycine. Using these proteins as a training set, we trained a hidden Markov model method that predicts the presence of the SPs and their cleavage sites and also discriminates such proteins from cytoplasmic and transmembrane ones. The method performs satisfactorily, yielding a 35-fold cross-validation procedure, a sensitivity of 100% and specificity 98.41% with the Matthews correlation coefficient being equal to 0.964. This particular method is currently the only available method for the prediction of secretory SPs in archaea, and performs consistently and significantly better compared with other available predictors that were trained on sequences of eukaryotic or bacterial origin. Searching 48 completely sequenced archaeal genomes we identified 9437 putative SPs. The method, PRED-SIGNAL, and the results are freely available for academic users at http://bioinformatics.biol.uoa.gr/PRED-SIGNAL/ and we anticipate that it will be a valuable tool for the computational analysis of archaeal genomes.
Keywords: archaea/hidden Markov model/prediction/secreted proteins/signal peptide
| Introduction |
|---|
|
|
|---|
In all three domains of life (bacteria, eukarya and archaea), proteins that are destined to be exported from the cytoplasm are generally (but not exclusively) synthesized as precursor proteins, bearing a cleavable N-terminal signal sequence. The signal peptide (SP) in all cases (bacteria, eukarya and archaea) is composed of a positively charged region at the n-terminus (n-region), a hydrophobic region (h-region) that spans the membrane and a c-region of mostly small and uncharged residues ending at the characteristic cleavage site (von Heijne, 1990
In bacteria, a second signal peptidase (Spase II or Lsp) has been discovered in membrane-bound lipoproteins (Sankaran and Wu, 1995
), that cleaves shorter SPs carrying a distinctive c-region containing a conserved cysteine (von Heijne, 1989
). The conserved cysteine is indispensable in both gram-positive and gram-negative bacteria, and is necessary for membrane anchoring. The post-translational lipid modification involves three enzymes that act sequentially: the prolipoprotein diacylglyceryl transferase (Lgt), that transfers a diacylglyceride to the cysteine sulfydryl group, the signal peptidase II (Spase II or Lsp) that cleaves the SP at the residue before the cysteine forming an apolipoprotein and the apolipoprotein N-acyltransferase (Lnt), which acylates the
-amino group of the apolipoprotein N-terminal cysteine forming the mature lipoprotein (Sankaran and Wu, 1994
; Sankaran et al., 1995
). Although dozens of putative lipoproteins have been identified in archaeal genomes, the absence of Spase II orthologues in archaea as well as the different post-translational modification of cysteine, have resulted in a limited level of knowledge concerning archaeal lipoproteins and a lack of experimentally verified proteins of that type. Translocation of lipoproteins through the Tat pathway has been postulated based on sequence analysis, but only recently has been proven for the Bacterium Desulfovibrio vulgaris (Valente et al., 2007
) and the Archaeon Haloferax volcanii (Gimenez et al., 2007
). Interestingly, in halophilic archaea, the components of the Tat pathway are essential for viability (Dilks et al., 2005
; Thomas and Bolhuis, 2006
) and there is evidence that Tat-dependent translocation is widely used as part of a mechanism for adaptation to extreme saline environments (Rose et al., 2002
).
Computational prediction of secretory SPs was performed initially using weight matrices (von Heijne, 1986
). However, Neural Networks (Nielsen et al., 1997
; Nielsen et al., 1999
) as well as hidden Markov models (HMM) (Nielsen and Krogh, 1998
) introduced by the SignalP method, have been proven to be the most successful methods currently available (Menne et al., 2000
). Recently, SignalP was retrained and, mainly due to better annotation and selection of the training set, yielded an even better accuracy (Bendtsen et al., 2004
), whereas the program TatP has been presented offering the most accurate classification of TAT SPs (Bendtsen et al., 2005
). A different approach has been followed in the Phobius method (Kall et al., 2004
; Kall et al., 2007
), where a HMM was used to predict at the same time the presence of a secretory SP and transmembrane (TM) topology of a given protein. Following this approach, the authors showed that they can minimize the number of SPs predicted as TM segments and vice versa. Concerning lipoproteins, for years, regular expression patterns were used based on the von Heijne rule (von Heijne, 1989
), with various modifications (Madan Babu and Sankaran, 2002
; Sutcliffe and Harrington, 2002
; Madan Babu et al., 2006
; Setubal et al., 2006
). Recently, a method called Lipop was developed, which is based on HMMs and was trained exclusively on gram-negative bacteria lipoproteins (Juncker et al., 2003
). However, the previously mentioned prediction methods have been trained on bacterial and/or eukaryal sequences, and in most cases there are different versions of the predictors aiming at capturing the distinct sequence features of the SPs of particular groups of organisms. Since very few experimentally verified SPs have been characterized from archaea, little is known about the precise characteristics of these sequences, even though there is some evidence suggesting that archaeal SPs exhibit a mixture of characteristics found in eukarya and bacteria. The first computational work on archaea was performed by Nielsen et al. (1999
) when they applied SignalP on the genome of Methanococcus jannaschii (M. jannaschii). They used the three versions of SignalP (trained on gram-positive bacteria, gram-negative bacteria and eukarya), and identified 34 proteins where the predictions concerning the existence of the SP coincided. A more systematic evaluation was performed later by Bardy et al. (2003
), which applied a similar procedure on 15 completely sequenced genomes of archaea, requiring though, that all the three methods would predict the same cleavage site. Although this procedure may be biased to select only proteins that share common features with the sequences found in other domains of life, the general conclusions of these studies suggested that archaeal SPs exhibit a more eukaryotic-like cleavage site (c-region), and a unique h-region resembling the bacterial ones, with a slight over-representation of leucine and isoleucine; leucine is by far the dominant residue in Eukaryotes. Thus, it is evident now that SP predictors trained on eukaryal or bacterial proteins cannot reliably be applied to archaeal sequences. A dedicated prediction method is needed that would be trained exclusively on archaeal SPs. The major problem in this respect is the lack of a large number of experimentally verified signal sequences of archaeal origin. In particular, the Uniprot database (Wu et al., 2006
) lists only 12 archaeal sequences with experimentally verified, precise locations of the cleavage site, and the specialized database of SPs SPDB (Choo et al., 2005
) lists only nine such proteins.
| Materials and methods |
|---|
|
|
|---|
Hidden Markov model
The HMM that we used is similar to the one proposed by SignalP (Nielsen and Krogh, 1998
). It consists of three different sub-models, the SP sub-model corresponding to the secretory SPs, the N-terminal TM sub-model corresponding to the N-terminal TM segment domain, and a globular sub-model used to model the globular N-terminal domains of cytoplasmic or membrane proteins. The central core of the model is the SP sub-model (Fig. 1). It is used to capture the modular nature of SPs, modeling the positively charged n-region, the hydrophobic h-region that spans the membrane and the c-region of mostly small and uncharged residues ending at the characteristic cleavage site (A-X-A) (von Heijne, 1990
). The TM sub-model, is identical to the one used by the HMM-TM predictor for alpha-helical membrane proteins (Bagos et al., 2006
), whereas the globular sub-model consists simply of a self-transitioning state.
|
The model was trained using the Baum–Welch algorithm for labeled sequences (Krogh, 1994
As we noted earlier, the publicly available databases, such as Uniprot (Wu et al., 2006
) and SPDB (Choo et al., 2005
), currently contain annotated information for only a few archaeal sequences with experimentally verified precise locations of the cleavage site. Thus, we decided to perform an extensive literature search in order to identify archaeal sequences with either verified cleavage site locations, or proteins with verified SPs whose cleavage sites are not precisely known. The literature search was performed on Pubmed using terms such as SP or signal sequence, combined with terms such as archaeon, archaea or archaebacteria. Since this strategy yielded also a limited number of archaeal peptides, and given that in many known cases the information concerning the presence of the SP was not available in the abstract or the title of the respective papers, we used additional search terms such as extracellular, extracytoplasmic or secreted. The full-text of the papers were downloaded and read, and the reference lists were also checked in order to identify additional studies that were missed by the initial search. The identified sequences in almost every case were retrieved from Uniprot (Wu et al., 2006
), and were classified according to two criteria; the first is whether the protein has a verified SP cleavage site or not, and the second is whether the protein is translocated using the Tat or the Sec system. Lipoprotein SPs were removed since there are only few such examples (see Results and discussion).
Since the model is also capable of discriminating SPs from globular proteins as well as from proteins with an N-terminal TM helix, we used as negative examples 69 archaeal proteins with an annotated (proven or putative) TM segment within the first 70 amino acids having the N-terminus located in the cytoplasmic space, and 183 archaeal cytoplasmic proteins. The sequences were retrieved from Uniprot and identical sequences were removed to produce a unique set. The training and testing procedure was performed using a 35-fold cross-validation procedure. The training set was split in 35 parts having approximately the same number of SPs, TM and cytoplasmic proteins. The training procedure consisted of removing one of the 35 subsets from the training set, training the model with the remaining proteins and performing the test on the proteins of the set that was removed. This process was repeated in tandem for all the subsets in the training set, and the final prediction accuracy summarized the outcome of all independent tests. Sequences belonging to different subsets used for cross-validation not had >18 identical residues within the SP as advised by previous studies (Nielsen et al., 1997
; Nielsen et al., 1999
). Finally, the complete proteomes of archaea were downloaded from the NCBI ftp site at ftp://ftp.ncbi.nih.gov/.
For measures of accuracy in the binary classification problem (signal peptides versus non-SPs), we used the percentage of correctly classified positive examples (sensitivity), the percentage of correctly classified negative examples (specificity) and the Matthews' correlation coefficient (MCC) that summarizes in a single measure true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) (Baldi et al., 2000
).
| Results and discussion |
|---|
|
|
|---|
The extensive literature search that we performed identified in total 69 archaeal proteins with a verified SP (Table I). Among them, 24 proteins have cleavage sites that were defined precisely by direct sequencing of the N-terminus of the mature protein. The 69 proteins listed in Table I include many extracellular secreted enzymes (proteases, chitinases, amylases, etc), several surface (S-layer) proteins, a few extracellular components of ABC transporter systems, as well as some uncharacterized proteins from the two main kingdoms of archaea (Crenarchaeota and Euryarchaeota). A few sequences were discarded since they were identical in the SP sequence with others in the set (i.e. CSG_METSC which is identical to CSG_METFE and Q7LYT7_PYRWO which is identical to O08452_PYRFU) as well as one sequence (Q97X08_SULSO) for which there was evidence suggesting that it was membrane-anchored (Ferrer et al., 2005
|
The alignment of the SPs at their respective cleavage sites (Fig. 2) is useful in order to obtain insight into the unique sequence features of the archaeal SPs. The sequence logos (Schneider and Stephens, 1990
|
The results obtained in the 35-fold cross-validation procedure are listed in Table II. Our method, PRED-SIGNAL, predicts correctly all the 69 SPs and rejects correctly 248 out of the 252 cytoplasmic and TM proteins. These results correspond to 100% sensitivity and 98.41% specificity with an MCC equal to 0.964. Using the same data set, we evaluated also the various versions of the SignalP method (Nielsen et al., 1997
|
Furthermore, the results obtained by using a combination of different SP predictors (i.e. the SignalP modules trained on Eukaryotes, gram-positive and gram-negative bacteria) illustrate the difficulties of such an approach. It is clear that although such an approach increases the specificity of the selection (i.e. few FPs), the sensitivity decreases (i.e. more FNs). Thus, this strategy (which was until now the only option), reliably predicts some SPs but at the same time overlooks a large number of true SPs. Some general conclusions could also been drawn from these results, verifying previous studies. As we noted earlier, methods trained on gram-positive bacteria (SignalPv2, SignalPv3 and PrediSi) perform slightly better compared with their gram-negative counterparts and clearly better compared with the Eukaryotic-based ones. Phobius, which was trained on a mixed set of proteins (gram-positive, gram-negative and Eukaryote), performs well also, but places lower than methods trained on gram-positive bacteria as well as methods trained on gram-negative bacteria. HMM methods that were trained to discriminate N-terminal TM regions from SPs (Phobius, SignalP-HMM) perform better in terms of specificity compared with Neural Networks and PSSM methods (SignalP-NN, PrediSi). On the other hand, Neural Network-based methods (SignalP-NN) are better in predicting the precise cleavage site location (data not shown). Finally, the updated versions of SignalP (SignalPv3) perform in general better compared with the older versions (SignalPv2).
We also analyzed 48 currently available archaeal completely sequenced genomes. The combined prediction of the three HMM predictors of SignalPv3 (gram-positive, gram-negative and Eukaryotic) produced in total 6145 proteins with a SP, of which 2306 proteins have the same predicted cleavage site for all three methods. The combination of the NN predictors of SignalPv3 yielded 5473 predictions in total of which 2037 have the same prediction for the cleavage site. On the contrary, the method developed here predicts in total a much larger number of proteins with signal sequences, 9437 in all. Among these proteins, according to their annotation the largest group consisted of 5351 hypothetical proteins (56.7%), followed by 1408 (14.92%) enzymes such as lipases, hydrolases, transferases, proteases, kinases, reductases, etc, of which 127 were probable, putative or predicted. There were also 832 (8.81%) membrane proteins such as permeases, transporters, etc of which 82 were probable, putative or predicted and 1024 (10.85%) extracellular proteins (mostly solute-binding components of ABC transport systems, as well as S-layer and flagellar proteins) of which 43 were probable, putative or predicted. Finally, there were 822 proteins that could not be classified (8.71%).
The detailed results for each genome are available as Supplementary data in our web site (http://bioinformatics.biol.uoa.gr/PRED-SIGNAL/). The per-genome percentage of predicted proteins carrying a SP according to our method, ranges from 5 to 14% (average = 8.92%) whereas the same percentage according to the combination of SignalP predictors ranges from 3 to 7%. According to our results, the 15 archaeal genomes belonging to Crenarchaeota do not differ significantly from the 32 genomes belonging to Euryarchaeota (8.54 versus 9.16%, P-value = 0.406 according to t-test) concerning the proportion of proteins predicted to contain a SP. The only representative of Nanoarchaeota (Nanoarchaeum equitans) contains a comparable proportion of secreted proteins (7.09%) although produced by a significantly smaller genome (38 out of the 536 total coding sequences). In an ANOVA analysis, psychrophiles, mesophiles, thermophiles and hyperthermophiles did not show any statistical difference concerning the proportion of proteins carrying a SP (range from 8.2 to 10.7%, P-value = 0.087). Only the six thermoacidophiles showed a smaller proportion (6.58%), whereas one haloalkalophile (13.8%) and the three halophiles (12.53%) showed larger proportions. The amino acid distribution of SPs of all the groups examined using sequence logos did not detect any obvious discrepancies (data not shown). The only detectable difference was the over-representation of alanine and glycine and the under-representation of isoleucine in the h-region of SPs of halophiles and haloalkalophiles. These results need to be studied further, but clearly the large proportion of secreted proteins as well as the abundance of glycine and alanine that suggest a lower hydrophibicity in the h-region of SPs of halophiles, should be attributed to the extensive use of the Tat pathway. PRED-SIGNAL does not discriminate Tat from Sec SPs, and we expect a lot of the secreted proteins of halophiles to contain a Tat SP (Rose et al., 2002
).
Among the proteins predicted by the combination of the HMM versions of SignalP, only 685 were not predicted by our predictor, and among the proteins predicted by the combination of the NN versions of SignalP, 749 were not predicted as having a SP by PRED-SIGNAL. Thus, the HMM method developed here is very specific in detecting putative SPs that are considered highly probable (as judged by the stringent criteria applied by the combination of the SignalP predictors). On the other hand, PRED-SIGNAL predicts an additional large number of proteins that were selected by only one or two modules of SignalP, and a remarkably large number of proteins that were not selected by either one of the versions of SignalP (1039 for the HMM versions and 1139 for the NN versions). This highlights that although the stringent criteria applied by combining the different predictors of SignalP can indeed select a large number of archaeal SPs sharing common features with bacterial and eukaryotic SPs, an additional large number of putative SPs exist that possess some unique features not present in SPs of eukaryotic or bacterial origin. As expected from the analysis of the training set, the largest agreement of the individual SignalP-NN modules with PRED-SIGNAL is to the gram-positive module (correlation coefficient = 0.646), followed by the gram-negative and Eukaryotic modules. Similar, although not identical, results hold also for the SignalP-HMM predictors (data not shown).
| Conclusions |
|---|
|
|
|---|
In this work, we present a first computational method that specifically predicts the SPs of archaeal origin and their cleavage sites. We performed an extensive literature search in order to identify SPs with experimentally verified cleavage sites, as well as verified SPs in which the cleavage site is not precisely located. The analysis confirms previous results that suggested a unique composition of archaeal SPs and justifies our approach for modeling separately the particular sequences. We used an HMM approach, and trained the model to discriminate secretory SPs from cytoplasmic proteins as well as from proteins with an N-terminal TM segment, as these segments are often confused by predictors. The prediction method was also applied to the currently available completely sequenced genomes of archaea, and the results were compared with those of SignalP, which is considered to be the most accurate predictor of non-archaeal sequences. The new prediction method, PRED-SIGNAL, and the secreted proteins identified in the genome analysis are available online at: http://bioinformatics.biol.uoa.gr/PRED-SIGNAL/. We anticipate that this method will be a useful tool for those studying secreted proteins of archaea, since it could be used in genome annotation, genome-wide analyses, and for various proteomics applications. Finally, we note that the modular nature of the HMM allows easily the extension of the model, i.e. in order to incorporate joint prediction of Tat SPs or lipoprotein SPs. In our data set we have included 18 Tat substrates, and we found not >10 archaeal lipoproteins. However, when further experimental data become available on these classes of SPs in the near future, the models architecture could be easily expanded in order to include them and allow better discrimination capability.
| Funding |
|---|
|
|
|---|
P.G.B. was supported by a scholarship from the State Scholarships Foundation of Greece (SSF), for post-doctoral research in the Department of Cell Biology and Biophysics of the University of Athens (Machine Learning Algorithms for Bioinformatics).
| Footnotes |
|---|
Edited by Todd Yeates
| Acknowledgements |
|---|
|
|
|---|
The authors would like to thank the two anonymous reviewers and the editors for their very helpful comments and the constructive criticism that helped in the improvement of the manuscript.
| References |
|---|
|
|
|---|
Akca E., Claus H., Schultz N., Karbach G., Schlott B., Debaerdemaeker T., Declercq J.P., Konig H. Extremophiles (2002) 6:351–358.[CrossRef][Medline]
Alber B.E., Ferry J.G. Proc. Natl Acad. Sci. USA (1994) 91:6909–6913.
Albers S.V., Driessen A.M. Arch. Microbiol. (2002) 177:209–216.[CrossRef][Web of Science][Medline]
Bagos P.G., Liakopoulos T.D., Hamodrakas S.J. BMC Bioinformatics (2006) 7:189.[CrossRef][Medline]
Baldi P., Brunak S., Chauvin Y., Andersen C.A., Nielsen H. Bioinformatics (2000) 16:412–424.
Bardy S.L., Eichler J., Jarrell K.F. Protein Sci. (2003) 12:1833–1843.[CrossRef][Web of Science][Medline]
Bauer M.W., Driskill L.E., Callen W., Snead M.A., Mathur E.J., Kelly R.M. J. Bacteriol. (1999) 181:284–290.
Bendtsen J.D., Nielsen H., von Heijne G., Brunak S. J. Mol. Biol. (2004) 340:783–795.[CrossRef][Web of Science][Medline]
Bendtsen J.D., Nielsen H., Widdick D., Palmer T., Brunak S. BMC Bioinformatics (2005) 6:167.[CrossRef][Medline]
Berks B.C., Palmer T., Sargent F. Curr. Opin. Microbiol. (2005) 8:174–181.[CrossRef][Web of Science][Medline]
Brockl G., Behr M., Fabry S., Hensel R., Kaudewitz H., Biendl E., Konig H. Eur. J. Biochem. (1991) 199:147–152.[Web of Science][Medline]
Brown S.H., Kelly R.M. Appl. Environ. Microbiol. (1993) 59:2614–2621.
Bult C.J., et al. Science (1996) 273:1058–1073.[Abstract]
Catara G., Ruggiero G., La Cara F., Digilio F.A., Capasso A., Rossi M. Extremophiles (2003) 7:391–399.[CrossRef][Medline]
Cheung J., Danna K.J., OConnor E.M., Price L.B., Shand R.F. J Bacteriol. (1997) 179:548–551.
Chong P.K., Wright P.C. J. Proteome Res. (2005) 4:1789–1798.[CrossRef][Web of Science][Medline]
Choo K.H., Tan T.W., Ranganathan S. BMC Bioinformatics (2005) 6:249.[CrossRef][Medline]
Cohen G.N., et al. Mol. Microbiol. (2003) 47:1495–1512.[CrossRef][Web of Science][Medline]
Comfort D.A., Chou C.J., Conners S.B., VanFossen A.L., Kelly R.M. Appl. Environ. Microbiol. (2008) 74:1281–1283.
Crooks G.E., Hon G., Chandonia J.M., Brenner S.E. Genome Res. (2004) 14:1188–1190.
Dharmavaram R., Gillevet P., Konisky J. J. Bacteriol. (1991) 173:2131–2133.
Dilks K., Gimenez M.I., Pohlschroder M. J. Bacteriol. (2005) 187:8104–8113.
Driessen A.J., Nouwen N. Annu. Rev. Biochem (2008) 77:643–667.[CrossRef][Web of Science][Medline]
Duffner F., Bertoldo C., Andersen J.T., Wagner K., Antranikian G. J. Bacteriol. (2000) 182:6331–6338.
Durbin R., Eddy S.R., Krogh A., Mithison G. Biological Sequence Analysis (1998) Cambridge University Press.
Erra-Pujada M., Debeire P., Duchiron F., ODonohue M.J. J. Bacteriol. (1999) 181:3284–3287.
Fariselli P., Martelli P.L., Casadio R. BMC Bioinformatics (2005) 6(Suppl. 4):S12.
Ferrer M., Golyshina O.V., Plou F.J., Timmis K.N., Golyshin P.N. Biochem. J. (2005) 391:269–276.[CrossRef][Web of Science][Medline]
Gimenez M.I., Dilks K., Pohlschroder M. Mol. Microbiol. (2007) 66:1597–1606.[Web of Science][Medline]
Goldman S., Hecht K., Eisenberg H., Mevarech M. J. Bacteriol. (1990) 172:7065–7070.
Habib S.J., Neupert W., Rapaport D. Methods Cell Biol. (2007) 80:761–781.[CrossRef][Web of Science][Medline]
Hashimoto Y., Yamamoto T., Fujiwara S., Takagi M., Imanaka T. J. Bacteriol. (2001) 183:5050–5057.
Hiller K., Grote A., Scheer M., Munch R., Jahn D. Nucleic Acids Res. (2004) 32:W375–W379.
Hutcheon G.W., Vasisht N., Bolhuis A. Extremophiles (2005) 9:487–495.[CrossRef][Medline]
Izotova L.S., Strongin A.Y., Chekulaeva L.N., Sterkin V.E., Ostoslavskaya V.I., Lyublinskaya L.A., Timokhina E.A., Stepanov V.M. J. Bacteriol. (1983) 155:826–830.
Jones R.A., Jermiin L.S., Easteal S., Patel B.K., Beacham I.R. J. Appl. Microbiol. (1999) 86:93–107.[CrossRef][Medline]
Juncker A.S., Willenbrock H., Von Heijne G., Brunak S., Nielsen H., Krogh A. Protein Sci. (2003) 12:1652–1662.[CrossRef][Web of Science][Medline]
Kall L., Krogh A., Sonnhammer E.L. J. Mol. Biol. (2004) 338:1027–1036.[CrossRef][Web of Science][Medline]
Kall L., Krogh A., Sonnhammer E.L. Bioinformatics (2005) 21(Suppl. 1):i251–i257.[Abstract]
Kall L., Krogh A., Sonnhammer E.L. Nucleic Acids Res. (2007) 35:W429–W432.
Kamekura M., Seno Y., Holmes M.L., Dyall-Smith M.L. J. Bacteriol. (1992) 174:736–742.
Kamekura M., Seno Y., Dyall-Smith M. Biochim. Biophys. Acta (1996) 1294:159–167.[CrossRef][Medline]
Kannan Y., Koga Y., Inoue Y., Haruki M., Takagi M., Imanaka T., Morikawa M., Kanaya S. Appl. Environ. Microbiol. (2001) 67:2445–2452.
Kashima Y., Mori K., Fukada H., Ishikawa K. Extremophiles (2005) 9:37–43.[CrossRef][Medline]
Kawarabayasi Y., et al. DNA Res. (1998) 5:55–76.[Abstract]
Kawarabayasi Y., et al. DNA Res. (2001) 8:123–140.[Abstract]
Kim B.K., Pihl T.D., Reeve J.N., Daniels L. J. Bacteriol. (1995) 177:7178–7185.
Krogh A. Proceedings of the12th IAPR International Conference on Pattern Recognition (1994) 140–144.
Krogh A., Larsson B., von Heijne G., Sonnhammer E.L. J. Mol. Biol. (2001) 305:567–580.[CrossRef][Web of Science][Medline]
Lechner J., Sumper M. J. Biol. Chem. (1987) 262:9724–9729.
Lee P.A., Tullman-Ercek D., Georgiou G. Annu. Rev. Microbiol. (2006) 60:373–395.[CrossRef][Web of Science][Medline]
Leveque E., Haye B., Belarbi A. FEMS Microbiol. Lett. (2000) 186:67–71.[Web of Science][Medline]
Lim J.K., Lee H.S., Kim Y.J., Bae S.S., Jeon J.H., Kang S.G., Lee J.H. J. Microbiol. Biotechnol. (2007) 17:1242–1248.[Web of Science][Medline]
Limauro D., Cannio R., Fiorentino G., Rossi M., Bartolucci S. Extremophiles (2001) 5:213–219.[CrossRef][Medline]
Lin X., Tang J. J. Biol. Chem. (1990) 265:1490–1495.
Madan Babu M., Sankaran K. Bioinformatics (2002) 18:641–643.
Madan Babu M., Priya M.L., Selvan A.T., Madera M., Gough J., Aravind L., Sankaran K. J. Bacteriol. (2006) 188:2761–2773.
Mander G.J., Duin E.C., Linder D., Stetter K.O., Hedderich R. Eur. J. Biochem. (2002) 269:1895–1904.[Web of Science][Medline]
Mattar S., Scharf B., Kent S.B., Rodewald K., Oesterhelt D., Engelhard M. J. Biol. Chem. (1994) 269:14939–14945.
Melen K., Krogh A., von Heijne G. J. Mol. Biol. (2003) 327:735–744.[CrossRef][Web of Science][Medline]
Menne K.M., Hermjakob H., Apweiler R. Bioinformatics (2000) 16:741–742.
Morikawa M., Izawa Y., Rashid N., Hoaki T., Imanaka T. Appl. Environ. Microbiol. (1994) 60:4559–4566.
Nielsen H., Krogh A. Proc. Int. Conf. Intell. Syst. Mol. Biol. (1998) 6:122–130.[Medline]
Nielsen H., Engelbrecht J., Brunak S., von Heijne G. Protein Eng. (1997) 10:1–6.
Nielsen H., Brunak S., von Heijne G. Protein Eng. (1999) 12:3–9.
Palmieri G., Casbarra A., Fiume I., Catara G., Capasso A., Marino G., Onesti S., Rossi M. Extremophiles (2006) 10:393–402.[CrossRef][Medline]
Perez-Pomares F., Bautista V., Ferrer J., Pire C., Marhuenda-Egea F.C., Bonete M.J. Extremophiles (2003) 7:299–306.[CrossRef][Medline]
Pohlschroder M., Gimenez M.I., Jarrell K.F. Curr. Opin. Microbiol. (2005) 8:713–719.[Web of Science][Medline]
Rapoport T.A., Matlack K.E., Plath K., Misselwitz B., Staeck O. Biol. Chem. (1999) 380:1143–1150.[CrossRef][Web of Science][Medline]
Rose R.W., Bruser T., Kissinger J.C., Pohlschroder M. Mol. Microbiol. (2002) 45:943–950.[CrossRef][Web of Science][Medline]
Ruiz D.M., De Castro R.E. J. Ind. Microbiol. Biotechnol. (2007) 34:111–115.[CrossRef][Web of Science][Medline]
Sako Y., Croocker P.C., Ishida Y. FEBS Lett. (1997) 415:329–334.[CrossRef][Web of Science][Medline]
Sankaran K., Wu H.C. J. Biol. Chem. (1994) 269:19701–19706.
Sankaran K., Wu H.C. Methods Enzymol. (1995) 248:169–180.[Web of Science][Medline]
Sankaran K., Gupta S.D., Wu H.C. Methods Enzymol. (1995) 250:683–697.[Web of Science][Medline]
Saunders N.F., Ng C., Raftery M., Guilhaus M., Goodchild A., Cavicchioli R. J. Proteome Res. (2006) 5:2457–2464.[CrossRef][Web of Science][Medline]
Schneider T.D., Stephens R.M. Nucleic Acids Res. (1990) 18:6097–6100.
Serour E., Antranikian G. Antonie Van Leeuwenhoek (2002) 81:73–83.[CrossRef][Web of Science][Medline]
Setubal J.C., Reis M., Matsunaga J., Haake D.A. Microbiology (2006) 152:113–121.
She Q., et al. Proc. Natl Acad. Sci. USA (2001) 98:7835–7840.
Shi W., Tang X.F., Huang Y., Gan F., Tang B., Shen P. Extremophiles (2006) 10:599–606.[CrossRef][Medline]
Sumper M., Berg E., Mengele R., Strobel I. J. Bacteriol. (1990) 172:7111–7118.
Sun C., Li Y., Mei S., Lu Q., Zhou L., Xiang H. Mol. Microbiol. (2005) 57:537–549.[CrossRef][Web of Science][Medline]
Sutcliffe I.C., Harrington D.J. Microbiology (2002) 148:2065–2077.
Tanaka T., Fujiwara S., Nishikori S., Fukui T., Takagi M., Imanaka T. Appl. Environ. Microbiol. (1999) 65:5338–5344.
Teter S.A., Klionsky D.J. Trends Cell Biol. (1999) 9:428–431.[CrossRef][Web of Science][Medline]
Thomas J.R., Bolhuis A. FEMS Microbiol. Lett. (2006) 256:44–49.[CrossRef][Web of Science][Medline]
Tuteja R. Arch Biochem. Biophys. (2005) 441:107–111.[CrossRef][Web of Science][Medline]
Valente F.M., Pereira P.M., Venceslau S.S., Regalla M., Coelho A.V., Pereira I.A. FEBS Lett. (2007) 581:3341–3344.[CrossRef][Web of Science][Medline]
van Roosmalen M.L., Geukens N., Jongbloed J.D., Tjalsma H., Dubois J.Y., Bron S., van Dijl J.M., Anne J. Biochim. Biophys. Acta (2004) 1694:279–297.[Medline]
von Heijne G. Nucleic Acids Res. (1986) 14:4683–4690.
von Heijne G. Protein Eng. (1989) 2:531–534.
von Heijne G. J. Membr. Biol. (1990) 115:195–201.[CrossRef][Web of Science][Medline]
von Heijne G., Steppuhn J., Herrmann R.G. Eur. J. Biochem. (1989) 180:535–545.[Web of Science][Medline]
Voorhorst W.G., Eggen R.I., Geerling A.C., Platteeuw C., Siezen R.J., Vos W.M. J. Biol. Chem. (1996) 271:20426–20431.
Voorhorst W.G., Warner A., de Vos W.M., Siezen R.J. Protein Eng. (1997) 10:905–914.
Wakai H., Nakamura S., Kawasaki H., Takada K., Mizutani S., Aono R., Horikoshi K. Extremophiles (1997) 1:29–35.[CrossRef][Medline]
Wang L., Zhou Q., Chen H., Chu Z., Lu J., Zhang Y., Yang S. J. Ind. Microbiol. Biotechnol. (2007) 34:187–192.[CrossRef][Web of Science][Medline]
Woodson J.D., Reynolds A.A., Escalante-Semerena J.C. J. Bacteriol. (2005) 187:5901–5909.
Wu C.H., et al. Nucleic Acids Res. (2006) 34:D187–D191.
Received May 17, 2008; revised September 30, 2008; accepted October 9, 2008.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Y. M. Ng, D. J. VanDyke, B. Chaban, J. Wu, Y. Nosaka, S.-I. Aizawa, and K. F. Jarrell Different Minimal Signal Peptide Lengths Recognized by the Archaeal Prepilin-Like Peptidases FlaK and PibD J. Bacteriol., November 1, 2009; 191(21): 6732 - 6740. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


