Protein Engineering, Vol. 15, No. 9, 713-715,
September 2002
© 2002 Oxford University Press
COMMUNICATION |
A study on the correlation of G-protein-coupled receptor types with amino acid composition
Computer-Aided Drug Discovery, Pharmacia, MI 49007-4940, USA
| Abstract |
|---|
|
|
|---|
G-protein-coupled receptors have become a target in utilizing bioinformatics and genomics technology to facilitate drug discovery for psychiatric diseases. In this study the covariant-discriminant algorithm [Chou and Elrod (1999)
Keywords: acetylcholine/adrenoceptor/amine/covariant-discriminant algorithm/dopamine/rhodopsin-like receptor/serotonin
| Introduction |
|---|
|
|
|---|
It is known that a large number of hormones, neurotransmitters, chemokines and other chemical messengers interact with G-protein-coupled receptors, which comprise one of the major signal transduction systems in eukaryotic cells and thus are major targets for therapeutic intervention. G-protein-coupled receptors consist of a single polypeptide chain of variable length that traverses the lipid bilayer seven times, forming characteristic transmembrane helices and alternating extracellular and intracellular sequences (Figure 1
|
The G-protein-coupled receptors interact with the guanine-nucleotide-binding signal transducing proteins (G-proteins), which consist of three different sub-units
, ß and
. G-proteins mediate adenylate cyclase activation and inhibition. G-proteins may also act to stimulate the opening of K+ channels in heart cells and to participate in the phosphoinositide signaling system. It is through various G-protein-coupled receptors that many signaling cascades convert external and internal stimuli into intracellular responses. G-protein-coupled receptors characteristically activate one or more members of G-proteins. Meanwhile, the information received by the G-protein-coupled receptors is carried by G-proteins to cellular effectors such as enzymes and ion channels. These effectors influence levels of second messengers that regulate a wide variety of cellular processes including cell growth and differentiation.
Although all known G-protein-coupled receptors are seven-helix transmembrane proteins (Voet and Voet, 1995
), they are a large and functionally diverse superfamily. According to their binding with different ligand types, G-protein-coupled receptors are classified into at least six different families. In this short communication, we would like to report that the amine-binding classes of the rhodopsin-like family of G-protein-coupled receptors are considerably correlated with their amino acid compositions.
| Materials and methods |
|---|
|
|
|---|
According to the GPCRDB (Horn et al., 1998
|
It is instructive to conduct an analysis of the sequence identity for the proteins in a same subset. The sequence identity percentage between two protein sequences is defined as follows. Suppose one sequence is N1 residues long and the other N2 residues long (N1
N2), and the maximum number of residues matched by sliding one sequence along the other is M. The sequence identity percentage between the two sequences is defined as (M / N1)%. The treatment for gaps is according to Thompson et al. (Thompson et al., 1994
The amino acid composition for each of the 167 receptors can be easily derived based on their sequences. The covariant-discriminant algorithm (Chou and Elrod, 1999
) was utilized to analyze the 167 G-protein-coupled receptors based on their amino acid compositions. The statistical analysis was performed by the re-substitution test and the jackknife test, respectively.
| Results and discussion |
|---|
|
|
|---|
Re-substitution test
The so-called re-substitution test is an examination for the self-consistency of an identification method. When the re-substitution test is performed for the current study, the type of each G-protein-coupled receptor in a data set is in turn identified using the rule parameters derived from the same data set, the so-called training data set. The success rate thus obtained for the 167 receptors in Table I
is summarized in Table II
, from which we can see that the overall success rate is 100%, indicating a perfect self-consistency. However, during the process of the re-substitution test, the rule parameters derived from the training data set include the information of the query receptor later plugged back in the test. This will certainly give a somewhat optimistic error estimate because the same receptors are used to derive the rule parameters and to test themselves. Nevertheless, the re-substitution test is absolutely necessary because it reflects the self-consistency of an identification method, especially for its algorithm part. An identification algorithm certainly cannot be deemed as a good one if its self-consistency is poor. In other words, the re-substitution test is necessary but not sufficient for evaluating an identification method. As a complement, a cross-validation test for an independent testing data set is needed because it can reflect the effectiveness of an identification method in practical application. This is important especially for checking the validity of a training database: whether it contains sufficient information to reflect all the important features concerned so as to yield a high success rate in application.
|
Jackknife test
As is well known, the independent data set test, sub-sampling test and jackknife test are the three methods often used for cross-validation in statistical prediction. Among these three, however, the jackknife test is deemed as the most effective and objective one [see Chou and Zhang (Chou and Zhang, 1995
) for a comprehensive discussion about this, and Mardia et al. (Mardia et al., 1979) for the mathematical principle]. During jackknifing, each receptor in the data set is in turn singled out as a tested receptor and all the rule parameters are calculated based on the remaining receptors. In other words, the type of each receptor is identified by the rule parameters derived using all the other receptors except the one which is being identified. During the process of jackknifing both the training data set and testing data set are actually open, and a receptor will in turn move from one to the other. The results of the jackknife test thus obtained for the 167 G-protein-coupled receptors are also given in Table II
, from which the following phenomena can be observed. First, as expected, the success identification rates by the jackknife test are decreased compared with those by the re-substitution test. Such a decrement is more remarkable for small subsets, such as the acetylcholine subset and the dopamine subset. This is because the cluster-tolerant capacity (Chou, 1999
) for small subsets is usually low. Hence, the information loss resulting from jackknifing will have a greater impact on the small subsets than the large ones. Nevertheless, the overall jackknife rate for the data set of 167 G-protein-couple receptors is still as high as 83.23%. It is expected that the success rate for identifying the types of G-protein-coupled receptors can be further enhanced by improving the training data of small subsets by adding into them more new proteins that have been found belonging to the types defined by these subsets.
| Conclusion |
|---|
|
|
|---|
Imagine if the samples of receptors are completely randomly distributed among the four possible subsets, the rate of correct identification by randomly assignment would generally be 1/4 = 25%; if the distribution is weighted according to the sizes of subsets, then the rate of correct identification by the weighted random assignment would be (31/167)2 + (44/167)2 + (38/167)2 + (54/167)2
26.02%. Therefore, the rates of correct identification obtained based on the amino acid composition in both the re-substitution and jackknife tests are much higher than the corresponding completely randomized rate and weighted randomized rate, implying that the type of G-protein-coupled receptors is considerably correlated with the amino acid composition. This suggests that the types of G-protein-coupled receptors are predictable to a considerably accurate extent if a complete or quasi-complete training data set can be established for that purpose. The establishment of such a fast and accurate prediction method will speed up the pace of identifying proper G-protein-coupled receptors to facilitate drug discovery for psychiatric and schizophrenic diseases.
| Notes |
|---|
1 To whom correspondence should be addressed.
| References |
|---|
|
|
|---|
Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 4548.
Chou,K.C. (1999) Biochem. Biophys. Res. Commun., 264, 216224.[CrossRef][Web of Science][Medline]
Chou,K.C. and Elrod,D.W. (1999) Protein Eng., 12, 107118.
Chou,K.C. and Zhang,C.T. (1995) Crit. Rev. Biochem. Mol. Biol., 30, 275349.[Web of Science][Medline]
Gish,W. (1999) http://blast.wustl.edu/pub/nrdb/
Horn,F., Weare,J., Beukers,M.W., Hörsch,S., Bairoch,A., Chen,W., Edvardsen,Ø., Campagne,F. and Vriend,G. (1998) Nucleic Acids Res., 26, 277281.
Schwartz,T.W (1996) In Forman,J.C. and Johansen,T. (eds), Textbook of Receptor Pharmacology. CRC Press, Boca Raton, FL, pp. 6584.
Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 46734680.
Voet,D. and Voet,J.G. (1995) In Biochemistry, 2nd edn. John Wiley & Sons, New York, pp. 12761278.
Received March 5, 2002; accepted June 6, 2002.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
Q.-B. Gao and Z.-Z. Wang Classification of G-protein coupled receptors at four levels Protein Eng. Des. Sel., November 1, 2006; 19(11): 511 - 516. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bhasin and G. P. S. Raghava GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors Nucleic Acids Res., July 1, 2005; 33(suppl_2): W143 - W147. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bhasin and G. P. S. Raghava GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors Nucleic Acids Res., July 1, 2004; 32(suppl_2): W383 - W389. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


