Skip Navigation


PEDS Advance Access originally published online on July 12, 2006
Protein Engineering Design and Selection 2006 19(9):421-429; doi:10.1093/protein/gzl026
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrowOA All Versions of this Article:
19/9/421    most recent
gzl026v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Google Scholar
Right arrow Articles by Tsuchiya, Y.
Right arrow Articles by Nakamura, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tsuchiya, Y.
Right arrow Articles by Nakamura, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commerical use, distribution, and reproduction in any medium, provided the original work is properly cited.

Analyses of homo-oligomer interfaces of proteins from the complementarity of molecular surface, electrostatic potential and hydrophobicity

Yuko Tsuchiya1, Kengo Kinoshita2,3,4 and Haruki Nakamura1

1 Institute for Protein Research, Osaka University 3-2 Yamadaoka, Suita, Osaka 565-0871 2 Institute of Medical Science, University of Tokyo 4-6-1 Shirokanedai, Minatoku, Tokyo 108-8639 3 Structure and Function of Biomolecules, SORST, JST 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan

4To whom correspondence should be addressed. Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan E-mail: kino{at}ims.u-tokyo.ac.jp


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Acknowledgements
 References
 
To extract the general structural features of interacting protein pairs, the non-redundant homo-oligomer interfaces (393 interfaces) in the PDB were analyzed using the fine-grained molecular surface, electrostatic potentials and the hydrophobicity calculated as the solvation free energy using empirical parameters. For each property, statistical analyses of the degree of complementarity were carried out, and we developed a method to judge whether interfaces were shape-complementary, electrostatic-complementary and/or hydrophobic-complementary or not. In order to search for the correlation between the property complementarity and structure of the interfaces, at first, we roughly classified all the interfaces into the following five groups according to the structure of the interface and surveyed the correlation between the shape classification and the complementary: cyclic-oligomer (69), twisted-dimer (27), dimer-parallel (14), dimer-perpendicular (109) and dimer-circular (174), where the number in the parenthesis is the number of interfaces in each group. As a result, we found the new characteristic trends as the possible necessary conditions in the formation of homo-oligomer interfaces, especially from the viewpoint of electrostatic complementarity. In addition, we also show that complementarity analyses can be used to discriminate the biological-interface from the crystallographic-interface in homo-oligomer proteins.

Keywords: complementary analysis/computational approach/homo-oligomer protein/protein three-dimensional structure/protein–protein interactions


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Acknowledgements
 References
 
Most biological processes occur through protein–protein interactions, and, therefore, some of the most important topics in molecular biology are to find interacting protein pairs and to construct protein interaction networks. For these purposes, high throughput experiments, such as yeast two-hybrid screens or affinity purification procedures, are carried out (Ito et al., 2001Go; Aebersold and Mann., 2003Go). Although these methods often generate comprehensive information, the ratio of false positives is not small (Sprinzak et al., 2003Go; Patil and Nakamura, 2005Go). On the other hand, low throughput but more reliable experiments, such as X-ray crystallography and NMR spectroscopy, are now producing vast amounts of information about the structures of protein complexes, according to the recent progress of structural genomics projects (Todd et al., 2005Go). Thus, the structural information of protein complexes registered in the PDB (Berman et al., 2000Go) is becoming increasingly useful for constructing protein interaction networks (Sali et al., 2003Go; Aloy et al., 2004Go).

From the viewpoint of molecular interactions, protein–DNA interactions are also a big topic. Many aspects differ between protein–protein and protein–DNA interactions, but one of the most significant differences between them is the range of variety. There are not very many varieties in protein–DNA interaction sites, when we ignore the precise base recognition differences and regard the double-stranded DNA as a negatively charged stick with ditches. This can be seen by the fact that even a simple approach, based on the complementarity of the electrostatic potential and the shape of molecular surface, is sufficient to predict the DNA-binding sites on proteins with 80.0% or better accuracy (Tsuchiya et al., 2004Go). On the other hand, many types of protein–protein interactions have been observed (Larsen et al., 1998Go), and numerous studies have sought to detect some differences between interacting and non-interacting sites in the protein structures. However, partly owing to the varieties of protein–protein interactions, common aspects that differentiate protein–protein interaction sites from non-interaction sites have not been found. For example, regarding the electrostatic interactions in protein–protein complexes, there have been many discussions on the contribution of salt-bridges and hydrophilic interactions (McCoy et al., 1997Go; Sheinerman et al., 2000Go). In addition, the residue composition differences in interaction sites and non-interaction sites have been surveyed by several groups (Xu et al., 1997Go; Keskin et al., 1998Go, 2005Go; Glaser et al., 2001Go; Caffrey et al., 2004Go; Shanahan and Thornton, 2004Go), but those results are not always consistent among the different groups. Furthermore, another topic is sequence conservation within interaction sites, which depends on the types of proteins, and whether the interaction site is more conserved than the other sites (Lichtarge and Sowa, 2002Go; Caffrey et al., 2004Go).

These various observations can be partially solved by classifying the interfaces into several groups according to the differences in the complexes, such as transient or permanent complexes (Jones and Thornton, 1996Go), domain–domain interfaces and protein–protein interfaces (that is, intra-molecular and inter-molecular interactions), homo-oligomers and hetero-oligomers (Goodsell and Olson, 2000Go), or the more comprehensive six types of classifications (Ofran and Rost, 2003Go). The classification of protein complexes is a simple idea, but it will provide some clues to enhance our understanding of protein–protein interactions.

These considerations prompted us to classify all the known complexes in the PDB in the first step and analyze the interfaces to find some rules to form the protein–protein interfaces. In this paper, we describe the development of a new method to analyze the complementarity of homo-oligomer interfaces, the classification of the interfaces according to the shape of the interfaces and then search for the correlation between complementarity and the classification. Goodsell and Olson (2000)Go have extensively studied and classified homo-oligomeric proteins, based on the structural symmetry of the protomers. On the other hand, we focused on the local structures of the interaction sites in the classification step, and the analyses were done by focusing on electrostatic, hydrophobic and shape complementarities of the molecular surfaces, which were considered as the major parameters to describe the physicochemical aspects of protein–protein interactions (Jones and Thornton, 1996Go; McCoy et al., 1997Go; Keskin et al., 1998Go; Sheinerman et al., 2000Go).


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Acknowledgements
 References
 
Dataset construction

We used 393 homo-interfaces selected from all the PDB entries as described below. These entries are assumed to make biological contacts (i.e. not crystal contacts), according to their annotations in the PDB, and the redundancies were eliminated by selecting one representative from each SCOP family.

In the first step, all the PDB entries with two or more chains in their coordinate files and with 2.5 Å or better resolution were selected from the February 2004 release of the PDB (8609 entries). Then, all pairs of chains with a nearest distance of 4.0 Å or less were considered as the contacting chain pairs. For each pair of contacting chains, a sequence comparison was carried out with ALIGN (Pearson, 1994Go), and it was judged to be either a homo-pair or a hetero-pair with a sequence identity threshold of 85%. This generated 13021 homo-pairs and 5442 hetero-pairs. The homo-pairs were further classified according to the annotation in the SCOP database (Andreeva et al., 2004Go), and one representative pair with the highest resolution was selected for each SCOP family. As a result, 867 homo-protein pairs were obtained. The homo-pairs were classified into biological pairs or crystallographic pairs, according to the annotation appearing in the PDB (REMARK 350), and 467 were judged as biological pairs and 270 pairs were considered to be crystallographic contact. The other 130 pairs were removed from the following analyses, because there was no annotation available in the PDB. All the 467 entries are shown in the Supplementary Table S1 available at PEDS online. We further examined the 467 entries visually and decided that 63 entries without symmetry as a whole structure, or the symmetry in the arrangement of protomers, were not used to collect the statistics, because there were discrepancies between the annotations in the original papers and those in the PDB descriptions. For the 737(= 467 + 270) entries, we generated molecular surfaces for each protein chain in the interfaces with the program msroll (Connolly, 1983Go) with default parameters, where the molecular surface is represented as a set of triangle meshes and vertices of every mesh. The molecular surface generated with msroll is very fine grained, and 1.4 vertices were found per 1.0 Å2 on average in our calculation, and we used all the vertices as they are.

Multiple contacting interfaces are very rare, but they sometimes cause trouble in defining the shape of the interface as described later. For instance, when an interface consists of two circular interfaces that are apart from each other, the interface can be recognized as having a single, long, stretched interface in the following definition, with certain values of adjustable parameters. To put such annoying interfaces separate from the classification, we identified multiple contacting interfaces according to the distribution of the distances of the vertices in the interfaces. When two or more peaks exist in the distribution and the maximum distance between the vertices in the homo-interface exceeds 50.0 Å, they were judged as multiple contacts (see Supplementary Figure S1 available at PEDS online for details). According to this definition, we found 11 interfaces consisting of multiple parts, and they were not used in the analysis, but they are included in Supplementary Table S1 available at PEDS online.

Finally, 393 homo-interfaces with biological contact were used for the classification and the following analyses, and 737 (=867 – 130) interfaces with PDB annotations about the crystallographic interface were used to determine the threshold values of the complementarity as described below.

Definition of a homo-interface

For each chain in the interface, we generated a molecular surface as described above. Here we define such pairs of vertices that belong to the different molecular surface and whose distances are <1.0 Å as corresponding vertices. The homo-interface is considered as a set of the corresponding vertices (see Supplementary Figure S2 available at PEDS online).

Calculation of electrostatic potential, hydrophobicity and shape descriptor The electrostatic potential, hydrophobicity and curvatures (shape descriptor) at each vertex in the homo-interface were used to analyze the complementarities of the interface. A hydrogen bond, which is one of the most important factors in protein–protein interactions, is not examined directly here but is implicitly treated in the electrostatic potential. There are not many interfaces that are mainly made from the inter-molecular hydrogen bonds as sometimes seen in inter-protomer beta-sheets, and, thus, direct treatment of hydrogen bonds would not change our results so much.

The electrostatic potential was taken from the eF-site database (Kinoshita and Nakamura, 2004Go), and the hydrophobicity indexes were evaluated by using the empirical relationship between the hydration free energy and the solvent accessible surface area (Richmond, 1984Go). Ooi-Oobatake’s parameter was employed to obtain the hydration free energy (Ooi et al., 1987Go). To describe the shapes of the molecular surface, a curvature at each vertex was calculated. For each vertex, a set of vertices around the vertex in focus was listed at first. Then, a curvature was calculated by considering a normal cross section and by fitting the cross section with a quadratic curve by least square fitting. According to the rotational freedom around the normal vector at the vertex under consideration, a set of curvatures was obtained for each vertex. Here we considered 36 cross sections, corresponding to 5.0 degree intervals, with an 8.0 Å distance threshold. In order to reduce the number of descriptors, we took averages for negative curvatures and positive curvatures among the 36 curvatures, so that we had a positive and a negative curvature for each vertex. The positive and the negative curvatures are considered to reflect the concave and convex characteristics, respectively. The details of the method are described in Tsuchiya et al. (2004)Go.

Complementarities of electrostatic potential, hydrophobicity and shape descriptor For the indexes mentioned above, we defined the complementarities as follows. For the electrostatic potentials, pairs of corresponding vertices with opposite signs of electrostatic potentials are considered as pairs of complementary vertices. However, the sign of the electrostatic potential around 0.0 V easily changes owing to small differences in the environment, which can result in accidental complementary vertices. Therefore, to reduce this spurious complementarity, we focused on the vertices belonging to the continuous regions, which are defined as follows: For all vertices in the continuous regions, the sign of the average electrostatic potential within the range of (r, r + 1.0 Å) should have the same sign, where r is the distance from the considered vertices (see Supplementary Figure S3 available at PEDS online). The range up to r = 12 Å was used in the analysis, because the average value of distance with the same sign (persistent distance) of potential value among all the proteins in our dataset was 11.7 Å vertices.

For the hydrophobicity indexes, the corresponding vertices with a positive sign of hydration free energy are defined as complementary vertices. In the same way as the electrostatic potentials, we used the vertices in the continuous regions, and we employed 3 Å as the threshold of continuity, since the average persistent distance was 2.4 Å (see Supplementary Figure S3 available at PEDS online).

For the shape complementarity, the corresponding vertices with different signs and absolute values of more than 0.1 for both curvatures are considered as complementary vertices. In the case of curvature, we did not consider the continuous region, because the continuity was implicitly evaluated in the calculation of the curvature itself.

Thus, for all properties and all interfaces, the numbers of complementary vertices were counted, and in order to convert them into the ratios to compare the different interface regardless of their size, the numbers were divided by the number of corresponding vertices in the interface. When a ratio exceeded the median ratio of the 737 interfaces for each property, then the interface was judged to have the complementarity of the property. The median values of the ratios in percentages were 4.8, 1.6 and 8.8% for hydrophobicity, electrostatic potential and shape, respectively.

Classification of homo-interfaces

The shape of the homo-interfaces was classified according to the following four steps to search for relations between the complementarity of each property and the shape of the interface. Here we describe the details of the classification scheme, and the philosophy of the classification is mentioned in the Results and discussion section.

Step 1: Same or different surface A sequence alignment was constructed for each pair of interacting chains, using the program ALIGN (Pearson, 1994Go), and the number of such residues belonging to the homo-interface and aligned to each other was counted. This number is considered as the number of the same residues participating in the homo-interfaces. Then, the number was converted to a ratio in order to normalize the protein size, by dividing by the number of residues appearing on the interface in the larger interacting chains. When the ratio is 1.0, the interface is completely the same and vice versa (Figure 1A). When each homo-interface has a ratio of more than 0.6, the homo-interface is considered to use the same part to make the interface. The distribution of the ratio is shown in Figure 1A.


Figure 1
View larger version (39K):
[in this window]
[in a new window]
 
Fig. 1. (A) The distribution of the ratio of the same residues in the homo-interfaces. (B) The distribution of the minimum value of the ratio between the area on the back side and that on the front side. The inlet graph is the close-up view of the same graph. (C) The distribution of the ratio between the 2nd and 3rd moments of inertia of the interfaces. (D) The distribution of the angle between the 2-fold axis and the principal axis of the 3rd (smallest) moment of inertia.

 
Step 2: Twisted or symmetrical The twisted interfaces were defined as the interfaces that have two contacting areas that exist on the opposite sides in each protomer. This was specifically judged as follows: At first, we defined the area of interaction viewed from a certain direction, Afront({theta},{phi}), as the sum of the area of the triangles that exist in the interface and that are located in the direction along with the viewing direction (Figure 1B). In the same way, we defined the area from the opposite view, Aback({theta},{phi}), as the area of interaction located on the opposite side of the viewing direction. If the minimum value of Aback({theta},{phi})/Afront({theta},{phi}) is almost 1.0, then it indicates that the interface is located on the viewing side and the back side at the same time. On the other hand, if the ratio is almost 0.0, then there will be no interacting site behind the observed direction. We assign a minimum ratio, that is, min[Aback({theta},{phi})/Afront({theta},{phi})], to a protomer. If the value exceeds 0.15, then the interface is the twisted interface. In search of the minimum direction, 36 x 18 possibilities for the azimuth ({phi}) and the zenith angle ({theta}), i.e. 10-degree intervals, around the center of gravity of the protomer were considered. The distribution of the ratio is shown in Figure 1B.

Step3: Circular or elliptic To classify the shape of the symmetrical dimer interface, we used the moment of inertia of the interface, which was calculated from the coordinates of the vertices belonging to the interface. For each interface, we used three values of moments, which were designated as I1, I2 and I3 (I1 > I2 > I3). If the ratio of I3 to I2 is considered, then 1.0 corresponds to a circular interface, and a small value of the ratio indicates a long and narrow interface (elliptic interface) (Figure 1C). We used 0.4 as the threshold to judge the circularity of the interface. The distribution of the ratio is shown in Figure 1C.

Step 4: Parallel or perpendicular An elliptic dimer-interface will have a 2-fold axis. The 2-fold axis for each homo-interface was calculated from the rotation matrix obtained by superimposing one surface of the interface on the other (Kinoshita et al., 1999Go), according to the alignment obtained in the first step. Then, the angle between the 2-fold axis and the principal axis of the minimum moment of inertia (I3), which is expected to correspond to the most elongated direction of the interface, was calculated. Finally, the interfaces with the angles of 90 ± 45° and 0 ± 45° were judged as perpendicular and parallel, respectively. The distribution of the ratio is shown in Figure 1D.


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Acknowledgements
 References
 
Classification of homo-interfaces

To find some rules governing the protein–protein interactions in homo-oligomeric proteins, we first classified the non-redundant 393 homo-interfaces. The results of the classification are summarized in Figure 2. The two numbers in the parentheses at the outside of the boxes are the numbers of interfaces belonging to each group, one for the interfaces coming from dimeric proteins (no. with d: in Figure 2) and the other for those from oligomeric proteins (no. with o: in Figure 2), where we refer to the proteins consisting of just two protomer as dimeric proteins and those made from three or more protomers as oligomeric. As seen in the figure (d:2 of cyclic-oligomer), there are two dimeric proteins that use different surfaces to make complexes, that is, 1c94A-B (GCN4 leucine zipper) and 1ln0A-B (catalytic domain of homing endonuclease I-TevI) (the first four numbers and letters are the PDB-ID, followed by a pair of chain IDs that form the homo-interface). However, according to the descriptions in the original references for the entries (Mittl et al., 2000Go; van Roey et al., 2002Go), the oligomerization states that appeared in the PDB for both entries may be wrong, and, actually, the former interface comes from a tetramer and the latter from a crystal contact. Therefore, all the interfaces using different parts of the protomers should be considered as interfaces coming from oligomeric proteins. Thus, we call a homo-interface using different parts of the molecular surfaces cyclic-oligomer interface, and that using the same surface dimer interface, even though they are taken from oligomeric proteins. From a historical viewpoint, this definition of the oligomer interfaces follows the intuitive definition of Monod et al. (1965)Go, who referred to dimer and oligomer interfaces as isologous and heterologous interfaces, respectively. However, these names are not commonly used now and were coined in the pre-structure era and, thus, we use different names in this paper.


Figure 2
View larger version (30K):
[in this window]
[in a new window]
 
Fig. 2. Flowchart of the classification scheme and summary of the classification. All of the homo-interfaces were classified by applying the following criteria sequentially: (i) for every homo-interface, it was judged if the interface is made from the same part of each protomer or not, (ii) if the interface is derived from the same part of the molecule, then it was checked if the interface is twisted or not, (iii) when the interface was not twisted, the circularity of the interface was evaluated, (iv) when the surface had some direction of spread (i.e. not-circular), the interface was classified according to the direction of the spread relative to the 2-fold symmetry axis of the interface. The number in parentheses just after the name of the classification group is the number of interfaces belonging to the group, and the number is divided into that from a dimeric protein (the number after ‘d:’) and a oligomeric protein (that after ‘o:’). The average values of the contact areas (Å2) for each structural group are also shown, as discussed in the text. The pie chart indicates the relative frequency of the number of entries in each group.

 
The dimer interfaces were further classified into twisted-dimer interfaces and symmetrical-dimer interfaces, because these two types are expected to have different trends of interactions (Larsen et al., 1998Go). This is partly supported by the observation that the average area of the contacting surface is much larger in a twisted-dimer (1609.5 Å2) than in a symmetrical-dimer interface (775.8 Å2). Other differences between symmetrical-dimers and twisted-dimers will be described later. We identified 27 twisted-dimer interfaces and 297 symmetrical-dimer interfaces (Figure 2). Among the 27 twisted-dimer interfaces, 22 came from dimeric proteins, and only five interfaces were from oligomeric proteins [1a92A-C (4mer, oligomerization domain of hepatitis delta antigen), 1ekjF-E (8mer, pisum sativum beta-carbonic anhydrase), 1fr3G-H (6mer, molybdate binding protein), 1jkvA-F (6mer, manganese catalase) and 1k1fC-D (4mer, oligomerization domain of Bcr-Abl oncoprotein)]. When the interfaces are taken from oligomeric proteins, we have two or more interfaces, and the twisted interfaces are usually used for the tight interface rather than the other interfaces in the oligomeric proteins. In other words, twisted-dimer interfaces are used to make tight dimers, and it is evident that the twisted interfaces are highly complementary as compared with the other interfaces, as shown later through the complementarity analyses.

In the following step, the symmetrical-dimer interfaces were classified according to the circular quality of the interface. When there was no special direction in the spread of the interfaces (i.e. the shape of the interface was almost circular), the interface was classified as a dimer-circular interface (the precise definition, based on the calculation of the moment of inertia of the interface, is given in the Materials and methods section). As shown in Figure 2, this group was the biggest (174/393 = 44.3%).

The remaining interfaces, elliptic (or non-circular) interfaces, should have particular directions of the spread and, at the same time, they have 2-fold axes, thus the directions of the spread can be classified according to the direction relative to the 2-fold axis of each interface. In the current study, our attention was focused on the interacting part of the molecular surfaces (not the entire fold), and, thus, the direction of the spread should be either parallel or perpendicular to the 2-fold axis, if the 2-fold symmetry is perfect. Thus, the interfaces should be classified into the dimer-parallel and dimer-perpendicular types, according to the angle between the principal axis of the smallest moment of inertia and the 2-fold axis. However, in the actual binding sites, the 2-fold axis is not perfect, and the angles between the 2-fold axis and the direction of the spread are distributed between parallel and perpendicular as seen in the distribution of the angles (Figure 1D).

As shown in Figure 2 and Supplementary Table S1 available at PEDS online, dimer-parallel interfaces were rarely observed (14/393 = 3.6%). In particular, only four examples were observed as dimer-parallel interfaces taken from oligomeric proteins, that is, 1ik9A-B (3mer, DNA repair protein XRCC4), 1jthC-A (4mer, N-terminal region of SNAP25), 1c3cA-B (4mer, adenylosuccinate lyase) and 1gkpE-F (4mer, dihydropyrimidinase). The first example, 1ik9A-B, forms a trimer with a different protomer (chain C: DNA ligase IV) from A and B (DNA repair protein). Thus, 1ik9A-B can be a kind of a dimer in a homo-oligomer. The interface, 1jthC-A, is made from coiled-coil helices, and the contact area is quite small (29.7 Å2). Both of them can be considered as exceptional cases. According to the visual inspection of the 1c3cA-B interface, it consists of two parts: one from the C-terminal (the main part) and three residues (Trp298, Glu293, Asp289) at intervals, and the main part seems to have a perpendicular type of interface. Therefore, the interface could be classified as dimer-perpendicular, although the automatic classification assigned it as dimer-parallel. The remaining interface, 1gkpE-F, comes from a tetramer, and it forms a dimer-of-dimers type tetramer. The contacting area of the interface in 1gkpE-F is 512.4 Å2, which is larger than that of another interface in this protein, 1gkpC-E (380.7 Å2, dimer-perpendicular). This may imply that this protein forms the dimer using the dimer-parallel interface at first, and then it forms the tetramer. In other words, the tetramer can be regarded as a kind of special form of dimeric protein with a dimer-perpendicular interface in 1gkpC-E.

Complementarity of the interface

The complementarity of the protein–protein interface is one of the most important views in the analyses of protein interactions. There are many kinds of properties for which the complementarity can be considered. For example, hydrophobic and electrostatic interactions may play central roles in oligomer formation. In addition, shape complementarity is also well recognized as a necessary condition for oligomer formation (Lawrence and Colman, 1993Go). Thus, we decided to examine the complementarity of the three properties, that is, the hydrophobicity, electrostatic potential and shape complementarities. Each interface was classified into eight property-categories as the combination of having or lacking of the three properties, which were represented with three-dimensional binary vectors, that is, [111], [110], ... , [100] and [000], where the binary values 1 and 0 correspond to having and lacking the complementary interface, respectively, for the properties of hydrophobicity, electrostatic potential, and shape, respectively. For example, an interface with complementarity in hydrophobicity and shape, but not in electrostatic potential, is represented as [101].

In Table I, the numbers of entries for each property-category and structural classification are shown, where we denote each number of the i-th category (i = 1.8) and the j-th structural classification (j = 1.5) as ni,j In order to compare the distributions among the different structural classifications, they were converted into the relative frequencies over the categories (Formula) and then were further converted to the ratio of the relative frequency against their sum of over all structural groups, Formula as plotted in the bar graph in Figure 3. In addition, the numbers of entries with each property for each structural group are summarized in Table II, and the ratios to the total number of each structural group are plotted in Figure 4. As clearly seen in Figure 3, three unique distributions can be observed: (i) higher frequency of [111] for twisted-dimer, (ii) high peak of [101] in dimer-parallel and (iii) preferable usage of [010] in cyclic-oligomer interfaces.


View this table:
[in this window]
[in a new window]
 
Table I. The number of entries for all combinations of property categories and structural groups

 

Figure 3
View larger version (24K):
[in this window]
[in a new window]
 
Fig. 3. The ratios between the relative frequencies of the property-categories and those of the summation over the structural groups.

 

View this table:
[in this window]
[in a new window]
 
Table II. number of entries for all combinations of properties and structural groups

 

Figure 4
View larger version (21K):
[in this window]
[in a new window]
 
Fig. 4. Relative frequency of the number of entries of which the interface is complementary in each property against the structural types.

 
Category [111] and the twisted-dimer interface

About half the twisted-dimer interface entries belong to the category [111] (14/27 = 52%). This ratio is much higher than the relative frequency of all the entries classified as [111] (26% = 102/393, see Table 1). At the same time, no twisted-dimer interfaces have the type with two or more ‘0’ (Figure 3). This indicates that all the twisted-dimer interfaces are complementary in, at least, two properties at the same time. In the category [111], all the properties are complementary above the thresholds, and, thus, an interface belonging to this category can be regarded as a well-designed interface. In addition, the number of entries in the twisted-dimer group is relatively small, as compared with the dimer-perpendicular and dimer-circular type interfaces, which suggests that the formation of twisted-dimer interfaces is not easily accomplished. Namely, such dimer interfaces could provide functionally important sites in the twisted-dimers.

Therefore, the locations of the functional sites of the 27 entries in the category were surveyed based on the original papers for each PDB entry. Here, we regarded the residues involved in the annotation of active sites or the DNA binding sites as the functional site of proteins. Among the 27 entries, five entries [1a92A-C, 1gk6A-B (vimentin coil 2B fragment), 1ic2A-B (tropomyosin molecule), 1no4A-B (pre-assembly scaffolding protein gp7) and 1nknA-B (N-terminal segment of the scallop myosin rod)] were found to be coiled-coil proteins, which were not assumed to have the functional sites, three entries [1ihrA-B (dimeric C-terminal domain of TonB), 1k1fC-D, 1l5bA-B (Domain-swapped cyanovirin-N dimer)] lacked annotations about their functions, three entries [1b8zA-B (DNA binding protein Hu), 1fr3G-H, 1o94A-B (trimethylamine dehydrogenase)] had no annotations on the positions of the functional sites of the proteins and 16 entries were found to be annotated to have the functional site residues. For the 16 entries, we examined whether the functional sites exist on the dimer interfaces or not, and found that all the 16 entries form their functional sites at the interfaces. That is, the dimer formation is essential for the functions of these proteins. These observations lead us to conclude that there is a strong tendency that the twisted-dimer interfaces are well designed, and their formations are essential for the function realization. It is also interesting that among the 27 entries, five complexes (1k1f, 1l5b, 1mv8, 1ihr and 1p5h) are the resultants of domain swapping.

It may be noteworthy that four of the five twisted-dimer interfaces that lack electrostatic complementarity (1a92A-C, 1gk6A-B, 1ic2A-B and 1no4A-B) are coiled-coiled proteins, according to their classification in the SCOP database (Andreeva et al., 2004Go). The coiled-coil proteins, usually, have narrow interfaces around the 2-fold axes, as well as a distinct relationship between the electrostatic potential and the narrowness of the interface, as discussed in the next section.

Category [101] and the dimer-parallel interface

About half the entries in the dimer-parallel interfaces are classified into the [101] category (43% = 6/14). The number of entries in this group is far smaller than the other groups. At the same time, it may be noteworthy that only three entries [1jp3A-B (undecaprenyl pyrophosphate synthase), 1d0cA-B (endothelial nitric oxide synthase heme domain) and 1ik9A-B] belong to the [*1*] category (electrostatic complementary regardless of the other two properties) in the dimer-parallel interfaces (Table 2). As the most illustrative example of the three exceptions, the electrostatic molecular surface of 1jp3A-B is shown in Figure 5A.


Figure 5
View larger version (34K):
[in this window]
[in a new window]
 
Fig. 5. (A) One exceptional case belonging to the dimer-parallel type and having electrostatic complementarity (1jp3A-B). From the left, a ribbon model with a ball-and-stick side-chain of Asp26, a surface model with the binding interface in purple, and a surface model colored according to the electrostatic potential and the hydrophobic side-chains of 1jp3A are shown, respectively. The electrostatic potentials are represented in a color gradation from red to blue for the vertex with the potential from –0.1V to +0.1V. The yellow color is added on the molecular surface of the hydrophobic side-chains. The figures were prepared on Molscript (Kraulis, 1991Go). The protein is depicted so that the 2-fold axis is in the same direction as in the following figure (B). (B) Schematic drawing to show the relation between the distribution of electrostatic potential around the 2-fold axis, to accomplish the complementarity of the electrostatic potential. The symbols ‘+’ and ‘–’ represent positive and negative electrostatic potentials, respectively.

 
The rare observation of dimer-parallel interface and the observation that the proteins in this group do not favor the electrostatic complementary interface may possibly be explained as follows: Owing to the existence of 2-fold symmetries, to achieve electrostatic complementarity, an electrostatic potential with the opposite sign must be placed across the 2-fold axis in homo-interfaces as illustrated in Figure 5B. At the same time, it will not be favorable for a small region to maintain an electrostatic potential with the opposite sign, because a rapid change in the electrostatic potential on the surface would result in large electrostatic fields, which will be an additional constraint to be compensated for upon dimer formation and possibly because large electrostatic fields would destabilize each protomer. The perpendicular-dimer and the circular-dimer interfaces have an abundant space to maintain an electrostatic potential with the opposite sign across the 2-fold axis, but the dimer-parallel interfaces lack sufficient space to do so (Figure 5B).

Some exceptional cases may be owing to strong requirements from the biochemical functions. In fact, 1jp3A-B catalyzes the reaction to generate undecaprenyl pyrophosphate (UPP) from isopentenylphosphate (IPP). UPP and IPP contain phosphate moieties that have negative charges. The active site of this protein is Asp26, and it is surrounded by three highly conserved basic residues, which may be used to accommodate the negative charge in the phosphate group (Figure 5A). The active site locates in the vicinity of the interface; thus, to avoid the charge collision, the counter part of the other protomer has to be negatively charged or neutral.

Category [010] and the difference between cyclic-oligomers and dimers

According to our definition of the cyclic-oligomer and the dimer interface, the former uses two different parts of the molecular surface and the latter uses the same parts. Thus, if the dimer interface has electrostatic complementarity, then it should satisfy the constraint that the opposite electrostatic potential should be aligned symmetrically across the 2-fold axis (Figure 5B). This constraint causes the ratio of the vertices with positive electrostatic potential on one side of the interface to be around 0.5 (half positive and half negative) on the dimer interface. On the other hand, the cyclic-oligomer interfaces are free from this constraint, and, therefore, the ratio of the positive vertices can vary from 0.0 (fully negative) to 1.0 (fully positive). To examine this idea, we calculated the distribution of the ratio of the number of vertices with positive potential to the number of the electrostatic complementary vertices in the 324 dimer interfaces and 69 cyclic-oligomer interfaces (see Figure 2 for the number of interfaces). Unexpectedly, we found that the cyclic-oligomer interfaces have clear peaks around 0.0 and 1.0 (Figure 6). This observation means that the electrostatic complementary surfaces on the cyclic-oligomer interfaces have either positive or negative electrostatic potential and do not have both at the same time. This is strictly forbidden for dimer interfaces, owing to the constraint of the 2-fold symmetry. Thus, it represents a large difference between the cyclic-oligomer and dimer interfaces.


Figure 6
View larger version (25K):
[in this window]
[in a new window]
 
Fig. 6. Distribution of the ratio of the number of vertices with positive electrostatic potential against the total number of complementary electrostatic vertices in the interface.

 
In addition, the electrostatic complementarity is preferably used in the cyclic-oligomer interface, as compared with the dimer-interface, as seen in the higher ratio of the [010] category shown in Figure 3, and the higher ratio of [*1*] to [*0*] (cyclic-oligomer:others=46/23:175/149 = 1.70:1 calculated from Table 1). Especially, no dimer-parallel interfaces are observed in the categories [010] and [011].

The other categories

No outstanding peaks were observed in other categories (Figure 3). Especially, dimer-circular and dimer-perpendicular interfaces have no special category preference. However, the rare observation of the [010] category of dimer-circular interfaces, with only four interfaces [1ifvA-B (pathogenesis-related protein LLPR10.1B), 1jnrA-C (adenylylsulfate reductase), 1mx0E-F (topoisomerase VI-B subunit) and 1o04G-H (mitochondrial aldehyde dehydrogenase)], may be noteworthy, but we could not find any interpretations for the observation.

Discrimination between biological-contact and crystal-contact by electrostatic potential, hydrophobic and shape complementarity

In this study, we relied on the annotations in the PDB to find the biological contact. However, some errors or discrepancies with the primary citation of the entry and the annotation in PDB were observed. These annotation errors could cause some of the crystal contacting interfaces to be classified as ‘biological contacts’ in our dataset. Many such interfaces can be found in the category [000], where no complementarity was observed in any of the properties. However, some crystallographic interfaces can be found in the [*1*] type of complementarity, and, therefore, it is not easy to eliminate all the crystal-contacting interfaces from the database.

One possible approach to avoid such contamination is to use a database such as PQS (Henrick and Thornton, 1998Go), which eliminates the crystal contact appearing in the PDB by considering the solvation free energies calculated from the change of accessible surface upon oligomer formation. The area of contact region is a good indicator to discriminate between biological and crystal contacts (Ponstingl et al., 2000Go). Thus, when two possible interfaces are observed in crystal, and when there is no biological information available to judge which interface is biologically relevant, usual practice to determine the biological interface is to select the larger interfaces. However, it is not true in some cases such as TRF2, a key component of vertebrate telomeres proteins (PDB: 1h6p). This protein is known to act as dimer and its crystal structure contains two protomers in each asymmetric unit, whose area of interface is 938.1 Å2 and it is considered as the biological contact (Fairall et al., 2001Go). On the other hand, the protein has three other interfaces when crystallographic symmetry is considered, whose areas of each interface are 1276.5, 602.0 and 212.1 Å2, respectively. Therefore, according to the guiding principle that the largest interface is the biological interface, the different interface may be regarded as biological interface or might be recognized as the tetramer, when no biological information is available.

In order to overcome these difficulties, we applied our statistics above to see whether the information we gained can discriminate between the crystal and biological contacts. Here we used the sum of the ratios of complementary pairs for all properties divided by each median value as an index to evaluate the degree of complementarity. As a result, in the 1h6p, we found the value of the index for biological interface, 7.29, is larger than that for the other interface, which is considered as crystallographic interface and has the largest contact area, 3.04. These kind of indices other than the contact area would be necessary in the development of prediction method to discriminate the biological complexes from the crystal ones according to the vast accumulation of complex structure in PDB.

Web server to carry out the complementarity analysis and the classification

To analyze the newly determined structures with our scheme, a web server was set up at http://pre-s.protein.osaka-u.ac.jp/~classppi. The server accepts a PDB-format file or PDB-ID and returns a URL to show the assignment of the structural group and the property categories, along with an interactive view of the binding interface, using the pdbjviewer with standard web browsers (Kinoshita and Nakamura, 2004Go). It will be useful to compare the interaction type with the known structures and may be used to validate the homo-oligomer interfaces from statistical viewpoints. The interactive figures of all the entries used in this study can also be seen at http://pre-s.protein.osaka-u.ac.jp/~classppi/supplementary_table.html.


    Footnotes
 
Edited by Kosuke Morikawa


    Acknowledgements
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Acknowledgements
 References
 
We would like to acknowledge the members of the Protein Data Bank Japan (PDBj) for technical support. This work was supported by grants from Research Fellowships of the Japan Society for the Promotion of Science for Young Scientists to Y.T. K.K. was supported by a Grant-in-Aid for Scientific Research from the Japan Society for the Promotion of Science (No. 15710150), and by a Grant-in-Aid for Scientific Research on Priority Areas from the Ministry of Education, Culture, Sports, Science and Technology of Japan (No. 17081003). HN was supported by Grant-in-aid for Scientific Research on priority areas No. 17017024 from the Ministry of Education, Culture, Sports, Science and Technology of Japan, and strategic Japan–UK cooperative program from the Japan Science and Technology Agency. Computation time was provided by the supercomputer system in the Human Genome Center, Institute of Medical Science, University of Tokyo. Funding to pay the Open Access publication charges for this article was provided by a Grant-in-Aid from the Ministry of Education, Culture, Sports, Science, and Technology of Japan to KK.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Acknowledgements
 References
 
Aebersold R. and Mann M. (2003) Nature 422:198–207.[CrossRef][Medline]

Aloy P., et al. (2004) Science 303:2026–2029.[Abstract/Free Full Text]

Andreeva A., Howorth D., Brenner S.E., Hubbard T.J.P., Chothia C., Murzin A.G. (2004) Nucleic Acids Res. 30:264–267.

Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. (2000) Nucleic Acids Res. 28:235–242.[Abstract/Free Full Text]

Caffrey D.R., Somaroo S., Hughes J.D., Mintseris J., Huang E.S. (2004) Protein Sci. 13:190–202.[Abstract/Free Full Text]

Connolly M.L. (1983) Science 221:709–713.[Abstract/Free Full Text]

Fairall L., Chapman L., Moss H., de Lange T., Rhodes D. (2001) Mol. Cell 8:351–361.[CrossRef][ISI][Medline]

Glaser F., Steinberg D.M., Vakser I.A., Ben-Tal N. (2001) Proteins 43:89–102.[CrossRef][ISI][Medline]

Goodsell D.S. and Olson A.J. (2000) Annu. Rev.Biophys. Biomol. Struct. 29:105–153.

Henrick K. and Thornton J.M. (1998) Trends Biochem. Sci. 23:358–361.[CrossRef][ISI][Medline]

Ito T., Chiba T., Yoshida M. (2001) Trends Biotechnol. 19:S23–27.[CrossRef][ISI][Medline]

Jones S. and Thornton J.M. (1996) Proc. Natl Acad. Sci. USA 93:13–20.[Abstract/Free Full Text]

Keskin O., Bahar I., Badretdinov A.Y., Ptitsyn O.B., Jernigan R.L. (1998) Protein Sci. 7:2578–2586.[Abstract]

Keskin O., Ma B., Nussinov R. (2005) J. Mol. Biol. 345:1281–1294.[CrossRef][ISI][Medline]

Kinoshita K. and Nakamura H. (2004) Bioinformatics 20:1329–1330.[Abstract/Free Full Text]

Kinoshita K., Sadanami K., Kidera A., Go N. (1999) Protein Eng. 12:11–14.[Abstract/Free Full Text]

Kraulis P.J. (1991) J. Appl. Cryst. 24:946–950.[CrossRef][ISI]

Larsen T.A., Olson A.J., Goodsell D.S. (1998) Structure 6:421–427.[Medline]

Lawrence M.C. and Colman P.M. (1993) J. Mol. Biol. 234:946–950.[CrossRef][ISI][Medline]

Lichtarge O. and Sowa M.E. (2002) Curr. Opin. Struct. Biol. 12:21–27.[CrossRef][ISI][Medline]

McCoy A.J., Chandana Epa V., Colman P.M. (1997) J. Mol. Biol. 268:570–584.[CrossRef][ISI][Medline]

Mittl P.R.E., Deillon C., Sargent D., Liu N., Klauser S., Thomas R.M., Gutte B., Grutter M.G. (2000) Proc. Natl Acad. Sci. USA 97:2562–2566.[Abstract/Free Full Text]

Monod J., Wyman J., Changeux J.P. (1965) J. Mol. Biol. 12:88–118.[ISI][Medline]

Ofran Y. and Rost B. (2003) J. Mol. Biol. 325:377–387.[CrossRef][ISI][Medline]

Ooi T., Oobatake M., Nemethy G., Scheraga H.A. (1987) Proc. Natl Acad. Sci. USA 84:3086–3090.[Abstract/Free Full Text]

Patil A. and Nakamura H. (2005) BMC Bioinformatics 6:100.[Medline]

Pearson W.R. (1994) Methods Mol. Biol. 24:307–331.[Medline]

Ponstingl H., Henrick K., Thornton J.M. (2000) Proteins 41:47–57.[CrossRef][ISI][Medline]

Richmond T.J. (1984) J. Mol. Biol. 178:63–89.[CrossRef][ISI][Medline]

Sali A., Glaeser R., Earnest T., Baumeister W. (2003) Nature 422:216–225.[CrossRef][Medline]

Shanahan H.P. and Thornton J.M. (2004) Bioinformatics 20:2197–2204.[Abstract/Free Full Text]

Sheinerman F.B., Norel R., Honig B. (2000) Curr. Opin. Struct. Biol. 10:153–159.[CrossRef][ISI][Medline]

Sprinzak E., Sattath S., Margalit H. (2003) J. Mol. Biol. 327:919–923.[CrossRef][ISI][Medline]

Todd A.E., Marsden R.L., Thornton J.M., Orengo C.A. (2005) J. Mol. Biol. 348:1235–1260.[CrossRef][ISI][Medline]

Tsuchiya Y., Kinoshita K., Nakamura H. (2004) Proteins 55:885–894.[CrossRef][ISI][Medline]

van Roey P., Meehan L., Kowalski J.C., Belfort M., Derbyshire V. (2002) Nat. Struct. Biol. 9:806–811.[ISI][Medline]

Xu D., Lin S.L., Nussinov R. (1997) J. Mol. Biol. 265:68–84.[CrossRef][ISI][Medline]

Received April 23, 2006; accepted May 4, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Protein Sci.Home page
M. Higurashi, T. Ishida, and K. Kinoshita
Identification of transient hub proteins and the possible structural basis for their multiple interactions
Protein Sci., January 1, 2008; 17(1): 72 - 78.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrowOA All Versions of this Article:
19/9/421    most recent
gzl026v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Google Scholar
Right arrow Articles by Tsuchiya, Y.
Right arrow Articles by Nakamura, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Tsuchiya, Y.
Right arrow Articles by Nakamura, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?