Protein Engineering, Vol. 16, No. 3, 161-167,
March 2003
© 2003 Oxford University Press
Discrete structure of van der Waals domains in globular proteins
Department of Structural Biology, The Weizmann Institute of Science, P.O.B. 26, Rehovot 76100, IsraelPresent address: Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street M-105, Cambridge, MA 02138, USA, E-mail: inberez{at}fas.harvard.edu
| Abstract |
|---|
|
|
|---|
Most globular proteins are divisible by domains, distinct substructures of the globule. The notion of hierarchy of the domains was introduced earlier via van der Waals energy profiles that allow one to subdivide the proteins into domains (subdomains). The question remains open as to what is the possible structural connection of the energy profiles. The recent discovery of the loop-n-lock elements in the globular proteins suggests such a structural connection. A direct comparison of the segmentation by van der Waals energy criteria with the maps of the locked loops of nearly standard size reveals a striking correlation: domains in general appear to consist of one to several such loops. In addition, it was demonstrated that a variety of subdivisions of the same protein into domains is just a regrouping of the loop-n-lock elements.
Keywords: closed loops/formation and alteration of domains/hierarchy of domain structure/loop-n-lock structure/protein folding/protein structure
| Introduction |
|---|
|
|
|---|
Proteins consist of distinct, semi-independent, stable structural fragments (domains) that were elucidated from the results of limited proteolysis (Porter, 1959
It was recently discovered that the globular proteins are universally built of nearly standard size closed loops (loop-n-lock elements), in other words, returns of the chain trajectory with tight end-to-end contacts (Berezovsky et al., 2000c
; Berezovsky and Trifonov, 2001a
,b
). In order to establish the possible connection between energy-derived domains and subdomains on the one hand and the structurally defined loop-n-lock elements on the other, we compared the results obtained by these two independent approaches. The detailed comparisons of the energy maps with positions of the primary closed loops, as described below, show that the domains are essentially made of the loops and the hierarchy of domain structure is defined by interactions between the loops and by the loop regrouping.
| Materials and methods |
|---|
|
|
|---|
Structural data
Major representatives (Orengo et al., 1994
) of the protein superfolds [(Globin (1thb), Trefoil (1i1b), Up-Down (256b), Immunoglobulin (2rhe),
ß Sandwich (1aps), Jelly Roll (2stv), Doubly Wound (4fxn), UB
ß roll (1ubq), TIM-Barrel (7tim)] were analyzed. X-ray data from the Protein Data Bank were supplemented with coordinates of H-atoms (Berezovsky et al., 1999
).
Hierarchy of protein domain structure by the van der Waals energy approach
The algorithm is based on the segmentation of the globule into parts with high concentration of van der Waals energy and further detailed analysis of interactions between these segments. Van der Waals energies were calculated for all pairs of contacting atoms. Only the contact distances between 2.5 and 5.0 Å were considered (Berezovsky et al., 1999
). The LennardJones 612 potential and the standard Scheraga parameters for different types of atoms were used (Dunfield et al., 1978
; Nemethy et al., 1983
). The van der Waals energies were calculated for atoms belonging to residues separated by at least two amino acids along the polypeptide chain (Berezovskii and Tumanyan, 1995
). Figure 1A
demonstrates an example of van der Waals energy walks for the TIM-Barrel fold (7tim). Every point of the curve of the plot is an energy interaction between the parts of the globule separated by a given amino acid residue.
|
The procedure of domain detection and setting of the levels of energy to establish the hierarchy of the domains consists of the following steps:
- Calculation of the interaction energy between parts of the native globule separated by each given residue. The minimal energy value (E0) is found on the curve of interaction energy between parts of the globule. Every local maximum on the curve of interaction energy between parts of the globule corresponds to a point of separation. The null value of the interaction energy means complete energy independence of the adjacent regions from each other.
- Setting of potential barriers, e.g. 0.3 E0, 0.25 E0, 0.2 E0, 0.15 E0, 0.1 E0, 0.05 E0, and analysis of the initial curve of interactions between parts of the globule at different levels of the potential barrier. Any maximum on the initial curve is considered to be a point of structural separation if the differences between this maximum and neighboring deep minima exceed the value of a chosen potential barrier. This generates sets of structural segments corresponding to the values of the barriers. Thus, alternative domains can be defined at different levels of the barrier. These segments are characterized as follows: internal energy of isolated segments eii, integral energy of external interaction of each segment with others
and interaction energies for each pair of segments eij (i,j = 1, ..., K, where K is the number of segments).
- Analysis of the interaction energy within the structural segments separated at the previous step, and between the segments. Any isolated segment is considered as a candidate domain if
(here and below, we compare absolute values of energy). Any candidate domain ill be classified as domain if
. Any two potential domains i and k will be combined into one independent domain if
and
simultaneously. Two potential domains will also be combined if more than 70% of the external energy of one domain pertains to the interaction with the other domain. Any isolated segment with
will be joined with the segment or potential domain for which the first segment has maximal external interaction energy.
The procedure is explained in further detail by the illustration (presented in Figure 1
) that contains a full set of van der Waals curves for triosephosphate isomerase (TIM-Barrel fold, 7tim). Graph A represents the initial curves of a van der Waals energy walk and graphs BE energy curves at different levels of hierarchy. In Figure 1
, accordingly:
- Position 190 of the initial curve (see graph A) gives a minimal value E0 of interactions.
- Different types of structure splitting are observed for the following levels of a potential barrier: 0.25E0 (Figure 1B
), 0.15E0 (Figure 1C
), 0.1E0 (Figure 1D
) and 0.05E0 (Figure 1E
). Figure 1A
demonstrates maxima at positions 64 and 124. Each of these maxima is accompanied by two minima with respective energy differences larger than 0.25E0. This suggests segmentation 163, 65123 and 125248 at the potential barrier 0.25E0 (see also Figure 1B
). Figure 1CE
demonstrate interaction energies within the segments (see legend to Figure 1
) corresponding to the barrier values 0.15E0, 0.1E0 and 0.05E0, respectively.
- This analysis therefore suggests four levels of hierarchy in triosephosphate isomerase: level 1 (0.3E0), single-domain structure; level 2 (0.25E00.15E0), domain 1 residues 163 and 125248, domain 2 residues 65123; level 3 (0.1E0), domain 1 residues 163 and 212248, domain 2 residues 65123, domain 3 residues 125210; and level 4 (0.05E0), domain 1 residues 163 and 125248, domain 2 residues 65123.
Comparative analysis of the domains detected by different computational approaches
We compared the domain assignments by our method with other methods and authors definitions. If the assigned domain boundary is located in the interval l ± 2 (residue l is the domain boundary assigned by the other method), we shall consider the domain boundary assignments identical. This is related to the accuracy of the method, since we take into account the interaction between atoms belonging to the residues separated by at least two residues (see above). The accuracy score is calculated as follows:
![]() |
where Nicor is the number of residues assigned to the same domain both by our program and another method or author definition, Ntot is the total number of residues in the protein chain, m is the number of domain boundaries that were similarly assigned both by the program and by the other methods and M is the number of domains under comparison. If the number of domains assigned by our method is not equal to the number of domains assigned by others, then M is the maximal number of domains in the compared assignments.
Detection of the closed loops and their characterization
Closed loops are defined as continuous sub-trajectories of the folded chains with small C
C
distance between their ends (up to 10 Å). These are not loops in the traditional definition as connectors between elements of secondary structure (Leszczynski and Rose, 1986
; Martin et al., 1995
; Kwasigroch et al., 1996
; Oliva et al., 1997
) or so-called U-turns (Kolinski et al., 1997
), which do not include loop closure points. The closed loops (Berezovsky et al., 2000c
; Berezovsky and Trifonov, 2001a
) connect points distantly positioned along the polypeptide chain, providing the formation of locally compact structural subunits. The C
C
contacts with immediate neighbors along the sequence are not considered. Five residues are taken as the cut-off value. For the (anti)parallel
- and ß-structures forming several short C
C
contacts the shortest one is taken. According to the loop size distribution (Berezovsky et al., 2000c
; Berezovsky and Trifonov, 2001a
), the loops accepted into the mapping procedure have sizes from 15 to 50 amino acid residues.
The mapping procedure sequentially selects the tightest loops and at each step the sequence region corresponding to the mapped loop is excluded from further calculations. In case of partial overlapping the tighter of the two loops is accepted. With overlapping less than five common amino acid residues both loops were accepted.
Van der Waals walks, i.e. interaction energy between the parts of the native globule, are plotted in Figure 2
as described previously (Berezovskii et al., 1997
; Berezovsky et al., 1999
). The top curves on the plots in Figure 2
correspond to the smallest value of the potential barrier [for Jelly Roll fold (2stv) only this curve is presented]. Values of the barrier are as follows: 0.01E0 for
/ß Sandwich (1aps) and 0.05E0 for other superfolds. A total of 54 sections of the van der Waals plots around the loop ends (left and right) of the nine superfolds were aligned and summed together in Figure 3
.
|
|
| Results |
|---|
|
|
|---|
Van der Waals segmentation and loop structure of the major superfolds
As demonstrated earlier (Berezovskii et al., 1997
; Berezovsky et al., 1999
), calculation of the van der Waals segments of the protein globule allows one to define boundaries and hierarchy of the domains at different energy levels. This calculation generates curves with zero levels at the start and end points. The energy profiles (see Figures 1
and 2
) are rather ragged, showing numerous maxima and minima. The maxima correspond to the borders between the energy-defined independent segments. For example, in Figure 1A
the profile for the whole molecule shows several maxima that split the molecule into several segments. The energy-justified splitting can be as detailed, as many maxima are considered to be borders. The energy profiles are then calculated separately for each segment so that other parts of the molecule do not contribute to the neighboring segments (e.g. plots in Figure 1BE
). The subdivision starts with the highest maxima observed and the procedure allows one to reveal additional maxima, which appear in the original plot as changes in the slope (shoulders) rather than maxima. Selected structural units are characterized by substantially higher internal versus external interactions.
The top curves in Figure 2AI
demonstrate more detailed segmentation of the globules for the smallest values of the potential barriers (the notion of the potential barrier is explained in detail and exemplified in Materials and methods). Inspection of Figure 2AI
shows the typical size of these segments: 1050 amino acid residues. Similar sizes are characteristic of closed loops (Berezovsky et al., 2000c
; Berezovsky and Trifonov, 2001a
); that is returning pieces of trajectory with tight C
C
contacts.
A comparison of maps of domain boundaries with borders between primary closed loops (indicated by bars above the energy curves in the Figure 2
) shows that these two maps are rather similar. That is, the majority of the loop ends correspond to the peaks on the energy curves (loop mapping error bars: ±3 amino acid residues). Table I
contains closed loops mapped in nine major superfolds and van der Waals segments of respective structures selected at the lowest level of the potential barrier (see Materials and methods). Sets of the loops and the segments contain 38 and 48 entities, respectively. Among 58 internal (not at the ends of the protein) loop ends there are 36 located at the respective borders of van der Waals segments (bold in Table I
).
|
Quantitative agreement of the loop borders with energy plots is further demonstrated by Figure 3
Hierarchy of domain structure
There are many cases where a fold can be dissected in alternative ways (different techniques and/or authors). A principle of the systematic comparison of the domain assignments by different techniques (see Materials and methods) has been developed earlier (Berezovsky et al., 1999
). In this work, the procedure was applied to the domain assignments of the major superfolds. Single-domain assignments made by the van der Waals techniques are in full accordance with the same conclusions by different techniques for the following superfolds: single-domain structure in Trefoil fold (1i1b) coincides with the result of the DOMAK program (Siddiqui and Barton, 1995
); single-domain structures in
/ß Sandwich (1aps), Doubly Wound (4fxn) and TIM-Barrel (7tim) have also been detected (Islam et al., 1995
); both the DOMAK program and the algorithm developed by Islam et al. (Islam et al., 1995
) show a single domain for Up-Down (256b) and Jelly Roll (2stv) folds. Three-domain assignment for TIM-Barrel fold coincides with that made by the DOMAK program (accuracy score 90%). In addition, the van der Waals approach demonstrates other variants of domains in these structures.
Inspection of known alternative van der Waals domain structures for the major superfolds reveals that their domains and subdomains also consist of closed loops. For example, formation of a two-domain structure in the Trefoil fold (1i1b) is achieved by the contribution of loops 4363 and 7099 to the two-loop domain 43102 while the rest of the structure is a complex domain of two parts: the upstream part (residues 341) contains loop 1940 and region 104153 contains loops 101122 and 122144. In the Doubly Wound fold (4fxn), segment 133 (loop 130), being in strong interaction with the last loop 112134, forms a complex domain of residues 130 and 122138. At the same time, the second domain (residues 32120) is made of two other loops (3566 and 78107). An Up-Down fold (256b) yields two variants of a two-domain structures (in addition to a single-domain description): domain 1 (residues 175; loops 1033 and 4162) and domain 2 (residues 77106; loop 6894) or, alternatively, domain 1 (residues 142; loop 1033) and domain 2 (residues 44106; loops 4162 and 6894). Finally, the domain structure of the TIM-Barrel fold can vary from a single-domain to two- or three-domain organization. Loops 940 (black, Figure 4B
) and 4162 (light grey, Figure 4B
) (segment 163) and loops 128166 (grey, Figure 4D
), 177210 (light grey, Figure 4D
), 207228 (grey, Figure 4B
) and 229243 (light grey, Figure 4B
) (segment 125248) form the first domain of the two-domain structure, and loops 6290 (black, Figure 4C
) and 95126 (light grey, Figure 4C
) segment (65123) the second domain. A three-domain structure is generated by strong interaction between loops in the following order: loops 940 (black), 4162 (light grey), 207228 (grey) and 229243 (light grey) are in the first domain (segments 163 and 212248; Figure 4B
); loops 6290 (black) and 95126 (light grey) form a second domain (residues 65123; Figure 4C
), while the third domain (residues 125210; Figure 4D
) consists of the loops 128166 (grey) and 177210 (light grey). Hence there is an obvious correlation of maxima of van der Waals plots with positions of the loop ends. The interaction of the loops and their closure results in the formation/alteration of domains at different levels of the energy hierarchy. Whichever domain is considered it consists of one or two or any number of nearly standard size loop-n-lock elements.
|
| Discussion |
|---|
|
|
|---|
Hierarchical subdivisions of the van der Waals domains provide common ground for the reconciliation of traditional definitions of domains. The very important advantage of this approach is the possibility of detecting structural domains that involve any number of continuous or discontinuous segments of the polypeptide chain. Moreover, van der Waals segmentation eventually leads to the elucidation of the levels of energy hierarchy, which correspond to distinct sets of structural domains (Berezovsky et al., 1999
|
Domain boundaries defined by the van der Waals energy approach match well the closed loop boundaries. Considering the inaccuracies in loop mapping and in energy calculations, this match is rather surprising. The correlation is best seen, for example, in the case of a Trefoil fold (1i1b) or a TIM-Barrel fold (7tim) in Figure 2G and I
-helices) or modulate interactions between enthalpy driven stable structures (loops, subdomains, domains). The exceptional role traditionally ascribed to directed interactions is an overestimation of their marginal role in the overall globule stability. Directed interactions are always saturated either by interaction between respective groups inside the globular structure or by the contacts with water and counter-ions. Therefore, they could only provide a small advantage for a folded versus an unfolded structure. Van der Waals closure of the loop ends and additional (secondary) distant van der Waals contacts serve as major folding enthalpy contributors. A closed loop can therefore be considered as an elementary unit of domain structure and interactions between them provide diversity of the domain structures in globular proteins.
| Acknowledgments |
|---|
Professor E.N.Trifonovs stimulating discussions and thoughtful comments and Professor M.D.Frank-Kamenetskiis critical reading of the manuscript and fruitful discussions are greatly appreciated. I am grateful to Mrs. A.Weinberg for editing of the text. I.N.B. is a Post-Doctoral Fellow of the Feinberg Graduate School at the Weizmann Institute of Science.
| References |
|---|
|
|
|---|
Baldwin,R.L. and Rose,G.D (1999a) Trends Biochem. Sci., 24, 2633.[CrossRef][Web of Science][Medline]
Baldwin,R.L. and Rose,G.D. (1999b) Trends Biochem. Sci., 24, 7783.[CrossRef][Web of Science][Medline]
Berezovskii,I.N. and Tumanyan,V.G. (1995) Biophysics, 40, 11811187.
Berezovskii,I.N., Esipova,N.G. and Tumanyan,V.G. (1997) Biophysics, 42, 557565.
Berezovsky,I.N. and Trifonov,E.N. (2001a) Protein Eng., 14, 403407.
Berezovsky,I.N. and Trifonov,E.N. (2001b) J. Mol. Biol., 307, 14191426.[CrossRef][Web of Science][Medline]
Berezovsky,I.N., Tumanyan,V.G. and Esipova,N.G. (1997) FEBS Lett., 418, 4346.[CrossRef][Web of Science][Medline]
Berezovsky,I.N., Namiot V,A., Tumanyan,V.G. and Esipova,N.G. (1999) J. Biomol. Struct. Dyn., 17, 133155.[Web of Science][Medline]
Berezovsky,I.N., Esipova,N.G., Tumanyan,V.G. and Namiot V,A. (2000a) J. Biomol. Struct. Dyn., 17, 799809.[Web of Science][Medline]
Berezovsky,I.N., Esipova,N.G. and Tumanyan,V.G. (2000b) J. Comput. Biol., 7, 183192.[CrossRef][Web of Science][Medline]
Berezovsky,I.N., Grosberg,A.Y. and Trifonov,E.N. (2000c) FEBS Lett., 466, 283286.[CrossRef][Web of Science][Medline]
Crippen,G.M. (1978) J. Mol. Biol., 126, 315332.[CrossRef][Web of Science][Medline]
Doolittle,R.F. (1995) Annu. Rev. Biochem., 64, 287314.[CrossRef][Web of Science][Medline]
Dunfield,L.G., Burgess,A.W. and Sheraga,H.A. (1978) J. Phys. Chem., 24, 26092616.[CrossRef]
Islam,S.A., Luo,J. and Sternberg,M.J.E. (1995) Protein Eng., 8, 513525.
Jones,S., Stewart,M., Michie,A., Swindells,M.B., Orengo,C. and Thornton J.M. (1998) Protein Sci., 7, 233242.[Web of Science][Medline]
Kendrew,J.C., Bodo,G., Dintzis,H.M., Parrish,R.G., Wyckoff,H., Phillips,D.C. (1958) Nature, 181, 662666.[CrossRef][Medline]
Kolinski,A., Skolnick,J., Godzik,A. and Hu,W.-P. (1997) Proteins, 27, 290308.[CrossRef][Web of Science][Medline]
Kwasigroch,J.M., Chomilier,J. and Mornon,J.P. (1996) J. Mol. Biol., 259, 855872.[CrossRef][Web of Science][Medline]
Leszczynski,J.F. and Rose,G.D. (1986) Science, 234, 849855.
Martin,A.C.R., Toda,K., Stirk,H.J. and Thornton,J.M. (1995) Protein Eng., 8, 10931101.
Nemethy,G., Pottle,M.S. and Scheraga,H.A. (1983) J. Phys. Chem., 87, 18831887.[CrossRef]
Oliva,B., Bates,P.A., Querol E., Aviles,F.X. and Sternberg,M.J.E. (1997) J. Mol. Biol., 259, 814830.
Orengo,C.A., Jones,D.T. and Thornton,J.M. (1994) Nature, 372, 631634.[CrossRef][Medline]
Porter,R.R. (1959) Biochem. J., 73, 119126.[Web of Science][Medline]
Rose,G.D. (1979) J. Mol. Biol., 134, 447470.[CrossRef][Web of Science][Medline]
Siddiqui,A.S. and Barton,G.J. (1995) Protein Sci., 4, 872884.[Web of Science][Medline]
Wernisch,L., Hunting,M. and Wodak,S. (1999) Proteins, 35, 338352.[CrossRef][Web of Science][Medline]
Received March 18, 2002; revised December 13, 2002; accepted January 24, 2003.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
G. Koczyk and I. N. Berezovsky Domain Hierarchy and closed Loops (DHcL): a server for exploring hierarchy of protein domain structure Nucleic Acids Res., July 1, 2008; 36(suppl_2): W239 - W245. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







