Protein Engineering, Vol. 14, No. 6, 403-407,
June 2001
© 2001 Oxford University Press
Loop fold nature of globular proteins
Igor N. Berezovsky1, and
Edward N. Trifonov
Department of Structural Biology, The Weizmann Institute of Science, P.O.B. 26, Rehovot 76100, Israel
 |
Abstract
|
|---|
Protein chains make numerous returns in globules, thus forming
loops, closed by tight residue-to-residue contactsclosed
loops. Previous statistical analysis of the sizes and locations
of the closed loops in all major protein folds revealed that
the loops have an almost standard contour length of 2530
amino acid residues and follow one after another along the chain.
In this work the closed loops of the major folds are presented
in three dimensions. A special image filtering procedure is
introduced that allows one to visualize the standard size closed
loops for the first time. The loop positions along the sequences
are verified by detection of loop-end clusters.
Keywords: closed loops/image filtering/major folds/protein folding/protein structure
 |
Introduction
|
|---|
Proteins are characterized in many ways on the basis of structural
and evolutionary considerations (
Murzin
et al.1995

;
Orengo
et al.1997

). There is one type of structural element which,
inexplicably, has never been considered, namely closed loops,
i.e. returns of the chain trajectories. Note that these are
not loops in the sense of the traditional definition as linkers
between elements of secondary structure (Leszczynski and Rose,
1986

;
Martin
et al.1995

;
Kwasigroch
et al.1996

;
Oliva
et al.1997

). The so-called U-turns (
Kolinski
et al.1997

) also
do not include the loop closure points. Closed loops of a protein
globule connect points distantly positioned along the chain
which are thus in contact (defined, for example, as short C
to C

distances). Compactness of the proteins implies large numbers
of such chain-to-chain contacts, unlike loose Gaussian trajectories
only occasionally returning to themselves. The loop fold structure
of the globular proteins is not immediately seen, being disguised
by frequent trajectory changes due to various elements of secondary
structure, primarily

-helices. A simple filtering (smoothing)
procedure described below makes the closed loops clearly seen.
The loops and the sites of multiple contacts in 10 major folds
are analyzed. The maps are constructed in which the closed loops
(total average size 2426 amino acid residues) show the
same size preference as in previous work (2530 residues)
where the statistics of the loop sizes of large ensembles of
protein structures were analyzed (
Berezovsky
et al.2000

). Three-dimensional
(3-D) structures of 10 major fold types demonstrate that the
proteins are universally built of consecutively connected standard
closed loops.
 |
Materials and methods
|
|---|
We define the closed loops as continuous sub-trajectories of
the folded chains with small C

-to-C

distances between their
ends (up to 10 Å). The C

C

contacts with immediate
neighbors along the sequence are not considered. Five residues
are taken as the cut-off value. The standard deviations for
the peak values in the loop size histograms (Figure 1

) are estimated
as square roots of the values in the nearby minima.

View larger version (72K):
[in this window]
[in a new window]
|
Fig. 1. . Loop size distributions for 101 eukaryotic proteins (a) and 162 prokaryotic proteins (b), of more than 200 amino acid residues. The protein structures for the analysis are taken from PDB database. The threshold of allowed sequence similarity is taken at 25% (PDB_SELECT).
|
|
Inspection of the positional distribution of the loop ends along
the sequences reveals numerous sites (small regions) where many
loops originate, as illustrated in
Figure
2

. We use such diagrams
for the purpose of locating the loops, considering first the
most prominent ones, as suggested by the diagrams. The mapping
is started from the sites of multiple end-to-end connections.
That is, we map first only the loops with both ends belonging
to multiple connection points, as in
Figure
2c

(see also the
flowchart of this procedure presented in Figure 3

). The loops
with tightest end-to-end distances irrespective of their size
are taken first. The procedure is repeated until the 10 Å
limit is reached, although it is normally exhausted at shorter
distances. Then the second round follows, which involves the
standard loops with only one end belonging to the multiple connection
points. The last stage involves single isolated standard loops
with no multiple connections, again in the order of the tightness
of the closure. These stages are also presented in the flowchart
(Figure 3

).

View larger version (18K):
[in this window]
[in a new window]
|
Fig. 3. . Flowchart of the loop mapping procedure. Multiple contact sites (MCS), dots in the scheme, correspond to positions of the major maxima in the diagrams of multiple contacts as in Figure 2c .
|
|
The uncertainty for the points of multiple contacts is ±2
amino acid residues. The (anti)parallel

- and ß-structures
form several short C

C

contacts, in which case the shortest
is taken. As suggested by the size distribution of the loops
(
Berezovsky
et al.2000

) the least frequent loop size is 15
amino acid residues. Correspondingly, the loops accepted into
the mapping procedure could be as small as 16 amino acid residues.
Acceptance of such small loops may cause a bias in the final
loop size distribution, but as the results below indicate, this
is not the case. The procedure described is practically devoid
of uncertainties.
In a few cases composite loops have been observed, consisting of two loops with all four ends within a 10 Å distance. Such loops were split into smaller ones. For example, loop 3283 (4.014 Å, 52) in ß Aligned Prism (1vmoA) consists of loops 3157 (3.740 Å, 27) and 5676 (7.840 Å, 21). In the case of partial overlapping, the tighter of the two loops was accepted. With overlapping of less than five common amino acid residues, both loops were accepted.
A trajectory smoothing procedure replaces the coordinates of every C
atom by average coordinates for seven C
atoms centered at a given residue.
 |
Results
|
|---|
The updated histograms of the loop size distributions for prokaryotic
and eukaryotic proteins are shown in
Figure
1

. The histograms
are calculated as earlier (
Berezovsky
et al.2000

) by utilizing
enlarged sets of structures (162 prokaryotic and 101 eukaryotic
proteins). Both plots demonstrate a major preference for loop
sizes of 2530 amino acid residues. The amplitudes in
the peak positions show an excess over nearby minima of 603
and 423 occurrences in Figure 1a and b

, respectively. This corresponds
to over 11 standard deviations in both cases. The purpose of
the loop mapping is to split every protein structure into a
set of minimum-sized elementary closed loopsprimary loops.
An important lead in the process of mapping is existence of
multiple contacts, clusters of N-ends and C-ends of the loops.
This is illustrated by Figure 2

, where clusters of N-ends (Figure
2a

), C-ends (Figure 2b

) and both (a product of the two, Figure
2c

) are shown for the ß Trefoil fold (1afc A). Figure
4

displays the polypeptide chain trajectories for 10 major folds
in standard backbone (left) and smoothed presentations with
the mapped loops indicated by various colors. A striking uniformity
of the variety of proteins otherwise thoroughly different is
observed. This is better seen when the smoothed trajectories
not obstructed by ubiquitous zig-zags of

-helixes are inspected.
As
Figure
4

amply illustrates, all major types of folds, despite
substantial differences in their overall appearances, are equally
`spelled' by consecutive arrays of the loops. It is important
to note that this is not only a property of the typical sized
folds (100200 amino acid residues), but also of substantially
larger molecules (data not shown), such as ß-galactosidase
made of five domains (
Jacobson
et al.1994

) or the huge multi-domain
muscular protein titin (
Politou
et al.1996

). In other words,
all globular proteins regardless of their types, size or function
appear to be largely built of connected loops of the same typical
size. In a few cases the mapping procedure allows for mutually
exclusive alternative linear arrays of the loops. Although both
variants can be considered in each case, selection of the tighter
end-to-end contacts leads to a unique choice. For example, in
the case of

ß Barrel the array 940 (4.463,
32), 4263 (4.377, 22), 6390 (4.365, 28), 90122
(4.735, 33), 131170 (3.739, 40), 167211 (4.489,
45), 211232 (4.640, 22) and 230249 (6.316, 20)
can be partially replaced by the overlapping set 68112
(6.670, 45), 110150 (4.379, 41), 147190 (7.744,
44) and 188227 (5.883, 40). The latter, however, offers
the loops which are more relaxed and more scattered size-wise.
Similarly, in the rotationally symmetrical case of

ß
Horseshoe (1bnh) two alternatives are possible, as shown in
Figure 4

, bottom. Apparent secondary contacts appear in composite
loops, that is, large loops with smaller internal closures.
For example, in ß Sandwich (2hlaB) loop 3980
(4.421, 42) covers the loop 4968 (4.760, 20). A large
loop may consist of several nearly standard-sized loops. For
example, loop 254299 (4.047, 46) of ß 8 Propellor
consists of loops 254272 (5.156, 19) and 273289
(3.814, 17); similarly, loop 324384 (4.250, 61) is made
of smaller ones, 327343 (4.422, 17) and 348379
(4.936, 32); finally, loop 468509 splits into 466482
(6.187, 17) and 485506 (3.359, 22). In the ß
Sandwich (2hlaB) region 3980 can be considered either
as a composite loop or as the overlap with another tight loop
2863 (3.741, 36). In both cases, linearity and nearly
standard size are maintained. Thus, composite loops responsible
for the secondary contacts between primary loops do not interfere
with the general linear arrangement of the primary loops. The
distant contacts may be responsible for 3-D stabilization of
sequentially engaged primary loops during the protein folding.
The mean value of the loop sizes in the maps of Figure 2

is
2426 amino acid residues, matching well the preferential
size observed in the overall histogram of the loop sizes, as
in Figure 1

.




View larger version (104K):
[in this window]
[in a new window]
|
Fig. 4. . Major protein folds in traditional backbone presentation (left of each single-column group) and in smoothed form (right of each single-column group): Non-Bundle (1eca), ß Roll (1pht), ß Sandwich (2hla B), ß Trefoil (1afc A), ß Aligned Prism (1vmo A), ß Barrel (4tim A), ß 8 Propellor (3aah A), ß 3 Solenoid (2pec), ß 3-Layer Sandwich (1 pya B), ß Horseshoe (1bnh). The alternative arrays for the ß Horseshoe (1bnh) are 227 (4.267, 26), 2653 (4.131, 28), 5482 (4.007, 29), 83110 (4.135, 28), 112141 (4.227, 30), 140167 (3.907, 28), 165194 (3.992, 30), 197224 (4.045, 28), 226255 (4.436, 30), 254281 (4.114, 28), 282310 (4.383, 29), 311338 (4.271, 28), 339367 (4.451, 29), 368395 (4.276, 28), 396424 (4.390, 29), 430456 (4.242, 27) (average C C distance is 4.2 Å); and 227 (4.267, 26), 3260 (4.576, 29), 6089 (4.969, 30), 94120 (4.862, 27), 119147 (4.798, 29), 151177 (4.273, 27), 178205 (5.077, 28), 264291 (4.543, 28), 292320 (4.927, 29), 322348 (4.277, 27), 349376 (4.765, 28), 378404 (4.596, 27), 401429 (4.438, 29), 430456 (4.242, 27). The final, double-page spread set consists of the loops with larger end-to-end distances(average C C distance is 4.6 Å). These can be considered rather as secondary contacts in the polypeptide chain trajectory. Chain sections of various colors correspond to the nearly standard size closed loops mapped as described.
|
|
 |
Discussion
|
|---|
The preferred size of the closed loops, 2530 amino acid
residues, may originate from polymer statistical properties
of the polypeptide chains. It is in the range of the optimum
size for ring (loop) closure of the polypeptide chain with a
mixed amino acid sequence (
Berezovsky
et al.2000

). At first
sight this may appear as a statistical feature of no relevance
to the biological functions of proteins. The loops as such are
obviously important building blocks of the protein structure
and may well have been under selection pressure during protein
evolution. Both the size of the loops and their actual positions
along the protein sequence could have been selected. It was,
perhaps, natural from the beginning to keep unchanged the optimum
size as enforced by the polymer statistics. As to the actual
positions of the loop ends in the protein sequence, selection
most likely has taken place. Indeed, the sequence, evolutionarily
driven, would have matching sites, making `stitches' at key
positions to guarantee an efficient and unique loop pattern.
Such hypothetical stitches obviously should play an important
role both for primary looping (linear arrangement of nearly
standard-sized loops) and for secondary interactions. The loop
closure might also have been an important stage in the earliest
evolution of proteins when the chain lengths were approaching
the loop closure size. Since there are many proteins of such
small size which are biologically active (
Douglass
et al.1984

),
one could speculate that the observed nearly standard-sized
loops may have been independent active entities at some early
stage of protein evolution. Later, owing to fusion of the respective
genes, the small loop-like proteins may have turned into larger
multiloop structures. The loop closure dramatically decreases
the number of alternative conformations that the chain may acquire,
thus fixing selected conformations. The chain-to-chain contacts
are also advantageous energetically, providing the necessary
stability to the loops and their associations in multiloop structures.
The linear arrangement of the loops immediately suggests the sequence of events during cotranslational protein folding. The folding process may start with the formation of the contact closing the first primary loop. Other loops would be formed sequentially involving correponding interacting sites until completion of the synthesis. Already at this initial stage the sequence would thus provide instructions for the protein folding (looping). A whole arsenal of current concepts about protein structure suggests further, secondary events: formation of
-helixes, of ß-sheets, of secondary hydrophobic and polar loop-to-loop contacts, etc. It is not excluded, of course, that the formation of these secondary elements may occur already during the primary looping as well as subsequently.
 |
Notes
|
|---|
1 To whom correspondence should be addressed. E-mail:
igor.berezovsky{at}weizmann.ac.il 
 |
Acknowledgments
|
|---|
The authors are grateful to A.Grosberg for stimulating discussions
and E.Yakobson for critical reading of the manuscript. I.N.B.
is a Post-Doctoral Fellow of the Feinberg Graduate School, Weizmann
Institute of Science.
 |
References
|
|---|
Berezovsky,I.N., Grosberg,A.Y. and Trifonov,E.N. (2000)
FEBS Lett.,
466, 283286.
[Web of Science][Medline]
Douglass,J., Civelli,O. and Herbert,E. (1984) Annu. Rev. Biochem., 53, 665714.[Web of Science][Medline]
Jacobson,R.H., Zhang,X.J., DuBose,R.F. and Matthews,B.W. (1994) Nature, 369, 761766.[Medline]
Kolinski,A., Skolnick,J., Godzik,A. and Hu,W.-P. (1997) Proteins: Struct. Funct. Genet., 27, 290308.[Web of Science][Medline]
Kwasigroch,J.M., Chomilier,J. and Mornon,J.P. (1996) J. Mol. Biol., 259, 855872.[Web of Science][Medline]
Leszczynski,J.F. and Rose,G.D. (1986) Science, 234, 849855.[Abstract/Free Full Text]
Martin,A.C.R., Toda,K., Stirk,H.J. and Thornton,J.M. (1995) Protein Eng., 8, 10931101.[Abstract/Free Full Text]
Murzin,A., Brenner,S.E., Hubbard,T.J.P. and Chothia,C. (1995) J. Mol. Biol., 247, 536540.[Web of Science][Medline]
Oliva,B., Bates,P.A., Querol E., Aviles,F.X. and Sternberg M.J.E. (1997) J. Mol. Biol., 259, 814830.
Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells M.B. and Thornton,J.M. (1997) Structure, 5, 10931108.[Medline]
Politou,A.S., Gautel,M., Improta,S., Vangelista,L. and Pastore A. (1996) J. Mol. Biol., 255, 604616.[Web of Science][Medline]
Received October 18, 2000;
revised February 26, 2001;
accepted March 12, 2001.

CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:

|
 |

|
 |
 
I. N. Berezovsky
Discrete structure of van der Waals domains in globular proteins
Protein Eng. Des. Sel.,
March 1, 2003;
16(3):
161 - 167.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
I. N. Berezovsky, V. M. Kirzhner, A. Kirzhner, V. R. Rosenfeld, and E. N. Trifonov
Closed loops: persistence of the protein chain returns
Protein Eng. Des. Sel.,
December 1, 2002;
15(12):
955 - 957.
[Abstract]
[Full Text]
[PDF]
|
 |
|