PEDS Advance Access originally published online on June 24, 2005
Protein Engineering Design and Selection 2005 18(7):329-335; doi:10.1093/protein/gzi037
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Computational modeling of type I collagen fibers to determine the extracellular matrix structure of connective tissues
1Biomimetics Technologies, Unit 101, 6 Fernwood Gardens, Toronto, Ontario M4K 2J9, Canada, 2Pittsburgh Safety and Health, Technology Center, Cochrans Mill Road, Pittsburgh, PA 15236, USA and 3University of Toronto, Toronto Western Hospital, 399 Bathurst Street, Toronto, Ontario M5T 2S8, Canada
4 To whom correspondence should be addressed at the University of Toronto E-mail: herb.vonschroeder{at}uhn.on.ca
| Abstract |
|---|
|
|
|---|
A method is presented for generating computer models of biological tissues. The method uses properties of extracellular matrix proteins to predict the structure and physical chemistry of the elements that make up the tissue. The method begins with Protein Data Bank coordinate positions of amino acids as input into TissueLab software. From the amino acid sequence, a type I collagen-like triple helix backbone was computationally constructed and boundary spheres were added based on known chemical and physical properties of the amino acids. Boundary spheres determined the contact surface characteristics of the collagen molecules and intermolecular interactions were then determined by considering the relationships of the contact surfaces and by resolving the energy-minimum state using feasible sequential quadratic programming. From this, the software created fibrils that corresponded exactly to known collagen parameters and were further confirmed by finite element modeling. Computationally derived fibrils were then used to create collagen fibers and three-dimensional collagen matrices. By resolving the energy-minimum state, large complex components of the extracellular space as well as other structures can be determined to provide three-dimensional structure of molecules, molecular interactions and the tissues that they form.
Keywords: collagen/computation/extracellular matrix/modeling/tertiary structure
| Introduction |
|---|
|
|
|---|
Tissue engineering is an interdisciplinary field that applies the principles of engineering and the life sciences to the development of biological substitutes that restore, maintain or improve tissue function (Langer, 2000
Developing three-dimensional models of the ECM can define the mechanical properties of fibers and thereby characterize the functionality of the structure as a whole (Kumar and Cochran, 1987
). Owing to the unique orientation of fibers in the tissue, different structures and scenarios can be created (Hartgerink et al., 2001
) and tested. For example, in 1993, Sacks et al. described the geometric properties of the right ventricle of the heart (Sacks et al., 1993
). In the process they defined the functionality of the ventricle in such disorders as chronic obstructive pulmonary disease, congenital defects and congestive heart failure. They created a geometric model showing how the unique geometry affected the functionality of the whole organ.
Previous attempts to model tissue have utilized experimental data (Sacks et al., 1997
) and have applied differential equations to simulated tissues (Dallon et al., 1999
). Modeling has also incorporated finite element analysis to account for the physical stresses during structure formation (Vorp et al., 2001
). Towards this, computational chemistry is rapidly emerging as a subfield of theoretical chemistry, where the primary focus is on solving chemically related problems by calculations (Jensen, 1998
) that can be applied to tissue modeling. One of the principal tools in computational chemistry that is used for the theoretical study of biological molecules such as proteins is the method of dynamic simulations at the molecular level. The molecular dynamics method was first introduced by Alder and Wainwright in the late 1950s (Alder and Wainwright, 1957
, 1959
) and the first protein simulations appeared in 1977 with the simulation of the bovine pancreatic trypsin inhibitor (McCammon et al., 1977
). Molecular dynamics methods involve calculations of bonded and non-bonded energies between each atom in the system examined. Bonded energies include stretching, bending and torsional energies. Non-bonded energies include van der Waals energy, electrostatic energy and hydrogen bonding energy. Because of limited current computational ability and the enormous number of the atoms in a molecule, present molecular dynamics simulations allow for only a few molecules in their calculations and are therefore not applicable to large-scale systems, such as ECM structures and formation.
As such, greater computational load or alternative strategies are required for the large-scale system. The basic principle of computational chemistry is still applicable and feasible by focusing on the energy-minimum state, not on the dynamic process of the energy state, and by considering only the dominant energy that determines the structure. By doing so, the computational load can be greatly reduced. It was our purpose to model type I collagen fibrils of the ECM by using energy-minimum calculations. The type I collagen-like molecule was selected based on the repetition and regularity of its molecular structure to reduce the computational load further. Given the unit structure of the collagen molecules, the alignment of the ECM can be theoretically calculated to achieve the energy-minimum configuration among molecules.
Type I collagen was also chosen for modeling since it is abundant in many organs in the human body and has a critical role in ECM formation and function. A complete understanding of the basis for collagen stability (and instability) could lead to new biomaterials and effective therapies for disorders (Jenkins and Raines, 2002
). A collagen molecule is a left-handed helix consisting of three alpha chains (two alpha-1 and one alpha-2 chains); each alpha chain contains
1040 amino acids. The basic amino acid sequence of collagen is the repetition of three amino acids: GlyXY in which every third amino acid is a glycine (Gly). X and Y are typically proline; Y is typically modified to hydroxyproline in the polymer sequence after the collagen chain is built (Berg and Prockop, 1973
) to increase collagen stability (Bella et al., 1994
). Glycine has a small molecular size that fits perfectly inside the helix. Hydroxyproline is critical for collagen stability. Five collagen molecules are bound together to form a microfibril (Cowin, 2000
) as shown in Figure 1.
|
Basic knowledge and principles of the structure of the collagen-based ECM have been acquired from X-ray crystallography and NMR spectroscopy. However, detailed structure and modeling cannot be completed based on existing information. The diameter of collagen microfibril is 3.54 nm (Piez and Trus, 1981
291 nm) and the discrete gaps are 0.66 D (
44 nm) (Cowin, 2000| Methodology |
|---|
|
|
|---|
Computational determination and rendering of collagen fibers were based on the consideration that proteins are rigid spheres that are packed together. The distance geometry between molecules and residues is a functional component that can be mathematically predicted. Reducing the molecular volumes to reduce the computational load is achieved by (a) considering the protein backbone as a braid that is converted to a mathematical model and (b) surface modeling that is based on differential geometry determined by calculation of energy sections that will determine intermolecular distances and hence define assembly and packing of molecules.
Computational modeling involved the following steps:
- Protein Data Bank (PDB) file input.
- Determination of backbone structure of polypeptide or molecule.
- Representation of amino acid residues as spheres with radii based on known physical properties (polarity and charge) of residues (Gilbert, 1999
). Excluded volumes (Connolly, 1985
) are also represented. The assumption is that the residue volume can be treated as a boundary sphere around the actual structure of the residue as a result of the vibration of the atoms; the atoms in the structure spin about their bond-axes. The speed of the spin of the atoms about the bond-axes in the residue is much greater than the spinning of the molecule.
- Representation of excluded volumes of amino acid side chains as boundary spheres based on physical properties.
- Sequential quadratic programming is used to determine intermolecular interaction of closest pairs of residues. Calculations are based on the minimum total energy of interaction. Total energy is the sum of electrostatic, van der Waals and hydrogen bond energies. The assumptions are that the energy between the atoms, at the closest points on the surfaces of two residues examined, is much greater than the energy among other combinations of atoms. Also, the energy between the closest atoms is predominant in the amount of the energy between two residues. The predominant energy is sufficient for the determination of the overall energy-minimum alignment.
- Energy minima and residue dimensions determine intermolecular distances, stacking geometry and three-dimensional shape of final structure.
For computational modeling of type I collagen we utilized the model simplification method illustrated in Figure 2. The structure of the type I collagen-like molecule [(GlyProPro)10]3 (Berisio et al., 2002
) was provided computationally by ProteoRubix software that used a volume minimization approach to determine optimal positions of each amino acid in the protein (Campbell et al., 2003
). The initial structure consisting of 1000 amino acids was input into our modeling program as a PDB file (Figure 2a) and any PDB file can be entered. The backbones of each of three alpha chains were extracted (Figure 2b). The backbone was based on the regular structure resulting from linking the amino acids through amide bonds (Creighton, 1993
). It was created by (a) using appropriate geometry for shorter amide bonds (C
N 1.36 Å) in which resonance structure was a significant contributor compared with regular amide bonds (CN 1.47 Å) and (b) creating a barrier to CN bond rotation of
17 kcal/mol (72 kJ/mol) (Nemethy et al., 1992
). The three backbones were represented with a single backbone by averaging the nodes on three backbones (Figure 2c).
|
The residues present at the surface of the triple helix define the characteristics of the molecular surface and determine intermolecular interactions. The amino acid residues were represented as boundary spheres and were added to the single backbone (Figure 2d). Boundary spheres were defined by the known chemical and physical properties of the residues, such as hydrophobicity and electrostatic charge. It was assumed that the backbones of the amino acid chains do not contribute significantly to the force interaction with other molecules because the backbones sit inside the molecule.
For intermolecular interactions and fibril formation, this model considered the contact between surfaces of different molecules. The surface interactions were determined by the common boundary sphere interactions between adjacent molecules to determine the energy minimum configuration between molecules. The non-bonding energies between molecules consisted of electrostatic forces (Ecoul), van der Waals forces (Evdw) and hydrogen bonds (EH-bond). The minimization of the cost function of the total of non-bonding energy is the sum of the three and was used to determine energy-minimizing configurations between molecules in the model:
![]() |
Assuming that the minimization of volume and energy is the only principle for calculation and the molecule is a rigid object that has six degrees of freedom, the cost function was solved by determining and measuring distances for every combination of residues on the interacting collagen backbones. The distances increased the value of the cost function if the combination of the residues had an attracting force, such as the interaction between a hydrophobic and a hydrophilic residue. It decreased the value of the cost function if the combination of the residues caused a repelling force, such as the combination of a positively charged residue with another positively charged residue.
The cost function thus determined the energy minimum state as the triple helix moved with six degrees of freedom, which computationally represents a stable interaction and position between the collagen molecules (Lawrence et al., 1997
). Any given molecule can change its energy in any of its degrees of freedom. There is one degree of freedom per independent mode of motion, i.e. translational (rigid whole body movement in the x, y or z directions), rotational (whole body rotation around one or two independent axes) and vibrational (oscillation of bond length or angle). Orientation, however, becomes fixed as a result of the energy minimizing function. The backbone was fixed and therefore did not contribute to the degrees of freedom.
Total energy calculations [f(x) = Etot] were based on the following calculations for residue interactions based on inherent physical and chemical properties of the residues. As seen from the equations, energy (E) minima were a function of distances (r).
Electrostatic energy
Electrostatic energy describes the repulsion or attraction between atoms based on atomic charge (Gilbert, 1999
). Ecoul is calculated as
![]() |
0 is the dielectric constant. van der Waals interaction
van der Waals energy is the non-electrostatic energy that describes the repulsion or attraction between atoms that are not covalently bonded. To calculate van der Waals energy, the LennardJones potential was used:
![]() |
is the depth of the minimum parameter (Alder and Wainwright, 1957Hydrogen bond
A hydrogen bond is formed between hydrogen atoms bonded to electronegative atoms and negatively charged heteroatoms. A hydrogen bond (25 kcal/mol) is significantly stronger than van der Waals interactions (0.10.2 kcal/mol). The amino acids lysine, arginine and histidine all have one or more hydrogen-bond donor sites. Glutamate, aspartate and methionine have one or more possible hydrogen-bond acceptor sites. Glutamine, asparagine, threonine, serine, cysteine and tyrosine have both donor and acceptor sites. In cases in which residues have multiple hydrogen donors and acceptors, the number of donors and acceptors actively involved in hydrogen bonding to other residues depends on the relative position of the residues and the environment. To simplify the problem, we calculated only the energy of one hydrogen bond, occurring at the closest pair of the point on the surfaces of the donor and acceptor residue. We also assumed that the directional term in hydrogen bonding is negligible in such a large-scale simulation since the residues are fixed to the backbone. To calculate the energy, we adopted one commonly used function modified from the LennardJones potential:
![]() |
Finite element model construction
To confirm the structures created by our software, residue boundary spheres and backbones were converted to a finite element mesh to construct the surface of the collagen molecule (Oden and Carey, 1983
). This model also allowed for the consideration of the contact between surfaces of different molecules. Specifically, model surface interactions were determined by common node interactions between adjacent molecules. The nodes corresponded to the attraction/repelling interactions between various molecular, van der Waals and hydrophobic forces at corresponding residue positions. The finite element method is basically concerned with the determination of field variables. The most important ones are the stress and strain fields determined in CalculiX (Dhondt, 2004
). This software computationally utilizes Lagrangian strain tensor for elastic media, Eulerian strain tensor for deformation plasticity and deviatoric elastic left CauchyGreen tensor for incremental plasticity to determine a Cartesian coordinate systems. All CalculiX input (e.g. distributed loading) and output are in terms of true stress.
The collagen model from Tissue Lab provided the values for the energy nodes. Repulsive and attractive stresses were located at corresponding residue positions on the surface. The stress zones modeled the hydrophilic and hydrophobic areas of the collagen. The finite element acts to minimize and balance stress throughout the structure and modeled the final outcome of the redistribution. In the initial study the cylindrical mesh was made up from 20 node brick elements. There were 16 elements per cylinder. For the initial test of the software, the cylinder was fixed and acted as a cantilever only on its z edge; all other surfaces were free to move. A force was applied to the top of the cylinder at the +z edge. The force was directed in the y-direction and the stress contours for this situation were determined (Eringen, 1988
; Simo and Hughes, 1997
). This was described as a multi-dimensional nonlinear optimization problem (Kumar and Cochran, 1990
; Nawrocki et al., 1996
; Belytschko et al., 2001
) to determine optimal interaction for final modeling of the collagen.
| Results |
|---|
|
|
|---|
For collagen fibril formation, the software examined the energy function to identify the region holding the global minimum from other regions that may have held local minima. Thus the optimizer narrowed the search region stochastically. Figure 3 shows the plots of energy function with two types of translations and two types of rotations. The molecule used here was a common collagen-like structure [(GlyProPro)10]3 (Berisio et al., 2002
|
Through the optimization process, Figure 4a shows five collagen molecules forming a five-molecule-based fibril in agreement with the known structure and periodicity of collagen (Figure 1). The molecules are shown to be tightly aligned together. As the molecules formed fibrils, these were computationally linked, again by determining energy minima to build three-dimension fibers shown in Figure 4bd.
|
For confirmation of a stable three-dimensional structure our energy calculations, which were based on known data of thermal stability, were compared with known experimental data for the denaturation process involving over 30 residues (Voet and Voet, 1990
H is 40006500 kJ/mol; and entropy
S is 1421 kJ/mol. For the enthalpy
H, we used two main criteria for estimating the strength of the H-bond in the A-HÖÖB (backbone-residue) system: the A-H stretching frequency or its relative shift (
0-
)/
0 (where
0 is stretching frequency of the free A-H group) and the distances AH and AÖÖÖB. According to these criteria H-bonds are regarded as weak, intermediate and strong. For the OHÖÖÖ.O bonds this approximate classification is given in Table I. The length of H-bonds in collagen is
1.5 Å (Richards, 1979
|
For confirmation of our computer-generated model, we converted collagen molecules into finite element constructs (Figure 5a) with nodes based on those generated by our software. Interactions of the finite element molecules resulted in five molecules forming a fibril (Figure 5b), which was identical with our model (Figure 4a). Figure 5c shows a lateral view of the fibril in agreement with our structure the known structure and periodicity (Figure 1) of collagen.
|
| Conclusions |
|---|
|
|
|---|
The results show the successful implementation of the physical and chemical principles to build three-dimensional collagen fibrils and an ECM scaffolding. Although the structure of the type I collagen is well known theoretically, the actual measurements are difficult to obtain because the technical challenges of obtaining diffraction of fibrous proteins. Available diffraction measurements of collagen-like polypeptide [(ProProGly)10]3 obtained from crystals grown in a microgravity and having the highest resolution description of a collagen triple helix reported to date (Berisio et al., 2002
Although finite element analysis provided confirmation of fibril assembly, the applications of this technique to tissue modeling should generally be considered as rudimentary. Finite element analysis does not allow for robust calculations of interactions including the significant bending or torsional molecular changes that occur.
It must be kept in mind that our modeling was based on a collagen-like molecule (GlyProPro)n that is less complex and more stable than that which occurs in the ECM. This structure was chosen because it could be compared with known crystallographic data on the molecule. Although in situ collagen has a slightly differing amino acid sequence and interacts with several other molecules ranging from proteoglycans to other collagen types and hydroxyapatite, our model did mimic collagen with high accuracy including the formation of a microfibril consisting of five collagen molecules. Any PDB file can be used as input for modeling in our software. Further, our model can readily be modified to account for alterations in amino acids and intermolecular interactions, as well as interactions with other ECM molecules (Brodsky and Ramshaw, 1994
), to improve accuracy further. The use of type I collagen-like molecule as an example shows that our computational methods are an intriguing first approach towards analyzing complex tertiary and quaternary structures of proteins.
In addition to energy minima on intermolecular interactions and interactions with other ECM molecules, several other properties also predispose the orientation of collagen fibers, including stress vectors, which are related to the effect of mechanical loads on a forming tissue. These loads can give result in directional orientation of the fibers that form the tissue (Hughes and DeVore, 1980
) and which thereby provide specific structure and hence functionality to the tissue. Additional programming can be readily added to determine preferred fiber orientation(s) during fibril formation based on physical stress.
As a starting point, the molecule was initially assumed to be a rigid object; however, it was expected that one molecule will deform to fit tightly to the other. Local deformations were initially calculated after locating the energy-minimum state by optimization, without allowing the molecule to deform. This was done by modeling the functions between the energy in the local domain and the bending and torsional angles of the segment of the backbone of the molecule. The software effectively determined local deformation for tighter packing. Although the same problem simplification method may not be applicable for all molecules, the use of feasible sequential quadratic programming can be successfully used to determine tissue structure based on molecular configuration. As such, this modeling can be applied to computer-assisted tissue engineering, proteomics, protein folding-related diseases, effects of mutations and the interactions between proteins with ligands, substrates, DNA, pharmaceuticals and other proteins or peptides. Molecules with less symmetry do require additional programming and software steps for the increased complexity. Modeling has direct applications to investigate the fundamental question of how amino acid sequences of peptides change to functional tertiary or quaternary structures and the four-dimensional process that they undergo.
| References |
|---|
|
|
|---|
Alder,B.J. and Wainwright,T.E. (1957) J. Chem. Phys., 27, 12081209.[CrossRef]
Alder,B.J. and Wainwright,T.E. (1959) J. Chem. Phys., 31, 459466.[CrossRef]
Bella,J., Eaton,M., Brodsky,B. and Berman,H.M. (1994) Science, 266, 7581.
Belytschko,T., Liu,W.K. and Moran,B. (2001) Nonlinear Finite Elements for Continua and Structures. Wiley, New York.
Berg,R.A. and Prockop,D.J. (1973) Biochemistry, 12, 33953401.[CrossRef][Medline]
Berisio,R., Vitagliano,L., Mazzarella,L. and Zagari,A. (2002) Protein Sci., 11, 262270.[CrossRef][Web of Science][Medline]
Billiar,K.L. and Sacks,M.S. (1997) J. Biomech., 30, 753756.[CrossRef][Web of Science][Medline]
Brodsky,B. and Ramshaw,J.A. (1994) Int. J. Biol. Macromol., 16, 2730.[CrossRef][Web of Science][Medline]
Campbell,P.G., Cohen,A.P., Ernst,L.A., Ernsthausen,J., Farkas,D.L., Galbraith,W. and Israelowitz,M. (2003) US Patent Application 20030216867.
Connolly,M.L. (1985) J. Am. Chem. Soc., 107, 11181124.[CrossRef]
Cowin,S.C. (2000) J. Biomech. Eng., 122, 553569.[CrossRef][Web of Science][Medline]
Creighton,T.E. (1993) Proteins. Freeman, New York.
Dallon,J.C., Sherratt,J.A. and Maini,P.K. (1999) J. Theor. Biol., 199, 449471.[CrossRef][Web of Science][Medline]
Dhondt,G. (2004) The Finite Element Method for Three-Dimensional Thermomechanical Applications. Wiley, Hoboken, NJ.
Eringen,A.C. (1988) Mechanics of Continua. Robert E. Krieger, Huntington, NY.
Gilbert,H.F. (1999) Basic Concepts in Biochemistry. McGraw-Hill, New York.
Hartgerink,J.D., Beniash,E. and Stupp,S.I. (2001) Science, 294, 16841688.
Hughes,K.E. and DeVore,D.P. (1980) US Patent 4 242 291.
Jenkins,C.L. and Raines,R.T. (2002) Nat. Prod. Rep., 19, 4959.[CrossRef][Web of Science][Medline]
Jensen,F. (1998) Introduction to Computational Chemistry. Wiley, Hoboken, NJ.
Kumar,K. and Cochran,J.E.,Jr (1987) J Appl. Mech., 54, 893903.
Kumar,K. and Cochran,J.E.,Jr (1990) J Appl. Mech., 57, 10011003.
Langer,R. (2000) Mol. Ther., 1, 1215.[CrossRef][Web of Science][Medline]
Lawrence,C.T., Zou,J.L. and Tits,A.L. (1997) User's Guide for CFSQP Version 2.5: a C Code for Solving (Large Scale) Constrained Nonlinear (Minimax) Optimization Problems, Generating Iterates Satisfying All Inequality Constraints. Technical Report TR-94-16r1. Institute for Systems Research, University of Maryland, College Park, MD.
McCammon,J.A., Gelin,B.R. and Karplus,M. (1977) Nature, 267, 585590.[CrossRef][Medline]
Nawrocki,A., Labrosse,M. and Dubigeon,S. (1996) Presented at the Engineering Systems Design and Analysis Conference, Montpellier.
Nemethy,G., Gibson,K.D., Palmer,K.A., Yoon,C.N., Paterlini,G., Zagari,A., Rumsey,S. and Schegara,H.A. (1992) J. Phys. Chem., 96, 64726484.[CrossRef]
Oden,T.J. and Carey,G.F. (1983) Finite Elements: Computational Aspects. Prentice Hall, Englewood Cliffs, NJ.
Panier,E.R. and Tits,A.L. (1993) Math Programming, 59, 261276.[CrossRef]
Piez,K.A. and Trus,B.L. (1981) Biosci. Rep., 1, 801810.[CrossRef][Web of Science][Medline]
Richards,F.M. (1979) Carlsberg Res. Commun., 44, 4763.[Web of Science]
Sacks,M.S., Chuong,C.J., Templeton,G.H. and Peshock,R. (1993) Ann. Biomed. Eng., 21, 263275.[CrossRef][Web of Science][Medline]
Sacks,M.S., Smith,D.B. and Hiester,E.D. (1997) Ann. Biomed. Eng., 25, 678689.[Web of Science][Medline]
Simo,J.C. and Hughes,T.J.R. (1997) Computational Inelasticity. Springer, New York.
Voet,D. and Voet,J.G. (1990) Biochemistry. Wiley, Hoboken, NJ.
Vorp,D.A., Steinman,D.A. and Ethier,C.R. (2001) Comput. Sci. Eng., 3, 5164.
Zhou,J.L. and Tits,A.L. (1993) J. Optimiz. Theory Appl., 76, 455476.[CrossRef]
Received December 23, 2004; revised May 14, 2005; accepted May 17, 2005.
Edited by Albert Berghuis
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
Y. Tang, R. Ballarini, M. J. Buehler, and S. J. Eppell Deformation micromechanisms of collagen fibrils under uniaxial tension J R Soc Interface, November 6, 2009; (2009) rsif.2009.0390v1. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









