Protein Engineering vol. 16 no. 10 pp. 717-720, 2003
© 2003 Oxford University Press
On the rotational operators in protein structure simulations
1Biomedical Engineering Program and 2Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269-3139, USA
3 To whom correspondence should be addressed. e-mail: kazem{at}engr.uconn.edu
| Abstract |
|---|
|
|
|---|
The reduction of the computational complexity of the algorithms dealing with protein structure analysis and conformation predictions is of prime importance. One common element in most of these algorithms is the process of transforming geometrical information between dihedral angles and Cartesian coordinates of the atoms in the protein using rotational operators. In the literature, the operators used in protein structures are rotation matrices, quaternions in vector and matrix forms and the RodriguesGibbs formula. In the protein structure-related literature, the most widely promoted rotational operator is the quaternions operator. In this work, we studied the computational efficiency of the mathematical operations of the above rotational operators applied to protein structures. A similar study applied to protein structures has not been reported previously. We concluded that the computational efficiency of these rotational operators applied to protein chains is different from those reported for other applications (such as mechanical machinery) and the conclusions are not analogous. Rotation matrices are the most efficient mathematical operators in the protein chains. We examined our findings in two protein molecules: Ab1 tyrosine kinase and heparin-binding growth factor 2. We found that the rotation matrix operator has between 2 and 187% fewer mathematical operations than the other rotational operators.
Keywords: protein/rotation/simulations/transformation
| Introduction |
|---|
|
|
|---|
The computer algorithms used for the analysis and simulation of protein structures are computationally intensive. These programs require a set of generalized coordinates to define specific three-dimensional geometrical structures. The two commonly used sets of generalized coordinates are (i) the Cartesian coordinates of the atoms in the protein molecule as the generalized coordinates (similar to PDB file formats) and (ii) the dihedral angles. The application of dihedral angles has the advantage of reducing the dimension of search space drastically. However, in almost all cases, transforming the geometrical information back and forth between dihedral angles (as generalized coordinates) and Cartesian coordinates of the atoms in the chain is needed. In other words, given the dihedral angles we calculate the x, y and z coordinates of all atoms in the macromolecules (and vice versa). This transformation mainly relies on what is referred to as a rotation operator.
A rotation operator is the mathematical means of rotating a point (e.g. atom) in space by a given angle. Protein structures can be described as a serial combination of such rotations (Branden and Tooze, 1999
), which are called open serial chains. An analogy of an open serial chain is the body of a snake, where the vertebras (links) can move relative to each other about their joints. The representative structural description of a proteins open serial chain is shown in Figure 1.
|
Every time that we calculate the position of atoms (or body vectors) of protein structures, a sequential composition (multiplication) of several rotational operators is required. This process is computationally very expensive.
Several rotational operators have been developed over the past few decades. These methods are widely reported in the field of analytical and theoretical kinematics. The rotational operators used in protein simulation literature are the quaternion descriptions (Jain et al., 1993
; Kneller and Hinsen, 1994
), rotation matrices (Dullweber et al., 1997
), transformations (Manocha et al., 1995
) and RodriguesGibbs [similar to the vector rotation approach found in Güntert (Güntert, 1993
)]. The most widely used method in the modeling of molecular systems is the quaternions in its matrix form.
Our intention in this work was to study and compare the number of operations required in the application of (i) the rotation matrix, (ii) the quaternions vector form, (iii) the quaternions in matrix form, and (iv) the Rodrigues formula to rotate atoms in protein chains. The operator that requires the fewest operations decreases the computational costs of molecular simulations. A reduction in unnecessary computationally expensive calculations improves the computational efficiency of computer simulations.
Funda and Paul performed a similar computational comparison of rotational operators (excluding the Rodrigues formula) (Funda and Paul, 1988
). Their conclusion, however, is only true for small mechanical chains. They only considered a sequential chain of two revolute joints and assumed that the result increases linearly with the number of joints.
In this study, we not only calculated the number of operations of two compositions, but also provided a general formulation that accounts for a larger number of compositions such as are found in protein chains. We have shown that the conclusions vary significantly as the number of joints increases (as is the case in protein structures) and therefore the results reported previously are no longer valid. We examine our findings with two examples: Abl tyrosine kinase and heparin-binding growth factor 2.
| Methods |
|---|
|
|
|---|
Evaluation of rotational operators in protein structures is of prime importance for selecting the least computationally expensive operator in protein simulations. This study compares several rotational operators, the rotation matrix, quaternions (in both vector and matrix form) and Rodrigues formula, to decide which one requires the least number of operations. This by no means suggests that these are the only rotation operators, but they are the methods that have been widely used in molecular simulations.
First, the different operators are introduced and their respective number of operations applied to protein chains. Then, we show the number of operations required to rotate atoms or vectors of two proteins: Abl tyrosine kinase (62 amino acids) and heparin-binding growth factor 2 (155 amino acids).
Rotation matrix
The rotation matrix operator definition to rotate a vector in space about a screw axis, u, by an angle,
, is (Ha and Radcliffe, 1978
)
where C
and S
represent cos
and sin
, respectively.
Evaluation of the rotation matrix requires 15 multiplications (*), 10 additions (+) and two function evaluations (f). To determine the rotation of a single body vector about a unit screw axis by an angle
, the following relation is used:
This operation requires 9 (*) and 6 (+) for multiplying a rotation matrix and a vector. In open serial chains, there is a sequential multiplication of rotational operators. To determine the rotation of the last vector (see Figure 1), we have a composition of rotation operators:
The composition of two rotational operators accounts for 27 (*) and 18 (+). Again, referring to Figure 1, vector bM1 is affected by the same joints as bM, so its new position is defined as follows:
Vector bM2 is not affected by the joint at uN, so the equation becomes
This can be applied to all the vectors in the kinematic chain defining the position of the vectors. The general equation describing the total number of operations [assuming that (*) and (+) have the same computational cost (Field, 1999
)] to perform rotation of vectors in open serial chains is
where N is the number of degrees of freedom, M is the number of amino acids and Mj is the number of vectors to operate per amino acid (in fact the number of atoms per amino acid). In proteins
Mj is greater than N or, in other words, there are more atoms than degrees of freedom. (Note: only the dihedral angle contributions are considered, not the rotameter angles.)
Quaternion vector operator form
A quaternion/vector operator (Bottema and Roth, 1979
) that determines the rotation of a single body vector can be defined as
where
is a quaternion number that has four components as follows:
[3 (*)]. Evaluation of this rotational operator gives 22 (*), 19 (+) and 2 (f). The position of vector bM of Figure 1 is
Sequential multiplication of two quaternions has 16 (*) and 12 (+). The same joints affect vector bM1 compared with bM, so its new position is defined as follows:
Vector bM2 is not affected by the joint at uN, so the equation becomes
This can be applied to all the vectors in the open serial chain defining the position of the vectors. The general equation describing the total number of operations to perform rotation of vectors in open serial chains is
Quaternions in matrix form
In addition to the quaternion vector form, there is a matrix form of the quaternion. This form can be from the rotation matrix (Sheperd, 1978
) and is shown below:
where, q0 = cos (
/2), q1 = ux sin (
/2), q2 = uy sin (
/2), and q3 = uz sin (
/2).
Evaluation of operations of this matrix gives 16 (*), 13 (+) and 2 (f). This operator now acts in the same way as the rotation matrix. The equation below shows how this quaternion matrix operates in a different way with composition of rotations (see Figure 1):
The sequential composition of two rotational operators accounts for 27 (*) and 18 (+). The same joints affect vector bM1 as compared with bM, so its new position is defined as follows:
Vector bM2 is not affected by the joint at uN, so the equation becomes
This can be applied to all the vectors in the kinematic chain defining the position of the vectors. The general equation describing the total number of operations to perform rotation of vectors in open serial chains is
Other quaternion formulations can be found in the literature (Horn, 1987
), but they are similar to the quaternion vector form.
Rodrigues
The RodriguesGibbs formulation [derived from Bottema and Roth (Bottema and Roth, 1979
) or Goldstein (Goldstein, 1980
)] is a screw transformation in terms of a general coordinate system:
This accounts for 21 (*), 12 (+) and 2 (f) for each rotation operation. Again, we show how this method could be applied to successive N rotations and Mj vectors (see Figure 1). For example, let us define the rotation of the body vector, b5, which is affected by the first and second joints (two rotations, one vector):
which accounts for 21 (*), 12 (+), 2(f)];
which accounts for 15 (*), 11 (+). This accounts for a total of 36 (*), 23 (+) and 2 (f). The RodriguesGibbs formula differs from the rotation matrix and quaternion operators in that we have to substitute one vector into the other to obtain its final position. This approach can be applied to all the vectors in the kinematic chain defining the position of all the body vectors. The general equation describing the total number of operations to perform rotation of vectors in open serial chains is
In order to identify which method is the least computationally expensive, it will be helpful to show examples that require the operation of various vectors and joints (or degrees of freedom). In the next section we consider two protein examples, which meet the requirements mentioned before, to understand which of the methods is more computationally efficient.
Rotational operators applied to proteins
In this section, the number of operations to rotate the atoms of proteins Abl tyrosine kinase and heparin-binding growth factor 2 is determined using Equations 14. Kinase is an enzyme that attaches phosphate groups to organic substrates that has 62 amino acids: MNDPNLFVALYDFVASGDNTLSITK GEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS. Keratin is a protein responsible for the structural skin surface, hair, nails, etc. It has 155 amino acids: MAAG SITTLPALPEDGGSGAFPPGHFKDPKRLYCKNGGFFLRIHPDGRVDGVREKSDPHIKLQLQAEERGVVSIKGVSANRYLAMKEDGRLLASKSVTDECFFFERLESNNYNTYRSRKYTSWYVALKRTGQYKLGSKTGPGQKAILFLPMSAKS.
To determine the number of operations we (i) identify atoms affected by a rotation(s), because not all atoms are affected by all dihedral angles; and (ii) substitute corresponding N rotation(s) (dihedral angles), M amino acids and Mj vectors (atoms) in the four derived equations (14). Table I summarizes the number of computations required by each method to describe the position of atoms.
|
| Results |
|---|
|
|
|---|
Table I show that the rotation matrix requires a smaller number of operations than the other methods. The percentage difference in the number of mathematical operations of the rotational operators with respect to the rotation matrix is also given. The quaternion in matrix form is the closest to the rotation matrix with 2% more mathematical operations than the rotation matrix. The RodriguesGibbs formulation requires the largest number of operations with 187% more mathematical operations than the rotation matrix.
| Discussion |
|---|
|
|
|---|
In order to understand why the rotation matrix formulation requires the least number of operations in protein structures, let us take a closer look at the equations for each rotation operator. If Mj (number of atoms per amino acid) is larger that N (dihedral angles) the rotation matrix is computationally the least expensive as compared to the other rotational operators. In the case that N is greater than Mj, then it could be possible for the quaternion matrix to be less computationally expensive than the rotation matrix. However, this situation does not arise in the protein chain because Mj is always greater than N.
In addition, vector operators, such as the quaternions in vector form and the RodriguesGibbs formulation, are more computationally expensive for protein chain descriptions. Both the rotation matrix and the quaternions in matrix form require significantly fewer mathematical operations than the vector operators.
| References |
|---|
|
|
|---|
Bottema,O. and Roth,B. (1979) Theoretical Kinematics. Dover, New York, pp. 5662 and 518525.
Branden,C. and Tooze,J. (1999) Introduction to Protein Structure, 2nd edn. Garland Publishing, New York.
Dullweber,A., Leimkuhler,B. and McLachlan,R. (1997) J. Chem. Phys., 107, 58405851.[CrossRef]
Field,M. (1999) A Practical Introduction to the Simulation of Molecular Systems. Cambridge University Press, Cambridge.
Funda,J. and Paul,R. (1988) Presented at the IEEE International Conference on Robotics and Automation, April 2429, 1988, Philadelphia, PA.
Goldstein,H. (1980) Classical Mechanics. AddisonWesley, Reading, MA, Ch. 4.
Güntert,P. (1993) Dissertation, ETH, Zurich, No. 10135.
Ha,C. and Radcliffe,C.W. (1978) Kinematics and Mechanism Design. Wiley, New York, pp. 4551.
Horn,B.K.P. (1987) J. Opt. Soc. Am. A, 4, 629642.
Jain,A., Vaidehi,N. and Rodriguez,G. (1993) J. Comput. Phys., 106, 258268.[CrossRef]
Kernighan,R. and Pike,R. (1999) The Practice of Programming. AddisonWesley, Reading, MA.
Kneller,G. and Hinsen,K. (1994) Phys. Rev. E, 50, 15591564.[CrossRef]
Manocha,D., Zhu,Y. and Wright,W. (1995) Comput. Appl. Biol. Sci., 11, 7186.
Sheperd,S. (1978) AIAA J. Guidance Control Dyn., 1, 223224.
Received June 6, 2003; revised July 17, 2003; accepted August 20, 2003.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
