Protein Engineering vol. 16 no. 11 pp. 819-829, 2003
© 2003 Oxford University Press
Homology modeling of the central catalytic domain of insertion sequence ISLC3 isolated from Lactobacillus casei ATCC 393
Department of Life Science, National Tsing Hua University, Hsinchu 30043, Taiwan
1 To whom correspondence should be addressed. e-mail: thlin{at}life.nthu.edu.tw
| Abstract |
|---|
|
|
|---|
The tertiary structure of the central catalytic domain of insertion sequence ISLC3 isolated from Lactobacillus casei ATCC 393 was predicted using the homology modeling approach. The novel insertion sequence was isolated by us from the template bacteriophage
A3 of L.casei ATCC 393. The number of amino acid residues of the ISLC3 central catalytic domain was 116 and was treated as the query sequence. There were five Web-available threading methods used to find some primary structure templates for the query sequence. These primary templates were further screened using the SWISS-MODEL Protein Modeling Server and the default parameter settings therein to give six final structure templates. All of these final structure templates were the integrase (IN) protein of retroviruses. Multiple sequence alignment using these IN sequences against the query one revealed the signature DDE motif. Based on the structures of these final templates, the structure of the query sequence was constructed using the InsightII/Discover/Homology programs. A metal ion, Mg2+, was inserted into the center of the putative catalytic pocket formed by the DDE residues of the predicted structure in the final rounds of refinement by molecular dynamics (MD) simulations. The structure with a metal ion included was designated withMg and that without a metal ion was designated freeMg. The average exposed surface area of some hydrophobic residues of both the predicted freeMg and withMg structures were computed and compared with those computed for the six structure templates. Whereas the predicted withMg structure was slightly more exposed than the predicted freeMg structure, the former appeared to be more stable than the latter, as revealed by the lower conformation energy recorded for the former during the structure refinement by MD simulations. To verify further the predicted structures, the coordinates of both predicted structures were fed into the ERRAT Protein Verification Server. It was found that the quality of the predicted withMg structure was much better than that of the freeMg structure. The validation results also indicated that regions of the predicted withMg structure that can be rejected at the 95% confidence level were
20% whereas those which can be rejected at the same level for the six structure templates were
10%. The predicted withMg structure was also docked into a short oligonucleotide representing the substrate of the ISLC3 transposase using the DOCK_4.0.2 program. It was found that both Glu140 and Asp68 residues of the DDE motif of the predicted withMg structure were able to form hydrogen bonds with the DNA substrate, which was similar to what was observed in a docking study using the retrovirus IN 1asu and its DNA substrate.
Keywords: insertion sequence/homology modeling/protein structure/threading
| Introduction |
|---|
|
|
|---|
Insertion sequences (ISs) are mobile DNA elements capable of mediating various types of DNA rearrangements such as transposition, deletion, inversion and cointegration. They are usually 0.82.5 kb long and encode a transposase protein. To date, more than 600 IS elements have been isolated from both eubacteria and archaea. Except for those highly similar variants from the same or related hosts, IS elements are considered heterogeneous at the nucleotide sequence level. Many can be grouped into families on the basis of conservation of motifs in their putative transposase amino acid sequences and their terminal nucleotide sequences. The IS3 family is one of the largest families (Mahillon and Chandler, 1998
Until recently, the ISs isolated from lactobacilli were still rare. Two ISs, ISL2 and ISL3, were isolated as factors influencing lactose utilization of Lactobacillus helveticus (Zwahlen and Mollet, 1993
) and L.bulgaricus (Germond et al., 1993
), respectively. ISL2 was isolated as an insertion in spontaneous lactose-negative mutant (Zwahlen and Mollet, 1993
) and ISL3 was isolated from a deletion-prone region following the lacZ gene (Germond et al., 1993
). The insertion element IS1223 was discovered in plasmid pSA3 resolution products recovered from transconjugants in L.johnsonii (Walker and Klaenhammer, 1993
). Transpositional activity of IS1163 was found to abolish the lactocin S production of L.sake by integration into the lactocin S operon (Skaugen and Nes, 1994
). Recently, we have isolated a novel insertion element ISLC3 (AF445084) from a temperate bacteriophage
A3 of L.casei ATCC393. The new IS was classified as a member of the IS3 family. We found that the transposition of ISLC3 created some circles on which 25 bp from the left-inverted repeat in the junction region were deleted. This unusual deletion was observed in L.casei ATCC393 and also in an Escherichia coli model system. The unusual deletion in the junction region also generated a promoter which was much stronger in activity than the indigenous one. Here, we used several theoretical structure prediction methods to derive a three-dimensional (3D) structure model for the catalytic domain of the newly identified insertion element ISLC3. Our work was aimed at providing a more profound insight into the structural features governing the catalytic activity exhibited by the newly identified transposase sequence. We used five threading methods available on the Web to find some structure templates for the central catalytic domain of ISLC3. Further screening and refinement of the predicted 3D structures was conducted using the InsightII/Discover/Homology programs (BIOSYM Technology, San Diego, CA) or the SWISS-MODEL Protein Modeling Server (Guex and Peitsch, 1997
) on the Web. To compare the predicted 3D structures with the known structures of some INs and transposases, an Mg2+ cofactor was placed in the center of the predicted catalytic domain in the final structure refinement using molecular dynamics (MD) simulations. The predicted structures with and without the cofactor Mg2+ included were designated freeMg and withMg structures and both were validated using several methods. A biochemical study where the predicted withMg structure was docked into a short oligonucleotide representing the substrate of the ISLC3 transposase was also conducted. It was found that Glu104 and Asp68 of the DDE motif of the predicted withMg structure could form hydrogen bonds with the DNA substrate similar to that observed by docking the retrovirus IN 1asu into its DNA substrate.
| Materials and methods |
|---|
|
|
|---|
The insertion element ISLC3 (AF445084) was identified by us from a novel temperate bacteriophage
A3 (unpublished data). The novel temperate bacteriophage was induced from L.casei ATCC393 with 0.2 µg/ml mitomycin-C. The ISLC3 sequence was found to be inserted into the coding region of the putative phage structure genes. The complete sequence of ISLC3 was determined and is shown in Figure 1. The total length of ISLC3 was 1351 bp and both ends of the element were flanked by 37 bp inverted repeats (IRs). Apparently, there were two open reading frames, orfA (from 82 to 333 bp) and orfB (from 387 to 1229 bp) on the ISLC3 sequence, which were in phase 0 and 1 (Sekine and Ohtsubo, 1989
70-like promoter (Bao et al., 1997
|
|
The five Web-available threading methods used to find the structure templates for the ISLC3 central catalytic domain were as follows: TOPITS (http://dodo.cpmc.columbia.edu/predict protein/); HMM (http://www.cse.ucsc.edu/research/compbio/ HMM-apps/HMM-applications.html); 3D-JIGSAW (http://www. bmm.icnet.uk/
3djigsaw/); 3D-PSSM (http://www.sbg. bio.ic.ac.uk/
3dpssm/); and HFR (http://www.cs.bgu.ac.il/
bioinbgu/). TOPITS (Rost, 1995
To construct a structure template for the query sequence, we used the InsightII/Discover/Homology programs implemented on a Silicon Graphics computer. Sequences of the six INs were aligned against the query one to find regions where the structures of these proteins were most matched. The matched structures were taken as the structures of the regions for the query sequence. Loop searching of the Protein Data Bank (Berman et al., 2002
) then yielded the missing fragments of the query sequence. Residues showing bad contacts were replaced with their rotamers and also manually adapted. One round of minimization (300 steps of steepest descent plus 500 steps of conjugate gradient) was performed while keeping the conserved residues restrained to their initial positions in order to relax the loops and bad contacts. The tertiary structure constructed for the query sequence was refined using the InsightII/Discover/Discover3 MD simulation programs with the consistent valence force field. The protein was held inside a box of water molecules and the temperature was kept at 298 K during the MD simulation runs. The cofactor Mg2+ was added to the final round of structure refinement by the MD simulation. The cofactor Mg2+ was initially placed in the center of a triangle formed by the C
-atoms of the three central catalytic residues, namely Asp7, Asp68 and Glu104 (Figure 2). The secondary structure of each protein was defined using the KabschSander DSSP program (Kabsch and Sander, 1983
) implemented in the SYBYL 6.9 package (Tripos Associates, St. Louis, MO). The exposed molecular surfaces of all the known or predicted structures were computed using the Connolly MS program (Connolly, 1983
) with a probe size of 1.4 and all the structures were displayed using the Kraulis MolScript v2.1 program (Kraulis et al., 1994
).
The DOCK_4.0.2 program (Kuntz et al., 1994
) was used for docking both the predicted withMg and retrovirus IN 1asu structures into their corresponding DNA substrates. The sequences of these two DNA substrates are displayed in Figure 9. These were the rigid docking processes where the charges on DNAs and proteins were added using th Amber95_All parameters in the SYBYL 6.9 program. A B-form DNA was used as the starting conformation for each DNA substrate. Having fixed the conformation of DNA after the rigid docking process, the position of DNA was further adjusted to be close to that of the DDE motif of protein structure using the Swiss-PdbViewer_3.7 program. The docked and adjusted structure of the DNAprotein complex was further refined by MD simulation runs using the InsightII/Discover program. The structure of the DNAprotein complex was solvated in five layers of water and the Amber force field was employed. The DNAprotein structure complexes were briefly energy minimized and then subjected to 104 steps of MD simulation runs.
| Results and discussion |
|---|
|
|
|---|
All the Web servers used found some structure templates with significant scores for the query sequence except TOPITS (Rost, 1995
|
|
A comparison of structures for all the template structures searched was made by superposition of the coordinates of C
-atoms of each structure on to each other using the SYBYL Fit module (Tripos Associates) and the results are presented in Table II. The difference in structural features between these template structures was low since the root-mean-square deviation (r.m.s.d.) values computed between them were low, as can be seen from the table. The structures were further compared using MolScript (Kraulis et al., 1994
one by the Kabsch and Sander DSSP program (Kabsch and Sander, 1983
and
of templates 1asu (Bujacz et al., 1995
|
|
|
The divalent metal ion requirements of HIV-1 IN have been investigated by Engelman and Craigie (Engelman and Craigie, 1995
|
|
The solvent-exposed molecular surface of a non-polar side chain has been used as a criterion to discriminate native proteins and incorrectly folded models. A direct numerical measure of burial of the non-polar atoms is the non-polar/polar side chain surface area ratio, with values of
2.02.2 and greater indicating incorrect folding in the case of hemerythrin and the VL domain (Novotny et al., 1988
20% while that for the latter was
10%.
|
|
|
The predicted withMg structure was further validated and refined by a biochemical approach by docking the structure into a short oligonucleotide (Figure 9) representing the substrate of the ISLC3 transposase. As a control, the ASV IN 1asu was also docked into its short DNA substrate (Figure 9). These were rigid docking processes where the parameter maximum_orientations was set at 3000 in the DOCK_4.0.2 program (Kuntz et al., 1994
|
|
|
|
|
Conclusion
The accuracy of comparative modeling depends strongly on the degree of homology between sequences of the query and the templates on which the model is built. The structure model we present here for the central catalytic domain of ISLC3 transposase may be categorized as of low accuracy (Pieper et al., 2002
) since the model is based on a sequence identity of only 30%. However, the prediction accuracy is greatly enhanced by inserting a metal ion into the predicted structure in the final steps of structural refinement. Although both the transposase and IN protein family carry the signature DDE motif, there are substantial mechanistic and structural differences within this protein family. It has been noted that domains outside the catalytic core are not highly conserved among many transposases (Haren et al., 1999
). While a tetramer of IN has been proposed to be required for the integration activity of the protein, the self-association properties of transposases are complex and still poorly understood (Haren et al., 1999
). The crystal structure of the Inh protein of IS50, a regulatory derivative of the transposase lacking the first 55 amino acids, has recently been determined (Davies et al., 1999
). Here, too, the DDE triad (D119, D188 and E326) forms a distinct catalytic pocket with a similar fold to that found in IN, although the sequence homology between them is low (Davies et al., 1999
). The importance of the DDE residues has been demonstrated by site-directed mutagenesis for several IN proteins plus the transposases of bacteriophage Mu (Baker and Luo, 1994
), Tn7 (Sarnovsky et al., 1996
), IS10 (Junop and Haniford, 1997
), Tc1/3 (Vos and Plasterk, 1994
) and IS911 (Haren, 1998
). Many of these results can now be understood from the known structure of the IN catalytic domains. It is also known that both the catalytic domain of the IN/transposase group and of other enzymes that promote phosphoryl transfer reactions, notably RNaseH (Grindley and Leschziner, 1995
) and the RuvC resolvase (Rice et al., 1996
), exhibit similar topologies. Therefore, the model presented here can serve as a guide for the allocation of amino acid residues of importance for further investigations or for the further refinement of the models of the ISLC3 central catalytic domain.
| Acknowledgement |
|---|
This work was supported in part by a grant from the National Science Council, Taiwan (NSC91-2313-B007-001).
| References |
|---|
|
|
|---|
Baker,T.A. and Luo,L. (1994) Proc. Natl Acad. Sci. USA, 91, 66546658.
Bao,T.H., Betermier,M., Polard,P. and Chandler,M. (1997) EMBO J., 16, 33573371.[CrossRef][Web of Science][Medline]
Bates,P.A., Kelley,L.A., MacCallum,R.M. and Sternber,M.J.E. (2001) Proteins: Struct. Funct. Genet., Suppl 5, 3946.
Berman,H.M. et al. (2002) Acta Crystallogr., D58, 899907.
Bujacz,G., Jaskolski,M., Alexandratos,J., Wlodawer,A., Merkel,G., Katz,R.A. and Skalka,A.M. (1995) J. Mol. Biol., 253, 333346.[CrossRef][Web of Science][Medline]
Bujacz,G., Alexandratos,J., Qing,Z.L., Clement-Mella,C. and Wlodawer,A. (1996) FEBS Lett., 398, 175178.[CrossRef][Web of Science][Medline]
Cai,M., Zheng,R., Caffrey,M., Craigie,R., Clore,G.M. and Gronenborn,A.M. (1997) Nat. Struct. Biol., 4, 567577.[CrossRef][Web of Science][Medline]
Colovos,C. and Yeates,T.O. (1993) Protein Sci., 2, 15111519.[Web of Science][Medline]
Connolly,M.L. (1983) Science, 221, 709713.
Davies,D.R., Braam,L.M., Reznikoff,W.S. and Rayment,I. (1999) J. Biol. Chem., 274, 1190411913.
Davies,D.R., Goryshin,I.Y., Reznikoff,W.S. and Rayment,I. (2000) Science, 289, 7785.
Eijkelenboom,A.P., van den Ent,F.M., Vos,A., Doreleijers,J.F., Hard,K., Tullius,T.D., Plasterk,R.H., Kaptein,R. and Boelens,R. (1997) Curr. Biol., 7, 739746.[CrossRef][Web of Science][Medline]
Engelman,A. and Craigie,R. (1995) J. Virol., 69, 59085911.[Abstract]
Fischer,D. (2000) In Maun,L. (ed.), Pacific Symposium on Biocomputing 2000, pp. 119130.
Germond,J.E., Lapierre,L., Delley,M. and Mollet,B. (1993) FEMS Microbiol. Rev., 12, 1027.
Goldgur,Y., Dyda,F., Hickman,A.B., Jenkins,T.M., Craigie,R. and Davies,D.R. (1998) Proc. Natl Acad. Sci. USA, 95, 91509154.
Grindley,N.D. and Leschziner,A.E. (1995) Cell, 83, 10631066.[CrossRef][Web of Science][Medline]
Guex,N. and Peitsch,M.C. (1997) Electrophoresis 18, 27142723.[CrossRef][Web of Science][Medline]
Haren,L. (1998) PhD Thesis, Université Paul Sabatier, Toulouse.
Haren,L., Ton-Hoang,B. and Chandler,M. (1999) Annu. Rev. Microbiol., 53, 245281.[CrossRef][Web of Science][Medline]
Jenkins,T.M., Esposito,D., Engelman,A. and Craigie,R. (1997) EMBO J., 16, 68496859.[CrossRef][Web of Science][Medline]
Junop,M.S. and Haniford,D.B. (1997) EMBO J., 16, 26462655.[CrossRef][Web of Science][Medline]
Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 25772637.[CrossRef][Web of Science][Medline]
Karplus,K., Barrett,C. and Hughey,R. (1998) Bioinformatics, 14, 846856.
Kelley,L.A., MacCallum,R.M. and Sternberg,M.J.E. (2000) J. Mol. Biol., 299, 501522.[CrossRef]
Khan,E., Mack,J.P.G., Katz,R.A., Kulkosky,J. and Skalka,A.M. (1991) Nucleic Acids Res., 19, 851860.
Kraulis,P.J., Domaille,P.J., Campbell-Burk,S.L., Van Aken,T. and Laue,E.D. (1994) Biochemistry, 33, 35153531.[CrossRef][Medline]
Kuntz,I.D., Meng,E.C. and Shoichet,B.K. (1994) Acc. Chem. Res., 27, 117123.
Mahillon,J. and Chandler,M. (1998) Microbiol. Mol. Biol. Rev., 62, 725774.
Maignan,S., Guilloteau,J.P., Zhou-Liu,Q., Clement-Mella,C. and Mikol,V. (1998) J. Mol. Biol., 282, 359368.[CrossRef][Web of Science][Medline]
Novotny,J., Rashin,A.A. and Bruccoleri,R.E. (1988) Proteins: Struct. Funct. Genet., 4, 1930.[CrossRef][Web of Science][Medline]
Pieper,U., Eswar,N., Stuart,A.S., Llyin,V.A. and Sali,A. (2002) Nucleic Acids Res., 30, 255259.
Polard,P., Ton-Hoang,B., Haren,L., Betermier,M., Walczak,R. and Chandler,M. (1996) J. Mol. Biol., 264, 6881.[CrossRef][Web of Science][Medline]
Ramachandran,G.N. and Sasisekharan,V. (1968) Adv. Protein Chem., 23, 283437.[Medline]
Rice,P.A. and Baker,T.A. (2001) Nat. Struct. Biol., 8, 302307.[CrossRef][Web of Science][Medline]
Rice,P., Craigie,R. and Davies,D.R. (1996) Curr. Opin. Struct. Biol., 6, 7683.[CrossRef][Web of Science][Medline]
Rost,B. (1995) In Rawlings,C., Clark,D., Altman,R., Hunter,L., Lengauer,T. and Wodak,S. (eds), The Third International Conference on Intelligent Systems for Molecular Biology (ISMB), AAAI Press, Cambridge, pp. 314321.
Sakai,J., Chalmers,R.M. and Kleckner,N. (1995) EMBO J., 14, 43744383.[Web of Science][Medline]
Sarnovsky,R.J., May,E.W. and Craig,N.L. (1996) EMBO J., 15, 63486361.[Web of Science][Medline]
Sekine,Y. and Ohtsubo,E. (1989) Proc. Natl Acad. Sci. USA, 86, 46094613.
Sekine,Y., Eisaki,N. and Ohtsubo,E. (1994) J. Mol. Biol., 235, 14061420.[CrossRef][Web of Science][Medline]
Skaugen,M. and Nes,I.F. (1994) Appl. Environ. Microbiol., 60, 28182829.
Vos,J.C. and Plasterk,R.H. (1994) EMBO J., 13, 61256132.[Web of Science][Medline]
Walker,D.C. and Klaenhammer,T.R. (1993) Abstracts of the 93rd General Meeting of the American Society for Microbiology, H-230.
Yang,Z.N., Mueser,T.C., Bushman,F.D. and Hyde,C.C. (1999) J. Mol. Biol., 296, 535548.
Zwahlen,M.C. and Mollet,B. (1993) FEMS Microbiol. Rev., 12, 27.
Received December 9, 2002; revised August 21, 2003; accepted September 12, 2003.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
K. V. Kepple, N. Patel, P. Salamon, and A. M. Segall Interactions between branched DNAs and peptide inhibitors of DNA repair Nucleic Acids Res., September 1, 2008; 36(16): 5319 - 5334. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

















