Skip Navigation


PEDS Advance Access originally published online on November 2, 2006
Protein Engineering Design and Selection 2006 19(12):555-562; doi:10.1093/protein/gzl044
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
19/12/555    most recent
gzl044v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (29)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Stam, M. R.
Right arrow Articles by Henrissat, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Stam, M. R.
Right arrow Articles by Henrissat, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org

Dividing the large glycoside hydrolase family 13 into subfamilies: towards improved functional annotations of {alpha}-amylase-related proteins

Mark R. Stam, Etienne G.J. Danchin, Corinne Rancurel, Pedro M. Coutinho and Bernard Henrissat1

Architecture et Fonction des Macromolécules Biologiques, UMR6098 CNRS, Universités Aix-Marseille I & II, Case 932, 163 Avenue de Luminy, 13288 Marseille cedex 9, France

1To whom correspondence should be addressed. E-mail: Bernard.Henrissat{at}afmb.univ-mrs.fr


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
Family GH13, also known as the {alpha}-amylase family, is the largest sequence-based family of glycoside hydrolases and groups together a number of different enzyme activities and substrate specificities acting on {alpha}-glycosidic bonds. This polyspecificity results in the fact that the simple membership of this family cannot be used for the prediction of gene function based on sequence alone. In order to establish robust groups that show an improved correlation between sequence and enzymatic specificity, we have performed a large-scale analysis of 1691 family GH13 sequences by combining clustering, similarity search and phylogenetic methods. About 80% of the sequences could be reliably classified into 35 subfamilies. Most subfamilies appear monofunctional (i.e. contain enzymes with the same substrate and the same product). The close examination of the other, apparently polyspecific, subfamilies revealed that they actually group together enzymes with strongly related (or even sometimes virtually identical) activities. Overall our subfamily assignment allows to set the limits for genomic function prediction on this large family of biologically and industrially important enzymes.

Keywords: {alpha}-amylase/functional prediction/glycoside hydrolase family GH13/phylogenetic analysis/subfamily classification


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
Starch is the major carbohydrate storage product of terrestrial plants and makes up an important part of the food consumed worldwide. As a direct consequence, the agricultural production of starch-rich plants is massive and exceeded 2.3 billion tons in 2002 just for maize, wheat, potatoes, cassava, rice, barley, oats and millet (FAOSTAT data, 2005; http://faostat.fao.org last accessed December 2005). Besides its direct use as food, starch is also used as a raw material in many industrial applications such as high-fructose corn syrups, glues, sizing agent for the paper industry, ethanol production etc. (van der Maarel et al., 2002Go). Starch is made of amylose, which is a linear polymer of glucose residues linked by {alpha}-1,4-glycosidic bonds, and of amylopectin, which is an {alpha}-1,4-linked D-glucan with varying proportions of {alpha}-1,6-linked branches. Because of its widespread occurrence as a storage product, many enzymes for starch hydrolysis (glycosidases) or modification (transglycosidases) are spread throughout the whole biodiversity. The same is true for enzymes acting on glycogen, the animal and bacterial equivalent of plant starch. Interestingly starch-degrading enzymes are found in just a very few of the numerous families of glycosidases and tranglycosidases (termed GH for glycoside hydrolases). For a review of the classification of glycosidases in families [see Henrissat, 1991Go; Henrissat and Bairoch, 1993Go; Bourne and Henrissat, 2001Go and the Carbohydrate-Active enZyme (CAZy) database at http://www.cazy.org/CAZY]. In this classification, the majority of the enzymes acting on starch, glycogen, and related oligo- and polysaccharides, are found within family GH13, which represents the largest family of glycoside hydrolases (data from CAZy, May 2006). This family belongs to clan GH-H which contains also families GH70 and GH77. A clan is a hierarchical level higher than the family in the CAZy classification, where families from the same clan are believed to share a common ancestor and catalytic machinery (Davies and Henrissat, 1995Go; Henrissat and Bairoch, 1996Go; Henrissat and Davies, 1997Go; Stam et al., 2005Go). The GH13 family, also known as the {alpha}-amylase family, has been identified very early (Nakajima et al., 1986Go; MacGregor, 1988Go; Svensson, 1988Go) and groups together enzymes sharing sometimes only very limited sequence similarity. As a consequence, the {alpha}-amylase family has been the subject of numerous analyses in order to derive relationships between the sequence and the properties of the enzymes (for example Jespersen et al., 1993Go; Janecek et al., 1997Go; Kuriki and Imanaka, 1999Go; MacGregor et al., 2001Go). Fuelled by the importance of {alpha}-amylases and related enzymes, many crystallographic studies have been performed on GH13 family enzymes, and 50 different members had a known 3-D structure in May 2006 (see for instance Buisson et al., 1987Go; Matsuura et al., 1980Go; Boel et al., 1990Go; Watanabe et al., 1991Go; Burk et al., 1993Go; Kadziola et al., 1994Go; Machius et al., 1995Go). Structurally, the GH13 enzymes are characterized by a conserved structural core composed of three domains often designated as domains A, B and C (Ramasubbu et al., 1996Go): domain A folds as a (ß/{alpha})8-barrel (Brayer et al., 1995Go; Brzozowski and Davies, 1997Go; Feese et al., 2000Go; Kanai et al., 2001Go; Abad et al., 2002Go), and domain B is a loop of variable length inserted between strand ß3 and helix {alpha}3 of the (ß/{alpha})8-barrel (Janecek, 1997Go). The active site is found in a cleft between domains A and B where a triad of catalytic residues performs catalysis (Brzozowski and Davies, 1997Go). Domain C is a C-terminal extension characterized by a Greek key structure (Ramasubbu et al., 1996Go; Janecek, 1997Go). In addition to this conserved core, some members of family GH13 bear a variable number of supplemental N- or C-terminal extensions such as starch-binding modules (families CBM26, CBM41, CBM34, CBM20 in CAZy) and other modules of still unknown function (Jespersen et al., 1991Go; Janecek, 1997Go). The conservation of a similar 3-D structure for the catalytic domain of family GH13 is logically accompanied by a conservation of the catalytic residues (Jespersen et al., 1991Go, 1991Go). From this conserved ancestral scaffold, a large variety of enzymes with varying substrate and product specificity has evolved resulting in the present occurrence in family GH13 of enzymes, with at least 26 different Enzyme Classification (EC) numbers from different enzyme classes: glycoside hydrolases (EC 3.2.1.X, the most abundant), enzymes transferring carbohydrates (EC: 2.4.1.X) and even isomerases (EC 5.4.99.15 [EC] and EC 5.4.99.16 [EC] ). These apparently different enzyme categories, however, use the same double displacement catalytic mechanism which proceeds through the build-up and subsequent breakdown of a glycosyl-enzyme intermediate (Davies and Wilson, 1999Go; Uitdehaag et al., 1999Go) and differ only by the nature of the final acceptor (water for the hydrolases and hydroxyl groups of the substrate for the ‘transferases’ which are in fact transglycosidases). The EC numbers that describe each enzyme activity are in general very useful, especially to avoid ambiguities and the proliferation of trivial names. However, at least in the case of glycoside hydrolases, and in particular in the case of family GH13, these numbers rarely reflect the common structural features of the enzymes and they are not appropriate for enzymes showing broad specificity (i.e. that act on several substrates). Other problems are that some EC numbers such as EC 3.2.1.98 [EC] (maltohexaose-producing {alpha}-amylase) are only particular cases of broader enzyme categories such as {alpha}-amylase (EC 3.2.1.1 [EC] ) and the distinction depends on the biochemical tests employed (or not) during characterization. Also, some different EC numbers such as EC 3.2.1.10 [EC] (oligo-1,6-glucosidase) and EC 3.2.1.70 [EC] (glucan 1,6-{alpha}-glucosidase) describe basically the same activity. Furthermore, the practical limitations in characterizing the many possible enzyme activities found among the members of this large family lead to biochemical characterizations with limited sets of substrates resulting in biased activity descriptions and annotations (Green and Karp, 2005Go). The families of glycosidases based on amino acid sequence similarity (Henrissat, 1991Go) relieved partly these limitations by providing a unified classification system that correlated with the structure and the molecular mechanism of the enzymes (Henrissat and Davies, 1997Go).

Our continuous updates of CAZy show that family GH13 grew exponentially from 40 entries in 1991 to 2700 in May 2006, e.g. doubled in size approximately every 3 years. A noteworthy fact about the current deluge of sequences that are released by genome sequencing centers and consortia is that virtually all of the novel members of family GH13 are just uncharacterized ORFs with varying degrees of annotations mostly based on unsupervised automatic procedures such as best BLAST hit scores (Rost and Valencia, 1996Go), which contribute to the creation and subsequent propagation of mis-annotation in public databases (Devos and Valencia, 2001Go). The increased use of hidden Markov model (HMM)-based annotation methods (Bateman and Haft, 2002Go; Brown et al., 2005Go) alleviates some of the problems due to the exclusive use of BLAST but presently relies on HMM models of varying quality and annotation. These models suffer from the already mentioned mis-annotations, insufficient biochemical coverage on carbohydrate-active enzymes and pollution with remote similarities (M.R. Stam, E.G.J. Danchin, P.M. Coutinho, B. Henrissat, unpublished data). A reliable tool for substrate specificity prediction is highly desirable and our day-to-day inspection of the current annotations released by genome sequencing centres shows that the situation is particularly critical in the field of glycoside hydrolases and glycosyltransferases, essentially due to the modular structure and the varying substrate specificity within sequence-based families (Coutinho and Henrissat, 1999Go). This situation is progressively worsened by the decreasing number of novel enzymatic characterization reports in modern scientific literature, perhaps reflecting the fact that the quest for increased impact factors renders journals reluctant to publish such characterizations. Because the situation is unlikely to change, it is becoming important to make the best possible use of the existing and future experimental data. In its field, the CAZy classification effort represents the beginning of a solution, because it usually restrains the number of possible activities for a new sequence assigned to a family, especially when the number of experimentally characterized members is significant. However, the problem remains for large families such as family GH13 that group enzymes of different substrate specificities or even different enzymatic activities overall catalyzed chemical reactions (e.g. hydrolase, transferase, isomerase). To address these problems, and to make progress towards improved annotation of carbohydrate-active enzymes in genomic sequences, we have classified family GH13 into subfamilies following the accepted idea that sequences sharing high similarity should share more biochemical properties than those more distantly related. The difficulties we had to overcome for this work were with the sheer size of the GH13 family, the varying modular structure of its members and the variety of EC numbers present.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
The sequences of catalytic modules of family GH13 members were extracted from the CAZy database. These sequences are the result of a 10-year manual annotation effort where the boundaries of the different catalytic modules were identified using a combination of information resulting from (i) 3-D structure analyses, (ii) deletion studies, (iii) hydrophobic cluster analysis (Gaboriaud et al., 1987Go), (iv) BLAST and PSI-BLAST analysis (Altschul et al., 1997Go) and (v) multiple sequence alignments. A total of 1691 complete catalytic modules sequences were extracted out of a total of 2100 family members available on 26 July 2005, the difference being attributable to fragmentary and other incomplete sequences. The advantage of analyzing exclusively complete and isolated catalytic module sequences is that the background noise due to the remaining component of the coding sequences, which include signal peptides, variable modules such as carbohydrate-binding modules (CBMs) and linker peptides, is eliminated. Moreover, additional modules associated with GH13 such as CBMs can have a different evolutionary history compared to that of the catalytic modules and can potentially produce inconsistencies in phylogenetic reconstructions (Machovic et al., 2005Go).

The extracted sequences corresponding to catalytic modules (GH13), comprising domains A, B and C, were subjected to a multiple sequence alignment using MUSCLE version 3.52 (Edgar, 2004Go), a program that reliably aligns large sets of protein sequences. The aligned sequences were clustered using the SECATOR algorithm (Wicker et al., 2001Go) as implemented in CLUSPACK (http://www-bio3d-igbmc.u-strasbg.fr/~wicker/programs.html). The underlying algorithm relies on BIONJ (Gascuel, 1997Go) to build a tree from the multiple sequence alignment and subsequently collapses the branches from subtrees after identification of the nodes joining different subtrees (Wicker et al., 2001Go). The resulting clusters of aligned sequences were considered as seeds for the creation of subfamilies. Many of the clusters contained too many sequences to make relevant subfamilies. A supplementary step was necessary to remove sequences sharing insufficient similarity with the remainder of the sequences of the cluster. Therefore an automated analysis was followed by a comparison of each sequence from each cluster against the library of amino acid sequences of GH13 catalytic modules using gapped BLASTP and default parameters (Altschul et al., 1997Go). The following criteria were used to identify sufficiently distinct subfamilies:

  1. sequences belonging to the same subfamily share higher sequence similarity than with the remainder of the family (Figure 1) and therefore appear at the top of the BLAST report;
  2. to ensure sufficient discriminative power, a significant shift in the BLAST E-value should be observed between the subfamily members and the remainder of the family (Figure 1);
  3. when the BLAST results were not consistent with a cluster identified by CLUSPACK, another round of CLUSPACK was performed using only the sequences of the cluster in order to obtain smaller clusters.
  4. a subfamily should contain at least five sequences from different organisms.


Figure 1
View larger version (52K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Example of a BLAST report obtained starting from a sequence from subfamily GH13_28 Starting with the sequence of the {alpha}-amylase from Bacillus subtilis SUH4-2 (Cho et al., 2000Go), sequences of the sub-family GH13_28 (framed in black box) are retrieved first with a slow and progressive increase of the E-value. The regularity of the progression is interrupted by a large difference in the E-value when members of other subfamilies are retrieved.

 
The results given by the clustering method were compared with those from an independent bootstrap-supported phylogenetic analysis. Starting with the multiple sequence alignment determined earlier, a new phylogenetic tree was created by the minimum evolution method (Kidd and Sgaramella-Zonta, 1971Go) with 100 bootstrap replicates using MEGA version 3.1 (Kumar et al., 2004Go). Because of the large number of sequences in the alignment, the ‘complete deletion’ option of MEGA was selected. This option removes all columns that contain gaps from the multiple sequence alignment.

We used TreeDyn (http://www.treedyn.org/) to analyze the resulting tree. TreeDyn allows the annotation of any leaf of the tree with external information. Here the different leaves were annotated with pertinent information for the subsequent interpretation: (i) EC number of biochemically characterized enzymes, (ii) the taxonomic group of the organisms present and (iii) the subfamilies identified in the clustering process.

The enzyme activities (EC numbers) reported for members of each subfamily were identified and checked for consistency in the context of related sequences. As routinely performed in CAZy, in order to eliminate self-propagating errors and for most sequences originating from genome sequencing efforts, all predicted activities were discarded. Biochemical activities were extracted and cross-checked using the literature and electronic data from different sources: (i) sequence and structure databases: GenBank (Benson et al., 2005Go), UniProt (Bairoch et al., 2005Go), PDB (Berman et al., 2000Go); (ii) biochemical databases: EMP (Selkov et al., 1996Go), PMD (Kawabata et al., 1999Go), and occasionally BRENDA (Schomburg et al., 2002Go); (iii) literature: PubMed (http://www.pubmed.org). Activities exhibited by only a limited number of elements in a subfamily were systematically checked to ensure reliability and support by accessible online resources.


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
The results of our functional classification effort are presented under the form of an annotated phylogenetic tree of the GH13 family (Figure 2). Each subfamily is represented by a coloured subtree. In order to adopt a general naming system that can be extended to other families of glycoside hydrolases, we chose to designate the subfamilies with Arabic numerals following the family number, by order of creation. For instance subfamily 5 of family GH13 is designated GH13_5. Table I presents a summary of the different subfamilies created in family GH13. For each of the 35 subfamilies we report the identified EC numbers, the associated activities and the taxonomic group to which the sequences belong, according to the NCBI taxonomy (Benson et al., 2005Go).


Figure 2
View larger version (59K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Phylogenetic tree of family GH13. Sequences classified into subfamilies 1–35 are shown in color. The sequences that were not included into subfamilies appear in black. The external black arcs cover subfamilies previously made by Janecek (Janecek et al., 2003Go) and Oslancava (Oslancova and Janecek, 2002Go).

 

View this table:
[in this window]
[in a new window]

 
Table I. Composition of the 35 subfamilies within glycosidase family GH13

 
We have cross-checked the subfamilies created by the clustering method with the phylogenetic tree generated by MEGA (Figure 2). All the subfamilies correspond to a subtree of the phylogenetic tree. There is only one exception, namely, subfamily GH13_13, which is found among the branches that compose subfamily GH13_14. This apparent contradiction is explained by the parameters that were selected for computation by MEGA: because of the huge number of sequences to compute, we used the ‘complete deletion’ option with MEGA, with the consequence that gaps in the alignment were not taken into account. BLAST results (data not shown) and a multiple sequence alignment (Figure 3) confirmed that the corresponding subfamilies are closely related, but distinct, and that the main difference between them are three gaps (Figure 3). The distinctiveness of the two subfamilies was verified by building a new phylogenetic tree with the sequences from subfamilies GH13_13 and GH13_14, using subfamily GH13_12 as an external group, and taking into account sequence gaps.


Figure 3
View larger version (107K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Multiple sequence alignment of characterized enzymes from subfamilies GH13_13 and GH13_14. The black boxes highlight the gaps present in one subfamily but not in the other.

 
Thirty-five subfamilies have been identified from the sample of 1691 complete GH13 catalytic domains analyzed (Table I). A total of 1358 (80%) sequences could be assigned to a subfamily. The remaining 333 (20%) sequences were not included because of insufficient statistical support. Typically the latter sequences belong to: (i) sequences left unclustered by the SECATOR procedure; (ii) insufficiently populated small clusters often lacking biochemically characterized members; and (iii) groups that are too populated for the identification of long branches indicative of distinctiveness. It is likely that some sequences in the first two categories will integrate new subfamilies when more closely related members appear.

The (apparently) monospecific subfamilies

The largest subfamilies of family GH13 are subfamilies GH13_15, GH13_9 and GH13_11 which count 303, 132 and 119 members, respectively. Interestingly, only one activity (identified by a single EC number: EC 3.2.1.1 [EC] , EC 2.4.1.8 [EC] and 3.2.1.68 [EC] , respectively) is observed in each of these families. This feature is observed in fact for 26 of the 35 subfamilies identified, covering 68% of the analyzed sequences. The division into subfamilies coinciding with single activities suggests that the acquisition of these specificities preceded speciation. For example subfamilies GH13_8 and GH13_9 group enzymes with an {alpha}-1,4-glucan branching activity (EC 2.4.1.8 [EC] ) and belong to the same subtree. The division into two subfamilies follows the taxonomy: subfamily GH13_8 groups sequences from Eukaryota while subfamily GH13_9 groups sequences from Bacteria.

Subfamily GH13_21 counts only a single biochemically characterized member, namely, an {alpha}-glucosidase (Peist et al., 1996Go). However, this enzyme is also highly active on {gamma}-cyclodextrin, which is more coherent with the close relatedness of this subfamily to subfamily GH13_20 (cyclomaltodextrinases) and its more distant relationship to subfamilies GH13_30 and GH13_17 ({alpha}-glucosidases). This example illustrates the fact that even when an experimental characterization is available, not all possible activities have been tested. More characterizations are therefore needed to assign reliably an EC number to subfamily GH13_21.

The (apparently) polyspecific subfamilies

Only five subfamilies (GH13_19, GH13_31, GH13_20, GH13_2 and GH13_4) contain more than one reported activity. However, we have noticed that the activities within each subfamily are closely related. In the case of subfamily GH13_19, there is barely any difference in term of specificity between the {alpha}-amylase (EC 3.2.1.1 [EC] ) from Escherichia coli K12 (Spiess et al., 1997Go), the maltohexaose-forming {alpha}-amylase (EC 3.2.1.98 [EC] ) from Bacillus halodurans LBK 34 (Hashim et al., 2004Go) and the maltopentaose-forming {alpha}-amylase (no EC number assigned) from alkalophilic Gram positive bacteria DSM 5853 (Candussio et al., 1990Go). In the same way, two apparently ‘different’ activities are present in subfamily GH13_31, namely, glucan 1,6-{alpha}-glucosidase (EC 3.2.1.70 [EC] ) (Whiting et al., 1993Go) and oligo-1,6-glucosidase (EC 3.2.1.10 [EC] ) (Bornke et al., 2001Go). According to the IUBMB Enzyme Nomenclature (Enzyme Nomenclature Committee, 1992Go), these two activities catalyze the hydrolysis of {alpha}-1,6-D-glucosidic linkages. Therefore, the difference is perhaps more semantic than biological. Another example of the assignment of different EC numbers for the same activity is found in subfamily GH13_20, which groups cyclomaltodextrinase (EC 3.2.1.54 [EC] ) from Paenibacillus sp. A11 (Kaulpiboon and Pongsawasdi, 2004Go), maltogenic {alpha}-amylase (EC 3.2.1.133 [EC] ) from Bacillus subtilis SUH4-2 (Cho et al., 2000Go) and neopullanase (EC 3.2.1.135 [EC] ) from Thermoactinomyces vulgaris R-47 (Tonozuka et al., 1993Go). It has been demonstrated that these three enzymes act on the same substrate and generate the same product, therefore they should be classified under the same name and the same EC number (Cheong et al., 2002Go; Lee et al., 2002Go).

Subfamily GH13_2 clusters together two maltogenic {alpha}-amylases (EC 3.2.1.133 [EC] ) (Dauter et al., 1999Go) and an acarviosyl transferase (EC 2.4.1.-) (Hemker et al., 2001Go) together with cyclodextrin glucanotransferases (EC 2.4.1.19 [EC] ) (Leemhuis et al., 2003Go). The two maltogenic {alpha}-amylases present also an high catalytic activity on cyclodextrin (Dauter et al., 1999Go) and only one amino acid mutation can change the acarviosyl transferase activity into an enzyme with 4-{alpha}-glucanotransferase activity (Leemhuis et al., 2004Go). This example shows the limitations of activity prediction based on subfamily analysis for polyspecific enzymes and for engineered variants.

Finally, two activities, sucrose hydrolase (EC 3.2.1.-) and amylosucrase (EC 2.4.1.4 [EC] ), are found in subfamily GH13_4. These two activities are closer to each other than suggested by their EC numbers, since they operate on the same substrate (sucrose), with the same molecular mechanism and the difference is only with different transglycosylation abilities. It is therefore likely that all members of this subfamily utilize sucrose as the substrate.

In conclusion, the close examination of the polyspecific subfamilies reveals that they actually contain enzymes with strongly related (or even sometimes nearly identical) substrate and/or product specificities, showing that here too subfamily assignment has strong predictive power.

Subfamilies with no associated EC number

Subfamilies GH13_3, GH13_23, GH13_34 and GH13_35 do not contain enzymes with an associated EC number. In fact, subfamilies GH13_34 and GH13_35 contain a particular set of GH13 members, which have lost their catalytic machinery and evolved to a novel function (Broer and Wagner, 2002Go; Janecek et al., 1997Go). Members from subfamily GH13_34 are known as 4F2 heavy chain proteins, which induce amino acid transport in vertebrates (Estevez et al., 1998Go), whereas subfamily GH13_35 groups cysteine, basic and neutral amino acid transporters and related proteins (Mizoguchi et al., 2001Go). A multiple sequence alignment of subfamily GH13_3 shows that while the catalytic base is conserved, the remainder of the catalytic machinery and other typical conserved motifs of this family are not (data not shown), suggesting that members of this subfamily have probably also acquired a novel, unrelated, function.

In contrast, multiple sequence alignment of subfamily GH13_23 members revealed a conserved catalytic apparatus (data not shown), suggesting that the members of this subfamily have a glycoside hydrolase capability.

Comparison with other efforts of classification within family GH13

Several criteria can be envisioned and used for the definition of subfamilies suitable to derive a better correlation between sequences and enzyme specificity than membership to the broad GH13 family. Here we have created subfamilies based on sequence similarity and phylogenetic reconstruction criteria. Overall sequence differences between catalytic modules reflect functional differences. Earlier a classification of amylases for specificity prediction purposes had been proposed, based on the structure of the small domain B (Janecek et al., 1997Go). Although it is conceivable that domain B has co-evolved to some extent with the remainder of the catalytic domain of the enzymes, it does not cover entirely the active site of family GH13 enzymes and is too short to provide a signal-to-noise ratio sufficient for the classification of hundreds of proteins. Other efforts have attempted to define a limited number of subfamilies based on phylogenetic analyses of partial subsets of family GH13 sequences (Oslancova and Janecek, 2002Go; Janecek et al., 2003Go). We have mapped the groups resulting from these earlier analyses onto the tree presented in Figure 2. Our results are broadly in agreement with these earlier studies but provide a complete analysis of the entire family GH13 resulting in both finer subdivisions (i.e. more subfamilies) and an improved correlation with activity.


    Conclusions
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
The diversity of specificities and activities found in family GH13 shows that this family is old enough to have seen the emergence (and sometime the loss) of many activities. The sequences belonging to subfamilies containing only one EC number represent 68% of the sequences analyzed. This excellent correlation with the subfamilies that we have defined through our phylogenetic analysis suggests that indeed the assignment to a subfamily is a considerable step towards improved functional prediction. However, because not all subfamilies have a biochemical characterized members and because a significant number of sequences are still not included in subfamilies, errors or imprecision are still possible during unsupervised automated genomic annotations.

In addition, the present study points out that there are still some branches of family GH13 that require structural or biochemical characterization and that additional subfamilies will emerge later. Here again, our work points to several limitations: experimental EC number assignments are sometimes ambiguous due to the use of unduly limited sets of substrates. To make things worse, the choice of an EC number appears to occasionally reflect more the opinion of the experimentalist than actual biochemical evidence. Finally, the traditional descriptive EC numbers were not intended nor designed to take into account functional drifts that arise from evolutionary events such as gene loss, convergence or duplication. All these aspects suggest the use of EC numbers in post-genomic approaches (for instance metabolic pathway mapping) with the greatest caution, as they were only designed to provide common names to describe enzyme reactions.

The rigorous approach we developed for the definition of sub-families in family GH13 will be applied in the future to other Carbohydrate-Active enZymes families which in turn will benefit from the improvement of predictability of specificity at a larger scale.

A limitation of the methods we have used is that they are very time-consuming and one cannot repeat this type of an analysis every time the CAZy database is updated. We have therefore developed a series of HMMs based on each of the 35 subfamilies described here, and these allow the rapid assignment of new sequences to the subfamilies defined here. Illustratively, the set of complete sequences of GH13 modules selected at the beginning of our work (1691 sequences as July 2005) has grown to over 2456 in August 2006. Out of the 765 novel full-length sequences added to family GH13 between July 2005 and August 2006, more than 90% could be added to the 35 subfamilies described here. This subfamily assignment will therefore become available and will be updated as an integral part of the data presented for family GH13 in the CAZy database (http://afmb.cnrs-mrs.fr/CAZY/fam/GH13.html).


    Footnotes
 
Edited by Dick Janssen


    Acknowledgements
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
This work was funded in part by the European Commission (STREP FungWall grant, contract: LSHB-CT-2004-511952).


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 Acknowledgements
 References
 
Abad M.C., Binderup K., Rios-Steiner J., Arni R.K., Preiss J., Geiger J.H. (2002) J. Biol. Chem. 277:42164–42170.[Abstract/Free Full Text]

Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. (1997) Nucleic Acids Res. 25:3389–3402.[Abstract/Free Full Text]

Bairoch A., et al. (2005) Nucleic Acids Res. 33:D154–D159.[Abstract/Free Full Text]

Bateman A. and Haft D.H. (2002) Brief Bioinform. 3:236–245.[Abstract/Free Full Text]

Benson D.A., Karsch-Mizrachi I., Lipman D.J., Ostell J., Wheeler D.L. (2005) Nucleic Acids Res. 33:D34–D38.[Abstract/Free Full Text]

Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. (2000) Nucleic Acids Res. 28:235–242.[Abstract/Free Full Text]

Boel E., Brady L., Brzozowski A.M., Derewenda Z., Dodson G.G., Jensen V.J., Petersen S.B., Swift H., Thim L., Woldike H.F. (1990) Biochemistry. 29:6244–6249.[CrossRef][Medline]

Bornke F., Hajirezaei M., Sonnewald U. (2001) J. Bacteriol. 183:2425–2430.[Abstract/Free Full Text]

Bourne Y. and Henrissat B. (2001) Curr. Opin. Struct. Biol. 11:593–600.[CrossRef][Web of Science][Medline]

Brayer G.D., Luo Y., Withers S.G. (1995) Protein Sci. 4:1730–1742.[Web of Science][Medline]

Broer S. and Wagner C.A. (2002) Cell. Biochem. Biophys. 36:155–168.[Web of Science][Medline]

Brown D., Krishnamurthy N., Dale J.M., Christopher W., Sjolander K. (2005) Pac. Symp. Biocomput. 322–333.

Brzozowski A.M. and Davies G.J. (1997) Biochemistry. 36:10837–10845.[CrossRef][Medline]

Buisson G., Duee E., Haser R., Payan F. (1987) EMBO J. 6:3909–3916.[Web of Science][Medline]

Burk D., Wang Y., Dombroski D., Berghuis A.M., Evans S.V., Luo Y., Withers S.G., Brayer G.D. (1993) J. Mol. Biol. 230:1084–1085.[CrossRef][Web of Science][Medline]

Candussio A., Schmid G., Bock A. (1990) Eur. J. Biochem. 191:177–185.[Web of Science][Medline]

Cheong K.A., Kim T.J., Yoon J.W., Park C.S., Lee T.S., Kim Y.B., Park K.H., Kim J.W. (2002) Biotechnol. Appl. Biochem. 35:27–34.[CrossRef][Web of Science][Medline]

Cho H.Y., Kim Y.W., Kim T.J., Lee H.S., Kim D.Y., Kim J.W., Lee Y.W., Leed S., Park K.H. (2000) Biochim. Biophys. Acta 1478:333–340.[CrossRef][Medline]

Coutinho P.M. and Henrissat B. (1999) J. Mol. Microbiol. Biotechnol. 1:307–308.[Medline]

Dauter Z., Dauter M., Brzozowski A.M., Christensen S., Borchert T.V., Beier L., Wilson K.S., Davies G.J. (1999) Biochemistry 38:8385–8392.[CrossRef][Medline]

Davies G. and Henrissat B. (1995) Structure 3:853–859.[Medline]

Davies G.J. and Wilson K.S. (1999) Nat. Struct. Biol. 6:406–408.[CrossRef][Web of Science][Medline]

Devos D. and Valencia A. (2001) Trends Genet. 17:429–431.[CrossRef][Web of Science][Medline]

Enzyme Nomenclature Committee. (1992) Recommendations of the Nomenclature Committee of the International Union of Biochemistry and molecular Biology on the Nomenclature and Classification of Enzymes.(Academic Press, San Diego, CA, USA).

Edgar R.C. (2004) Nucleic Acids Res. 32:1792–1797.[Abstract/Free Full Text]

Estevez R., Camps M., Rojas A.M., Testar X., Deves R., Hediger M.A., Zorzano A., Palacin M. (1998) FASEB J. 12:1319–1329.[Abstract/Free Full Text]

Feese M.D., Kato Y., Tamada T., Kato M., Komeda T., Miura Y., Hirose M., Hondo K., Kobayashi K., Kuroki R. (2000) J. Mol. Biol. 301:451–464.[CrossRef][Web of Science][Medline]

Gaboriaud C., Bissery V., Benchetrit T., Mornon J.P. (1987) FEBS Lett. 224:149–155.[CrossRef][Web of Science][Medline]

Gascuel O. (1997) Mol. Biol. Evol. 14:685–695.[Abstract]

Green M.L. and Karp P.D. (2005) Nucleic Acids Res. 33:4035–4039.[Abstract/Free Full Text]

Hashim S.O., Delgado O., Hatti-Kaul R., Mulaa F.J., Mattiasson B. (2004) Biotechnol. Lett. 26:823–828.[CrossRef][Web of Science][Medline]

Hemker M., Stratmann A., Goeke K., Schroder W., Lenz J., Piepersberg W., Pape H. (2001) J. Bacteriol. 183:4484–4492.[Abstract/Free Full Text]

Henrissat B. (1991) Biochem. J. 280:309–316.[Web of Science][Medline]

Henrissat B. and Bairoch A. (1993) Biochem. J. 293:781–788.[Web of Science][Medline]

Henrissat B. and Bairoch A. (1996) Biochem. J. 316:695–696.[Web of Science][Medline]

Henrissat B. and Davies G. (1997) Curr. Opin. Struct. Biol. 7:637–644.[CrossRef][Web of Science][Medline]

Janecek S. (1997) Prog. Biophys. Mol. Biol. 67:67–97.[CrossRef][Web of Science][Medline]

Janecek S., Svensson B., Henrissat B. (1997) J. Mol. Evol. 45:322–331.[CrossRef][Web of Science][Medline]

Janecek S., Svensson B., MacGregor E.A. (2003) Eur. J. Biochem. 270:635–645.[Web of Science][Medline]

Jespersen H.M., MacGregor E.A., Sierks M.R., Svensson B. (1991) Biochem. J. 280:51–55.[Medline]

Jespersen H.M., MacGregor E.A., Henrissat B., Sierks M.R., Svensson B. (1993) J. Protein Chem. 12:791–805.[CrossRef][Web of Science][Medline]

Kadziola A., Abe J., Svensson B., Haser R. (1994) J. Mol. Biol. 239:104–121.[CrossRef][Web of Science][Medline]

Kanai R., Haga K., Yamane K., Harata K. (2001) J. Biochem. (Tokyo) 129:593–598.[Abstract/Free Full Text]

Kaulpiboon J. and Pongsawasdi P. (2004) J. Biochem. Mol. Biol. 37:408–415.[Web of Science][Medline]

Kawabata T., Ota M., Nishikawa K. (1999) Nucleic Acids Res. 27:355–357.[Abstract/Free Full Text]

Kidd K.K. and Sgaramella-Zonta L.A. (1971) Am. J. Hum. Genet. 23:235–252.[Web of Science][Medline]

Kumar S., Tamura K., Nei M. (2004) Brief Bioinform. 5:150–163.[Abstract/Free Full Text]

Kuriki T. and Imanaka T. (1999) J. Biosci. Bioeng. 87:557–565.[CrossRef][Web of Science][Medline]

Lee H.S., Kim M.S., Cho H.S., Kim J.I., Kim T.J., Choi J.H., Park C., Oh B.H., Park K.H. (2002) J. Biol. Chem. 277:21891–21897.[Abstract/Free Full Text]

Leemhuis H., Dijkstra B.W., Dijkhuizen L. (2003) Eur. J. Biochem. 270:155–162.[Web of Science][Medline]

Leemhuis H., Wehmeier U.F., Dijkhuizen L. (2004) Biochemistry 43:13204–13213.[CrossRef][Medline]

MacGregor E.A. (1988) J. Protein Chem. 7:399–415.[CrossRef][Web of Science][Medline]

MacGregor E.A., Janecek S., Svensson B. (2001) Biochim. Biophys. Acta 1546:1–20.[CrossRef][Medline]

Machius M., Wiegand G., Huber R. (1995) J. Mol. Biol. 246:545–559.[CrossRef][Web of Science][Medline]

Machovic M., Svensson B., MacGregor E.A., Janecek S. (2005) FEBS J. 272:5497–5513.[CrossRef][Medline]

Matsuura Y., Kusunoki M., Harada W., Tanaka N., Iga Y., Yasuoka N., Toda H., Narita K., Kakudo M. (1980) J. Biochem. (Tokyo) 87:1555–1558.[Abstract/Free Full Text]

Mizoguchi K., et al. (2001) Kidney Int. 59:1821–1833.[CrossRef][Web of Science][Medline]

Nakajima R., Imanaka T., Aiba S. (1986) Appl. Microbiol. Biotechnol 23:355–360.

Oslancova A. and Janecek A. (2002) Cell Mol. Life. Sci. 59:1945–1959.[CrossRef][Web of Science][Medline]

Peist R., Schneider-Fresenius C., Boos W. (1996) J. Biol. Chem. 271:10681–10689.[Abstract/Free Full Text]

Ramasubbu N., Paloth V., Luo Y., Brayer G.D., Levine M.J. (1996) Acta Crystallogr. D Biol. Crystallogr. 52:435–446.[CrossRef][Medline]

Rost B. and Valencia A. (1996) Curr. Opin. Biotechnol. 7:457–461.[CrossRef][Web of Science][Medline]

Schomburg I., Chang A., Schomburg D. (2002) Nucleic Acids Res. 30:47–49.[Abstract/Free Full Text]

Selkov E., et al. (1996) Nucleic Acids Res. 24:26–28.[Abstract/Free Full Text]

Spiess C., Happersberger H.P., Glocker M.O., Spiess E., Rippe K., Ehrmann M. (1997) J. Biol. Chem. 272:22125–22133.[Abstract/Free Full Text]

Stam M.R., Blanc E., Coutinho P.M., Henrissat B. (2005) Carbohydr. Res. 340:2728–2734.[CrossRef][Web of Science][Medline]

Svensson B. (1988) FEBS Lett. 230:72–76.[CrossRef][Web of Science][Medline]

Tonozuka T., Ohtsuka M., Mogi S., Sakai H., Ohta T., Sakano Y. (1993) Biosci. Biotechnol. Biochem. 57:395–401.[Medline]

Uitdehaag J.C., Mosi R., Kalk K.H., van der Veen B.A., Dijkhuizen L., Withers S.G., Dijkstra B.W. (1999) Nat. Struct. Biol. 6:432–436.[CrossRef][Web of Science][Medline]

van der Maarel M.J., van der Veen B., Uitdehaag J.C., Leemhuis H., Dijkhuizen L. (2002) J. Biotechnol. 94:137–155.[CrossRef][Web of Science][Medline]

Watanabe K., Kitamura K., Hata Y., Katsube Y., Suzuki Y. (1991) FEBS Lett. 290:221–223.[CrossRef][Web of Science][Medline]

Whiting G.C., Sutcliffe I.C., Russell R.R. (1993) J. Gen. Microbiol. 139:2019–2026.[Abstract/Free Full Text]

Wicker N., Perrin G.R., Thierry J.C., Poch O. (2001) Mol. Biol. Evol. 18:1435–1441.[Abstract/Free Full Text]

Received June 27, 2006; revised August 31, 2006; accepted September 18, 2006.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
B. G. Hall, A. Pikis, and J. Thompson
Evolution and Biochemistry of Family 4 Glycosidases: Implications for Assigning Enzyme Function in Sequence Annotations
Mol. Biol. Evol., November 1, 2009; 26(11): 2487 - 2497.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
J.-H. Shim, J.-T. Park, J.-S. Hong, K. W. Kim, M.-J. Kim, J.-H. Auh, Y.-W. Kim, C.-S. Park, W. Boos, J.-W. Kim, et al.
Role of Maltogenic Amylase and Pullulanase in Maltodextrin and Glycogen Metabolism of Bacillus subtilis 168
J. Bacteriol., August 1, 2009; 191(15): 4835 - 4844.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
M. Palomo, S. Kralj, M. J. E. C. van der Maarel, and L. Dijkhuizen
The Unique Branching Patterns of Deinococcus Glycogen Branching Enzymes Are Determined by Their N-Terminal Domains
Appl. Envir. Microbiol., March 1, 2009; 75(5): 1355 - 1362.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
B. L. Cantarel, P. M. Coutinho, C. Rancurel, T. Bernard, V. Lombard, and B. Henrissat
The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D233 - D238.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
E.-J. Woo, S. Lee, H. Cha, J.-T. Park, S.-M. Yoon, H.-N. Song, and K.-H. Park
Structural Insight into the Bifunctional Mechanism of the Glycogen-debranching Enzyme TreX from the Archaeon Sulfolobus solfataricus
J. Biol. Chem., October 17, 2008; 283(42): 28641 - 28648.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
R. M. Kelly, H. Leemhuis, L. Gatjen, and L. Dijkhuizen
Evolution toward Small Molecule Inhibitor Resistance Affects Native Enzyme Function and Stability, Generating Acarbose-insensitive Cyclodextrin Glucanotransferase Variants
J. Biol. Chem., April 18, 2008; 283(16): 10727 - 10734.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
Q. P. Liu, H. Yuan, E. P. Bennett, S. B. Levery, E. Nudelman, J. Spence, G. Pietz, K. Saunders, T. White, M. L. Olsson, et al.
Identification of a GH110 Subfamily of {alpha}1,3-Galactosidases: NOVEL ENZYMES FOR REMOVAL OF THE {alpha}3GAL XENOTRANSPLANTATION ANTIGEN
J. Biol. Chem., March 28, 2008; 283(13): 8545 - 8554.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
R. M. van der Kaaij, S. Janecek, M. J. E. C. van der Maarel, and L. Dijkhuizen
Phylogenetic and biochemical characterization of a novel cluster of intracellular fungal {alpha}-amylase enzymes
Microbiology, December 1, 2007; 153(12): 4003 - 4015.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
19/12/555    most recent
gzl044v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (29)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Stam, M. R.
Right arrow Articles by Henrissat, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Stam, M. R.
Right arrow Articles by Henrissat, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?