PEDS Advance Access originally published online on June 20, 2007
Protein Engineering Design and Selection 2007 20(7):327-337; doi:10.1093/protein/gzm024
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Development of a screening platform for directed evolution using the reef coral fluorescent protein ZsGreen as a solubility reporter
1Global Protein Science and Supply, AstraZeneca R&D, Södertälje S-151 85, Sweden 2Global Protein Science and Supply, AstraZeneca, Mereside, Alderley Park, Macclesfield, Cheshire SK10 4TG, UK
3 To whom correspondence should be addressed. E-mail: catherine.heddle{at}astrazeneca.com
| Abstract |
|---|
|
|
|---|
Soluble proteins, with high expression levels, are preferred candidates for structural and functional studies. In cases of low expression, aggregation or inclusion body formation, time-consuming searches for optimal expression or refolding conditions are required. We have developed a high-throughput solubility engineering and screening platform for proteins that are expressed in an insoluble form in Escherichia coli with the aim of obtaining a broad spectrum of best hits with increased solubility in difficult to express target proteins. This process has been developed using error-prone PCR to introduce random base changes in genes of interest. Expression of mutated proteins in fusion with the reef coral fluorescent protein ZsGreen as a solubility marker has enabled the selection of more soluble variants. We have used a colony picker to achieve high-throughput selection of E.coli expressing more soluble target protein–ZsGreen fusions, with increased fluorescence. The whole process enables us to complete one round of mutation, screening and analysis of 20 000 potential soluble clones within
8 weeks. We describe the development of the methods using different model proteins and show one example, the kinase domain from the human EphB2 receptor, as a successful application of the whole platform.
Keywords: directed evolution/E.coli/high-throughput solubility screen/recombinant expression/reef coral fluorescent proteins (RCFPs)
| Introduction |
|---|
|
|
|---|
Large quantities of soluble protein are required for structural and functional studies in the pursuit of identifying new candidate drugs. Escherichia coli (E.coli) is a potential expression system, but frequently recombinant proteins fail to fold correctly and form insoluble aggregates in inclusion bodies (Carrio et al., 1998
Directed evolution screening methods using error-prone PCR (epPCR) and DNA shuffling techniques to introduce random but controllable numbers of point mutations into genes of interest have been reported as an alternative tool to select for mutants with improved solubility (Waldo et al., 1999
; Hart and Tarendeau, 2006
). An indirect folding reporter is used to detect changes in protein solubility in such mutant libraries. In this fusion reporter method, a test domain or library is expressed as an N-terminal fusion with the reporter. Following expression of this recombinant protein, the reporter protein activity level correlates with the folding success or failure of the upstream moiety. A well-characterised folding reporter system uses green fluorescent protein (GFP) as the C-terminal reporter tag (Waldo et al., 1999
; Waldo, 2003
). A poorly folded or aggregated upstream moiety prevents the correct folding of GFP and hence the fusion protein does not fluoresce.
Recently, several studies have reported the successful application of such directed evolution methods to the screening of protein variants with increased soluble expression in E.coli. For example, the above directed evolution method has been used to identify a tobacco etch virus protease triple mutant with a 5-fold increase in the yield of soluble expression and wild-type activity (van den Berg et al., 2006
). Directed evolution was also used to increase the soluble expression of Pyrobaculum aerophilum methyl transferase, tartrate dehydratase
-subunit and nucleoside diphosphate kinase to 50%, 95% and 90%, respectively (Pedelacq et al., 2005
).
We were interested in optimising the method in order to make it a feasible tool to improve the soluble expression of recalcitrant targets. We focused on three key aspects of the directed evolution process. First, we describe the choice of reef coral fluorescent protein (RCFP) as a solubility reporter and its validation in a high-throughput process using a colony picker. Second, we tested the epPCR method in conjunction with the Invitrogen GatewayTM cloning system for the construction of random mutation libraries. Third, we describe the use of a multi-parallel protein analysis technique for the expression, purification and characterisation of mutant proteins.
ZsGreen, AmCyan, ZsYellow and AsRed are members of the RCFP family of novel fluorescent proteins that have been isolated from non-bioluminescent species of reef-coral organisms. Although the sequence homology between RCFPs and GFP can vary significantly (less than 30% homology), they have a very conserved tertiary structure (Matz et al., 1999
). RCFP variants with brighter fluorescence and specific emission characteristics have been generated and are now commercially available (Clontech). RCFPs have the same advantageous qualities that make GFP a suitable reporter, as they do not require external cofactors or substrates and can be used in vivo or in vitro (Bourett et al., 2002
; Wenck et al., 2003
). In this report, we tested RCFPs in conjunction with a directed evolution methodology to determine if they could be used like GFP as fusion tags to fluorescently select for E.coli expressing protein with improved folding and by implication with improved solubility.
One key element of the directed evolution process is the ability to readily create large expression libraries. To do this, we have tested the Invitrogen GatewayTM cloning method, which is based on site-specific recombination mediated by lambda phage and provides a rapid and highly efficient tool to transfer libraries to E.coli expression vectors (Hartley et al., 2000
). The RCFP GatewayTM adapted expression vectors were constructed to determine which one of the fluorescent proteins was the most suitable as a solubility reporter.
We were also interested in understanding if directed evolution could successfully be applied to classes of proteins, which are historically poorly expressed in E.coli such as receptor tyrosine kinases (RTKs). We chose to subject the kinase domain of the Ephrin B2 RTK (EphB2) to our newly optimised screening platform in an attempt to improve its soluble expression in E.coli. EphB2 kinase domain crystal structure has been previously resolved using murine recombinant protein to circumvent the problems of poor soluble expression of the human homologue in E.coli (Wybenga-Groot et al., 2001
). In this article, we describe the successful application of our high-throughput directed evolution screening platform to EphB2 and the identification of mutants with improved soluble expression compared with a wild-type construct.
| Materials and methods |
|---|
|
|
|---|
Plasmid construction/GatewayTM cloning
The RCFP vectors pET28ZsGreen, pET28ZsYellow, pET28AmCyan, pET28AsRed and pET28GFP were constructed in-house (pET28 was obtained from Novagen). The RCFPs were obtained from Clontech. The GFP variant used was the red shift variant [S65T]GFP. The GatewayTM adapted destination vectors pT7-ZsGreen, pT7-AmCyan, pT7-ZsYellow and pT7-GFP were constructed using an in-house pT7 destination vector as backbone (modified T7 promoter expression vector) (Tobbell et al., 2002
). Using modifying primers and PCR, the RCFPs and GFP coding sequence were adapted to contain KpnI sites at the 5' and 3' ends to facilitate insertion into the pT7 destination vector downstream of the attR2 site. For all of the RCFPs and GFP, the ATG start codons were removed to allow for continuous read through from the gene of interest. The primer sequences used are as follow (KpnI site is underlined): modifying forward primer for ZsGreen 5'-ACCGGTCGGTACCGGCTCAGTC-3'; reverse primer for ZsGreen 5'-AGTCGCGGTACCTCAGGGCAATG-3'; forward primer for AmCyan 5'-ACCGGTCGGTACCGGCT CTTTC-3'; reverse primer for AmCyan 5'-GAGTCGC GGTACCTCAGAAAGGG-3'; forward primer for ZsYellow 5'-GTACCGGTCGGTACCGGCTCATTC-3'; reverse primer for ZsYellow 5'-CGCGGTACCTCAGGCCAAGGCA-3', forward primer for GFP 5'-GGCAGCTGGTGGGTACC GAGCAAGGGCGAGG-3' reverse primer for GFP 5'-CGGGGTACCTCACTTGTACAGCTCGTCCATGCC-3'.
In our pT7-RCFP destination vectors, attR1 and attR2 sites flank the toxic ccdB gene directly upstream of the RFCP (Fig. 1). During the recombination events of the GatewayTM LR reaction, the ccdB gene is replaced by the gene of interest flanked by the corresponding attL1 and attL2 sites and thus acts as a negative selection tool (Hartley et al., 2000
). These expression vectors allowed the expression of a fusion protein containing a C-terminal RCFP tag labelled thereafter as Test Protein-RCFP.
|
Human oestrogen receptor DNA binding domain (ERDBD), R183-H267 (Wikstrom et al., 1999
Constructs were cloned into pT7-RCFP series expression vectors using BP and LR recombination reactions according to the GatewayTM technology instruction manual (Invitrogen). The ERDBD and [C245S]ERDBD were amplified and an N-terminal thrombin cleavage site added using forward primer 5'-CACAGCAGCGGCCTGGTGCCGCGCGGCAGCCATCGCTACTGTGCAGTGTGC-3' and reverse primer 5'-AAAGCTGGGTCGTGTTTCAACATTCTC-3'. A 6His tag was then added to the N-terminal using the forward primer 5'-GAGATATACATATGGGCAGCAGCCATCATCATCATC ATCACAGCAGCGGCC-3' in combination with the previously mentioned reverse primer. Att sites were introduced using the forward primer 5'-GGGGACAAGTTTGTACAAAAAAGCAGGCTTCTTTAACTTTAAGAAGGAGATATACATATG-3' and reverse primer 5'-GGGGACCACTTTGTACAAGAAAGCTGGGTC-3'. pENTR-ERDBD and pENTR-[C245S]ERDBD were verified by sequencing. The wild-type human EphB2 tyrosine kinase domain used in this study consisted of amino acids 604–898 of the EphB2 RTK. It was cloned into the destination vector pT7-ZsGreen with an N-terminal-6His tag to allow for purification of the recombinant protein by IMAC (immobilised metal ion affinity chromotography). The addition of GatewayTM att sequences and 6His tag was carried out in a two-step PCR reaction using 6His tag forward primer 5'-GGAGATATAACTATGCATCACCATCACCATCACGACCCCAACGAGGCAGTG-3' and the reverse primer 5'-GACCCTGAAACAGAGAGGAGAGGGGC GCCATG-3' for the initial PCR; and using attB1 forward primer 5'-GGGGACAAGTTTGTACAAAAAAGCAGGCTTCTTTAACTTTAAGAAGGAGATATAACTATG-3' and attB2 reverse primer 5'-GGGGACCACTTTGTACAAGAAAGCTGGGTCCGGACCCTGAAACAGAACTTCCAG-3' for the second PCR reaction. pENTR-EphB2 was verified by sequencing. All the cDNAs used for this study were obtained from our in-house collections of vector constructs, including those used to prepare the panel of soluble and insoluble proteins.
The epPCR was based on the method of Fromant et al. (1995)
. The EphB2 random mutation library was generated using 0.56 mM dATP, 0.9 mM dCTP, 0.2 mM dGTP, 1.4 mM dTTP and 3.26 mM MgCl2, 1 mM MnCl2, 0.1 U/µl Taq DNA polymerase (Invitrogen). PCR products were verified on a 1% agarose gel and extracted using Qiagen QIAquick Gel Extraction Kit. Purified products were eluted in 35 µl water. DNA concentration of the purified PCR product was measured to ensure the total yield was over 500 ng (20 ng/µl). The BP reaction resulting in pENTR-EphB2 library was carried out according to the GatewayTM technology instruction manual (Invitrogen). Following BP reaction, library size was estimated by plating serial dilutions of cells on Luria Broth (Bloom et al., 2005
) agar plates containing kanamycin (25 µg/ml). The rate of epPCR mutation was confirmed by sequencing 12 randomly selected clones, prepared using the TempliPhi method (GE Healthcare). Finally, the LR reaction to create pT7-EphB2-ZsGreen library was carried out according to the GatewayTM technology instruction manual (Invitrogen). Libraries were stored as glycerol stocks at –80ºC until screening. A viability count was carried out with each library prior to screening.
For construct verification of the RCFP destination vectors, the CEQ DTCS QuickStart Kit (Beckman Coulter) was used. Amplification was set up according to the manufacturer's instructions using appropriate primers with the following modifications: one-quarter volume of kit reagent was used for each PCR reaction. Following PCR, the reaction was stopped and samples precipitated following the manufacturer's instructions. Sequencing and analysis were carried out on a CEQ8000 Genetic analysis System (Beckman Coulter).
For libraries, quality control confirmation template DNA was prepared by rolling circle amplification using GE Healthcare TempliPhi Amplification kit. One millilitre of LB liquid culture, supplemented with appropriate antibiotics and dispensed into deep-well 96-well plate, was inoculated with a single colony (BP reaction transformants) or 1 µl of glycerol stock (Library screen hits) and incubated overnight at 37°C, 250 r.p.m. The TempliPhi reaction was set up in 96-well PCR plates according to the supplier's instructions with the following modifications: 1 µl of a 1:100 dilution of the overnight culture in water was used as starting material, the TempliPhi reaction mixture was spiked with 0.5 M betaine (SIGMA) and incubated overnight at room temperature. Prior to DNA sequencing or analytical restriction digestion, the amplified product was diluted 1:5 in water and 10–15 µl was used to set up the sequencing reaction with Applied Biosystems BigDye terminator chemistry.
DNA sequence analyses were performed using LaserGene DNAStar SeqMan software (v5.07) and SoftGenetics®LLC Mutation surveyor. In silico GatewayTM cloning and construction of restriction maps were carried out using Invitrogen Vector NTITM Advance 9.
The library screening was performed in a two-step process. First, libraries of mutated genes assembled in pT7-ZsGreen were transformed into BL21(DE3) E.coli (Invitrogen). Transformants were plated at
2000 colonies per plate (Vented Q-trays, Genetix) on nylon membrane (Performa II, Genetix) positioned on LB agar plates containing tetracycline (12 µg/ml) and incubated overnight at 37°C. Induction was performed for 5 h at 37°C by transferring the membrane to LB agar plates containing tetracycline and supplemented with 1 mM IPTG (isopropyl-beta-D-thiogalactopyranoside). Colonies with increased fluorescence were identified and picked into 96-well plates (X6011, Genetix) containing 120 µl of LB liquid media with tetracycline, (12 µg/ml) per well. Colony picking was carried out using a GloPix colony picker (Genetix), equipped with a ZsGreen1 filter (Chroma Technologies) with excitation 470 nm and emission 520 nm. The plates were incubated overnight at 37°C, 450 r.p.m (Comfort Shaker, Eppendorf). Glycerol stocks were made by adding 60 µl of 45%(v/v) glycerol/LB to each well and stored at –80°C. A second round of screening was immediately carried out as previously described with the following modifications: the hits obtained from the first picking stage were gridded out, using a 96-pin replicator, onto nylon membrane positioned onto LB agar plates containing tetracycline. Additionally, un-mutated EphB2 in pT7-ZsGreen [expressed in BL21(DE3)] was spotted on the membrane alongside the library hits. The colony picker was set up to select colonies with an increased fluorescence signal compared with an un-mutated control.
Expression of recombinant proteins was carried out using E.coli BL21(DE3). For small-scale expression experiments, 1 µl of glycerol stock was used to inoculate 1 ml of LB liquid culture, which had been supplemented with the appropriate antibiotics and dispensed into 96-deep-well plates (Whatman-Uniplate, 2 ml). After overnight incubation at 37°C and 250 r.p.m, these start-up cultures were used to inoculate 5 ml of LB broth (with appropriate antibiotics) dispensed in 24-deep-well plates (Whatman-Uniplate, 10 ml) to a final optical density (OD 600 nm) 0.05. Expression cultures were grown to an OD 600 nm of
0.4 at 37°C, 250 r.p.m and then transferred to a 20°C incubator. When OD 600 nm reached 0.6, cultures were induced with IPTG to a final concentration of 0.1 mM. Cells were harvested by centrifugation after 18–20 h of induction. During cultivation, deep-well plates were sealed with Qiagen AirPore tape. For ERDBD constructs, expression was induced with 0.5 mM IPTG and performed at 20°C and 37°C. For EphB2 constructs, induction was carried out with 0.1 mM IPTG at 20°C. For the larger scale expression experiments, overnight cultures were used to inoculate 600 ml of LB in 2 l flasks, to a final OD 600 nm of 0.05. Culture and induction conditions were identical to those used during small-scale experiments.
Fluorescence detection using a Microplate reader
The optimisation of detection and choice of RCFP measurement of fluorescence was carried out using a SpectraMax Gemini XS (Molecular Devices). To measure the fluorescence of ZsGreen-tagged recombinant proteins identified from the library screening, a TECAN microplate reader with a 485 nm excitation filter and 535 nm emission filter using a gain setting of 0.8 was used. In parallel, the OD 600 nm was quantified on a SpectraMax microplate reader. The fluorescence signal to noise ratio [labelled as fluorescence signal in arbitrary units (AU) hereafter] was calculated using the formula [(Signal – Noise)/Noise]; where signal is the fluorescence signal from the induced or mutated proteins and the noise corresponds to the fluorescence signal from un-induced samples or un-mutated protein in the case of a library screen. In addition, the fluorescence signal was normalised to equal cell density, which enabled direct comparison between samples.
For high-throughput purification, cell pellets were re-suspended in 500 µl of Lysis buffer; 20 mM Tris pH7.8, 200 mM NaCl, 5 mM Imidazole, 13 mM CHAPS supplemented with 10 mg/ml lysozyme and 10 U/ml benzonase nuclease (Novagen). Cells suspensions were incubated for 30 min at 25°C with shaking (250 r.p.m). After one freeze-thaw cycle, cells were separated into soluble and insoluble fractions by centrifugation at 3000g for 30 min through a 96-well filter plate (Whatman). Purification of 6His-tagged proteins from the soluble fraction was carried out using His-MultiTrap HP plates (GE Healthcare) according to the supplier's instructions. Plates were washed using Lysis buffer and 6His-tagged proteins were eluted with 250 µl of Lysis buffer supplemented with 0.2 M Imidazole. For larger scale purification, cell pellets were re-suspended at 10 ml/g of cell paste in 40 mM HEPES pH 8.0, 500 mM NaCl, 20 mM Imidazole, 1 mM TCEP, protease inhibitor tablets (Roche complete EDTA-free tablets), lysozyme (10 mg/ml), 5 mM MgCl2 and 10 U/ml Benzonase nuclease. After 30 min incubation at room temperature, cell lysis was carried out by sonication for 2 x 30 s pulses at 50% amplitude on a Sonics and Materials VibraCell sonicator. Insoluble material was removed by centrifugation at 30 000g for 30 min. For protein purification in denaturing conditions, insoluble material was re-suspended by sonication in lysis buffer supplemented with 8 M urea. 6His-tagged proteins were purified on Qiagen Ni-NTA resin packed into BioRad Bio-spin disposable chromatography columns, pre-equilibrated with 5 column volume (CV) of lysis buffer. After a 10 CV wash using lysis buffer, proteins were eluted in presence of 250 mM Imidazole. Samples were analysed on NuPage Bis-Tris 4–12% SDS–PAGE gels (Invitrogen) or on the Agilent Lab-on-a-Chip system (Agilent Technologies) according to the manufacturer's instructions. Coomassie-stained SDS–PAGE gels were analysed by densitometry on images captured with Syngene GeneGenius using Syngene GeneSnap and GeneTools software. The fraction of soluble protein was defined as %soluble = [Tsol/(Tsol + Tins)], with Tsol the soluble fraction and Tins the insoluble fraction. Protein concentration of purified samples was determined using Bio-Rad Protein assay reagent with bovine serum albumin as a standard.
| Results |
|---|
|
|
|---|
Characterisation of RCFPs in E.coli liquid culture
The members of the RCFP family have the potential to be reporters of improved solubility. To determine their suitability in a solubility reporter assay, we needed to identify the optimum wavelength for their detection. To determine the optimal wavelength combinations for the detection of the RCFPs different combinations of excitation, emission and cutoff wavelengths were tested. Escherichia coli BL21(DE3) cells expressing the RCFPs alone in a pET28 vector were used. This was considered to be a baseline expression level for RCFPs as they were not linked to any fusion protein that could interfere with their folding and accumulation level. Fluorescence of whole cells expressing the RCFPs in liquid culture was carried out at a variety of wavelengths based on the optima published by the suppliers (Clontech). Whole cell fluorescence was measured directly from samples of induced cell cultures and the signal to noise fluorescence measurements for each new combination of excitation, emission and cutoff wavelengths for each RCFPS are shown in Fig. 2. AsRed was immediately identified as having a poorer maximal signal to noise ratio compared with the other RCFPs and was eliminated as a potential solubility reporter. In contrast, ZsGreen and AmCyan showed a good signal to noise ratio whereas ZsYellow had an intermediate signal. The level of fluorescence for the RCFPs was seen to be greater in all cases when compared with the control GFP (S65T) suggesting that the RCFPs were much easier to detect than this variant of GFP.
|
Changes in solubility levels detected using ZsGreen
Since ZsGreen, AmCyan and ZsYellow had been identified as having suitable signal to noise levels detection, their ability to report distinct levels of solubility in different proteins was then tested. This was done using two variants of the human ERDBD known to have different levels of solubility (Wikstrom et al., 1999
; Low et al., 2002
). ERDBD carrying the C245S point mutation has a reduced solubility compared with the wild-type ERDBD protein. C245 within the hydrophobic core of the second Zinc finger is a conserved residue among the superfamily of nuclear hormone receptor DNA binding domain. Its substitution with a Ser residue has been shown to greatly destabilise ERDBD structure through loss of non-polar interactions.
The expression vectors pT7-ERDBD-ZsGreen, pT7-ERDBD-AmCyan, pT7-ERDBD-ZsYellow, pT7-[C245S] ERDBD-ZsGreen, pT7-[C245S]ERDBD-AmCyan and pT7- [C245S]ERDBD-ZsYellow were constructed. They allow the expression of ERDBD constructs with an N-terminal 6His tag and a C-terminal RCFP tag. Expression of the fusion proteins was carried out in BL21(DE3) cells at both 20°C and 37°C to assess the RCFPs ability to function as reporters under different cultivation temperature. All RCFP fusion constructs expressed at similar levels (data not shown). At both temperatures, ZsGreen appeared to be the best solubility reporter of all of the RCFPs variants as it provides a strong fluorescence signal and showed the most distinguishable difference between the ERDBD and [C245S]ERDBD mutant (Fig. 3). AmCyan, which showed high levels of fluorescence for both ERDBD and [C245S]ERDBD, was thought to be a less favourable option, as we could not easily differentiate between the two ERDBD variants. ZsYellow was also considered to be a possible candidate as an indicator of solubility but was not selected as it gave a lower overall signal to noise ratio and would provide a less sensitive detection system than ZsGreen.
|
ZsGreen appeared to be the best fluorescent tag in terms of signal sensitivity. It is known that large fusion tag such as the 40 kDa maltose binding protein (MBP) can promote the folding of its fusion partner and thus act as a solubilising agent (Kapust and Waugh, 1999
|
|
Initial selection of ZsGreen fusion proteins by colony picker according to solubility levels
To confirm the ability of ZsGreen to act as a solubility reporter for high-throughput screening, using a Genetix automated colony picker the following experiment was conducted. A mixed population of cells was created so that cells expressing ERDBD and [C245S]ERDBD ZsGreen fusion proteins were in the minority, (20 of 2000), compared with cells expressing the non-fluorescent 6His-tagged ERDBD and [C245S] ERDBD constructs. The cell mixture was grown and recombinant protein expression induced as described in the library screening method. Fluorescent colonies were picked and analysed to determine if the colony picker could (1) identify and pick fluorescent colonies and (2) preferentially pick ERDBD-ZsGreen over [C245S]ERDBD-ZsGreen fusions. As shown in Fig. 5a and b, the colony picker was able to differentiate between fluorescent and non-fluorescent colonies using a ZsGreen specific filter. From 2000 plated colonies, the colony picker selected 18 fluorescent colonies (Fig. 5c). DNA sequencing confirmed that all 18 colonies contained the ZsGreen tag and determined the order of picking as follows: first 12 clones picked correspond to ERDBD-ZsGreen and the final six clones correspond to [C245S]ERDBD-ZsGreen. There was a discrepancy between total number of fluorescent clones (20 clones) and those suitable to pick (18 clones), this was due to the stringency of the picking criteria where colonies of the wrong axis ratio, size or roundness (i.e. those that are not single colonies) are not selected.
|
Further testing of colony picking with ZsGreen fusion proteins
The ZsGreen tag was further tested in order to assess its use as a solubility reporter for typical therapeutic targets. The
30 kDa kinase domains of three RTKs (RTK1, RTK2 and RTK3) and of two Ser/Thr kinases (STK1 and STK2) were sub-cloned into pT7-ZsGreen expression vector. These new kinase constructs were expressed alongside our soluble ERDBD and [C245S]ERDBD constructs. All three RTKs show poor solubility levels (<0.1 mg/l culture) when expressed and purified as 6His-tagged fusion proteins, whereas STK1 and STK2 have detectable levels of solubility at
10–14 mg/l (Fig. 6 and Table II). As seen in Fig. 7a, the whole cell fluorescence level correlates with the 6His constructs solubility levels as only the ERDBD and STK constructs have detectable fluorescence levels above background level. The ERDBD–ZsGreen construct has the highest whole cell fluorescence (
39 AU) followed by [C245S]ERDBD (
10 AU) and STK1 and STK2 (
5 AU). In contrast, the poorly soluble RTKs constructs have very low levels of fluorescence (<1 AU). The absence of fluorescence for RTK2 and RTK3 may be explained by both their low expression level and poor solubility, RTK1 is highly expressed, insoluble and yet yields no fluorescence. It is proposed that the presence of RTKs N-terminal fusions prevents ZsGreen from forming its native fluorescent structure.
|
|
|
This panel of RTKs, STKs and ERDBD constructs were then tested on the colony picker in order to determine if the colony picker can effectively discriminate between soluble and insoluble constructs. A mixed population of cells containing equal numbers of cells expressing all the different constructs tagged with either a C-terminal 6His or ZsGreen was plated and screened using the GloPix colony picker. The identity of the picked clones was confirmed by DNA sequencing. Figure 7b shows that none of the insoluble RTKs was selected, whereas the soluble STK and ERDBD constructs were picked. Furthermore, the order of selection of the expressed constructs by the colony picker corresponds to their solubility level as ERDBD constructs were selected before the less soluble STKs and RTKs.
Construction of random mutation libraries
As part of the development of our solubility-screening platform, it was important to develop a robust system to generate large libraries of random mutations. The epPCR approach developed by Fromant et al. (1995)
and the recombination-mediated Invitrogen GatewayTM cloning system were combined. The GatewayTM system was selected for ease of cloning and as a robust method for eliminating problems of poor ligation efficiency that can arise with conventional cloning methods. epPCR using ERDBD as the template was used to generate the first libraries. Since we aimed to keep the numbers of mutations low to avoid the introduction of deleterious levels of mutations (Suzuki et al., 1996
; Bloom et al., 2005
), we used an unbalanced ratio of nucleotides as well as MnCl2/MgCl2 salt concentration predicted to introduce
3 mutations per 1000 bp. PCR products were then cloned into our pT7-ZsGreen expression vector. Table III summarises an analysis of a pool of expression clones showing that we introduced a variety of mutations spread throughout the target gene and were able to keep the number of mutations low (<3 mutation/kb). Additionally, the spectrum of mutations is in agreement with previous published work (Wong et al., 2004
) on Taq polymerase that indicates a bias for mutations on A and T (data not shown).
|
We further tested the protocol by constructing a series of libraries into the pT7-ZsGreen expression vector using target genes ranging from 0.8 to 1.7 kb. As seen in Table IV, we were able to successfully create random mutation libraries of sizes ranging from 0.2 x 104 to 2 x 105 clones depending on the efficiency of the GatewayTM cloning reactions. We sometimes saw poor yield for the epPCR reaction as well as a discrepancy between the predicted and the observed rate of mutations as some newly constructed libraries showed very high mutation rates (>5 mutation/kb). Our interpretation is that this increased error rate is gene-primer dependent, as the gene length, GC content and amount of starting material do not appear to affect the PCR mutation rate.
|
To circumvent these issues, which could have a detrimental effect on the library quality, solubility screen and overall process timelines, we routinely set four epPCR reactions in parallel using conditions to introduce 1.5, 2, 3 and 4 mutations per kb and effectively created three to four libraries per target gene. Sequencing analysis of 16 randomly selected clones was carried out to identify the library with the desired target mutation rate. The library identified to have the correct mutation rate was then used for screening. The quality of the library was also assessed by identifying the number of clones containing our gene of interest. Routinely
85–100% of library clones contained our gene of interest with the remaining clones containing short or no insert resulting from GatewayTM cloning artefacts. To analyse the libraries, we routinely prepared DNA template straight from bacterial culture by rolling circle amplification using GE Healthcare TempliPhi kit. This allows the preparation of enough material for the DNA sequencing and restriction mapping of clones in a high-throughput manner. Preparation of soluble EphB2 mutants by directed evolution
In order to test the efficacy of our selection system, we applied our ZsGreen-based directed evolution screening to obtain soluble mutants of the kinase domain from the human EphB2 RTK. The Ephrin B2 RTK domain used in this study consisted of residues D604 to S898 (labelled as EphB2 in the following text). We constructed a fusion protein in which, un-mutated EphB2 was linked to an N-terminal 6His tag and to a C-terminal ZsGreen reporter. Initial expression experiments demonstrated that E.coli cells express the wild-type EphB2-ZsGreen 62 kDa fusion at high levels, but the majority of this product ends up in inclusion bodies (Fig. 8a) and does not fluoresce (Fig. 8b). Similarly, an EphB2 construct with an N-terminal 6His tag fails to be expressed in soluble form (Fig. 10b). Thus, the presence of EphB2 sequence as an N-terminal fusion prevents ZsGreen from forming its native fluorescent structure. A random mutation library was generated by epPCR according to the protocol by Fromant et al. (1995)
modified to allow for GatewayTM cloning into pT7-ZsGreen destination vector. DNA sequencing of 16 randomly selected clones determined that the library had an average of 3 mutations per kb. The library size was estimated to
8 x 104 independent clones following GatewayTM cloning recombination event reaction. The EphB2 mutated sequences were then fused to a 6His tag at their 5' end and to ZsGreen coding sequence at their 3' end in the pT7-ZsGreen vector and transformed in BL21(DE3) cells.
|
A library of 20 000 clones was screened using the GloPix colony picker using a two-step process. In the first step, the top 1000 fluorescent clones were selected. In the second step, these top 1000 hits were gridded onto new plates using a 96-pin replicator and re-screened against the un-mutated fusion protein control. Over 60% of the top hits showed a
1.5–to 2-fold increase in fluorescence compared with the un-mutated EphB2 control, thus indicating a significant enrichment of potentially more soluble clones (data not shown).
From this second step, the most fluorescent
400 clones selected by the colony picker were subjected to small-scale expression in order to determine if the improvement in fluorescence correlates with an increase in soluble protein form. Purification and analysis of the recombinant proteins were performed, in a 96-well format, using GE Healthcare Multitrap system and Agilent Lab-on-a-chip platform, respectively. This allowed us to rapidly compile data in order to identify clones showing highest fluorescence signal combined with high solubility compared with un-mutated wild-type control (Fig. 9). The solubility level of the clone with highest fluorescence (mutant 1 corresponding to C5 in Fig. 9b) was further assessed with expression and purification experiments at a larger scale. As seen in Fig. 10a, wild type and mutant EphB2 express at similar levels and whereas the wild-type EphB2-ZsGreen fusion fails to be purified from the soluble fraction, the mutant EphB2-ZsGreen fusion can be readily purified from the same fraction. This evolved mutant was sub-cloned into an expression vector lacking the fluorescent tag. Subsequent expression and purification experiments showed that this EphB2 variant retained its solubility in the absence of the ZsGreen C-terminal tag (Fig. 10b). Mutant 1 has a solubility yield at 8.5 mg/l culture compared with that of the wild-type EphB2 <0.1 mg/l culture.
|
|
| Discussion |
|---|
|
|
|---|
By combining the epPCR and GatewayTM methods for library construction with the ZsGreen reporter-based solubility screen and an automated colony picker, we have developed a process to screen and analyse 20 000 potentially soluble clones within 8 weeks (Fig. 11). This includes 2 weeks for the library construction, 2 weeks for the fluorescent solubility screen and another 4 weeks for downstream analysis of top performing mutants. Analysis of several RCFPs as solubility reporters showed that ZsGreen was the most promising with regard to the ease of detection, compared with the other RCFPs and GFP(S65T) [The authors would like to make clear that GFP(S65T) was used in this study and not the GFP folding reporter (S65T, F64L mutant of the cycle-3 GFP) as reported by Waldo et al.]. We have also demonstrated the successful use of ZsGreen for reporting differences in solubility with proteins of known solubility (ERDBD/[C245S]ERDBD), a panel of test proteins and the target EphB2. Characterisation of the evolved EphB2 soluble mutants is currently underway and will be the focus of a further publication (manuscript in preparation).
|
Importantly, ZsGreen does not by itself enhance or decrease the solubility of the fusion proteins and this ensures that the detected increase in solubility is due to the properties of the fusion protein alone. This is the case in the present study where the improved solubility of the EphB2 evolved mutant is retained in the 6His constructs lacking the ZsGreen tag. Additionally, the highest fluorescence signal was obtained for the cells expressing the EphB2 evolved variant mutant 1, which correlates with the highest soluble yield following purification.
In the epPCR method used, we observed a variable error rate and believe these variations in mutation rates to be due to the use of primers containing att sequences rather than primers optimised for epPCR. To overcome problems with error rate reproducibility and any potential bottleneck in screening that might result from producing one library at a time, we set up parallel epPCR reactions with differing conditions to acquire the correct mutation rate. Quality control by sequencing of randomly chosen mutated genes identified which of the libraries contained the desired rate of mutation. This parallel approach saved time as all libraries were immediately ready to use in the GatewayTM sub-cloning step and backup libraries were readily available if a library with a different mutation rate was required. The use of the Templiphi sequencing method in our process saves time compared with standard DNA plasmid preparations and the subsequent analysis of the sequences is greatly speeded up by using software such as the SoftGene Mutation Surveyor, which creates large contigs of sequences and directly reviews and compiles mutations in a user-friendly format.
Use of the GatewayTM cloning system is convenient for dealing with large numbers of clones and allowing the easy transfer of libraries into the ZsGreen expression vector. During the GatewayTM recombination steps, we can readily monitor the library size and mutation rate. The initial screening and subsequent gridding and comparison to wild-type ZsGreen fusion protein fluorescence ensured that only colonies expressing fluorescence above the wild-type ZsGreen fusion background level went forward in the screening process. This combination of optimisation steps has resulted in a high-throughput process. The application of multi-parallel approaches in the downstream analysis also enabled the confirmation of those hits with improved solubility and aided in the decision of which of the hits should be focused on for further characterisation. Although these approaches deal with multiple samples and allow the analysis of high numbers, the process is still slowed by the human element in the time required to analyse data generated.
The directed evolution and solubility screening process at present, as outlined in Fig. 11 with white arrows, has in our hands been successful in delivering protein of sufficiently improved solubility for its intended end use. It is an alternative approach to gaining soluble protein for which there is a lack of structural knowledge. It is foreseen that improvements in the process can be made with respect to the generation of more mutations and further combination of the existing hits, shown in Fig. 11 with the patterned arrows. Generation of mutations with epPCR might be more predictable using gene specific primers or commercially available kits, such as the Stratagene GeneMorph EZ clone random mutagenesis kit with the attB sequences added in a subsequent nested PCR. Libraries constructed using these approaches are then ready for use in the ZsGreen-based solubility screen. Apart from enzyme-based methods such as PCR under low fidelity conditions using unbalanced dNTPs or MnCl2, the use of E.coli mutator strains, which are deficient in genes required for DNA repair or replication have been also been reported (reviewed in Wong et al., 2006).
The further recombination of hits is now under development using DNase shuffling as an extra step after hit identification, using EphB2 as an example. The longer-term objective of directed evolution, including further recombination, is the accumulation of useful knowledge via analysis of mutants with improved solubility. The aim of gathering this information is to facilitate the construction of smaller more rationally designed libraries, using for example the QuickChange Multi kit (Stratagene). It is envisaged that information gained from such libraries could then be used to apply or avoid certain amino acid substitutions in a rational approach to mutate other members of the same enzyme family. However, in a drug discovery context where structural information would be derived from evolved protein variants, the improvement of solubility must be carefully balanced against loss of function and gross alteration of potential binding sites for drug compounds. Hence evolved mutants should be routinely screened for activity and compound binding. The process as a whole, with respect to the time taken to deliver the first soluble hits, could benefit from further automation. A black frame in Fig. 11 highlights these steps. The use of a plate stacker linked to the colony picker would enable higher throughput and continuous screening of colonies and increase the screening capacity beyond the actual 20 000 clones possible with the colony picker. At present, the first round of screening is carried out over 1 week with the gridding screen in the following week. With small adaptations to the process, we aim to carry out both steps in the same week, thus reducing the time for the process as a whole from 8 to 7 weeks. Downstream of the colony picking, small-scale fermentation and protein purification could be automated by using a dedicated protein expression and purification platform such as PiccoloTM (The Automation Partnership) or the ExpressionfactoryTM (NextGen Sciences).
Further development of the screening platform is underway and we are in the process of using it with a range of difficult to express target proteins. It is expected that if successful this will impact directly on the early stages of the drug discovery process.
| Footnotes |
|---|
Edited by Alan Berry
| Acknowledgements |
|---|
|
|
|---|
Magnus Hansson and Per Åke Löfdahl for work on the epPCR and Hannu Ojanperä for assistance in the library screening. We would like to thank Mark McAlister and Isabelle Green for providing the original EphB2 construct. We also thank Mark Abbott, Christine Dartsch, April Greene and Ian Taylor for stimulating discussion and helpful criticism.
| References |
|---|
|
|
|---|
Baneyx F., Mujacic M. Nat. Biotechnol. (2004) 22:1399–1408.[CrossRef][ISI][Medline]
Binns K.L., Taylor P.P., Sicheri F., Pawson T., Holland S.J. Mol. Cell Biol. (2000) 20:4791–4805.
Bloom J.D., Silberg J.J., Wilke C.O., Drummond D.A., Adami C., Arnold F.H. Proc. Natl Acad. Sci. USA (2005) 102:606–611.
Bourett T.M., Sweigard J.A., Czymmek K.J., Carroll A., Howard R.J. Fungal Genet. Biol. (2002) 37:211–220.[CrossRef][ISI][Medline]
Bowden G.A., Paredes A.M., Georgiou G. Biotechnology (NY) (1991) 9:725–730.[CrossRef][ISI]
Cabrita L.D., Bottomley S.P. Biotechnol. Annu. Rev. (2004) 10:31–50.[Medline]
Carrio M.M., Corchero J.L., Villaverde A. FEMS Microbiol. Lett. (1998) 169:9–15.[ISI][Medline]
Eijsink V.G., Bjork A., Gaseidnes S., Sirevag R., Synstad B., van den Berg S, Vriend G. J. Biotechnol. (2004) 113:105–120.[CrossRef][ISI][Medline]
Fromant M., Blanquet S., Plateau P. Anal. Biochem. (1995) 224:347–353.[CrossRef][ISI][Medline]
Hart D.J., Tarendeau F. Acta Crystallogr. D Biol. Crystallogr. (2006) 62:19–26.[CrossRef][Medline]
Hartley J.L., Temple G.F., Brasch M.A. Genome Res. (2000) 10:1788–1795.
Kapust R.B., Waugh D.S. Protein Sci (1999) 8:1668–1674.[Abstract]
Kim G.J., Cheon Y.H., Kim H.S. Biotechnol. Bioeng. (2000) 68:211–217.[CrossRef][ISI][Medline]
Low L.Y., Hernandez H., Robinson C.V., O'Brien R., Grossmann J.G., Ladbury J.E., Luisi B. J. Mol. Biol. (2002) 319:87–106.[CrossRef][ISI][Medline]
Matz M.V., Fradkov A.F., Labas Y.A., Savitsky A.P., Zaraisky A.G., Markelov M.L., Lukyanov S.A. Nat. Biotechnol. (1999) 17:969–973.[CrossRef][ISI][Medline]
Pedelacq J.D., Waldo G.S., Cabantous S., Liong E.C., Terwilliger T.C. Protein Sci. (2005) 14:2562–2573.
Suzuki M., Christians F.C., Kim B., Skandalis A., Black M.E., Loeb L.A. Mol. Divers. (1996) 2:111–118.[CrossRef][ISI][Medline]
Tobbell D.A., Middleton B.J., Raines S., Needham M.R., Taylor I.W., Beveridge J.Y., Abbott W.M. Protein Expr. Purif. (2002) 24:242–254.[CrossRef][ISI][Medline]
van den Berg S., Lofdahl P.A., Hard T., Berglund H. J. Biotechnol. (2006) 121:291–298.[CrossRef][ISI][Medline]
Waldo G.S. Methods Mol. Biol. (2003) 230:343–359.[Medline]
Waldo G.S., Standish B.M., Berendzen J., Terwilliger T.C. Nat. Biotechnol. (1999) 17:691–695.[CrossRef][ISI][Medline]
Wenck A., et al. Plant Cell Rep. (2003) 22:244–251.[CrossRef][ISI][Medline]
Wikstrom A., Berglund H., Hambraeus C., van den Berg S, Hard T. J. Mol. Biol. (1999) 289:963–979.[CrossRef][ISI][Medline]
Wong T.S., Tee K.L., Hauer B., Schwaneberg U. Nucleic Acids Res. (2004) 32:e26.
Wybenga-Groot L.E., Baskin B., Ong S.H., Tong J., Pawson T., Sicheri F. Cell (2001) 106:745–757.[CrossRef][ISI][Medline]
Yang J.K., Park M.S., Waldo G.S., Suh S.W. Proc. Natl Acad. Sci. USA (2003) 100:455–460.
Received February 23, 2007; revised May 4, 2007; accepted May 8, 2007.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| |||||||||||||||||||||||||










