- Research article
- Open Access
The BsaHI restriction-modification system: Cloning, sequencing and analysis of conserved motifs
© Neely and Roberts; licensee BioMed Central Ltd. 2008
- Received: 21 January 2008
- Accepted: 14 May 2008
- Published: 14 May 2008
Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC.
The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360), cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases.
We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.
- Recognition Sequence
- Methyltransferase Gene
- Motif Reading
- R351A Mutant
DNA restriction-modification (R-M) systems are valuable tools for molecular biology and the methyltransferases in particular, which have well conserved structures , also represent excellent model systems for studying the specific interactions between DNA and DNA-binding enzymes. Despite the large number of cloned and sequenced R-M systems , in comparison to unique recognition sequences, there is remarkably little sequence similarity amongst the restriction enzymes and, though to a lesser extent, between the target recognition domains of the methyltransferases implying a diverse ensemble of DNA recognition modes and methods is used by these enzymes.
We report the cloning, sequencing and subsequent expression and purification of the BsaHI R-M system from Bacillus stearothermophilus. These enzymes target the degenerate sequence GRCGYC, where R= G/A and Y = T/C (Chen, W., Pan, X. and Chen, Z. unpublished data. See REBASE ). The inherent degeneracy of the DNA recognition by these enzymes provides an opportunity to study directly the mechanism of specific DNA recognition and to examine the question of how this breaks down into degenerate DNA recognition. Furthermore, such enzymes are exciting targets in the ongoing effort to manipulate the recognition sequences of enzymes, particularly for the restriction enzymes .
R. BsaHI belongs to the Type II subfamily of restriction enzymes . It recognises a palindromic sequence of bases and cleaves within this sequence between the purine and cytosine bases: GR/CGYC, where '/' is the cutting site. The restriction enzymes are also sub-classified as belonging to one of several 'superfamilies', named for their conserved motifs. Examples of such superfamilies include the PD-(D/E)xK, HNH or GIY-YIG superfamilies . In the present work, we utilise a bioinformatics approach to classify and identify the conserved, putative catalytic motifs of the R. BsaHI enzyme. Subsequent in vitro transcription/translation of a series of mutants is used to identify a motif that is crucial for enzymatic activity.
The target recognition domain (TRD) of the methyltransferase enzymes is the least conserved region of the enzymes of this type. However, some amino acids of the TRD are conserved and a previous report  has revealed a consensus motif towards the C-terminal end of the TRD that reads (YFW)X(RK)X5P(STCA)PT(ILV)(TASV)X5–16H(PFYWL). Structural studies have shown that the residues within and around this motif form critical interactions with the DNA duplex [7, 8]. This so-called 'TL' motif lies between 10 and 50 amino acids from the conserved methyltransferase motif IX. Trautner et al were the first to note that the TL motif was conserved and applied this knowledge to carefully define and modify the target recognition of multi-specific methyltransferase enzymes in domain swapping experiments [9–11]. A key feature of this work was that it showed that the residues lying to the N-terminal side of the TL motif are responsible for the recognition of the base to the 5'- side of the target base for methylation. Later bioinformatic analysis by Cheng and Blumenthal  noted that the recognition of the base directly 5'- of the target base for methylation could be correlated to a conserved R or Q/N upstream of the TL motif for recognition of G or C at this position, respectively. We have built on this previous work with a new bioinformatics analysis, in which we show that, to some extent, prediction of the target specificity of a given methyltransferase is possible by examination of the residues around the conserved TL motif in the variable region of the enzyme.
The ExPASy ScanProsite tool was used to carry out a search for enzymes matching any one of three strongly conserved sequence motifs ('WGKNQF', '(Q/K)(T/N)DKAF(A/S)' and 'SPERRFD') from the BsaHI homologues . These motifs were not found beyond the enzymes shown in Figure 2, suggesting that their functionality is specific to these homologues.
Lane "-basHIR" in Figure 4 shows that, in the absence of the bsaHIR gene no sequence-specific digestion takes place. However, a small amount of smearing is evident, indicating that there is a little non-specific nuclease activity in the IVTT mixture. The positive control, with wild-type BsaHI (lane 'WT'), shows complete digestion of the λ-DNA during the four-hour incubation. The Q344A, S348A and R352A mutants all show similar activity and only a small fraction of the DNA is not completely digested. The activity of all of the other mutants has been significantly impaired by the mutation and can be described by P349A~F353A > E350A~D354A > R351A, where the activity of the R351A mutant is negligible.
The similar activity of the Q344A, S348A and R352A mutants to the wild-type R. BsaHI enzyme indicates that these amino acids do not play a functional role in the enzyme. However, all of the other mutations significantly decrease the rate of the digestion. This implies that Q344 and S348 lie in a region of the enzyme that is tolerant of mutation, perhaps a turn or flexible region of the amino acid chain. Those residues from P349 to D354 define a region of the enzyme that is critical to its function. There are clear differences in the digestion rates with the different mutants. The improved activity of the P349A and F353A mutants as compared to the E350A and D354A mutants perhaps indicates that alanine is able to somewhat compensate for the absence of the bulky P/F residues, whereas it clearly cannot mimic the hydrogen bonding functionality of the E/D residues. Remarkably, the R351A mutant is inactive. This result becomes more striking when one considers that the mutation of the neighbouring residue, R352A displays activity comparable to that of the wild-type enzyme. The marked difference in the activity of these mutants of identical, adjacent residues suggests a critical and tightly defined role for R351 in ensuring the activity of R. BsaHI.
Figure 7 shows the superimposed structures of M. HaeIII and M. HhaI and illustrates that the loops on either side of the conserved TL motif are, structurally, well conserved. Using these structures, we define two trimeric sequences on the N-terminal and C-terminal side of the TL motif, which come into close contact with the DNA duplex. These trimers have the spacing 'NNN'x10TLx3'CCC' and will be referred to as the 'N-TL' and 'C-TL' motifs, henceforth. There is good evidence for the importance of the C-TL motif in the solution phase for M. HhaI . In vitro compartmentalisation experiments have shown that G257 is critical to the function of M. HhaI, whereas nearby residues S252 and Y254 can be mutated whilst activity is retained. We hypothesised that, in enzymes using similar mechanisms of DNA recognition and recognising similar sequences, the DNA contacts are likely to be similarly spaced from the TL motif and that these key, DNA-contacting residues are likely to be conserved.
A MUSCLE alignment of the characterised and putative cytosine C5-methyltransferases with known or predicted four base recognition sequences, which contain a clear TL motif, is shown in Additional File 1. For each of the distinct recognition sequences there is conservation of the highlighted N-TL motif and the C-TL motifs. The conservation within these critical regions of the enzymes suggests that, as in M. HhaI and M. HaeIII, these amino acids describe regions involved in DNA recognition and can potentially be employed to diagnose the recognition sequence of the four-base targeting cytosine C5-methyltransferases.
In the case where there is the most sequence information available for characterised enzymes, i.e. those recognising GGCC, the N-TL motif reads exclusively 'SRN'. The C-TL motif is also relatively well conserved with a preference for the trimer 'GRQ'. There are intriguing overlaps in the amino acids used in both the N-TL and C-TL motifs. Most notable are the GCGC recognising enzymes whose C-TL motif reads 'RHG' and the CGCG recognising enzymes, which employ a C-TL motif reading 'HHG'. Similar overlap is seen between the GCGC/CGCG recognising enzymes with N-TL motifs reading 'QGE'/'QG(NQ)' and those recognising CCGG/GGCC with N-TL motifs reading 'ERN'/'SRN' Such overlap is likely an indicator of the common modes of DNA recognition employed by this group of cytosine C5 methyltransferases. The common use of C-TL and N-TL motifs by enzymes recognising opposite recognition sequences (for example GCGC and CGCG) is likely a result of the simple, reversible nature of the hinged structure about the TL motif and implies that this motif is suited to DNA binding in either direction along the duplex.
Examination of the amino acid sequences for the six-base recognising enzymes reveals that the cytosine C5 methylating enzymes targeting GTCGAC contain an easily identifiable TL motif. Alignment of the sequences, however, shows that there are no significantly conserved amino acids with the spacing from the 'TL' residues seen for the 4- and 5-base recognising enzymes ('NNN'x10TLx3'CCC'). Furthermore, although the motif YGRx8T(LIM)x9GRxGH is well conserved in the GTCGAC recognising enzymes the recently sequenced M. TspMI enzyme, recognising CCCGGG, utilises an almost identical motif (YGRx8TIx9GRxL H). Clearly, the amino acids around the TL motif cannot be used to wholly describe the recognition sequences of the enzymes targeting these relatively long sequences.
The BsaHI restriction-modification system has been cloned and sequenced. The sequence alignment of R. BsaHI and its homologues clearly shows many highly conserved motifs. We showed through sequence alignment that these enzymes belong to the PD-(D/E)xK superfamily of restriction enzymes and, based on this sequence alignment, have identified residues that are potentially catalytic or involved in DNA binding. We also chose a motif reading 'QxxxSPERRFD' at the C-terminus of R. BsaHI and mutated this to investigate its function. We have shown that this motif is crucial to enzymatic activity and represents a good target for future studies. In particular, we have shown that the R351 residue is critical to the function of R. BsaHI.
M. BsaHI is a cytosine C5 methyltransferase that has been found to methylate the central cytosine of its GRC GYC recognition sequence. The amino acid sequence was found to contain all of the conserved motifs (I to X) for a cytosine C5 methyltransferase. Furthermore, the target recognition domain of the M. BsaHI was found to contain the conserved TL motif. On either side of the TL motif, we identified two amino acid trimers, the N-TL and C-TL motifs, which can potentially be used to diagnose the recognition sequence of the four- and some five-base recognising cytosine C5 methyltransferases. Should these motifs turn out to be reliable indicators of recognition sequence, such information has potential application in the search for restriction enzymes with new specificities, since it should be possible, by simple sequence inspection, to discriminate against genes containing the N-TL and C-TL motifs for known recognition sequences.
All enzymes, DNA sequencing reagents and primers were from New England Biolabs Inc. DNA purification was done using spin-column purification (Qiagen) unless otherwise stated. All reagents were used as received and according to the manufacturers instructions.
Cloning the BsaHI R-M System
The chromosomal DNA encoding the BsaHI R-M system was isolated by phenol extraction from the thermophilic bacterium Bacillus stearothermophilus, strain CPW11, from the NEB strain collection. This DNA was partially digested with HpyCH4IV to give an average fragment size of 1–3 kB. Fragments were cloned into the AccI site of pUC19 and subsequently transformed into the methyl-restriction deficient E. coli strain ER2566 (NEB T7-Express) using the heat-shock method. The methylase selection method (Hungarian Trick)  was used to select clones containing a viable bsaHIM gene. Following two rounds of selection, the isolated clone containing the methyltransferase gene was sequenced. A chromosome walking technique [20, 21] was employed in order to sequence the DNA adjacent to the bsaHIM gene. The DNA sequence encoding the bsaHIR gene was located after 3 rounds of inverse PCR, upstream of the methyltransferase gene, as illustrated in Figure 1.
Alignments of the amino acid sequences of the BsaHI R-M and their homologues were carried out using the Jalview sequence alignment editor  and generated using the MUSCLE  or MAFFT  computer programs. Homologues were identified by running a BLAST search, using an E-value cut-off of 1, of the bsaHIR and bsaHIM genes against the restriction/modification enzyme database, REBASE .
Mutations and In vitro Transcription and Translation of R. BsaHI
Targeted mutations of R. BsaHI were made using two rounds of PCR. In the first round, fragments of the bsaHIR gene were made using overlapping primers containing the mutated sequences. These fragments were purified and used as complementary primers for the second round of PCR during which a T7 promoter sequence was appended to the 5'-end of the gene. The assembled genes enabled the production of small amounts of wild-type and mutated R. BsaHI protein using the in vitro transcription/translation (IVTT) 'Puresystem' from the Post-Genome Institute, Japan. The IVTT system was used according to the manufacturer's instructions. Incubation for 2 h at 37°C resulted in an enzyme concentration equivalent to approximately 0.5 units of the wild-type R. BsaHI per μL (where 1 unit is sufficient to digest 1 μg of λ-DNA in 1 hour). We expect little variation in the expression levels of the mutants of R. BsaHI, although this has not been tested explicitly.
DNA Cleavage Assay
2 μL of the IVTT mixture was incubated with 500 ng of λ-DNA for 4 h at 37°C in the presence of RNase A. The digested DNA was purified and analysed by electrophoresis on a 1% agarose gel.
Overexpression and Purification of M. BsaHI
A PCR reaction was carried out to amplify the bsaHIM gene and to append a hexahistidine tag to the C-terminal-end of the gene. The his-tagged gene was cloned into the NheI/EcoRI sites of the pTXBI vector (NEB). This clone was transformed into E. coli ER2566, which was grown in Luria Broth in the presence of 100 μg/ml ampicillin at 37°C for 4.5 hrs. Expression of M. BsaHI was induced by addition of isopropyl-β-D-thiogalactopyranoside (IPTG) followed by outgrowth at 30°C for 16 h. The resultant cells (~1 g in 100 ml growth medium) were spun-down and resuspended in 1 ml lysis buffer (50 mM NaH2PO4, 300 mM NaCl, 10 mM Imidazole, pH 8.0) then subjected to three 20s intervals of sonication. Following centrifugation, the cell extract (~1 ml) was loaded onto a column containing 200 μL Ni-NTA Agarose beads (Qiagen). The his-tagged M. BsaHI was purified from the beads according to the manufacturers instructions. Tests with Bradford's reagent indicated approximately 0.5 mg/ml protein concentration in the second and third (250 μL) elutions from the column.
Determining the Methylation Target of M. BsaHI
pUC19 plasmid DNA was incubated with M. BsaHI in the presence of SAM for 1.5 h. The methylated DNA (250 ng) was aliquoted to a second reaction containing 0.5 μL of restriction enzyme (R. BsaHI, R. HhaI or R. HpaII) in appropriate buffer. This reaction was incubated for 2 h. The digested DNA fragments were analysed using gel electrophoresis with a 2% agarose gel (Ambion, Agarose-HR) containing 1× SybrSafe dye (Invitrogen).
We would like to acknowledge the efforts of the organic synthesis and sequencing staff at New England Biolabs and Dr. Yu Zheng for helpful discussions. RKN would like to thank the EPSRC for their generous support
- Cheng X, Blumenthal RM: S-Adenosylmethionine-Dependent Methyltransferases: Structure and Functions. 1st edition. Edited by: Cheng X and Blumenthal RM. Singapore, World Scientific; 1999.Google Scholar
- Roberts RJ, Vincze T, Posfai J, Macelis D: REBASE--enzymes and genes for DNA restriction and modification. Nucl Acids Res 2007, 35: D269-D270. 10.1093/nar/gkl891PubMed CentralView ArticlePubMedGoogle Scholar
- Townson SA, Samuelson JC, Xu SY, Aggarwal AK: Implications for switching restriction enzyme specificities from the structure of BstYI bound to a BgIII DNA sequence. Structure 2005, 13: 791-801. 10.1016/j.str.2005.02.018View ArticlePubMedGoogle Scholar
- Roberts RJ, Belfort M, Bestor T, Bhagwat AS, Bickle TA, Bitinaite J, Blumenthal RM, Degtyarev SK, Dryden DTF, Dybvig K, Firman K, Gromova ES, Gumport RI, Halford SE, Hattman S, Heitman J, Hornby DP, Janulaitis A, Jeltsch A, Josephsen J, Kiss A, Klaenhammer TR, Kobayashi I, Kong H, Kruger DH, Lacks S, Marinus MG, Miyahara M, Morgan RD, Murray NE, Nagaraja V, Piekarowicz A, Pingoud A, Raleigh E, Rao DN, Reich N, Repin VE, Selker EU, Shaw PC, Stein DC, Stoddard BL, Szybalski W, Trautner TA, Van Etten JL, Vitor JMB, Wilson GG, Xu SY: A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes. Nucl Acids Res 2003, 31: 1805-1812. 10.1093/nar/gkg274PubMed CentralView ArticlePubMedGoogle Scholar
- Bujnicki JM, Rychlewski L, Radlinska M: Polyphyletic evolution of type II restriction enzymes revisited: two independent sources of second-hand folds revealed. Trends in Biochemical Sciences 2001, 26: 9-11. 10.1016/S0968-0004(00)01690-XView ArticlePubMedGoogle Scholar
- Vilkaitis G, Dong A, Weinhold E, Cheng X, Klimasauskas S: Functional Roles of the Conserved Threonine 250 in the Target Recognition Domain of HhaI DNA Methyltransferase. J Biol Chem 2000, 275: 38722-38730.PubMedGoogle Scholar
- Klimasauskas S, Kumar S, Roberts RJ, Cheng XD: Hhal Methyltransferase Flips Its Target Base Out of the DNA Helix. Cell 1994, 76: 357-369. 10.1016/0092-8674(94)90342-5View ArticlePubMedGoogle Scholar
- Reinisch KM, Chen L, Verdine GL, Lipscomb WN: The Crystal-Structure of HaeIII Methyltransferase Covalently Complexed to DNA - An Extrahelical Cytosine and Rearranged Base-Pairing. Cell 1995, 82: 143-153. 10.1016/0092-8674(95)90060-8View ArticlePubMedGoogle Scholar
- Trautner TA, Pawlek B, Behrens B, Willert J: Exact size and organization of DNA target-recognizing domains of multispecific DNA-(cytosine-C5)-methyltransferases. EMBO J 1996, 15: 1434-1442.PubMed CentralPubMedGoogle Scholar
- Lange C, Wild C, Trautner TA: Identification of a subdomain within DNA-(cytosine-C5)-methyltransferases responsible for the recognition of the 5' part of their DNA target. EMBO J 1996, 15: 1443-1450.PubMed CentralPubMedGoogle Scholar
- Lauster R, Trautner TA, Noyer-Weidner M: Cytosine-specific type II DNA methyltransferases : A conserved enzyme core with variable target-recognizing domains. J Mol Biol 1989, 206: 305-312. 10.1016/0022-2836(89)90480-4View ArticlePubMedGoogle Scholar
- Cheng X, Blumenthal RM: Finding a basis for flipping bases. Structure 1996, 4: 639-645. 10.1016/S0969-2126(96)00068-8View ArticlePubMedGoogle Scholar
- Posfai J, Bhagwat AS, Posfai G, Roberts RJ: Predictive Motifs Derived from Cytosine Methyltransferases. Nucl Acids Res 1989, 17: 2421-2435. 10.1093/nar/17.7.2421PubMed CentralView ArticlePubMedGoogle Scholar
- Katoh K, Misawa K, Kuma K, Miyata T: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucl Acids Res 2002, 30: 3059-3066. 10.1093/nar/gkf436PubMed CentralView ArticlePubMedGoogle Scholar
- Pingoud V, Sudina A, Geyer H, Bujnicki JM, Lurz R, Luder G, Morgan R, Kubareva E, Pingoud A: Specificity Changes in the Evolution of Type II Restriction Endonucleases: A BIOCHEMICAL AND BIOINFORMATIC ANALYSIS OF RESTRICTION ENZYMES THAT RECOGNIZE UNRELATED SEQUENCES. J Biol Chem 2005, 280: 4289-4298. 10.1074/jbc.M409020200View ArticlePubMedGoogle Scholar
- Pingoud V, Conzelmann C, Kinzebach S, Sudina A, Metelev V, Kubareva E, Bujnicki JM, Lurz R, Luder G, Xu SY, Pingoud A: PspGI, a Type II Restriction Endonuclease from the Extreme Thermophile Pyrococcus sp.: Structural and Functional Studies to Investigate an Evolutionary Relationship with Several Mesophilic Restriction Enzymes. J Mol Biol 2003, 329: 913-929. 10.1016/S0022-2836(03)00523-0View ArticlePubMedGoogle Scholar
- de Castro E, Sigrist CJA, Gattiker A, Bulliard V, Langendijk-Genevaux PS, Gasteiger E, Bairoch A, Hulo N: ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucl Acids Res 2006, 34: W362-W365. 10.1093/nar/gkl124PubMed CentralView ArticlePubMedGoogle Scholar
- Lee YF, Tawfik DS, Griffiths AD: Investigating the target recognition of DNA cytosine-5 methyltransferase HhaI by library selection using in vitro compartmentalisation. Nucl Acids Res 2002, 30: 4937-4944. 10.1093/nar/gkf617PubMed CentralView ArticlePubMedGoogle Scholar
- Szomolányi E, Kiss A, Venetianer P: Cloning the modification methylase gene of Bacillus sphaericus R in Escherichia coli. Gene 1980, 10: 219-225. 10.1016/0378-1119(80)90051-7View ArticlePubMedGoogle Scholar
- Ochman H, Gerber AS, Hartl DL: Genetic Applications of an Inverse Polymerase Chain Reaction. Genetics 1988, 120: 621-623.PubMed CentralPubMedGoogle Scholar
- Triglia T, Peterson MG, Kemp DJ: A procedure for in vitro amplification of DNA segments that lie outside the boundaries of known sequences. Nucl Acids Res 1988, 16: 8186. 10.1093/nar/16.16.8186PubMed CentralView ArticlePubMedGoogle Scholar
- Clamp M, Cuff J, Searle SM, Barton GJ: The Jalview Java Alignment Editor. Bioinformatics 2004, 20: 426-427. [http://www.jalview.org] 10.1093/bioinformatics/btg430View ArticlePubMedGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res 2004, 32: 1792-1797. 10.1093/nar/gkh340PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.