Localization of TFIIB binding regions using serial analysis of chromatin occupancy
© Yochum et al; licensee BioMed Central Ltd. 2007
Received: 22 June 2007
Accepted: 12 November 2007
Published: 12 November 2007
RNA Polymerase II (RNAP II) is recruited to core promoters by the pre-initiation complex (PIC) of general transcription factors. Within the PIC, transcription factor for RNA polymerase IIB (TFIIB) determines the start site of transcription. TFIIB binding has not been localized, genome-wide, in metazoans. Serial analysis of chromatin occupancy (SACO) is an unbiased methodology used to empirically identify transcription factor binding regions. In this report, we use TFIIB and SACO to localize TFIIB binding regions across the rat genome.
A sample of the TFIIB SACO library was sequenced and 12,968 TFIIB genomic signature tags (GSTs) were assigned to the rat genome. GSTs are 20–22 base pair fragments that are derived from TFIIB bound chromatin. TFIIB localized to both non-protein coding and protein-coding loci. For 21% of the 1783 protein-coding genes in this sample of the SACO library, TFIIB binding mapped near the characterized 5' promoter that is upstream of the transcription start site (TSS). However, internal TFIIB binding positions were identified in 57% of the 1783 protein-coding genes. Internal positions are defined as those within an inclusive region greater than 2.5 kb downstream from the 5' TSS and 2.5 kb upstream from the transcription stop. We demonstrate that both TFIIB and TFIID (an additional component of PICs) bound to internal regions using chromatin immunoprecipitation (ChIP). The 5' cap of transcripts associated with internal TFIIB binding positions were identified using a cap-trapping assay. The 5' TSSs for internal transcripts were confirmed by primer extension. Additionally, an analysis of the functional annotation of mouse 3 (FANTOM3) databases indicates that internally initiated transcripts identified by TFIIB SACO in rat are conserved in mouse.
Our findings that TFIIB binding is not restricted to the 5' upstream region indicates that the propensity for PIC to contribute to transcript diversity is far greater than previously appreciated.
The core promoter is the major regulatory element responsible for determining transcriptional output. The core promoter spans a region of 40–50 bases and encompasses the transcript start site . The core promoter assembles a pre-initiation complex (PIC) of general transcription factors (GTFs) in a step-wise fashion to recruit RNA polymerase II (RNAP II) [2, 3]. Reconstitution assays using purified factors demonstrate that TFIIB is required for transcript initiation by RNAP II [4–7]. The importance of TFIIB in transcript initiation was suggested by a co-crystal structure showing that TFIIB positions the coding DNA strand into the active site of RNAP II, thereby ensuring proper TSS selection . Additionally, TFIIB remains at the promoter and does not track with the elongating RNAP II complex [9, 10]. Thus, TFIIB is an ideal factor to localize core promoters.
Recently, the isolation and analysis of the mouse transcriptome by the functional annotation of mouse 3 (FANTOM3) consortium indicates that most protein-coding genes produce multiple transcripts . Importantly, for most genes the 5' end of multiple internal transcripts (as identified by the 5' cap structure) localized far downstream of the 5' TSS for the full-length protein-coding transcript. It has been proposed that regulation of internally initiated and variant transcripts may occur through alternative or multiple promoters [12–14].
In this report, we use serial analysis of chromatin occupancy (SACO) to identify TFIIB binding regions in the rat genome. SACO allows an unbiased and genome-wide interrogation of transcription factor binding regions [15, 16]. In this method, a transcription factor (in this case TFIIB) is cross-linked to its binding site using formaldehyde, and the DNA-protein complexes are isolated by chromatin immunoprecipitation (ChIP). The DNA is purified from the transcription factor and is then processed into 20–22 bp tags as in long serial analysis of gene expression . In SACO, these tags are referred to as genomic signature tags (GSTs). The GSTs are concatamerized and sub-cloned into a sequencing vector. The concatamers of TFIIB GSTs comprise the SACO library. The TFIIB GSTs are aligned to the genome and only those with unique assignments are further considered. The resolution of SACO is limited by the largest chromatin fragments included in construction of the library (for this library, approximately 2.5 kilobases). Therefore, a conservative estimate is that a TFIIB GST identifies a putative TFIIB binding site within a 2.5 kilobase fragment of chromatin. In the current study, the sequencing and analysis of a sample of the SACO library indicates that internal TFIIB binding positions are a common feature of protein-coding genes.
TFIIB SACO library
The TFIIB SACO library contains approximately 106 GSTs indicative of putative TFIIB binding regions. A portion of the library corresponding to 19,204 GSTs was sequenced. Of these, 12,968 (68%) could be assigned to a unique position in the rat genome, similar to the fraction of unique GSTs in previously characterized SACO libraries [15, 16]. We mapped the distribution of TFIIB GSTs on each rat chromosome to determine whether this set of identified regions represents an unbiased sample (see Additional file 1). The chromosomes contain between 210 and 1416 TFIIB GSTs. The average number of TFIIB binding sites per megabase of DNA ranges from approximately two on the X chromosome to eight on chromosome 16. We then focused on a sub-population of the library that contains 2481 distinct TFIIB GSTs localizing to 1783 protein-coding genes in the reference sequence (RefSeq) database. An alignment of TFIIB GSTs and corresponding RefSeq genes on chromosome 10 demonstrate that the entire length of the chromosome is represented in the SACO library (see Additional file 2). A similar representation was present at other chromosomes examined (data not shown). Therefore, chromosome size and position do not appear to bias TFIIB localizations identified in our library.
TFIIB binding occurs internally as well as 5' and 3'
Confirmation of internal PIC binding and internal transcripts
Next, we mapped the 5' ends of 18 internal transcripts for the 9 genes in Figure 3A with internal TFIIB binding regions. Using a modified cap trap assay [20, 21], we found that the 5' ends of all 18 internal transcripts surveyed were associated with TFIIB GSTs. A representative agarose gel is shown for the cap-trapped TSS of an internal transcript from CBP (Figure 3B). This indicates that the internal TFIIB positions are associated with TSSs. We also mapped the TSS for an antisense CBP transcript identified by proximity to a 3' TFIIB GST.
Internal promoters are evolutionarily conserved
The TFIIB SACO approach described in this report allowed us to ascertain experimentally the prevalence of TFIIB binding and PIC localization among protein-coding genes. We found that the majority of protein-coding genes were characterized by internal TFIIB binding (Figure 2). Several lines of evidence demonstrate that the internal positions represent core promoters. First, repeat ChIP assays performed using antibodies against TFIIB and TFIID demonstrate that each of the 18 internal sites of 9 RefSeq genes chosen at random was occupied by PIC in vivo (Figure 3). Second, we experimentally isolated eighteen 5' capped transcripts closely associated with internal TFIIB positions from nine RefSeq genes. Third, 88% of the mouse homologs identified in our screen had CAGE evidence for internal TSSs . These findings in conjunction with the earlier FANTOM3 transcript analysis reiterate that alternative promoters are a common feature of protein-coding genes. Thus, our localization of PIC via TFIIB binding suggests that positional diversity of core promoters has been underestimated.
In the last few years, we have begun to understand the complexity and diversity of core promoters that reside in the mammalian genome . This is in stark contrast to the early emphasis on three key motif elements that defined a core promoter: a TATA box, an initiator (Inr) element, and a downstream promoter element (DPE). In comparison with D. melanogaster (in which much of the core promoter architecture was originally characterized), mammalian core promoters less frequently contain TATA boxes, have a lower frequency of pairing TATA with Inr elements, and many promoters, including those within CpG islands, appear to lack all three of these core elements . Two TFIIB recognition elements (BREs) have been demonstrated to mediate TFIIB binding within core promoters. BREu is upstream of the TATA box and BREd is downstream. It is known that both the BREu and the BREd modulate promoter activity but that this effect is dependent on the specific composition of elements present within a core promoter . This core promoter heterogeneity has made motif analysis of promoters challenging. A recent study by Kim et. al. demonstrates this issue . Using a TAF1 antibody to identify core promoters using a ChIP-coupled microarray, it was discovered that the TATA box was not significantly enriched among 10,567 active promoters in fibroblast cells. Whether or not there will be a uniform code of motifs that determine whether a stretch of DNA is predicted to function as a core promoter await further experimentation. Informative motif analysis of genome-wide studies, like Kim et. al.  or this study, will likely require experimental assays to categorize the vast array of sequences into putative classes of core promoters.
The most significant finding of our study is that TFIIB binding localizes to positions other than the 5' promoter of protein-coding genes. While the strong co-occurrence of TFIIB GSTs with transcript start sites identified by the FANTOM3 group suggest that these alternative positions identify promoters, TFIIB has been shown to have additional roles outside of transcript start site selection. Singh and Hampsey recently reported that TFIIB binding to terminator sequences located downstream of protein coding regions . Instead of functioning as a core promoter to drive an anti-sense transcript, TFIIB bound at this position communicated with the 5' promoter via a loop structure. The authors proposed that this loop between the 5' promoter and 3' terminator plays a role in transcript re-initiation. It will be of interest to determine whether TFIIB binding at the 3' end of protein-coding genes identified in our SACO screen, similarly cooperates with the 5' promoter.
The results presented here provide the first genome-wide mapping of TFIIB binding in a metazoan. Because TFIIB is required for RNAP II dependent transcription, our SACO screen provides an unbiased localization of promoter elements. Our finding that TFIIB occupation of internal regions is common within genes suggests that the full-length protein-coding transcripts may in fact represent a fraction of the genetic output from these loci. Identification of evolutionarily conserved internal promoters suggests that adjacent transcripts may be subjected to regulation that is independent of the 5' untranslated region and promoter elements. Clearly we are only at the very beginnings with our understanding of the transcriptome and its regulation in higher eukaryotes.
Rin-m cells (ATCC #CRL-2057), passage 5–10, were grown in RPMI 1640 (Invitrogen) supplemented with 10% FBS (Hyclone), 100 units/ml penicillin, 100 units/ml streptomycin, and 5 mM L-glutamine. Cells were maintained at 37°C and 5% CO2 and were 70–75% confluent at the time of harvesting.
Antibodies used for ChIP included: 3 μg TFIIB c18 (Santa Cruz, sc-225), 3 μg RNAP II (Santa Cruz, sc-9001), 10 μl TFIID (Upstate Cell Signaling solutions, 06-241), and 3 μg Gal4 (Santa Cruz, sc-577). According to the manufacturer the TFIID antibody is directed against TBP. ChIP assays contained 5 × 106 cells and were conducted as reported [21, 27]. Briefly, chromatin in formaldehyde-fixed lysates was sonicated to an average size of approximately 750 bp using a sonic dismembrator 60 (Fisher Scientific). Sonication was conducted for 5 × 20 sec, output 7, with 1 min intermittent rest periods. Lysates were clarified by centrifugation at 20,000 × g for 10 min at 4°C and then incubated with primary antibody overnight at 4°C. Immunocomplexes captured with bovine serum albumin/glycogen-blocked protein A sepharose (Repligen) were washed, and precipitated DNA fragments were isolated with 10% w/v Chelex-100 (BioRad). Isolated fragments were quantified by real-time PCR as previously described . Primers were designed using Primer3 software from the Massachusetts Institute of Technology (MIT). Primers were synthesized at Integrated DNA Technologies (IDT), and sequences are available upon request.
For a complete protocol for constructing a SACO library see [15, 21]. The TFIIB SACO library was subcloned into the Sph1 site of pZERO2 (Invitrogen). A second Sph1 site in the kanamycin resistance gene in pZERO-2 was mutated using the Quikchange mutagenesis kit (Stratagene) prior to subcloning of the concatamers. The complete list of genomic targets identified is available upon request.
Sequencing was performed at High-Throughput Sequencing Solutions (Seattle, WA).
5 × 106 Rin-m cells were lysed using a Qiashredder column (Qiagen) and RNA was isolated using the RNeasy kit (Qiagen). Genomic DNA was removed using the DNA free kit (Ambion). First strand cDNA was synthesized from 1 μg of DNA free RNA using a random hexamer primer (Invitrogen) and Superscript III Reverse Transcriptase (Invitrogen). Reactions lacking reverse transcriptase were analyzed to survey the presence of genomic DNA. Real-time PCR was conducted with 50 ng of cDNA and the assay was capable of detecting 10–50 copies of target cDNA.
Mapping of 5' capped nucleotides
A modification of the first choice RNA mediated rapid amplification of cDNA ends kit (RLM-RACE, Ambion) was used to identify 5' capped transcripts associated with TFIIB GSTs as reported [20, 21]. Total RNA was isolated from Rin-m cells with TRIzol (Invitrogen) and 10 μg was treated with calf intestinal phosphatase in a 20 μl reaction at 37°C for 60 min to remove 5' phosphates of contaminating nucleic acids. The 5' cap structure of the remaining RNA was removed by treatment with tobacco acid pyrophosphatase (TAP) in a 20 μl reaction at 37°C for 60 min which leaves a free 5' monophosphate. An oligonucleotide containing two nested primer sites was ligated to the 5' end with T4 RNA ligase in a 20 μl reaction at 42°C for 60 min. First strand cDNA synthesis with M-MLV reverse transcriptase and random decamers was performed in a 20 μl reaction at 42°C for 60 min. The cDNA was was amplified with 10 μM of a primer to the 5' cassette, and 10 μM of a gene-specific primer designed 700–1000 bp away from the TFIIB tag. The products were further amplified by nested PCR using a second internal primer to the 5' cassette and a second gene specific primer. The amplicons were gel purified, cloned into TOPO PCR2.1 (Invitrogen) and analyzed by sequencing.
Primer extension was performed with the primer extension/AMV reverse transcriptase system (Promega). Primers were designed approximately 50 bp downstream of the 5' capped nucleotide of CBP were end-labeled with 32P-γ-ATP (Perkin Elmer) using T4 polynucleotide kinase (NEB). Unincorporated 32P-γ-ATP was removed using a NucAway spin column (Ambion) and specific activity of the probes was determined using a 2200CA liquid scintillation analyzer (Packard). Total RNA was isolated using TRIzol (Invitrogen) and 10 μg was annealed to the radiolabeled primer at 58°C for one hour. cDNA was extended using AMV reverse transcriptase (Promega). Products were resolved on a 7 M urea/1× TBE (89 mM Tris base, 2 mM EDTA, 89 mM boric acid)/8% polyacrylamide denaturing gel using a sequencing gel apparatus (Gibco BRL, Invitrogen). An end-labeled Φ× HinfI digested ladder was included for size reference.
Initial processing and placement of the GSTs followed the pipeline established for our previous SACO libraries [15, 16]. Chromosome locations, start, end, and orientation of rat RefSeq features aligned to the rn3 build of the rat genome were obtained from the UCSC annotation database. The 1783 RefSeq features that were identified to be associated with putative internal TFIIB binding sites were mapped to their homologs in mouse using Homologene. Genomic coordinates and orientation for mouse RefSeq transcripts were obtained from the FANTOM3 build. The annotated 5' ends of transcripts were obtained from the boundary_set dataset from the FANTOM3 study . The boundary_set dataset was also used to locate internal TSSs for the mouse homologs. All TSS with evidence of 1 or more CAGE tags (or had a reliability of 1 assigned by FANTOM3 curators) were selected for further analysis. Data and annotation from FANTOM3 were based on the mm5 build of the mouse genome.
We thank Dr. Richard H. Goodman (Oregon Health and Science University) and members of the Goodman lab for discussions on SACO. We also thank the reviewers whose comments improved the manuscript. This publication was made possible with support from the department of Hematology and Oncology (Oregon Health and Science University), the Oregon Clinical and Translational Research Institute (OCTRI), grant number UL1 RR024140 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH), and NIH Roadmap for Medical Research.
- Juven-Gershon T, Hsu JY, Kadonaga JT: Perspectives on the RNA polymerase II core promoter. Biochem Soc Trans. 2006, 34 (Pt 6): 1047-1050.View ArticlePubMedGoogle Scholar
- Pugh BF: Mechanisms of transcription complex assembly. Curr Opin Cell Biol. 1996, 8 (3): 303-311. 10.1016/S0955-0674(96)80002-0.View ArticlePubMedGoogle Scholar
- Roeder RG: The role of general initiation factors in transcription by RNA polymerase II. Trends Biochem Sci. 1996, 21 (9): 327-335. 10.1016/0968-0004(96)10050-5.View ArticlePubMedGoogle Scholar
- Parvin JD, Shykind BM, Meyers RE, Kim J, Sharp PA: Multiple sets of basal factors initiate transcription by RNA polymerase II. J Biol Chem. 1994, 269 (28): 18414-18421.PubMedGoogle Scholar
- Reinberg D, Horikoshi M, Roeder RG: Factors involved in specific transcription in mammalian RNA polymerase II. Functional analysis of initiation factors IIA and IID and identification of a new factor operating at sequences downstream of the initiation site. J Biol Chem. 1987, 262 (7): 3322-3330.PubMedGoogle Scholar
- Reinberg D, Roeder RG: Factors involved in specific transcription by mammalian RNA polymerase II. Purification and functional analysis of initiation factors IIB and IIE. J Biol Chem. 1987, 262 (7): 3310-3321.PubMedGoogle Scholar
- Sawadogo M, Roeder RG: Factors involved in specific transcription by human RNA polymerase II: analysis by a rapid and quantitative in vitro assay. Proc Natl Acad Sci USA. 1985, 82 (13): 4394-4398. 10.1073/pnas.82.13.4394.PubMed CentralView ArticlePubMedGoogle Scholar
- Bushnell DA, Westover KD, Davis RE, Kornberg RD: Structural basis of transcription: an RNA polymerase II-TFIIB cocrystal at 4.5 Angstroms. Science. 2004, 303 (5660): 983-988. 10.1126/science.1090838.View ArticlePubMedGoogle Scholar
- Pokholok DK, Hannett NM, Young RA: Exchange of RNA polymerase II initiation and elongation factors during gene expression in vivo. Mol Cell. 2002, 9 (4): 799-809. 10.1016/S1097-2765(02)00502-6.View ArticlePubMedGoogle Scholar
- Zawel L, Kumar KP, Reinberg D: Recycling of the general transcription factors during RNA polymerase II transcription. Genes Dev. 1995, 9 (12): 1479-1490. 10.1101/gad.9.12.1479.View ArticlePubMedGoogle Scholar
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, et al: The transcriptional landscape of the mammalian genome. Science. 2005, 309 (5740): 1559-1563. 10.1126/science.1112014.View ArticlePubMedGoogle Scholar
- Heras SR, Lopez MC, Olivares M, Thomas MC: The L1Tc non-LTR retrotransposon of Trypanosoma cruzi contains an internal RNA-pol II-dependent promoter that strongly activates gene transcription and generates unspliced transcripts. Nucleic Acids Res. 2007Google Scholar
- Kleinjan DA, Seawright A, Mella S, Carr CB, Tyas DA, Simpson TI, Mason JO, Price DJ, van Heyningen V: Long-range downstream enhancers are essential for Pax6 expression. Dev Biol. 2006, 299 (2): 563-581. 10.1016/j.ydbio.2006.08.060.PubMed CentralView ArticlePubMedGoogle Scholar
- Russcher H, Dalm VA, de Jong FH, Brinkmann AO, Hofland LJ, Lamberts SW, Koper JW: Associations between promoter usage and alternative splicing of the glucocorticoid receptor gene. J Mol Endocrinol. 2007, 38 (1–2): 91-98. 10.1677/jme.1.02117.View ArticlePubMedGoogle Scholar
- Impey S, McCorkle SR, Cha-Molstad H, Dwyer JM, Yochum GS, Boss JM, McWeeney S, Dunn JJ, Mandel G, Goodman RH: Defining the CREB regulon: a genome-wide analysis of transcription factor regulatory regions. Cell. 2004, 119 (7): 1041-1054.PubMedGoogle Scholar
- Yochum GS, McWeeney S, Rajaraman V, Cleland R, Peters S, Goodman RH: Serial analysis of chromatin occupancy identifies beta-catenin target genes in colorectal carcinoma cells. Proc Natl Acad Sci USA. 2007, 104 (9): 3324-3329. 10.1073/pnas.0611576104.PubMed CentralView ArticlePubMedGoogle Scholar
- Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE: Using the transcriptome to annotate the genome. Nat Biotechnol. 2002, 20 (5): 508-512. 10.1038/nbt0502-508.View ArticlePubMedGoogle Scholar
- Gazdar AF, Chick WL, Oie HK, Sims HL, King DL, Weir GC, Lauris V: Continuous, clonal, insulin- and somatostatin-secreting cell lines established from a transplantable rat islet cell tumor. Proc Natl Acad Sci USA. 1980, 77 (6): 3519-3523. 10.1073/pnas.77.6.3519.PubMed CentralView ArticlePubMedGoogle Scholar
- Bedoya FJ, Flodstrom M, Eizirik DL: Pyrrolidine dithiocarbamate prevents IL-1-induced nitric oxide synthase mRNA, but not superoxide dismutase mRNA, in insulin producing cells. Biochem Biophys Res Commun. 1995, 210 (3): 816-822. 10.1006/bbrc.1995.1731.View ArticlePubMedGoogle Scholar
- Kim TH, Barrera LO, Qu C, Van Calcar S, Trinklein ND, Cooper SJ, Luna RM, Glass CK, Rosenfeld MG, Myers RM, et al: Direct isolation and identification of promoters in the human genome. Genome Res. 2005, 15 (6): 830-839. 10.1101/gr.3430605.PubMed CentralView ArticlePubMedGoogle Scholar
- Yochum GS, Cleland R, McWeeney S, Goodman RH: An antisense transcript induced by Wnt/beta-catenin signaling decreases E2F4. J Biol Chem. 2007, 282 (2): 871-878. 10.1074/jbc.M609391200.View ArticlePubMedGoogle Scholar
- Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y, Hume DA: Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nature reviews. 2007, 8 (6): 424-436. 10.1038/nrg2026.View ArticlePubMedGoogle Scholar
- Smale ST: Core promoters: active contributors to combinatorial gene regulation. Genes Dev. 2001, 15 (19): 2503-2508. 10.1101/gad.937701.View ArticlePubMedGoogle Scholar
- Deng W, Roberts SG: Core promoter elements recognized by transcription factor IIB. Biochem Soc Trans. 2006, 34 (Pt 6): 1051-1053.View ArticlePubMedGoogle Scholar
- Kim TH, Barrera LO, Zheng M, Qu C, Singer MA, Richmond TA, Wu Y, Green RD, Ren B: A high-resolution map of active promoters in the human genome. Nature. 2005, 436 (7052): 876-880. 10.1038/nature03877.PubMed CentralView ArticlePubMedGoogle Scholar
- Singh BN, Hampsey M: A transcription-independent role for TFIIB in gene looping. Mol Cell. 2007, 27 (5): 806-816. 10.1016/j.molcel.2007.07.013.View ArticlePubMedGoogle Scholar
- Nelson JD, Denisenko O, Sova P, Bomsztyk K: Fast chromatin immunoprecipitation assay. Nucleic Acids Res. 2006, 34 (1): e2-10.1093/nar/gnj004.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.