Non-consensus GLI binding sites in Hedgehog target gene regulation

Background The GLI transcription factors, mediators of the hedgehog signal bind with high affinity to the consensus sequence GACCACCCA. The affinity of variant single substitutions in GLI binding sites has been measured systematically, but the affinities of the variant binding sites appears low compared to the frequency of occurrence of variant sites in known GLI target gene promoters. Results We quantified transcriptional activation by GLI using PTCH1 promoter based luciferase reporters containing all single substitutions of the GLI consensus binding site. As expected variants with very low affinity did not activate the reporter. Many lower affinity binding sequences are, however, functional in the presence of moderate GLI concentration. Using two natural non-consensus GLI site promoters we showed that substitution of the variant sequences by consensus leads to comparable activity. Conclusions Variant GLI binding sites with relatively low affinity can within natural promoters lead to strong transcriptional activation. This may facilitate the identification of additional direct GLI target genes.


Background
Sequence specific binding of transcription factors in response to diverse cellular input signals is a major determinant in the regulation of transcription. Binding sequences for many factors have been identified by experiment and/or by a wealth of prediction methods (reviewed in [1]). Consensus binding sites were classically determined by SELEX experiments and verified by EMSA while more recently affinity measurements by methods better suited to moderate to large scale experimentation like microarray binding experiments have been used [2]. Experimentally determined affinities or frequencies for each base at every position of a binding site can be represented as position weight matrices or sequence logos, which can be used for prediction of new binding sites [3,4]. It is well known that not all sequences, which a transcription factor strongly binds to in vitro will also be bound in an in vivo context [5]. Global chromatin immunoprecipitation can identify the sequences bound by a transcription factor within the cellular context but does not indicate whether the binding site is functional, i.e. whether the presence of a given TF at this site affects expression of the target gene. For this, additional information usually derived from microarray data, sequencing or promoter studies is required [6,7].
Relative binding affinity is a good indicator of transcriptional activation or repression in an artificial system as shown for example by Kang et al for the Zif268 DNA binding domain joined to repressor or activator domains [8]. A detailed description of the quantitative relationship between affinity and activation potential in the cell is difficult since in vivo activation depends on the presence of co-factors, additional transcription factors and the epigenetic state of the chromatin. On the other hand, a single high affinity binding site in combination with a minimal promoter frequently does not produce strong target gene activation and reporter constructs therefore usually contain several repeats of consensus binding sites to enhance reporter activity. In the analysis of specific promoters attention is usually first focussed on consensus sites though the functionality of variant sites for many transcription factors has been shown in vivo and in reporter gene assays. The effect of variation in a single site on activation and specificity has extensively been investigated in E coli [9]. Within specific mammalian promoters the influence of variant sites on transcriptional activation has not been explored systematically.
The three GLI transcription factors, mediators of the hedgehog signal, comprise a DNA binding domain of five zinc fingers, which are very highly homologous in the three GLIs. Two of the five fingers are responsible for all but one of the protein-DNA base contacts [10]. The GLIs can function as activators and/or repressors and regulate target genes in a highly context specific way. The consensus binding sequence GACCACCCA was first determined by Kinzler et al [11] and many direct GLI target genes have been identified. Hallikas et al [12] determined the affinities of all single base substitutions in the GLI consensus binding sequence using a fusion of luciferase with the GLI-DNA binding domain in an in vitro assay. These data together with information on species conservation were used in the novel EEL prediction program to identify GLI regulated genes within the mouse and human genome. These predictions were successful in identifying new target genes though some known target genes were not represented in the original version. This emphasizes the need to characterise in more detail the relationship between affinity and functionality of GLI binding sites in functional assays.
We therefore set out to investigate the activity of all single site variants of the consensus GLI binding site in a luciferase assay. Frequently GLI transcriptional activity is measured in an artificial construct containing multiple copies of the consensus site. Here we use a construct based on the PTCH1 promoter, which is functional in many different cell types and should approximate a "normal" control of gene expression. Using relatively low GLI concentration to enhance specificity we found that a rather large number of variant GLI binding sites was able to activate transcription within the PTCH1 promoter. We then proceeded to turn variant binding sites into consensus within two unrelated natural promoters containing essential non-consensus GLI binding sites and found that activity was not significantly enhanced.

Results and Discussion
A PTCH1 reporter system to measure the functionality of variant GLI binding sites The hedgehog receptor PTCH1 is a well characterised direct GLI target gene and its elevated expression is indicative of Hh pathway activation. PTCH1 expression is driven from several alternative transcription start sites [13]. The PTCH1 promoter region upstream of exon 1B ( Figure 1A) has been shown to contain a GLI consensus site (BS2, -704) [14] essential for activation by GLI. We localised a second GLI binding site (BS1, GACCTCCCA) with a single substitution compared to consensus upstream of BS2 at -1033. The presence of BS1 only is not sufficient for promoter activation by GLI in a luciferase assay, but it enhances transcriptional activation in the presence of BS2 ( Figure 1C). We chose to use the essential BS2 site in the PTCH1 promoter to investigate the influence of all 27 possible single base substitutions in the consensus sequence on transcriptional activation. To facilitate the exchange of consensus by the variant binding site we replaced the consensus site with a linker sequence permitting the test sequence to be quickly inserted into the PTCH1 luciferase reporter construct (PTCH1_VAR) ( Figure 1A). Together with the variant sequence, a HindIII site was inserted to allow fast identification of plasmids containing the variant sequence. The base C in position 14 relative to the start of the consensus sequence has previously been shown to positively affect GLI binding affinity [12,15] and is included in the construct as part of the HindIII site.
We then tested the functionality of the luciferase reporter system by comparing the ability of GLI2act to activate the reporter constructs containing the linker with the consensus sequence (PTCH1_VAR_(cons)) to the unmodified PTCH1 promoter luciferase reporter construct (PTCH1_WT) ( Figure 1B). All results presented here were obtained with GLI2act, which is a strong activator. When GLI1 was used comparable results were obtained though activity was lower (data not shown, CS unpublished). As shown in Figure 1B both wild type (PTCH1_WT) and modified PTCH1 promoter construct (PTCH1_VAR(Cons)) were strongly induced in response to GLI2act with only slightly lower activation for the modified PTCH1 promoter ( Figure  1B). As expected, the inactive variant 6G7G (GAC-CAGGCA) ( Figure 1B) in PTCH1_WT as well as in PTCH1_VAR resulted in strongly reduced reporter activity. No activation was observed with PTCH1_VAR, with no inserted sequence. Thus, the modified PTCH1 reporter system is functional and can be used to systematically measure the effect of variation in GLI binding sequence on GLI target gene activation.

The effect of GLI binding site variants on PTCH1 promoter activation
To determine GLI activity for all single site variants of the 9 bp consensus binding sequence we co-transfected each PTCH1_VAR luciferase reporter together with GLI2act into HaCaT cells ( Figure 2A). As a negative control we used PTCH1_VAR(6G7G) ( Figure 1B). To exploit the dynamic range of the reporter system, all assays were performed under optimal transcriptional activation conditions using moderate GLI2act levels. The boxplot (Figure 2A) shows the range of activities measured at each position, statistical significance compared to negative control and to consensus is shown in Figure 2B. At first view it is striking that many sequence variants result in reporter activation similar to the consensus GLI binding site. Especially in position 5 there is no significant difference in the transcriptional activities between consensus and any non-consensus bases ( Figure  2A, B). In contrast, any substitution in position 4 or 6 leads to loss of activity, consistent with affinity measurements showing complete loss of GLI binding if these critical positions are altered (CS unpublished). There are several positions where the identity of the substituted base shows a pronounced effect on transcriptional activation: in position 7 (C in consensus), G and T do not lead to reporter gene transcription while A reproducibly equals or even appears to exceed the level of activation by consensus. A number of variants results in activities intermediate between consensus and background. Taking into account the variability inherent in biological replicates it is not possible to attach significance to relatively small differences in activity. To exclude the possibility that the linker sequence, which surrounds the binding site differentially affects the activation of the various reporter constructs, we also tested a small number of binding site variants directly within the unmodified PTCH1 promoter construct by introducing sitespecific mutations (Figure 3). No major discrepancies were observed, suggesting in summary that many variant GLI binding sites are functional and can substitute for the consensus. We then compared the transcriptional activation ( Figure 2A) to the affinity profile described by Hallikas et al [12] and found that a large number of substitutions, which have quite low affinity significantly activate the luciferase reporter. This may be due to the fact that the nonlinear normalization applied to the raw data very strongly emphasizes the consensus site [16]. Conventional competitive EMSA measurements on selected binding sequences with linear normalization showed several single substitutions with Kd values within a factor of 10 of the consensus (CS, unpublished), which are compatible with the results of the luciferase reporter activity found. This is also consistent with the existence of many single and several double substitutions in the GLI consensus sequence of promoters with known GLI dependent function in vivo (Table 1), which failed to be retrieved in genome-wide in silico searches for GLI target genes [12] e.g. BCL2, IL1R2, FST, TGM3 (Table 1).
Though not perfectly representing the context of chromatin, luciferase reporter assays can be used to distinguish between potentially functional GLI binding sites and apparent binding sites, which do not activate reporter gene activity within their sequence context. This can be demonstrated clearly for the TGM3 promoter, which contains three potential GLI binding sites: one consensus sequence and two variants with a C to A substitution in position 7 (7A) (Figure 4). Mutation of the consensus sequence to nonbinding 6G7G does not affect reporter activity nor is the consensus site bound by GLI in a ChIP experiment ( Figure 4B, C). In contrast, the variant sites are bound by GLI and mutation of either variant site abolishes transcriptional activation ( Figure  4B, C).

Non-consensus GLI binding sites in GLI target gene promoters
To further explore the influence of binding affinity on transcriptional activation in a natural promoter context other than PTCH1 we chose the JUN and GLI1 promoters, both containing functional non-consensus GLI binding sites, for further analysis The human JUN (JUNpromWT2G5C) [17] and human proximal GLI1 promoter (GLI1prom WT9G) Figure 3 Validation of the transcriptional activation obtained with the PTCH1_VAR construct containing a selection of different binding site variants with a modified PTCH1_WT construct in a luciferase assay. GLI binding site variations (3T, 8A, 6G7G) were introduced into the PTCH_WT construct by site directed mutagenesis. HaCaT cells were co-transfected with the indicated reporter constructs and a GLI2act expression construct together with LacZ for normalization. No significant differences were observed between the mutated PTCH_WT construct and the PTCH_VAR construct demonstrating the relevance of the PTCH_VAR test system. 6G7G was used as inactive negative control.  [18] both contain only one functional GLI binding site thus eliminating possible interactions between nearby GLI binding sites ( Figure 5A, B). Either binding site variant has been shown to be essential for activation by GLI and both have significantly lower affinity than the consensus site (9G coefficient according to Hallikas et al binding profile 0.004 for GLI1 (0.982 for consensus) and GLI3 (0.937 for consensus), 0.000 for GLI2 (0.982 for consensus) (see Table S1 in [12]), double substitutions as found in the JUN promoter were not tested under identical conditions, [12,17]. To compare the activity of the variants to the consensus, we applied site directed mutagenesis to change the wild type variant sites to the consensus sequence (GLI1promCons, JUNpromCons) ( Figure 5). The luciferase reporter constructs (GLI1-promCons, JUNpromCons) were then tested for the response to GLI2act in HaCaT cells and luciferase activity was compared to the respective wild type promoter constructs (GLI1promWT9G, JUNpromWT2G5C). We detected no significant difference between GLI consensus and non consensus wild type sequences in the context of either promoter. These results indicate that relatively low binding site affinity does not prevent activation by GLI in a luciferase assay at optimal GLI concentration.
Recent observations show that lower affinity binding sites for transcription factors can be identified by global ChIP [7] and occur quite frequently. Large scale affinity measurements as described in [2] showed that a large selection of transcription factors recognises many variations of the primary motifs and that even secondary motifs exist, which may possibly affect changes in transcriptional specificity. A visible influence of low affinity sites on gene expression in yeast has been described pointing to their potential relevance for modulating gene expression [19]. Vokes et al [15] identified a number of GLI promoters/enhancers, which behave in a tissue specific way and are influenced by nuclear GLI concentration. In a more global study, groups of sites with high and lower affinities to REST repressor were shown to cluster into groups responsible for activation of target genes expressed commonly, specifically or uniquely in different cell lines [20]. These data imply an important role for lower affinity sites in the context dependent control of transcription and point to the need for more detailed investigation of their function.

Conclusion
The results presented here specifically focus on the activation potential of binding sites of the GLI transcription factors, the mediators of the hedgehog signal. We measured activation in a standardised luciferase assay in the context of the PTCH1 promoter testing all Figure 4 Activation of TGM3 in response to GLI requires two low affinity GLI binding sites. (A) The human TGM3 upstream regulatory region contains two non-consensus GLI binding sites (7A) (BS2, BS3). Lines below represent DNA amplified by qPCR in ChIP (*;, *;*;). (B) Luciferase reporter assay with the wild type TGM3 promoter fragment (TGM3prom) and constructs with mutated putative GLI binding sites (BS1 Mut, BS2 Mut, BS3 Mut). HaCaT cells were cotransfected with promoter constructs as indicated and GLI2act expression constructs or the empty expression vector pcDNA4/to. Data shown are mean values of relative light units (RLU) of three independent experiments. Mutation of either BS2 or BS3 completely abolishes reporter activation while mutation of the consensus sequence has no effect. (C) Chromatin immunoprecipitation (ChIP) demonstrates specific binding of GLI2act to a region of the TGM3 promoter containing BS2 and BS3 (*;*; in A). No amplification was observed for the region containing BS1, the GLI consensus binding site (*; in A). Chromatin isolated from GLI2actHaCaT cells was precipitated with either specific antibody (GLI2 N-20) or species matched normal IgG (nIgG). As positive control the region of the human PTCH promoter containing the functional consensus site (PTCH BS2) was used. Data shown are fold enrichment of specifically precipitated DNA (GLI2 N-20) compared to samples using species matched normal IgG for unspecific precipitation.
single site mutations of the GLI consensus binding sequence. A rather large number of substitutions was shown to be active, which is consistent with the existence of many known GLI target gene promoters containing variant sites with lower binding affinity. Taking into account the contribution of a larger subset of binding sites with significant affinity the results presented in this study are likely to be helpful in the prediction and experimental validation of more direct GLI target genes.

Cloning
Numbering of base positions was according to [14] for the PTCH1 promoter, to [17] for the JUN promoter and to [18] for the GLI1 promoter. The GLI consensus site orientation used is 5'GACCACCCA3' [11]. The wild type PTCH1 promoter (-1022 to +211) was amplified from BAC #RP11/43505 (obtained from Children's Hospital Oakland Research Institute (CHORI)) and cloned into the NheI and BglII sites of pGL3 basic vector (Promega, Madison, USA). For the PTCH1_VAR construct GLI binding site BS2 (-704) was replaced with a 29 bp linker sequence containing the restriction sites NsiI and XhoI. Oligonucleotides representing all variant GLI binding sites and including a HindIII restriction site for quick screening of positive clones were inserted into the pGL3_PTCH1_linker construct. (Figure 1A). To mutate GLI binding sites in wild type promoters we used QuickChange site-directed mutagenesis kit (Stratagene, La Jolla, USA) according to the manufacturer's protocol and verified changes by sequencing. For primers and oligos see Table 2.

Chromatin immunoprecipitation
ChIP from GLI2actHaCaT was done as described in [22]. Antibodies used were: polyclonal goat-anti-GLI2 (GLI2-N20) (Santa Cruz Biotechnology) for specific precipitation and species matched normal IgGs (Santa Cruz Biotechnology) for unspecific control. PCR primer sequences are listed in Table 2.