Quantitative profiling of BATF family proteins/JUNB/IRF hetero-trimers using Spec-seq

Background BATF family transcription factors (BATF, BATF2 and BATF3) form hetero-trimers with JUNB and either IRF4 or IRF8 to regulate cell fate in T cells and dendritic cells in vivo. While each combination of the hetero-trimer has a distinct role, some degree of cross-compensation was observed. The basis for the differential actions of IRF4 and IRF8 with BATF factors and JUNB is still unknown. We propose that the differences in function between these hetero-trimers may be caused by differences in their DNA binding preferences. While all three BATF family transcription factors have similar binding preferences when binding as a hetero-dimer with JUNB, the cooperative binding of IRF4 or IRF8 to the hetero-dimer/DNA complex could change the preferences. We used Spec-seq, which allows for the efficient and accurate determination of relative affinity to a large collection of sequences in parallel, to find differences between cooperative DNA binding of IRF4, IRF8 and BATF family members. Results We found that without IRF binding, all three hetero-dimer pairs exhibit nearly the same binding preferences to both expected wildtype binding sites TRE (TGA(C/G)TCA) and CRE (TGACGTCA). IRF4 and IRF8 show the very similar DNA binding preferences when binding with any of the three hetero-dimers. No major change of binding preferences was found in the half-sites between different hetero-trimers. IRF proteins bind with substantially lower affinity with either a single nucleotide spacer between IRF and BATF binding site or with an alternative mode of binding in the opposite orientation. In addition, the preference to CRE binding site was reduced with either IRF binding in all BATF–JUNB combinations. Conclusions The specificities of BATF, BATF2 and BATF3 are all very similar as are their interactions with IRF4 and IRF8. IRF proteins binding adjacent to BATF sites increases affinity substantially compared to sequences with spacings between the sites, indicating cooperative binding through protein–protein interactions. The preference for the type of BATF binding site, TRE or CRE, is also altered when IRF proteins bind. These in vitro preferences aid in the understanding of in vivo binding activities. Electronic supplementary material The online version of this article (10.1186/s12867-018-0106-7) contains supplementary material, which is available to authorized users.

due to their DNA binding preferences. BATF family proteins form hetero-dimers with JUN family proteins and can recognize the 7-long TPA response elements (TRE: TGA(C/G)TCA) or the 8-long cyclic AMP response element (CRE: TGACGTCA) [4][5][6]. The bZIP domain of all three BATF family members are highly conserved. None of the BATF transcription factor have a transcriptional activation domain, and are considered to act as inhibitors of AP-1 activity [7]. BATF and BATF3 are relatively small compared to other bZIP transcriptional factors (125 and 118 amino acids, respectively) and contain no additional domains other than bZIP. BATF2 has an extra carboxyterminal domain of unknown function.
mRNA expression analysis showed that BATF and BATF3 were highly expressed in lymphocytes while BATF2 is mostly expressed in macrophages [8]. While sometimes expressed in the same cell types, each BATF family member has specific functions. For example, BATF is found to control TH17 differentiation [9] and BATF3 is required for the development of CD8a classical dendritic cells (cDC) [10]. Interestingly, BATF and BATF3 can cross-compensate in vivo in T cells and dendritic cells, but BATF2 can only compensate for BATF3 in dendritic cells [11]. The mechanism for how the family members compensate for each other is not clear.
Interferon regulatory factors (IRFs) family transcription factors have diverse roles in regulating the immune system. IRFs have a conserved DNA binding domain (DBD) known to bind to the interferon-stimulated response element (ISRE) by itself [12,13]. While the mammalian IRF family comprises nine members from IRF1 to IRF9, only IRF4 and IRF8 are known to cooperatively function with BATF family transcription factors. Structurally, IRF4 and IRF8 contain an IRF-association domain (IAD) C-terminal to the DBD. When binding cooperatively with BATF, the IAD is proposed to interact with the leucine zipper region on the BATF and the DBD binds to "GAAA" motif either 0 or 4 base pairs away from the TRE in opposite orientations [11,14,15].
The basis for the differential actions of IRF4 and IRF8 with BATF factors is still under investigation. One potential explanation could be the subtle differences in cooperative DNA binding between BATF factors and IRFs. Iwata et al. found that a "T" preference 8 base pairs 5′ to the TRE can affect the strength of T cell antigen receptor signal [16]. We propose that the differences in function between these hetero-trimers is caused by differences in their DNA binding preference. We used Spec-seq, which allows for the efficient and accurate determination of relative affinity to a large collection of sequences in parallel [17][18][19][20][21], to find differences between cooperative DNA binding of IRF4, IRF8 and BATF family members.
Spec-seq is based on the principle that the relative binding affinities of a collection of DNA sequences can be measured by separating the bound and unbound fractions of DNA and determining the ratios of each sequence in the two fractions (see "Methods"). We have used this principle to measure binding specificity many times previously, but with methods that were lowthroughput, allowing the measurement of relative affinity to only a few sequences per assay [22][23][24][25][26][27]. With the development of new sequencing technologies, Spec-seq allows that principle to be applied to measure the relative binding affinities of hundreds to thousands of sequences per assay [17][18][19][20]28]. We have recently demonstrated that it can be easily extended to measure the effects of modified bases on binding affinity, and also showed its high accuracy by comparison with a two-color competitive fluorescence anisotropy method [21]. Spec-seq can also be readily adapted to measuring the cooperativity of binding between two proteins to the same DNA sequence, in a method we call Coop-seq [17,29,30]. In this paper Spec-seq is applied for the first time to the study of hetero-trimeric protein-DNA complexes.

Spec-seq of BATF/BATF2/BATF3 with JUNB
We used full length human BATF and BATF3 and the bZIP domain of BATF2 (142aa). BATFs were heterodimerized with JUNB prior to protein purification. Each BATFx-JUNB hetero-dimer was incubated with the Spec-seq library to induce DNA-BATFx-JUNB binding (Fig. 1a). The binding reactions were loaded onto native polyacrylamide gels for electrophoretic mobility shift assay (EMSA) (Additional file 1). The separated bound and unbound bands on the gel were extracted separately for DNA, then sequenced by Illumina sequencing. The read-counts of each oligo were used for Spec-seq calculation of relative binding affinity [17,18] (see "Methods"). The DNA library used here contained three oligos (Fig. 1b). Oligos 1 and 2 can be bound in either the CRE (TGACGTCA) or TRE (TGA(C/G)TCA) mode, whereas oligo 3 can only be bound in the TRE mode. For the TRE sequences there is a single randomized flanking position which we find does not contribute to specificity, consistent with our previous results [21]. Spec-seq calculations generated relative binding energies for each of the oligos used in the library (Additional file 2). Energy logos were drawn by using only the single variant mutants from either CRE or TRE reference (energy PWMs are included in Additional file 3). All three BATF-JUNB combinations have a similar preference of binding to the TRE and CRE sites (Fig. 1c). BATF binds the TRE and CRE sites with approximately equal affinity, while BATF2 and BATF3 have a preference for CRE of 0. Oligos used to generated the library used in the Spec-seq experiment. Only the binding sites are shown. Each of these sequence in the library is flanked with sequences for amplification purposes as described in "Methods". c Energy logos for BATFx-JUNB heterodimers for both TRE and CRE binding sites. Since these binding sites have no directional preferences, these logos are generated as symmetrical. Single variants from the consensus BATFx-JUNB binding site of GAAA were used to generate these logos. The Y-axis is negative energy in kT units, so the preferred sequence is on the top. Energy PWMs are in Additional file 3 file 4). Our result agrees with Rodriguez-Martinez et al. [31], who also used heterodimers of all three BATFs bZIP domain with JUNB. The BATF2-JUNB combination is especially of note because previous reports of the full length BATF2 and JUNB combination failed to bind to TRE [32,33].

IRF4 and IRF8 spec-seq with BATF/BATF2/BATF3 and JUNB
IRF4 and IRF8 have low affinity to DNA on their own. When subjected to Selex experiments, a "GAAA" rich motif known as ISRE can be found [13]. However, that cannot reflect the realistic binding situation in vivo. Glasmacher et al. [14] reported that IRF4 Chip-seq experiment from TH17 cells yields motifs with a "GAAA" either 0 or 4 bases away from the AP-1 site. The 0-spacer "GAAA" and 4-spacer "TTTC" binding site suggests that the IRF could have two modes of DNA binding with different spacers and orientations (Fig. 2a). We designed our oligo library to measure the relative DNA binding affinity of IRFs under the presence of BATF-JUNB. The library contains oligos with randomized potential IRF sites. To allow only one potential IRF binding per protein-DNA complex, we changed the non-randomized positions to sequences that were determined in prior Spec-seq experiments to be a non-preferred sequence (ACGG).
Since AP-1 sites are palindromic, we mutated the distal half of AP-1 binding site to a lower preference one (TCC instead of TCA) because IRF was shown to prefer binding to the more conserved side of the AP-1 site [14] ( Fig. 2b). As in the BATF-JUNB Spec-seq experiments, IRF-BATF-JUNB and the DNA library were incubated and then run on native polyacrylamide gels for EMSA experiments (Additional file 1). Bound and unbound bands in the EMSA experiments were extracted for DNA and sequenced through Illumina sequencing to produce read counts for Spec-seq calculation. Energy logos were drawn by using only the single variant mutants from either "TTTC" or "GAAA" references for 4 and 0 spacers respectively, then merged together (energy PWMs in Additional file 3). Overall, the 0 spacer sites (position 5-8) for both IRF4 and IRF8 have higher specificity than the 4 spacer sites (position 1-4) (Fig. 2c). The two bases closest to the AP-1 site contribute the most to those preferences. Both IRF4 and IRF8 0-spacer half site show up as "GA(T/A)A. " IRF4 prefers A and T equally on the third position of the IRF site while IRF8 prefers a T at the third position. The binding affinity is much higher with the 0 spacer sites than with the 4 spacers, and the magnitude of the difference depends on both the IRF protein and the BATF dimer (Additional file 5  (Fig. 3a). We first normalized all energy measurements by setting TGAGTCAT (TRE-0sp) measurements in each experiment to 0. The binding energies of TGAGTCAT (TRE-0sp), ATGAGTCA (TRE-1sp) and CRE for each protein combination are graphed (Fig. 3b). The higher energy value represents lower binding affinity.

Discussion
We have found that quantitative specificities of BATF, BATF2 and BATF3 are all very similar over a large collection of binding sites. The main difference being that  [14]. The IRFx can bind either 0 or 4 nucleotides away from the BATFx-JUNB binding site. b Oligos used to generated the library used in the Spec-seq experiment. Each oligo contains two potential IRF binding locations, either 0 or 4 nucleotides from the BATFx-JUNB binding site. The IRF site intended for binding test is randomized to NNNN while the IRF site not intended for IRF binding was mutated to ACGG, a sequence not preferred by either IRF. The BATFx-JUNB site is mutated to have a "C" instead of an "A" on the 7th position to facilitate BATFx-JUNB binding in only one direction. c Energy logos for BATFx-JUNB-IRFx hetero-trimer binding. Logos for two IRFx sites were generated separately and combined in a single logo. Single variants from the consensus IRFx binding site of GAAA were used to generate these logos. The Y-axis is negative energy (kT units) so the preferred sequence is on the top BATF2 and BATF3 have a slight preference for 8-long CRE sites over 7-long TRE sites that is not observed for BATF. IRF4 and IRF8 have very similar specificities in combination with any of the BATF proteins. In every case there is a preference for IRF sites that are immediately adjacent, 0 spacer sites, to those that have a single base in between, which strongly suggests cooperative binding through protein-protein interactions [29]. The preference for the 0 spacer sites over the 4 spacer sites, with the IRF site in the opposite orientation, is even stronger. The fact that such combinations are observed in in vivo binding sites [14] suggests that there are other, currently unknown, factors contributing to the complex formation in vivo. Although the specificities of the BATF proteins are very similar, as are those of the IRF proteins, there are some significant differences in the interaction energies that may account for differential binding in vivo.

Conclusions
BATF, BATF2 and BATF3 each can form dimers with JUNB and bind DNA with very similar specificities. Each dimer can also interact with IRF4 and IRF8 to form hetero-trimeric protein complexes that bind to DNA with similar, but somewhat distinct quantitative preferences, especially regarding the spacings between the monomeric sites. Spec-seq is an effective method to measure the relative affinities to hundreds of alternative binding sites in parallel.

Protein expression and purification BATF/BATF2/BATF3-JUNB heterodimers
Full length human BATF, BATF3 and a truncated version of BATF2 (aa 1-142) were cloned into a pUC19 based plasmid with T7 promoter and T7 terminator. Only the N-terminal bZIP domain of BATF2 was used to make it equivalent to BATF and BATF3 and because earlier work had shown that the full length BATF2 did not bind TRE sequences with JUNB [32,33]. Each protein construct contains a N-terminal mCherry followed by a cleavage site for Tobacco Etch Virus nuclear-inclusion-a endopeptidase (TEV protease) and finally the actual protein of interest. In addition, a truncated version of human JUNB (aa 148-347) with C-terminal 6-histidine (6His) tag were Since the BATF proteins contain only mCherry and no affinity tags, all 6His purified proteins were heterodimerized BATF-JUNB. The mCherry on BATF proteins were cleaved off by using ProTEV Plus (Promega) following manufacturer's instructions.

IRF4/IRF8
Full length human IRF4 and mouse IRF8 were cloned into a pUC19 based plasmid with T7 promoter and T7 terminator containing N-terminal strep-tag followed by cleavage site for thrombin protease as described (39). The construct was transformed into Escherichia coli BL21(DE3) and grown in LURIA BROTH (Sigma). Protein expression was induced by adding 0.4 mM isopropyl-B-thiogalactoside (IPTG) for 3 h at 30 °C. The proteins were purified using Strep-Tactin Superflow (IBA Life Sciences) following the manufacturer's instructions. The strep-tag was cleaved off by thrombin protease digestion for 8 h at room temperature.

Library design and preparation
The BATF-JUNB Spec-seq library was designed by flanking the degenerate sequences of interest (those in Fig. 1b) with 5′ flanking sequence of GATAGTCTCATTTTCAC CCCGT and 3′ flanking sequence of TTGTTCCAT TACAGTATCTGT for downstream processing. The IRF Spec-seq library was designed by flanking the degenerate sequences of interest (those in Fig. 2b)  . The PCR product was then purified again using QIAquick Nucleotide Removal Kit. Multiple samples were pooled and sequenced and analyzed as previously described [18]. Analysis of Spec-seq data to determine relative binding energies for a collection of sequences is as previously described [17,18]. Briefly, the affinity (association constant) of a TF to any sequence, S i , can be determined by measuring the concentrations of the unbound TF, the unbound S i and the TF-S i complex ([TF], [S i ], [TF-S i ], respectively) To obtain the relative affinity of the TF to a collection of sequences, S 1 …S n , (which for convenience we label K 1 …K n ) requires only measuring the distribution of those sequences in the bound and unbound fractions and the none of the concentrations, including that of the free protein, are needed: