Standardized collection of MNase-seq experiments enables unbiased dataset comparisons

Background The organization of eukaryotic DNA into chromatin has a strong influence on the accessibility and regulation of genetic information. The locations and occupancies of a principle component of chromatin, nucleosomes, are typically assayed through use of enzymatic digestion with micrococcal nuclease (MNase). MNase is an endo-exo nuclease that preferentially digests naked DNA and the DNA in linkers between nucleosomes, thus enriching for nucleosome-associated DNA. To determine nucleosome organization genome-wide, DNA remaining from MNase digestion is sequenced using high-throughput sequencing technologies (MNase-seq). Unfortunately, the results of MNase-seq can vary dramatically due to technical differences and this confounds comparisons between MNase-seq experiments, such as examining condition-dependent chromatin organizations. Results In this study we use MNase digestion simulations to demonstrate how MNase-seq signals can vary for different nucleosome configuration when experiments are performed with different extents of MNase digestion. Signal variation in these simulations reveals an important DNA sampling bias that results from a neighborhood effect of MNase digestion techniques. The presence of this neighborhood effect ultimately confounds comparisons between different MNase-seq experiments. To address this issue we present a standardized chromatin preparation which controls for technical variance between MNase-based chromatin preparations and enables the collection of similarly sampled (matched) chromatin populations. Standardized preparation of chromatin includes a normalization step for DNA input into MNase digestions and close matching of the extent of digestion between each chromatin preparation using gel densitometry analysis. The protocol also includes directions for successful pairing with multiplex sequencing reactions. Conclusions We validated our method by comparing the experiment-to-experiment variation between biological replicates of chromatin preparations from S. cerevisiae. Results from our matched preparation consistently produced MNase-seq datasets that were more closely correlated than other unstandardized approaches. Additionally, we validated the ability of our approach at enabling accurate downstream comparisons of chromatin structures, by comparing the specificity of detecting Tup1-dependent chromatin remodeling events in comparisons between matched and un-matched wild-type and tup1Δ MNase-seq datasets. Our matched MNase-seq datasets demonstrated a significant reduction in non-specific (technical) differences between experiments and were able to maximize the detection of biologically-relevant (Tup1-dependent) changes in chromatin structure.


Supplemental Validation of Method:
To further examine our matched datasets we also compared our genome-wide tup1∆ chromatin data to multiple wildtype chromatin datasets. Since we could not calculate gel-front correlations between our data and other MNase-seq experiments, we determined a genome-to-genome Pearson correlation between datasets to gauge how well-matched they are to our tup1D data (Supplemental Table 1).
As evident by the trend in Supplemental Figure 2, Tup1-specific differences in chromatin structure between wild-type and tup1∆ chromatin were seen for only highly-correlated datasets (orange and pink highlights), and our matched sample (yellow highlight) preparation provided the greatest Tup1-enrichment score. Importantly, the loss of Tup1enrichment signal seen for dissimilar chromatin regions in unmatched sample comparisons was due to identification of non-specific changes in chromatin structures caused by technical differences. Supplemental Figure 3 illustrates this concept, depicting a single dissimilar region of chromatin identified between our matched samples that is centered on a Tup1 binding site. Decreased specificity for unmatched comparisons is observed regardless of the stringency of defining dissimilar regions (Supplemental Figure 4).
Overall, these results demonstrate the superiority of our matched method of uniquely identifying Tup1-specific regions of dissimilar chromatin, whereas unmatched datasets suffer from non-specific differences in chromatin structure for all MNase-seq data comparison windows.

Supplemental Figure 1. Unmatched preparations show poor consistency in identifying changes in MNase protection signals
Graphs illustrating MNase protection data and the underlying nucleosome occupancy data for both wildtype (black) and tup1∆ (red) chromatin. Error bars represent the standard error between experiments.
Top panel: Graph of MNaseseq data for matched samples of wild-type and tup1∆ chromatin surrounding the YLR295C transcription start site. MNase-seq data from matched samples illustrates a decrease in MNase protection upstream of the YLR295C gene associated with loss of Tup1-dependent nucleosome stabilization, which has been demonstrated previously [1].
Upper middle panel: Bar graphs illustrating the reproducibility of MNase-seq results using MNase-qPCR approaches on both matched and unmatched chromatin preparations. Matched preparations (yellow) of chromatin DNA consistently identify changes in MNase protection signals which relate to the initial MNase-seq signal. These changes also relate to underlying changes in nucleosome occupancy (lower middle and bottom panels). Unmatched preparations (black and grey) show poorer ability to identify changes in MNase-seq signals, especially at locations with subtle changes in protection such as amplicons A and D (purple). Unmatched preparations also show more variance between preparations (larger standard error).
Lower middle and bottom panels: Bar graphs illustrating the underlying nucleosome occupancy (histone ChIP-qPCR) corresponding to MNase-seq signals upstream of the YLR295C gene.

Supplemental Figure 2. Matched MNase digests identify biologically relevant differences in chromatin structure.
Bar graph illustrating the average Tup1 binding (blue), from ChIP-chip analysis [2], for dissimilar chromatin structures identified when comparing various wild-type MNase-seq datasets to our tup1∆ dataset. Sources and characteristics of wild-type datasets are listed in Supplemental Table 1. All dataset comparisons are sorted by genome-to-genome Pearson correlations between processed wild-type and tup1∆ MNase-seq datasets, whereby correlation (r) increases along the x-axis reading left to right, such that Kent_NOCL_PD is the lowest correlated dataset and Rizzo_CL_CD is the highest correlated. Tup1-specific differences in chromatin structure between wild-type and tup1∆ chromatin are seen for only highly correlated datasets (pink and orange highlights) and are best captured by our matched preparation (yellow highlight). Error bars represent standard error. .

Supplemental Figure 3. Comparison between wild-type and tup1∆ MNaseseq experiments at a single region of Tup1-dependent chromatin
Graph illustrating MNase-seq signals, at 10-bp resolution, for a single region of chromatin. Graphs display data from our tup1∆ dataset (red) compared to various wild-type datasets. Sources and characteristics of wild-type datasets are listed in Supplemental Table 1. Graphs of dataset comparisons are sorted by genome-to-genome Pearson correlations between processed wild-type and tup1∆ MNase-seq datasets, whereby correlation (r) decreases reading from the top graph (wt = Rizzo_CL_CD) to bottom (wt = Weiner_CL_PD). Gray bars reflect windows of dissimilarity between chromatin structures, whereby the correlation between the two MNase-seq datasets is r<0.5 for a 1000 bp window centered on that base-pair. A bar graph of Tup1 binding by ChIP-chip analysis (blue; Bottom Graph) is also displayed for this same region. Notice how the number of dissimilar chromatin regions identified in MNase-seq datasets comparisons is inversely proportional to the Pearson correlation between datasets. Dissimilar chromatin structures between our matched experiments (yellow highlight) and other highly correlated datasets (pink and orange highlights) center on Tup1 binding events, whereas dissimilarities between unmatched chromatin structures do not, due to higher levels of technical variance. Identifications of non-specific changes in chromatin structure occur more frequently in poorly correlated and unmatched sample comparisons.

Supplemental Figure 4. The ability of matched MNase digests to specifically detect biologically-relevant differences in chromatin structure is NOT dependent on dissimilarity cutoff values used in analysis.
Results illustrate how the improved biological-signal-to-technical-noise ratio seen for our matched MNase-seq datasets is not contingent on the dissimilarity cutoff (r <0.5) used in the initial analysis.
A.) Heatmap illustrating the average Tup1 enrichment for all unique 1000 bp chromatin windows sorted by the percent rank of the Pearson correlation coefficient (r) from comparisons between tup1∆ MNase-seq data and various wild-type MNase-seq datasets (x-axis). Supplemental Table 1 provides the experimental characteristics of MNase-seq datasets used in this analysis. Only chromatin regions with available MNase-seq and ChIP-chip Tup1 binding data [2] were used in this analysis and the average Tup1 binding profiles for this data was calculated for each 0.25% percentile of data. Our matched dataset (yellow highlight) shows an enrichment of Tup1 binding that is specific to the top ranked chromatin widows (i.e. most dissimilar regions between wild-type and tup1∆ datasets). Conversely, unmatched dataset comparisons show a marked increase in non-specific Tup1 binding signals at regions that are not strongly dissimilar chromatin regions. Black bar notes the most dissimilar chromatin regions (top 15% rank) identified in this analysis and used to populate the line graph in SF 4B.
B.) Line graph illustrating the average Tup1 binding data for the strongest dissimilar chromatin regions identified by analysis in SF 4A. Our matched dataset (black) consistently shows a stronger Tup1-enrichment score for the most dissimilar regions of chromatin identified.

Supplemental Figure 5. MNase cut probability function for nucleosomal DNA templates.
The probability of cutting nucleosomal DNA templates during MNase digestion simulations was weighted using an exponential decay function (red) that relates MNase protection to a base-pair's location within a nucleosome. Protection was set to be 1000X greater than naked (linker) DNA at the nucleosome dyad with a decay to 50X protection at the edges of nucleosomes, based on the in vivo work of Widom and colleagues [3]. Sources and characteristics of technical preparation of the MNase -seq datasets used in this analysis. A range of different chromatin preparations, including variable crosslinking, growth, temperature and MNase-digestion extents, were included in this analysis to intentionally introduce technical differences into MNase-dataset comparisons and gauge their effect on signal comparisons. All experiments used chromatin from either wild-type (black) or tup1∆ (red) S. cerevisiae strains. All samples were grown in rich medium (YPD) to log phase and sequenced on the Illumina platform. All data were downloaded from the Sequence Read Archive Database (SRA) and processed identically, as described in Methods [10]. UNID = unique identifier, PD = Partial Digestion (<70% Monos), ND = Normal Digestion (70-89% Monos), CD = Complete Digestion (90-100% Monos), UK = Unknown Digestion (not specified). Correlation to tup1∆ = genome-to-genome Pearson correlation (r) between processed wild-type and tup1∆ MNase-seq datasets. Wild-type MNase-seq datasets represent a range of chromatin preparations, including our matched wild-type dataset which had the highest correlation to tup1∆ chromatin (r = 0.93). Other unmatched datasets represented a range of correlations to our tup1∆ data including well-correlated datasets (0.75 < r <.9) and poorly correlated datasets (r < 0.75).