DNA deformability changes of single base pair mutants within CDE binding sites in S. Cerevisiae centromere DNA correlate with measured chromosomal loss rates and CDE binding site symmetries
 Brad Hennemuth^{1} and
 Kenneth A Marx^{1}Email author
DOI: 10.1186/14712199712
© Hennemuth and Marx; licensee BioMed Central Ltd. 2006
Received: 31 August 2005
Accepted: 16 March 2006
Published: 16 March 2006
Abstract
Background
The centromeres in yeast (S. cerevisiae) are organized by short DNA sequences (125 bp) on each chromosome consisting of 2 conserved elements: CDEI and CDEIII spaced by a CDEII region. CDEI and CDEIII are critical sequence specific protein binding sites necessary for correct centromere formation and following assembly with proteins, are positioned near each other on a specialized nucleosome. Hegemann et al. BioEssays 1993, 15: 451–460 reported single base DNA mutants within the critical CDEI and CDEIII binding sites on the centromere of chromosome 6 and quantitated centromere loss of function, which they measured as loss rates for the different chromosome 6 mutants during cell division. Olson et al. Proc Natl Acad Sci USA 1998, 95: 11163–11168 reported the use of proteinDNA crystallography data to produce a DNA dinucleotide protein deformability energetic scale (PDscale) that describes local DNA deformability by sequence specific binding proteins. We have used the PDscale to investigate the DNA sequence dependence of the yeast chromosome 6 mutants' loss rate data. Each single base mutant changes 2 PDscale values at that changed base position relative to the wild type. In this study, we have utilized these mutants to demonstrate a correlation between the change in DNA deformability of the CDEI and CDEIII core sites and the overall experimentally measured chromosome loss rates of the chromosome 6 mutants.
Results
In the CDE I and CDEIII core binding regions an increase in the magnitude of change in deformability of chromosome 6 single base mutants with respect to the wild type correlates to an increase in the measured chromosome loss rate. These correlations were found to be significant relative to 10^{5} Monte Carlo randomizations of the dinucleotide PDscale applied to the same calculation. A net loss of deformability also tends to increase the loss rate. Binding site position specific, 4 datapoint correlations were also created using the wild type sequence and the 3 associated alternate base mutants at each binding site position. These position specific slope magnitudes, or sensitivities, correlated with and reflected the underlying position symmetry of the DNA binding sequences.
Conclusion
These results suggest the utility of correlating quantitative aspects of sequence specific proteinDNA complex single base mutants with changes in the easily calculated PDdeformability scale of the individual DNA sequence mutants. Using this PD approach, it may be possible in the future to understand the magnitude of biological or energetic functional effects of specific DNA sequence mutants within DNAprotein complexes in terms of their effect on DNA deformability.
Background
In a detailed genetic mapping study of the centromere of chromosome 6, Hegemann et al [7] produced 67 single base DNA mutants within the critical CDEI and CDEIII binding sites and measured chromosome fragment loss rates for all the chromosome 6 mutants during mitosis. While no specific error ranges were presented in this study, the researchers indicated that repeated data points were within 10% of each other [8]. All possible single base mutants were tested for the 8 bp CDEI element which has a center of symmetry between positions 5 and 6. Similarly, all possible single base mutants were tested for positions 11–17 of the CDEIII element, which forms the highly conserved core 7 bp palindrome with a center of symmetry at position 14. These areas are referred to as the 'core' protein binding sequences in our study due to their base conservation and elevated loss rates. In Figure 1 below we show each base position and present the magnitude of the measured chromosome loss rate for each base mutant as vertical lines, where the length of each line is proportional to the log_{e} (chromosome loss rate).
The current state of knowledge in understanding DNAprotein interactions posits that DNA sequence specific recognition by its cognate protein may happen by a combination of both direct and indirect readout of a DNA sequence [9]. Direct readout would correspond to specific functional groups in the DNA major and minor grooves, including tightly bound waters of hydration, interacting with specific amino acid residue features of the 3D structured protein. Indirect readout would correspond to features such as characteristic distortions of the DNA sequence that may not be entirely sequence specific. A type of indirect readout of DNA is that characteristic of the deformation brought about by sequence specific protein binding. Olson et al [10] have used crystallography data on nearly 100 proteinDNA complexes to produce a type of indirect readout scale, the dinucleotide protein deformability energetic scale (PDscale), that describes the average DNA deformability by sequence specific binding proteins. The relative positions of adjacent planar base pairs are calculated in terms of the 6 dimensional parameters: twist, roll, tilt, shift, slide and rise. From these calculations, each of the 10 unique dinucleotides was assigned a single number that represents the multidimensional volume of energetic states found for that dinucleotide. That numerical PDscale value for each dinucleotide is a representation of the average overall deformability for that dinucleotide by sequence specific binding proteins.
In this study, we have demonstrated novel correlations between the chromosome 6 single base pair mutants' chromosome loss rate data obtained by Hegemann et.al., [7], expressed in the log_{e} form, and the PDscale differences for each single base mutant calculated from the difference between each mutant and the wild type DNA. The correlations were observed for both CDEI and CDEIII binding sites. We also examined each binding site position's specific 4 datapoint correlation, created using the wild type sequence and the 3 associated alternate base mutants' chromosome loss data for that position. Slope magnitudes from these positionspecific correlations were found to be symmetric about the center of symmetry of both the CDEI and CDEIII sites. The slopes quantitatively represent symmetric energetic/functional aspects of the assembled centromere in vivo and reflect the underlying symmetry of the palindromic DNA binding sequences. Monte Carlo statistical controls were performed to demonstrate the significance of the observed correlations.
Results and discussion
PD changes in mutant sequences correlate with measured chromosomal loss rates
Correlation and Monte Carlo data values
UNSIGNED PD  Monte Carlo  

Mutant Group  Data Points  R Value  Slope  Reg. PValue  Prob.(Better RValue)  Prob.(Steeper Slope) 
CDE I Core  24  54.6  0.186  .0057  .016  .009 
CDE III Core  21  54.7  0.427  .0103  .052  .044 
CDE I & III  45  50.1  0.347  .0004  .003  .003 
ALL  67  43.0  0.264  .0003  .020  .007 
Non Core  22  11.3  0.018  .6100  .280  .460 
SIGNED PD  
CDE I Core  24  30.1  0.08  .152  .098  .050 
CDE III Core  21  68.0  0.37  .0007  .059  .011 
CDE I & III  45  22.9  0.11  .130  .228  .282 
ALL  67  30.1  0.12  .013  .083  .065 
Non Core  22  15.4  0.02  .493  .206  .194 
It is worth noting that the intercepts of the regression lines from both these unsignedPD CDEI and CDEIII core areas yield values very close to the wild type chromosome 6 loss rate point (marked as 'W' in Figure 3A). The wild type point was not included in the determination of the correlations and slopes. We believe that the closeness of the intercepts to the true wild type chromosomal loss rate value, along with the magnitudes of the correlations, suggests that the PDscale is an appropriate and valid method for representing mutant DNA sequence differences that bring about consequent energetic and functional changes for this DNAprotein system.
When we simply compare the raw signedPD data values associated with the two core area mutant groups, the differences are quite noticeable. The wild type average dinucleotide PD value for each site is about the same, but the average signedPD value for all mutants is about 3 for CDEI vs +1 for CDEIII. This is explained by the fact that CDEIII positions often have 2 of the 3 mutants introducing a YR hinge while many of the CDEI positions have 2 of 3 mutants eliminating YR hinges. Furthermore, the CDEIII position 14 and 15 mutants tend to eliminate the center YR hinge and therefore most have a high net loss of deformability. These mutants also have the highest loss rates (suggesting the biological importance of this YR hinge to CDEIIIprotein binding and correct centromere function). Since just the opposite is true of most of the other CDEIII position mutants, this partly explains the exceptionally high signedPD correlation value in Table 1. This means that the polyY, polyR CDEIII site creates a tendency for higher correlation values in the signedPD case as reflected in the Monte Carlo 0.059 probability of higher correlation than the actual value and leaves the general significance of the high signedPD correlation debatable. Indeed the other signedPD results presented in Table 1 showed much lower correlations. However, in contrast to the unsignedPD results, there is a clear tendency for these correlations to be negative. The only exception is the noncore group, which has the only positive slope and, as in the unsignedPD case, has regression values indicating that the correlation is not significant. Therefore, although expressed statistically somewhat weakly, these data suggest that using the PDscale in the signedPD metric, exhibiting a net lowering of energetic deformability, correlates to a higher chromosomal loss rate.
The slopes of these regressions are a representation of the rate of change in log_{e} (chromosome loss rate) per unit PDscale change. In other words, they measure the binding site functional sensitivity to deformability changes averaged over the entire site for which mutants were generated and studied. The ratios of CDEIII to CDEI slopes, 2.3 for unsignedPD and 4.7 for signedPD, echo's the greater sensitivity of CDEIII compared to CDEI seen in the raw chromosome loss rates. Besides the raw loss rates of the Hegemann et.al., [1] study, many other experimental studies are in agreement with these slope ratios or binding site sensitivities, clearly indicating the greater importance of the CDEIII site compared to the CDEI site for correct centromere functioning [2, 6, 7]. This way of viewing the measured chromosomal loss rate data for different DNA sites provides an underlying structural/energetic basis for understanding them and calculating their magnitudes in terms of DNA deformation produced at those sites.
It is very clear from the group correlation values that the correlations are much stronger in the known conserved core protein binding centers as compared to strains with mutations in the less critical flanking areas, i.e. the noncore group. Although the mutant data is incomplete in these areas, the loss rates are varied and there are a few significant loss values. The lack of correlation here, in contrast to the core binding areas, may suggest a less critical role played by energetic deformability changes in these noncore DNA regions contributing to accurate biological function.
Dual variable regression models, using both the signedPD and unsignedPD, also reflect the overall weakness of the signedPD correlations. Analysis of variance reveals that none of the signedPD variable parameters are significant as an additional parameter except for the CDEIII group. There, the signedPD parameter is actually more significant than the unsignedPD, but both are very statistically significant. This last correlation has an Rvalue of 82.4; a Pvalue of 0.0027 reflects the significance of the unsignedPD parameter added as a second parameter.
Monte Carlo simulation distributions demonstrate significance of delta PD sum correlations
The Monte Carlo analysis (see methods) was done as both a statistical crosscheck of the reported regression significance as well as to reveal trends or bias inherent in the data set. Dramatic differences between regression Pvalues and Monte Carlo Pvalues are an indicator of bias and question the validity of linear regression model assumptions. The Monte Carlo Pvalues are shown for both correlation and slope in Table 1 and generally have the same relative trends as the regression Pvalues. However, as noted above we see the stark contrast in Pvalues in the CDEIII signed case: 0.0007 vs 0.059 indicating significant data set bias not reflected in the linear regression values. Notably, from a pure statistical viewpoint, the best fit is the combined core area (CDEI & CDEIII) group using the unsignedPD metric. The regression Pvalue of 0.0004 and associated Monte Carlo value of 0.003 suggest the applicability of this approach to known sequence specific binding areas.
The superimposed normal curve created from the mean and standard deviation of each dataset in panels AD reveals that the CDEI distribution is closer to a normal form than the CDEIII and the unsignedPD distributions are closer to normal than the signedPD. Therefore, the least amount of bias attributed to the PDscale is found in the unsignedPD CDEI case. It is striking that both CDEIII distributions show a tendency toward a bimodal form with the signedPD version having a more pronounced effect and a significant negative bias. The exact shapes of these distributions are a complex interplay between the mutant loss rate pattern and the overlaps of the PDscale delta sum calculations.
Double mutant groups
Three small double mutant groups were also tested [4, 8] where a primary mutation is held constant and a secondary mutation is created. These groups can be evaluated as previously described by using the PDscale delta sums corresponding to the secondary areas. The first group was the CDEIII position 15 base 'T' (15T) group and had as their second mutation a changed base in the CDEI position 3,4, or 5. The group consisted of 8 of the possible 9 CDEI mutant variants. The 15T group had no correlation using unsignedPD and a signedPD correlation value of 62.3 with a positive slope of 0.0262. The signedPD correlation was of poor quality; the regression Pvalue was 0.099. The second group was the CDEIII position 18 base 'A' (18A) group and consisted of the three possible CDEIII position 15 secondary mutations. Here we simply note that this group's 4 point unsigned position specific plot is a bit better than the single mutant version and maintains the general correlation trends previously discussed. The last group consists of 2 CDEIII double mutants which also follow previously discussed unsignedPD correlation trends. Although the first group results are contrary to the previous findings, it should be noted that this group is comprised of only 8 data points, and includes a high loss rate primary mutant. Use of the latter mutant forces an alternate loss rate assessment method to be used and involves mutations in both binding sites, which may introduce additional energetic factors in the chromosome loss rates involving changes in proteinprotein interactions. Thus, these limited double mutant data do not provide a clear test of the utlility of the PDscale representation.
Position specific binding site sensitivities possess symmetric features mirroring the binding site sequence
The overall position specific symmetry for both core sites can be described with one number by using the six pair relative closeness sum (see methods). The value can range from 0 to 6, the lower the number, the more symmetric the result. For the unsignedPD representation, this sum is 1.02, indicating a highly symmetric set of slope values for the combined core sites. Monte Carlo techniques using random PDscale values as described in methods were used to count the number of values less than the real value, giving a probability of obtaining better overall symmetric patterns than the actual. It is understood that these unsignedPD 4 point regressions have a natural tendency to produce significant positive correlations and, even worse, the slope magnitudes would tend to reflect the combined loss rates of the 3 mutants. Yet despite this, the computed Monte Carlo probability for a more symmetric value is a relatively low 0.012 and this is conservative, as it allows for a fairly wide range of individual pair 'relative closeness' values. It drops to 0.007 if all six pair values are required to be no worse than the worst actual single pair value. Therefore, we see the single site position changes in log_{e} (chromosomal loss rate) with respect to sensitivity to energetic deformability significantly reflecting the interactive proteinDNA symmetry of these systems.
The same symmetry analysis using the corresponding values for the signedPD yield much poorer results than for unsignedPD. The individual position specific slopes are less significant, and the overall symmetry is much worse; the six pair symmetry value is 2.59. Therefore, from a number of results we have presented, the unsignedPD is the more useful representation, producing significant correlations of the log_{e} (chromosome loss rates) dependence on changes in the binding site deformability for both CDEI and CDEIII.
Applicability to other systems
It is clear from the mutant loss rates in Figure 1 that the CEN6 wild type CDEI & CDEIII sequences possess superior levels of function compared to all the single base pair mutant sequences. In order to carry out their function, these native sequences must fulfill multiple energetic requirements that are dynamic in nature. The centromere formation process involves many proteins, some binding to both DNA as well as other proteins. Then, the resulting DNAmultiprotein complex must sustain the energetic forces of the chromosome separation process. Since the DNA dinucleotide based PDscale represents the average deformability energetics expressed by dinucleotides involved in sequence specific protein binding, it is not that surprising to find general correlations between the magnitude of the energetic perturbations associated with the mutants and their resulting loss of function. While we have found that the PD sensitivity is roughly doubled in the CDEIII site, the unsigned PD regression combining all 45 mutants in these two core sequences with clearly different PDscale signatures is still very significant. The low regression pvalue implies the effects of PD change are relatively close in magnitude and highlights the general applicability of the PDscale to this system. Our study (data not shown) using the Cbf1p/CDEI binding constants [15] shows correlation at selected DNA site positions. However, using this simple approach with all preserved site positions does not provide an overall significant level of correlation. Currently, we do not understand the reason for this. Certainly the binding constants for Cbf1p/CDEI complexes are quantities of far simpler physicochemical systems than are the DNAmultiprotein complexes involved in the in vivo chromosome segregation functional data we analyzed in this study. We do not yet know whether the types of correlations observed here will be a general type of behavior found in different DNAprotein systems where functional or energetic measurements of complexes possessing single base mutations have been carried out. We are interested in determining whether the PD change effects for other completely different DNAprotein systems exhibit correlations and also whether they are roughly linear as we observed here.
Conclusion
The PDscale represents DNA dinucleotide based deformability energetics by sequence specific protein binding. We have investigated the use of calculating changes in the PD value of single base mutant sequences relative to the wild type sequence to demonstrate correlations with measured chromosome loss rates for these single base mutants within the core CDEI and CDEIII protein binding sites in the centromere of yeast chromosome 6. We have produced novel results that lead us to the following conclusions. The greater the magnitude of change in energetic deformability of a given mutant, the greater is its measured chromosome loss rate. Generally this is linear for the rate of log_{e} (chromosome loss rate) increase per PDscale unit change, with the value for CDEIII being over twice that of CDEI. This higher deformation sensitivity reflects the conclusions from the data presented in Table 1 and agrees with numerous experimental studies, indicating that CDEIII is more critical to correct chromosome segregation than is CDEI. A net decrease in energetic deformability tends to correlate to a higher loss rate. This is strongly expressed in the CDEIII site. The position specific site sensitivities, or slope magnitudes, reflect the underlying sequence symmetries of these two sites. Taken together, these data suggest that the PDscale representation of the deformability energetics of a DNA sequence, is an important simple attribute of the DNA sequence that could be used in future studies to quantitate and understand the functional consequence of alterations in a DNA recognition sequence upon interaction with its sequence specific recognition protein.
Methods
We used the PDscale of Olson et. al. [10] in the following way. The single base change in a mutant causes two adjacent dinucleotide PDvalues to differ from the wild type (Figure 2). Let delta1 and delta2 be these differences, calculated by Mutant – Wild Type. Then delta1+delta2 is the signed PDscale delta sum (signedPD), and is the net directional change in the PD scale attribute of the mutant with respect to the wild type. Similarly, delta1+delta2 is the unsigned PDscale delta sum (unsignedPD), and is the magnitude of the PD scale change. Each mutant's PDscale delta sum is paired with the natural log of the chromosome loss rate forming a data point in our plots. The PDscale delta sums and the log_{e} (chromosomal loss rates) are used as the predictor and response values, respectively, for correlations using various groupings of these data points. Two types of groupings were used: larger groups of mutant data points without inclusion of the wild type data point and position specific, 4 datapoint groups, comprised of the 3 associated alternate base mutant data points for a single position along with the wild type data point.
All of the correlations used standard linear techniques with a constant term. For each group, two separate regressions were performed: one using unsignedPD and one using signedPD. Additionally, 2 variable linear regressions were done for the larger groups using both signedPD and unsignedPD as predictor variables. While Pvalues for all parameters in the single and dual variable regressions were produced, control Monte Carlo based Pvalues were also calculated. This Monte Carlo method first randomly reassigned the 10 PDscale values to the 10 unique dinucleotides and then performed the same correlations done with the real PDscale values. This randomization process was repeated 100,000 times producing a distribution of Rvalues and, independently, slope values for each specific realvalue regression performed. From these distributions, the probabilities of encountering a better correlation or a steeper slope than the actual values were obtained.
The position specific data are mainly only applicable to the core palindromic areas in CDEI & CDEIII, where all 3 mutants at the following positions were tested: CDEI 1–8 and CDEIII 11–17. The patterns of these regression values, relative to the center of symmetry, were the focus of our interest. Numerically, this symmetry is expressed by finding pairs of slope values close in magnitude. For CDEI, the symmetric position pairs are (5,6), (4,7), and (3,8); for CDEIII they are (13,15), (12,16), and (11,17). We expressed the relative closeness of two values, v1 and v2 as v1v2/(v1+v2) allowing for a measure of the symmetry pattern by summing these 'relative closeness' values for all six pairs of symmetric positions. The value can range from 0 to 6, the lower the number, the more symmetric the result. Then, Monte Carlo techniques using random PDscale values as described above were used to count the number of values less than the real value, giving a probability of obtaining better overall symmetric patterns than the actual.
The Mathworks' MATLAB^{®} version 7.0.1 programming software and statistics toolbox was used to calculate regressions and results.
Abbreviations
 YDNA:

pyrimidine base
 RDNA:

purine base
 YR – DNA:

Pyrimidinepurine dinucleotide
Declarations
Acknowledgements
The authors acknowledge financial support from a CFCI Seed Grant at UML.
Authors’ Affiliations
References
 Westermann S, Cheeseman IM, Anderson S, Yates JR, Drubin DG, Barnes G: Architecture of the budding yeast kinetochore reveals a conserved molecular core. J Cell Biol. 2003, 163: 215222. 10.1083/jcb.200305100PubMed CentralView ArticlePubMedGoogle Scholar
 Cheeseman IM, Drubin DG, Barnes G: Simple centromere, complex kinetochore: linking spindle microtubules and centromeric DNA in budding yeast. J Cell Biol. 2002, 157: 199203. 10.1083/jcb.200201052PubMed CentralView ArticlePubMedGoogle Scholar
 Hemmerich P, Stoyan T, Wieland G, Koch M, Lechner J, Diekmann S: Interaction of yeast kinetochor proteins with centromereprotein/transcription factor Cbf1. Proc Natl Acad Sci. 2000, 97: 1258312588. 10.1073/pnas.97.23.12583PubMed CentralView ArticlePubMedGoogle Scholar
 Niedenthal R, Stoll R, Hegemann JH: In vivo characterization of the Saccharomyces cerevisiae centromere DNA element I, a binding site for the helixloophelix protein CPF1. Mol Cell Biol. 1991, 11: 35453553.PubMed CentralView ArticlePubMedGoogle Scholar
 Meluh PB, Koshland D: Budding yeast centromere composition and assembly as revealed by in vivo crosslinking. Genes Dev. 1997, 11: 34013412.PubMed CentralView ArticlePubMedGoogle Scholar
 McAinsh AD, Tytell JD, Sorger PK: Structure, function, and regulation of budding yeast kinetochores. Annu Rev Cell Dev Biol. 2003, 19: 519539. 10.1146/annurev.cellbio.19.111301.155607View ArticlePubMedGoogle Scholar
 Hegemann JH, Fleig UN: The centromere of budding yeast. Bio Essays. 1993, 15: 451460.Google Scholar
 Jehn B, Niedenthal R, Hegemann JH: In vivo analysis of the Saccharomyces cerevisiae centromere CDEIII sequence: requirements for mitotic chromosome segregation. Mol Cell Biol. 1991, 11: 52125221.PubMed CentralView ArticlePubMedGoogle Scholar
 Gromiha MM, Siebers JG, Selvaraj S, Kono H, Sarai A: Intermolecular and Intramolecular Readout Mechanisms in ProteinDNA Recognition. J Mol Biol. 2004, 337: 285294. 10.1016/j.jmb.2004.01.033View ArticleGoogle Scholar
 Olson WK, Gorin AA, Lu X, Hock LM, Zhurkin VB: DNA sequencedependent deformability deduced from proteinDNA crystal complexes. Proc Natl Acad Sci. 1998, 95: 1116311168. 10.1073/pnas.95.19.11163PubMed CentralView ArticlePubMedGoogle Scholar
 Olson WK, Zhurkin VB: Modeling DNA deformations. Curr Opin Struct Biol. 2000, 10: 286297. 10.1016/S0959440X(00)000865View ArticlePubMedGoogle Scholar
 Liu G, Geurts AM, Yae K, Srinivasan AR, Fahrenkrug SC, Largaespada DA, Takeda J, Horie K, Olson WK, Hackett PB: Targetsite preferences of Sleeping Beauty transposons. J Mol Biol. 2005, 346: 161173. 10.1016/j.jmb.2004.09.086View ArticlePubMedGoogle Scholar
 Sierk ML, Zhao Q, Rastinejad F: DNA Deformability as a Recognition Feature in the RevErb Response Element. Biochemistry. 2001, 40: 1283312843. 10.1021/bi011086rView ArticlePubMedGoogle Scholar
 Espelin CW, Kaplan KB, Sorger PK: Probing the architecture of a simple kinetochore using DNAprotein crosslinking. J Cell Biol. 1997, 139: 13831396. 10.1083/jcb.139.6.1383PubMed CentralView ArticlePubMedGoogle Scholar
 Wieland G, Hemmerich P, Koch M, Stoyan T, Hegemann J, Diekmann S: Determination of the binding constants of the centromere protein Cbf1 to all 16 centromere DNAs of Saccharomyces cerevisiae. Nucleic Acids Res. 2001, 29: 10541060. 10.1093/nar/29.5.1054PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.