Prediction of highly expressed genes in microbes based on chromatin accessibility
© Willenbrock and Ussery; licensee BioMed Central Ltd. 2007
Received: 25 October 2006
Accepted: 13 February 2007
Published: 13 February 2007
It is well known that gene expression is dependent on chromatin structure in eukaryotes and it is likely that chromatin can play a role in bacterial gene expression as well. Here, we use a nucleosomal position preference measure of anisotropic DNA flexibility to predict highly expressed genes in microbial genomes. We compare these predictions with those based on codon adaptation index (CAI) values, and also with experimental data for 6 different microbial genomes, with a particular interest in experimental data from Escherichia coli. Moreover, position preference is examined further in 328 sequenced microbial genomes.
We find that absolute gene expression levels are correlated with the position preference in many microbial genomes. It is postulated that in these regions, the DNA may be more accessible to the transcriptional machinery. Moreover, ribosomal proteins and ribosomal RNA are encoded by DNA having significantly lower position preference values than other genes in fast-replicating microbes.
This insight into DNA structure-dependent gene expression in microbes may be exploited for predicting the expression of non-translated genes such as non-coding RNAs that may not be predicted by any of the conventional codon usage bias approaches.
Transcription of DNA is highly influenced by DNA bending and flexibility. These structural properties are dependent on the base sequence , which in turn, is reflective of, or may influence the codon usage – also important in determining the relative expression of a given gene. Prediction of highly expressed genes and elucidation of the physical and biological properties of highly expressed genes has been addressed by a number of studies [2–4].
The translational 'codon adaptation index' (CAI) is highly correlated with the expression level in fast growing bacteria . It is based on the finding that highly expressed genes almost exclusively use those codons of abundant tRNAs in Escherichia coli and budding yeast . Consequently for any sequenced bacterial genome, a codon bias signature can be deduced that is most likely to be efficient for translation. This bias is used to derive codon adaptation indices for all genes for a given organism, where high CAI values correspond to genes most likely to be highly expressed.
However, using CAI, one is only able to predict highly expressed proteins (translated genes) since this measure is based on codon usage bias. Unfortunately, this method cannot consider tRNAs, ribosomal RNAs, and other non-coding RNAs. Moreover, for organisms with low translational bias – typically slow growing organisms – CAI is a less effective predictor of highly expressed genes . Furthermore, effective usage of CAI requires the identification of a representative subset of highly expressed genes in an organism on which the codon bias is based. While relatively good subsets may be found by simple BLAST searches  for organisms closely related to well-characterized model organisms such as Yeast and E. coli, it is more difficult for more distant microbes such as archaeabacteria.
On a more global scale, gene expression may be regulated from specific promoters that are sensitive to DNA superhelicity. That is, supercoiling may regulate gene expression at a genome-wide level [8, 9]. In this way, an organism may react rapidly to changes in growth and nutritional states as well as environmental conditions since DNA superhelicity varies with the cellular energy charge, which, for example, differs in log phase versus stationary phase or is influenced by environmental factors such as temperature or osmotic stress . Such structural elements appear to be clustered around the chromosome in so-called topological domains [8, 11, 12].
The 'position preference' measure is a DNA structural measure that was originally derived for eukaryotes using chicken DNA and is a trinucleotide model of nucleosome positioning patterns. It reflects the preference of a given trinucleotide for being found in a region where the DNA minor groove faces either towards or away from the nucleosome histone core . Here, we use a minor modification of the original nucleosomal positioning trinucleotide scale where absolute values reflect the magnitude of position preference . Thus, high absolute position preference reflects a high preference for nucleosomes, while low absolute position preferences reflect trinucleotides which tend to exclude nucleosomes. On the one hand, this only makes sense in eukaryotes since prokaryotes do not have nucleosomes. However, prokaryotes also have chromatin, and the DNA is compacted to similar levels (i.e., more than 1000x) in both prokaryotes and eukaryotes. The position preference value is also a measure of anisotropic DNA flexibility of certain trinucleotides, which can either favor nucleosome positioning ("high position preference") or tend not to be found in sequences wrapped around nucleosomes. Consequently, the 'position preference' measure also describes a more general structural property of DNA – that is, how easily can it be wrapped around chromatin proteins. As a result, position preference has been used previously to show structural characteristics in prokaryotic genomes [14, 15]. For example, a cluster analysis of various structural properties including position preference, identified groups of genes that contained all the ribosomal RNAs and a majority of the ribosomal proteins from Escherichia coli . These genes were characterized by higher than average DNaseI sensitivity  and low position preference, indicating regions of DNA not easily condensed by chromatin. Since the ribosomal genes are among the most highly expressed in actively dividing E. coli cells, it was hypothesized that their common structural features may play a role in regulating expression and that there exists a correlation between low position preference values and highly expressed genes . This makes sense because regions of DNA that are not condensed into chromatin are more accessible to the RNA polymerase. Consequently, transcription is thought to be governed by 'effective' superhelicity, where topoisomerases, the transcription machinery and chromatin proteins compete for available supercoils .
Here, we use the position preference (PP) measure for the prediction of highly expressed genes in 6 sequenced microbial genomes with a particular interest in the model organism E. coli. The predictions are compared to experimental data as well as predictions by CAI and we thereby demonstrate that the position preference measure is a useful measure for prediction of highly expressed non-translated genes. We have extended this analysis by examining position preference values of genes in 328 sequenced microbial genomes. By characterizing the functional categories of genes predicted to be highly expressed, we find that these categories are independent of phylogeny but rather reflect the ecology of the organism, such as pathogens or extremophiles.
Results and discussion
Whole genome E. coli Atlas
Ribosomal proteins and non-translated RNA
Ribosomal proteins are often highly expressed and demonstrate high codon usage bias in terms of high CAI values, at least in fast replicating microbes. Consequently, for these microbes, we expect a similar correlation between low position preference and high gene expression level.
While highly expressed ribosomal proteins and non-translated RNA genes demonstrated a tendency to have low position preference especially in fast replicating organisms, this does not signify that the position preference measure for prediction of highly expressed genes may only work in fast replicating organisms as for the CAI measure. On the contrary, position preference might, consequently, provide an alternative measure for prediction of highly expressed genes in slow replicating bacteria. For these, ribosomal proteins and RNA genes are not always highly expressed and therefore, the CAI measure is less efficient.
The above results are somewhat in contrast to the findings by Segal and coworkers . They recently published a more refined model for nucleosomal positioning based on a combined experimental and computational approach. Although this model predicts a nucleosome pattern strikingly similar to that of the model used in our study [13, 22], at least for eukaryotes, they did not find nucleosome depletion at ribosomal proteins sites in Yeast. Consequently, they predicted high nucleosome occupancy encoded over these genes and reasoned that the expression of these genes is governed by other factors. However, although we only predict a slightly lower than average position preference for yeast ribosomal proteins, we find that the general trend observed across a large range of microbial genomes is that both DNA encoding ribosomal proteins and non-coding genes have lower position preference than the genomic average (Figure 2). This points at a possible regulation of ribosomal proteins by DNA structural properties.
Position preference versus CAI
The position preference measure used in this study is based on the experimentally determined preference demonstrated by individual trinucleotides to be positioned in a specific orientation in nucleosomal DNA . Consequently, the position preference score assigned to any given triplet will be the same for all organisms and the gene average will depend only on the specific sequence of a gene whereas CAI scores for a gene depends on both the sequence and the translational codon bias in the specific organism to which the gene belongs. Correspondingly, we only found small correlations or anti-correlations between CAI triplet weights and position preference triplet scores for a few organisms, none of which were significant (multiple testing  corrected P-values = 1). Moreover, the correlation between CAI weights and position preference triplet values did not increase for fast replicating bacteria (P-value = 0.532), indicating that position preference as such, may be a useful supplement for predicting highly expressed non-translated genes even in slow-growing microbes. Moreover, because rRNAs, tRNAs, and other non-coding RNAs tend to have lower position preference than the genomic average, the position preference measure could be useful for identifying these genes in pre-annotated DNA sequences. In particular, because the position preference can be estimated at the DNA level and as such, do not require the prior knowledge of gene co-ordinates.
Prediction of highly expressed genes
From Figure 1, we would expect a correlation between low position preference and high gene expression level. However, a complete separation of highly expressed genes from the other genes was not possible using the position preference measure (for example, see Figure 2B). This is hardly surprising since no structural or coding property singularly determines the level of gene expression, for which a large number of regulatory steps are involved. Consequently, the level of separation may reflect the influence of each measure on gene expression. For the five additional microbial genomes where we had experimentally determined expression values, a clear difference was also observed between the distributions of CAI or position preference values for highly expressed genes and low expressed genes. For details, refer to supplementary Table S1 [Additional file 1].
Predicted highly expressed non-translated E. coli genes by the position preference measure.
Gene expression rank
Gene expression rank
Ribosomal and stable RNAs
Ribosomal and stable RNAs
Ribosomal and stable RNAs
Ribosomal and stable RNAs
Ribosomal and stable RNAs
Ribosomal and stable RNAs
Ribosomal and stable RNAs
Ribosomal and stable RNAs
Ribosomal and stable RNAs
Ribosomal and stable RNAs
Ribosomal and stable RNAs
RNA; Cell division
Functional categories of genes with low position preference
It is clear that the clustering brings together organisms which are relatively distant phylogenetically (Figure 4), right side color bar representing the taxonomic phylum of each genome). As opposed to the apparent clustering according to similar environments as found based on CAI , in the present analysis, the ordering appeared related to the functionality of the microbe, i.e. pathogen versus non pathogen. For example, the COG category 'replication, recombination and repair' is particularly over represented amongst genes with low position preference for a distinct cluster at the top of Figure 4, consisting of extremophilic archaea and bacteria as well as pathogenic bacteria (mainly Yersinia pestis and Shigella strains). The common feature of these organisms is that genes involved in replication, recombination and repair have very low position preference (and consequently are potentially highly expressed). Particularly genes involved in recombination and repair are essential for pathogens and microbes living under extreme conditions making it reasonable for them to be highly expressed. Supporting this observation, we find that the same COG category is over represented for pathogenic E. coli strains, O157:H7 EDL933, O157:H7 RIMD0509952, CFT073 and UTI89 as well as for most Shigella strains, which are essentially pathogenic E. coli, whereas, the same COG category is not dominating for the non-pathogenic E. coli strains K-12 W3110 and K-12 MG1655. This provides us with a possible means for distinguishing pathogenic strains from non pathogenic strains. An important caveat is that some pathogenic strains have important virulence genes expressed on plasmids, which were not considered in this study. The more direct approach to distinguishing pathogenic strains from non-pathogenic strains is to look for pathogenicity factors. However, the exact combination of virulence genes and pathogenecity factors necessary to make a strain pathogenic is still unknown and also depends on the expression level of these genes.
Finally, four fungi clustered closely with certain probiotic bacteria (Lactobacillus); it is interesting to note that these organisms can live in a similar ecological niche. Also, a few microbes contain genes with low position preference that are involved in carbohydrate transport and metabolism, especially the Streptococcus genomes found in the bottom cluster of Figure 4. Again, this might be reflective of their ecological niche.
The above analysis demonstrates that the overrepresented COG categories differ between microorganisms independently of phylogeny. Moreover, the differences in the occurrences within the 'translation, ribosomal structure and biogenesis' COG category may explain why the position preference measure was more effective in some organisms than others according to Figure 3. Consequently, instead of the above speculation that position preference is an eukaryotic measure and therefore works better in S. cerevisiae than in bacteria, the very high representation of this COG category among genes with low position preference in S. cerevisiae could explain why position preference is a better predictor of gene expression levels in S. cerevisiae than in prokaryotes, in particular G. sulfurreducens where this COG category is barely present among genes with low position preference.
We use a nucleosome position preference measure of anisotropic DNA flexibility to predict highly expressed genes in microbial genomes, and compare it to a translational codon adaptation index for synonymous codon usage bias of potentially highly expressed genes. We hereby demonstrate that absolute gene expression levels are highly correlated with low position preference in multiple microbial genomes. This newly gained insight into DNA structure dependent gene expression may be exploited for predicting the expression of non-translated genes such as non-coding RNAs that may not be predicted by any of the conventional codon usage bias approaches, and we speculate that it may also be used for prediction of highly expressed genes in slow growing microbes, in which the CAI measure is less successful. Genes often encoded by DNA with low position preference values were mostly involved in 'translation, ribosomal structure and biogenesis', 'energy production and conversion', and transcription. For pathogens and microbes living in extreme environments, the predominant functional category was 'replication, recombination and repair'. In particular, E. coli pathogenic strains and most Shigalla strains demonstrated this trait while non pathogenic E. coli strains did not. This provides a likely signature for distinguishing some pathogenic strains from non pathogens. This new insight into DNA structural dependent gene expression in microbial genomes may aid in our understanding of gene expression regulation. It may also be used in developing a reliable predictor of gene expression both in prokaryotes and eukaryotes.
Translational Codon Adaptation Index (CAI)
The codon adaptation index describes a codon usage bias in an organism . Here, we use a translational codon adaptation index (CAI), in which a codon bias signature is deduced that is most likely to be efficient for translation . In short, this method is based on a known set of 27 very highly expressed E. coli genes for bacterial genomes , and a set of 39 very highly expressed yeast genes for eukaryotes . Both reference sets were identified based on protein expressions. In order to identify a set of constitutively highly expressed genes for each of the bacterial genomes analyzed in this work, the reference set of very highly expressed E. coli or Yeast genes is aligned at the protein level against all genes annotated in the Genbank entry for each genome using BLASTP version 2.2.9 . For each of these very highly expressed genes, the gene with the best alignment was added to a set of very highly expressed genes if it had an E-value below 10-6, and these were used as a reference set for the given organism. Using each genome specific reference set, a weight table including all codons is derived indicating the most translationally efficient codons. In turn, these weights are used for calculating a CAI value for each gene. The higher the CAI score, the more likely a gene is to be highly expressed.
This is a model of anisotropic DNA flexibility, which is derived experimentally from the preference demonstrated by individual trinucleotides to be positioned in a specific orientation in nucleosomal DNA . The values indicate the preference of triplets for being specifically positioned in nucleosomal DNA. High absolute values correspond to triplets with a strong preference for having minor grooves facing either towards or away from the nucleosome core, while triplets with close-to-zero preference can occupy any rotational position on the nucleosomal DNA, and are thus assumed to be flexible in one direction. Since the 'position preference' measure is based on a simple trinucleotide model, values are assigned to every nucleotide in the DNA sequence simply by looking up the values for the corresponding triplet, in which the nucleotide is centered [1, 14, 15]. Here, the average of each possible triplet in a gene is used to calculate the position preference score for that gene.
Assigning Cluster of Orthologous Genes (COGs)
The system for delineation of Clusters of Orthologous Groups of proteins (COGs) is based on orthologous relationships between genes and is useful for comparative genomics and facilitates the functional annotation of genomes. Here, genes were assigned a COG category by AutoFACTS, an automatic functional annotation tool  utilizing Blastx version 2.2.9  to blast open reading frames to a database of sequences with assigned cog categories available from NCBI . The following COG categories were not used due to their low relevance in microbial functional genomics: 'chromatin structure and dynamics' (B), 'nuclear structure' (N), 'cytoskeleton' (Z), and 'extracellular structures' (W). Also, the two categories of poorly characterized functions were neglected: 'general function prediction only' (R) and 'function unknown' (S).
Prediction of ribosomal proteins
Ribosomal proteins for each Genbank entry were predicted using profile Hidden Markov Models (HMMs) from Pfam  since the quality of the annotations available from the Genbank entries varies tremendously. Pfam_ls profile HMMs for all ribosomal proteins were extracted (94 as per July 24th 2006). Pfam_ls files contain all the Pfam models for finding global or complete matches to a domain or family.
Gene expression data
Microarray based gene expression data were taken from Willenbrock et al., 2006 . Briefly, the dataset comprised pre-processed gene expression data for E. coli , C. jejuni , P. aeruginosa, S. cerevisiae [31, 32], G. sulfurreducens , and B. subtilis . Additional microarray gene expression data for E. coli at different growth stages were taken from , where raw data were normalized with qspline  and expression indices were estimated .
All DNA and protein sequence information was extracted from each of the 328 Genbank entries. For correlation estimates, we used Spearman's rank correlation  to avoid any problems with possible deviations from normality in compared data (e.g. log-normal distribution for microarray data). Cluster analysis was based on hierarchical clustering of Euclidian distances using complete linkage. For density plots, the bandwidths were chosen as the standard deviation of the Gaussian smoothing kernel.
Additional data are available at our website . This website contains an overview of the 328 microbial genomes included in this study linked to estimated position preference values. Supplementary Figure S1 is a detailed version of the heatmap sketched in Figure 4, providing the full organism names of all included microbial genomes. Supplementary table S1 and S2 provides some statistics for the comparison of expression values and CAI and position preference.
Codon adaptation index
cluster of orthologous genes
Hidden Markov Models
This study was supported financially by The Danish Center for Scientific Computing.
- Baldi P, Brunak S, Chauvin Y, Krogh A: Naturally occurring nucleosome positioning signals in human exons and introns. J Mol Biol. 1996, 263 (4): 503-510. 10.1006/jmbi.1996.0592View ArticlePubMed
- Raghava GP, Han JH: Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein. BMC Bioinformatics. 2005, 6: 59- 10.1186/1471-2105-6-59PubMed CentralView ArticlePubMed
- Karlin S, Barnett MJ, Campbell AM, Fisher RF, Mrazek J: Predicting gene expression levels from codon biases in alpha-proteobacterial genomes. Proc Natl Acad Sci USA. 2003, 100 (12): 7313-7318. 10.1073/pnas.1232298100PubMed CentralView ArticlePubMed
- Sharp PM, Li WH: The codon Adaptation Index – a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987, 15 (3): 1281-1295. 10.1093/nar/15.3.1281PubMed CentralView ArticlePubMed
- Carbone A, Kepes F, Zinovyev A: Codon bias signatures, organization of microorganisms in codon space, and lifestyle. Mol Biol Evol. 2005, 22 (3): 547-561. 10.1093/molbev/msi040View ArticlePubMed
- Willenbrock H, Friis C, Juncker AS, Ussery DW: An environmental signature for 323 microbial genomes based on codon adaptation indices. Genome Biol. 2006, 7 (12): R114- 10.1186/gb-2006-7-12-r114PubMed CentralView ArticlePubMed
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMed
- Willenbrock H, Ussery DW: Chromatin architecture and gene expression in Escherichia coli. Genome Biol. 2004, 5 (12): 252- 10.1186/gb-2004-5-12-252PubMed CentralView ArticlePubMed
- Peter BJ, Arsuaga J, Breier AM, Khodursky AB, Brown PO, Cozzarelli NR: Genomic transcriptional response to loss of chromosomal supercoiling in Escherichia coli. Genome Biololy. 2004, 5: R87-10.1186/gb-2004-5-11-r87. 10.1186/gb-2004-5-11-r87View Article
- Hatfield GW, Benham CJ: DNA topology-mediated control of global gene expression in Escherichia coli. Annu Rev Genet. 2002, 36: 175-203. 10.1146/annurev.genet.36.032902.111815View ArticlePubMed
- Jeong KS, Ahn J, Khodursky AB: Spatial patterns of transcriptional activity in the chromosome of Escherichia coli. Genome Biol. 2004, 5: R86- 10.1186/gb-2004-5-11-r86PubMed CentralView ArticlePubMed
- Postow L, Hardy CD, Arsuaga J, Cozzarelli NR: Topological domain structure of the Escherichia coli chromosome. Genes Dev. 2004, 18 (14): 1766-1779. 10.1101/gad.1207504PubMed CentralView ArticlePubMed
- Satchwell SC, Drew HR, Travers AA: Sequence periodicities in chicken nucleosome core DNA. J Mol Biol. 1986, 191 (4): 659-675. 10.1016/0022-2836(86)90452-3View ArticlePubMed
- Pedersen AG, Baldi P, Chauvin Y, Brunak S: DNA structure in human RNA polymerase II promoters. J Mol Biol. 1998, 281 (4): 663-673. 10.1006/jmbi.1998.1972View ArticlePubMed
- Pedersen AG, Jensen LJ, Brunak S, Staerfeldt HH, Ussery DW: A DNA structural atlas for Escherichia coli. J Mol Biol. 2000, 299 (4): 907-930. 10.1006/jmbi.2000.3787View ArticlePubMed
- Brukner I, Sanchez R, Suck D, Pongor S: Sequence-dependent bending propensity of DNA as revealed by DNase I: parameters for trinucleotides. EMBO J. 1995, 14 (8): 1812-1818.PubMed CentralPubMed
- Dlakic M, Ussery D, Brunak S: DNA bendability and nucleosome positioning in transcriptional regulation. DNA Conformation in Transcription. 2004, Ohyama T: Landes Bioscience
- Blot N, Mavathur R, Geertz M, Travers A, Muskhelishvili G: Homeostatic regulation of supercoiling sensitivity coordinates transcription of the bacterial genome. EMBO Rep. 2006, 7 (7): 710-715. 10.1038/sj.embor.7400729PubMed CentralView ArticlePubMed
- Schembri MA, Ussery DW, Workman C, Hasman H, Klemm P: DNA microarray analysis of fim mutations in Escherichia coli. Mol Genet Genomics. 2002, 267 (6): 721-729. 10.1007/s00438-002-0705-2View ArticlePubMed
- Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE: Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res. 2005, 33 (4): 1141-1153. 10.1093/nar/gki242PubMed CentralView ArticlePubMed
- Ussery DW, Hallin PF, Lagesen K, Coenye T: Genome update: rRNAs in sequenced microbial genomes. Microbiology. 2004, 150 (Pt 5): 1113-1115. 10.1099/mic.0.27173-0View ArticlePubMed
- Segal E, Fondufe-Mittendorf Y, Chen L, Thastrom A, Field Y, Moore IK, Wang JP, Widom J: A genomic code for nucleosome positioning. Nature. 2006, 442 (7104): 772-778. 10.1038/nature04979PubMed CentralView ArticlePubMed
- Bonferroni CE: CE Teoria statistica delle classi e calcolo delle probilitá. Pubblicazioni del R Istituto Superiore de Scienze Economiche e Commerciali di Firenze. 1936, 8: 3-62.
- Sharp PM, Li WH: Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codons. Nucleic Acids Res. 1986, 14 (19): 7737-7749. 10.1093/nar/14.19.7737PubMed CentralView ArticlePubMed
- Sharp PM, Tuohy TM, Mosurski KR: Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 1986, 14 (13): 5125-5143. 10.1093/nar/14.13.5125PubMed CentralView ArticlePubMed
- Koski LB, Gray MW, Lang BF, Burger G: AutoFACT: an automatic functional annotation and classification tool. BMC Bioinformatics. 2005, 6: 151- 10.1186/1471-2105-6-151PubMed CentralView ArticlePubMed
- NCBI COG categories. ftp://ftp.ncbi.nih.gov/pub/COG/COG/
- Covert MW, Knight EM, Reed JL, Herrgard MJ, Palsson BO: Integrating high-throughput and computational data elucidates bacterial networks. Nature. 2004, 429 (6987): 92-96. 10.1038/nature02456View ArticlePubMed
- Stintzi A, Whitworth L: Investigation of the Campylobacter jejuni Cold Shock response by global gene expression analysis. Journal of Genome Science and Technology. 2003, 2 (1/2): 18-27.
- Bulik DA, Olczak M, Lucero HA, Osmond BC, Robbins PW, Specht CA: Chitin synthesis in Saccharomyces cerevisiae in response to supplementation of growth medium with glucosamine and cell wall stress. Eukaryot Cell. 2003, 2 (5): 886-900. 10.1128/EC.2.5.886-900.2003PubMed CentralView ArticlePubMed
- Ronald J, Akey JM, Whittle J, Smith EN, Yvert G, Kruglyak L: Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. Genome Res. 2005, 15 (2): 284-291. 10.1101/gr.2850605PubMed CentralView ArticlePubMed
- Methe BA, Webster J, Nevin K, Butler J, Lovley DR: DNA microarray analysis of nitrogen fixation and Fe(III) reduction in Geobacter sulfurreducens. Appl Environ Microbiol. 2005, 71 (5): 2530-2538. 10.1128/AEM.71.5.2530-2538.2005PubMed CentralView ArticlePubMed
- Helmann JD, Wu MF, Gaballa A, Kobel PA, Morshedi MM, Fawcett P, Paddon C: The global transcriptional response of Bacillus subtilis to peroxide stress is coordinated by three transcription factors. J Bacteriol. 2003, 185 (1): 243-253. 10.1128/JB.185.1.243-253.2003PubMed CentralView ArticlePubMed
- Tjaden B, Haynor DR, Stolyar S, Rosenow C, Kolker E: Identifying operons and untranslated regions of transcripts using Escherichia coli RNA expression analysis. Bioinformatics. 2002, 18 (Suppl 1): S337-344.View ArticlePubMed
- Workman C, Jensen LJ, Jarmer H, Berka R, Gautier L, Nielser HB, Saxild HH, Nielsen C, Brunak S, Knudsen S: A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol. 2002, 3 (9): research0048- 10.1186/gb-2002-3-9-research0048PubMed CentralView ArticlePubMed
- Li C, Wong W: Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol. 2001, 2 (8): 1-11.
- Best DJ, Roberts DE: Algorithm AS 89: The Upper Tail Probabilities of Spearman's rho. Applied Statistics. 1975, 24 (3): 377-379. 10.2307/2347111.View Article
- Supplementary material: Prediction of highly expressed genes in microbes based on chromatin accessibility.http://www.cbs.dtu.dk/~hanni/Chromatin/
- Fisher RA: On the interpretation of χ2 from contingency tables, and the calculation of P. Journal of the Royal Statistical Society. 1922, 85 (1): 87-94. 10.2307/2340521.View Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.