Selection and evaluation of reference genes for improved interrogation of microbial transcriptomes: case study with the extremophile Acidithiobacillus ferrooxidans

Background Normalization is a prerequisite for accurate real time PCR (qPCR) expression analysis and for the validation of microarray profiling data in microbial systems. The choice and use of reference genes that are stably expressed across samples, experimental conditions and designs is a key consideration for the accurate interpretation of gene expression data. Results Here, we evaluate a carefully selected set of reference genes derived from previous microarray-based transcriptional profiling experiments performed on Acidithiobacillus ferrooxidans and identify a set of genes with minimal variability under five different experimental conditions that are frequently used in Acidithiobacilli research. Suitability of these and other previously reported reference genes to monitor the expression of four selected target genes from A. ferrooxidans grown with different energy sources was investigated. Utilization of reference genes map, rpoC, alaS and era results in improved interpretation of gene expression profiles in A. ferrooxidans. Conclusion This investigation provides a validated set of reference genes for studying A. ferrooxidans gene expression under typical biological conditions and an initial point of departure for exploring new experimental setups in this microorganism and eventually in other closely related Acidithiobacilli. The information could also be of value for future transcriptomic experiments in other bacterial systems.


Background
Gene expression interrogation, both at single-gene (classical gene expression analysis) and genome-wide (global transcriptome analysis) levels, are prominent fields of study. By providing quantitative measures of mRNA changes, it has the extraordinary potential of identifying the functional consequences of genetic variability and environmental influence. Accurate quantification of gene expression is providing great insight into the physiology and metabolic complexity of microbes and their consortia and is thus contributing to our understanding of both fundamental and applied issues of major interest.
Microarray transcript profiling is the most widely used technique to evaluate global gene expression in microbial systems. However, due to methodological uncertainties inherent to the technique, it is imperative to validate the expression of key genes by an alternative procedure and quantitative real-time PCR (qPCR) has become the method of choice. qPCR is accurate, exhibits a broad dynamic range, and is sensitive and reproducible [1][2][3][4][5][6]. However, when performing qPCR, several parameters need to be controlled in order to obtain accurate and reliable expression measurements; these include variations in the amounts of starting material between samples, RNA extraction efficiency, RNA integrity/quality, efficiency of cDNA synthesis, and differences in the overall transcriptional activity of the cells analyzed [7].
The most frequently used strategy to control for such variations is relative normalization, where the expression of a target gene is measured with respect to total RNA, rRNA or a stably expressed internal reference gene. The use of rRNA or other reference gene has the advantage that their expression permits normalization against the cumulative errors of the entire process [8]. Ideally, the reference gene should be universally valid, expressed stably and at a similar level across all samples, cells, experimental treatments, and designs. Unfortunately, no such reference gene has been identified [9][10][11][12] and even widely used control genes have proven unsuitable in certain situations [6,13,14].
Housekeeping genes are usually chosen as reference genes both in eukaryotes [15] and prokaryotes [16] because they are assumed: a) to be essential, b) to be ubiquitous, c) not to be regulated or influenced by the experimental procedure and d) to be expressed at similar levels in different types of cells. However, it is becoming increasingly clear that some commonly chosen housekeeping genes vary considerably across cell types, with time, or due to experimental treatment [13]. This may be explained partially by the fact that some housekeeping proteins participate in other functions as well as in basal cell metabolism [17][18][19].
Therefore, for an accurate comparison of mRNA transcription in different samples, either validated reference genes are required for normalization or new ones should be determined empirically for each experimental system and condition studied [20,21]. Here, we evaluate a carefully selected set of reference genes for improving the interpretation of gene expression profiles in a model microorganism, Acidithiobacillus ferrooxidans. The implementation of high-throughput microarray analyses of A. ferrooxidans [22,23] and PCR techniques for expression profiling [24][25][26][27][28][29][30][31][32][33][34][35][36][37], have greatly enhanced our understanding of the genetic and physiological potential of this bioleaching bacterium. However, suitable reference genes to evaluate gene expression have not previously been identified in A. ferrooxidans.
In order to address this lacuna, we screened high-density oligonucleotide array-based expression profiles available for A. ferrooxidans and identified a set of nine genes with minimal variability under different experimental setups. Quantitative real-time PCR was then used to determine the mRNA levels of these genes, comparing their transcription in five different experimental conditions. Finally, we evaluated the suitability of these and other previously reported reference genes to monitor the expression of four selected target genes from A. ferrooxidans grown with different energy sources. This study defines reference genes for normalization of gene expression for future research, sparing other researchers in this and related fields from cumbersome and time-consuming screenings for an ideal reference gene, provided that they verify the stability of these candidates under their conditions of study.

Selection of candidate reference genes
The most useful reference genes for standards in gene expression studies should be stably expressed over a range of experimental conditions and should be of wide phylogenetic distribution. Taking these requirements into account, a combined computational and experimental approach was devised to identify reference genes in the Acidithiobacilli (Figure 1). The strategy involves the following steps: 1) A genome-wide bioinformatic identification of candidate reference genes An initial set of candidate reference genes was compiled from a gene list that includes "housekeeping genes" of wide phylogenetic distribution [38]. This set was used to textmine the A. ferrooxidans ATCC 23270 genome (GenBank/EMBL/DDBJ accession number CP001219) [39]. Due to difference in ontological descriptions of gene function, not all genes could be recovered by textmining and additional candidates were identified in the A. ferrooxidans genome by BLASTP and TBLASTX searches using the housekeeping genes as queries. The combined set of candidate reference genes was then used to formulate bidirectional BLASTP and TBLASTX searches of the genomes of A. thiooxidans and A. caldus. This search for well conserved orthologs across Acidithiobacilli was performed in order to better define the set of essential genes for this bacterial genus. Such an approach will promote the ability to carry out future comparative gene expression studies within the Acidithiobacilli. Only genes present in all three genomes were accepted and provided the initial bioinformatic compilation of candidate reference genes (Additional File 1).
2) The selection of stably expressed candidate reference genes for A. ferrooxidans The expression profiles for the set of candidate reference genes was then evaluated in three different growth conditions of A. ferrooxidans (iron, pH 1.6 vs. sulfur, pH 3.5, iron-sulfur mixture, pH 1.6 vs. sulfur, pH 1.6 and high iron, pH 1.6 vs. low iron, pH 1.6; see Methods for more details). Candidate reference genes that exhibited non-differential expression (log ratio expression |M| < 1.5) and had the most similar level of expression (log ratio expression M~0) between every pair of conditions in all three experiments were further selected (Additional file 2).

3) Removal of redundant candidate reference genes
The genetic context of these candidate reference genes in the genome of A. ferrooxidans was evaluated using the DNA sequence viewer and annotation tool Artemis v.10 [40]. In the case where more than one candidate gene belonged to the same operon or gene cluster, only one gene was selected for further experimental validation. Also, only one candidate was chosen from genes belonging to the same functional category as defined by TIGRfams [41]. This reduced redundancy in the set.

4) Evaluation of the expression profiles of the candidate reference genes by real time PCR
The expression of these selected candidate reference genes was analyzed by quantitative PCR in order to evaluate if the stability of expression observed in microarray experiments was supported by more sensitive and rigorous evaluation methods. Transcriptional levels were compared by assessing C t values of each gene for two of the former experimental conditions (iron pH 1.6 and sulfur pH 3.5) and three new ones (sulfur pH 2.5, sulfur pH 4.5 and thiosulfate pH 4.5, see Methods for further detail). These conditions are frequently used in the laboratory to study the biology of Acidthiobacilli because they simulate environmental conditions. Expression values (C t ) of the selected reference genes and their dispersion are plotted in Additional file 3.

Expression stability of candidate reference genes
The expression of the 9 selected reference genes was analyzed in the five different experimental conditions Pipeline for the computational and experimental strategy used to identify suitable reference genes for normalization of expres-sion Figure 1 Pipeline for the computational and experimental strategy used to identify suitable reference genes for normalization of expression.
described above by the methods of Vandesompele and Andersen as implemented in the Visual Basic Application geNorm [11] and Visual Basic Application NormFinder [7], respectively (Table 1). According to geNorm (which determines a gene expression stability value M to produce a rank where the best genes are those with the lowest M value) map, rpoC, era and gmk showed the least variability in expression in all conditions evaluated (range of expression of 1.029, 1.040, 1.046 and 1.067 fold respectively). The NormFinder approach (which enables identification of the single best genes in a ranking) showed era to be the most stable gene and rpoC and gmk to rank within the four most stably expressed genes. Slight differences observed between the two techniques are to be expected and can be explained by the way in which both methods analyze the data. geNorm selects the gene with the most stable expression independent of the expression of the other genes under analysis. NormFinder instead focuses on the genes with least intra-and inter-group expression variation, thus the selection of the best reference gene is affected by the other genes being analyzed. Similar differences between the results of geNorm and NormFinder have been reported in other studies [42,43]. Taking into account the results from both geNorm and NormFinder, we can conclude that rpoC and era are the most suitable reference genes for studies with the Acidithiobacilli.

Expression stability of reference genes previously used in studies of A. ferrooxidans
Three genes, recA, alaS and rrs (16S rRNA) have been used in prior studies as internal controls for experimental investigations in A. ferrooxidans [28,30,44,45], but there has been no formal report showing that they are reliable references. Conversely, there is evidence showing differential expression of recA and rrs under cellular stress [46] and starvation [45,47]. In addition, use of rrs as a reference gene has been challenged because it is a very abundant species of RNA present at concentrations outside most calibration ranges [16].
The expression of recA, alaS and rrs was analyzed under the same experimental conditions and their expression values were ranked with respect to the new reference genes derived above using geNorm [11] and NormFinder [7] ( Table 2). The variability in expression of rrs and alaS in five different experimental setups shows them to perform well although slightly poorer than rpoC, era and map (Figure 2). Both geNorm and NormFinder identified rrs and alaS among the six more stable genes. On the contrary, recA ranks further down, indicating less stable expression, independently of the ranking method used ( Table 2). In addition, the expression of recA varies more than two fold, and together with coaE and trpS exhibits the least suitable expression profile of the genes assessed in this study (Figure 2).
It can be concluded that, among the previously used reference genes in A. ferrooxidans, only alaS is suitable and can be used with confidence as a normalizer. Given the variability observed in the present study, use of recA as a normalizer is not recommended as it would introduce noise to the analysis and eventually produce misleading results. In addition, use of rrs as a reference gene is not recommended despite its stable expression in the five conditions Ranking of the nine evaluated reference genes according to their expression stability under five different experimental conditions ( §) using geNorm [11] and NormFinder [7] applications. * Genes with the most stable expression values.
() The rank of the genes is indicated in parentheses.
( §) Experimental conditions. Condition 1: cells grown in 9 K medium at pH 1.6 containing 200 mM FeSO 4 ; condition 2: cells grown in 9 K medium at pH 2.5 containing 1% elemental sulfur; condition 3: cells grown in 9 K medium at pH 3.5 containing 1% elemental sulfur; condition 4: cells grown in 9 K medium at pH 4.5 containing 1% elemental sulfur; condition 5: cells grown in DSMZ71 medium at pH 4.5 containing 0.5% thiosulfate. analyzed because its abundance prejudices the analysis of lowly abundant transcripts.

Use of multiple reference genes for improved normalization
Normalizing gene expression based upon the expression levels of a carefully selected set of reference genes, usually referred to as a normalization factor, performs better than normalizing against any single gene alone [11]. This raises the question as to whether a compromise could be discovered in which a pool of genes, fewer than nine but more than one, would provide greater confidence for use as a reference group in A. ferrooxidans. For this purpose, several normalization factors (NF) were calculated following the criteria defined by Vandesompele et al. [11]. A NF represents the geometric mean of n genes, and the pairwise variation between sequential normalization factors (NF n and NF n+1 ) gives an idea of how well each of these perform. Geometric means for seven of most stable genes, showing less than 1.5 fold variation in all experimental conditions (map, rpoC, alaS, era, gyrA and nth) were obtained and pairwise variations between any two subsequent values was calculated. As shown in Figure 3, addition of a fourth gene leads to a non-significant change in the average of the gene variance estimates. According to geNorm ranking, it is concluded that the NF derived from the pool of the three genes map, rpoC and alaS (NF 1 ) is suitable for reliable normalization of gene expression of target genes.
In spite of this we posit that era, which ranked first accord- Pairwise variation (V n/n+1 ) analysis between normalization fac-tors (NF n /NF n+1 ) to determine the optimal number of refer-ence genes required for normalization Figure 3 Pairwise variation (V n/n+1 ) analysis between normalization factors (NF n /NF n+1 ) to determine the optimal number of reference genes required for normalization.
ing to the Normfinder method, could also be included in the selected reference gene set.

Use of selected reference genes to normalize expression of differentially expressed genes in Fe-S cells
To assess the value of our study, the relative expression levels of selected target genes was analyzed using the following normalization strategies: a) the three best reference genes selected by geNorm and Normfinder rpoC, era and alaS were used individually, b) a NF derived from the combination of the three genes selected by geNorm, rpoC, map and alaS (NF 1 ), c) a NF derived from the combination of the top ranking genes selected by geNorm and Normfinder method rpoC, map and era (NF 2 ) or d) the frequently cited reference genes rrs and recA. For this purpose, four target genes were selected that are known to be differentially expressed in Expression of the four genes of interest in iron versus sulfur grown cells was evaluated using the relative expression analysis software qBase v1.3.5 [48]. Figure 4 shows a significant increase in the expression of the sdrA1 and cbbOIa genes in cells grown in ferrous iron and of the cyoB and mntH genes in cells grown in sulfur. In all cases, normalization by individual reference genes outlined here (rpoC, era or alaS), by the geNorm derived NF 1 (rpoC plus map plus alaS), by the combined NF 2 (rpoC plus map plus era) and by rrs gave similar results. For example, sdrA1 was upregulated 60-100 fold depending on whether the normalization strategy was a single stable reference gene or the normalization factors. Conversely, normalization by the recA gene dramatically altered the relative expression ratio of the target gene, revealing an up-regulation of less than 50.
These results demonstrate how the interpretation of bacterial gene expression levels can be affected by the choice of the reference genes in quantitative real-time RT-PCR analysis. If a single gene is to be used, e.g. in studies where only one or a few target genes are being evaluated, the reference gene should be one of the three validated stable ref- Figure 4 Comparison of different normalization strategies. Mean relative expression levels of four differentially expressed genes using different normalization strategies. Black: stable reference genes used individually, light grey: geometric mean of three stable reference genes selected by geNorm (NF 1 : rpoC, map and alaS), dark grey: geometric mean of four stable reference genes selected by geNorm (NF 2 : rpoC, map and era) and white: classical reference genes used individually. (*) Significantly different according to a two tailed unpaired t-test at 95% confidence. erence genes rpoC, era or alaS. In investigations where a larger number of target genes are to be evaluated or a higher degree of confidence is desired, use of the NF derived from the pool of the genes rpoC, map and alaS or era is advisable.

Conclusion
Normalization is a prerequisite for accurate real time PCR expression profiling. Significant random fluctuations or, even worse, directional changes in the expression of chosen reference genes between samples, can lead to the lack of detection of small differences between genes of interest or to erroneous results. Therefore, it is extremely important to find appropriate reference genes with minimal variability. This cumbersome task is often avoided and frequently priorly used reference genes are assumed to be good normalizers without further evaluation in unexplored experimental setups.
The geometric mean of few carefully selected genes, rpoC, map, alaS and/or era, is demonstrated to be the best normalizer for A. ferrooxidans in the diverse experimental conditions used in this study. Use of a single gene for normalization, instead, may result in relatively large variations in target gene expression and significant errors depending on the gene in question and the experimental setup, as showed to be the case when using the recA gene. Conversely, it is suggested that rpoC, era or alaS could be used as normalizers if only one reference gene is strictly necessary. Since ribosomal RNA is much more abundant than most target mRNA transcripts and its quantification falls outside most calibration ranges, the use of rrs is not recommended especially for the measurements of low abundance transcripts.
Whatever strategy is used to normalize for differences in quality and quantity of input RNA it must be validated for a particular experimental model on an individual basis. This investigation provides a validated set of reference genes for those studying A. ferrooxidans gene expression under typical biological conditions and an initial point of departure for those exploring new experimental setups in this microorganism or other closely related Acidithiobacilli or possibly also in other bacterial models.

Bioinformatic selection of candidate reference genes
An initial set of candidate reference genes was compiled from a gene list that includes "housekeeping genes" of wide phylogenetic distribution [38]. This set was used to textmine the A. ferrooxidans ATCC 23270 genome (Gen-Bank/EMBL/DDBJ accession number CP001219) [39]. Additional candidates were identified in the A. ferrooxidans genome by BLASTP and TBLASTX searches. The com-bined set of candidate genes was then used to formulate bidirectional BLASTP and TBLASTX searches of the genomes of A. thiooxidans and A. caldus. Genomic context for genes present in all three genomes was analyzed using the DNA sequence viewer and annotation tool Artemis v.10 [40]. Candidate genes belonging to the same predicted operon or gene cluster were excluded from further analysis. Reference genes were classified by function using TIGRfams [41] and one gene per functional category was selected. These last two steps were included to reduce redundancy in the gene set.

Reverse transcription
cDNA was prepared from 1 mg total RNA using random hexamers and Superscript II reverse transcriptase (Invitrogen) according to manufacturer instructions. The resulting cDNA was diluted 1:10 in distilled water and stored in aliquots at -20°C until further use.

Real-time PCR
Primers for real-time PCR assays and amplification efficiencies (E) are shown in Table 3. The real-time PCR reactions were performed in the Mx3000P QPCR System (Stratagene) using the SYBR GreenER qPCR SuperMix Universal Kit (Invitrogen). The 20 ml PCR reactions contained 2 ml of a 1:100 diluted cDNA sample; 200 nM of each primer and 1× SYBR GreenER qPCR SuperMix Universal (Invitrogen). The reference dye ROX was included at a final concentration of 5 nM. The cycling protocol was as follows: initial denaturation for 10 min at 95°C followed by 40 cycles of 30 s at 95°C, 15 s at 52°C; 30 s at 72°C. Fluorescence was measured after the extension phase at 72°C. The PCR products were subjected to a melting curve analysis, that commenced at 52°C and increased at 0.5°C s-1 up to 95°C, with a continuous fluorescent measurement. Specific amplification was confirmed by a single peak in the melting curve. For each experimental condition total RNA was extracted from two independent A. ferrooxidans cultures. Each RNA sample was retro-transcribed and the expression of all genes was assessed on the same cDNA sample. The reactions for each target gene were performed in triplicate and in the same PCR run. Thus, data sets consist of 6 values per gene per experimental set-up generated under standardized PCR cycling conditions. Stationary phase genomic DNA 10-fold dilutions (ranging from 10 ng to 1 pg) were used to generate a 5-point standard curve for every gene by using the Cycle Threshold (C t ) value versus the logarithm of each dilution factor. Reaction efficiency (E = [10(-1/ slope)]-1) for every gene was derived from the slope of the corresponding standard curves. Transcript quantities were calculated from the standard curve by the software accompanying the MxPro3000P QPCR System (Stratagene) set with default parameters. Each experiment included a no template control.

Stability of gene expression and relative quantification
The stability of gene expression was evaluated using the Excel-based applications geNorm [11] and Normfinder [7] and the relative expression was calculated with qBase 1.3.5 [48]. Briefly, the geNorm method is based on a pairwise comparison approach and depends on the calculation of an M value, defined as the average pairwise variation of a particular gene with all others. The NormFinder method is based on a different mathematic model that considers the intra-and inter-treatment variation in expression for gene ranking. The normalization factors were calculated following the criteria defined by Vandesompele [11]. These include: a) to use the geometric mean (n numbers are multiplied and then the nth root of the resulting product is taken) as this controls better for possible outliers and abundance differences between genes and b) to define the minimum number of reference genes needed for a reliable calculation by evaluating the pairwise variation between sequential normalization factors including three, four, five or more stable reference genes (NF n /NF n+1 ).

Evaluation of reference gene expression by microarray transcript profiling
A. ferrooxidans gene expression was evaluated under three experimental conditions: (1) cells were grown in 9 K medium containing 62 mM FeSO 4 at pH 1.6 versus cells grown in 9 K medium 1% elemental sulfur at pH 3.5 containing; (2) cells were grown in 9 K medium containing 62 mM FeSO 4 plus 1% elemental sulfur at pH 1.6 versus cells grown in 9 K medium containing 1% elemental sulfur at pH 1.6 and (3) cells were grown in 9 K medium containing 200 mM FeSO 4 at pH 1.6 versus cells grown in 9 K medium containing 62 mM FeSO 4 at pH 1.6. Construction, experimental and data analysis protocols for A. ferrooxidans type strain specific oligonucleotide microarrays have been previously described [22] and deposited in the ArrayExpress database under the following accession numbers (A-MEXP-1478, A-MEXP-1479).

Authors' contributions
PAN carried out the real time PCR standardization and analysis; PCC was responsible for the culture handling and preparation of the genomic DNA and total RNA sam-ples. PAN and RQ conceived the study; EJ tutored PAN; RQ and DSH helped in the biological interpretation, and drafted the manuscript. All authors read and approved the final manuscript.