Normalization of gene expression is critical for the accurate interpretation of transcriptional changes determined by qPCR. The use of total RNA as a reference has been explored as an alternative method to normalize qPCR gene expression but this approach was found not to be suitable as ribosomal RNA, the main constituent, was found to be regulated [24–26]. The use of multiple reference genes is deemed as the preferred method, where technical variations are taken into account [4, 5]. In this study we aimed to identify reliable reference genes for qPCR in recombinant protein producing E. coli, since over expression of recombinant proteins in E. coli is extensively used in biotechnology.
We first analyzed single platform and uniformly normalized microarray expression data from a substantial number of recombinant protein over-expression studies. From these, we selected 20 candidate reference genes based on their variation of signal intensities across all selected arrays. The expression of the 20 candidate genes and 2 commonly used reference genes were then measured by qPCR under a variety of conditions, including different strains, growth stages, temperatures, and induction levels. The expression of cysG, idnT and hcaT were found to be most constant as ranked by geNorm and NormFinder. Furthermore, scaling with the geometric average of these three genes (cysG/idnT/hcaT) provided accurate data interpretation under all tested conditions. The genes, idnT and hcaT, encode transporters for idonate and 3-phenylpropionate, respectively and cysG encodes a metabolic enzyme involved in siro-heme synthesis. Putative regulations of these gene expressions were predicted in EcoCyc http://ecocyc.org/, but only the control of idnT expression by its substrate idonate was experimentally verified in E. coli. Idonate is not a commonly used ingredient in E. coli media, which might explain the constant expression of idnT in the diverse microarray studies which we have analyzed and the qPCR study carried out herein.
Normalization of gene expression using the scaling factor derived from the three most stable genes (cysG/hcaT/idnT) revealed that the induction of the 4 enzymes (dxs, idi, ispD, and ispF) of the DXP pathway did not inhibit the expression of the enzymes involved in the lycopene production. The lycopene biosynthetic genes expressed from multiple copy plasmid and some of endogenous DXP pathway genes were found to maintain constant transcription levels during high induction levels. In contrast to the empirically validated reference genes described here, rrsA[6, 7] encoding ribosomal RNA 16S and ihfB [2, 8–13], were unvalidated but yet commonly used as reference genes in E. coli. Disturbingly, we found that ihfB and rrsA were less stable than cysG/idnT/hcaT under many conditions tested. The expression of lycopene biosynthetic genes normalized with ihfB appeared to be upregulated by up to 5 fold during induction, while rrsA overestimated gene downregulation by up to 8 fold (Figure 4 and additional file 1: supplementary figure S4). This type of incorrect interpretation can lead to faulty conclusions with regard to transcriptional regulations and is especially problematic in guiding further genetic manipulations. These results showed that the E. coli transcription machinery is fairly robust, and the lycopene production inhibition is not due to altered transcriptions of genes in the pathway examined. The results also illustrate the importance of the use of reliable, validated reference genes for qPCR analysis in E. coli.
In addition, the stabilities of gene expressions of rrsA in two tested strains were not consistent. We found that rrsA expression was fairly stable in DH10B but not in BL21 (DE3). This could be due to the strain-specific expression systems of the two cells. BL21 (DE3) was engineered to produce recombinant T7 RNA polymerase for target protein expression whereas DH10B uses E. coli endogenous expression systems. Another possibility could be due to the differences in their genetic backgrounds. Nonetheless, this example clearly highlights the necessity of evaluating the suitability of reference genes in different experimental contexts. Extending the study herein, it will be of interest to examine the use of the normalization factor of these three genes (cysG/hcaT/idnT) in more specific applications including conditions in biotechnology such as heat/cold-shock stresses, the presence of alcohols, and knockout of metabolic genes, and the study of pathogenic strains.