A small intergenic region drives exclusive tissue-specific expression of the adjacent genes in Arabidopsis thaliana

Background Transcription initiation by RNA polymerase II is unidirectional from most genes. In plants, divergent genes, defined as non-overlapping genes organized head-to-head, are highly represented in the Arabidopsis genome. Nevertheless, there is scarce evidence on functional analyses of these intergenic regions. The At5g06290 and At5g06280 loci are head-to-head oriented and encode a chloroplast-located 2-Cys peroxiredoxin B (2CPB) and a protein of unknown function (PUF), respectively. The 2-Cys peroxiredoxins are proteins involved in redox processes, they are part of the plant antioxidant defence and also act as chaperons. In this study, the transcriptional activity of a small intergenic region (351 bp) shared by At5g06290 and At5g06280 in Arabidopsis thaliana was characterized. Results Activity of the intergenic region in both orientations was analyzed by driving the β-glucuronidase (GUS) reporter gene during the development and growth of Arabidopsis plants under physiological and stressful conditions. Results have shown that this region drives expression either of 2cpb or puf in photosynthetic or vascular tissues, respectively. GUS expression driven by the promoter in 2cpb orientation was enhanced by heat stress. On the other hand, the promoter in both orientations has shown similar down-regulation of GUS expression under low temperatures and other stress conditions such as mannitol, oxidative stress, or fungal elicitor. Conclusion The results from this study account for the first evidence of an intergenic region that, in opposite orientation, directs GUS expression in different spatially-localized Arabidopsis tissues in a mutually exclusive manner. Additionally, this is the first demonstration of a small intergenic region that drives expression of a gene whose product is involved in the chloroplast antioxidant defence such as 2cpb. Furthermore, these results contribute to show that 2cpb is related to the heat stress defensive system in leaves and roots of Arabidopsis thaliana.


Background
A promoter region of an eukaryotic protein-encoding gene usually consists of a core promoter region of around 50 bp nucleotides adjacent to the transcription initiation site, and multiple distal DNA regulatory elements to con-trol transcription efficiency. There are several key genetic elements within a core promoter: the TATA box, an initiator element, the downstream promoter element usually found in TATA-less promoters, and the TFIIB-recognition element [1,2]. The TATA boxes are usually located about 25 to 30 bp upstream of the transcription start site (TSS), while the less conserved initiator elements span the TSS. These sequences contribute to an accurate transcription initiation and to the TATA-containing promoters strength. In Arabidopsis core promoters, the TATA box is located between -50 and -20 relative to the TSS and, instead of the initiator element around the TSS, the YR rule (Y: C or T; R: A or G) applies to most of them. Another element is the pyrimidine patch (Y Patch), although its role is still unknown. These three elements are orientation-sensitive [3]. Other promoter elements found in Arabidopsis and rice are regulatory element groups (REGs), which appear upstream of the TATA box (-20 to -400), and exist in an orientation-insensitive manner [3].
Transcription initiation by RNA polymerase II is unidirectional from most genes. However, several reports indicate that divergent transcription is likely a common feature for active promoters [4][5][6][7].
Divergent genes, defined as non-overlapping genes organized head-to-head in opposite orientation, represent a 36.5% of the total gene pairs when separated by less than 1 kb in the Arabidopsis genome [8]. Nevertheless, there is scarce evidence on functional analyses of the intergenic regions between those gene pairs. Previous findings of head-to-head oriented genes sharing an intergenic region with putative bidirectional promoters were reported in Brassica napus [9], Capsicum annuum [10], and by computational analysis in rice, Arabidopsis, and black cottonwood [11]. Large-scale studies of expression data in Arabidopsis revealed that neighbouring genes in the genome are co-expressed [12], and that the lengths of the intergenic sequences have opposite effects on the ability of a gene to be epigenetically regulated for differential expression [13]. Two recent papers have shown activity of larger intergenic regions in rice (1.8 kbp) and Arabidopsis (2.1 kbp), functioning as bidirectional promoters of chymotrypsin protease inhibitor [14] and chlorophyll a/bbinding protein [15] genes, respectively. These systems were assessed in a heterologous background using onion epidermal cells [14], and also in stable transgenic plants, the latter intended to be used for genetic engineeringbased crop improvement [15].
All divergent gene pairs are potential sources of bidirectional promoters. To define the function of the corresponding intergenic regions and their transcriptional regulation is of great interest for plant molecular biologists.
In this study, a divergent promoter of a protein-encoding gene pair (At5g06290 and At5g06280) with an intergenic region of 351 bp was analyzed. The At5g06290 and At5g06280 loci encode a 2-Cys peroxiredoxin B (2CPB), which are a chloroplast-located protein [16], and a protein of unknown function (PUF), respectively http:// www.arabidopsis.org. The 2-Cys peroxiredoxins are proteins involved in redox processes, and their functions are related to the antioxidant defence of the plant [17], photosynthesis, abiotic stress response, and possibly chloroplast-to-cytosol signalling [18]. In yeast, peroxiredoxins could act as molecular chaperons, increasing resistance to heat stress [19]. The expression pattern of the At5g06290 and At5g06280 was tested by fusing the intergenic region in opposite orientation to β-glucuronidase (GUS) reporter gene during the development and growth of Arabidopsis plants as well as during stress situations.

Functional analysis of the intergenic region between At5g06280 and At5g06290 in Arabidopsis plants during their development and growth
To test functionality of the intergenic region shared by the divergent genes At5g06280 and At5g06290 during Arabidopsis life cycle, the DNA fragment was fused to GUS in both orientations (Prom280:GUS and Prom290:GUS, respectively). Accordingly, we cloned a 530 bp DNA fragment (the 351 bp intergenic region and the 5' untranslated regions) upstream of GUS in the binary vector pBI101.1. The constructs were introduced into wild-type Arabidopsis plants by floral dip, multiple transgenic plants were obtained, and more than 3 independent lines were examined for each construct throughout development. GUS staining was performed in Arabidopsis plants during life cycle ( Figure 1, stages 1.0 to 6.9 according to [20]). Interestingly, Prom280:GUS plants have shown staining almost exclusively in the petiole and vascular bundle of midrib in all the leaves ( Figure 1C, 1E, 1G and 1I), sepals ( Figure 1K), but not in the cotyledons ( Figure  1A), while Prom290:GUS plants have shown staining mainly in the leaf mesophyll ( Figure 1B, 1D, 1F, 1H and 1J), sepals ( Figure 1L), and siliques ( Figure 1M and 1N). It is worth noticing that stronger GUS staining was observed for Prom290:GUS plants (it was visualized even after three hours of staining) in comparison with Prom280:GUS plants at all growth stages (data not shown). Results indicate that the intergenic region between At5g06290 and At5g06280 directs GUS expression in a spatially exclusive manner depending on the promoter orientation during Arabidopsis development and growth ( Figure 1).
As 2CPB is a chloroplastic protein [16], we analyzed the putative intracellular location of PUF using ChloroP 1.1 Server [21] and the deduced amino acid sequence of At5g06280. The prediction results have shown that PUF (156 residues) is likely to be a plastidic protein, because it has an amino-terminal extension indicative of chloroplast transit peptide (score 0.506). For comparison, 2CPB score was 0.598 using this web tool.

Response of Prom280:GUS and Prom290:GUS plants to various stresses
Different stress conditions lead to the production of reactive oxygen species (ROS) as a consequence of membrane and protein damage [22]. The expression of 2-Cys peroxiredoxins are reported to be redox regulated [23]. Therefore, it was decided to test the response of Prom280:GUS and Prom290:GUS plants to various environmental stresses. Firstly, the effect of temperature treatment in 10day old Arabidopsis seedlings was analyzed. Plants of both transgenic lines were incubated for 48 h at 37°C or 4°C and, after the treatment, they were submitted to GUS staining procedure. Figure 2 shows that leaves from both plant lines were stained stronger under heat stress ( Figures  2C and 2D), maintaining the same tissue specificity to the control condition (Figures 2A and 2B). In addition, the root tips were stained in the case of Prom290:GUS plants ( Figure 2J). In both plant lines the GUS staining pattern was conserved under cold stress ( Figures 2E and 2F), although the expression levels were weaker than control conditions as revealed by quantification of the GUS staining intensity ( Figures 2G and 2H). Furthermore, no expression was detected in the plants carrying the vector without the intergenic region (empty vector) ( Figure 2I). Further analysis of puf and 2cpb expression using the response viewer of Genevestigator software http:// www.genevestigator.ethz.ch/ [24] is presented in Figure 3. Under several cold treatments, the aerial part of Arabidopsis plants have evidenced decreased expression of puf and 2cpb, while under heat conditions, the plants have evidenced enhanced expression of both genes ( Figure 3). Similar responses were observed in the expression of GUS in Prom280:GUS and Prom290:GUS plants submitted to temperature stress ( Figure 2). Additionally, in roots of Prom290:GUS plants, the expression of 2cpb is markedly increased by heat stress ( Figure 2J), which is consistent with data obtained from roots under the same stress treatment ( Figure 3).
To confirm the effect of heat treatment on the induction of 2CPB, 10-day old wild-type Arabidopsis plants were submitted for 2 days at 37°C, and the total protein of leaves and roots were extracted and analyzed by SDS-PAGE and immunoblotting. Results are presented in  Additional file 1. The total protein pattern has shown slight differences between control and treated plants in the leaf or root tissues, especially in higher molecular masses larger than 66 kDa. Immunoblot analysis of these tissues has shown induction of 2CPB in both leaves and roots after heat treatment (Additional file 1, bottom panel). These data indicate that heat treatment was able to increase not only 2CPB protein level in root and leaf of wild-type plants (Additional file 1), but also GUS activity in the same tissues as observed in Prom290:GUS plants ( Figures 2D and 2J).
Other sources of ROS are biotic and abiotic stresses. The effect of different stress conditions on the expression levels of Prom280:GUS and Prom290:GUS plants were evaluated, and the results are presented in Figure 4. GUS Expression levels of the genes after heat or cold stress as shown by Genevestigator Figure 3 Expression levels of the genes after heat or cold stress as shown by Genevestigator. Response viewer of Genevestigator software shows that At5g06280 and At5g06290 genes decrease their expression levels in all cold stress experiments and increase their levels with heat stress treatments. Effect of several stresses on GUS expression in Prom280:GUS and Prom290:GUS lines These results suggest that puf and 2cpb are stress-responsive genes, although they are not always affected in the same way by the same stress conditions.

In search of cis-elements in the promoter of puf and 2cpb
In silico analysis of the divergent promoter was performed looking for cis-elements using the Plant Promoter Database (ppdb) [25], PlantCARE [26], PLACE [27], and Athamap [28] web tools. Analysis revealed no TATA box available. The elements distribution in the 530 bp region is shown in Figure 5. We identified binding sites for four homeodomain-leucine zipper transcription factors: ATHB1, which was reported to be involved in differentiation of the palisade mesophyll cells and leaf development [29,30]; ATHB2, which is responsive to far-red light [31]; ATHB5, which is a transcription factor involved in the regulation of light-dependent developmental phenomena [29]; and transcription factors similar to ZmHox2a, which have the homeodomains ZmHOX2a(1) and ZmHOX2a (2) [32]. Furthermore, a Y Patch near puf TSS, and seven REGs near 2cpb TSS were identified; however, their functions are still unknown. An AACA element, which was described as a negative regulatory element in vascular promoters that represses activity in other cell types [33], were identified in seven positions. Lastly, a CCAAT box, present in the promoter of heat shock protein (Hsp) genes [34], was found four times, and the nCTTn element present in the promoters of several Hsp genes [35] was found 23 times. This analysis displayed no other overrepresented cis-element in the promoter region under study.

Distribution of distances between genes and their nearest neighbours in Arabidopsis genome
To further characterize this 351 bp promoter on genomewide scale, the distribution of intergenic regions of similar lengths into the Arabidopsis genome was studied. For that purpose, the distribution of distances between Arabidopsis genes and their nearest neighbours in the same and opposite strands were explored. The distances between the TSS of the nearest gene neighbours for each of the 27,141 genes predicted (see Methods) after filtering out genes annotated as pseudogenes and transposons were calculated. The distribution of distances between 5'ends of genes on opposite strands is bimodal, which could be deconvoluted in two peaks centred at 323 bp (around 140 gene pairs between 300 and 350 bp length) and 2.5 kbp ( Figure 6A). This type of distribution was not present in all the around 14,000 genes with the nearest neighbours on the same strand ( Figure 6B), or when the distances were calculated between the 3'ends of the genes on opposite strands ( Figure 6C). Noticeably, only 4.3% of the gene pairs with 5'ends on the same strand are closer than 1,000 bp ( Figure 6B), while 75% of the gene pairs with 3'ends on opposite strands are closer than 1,000 bp, with 1,234 of them having overlapping regions ( Figure 6C, inset). We designated the region between the two non-overlapping 5'ends of genes located on opposite strands as a putative bidirectional promoter. This analysis shows that out of 6,438 divergent gene pairs ( Figure 6A), 2,469 are putative bidirectional promoters of less than 1,000 bp in the Arabidopsis genome. Most of the head-to-head oriented genes (98%) have predictably shown non-overlapping bidirectional promoters, and only 874 (13.8%) gene pairs are less than 323 bp in length.

Discussion
With the availability of complete genome sequences for a number of organisms, functionality of intergenic regions has attracted more attention. Computational analysis has shown that divergent gene pairs with intergenic regions less than 1 kb are quite abundant in the sequenced eukaryotic genomes of both plants and animals [5,8]. The interest in studying intergenic region functionality is increasing not only to better understand divergent transcription, but also to use them as a new toolkit to manipulate genomes [36]. In plants, particularly, very few reports about this matter are available. An example of such investigations in plants in which data from computational assistance and bidirectionalization were integrated to construct a synthetic transcriptional unit for high-level reporter-gene expression in response to specific elicitors was reported, thus yielding exciting results [37]. In this study, it has been found that the region shared by two divergent genes in the chromosome 5 of Arabidopsis thaliana (At5g06280 and At5g06290) functions as a promoter in both orientations. In addition, this study was able to demonstrate that tissue and developmental expression patterns differed between puf and 2cpb. Head-to-head genes from other organisms such as human, mouse, and rat genomes statistically tend to perform similar functions, and gene pairs associated with the significant cofunctions seem to have stronger expression correlations [38]. In this case, the gene products of At5g06280 and At5g06290 are both presumably located in the chloroplasts, although it is unknown if their functions are  related. Thus, it is known that 2CPB is located in the chloroplasts and prevents oxidative damage of chloroplast proteins [17]. The transcript increase of 2cpb was correlated with chlorophyll distribution and also accumulated in plants with decreased catalase activity and upon heat stress [39]. Down-regulation of 2cpb was observed upon pathogen infection, ozone and cold [40,41]. Instead, the role of PUF remains unknown until today, and presumably it would be a chloroplast-located protein as predicted by ChloroP analysis [21].
When searching for At5g06280 and At5g06290 potential orthologues, it has been found that this head-to-head gene organization was not conserved among other genomes (data not shown); pointing out that most probably their gene products are not functionally related. In humans, analysis of genome-wide expression data demonstrated that a minority of bidirectional gene pairs are expressed through a mutually exclusive mechanism [5]. In this study, the tissue-specific expression of both genes directed by the divergent promoter has shown unidirectional activity for puf in petiole and vascular bundles and unidirectional activity in the opposite direction in different tissues for 2cpb. The higher expression of 2cpb in the leaf mesophyll, but not in vascular bundles, is coincident with its function in the redox processes of chloroplasts [40]. Taken together, these results suggest that the directionality of the promoter activity may be regulated to some degree in a tissue-specific manner. In fact, a cis-motif associated to vascular bundle expression (AACA) [33] was found several times in the puf direction of transcription.
Furthermore, it has been demonstrated that the divergent promoter shared by puf and 2cpb responded to temperature stress. In relation to this, the higher 2CPB levels in the leaf and root caused by heat treatment of Arabidopsis seedlings would indicate a role of this protein in temperature stress. In yeast, peroxiredoxins could alternatively function as peroxidases and molecular chaperons, increasing resistance to heat stress [19]. It is well known No TATA boxes have been found. ATHB1 is the binding site of the transcription factor ATHB1, which is involved in differentiation of the palisade mesophyll cells and leaf development. ATHB2 is the binding site of the transcription factor ATHB2, which is an element of response to far-red light. ATHB5 is the binding site of the transcription factor ATHB5, which is involved in the regulation of light-dependent developmental phenomena. Hox2a_Hox2a is the binding site of proteins with the homeodomains ZmHOX2a(1) and ZmHOX2a (2). CCAAT box is found in the promoter of Hsp genes. Y Patch is a direction-sensitive plant core promoter element that appears around TSS. REG is a direction-insensitive element that is preferentially found around -20 to -400 bp relative to TSS. AACA is a negative regulatory element in vascular promoters that repress activity in other cell types. The yeast heat shock factor 1 binding sequence nTTCn is underlined in the minus strand and overlined in the plus strand. that exposure of plants to high temperature leads to the production of Hsps. The yeast heat shock factor 1 binding sequence nTTCn (or nGAAn) [35] was found highly represented in the intergenic region of this study. Therefore, it is tempting to speculate that high temperature could stimulate 2cpb similarly to Hsp genes. Remarkably, the puf expression was repressed similarly to 2cpb by several stress conditions.

Distribution of distances between genes and their nearest neighbours in Arabidopsis
In silico analysis of this promoter using ppdb revealed that it is a TATA-less promoter in both orientations. In plant genomes putative bidirectional promoters have TATA boxes underrepresented [11]. A recent study [42] suggested that TATA box-containing genes have longer intergenic upstream regions and increased variation across species because their upstream regulatory potential is greater and, therefore, more amenable to change and modulation. The TATA box appears to be responsible for promoter unidirectionality in most cases, whereas having no TATA boxes appears to be a novel mechanism of regulation by bidirectional promoters compared to unidirectional promoters. This analysis also revealed that in a short region of this promoter (28 bp) ( Figure 5B), four different cis-elements are overlapped. They are: one heat shock element (CCAAT box), a Y Patch found in the majority of Arabidopsis promoters but with unknown function [25], and three binding sites of homeodomainsleucine zipper transcription factors, some of them being able to bind in both directions [27,28]. These cis-elements would be leading the transcription of 2cpb, specially ATHB1, which is involved in differentiation of the palisade mesophyll cells, and ATHB5, which in turn is involved in the control of leaf morphology development [26]. Upstream of this region there are three AACA elements in the +/-25 bp region of puf TSS ( Figure 5A). This is a negative regulatory element in vascular promoters, which represses activity in other cell types [33] suggesting that, in the intergenic region under analysis, this cis-element would be preventing puf transcription in mesophilic cells. The expression of puf in vascular bundle of midribs could be activated by ATHB2, which has a homeodomain too, and by the Y Patch that is located in the 28 bp region above mentioned. The 2cpb and puf putative promoter regions mentioned have an element of response to heat near them, which could explain the heat stress experiments. It was not possible to find any abiotic stress element overrepresented in the 530 bp region analyzed, suggesting that the expression pattern observed in Figure  4 could be the result of the complex interaction of the transcription factors that bind the 28 bp region. Overall, results obtained from this study indicate that the multiple stress responsiveness of the intergenic region would reside within the 351 bp.
When length is considered, the short promoter shared by 2cpb and puf belongs to a minority group of putative bidirectional promoters present in the Arabidopsis genomes. In fact, Arabidopsis genome has a bimodal distribution of distances between the 5'ends of genes on opposite strands, peaking the smaller group of gene pairs at 323 bp. This is the first intergenic region functionally studied of this small group of Arabidopsis promoters. Plants are sessile organisms and, during their growth, they occasionally are affected by adverse environmental conditions; therefore, they may rely more strongly on elaborate transcriptional response programs to survive. Then, it is highly possible that other intergenic regions of similar lengths and regulatory features could be found in plants.

Conclusion
In this report, it has been shown that a 351 bp intergenic region between head-to-head oriented At5g06290 and At5g06280 directs genes expression in different Arabidopsis tissues in a mutually exclusive manner. Gene products of these loci are a chloroplast-located 2-Cys peroxiredoxin B involved in the antioxidant defence, and a protein of unknown function. This is the first report of an intergenic region that drives expression of a gene involved in the chloroplast antioxidant defence. These results also show that 2CPB is induced by heat stress in the leaves and roots, suggesting a function for this protein in the heat stress defensive system of Arabidopsis thaliana.

Plant material and growth conditions
Arabidopsis thaliana ecotype Columbia (Col-7) was synchronously germinated at 4°C for 48 h and grown in soilvermiculite mixture (2:1 v/v) in growth chambers at 20-22°C, under long day conditions (16 h light/8 h darkness). The light intensity was set at 130 μmol m -2 s -1 .

Stress treatments
Arabidopsis plants were cultivated on agar supplemented with the stress agent: osmotic stress (100 mM mannitol), salt stress (50 mM NaCl), oxidative stress (0.1 μM methyl viologen) or fungal elicitor (1.3 mg/mL autoclaved cellulase, Onozuka R-10, Yakult Honsha, Tokio, Japan). For cold (4°C) and high (37°C) temperature stresses, the plants were grown for 10 days on MS agar without supplements under control conditions and then the temperature treatment was applied for 2 days. For higher light intensity (800 μmol m -2 s -1 ), the plants were grown for 10 days and the treatment was applied for 6 h.

DNA constructs
The intergenic region with the 5'UTR regions of the genes At5g06280 and At5g06290 was isolated by PCR from an A. thaliana DNA CTAB preparation [43] using the primers 5'-CGCGGATCCAGTCTTTCTTCTTCTTTTTTTTTG-3' and 5'-CGCGGATCCTGACTCTGTTCTCTCTCTCTATC-3' (added BamHI restriction site in bold). The PCR product was subcloned into pGEM-T Easy Vector (Promega, Madison, USA). DNA sequencing was used to confirm that no spurious mutations were introduced during amplification. The fragment was excised with BamHI, and the 530 bp fragments were cloned into the BamHI site of pBI101.1 to create the plasmids pBI280 and pBI290. The orientation of the fragment was analyzed by PCR with primers that hybridize in the pBI101.1 plasmid (5'-ACAGTTT-TCGCGATCCAGAC-3' and 5'-TTATGCTTCCGGCTCG-TATG-3') and the primers previously described. Escherichia coli strain DH5α was used for plasmid construction. Agrobacterium tumefaciens strain GV3101 pMP90 was transformed with plasmids by electroporation, and Arabidopsis (Col-7) plants were transformed by floral dip infiltration [44] with the plasmids pBI101.1, pBI280, or pBI290.

Histochemical localization of GUS activity
GUS activity was localized by staining the tissues with 0.5 mg of 5-bromo-4-chloro-3-indolyl-b-D-glucuronic acid (X-Gluc; Gold Biotechnology, St Louis, MO, USA) per mL in X-Gluc buffer containing 50 mM sodium phosphate (pH 7.2), 10 mM EDTA, 0.33 mg/mL potassium ferricyanide and 0.001% Tween 20. The tissues were vacuuminfiltrated for three rounds of one min each, and staining reactions proceeded overnight at 37°C. Chlorophyll was removed by soaking in ethanol. The photographs were taken with a binocular microscope Leika MZ16F.

Analysis of GUS activity
Quantitative analysis of GUS activity was performed on whole aerial part using the GUS activity assay [45], the experiment was made twice, each treatment had three biological replicates and each replicate was a pool of 10 Arabidopsis plants, except the high light treatment which had four biological replicates.

Immunoblot analysis
To measure the protein levels of 2CPB, 100 mg of tissue were ground to a fine powder in liquid N 2 and then homogenized with 0.2 mL of buffer (25 mM Hepes (pH 7.5), 0.6 M mannitol, 0.462 mg/mL dithiothreitol, 2 mM EDTA, 0.175 mg/mL phenylmethylsulphonyl fluoride and 1% (w/v) polyvinylpolypyrrolidone). The homogenates were centrifuged at 15,000 g for 20 min, and the supernatant protein concentration was determined utilizing BSA as a standard protein as described by [46]. The supernatant was mixed with sample buffer 10× (250 mM Tris-HCl (pH 6.8), 10% SDS, 0.5% bromophenol blue and 20% glycerol), boiled for 5 min, and separated in a 12% SDS-PAGE as described earlier [47]. The gels were stained with Coomassie Brilliant Blue R-250. For immunoblotting, the proteins were transferred to nitrocellulose membranes using a Mini Trans-Blot cell (Bio-Rad, CA, USA) at 100 mA for 100 min. The membranes were treated with polyclonal antibody raised against rapeseed 2-Cys peroxiredoxin [48]. Signals on the membranes were visualized with alkaline phosphatase-conjugated goat anti-rabbit IgG (SIGMA, St Louis, MO, USA).
The signal intensities were quantified from the immunoblot using the Gel-Pro Analyzer software (Media Cybernetics Inc, Silver Spring, MD) and normalized to the intensities observed in control conditions. A representative example from three independent experiments is shown.

Arabidopsis promoters length analysis
Annotation data for the Arabidopsis thaliana genes was downloaded from The Arabidopsis Information Resource (TAIR) FTP server ftp://ftp.arabidopsis.org/Maps/ seqviewer_data/sv_gene.data. The analysis was performed on 27,141 genes after filtering out pseudogenes and transposon-related genes ftp://ftp.arabidopsis.org/Maps/ gbrowse_data/TAIR8/ TAIR8_GFF3_genes_transposons.gff from 31,762 annotated genes. Start and stop positions of the transcription units along with information on the strand that encodes an mRNA were extracted. Microsoft Office Excel was used to calculate the distances between the 3' ends of the nearest neighbour genes and the distances between 5' ends of the neighbour genes. The overlapping genes were analyzed only in the graph corresponding to the 3'ends of the nearest neighbour genes and the resulting distances among them were less than zero (shown in Figure 6C, inset).