Complex organisation and structure of the ghrelin antisense strand gene GHRLOS, a candidate non-coding RNA gene

Background The peptide hormone ghrelin has many important physiological and pathophysiological roles, including the stimulation of growth hormone (GH) release, appetite regulation, gut motility and proliferation of cancer cells. We previously identified a gene on the opposite strand of the ghrelin gene, ghrelinOS (GHRLOS), which spans the promoter and untranslated regions of the ghrelin gene (GHRL). Here we further characterise GHRLOS. Results We have described GHRLOS mRNA isoforms that extend over 1.4 kb of the promoter region and 106 nucleotides of exon 4 of the ghrelin gene, GHRL. These GHRLOS transcripts initiate 4.8 kb downstream of the terminal exon 4 of GHRL and are present in the 3' untranslated exon of the adjacent gene TATDN2 (TatD DNase domain containing 2). Interestingly, we have also identified a putative non-coding TATDN2-GHRLOS chimaeric transcript, indicating that GHRLOS RNA biogenesis is extremely complex. Moreover, we have discovered that the 3' region of GHRLOS is also antisense, in a tail-to-tail fashion to a novel terminal exon of the neighbouring SEC13 gene, which is important in protein transport. Sequence analyses revealed that GHRLOS is riddled with stop codons, and that there is little nucleotide and amino-acid sequence conservation of the GHRLOS gene between vertebrates. The gene spans 44 kb on 3p25.3, is extensively spliced and harbours multiple variable exons. We have also investigated the expression of GHRLOS and found evidence of differential tissue expression. It is highly expressed in tissues which are emerging as major sites of non-coding RNA expression (the thymus, brain, and testis), as well as in the ovary and uterus. In contrast, very low levels were found in the stomach where sense, GHRL derived RNAs are highly expressed. Conclusion GHRLOS RNA transcripts display several distinctive features of non-coding (ncRNA) genes, including 5' capping, polyadenylation, extensive splicing and short open reading frames. The gene is also non-conserved, with differential and tissue-restricted expression. The overlapping genomic arrangement of GHRLOS with the ghrelin gene indicates that it is likely to have interesting regulatory and functional roles in the ghrelin axis.


Background
The peptide hormone ghrelin has many important physiological and pathophysiological roles, including the stimulation of growth hormone (GH) release, appetite regulation, gut motility and proliferation of cancer cells. We previously identified a gene on the opposite strand of the ghrelin gene, ghrelinOS (GHRLOS), which spans the promoter and untranslated regions of the ghrelin gene (GHRL). Here we further characterise GHRLOS.

Results
We have described GHRLOS mRNA isoforms that extend over 1. . Interestingly, we have also identified a putative non-coding TATDN2-GHRLOS chimaeric transcript, indicating that GHRLOS RNA biogenesis is extremely complex. Moreover, we have discovered that the 3' region of GHRLOS is also antisense, in a tail-to-tail fashion to a novel terminal exon of the neighbouring SEC13 gene, which is important in protein transport. Sequence analyses revealed that GHRLOS is riddled with stop codons, and that there is little nucleotide and aminoacid sequence conservation of the GHRLOS gene between vertebrates. The gene spans 44 kb on 3p25.3, is extensively spliced and harbours multiple variable exons. We have also investigated the expression of GHRLOS and found evidence of differential tissue expression. It is highly expressed in tissues which are emerging as major sites of non-coding RNA expression (the thymus, brain, and testis), as well as in the ovary and uterus. In contrast, very low levels were found in the stomach where sense, GHRL derived RNAs are highly expressed.

Conclusion
GHRLOS RNA transcripts display several distinctive features of non-coding (ncRNA) genes, including 5' capping, polyadenylation, extensive splicing and short open reading frames. The gene is also non-conserved, with differential and tissue-restricted expression. The overlapping genomic arrangement of GHRLOS with the ghrelin gene indicates that it is likely to have interesting regulatory and functional roles in the ghrelin axis.

Background
Ghrelin, a hormone with many physiological and pathophysiological roles, was initially described as the endogenous ligand for the growth hormone secretagogue receptor (GHSR 1a), through which it stimulates the release of growth hormone from the anterior pituitary [1]. Ghrelin is primarily produced in the stomach and plays a key role in regulating appetite, gut motility and energy balance [2][3][4][5][6]. Ghrelin is also an autocrine factor in a number of tissues, as it regulates insulin release and has therapeutic potential for inflammatory diseases, heart disease, cancer cachexia, diabetes mellitus and obesity [7]. Despite the importance of ghrelin in a range of physiological systems and pathophysiological conditions, little is known about the regulation of ghrelin synthesis and secretion. We previously identified a gene on the opposite strand of the ghrelin gene, ghrelinOS (GHRLOS), which spans the promoter and untranslated regions of the ghrelin gene (GHRL) [8]. However, the genomic structure, expression pattern and potential function of GHRLOS remains to be investigated. It is not known whether GHRLOS RNA species with open reading frames exist, or whether GHRLOS is a non-coding RNA gene.
There is strong support for the hypothesis that antisense transcripts provide a widespread and important mechanism for the regulation of the human genome [9,10]. Our understanding of the genome is currently undergoing a paradigm shift, as a previously hidden and complex layer of antisense and non-coding RNAs is emerging, which controls gene transcription and translation through a diverse range of mechanisms [11]. Much phenotypic diversity between humans and other species is likely to be due to regulation by RNA [12].
In this study, we examined the genomic structure and organisation of GHRLOS. We have found that GHRLOS spans approximately 44 kb of genomic DNA and transcribes long, 5' capped, polyadenlyated RNA species that are extensively spliced and differentially expressed. High levels of GHRLOS expression occur in the emerging non-coding RNA tissues, the brain, testis and thymus. We have also examined GHRLOS RNA species in silico, revealing that GHRLOS is a candidate non-coding RNA gene. These data provide a strong basis for further functional studies to determine whether GHRLOS plays a role in the regulation of ghrelin gene expression.

Characterisation of GHRLOS start sites and alternative splicing
An initial aim of this study was to characterise GHRLOS in a range of tissues. As we have previously demonstrated the expression of GHRLOS mRNA transcripts in the human stomach [8], we performed 5' RLM-RACE (RNA ligase mediated rapid amplification of cDNA ends) on this tissue. Unexpectedly, we identified a number of new exons (exon I-III) [GenBank:EU789528, EU789529, and EU789530] that are 2.5 to 4.8 kb upstream of the previously reported GHRLOS transcription start sites [8] in exon 4* (4*a-c). To simplify the numbering of GHRLOS exons, the previously reported [8] exons 4*, 2**, 2* and -1* have been renamed exon 1 to 4, respectively; while the exons upstream of the reported start sites in exon 1 are denoted by Roman numerals (I-III). Importantly, in the GHRLOS variants demonstrated via 5' RLM-RACE, exon 1 is extended, with a 106 nt region (exon 1d, hereafter termed exon 1) overlapping exon 4 of the ghrelin gene (see [Additional file 1]). The novel first exon, exon I, is 51 bp in size ( Fig. 1) and overlaps the 3' untranslated region of the adjacent gene, TATDN2 (TatD DNase domain containing 2, also known as hypothetical protein KIAA0218) [13]. This gene initiates on the same DNA strand as GHRLOS, approximately 32,000 base pairs upstream of the 51 bp exon I of GHRLOS.
In order to determine the polyadenylation site(s) of GHRLOS transcripts, 3' RACE and inverse PCR were conducted using normal stomach and prostate tissues, the  ID 183613). Furthermore, all of the amplicons that we obtained were followed by a stretch of adenosines at the 3' ends that are not present in the genomic sequence. We, therefore, concluded that we had reached a genuine 3' end of

Complex pattern of GHRLOS variant exon expression
In order to examine the size and tissue distribution of GHRLOS transcripts, we performed Northern blot analysis of mRNA from human stomach and 12 normal, human tissues. With a riboprobe designed to span exons I, II, 1 and 2 of GHRLOS, a weak, smeared signal ranging from approximately 1.0 to 2.0 kb in size was observed in the human stomach upon a lengthened exposure time (data not shown). A riboprobe spanning exon 1 alone (a region common to all GHRLOS RNA isoforms) resulted in signals ranging from 1.0 to 5.5 kb in size in the pancreas, prostate, salivary gland and thymus (Fig. 2). Upon longer exposure times, a smear in the 1.0 to 5.5 kb range was seen in all tissues, except for peripheral blood leukocytes and urinary bladder (data not shown). This suggests that GHRLOS is a fully processed transcript consisting of many mRNA isoforms.

GHRLOS exons are highly polymorphic
The RACE, inverse RT-PCR and Northern blotting experiments indicate that GHRLOS is extensively spliced and that isoforms range greatly in size (from approximately 1.0-5.5 kb). To examine the alternative splicing pattern of GHRLOS in greater detail, we performed RT-PCR using a range of human tissues and cell lines.
We used a forward primer common to exon Ia (identified via 5' RLM-RACE) and a reverse primer in a region common to exons 4a and b. Sequence analysis indicated that, with the exception of exons 1 and 4, which are common to all known GHRLOS isoforms, GHRLOS exons are highly polymorphic in size and exon skipping occurs frequently. A representative amplicon banding pattern is shown in [Additional File 2].
The highly variable splicing pattern revealed by RT-PCR is consistent with the diffuse and broad signal which we detected by Northern blotting. In total we obtained 13 different GHRLOS splice variants (Fig. 3). Analysis using GMAP, a genomic mapping and alignment program for mRNA and EST sequences [14], indicated that all GHRLOS exons are flanked by canonical splice donor and acceptor sites (GT/AG), except for exons 3a and 3c where the splice junction is the common non-canonical splice pair GC/AG [15]. GHRLOS exons and introns identified are listed in [Additional File 3].  [16,17]. The CAGE method, therefore, detects the most 5' site of the mRNA transcripts (the transcription start site) and gives an unbiased and comprehensive picture of the positions and usage of transcription start sites [18]. To confirm if this region belongs to GHRLOS, we employed nested RT-PCR using thymus tissue, foetal brain tissue and Hep G2 hepatocarcinoma cell line cDNA (with RNA reverse transcribed using oligo(dT) primers). The forward primers were present immediately downstream of the CAGE tag cluster (which is 1.5 kb upstream of the 51 bp exon I of GHRLOS) and the reverse primers spanned exon 1 (which is common to all known GHRLOS variants). Sequence analysis revealed several novel exon I variants, which were approximately 1601 bp, 994 bp and 526 bp in length. All of these variants spliced into the expected acceptor site of the 106 bp exon 1 ( Fig. 1 and [Additional broad-type, TATA-less promoter that initiates transcription at many sites [19]. The length of GHRLOS transcripts initiating in exon I (present in the 3' UTR of TATDN2) is 1.3-3.6 kb, corresponding in size to the Northern data. However, the potential identity of the approximately 5.5 kb transcript seen in the Northern blot with a full-length, 106 nt exon 1 probe (Fig. 2) was not determined. After prolonged exposure of the Northern blot, the 5.5 kb transcript was observed in all tissues except the urinary bladder and uterus (data not shown). TATDN2 is only approximately 1 kb upstream of exon II and 4.5 kb from exon 1 of GHRLOS, suggesting that transcription-induced chimaeras (TICs) may be generated [20] and give rise to large GHRLOS transcripts. Because TICs must contain a first exon of the upstream gene [20], we employed primers in exon 2 (immediately after the start codon) of TATDN2 and exon 1 of GHRLOS (which was also the sequence of our Northern riboprobe).
Using nested RT-PCR (on cDNA reverse transcribed using oligo(dT) primers), we isolated a 2831 bp TATDN2-GHRLOS amplicon from the thymus [GenBank:EU789553] (Fig. 4). This transcript has canonical splice donor and acceptor sites (GT/AG) and splices into the expected acceptor site of the 106 bp exon 1. This variant harbour significant open reading frames corresponding to the TATDN2 protein, but contains alternative exons and a premature termination codon more than 50 bp upstream of the final coding exon 7 of TATDN2 (Fig. 4). This is likely to result in degradation of the mRNA by nonsense-mediated RNA decay (NMD), a surveillance mechanism that detects and degrades mRNA that may encode truncated proteins with dominant-negative or deleterious gain-of-function activities [21].
We have identified several novel GHRLOS transcripts. This includes overlapping GHRLOS transcripts initiating in the TATDN2 3' UTR and putative transcriptioninduced chimaeras of TATDN2 and GHRLOS. These findings extend the previously reported length of GHRLOS by ~37 kb. We propose that GHRLOS harbour several promoters with start sites in exon 1 (of GHRLOS), and in the 3' UTR and the first exon of TATDN2.

Results of in silico analysis indicate that GHRLOS is a non-coding RNA gene
Sequence analysis of GHRLOS (excluding the putative TATDN2-GHRLOS transcription-induced chimaeras that are likely to result in nonsense mediated decay) reveals that GHRLOS transcripts do not harbour protein coding potential, but rather have several features of non-coding RNA genes. GMAP analysis showed that suggests that GHRLOS has evolved rapidly and may be unique to primates.
Our study suggests that GHRLOS is a non-coding RNA. In silico translation revealed that the heterogeneous GHRLOS RNAs contain multiple stop codons, resulting in lack of extensive reading frames, and the putative ORFs do not span conserved regions (data not shown). Moreover, no significant sequence similarity to any known proteins was observed (data not shown). Finally, screening of GHRLOS sequence against a reference collection of repeats (RepbasE) using CENSOR [22] identified a 203 bp overlap of exon 4 with the extinct 224 bp MIR3 SINE element, which is present in all vertebrates [23] (Fig. 5). Interestingly, the presence of repeat elements in exons of non-coding RNAs has been reported previously [24][25][26][27]

The GHRLOS terminal exon 4 and a putative SEC13 exon overlap in an antisense manner
We have discovered that exon 4 of GHRLOS is also on the opposite strand of a novel terminal exon of the neighbouring SEC13 gene in a tail-to-tail, 3' to 3', fashion (  Fig. 6C and D). The putative C-terminal-coding exon of the SEC13-T isoform appears to be conserved only in primates (data not shown). Therefore SEC13-T alternative splicing is likely to be human-specific or primate-specific.

GHRLOS is expressed in many tissues and cell lines and the level of expression shows great variability
RT-PCR analysis (with primers spanning GHRLOS terminal exons) and Northern blotting (which is only suitable for high copy number transcripts) demonstrated that the size of GHRLOS transcripts is highly variable, resulting in significant transcript heterogeneity. We, therefore, employed a quantitative, real-time RT-PCR approach in order to more precisely gauge the expression of GHRLOS in a range of tissues and cell lines. As the number of alternatively spliced GHRLOS transcripts makes it impossible to generate real-time RT-PCR primers that are unique to each splice variant (data not shown), a strand-specific quantitative RT-PCR assay with primers in exon 4 (which is common to all GHRLOS variants) was designed to detect total GHRLOS RNA expression (Fig. 7A).
The level of total GHRLOS transcript expression varied greatly in different human tissues, with high levels in the thymus, testis, foetal brain, uterus, cerebellum, ovary, thyroid, and whole brain. Very low levels of total GHRLOS RNA expression were detected in the stomach, foetal liver and pancreas (Fig. 7B). In the thymus the level of expression was approximately 133 fold higher than in the stomach (P<0.01) and 110fold higher than in the foetal liver (P<0.01). Furthermore, the level of expression in the adult liver was 9-fold higher than in the foetal liver (P<0.05), indicating differential expression according to developmental stage in this tissue. The level of GHRLOS in the foetal brain was two-fold higher than the adult brain, but this was not statistically significant (P>0.05).
Expression of GHRLOS in a number of continuous cell lines was also examined using

Comparison of total GHRLOS and total GHRL expression
We examined the expression levels of the GHRL-GHRLOS cis-natural antisense transcript (cis-NAT) pair via quantitative real-time RT-PCR assays detecting total transcription from the ghrelin gene (GHRL). As expected [31], the highest level of GHRL expression was found in the stomach (data not shown), followed by the testis and pancreas (Fig. 8). When comparing total ghrelin and GHRLOS RNA expression in the stomach, GHRL was expressed at 2300 fold higher levels than GHRLOS, with GHRLOS expression almost undetectable (P<0.001). The levels of GHRLOS expression were higher than GHRL in the thymus, whole brain, the SW1353 chondrosarcoma cell line, uterus and prostate. However, total GHRL RNA levels were higher than GHRLOS in the pancreas and the OVCAR-3 ovarian cancer cell line.

Discussion
Our study demonstrates that GHRLOS gives rise to long, extensively spliced, mRNAlike, 5' capped and 3' polyadenylated transcripts suggesting that they are genuine products of RNA polymerase II mediated transcription [32]. We have shown that the GHRLOS gene gives rise to transcripts 1.0 to 5.5 kb in size and has many broadly distributed transcription starts sites (TSSs) (Fig. 9). This includes several TSSs in exon 1, TSSs overlapping the 3' UTR of TATDN2 and evidence of transcription induced chimaeras employing TSSs in the first exon of TATDN2. The ghrelin locus, therefore, gives rise to many antisense transcripts that are currently annotated as a single gene, GHRLOS. A well-described example of such complex architecture is the imprinted murine Gnas locus, which gives rise to multiple coding and non-coding sense and antisense transcription units [33].
First reported in 1987 [34] and originally thought to be a rarity, it has recently been established that promoters in 3' untranslated regions (3' UTRs) are not uncommon and may be independently transcribed and regulated from their upstream "host gene" [35]. The ENCODE (ENCyclopedia Of DNA Elements) consortium has recently demonstrated that two-thirds of the loci in their dataset contain new putative first exons, which frequently overlap upstream genes [36]. However, it is currently not known how promoters that overlap 3' UTRs are regulated and coordinated [37]. We have demonstrated GHRLOS transcription start sites in the 3' UTR of TATDN2 via RLM-RACE. The sequence upstream of exon I of GHRLOS contains no apparent TATA boxes (data not shown), indicating that GHRLOS has a broad type promoter, with many potential transcription start sites in the 2.1 kb 3' UTR of TATDN2. This may allow the transcription of numerous tissue-specific and developmental stagespecific transcripts [19]. In addition, multiple CAGE tags are present in the 3' UTR of TATDN2, indicating that GHRLOS transcripts initiate in this region.
Interestingly, we also report the joining of exons of the neighbouring genes TATDN2 and GHRLOS. Similar chimaeric transcripts (not caused by chromosomal translocation) have been reported in lower eukaryotes [38,39], but were until recently assumed to be relatively rare in mammals [20,[40][41][42][43]. It is not known how chimaeric transcripts arise, but transcriptional read-through, followed by canonical cis-splicing is the most likely mechanism [40]. Alternatively, chimaeric transcripts could arise through trans-splicing, but the existence of this mechanism has not been well- We previously reported that GHRLOS completely overlaps the ghrelin (GHRL) gene [8]. Here we also show that the 3' terminal exon 4 of GHRLOS is present on the opposite (antisense) strand to a novel, 3' terminal SEC13 exon (Fig. 8). SEC13 is a protein that forms a part of the coat protein complex II (COPII) [28]. COPII proteins are required for the trafficking of nascent proteins from the endoplasmic reticulum (ER) to the Golgi apparatus. It also plays a role in the selection and concentration of cargo proteins for transport [29]. SEC13 [28], therefore, has a core endocrine function. GHRLOS overlaps a novel SEC13 variant, SEC13-tentative (SEC13-T). antisense transcripts may regulate one or both of these genes.

GHRLOS, a candidate non-coding RNA
Non-coding RNAs are frequently not conserved between species, suggesting that they are either biological noise (non-functional transcription), or that they have species specific-functions. Species-specific non-coding transcripts have been observed and there is strong evidence that non-coding transcripts are functionally significant [48][49][50][51][52]. Interestingly it has recently been observed, using in silico analysis, that even between closely related Drosophila species non-coding RNAs are not conserved [53].
While the number of protein coding genes in distant eukaryotes (such as worms, mice, and humans) is approximately equal, the relative amount of non-coding DNA increases in proportion to eukaryotic complexity [54,55]. Mattick and colleagues explain this paradox by hypothesising that non-coding RNAs have evolved to enable the emergence of organisms with increasingly complex higher levels functions [55,56]. Our in silico analysis suggests that GHRLOS exons show very low sequence conservation in vertebrate species. Furthermore, exon 4 of GHRLOS contains a transposable element, a feature observed in many non-coding RNAs [25][26][27].
Our bioinformatic studies indicate that GHRLOS does not encode a protein and, therefore, is a non-coding RNA. Taking into account the full-length sequence of suggested that many small peptides may be translated [57], the majority of small peptides are processed from larger precursor proteins, as is ghrelin itself which is processed from preproghrelin. Therefore, while it cannot be excluded that GHRLOS encodes short ORFs (that are not conserved in the mouse) it appears unlikely that GHRLOS encodes biologically active peptides.
We examined the GHRLOS expression profile to strengthen the hypothesis that it is a candidate non-coding RNA. It has been demonstrated that mammalian long noncoding RNAs are expressed in a tissue-specific manner, indicating that they are biologically significant [50, [58][59][60][61]. In both humans and mice, the major tissues of non-coding expression are the complex organs; the brain, testis, and thymus [50,61].
This also holds true in Drosophila, where the majority of the candidate non-coding RNAs are expressed in the central nervous system [53]. Indeed, non-coding RNAs are emerging as important regulators of complex systems, such as the central nervous system (brain) and intricate processes, including spermatogenesis in the testis [62][63][64][65].
We demonstrated high levels of GHRLOS in the thymus, brain, testis, uterus, ovary and thymus, while the expression levels in the stomach, where GHRL is highly expressed [1], were almost undetectable. Our data demonstrate that GHRLOS is predominantly expressed in a limited number of tissues and cell types, suggesting that these transcripts have physiological functions in distinct cell types and tissues.
Moreover, real-time RT-PCR showed extremely low levels of GHRLOS in the foetal and adult liver and high levels in the Hep G2 hepatocarcinoma cell line, suggesting that GHRLOS expression may be altered in liver cancer. Indeed, it has recently been reported that non-coding RNA expression is frequently altered in cancer [58,66]. This indicates that non-coding RNAs may have specific functions in normal cells. Much like protein-coding transcripts, ncRNAs may act as tumour suppressors, or be upregulated in cancer and act as oncogenes. An examination of GHRLOS expression in cancer would, therefore, be of great interest.
Here we have characterised the structure and organisation of GHRLOS, a ghrelin antisense gene, suggesting that GHRLOS has multiple first exons. Therefore, it is possible that GHRLOS could be a part of very large, continuous ncRNA species in the 3p25 chromosomal region and beyond. In the absence of hallmarks, such as large open reading frames, mapping complex non-coding RNA genes remains a complex task. For example, two RNAs in the FMR1 locus, the cis-NAT ASFMR1 [67] and the non-coding RNA gene FMR4 (found just upstream of FMR1 [51]) may be one continuous RNA isoform [51].

What is the function of GHRLOS?
It is currently difficult and costly to determine the mechanism of function of long noncoding RNAs. The roles of non-coding RNAs are likely to be diverse, and there is strong evidence that they play a role in regulating important pathways. Many ncRNAs are expressed during development, neural differentiation, during macrophage activation and in cancer, indicating that they have key functions in these processes [12,63,68]. Non-coding RNAs have been found to play a role in the silencing of overlapping genes in cis [69], in the silencing of distant chromosome regions in trans [70], in nuclear trafficking [71], apoptosis [51,72], promoter repression [73], and can act as tumour suppressors [74]. Moreover, ncRNAs are emerging as markers for complex human disease, including lung cancer [75], heart disease [76], and a range of other pathologies [68,77]. GHRLOS may also serve as a host gene for snoRNA (small nucleolar RNAs) genes [78] or GHRLOS RNA transcripts may be precursors for short RNAs, such as micoRNAs [77], endogenous siRNAs [79,80], piRNAs [81] and other novel, short non-coding RNA species [82,83].
Although the understanding of natural antisense transcripts (NATs) remains in its infancy, they have been associated with a range of regulatory mechanisms that are not necessarily mutually exclusive. This includes transcriptional interference, RNA masking and dsRNA mediated gene-silencing via direct interaction between the sense and antisense transcripts [9,84,85].
While it is difficult to predict GHRLOS function, the fact that all spliced GHRLOS variants share exon 1, which overlaps the 3' untranslated exon 4 of GHRL is striking.
Our findings suggest that GHRLOS functions as a non-coding RNA. There are a few examples that suggest that antisense transcripts are important in the regulation of endocrine hormone receptors, including a thyroid hormone receptor [86,87], and the luteinising hormone/choriogonadotropin receptor [88] gene and in the regulation of growth factors [89]. Interestingly, it has been recently suggested that the invertebrate (insect) polypeptide hormone allatostatin may be regulated by cis-NATs [90]. To date, however the physiological and pathophysiological roles of natural sense/antisense pairs have not been elucidated for any vertebrate endocrine hormone.
Further studies are necessary to reveal the function and molecular mechanisms regulating the candidate non-coding RNA gene GHRLOS.

Conclusions
In the present study, we have characterised GHRLOS, which gives rise to endogenous ghrelin natural antisense transcripts. GHRLOS exhibits features which are common to many non-coding RNA genes, including extensive splicing, lack of significant and conserved open reading frames, differential expression and lack of conservation in vertebrates. Our data also reveal that GHRLOS contains multiple first exons and that it overlaps both GHRL and a novel SEC13 exon in the antisense direction, suggesting that GHRLOS may have a role in regulating these genes. Moreover, we report TATDN2-GHRLOS chimaeras that may function to regulate the translation of the putative DNase TATDN2. Additional studies are underway to elucidate the functions of GHRLOS and to investigate, in particular, its overlapping genomic arrangement with the ghrelin gene. These studies may provide a new, physiologically relevant model system for investigating the roles of antisense gene and non-coding RNA regulation and the mechanisms involved, as well as establishing whether GHRLOS RNAs may be useful markers for diagnosis and prognosis of complex disease.

Bioinformatics
Multiple sequence alignments were generated using the MUltiple sequence Local AligNment and conservation visualization tool (Mulan) [  To locate transcription start sites in the putative first exons of GHRLOS, CAGE (Cap Analysis of Gene Expression) tags (deposited by the RIKEN consortium and its collaborators) were obtained via the Genome Network Platform Viewer [95]. We then recovered the RNA library information for each CAGE tag starting site. Briefly, each CAGE tag was individually queried against the 1.4 GB CAGE tag sequencing file, (release date 13.11.2006) available on the Genome Network Platform website using the UNIX grep command [96].
The exon-intron-structure of ESTs and mRNA entries identified from BLAST searches, as well as sequenced PCR amplicons obtained in this study, were analysed against the human genome (NCBI release 35) using GMAP [14]. Presence of open reading frames was analysed by NCBI ORF Finder [97], Fickett's TestCode [98,99] and ESTScan2 [100,101]. The presence of transposable elements in GHRLOS sequence was examined using CENSOR v4.2.8 [22]. Protein domain analysis was performed using the SMART database [102].

Cell culture and RNA extraction
The following cell lines (originally obtained from the American Type Culture

5' and 3' RACE mapping of GHRLOS transcripts
To further characterise the 5' end of the putative ghrelin antisense RNAs, 5' RACE was undertaken using FirstChoice RLM-RACE-Ready human stomach cDNA (Ambion) according to the manufacturer's instructions. The first round PCR was performed with an adapter-specific sense primer (5'adapter-out-F, Table 1) and an exon 2-specific antisense primer (5'OS-out-R in Table 1). PCR product (1 µl) was used in a secondary, nested PCR with a gene specific primer in exon 2 (5'adapter-in-F and 5'OS-in-R, Table 1). PCRs were performed in a total reaction volume of 50 µl  Table 1) from the FirstChoice RLM-RACE Kit (Ambion). 3' RACE was performed with 2 µl of this cDNA. Two 3' RACE reactions were performed -one combined an exon 4 GHRLOS-specific forward primer and an adapter-specific reverse primer (3'4F and 3'2OR, Table 1), and the other used an adapter-specific reverse primer and an exon 2-specific forward primer 3'2OR/F, Table 1). PCR products were then diluted and used in a secondary, nested PCR with a gene-specific forward and a reverse adapter primer (3'2IF/R, Table 1). PCR products were purified using a High

Rolling Circle Amplification Rapid Amplification of cDNA Ends (RCA-RACE)
To simultaneously obtain the 5' and 3' ends of GHRLOS transcripts, we employed Rolling Circle Amplification-RACE (Rapid Amplification of cDNA Ends [103], an improved inverse PCR approach. Briefly, 3 µg stomach, prostate, RWPE-1 cell line and PC3 prostate cancer cell line total RNA were reverse transcribed using 10 U of Transcriptor reverse transcriptase (Roche Applied Science) and 100 µM HPLCpurified 5'-end phosphorylated oligo d(T)-adapter primer (Phospo-dT, Table 1) (Proligo, Boulder, CO) according to the manufacturer's instructions. The singlestranded cDNA was purified using a High Pure PCR purification kit (Roche Applied Science) and eluted in 50 µl elution buffer (10 mM Tris-HCl, pH 8.5). Next, 25 µl purified linear cDNA was circularised using 100 U of CircLigase (EPICENTRE Biotechnologies, Madison, WI) and purified as before. After self-ligation, 15 µl circular cDNA was added to a rolling circle amplification reaction with 10 U of φ29 DNA polymerase (NEB) and 10 µM HPLC-purified random hexamer primers with two phosphothioate linkages on their 3'ends (Pthioate-hex, Table 1) (Proligo).  Table 1). After 35 cycles at a 60 °C annealing temperature, the outer PCR product was diluted 100 times in water and 1 µl was used in a hemi-nested PCR of 20 cycles, with annealing at 60 °C (IPCR-in-F and IPCR-ALL-R, Table 1). Amplification products were eluted from agarose gels in 50 µl water overnight, reamplified, cloned into pCR-XL-TOPO (Invitrogen), transformed into One Shot MAX Efficiency DH5α-T1R chemically competent cells (Invitrogen) and sequenced at the Australian Genome Research Facility (AGRF, Brisbane, Australia).

Northern blot hybridisation
Initially, a cRNA probe spanning exon I, II, 1 and 2 of GHRLOS was employed.
Briefly, a 5' RACE clone in pGEM-T Easy was linearised with SalI restriction enzyme and a cRNA probe was synthesised using T7 RNA polymerase and a digoxigenin (DIG) RNA labelling kit (Roche Applied Science). Probe concentration was estimated by dot blot comparison with digoxigenin-labelled standards. 500 ng stomach poly(A) + RNA (FirstChoice, Ambion) was separated on a 1.2% formaldehyde gel and blotted, as described previously [104]. Samples were electrophoresed with 50 ng RNA Molecular Weight Marker II (Roche Applied Science). The blot was hybridised to 50 ng/mL DIG-labelled cRNA probe overnight.
Prehybridisation and hybridisation was performed with DIG-Easy Hyb (Roche Applied Science) at 65 °C. The membranes were washed twice for 5 min at room temperature with 1 × Saline-Sodium Citrate (SSC), 0.1% sodium dodecyl sulfate (SDS) and then washed three times for 10 min at 65°C with 0.1 × SSC, 0.1% SDS.
The membrane was then reacted with an alkaline phosphatase (AP)-conjugated anti-DIG antibody (Roche Applied Science). AP activity was detected using a chemiluminescence method using CDP-Star (Roche Applied Science).
A second cRNA probe, which spanned exon 1 (which is common to all known GHRLOS mRNA isoforms) was synthesised from 100 ng human stomach genomic DNA (BioChain, Hayward, CA) using the PCR method [103] (Ex1-cRNA-F/R, Table   1). The PCR product was purified using a High Pure PCR Product Purification Kit (Roche Applied Science) and the DIG-labelled cRNA probe synthesised and quantified as detailed above. A multi-tissue membrane containing poly(A) + RNA from 12 human tissues (brain, duodenum, oesophagus, pancreas, PBL/leukocytes, prostate, salivary gland, testis, thymus, thyroid, urinary bladder and uterus) was purchased from OriGene (Rockville, MD). Prehybridisation and hybridisation were performed as described above, except that ULTRAhyb Ultrasensitive Hybridization Buffer (Ambion) was used instead of DIG-Easy Hyb (Roche Applied Science).
Equivalent loading between tissues on the blot was determined by rehybridising with 20 ng/mL DIG-labelled β-actin cRNA probe (Roche Applied Science).

Isolation of alternatively spliced GHRLOS mRNAs via non-quantitative RT-PCR
For non-quantitative RT-PCR analysis of GHRLOS splicing, RT-PCRs were performed with a forward primer in a region common to the 5' terminal exon Ia/b and a reverse primer in the 3' terminal exon 4 of GHRLOS (Ito4-F/R, Table 1). cDNA was synthesised in a final volume of 20 µl from 3 µg total RNA from tissues and cell lines using 10 U of Transcriptor reverse transcriptase (Roche Applied Science), 20 U of RNasin Plus RNase Inhibitor (Promega) and a 3' RACE adapter primer (3′-RACEadapter, Table 1) at 55 °C according to the manufacturer's instructions. PCR amplicons from the stomach, prostate, foetal brain, heart, thymus, testis, and pancreas were purified, sub-cloned and sequenced as described above.

Long-range RT-PCR to detect putative chimaeric TATDN2-GHRLOS transcripts
To detect long, chimaeric transcripts, we employed RT-PCR with a forward primer in exon 2 of TATDN2 (ChiOut-F, Table 1) and a reverse primer in exon 1 of GHRLOS (ChiOut-R, Table 1). PCR was carried out with 1 U of Platinum Taq HIFI polymerase (Invitrogen) as per manufacturer's instructions, extending at 68 °C for 2.5 minutes per cycle. cDNA was synthesised as above in a final volume of 20 µl from 2 µg total RNA, from the Hep G2 hepatocarcinoma cell line, CaCo-2 colorectal adenocarcinoma cell line, OVCAR-3 ovarian cancer cell line, and from a range of normal tissues (testis, prostate, pancreas, thymus, and foetal brain). RT-PCR products were subcloned and sequenced as described above.

CAGE-aided cDNA primer walking
To determine if the identified upstream CAGE tag starting sites transcribe exons that belong to GHRLOS, we employed RT-PCR using a forward primer designed to the region immediately after a CAGE cluster in the ~2 kb 3' untranslated region of the adjacent gene TATDN2 (TSS ID T03F009D1927) and a reverse primer in exon 1, an exon which is common to all known GHRLOS variants (F_CAGE and R_CAGEout in Table 1

Identification of novel SEC13 exon
To verify the presence of a novel SEC13 exon identified in a brain tumour EST [GenBank:BF931280], cDNAs reverse transcribed with an oligo(dT) primer (as described above) were challenged by RT-PCR with primers in exon 8 of SEC13 and a reverse primer in the novel exon (231-F/R, Table 1, respectively).

Strand-specific, quantitative real-time RT-PCR
To allow strand-specific and RNA-specific amplification [103,105,106] of GHRLOS transcripts, reverse transcription was performed using a gene-specific primer in exon 4 with a linker (LK) [107] sequence attached to the 5' end of the primer (GHRLOS-Real-RT-LK, Table 1). cDNA was generated from 1 µg total RNA using 40 U of AMV reverse transcriptase (Roche Applied Science) at 42 °C, according to the manufacturer's instructions. The strand-specific, real-time RT-PCR was performed with an exon 4 specific forward primer, a reverse primer with the LK sequence only (GHRLOS-Real-F and LK, Table 1) and a TaqMan probe (Ex4-TaqMan, Table 1). To detect sense GHRL transcripts, we employed a strand-specific RT-PCR approach, with a reverse transcription primer spanning the 3' terminal exon 4 of the ghrelin gene (GHRLex4_RT_LK, Table 1) followed by PCR with an exon 4 specific forward primer (GHRLex4_F, Table 1) and a linker-specific reverse primer. (LK, Table 1).
The relative quantification of GHRLOS and GHRL transcripts was estimated by direct normalisation to the threshold cycle (C T ) of the housekeeping gene, 18S ribosomal RNA (18S-Real-F/R, Table 1). 18S PCRs were used to normalise real-time data. As reported for GAPDH [108], 18S RNAs self-primes efficiently in reverse transcription reactions without the addition of random or gene-specific primers. All primers were designed using the Primer Express version 2.0 software (AB).
PCRs were performed in a total reaction volume of 20 µl using Platinum Quantitative PCR SuperMix-UDG w/ROX (Invitrogen) for GHRLOS, while GHRL and the housekeeping gene 18S ribosomal RNA were amplified using 2 x SYBR green master mix (AB). Controls included the use of cDNA, which was reverse transcribed using random hexamers as primers, as well as the reverse transcription of RNA in the absence of primer. Real-time RT-PCR was performed using the AB 7000 sequence detection system (AB) and data analysed using the absolute standard curve method (User Bulletin #2, AB) to determine expression levels in a range of tissues and cell lines. Briefly, we calculated values from duplicate reactions for each sample from standards, which were constructed from PCR products. Statistical significance was determined using the Student's t-test and, where applicable, one-way analysis of variance (ANOVA) with Tukey post-hoc analysis. P-values of <0.05 were considered to be statistically significant. Data are represented as mean ± standard deviation (S.D.).