Identification, characterization and expression of novel Sex Hormone Binding Globulin alternative first exons in the human prostate

Background The human Sex Hormone Binding Globulin (SHBG) gene, located at 17p13.1, comprises, at least, two different transcription units regulated by two different promoters. The first transcription unit begins with the exon 1 sequence and is responsible for the production of plasma SHBG by the hepatocytes, while the second begins with an alternative exon 1 sequence, which replaces the exon 1 present in liver transcripts. Alternative exon 1 transcription and translation has only been demonstrated in the testis of transgenic mice containing an 11-kb human SHBG transgene and in the human testis. Our goal has been to further characterize the 5' end of the SHBG gene and analyze the presence of the SHBG alternative transcripts in human prostate tissue and derived cell lines. Results Using a combination of in silico and in vitro studies, we have demonstrated that the SHBG gene, along with exon 1 and alternative exon 1 (renamed here exon 1A), contains four additional alternative first exons: the novel exons 1B, 1C, and 1E, and a previously identified exon 1N, which has been further characterized and renamed as exon 1D. We have shown that these four alternative first exons are all spliced to the same 3' splice site of SHBG exon 2, and that exon 1A and the novel exon 1B can be spliced to exon 1. We have also demonstrated the presence of SHBG transcripts beginning with exons 1B, 1C and 1D in prostate tissues and cell lines, as well as in several non-prostatic cell lines. Finally, the alignment of the SHBG mammalian sequences revealed that, while exons 1C, 1D and 1E are very well conserved phylogenetically through non-primate mammal species, exon 1B probably aroused in apes due to a single nucleotide change that generated a new 5' splice site in exon 1B. Conclusion The identification of multiple transcription start sites (TSS) upstream of the annotated first exon of human SHBG, and the detection of the alternative transcripts in human prostate, concur with the prediction of the ENCODE (ENCyclopedia of DNA Elements) project, and suggest that the regulation of SHBG is much more complex than previously reported.


Background
Sex hormone-binding globulin is a dimeric glycoprotein that transports sex steroids in the blood and regulates their access to target tissues [1]. Human SHBG gene is localized in the short arm of chromosome 17 (17p13.1), in a region characterized as hotspot for genetic recombination, gene amplification, and integration of foreign genomes [2,3]. In humans, the SHBG gene is constituted by a minimum of two different transcription units regulated by, at least, two different promoters [4]. The first transcription unit is responsible for the production of plasma SHBG by the hepatocytes, and begins with the exon 1 sequence, which encodes for a leucine rich signal secretion peptide. This transcription unit is regulated by the promoter 1 sequence that contains several binding sites for liver enriched transcription factors [4,5]. The second transcription unit begins with the alternative exon 1 sequence, which replaces the exon 1 present in the liver SHBG transcripts, and is regulated by an alternative promoter sequence [4,6], that proved to be very active when it was transfected in the GC2 mouse germ cell line [7]. The alternative exon 1 is found approximately 1.9 kb upstream of the exon 1 sequence, and does not contain an ATG in frame with the SHBG nucleotide coding sequence. It has been hypothesized that transcripts beginning with the alternative exon 1 could potentially initiate translation in the first ATG in frame found in exon 2, which encodes for the methionine 30 of the mature plasma protein [4,7].
The presence of SHBG mRNA has been demonstrated in human liver, brain, cardiac myocytes, adrenal glands, testis, prostate, mammary glands, placenta, fallopian tube, endometrium and granulose-lutein cells of the ovary [6,[8][9][10][11][12][13][14]. However, transcription and translation of SHBG alternative exon 1 have only been shown in the testis of transgenic mice containing an 11-kb human SHBG transgene and in the human testis [6], resulting in a SHBG isoform that binds androgens and estradiol with high affinity, and accumulates in the acrosome of developing sperm [4].
In the prostate, the presence of SHBG mRNA and protein has been described in epithelial and stromal cells [15,16]. However, the transcription unit responsible for the synthesis of these SHBG mRNAs has not been characterized and it is unknown whether the protein found in the human prostatic tissues is translated locally or results from extravascularization of the liver secreted protein, as described in other tissues [17]. Our goal in the present study has been to identify novel SHBG TSSs since, as indicated by the ENCODE project, about two-thirds of the genes in the 1% of the analyzed human genome had unannotated 5' extensions [18], and determine which SHBG transcription units are active in human prostate.

Cell cultures
Human prostate cancer cell lines LNCaP, PC3, DU-145 and PZ-HPV7 were obtained from the American Type Culture Collection (ATCC, Rockville, MD). LNCaP, PC3 and DU-145 cells were maintained in RPMI 1640 medium (PAA Laboratories, Pasching, Austria), containing 10% fetal calf serum (PAA Laboratories), and supplemented with penicillin/streptomycin, sodium pyruvate and modified Eagle media with non-essential aminoacids as recommended. PZ-HPV-7 cells were grown in Keratinocyte-SFM Medium (Invitrogen, Carlsbad, CA), supplemented with 2.5 μg of EGF and 25 mg of bovine pituitary extract (both from Invitrogen). The hepatocarcinoma cell line HepG2 and the kidney carcinoma cell line Hek 293 were also obtained from ATCC and were grown and maintained in DMEM (PAA laboratories) containing 10% fetal calf serum, and also supplemented with penicillin/streptomycin, sodium pyruvate and modified Eagle media with non essential aminoacids.

Human prostate samples
Human prostate samples were obtained from the nontumoral tissue of patients with prostate carcinoma at the T2/T3N0M0 stage, submitted to radical prostatectomy. The informed consent was obtained in all cases, in keeping with Institutional Ethics Committee requirements. The histology of the prostate specimens was evaluated by the urological pathologist (Dr. de Torres).

5' Rapid Amplification of cDNA Ends (RACE)
5'RACE was performed using the FirstChoice ® RLM (RNA Ligase Mediated)-RACE Kit (Ambion, Austin, TX) with 10 μg of total RNA from DU-145 and LNCaP cells as template, and reverse primers complementary to the fifth (5' TGAGATCTCGGCCTGTTTGTC 3') and third (5' AGGCCTGCCGTCTCGAAGTCCC 3' for nested PCR) exon of SHBG gene. First round of PCR amplification consisted of 40 cycles of denaturation at 94°C for 20 sec, annealing at 59°C for 30 sec, and extension at 72°C for 45 sec. Nested PCR amplification was performed at the same conditions for 36 cycles. PCR products were cloned into the pCR ® 2.1-TOPO using the TOPO TA Cloning ® kit (Invitrogen). The inserts were amplified by PCR, purified with the QIAquick Gel Extraction Kit (Qiagen) and sequenced using an ABI Prism 3100 genetic analyzer (Perkin-Elmer Corp., Wellesley, MA).

Identification of orthologous SHBG first exons in different species
To identify the orthologous sequences of alternative human SHBG first exons, we used the blastn, discontiguous megablast and megablast web interfaces of NCBI (expected threshold: 10; match/mismatch scores: 2,-3; gap costs: existence 5 extension 2), the NCBI Trace

RNA extraction, RT-PCR and Real-time PCR
Total RNA was isolated from LNCaP, PC3, PZ-HPV7, HepG2 and Hek 293 cell lines, and from human prostate tissue samples, using the RNeasy Mini/Midi Kit 50 (Qiagen, Hilden, Germany). Total RNA from rhabdomyosarcoma cell lines CW 9019 and RH 30, and from the neuroblastoma cell line imr 32, was kindly provided by Dr. Josep Roma, and total RNA from the breast cancer cell lines MDA-MB 468, BT 474 and T47D was kindly provided by Dr. Maurizio Scaltriti. Two μg of RNA from each sample were reverse transcribed using Superscript II H -(Invitrogen), at 42°C for 50 min. One μl of the resulting cDNA was amplified in a 25 μl reaction in the presence of Taq polymerase (Ecogen, Barcelona, Spain) or TaKaRa LA Taq™ (Takara Bio Inc., Shiga, Japan). The PCR amplification was performed in non-saturating conditions using the primer pairs described in Table 1. Each PCR was performed in triplicate. The PCR products were resolved by electrophoresis in a 1.5% agarose gel and purified, cloned and sequenced as described above.
For real-time PCR, one μl of the cDNA was amplified in a 20 μl reaction using Quantitect™ SYBR ® Green PCR kit (Qiagen), with forward primers that recognized exons 1A, 1B, 1C and 1D and a common reverse primer for exon 3 ( Table 1). The reactions were performed in triplicate, using the universal thermal cycling parameters (Applied Biosystems). Data were calculated as the means ± SE, for each SHBG alternative exon. The relative expression levels were calculated in relation to the levels of alternative exon 1A, according to the formula 2 -ΔCT , where ΔCT is the difference in threshold cycle (CT) values between the target and the internal control (S18 gene) using one-way ANOVA.P values < 0.05 were considered significant.

"In silico" identification of an additional SHBG first exon
With the use of the FirstEF program, three potential first exons were predicted in the genomic sequence -858/+132 respective to the described alternative exon 1 TSS [23]: one in the negative strand of the chromosome 17 (7471406-7471590), with a genomic size of 185 nucleotides and a predicted exon probability of 0.982, that corresponded to the second exon of the SAT2 gene (exon prediction 1); and two in the positive strand in the same orientation of the SHBG gene ( Figure 1A). Of the two predicted exons in the positive strand, one corresponded to the previously described alternative exon 1 ( Figure 1A, exon prediction 2) [9,23], localized on the chromosome 17 genomic sequence 7471905-7472194, and with a predicted exon probability of 0.982; and the other corresponded to a putative previously non described SHBG first exon of 65 nucleotides in length, situated at -278/-343 respective to the alternative exon 1 TSS, and localized on the chromosome 17 genomic sequence 7471760-7471824 ( Figure 1A, exon prediction 3). The predicted 3' end of this exon is perfectly defined by the consensus 5' splice site CTG/gtaagt ( Figure 1B). According to the FirstEF program, the novel potential SHBG first exon is contained in a CpG island of 202 nucleotides in length localized on the chromosome 17 genomic sequence 7471654-7471855 ( Figure 1A). We named this novel putative SHBG first exon, exon 1B, and therefore we renamed the SHBG alternative exon 1 as exon 1A.
In order to find out if exon 1B is expressed in prostate tissues and cell lines, we performed a PCR with a forward primer corresponding to the beginning of the predicted exon 1B ( Figure 1B, primer ii), and a reverse primer of exon 8. Using the cDNA obtained from the LNCaP cell line and from human prostate samples, we amplified a major band of approximately 1200 bp ( Figure 1C), which once cloned and sequenced, corresponded to the predicted exon 1B sequence followed by exons 2-3-4-5-6-7-8. The analysis of the sequence showed that all the exon-exon junctions resulted from splicing of consensus splice sites and that exon 1B transcripts used the same 3' splice site of exon 2 as exon 1 and exon 1A transcripts.
Aiming to determine whether exon 1B TSS was localized further 5' upstream of the sequence predicted by the FirstEF program, we designed three additional forward primers of exon 1B, situated -90, -70 and -39 nucleotides from the predicted exon 1B start site and performed PCR amplification using a reverse primer against exon 3 and cDNA from the LNCaP cell line. Using the primer situated -39 nucleotides from exon 1B predicted start site ( Figure 1B, primer i), we obtained a major PCR product ( Figure 1D), which, once sequenced, consisted of a 39 nucleotide extension of the exon 1B in its 5' end, followed by exons 2-3. This data allowed us to extend the exon 1B sequence to the nucleotide 7471721 of the chromosome 17, resulting for a characterized exon of 104 nucleotides in length (Table 2).
To provide further evidence of the exon 1B expression, we analyzed the databases from other species and identified two mRNA sequences, one in Macaca fascicularis [Gen-Bank: AB169062], and one in Macaca mulatta [GenBank: XR_013781.1]. The Macaca fascicularis mRNA sequence derived from a testis cDNA library and included an orthologous sequence of 36 nucleotides located upstream of the 5'splice site of the human exon 1B, and the Macaca mulatta mRNA sequence included the orthologous 21 nucleotides upstream of the 5'splice site in humans. Furthermore, in both Macaca mRNAs, the human 5' splice site was not found due to the presence of a GG sequence instead of the GT donor nucleotides of the human sequence, and therefore exon 1B and exon 1A are present in these mRNAs as one unique exon ( Figure 1E).

Identification of additional SHBG first exons using 5'RLM-RACE
To further characterize the 5' end of the SHBG gene, we used the FirstChoice ® RLM-RACE kit to identify 5'capped SHBG transcripts from total RNA of the DU-145 and LNCaP cell lines. To perform the first round of the 5' RACE, we used a forward outer primer that recognized the Analysis of the annotated 5' end of the human SHBG gene adapter sequence linked to the 5' end of capped mRNAs, and a reverse primer against exon 5 ( Figure 2A). Nested PCR was performed using a forward inner primer of the adapter sequence and a reverse primer recognizing exon 3 ( Figure 2A). In the DU-145 cell line, the two rounds of amplification resulted in one major product of 300 nucleotides approximately ( Figure 2B), that, after cloning and sequencing, corresponded to two different and novel SHBG mRNAs. One RACE product ( Figure 2C, RACEfrag 1; 311 nucleotides) contained a novel SHBG first exon of 107 nucleotides followed by exons 2-3, and the other RACE product ( Figure 2C, RACEfrag 2; 279 nucleotides) consisted of a different and novel first exon of 75 nucleotides, also followed by exons 2-3. In the LNCaP cell line, the two rounds of amplification resulted also in one major band of 300 nucleotides approximately ( Figure  2D), that after cloning and sequencing, corresponded to two different transcript sequences of 289 and 341 nucleotides ( Figure 2E, RACEfrags 3 and 4). Both transcripts include an alternative SHBG first exon previously introduced in the public databases by Kahn and collaborators as exon 1N [GenBank: EU352656], followed by exons 2 and 3. However, these two RACEfrags contain two different TSS in their 5' end, resulting in an exon of 137 nucleotides when TSS 1 (Chr 17: 7458018) is used, and an exon of 87 nucleotides when the transcription starts at TSS 2 (Chr 17: 7458070) ( Figure 3A). These sequences are also 27 and 79 nucleotides shorter than the sequence of the exon 1N (164 nucleotides), as the TSS of the later was located at the nucleotide 7457989 of the chromosome 17. Even so, the two variants of the alternative first exon found in the RACEfrags 3 and 4 and the exon 1N share the same 3' end and the same 5'splice site ( Figure 3A). Interestingly, the 5' end of exon 1N partially overlaps with the sequence of the novel first exon identified in the RACEfrag 2 ( Figure 3A). However, based on our 5'RACE data, the TSS found in the RACEfrag 2, although it is localized further 5'upstream of the TSS described for exon 1N, does not extend the 1N sequence in the 5' end because it uses a different 5' splice site, and, therefore, generates a different 5' exon ( Figure 3A).  Table 2). The analysis of the exon-exon junctions of all RACEfrags sequences, indicated that exons 1C, 1D and 1E were all spliced to exon 2 using the same 3' splice site as exons 1, 1A and 1B ( Figure 3C). A complete overview of the characteristics of the different SHBG alternative first exons is illustrated in Table 2.

Phylogenetic comparison of the SHBG alternative first exons
flanking intronic sequence is depicted in Figure 5A. The comparison of apes and Old World monkey sequences showed a 100% sequence identity between humans and chimpanzees, and a calculated 97% sequence identity between humans, orangutans, baboons and rhesus macaques. However, a much less homology is found between humans and horses (calculated sequence identity of 51%); humans and dogs (49%) and humans and mice (44%). The 5'splice site is only conserved in apes and Old World monkeys.
The alignment of the human exon 1B sequence with the sequences of Pan troglodytes, Gorilla gorilla, Pongo pygmaeus, Papio hamadryas, Macaca mulatta, Equus caballus, Oryctolagus cuniculus (rabbit), and Mus musculus showed a 96% sequence identity between humans and chimpanzees, and a 88-83% sequence identity between humans, gorillas, baboons and rhesus macaques ( Figure 5B). Much less homology was found between humans and horses (51%), humans and rabbits (37%) and humans and mice (39%). Exon 1B 5' splice site is only conserved in humans, chim-panzees, gorillas and orangutans, and is not found in the Old World monkeys baboons and rhesus macaques, and neither in horses, rabbits and mice ( Figure 5B).  Figure 6A). There is also a high sequence identity with horses (86%), bovines (79%), pigs (78%), mice (69%) and rats (71%). Therefore, and in contrast to exons 1A and 1B, exon 1C is not only conserved in apes and Old World monkeys, but in horses and pigs. However, it is not conserved in cows, mice and rats. caballus, Bos taurus, Canis familiaris, and Mus musculus showed a 95% sequence identity with chimpanzees, 96% with gorillas, 94% with orangutans, and 92% with baboons and rhesus macaques. High sequence identity was also found with bovines (82%), horses (81%) and dogs (78%), but less identity was found with mice (57%) ( Figure 6B).  macaques ( Figure 6B). As in the case of exons 1C and 1D, there was a high homology between human and horses, with a calculated sequence identity of 81%. There was also high identity between humans and bovines (80%) and humans and dogs (79%), and less identity was found between humans and mice (62%).

Activity of the SHBG transcription units in human prostate
Even though the presence of SHBG mRNA in the prostate has been previously reported [15,16], none of these studies took into account the nature of the transcription unit responsible for its production. To analyze the activity of the different SHBG transcription units in prostate tissues and prostate cancer cell lines, we performed RT-PCR using specific forward primers for each of the alternative first exons, as well as an exon 3 forward primer, to determine the global SHBG expression.
The analysis of cell lines using a specific exon 3 forward primer, showed that all the prostate cell lines analyzed expressed SHBG, with LNCaP and PZ-HPV-7 cells presenting higher relative amounts of total SHBG mRNA com- pared with the PC3 cell line ( Figure 7A). Using the exon 1 forward primer, the levels of SHBG mRNA were higher in PZ-HPV-7 cells than in PC3 cells, whereas almost no expression was found in LNCaP cells ( Figure 7A). On the contrary, when the exon 1A upper primer was used, the levels of SHBG mRNA were higher in LNCaP than in PC3 cells, whereas almost no expression was detected in PZ-HPV-7 cells ( Figure 7A). For the exon 1B primer, the max-imum levels were found in PZ-HPV7 cells, followed by PC3 cells (Figure 7A), whereas with the exon 1C primer, similar levels of SHBG were found in the three prostate cancer cell lines ( Figure 7A). As in the case of exon 3 forward primer, the use of exon 1D forward primer showed that the relative amounts of this transcript were higher in LNCaP and PZ-HPV7 than in PC3 cell line ( Figure 7A). In contrast, we could not amplify transcripts containing exon Phylogenetic comparison of exon 1A and 1B across different vertebrate species 1E in any of the prostate cell lines tested, probably reflecting that their transcription is very low. It is noteworthy that in exon 1A and exon 1B specific RT-PCR products, a faintly upper-than-expected band was detected ( Figure  7A, β band), that once sequenced corresponded to SHBG sequences that included the exon 1 sequence after the exon 1A or exon 1B. These results suggested that exon 1A/ exon 1B and exon 1 can be alternatively spliced together and therefore are not always mutually exclusive. When exons 1A or 1B were spliced to exon 2 ( Figure 7A, α band),

Localization of SHBG alternative first exons in Chromosome 17 genomic sequence
Phylogenetic comparison of exon 1C, 1 D and 1E across different vertebrate species the 5' consensus splice sites of both first exons and the consensus 3' splice site of exon 2 were used ( Figure 3C). However, when exons 1A or 1B were spliced to exon 1 ( Figure 7A, band β), although the 5' consensus splice sites of exon 1A and 1B were the same, two different 3' splice sites of exon 1 were used, that we named a and b ( Figure  7D). While exon 1A was found to be spliced only to the 3' splice site b, exon 1B was alternatively spliced to both a and b splice sites ( Figure 7D). Additionally, it was also observed that when exon 1A primers were used, a lowerthan-expected band was detected ( Figure 7A; γ band), that once sequenced corresponded to SHBG transcripts were exon 1A was spliced to exon 2 and exon 4 was skipped.
Using exon 1D primers, a lower-than-expected band was also observed, but in this case, its sequence determined that it corresponded to an unspecific RT-PCR product. No SHBG sequences with non-canonical 5' and 3' consensus splice sites were obtained in any case.
When the activity of the different SHBG transcription units was analyzed in prostate samples, we detected the presence of the ones containing exon 1, 1A, 1B, 1C and 1D in all the samples analyzed ( Figure 7B), but exon 1E transcripts were not amplified with RT-PCR. As in the case of the prostate cancer cell lines, upper-than-expected RT-PCR bands were detected ( Figure 7B, β band) that, once sequenced, corresponded to specific SHBG transcripts whose exon 1A and 1B sequences were followed by the exon 1 sequence using the 3' splice sites a and b ( Figure  7D). In the case of exon 1B transcripts, two different β bands were observed ( Figure 7B; β 1 and β 2 bands) corresponding to the use of the 3' splice sites a (β 1 ) and b (β 2 ).
To determine whether transcription of these novel SHBG isoforms was restricted to prostate, human non-prostatic cell lines were analyzed by RT-PCR. The results revealed that transcripts containing exon 1B were present in cell lines derived from cervix carcinoma (HeLa), rhabdomyosarcoma (CW 9019 and RH 30), and breast cancer (MDA-MB 468, T47D, and BT 474) ( Figure 7C), but were almost undetectable in the hepatocarcinoma cell line HepG2, in the kidney cell line Hek 293, and in the neuroblastoma cell line imr 32 ( Figure 7C). As in the case of prostate cell lines and tissues, the upper-than-expected band corresponding to transcripts where exon 1B is followed by the exon 1 sequence, was also detected in the RH 30, HeLa and HepG2 cell lines. Specifically, in the RH 30 cell line, transcripts containing exon 1B-exon 1 were more abundant than exon 1B-exon 2 transcripts ( Figure 7C). The transcription unit containing exon 1C was identified in HeLa, HepG2, Hek 293, CW 9019, RH 30 and imr 32 cells, but no detectable levels were found in any of the breast cancer cell lines analyzed ( Figure 7C). Furthermore, in cell lines where levels of exon 1C SHBG mRNA were higher (CW 9019 and imr 32; Figure 7A), full length exon 1C transcript sequence was detected by nested PCR, using primers directed against exons 1C -8 for the first round, and exons 1C -5 for the second round ( Figure 8A). The major product found in the first round (1.2 kb), coincided with the expected size of the full-length transcript from exon 1C to exon 8 ( Figure 8B). The 1.2 kb PCR band from both cell lines were excised from the gel, purified and pooled to use it as template DNA for the second round PCR. Direct sequencing of the second round PCR product demonstrated that the 635 nucleotide sequence contained the exon 1C followed by exons 2, 3, 4 and 5 ( Figure 8C). Finally, the transcription unit of exon 1D was detected in all cell lines analyzed, but very few levels were found in the imr 32 and MDA-MB 468 cells ( Figure 7C). The fulllength of transcripts beginning with exon 1D has been previously reported by Kahn and collaborators as exon 1N SHBG transcript [GenBank: EU352670].
The analysis of the relative abundance of the alternative 1B, 1C and 1D transcripts compared to 1A in LNCaP, PC3 and PZ-HPV7 prostate cell lines as well as in HeLa cells by real-time PCR showed that the levels of exon 1B transcript were significantly higher in PC3 and PZ-HPV7 cells, and of exon 1D transcript in PZ-HPV7 cells (see Additional file 1).

Discussion
Recent reports in the framework of the ENCODE Project indicated that more than two thirds of the interrogated genes present additional TSSs upstream of their annotated first exons [18]. The analysis of 44 regions totaling 30 Mb or 1% of the human genome sequence showed that there is a coding protein gene every 62 kb, and that the novel TSSs often mapped upstream loci, since the newly identified TSSs localized on average 186 kb upstream of the most 5' annotated exons [18,24]. Additionally, an increasing number of reports indicate that many eukaryotic genes possess multiple transcriptional promoters associated with alternative first exons [25][26][27][28]. In this context, we decided to characterize the 5' end of the human SHBG gene.
In chromosome 17, where SHBG is localized, alternative splicing has been described to occur extensively, with an average of 5 transcripts per gene, and it has also been shown that in this chromosome, 76.6% of the genes display, at least, two transcripts [29]. In the present study, we have used in silico and in vitro approaches in order to identify novel 5' SHBG alternative first exons. Using the FirstEF program we identified a potential novel SHBG first exon in the positive strand of chromosome 17, situated -278 nucleotides upstream of the TSS of the SHBG alternative exon 1. We named this potential novel SHBG first exon, exon 1B, and therefore we renamed the previously described alternative exon 1 [23] [30] and also that CpG island promoter regions are commonly associated with bidirectional promoter activity (approximately 15% of the imprinted genes have associated antisense transcripts) [31,32]. One example is the Gabpa-Atp5j genes in mouse, which contain two over-lapping promoters in each direc-tion [33]. When this condition occurs in yeast and bacteria, there is potential for transcriptional interference [34], and it has also been postulated that, in these CpG bidirectional promoters, the activity of one promoter might influence the epigenetic state of the other [33].
Using 5'RLM-RACE with cDNA obtained from DU-145 and LNCaP cells, we identified two additional SHBG first exons that were named exons 1C and 1E, and two different sequences corresponding to the previously described exon 1N. These two sequences resulted from two different TSSs, generating exon sequences 27 and 79 nucleotides shorter than exon 1N. We renamed these exon sequences as exon 1D.
Exon 1C has a length of 107 nucleotides and is localized 253 nucleotides upstream of exon 1B. As in the case of exon 1B, it also overlaps with the SAT 2 gene, specifically with the complete coding sequence of exon 3 and with 44 and 10 nucleotides of intron 3 and intron 2 respectively.
Amplification of the full-length exon 1C SHBG transcript The TSS identified by 5'RACE, with T as the nucleotide +1, is displaced just one nucleotide from a consensus Inr element or Cap motif (pyrimidine, pyrimidine, A(+1), N, T/ A, pyrimidine, pyrimidine, where N is any nucleotide) [33]. Exon 1D has a length of 137 or 85 nucleotides, depending on which TSS is used, and is localized 13.35 kilobases 5' upstream of exon 1 TSS. Exon 1E has a length of 75 nucleotides and its 3' end is situated only 13 nucleotides upstream of the TSS1 of exon 1D. It overlaps with the intron 1 sequence of the FXR2 gene [GenBank: NM_004860.2], situated also in the negative strand. This overlap was previously described in rat chromosome 10, where a GC rich sequence in the alternative promoter of rat SHBG overlapped with the 5' UTR of the FXR2 gene [35]. These and our results, suggest that the 5' end of the SAT2 and FXR2 genes present a broad range of TSSs in the opposite strand, corresponding to the SHBG gene.
TATA boxes have been described to be six-fold more common in genes with single promoters than in genes with alternative promoters, and it has been suggested that strong TATA boxes might be incompatible with alternative promoters [36]. However, in exon 1E there is a TATAA sequence situated 30 nucleotides upstream from the TSS1 of exon 1D, probably regulating its transcription initiation. TATA-box directed transcription is normally associated with a sharply defined TSS situated 30-31 downstream from the TATA sequence. In contrast, TATAindependent promoters normally present a much broad distribution of TSSs [33], suggesting that exons 1B, 1C and 1E might present additional TSSs than the ones presented here. Furthermore, while TATA-box promoters are normally associated with tightly regulated transcripts with a strong bias toward postnatal activity [33,36], broad TSS promoters and CpG promoters are associated with ubiquitous transcripts, and, in the case of genes with alternative promoters, with a weak bias toward prenatal transcription [33,36]. These data suggests a differential usage of the SHBG alternative promoters through different stages of development.
The full-length of transcripts beginning with exon 1B and 1C has been demonstrated. In the case of exon 1C transcript, although the size of the product of the first round PCR suggests that exons 6 and 7 are included in the fulllength transcript, it has only been possible to sequence it from exon 1C to exon 5.
We have shown the presence of exon 1B, 1C and 1D transcripts in prostate cell lines and tissues as well as in several non-prostatic cell lines. These results support that the activity of these different transcription units is not restricted to prostate, as it would be expected for 1B and 1C transcripts, since their promoters do not contain a TATA sequence. As for the 1D transcripts, the activity of the putative TATA promoter remains to be proven. With regard to the relative abundance of the alternative transcripts, real-time PCR assays showed that the highest levels corresponded to the exon 1B transcript in the four cell lines analyzed. Although differences in primer efficiency could not be completely excluded, it would be interesting in the future to elucidate the meaning of these variations.
All the SHBG exons identified by 5'RACE in this study are very well defined in their 3' end by consensus splice sites sequences (AG/GTRAGT), and they are all spliced directly to the 3'splice site of exon 2, as it was previously described for SHBG exon 1 and exon 1A [9]. However, when RT-PCR was performed in prostate cancer cell lines and prostate tissues using specific primers for exons 1A and 1B, two different 3'splice sites of exon 1 were used, a and b, both presenting a consensus -AG/G sequence for U2AF65 protein binding, but lacking a clear polypyrimidine tract and branch point. None of this alternative first exons contain in their sequence an ATG in frame with the SHBG coding sequence, suggesting that they act as 5' UTR sequences regulating the translation efficiency from the first ATG in frame of exon 2, which encodes the methionine 30 of the transcripts that begin with exon 1. In this regard, it has been reported that stable mRNA secondary structures in the 5'UTR region (≥-35 kcal/mol) can decrease considerably the translation efficiency by affecting ribosomal recruitment and positioning at the initiation codon [37]. The MFOLD program served us to predict a hairpin stability ≥-40 kcal/mol for exons 1A, 1B, 1C and 1D (TSS1), and -35,8 kcal/mol for exon 1E, supporting that the secondary structure of the different alternative first exons could affect translation efficiency.
The functional significance of these novel alternative SHBG transcription units will rely on the demonstration of their protein coding capacity or of their action as natural antisense transcripts of genes located on the opposite DNA strand. We tested the former possibility by using different antibodies against SHBG, and detected bands of the expected size in human prostate samples but not in prostate cancer cell lines by Western blot analysis (data not shown). Our preliminary data of transient transfection experiments of different alternative SHBG full-length constructs showed that these transcripts are indeed translated, but their translation efficiency are negatively regulated depending on each specific 5'UTR sequence, as predicted by the MFOLD program. Further studies are required to fully assess and understand the contribution of the use of these alternative transcripts and their probable alternative promoters to regulate transcription and translation, as it has already been demonstrated for other genes [28]. As for the second possible action, the discovery of novel 5'SHBG exons that overlap with the SAT 2 and FXR 2 sequences (exons 1B, 1C, 1D and 1E) suggests that SHBG-SAT 2 and SHBG-FXR 2 genes might be mutually regulated by transcriptional interference.
The phylogenetic comparison of the human SHBG alternative first exons with different vertebrate mammalian species showed that exons 1C, 1D and 1E are highly conserved across the species, but exon 1A and specially exon 1B, are by far less conserved. The variation in the degree of conservation of the different SHBG alternative first exons parallels the degree of conservation of their 5' splice sites: while the 5'splice site of exons 1A and 1B are only conserved within primate species, the ones of exons 1C, 1D and 1E are conserved in primate and non-primate species. These data suggest that 1A and 1B are recently evolved exons, with higher evolutionary turnover rate, especially exon 1B, which was created about 25 million years ago, when apes diverged from Old World monkeys.
The estimation of the total number of human proteincoding genes falls between 20000-25000 [38,39], whereas those of simpler organisms as Drosophila melanogaster and Caenorhabditis elegans are not much lower, with 13000 and 18000 genes respectively [38]. It was hypothesized that functional diversity of this limited number of genes is necessary to create the highly elaborated systems necessary for mammalian live [38]. Alternative splicing and alternative promoter usage are welldescribed mechanisms that produce an elevated number of protein-coding and non-coding transcripts from a single gene locus. In the framework of the ENCODE project it was observed that 86% of the interrogated multi-exon gene loci in the ENCODE regions presented alternative splicing generating > 5.4 transcripts per gene [18]. Our analysis showed that, at least, one additional transcription unit (exon 1B) aroused in apes, due to a single nucleotide change that generated a new 5' splice site in exon 1B.

Conclusion
In the present study, we have identified three novel alternative SHBG first exons (exons 1B, 1C and 1E), and further characterized an alternative SHBG first exon previously introduced in the public databases (exon 1D). We have also demonstrated the activity of the transcription units containing exons 1B, 1C and 1D in human prostate tissues and prostatic and non-prostatic cell lines. In view of these results, it will be necessary to determine the significance of these alternative TSSs in terms of regulation of expression of SHBG or overlapping genes on the opposite strand. Additionally, it would be interesting, in the future, to ascertain whether the appearance of a SHBG alternative transcript in humans confers any evolutionary advantage.