The sequences of the four core histones (H2a, H2b, H3 and H4) of Giardia intestinalis are similar to those of histones from other eukaryotes . Protein structure modeling of the putative translation products indicate that these proteins can assemble into nucleosomes that do not differ significantly from nucleosomes from vertebrates . In the present study we extend this information to the copy number, promoter structure, and genomic distribution of the histone genes in this organism. Each of these genes is represented in its genome by two copies only – with the exception of the H4 gene that is present in three copies. These copy numbers are at the low end of the scale observed among eukaryotes, ranging from just two copies of each gene in yeast, 10 to 20 copies in humans, and up to several hundred copies in sea urchins . The unexpected extensive DNA sequence identities between the copies of each of these genes, especially since they extend to the 5' noncoding region, suggest that these copies have arisen from relatively recent gene duplications or gene conversions.
All canonical replication-dependent histone genes in higher eukaryotes lack polyadenylation signals except for some invertebrate animals (insects [21, 22], worms , crustaceans , and mussels ) that have core histone genes with both a stem-loop structure and a polyadenylation signal. These findings have led to the hypothesis that there has been a progressive replacement of the polyadenylation signal by the stem-loop structure in replication-dependent histones in defining the 3' ends of core histone mRNAs during animal evolution . The presence of polyadenylation signals and the absence of potential stem-loop structures in the 3' noncoding sequences of the Giardia core histone genes suggest that studies on representatives of other eukaryotic lineages is warranted to determine whether a similar evolutionary trend is a more general phenomenon.
The linker histone H1 is a ubiquitous component of eukaryotic chromatin and is expected to be present in this protist. An earlier report putatively identified a 21 kDa basic protein from nuclear extracts as histone H1 . The identification was based on the size of the protein and on the sequence of one of its peptides (VAATPVSTKAAP) that was highly similar to a sequence within the chicken histone H1 protein (VAAPPTPAKAAP). Our inability to find this peptide or any homologs of H1 sequences from other organisms in the G. intestinalis genome database is therefore puzzling. More than an 11-fold coverage of the genome has been achieved by the Giardia genome sequencing project , so the possibility that the H1 gene was missed in the assembly is unlikely although it cannot be excluded. Furthermore, we were not able to detect specific binding to any protein in a Giardia histone extract with an antibody that recognizes the H1 protein in chicken, human and other mammals in Western blot analysis (data not shown).
These observations raise the problem of the identity of the protein isolated by Triana et al. . The failure to identify a histone H1 in the Giardia genome indicates that this gene is either so divergent that it could not be recognized by our database searches or that it absent from this organism. No unambiguous H1 has been reported for Apicomplexa , and the H1 gene is non-essential in the ciliate, Tetrahymena thermophila . Although TEM analysis of Giardia chromatin by Triana et al. showed structures that were more compact than the "beads-on-a-string" nucleosome filament, it is unclear whether a 30 nm fibre structure is present. Indeed, polynucleosome fibres in the presence of physiological levels of cations can fold further to form more compact chromatin structures in the absence of H1 histones [29, 30]. In Giardia, more than 9,000 ORFs are crammed into a genome that is only 12 Mb in size. Thus, it is tantalizing to speculate that the formation of histone H1-dependent chromatin structures may not be necessary to compact such a small and gene-rich genome. In support of this idea, we were also unable to identify a histone H1 in the Microsporidian, Encephalitozoon cuniculi, an evolutionarily distant organism with a genome less than 3 Mb in size that contains almost 2,000 protein encoding genes . Only further biochemical studies on Giardia chromatin can clarify the conundrum of the missing histone H1 gene. If Giardia indeed lacks a histone H1, this finding would be consistent with the hypothesis that H1 histones were recruited in eukaryotic evolution after the acquisition of the core histones to further refine the chromatin structure [32, 33]. However, further support for this hypothesis would require more detailed analysis of representative organisms from different lineages.
Whereas the evolutionary origin of the core histones can be traced back to a DNA binding protein in archaebacteria, such as the Hmf protein in Methanofermus fervidus [34, 35], the origin of the H1 histone is more difficult to determine. Unlike the highly conserved core histone proteins, the H1 proteins are very heterogeneous among protozoa, and exhibit great diversification and specialization even among mammals . Nevertheless, the sequence similarity of small basic proteins found in several eubacteria to the lysine-rich carboxyl terminus of metazoan H1 proteins has led to speculations that these eubacterial proteins are candidates for the ancestral histone H1 protein . In the dinoflagellate, Crypthecodinium cohnii, proteins with similarity to the core histones are absent but two proteins (HCC1 and HCC2) with significant similarity to the small molecular weight HU bacterial protein are present . The HU protein is the most abundant DNA-binding protein in E. coli, and one role of this multifunctional protein is the organization of the bacterial chromosomal DNA [38, 39]. Intriguingly, an HU-like DNA-binding protein was also identified by Triana et al.  based on a peptide sequence obtained from a small molecular weight protein on a SDS-PAGE gel of fractionated Giardia chromatin. However, we were unable to locate an exact match of this peptide sequence in the Giardia genome database by BLASTp and tBLASTn searches. The Giardia sequence with the best match to the putative HU peptide was only 8 out of 13 amino acids. Negative results were also obtained when the Giardia genome was searched with the full-length HU sequence from Bacillus subtilis. When the top five Giardia sequences from each of the above searches were used individually in BLASTp and tBLASTn searches against all sequences in the GenBank database, no HU nor other histone sequences were retrieved. Possible explanations for our lack of finding the HU gene is that the HU-like gene was missed in sequencing of the Giardia genome, or more likely, there was a bacterial contamination of Triana et al.'s Giardia chromatin preparation.
The transcription initiation site of the G. intestinalis histone genes was located only a few nucleotides upstream from the translation start codon, in agreement with previous studies showing that mRNAs of this species have unusually short 5' UTRs . The translation of messages with such short 5' UTRs has been a point of debate because earlier reports suggested that Giardia mRNA are uncapped . Recent studies, however, demonstrated more convincingly that Giardia mRNAs are capped, and that capped mRNAs with 5'UTRs as short as a single nucleotide can be efficiently translated in this protist [41, 42].
Our results obtained with transient transfection assays with luciferase activity as indicator show that an about 50 bp upstream stretch has full promoter activity for G. intestinalis core histone genes. This is in agreement with results of transcriptional analysis of other Giardia genes demonstrating promoter regions range from 40 to 60 bp in length [18, 19, 43, 44]. Mutational analysis of these regions showed that the promoter elements are only weakly conserved and usually contain triplet A's and/or triplet T's [18, 19, 43, 44]. The 15 bp hi stone m otif (him) with the consensus sequence GRGCGCAGATTNGG, was detected in all four core histone genes, but has not been found in other Giardia genes or in the core histone genes of other eukaryotes . Motivated by the assumption that him is a regulatory element that controls the coordinated expression of the Giardia core histone genes, we decided to characterize the promoter of the histone H4 gene by analyzing its upstream sequence in more detail. The marked decrease in luciferase activity observed with deletions or mutations in him indicates the importance of this motif to the function of the histone H4 promoter. The highly conserved nature of the motif, and our observation that the four histone promoters had approximately equivalent activity, suggest that him also has an important regulatory role in the transcription of the other Giardia core histone genes.
We used quantitative real time PCR to compare the steady state mRNA levels of the four histone genes and found that they were all within a four-fold range (two amplification cycles) of each other. We did not detect any significant differences in the relative levels of core histone mRNAs extracted from exponential phase cultures compared to mRNAs from stationary phase cultures. Moreover, we have not observed any marked changes in core histone mRNA levels during the cell cycle in our analysis of gene expression in semi-synchronized Giardia cultures (J. Yee, unpublished data). These results suggest that the core histone genes of G. intestinalis are constitutively expressed at approximately equivalent levels. These observations are also consistent with the absence of orthologs in this species to the transcription factors SPT10, SPT21, HIR1 or HIR2, which are involved in the expression of replication-dependent histones in yeast [45, 46]. Therefore, the him sequence is likely to be a binding site for a transcription factor that allows a relatively high and constant level of histone gene expression in Giardia. The equivalent results obtained by qRT-PCR assays using cDNA produced from mRNA with either poly(T) or gene-specific primers demonstrate that the histone transcripts are polyadenylated. These observations suggest that the single class of core histone genes in Giardia have a dual function: they provide bulk histones for packaging of newly synthesized DNA during S-phase in the cell cycle, and provide replacement histones for the repair of chromatin during the other stages of the cell cycle.
While information on the identity and function of transcription factors in G. intestinalis is scarce, a recent survey of its genome identified only four general transcription initiation factors among the twelve that are normally associated with transcription in higher eukaryotes . Furthermore, a Giardia TBP was identified that is highly divergent with respect to archaeal and higher eukaryotic TBPs, and it contains substitutions in three out of four phenylalanines required for binding of other TBPs to TATA-sequences . In this study, we showed that proteins from a Giardia nuclear extract bound to a probe containing three him sequences in tandem, and demonstrated the specificity of these DNA-protein interactions by competition assays with an excess of unlabeled DNA containing different sequences. Competition was observed with unlabeled DNA containing the 50 bp minimal H4 promoter or a single wild type him sequence, but not with a region of the H4 coding sequence or with a canonical eukaryotic TATA-box sequence. In addition, a mutated sequence of him had greatly reduced ability to compete for protein binding to the probe. While these results suggest that a protein or proteins are binding specifically to him, the competition observed with DNA containing the H4 promoter with the him deleted and the GDH promoter lacking him appears to contradict this conclusion. However, these two sequences contain both an AT-rich and a g-CAB element, and we showed that the wild-type AT-rich sequence was able to compete for protein binding to the 3him probe but the mutant AT-rich sequence could not. Taken together, these results suggest that either a common protein is binding to him as well as to the g-CAB and AT-rich elements, or more likely, a common cofactor is required for the formation of different protein complexes at each of these promoter elements. One possible candidate for this common cofactor is pot, a protein that was described in our previous characterization of the GDH promoter [18, 19]. However, the identity of the proteins that bind him awaits the completion of our protein purification experiments.