Computational modeling and in silico analysis of differential regulation of myo-inositol catabolic enzymes in Cryptococcus neoformans

Background Inositol is a key cellular metabolite for many organisms. Cryptococcus neoformans is an opportunistic pathogen which primarily infects the central nervous system, a region of high inositol concentration, of immunocompromised individuals. Through the use of myo-inositol oxygenase C. neoformans can catabolize inositol as a sole carbon source to support growth and viability. Results Three myo-inositol oxygenase gene sequences were identified in the C. neoformans genome. Differential regulation was suggested by computational analyses of the three gene sequences. This included examination of the upstream regulatory regions, identifying ORE/TonE and UASINO sequences, conserved introns/exons, and in frame termination sequences. Homology modeling of the proteins encoded by these genes revealed key differences in the myo-inositol active site. Conclusion The results suggest there are two functional copies of the myo-inositol oxygenase gene in the C. neoformans genome. The functional genes are differentially expressed in response to environmental inositol concentrations. Both the upstream regulatory regions of the genes and the structure of the specific proteins suggest that MIOX1 would function when inositol concentrations are low, whereas MIOX2 would function when inositol concentrations are high.


Background
Myo-inositol, a key cellular metabolite, is a simple six carbon ring sugar with one hydroxyl group on each carbon. Myo-inositol is the precursor for the synthesis of phosphatidylinositol, an essential membrane lipid, an anchor for proteins, and a core component of signal transduction mechanisms [1,2]. Inositol and compounds derived from inositol are among the major nonperturbing intracellular osmolytes which accumulate in response to hypertonic stress of the organism or tissue. Stepwise phosphorylation of inositol yields the myo-inositol polyphosphates. Inositol hexakisphosphate, in particular, has been found in soil, bacteria and most animals [2].
Cryptococcus neoformans is an opportunistic pathogen primarily infecting individuals with compromised immune systems. C. neoformans is found worldwide in soil and pigeon droppings. Under environmentally dry condi-tions, the quiescent fungal spores in the soil or pigeon guano can become airborne. Once in the air mammals can inhale the dehydrated yeast spores. The immune system of a non-immunocompromised individual typically eliminates C. neoformans with out any symptoms of disease. If the host is immunocompromised the pathogen can cause cryptococcosis. C. neoformans infections often localize to the brain and central nervous system (CNS) [3]. C. neoformans is unusual among the fungi in that the pathogen can use inositol as a sole carbon source to support growth. Catabolism of myo-inositol in C. neoformans is through the action of myo-inositol oxygenase (MIOX), though this has not been confirmed as the only pathway [4]. The inositol concentration in the cerebral spinal fluid (CSF) is high (when compared to plasma levels [5]). C. neoformans localized in the CNS could utilize myo-inositol as a substrate for myo-inositol oxygenase (MIOX) in order to generate glucuronic acid for energy production.
The conversion of myo-inositol to glucuronic acid by MIOX involves the cleavage of the inositol ring between the 6 C and 1 C, and a four-electron transfer with 1 atom of oxygen incorporated into the glonate, xylulose and xyulose-5-phosphate which can then enter the pentose phosphate pathway resulting in energy production. This enzyme has been extensively studied in many eukaryotic organisms, recently crystallized, and modeled [6][7][8][9]. However, regulation of the MIOX protein in any organism has only recently been examined. Transcription of the MIOX gene in humans has been shown to respond to osmotic response element (ORE) binding proteins and/or the tonicity-responsive enhancer (TonE) binding protein through conserved motifs [7,10]. Expression of genes regulated by ORE/TonE binding proteins have been shown to increase when an AP-1 protein binding sequence is located downstream. In conditions of high osmolarity, this AP-1 mediated increase in transcription is inhibited by A-Fos or Tam-67 [11]. The promoter of a renal specific oxidoreductase with increased expression in diabetes mellitus, that has been experimentally determined to respond to inositol in media, also contains the conserved ORE motif GGAAA [6]. An additional transcriptional regulating sequence, known as an inositol upstream activation sequence (UAS INO ), has the conserved core sequence of CANNTG and has been identified upstream of several genes encoding proteins involved in phospholipid metabolism [1].
In this study, three genes on separate chromosomes, encoding the MIOX protein, were identified in the C. neoformans genome. The MIOX promoter region, transcriptional regulatory sequences and myo-inositol binding pocket in C. neoformans were characterized. Examination of the genes revealed differential regulation. This exami-nation includes identification of upstream regulatory sequences such as ORE, UAS INO , and TATA boxes, introns, in-frame termination sequences, expressed sequence tags (ESTs) and CpG islands for each sequence. Molecular modeling of the three protein sequences indicated key differences between the isoforms possibly affecting the ability of two isoforms to bind the myo-inositol substrate.
Computational analysis of the promoter region of MIOX1 revealed two possible conserved OREs containing the consensus sequence GGAAA [6]. One of these putative ORE sequences, GGGAAAATTGA, is located at -2137 upstream from the transcriptional start site. Another putative ORE, TGGAAAAAAAGA, is located -645 and is followed by an AP-1 binding sequence (TGATTCA) located at -204. One putative cis-acting inositol upstream activating sequences (UAS INO ) CATGTGGAAT was located at -397, and matches the experimentally determined sequence [13] (Table 1). MIOX1 has one predicted TATA box starting at nucleotide -106. EST b9fo8h9.r1 in the TIGR database aligned with bases -87-229, 280-455 with a 95% identity. EST a7e05cn.r1 aligned with nucleotides 540-764 823-947, and 1007-1058 with 93% identity (Table  2). Thus, the MIOX1 gene contains three introns with GT/ AG splice sites confirmed by comparison of the genomic sequence to ESTs. Four in-frame termination signals were located at the end of the genomic sequence.
The genomic region (TIGR189.m00292 plus +/-3000 extracted from chr09.b3501.040506 STGC) upstream of Optimal alignment of C. neoformans myo-inositol oxygenase to the Mus musculus myo-inositol oxygenase Figure 1 Optimal alignment of C. neoformans myo-inositol oxygenase to the Mus musculus myo-inositol oxygenase. MIOX1, MIOX2 and MIOX3 refers to the C. neoformans myo-inositol oxygenase protein isoforms. The predicted consensus secondary structure for each sequence is below the respective amino acid. Helices are represented by h and are in red. Sheets are represented by s and are in green. Each protein is predicted to have 10 helices and four sheets. Conserved amino acids essential to enzyme function are highlighted in blue. As RNA stabilization can effect functional expression, the predicted RNAs were examined. Secondary and tertiary  analysis of the computationally predicted RNA showed no significant variance between the corresponding structures of the three genes (data not shown).

MIOX Sequence alignment and Homology Modeling
In the Mus musculus MIOX protein the myo-inositol substrate has been determined to be buried in a pocket formed by two short sections of the protein and a hairpin loop [8]. Multiple sequence alignment of the Mus musculus MIOX protein sequence and the three MIOX protein sequences from C. neoformans reveals 100% identity for the two short sections of the protein (Figure 1). Despite the similarity of the protein sequences and the conservation of several key amino acids, homology modeling of the putative proteins encoded by the three C. neoformans genes revealed some significant differences (Figure 2 and 3). The following amino acid numbering is based on the mouse protein [8]. Amino acid Asp-124 was previously identified to be critical for the binding of iron, which is necessary for optimal protein function. This amino acid (Asp-124) is conserved in all three C. neoformans MIOX proteins [8].  Figure 3). The first three residues (Leu, Val, Asp) and residues five through eight (Ser, Asp, Pro Asp) in the MIOX2 and the MIOX3 predicted hairpin loops align with the experimentally determined hairpin loop identified in the Mus musculus. The fourth residue of the hairpin loop has a substitution of Ala for Asp in both the MIOX2 and the MIOX3 proteins, replacing an acidic residue for a smaller non-polar residue. Amino acid residues nine and ten of the hairpin loop are the same as those located in the MIOX1 protein (Thr and Ser  (Table 5).

Discussion
C. neoformans appears to be the only organism in the animal and fungal kingdoms with multiple MIOX genes. Examination of over 60 completed eukaryotic genomes from the animal and fungal kingdoms revealed that if the MIOX gene is present, there is only one highly conserved copy (data not shown). Perhaps the three copies of the MIOX gene in the C. neoformans genome represents a physiological mechanism for survival in various environmental inositol concentrations.
This computational study suggests there are at least two sequences regulating transcription of the C. neoformans MIOX genes, one involves ORE sequences, the other involves UAS INO sequences. ORE sequences were originally identified in vertebrates and UAS INO sequences were demonstrated in the yeast Saccharomyces cerevisiae, but these sequences are present in the C. neoformans genome. As a basidiomycete the C. neoformans genome has been shown to contain features similar to other yeasts yet its gene organization is more complex resembling higher eukaryotes [14]. Both MIOX1 and MIOX2 have two ORE/ TonE sequences upstream of the genes. MIOX protein expression in humans has been demonstrated to be regulated by ORE/TonE binding proteins [7,10]. In vertebrates, the experimentally demonstrated binding sites for each transcription factor are slightly different. The core sequence TGGAAA is recognized by ORE binding protein, whereas the core sequence GGAAAA is recognized by TonE binding protein (also known as NFAT5 [11]). Interestingly, comparison of the C. neoformans promoter regions of MIOX1 and MIOX2 reveals nucleotide differences in the identified ORE/TonE sequences suggesting differential regulation of the two genes. MIOX1 has two ORE like sequences with the conserved TGGAAA sequence coupled with an AP-1 binding sequence. This would render MIOX1 subject to A-Fos and Tam-67 inhibition. Two TonE like sequences are located in MIOX2 however, no AP-1 sequence was identified. The presence of the TonE and the absence of the AP-1 binding sequences suggests MIOX2 is up-regulated in the presence of inositol but is not subject to inhibition by A-Fos or Tam-67 [11].
Unlike MIOX1 and MIOX2, the MIOX3 gene lacked a conserved ORE/TonE sequence within 3000 base pairs upstream of the transcriptional start site suggesting MIOX3 is not regulated by the ORE or TonE transcription factors.
In addition to the ORE and AP-1 sequences, MIOX1 also contains a UAS INO sequence. Several C. neoformans genes involved in phospholipid and phospholipid precursor biosynthesis (including MIOX) have been experimentally found to be regulated by the availability of inositol and choline in the medium [4,12] Studies of the transcriptional control coordinating expression of these genes in S. cerevisiae led to the identification of an upstream activating sequence (UAS) with a core conserved sequence of CANNTG [1,15] [16]. Unexpressed genes, genes expressed at very low levels or only under specific conditions are also less likely to have associated ESTs/cDNAs. Therefore the absence of ESTs for MIOX3 suggests either the gene is only expressed under specific conditions or is not expressed at all.
The identity of the MIOX2 and MIOX3 coding sequences (80%) is significantly less than that of their amino acids (91%). This can be attributed to changes in the wobble position. Manual analysis of the aligned sequences revealed the majority (78%) of the base changes to be in the wobble position. The synonymous changes allowed the identity of the DNA sequences to decrease but did not change most of the amino acids encoded, thereby retaining the protein sequence identity. The conservation of the amino acid sequence despite changes in the DNA suggests that the function of the protein is conserved through selective pressure.
The MIOX substrate, myo-inositol, is held in the active site by several key interactions. Three salt bridges created between Asp-85/Lys127, Asp-88/Lys257 and Asp92/Arg-39 form a lid that holds the substrate in the active site, along with main-chain H-bonds with Thr-32, Arg-39 and Gln-136 that stabilize the active site. All residues, essential in myo-inositol binding were found to be conserved in C. Protein sequence alignment and the homology models generated for the MIOX2 and MIOX3 proteins of C. neoformans also indicate three key differences from the mouse

Conclusion
Multiple copies of the MIOX gene is unique to C. neoformans among the animal and fungal kingdoms. This study suggests that the C. neoformans genome has multiple copies of the MIOX gene which appear to be differentially expressed under various physiological inositol conditions. MIOX1 protein is predicted to be efficient in binding the myo-inositol substrate [12].  Tam-67 at the AP-1 site. This inhibition may be compensated for by the up-regulation of MIOX2. Although the MIOX2 enzyme is not predicted to be as efficient at holding the myo-inositol substrate in the active site, due to loss of disulfide bridges between the lid and main-chain and loss of main-chain hydrogen bonds, expression of MIOX2 should not be inhibited in elevated inositol levels due to the absence of an AP-1 binding sequence. This differentiated regulation of inositol catabolism could facilitate the growth and viability of C. neoformans in various environments.

MIOX Gene Identification and Characterization
To locate the MIOX gene within the Cryptococcus neoformans genome the N-terminus region of MIOX1 protein (previously isolated in this laboratory) was submitted as a query search to the TBLASTN program via BLAST at the Stanford Genome Technology Center (STGC). A sequence with 100% identity to the MIOX protein N-terminus region plus 3000 bases +/-was then used to search for cDNA's and ESTs in the TIGR C. neoformans gene indices database using BLAST (blastn), and the were aligned with GAP from the Wisconsin sequencing package on the W-H2 server (GCG) [16]. Possible open reading frames (ORF) were located using Map (GCG) Translate (GCG), GENSCAN [17], and ORF Finder [18].

MIOX Secondary Structure Prediction
Predictions of protein secondary structures for MIOX1, MIOX2, MIOX3 were computed using J-pred [29], Pre-dictProtein [30], PSIPRED [31], Discrimination of protein Secondary structure Class (DSC) [32], Hydrophobic cluster analysis (HCA) [33], PSSFinder [34], and SAM_T02 [35] methods. The results from each method were compared and a consensus structure for each MIOX protein was generated based on regions of similarity. Consensus secondary structure sequences for each MIOX protein were generated by comparing results obtained by all servers, and taking into account only the sequence regions that were predicted at least 50% reliability or higher by at least five servers. All alignments were generated with CLUSTALW [36]or GAP then edited manually using Bioedit [37].

Template Identification and Protein Modeling
The SWISS-MODEL Comparative Protein Modeling Server [38], along with Modeler within the Accelerys Insight II program suite, were utilized to generate 3D-models of the putative MIOX proteins. The MIOX protein from Mus musculus (PDB code: 2HUO) was used as a template. Each C. neoformans MIOX protein was manually aligned to the template then submitted to the server. The N-terminal does not support the formation of a salt bridge at these locations. The main chain hydrogen bonds involving Arg and Gln are predicted to be conserved in all four proteins. MIOX2 and MIOX3 protein share a substitution of Val for Thr disrupting the main chain hydrogen bond found at that location in the Mus musculus protein. MIOX1 protein has a conservative substitution of Glu for Thr suggesting the preservation of that main chain hydrogen bond.