What is normal? Next generation sequencing-driven analysis of the human circulating miRNAOme

Background MicroRNAs (miRNAs) are short non-protein-coding RNA species that have a regulatory function in modulating protein translation and degradation of specific mRNAs. MicroRNAs are estimated to target approximately 60 % of all human mRNAs and are associated with the regulation of all physiological processes. Similar to many messenger RNAs (mRNA), miRNAs exhibit marked tissue specificity, and appear to be dysregulated in response to specific pathological conditions. Perhaps, one of the most significant findings is that miRNAs are detectable in various biological fluids and are stable during routine clinical processing, paving the way for their use as novel biomarkers. Despite an increasing number of publications reporting individual miRNAs or miRNA signatures to be diagnostic of disease or indicative of response to therapy, there is still a paucity of baseline data necessary for their validation. To this end, we utilised state of the art sequencing technologies to determine the global expression of all circulating miRNAs within the plasma of 18 disease-free human subjects. Results In excess of 500 miRNAs were detected in our study population with expression levels across several orders of magnitude. Ten highly expressed miRNAs accounted for 90 % of the total reads that mapped showing that despite the range of miRNAs present, the total miRNA load of the plasma was predominated by just these few species (50 % of which are blood cell associated). Ranges of expression were determined for all miRNA detected (>500) and a set of highly stable miRNAs identified. Finally, the effects of gender, smoking status and body mass index on miRNA expression were determined. Conclusions The data contained within will be of particular use to researchers performing miRNA-based biomarker screening in plasma and allow shortlisting of candidates a priori to expedite discovery or reduce costs as required. Electronic supplementary material The online version of this article (doi:10.1186/s12867-016-0057-9) contains supplementary material, which is available to authorized users.

extra-cellular signalling molecules, retaining their biological activity in recipient cells [1].
In considering the use of miRNAs as novel biomarkers, it is essential to generate baseline data that describes which miRNA species are present in a given biological fluid, where they likely originate from, and what is considered normal in terms of their patterns of expression. Indeed, we take such data for granted when we consider other routine clinical analyses such as the measurement of plasma/serum aspartate aminotransferase and alanine aminotransferase for the determination of liver function. Despite an increasing number of publications (5897 articles indexed by PubMed at the time of publication) reporting individual miRNAs or miRNA signatures as biomarkers of various conditions, there is still a paucity of baseline data necessary for their validation. To date, there are no comprehensive assessments of plasma miRNA expression conducted using unbiased methodology (i.e. where the miRNA targets are not required a priori) in a representative set of healthy human subjects reported. To this end, we utilised state of the art sequencing and bioinformatic techniques to determine the global expression of all circulating miRNAs within the plasma of 18 diseasefree human subjects with high resolution. We report data to support the key questions outlined above including (1) a comprehensive list of all miRNAs found within human plasma, (2) the expected range of expression in a diseasefree cohort and (3) the likely tissue of origin for a selection of the most highly expressed miRNAs. Furthermore, we report the effects of sex, smoking status and differing body mass index (BMI) on these parameters.

Results and discussion
All data presented herein are derived from 18 human plasma specimens, details of which can be found in Table 1. Circulating RNA was extracted from 5 mL of each plasma sample, yielding an average RNA mass of 0.01 µg ( Table 2). The mass of RNA obtained was consistent irrespective of gender (P > 0.05; two-tailed t test; n = 9). Next generation sequencing for small RNA molecules yielded an average of 8,080,000 high quality (≥Q30) sequencing reads per sample (Fig. 1). Raw sequencing data in fastq format can be obtained from the Short Read Archive (SRA), accession number SRP064104. Mapping of the raw sequencing data to miRBase version 20.0 ( Fig. 2a) identified 548 different miRNAs across the 18 samples analysed. Of these, only 53 were present in all samples ( Table 3). The number of reads mapped and the number of miRNAs identified in each plasma sample is detailed in Fig. 2b, c respectively.
Normalised miRNA expression values were used to compare the expression of individual miRNAs between plasma samples (Additional file 1: S1). In the first instance, we investigated which miRNAs were most abundantly expressed in human plasma. Based upon the mean expression of 18 independent samples, microR-NAs 486-5p, 10b-5p, 320a, 423-5p, 92a-3p, 22-3p, 10a-5p, 181a-5p, 151a-3p, and let-7f-5p were found to be most abundantly expressed in the plasma of our study participants. Surprisingly, the most abundant miRNA (miR-486-5p) accounted for almost 60 % of the total number of sequencing reads mapping to miRNAs and furthermore, the ten most abundant miRNAs accounted for approximately 90 % of mapped reads. To verify these striking findings, we reviewed the results of thirty-seven publically available plasma miRNA-seq datasets obtained via miRmine online (Guan Laboratory, University of Michigan-Additional file 1: S1, sheet 2) and undertook a complete re-analysis (from raw sequencing data) of three randomly selected plasma sequencing runs from the short read archive [accession numbers; SRR1005875, SRR1005877 and SRR1005876]. The top ten miRNAs accounted for 68 and 74 % of the total mapped read count in the reviewed and re-analysed datasets respectively. The fraction of the total plasma miRNA read count attributed to each of the top 10 most highly expressed miRNAs is detailed in Fig. 3 (left-hand axis). The normalised expression (reads per million; RPM expressed in Log 2 scale) of each of these miRNAs in human whole blood, serum and plasma (using publically available data via miRmine) is included on the right-hand axis. Comparison of plasma data from this study and those derived from miRMine revealed no clear correlation, however, significant variation within the later dataset may explain this finding (Fig. 3, right-hand axis error bars).
Next we sought to determine the origin of these highly expressed miRNAs. As per the available literature, five out of the ten most highly expressed miRNAs were previously identified as being highly expressed by cells of the blood. MicroRNAs 486-5p and 92a-3p were shown to be highly expressed by erythrocytes, whilst miRNAs 181a-3p, 151a-3p and let-7f-5p were highly expressed by various blood cell types [11,15]. To ensure our highly expressed miRNAs were naturally present at such levels, as opposed to the result of haemolysis, we determined the expression variation in both blood cell and non-blood cell associated miRNAs with the expectation that these should be equivalent in the absence of haemolysis. Variation in our blood cell associated miRNAs was in fact less than that of the non-blood cell associated miRNAs (Coefficient of variation = 95.8 vs 116.3 %, N = 18), supporting our findings. Leveraging over 300 publically available next generation sequencing datasets (including 66, 13 and 37 in human blood, serum and plasma respectively), we were able to determine the tissue expression profile of each highly expressed miRNA. Whilst each miRNA exhibited a different tissue based expression pattern, nine of the ten (miRNA-151-3p was the exception) highly expressed miRNAs identified herein had median expression levels in either blood, serum or plasma in excess of the median expression for all tissues, suggesting that the blood is a significant source of these. Furthermore, in considering only those miRNA-seq datasets generated from human plasma, nine of the ten (miR-10b-5p was the exception) top expressed miRNAs identified in this study appeared within the upper quartile of expression, confirming that our results mirror those of other studies conducted using similar methods in the same biological sample type. Comprehensive tissue expression data for the most highly expressed miRNAs identified in human plasma is included within Fig. 4. Our realisation that a number of the most highly expressed miRNAs were of blood cell origin led us to consider the potential implications that this may have for biomarker development (for a comprehensive review of the challenges associated with miRNA biomarker development, please refer to [16]. The routine clinical sampling of blood involves venapuncture, delivery to a suitable storage vessel with or without anticoagulant and delivery to the analytical laboratory. The preparation of plasma or serum involves a further step during which blood cells are removed by centrifugation to leave a "cell free" , total number of reads of miRNA size (sum of red and green elements) and the total number of reads mapped to miRBase (green element) for each study participant. c Total number of miRNAs mapped for each study participant sorted according to gender (blue-males, pink-females) (n = 9, N = 18). Note, only those miRNAs detected by at least ten independent sequencing reads, and that were present in at least three study participants are included supernatant. At each point, there is potential to influence the number of blood cell associated miRNAs present in the remaining sample either through haemolysis following venapuncture or during transportation, or via variation in the time and or intensity of centrifugation. In support of this, Pritchard and colleagues have previously demonstrated significant increases in erythrocyte-associated miRNAs (miRs-451, 16, 92a, 486) in haemolysis, and explained miR-150 and 223 expression as a function of lymphocyte and neutrophil count respectively [15]. Thus, small variations in blood sample preparation may lead to individual samples being enriched or depleted in certain blood cell types and thus have significant impacts upon the expression of specific miRNAs. Of concern is the fact that all of the blood cell associated miRNAs identified herein have been identified in biomarker screens and or proposed as novel biomarkers for various conditions (miR-486-5p-gastric adenocarcinoma, miR-92a-3pcolorectal cancer, miR-181-5p-endometrial carcinoma, miR-151a-3p-paracetamol toxicity, let-7f-5p-Alzheimer's disease) and may thus be subject to the aforementioned issues.
In order to determine the normal range of circulating miRNA expression within our disease free population, the mean and the standard error of the normalised expression values was calculated and the miRNAs ranked according to their apparent stability. The most stably expressed miRNAs included microRNAs 486-5p, 25-3p, 10b-5p, 99b-5p, let-7f-5p, 10a-5p, 423-3p, 101-3p, 532-5p, and 103a-3p (Fig. 5). Although several of the most abundantly expressed miRNAs were identified, these highly stable miRNAs represented several orders of expression magnitude including both highly abundant and very lowly expressed miRNAs. Wang et al. [18] have previously reported the stability of a restricted set of plasma and serum miRNAs and despite differences in experimental design and measurement techniques, various miRNAs were identified as highly stable in both studies ( Table 4).
The sole inclusion criterion for this study was the requirement to be free from overt disease at the time of blood donation and thus our patient cohort was diverse in their gender, smoking status and BMI. It was therefore possible to determine the effects of these factors  on circulating miRNA expression. Perhaps unsurprisingly, the gender of the study participant had a significant effect on their circulating miRNA profile. Seventy-five miRNAs were differentially regulated between the two genders ( Fig. 6a) however rather surprisingly 74 of these were up-regulated in females (in the absence of any significant differences in RNA mass obtained, number of clean sequencing reads or indeed the percentage of reads mapped to miRNAs- Fig. 6b, c). MicroRNA 486-5p, present in erythrocytes, was the only miRNA to be expressed at a higher level in males than in females, and we consider that this is likely due to the higher initial erythrocyte count that males present with. In considering why all of the deregulated miRNAs were elevated in the female sex, we determined the genomic location at which they were encoded with a hypothesis that there would be an over representation of miRNAs encoded by the X chromosome. In fact of the 75 miRNAs deregulated, only three were encoded at a locus on the X chromosome confirming that X copy number is unlikely responsible for this phenomenon. A review of the available literature reveals that few studies have previously investigated gender-specific miRNA expression, and there is an absence of those that have specifically considered gender-specific expression in healthy human subjects. In order to validate our findings, we therefore returned to the 37 publically available plasma miRNA-seq datasets, 27 of which had gender information. Considering the 74 miRNAs differentially expressed between male and female subjects in our study, we confirmed that 53 of these were also more highly expressed in the female sex of the publically available datasets. We next considered the impact of smoking status on circulating miRNA expression. Of the 18 participants, 8 (45 %) were self-reported smokers (an equal number of males and females). Smoking was associated with the down-regulation of 27 plasma miRNAs (Fig. 7), including several previously identified as performing tumour suppressor-like functions; let-7i-5p [2], miR-148a-5p [22], miR-218-5p [17], miR-29-3p [20], miR-133a [4], miR-296-5p [10] and miR-370 [21]. We consider this finding particularly interesting; however, these results should be confirmed by analysis of a larger cohort of smokers incorporating robust statistical techniques to control for false discovery. Various publications have investigated the impact of smoking on miRNA expression in the lungs [8] and lung cells [5] and have reported an apparent global down-regulation of miRNA regulation, potentially mediated via disruption of the Erk/dicer/trbp pathway [14]. Here, we show the same phenomenon in the plasma Fig. 3 Fraction of the total sequencing reads mapped occupied by each of the ten most highly expressed miRNAs (based upon the mean of 18 samples) are presented (grey bars) ± S.E.M. The mean expression in human plasma, serum and whole blood derived from 116 publically available datasets is included on the right hand axis for comparison (reads per million-RPM ± S.E.M, in Log2 form). Note for clarity, errors bars are only included for the plasma datasets however, there is little agreement at present between the dysregulated miRNAs identified in each individual study-perhaps reflecting a difference in the smoking cohort, experimental setup or suggesting that there are tissue specific effects.
Finally, we considered the effects of obesity on the circulating miRNA profile of our study participants. Of the 18 donors, 8 had a BMI of greater or equal to 30 and were considered obese (2 males and 6 females). Sixteen miR-NAs were dysregulated in obese individuals, and were more highly expressed compared to normal weight control subjects in all cases (Fig. 8a). Given that our donors had a range of BMI values, we sought to determine whether any of the obesity-associated miRNAs correlated with BMI. The expression of both miR-129-5p and miR-30e-3p were found to correlate with obesity (R = 0.67 and 0.64 respectively) (Fig. 8b, c). In both cases, the relationship was stronger amongst female participants however it is highly likely that this effect is down to a higher number of females than males being obese.

Conclusions
The potential of miRNAs to serve as novel biomarkers is under intense investigation. To date, numerous studies have attempted to correlate miRNA expression with a range of endpoints. Despite the number of studies reporting such miRNA biomarkers, there is still a relative paucity of baseline data reporting the miRNA species expressed in a given biological fluid, what is considered normal in terms of their patterns of expression, and further, how this expression is altered by gender, obesity and smoking status. In this study we utilised state of the art sequencing technology to provide data to support the above key questions. In excess of 500 miRNAs were detected in our study population with expression across several orders of magnitude. However, only 53 of those miRNAs were detected in all participants with some miRNAs being detected in just a single individual. In considering relative expression between miRNAs, the top 10 most highly expressed candidates accounted for 90 % of the total reads that mapped to all miRNAs suggesting that despite the range of miRNAs present, the total miRNA load of the plasma was predominated by just 10 different species. Furthermore, many of the most abundant miRNAs have been shown to be highly expressed in cells of the blood and thus, perhaps with the exception of haematological biomarkers, their use as biomarkers should be approached with caution. Ranges of expression were determined for all miRNA detected (> 500) and a set of highly stable miRNAs identified. Finally, the effects of gender, smoking status and BMI on miRNA expression were determined. These data provide researchers with the ability to (1) determine the presence or absence of individual miRNAs within the plasma of a disease-free population, (2) consider the penetrance of each individual miRNA, (3) gain insight into the relative expression levels and "normal range" of individual miRNAs, ensuring that only suitabily highly expressed and stable candidates are taken forward, and (4) to have an indication of whether individual miRNAs are regulated by gender, smoking or obesity. Perhaps most striking and of most relevance here is the finding that of the large pool of miRNA detected in disease-free plasma, only a very small proportion of this may house suitable biomarker candidates (considering that, with the exception of the 10 most highly expressed miRNAs, only 10 % of the total miRNA pool is left and once miRNAs with low penetrance, those associated with cells of the blood, and those with high levels of inter-individual variation are discounted). Thus, the information contained within will assist researchers in designing more targeted miRNA biomarker studies.

Ethical approval and consent to participate
Sample collection was undertaken by Sera Lab ltd. who obtained ethical approval by an Institutional Review Board (Schulman Associates IRB #201209850). Written informed consent to participate was obtained from each study participant prior to sample collection.

Consent for publication
The supplying company (Sera Lab ltd) obtained written consent to produce reports or articles about the study from each participant, subject to their names being omitted from any such publications.

Samples, RNA isolation and sequencing
This study utilised human control material purchased from SeraLab ltd. All data were analysed anonymously; donor details are included in Table 1. Whole blood was drawn into EDTA containing tubes and stored on ice prior to centrifugation at 1000×g to obtain the plasma component. Plasma samples were frozen at −20 °C immediately after production. RNA was extracted from 5 mL of each plasma sample using the QiaAmp Circulating Nucleic Acid extraction kit in accordance with the manufacturer's standard instructions. The quantity and quality of all RNA extracts was determined using the QuBit fluorimeter (Invitrogen) and Agilent BioAnalyzer (with Small RNA chip) respectively. Small RNA sequencing libraries were preparing using the TruSeq Small RNA kit according to the manufacturer's standard directions. Sequencing was conducted on the Illumina HiSeq 2000 platform with the aim of obtaining >5,000,000 reads per sample.