Can the metatranscriptomics replace the approach of functional metatranscriptomics/functional metagenomics?

Can the metatranscriptomics replace the approach of functional metatranscriptomics/functional metagenomics?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

While metatranscriptomics reveals information about the expression of genes and their functions too, Functional metatranscriptomics ( allows the characterization of genes expressed by different eukaryotic microorganisms. My question is can we use both of those techniques? I couldn't find out if they will bring the same results or each one of them will give its own.

I would consider these to be complementary approaches. They don't really provide the same information, or even the same kind of information.

A big challenge with sequence-based genomics and transcriptomics is that they rely on sequence homology for functional annotations. In other words, they only provide indirect information about gene function, which can be incorrect or, in many cases, completely absent (this is why you often see annotations like "hypothetical protein xxx" in genomics papers). So what can happen is, you may identify a great number of genes with no functional annotation, so you have no basis for determining what they are doing, or worse, they are incorrectly annotated and you publish misleading reports. This is especially challenging in complex and diverse environments like soil.

The advantage of the functional approach is that you don't have to rely on sequence homology for functional annotation, since you're characterizing the phenotype directly. The disadvantage is that you can only fined the specific functions you are looking for. To bring it back the example you linked, if you're only screening for dipeptide transporters, you would not be able identify anything that doesn't directly influence that phenotype.

So, it really depends on what information your are after in your study. If you wanted to completely replace the functional genomics approach with sequencing approaches, it would only work if your genes of interest already have reliable functional annotations and if your sequencing depth is sufficient to identify them. That said, sequences from a functional study like the one you shared could be used to inform your sequencing approach, or to mine existing metagenomic or metatranscriptomic datasets for genes with share homology. Heck, if you only have a few target genes and they have some well conserved regions, you could probably come up with an acceptable RT-PCR approach too.

Here is my understanding of the terms you present.

Metatranscriptomics refers to RNA-seq of a mixed microbial community or environmental sample. This is a general approach to characterize the gene expression of all organisms in a sample, and will rely on alignment of transcripts to existing genomes or metagenomes for which genes have been called and annotated.

Functional metatranscriptomics refers to the process of isolating RNA, reverse transcribing the RNA to make a cDNA library, and cloning that library into a model host to assess cDNA-encoded functions. This is a method that allows for identification of transcripts that encode a specific function in cases where the organism(s) that produced those transcripts are not genetically tractable. In the ISME paper you link, cDNAs were cloned into a yeast strain lacking dipeptide transporters in order to look for environmental genes that complement the knocked-out functions.

My question is can we use both of those techniques?

These are two techniques that differ in the type and scope of data produced, and their utility in application will depend on the scientific questions you want to answer.

Metatranscriptomic Approach to Analyze the Functional Human Gut Microbiota

Affiliations Unidad Mixta de Investigación en Genómica y Salud-Centro Superior Investigación en Salud Pública (Generalitat Valenciana)/Instituto Cavanilles de Biodiversidad y Biología Evolutiva (Universitat de València), València, Spain, CIBER en Epidemiología y Salud Pública, València, Spain

Affiliations Unidad Mixta de Investigación en Genómica y Salud-Centro Superior Investigación en Salud Pública (Generalitat Valenciana)/Instituto Cavanilles de Biodiversidad y Biología Evolutiva (Universitat de València), València, Spain, CIBER en Epidemiología y Salud Pública, València, Spain

Current address: EMBL-EBI, Cambridge, United Kingdom

Affiliations Unidad Mixta de Investigación en Genómica y Salud-Centro Superior Investigación en Salud Pública (Generalitat Valenciana)/Instituto Cavanilles de Biodiversidad y Biología Evolutiva (Universitat de València), València, Spain, CIBER en Epidemiología y Salud Pública, València, Spain

Affiliations Unidad Mixta de Investigación en Genómica y Salud-Centro Superior Investigación en Salud Pública (Generalitat Valenciana)/Instituto Cavanilles de Biodiversidad y Biología Evolutiva (Universitat de València), València, Spain, CIBER en Epidemiología y Salud Pública, València, Spain

Affiliations Unidad Mixta de Investigación en Genómica y Salud-Centro Superior Investigación en Salud Pública (Generalitat Valenciana)/Instituto Cavanilles de Biodiversidad y Biología Evolutiva (Universitat de València), València, Spain, CIBER en Epidemiología y Salud Pública, València, Spain

Affiliation Unidad Mixta de Investigación en Genómica y Salud-Centro Superior Investigación en Salud Pública (Generalitat Valenciana)/Instituto Cavanilles de Biodiversidad y Biología Evolutiva (Universitat de València), València, Spain

Affiliations Unidad Mixta de Investigación en Genómica y Salud-Centro Superior Investigación en Salud Pública (Generalitat Valenciana)/Instituto Cavanilles de Biodiversidad y Biología Evolutiva (Universitat de València), València, Spain, CIBER en Epidemiología y Salud Pública, València, Spain

Affiliations Unidad Mixta de Investigación en Genómica y Salud-Centro Superior Investigación en Salud Pública (Generalitat Valenciana)/Instituto Cavanilles de Biodiversidad y Biología Evolutiva (Universitat de València), València, Spain, CIBER en Epidemiología y Salud Pública, València, Spain

REVIEW article

Sequencing-based analyses of microbiomes have traditionally focused on addressing the question of community membership and profiling taxonomic abundance through amplicon sequencing of 16 rRNA genes. More recently, shotgun metagenomics, which involves the random sequencing of all genomic content of a microbiome, has dominated this arena due to advancements in sequencing technology throughput and capability to profile genes as well as microbiome membership. While these methods have revealed a great number of insights into a wide variety of microbiomes, both of these approaches only describe the presence of organisms or genes, and not whether they are active members of the microbiome. To obtain deeper insights into how a microbial community responds over time to their changing environmental conditions, microbiome scientists are beginning to employ large-scale metatranscriptomics approaches. Here, we present a comprehensive review on computational metatranscriptomics approaches to study microbial community transcriptomes. We review the major advancements in this burgeoning field, compare strengths and weaknesses to other microbiome analysis methods, list available tools and workflows, and describe use cases and limitations of this method. We envision that this field will continue to grow exponentially, as will the scope of projects (e.g. longitudinal studies of community transcriptional responses to perturbations over time) and the resulting data. This review will provide a list of options for computational analysis of these data and will highlight areas in need of development.


Animal experiments and sample collection

Forty-eight steers were selected from a herd of 738 beef cattle that were born in 2014 and raised at the Roy Berg Kinsella Research Ranch, University of Alberta, according to their breeds and residual feed intake (RFI) ranking. These 48 steers belong to three breeds and two RFI groups (high RFI [H-RFI, inefficient] and low RFI [L-RFI, efficient]), including two purebreds (Angus [ANG] H-RFI, n = 8 L-RFI, n = 8) and Charolais [CHAR] H-RFI, n = 8 L-RFI, n = 8), and one crossbred (Kinsella composite hybrid [HYB] H-RFI, n = 8 L-RFI, n = 8). The animal study was approved by the Animal Care and Use Committee of the University of Alberta (no. AUP00000882), following the guideline of the Canadian Council on Animal Care [30]. The HYB population was bred from multiple beef breeds including Angus, Charolais, Galloway, Hereford, Holstein, Brown Swiss, and Simmental as described previously [31]. These animals were all under the same feedlot condition and fed with the same high-energy finishing diet which consisted of 80% Barley grain, 15% Barley silage, and 5% Killam 30% Beef Supplement Pellets (Tag 849053 Hi-Pro Feeds, Westlock, AB, Canada). Dry matter intake (DMI) and eating frequency (times of an individual visiting the feed bunk per day) were individually recorded using the GrowSafe system (GrowSafe Systems Ltd., Airdrie, AB, Canada). RFI values were calculated based on DMI, average daily gain (ADG), metabolic weight (MWT), and back fat thickness as descried previously [32]. Steers were slaughtered before feeding at Lacombe Research Centre (Agriculture and Agri-Food Canada, Lacombe, AB, Canada). Rumen digesta samples were collected at slaughter, snap-frozen using liquid nitrogen, and stored under − 80 °C until further analysis. Rumen weight was obtained after completely emptying rumen digesta and fluid using a weight balance.

DNA extraction and metagenome sequencing

Total genomic DNA was isolated from rumen digesta using the repeated bead beating plus column (RBB + C) method as described in [33]. The quality and quantity of DNA was measured using a NanoDrop Spectrophotometer ND-1000 (Thermo Fisher Scientific Inc., Wilmington, DE, USA). Metagenome library was constructed using the TruSeq DNA PCR-Free Library Preparation Kit (Illumina, San Diego, CA, USA), and the quantity of each library was evaluated using a Qubit 2.0 fluorimeter (Invitrogen, Carlsbad, CA, USA). Sequencing of metagenome libraries was conducted at the McGill University and Génome Québec Innovation Centre (Montréal, QC, Canada) using Illumina HiSeq 2000 (100 bp paired-end sequencing of

RNA extraction and metatranscriptome sequencing

Total RNA was extracted from rumen disgesta following the procedure described in [13]. The RNA yield was measured using a Qubit 2.0 fluorimeter (Invitrogen), and the RNA quality was measure using an Agilent 2200 TapeStation (Agilent Technologies, Santa Clara, CA, USA). Only samples with RNA integrity number (RIN) ≥ 7.0 were used to generate metatranscriptome libraries. In the current study, two types of metatranscriptome libraries were constructed: total RNA-based metatranscriptome libraries (T-metatranscriptome) and mRNA-enriched metatranscriptome libraries (M-metatranscriptome). For the M-metatranscriptome library construction, rRNA in each sample was depleted using the Ribo-Zero Gold rRNA Removal Kit (Epidemiology) (Illumina) according to the manufacturer’s instruction. Total RNA and enriched mRNA were used for T- and M-metatranscriptome library construction, respectively, using the TruSeq RNA Library Prep Kit v2 (Illumina). Sequencing of T- and M-metatranscriptome libraries was conducted at the McGill University and Génome Québec Innovation Centre (Montréal, QC, Canada) using Illumina HiSeq 2000 (100 bp paired-end sequencing of

140 bp inserts) and 2500 (125 bp paired-end sequencing of

140 bp inserts), respectively.

Analysis of metagenomes and metatranscriptomes

The quality control (QC) of each dataset was performed using Trimmomatic (version 0.35) [34] to trim artificial sequences (adapters), cut low quality bases (quality scores < 20), and remove short reads (< 50 bp). The program SortMeRNA (version 1.9) [35] was used to extract rDNA and rRNA reads from sequencing datasets. Non-rDNA/rRNA reads were then mapped to the bovine genome (UMD 3.1) using Tophat2 (version 2.0.9) [36] to remove potential host DNA and RNA contaminations. Taxonomic profiles of the active rumen microbiota were generated using 16S rRNA extracted from T-metatranscriptomes following the pipeline described in [13]. Briefly, post-QC bacterial and archaeal 16S rRNA reads were aligned to the V1-V3 region-enriched Greengenes database (version gg_13_8) [37] and the V6-V8 region-enriched RIM-DB database [38], respectively. After that, mapped reads were taxonomically classified using the naive Bayesian approach [39] in mothur [40].

To estimate rumen microbial functional profiles, non-rDNA sequences from all metagenomes (n = 48) were pooled, assembled, and annotated to create a functional reference database. Briefly, the pooled metagenomes were de novo assembled using Spherical program [41]. Within Spherical, Velvet [42] was set as the assembler with the kmer size of 31, Bowtie2 [43] was set as the aligner, and 25% of total pooled sequences were subsampled as the input for each iteration of assembly with eight iterations in total. After the de novo assembly of pooled metagenomes reads, a total of 57,696,422 contigs with an average length of 144 bp (max 135,846 bp) and a N50 length of 140 bp were generated. Assembled contigs were then annotated using the blastx module in DIAMOND [44] against the UniProt database [45], and only annotations with bitscore > 40 were kept for the downstream analysis. Overlapped annotations were filtered and converted to the GFF format using the MGKit package ( After discarding short contigs with length < 60 bp, 20,314,713 contigs (35.21%) were successfully annotated with an average length of 195 bp and a N50 length of 197 bp. To identify the functional categories of metagenomes, T-metatranscriptomes, and M-metatranscriptomes, non-rDNA/rRNA sequences were individually aligned to above annotated contigs using Bowtie2 and then were counted using HTSeq [46]. Only reads mapped to contigs with eggNOG annotation information [47] were further retrieved to calculate the abundances of genes and functional categories using MGKit.

Statistical analysis

Values of RFI, DMI, eating frequency, and rumen weight were compared among three breeds using ANOVA, and the comparison between efficient (L-RFI) and inefficient (H-RFI) animals were conducted using t test within each breed separately. In the current study, only microbial taxa with a relative abundance higher than 0.01% in at least 50% of individuals within each breed were considered as being observed and used for the analysis. Bacterial compositional profiles were summarized at phylum and genus levels, and archaeal communities were summarized at the species level. Relative abundances of microbial taxa were arcsine square root transformed [19, 24], and then compared among breeds (using ANOVA) and between RFI groups within each breed (using t test). To make alpha-diversity indices (including Chao1, Shannon evenness, Simpson evenness, Shannon index, and inverse Simpson) comparable among samples, the number of sequences per sample was normalized to the lowest reads number for bacteria (n = 274,885) and archaea (n = 4263), respectively. These indices were compared between H- and L-RFI groups within each breed using Kruskal-Wallis rank-sum test. Principal coordinate analysis (PCoA) was used to visualize rumen microbial communities based on the Bray-Curtis dissimilarity matrices at the genus level for bacteria and at the species level for archaea.

Only functional categories and genes/transcripts with a minimum relative abundance of 0.01% in at least three samples within a dataset were considered as being detected as suggested in [19]. The abundance of each gene/transcript was then normalized into counts per million (cpm). To compare general microbial functional profiles among different datasets, breeds, and RFI groups, a principal component analysis (PCA) was conducted based on the auto-scaled cpm of functional categories and genes (or transcripts). Correlations between datasets were calculated using Spearman’s rank correlation. Differential abundances of functional categories and genes (or transcripts) were compared among sequencing datasets, breeds, and RFI groups using DESeq2 [48].


During sequencing, errors are introduced, such as incorrect nucleotides being called. These are due to the technical limitations of each sequencing platform. Sequencing errors might bias the analysis and can lead to a misinterpretation of the data.

Sequence quality control is therefore an essential first step in your analysis. In this tutorial we use similar tools as described in the tutorial “Quality control”:

    generates a web report that will aid you in assessing the quality of your data combines multiple FastQC reports into a single overview report for trimming and filtering

Hands_on Hands-on: Quality control

  1. FastQC Tool: toolshed galaxy 1 with the following parameters:
    • param-files “Short read data from your current history”: both T1A_forward and T1A_reverse datasets selected with Multiple datasets

Tip: Select multiple datasets

  1. Click on param-files Multiple datasets
  2. Select several files by keeping the Ctrl (or COMMAND ) key pressed and clicking on the files of interest

Inspect the webpage output of FastQC tool for the T1A_forward dataset

Question Questions

Solution Solution

The read length is 151 bp.

  • In “Results”
    • “Which tool was used generate logs?”: FastQC
    • In “FastQC output”
      • “Type of FastQC output?”: Raw data
      • param-files “FastQC output”: both Raw data files (outputs of FastQC tool )

      For more information about how to interpret the plots generated by FastQC and MultiQC, please see this section in our dedicated Quality Control Tutorial.

      Question Questions

      1. How many sequences does each file have?
      2. How is the quality score over the reads ? And the mean score?
      3. Is there any bias in base content?
      4. How is the GC content?
      5. Are there any unindentified bases?
      6. Are there duplicated sequences?
      7. Are there over-represented sequences?
      8. Are there still some adapters left?
      9. What should we do next?

      Solution Solution

      1. Both files have 260,554 sequences
      2. The “Per base sequence quality” is globally good: the quality stays around 40 over the reads , with just a slight decrease at the end (but still higher than 35)

        The reverse reads have a slight worst quality than the forward, a usual case in Illumina sequencing.

        The distribution of the mean quality score is almost at the maximum for the forward and reverse reads :

      3. For both forward and reverse reads , the percentage of A, T, C, G over sequence length is biased. As for any RNA -seq data or more generally libraries produced by priming using random hexamers, the first 10-12 bases have an intrinsic bias.

        We could also see that after these first bases the distinction between C-G and A-T groups is not clear as expected. It explains the error raised by FastQC.

      4. With sequences from random position of a genome, we expect a normal distribution of the %GC of reads around the mean %GC of the genome. Here, we have RNA reads from various genomes. We do not expect a normal distribution of the %GC. Indeed, for the forward reads , the distribution shows with several peaks: maybe corresponding to mean %GC of different organisms.

      5. Almost no N were found in the reads : so almost no unindentified bases

      6. The forward reads seem to have more duplicated reads than the reverse reads with a rate of duplication up to 60% and some reads identified over 10 times.

        In data from RNA (metatranscriptomics data), duplicated reads are expected. The low rate of duplication in reverse reads could be due to bad quality: some nucleotides may have been wrongly identified, altering the reads and reducing the duplication.

      7. The high rate of overrepresented sequences in the forward reads is linked to the high rate of duplication.

      8. Illumina universal adapters are still present in the reads , especially at the 3’ end.

      9. After checking what is wrong, we should think about the errors reported by FastQC: they may come from the type of sequencing or what we sequenced (check the “Quality control” training: FastQC for more details): some like the duplication rate or the base content biases are due to the RNA sequencing. However, despite these challenges, we can still get slightly better sequences for the downstream analyses.

      Even though our data is already of pretty high quality, we can improve it even more by:

      1. Trimming reads to remove bases that were sequenced with low certainty (= low-quality bases) at the ends of the reads
      2. Removing reads of overall bad quality.
      3. Removing reads that are too short to be informative in downstream analysis

      Question Questions

      What are the possible tools to perform such functions?

      Solution Solution

      There are many tools such as Cutadapt, Trimmomatic, Trim Galore, Clip, trim putative adapter sequences. etc. We choose here Cutadapt because it is error tolerant, it is fast and the version is pretty stable.

      There are several tools out there that can perform these steps, but in this analysis we use Cutadapt (Martin 2011).

      Cutadapt also helps find and remove adapter sequences, primers, poly-A tails and/or other unwanted sequences from the input FASTQ files. It trims the input reads by finding the adapter or primer sequences in an error-tolerant way. Additional features include modifying and filtering reads .

      Hands_on Hands-on: Read trimming and filtering

      1. Cutadapt Tool: toolshed with the following parameters to trim low quality sequences:
        • “Single-end or Paired-end reads ?”: Paired-end
          • param-files “FASTQ/A file #1”: T1A_forward
          • param-files “FASTQ/A file #2”: T1A_reverse

          The order is important here!

        • In “Filter Options”
          • “Minimum length”: 150
        • In “Read Modification Options”
          • “Quality cutoff”: 20
        • In “Output Options”
          • “Report”: Yes

      Question Questions

      Why do we run the trimming tool only once on a paired-end dataset and not twice, once for each dataset?

      Solution Solution

      The tool can remove sequences if they become too short during the trimming process. For paired-end files it removes entire sequence pairs if one (or both) of the two reads became shorter than the set length cutoff. Reads of a read-pair that are longer than a given threshold but for which the partner read has become too short can optionally be written out to single-end files. This ensures that the information of a read pair is not lost entirely if only one read is of good quality.

      • Read 1 output to QC controlled forward reads
      • Read 2 output to QC controlled reverse reads

      Cutadapt tool outputs a report file containing some information about the trimming and filtering it performed.

      Question Questions

      1. How many basepairs has been removed from the forwards reads because of bad quality? And from the reverse reads ?
      2. How many sequence pairs have been removed because at least one read was shorter than the length cutoff?

      Solution Solution

      1. 203,654 bp has been trimmed for the forward read (read 1) and 569,653 bp bp on the reverse (read 2). It is not a surprise: we saw that at the end of the sequences the quality was dropping more for the reverse reads than for the forward reads .
      2. 27,677 (10.6%) reads were too short after trimming and then filtered.

      The functional characterization of microbial communities residing in biogas plants by analyzing community RNA and community proteins

      Metagenome analyses for biogas communities revealed their genetic potential. However, they do not enable conclusions on the metabolic activity of community members. To tackle profiling of the metabolically active biogas community, metatranscriptome, and metaproteome analyses were conducted. Metatranscriptome analyses provided insights into the transcriptional activity of biogas microbiomes (see Fig. 5). However, expression of enzymes, the catalysts of metabolism, involves translation of messenger-RNAs implicating the possibility of regulation at the post-transcriptional level. Analysis of the biogas microbiome’s proteome was addressed in metaproteome studies (see Fig. 6).

      Metatranscriptome-based analyses of biogas-producing microbial communities. After sampling, whole community RNA was extracted followed by depletion of ribosomal RNAs. Metatranscriptome cDNA libraries were prepared and sequenced. Resulting metatranscriptome reads were mapped on corresponding metagenome data or MAGs. Finally, Transcripts per million (TPM) values were calculated for each gene to deduce transcriptional profiles of biogas microorganisms

      Ribosomal RNA fragments filtering

      Metatranscriptomics sequencing targets any RNA in a pool of micro-organisms. The highest proportion of the RNA sequences in any organism will be ribosomal RNA s.

      These r RNA s are useful for the taxonomic assignment (i.e. which organisms are found) but they do not provide any functional information, (i.e. which genes are expressed) To make the downstream functional annotation faster, we will sort the r RNA sequences using SortMe RNA (Kopylova et al. 2012). It can handle large RNA databases and sort out all fragments matching to the database with high accuracy and specificity:

      SortMe RNA tool removes any reads identified as r RNA from our dataset, and outputs a log file with more information about this filtering.

      Question Questions

      1. How many reads have been processed?
      2. How many reads have been identified as r RNA given the log file?
      3. Which type of r RNA are identified? Which organisms are we then expected to identify?

      Solution Solution

      1. 465,754 reads are processed: 232,877 for forward and 232,877 for reverse (given the Cutadapt report)

      2. Out of the 465,754 reads , 119,646 (26%) have passed the e-value threshold and are identified as r RNA .

        The proportion of r RNA sequences is then quite high (around 40%), compared to metagenomics data where usually they represent < 1% of the sequences. Indeed there are only few copies of r RNA genes in genomes, but they are expressed a lot for the cells.

        Some of the aligned reads are forward (resp. reverse) reads but the corresponding reverse (resp. forward) reads are not aligned. As we choose “If one of the paired-end reads aligns and the other one does not”: Output both reads to rejected file (--paired_out) , if one read in a pair does not align, both go to unaligned.

      3. The 20.56% r RNA reads are 23S bacterial r RNA , 2.34% 16S bacterial r RNA and 1.74% 18S eukaryotic r RNA . We then expect to identify mostly bacteria but also probably some archae (18S eukaryotic r RNA ).

      This work was supported by the National Science Foundation through the Graduate Research Fellowship Program under award number 1644760 for BK and DC, by the National Institutes of Health through the National Center for Complementary and Integrative Health award number 1R21A T010366, and by the National Institutes of Health under institutional development awards P20GM121344 and P20GM109035 from the National Institute of General Medical Sciences, which fund the Center for Antimicrobial Resistance and Therapeutic Discovery and the COBRE Center for Computational Biology of Human Disease, respectively. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the National Science Foundation or the National Institutes of Health.

      The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


      Technological advances in sequencing technologies and bioinformatics analysis tools now enable the generation of a metagenome from soil, although the ultimate goal of obtaining the entire complement of all genes of all organisms in a given sample of soil still lies in the future. The rich information obtained from a soil metagenome will undoubtedly provide new insights into the taxonomic and functional diversity of soil microorganisms the question is whether it will also yield greater understanding of how C, N, and other nutrients cycle in soil. The purpose of this review is to describe the steps involved in producing a soil metagenome, including some of the potential pitfalls associated with its production and annotation. Possible solutions to some of these challenges are presented. Selected examples from published soil metagenomic studies are discussed, with an emphasis on clues that they have provided about biogeochemical cycling.


      Soil microbial communities are known to be incredibly diverse, harboring tens of thousands of species of bacteria and thousands of species of fungi in a gram of soil. This knowledge is based primarily on recent advances in DNA sequencing, which have made it possible to generate millions of sequence reads quickly and economically. The initial application of this high-throughput sequencing technology explored the taxonomic diversity and composition of soil microbial communities using a polymerase chain reaction (PCR)-based approach that focused on phylogenetically informative ribosomal genes ( 14 57 ). This has become known as pyrotagged sequencing and, with the incorporation of barcoded primers or tags ( 34 ), has rapidly became the standard approach for describing the taxonomic composition of soil microbial communities—the soil “microbiome”. The pyrotagging approach has subsequently been extended to include targeted functional genes, such as nifH, amoA, etc. ( 47 ). Targeted pyrosequencing has the advantage of being able to focus on a gene, or a few genes, of specific interest however, for some applications, such as the interaction among multiple community members or their collective response to environmental perturbations, a more comprehensive and complete inventory of microbial genes is desired—a “metagenome”.

      Initial soil metagenomic studies relied on constructing libraries (e.g., plasmid, bacterial artificial chromosome [BAC], cosmid, fosmid) that were then sequenced with the intent of finding genes that encoded for products of interest, such as antimicrobials or enzymes ( 21 ). The first attempt to generate a comprehensive soil metagenome was reported by 69 , who constructed it by Sanger sequencing of random lengths of DNA that were cloned into a virus (i.e., a phage library). Although the depth of sequencing was insufficient to achieve significant assembly of the DNA sequences into larger contiguous segments, or contigs, the individual reads were useful in comparing the gene content of the Minnesota farm soil metagenome with those obtained from other environments. Subsequent efforts to produce soil metagenomes have primarily used a shotgun sequencing approach that eliminates the need for making libraries and directly sequences the extracted DNA (Table 1).

      Location Site description Soil type Experimental design Biological replication Sequencing platform Sequencing depth Assembly Functional assignment† Comments Reference
      Nunavut, Canada tundra permafrost two depths: active layer and permafrost no 454 GS FLX Titanium 853 Mbp (0.35–0.99 million reads/sample) Phrap assembler (140 Mbp, 134,000 contigs) not given multiple displacement amplification (MDA) was needed to generate sufficient DNA detected genes for methanogenesis 78
      Rothamsted, UK Park Grass experiment (permanent grassland) silty clay loam (Chromic Luvisol) comparison of direct and indirect DNA extraction methods no 454 GS FLX Titanium ?1 million reads none not given extraction methods gave similar functional gene profiles 24
      Pru Toh Daeng, Thailand swamp forest peat pooled sample no 454 GS FLX 45.9 Mbp (0.18 million reads) GS De Novo Assembler (13.4 Mbp, 54,000 contigs averaging 248 bp) SEED: 53 searched mainly for polysaccharide degrading genes 40
      Alaska Hess Creek (black spruce forest) permafrost two depths: active layer and permafrost before and 2 and 7 d after thawing yes (n = 2) IlluminaGAII (2 ´ 113 bp) 39.8 Gbp (176 million reads) Velvet assembler (9.6 Mbp, 3758 contigs >1 kb) KEGG: 11 used emulsion polymerase chain reaction to generate sufficient DNA draft genome of dominant methanogen obtained 46
      New Hampshire Harvard Forest (northern hardwood forest) sandy loam (Typic Dystrochrept) single composite from two soil cores (0–10 cm) no 454 GS FLX Titanium 748 Mbp (1.4 million reads) also companion metatranscriptome 65
      São Paulo, Brazil mangrove forest submerged sediment samples from four sites, one of which was impacted by oil contamination no 454 GS FLX Titanium 215 Mbp (?0.25 million reads/site) none COG: 60 KEGG: 30 metabolic reconstruction of C, N, and S cycles 4
      Rothamsted, UK Park Grass experiment (permanent grassland) silty clay loam (Chromic Luvisol) 13 samples differing by date, depth, and DNA extraction method no 454 GS FLX Titanium 4.8 Gbp (12.5 million reads) Newbler assembler (15.2 Mbp, 267,000 contigs) SEED: 56 extraction method showed more variation than date or depth of sampling 23
      Michigan and Minnesota Kellogg Biological Station (cropped fields) and Cedar Creek Ecosystem Science Reserve (successional grasslands) sandy loam (Hapludalf) and sand (Udipsamment) low, medium, and high N addition at each site yes (n = 3) 454 GS FLX Titanium 518 Mbp (1.35 million reads) none SEED: 25–35 shift in functional capacity with N input significant correlation of metagenome functional and phylogenetic composition 28
      Worldwide cold and hot deserts, selected nondesert biomes various 16 total soils no Illumina GAIIx (2 ´ 100 bp) 6.2 Gbp total (0.39–1.9 million reads per soil) none SEED: 13–23 functional composition correlated with site characteristics 29
      Lucknow, India dumpsite not specified three soils with a gradient of hexachlorocyclohexane contamination no 454 GS FLX Titanium 1.2 Gbp (1.1–1.2 million reads per soil) none not given draft genome of Chromohalobacter salaxigenes 58
      Nevada, Nevada Desert FACE Facility (shrubland) loamy sand (Aridisol) four pooled samples (ambient and elevated CO2 plots, with two locations [creosote bush and interspace]) no 454 GS FLX Titanium 724 Mbp total (0.31–0.68 million reads per sample) none SEED: 36–40 functional genes less discriminating than 16S rRNA genes for the effect of elevated CO2 64
      Nunavut, Canada tundra permafrost four pooled samples (control and three times for treated biopiles of oil-contaminated soil) no 454 GS FLX Titanium 463 Mbp total (0.11–0.46 reads per sample) none not applicable used BLAST to focus on hydrocarbon-degrading genes (found to be higher in treated than control) 79
      Breuil-Chenue, France Norway spruce plantation Alocrisol soil closely spaced soil cores separated into organic and mineral horizons yes (n = 3) 454 GS FLX Titanium Illumina HiSeq 2000 (1 ´ 75 bp) 1.9 Gbp total of 454 (0.41–0.62 million reads per sample) 11.9 Gbp total of Illumina (23–29 million reads per sample) unspecified assembler of Illumina data (0.77 Mbp, 3492 contigs) SEED: 36–60 (454 data) ?25 (Illumina data) differences in functional subsystems between organic and mineral horizons 71
      • † COG, Clusters of Orthologous Groups of proteins KEGG, Kyoto Encyclopedia of Genes and Genomes SEED, SEED subsystem hierarchy.

      Most soil shotgun metagenomes have been obtained using the 454 GS FLX platform with Titanium chemistry, which generates reads of 400 to 500 bp in length. These studies have generated 100-fold differences in the amount of sequence generated, with 0.1 to 1.0 million reads per soil sample being typical and a maximum metagenome size of about 0.5 Gbp. Assembly was attempted in about one-third of these metagenomic studies with some success, although most contigs were relatively short, <1000 bp ( 25 40 78 ). A few studies have used the Illumina sequencing platform for generating shotgun metagenomes from soils ( 29 46 ), and one has combined both sequencing platforms ( 71 ). The soil metagenomes produced using the Illumina system generally produced many more sequences (0.4–29 million reads per sample) and a maximum metagenome size of about 4.0 Gbp. Although a relatively small fraction of the short reads could be assembled into larger contigs, sufficient assembly was obtained in a permafrost soil to produce a draft genome of the dominant methanogen ( 46 ).

      A major goal of soil metagenomic studies is to identify the functional potential of the complex microbial communities, whether using individual reads or assembled contigs. As might be expected, greater success in assigning functions has been obtained with the longer reads generated with 454 sequencing: 20 to 60% assignment depending on the databases used (Table 1). In contrast, only 10 to 25% of the shorter Illumina reads have been successfully assigned, although the 10-fold greater number of sequences resulted in more total functional gene assignments.

      Whether using individual reads or assembled contigs, the studies to date have been effective in understanding the functional potential of microbial communities in soil and in distinguishing among soils and treatments. For practical reasons, such as cost and computational constraints, many studies, particularly the earlier ones, have not had true biological replication.

      Of course, many more soil metagenomic projects are in process: as of May 2013, 48 soil metagenomes were registered at Most of these are not yet published and not all are shotgun metagenomes. As part of our research on Mollisol soils of the Great Plains, several metagenomes have been obtained, including some in excess of 500 million Illumina reads (unpublished data). Assembly of such large datasets is challenging, however.

      Future Perspectives

      There have been major advances in the field of microbiome research, including large-scale population-based studies, such as the Human Microbiome Project, MetaHIT, Lifelines-DEEP, and the Flemish Gut Flora Project, which have identified factors that are linked to gut microbiota composition. Studies have reported that ethnicity and lifestyle could influence gut microbial profiles (Chong et al., 2015 Liu et al., 2016b), and therefore, future studies should include more population-based studies in ethnically diverse groups to clarify this association and to determine the composition of a healthy gut microbiome in a particular population.

      Furthermore, certain pitfalls should be taken into account when performing microbiome analyses. These include experimental design, such as selection of 16S rRNA target region and sequencing platform (Tremblay et al., 2015), sample collection, storage (Vogtmann et al., 2017) and extraction methods, inclusion of positive and negative controls (Weiss et al., 2014), taking cage effects into consideration in animal models (Hildebrand et al., 2013), and the use of discovery and validation cohorts (Forslund et al., 2015 Sabino et al., 2016) as well as robust data analyses that incorporate power calculations (Kelly et al., 2015), appropriate reference genome databases (Balvočiūtė and Huson, 2017 Forster et al., 2016), correction for multiple comparisons (Benjamini and Hochberg, 1995), and confounders (Falony et al., 2016).

      Meta-omic technologies (such as metatranscriptomics, metaproteomics, and metabolomics, as discussed earlier) are increasingly used in the laboratory and enable us to interrogate the taxonomic and functional composition of the microbiome, as well as protein and metabolite synthesis to determine their role in health and disease.

      However, these techniques are not without challenges, which mirror those discussed for microbiome analyses (including sample collection, storage, processing, data analysis strategies, and use of appropriate databases and analysis pipelines). In addition, there is a need for meta-information (i.e., databases of information on sample origin, collection and storage, and experimental and analytical conditions) (Weckwerth and Morgenthal, 2005) and the integration of multiple data sets arising from multi-omic outputs (Abram, 2015), for example, by using network-based approaches, such as the 48-h multi-omic pipeline developed by Quinn et al. (2016) [also refer to review by Aguiar-Pulido et al. (2016)].

      Recent research has also indicated a role of the host in shaping the gut microbiome content, including host genetic variation and host epigenetic factors, as well as the host virome. Furthermore, the effect of exposure to environmental microbes and parasitic gut infections also plays a role in modulating the immune system as well as the gut microbiome. The feasibility of host-genome-epigenome-microbiome investigations lie within our grasp, as evident in the literature presented in this review however, the combined investigation in a single cohort is yet to be endeavored.

      Challenges include the high cost of multi-omic investigations in large, well-characterized cohorts as well the need for sophisticated bioinformatic pipelines and mathematical models to integrate output from several omic data sets. Moreover, in an attempt to attain a whole systems overview (including the functional microbiome as well as host and environmental factors that influence and interact with the microbiome), we require mathematical models and statistical approaches to enable the meaningful biological interpretation of multi-omic outputs.

      Animal models provide strong evidence for the role of the gut microbiome in regulating anxiety- and stress-related phenotypes, and the potential to target the gut microbiome to alleviate anxiety. However, few human studies of the microbiome in anxiety- and stress-related disorders have been published. As effective probiotic treatments in animal models have not translated well to humans, gut microbiome studies of human subjects with anxiety disorders are warranted, before we can even attempt to understand the complexities of the functions, genes, pathways, proteins, and metabolites of the gut microbiome.

      Furthermore, to show causation and to understand the mechanisms through which dysbiosis influences disease, longitudinal studies are needed, preferably with birth cohorts or pre- and postdeployment cohorts, to track disease progression before onset. Such study designs would also enable the investigation of the role of the gut microbiome in treatment response.

      Probiotic intervention studies in humans suggest that the gut microbiome could be targeted to alleviate anxiety- and stress- or trauma-related outcomes. However, interpretation of these findings is impeded by several limitations, including small sample sizes and confounders such as different populations/ethnicities, gender bias, different probiotic strains (or combinations of strains) at different doses and for different treatment durations, and differences in outcome measurements. More concerted efforts to recruit large cohorts and adopt standardized approaches could yield more insights into how the gut microbiome can be targeted to alleviate anxiety- and stress- or trauma-related outcomes.

      Future studies should also address aspects such as the host's baseline microbiota composition and whether it predicts response to the probiotic treatment, possible effects of the probiotic vehicle, dose/response effects, and the stability of the treatment response. These studies should preferably use a longitudinal design, even beyond the standard treatment duration in clinical trials, to fully assess the long-term effects of microbial manipulation on behavior. Once we understand how the microbial composition is associated with disease, the aforementioned recommendations can be used to design more targeted microbial therapies in the near future. Once successful therapies have been designed and proven to be useful, future investigations could also utilize imaging data, such as fMRI and spectroscopy, to measure functional brain changes pre- and post-prebiotic, synbiotic, probiotic, or antibiotic interventions.

      Another complicating factor in the investigation and understanding of anxiety- and trauma-related disorders is phenotypic heterogeneity and the high prevalence of psychiatric and medical comorbid disorders, including depression (Elhai et al., 2008 Rytwinski et al., 2013) and metabolic diseases (Kahl et al., 2015 Meurs et al., 2016). Anxiety- and trauma-related disorders, and their common comorbidities, have been associated with increased inflammation (Zass et al., 2017), suggesting that the gut microbiome could play a role in comorbidity.

      Careful study designs, using large cohorts and inclusion of appropriate controls, will be required to understand the underpinnings of comorbidity. Furthermore, since a plethora of environmental variables has been shown to alter the microbiome, the collection of metadata should be extensive and thorough and variables should be tested for association with microbial composition to correct for the effects of confounding variables during data analysis.

      This review focused on the gut microbiome in the context of the human interactome. However, it should be noted that other microbial habitats include the mouth, skin, urogenital tract, and vagina, and that these could also play an intricate role in the human interactome. As an example, afferent signaling of bronchopulmonary immune activation to the CNS has been described (Hale et al., 2012 Lowry et al., 2016). Immune signals in the bronchopulmonary system reach the brain via the vagus nerve and sympathetic nerves, much like the afferents from the gastrointestinal system, but the specific targets of the bronchopulmonary afferents in the brain are distinct from the specific targets of the gastrointestinal afferents in the brain (Hale et al., 2012).

      Thus, peripheral signals arising from the microbiome in the bronchopulmonary system are not redundant with those arising from the gastrointestinal system, and may have unique cognitive and affective functions. Similar arguments could be made for the skin (Belkaid and Segre, 2014) and oral microbiomes (Castro-Nallar et al., 2015). This review, however, focussed on the gut microbiome due to its established involvement in the MGB and its implications for anxiety- and stress-related disorders. Future studies could further investigate the role of the microbiome in other body sites in the context of anxiety- and stress-related disorders.

      Great efforts are being made to discover the missing heritability in complex disorders, such as anxiety- and trauma-related disorders. Investigations of gene–gene and gene–environment interactions (Nugent et al., 2011), copy number variations (Bersani et al., 2016 Fung et al., 2010 Kawamura et al., 2011), and epigenetic factors (Cappi et al., 2016 Kim et al., 2017) have yielded some additional insights into the molecular etiology of these disorders. However, renewed interest and large-scale focus on microbial communities, investigation of the human interactome, which includes the (gut) microbiome composition, its genes, proteins, and metabolites, as well as host and environmental factors that shape the microbiome, have the potential to unravel the etiology of complex disorders and direct novel treatment strategies.

      Watch the video: Analysis of Metagenomic Data: Metatranscriptomics (September 2022).


  1. Tocho

    Another variant is possible

  2. Noell

    I can advise you on this matter. Together we can find a solution.

  3. Phelan

    it does not happen More exactly

  4. Kigalar

    You commit an error. I can defend the position.

  5. Adi

    I find you admit the error. I propose to examine.

Write a message