Metabarcoding analysis - gws_ubiome

Introduction

One of the most cost-effective and rapid genomic approaches to determining the species composition of a microbiome is to sequence one or more genes (called barcodes or tags) common within the living kingdom. For example, sequencing the gene coding for 16S rRNA will reveal the representation and abundance of different bacterial species present in samples. The reads obtained after sequencing are aligned with bacterial genomes referenced in databases, allowing the identification and taxonomic classification of the sequenced species.

Data upload and preparation

Input fastq folder

One must upload one folder with all the sequencing data using the

. You must select the following format: Fastq folder.

Generation of the metadata file

The uBiome Make metadata file task automatically generates a ready-to-use metadata file using ( task: uBiome -Qiime2 metadata table maker) when given a fastq folder as input. Once the metadata file is generated,

in the expected file format (see below).

Exemple :

#author: Paulson, Robert		
#data: 1996/08/17		
#project: Chaos		
#types_allowed:categorical or numeric	
#column-type	categorical	categorical	categorical
sample-id	forward-absolute-filepath	reverse-absolute-filepath	status
Sample_1	Sample_1.forward.fq.gz	Sample_1.reverse.fq.gz	ctrl
Sample_2a	Sample_2a.forward.fq.gz	Sample_2a.reverse.fq.gz	T1
Sample_2b	Sample_2b.forward.fq.gz	Sample_2b.reverse.fq.gz	T1
Sample_3a	Sample_3a.forward.fq.gz	Sample_3a.reverse.fq.gz	T2
Sample_3b	Sample_3b.forward.fq.gz	Sample_3b.reverse.fq.gz	T2

Protocol

STEP 1 - Checking the reads quality

This step (task: uBiome - Qiime2 quality check) allows to investigate the quality of reads from a sequencing dataset project.

Modify it with a spreadsheet app (e.g. Excel...) by adding the informative columns (for example, the treatment column). We advise not to put spaces in the names of samples as they may produce errors with some tools !

Keep the tabulation as a column separator.

Re-upload the file to the databox.

Informations :

STEP 2 - Denoising, clustering sequences

This step method (task: uBiome -Qiime2 featureInference) requires two (paired-end, one for single-end) parameters to

the last bases of each sequence, to remove

of the sequences (see above). Reads can be hard clipped in 5' with an optional option.

Files :

Outputs : Feature frequencies folder

Informations :

STEP 3 - Assessing α rarefaction

For this step, the max depth parameter value that you provide to the task (task: uBiome -Qiime2 rarefaction) should be determined by reviewing the “Frequency per sample” information presented in the previous Feature table file that was created above. In general, choosing a value that is somewhere

seems to work well, but you may want to increase that value if the lines in the resulting rarefaction plot don’t appear to be leveling out, or decrease that value if it seems that many of your samples are lost due to low total frequencies closer to the minimum sampling depth than the maximum sampling depth.

Files :

Informations :

STEP 4 - Taxonomy, diversity assessments

An important parameter that needs to be provided to this step task (task: uBiome - Qiime2 Taxonomy Diversity) is sampling depth, which is the even sampling (i.e., rarefaction) depth. Because most diversity metrics are sensitive to different sampling depths across different samples, this script will randomly subsample the counts from each sample to the value provided for this parameter. For example, if you provide --p-sampling-depth 500, this step will subsample the counts in each sample without replacement so that each sample in the resulting table has a total count of 500. If the total count for any sample(s) are smaller than this value, those samples will be dropped from the diversity analysis.

We recommend making your choice by reviewing the previous rarefication views. Choose a value that is as high as possible (so you retain more sequences per sample) while excluding as few samples as possible. To do so, the visualization file from the previous step will display two lineplots. The plots will display the alpha diversity (observed features or shannon) as a function of the sampling depth. This is used to determine whether the richness or evenness has saturated based on the sampling depth. The rarefaction curve (select the lineplot 2D view) should “level out” as you approach the maximum sampling depth. This plateau value must be evaluated visually.

Files :

Outputs : Diversity folder, diversities index table, distance matrix table (PCoA compatible)

Informations :

1 - Diversity indexes

- Evenness (

2 - Taxonomic classification

For the classification, we are using a pre-trained Machine-learning-based scikit-learn classifiers, that learn which features best distinguish each taxonomic group, adding an additional step to the classification process. In our case, we are using a Naive Bayes classifier pre-trained on the database Greengenes 13_8 99% OTUs full-length sequences and NCBI-16s full-lenght sequences (from

STEP 5 - Samples differential analysis

To use this task (Task: uBiome - Qiime2 ANCOM differential analysis), you need to specify the

to perform the

This will allow you to assess which taxa have significantly different abundances among your samples.

Files :

Outputs : ANCOM test tables (ANCOM test result table, volcano plot ta

Informations :

STEP 6 - Functional analysis prediction

This task (uBiome - 16s Functional Analysis Prediction) permit to predict functional analysis of 16s rRNA data . It uses PICRUSt2 : Phylogenetic Investigation of Communities by Reconstruction of Unobserved States.It wraps a number of tools to generate functional predictions based on 16S rRNA gene sequencing data.

Files :

Informations :

STEP 7 - Functional analysis prediction visualization

This task (uBiome - 16s Functional Analysis Prediction Visualization)permit to analyze and interpret the results of PICRUSt2 functional prediction of 16s rRNA data using ggpicrust2.