This documentation is under construction. Please give us your feedbacks by contacting us at hub@gencovery.com.
Introduction
Whole Genome Shotgun (WGS) metagenomic sequencing allows to sample all the genes of all the organisms present in a given complex sample. This method allows microbiologists to assess bacterial diversity and detect the abundance of microbes in various environments. WGS metagenomics also provides a means of studying non-cultivatable microorganisms that are otherwise difficult or impossible to analyse.
This type of sequencing data can be used in a so-called (1) mapping analysis against a database of reference genes or (2) metagenomic assemblies followed by annotation of the assembled sequences and identification of the taxa present.
The pipeline used here is called SqueezeMeta. No assembly is performed : not having this step enables a shorter computational time compared with the pipeline with an assembly.
Data upload and preparation
Input fastq folder
One must upload one folder with all the sequencing data using the Databox. You must select the following format: Fastq folder.
Fastq folder : A folder containing all the sequencing data in fastq format, regardless of the sequencing strategy (paired or not).
Sample file
One must upload one file with all the information on the samples using the Databox. You must select the following format: File.
sample_file_list, expected format :
Sample1 readfileA_1.fastq pair1
Sample1 readfileA_2.fastq pair2
Sample1 readfileB_1.fastq pair1
Sample1 readfileB_2.fastq pair2
Sample3 readfileD_1.fastq pair1 noassembly
Sample3 readfileD_2.fastq pair2 noassembly
The sample file has to be tab separated.
Protocol
Mapping reads
This task: gws_metag- Short Reads Mapping with SqueezeMeta
performs taxonomic and functional assignment directly on the raw reads. The database used for the mapping is the nr Genbank database.
The first thing to choose is selecting the number of threads
to run the analysis. One can put as much threads as available for the pipeline to take less time.
Another option is clustering_orthologous_groups
. If set to true, it will perform a COG analysis (functional annotation) on the reads against the eggNOG database.
Files :
Input : FastQfolder, Sample file
Outputs : -Set of taxonomic, functional assignment and abundance tables
-Set of measures and statistic tables
-squeeze_meta_pipeline_folder (all files created by the pipeline)
In the Set of taxonomic, functional assignment and abundance tables, the computed abundance is a absolute one.
The pipeline produces one file per taxonomic level (e.g. Class Abundance) with the samples in columns and the taxonomic information in rows.
Other tables were made from the previous ones in order to separate the different superkingdoms in different files and permuting rows and columns.
Those tables are easier if one wants to mqke graphics such as stacked barplots.