Introduction Version

WGS with SqueezeMeta : no assembly

This documentation is under construction. Please give us your feedbacks by contacting us at hub@gencovery.com.


Whole Genome Shotgun (WGS) metagenomic sequencing allows to sample all the genes of all the organisms present in a given complex sample. This method allows microbiologists to assess bacterial diversity and detect the abundance of microbes in various environments. WGS metagenomics also provides a means of studying non-cultivatable microorganisms that are otherwise difficult or impossible to analyse. 
This type of sequencing data can be used in a so-called (1) mapping analysis against a database of reference genes or (2) metagenomic assemblies followed by annotation of the assembled sequences and identification of the taxa present.
The pipeline used here is called SqueezeMeta. No assembly is performed : not having this step enables a shorter computational time compared with the pipeline with an assembly.

Data upload and preparation

Input fastq folder

One must upload one folder with all the sequencing data using the Databox. You must select the following format: Fastq folder.
Fastq folder : A folder containing all the sequencing data in fastq format, regardless of the sequencing strategy (paired or not).

Sample file

One must upload one file with all the information on the samples using the Databox. You must select the following format: File.

sample_file_list, expected format :
                          Sample1        readfileA_1.fastq       pair1
                          Sample1        readfileA_2.fastq      pair2
                          Sample1        readfileB_1.fastq       pair1
                          Sample1        readfileB_2.fastq      pair2
                          Sample3       readfileD_1.fastq       pair1       noassembly
                          Sample3       readfileD_2.fastq      pair2      noassembly

The sample file has to be tab separated.


Mapping reads

This task: gws_metag- Short Reads Mapping with SqueezeMeta performs taxonomic and functional assignment directly on the raw reads. The database used for the mapping is the nr Genbank database.
The first thing to choose is selecting the number of threads to run the analysis. One can put as much threads as available for the pipeline to take less time.
Another option is clustering_orthologous_groups. If set to true, it will perform a COG analysis (functional annotation) on the reads against the eggNOG database.

Files :

Input : FastQfolder, Sample file
Outputs : -Set of taxonomic, functional assignment and abundance tables
-Set of measures and statistic tables
-squeeze_meta_pipeline_folder (all files created by the pipeline)
In the Set of taxonomic, functional assignment and abundance tables, the computed abundance is a absolute one.
The pipeline produces one file per taxonomic level (e.g. Class Abundance) with the samples in columns and the taxonomic information in rows.
Other tables were made from the previous ones in order to separate the different superkingdoms in different files and permuting rows and columns.
Those tables are easier if one wants to mqke graphics such as stacked barplots.