RNA-Seq mapping - gws_omix

Introduction

RNA sequencing (RNA-Seq) is used to study the transcriptome of living organisms. This high-throughput sequencing technique allows to measure the expression of an organism's genes and thus to compare the transcriptome of individuals under different conditions, to compare gene expression between different organs, at different times, etc.

Another major advantage of this technology is its ability to discover new isophorms, alleles, mutations (SNPs, InDels) within the sequenced genes. RNA-Seq is also an excellent tool for refining the annotation of assembled genomes.

This documentation presents the main steps you need to use RNA-Seq mapping.

Data upload and preparation
Input folder

One must upload one folder with all the sequencing data in the Databox. You must select the following format: Fastq folder.

Making the ready-to-use metadata file

The uBiome Make metadata file task automatically generates a ready-to-use metadata file when given a fastq folder as input. Once the metadata file is generated, you can add specific metadata columns in the expected file format (see below).

Example :

#author: Paulson, Robert		
#data: 1996/08/17		
#project: Chaos		
#types_allowed:categorical or numeric	
#column-type	categorical	categorical	categorical
sample-id	forward-absolute-filepath	reverse-absolute-filepath	Treatment
Sample_1	Sample_1.forward.fq.gz	Sample_1.reverse.fq.gz	ctrl
Sample_2a	Sample_2a.forward.fq.gz	Sample_2a.reverse.fq.gz	T1
Sample_2b	Sample_2b.forward.fq.gz	Sample_2b.reverse.fq.gz	T1
Sample_3a	Sample_3a.forward.fq.gz	Sample_3a.reverse.fq.gz	T2
Sample_3b	Sample_3b.forward.fq.gz	Sample_3b.reverse.fq.gz	T2

Protocol
STEP 0 - Reads quality check OmiX – Fastqc quality check

This step is not mandatory. This task (task: OmiX - Fastqc quality check) allows to investigate visually sequencing quality from a sequencing dataset project.

Files:

Informations :
STEP 1 - Reads quality check and trimming

This step (task: OmiX - Trimgalore quality trimming) allows to investigate

from a sequencing dataset project and to remove reads

and

(parameters: quality=0 to 40 [default=20]; maximum unknown base, [Default=No filter]; minimum size, [Default=20 bp]). For paired-end project,

(i.e., when one reads from the pair) can be kept (parameter: singleton = Yes|No).

Files :

Informations :

Trimagalore

Cutadapt

FastQC

STEP 2.a - Genome indexing

To perform genomic mapping you first need to index your genome data (task: OmiX - STAR genome index). This task can be done according to two options: either with

or with a

(fasta format). For the latter option, we advise you to use genomes from reference databases (ensembl, NCBI, EBI...) which will offer you the annotation file of these genomes (i.e., the position of the genes in this genome). For more information on the annotation file format go to Ensembl website. To get direct download files go to :

Ensembl and choose your species, then download the XXXX.gtf.gz file

Ensembl plant ...

Ensembl fungi ...

Ensembl bacteria ...

Ensembl metazoan ...

Files :

STEP 2.b - Genome mapping

Sequencing datasets which are contained in the upload

are mapped (task: OmiX - STAR genome mapping) on the previously

. Metadata file must be provided to perform this step.

Files :

Output :

Informations :
STEP 3 - Gene transcription quantification

To evaluate gene expression after reads mapping (with STAR, see Step 2.B) using the Salmon suite (task: OmiX - Salmon quantification STAR), you just need to provide the previous step output folder (

containing: mapping file, the previously used genome and annotation file).

Files :

Informations :

STEP 4 - Gene expression differential analysis

Once the gene expression assessment has been performed,

will be processed by the R package Deseq2 (task: OmiX - Deseq2 differential analysis). For this step, you need to specify the metadata column to use to perform the comparison.

(with or without p-value filters) will be available in the output ressource set.

Files :

Informations :