gws_omix

Introduction
Getting Started
ID Cards
Use Cases
Technical documentations
Version

pyDESeq2 pairwise differential analysis

TASK
Typing name :  TASK.gws_omix.pyDESeq2DifferentialAnalysis Brick :  gws_omix

Compute differential analysis using pyDESeq2 python package (pairwise comparison)

  • PyDESeq2, a Python implementation of the DESeq2 method originally developed in R, is a versatile tool for conducting differential expression analysis (DEA) on bulk RNA-seq data. This reimplementation provides similar—but not identical—results: it achieves higher model likelihood and enables faster performance on large datasets.

  • Normalization:
    • Before differential expression analysis: Median-of-ratios normalization (dds.deseq2()) is used to correct for differences in sequencing depth across samples. • After differential expression analysis: VST (Variance Stabilizing Transformation) is applied for exploratory analyses such as PCA, clustering, and heatmaps.

  • The Wald test is used to compare two conditions:
    The null hypothesis of the Wald test states that for each gene, there is no differential expression between two sample groups (e.g., treated vs. control). If the p-value is small (e.g., p < 0.05), the null hypothesis is rejected, suggesting there is only a 5% chance that the observed difference occurred by random chance. However, when testing many genes, a number of non-differentially expressed genes may still appear significant due to random chance (false positives).

  • Visual outputs:
    pydesq2_results_table.csv — DE results sorted by log2FoldChange (raw Wald values).
    Volcano plot Volcano_<contrast> (red = sig, grey = non-sig).
    Heat-map Heatmap_<contrast> for the top-50 DE genes (VST counts).
    Global PCA: `.

  • Example of metadata file:
    Sample forward-absolute-filepath reverse-absolute-filepath Condition Replicate Batch SRR13978645 SRR13978645_1.fastq.gz SRR13978645_2.fastq.gz CTRL R1 Batch1 SRR13978644 SRR13978644_1.fastq.gz SRR13978644_2.fastq.gz CTRL R2 Batch1 SRR13978643 SRR13978643_1.fastq.gz SRR13978643_2.fastq.gz CTRL R3 Batch2 SRR13978642 SRR13978642_1.fastq.gz SRR13978642_2.fastq.gz SPRC1 R1 Batch1 SRR13978641 SRR13978641_1.fastq.gz SRR13978641_2.fastq.gz SPRC2 R1 Batch1 SRR13978640 SRR13978640_1.fastq.gz SRR13978640_2.fastq.gz SPRC2 R2 Batch2

Input

count table matrix
count table matrix
metadata file
tsv metadata file

Output

Differantial expression result
Differential expression results ,providing a summary of genes showing significant changes in expression levels between two conditions.
Principal Component Analysis
Show the difference after dimensional reduction via principal component analysis.
Interactive Average Hierarchical Clustering Heatmap
The average linkage method and the Euclidean distance metric was used for hierarchical clustering. Utilizing scipy's hierarchical clustering, the code groups genes and samples, and then rearranges the original DataFrame based on the clustering outcomes. The resulting heatmap, created using Plotly, visually represents the reordered data, making it easier to discern patterns and relationships within the gene expression dataset.
Interactive Volcano plot
This plot permit to visualize the relationship between the log2 fold change and adjusted p-values for each gene. The color scale represents log2 fold change values, and the size of the points is controlled for better visibility. The resulting plot, titled 'Volcano Plot,' provides insights into gene expression changes and their statistical significance.
heatmap
displaying average expression levels across different groups with rows representing individual genes ensembl id and columns representing samples. The hierarchical clustering dendrograms are typically displayed on the side of the heatmap, showing the relationships between samples based on their similarity in expression profiles.

Configuration

genes_colname

Column name containing gene ids in expression matrix

Type : string

control_condition

normal_condition

Type : string

unnormal_condition

unnormal_condition

Type : string

pvalue_value

Optional

pvalue_value

Type : floatDefault value : 0.05

log2FoldChange_value

Optional

log2FoldChange value

Type : floatDefault value : 0.5