pyDESeq2 pairwise differential analysis

Typing name : TASK.gws_omix.pyDESeq2DifferentialAnalysis

Brick : gws_omix

Compute differential analysis using pyDESeq2 python package (pairwise comparison)

PyDESeq2, a Python implementation of the DESeq2 method originally developed in R, is a versatile tool for conducting differential expression analysis (DEA) on bulk RNA-seq data. This reimplementation provides similar—but not identical—results: it achieves higher model likelihood and enables faster performance on large datasets.
Normalization:
• Before differential expression analysis: Median-of-ratios normalization (dds.deseq2()) is used to correct for differences in sequencing depth across samples. • After differential expression analysis: VST (Variance Stabilizing Transformation) is applied for exploratory analyses such as PCA, clustering, and heatmaps.
The Wald test is used to compare two conditions:
The null hypothesis of the Wald test states that for each gene, there is no differential expression between two sample groups (e.g., treated vs. control). If the p-value is small (e.g., p < 0.05), the null hypothesis is rejected, suggesting there is only a 5% chance that the observed difference occurred by random chance. However, when testing many genes, a number of non-differentially expressed genes may still appear significant due to random chance (false positives).
Visual outputs:
• pydesq2_results_table.csv — DE results sorted by log2FoldChange (raw Wald values).
• Volcano plot Volcano_<contrast> (red = sig, grey = non-sig).
• Heat-map Heatmap_<contrast> for the top-50 DE genes (VST counts).
• Global PCA: `.
Example of metadata file:
Sample forward-absolute-filepath reverse-absolute-filepath Condition Replicate Batch SRR13978645 SRR13978645_1.fastq.gz SRR13978645_2.fastq.gz CTRL R1 Batch1 SRR13978644 SRR13978644_1.fastq.gz SRR13978644_2.fastq.gz CTRL R2 Batch1 SRR13978643 SRR13978643_1.fastq.gz SRR13978643_2.fastq.gz CTRL R3 Batch2 SRR13978642 SRR13978642_1.fastq.gz SRR13978642_2.fastq.gz SPRC1 R1 Batch1 SRR13978641 SRR13978641_1.fastq.gz SRR13978641_2.fastq.gz SPRC2 R1 Batch1 SRR13978640 SRR13978640_1.fastq.gz SRR13978640_2.fastq.gz SPRC2 R2 Batch2

Input

count table matrix

File

metadata file

tsv metadata file

File

Output

Differantial expression result

Differential expression results ,providing a summary of genes showing significant changes in expression levels between two conditions.

Table

Principal Component Analysis

Show the difference after dimensional reduction via principal component analysis.

Plotly resource

Interactive Average Hierarchical Clustering Heatmap

The average linkage method and the Euclidean distance metric was used for hierarchical clustering. Utilizing scipy's hierarchical clustering, the code groups genes and samples, and then rearranges the original DataFrame based on the clustering outcomes. The resulting heatmap, created using Plotly, visually represents the reordered data, making it easier to discern patterns and relationships within the gene expression dataset.

Plotly resource

Interactive Volcano plot

This plot permit to visualize the relationship between the log2 fold change and adjusted p-values for each gene. The color scale represents log2 fold change values, and the size of the points is controlled for better visibility. The resulting plot, titled 'Volcano Plot,' provides insights into gene expression changes and their statistical significance.

Plotly resource

heatmap

displaying average expression levels across different groups with rows representing individual genes ensembl id and columns representing samples. The hierarchical clustering dendrograms are typically displayed on the side of the heatmap, showing the relationships between samples based on their similarity in expression profiles.

File

Configuration

genes_colname

Column name containing gene ids in expression matrix

Type : string

control_condition

normal_condition

Type : string

unnormal_condition

Type : string

pvalue_value

Optional

pvalue_value

Type : float

Default value : 0.05

log2FoldChange_value

Optional

log2FoldChange value

Type : float

Default value : 0.5

Input

Output

Configuration

Have you developed a brick?