
Converting Snakemake Workflows to Constellab
This guide explains how to migrate your Snakemake pipelines to the Constellab platform using AI-assisted conversion.
Simple Conversion Powered by AI
Converting Snakemake pipelines to Constellab is remarkably simple thanks to AI automation. The AI assistant analyzes your Snakemake workflow, understands the rule logic, and automatically generates equivalent Constellab tasks with minimal user intervention. You don't need to manually rewrite code or understand the intricacies of both systems—the AI handles the heavy lifting for you.
Key benefits:
- 🤖 Fully automated analysis of your Snakemake workflow structure
- ⚡ Instant task generation with proper inputs, outputs, and configurations
- 🎯 Smart script conversion from bash to Python for better maintainability
- 📝 Automatic documentation generation for all converted tasks
- ✅ Built-in validation to ensure conversion accuracy
Simply point the AI to your Snakefile, answer a few validation questions, and let it create production-ready Constellab tasks in seconds.
What is Snakemake?
Snakemake is a workflow management system that enables reproducible and scalable data analyses. It uses a Python-based language to define computational pipelines composed of rules connected via input/output file dependencies.
Key features:
- Rule-based workflows: Define independent computational steps (rules) that are connected via file dependencies
- Reproducibility: Environment management through Conda/Mamba integration ensures consistent results
- Scalability: Execute workflows in parallel across local machines, HPC clusters, or cloud platforms
- Python-based DSL: Write workflows using a familiar Python-based domain-specific language
- Automatic parallelization: Snakemake automatically determines which rules can run in parallel based on dependencies
Snakemake has become a standard tool in bioinformatics and scientific computing, particularly for genomics and data analysis workflows.
Snakemake vs. Constellab: A Comparison
Both Snakemake and Constellab are open-source workflow management systems, but they serve different needs:
Key Advantages of Constellab
Constellab's main strength is being a complete platform, not just a workflow engine:
- Unified Interface: Manage pipelines, data, experiments, and results in one place
- Full Traceability: Every execution, parameter change, and data transformation is tracked and auditable
- Data-Centric: First-class data management with visualization, annotation, and sharing capabilities
- Collaboration: Teams can share protocols, datasets, and results seamlessly
- Extensibility: Modular "brick" architecture allows easy addition of new capabilities
- No-Code Options: Build and run workflows through the GUI without writing code
- Hybrid Approach: Supports both GUI-based and code-based workflow development
What is Supported in the Conversion?
The AI-powered conversion supports a wide range of Snakemake workflow features:
Workflow Structures
- ✅ Standard Rules: Single-level rule definitions
- ✅ Complex Workflows: Multi-step pipelines with file dependencies
- ✅ Wildcard Patterns: Handling of sample patterns and batch processing
Rule Execution Types
1. Shell Command Rules ✅
- Shell/Bash scripts embedded in
shell:blocks - Option to convert to Python: AI can translate bash commands to Python code for:Better maintainability and code readabilityEasier debugging with proper error messagesCross-platform compatibilityIntegration with Python data science libraries
- Better maintainability and code readability
- Easier debugging with proper error messages
- Cross-platform compatibility
- Integration with Python data science libraries
Example: fastqc {input} -o {output} → Python with subprocess or BioPython
2. Python Script Rules ✅
- Rules that execute external Python scripts (
script:directive) - Rules with inline Python code blocks (
run:directive) - Automatic integration of Python logic into Constellab task
run()methods - Preservation of Python dependencies and imports
3. Virtual Environment Rules ✅
- Rules using Conda environments (
conda:directive) - Rules using Bioconda packages (common in bioinformatics)
- Rules with Mamba for faster environment solving
- Converted to use
MambaShellProxy,CondaShellProxy, orPipShellProxyin Constellab - Automatic handling of environment YAML files
Example: conda: "envs/qc.yaml" → MambaShellProxy with env file
4. Docker Container Rules 🧪 (Beta)
- Rules using Docker containers (
container:directive) - Rules with Singularity images
- Converted to use Constellab's
DockerServicefor container orchestration - Automatic generation of
docker-compose.ymlfiles - Note: Beta feature—complex container configurations may require manual adjustments
Example: container: "docker://biocontainers/fastqc:0.11.9" → DockerService integration
5. Jupyter Notebook Rules ✅
- Rules that execute Jupyter notebooks (
notebook:directive) - Options to convert notebook cells to Python code or execute via nbconvert/papermill
- Preservation of notebook outputs and visualizations
6. Wrapper Rules ✅
- Rules using Snakemake wrappers (
wrapper:directive) - AI analyzes wrapper functionality and implements equivalent logic
- Support for common bioinformatics wrappers
Current Limitations
Protocol Template Generation 🚧 (Coming Soon)
The automatic generation of Constellab Protocol templates (equivalent to Snakemake workflows) is not yet available but is coming soon. Currently:
- ✅ Individual tasks are generated and immediately usable
- ✅ Tasks can be manually assembled into protocols using the GUI
- ✅ Task connections are documented in conversion output
- 🚧 Automatic protocol template creation is under development
Once available, the AI will generate complete protocol templates that:
- Automatically connect tasks based on Snakemake rule dependencies
- Set default parameters from Snakemake params and config
- Preserve workflow execution order and file dependencies
- Generate reusable templates ready for immediate use
Workaround: After task conversion, manually create protocol templates using the visual editor (see "Build Protocol Templates" section below).
What Gets Converted
Converting Snakemake to Constellab
Overview
Converting a Snakemake workflow to Constellab involves transforming each Snakemake rule into a Constellab Task. The AI-powered conversion command automates most of this process while preserving the original logic.
Conversion Concept
Using the AI Conversion Command
The conversion is performed using the AI command /gws-snakemake-to-constellab.
Step 1: Prepare Your Snakemake Workflow
Ensure your Snakemake workflow is accessible in your workspace:
# Example structure
/lab/user/
├── snakemake/
│ ├── Snakefile # Your Snakemake workflow
│ ├── config.yaml # Configuration file (optional)
│ ├── envs/ # Conda environment files
│ │ ├── qc.yaml
│ │ └── analysis.yaml
│ └── scripts/ # External scripts (optional)
│ └── analyze.pyStep 2: Invoke the Conversion Command
In the Constellab AI assistant, use:
/gws-snakemake-to-constellab /lab/user/snakemake/SnakefileStep 3: Review the Analysis
The AI will analyze your workflow and present:
- Configuration Parameters: All
configandparamsvalues with defaults - Wildcards: Pattern variables used across rules (e.g.,
{sample},{replicate}) - Rule Inventory: Each rule with its:Inputs and outputsExecution directive type (shell, script, run, etc.)Virtual environment requirementsBrief description of functionality
- Workflow Structure: How rules are connected via file dependencies
- Validation Questions:Which rules to convert?Convert bash scripts to Python or keep as shell commands?Handle external scripts inline or keep external?How to handle wildcards (config params vs batch processing)?Confirm task namesTarget brick location
Example Analysis Output:
## Snakemake Workflow Analysis
### Configuration
- Config values: min_length = 10, quality_threshold = 20
- Wildcards: {sample} - values inferred from input files
### Rules to Convert
#### 1. quality_control
- **Input**: FASTQ file (`data/raw/{sample}.fastq`)
- **Output**: QC file (`results/qc/{sample}_qc.txt`), Stats file (`results/qc/{sample}_stats.txt`)
- **Params**: quality_threshold = 20
- **Threads**: 2
- **Execution**: `shell:` directive with FastQC command
- **Environment**: conda environment (`envs/qc.yaml`)
- **Description**: Performs quality control on raw sequencing data using FastQC
#### 2. trim_reads
- **Input**: FASTQ file (`data/raw/{sample}.fastq`), QC file (dependency)
- **Output**: Trimmed FASTQ (`data/trimmed/{sample}_trimmed.fastq`)
- **Params**: min_length = 10
- **Execution**: `shell:` directive with cutadapt command
- **Description**: Trims low-quality reads based on minimum length threshold
#### 3. analyze
- **Input**: Trimmed FASTQ file
- **Output**: Analysis results (`results/analysis/{sample}_analysis.txt`)
- **Params**: threshold = 0.05
- **Threads**: 2
- **Execution**: `script:` directive → `scripts/analyze.py`
- **Environment**: conda environment (`envs/analysis.yaml`)
- **Description**: Analyzes trimmed data using external Python script
#### 4. generate_report
- **Input**: QC stats, Analysis results
- **Output**: Report (`results/{sample}_report.txt`)
- **Execution**: `run:` directive with inline Python code
- **Description**: Generates comprehensive sample report combining QC and analysis
### Workflow Structure
quality_control → trim_reads → analyze → generate_report
### Wildcards
- {sample}: Processes multiple samples (inferred: sample1, sample2, sample3)
- **Conversion Strategy Options**:
- Option A: Create config parameter for sample name (user specifies sample)
- Option B: Process multiple samples in a single task (batch processing)
---
**Questions for Validation:**
1. Should I convert all 4 rules as tasks? (Yes/No or specify which ones)
2. For `shell:` directives, should I convert bash scripts to Python code? (Recommended: Yes for better maintainability)
3. For `script:` directives, should I inline the Python script or keep it external? (Recommended: Inline for single-file tasks)
4. How should I handle wildcards?
- Option A: Config parameter (user specifies sample name at runtime)
- Option B: Batch processing (process all samples in one task execution)
5. Confirm task names: `QualityControl`, `TrimReads`, `Analyze`, `GenerateReport`?
6. Where should I create these tasks? (brick name or create new brick, e.g., "gws_genomics")
Please confirm before I proceed with implementation.Step 4: Confirm Conversion
Answer the validation questions:
Yes, convert all 4 rules.
Convert bash to Python where possible.
Inline the Python script for the analyze rule.
Use config parameter for sample name (Option A).
Task names are good.
Create in a new brick called "gws_genomics".Step 5: Automatic Task Generation
The AI will:
- Generate Constellab Task classes for each rule
- Convert Snakemake execution logic to Python
- Map inputs/outputs to Constellab resources (File, Folder, etc.)
- Create configuration parameters from Snakemake params and wildcards
- Handle virtual environments (conda, container)
- Add comprehensive documentation
- Generate unit tests
Each generated task will be a standalone Python file in your specified brick:
bricks/
└── gws_genomics/
└── src/
└── gws_genomics/
├── quality_control.py
├── trim_reads.py
├── analyze.py
├── generate_report.py
└── ...After Conversion: Using Your Tasks
1. Tasks are Available in the System
Once converted, your tasks are immediately available in Constellab:
- Navigate to the Task Library in the web interface
- Search for your newly created tasks (e.g., "QualityControl", "TrimReads")
- View task documentation, inputs, outputs, and parameters
2. Create Scenarios
You can now run individual tasks as scenarios:
- Go to Scenarios → Create New Scenario
- Select a task from the library
- Configure inputs and parameters
- Execute and monitor the scenario
- View results and execution traces
3. Build Protocol Templates (Recommended)
The Protocol Template is Constellab's equivalent to a Snakemake workflow. It allows you to:
- Chain tasks together in a logical sequence
- Define data flow between tasks
- Set default configurations for reproducible workflows
- Share and reuse complete pipelines
Creating a Protocol Template
Option A: Visual Protocol Editor (GUI)
- Go to Protocols → Create New Protocol
- Drag and drop tasks from the library onto the canvas
- Connect task outputs to inputs by drawing links
- Configure default parameters for each task
- Add annotations and documentation
- Save as a template
Example Protocol Structure for the Genomics Workflow:
[QualityControl] → [TrimReads] → [Analyze] → [GenerateReport]
↓ ↓ ↓ ↓
qc_file trimmed_file analysis_file report
stats_fileBenefits of Protocol Templates
- Reusability: Apply the same workflow to different datasets
- Consistency: Ensure standardized processing across experiments
- Collaboration: Share templates with team members
- Versioning: Track changes to workflow structure over time
- Parameterization: Create flexible templates with adjustable parameters
4. Execute Complete Workflows
Once your protocol template is created:
- Create a Scenario from Template:Select your protocol templateProvide input dataAdjust parameters if neededRun the entire pipeline
- Monitor Execution:Real-time progress trackingView logs for each taskInspect intermediate results
- Access Results:Download output filesVisualize data with built-in viewsExport results to external systems
- Trace Execution:Full audit trail of what ran, when, and with what parametersReproduce results by re-running with identical settingsCompare different runs side-by-side
Detailed Conversion Examples
Example 1: Shell Command Rule
Original Snakemake Rule:
rule quality_control:
input:
"data/raw/{sample}.fastq"
output:
qc="results/qc/{sample}_qc.txt",
stats="results/qc/{sample}_stats.txt"
params:
quality_threshold=20
threads: 2
conda: "envs/qc.yaml"
shell:
"""
fastqc {input} -o results/qc -t {threads}
echo "Quality: {params.quality_threshold}" > {output.stats}
"""Converted Constellab Task:
from gws_core import Task, task_decorator, ConfigParams, TaskInputs, TaskOutputs
from gws_core import InputSpec, OutputSpec, InputSpecs, OutputSpecs, ConfigSpecs
from gws_core import File, StrParam, IntParam, MambaShellProxy
import os
@task_decorator(
unique_name="QualityControl",
human_name="Quality Control",
short_description="Perform quality control on sequencing data"
)
class QualityControl(Task):
"""
[Generated by Snakemake to Task Converter]
Converted from Snakemake rule: quality_control
## Description
Performs quality control analysis on raw FASTQ files using FastQC.
Generates QC reports and quality statistics.
"""
input_specs = InputSpecs({
'fastq_file': InputSpec(
File,
human_name="FASTQ file",
short_description="Raw sequencing data file"
)
})
output_specs = OutputSpecs({
'qc_file': OutputSpec(
File,
human_name="QC file",
short_description="Quality control results"
),
'stats_file': OutputSpec(
File,
human_name="Stats file",
short_description="Quality statistics"
)
})
config_specs = ConfigSpecs({
'sample_name': StrParam(
human_name="Sample name",
short_description="Name of the sample (from wildcard {sample})",
default_value="sample1"
),
'quality_threshold': IntParam(
human_name="Quality threshold",
short_description="Minimum quality score threshold",
default_value=20
),
'threads': IntParam(
human_name="Threads",
short_description="Number of CPU threads to use",
default_value=2,
min_value=1
)
})
def run(self, params: ConfigParams, inputs: TaskInputs) -> TaskOutputs:
# Get inputs
fastq_file = inputs['fastq_file']
# Get parameters
sample_name = params.get_value('sample_name')
quality_threshold = params.get_value('quality_threshold')
threads = params.get_value('threads')
# Create output paths
tmp_dir = self.create_tmp_dir()
qc_path = os.path.join(tmp_dir, f"{sample_name}_qc.txt")
stats_path = os.path.join(tmp_dir, f"{sample_name}_stats.txt")
# Get conda environment file path
env_file_path = os.path.join(os.path.dirname(__file__), "qc_env.yaml")
# Execute with conda environment
mamba = MambaShellProxy(
env_file_path=env_file_path,
env_name="qc_env",
message_dispatcher=self.message_dispatcher
)
self.log_info_message(f"Running quality control for {sample_name} with {threads} threads...")
# Run FastQC
mamba.run(f"fastqc {fastq_file.path} -o {tmp_dir} -t {threads}")
# Generate stats file
with open(stats_path, 'w') as f:
f.write(f"Quality: {quality_threshold}\n")
self.log_success_message("Quality control completed successfully")
# Return outputs
return {
'qc_file': File(qc_path),
'stats_file': File(stats_path)
}Example 2: Python Script Rule
Original Snakemake Rule:
rule analyze:
input:
"data/trimmed/{sample}_trimmed.fastq"
output:
"results/analysis/{sample}_analysis.txt"
params:
threshold=0.05
threads: 2
conda: "envs/analysis.yaml"
script:
"scripts/analyze.py"Converted Constellab Task (with inlined script):
from gws_core import Task, task_decorator, ConfigParams, TaskInputs, TaskOutputs
from gws_core import InputSpec, OutputSpec, InputSpecs, OutputSpecs, ConfigSpecs
from gws_core import File, StrParam, FloatParam, IntParam
import os
@task_decorator(
unique_name="Analyze",
human_name="Analyze",
short_description="Analyze trimmed sequencing data"
)
class Analyze(Task):
"""
[Generated by Snakemake to Task Converter]
Converted from Snakemake rule: analyze
## Description
Analyzes trimmed sequencing data using custom analysis logic.
The original external script has been inlined into this task.
"""
input_specs = InputSpecs({
'trimmed_file': InputSpec(
File,
human_name="Trimmed FASTQ",
short_description="Trimmed sequencing data"
)
})
output_specs = OutputSpecs({
'analysis_file': OutputSpec(
File,
human_name="Analysis results",
short_description="Analysis output file"
)
})
config_specs = ConfigSpecs({
'sample_name': StrParam(
human_name="Sample name",
short_description="Name of the sample",
default_value="sample1"
),
'threshold': FloatParam(
human_name="Threshold",
short_description="Analysis threshold value",
default_value=0.05
),
'threads': IntParam(
human_name="Threads",
short_description="Number of CPU threads",
default_value=2,
min_value=1
)
})
def run(self, params: ConfigParams, inputs: TaskInputs) -> TaskOutputs:
# Get inputs
trimmed_file = inputs['trimmed_file']
# Get parameters
sample_name = params.get_value('sample_name')
threshold = params.get_value('threshold')
threads = params.get_value('threads')
# Create output path
tmp_dir = self.create_tmp_dir()
output_path = os.path.join(tmp_dir, f"{sample_name}_analysis.txt")
# Inlined script logic from scripts/analyze.py
self.log_info_message(f"Analyzing {sample_name} with threshold {threshold}...")
with open(trimmed_file.path, 'r') as f:
data = f.read()
# Perform analysis (original script logic)
result = f"Analysis Results for {sample_name}\n"
result += f"{'=' * 40}\n"
result += f"Threshold: {threshold}\n"
result += f"Threads used: {threads}\n"
result += f"\nInput data preview:\n{data[:200]}\n"
# Write results
with open(output_path, 'w') as f:
f.write(result)
self.log_success_message("Analysis completed")
return {'analysis_file': File(output_path)}Example 3: Python Run Block Rule
Original Snakemake Rule:
rule generate_report:
input:
qc="results/qc/{sample}_stats.txt",
analysis="results/analysis/{sample}_analysis.txt"
output:
"results/{sample}_report.txt"
run:
with open(output[0], 'w') as f_out:
f_out.write(f"Report for {wildcards.sample}\n")
f_out.write("=" * 40 + "\n\n")
with open(input.qc, 'r') as f:
f_out.write("QC Statistics:\n")
f_out.write(f.read() + "\n\n")
with open(input.analysis, 'r') as f:
f_out.write("Analysis Results:\n")
f_out.write(f.read())Converted Constellab Task:
from gws_core import Task, task_decorator, ConfigParams, TaskInputs, TaskOutputs
from gws_core import InputSpec, OutputSpec, InputSpecs, OutputSpecs, ConfigSpecs
from gws_core import File, StrParam
import os
@task_decorator(
unique_name="GenerateReport",
human_name="Generate Report",
short_description="Generate analysis report from QC and analysis results"
)
class GenerateReport(Task):
"""
[Generated by Snakemake to Task Converter]
Converted from Snakemake rule: generate_report
## Description
Combines QC statistics and analysis results into a comprehensive report.
"""
input_specs = InputSpecs({
'qc_stats': InputSpec(
File,
human_name="QC statistics",
short_description="Quality control statistics file"
),
'analysis_results': InputSpec(
File,
human_name="Analysis results",
short_description="Analysis output file"
)
})
output_specs = OutputSpecs({
'report': OutputSpec(
File,
human_name="Report",
short_description="Combined analysis report"
)
})
config_specs = ConfigSpecs({
'sample_name': StrParam(
human_name="Sample name",
short_description="Name of the sample",
default_value="sample1"
)
})
def run(self, params: ConfigParams, inputs: TaskInputs) -> TaskOutputs:
# Get inputs
qc_stats = inputs['qc_stats']
analysis_results = inputs['analysis_results']
# Get parameters
sample_name = params.get_value('sample_name')
# Create output path
tmp_dir = self.create_tmp_dir()
output_path = os.path.join(tmp_dir, f"{sample_name}_report.txt")
# Logic from run: block (directly converted)
with open(output_path, 'w') as f_out:
f_out.write(f"Report for {sample_name}\n")
f_out.write("=" * 40 + "\n\n")
with open(qc_stats.path, 'r') as f:
f_out.write("QC Statistics:\n")
f_out.write(f.read() + "\n\n")
with open(analysis_results.path, 'r') as f:
f_out.write("Analysis Results:\n")
f_out.write(f.read())
self.log_success_message(f"Report generated for {sample_name}")
return {'report': File(output_path)}Handling Special Cases
Wildcards and Batch Processing
Snakemake uses wildcards to process multiple samples. Constellab offers two strategies:
Strategy A: Config Parameter (Single Sample)
- Each task execution processes one sample
- User specifies sample name as a config parameter
- Multiple samples require multiple task executions or protocol runs
Strategy B: Batch Processing (Multiple Samples)
- Task processes all samples in a single execution
- Use
ListParamfor sample names - Loop through samples within the task
Example (Batch Processing):
from gws_core import ListParam
config_specs = ConfigSpecs({
'sample_names': ListParam(
human_name="Sample names",
short_description="List of sample names to process",
default_value=["sample1", "sample2", "sample3"]
)
})
def run(self, params: ConfigParams, inputs: TaskInputs) -> TaskOutputs:
sample_names = params.get_value('sample_names')
results = []
for sample in sample_names:
self.log_info_message(f"Processing {sample}...")
# Process each sample
# ...
return outputsVirtual Environments
Snakemake's conda: directive is converted to MambaShellProxy:
Original:
rule align:
conda: "envs/alignment.yaml"
shell: "bwa mem ..."Converted:
env_file_path = os.path.join(os.path.dirname(__file__), "alignment_env.yaml")
mamba = MambaShellProxy(
env_file_path=env_file_path,
env_name="alignment_env",
message_dispatcher=self.message_dispatcher
)
mamba.run("bwa mem ...")Docker Containers (Beta)
Snakemake's container: directive is converted to DockerService:
Summary
Converting from Snakemake to Constellab offers:
- ✅ Automated conversion with AI assistance
- ✅ Preserved logic from original Snakemake rules
- ✅ Enhanced traceability and audit trails
- ✅ User-friendly interface for non-programmers
- ✅ Complete platform for data and pipeline management
- ✅ Flexible deployment in any environment
- ✅ Reusable protocol templates for standardized workflows
- ✅ Better maintainability through Python-based tasks
Start your conversion today with /gws-snakemake-to-constellab and experience the power of a complete data lab automation platform!
Comments (0)
Write a comment