Converting Snakemake Workflows to Constellab

Text editor image

Converting Snakemake Workflows to Constellab


This guide explains how to migrate your Snakemake pipelines to the Constellab platform using AI-assisted conversion.


Simple Conversion Powered by AI


Converting Snakemake pipelines to Constellab is remarkably simple thanks to AI automation. The AI assistant analyzes your Snakemake workflow, understands the rule logic, and automatically generates equivalent Constellab tasks with minimal user intervention. You don't need to manually rewrite code or understand the intricacies of both systems—the AI handles the heavy lifting for you.


Key benefits:


  • 🤖 Fully automated analysis of your Snakemake workflow structure
    • ⚡ Instant task generation with proper inputs, outputs, and configurations
      • 🎯 Smart script conversion from bash to Python for better maintainability
        • 📝 Automatic documentation generation for all converted tasks
          • ✅ Built-in validation to ensure conversion accuracy

            Simply point the AI to your Snakefile, answer a few validation questions, and let it create production-ready Constellab tasks in seconds.


            What is Snakemake?


            Snakemake is a workflow management system that enables reproducible and scalable data analyses. It uses a Python-based language to define computational pipelines composed of rules connected via input/output file dependencies.


            Key features:


            • Rule-based workflows: Define independent computational steps (rules) that are connected via file dependencies
              • Reproducibility: Environment management through Conda/Mamba integration ensures consistent results
                • Scalability: Execute workflows in parallel across local machines, HPC clusters, or cloud platforms
                  • Python-based DSL: Write workflows using a familiar Python-based domain-specific language
                    • Automatic parallelization: Snakemake automatically determines which rules can run in parallel based on dependencies

                      Snakemake has become a standard tool in bioinformatics and scientific computing, particularly for genomics and data analysis workflows.


                      Snakemake vs. Constellab: A Comparison


                      Both Snakemake and Constellab are open-source workflow management systems, but they serve different needs:



                      Key Advantages of Constellab


                      Constellab's main strength is being a complete platform, not just a workflow engine:


                      1. Unified Interface: Manage pipelines, data, experiments, and results in one place
                        1. Full Traceability: Every execution, parameter change, and data transformation is tracked and auditable
                          1. Data-Centric: First-class data management with visualization, annotation, and sharing capabilities
                            1. Collaboration: Teams can share protocols, datasets, and results seamlessly
                              1. Extensibility: Modular "brick" architecture allows easy addition of new capabilities
                                1. No-Code Options: Build and run workflows through the GUI without writing code
                                  1. Hybrid Approach: Supports both GUI-based and code-based workflow development

                                    What is Supported in the Conversion?


                                    The AI-powered conversion supports a wide range of Snakemake workflow features:


                                    Workflow Structures


                                    • ✅ Standard Rules: Single-level rule definitions
                                      • ✅ Complex Workflows: Multi-step pipelines with file dependencies
                                        • ✅ Wildcard Patterns: Handling of sample patterns and batch processing

                                          Rule Execution Types


                                          1. Shell Command Rules ✅


                                          • Shell/Bash scripts embedded in shell: blocks
                                            • Option to convert to Python: AI can translate bash commands to Python code for:Better maintainability and code readabilityEasier debugging with proper error messagesCross-platform compatibilityIntegration with Python data science libraries
                                              • Better maintainability and code readability
                                                • Easier debugging with proper error messages
                                                  • Cross-platform compatibility
                                                    • Integration with Python data science libraries

                                                    Examplefastqc {input} -o {output} → Python with subprocess or BioPython


                                                    2. Python Script Rules ✅


                                                    • Rules that execute external Python scripts (script: directive)
                                                      • Rules with inline Python code blocks (run: directive)
                                                        • Automatic integration of Python logic into Constellab task run() methods
                                                          • Preservation of Python dependencies and imports

                                                            3. Virtual Environment Rules ✅


                                                            • Rules using Conda environments (conda: directive)
                                                              • Rules using Bioconda packages (common in bioinformatics)
                                                                • Rules with Mamba for faster environment solving
                                                                  • Converted to use MambaShellProxyCondaShellProxy, or PipShellProxy in Constellab
                                                                    • Automatic handling of environment YAML files

                                                                      Exampleconda: "envs/qc.yaml" → MambaShellProxy with env file


                                                                      4. Docker Container Rules 🧪 (Beta)


                                                                      • Rules using Docker containers (container: directive)
                                                                        • Rules with Singularity images
                                                                          • Converted to use Constellab's DockerService for container orchestration
                                                                            • Automatic generation of docker-compose.yml files
                                                                              • Note: Beta feature—complex container configurations may require manual adjustments

                                                                                Examplecontainer: "docker://biocontainers/fastqc:0.11.9" → DockerService integration


                                                                                5. Jupyter Notebook Rules ✅


                                                                                • Rules that execute Jupyter notebooks (notebook: directive)
                                                                                  • Options to convert notebook cells to Python code or execute via nbconvert/papermill
                                                                                    • Preservation of notebook outputs and visualizations

                                                                                      6. Wrapper Rules ✅


                                                                                      • Rules using Snakemake wrappers (wrapper: directive)
                                                                                        • AI analyzes wrapper functionality and implements equivalent logic
                                                                                          • Support for common bioinformatics wrappers

                                                                                            Current Limitations


                                                                                            Protocol Template Generation 🚧 (Coming Soon)


                                                                                            The automatic generation of Constellab Protocol templates (equivalent to Snakemake workflows) is not yet available but is coming soon. Currently:


                                                                                            • ✅ Individual tasks are generated and immediately usable
                                                                                              • ✅ Tasks can be manually assembled into protocols using the GUI
                                                                                                • ✅ Task connections are documented in conversion output
                                                                                                  • 🚧 Automatic protocol template creation is under development

                                                                                                    Once available, the AI will generate complete protocol templates that:


                                                                                                    • Automatically connect tasks based on Snakemake rule dependencies
                                                                                                      • Set default parameters from Snakemake params and config
                                                                                                        • Preserve workflow execution order and file dependencies
                                                                                                          • Generate reusable templates ready for immediate use

                                                                                                            Workaround: After task conversion, manually create protocol templates using the visual editor (see "Build Protocol Templates" section below).


                                                                                                            What Gets Converted



                                                                                                            Converting Snakemake to Constellab


                                                                                                            Overview


                                                                                                            Converting a Snakemake workflow to Constellab involves transforming each Snakemake rule into a Constellab Task. The AI-powered conversion command automates most of this process while preserving the original logic.


                                                                                                            Conversion Concept



                                                                                                            Using the AI Conversion Command


                                                                                                            The conversion is performed using the AI command /gws-snakemake-to-constellab.


                                                                                                            Step 1: Prepare Your Snakemake Workflow


                                                                                                            Ensure your Snakemake workflow is accessible in your workspace:


                                                                                                            # Example structure
                                                                                                            /lab/user/
                                                                                                            ├── snakemake/
                                                                                                            │   ├── Snakefile             # Your Snakemake workflow
                                                                                                            │   ├── config.yaml           # Configuration file (optional)
                                                                                                            │   ├── envs/                 # Conda environment files
                                                                                                            │   │   ├── qc.yaml
                                                                                                            │   │   └── analysis.yaml
                                                                                                            │   └── scripts/              # External scripts (optional)
                                                                                                            │       └── analyze.py

                                                                                                            Step 2: Invoke the Conversion Command


                                                                                                            In the Constellab AI assistant, use:


                                                                                                            /gws-snakemake-to-constellab /lab/user/snakemake/Snakefile

                                                                                                            Step 3: Review the Analysis


                                                                                                            The AI will analyze your workflow and present:


                                                                                                            1. Configuration Parameters: All config and params values with defaults
                                                                                                              1. Wildcards: Pattern variables used across rules (e.g., {sample}{replicate})
                                                                                                                1. Rule Inventory: Each rule with its:Inputs and outputsExecution directive type (shell, script, run, etc.)Virtual environment requirementsBrief description of functionality
                                                                                                                  1. Workflow Structure: How rules are connected via file dependencies
                                                                                                                    1. Validation Questions:Which rules to convert?Convert bash scripts to Python or keep as shell commands?Handle external scripts inline or keep external?How to handle wildcards (config params vs batch processing)?Confirm task namesTarget brick location

                                                                                                                      Example Analysis Output:


                                                                                                                      ## Snakemake Workflow Analysis
                                                                                                                      
                                                                                                                      ### Configuration
                                                                                                                      - Config values: min_length = 10, quality_threshold = 20
                                                                                                                      - Wildcards: {sample} - values inferred from input files
                                                                                                                      
                                                                                                                      ### Rules to Convert
                                                                                                                      
                                                                                                                      #### 1. quality_control
                                                                                                                      - **Input**: FASTQ file (`data/raw/{sample}.fastq`)
                                                                                                                      - **Output**: QC file (`results/qc/{sample}_qc.txt`), Stats file (`results/qc/{sample}_stats.txt`)
                                                                                                                      - **Params**: quality_threshold = 20
                                                                                                                      - **Threads**: 2
                                                                                                                      - **Execution**: `shell:` directive with FastQC command
                                                                                                                      - **Environment**: conda environment (`envs/qc.yaml`)
                                                                                                                      - **Description**: Performs quality control on raw sequencing data using FastQC
                                                                                                                      
                                                                                                                      #### 2. trim_reads
                                                                                                                      - **Input**: FASTQ file (`data/raw/{sample}.fastq`), QC file (dependency)
                                                                                                                      - **Output**: Trimmed FASTQ (`data/trimmed/{sample}_trimmed.fastq`)
                                                                                                                      - **Params**: min_length = 10
                                                                                                                      - **Execution**: `shell:` directive with cutadapt command
                                                                                                                      - **Description**: Trims low-quality reads based on minimum length threshold
                                                                                                                      
                                                                                                                      #### 3. analyze
                                                                                                                      - **Input**: Trimmed FASTQ file
                                                                                                                      - **Output**: Analysis results (`results/analysis/{sample}_analysis.txt`)
                                                                                                                      - **Params**: threshold = 0.05
                                                                                                                      - **Threads**: 2
                                                                                                                      - **Execution**: `script:` directive → `scripts/analyze.py`
                                                                                                                      - **Environment**: conda environment (`envs/analysis.yaml`)
                                                                                                                      - **Description**: Analyzes trimmed data using external Python script
                                                                                                                      
                                                                                                                      #### 4. generate_report
                                                                                                                      - **Input**: QC stats, Analysis results
                                                                                                                      - **Output**: Report (`results/{sample}_report.txt`)
                                                                                                                      - **Execution**: `run:` directive with inline Python code
                                                                                                                      - **Description**: Generates comprehensive sample report combining QC and analysis
                                                                                                                      
                                                                                                                      ### Workflow Structure
                                                                                                                      quality_control → trim_reads → analyze → generate_report
                                                                                                                      
                                                                                                                      ### Wildcards
                                                                                                                      - {sample}: Processes multiple samples (inferred: sample1, sample2, sample3)
                                                                                                                        - **Conversion Strategy Options**:
                                                                                                                          - Option A: Create config parameter for sample name (user specifies sample)
                                                                                                                          - Option B: Process multiple samples in a single task (batch processing)
                                                                                                                      
                                                                                                                      ---
                                                                                                                      
                                                                                                                      **Questions for Validation:**
                                                                                                                      1. Should I convert all 4 rules as tasks? (Yes/No or specify which ones)
                                                                                                                      2. For `shell:` directives, should I convert bash scripts to Python code? (Recommended: Yes for better maintainability)
                                                                                                                      3. For `script:` directives, should I inline the Python script or keep it external? (Recommended: Inline for single-file tasks)
                                                                                                                      4. How should I handle wildcards?
                                                                                                                         - Option A: Config parameter (user specifies sample name at runtime)
                                                                                                                         - Option B: Batch processing (process all samples in one task execution)
                                                                                                                      5. Confirm task names: `QualityControl`, `TrimReads`, `Analyze`, `GenerateReport`?
                                                                                                                      6. Where should I create these tasks? (brick name or create new brick, e.g., "gws_genomics")
                                                                                                                      
                                                                                                                      Please confirm before I proceed with implementation.

                                                                                                                      Step 4: Confirm Conversion


                                                                                                                      Answer the validation questions:


                                                                                                                      Yes, convert all 4 rules.
                                                                                                                      Convert bash to Python where possible.
                                                                                                                      Inline the Python script for the analyze rule.
                                                                                                                      Use config parameter for sample name (Option A).
                                                                                                                      Task names are good.
                                                                                                                      Create in a new brick called "gws_genomics".

                                                                                                                      Step 5: Automatic Task Generation


                                                                                                                      The AI will:


                                                                                                                      1. Generate Constellab Task classes for each rule
                                                                                                                        1. Convert Snakemake execution logic to Python
                                                                                                                          1. Map inputs/outputs to Constellab resources (File, Folder, etc.)
                                                                                                                            1. Create configuration parameters from Snakemake params and wildcards
                                                                                                                              1. Handle virtual environments (conda, container)
                                                                                                                                1. Add comprehensive documentation
                                                                                                                                  1. Generate unit tests

                                                                                                                                    Each generated task will be a standalone Python file in your specified brick:


                                                                                                                                    bricks/
                                                                                                                                    └── gws_genomics/
                                                                                                                                        └── src/
                                                                                                                                            └── gws_genomics/
                                                                                                                                                ├── quality_control.py
                                                                                                                                                ├── trim_reads.py
                                                                                                                                                ├── analyze.py
                                                                                                                                                ├── generate_report.py
                                                                                                                                                └── ...

                                                                                                                                    After Conversion: Using Your Tasks


                                                                                                                                    1. Tasks are Available in the System


                                                                                                                                    Once converted, your tasks are immediately available in Constellab:


                                                                                                                                    • Navigate to the Task Library in the web interface
                                                                                                                                      • Search for your newly created tasks (e.g., "QualityControl", "TrimReads")
                                                                                                                                        • View task documentation, inputs, outputs, and parameters

                                                                                                                                          2. Create Scenarios


                                                                                                                                          You can now run individual tasks as scenarios:


                                                                                                                                          1. Go to Scenarios → Create New Scenario
                                                                                                                                            1. Select a task from the library
                                                                                                                                              1. Configure inputs and parameters
                                                                                                                                                1. Execute and monitor the scenario
                                                                                                                                                  1. View results and execution traces

                                                                                                                                                    3. Build Protocol Templates (Recommended)


                                                                                                                                                    The Protocol Template is Constellab's equivalent to a Snakemake workflow. It allows you to:


                                                                                                                                                    • Chain tasks together in a logical sequence
                                                                                                                                                      • Define data flow between tasks
                                                                                                                                                        • Set default configurations for reproducible workflows
                                                                                                                                                          • Share and reuse complete pipelines

                                                                                                                                                            Creating a Protocol Template


                                                                                                                                                            Option A: Visual Protocol Editor (GUI)


                                                                                                                                                            1. Go to Protocols → Create New Protocol
                                                                                                                                                              1. Drag and drop tasks from the library onto the canvas
                                                                                                                                                                1. Connect task outputs to inputs by drawing links
                                                                                                                                                                  1. Configure default parameters for each task
                                                                                                                                                                    1. Add annotations and documentation
                                                                                                                                                                      1. Save as a template

                                                                                                                                                                        Example Protocol Structure for the Genomics Workflow:


                                                                                                                                                                        [QualityControl] → [TrimReads] → [Analyze] → [GenerateReport]
                                                                                                                                                                             ↓                  ↓             ↓              ↓
                                                                                                                                                                          qc_file          trimmed_file  analysis_file   report
                                                                                                                                                                          stats_file

                                                                                                                                                                        Benefits of Protocol Templates


                                                                                                                                                                        • Reusability: Apply the same workflow to different datasets
                                                                                                                                                                          • Consistency: Ensure standardized processing across experiments
                                                                                                                                                                            • Collaboration: Share templates with team members
                                                                                                                                                                              • Versioning: Track changes to workflow structure over time
                                                                                                                                                                                • Parameterization: Create flexible templates with adjustable parameters

                                                                                                                                                                                  4. Execute Complete Workflows


                                                                                                                                                                                  Once your protocol template is created:


                                                                                                                                                                                  1. Create a Scenario from Template:Select your protocol templateProvide input dataAdjust parameters if neededRun the entire pipeline
                                                                                                                                                                                    1. Monitor Execution:Real-time progress trackingView logs for each taskInspect intermediate results
                                                                                                                                                                                      1. Access Results:Download output filesVisualize data with built-in viewsExport results to external systems
                                                                                                                                                                                        1. Trace Execution:Full audit trail of what ran, when, and with what parametersReproduce results by re-running with identical settingsCompare different runs side-by-side

                                                                                                                                                                                          Detailed Conversion Examples


                                                                                                                                                                                          Example 1: Shell Command Rule


                                                                                                                                                                                          Original Snakemake Rule:


                                                                                                                                                                                          rule quality_control:
                                                                                                                                                                                              input:
                                                                                                                                                                                                  "data/raw/{sample}.fastq"
                                                                                                                                                                                              output:
                                                                                                                                                                                                  qc="results/qc/{sample}_qc.txt",
                                                                                                                                                                                                  stats="results/qc/{sample}_stats.txt"
                                                                                                                                                                                              params:
                                                                                                                                                                                                  quality_threshold=20
                                                                                                                                                                                              threads: 2
                                                                                                                                                                                              conda: "envs/qc.yaml"
                                                                                                                                                                                              shell:
                                                                                                                                                                                                  """
                                                                                                                                                                                                  fastqc {input} -o results/qc -t {threads}
                                                                                                                                                                                                  echo "Quality: {params.quality_threshold}" > {output.stats}
                                                                                                                                                                                                  """

                                                                                                                                                                                          Converted Constellab Task:


                                                                                                                                                                                          from gws_core import Task, task_decorator, ConfigParams, TaskInputs, TaskOutputs
                                                                                                                                                                                          from gws_core import InputSpec, OutputSpec, InputSpecs, OutputSpecs, ConfigSpecs
                                                                                                                                                                                          from gws_core import File, StrParam, IntParam, MambaShellProxy
                                                                                                                                                                                          import os
                                                                                                                                                                                          
                                                                                                                                                                                          @task_decorator(
                                                                                                                                                                                              unique_name="QualityControl",
                                                                                                                                                                                              human_name="Quality Control",
                                                                                                                                                                                              short_description="Perform quality control on sequencing data"
                                                                                                                                                                                          )
                                                                                                                                                                                          class QualityControl(Task):
                                                                                                                                                                                              """
                                                                                                                                                                                              [Generated by Snakemake to Task Converter]
                                                                                                                                                                                          
                                                                                                                                                                                              Converted from Snakemake rule: quality_control
                                                                                                                                                                                          
                                                                                                                                                                                              ## Description
                                                                                                                                                                                              Performs quality control analysis on raw FASTQ files using FastQC.
                                                                                                                                                                                              Generates QC reports and quality statistics.
                                                                                                                                                                                              """
                                                                                                                                                                                          
                                                                                                                                                                                              input_specs = InputSpecs({
                                                                                                                                                                                                  'fastq_file': InputSpec(
                                                                                                                                                                                                      File,
                                                                                                                                                                                                      human_name="FASTQ file",
                                                                                                                                                                                                      short_description="Raw sequencing data file"
                                                                                                                                                                                                  )
                                                                                                                                                                                              })
                                                                                                                                                                                          
                                                                                                                                                                                              output_specs = OutputSpecs({
                                                                                                                                                                                                  'qc_file': OutputSpec(
                                                                                                                                                                                                      File,
                                                                                                                                                                                                      human_name="QC file",
                                                                                                                                                                                                      short_description="Quality control results"
                                                                                                                                                                                                  ),
                                                                                                                                                                                                  'stats_file': OutputSpec(
                                                                                                                                                                                                      File,
                                                                                                                                                                                                      human_name="Stats file",
                                                                                                                                                                                                      short_description="Quality statistics"
                                                                                                                                                                                                  )
                                                                                                                                                                                              })
                                                                                                                                                                                          
                                                                                                                                                                                              config_specs = ConfigSpecs({
                                                                                                                                                                                                  'sample_name': StrParam(
                                                                                                                                                                                                      human_name="Sample name",
                                                                                                                                                                                                      short_description="Name of the sample (from wildcard {sample})",
                                                                                                                                                                                                      default_value="sample1"
                                                                                                                                                                                                  ),
                                                                                                                                                                                                  'quality_threshold': IntParam(
                                                                                                                                                                                                      human_name="Quality threshold",
                                                                                                                                                                                                      short_description="Minimum quality score threshold",
                                                                                                                                                                                                      default_value=20
                                                                                                                                                                                                  ),
                                                                                                                                                                                                  'threads': IntParam(
                                                                                                                                                                                                      human_name="Threads",
                                                                                                                                                                                                      short_description="Number of CPU threads to use",
                                                                                                                                                                                                      default_value=2,
                                                                                                                                                                                                      min_value=1
                                                                                                                                                                                                  )
                                                                                                                                                                                              })
                                                                                                                                                                                          
                                                                                                                                                                                              def run(self, params: ConfigParams, inputs: TaskInputs) -> TaskOutputs:
                                                                                                                                                                                                  # Get inputs
                                                                                                                                                                                                  fastq_file = inputs['fastq_file']
                                                                                                                                                                                          
                                                                                                                                                                                                  # Get parameters
                                                                                                                                                                                                  sample_name = params.get_value('sample_name')
                                                                                                                                                                                                  quality_threshold = params.get_value('quality_threshold')
                                                                                                                                                                                                  threads = params.get_value('threads')
                                                                                                                                                                                          
                                                                                                                                                                                                  # Create output paths
                                                                                                                                                                                                  tmp_dir = self.create_tmp_dir()
                                                                                                                                                                                                  qc_path = os.path.join(tmp_dir, f"{sample_name}_qc.txt")
                                                                                                                                                                                                  stats_path = os.path.join(tmp_dir, f"{sample_name}_stats.txt")
                                                                                                                                                                                          
                                                                                                                                                                                                  # Get conda environment file path
                                                                                                                                                                                                  env_file_path = os.path.join(os.path.dirname(__file__), "qc_env.yaml")
                                                                                                                                                                                          
                                                                                                                                                                                                  # Execute with conda environment
                                                                                                                                                                                                  mamba = MambaShellProxy(
                                                                                                                                                                                                      env_file_path=env_file_path,
                                                                                                                                                                                                      env_name="qc_env",
                                                                                                                                                                                                      message_dispatcher=self.message_dispatcher
                                                                                                                                                                                                  )
                                                                                                                                                                                          
                                                                                                                                                                                                  self.log_info_message(f"Running quality control for {sample_name} with {threads} threads...")
                                                                                                                                                                                          
                                                                                                                                                                                                  # Run FastQC
                                                                                                                                                                                                  mamba.run(f"fastqc {fastq_file.path} -o {tmp_dir} -t {threads}")
                                                                                                                                                                                          
                                                                                                                                                                                                  # Generate stats file
                                                                                                                                                                                                  with open(stats_path, 'w') as f:
                                                                                                                                                                                                      f.write(f"Quality: {quality_threshold}\n")
                                                                                                                                                                                          
                                                                                                                                                                                                  self.log_success_message("Quality control completed successfully")
                                                                                                                                                                                          
                                                                                                                                                                                                  # Return outputs
                                                                                                                                                                                                  return {
                                                                                                                                                                                                      'qc_file': File(qc_path),
                                                                                                                                                                                                      'stats_file': File(stats_path)
                                                                                                                                                                                                  }

                                                                                                                                                                                          Example 2: Python Script Rule


                                                                                                                                                                                          Original Snakemake Rule:


                                                                                                                                                                                          rule analyze:
                                                                                                                                                                                              input:
                                                                                                                                                                                                  "data/trimmed/{sample}_trimmed.fastq"
                                                                                                                                                                                              output:
                                                                                                                                                                                                  "results/analysis/{sample}_analysis.txt"
                                                                                                                                                                                              params:
                                                                                                                                                                                                  threshold=0.05
                                                                                                                                                                                              threads: 2
                                                                                                                                                                                              conda: "envs/analysis.yaml"
                                                                                                                                                                                              script:
                                                                                                                                                                                                  "scripts/analyze.py"

                                                                                                                                                                                          Converted Constellab Task (with inlined script):


                                                                                                                                                                                          from gws_core import Task, task_decorator, ConfigParams, TaskInputs, TaskOutputs
                                                                                                                                                                                          from gws_core import InputSpec, OutputSpec, InputSpecs, OutputSpecs, ConfigSpecs
                                                                                                                                                                                          from gws_core import File, StrParam, FloatParam, IntParam
                                                                                                                                                                                          import os
                                                                                                                                                                                          
                                                                                                                                                                                          @task_decorator(
                                                                                                                                                                                              unique_name="Analyze",
                                                                                                                                                                                              human_name="Analyze",
                                                                                                                                                                                              short_description="Analyze trimmed sequencing data"
                                                                                                                                                                                          )
                                                                                                                                                                                          class Analyze(Task):
                                                                                                                                                                                              """
                                                                                                                                                                                              [Generated by Snakemake to Task Converter]
                                                                                                                                                                                          
                                                                                                                                                                                              Converted from Snakemake rule: analyze
                                                                                                                                                                                          
                                                                                                                                                                                              ## Description
                                                                                                                                                                                              Analyzes trimmed sequencing data using custom analysis logic.
                                                                                                                                                                                              The original external script has been inlined into this task.
                                                                                                                                                                                              """
                                                                                                                                                                                          
                                                                                                                                                                                              input_specs = InputSpecs({
                                                                                                                                                                                                  'trimmed_file': InputSpec(
                                                                                                                                                                                                      File,
                                                                                                                                                                                                      human_name="Trimmed FASTQ",
                                                                                                                                                                                                      short_description="Trimmed sequencing data"
                                                                                                                                                                                                  )
                                                                                                                                                                                              })
                                                                                                                                                                                          
                                                                                                                                                                                              output_specs = OutputSpecs({
                                                                                                                                                                                                  'analysis_file': OutputSpec(
                                                                                                                                                                                                      File,
                                                                                                                                                                                                      human_name="Analysis results",
                                                                                                                                                                                                      short_description="Analysis output file"
                                                                                                                                                                                                  )
                                                                                                                                                                                              })
                                                                                                                                                                                          
                                                                                                                                                                                              config_specs = ConfigSpecs({
                                                                                                                                                                                                  'sample_name': StrParam(
                                                                                                                                                                                                      human_name="Sample name",
                                                                                                                                                                                                      short_description="Name of the sample",
                                                                                                                                                                                                      default_value="sample1"
                                                                                                                                                                                                  ),
                                                                                                                                                                                                  'threshold': FloatParam(
                                                                                                                                                                                                      human_name="Threshold",
                                                                                                                                                                                                      short_description="Analysis threshold value",
                                                                                                                                                                                                      default_value=0.05
                                                                                                                                                                                                  ),
                                                                                                                                                                                                  'threads': IntParam(
                                                                                                                                                                                                      human_name="Threads",
                                                                                                                                                                                                      short_description="Number of CPU threads",
                                                                                                                                                                                                      default_value=2,
                                                                                                                                                                                                      min_value=1
                                                                                                                                                                                                  )
                                                                                                                                                                                              })
                                                                                                                                                                                          
                                                                                                                                                                                              def run(self, params: ConfigParams, inputs: TaskInputs) -> TaskOutputs:
                                                                                                                                                                                                  # Get inputs
                                                                                                                                                                                                  trimmed_file = inputs['trimmed_file']
                                                                                                                                                                                          
                                                                                                                                                                                                  # Get parameters
                                                                                                                                                                                                  sample_name = params.get_value('sample_name')
                                                                                                                                                                                                  threshold = params.get_value('threshold')
                                                                                                                                                                                                  threads = params.get_value('threads')
                                                                                                                                                                                          
                                                                                                                                                                                                  # Create output path
                                                                                                                                                                                                  tmp_dir = self.create_tmp_dir()
                                                                                                                                                                                                  output_path = os.path.join(tmp_dir, f"{sample_name}_analysis.txt")
                                                                                                                                                                                          
                                                                                                                                                                                                  # Inlined script logic from scripts/analyze.py
                                                                                                                                                                                                  self.log_info_message(f"Analyzing {sample_name} with threshold {threshold}...")
                                                                                                                                                                                          
                                                                                                                                                                                                  with open(trimmed_file.path, 'r') as f:
                                                                                                                                                                                                      data = f.read()
                                                                                                                                                                                          
                                                                                                                                                                                                  # Perform analysis (original script logic)
                                                                                                                                                                                                  result = f"Analysis Results for {sample_name}\n"
                                                                                                                                                                                                  result += f"{'=' * 40}\n"
                                                                                                                                                                                                  result += f"Threshold: {threshold}\n"
                                                                                                                                                                                                  result += f"Threads used: {threads}\n"
                                                                                                                                                                                                  result += f"\nInput data preview:\n{data[:200]}\n"
                                                                                                                                                                                          
                                                                                                                                                                                                  # Write results
                                                                                                                                                                                                  with open(output_path, 'w') as f:
                                                                                                                                                                                                      f.write(result)
                                                                                                                                                                                          
                                                                                                                                                                                                  self.log_success_message("Analysis completed")
                                                                                                                                                                                          
                                                                                                                                                                                                  return {'analysis_file': File(output_path)}

                                                                                                                                                                                          Example 3: Python Run Block Rule


                                                                                                                                                                                          Original Snakemake Rule:


                                                                                                                                                                                          rule generate_report:
                                                                                                                                                                                              input:
                                                                                                                                                                                                  qc="results/qc/{sample}_stats.txt",
                                                                                                                                                                                                  analysis="results/analysis/{sample}_analysis.txt"
                                                                                                                                                                                              output:
                                                                                                                                                                                                  "results/{sample}_report.txt"
                                                                                                                                                                                              run:
                                                                                                                                                                                                  with open(output[0], 'w') as f_out:
                                                                                                                                                                                                      f_out.write(f"Report for {wildcards.sample}\n")
                                                                                                                                                                                                      f_out.write("=" * 40 + "\n\n")
                                                                                                                                                                                          
                                                                                                                                                                                                      with open(input.qc, 'r') as f:
                                                                                                                                                                                                          f_out.write("QC Statistics:\n")
                                                                                                                                                                                                          f_out.write(f.read() + "\n\n")
                                                                                                                                                                                          
                                                                                                                                                                                                      with open(input.analysis, 'r') as f:
                                                                                                                                                                                                          f_out.write("Analysis Results:\n")
                                                                                                                                                                                                          f_out.write(f.read())

                                                                                                                                                                                          Converted Constellab Task:


                                                                                                                                                                                          from gws_core import Task, task_decorator, ConfigParams, TaskInputs, TaskOutputs
                                                                                                                                                                                          from gws_core import InputSpec, OutputSpec, InputSpecs, OutputSpecs, ConfigSpecs
                                                                                                                                                                                          from gws_core import File, StrParam
                                                                                                                                                                                          import os
                                                                                                                                                                                          
                                                                                                                                                                                          @task_decorator(
                                                                                                                                                                                              unique_name="GenerateReport",
                                                                                                                                                                                              human_name="Generate Report",
                                                                                                                                                                                              short_description="Generate analysis report from QC and analysis results"
                                                                                                                                                                                          )
                                                                                                                                                                                          class GenerateReport(Task):
                                                                                                                                                                                              """
                                                                                                                                                                                              [Generated by Snakemake to Task Converter]
                                                                                                                                                                                          
                                                                                                                                                                                              Converted from Snakemake rule: generate_report
                                                                                                                                                                                          
                                                                                                                                                                                              ## Description
                                                                                                                                                                                              Combines QC statistics and analysis results into a comprehensive report.
                                                                                                                                                                                              """
                                                                                                                                                                                          
                                                                                                                                                                                              input_specs = InputSpecs({
                                                                                                                                                                                                  'qc_stats': InputSpec(
                                                                                                                                                                                                      File,
                                                                                                                                                                                                      human_name="QC statistics",
                                                                                                                                                                                                      short_description="Quality control statistics file"
                                                                                                                                                                                                  ),
                                                                                                                                                                                                  'analysis_results': InputSpec(
                                                                                                                                                                                                      File,
                                                                                                                                                                                                      human_name="Analysis results",
                                                                                                                                                                                                      short_description="Analysis output file"
                                                                                                                                                                                                  )
                                                                                                                                                                                              })
                                                                                                                                                                                          
                                                                                                                                                                                              output_specs = OutputSpecs({
                                                                                                                                                                                                  'report': OutputSpec(
                                                                                                                                                                                                      File,
                                                                                                                                                                                                      human_name="Report",
                                                                                                                                                                                                      short_description="Combined analysis report"
                                                                                                                                                                                                  )
                                                                                                                                                                                              })
                                                                                                                                                                                          
                                                                                                                                                                                              config_specs = ConfigSpecs({
                                                                                                                                                                                                  'sample_name': StrParam(
                                                                                                                                                                                                      human_name="Sample name",
                                                                                                                                                                                                      short_description="Name of the sample",
                                                                                                                                                                                                      default_value="sample1"
                                                                                                                                                                                                  )
                                                                                                                                                                                              })
                                                                                                                                                                                          
                                                                                                                                                                                              def run(self, params: ConfigParams, inputs: TaskInputs) -> TaskOutputs:
                                                                                                                                                                                                  # Get inputs
                                                                                                                                                                                                  qc_stats = inputs['qc_stats']
                                                                                                                                                                                                  analysis_results = inputs['analysis_results']
                                                                                                                                                                                          
                                                                                                                                                                                                  # Get parameters
                                                                                                                                                                                                  sample_name = params.get_value('sample_name')
                                                                                                                                                                                          
                                                                                                                                                                                                  # Create output path
                                                                                                                                                                                                  tmp_dir = self.create_tmp_dir()
                                                                                                                                                                                                  output_path = os.path.join(tmp_dir, f"{sample_name}_report.txt")
                                                                                                                                                                                          
                                                                                                                                                                                                  # Logic from run: block (directly converted)
                                                                                                                                                                                                  with open(output_path, 'w') as f_out:
                                                                                                                                                                                                      f_out.write(f"Report for {sample_name}\n")
                                                                                                                                                                                                      f_out.write("=" * 40 + "\n\n")
                                                                                                                                                                                          
                                                                                                                                                                                                      with open(qc_stats.path, 'r') as f:
                                                                                                                                                                                                          f_out.write("QC Statistics:\n")
                                                                                                                                                                                                          f_out.write(f.read() + "\n\n")
                                                                                                                                                                                          
                                                                                                                                                                                                      with open(analysis_results.path, 'r') as f:
                                                                                                                                                                                                          f_out.write("Analysis Results:\n")
                                                                                                                                                                                                          f_out.write(f.read())
                                                                                                                                                                                          
                                                                                                                                                                                                  self.log_success_message(f"Report generated for {sample_name}")
                                                                                                                                                                                          
                                                                                                                                                                                                  return {'report': File(output_path)}

                                                                                                                                                                                          Handling Special Cases


                                                                                                                                                                                          Wildcards and Batch Processing


                                                                                                                                                                                          Snakemake uses wildcards to process multiple samples. Constellab offers two strategies:


                                                                                                                                                                                          Strategy A: Config Parameter (Single Sample)


                                                                                                                                                                                          • Each task execution processes one sample
                                                                                                                                                                                            • User specifies sample name as a config parameter
                                                                                                                                                                                              • Multiple samples require multiple task executions or protocol runs

                                                                                                                                                                                                Strategy B: Batch Processing (Multiple Samples)


                                                                                                                                                                                                • Task processes all samples in a single execution
                                                                                                                                                                                                  • Use ListParam for sample names
                                                                                                                                                                                                    • Loop through samples within the task

                                                                                                                                                                                                      Example (Batch Processing):


                                                                                                                                                                                                      from gws_core import ListParam
                                                                                                                                                                                                      
                                                                                                                                                                                                      config_specs = ConfigSpecs({
                                                                                                                                                                                                          'sample_names': ListParam(
                                                                                                                                                                                                              human_name="Sample names",
                                                                                                                                                                                                              short_description="List of sample names to process",
                                                                                                                                                                                                              default_value=["sample1", "sample2", "sample3"]
                                                                                                                                                                                                          )
                                                                                                                                                                                                      })
                                                                                                                                                                                                      
                                                                                                                                                                                                      def run(self, params: ConfigParams, inputs: TaskInputs) -> TaskOutputs:
                                                                                                                                                                                                          sample_names = params.get_value('sample_names')
                                                                                                                                                                                                      
                                                                                                                                                                                                          results = []
                                                                                                                                                                                                          for sample in sample_names:
                                                                                                                                                                                                              self.log_info_message(f"Processing {sample}...")
                                                                                                                                                                                                              # Process each sample
                                                                                                                                                                                                              # ...
                                                                                                                                                                                                      
                                                                                                                                                                                                          return outputs

                                                                                                                                                                                                      Virtual Environments


                                                                                                                                                                                                      Snakemake's conda: directive is converted to MambaShellProxy:


                                                                                                                                                                                                      Original:


                                                                                                                                                                                                      rule align:
                                                                                                                                                                                                          conda: "envs/alignment.yaml"
                                                                                                                                                                                                          shell: "bwa mem ..."

                                                                                                                                                                                                      Converted:


                                                                                                                                                                                                      env_file_path = os.path.join(os.path.dirname(__file__), "alignment_env.yaml")
                                                                                                                                                                                                      mamba = MambaShellProxy(
                                                                                                                                                                                                          env_file_path=env_file_path,
                                                                                                                                                                                                          env_name="alignment_env",
                                                                                                                                                                                                          message_dispatcher=self.message_dispatcher
                                                                                                                                                                                                      )
                                                                                                                                                                                                      mamba.run("bwa mem ...")

                                                                                                                                                                                                      Docker Containers (Beta)


                                                                                                                                                                                                      Snakemake's container: directive is converted to DockerService:


                                                                                                                                                                                                      Summary


                                                                                                                                                                                                      Converting from Snakemake to Constellab offers:


                                                                                                                                                                                                      • ✅ Automated conversion with AI assistance
                                                                                                                                                                                                        • ✅ Preserved logic from original Snakemake rules
                                                                                                                                                                                                          • ✅ Enhanced traceability and audit trails
                                                                                                                                                                                                            • ✅ User-friendly interface for non-programmers
                                                                                                                                                                                                              • ✅ Complete platform for data and pipeline management
                                                                                                                                                                                                                • ✅ Flexible deployment in any environment
                                                                                                                                                                                                                  • ✅ Reusable protocol templates for standardized workflows
                                                                                                                                                                                                                    • ✅ Better maintainability through Python-based tasks

                                                                                                                                                                                                                      Start your conversion today with /gws-snakemake-to-constellab and experience the power of a complete data lab automation platform!



                                                                                                                                                                                                                      Comments (0)

                                                                                                                                                                                                                      Write a comment
                                                                                                                                                                                                                      Shine Logo
                                                                                                                                                                                                                      Stories & feedback

                                                                                                                                                                                                                      Do you have a resource to share?

                                                                                                                                                                                                                      Share it to accelerate projects for the entire community.