Load Constellab Bioprocess data QC0

Typing name : TASK.gws_plate_reader.ConstellabBioprocessLoadData

Brick : gws_plate_reader

Load and process Constellab Bioprocess QC0 fermentation data from multiple sources

Load and process Constellab Bioprocess QC0 fermentation data from multiple CSV files and follow-up data.

Overview

This task integrates data from four different sources to create a comprehensive dataset for fermentation analysis. It handles data merging, validation, missing data detection, and generates a visual summary of data availability.

Input Files Required

1. Info CSV (`info_csv`)

Contains experiment and fermenter information with columns:

Batch: Experiment/trial identifier (e.g., "A")
Fermentor: Fermenter identifier (e.g., "1", "2")
Medium: Culture medium used
Additional metadata columns describing experimental conditions

2. Raw Data CSV (`raw_data_csv`)

Contains raw measurement data with columns:

Batch: Experiment identifier (must match Info CSV)
Fermentor: Fermenter identifier (must match Info CSV)
Multiple measurement columns (e.g., biomass, pH, temperature)

3. Medium CSV (`medium_csv`)

Contains culture medium composition with columns:

Medium: Medium identifier (must match Info CSV)
Composition columns describing medium components and concentrations

4. Follow-up ZIP (`follow_up_zip`)

ZIP archive containing CSV files with temporal tracking data:

Filenames must follow pattern: <BATCH> <FERMENTOR>.csv
Example: "A 1.csv"
Contains time-series data with temporal measurements

Processing Steps

File Loading: Imports all CSV files and unzips follow-up data
Data Indexing: Creates lookup tables for each (BATCH, FERMENTOR) pair
Data Merging:
- Merges Raw Data and Follow-up data on time column
- Normalizes decimal formats (comma → dot)
- Filters negative time values
- Performs outer join to preserve all data points
Metadata Enrichment: Adds batch, sample, and medium information as tags
Missing Data Detection: Identifies which data types (info/raw_data/medium/follow_up) are missing for each sample
Column Tagging: Automatically tags columns as:
- is_index_column: Time columns
- is_data_column: Measurement columns
- Metadata columns (BATCH, FERMENTOR, MEDIUM)

Outputs

1. Resource set (`resource_set`)

A ResourceSet containing one Table per (BATCH, FERMENTOR) combination:

Table Name: <BATCH>_<FERMENTOR>
Tags:
- batch: Experiment identifier (BATCH)
- sample: Fermenter identifier (FERMENTOR)
- medium: Culture medium name
- missing_value: Comma-separated list of missing data types (if any)
  - Possible values: "info", "raw_data", "medium", "follow_up", "follow_up_empty"
Column Tags:
- is_index_column='true': Time columns for plotting
- is_data_column='true': Measurement columns
- unit: Unit of measurement (when available)

2. Venn Diagram (`venn_diagram`) - Optional

A PlotlyResource containing an interactive Venn diagram showing:

4 Overlapping Circles: One per data type (Info, Raw Data, Medium, Follow-up)
Circle Labels: Show count of samples with that data type
Center Label: Shows count of complete samples (all 4 data types present)
Color Coding:
- Blue: Info data
- Green: Raw Data
- Orange: Medium data
- Purple: Follow-up data

3. Medium Table (`medium_table`) - Optional

The medium composition CSV file converted to a Table resource with proper data types:

First column (MEDIUM): Kept as string (medium name/identifier)
Other columns: Converted to float (handles both comma and dot as decimal separator)
Missing values: Empty cells or non-numeric text converted to NaN

4. Metadata Table (`metadata_table`) - Optional

A Table concatenating medium composition data of the couple and median values of the follow-up data of the sample:

First column (Series): Contains the 'batch_sample' identifier (e.g., "EPA-WP3-25-001_23A")
Other columns: Medium composition columns and median values of follow-up measurements

Data Quality

Missing Data Handling

The task detects and reports missing data through tags:

Samples with missing Info will have missing_value tag including "info"
Samples with missing Raw Data will have tag including "raw_data"
Samples with missing Medium will have tag including "medium"
Samples with missing Follow-up will have tag including "follow_up"
Samples with empty Follow-up files will have tag including "follow_up_empty"

Data Validation

Checks for matching (BATCH, FERMENTOR) pairs across all data sources
Normalizes decimal separators for consistent numeric parsing
Filters out negative time values from follow-up data
Preserves all original columns and metadata

Use Cases

Quality Control: Use Venn diagram to quickly assess data completeness
Exploratory Analysis: Browse merged data with all temporal measurements
Selective Processing: Use tags to filter complete vs incomplete samples
Dashboard Display: Visualize data availability and sample information
Downstream Analysis: Provides clean, tagged data for filtering and interpolation

Example Workflow

[Info CSV] ──┐
[Raw Data] ──┼──> ConstellabBioprocessLoadData ──┬──> [Resource set] ──> Filter/Analysis
[Medium]   ──┤                          ├──> [Venn Diagram] ──> Dashboard
[Follow-up]──┘                          └──> [Medium Table] ──> PCA/UMAP

Notes

All CSV files should use UTF-8, Latin-1, or CP1252 encoding
Accepted separators: comma (,) or semicolon (;)
Column names are normalized to uppercase for matching
Follow-up files starting with "._" (macOS metadata) are ignored
Time columns are automatically detected and tagged for indexing
Output tables can be directly used with Filter and Interpolation tasks

Input

Info CSV file

Raw data CSV file

Medium CSV file

Follow-up ZIP file

Output

Resource set containing all the tables

A set of resources

Resource set

Venn diagram of data availability

Plotly resource

Optional

Plotly resource

Medium composition table

Medium CSV file with numeric columns properly converted to float

Optional

Table

Metadata table

Table containing metadata information

Optional

Table

Overview

Input Files Required

1. Info CSV (info_csv)

2. Raw Data CSV (raw_data_csv)

3. Medium CSV (medium_csv)

4. Follow-up ZIP (follow_up_zip)

Processing Steps

Outputs

1. Resource set (resource_set)

2. Venn Diagram (venn_diagram) - Optional

3. Medium Table (medium_table) - Optional

4. Metadata Table (metadata_table) - Optional

Data Quality

Missing Data Handling

Data Validation

Use Cases

Example Workflow

Notes

Input

Output

Have you developed a brick?

1. Info CSV (`info_csv`)

2. Raw Data CSV (`raw_data_csv`)

3. Medium CSV (`medium_csv`)

4. Follow-up ZIP (`follow_up_zip`)

1. Resource set (`resource_set`)

2. Venn Diagram (`venn_diagram`) - Optional

3. Medium Table (`medium_table`) - Optional

4. Metadata Table (`metadata_table`) - Optional