Menu
Introduction
Getting Started
Use cases
Cell Culture App
Technical documentations
Version
Publication date

Sep 19, 2024

Confidentiality
Public
Reactions
0
Share

Load Constellab Bioprocess data QC0

TASK
1036 times
61.29 %
18 seconds
Typing name :  TASK.gws_plate_reader.ConstellabBioprocessLoadData Brick :  gws_plate_reader

Load and process Constellab Bioprocess QC0 fermentation data from multiple sources

Load and process Constellab Bioprocess QC0 fermentation data from multiple CSV files and follow-up data.

Overview

This task integrates data from four different sources to create a comprehensive dataset for fermentation analysis. It handles data merging, validation, missing data detection, and generates a visual summary of data availability.

Input Files Required

1. Info CSV (info_csv)

Contains experiment and fermenter information with columns:

  • Batch: Experiment/trial identifier (e.g., "A")
  • Fermentor: Fermenter identifier (e.g., "1", "2")
  • Medium: Culture medium used
  • Additional metadata columns describing experimental conditions

2. Raw Data CSV (raw_data_csv)

Contains raw measurement data with columns:

  • Batch: Experiment identifier (must match Info CSV)
  • Fermentor: Fermenter identifier (must match Info CSV)
  • Multiple measurement columns (e.g., biomass, pH, temperature)

3. Medium CSV (medium_csv)

Contains culture medium composition with columns:

  • Medium: Medium identifier (must match Info CSV)
  • Composition columns describing medium components and concentrations

4. Follow-up ZIP (follow_up_zip)

ZIP archive containing CSV files with temporal tracking data:

  • Filenames must follow pattern: <BATCH> <FERMENTOR>.csv
  • Example: "A 1.csv"
  • Contains time-series data with temporal measurements

Processing Steps

  1. File Loading: Imports all CSV files and unzips follow-up data
  2. Data Indexing: Creates lookup tables for each (BATCH, FERMENTOR) pair
  3. Data Merging:
    • Merges Raw Data and Follow-up data on time column
    • Normalizes decimal formats (comma → dot)
    • Filters negative time values
    • Performs outer join to preserve all data points
  4. Metadata Enrichment: Adds batch, sample, and medium information as tags
  5. Missing Data Detection: Identifies which data types (info/raw_data/medium/follow_up) are missing for each sample
  6. Column Tagging: Automatically tags columns as:
    • is_index_column: Time columns
    • is_data_column: Measurement columns
    • Metadata columns (BATCH, FERMENTOR, MEDIUM)

Outputs

1. Resource set (resource_set)

A ResourceSet containing one Table per (BATCH, FERMENTOR) combination:

  • Table Name: <BATCH>_<FERMENTOR>
  • Tags:
    • batch: Experiment identifier (BATCH)
    • sample: Fermenter identifier (FERMENTOR)
    • medium: Culture medium name
    • missing_value: Comma-separated list of missing data types (if any)
      • Possible values: "info", "raw_data", "medium", "follow_up", "follow_up_empty"
  • Column Tags:
    • is_index_column='true': Time columns for plotting
    • is_data_column='true': Measurement columns
    • unit: Unit of measurement (when available)

2. Venn Diagram (venn_diagram) - Optional

A PlotlyResource containing an interactive Venn diagram showing:

  • 4 Overlapping Circles: One per data type (Info, Raw Data, Medium, Follow-up)
  • Circle Labels: Show count of samples with that data type
  • Center Label: Shows count of complete samples (all 4 data types present)
  • Color Coding:
    • Blue: Info data
    • Green: Raw Data
    • Orange: Medium data
    • Purple: Follow-up data

3. Medium Table (medium_table) - Optional

The medium composition CSV file converted to a Table resource with proper data types:

  • First column (MEDIUM): Kept as string (medium name/identifier)
  • Other columns: Converted to float (handles both comma and dot as decimal separator)
  • Missing values: Empty cells or non-numeric text converted to NaN

4. Metadata Table (metadata_table) - Optional

A Table concatenating medium composition data of the couple and median values of the follow-up data of the sample:

  • First column (Series): Contains the 'batch_sample' identifier (e.g., "EPA-WP3-25-001_23A")
  • Other columns: Medium composition columns and median values of follow-up measurements

Data Quality

Missing Data Handling

The task detects and reports missing data through tags:

  • Samples with missing Info will have missing_value tag including "info"
  • Samples with missing Raw Data will have tag including "raw_data"
  • Samples with missing Medium will have tag including "medium"
  • Samples with missing Follow-up will have tag including "follow_up"
  • Samples with empty Follow-up files will have tag including "follow_up_empty"

Data Validation

  • Checks for matching (BATCH, FERMENTOR) pairs across all data sources
  • Normalizes decimal separators for consistent numeric parsing
  • Filters out negative time values from follow-up data
  • Preserves all original columns and metadata

Use Cases

  1. Quality Control: Use Venn diagram to quickly assess data completeness
  2. Exploratory Analysis: Browse merged data with all temporal measurements
  3. Selective Processing: Use tags to filter complete vs incomplete samples
  4. Dashboard Display: Visualize data availability and sample information
  5. Downstream Analysis: Provides clean, tagged data for filtering and interpolation

Example Workflow

[Info CSV] ──┐
[Raw Data] ──┼──> ConstellabBioprocessLoadData ──┬──> [Resource set] ──> Filter/Analysis
[Medium]   ──┤                          ├──> [Venn Diagram] ──> Dashboard
[Follow-up]──┘                          └──> [Medium Table] ──> PCA/UMAP

Notes

  • All CSV files should use UTF-8, Latin-1, or CP1252 encoding
  • Accepted separators: comma (,) or semicolon (;)
  • Column names are normalized to uppercase for matching
  • Follow-up files starting with "._" (macOS metadata) are ignored
  • Time columns are automatically detected and tagged for indexing
  • Output tables can be directly used with Filter and Interpolation tasks

Input

Info CSV file
Raw data CSV file
Medium CSV file
Follow-up ZIP file

Output

Resource set containing all the tables
A set of resources
Venn diagram of data availability
Plotly resource
Optional
Medium composition table
Medium CSV file with numeric columns properly converted to float
Optional
Metadata table
Table containing metadata information
Optional
Shine Logo
Technical bricks to reuse or customize

Have you developed a brick?

Share it to accelerate projects for the entire community.