Menu
Introduction
Getting Started
Use cases
Technical documentations
Version
Publication date

Sep 19, 2024

Confidentiality
Public
Reactions
0
Share

Load Fermentalg data QC0

TASK
Typing name :  TASK.gws_plate_reader.FermentalgLoadData Brick :  gws_plate_reader

Load and process Fermentalg QC0 fermentation data from multiple sources

Load and process Fermentalg QC0 fermentation data from multiple CSV files and follow-up data.

Overview

This task integrates data from four different sources to create a comprehensive dataset for fermentation analysis. It handles data merging, validation, missing data detection, and generates a visual summary of data availability.

Input Files Required

1. Info CSV (info_csv)

Contains experiment and fermenter information with columns:

  • ESSAI: Experiment/trial identifier (e.g., "EPA-WP3-25-001")
  • FERMENTEUR: Fermenter identifier (e.g., "23A", "23B")
  • MILIEU: Culture medium used
  • Additional metadata columns describing experimental conditions

2. Raw Data CSV (raw_data_csv)

Contains raw measurement data with columns:

  • ESSAI: Experiment identifier (must match Info CSV)
  • FERMENTEUR: Fermenter identifier (must match Info CSV)
  • Temps de culture (h): Culture time in hours
  • Multiple measurement columns (e.g., biomass, pH, temperature)

3. Medium CSV (medium_csv)

Contains culture medium composition with columns:

  • MILIEU: Medium identifier (must match Info CSV)
  • Composition columns describing medium components and concentrations

4. Follow-up ZIP (follow_up_zip)

ZIP archive containing CSV files with temporal tracking data:

  • Filenames must follow pattern: <ESSAI> <FERMENTEUR>.csv
  • Example: "EPA-WP3-25-001 23A.csv"
  • Contains time-series data with temporal measurements
  • Column Temps (h) will be renamed to Temps de culture (h) for consistency

Processing Steps

  1. File Loading: Imports all CSV files and unzips follow-up data
  2. Data Indexing: Creates lookup tables for each (ESSAI, FERMENTEUR) pair
  3. Data Merging:
    • Merges Raw Data and Follow-up data on time column
    • Normalizes decimal formats (comma → dot)
    • Filters negative time values
    • Performs outer join to preserve all data points
  4. Metadata Enrichment: Adds batch, sample, and medium information as tags
  5. Missing Data Detection: Identifies which data types (info/raw_data/medium/follow_up) are missing for each sample
  6. Column Tagging: Automatically tags columns as:
    • is_index_column: Time columns
    • is_data_column: Measurement columns
    • Metadata columns (ESSAI, FERMENTEUR, MILIEU)

Outputs

1. Resource Set (resource_set)

A ResourceSet containing one Table per (ESSAI, FERMENTEUR) combination:

  • Table Name: <ESSAI>_<FERMENTEUR>
  • Tags:
    • batch: Experiment identifier (ESSAI)
    • sample: Fermenter identifier (FERMENTEUR)
    • medium: Culture medium name
    • missing_value: Comma-separated list of missing data types (if any)
      • Possible values: "info", "raw_data", "medium", "follow_up", "follow_up_empty"
  • Column Tags:
    • is_index_column='true': Time columns for plotting
    • is_data_column='true': Measurement columns
    • unit: Unit of measurement (when available)

2. Venn Diagram (venn_diagram) - Optional

A PlotlyResource containing an interactive Venn diagram showing:

  • 4 Overlapping Circles: One per data type (Info, Raw Data, Medium, Follow-up)
  • Circle Labels: Show count of samples with that data type
  • Center Label: Shows count of complete samples (all 4 data types present)
  • Color Coding:
    • Blue: Info data
    • Green: Raw Data
    • Orange: Medium data
    • Purple: Follow-up data

Data Quality

Missing Data Handling

The task detects and reports missing data through tags:

  • Samples with missing Info will have missing_value tag including "info"
  • Samples with missing Raw Data will have tag including "raw_data"
  • Samples with missing Medium will have tag including "medium"
  • Samples with missing Follow-up will have tag including "follow_up"
  • Samples with empty Follow-up files will have tag including "follow_up_empty"

Data Validation

  • Checks for matching (ESSAI, FERMENTEUR) pairs across all data sources
  • Normalizes decimal separators for consistent numeric parsing
  • Filters out negative time values from follow-up data
  • Preserves all original columns and metadata

Use Cases

  1. Quality Control: Use Venn diagram to quickly assess data completeness
  2. Exploratory Analysis: Browse merged data with all temporal measurements
  3. Selective Processing: Use tags to filter complete vs incomplete samples
  4. Dashboard Display: Visualize data availability and sample information
  5. Downstream Analysis: Provides clean, tagged data for filtering and interpolation

Example Workflow

[Info CSV] ──┐
[Raw Data] ──┼──> FermentalgLoadData ──┬──> [Resource Set] ──> Filter/Analysis
[Medium]   ──┤                          └──> [Venn Diagram] ──> Dashboard
[Follow-up]──┘

Notes

  • All CSV files should use UTF-8, Latin-1, or CP1252 encoding
  • Accepted separators: comma (,) or semicolon (;)
  • Column names are normalized to uppercase for matching
  • Follow-up files starting with "._" (macOS metadata) are ignored
  • Time columns are automatically detected and tagged for indexing
  • Output tables can be directly used with Filter and Interpolation tasks

Input

Info CSV file
Raw data CSV file
Medium CSV file
Follow-up ZIP file

Output

Resource set containing all the tables
A set of resources
Venn diagram of data availability
Plotly resource
Technical bricks to reuse or customize

Have you developed a brick?

Share it to accelerate projects for the entire community.