Load Fermentalg data QC0
Load and process Fermentalg QC0 fermentation data from multiple sources
Load and process Fermentalg QC0 fermentation data from multiple CSV files and follow-up data.
Overview
This task integrates data from four different sources to create a comprehensive dataset for fermentation analysis. It handles data merging, validation, missing data detection, and generates a visual summary of data availability.
Input Files Required
1. Info CSV (info_csv)
Contains experiment and fermenter information with columns:
ESSAI: Experiment/trial identifier (e.g., "EPA-WP3-25-001")FERMENTEUR: Fermenter identifier (e.g., "23A", "23B")MILIEU: Culture medium used- Additional metadata columns describing experimental conditions
2. Raw Data CSV (raw_data_csv)
Contains raw measurement data with columns:
ESSAI: Experiment identifier (must match Info CSV)FERMENTEUR: Fermenter identifier (must match Info CSV)Temps de culture (h): Culture time in hours- Multiple measurement columns (e.g., biomass, pH, temperature)
3. Medium CSV (medium_csv)
Contains culture medium composition with columns:
MILIEU: Medium identifier (must match Info CSV)- Composition columns describing medium components and concentrations
4. Follow-up ZIP (follow_up_zip)
ZIP archive containing CSV files with temporal tracking data:
- Filenames must follow pattern:
<ESSAI> <FERMENTEUR>.csv - Example: "EPA-WP3-25-001 23A.csv"
- Contains time-series data with temporal measurements
- Column
Temps (h)will be renamed toTemps de culture (h)for consistency
Processing Steps
- File Loading: Imports all CSV files and unzips follow-up data
- Data Indexing: Creates lookup tables for each (ESSAI, FERMENTEUR) pair
- Data Merging:
- Merges Raw Data and Follow-up data on time column
- Normalizes decimal formats (comma → dot)
- Filters negative time values
- Performs outer join to preserve all data points
- Metadata Enrichment: Adds batch, sample, and medium information as tags
- Missing Data Detection: Identifies which data types (info/raw_data/medium/follow_up) are missing for each sample
- Column Tagging: Automatically tags columns as:
is_index_column: Time columnsis_data_column: Measurement columns- Metadata columns (ESSAI, FERMENTEUR, MILIEU)
Outputs
1. Resource Set (resource_set)
A ResourceSet containing one Table per (ESSAI, FERMENTEUR) combination:
- Table Name:
<ESSAI>_<FERMENTEUR> - Tags:
batch: Experiment identifier (ESSAI)sample: Fermenter identifier (FERMENTEUR)medium: Culture medium namemissing_value: Comma-separated list of missing data types (if any)- Possible values: "info", "raw_data", "medium", "follow_up", "follow_up_empty"
- Column Tags:
is_index_column='true': Time columns for plottingis_data_column='true': Measurement columnsunit: Unit of measurement (when available)
2. Venn Diagram (venn_diagram) - Optional
A PlotlyResource containing an interactive Venn diagram showing:
- 4 Overlapping Circles: One per data type (Info, Raw Data, Medium, Follow-up)
- Circle Labels: Show count of samples with that data type
- Center Label: Shows count of complete samples (all 4 data types present)
- Color Coding:
- Blue: Info data
- Green: Raw Data
- Orange: Medium data
- Purple: Follow-up data
Data Quality
Missing Data Handling
The task detects and reports missing data through tags:
- Samples with missing Info will have
missing_valuetag including "info" - Samples with missing Raw Data will have tag including "raw_data"
- Samples with missing Medium will have tag including "medium"
- Samples with missing Follow-up will have tag including "follow_up"
- Samples with empty Follow-up files will have tag including "follow_up_empty"
Data Validation
- Checks for matching (ESSAI, FERMENTEUR) pairs across all data sources
- Normalizes decimal separators for consistent numeric parsing
- Filters out negative time values from follow-up data
- Preserves all original columns and metadata
Use Cases
- Quality Control: Use Venn diagram to quickly assess data completeness
- Exploratory Analysis: Browse merged data with all temporal measurements
- Selective Processing: Use tags to filter complete vs incomplete samples
- Dashboard Display: Visualize data availability and sample information
- Downstream Analysis: Provides clean, tagged data for filtering and interpolation
Example Workflow
[Info CSV] ──┐
[Raw Data] ──┼──> FermentalgLoadData ──┬──> [Resource Set] ──> Filter/Analysis
[Medium] ──┤ └──> [Venn Diagram] ──> Dashboard
[Follow-up]──┘
Notes
- All CSV files should use UTF-8, Latin-1, or CP1252 encoding
- Accepted separators: comma (,) or semicolon (;)
- Column names are normalized to uppercase for matching
- Follow-up files starting with "._" (macOS metadata) are ignored
- Time columns are automatically detected and tagged for indexing
- Output tables can be directly used with Filter and Interpolation tasks