Minimum Required Data Structure
ResourceSet Structure
The resource_set output must contain one table per batch-sample pair. Each table represents the complete time-series data for that specific batch-sample combination.
Structure:
ResourceSet
├── Table: "B001_S1"
│ ├── Columns: Batch, Sample, Time (h), OD600, pH, Glucose, ...
│ ├── Tags: batch='B001', sample='S1', medium='Medium1', missing_value='follow_up'
│ └── Column Tags: Time (h) → is_index_column, OD600 → is_data_column, ...
│
├── Table: "B001_S2"
│ ├── Columns: Batch, Sample, Time (h), OD600, pH, Glucose, ...
│ └── Tags: batch='B001', sample='S2', medium='Medium2'
│
└── Table: "B002_S1"
└── ...
Individual Table Structure
Each batch-sample table must contain:
- Batch column: String or numeric, batch identifier (e.g.,
"B001", "ESSAI01") - Sample column: String or numeric, sample identifier (e.g.,
"S1", "FERMENTEUR1") - Time column: Numeric, representing time points (e.g.,
0, 1, 2, 3, ...) - Measurement columns: Numeric columns with actual data (e.g.,
OD600, pH, Glucose)
Example table for batch-sample pair "B001_S1":
| Batch | Sample | Time (h) | OD600 | pH | Glucose (g/L) | Source |
|-------|--------|----------|-------|-----|---------------|-----------|
| B001 | S1 | 0 | 0.05 | 7.0 | 10.0 | raw_data |
| B001 | S1 | 1 | 0.08 | 6.9 | 9.5 | raw_data |
| B001 | S1 | 2 | 0.12 | 6.8 | 9.0 | raw_data |
| B001 | S1 | 24 | 2.50 | 6.5 | 2.0 | follow_up |
| B001 | S1 | 48 | 4.20 | 6.3 | 0.5 | follow_up |
Note: Raw data and follow-up data are merged into a single table per batch-sample pair.
Metadata Table Structure
The metadata_table output (separate from ResourceSet) contains one row per batch-sample pair with:
- Series column: String identifier combining batch and sample (e.g.,
"B001_S1") - Medium column: String, medium name
- Medium composition columns: Numeric, ingredients from medium CSV (e.g.,
Glucose, Nitrogen) - Follow-up median columns: Numeric, median values from follow-up data (optional)
Example metadata table:
| Series | Medium | Glucose | Nitrogen | pH_median | OD600_median |
|---------|---------|---------|----------|-----------|--------------|
| B001_S1 | Medium1 | 10.0 | 2.0 | 6.4 | 3.35 |
| B001_S2 | Medium2 | 10.0 | 4.0 | 6.5 | 3.80 |
| B002_S1 | Medium1 | 10.0 | 2.0 | 6.3 | 3.10 |
Tagging Conventions
Tags are critical for the framework to understand your data structure.
Table-Level Tags (on each batch-sample table)
# Create table
table = Table(merged_df)
table.name = f"{batch}_{sample}"
# Add table-level tags
table.add_tag(Tag('batch', 'B001')) # Batch identifier
table.add_tag(Tag('sample', 'S1')) # Sample identifier
table.add_tag(Tag('medium', 'Medium1', # Medium name + composition
additional_info={'composed': {'Glucose': 10.0, 'Nitrogen': 2.0}}))
table.add_tag(Tag('missing_value', 'raw_data')) # Optional: missing data types
Column-Level Tags (on each table)
# Tag time column as index
table.add_column_tag_by_name('Time (h)', 'is_index_column', 'true')
# Tag measurement columns as data columns
table.add_column_tag_by_name('OD600', 'is_data_column', 'true')
table.add_column_tag_by_name('pH', 'is_data_column', 'true')
table.add_column_tag_by_name('Glucose', 'is_data_column', 'true')
# DO NOT tag batch, sample, or medium columns as data columns
# These are metadata columns, not measurement data
Missing Value Convention
The missing_value tag tracks which data sources are missing for a batch-sample pair:
-
'info': Sample not found in info CSV -
'raw_data': No raw data file for this batch-sample -
'follow_up': No follow-up data file for this batch-sample -
'follow_up_empty': Follow-up file exists but contains no data
Example:
# Sample has raw data but no follow-up
table.add_tag(Tag('missing_value', 'follow_up'))
# Sample has no raw data, but has follow-up
table.add_tag(Tag('missing_value', 'raw_data'))
# Sample is missing from info CSV and has empty follow-up
table.add_tag(Tag('missing_value', 'info, follow_up_empty'))