Data Requirements & Conventions

Minimum Required Data Structure

ResourceSet Structure

The resource_set output must contain one table per batch-sample pair. Each table represents the complete time-series data for that specific batch-sample combination.

Structure:

ResourceSet
├── Table: "B001_S1"
│   ├── Columns: Batch, Sample, Time (h), OD600, pH, Glucose, ...
│   ├── Tags: batch='B001', sample='S1', medium='Medium1', missing_value='follow_up'
│   └── Column Tags: Time (h) → is_index_column, OD600 → is_data_column, ...
│
├── Table: "B001_S2"
│   ├── Columns: Batch, Sample, Time (h), OD600, pH, Glucose, ...
│   └── Tags: batch='B001', sample='S2', medium='Medium2'
│
└── Table: "B002_S1"
    └── ...

Individual Table Structure

Each batch-sample table must contain:

Batch column: String or numeric, batch identifier (e.g., "B001", "ESSAI01")

Sample column: String or numeric, sample identifier (e.g., "S1", "FERMENTEUR1")

Time column: Numeric, representing time points (e.g., 0, 1, 2, 3, ...)

Measurement columns: Numeric columns with actual data (e.g., OD600, pH, Glucose)

Example table for batch-sample pair "B001_S1":

| Batch | Sample | Time (h) | OD600 | pH  | Glucose (g/L) | Source    |
|-------|--------|----------|-------|-----|---------------|-----------|
| B001  | S1     | 0        | 0.05  | 7.0 | 10.0          | raw_data  |
| B001  | S1     | 1        | 0.08  | 6.9 | 9.5           | raw_data  |
| B001  | S1     | 2        | 0.12  | 6.8 | 9.0           | raw_data  |
| B001  | S1     | 24       | 2.50  | 6.5 | 2.0           | follow_up |
| B001  | S1     | 48       | 4.20  | 6.3 | 0.5           | follow_up |

Note: Raw data and follow-up data are merged into a single table per batch-sample pair.

Metadata Table Structure

The metadata_table output (separate from ResourceSet) contains one row per batch-sample pair with:

Series column: String identifier combining batch and sample (e.g., "B001_S1")

Medium column: String, medium name

Medium composition columns: Numeric, ingredients from medium CSV (e.g., Glucose, Nitrogen)

Follow-up median columns: Numeric, median values from follow-up data (optional)

Example metadata table:

| Series  | Medium  | Glucose | Nitrogen | pH_median | OD600_median |
|---------|---------|---------|----------|-----------|--------------|
| B001_S1 | Medium1 | 10.0    | 2.0      | 6.4       | 3.35         |
| B001_S2 | Medium2 | 10.0    | 4.0      | 6.5       | 3.80         |
| B002_S1 | Medium1 | 10.0    | 2.0      | 6.3       | 3.10         |

Tagging Conventions

Tags are critical for the framework to understand your data structure.

Table-Level Tags (on each batch-sample table)

# Create table
table = Table(merged_df)
table.name = f"{batch}_{sample}"

# Add table-level tags
table.add_tag(Tag('batch', 'B001'))           # Batch identifier
table.add_tag(Tag('sample', 'S1'))            # Sample identifier
table.add_tag(Tag('medium', 'Medium1',        # Medium name + composition
    additional_info={'composed': {'Glucose': 10.0, 'Nitrogen': 2.0}}))
table.add_tag(Tag('missing_value', 'raw_data'))  # Optional: missing data types

Column-Level Tags (on each table)

# Tag time column as index
table.add_column_tag_by_name('Time (h)', 'is_index_column', 'true')

# Tag measurement columns as data columns
table.add_column_tag_by_name('OD600', 'is_data_column', 'true')
table.add_column_tag_by_name('pH', 'is_data_column', 'true')
table.add_column_tag_by_name('Glucose', 'is_data_column', 'true')

# DO NOT tag batch, sample, or medium columns as data columns
# These are metadata columns, not measurement data

Missing Value Convention

The missing_value tag tracks which data sources are missing for a batch-sample pair:

'info': Sample not found in info CSV

'raw_data': No raw data file for this batch-sample

'follow_up': No follow-up data file for this batch-sample

'follow_up_empty': Follow-up file exists but contains no data

Example:

# Sample has raw data but no follow-up
table.add_tag(Tag('missing_value', 'follow_up'))

# Sample has no raw data, but has follow-up
table.add_tag(Tag('missing_value', 'raw_data'))

# Sample is missing from info CSV and has empty follow-up
table.add_tag(Tag('missing_value', 'info, follow_up_empty'))