Menu
Introduction
Getting Started
Use cases
Cell Culture App
Technical documentations
Version
Publication date

Sep 19, 2024

Confidentiality
Public
Reactions
0
Share

Data Requirements & Conventions

Minimum Required Data Structure


ResourceSet Structure


The resource_set output must contain one table per batch-sample pair. Each table represents the complete time-series data for that specific batch-sample combination.


Structure:


ResourceSet
├── Table: "B001_S1"
│   ├── Columns: Batch, Sample, Time (h), OD600, pH, Glucose, ...
│   ├── Tags: batch='B001', sample='S1', medium='Medium1', missing_value='follow_up'
│   └── Column Tags: Time (h) → is_index_column, OD600 → is_data_column, ...
│
├── Table: "B001_S2"
│   ├── Columns: Batch, Sample, Time (h), OD600, pH, Glucose, ...
│   └── Tags: batch='B001', sample='S2', medium='Medium2'
│
└── Table: "B002_S1"
    └── ...

Individual Table Structure


Each batch-sample table must contain:


  • Batch column: String or numeric, batch identifier (e.g., "B001""ESSAI01")
    • Sample column: String or numeric, sample identifier (e.g., "S1""FERMENTEUR1")
      • Time column: Numeric, representing time points (e.g., 0, 1, 2, 3, ...)
        • Measurement columns: Numeric columns with actual data (e.g., OD600pHGlucose)

          Example table for batch-sample pair "B001_S1":


          | Batch | Sample | Time (h) | OD600 | pH  | Glucose (g/L) | Source    |
          |-------|--------|----------|-------|-----|---------------|-----------|
          | B001  | S1     | 0        | 0.05  | 7.0 | 10.0          | raw_data  |
          | B001  | S1     | 1        | 0.08  | 6.9 | 9.5           | raw_data  |
          | B001  | S1     | 2        | 0.12  | 6.8 | 9.0           | raw_data  |
          | B001  | S1     | 24       | 2.50  | 6.5 | 2.0           | follow_up |
          | B001  | S1     | 48       | 4.20  | 6.3 | 0.5           | follow_up |

          Note: Raw data and follow-up data are merged into a single table per batch-sample pair.


          Metadata Table Structure


          The metadata_table output (separate from ResourceSet) contains one row per batch-sample pair with:


          • Series column: String identifier combining batch and sample (e.g., "B001_S1")
            • Medium column: String, medium name
              • Medium composition columns: Numeric, ingredients from medium CSV (e.g., GlucoseNitrogen)
                • Follow-up median columns: Numeric, median values from follow-up data (optional)

                  Example metadata table:


                  | Series  | Medium  | Glucose | Nitrogen | pH_median | OD600_median |
                  |---------|---------|---------|----------|-----------|--------------|
                  | B001_S1 | Medium1 | 10.0    | 2.0      | 6.4       | 3.35         |
                  | B001_S2 | Medium2 | 10.0    | 4.0      | 6.5       | 3.80         |
                  | B002_S1 | Medium1 | 10.0    | 2.0      | 6.3       | 3.10         |

                  Tagging Conventions


                  Tags are critical for the framework to understand your data structure.


                  Table-Level Tags (on each batch-sample table)


                  # Create table
                  table = Table(merged_df)
                  table.name = f"{batch}_{sample}"
                  
                  # Add table-level tags
                  table.add_tag(Tag('batch', 'B001'))           # Batch identifier
                  table.add_tag(Tag('sample', 'S1'))            # Sample identifier
                  table.add_tag(Tag('medium', 'Medium1',        # Medium name + composition
                      additional_info={'composed': {'Glucose': 10.0, 'Nitrogen': 2.0}}))
                  table.add_tag(Tag('missing_value', 'raw_data'))  # Optional: missing data types

                  Column-Level Tags (on each table)


                  # Tag time column as index
                  table.add_column_tag_by_name('Time (h)', 'is_index_column', 'true')
                  
                  # Tag measurement columns as data columns
                  table.add_column_tag_by_name('OD600', 'is_data_column', 'true')
                  table.add_column_tag_by_name('pH', 'is_data_column', 'true')
                  table.add_column_tag_by_name('Glucose', 'is_data_column', 'true')
                  
                  # DO NOT tag batch, sample, or medium columns as data columns
                  # These are metadata columns, not measurement data

                  Missing Value Convention


                  The missing_value tag tracks which data sources are missing for a batch-sample pair:


                  • 'info': Sample not found in info CSV
                    • 'raw_data': No raw data file for this batch-sample
                      • 'follow_up': No follow-up data file for this batch-sample
                        • 'follow_up_empty': Follow-up file exists but contains no data

                          Example:


                          # Sample has raw data but no follow-up
                          table.add_tag(Tag('missing_value', 'follow_up'))
                          
                          # Sample has no raw data, but has follow-up
                          table.add_tag(Tag('missing_value', 'raw_data'))
                          
                          # Sample is missing from info CSV and has empty follow-up
                          table.add_tag(Tag('missing_value', 'info, follow_up_empty'))
                          Technical bricks to reuse or customize

                          Have you developed a brick?

                          Share it to accelerate projects for the entire community.