Cell Culture Time Series Subsampling

Typing name : TASK.gws_plate_reader.CellCultureSubsampling

Brick : gws_plate_reader

Subsample fermentation time series data by combining real and interpolated values

[Generated by Task Expert Agent]

Subsample fermentation time series data by combining real and interpolated values.

Overview

This task creates a combined dataset where:

Interpolated columns use only interpolated values
Non-interpolated columns use only real measured values
Time column contains both real and interpolated time points

Purpose

Smart Data Combination: Use interpolation only where it makes sense
Preserve Real Data: Keep original measurements for columns with sparse data
Enable Comparison: Create uniform datasets while respecting data quality
Prepare for Analysis: Output ready for ML and statistical analysis

Interpolation Methods

Shape-Preserving Methods (Recommended for Biological Data)

`makima` (Default) ★ RECOMMENDED

Modified Akima Interpolation: Enhanced shape-preserving method
Best For: Biological time series with varying growth phases
Advantages:
- Avoids overshoot in steep regions (pH, biomass)
- Natural-looking curves following data trends
- Robust to outliers
- Smooth transitions between regions
Use When: General fermentation data analysis

`pchip`

Piecewise Cubic Hermite Interpolating Polynomial
Best For: Data with local extrema that must be preserved
Advantages:
- Guarantees monotonicity (no artificial peaks/valleys)
- Shape-preserving in regions of constant slope
- Good for concentration measurements
Use When: Substrate consumption, product formation curves

`akima`

Original Akima Spline: Classic shape-preserving method
Best For: Smooth biological responses
Advantages:
- Very smooth curves
- Less sensitive to outliers than cubic splines
- Natural interpolation
Use When: Temperature, pressure, flow rate data

Polynomial Methods

`linear`

Linear Interpolation: Straight lines between points
Best For: High-frequency data, preliminary analysis
Advantages: Fast, simple, no overshoot
Disadvantages: Angular appearance, not smooth
Use When: Quick checks, high sampling rate data

`quadratic`

Quadratic Spline: Piecewise second-degree polynomials
Best For: Slowly varying parameters
Advantages: Smoother than linear, faster than cubic
Use When: Steady-state measurements

`cubic`

Cubic Spline: Piecewise third-degree polynomials
Best For: Very smooth data
Advantages: Smooth, twice differentiable
Disadvantages: Can overshoot between points
Use When: High-quality data with minimal noise

Advanced Spline Methods

`cubic_spline`

Natural Cubic Spline: Global smoothing with boundary conditions
Best For: Complete curve fitting
Advantages: Global smoothness, natural boundary behavior
Use When: Need derivatives, mathematical modeling

`univariate_spline` or `spline`

Adaptive Order B-Spline: Automatic order selection
Best For: Complex curves with varying smoothness
Advantages: Adaptive, handles varied data patterns
Configuration: spline_order (1-5, default=3)
Use When: Unknown data behavior, exploratory analysis

`nearest`

Nearest Neighbor: Step-wise interpolation
Best For: Categorical data, control parameters
Use When: Binary states, control settings over time

Grid Strategies

`global_auto` (Default) ★ RECOMMENDED

Creates ONE common time grid for ALL samples
Advantages:
- Direct sample comparison at same time points
- Optimal for plotting multiple samples together
- Consistent analysis across dataset
Algorithm:
1. Finds global min/max time across all samples
2. Calculates median time step from all samples
3. Generates uniform grid spanning entire range
Use When: Comparing multiple fermentations

`per_file`

Creates INDIVIDUAL time grid for EACH sample
Advantages:
- Preserves each sample's specific time range
- Better for samples with very different durations
- Optimal grid density per sample
Use When: Samples have vastly different durations

`reference`

Uses time grid from ONE specific sample as template
Configuration: reference_index (0-based index of reference sample)
Advantages:
- Forces all samples to exact same time points
- Perfect alignment for direct arithmetic operations
Use When:
- Need exact alignment for calculations
- One sample is the "gold standard"

Configuration Parameters

Essential Parameters

`method` (String)

Default: "makima"
Options: See Interpolation Methods section
Impact: Determines curve shape and smoothness
Recommendation: Start with makima, try pchip if issues

`grid_strategy` (String)

Default: "global_auto"
Options: "global_auto", "per_file", "reference"
Impact: Determines time alignment across samples
Recommendation: Use default unless specific needs

`n_points` (Integer)

Default: 500
Range: 10 to 20,000
Impact:
- Higher: Smoother curves, larger files, slower processing
- Lower: Faster processing, may miss details
Auto Mode: If not set, calculated from data characteristics
Recommendation:
- 100-300: Quick previews
- 500-1000: Standard analysis
- 1000-5000: Publication quality
- 5000+: Detailed mathematical analysis

Advanced Parameters

`spline_order` (Integer)

Default: 3
Range: 1 to 5
Only For: univariate_spline / spline method
Values:
- 1: Linear (same as linear method)
- 2: Quadratic
- 3: Cubic (smooth, flexible) ★
- 4-5: Very smooth (may oscillate)

`edge_strategy` (String)

Default: "nearest"
Options:
- "nearest": Extend with nearest value (flat)
- "linear": Linear extrapolation
- "nan": Fill with NaN (missing)
Impact: How to handle time points outside original range
Recommendation: Use default (nearest) for safety

`reference_index` (Integer)

Default: 0
Only For: grid_strategy="reference"
Range: 0 to (number_of_samples - 1)
Impact: Which sample's time grid to use as template

Input Requirements

ResourceSet Structure

Source: CellCultureLoadData or Filter task output
Contents: Table resources with time series data
Required Column: "Time" (culture time in hours)
Required Tags:
- batch: Experiment identifier (preserved in output)
- sample: Sample identifier (preserved in output)
- Column tag is_index_column='true' on time column

Data Quality Requirements

Numeric Data: All measurement columns must be numeric
Decimal Format: Commas automatically converted to dots
Time Values: Must be finite (no NaN/Inf in time column)
Minimum Points: At least 2 time points per sample
Sorting: Time column will be sorted automatically

Processing Steps

Data Loading: Extracts Tables from input ResourceSet
Decimal Normalization: Converts commas to dots in numeric columns
Time Grid Generation: Creates interpolation grid(s) based on strategy
Quality Checks: Validates sufficient data points, handles edge cases
Interpolation: Applies selected method to each numeric column
Tag Preservation: Copies all tags from original to interpolated Tables
Metadata: Adds interpolation method tag for traceability
Output Assembly: Combines all interpolated Tables into ResourceSet

Output Structure

interpolated_resource_set (ResourceSet)

Contains one Table per input sample with:

Columns

Time: Uniform time grid
All measurement columns: Interpolated values
Metadata columns: Preserved as-is (Batch, Fermentor, Medium)

Tags (Preserved)

batch: Original experiment ID
sample: Original sample ID
medium: Original medium name
missing_value: Original missing data info
interpolation_method: NEW - Records method used

Column Tags (Preserved)

is_index_column='true': On time column
is_data_column='true': On measurement columns
unit: Units of measurement

Properties

Same number of rows across all samples (if global_auto)
Aligned time points for easy comparison
Smooth, continuous curves
Ready for plotting and analysis

Use Cases

1. Multi-Sample Comparison

Filter → Interpolate (global_auto, makima) →
Plot all samples on same axes

2. Growth Rate Calculation

Interpolate (cubic_spline) →
Calculate derivatives → Analyze growth kinetics

3. Gap Filling

Data with missing points →
Interpolate (pchip) → Complete time series

4. Noise Reduction

Noisy measurements →
Interpolate (makima, fewer points) → Smoother curves

5. Machine Learning Prep

Interpolate (global_auto, 200 points) →
Fixed-length feature vectors → ML model

Example Workflows

Standard Analysis Pipeline

CellCultureLoadData
  ↓ (50 samples with irregular sampling)
Filter (select 10 samples)
  ↓
Interpolate (method=makima, grid_strategy=global_auto, n_points=500)
  ↓
[10 samples, each with 500 uniform time points]
  ↓
Visualization / Statistical Analysis

High-Quality Publication

Filter (select best samples)
  ↓
Interpolate (method=pchip, n_points=2000)
  ↓
Export smooth, high-resolution curves

Quick Preview

Interpolate (method=linear, n_points=50)
  ↓
Fast visualization for QC

Performance Considerations

Speed: linear > nearest > quadratic > cubic > akima > pchip > cubic_spline > univariate_spline
Memory: Proportional to (n_samples × n_points × n_columns)
Quality: Shape-preserving methods generally best for biological data
Recommendations:
- < 10 samples: Any method, high n_points OK
- 10-100 samples: Use makima/pchip, n_points=500-1000
- 100 samples: Consider linear or lower n_points

Error Handling

Automatic Fallbacks

Too Few Points: Skips sample with warning
Non-numeric Data: Preserves original values
NaN in Time: Filters out invalid rows
Method Failure: Falls back to linear interpolation
Spline Order: Reduces order if insufficient points

Common Issues

Issue	Cause	Solution
Oscillations	Cubic with sparse data	Use pchip or makima
Angular curves	Linear method	Switch to makima/pchip
Overshoot	Cubic spline	Use shape-preserving method
Slow processing	Too many points	Reduce n_points to 500-1000
Missing columns	Time column not found	Check column name exactly

Best Practices

Start Simple: Begin with default settings (makima, global_auto)
Visual Check: Plot original + interpolated to verify quality
Match Purpose: Choose method based on analysis goal
Document Choice: Interpolation method affects results
Validate: Check that interpolation preserves key features
Right Resolution: More points ≠ better (avoid over-sampling)

Scientific Considerations

For Growth Curves

Use makima or pchip (preserve exponential growth)
Avoid cubic spline (may create artificial inflection points)

For Substrate/Product

Use pchip (preserves monotonic consumption/production)
Avoid methods that create overshoots

For Environmental

Use akima (smooth physical parameter changes)
Linear OK if high-frequency logging

For Derivatives

Use cubic_spline (smooth second derivative)
Higher n_points for accurate derivatives

Integration with Dashboard

The Cell Culture dashboard provides:

Interactive Selection: Choose method from dropdown
Live Preview: See interpolation results immediately
Comparison: View original vs interpolated side-by-side
Export: Download interpolated data as CSV

Notes

Time column name is hardcoded: "Time"
All numeric columns are interpolated (metadata columns preserved)
Original data is never modified (creates new Tables)
Interpolation tag added for traceability
Compatible with all Cell Culture workflow tasks
Designed for biological time-series (fermentation focus)

Input

Input ResourceSet to interpolate

ResourceSet containing cell culture time series data

Resource set

Output

Subsampled ResourceSet

ResourceSet with combined real and interpolated data

Resource set

Configuration

time_column

Optional

Name of the time column

Type : string

Default value : Time

batch_column

Optional

Name of the batch column to remove from output

Type : string

Default value : ESSAI

sample_column

Optional

Name of the sample column to remove from output

Type : string

Default value : FERMENTEUR

method

Optional

Method: linear, nearest, quadratic, cubic, pchip, akima, makima, cubic_spline, univariate_spline, spline

Type : string

Allowed values :

makima
linear
cubic
nearest
cubic_spline
spline
pchip
quadratic
akima
univariate_spline

Default value : makima

grid_strategy

Optional

Strategy for time grid generation

Type : string

Allowed values :

global_auto
per_file
reference

Default value : global_auto

n_points

Optional

Number of points in interpolation grid (auto if not set)

Type : int

Default value : 500

spline_order

Optional

Order of spline interpolation for univariate_spline method (1-5)

Type : int

Default value : 3

edge_strategy

Optional

How to handle data beyond original time range

Type : string

Allowed values :

nearest
linear
nan

Default value : nan

reference_index

Optional

Index of resource to use as reference for grid (when grid_strategy='reference')

Type : int

min_values_threshold

Optional

Interpolate only columns with at least this many non-NaN values. If None, interpolate all columns.

Type : int

Overview

Purpose

Interpolation Methods

Shape-Preserving Methods (Recommended for Biological Data)

makima (Default) ★ RECOMMENDED

pchip

akima

Polynomial Methods

linear

quadratic

cubic

Advanced Spline Methods

cubic_spline

univariate_spline or spline

nearest

Grid Strategies

global_auto (Default) ★ RECOMMENDED

per_file

reference

Configuration Parameters

Essential Parameters

method (String)

grid_strategy (String)

n_points (Integer)

Advanced Parameters

spline_order (Integer)

edge_strategy (String)

reference_index (Integer)

Input Requirements

ResourceSet Structure

Data Quality Requirements

Processing Steps

Output Structure

interpolated_resource_set (ResourceSet)

Columns

Tags (Preserved)

Column Tags (Preserved)

Properties

Use Cases

1. Multi-Sample Comparison

2. Growth Rate Calculation

3. Gap Filling

4. Noise Reduction

5. Machine Learning Prep

Example Workflows

Standard Analysis Pipeline

High-Quality Publication

Quick Preview

Performance Considerations

Error Handling

Automatic Fallbacks

Common Issues

Best Practices

Scientific Considerations

For Growth Curves

For Substrate/Product

For Environmental

For Derivatives

Integration with Dashboard

Notes

Input

Output

Configuration

Have you developed a brick?

`makima` (Default) ★ RECOMMENDED

`pchip`

`akima`

`linear`

`quadratic`

`cubic`

`cubic_spline`

`univariate_spline` or `spline`

`nearest`

`global_auto` (Default) ★ RECOMMENDED

`per_file`

`reference`

`method` (String)

`grid_strategy` (String)

`n_points` (Integer)

`spline_order` (Integer)

`edge_strategy` (String)

`reference_index` (Integer)