Menu
Introduction
Getting Started
Use cases
Technical documentations
Version
Publication date

Sep 19, 2024

Confidentiality
Public
Reactions
0
Share

Fermentalg Time Series Interpolation

TASK
Typing name :  TASK.gws_plate_reader.FermentalgInterpolation Brick :  gws_plate_reader

Interpolate fermentation time series data with multiple advanced methods

Interpolate fermentation time series data to create uniform time grids for analysis.

Overview

This task transforms fermentation data with irregular time sampling into uniformly sampled time series, enabling direct comparison between experiments, better visualization, and advanced time-series analysis.

Purpose

  • Align Time Points: Create common time grid across all samples
  • Fill Gaps: Handle missing measurements through intelligent interpolation
  • Enable Comparison: Make samples with different sampling rates comparable
  • Smooth Noise: Apply shape-preserving methods to reduce measurement noise
  • Prepare for ML: Create uniform input for machine learning algorithms

Interpolation Methods

Shape-Preserving Methods (Recommended for Biological Data)

makima (Default) ★ RECOMMENDED

  • Modified Akima Interpolation: Enhanced shape-preserving method
  • Best For: Biological time series with varying growth phases
  • Advantages:
    • Avoids overshoot in steep regions (pH, biomass)
    • Natural-looking curves following data trends
    • Robust to outliers
    • Smooth transitions between regions
  • Use When: General fermentation data analysis

pchip

  • Piecewise Cubic Hermite Interpolating Polynomial
  • Best For: Data with local extrema that must be preserved
  • Advantages:
    • Guarantees monotonicity (no artificial peaks/valleys)
    • Shape-preserving in regions of constant slope
    • Good for concentration measurements
  • Use When: Substrate consumption, product formation curves

akima

  • Original Akima Spline: Classic shape-preserving method
  • Best For: Smooth biological responses
  • Advantages:
    • Very smooth curves
    • Less sensitive to outliers than cubic splines
    • Natural interpolation
  • Use When: Temperature, pressure, flow rate data

Polynomial Methods

linear

  • Linear Interpolation: Straight lines between points
  • Best For: High-frequency data, preliminary analysis
  • Advantages: Fast, simple, no overshoot
  • Disadvantages: Angular appearance, not smooth
  • Use When: Quick checks, high sampling rate data

quadratic

  • Quadratic Spline: Piecewise second-degree polynomials
  • Best For: Slowly varying parameters
  • Advantages: Smoother than linear, faster than cubic
  • Use When: Steady-state measurements

cubic

  • Cubic Spline: Piecewise third-degree polynomials
  • Best For: Very smooth data
  • Advantages: Smooth, twice differentiable
  • Disadvantages: Can overshoot between points
  • Use When: High-quality data with minimal noise

Advanced Spline Methods

cubic_spline

  • Natural Cubic Spline: Global smoothing with boundary conditions
  • Best For: Complete curve fitting
  • Advantages: Global smoothness, natural boundary behavior
  • Use When: Need derivatives, mathematical modeling

univariate_spline or spline

  • Adaptive Order B-Spline: Automatic order selection
  • Best For: Complex curves with varying smoothness
  • Advantages: Adaptive, handles varied data patterns
  • Configuration: spline_order (1-5, default=3)
  • Use When: Unknown data behavior, exploratory analysis

nearest

  • Nearest Neighbor: Step-wise interpolation
  • Best For: Categorical data, control parameters
  • Use When: Binary states, control settings over time

Grid Strategies

global_auto (Default) ★ RECOMMENDED

  • Creates ONE common time grid for ALL samples
  • Advantages:
    • Direct sample comparison at same time points
    • Optimal for plotting multiple samples together
    • Consistent analysis across dataset
  • Algorithm:
    1. Finds global min/max time across all samples
    2. Calculates median time step from all samples
    3. Generates uniform grid spanning entire range
  • Use When: Comparing multiple fermentations

per_file

  • Creates INDIVIDUAL time grid for EACH sample
  • Advantages:
    • Preserves each sample's specific time range
    • Better for samples with very different durations
    • Optimal grid density per sample
  • Use When: Samples have vastly different durations

reference

  • Uses time grid from ONE specific sample as template
  • Configuration: reference_index (0-based index of reference sample)
  • Advantages:
    • Forces all samples to exact same time points
    • Perfect alignment for direct arithmetic operations
  • Use When:
    • Need exact alignment for calculations
    • One sample is the "gold standard"

Configuration Parameters

Essential Parameters

method (String)

  • Default: "makima"
  • Options: See Interpolation Methods section
  • Impact: Determines curve shape and smoothness
  • Recommendation: Start with makima, try pchip if issues

grid_strategy (String)

  • Default: "global_auto"
  • Options: "global_auto", "per_file", "reference"
  • Impact: Determines time alignment across samples
  • Recommendation: Use default unless specific needs

n_points (Integer)

  • Default: 500
  • Range: 10 to 20,000
  • Impact:
    • Higher: Smoother curves, larger files, slower processing
    • Lower: Faster processing, may miss details
  • Auto Mode: If not set, calculated from data characteristics
  • Recommendation:
    • 100-300: Quick previews
    • 500-1000: Standard analysis
    • 1000-5000: Publication quality
    • 5000+: Detailed mathematical analysis

Advanced Parameters

spline_order (Integer)

  • Default: 3
  • Range: 1 to 5
  • Only For: univariate_spline / spline method
  • Values:
    • 1: Linear (same as linear method)
    • 2: Quadratic
    • 3: Cubic (smooth, flexible) ★
    • 4-5: Very smooth (may oscillate)

edge_strategy (String)

  • Default: "nearest"
  • Options:
    • "nearest": Extend with nearest value (flat)
    • "linear": Linear extrapolation
    • "nan": Fill with NaN (missing)
  • Impact: How to handle time points outside original range
  • Recommendation: Use default (nearest) for safety

reference_index (Integer)

  • Default: 0
  • Only For: grid_strategy="reference"
  • Range: 0 to (number_of_samples - 1)
  • Impact: Which sample's time grid to use as template

Input Requirements

ResourceSet Structure

  • Source: FermentalgLoadData or Filter task output
  • Contents: Table resources with time series data
  • Required Column: "Temps de culture (h)" (culture time in hours)
  • Required Tags:
    • batch: Experiment identifier (preserved in output)
    • sample: Sample identifier (preserved in output)
    • Column tag is_index_column='true' on time column

Data Quality Requirements

  • Numeric Data: All measurement columns must be numeric
  • Decimal Format: Commas automatically converted to dots
  • Time Values: Must be finite (no NaN/Inf in time column)
  • Minimum Points: At least 2 time points per sample
  • Sorting: Time column will be sorted automatically

Processing Steps

  1. Data Loading: Extracts Tables from input ResourceSet
  2. Decimal Normalization: Converts commas to dots in numeric columns
  3. Time Grid Generation: Creates interpolation grid(s) based on strategy
  4. Quality Checks: Validates sufficient data points, handles edge cases
  5. Interpolation: Applies selected method to each numeric column
  6. Tag Preservation: Copies all tags from original to interpolated Tables
  7. Metadata: Adds interpolation method tag for traceability
  8. Output Assembly: Combines all interpolated Tables into ResourceSet

Output Structure

interpolated_resource_set (ResourceSet)

Contains one Table per input sample with:

Columns

  • Temps de culture (h): Uniform time grid
  • All measurement columns: Interpolated values
  • Metadata columns: Preserved as-is (ESSAI, FERMENTEUR, MILIEU)

Tags (Preserved)

  • batch: Original experiment ID
  • sample: Original sample ID
  • medium: Original medium name
  • missing_value: Original missing data info
  • interpolation_method: NEW - Records method used

Column Tags (Preserved)

  • is_index_column='true': On time column
  • is_data_column='true': On measurement columns
  • unit: Units of measurement

Properties

  • Same number of rows across all samples (if global_auto)
  • Aligned time points for easy comparison
  • Smooth, continuous curves
  • Ready for plotting and analysis

Use Cases

1. Multi-Sample Comparison

Filter → Interpolate (global_auto, makima) →
Plot all samples on same axes

2. Growth Rate Calculation

Interpolate (cubic_spline) →
Calculate derivatives → Analyze growth kinetics

3. Gap Filling

Data with missing points →
Interpolate (pchip) → Complete time series

4. Noise Reduction

Noisy measurements →
Interpolate (makima, fewer points) → Smoother curves

5. Machine Learning Prep

Interpolate (global_auto, 200 points) →
Fixed-length feature vectors → ML model

Example Workflows

Standard Analysis Pipeline

FermentalgLoadData
  ↓ (50 samples with irregular sampling)
Filter (select 10 samples)
  ↓
Interpolate (method=makima, grid_strategy=global_auto, n_points=500)
  ↓
[10 samples, each with 500 uniform time points]
  ↓
Visualization / Statistical Analysis

High-Quality Publication

Filter (select best samples)
  ↓
Interpolate (method=pchip, n_points=2000)
  ↓
Export smooth, high-resolution curves

Quick Preview

Interpolate (method=linear, n_points=50)
  ↓
Fast visualization for QC

Performance Considerations

  • Speed: linear > nearest > quadratic > cubic > akima > pchip > cubic_spline > univariate_spline
  • Memory: Proportional to (n_samples × n_points × n_columns)
  • Quality: Shape-preserving methods generally best for biological data
  • Recommendations:
    • < 10 samples: Any method, high n_points OK
    • 10-100 samples: Use makima/pchip, n_points=500-1000
    • 100 samples: Consider linear or lower n_points

Error Handling

Automatic Fallbacks

  • Too Few Points: Skips sample with warning
  • Non-numeric Data: Preserves original values
  • NaN in Time: Filters out invalid rows
  • Method Failure: Falls back to linear interpolation
  • Spline Order: Reduces order if insufficient points

Common Issues

Issue Cause Solution
Oscillations Cubic with sparse data Use pchip or makima
Angular curves Linear method Switch to makima/pchip
Overshoot Cubic spline Use shape-preserving method
Slow processing Too many points Reduce n_points to 500-1000
Missing columns Time column not found Check column name exactly

Best Practices

  1. Start Simple: Begin with default settings (makima, global_auto)
  2. Visual Check: Plot original + interpolated to verify quality
  3. Match Purpose: Choose method based on analysis goal
  4. Document Choice: Interpolation method affects results
  5. Validate: Check that interpolation preserves key features
  6. Right Resolution: More points ≠ better (avoid over-sampling)

Scientific Considerations

For Growth Curves

  • Use makima or pchip (preserve exponential growth)
  • Avoid cubic spline (may create artificial inflection points)

For Substrate/Product

  • Use pchip (preserves monotonic consumption/production)
  • Avoid methods that create overshoots

For Environmental

  • Use akima (smooth physical parameter changes)
  • Linear OK if high-frequency logging

For Derivatives

  • Use cubic_spline (smooth second derivative)
  • Higher n_points for accurate derivatives

Integration with Dashboard

The Fermentalg dashboard provides:

  1. Interactive Selection: Choose method from dropdown
  2. Live Preview: See interpolation results immediately
  3. Comparison: View original vs interpolated side-by-side
  4. Export: Download interpolated data as CSV

Notes

  • Time column name is hardcoded: "Temps de culture (h)"
  • All numeric columns are interpolated (metadata columns preserved)
  • Original data is never modified (creates new Tables)
  • Interpolation tag added for traceability
  • Compatible with all Fermentalg workflow tasks
  • Designed for biological time-series (fermentation focus)

Input

Input ResourceSet to interpolate
ResourceSet containing fermentalg time series data

Output

Interpolated ResourceSet
ResourceSet with interpolated time series data

Configuration

method

Optional

Method: linear, nearest, quadratic, cubic, pchip, akima, makima, cubic_spline, univariate_spline, spline

Type : stringAllowed values : quadratic cubic univariate_spline akima nearest cubic_spline linear spline pchip makima Default value : makima

grid_strategy

Optional

Strategy for time grid generation

Type : stringAllowed values : global_auto per_file reference Default value : global_auto

n_points

Optional

Number of points in interpolation grid (auto if not set)

Type : intDefault value : 500

spline_order

Optional

Order of spline interpolation for univariate_spline method (1-5)

Type : intDefault value : 3

edge_strategy

Optional

How to handle data beyond original time range

Type : stringAllowed values : nearest linear nan Default value : nearest

reference_index

Optional

Index of resource to use as reference for grid (when grid_strategy='reference')

Type : int
Technical bricks to reuse or customize

Have you developed a brick?

Share it to accelerate projects for the entire community.