Cell Culture Time Series Subsampling
Subsample fermentation time series data by combining real and interpolated values
[Generated by Task Expert Agent]
Subsample fermentation time series data by combining real and interpolated values.
Overview
This task creates a combined dataset where:
- Interpolated columns use only interpolated values
- Non-interpolated columns use only real measured values
- Time column contains both real and interpolated time points
Purpose
- Smart Data Combination: Use interpolation only where it makes sense
- Preserve Real Data: Keep original measurements for columns with sparse data
- Enable Comparison: Create uniform datasets while respecting data quality
- Prepare for Analysis: Output ready for ML and statistical analysis
Interpolation Methods
Shape-Preserving Methods (Recommended for Biological Data)
makima (Default) ★ RECOMMENDED
- Modified Akima Interpolation: Enhanced shape-preserving method
- Best For: Biological time series with varying growth phases
- Advantages:
- Avoids overshoot in steep regions (pH, biomass)
- Natural-looking curves following data trends
- Robust to outliers
- Smooth transitions between regions
- Use When: General fermentation data analysis
pchip
- Piecewise Cubic Hermite Interpolating Polynomial
- Best For: Data with local extrema that must be preserved
- Advantages:
- Guarantees monotonicity (no artificial peaks/valleys)
- Shape-preserving in regions of constant slope
- Good for concentration measurements
- Use When: Substrate consumption, product formation curves
akima
- Original Akima Spline: Classic shape-preserving method
- Best For: Smooth biological responses
- Advantages:
- Very smooth curves
- Less sensitive to outliers than cubic splines
- Natural interpolation
- Use When: Temperature, pressure, flow rate data
Polynomial Methods
linear
- Linear Interpolation: Straight lines between points
- Best For: High-frequency data, preliminary analysis
- Advantages: Fast, simple, no overshoot
- Disadvantages: Angular appearance, not smooth
- Use When: Quick checks, high sampling rate data
quadratic
- Quadratic Spline: Piecewise second-degree polynomials
- Best For: Slowly varying parameters
- Advantages: Smoother than linear, faster than cubic
- Use When: Steady-state measurements
cubic
- Cubic Spline: Piecewise third-degree polynomials
- Best For: Very smooth data
- Advantages: Smooth, twice differentiable
- Disadvantages: Can overshoot between points
- Use When: High-quality data with minimal noise
Advanced Spline Methods
cubic_spline
- Natural Cubic Spline: Global smoothing with boundary conditions
- Best For: Complete curve fitting
- Advantages: Global smoothness, natural boundary behavior
- Use When: Need derivatives, mathematical modeling
univariate_spline or spline
- Adaptive Order B-Spline: Automatic order selection
- Best For: Complex curves with varying smoothness
- Advantages: Adaptive, handles varied data patterns
- Configuration:
spline_order(1-5, default=3) - Use When: Unknown data behavior, exploratory analysis
nearest
- Nearest Neighbor: Step-wise interpolation
- Best For: Categorical data, control parameters
- Use When: Binary states, control settings over time
Grid Strategies
global_auto (Default) ★ RECOMMENDED
- Creates ONE common time grid for ALL samples
- Advantages:
- Direct sample comparison at same time points
- Optimal for plotting multiple samples together
- Consistent analysis across dataset
- Algorithm:
- Finds global min/max time across all samples
- Calculates median time step from all samples
- Generates uniform grid spanning entire range
- Use When: Comparing multiple fermentations
per_file
- Creates INDIVIDUAL time grid for EACH sample
- Advantages:
- Preserves each sample's specific time range
- Better for samples with very different durations
- Optimal grid density per sample
- Use When: Samples have vastly different durations
reference
- Uses time grid from ONE specific sample as template
- Configuration:
reference_index(0-based index of reference sample) - Advantages:
- Forces all samples to exact same time points
- Perfect alignment for direct arithmetic operations
- Use When:
- Need exact alignment for calculations
- One sample is the "gold standard"
Configuration Parameters
Essential Parameters
method (String)
- Default:
"makima" - Options: See Interpolation Methods section
- Impact: Determines curve shape and smoothness
- Recommendation: Start with
makima, trypchipif issues
grid_strategy (String)
- Default:
"global_auto" - Options:
"global_auto","per_file","reference" - Impact: Determines time alignment across samples
- Recommendation: Use default unless specific needs
n_points (Integer)
- Default: 500
- Range: 10 to 20,000
- Impact:
- Higher: Smoother curves, larger files, slower processing
- Lower: Faster processing, may miss details
- Auto Mode: If not set, calculated from data characteristics
- Recommendation:
- 100-300: Quick previews
- 500-1000: Standard analysis
- 1000-5000: Publication quality
- 5000+: Detailed mathematical analysis
Advanced Parameters
spline_order (Integer)
- Default: 3
- Range: 1 to 5
- Only For:
univariate_spline/splinemethod - Values:
- 1: Linear (same as linear method)
- 2: Quadratic
- 3: Cubic (smooth, flexible) ★
- 4-5: Very smooth (may oscillate)
edge_strategy (String)
- Default:
"nearest" - Options:
"nearest": Extend with nearest value (flat)"linear": Linear extrapolation"nan": Fill with NaN (missing)
- Impact: How to handle time points outside original range
- Recommendation: Use default (
nearest) for safety
reference_index (Integer)
- Default: 0
- Only For:
grid_strategy="reference" - Range: 0 to (number_of_samples - 1)
- Impact: Which sample's time grid to use as template
Input Requirements
ResourceSet Structure
- Source: CellCultureLoadData or Filter task output
- Contents: Table resources with time series data
- Required Column:
"Time"(culture time in hours) - Required Tags:
batch: Experiment identifier (preserved in output)sample: Sample identifier (preserved in output)- Column tag
is_index_column='true'on time column
Data Quality Requirements
- Numeric Data: All measurement columns must be numeric
- Decimal Format: Commas automatically converted to dots
- Time Values: Must be finite (no NaN/Inf in time column)
- Minimum Points: At least 2 time points per sample
- Sorting: Time column will be sorted automatically
Processing Steps
- Data Loading: Extracts Tables from input ResourceSet
- Decimal Normalization: Converts commas to dots in numeric columns
- Time Grid Generation: Creates interpolation grid(s) based on strategy
- Quality Checks: Validates sufficient data points, handles edge cases
- Interpolation: Applies selected method to each numeric column
- Tag Preservation: Copies all tags from original to interpolated Tables
- Metadata: Adds interpolation method tag for traceability
- Output Assembly: Combines all interpolated Tables into ResourceSet
Output Structure
interpolated_resource_set (ResourceSet)
Contains one Table per input sample with:
Columns
Time: Uniform time grid- All measurement columns: Interpolated values
- Metadata columns: Preserved as-is (Batch, Fermentor, Medium)
Tags (Preserved)
batch: Original experiment IDsample: Original sample IDmedium: Original medium namemissing_value: Original missing data infointerpolation_method: NEW - Records method used
Column Tags (Preserved)
is_index_column='true': On time columnis_data_column='true': On measurement columnsunit: Units of measurement
Properties
- Same number of rows across all samples (if global_auto)
- Aligned time points for easy comparison
- Smooth, continuous curves
- Ready for plotting and analysis
Use Cases
1. Multi-Sample Comparison
Filter → Interpolate (global_auto, makima) →
Plot all samples on same axes
2. Growth Rate Calculation
Interpolate (cubic_spline) →
Calculate derivatives → Analyze growth kinetics
3. Gap Filling
Data with missing points →
Interpolate (pchip) → Complete time series
4. Noise Reduction
Noisy measurements →
Interpolate (makima, fewer points) → Smoother curves
5. Machine Learning Prep
Interpolate (global_auto, 200 points) →
Fixed-length feature vectors → ML model
Example Workflows
Standard Analysis Pipeline
CellCultureLoadData
↓ (50 samples with irregular sampling)
Filter (select 10 samples)
↓
Interpolate (method=makima, grid_strategy=global_auto, n_points=500)
↓
[10 samples, each with 500 uniform time points]
↓
Visualization / Statistical Analysis
High-Quality Publication
Filter (select best samples)
↓
Interpolate (method=pchip, n_points=2000)
↓
Export smooth, high-resolution curves
Quick Preview
Interpolate (method=linear, n_points=50)
↓
Fast visualization for QC
Performance Considerations
- Speed: linear > nearest > quadratic > cubic > akima > pchip > cubic_spline > univariate_spline
- Memory: Proportional to (n_samples × n_points × n_columns)
- Quality: Shape-preserving methods generally best for biological data
- Recommendations:
- < 10 samples: Any method, high n_points OK
- 10-100 samples: Use makima/pchip, n_points=500-1000
100 samples: Consider linear or lower n_points
Error Handling
Automatic Fallbacks
- Too Few Points: Skips sample with warning
- Non-numeric Data: Preserves original values
- NaN in Time: Filters out invalid rows
- Method Failure: Falls back to linear interpolation
- Spline Order: Reduces order if insufficient points
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| Oscillations | Cubic with sparse data | Use pchip or makima |
| Angular curves | Linear method | Switch to makima/pchip |
| Overshoot | Cubic spline | Use shape-preserving method |
| Slow processing | Too many points | Reduce n_points to 500-1000 |
| Missing columns | Time column not found | Check column name exactly |
Best Practices
- Start Simple: Begin with default settings (makima, global_auto)
- Visual Check: Plot original + interpolated to verify quality
- Match Purpose: Choose method based on analysis goal
- Document Choice: Interpolation method affects results
- Validate: Check that interpolation preserves key features
- Right Resolution: More points ≠ better (avoid over-sampling)
Scientific Considerations
For Growth Curves
- Use
makimaorpchip(preserve exponential growth) - Avoid cubic spline (may create artificial inflection points)
For Substrate/Product
- Use
pchip(preserves monotonic consumption/production) - Avoid methods that create overshoots
For Environmental
- Use
akima(smooth physical parameter changes) - Linear OK if high-frequency logging
For Derivatives
- Use
cubic_spline(smooth second derivative) - Higher n_points for accurate derivatives
Integration with Dashboard
The Cell Culture dashboard provides:
- Interactive Selection: Choose method from dropdown
- Live Preview: See interpolation results immediately
- Comparison: View original vs interpolated side-by-side
- Export: Download interpolated data as CSV
Notes
- Time column name is hardcoded:
"Time" - All numeric columns are interpolated (metadata columns preserved)
- Original data is never modified (creates new Tables)
- Interpolation tag added for traceability
- Compatible with all Cell Culture workflow tasks
- Designed for biological time-series (fermentation focus)
Input
Output
Configuration
time_column
Name of the time column
stringTimebatch_column
Name of the batch column to remove from output
stringESSAIsample_column
Name of the sample column to remove from output
stringFERMENTEURmethod
Method: linear, nearest, quadratic, cubic, pchip, akima, makima, cubic_spline, univariate_spline, spline
stringmakimalinearcubicnearestcubic_splinesplinepchipquadraticakimaunivariate_splinemakimagrid_strategy
Strategy for time grid generation
stringglobal_autoper_filereferenceglobal_auton_points
Number of points in interpolation grid (auto if not set)
int500spline_order
Order of spline interpolation for univariate_spline method (1-5)
int3edge_strategy
How to handle data beyond original time range
stringnearestlinearnannanreference_index
Index of resource to use as reference for grid (when grid_strategy='reference')
intmin_values_threshold
Interpolate only columns with at least this many non-NaN values. If None, interpolate all columns.
int