Overview
The Cell Culture Application is a comprehensive analysis platform designed for fermentation and cell culture data analysis. It provides an intuitive workflow for processing, visualizing, and analyzing biological data from fermentation experiments.
Main Workflow Steps
1.Overview 📋
The starting point of your analysis. This step displays the results of the data loading scenario and helps you verify data quality.
What you'll see:
- Basic Statistics (4 key metrics):Total Samples: Total number of batch-sample pairs found in info fileValid Samples: Number of complete samples (all data types present)Completion Rate: Percentage of samples with complete dataData Tables: Number of individual tables in the loaded ResourceSet
- Missing Data Visualization (Venn Diagram):Interactive 3-circle Venn diagram showing data coverageBlue circle (Info): Samples with info file dataGreen circle (Raw Data): Samples with time series dataPurple circle (Follow-up): Samples with follow-up measurementsCenter overlap: Complete samples with all three data typesNumbers show count of samples in each regionExpandable table below lists all missing data details (Batch, Sample, Missing Value types)
- Complete Data Visualizations:Batch Distribution: Pie chart showing sample count per batchMedium Distribution: Bar chart showing sample count per medium type
Key Actions:
- Verify all expected samples were loaded
- Identify which samples have missing data (info/raw_data/follow_up)
- Check batch and medium distribution
- Confirm data completeness before proceeding to selection
2.Selection ✅
Interactive tool to select which batch-sample pairs you want to analyze.
What you'll see:
- Existing Selections (if any):List of previously created selections with their ID and statusExpandable view to see selection details
- Create New Selection:Interactive Data Table: Shows all valid samples (only complete data, no missing values)Columns displayed: Batch, Sample, MediumMulti-row selection mode: Click on rows to select/deselectSelected rows are highlighted
How to use:
- Review the table of valid samples
- Click on rows to select the batch-sample pairs you want to analyze
- Click "Validate Selection" button to create a new selection scenario
- The system launches a filtering scenario that creates a new ResourceSet with only selected samples
When to Use:
- Remove failed or problematic experiments from analysis
- Focus on specific batches or conditions
- Create multiple selections for comparison (e.g., "control group", "treatment group")
- Reduce dataset size for faster analysis
Output:
- New selection scenario created and launched
- Filtered ResourceSet containing only selected batch-sample pairs
- Selection saved with timestamp for future reference
3.Data Visualization
3.1. Table View 📊
Browse your data in tabular format.
Features:
- Sortable and filterable columns
- Search functionality
- Export to CSV
- View metadata (batch, sample, medium composition)
- Pagination for large datasets
3.2. Graph View 📈
Visualize time series data with interactive plots.
Features:
- Multi-sample Plots: Compare multiple fermentation curves
- Parameter Selection: Choose which measurements to display (OD, pH, glucose, etc.)
- Color Coding: Automatically color by batch, sample, or medium type
- Interactive Zoom: Focus on specific time ranges
- Export Options: Download plots as PNG or SVG
Common Visualizations:
- Growth curves (OD vs Time)
- Substrate consumption
- Product formation
- pH evolution
- Multi-parameter overlay
3.3. Medium View 🧪
Explore and manage medium composition data.
Features:
- Medium Composition Table: View all medium formulations
- Compare Formulations: Side-by-side comparison of different media
- Import/Export: Upload new medium compositions or export existing ones
- Metadata Integration: Link medium data with fermentation results
Typical Use Cases:
- Document medium variations
- Identify formulation differences
- Prepare data for predictive analysis
4.Quality Check 🔍
After visualizing your data, validate and clean it.
Features:
- Outlier Detection: Identify and remove statistical outliers using configurable thresholds
- Data Validation: Check for missing values, duplicates, and inconsistencies
- Visual Quality Reports:Distribution plotsTime series plots with outliers highlightedMissing data visualization
- Distribution plots
- Time series plots with outliers highlighted
- Missing data visualization
- Quality Metrics:Percentage of valid data pointsOutlier statistics per sampleData completeness scores
- Percentage of valid data points
- Outlier statistics per sample
- Data completeness scores
Configuration Options:
- Z-score threshold for outlier detection
- Minimum data points per sample
- Missing value handling strategies
Output: Cleaned and validated dataset ready for analysis
5.Exploratory Analysis
5.1. Medium PCA 🔬
Principal Component Analysis on medium composition data.
Purpose: Reduce dimensionality of medium composition data and identify key formulation patterns.
Features:
- Component Selection: Choose number of principal components (typically 2-3)
- Variance Explained: Understand how much variation each component captures
- 2D/3D Visualization: Interactive scatter plots colored by batches or outcomes
- Loadings Plot: See which medium components contribute most to each PC
Insights:
- Identify similar medium formulations
- Detect clustering patterns
- Understand which nutrients drive variability
5.2. Medium UMAP 🗺️
UMAP (Uniform Manifold Approximation and Projection) for medium composition.
Purpose: Non-linear dimensionality reduction for complex medium composition patterns.
Configuration:
- Number of Neighbors: Controls local vs global structure (default: 15)
- Minimum Distance: Affects point clustering tightness
- 2D/3D Output: Choose dimensionality for visualization
- K-Means Clustering: Optional automatic clustering
Advantages over PCA:
- Better preserves local structure
- More effective for non-linear relationships
- Clearer visual separation of groups
Results:
- Interactive 2D/3D plots
- Cluster assignments (if enabled)
- Downloadable coordinates table
6.Feature Extraction 📉
Extract biological characteristics from growth curves.
Purpose: Convert raw time-series data into biological features for predictive modeling.
Extracted Features:
- Growth Parameters:Maximum growth rate (μmax)Lag phase durationExponential phase durationMaximum OD reached
- Maximum growth rate (μmax)
- Lag phase duration
- Exponential phase duration
- Maximum OD reached
- Substrate Consumption:Consumption rateYield coefficients
- Consumption rate
- Yield coefficients
- Product Formation:Production rateFinal titerProductivity
- Production rate
- Final titer
- Productivity
Configuration:
- Select which parameters to extract
- Define calculation windows
- Choose smoothing methods
Output:
- Table with one row per sample
- Columns for each extracted biological feature
- Ready for machine learning analysis
7.Advanced Analysis
These analyses require the extracted features from Step 6 (Feature Extraction) to work properly. Make sure to run Feature Extraction before using these tools.
7.1. Metadata-Feature UMAP 🎯
Combine medium composition and extracted features for comprehensive UMAP analysis.
Prerequisites: Requires extracted features from Feature Extraction step.
Purpose: Visualize relationships between medium formulations and biological outcomes.
Features:
- Combined Dataset: Merges medium metadata with extracted features
- Column Selection: Choose which features to include in UMAP
- Medium Name Coloring: Color points by medium composition
- Hover Data: Display batch, trial, and other metadata on hover
- 2D/3D Visualization: Interactive plots with clustering option
Use Cases:
- Identify medium-performance relationships
- Find optimal formulation clusters
- Detect outlier experiments
7.2. PLS Regression 📊
Partial Least Squares regression for predictive modeling.
Prerequisites: Requires extracted features from Feature Extraction step.
Purpose: Model relationships between medium composition (X) and biological characteristics (Y).
Workflow:
- Select Target Variable: Choose which biological feature to predict (e.g., μmax, final titer)
- Configure Model:Number of components (with cross-validation)Columns to exclude
- Train Model: Automatic train/test split
- Evaluate Results
Results Display:
- Performance Metrics:R² scoreRMSE (Root Mean Square Error)MAE (Mean Absolute Error)
- R² score
- RMSE (Root Mean Square Error)
- MAE (Mean Absolute Error)
- Predictions vs Actual: Scatter plots for train and test sets
- VIP Scores: Variable Importance in Projection (VIP > 1 = important)
- Component Selection: Cross-validation plot to choose optimal components
Advantages:
- Handles correlated predictors well
- Works with small sample sizes
- Provides interpretable variable importance
7.3. Random Forest Regression 🌲
Ensemble learning for non-linear predictive modeling.
Prerequisites: Requires extracted features from Feature Extraction step.
Purpose: Predict biological characteristics using decision tree ensembles.
Configuration:
- Number of Trees: More trees = better stability (default: 100)
- Maximum Depth: Controls tree complexity (None = no limit)
- Random Seed: For reproducibility
- Target Variable: Which biological feature to predict
- Feature Selection: Choose which medium components to include
Results Display:
- Performance Metrics: R², RMSE, MAE
- Feature Importance: Bar chart showing which medium components matter most
- Predictions vs Actual: Train and test set visualizations
- Top 10/20 Important Variables: Tables with importance scores
When to Use:
- Non-linear relationships expected
- Large feature sets
- Need robust predictions
- Want feature importance rankings
7.4. Causal Effect Analysis 🔗
Identify cause-and-effect relationships between medium components and outcomes.
Purpose: Go beyond correlation to understand causal relationships.
Methods:
- Causal inference algorithms
- Intervention analysis
- Counterfactual reasoning
Features:
- Causal graph visualization
- Effect size estimation
- Confidence intervals
Output:
- Directed causal graphs
- Effect magnitude tables
- Recommendations for medium optimization
7.5. Optimization ⚙️
Use genetic algorithms to find optimal medium composition.
Purpose: Automatically discover the best medium formulation to maximize biological performance.
Configuration:
- Constraints Resource: Upload JSON file with component bounds
{
"glucose": {"lower_bound": 0, "upper_bound": 20},
"nitrogen": {"lower_bound": 0.5, "upper_bound": 5}
}
- Optimization Parameters:Population size (default: 50)Number of iterations (default: 100)
- Targets and Objectives:Define which features to optimize (e.g., μmax, final titer)Set minimum acceptable values for each targetAdd multiple targets (multi-objective optimization)
Algorithm:
- Genetic algorithm with mutation and crossover
- Pareto optimization for multi-objective cases
- Constraint handling
Results:
- Optimal medium formulation(s)
- Predicted performance values
- Convergence plots
- Sensitivity analysis
Use Cases:
- Design of Experiments (DoE) follow-up
- Medium formulation development
- Cost reduction while maintaining performance
- Multi-parameter optimization
Data Flow Summary
1. Load Data (from files or existing resources)
↓
2. Overview (verify data loaded correctly, check missing data)
↓
3. Selection (select batch-sample pairs to analyze)
↓
4. Visualization (Table/Graph/Medium views)
↓
5. Quality Check (remove outliers, validate data)
↓
6. Exploratory Analysis (PCA/UMAP on medium data)
↓
7. Feature Extraction (extract biological characteristics)
↓
8. Advanced Analysis (predictive models, optimization)
Tips and Best Practices
Data Preparation
- Ensure consistent column naming across files
- Check for missing values before analysis
- Document medium compositions clearly
- Use meaningful batch and sample names
Selection Strategy
- Create multiple selections for comparison (e.g., "good batches", "failed batches")
- Keep a "full dataset" selection for reference
- Select only valid samples (complete data)
Visualization & Quality Control
- Visualize data first to identify potential issues
- Always run Quality Check after visualization
- Review outlier detection results manually
- Document why specific samples are excluded
- Export quality reports for documentation
Analysis Workflow
- Start with exploratory analysis (PCA/UMAP) before modeling
- Extract features before running predictive models
- Use PLS for initial modeling, Random Forest for complex relationships
- Validate models with train/test splits
Optimization
- Start with wide constraint ranges, then narrow
- Use multiple optimization runs with different random seeds
- Validate optimization results experimentally
- Consider cost constraints in multi-objective optimization
Common Issues and Solutions
Problem: "No data available"
Solution: Check that Selection step completed successfully and produced a valid ResourceSet.
Problem: "Analysis failed"
Solution:
- Verify input data has required columns
- Check for missing values in critical columns
- Ensure numeric columns are properly formatted
Problem: "Model performance is poor"
Solution:
- Increase number of samples (minimum 20-30 recommended)
- Check for outliers in data
- Try different feature sets
- Consider data normalization
Problem: "Optimization doesn't converge"
Solution:
- Increase population size or iterations
- Check constraint bounds are reasonable
- Verify target objectives are achievable
- Review quality of training data
Export and Reporting
All analysis steps provide export options:
- CSV Downloads: Tables, coordinates, metrics
- Plot Exports: PNG, SVG formats
- Scenario Reports: Complete analysis documentation
- Model Artifacts: Save trained models for reuse