User Guide

Overview


The Cell Culture Application is a comprehensive analysis platform designed for fermentation and cell culture data analysis. It provides an intuitive workflow for processing, visualizing, and analyzing biological data from fermentation experiments.


Main Workflow Steps


1.Overview 📋


The starting point of your analysis. This step displays the results of the data loading scenario and helps you verify data quality.


What you'll see:


  1. Basic Statistics (4 key metrics):Total Samples: Total number of batch-sample pairs found in info fileValid Samples: Number of complete samples (all data types present)Completion Rate: Percentage of samples with complete dataData Tables: Number of individual tables in the loaded ResourceSet
    1. Missing Data Visualization (Venn Diagram):Interactive 3-circle Venn diagram showing data coverageBlue circle (Info): Samples with info file dataGreen circle (Raw Data): Samples with time series dataPurple circle (Follow-up): Samples with follow-up measurementsCenter overlap: Complete samples with all three data typesNumbers show count of samples in each regionExpandable table below lists all missing data details (Batch, Sample, Missing Value types)
      1. Complete Data Visualizations:Batch Distribution: Pie chart showing sample count per batchMedium Distribution: Bar chart showing sample count per medium type

        Key Actions:


        • Verify all expected samples were loaded
          • Identify which samples have missing data (info/raw_data/follow_up)
            • Check batch and medium distribution
              • Confirm data completeness before proceeding to selection

                2.Selection ✅


                Interactive tool to select which batch-sample pairs you want to analyze.


                What you'll see:


                1. Existing Selections (if any):List of previously created selections with their ID and statusExpandable view to see selection details
                  1. Create New Selection:Interactive Data Table: Shows all valid samples (only complete data, no missing values)Columns displayed: Batch, Sample, MediumMulti-row selection mode: Click on rows to select/deselectSelected rows are highlighted

                    How to use:


                    1. Review the table of valid samples
                      1. Click on rows to select the batch-sample pairs you want to analyze
                        1. Click "Validate Selection" button to create a new selection scenario
                          1. The system launches a filtering scenario that creates a new ResourceSet with only selected samples

                            When to Use:


                            • Remove failed or problematic experiments from analysis
                              • Focus on specific batches or conditions
                                • Create multiple selections for comparison (e.g., "control group", "treatment group")
                                  • Reduce dataset size for faster analysis

                                    Output:


                                    • New selection scenario created and launched
                                      • Filtered ResourceSet containing only selected batch-sample pairs
                                        • Selection saved with timestamp for future reference

                                          3.Data Visualization


                                          3.1. Table View 📊


                                          Browse your data in tabular format.


                                          Features:


                                          • Sortable and filterable columns
                                            • Search functionality
                                              • Export to CSV
                                                • View metadata (batch, sample, medium composition)
                                                  • Pagination for large datasets

                                                    3.2. Graph View 📈


                                                    Visualize time series data with interactive plots.


                                                    Features:


                                                    • Multi-sample Plots: Compare multiple fermentation curves
                                                      • Parameter Selection: Choose which measurements to display (OD, pH, glucose, etc.)
                                                        • Color Coding: Automatically color by batch, sample, or medium type
                                                          • Interactive Zoom: Focus on specific time ranges
                                                            • Export Options: Download plots as PNG or SVG

                                                              Common Visualizations:


                                                              • Growth curves (OD vs Time)
                                                                • Substrate consumption
                                                                  • Product formation
                                                                    • pH evolution
                                                                      • Multi-parameter overlay

                                                                        3.3. Medium View 🧪


                                                                        Explore and manage medium composition data.


                                                                        Features:


                                                                        • Medium Composition Table: View all medium formulations
                                                                          • Compare Formulations: Side-by-side comparison of different media
                                                                            • Import/Export: Upload new medium compositions or export existing ones
                                                                              • Metadata Integration: Link medium data with fermentation results

                                                                                Typical Use Cases:


                                                                                • Document medium variations
                                                                                  • Identify formulation differences
                                                                                    • Prepare data for predictive analysis

                                                                                      4.Quality Check 🔍


                                                                                      After visualizing your data, validate and clean it.


                                                                                      Features:


                                                                                      • Outlier Detection: Identify and remove statistical outliers using configurable thresholds
                                                                                        • Data Validation: Check for missing values, duplicates, and inconsistencies
                                                                                          • Visual Quality Reports:Distribution plotsTime series plots with outliers highlightedMissing data visualization
                                                                                            • Distribution plots
                                                                                              • Time series plots with outliers highlighted
                                                                                                • Missing data visualization
                                                                                                • Quality Metrics:Percentage of valid data pointsOutlier statistics per sampleData completeness scores
                                                                                                  • Percentage of valid data points
                                                                                                    • Outlier statistics per sample
                                                                                                      • Data completeness scores

                                                                                                      Configuration Options:


                                                                                                      • Z-score threshold for outlier detection
                                                                                                        • Minimum data points per sample
                                                                                                          • Missing value handling strategies

                                                                                                            Output: Cleaned and validated dataset ready for analysis


                                                                                                            5.Exploratory Analysis


                                                                                                            5.1. Medium PCA 🔬


                                                                                                            Principal Component Analysis on medium composition data.


                                                                                                            Purpose: Reduce dimensionality of medium composition data and identify key formulation patterns.


                                                                                                            Features:


                                                                                                            • Component Selection: Choose number of principal components (typically 2-3)
                                                                                                              • Variance Explained: Understand how much variation each component captures
                                                                                                                • 2D/3D Visualization: Interactive scatter plots colored by batches or outcomes
                                                                                                                  • Loadings Plot: See which medium components contribute most to each PC

                                                                                                                    Insights:


                                                                                                                    • Identify similar medium formulations
                                                                                                                      • Detect clustering patterns
                                                                                                                        • Understand which nutrients drive variability

                                                                                                                          5.2. Medium UMAP 🗺️


                                                                                                                          UMAP (Uniform Manifold Approximation and Projection) for medium composition.


                                                                                                                          Purpose: Non-linear dimensionality reduction for complex medium composition patterns.


                                                                                                                          Configuration:


                                                                                                                          • Number of Neighbors: Controls local vs global structure (default: 15)
                                                                                                                            • Minimum Distance: Affects point clustering tightness
                                                                                                                              • 2D/3D Output: Choose dimensionality for visualization
                                                                                                                                • K-Means Clustering: Optional automatic clustering

                                                                                                                                  Advantages over PCA:


                                                                                                                                  • Better preserves local structure
                                                                                                                                    • More effective for non-linear relationships
                                                                                                                                      • Clearer visual separation of groups

                                                                                                                                        Results:


                                                                                                                                        • Interactive 2D/3D plots
                                                                                                                                          • Cluster assignments (if enabled)
                                                                                                                                            • Downloadable coordinates table

                                                                                                                                              6.Feature Extraction 📉


                                                                                                                                              Extract biological characteristics from growth curves.


                                                                                                                                              Purpose: Convert raw time-series data into biological features for predictive modeling.


                                                                                                                                              Extracted Features:


                                                                                                                                              • Growth Parameters:Maximum growth rate (μmax)Lag phase durationExponential phase durationMaximum OD reached
                                                                                                                                                • Maximum growth rate (μmax)
                                                                                                                                                  • Lag phase duration
                                                                                                                                                    • Exponential phase duration
                                                                                                                                                      • Maximum OD reached
                                                                                                                                                      • Substrate Consumption:Consumption rateYield coefficients
                                                                                                                                                        • Consumption rate
                                                                                                                                                          • Yield coefficients
                                                                                                                                                          • Product Formation:Production rateFinal titerProductivity
                                                                                                                                                            • Production rate
                                                                                                                                                              • Final titer
                                                                                                                                                                • Productivity

                                                                                                                                                                Configuration:


                                                                                                                                                                • Select which parameters to extract
                                                                                                                                                                  • Define calculation windows
                                                                                                                                                                    • Choose smoothing methods

                                                                                                                                                                      Output:


                                                                                                                                                                      • Table with one row per sample
                                                                                                                                                                        • Columns for each extracted biological feature
                                                                                                                                                                          • Ready for machine learning analysis

                                                                                                                                                                            7.Advanced Analysis


                                                                                                                                                                            These analyses require the extracted features from Step 6 (Feature Extraction) to work properly. Make sure to run Feature Extraction before using these tools.


                                                                                                                                                                            7.1. Metadata-Feature UMAP 🎯


                                                                                                                                                                            Combine medium composition and extracted features for comprehensive UMAP analysis.


                                                                                                                                                                            Prerequisites: Requires extracted features from Feature Extraction step.


                                                                                                                                                                            Purpose: Visualize relationships between medium formulations and biological outcomes.


                                                                                                                                                                            Features:


                                                                                                                                                                            • Combined Dataset: Merges medium metadata with extracted features
                                                                                                                                                                              • Column Selection: Choose which features to include in UMAP
                                                                                                                                                                                • Medium Name Coloring: Color points by medium composition
                                                                                                                                                                                  • Hover Data: Display batch, trial, and other metadata on hover
                                                                                                                                                                                    • 2D/3D Visualization: Interactive plots with clustering option

                                                                                                                                                                                      Use Cases:


                                                                                                                                                                                      • Identify medium-performance relationships
                                                                                                                                                                                        • Find optimal formulation clusters
                                                                                                                                                                                          • Detect outlier experiments

                                                                                                                                                                                            7.2. PLS Regression 📊


                                                                                                                                                                                            Partial Least Squares regression for predictive modeling.


                                                                                                                                                                                            Prerequisites: Requires extracted features from Feature Extraction step.


                                                                                                                                                                                            Purpose: Model relationships between medium composition (X) and biological characteristics (Y).


                                                                                                                                                                                            Workflow:


                                                                                                                                                                                            1. Select Target Variable: Choose which biological feature to predict (e.g., μmax, final titer)
                                                                                                                                                                                              1. Configure Model:Number of components (with cross-validation)Columns to exclude
                                                                                                                                                                                                1. Train Model: Automatic train/test split
                                                                                                                                                                                                  1. Evaluate Results

                                                                                                                                                                                                    Results Display:


                                                                                                                                                                                                    • Performance Metrics:R² scoreRMSE (Root Mean Square Error)MAE (Mean Absolute Error)
                                                                                                                                                                                                      • R² score
                                                                                                                                                                                                        • RMSE (Root Mean Square Error)
                                                                                                                                                                                                          • MAE (Mean Absolute Error)
                                                                                                                                                                                                          • Predictions vs Actual: Scatter plots for train and test sets
                                                                                                                                                                                                            • VIP Scores: Variable Importance in Projection (VIP > 1 = important)
                                                                                                                                                                                                              • Component Selection: Cross-validation plot to choose optimal components

                                                                                                                                                                                                                Advantages:


                                                                                                                                                                                                                • Handles correlated predictors well
                                                                                                                                                                                                                  • Works with small sample sizes
                                                                                                                                                                                                                    • Provides interpretable variable importance

                                                                                                                                                                                                                      7.3. Random Forest Regression 🌲


                                                                                                                                                                                                                      Ensemble learning for non-linear predictive modeling.


                                                                                                                                                                                                                      Prerequisites: Requires extracted features from Feature Extraction step.


                                                                                                                                                                                                                      Purpose: Predict biological characteristics using decision tree ensembles.


                                                                                                                                                                                                                      Configuration:


                                                                                                                                                                                                                      • Number of Trees: More trees = better stability (default: 100)
                                                                                                                                                                                                                        • Maximum Depth: Controls tree complexity (None = no limit)
                                                                                                                                                                                                                          • Random Seed: For reproducibility
                                                                                                                                                                                                                            • Target Variable: Which biological feature to predict
                                                                                                                                                                                                                              • Feature Selection: Choose which medium components to include

                                                                                                                                                                                                                                Results Display:


                                                                                                                                                                                                                                • Performance Metrics: R², RMSE, MAE
                                                                                                                                                                                                                                  • Feature Importance: Bar chart showing which medium components matter most
                                                                                                                                                                                                                                    • Predictions vs Actual: Train and test set visualizations
                                                                                                                                                                                                                                      • Top 10/20 Important Variables: Tables with importance scores

                                                                                                                                                                                                                                        When to Use:


                                                                                                                                                                                                                                        • Non-linear relationships expected
                                                                                                                                                                                                                                          • Large feature sets
                                                                                                                                                                                                                                            • Need robust predictions
                                                                                                                                                                                                                                              • Want feature importance rankings

                                                                                                                                                                                                                                                7.4. Causal Effect Analysis 🔗


                                                                                                                                                                                                                                                Identify cause-and-effect relationships between medium components and outcomes.


                                                                                                                                                                                                                                                Purpose: Go beyond correlation to understand causal relationships.


                                                                                                                                                                                                                                                Methods:


                                                                                                                                                                                                                                                • Causal inference algorithms
                                                                                                                                                                                                                                                  • Intervention analysis
                                                                                                                                                                                                                                                    • Counterfactual reasoning

                                                                                                                                                                                                                                                      Features:


                                                                                                                                                                                                                                                      • Causal graph visualization
                                                                                                                                                                                                                                                        • Effect size estimation
                                                                                                                                                                                                                                                          • Confidence intervals

                                                                                                                                                                                                                                                            Output:


                                                                                                                                                                                                                                                            • Directed causal graphs
                                                                                                                                                                                                                                                              • Effect magnitude tables
                                                                                                                                                                                                                                                                • Recommendations for medium optimization

                                                                                                                                                                                                                                                                  7.5. Optimization ⚙️


                                                                                                                                                                                                                                                                  Use genetic algorithms to find optimal medium composition.


                                                                                                                                                                                                                                                                  Purpose: Automatically discover the best medium formulation to maximize biological performance.


                                                                                                                                                                                                                                                                  Configuration:


                                                                                                                                                                                                                                                                  1. Constraints Resource: Upload JSON file with component bounds{ "glucose": {"lower_bound": 0, "upper_bound": 20}, "nitrogen": {"lower_bound": 0.5, "upper_bound": 5} }
                                                                                                                                                                                                                                                                    1. Optimization Parameters:Population size (default: 50)Number of iterations (default: 100)
                                                                                                                                                                                                                                                                      1. Targets and Objectives:Define which features to optimize (e.g., μmax, final titer)Set minimum acceptable values for each targetAdd multiple targets (multi-objective optimization)

                                                                                                                                                                                                                                                                        Algorithm:


                                                                                                                                                                                                                                                                        • Genetic algorithm with mutation and crossover
                                                                                                                                                                                                                                                                          • Pareto optimization for multi-objective cases
                                                                                                                                                                                                                                                                            • Constraint handling

                                                                                                                                                                                                                                                                              Results:


                                                                                                                                                                                                                                                                              • Optimal medium formulation(s)
                                                                                                                                                                                                                                                                                • Predicted performance values
                                                                                                                                                                                                                                                                                  • Convergence plots
                                                                                                                                                                                                                                                                                    • Sensitivity analysis

                                                                                                                                                                                                                                                                                      Use Cases:


                                                                                                                                                                                                                                                                                      • Design of Experiments (DoE) follow-up
                                                                                                                                                                                                                                                                                        • Medium formulation development
                                                                                                                                                                                                                                                                                          • Cost reduction while maintaining performance
                                                                                                                                                                                                                                                                                            • Multi-parameter optimization

                                                                                                                                                                                                                                                                                              Data Flow Summary


                                                                                                                                                                                                                                                                                              1. Load Data (from files or existing resources)
                                                                                                                                                                                                                                                                                                 ↓
                                                                                                                                                                                                                                                                                              2. Overview (verify data loaded correctly, check missing data)
                                                                                                                                                                                                                                                                                                 ↓
                                                                                                                                                                                                                                                                                              3. Selection (select batch-sample pairs to analyze)
                                                                                                                                                                                                                                                                                                 ↓
                                                                                                                                                                                                                                                                                              4. Visualization (Table/Graph/Medium views)
                                                                                                                                                                                                                                                                                                 ↓
                                                                                                                                                                                                                                                                                              5. Quality Check (remove outliers, validate data)
                                                                                                                                                                                                                                                                                                 ↓
                                                                                                                                                                                                                                                                                              6. Exploratory Analysis (PCA/UMAP on medium data)
                                                                                                                                                                                                                                                                                                 ↓
                                                                                                                                                                                                                                                                                              7. Feature Extraction (extract biological characteristics)
                                                                                                                                                                                                                                                                                                 ↓
                                                                                                                                                                                                                                                                                              8. Advanced Analysis (predictive models, optimization)

                                                                                                                                                                                                                                                                                              Tips and Best Practices


                                                                                                                                                                                                                                                                                              Data Preparation


                                                                                                                                                                                                                                                                                              • Ensure consistent column naming across files
                                                                                                                                                                                                                                                                                                • Check for missing values before analysis
                                                                                                                                                                                                                                                                                                  • Document medium compositions clearly
                                                                                                                                                                                                                                                                                                    • Use meaningful batch and sample names

                                                                                                                                                                                                                                                                                                      Selection Strategy


                                                                                                                                                                                                                                                                                                      • Create multiple selections for comparison (e.g., "good batches", "failed batches")
                                                                                                                                                                                                                                                                                                        • Keep a "full dataset" selection for reference
                                                                                                                                                                                                                                                                                                          • Select only valid samples (complete data)

                                                                                                                                                                                                                                                                                                            Visualization & Quality Control


                                                                                                                                                                                                                                                                                                            • Visualize data first to identify potential issues
                                                                                                                                                                                                                                                                                                              • Always run Quality Check after visualization
                                                                                                                                                                                                                                                                                                                • Review outlier detection results manually
                                                                                                                                                                                                                                                                                                                  • Document why specific samples are excluded
                                                                                                                                                                                                                                                                                                                    • Export quality reports for documentation

                                                                                                                                                                                                                                                                                                                      Analysis Workflow


                                                                                                                                                                                                                                                                                                                      • Start with exploratory analysis (PCA/UMAP) before modeling
                                                                                                                                                                                                                                                                                                                        • Extract features before running predictive models
                                                                                                                                                                                                                                                                                                                          • Use PLS for initial modeling, Random Forest for complex relationships
                                                                                                                                                                                                                                                                                                                            • Validate models with train/test splits

                                                                                                                                                                                                                                                                                                                              Optimization


                                                                                                                                                                                                                                                                                                                              • Start with wide constraint ranges, then narrow
                                                                                                                                                                                                                                                                                                                                • Use multiple optimization runs with different random seeds
                                                                                                                                                                                                                                                                                                                                  • Validate optimization results experimentally
                                                                                                                                                                                                                                                                                                                                    • Consider cost constraints in multi-objective optimization

                                                                                                                                                                                                                                                                                                                                      Common Issues and Solutions


                                                                                                                                                                                                                                                                                                                                      Problem: "No data available"


                                                                                                                                                                                                                                                                                                                                      Solution: Check that Selection step completed successfully and produced a valid ResourceSet.


                                                                                                                                                                                                                                                                                                                                      Problem: "Analysis failed"


                                                                                                                                                                                                                                                                                                                                      Solution:


                                                                                                                                                                                                                                                                                                                                      • Verify input data has required columns
                                                                                                                                                                                                                                                                                                                                        • Check for missing values in critical columns
                                                                                                                                                                                                                                                                                                                                          • Ensure numeric columns are properly formatted

                                                                                                                                                                                                                                                                                                                                            Problem: "Model performance is poor"


                                                                                                                                                                                                                                                                                                                                            Solution:


                                                                                                                                                                                                                                                                                                                                            • Increase number of samples (minimum 20-30 recommended)
                                                                                                                                                                                                                                                                                                                                              • Check for outliers in data
                                                                                                                                                                                                                                                                                                                                                • Try different feature sets
                                                                                                                                                                                                                                                                                                                                                  • Consider data normalization

                                                                                                                                                                                                                                                                                                                                                    Problem: "Optimization doesn't converge"


                                                                                                                                                                                                                                                                                                                                                    Solution:


                                                                                                                                                                                                                                                                                                                                                    • Increase population size or iterations
                                                                                                                                                                                                                                                                                                                                                      • Check constraint bounds are reasonable
                                                                                                                                                                                                                                                                                                                                                        • Verify target objectives are achievable
                                                                                                                                                                                                                                                                                                                                                          • Review quality of training data

                                                                                                                                                                                                                                                                                                                                                            Export and Reporting


                                                                                                                                                                                                                                                                                                                                                            All analysis steps provide export options:


                                                                                                                                                                                                                                                                                                                                                            • CSV Downloads: Tables, coordinates, metrics
                                                                                                                                                                                                                                                                                                                                                              • Plot Exports: PNG, SVG formats
                                                                                                                                                                                                                                                                                                                                                                • Scenario Reports: Complete analysis documentation
                                                                                                                                                                                                                                                                                                                                                                  • Model Artifacts: Save trained models for reuse

                                                                                                                                                                                                                                                                                                                                                                    Technical bricks to reuse or customize

                                                                                                                                                                                                                                                                                                                                                                    Have you developed a brick?

                                                                                                                                                                                                                                                                                                                                                                    Share it to accelerate projects for the entire community.