Getting Started

Optimal Design of Experiments for Bioprocessing and Beyond

1. Overview

The gws_design_of_experiment brick is designed to facilitate optimal design of experiments (DoE) with a focus on bioprocessing and other domains. It integrates machine learning (ML) tools and advanced optimization algorithms to help researchers and engineers extract meaningful insights, optimize processes, and uncover causal relationships in complex datasets.

2. Key Features

2.1 Machine Learning Tools

The brick provides a suite of ML tools for dimensionality reduction, feature extraction, and predictive modeling:

2.2 Advanced Tools

A. Causal Effect Analysis

Purpose: Uncover causal relationships between variables, going beyond correlation to infer cause-and-effect.

Methodologies:

Double Machine Learning (DML): Implemented using the EconML package.

LinearDML: For discrete treatments.

CausalForestDML: For continuous treatments.

Output: Estimates the Average Treatment Effect (ATE) for all specified treatment-target pairs.

B. Genetic Algorithms

Purpose: Optimize complex, multi-objective problems (e.g., optimal medium computation).

Methodologies:

NSGA-II: Non-dominated Sorting Genetic Algorithm II for multi-objective optimization.

GA: Classic Genetic Algorithm for single-objective optimization.

3. Complementarity: Causality vs. Multivariate Correlation Analysis

Synergy: Use correlation analysis to explore relationships and causality analysis to validate hypotheses and guide interventions.

4. Practical Applications

Bioprocessing: Optimize medium composition, fermentation conditions, and yield.

Manufacturing: Improve process parameters for quality and efficiency.

Healthcare: Identify causal factors in clinical outcomes.

5. Getting Started

Prerequisites

Python 3.8+

Required packages: scikit-learn, umap-learn, econml, deap (for genetic algorithms).

Example Workflow

Data Preparation: Load and preprocess your dataset.

Exploratory Analysis: Use PCA/UMAP to visualize data structure.

Causal Analysis: Apply DML to estimate treatment effects.

Optimization: Use NSGA-II to find optimal process parameters.

6. References

EconML Documentation

DEAP: Genetic Algorithms in Python