Optimization task using machine learning models.
This task performs optimization on experimental data by:
- Training multiple machine learning models (Random Forest, XGBoost, CatBoost)
- Selecting the best performing model based on cross-validation R² scores
- Using algorithms (NSGA-II or GA) to find optimal solutions
- Generating comprehensive optimization results and analysis files
The optimization process considers:
- Target variables: Variables to maximize during optimization
- Constraints: Manual bounds on input features
- Thresholds: Minimum acceptable values for target variables
Generated Output Files:
generalized_solutions.csv
: All optimization solutions found
best_generalized_solution.csv
: Best solution based on CV and target values
actual_vs_predicted.csv
: Model validation data (observed vs predicted)
feature_importance_matrix.csv
: Feature importance for each target variable
constraints_used_in_optimization.csv
: Bounds applied to each feature
optimization_progress.csv
: Convergence history during optimization
Inputs:
data (Table): Experimental data containing features and target variables
targets_thresholds (JSONDict): Minimum threshold values for each target variable
manual_constraints (JSONDict): Custom bounds for input features in format:
{"feature_name": {"lower_bound": value, "upper_bound": value}}
Outputs:
results_folder (Folder): Directory containing all optimization results and analysis files
Example:
For a chemical process optimization, you might want to maximize yield and purity
while keeping temperature below 100°C and pressure above 2 bar, with minimum
yield of 80% and minimum purity of 95%.