Menu
Introduction
Getting Started
Use cases
Cell Culture App
Technical documentations
Version
Publication date

Sep 19, 2024

Confidentiality
Public
Reactions
0
Share

Filter Cell Culture ResourceSet by Selection

TASK
Typing name :  TASK.gws_plate_reader.FilterFermentorAnalyseLoadedResourceSetBySelection Brick :  gws_plate_reader

Filter cell culture data by selecting specific batch/sample combinations

Filter fermentation data ResourceSet by selecting specific batch/sample combinations.

Overview

This task enables selective analysis by filtering a ResourceSet to include only specific batch/sample combinations. It works seamlessly with the output from CellCultureLoadData task and preserves all tags and metadata.

Purpose

  • Select specific fermenters and experiments for focused analysis
  • Remove incomplete or problematic samples from analysis
  • Create subsets for comparative studies (e.g., different media, conditions)
  • Reduce dataset size for faster processing and visualization

How It Works

Input Requirements

  • ResourceSet: Output from CellCultureLoadData containing Tables with tags
  • Selection Criteria: List of batch/sample pairs to include

Tag-Based Filtering

Each Table in the ResourceSet is examined for two required tags:

  • batch: Experiment/trial identifier (e.g., "EPA-WP3-25-001")
  • sample: Fermenter identifier (e.g., "23A", "23B")

Only Tables matching the selection criteria are included in the output.

Filtering Process

  1. Parses selection criteria (JSON format or dict)
  2. Creates fast lookup set for O(1) matching
  3. Iterates through all resources in input ResourceSet
  4. For each Table resource:
    • Extracts 'batch' and 'sample' tag values
    • Checks if pair exists in selection criteria
    • Includes resource in output if matched
    • Logs warning if resource missing required tags
  5. Returns filtered ResourceSet with only selected samples

Configuration

Selection Criteria Format

The selection_criteria parameter accepts a list of dictionaries or JSON strings:

Python Format

[
    {'batch': 'EPA-WP3-25-001', 'sample': '23A'},
    {'batch': 'EPA-WP3-25-001', 'sample': '23B'},
    {'batch': 'EPA-WP3-25-002', 'sample': '23A'},
]

JSON Format

[
    {"batch": "EPA-WP3-25-001", "sample": "23A"},
    {"batch": "EPA-WP3-25-001", "sample": "23B"},
    {"batch": "EPA-WP3-25-002", "sample": "23A"}
]

Alternative Format (with couple keys)

{
    'couple0': {'batch': 'batch01', 'sample': 'A1'},
    'couple1': {'batch': 'batch01', 'sample': 'B1'},
    'couple2': {'batch': 'batch02', 'sample': 'A1'}
}

Input Specifications

resource_set (ResourceSet)

  • Source: Typically from CellCultureLoadData task
  • Requirements:
    • Must contain Table resources
    • Each Table must have 'batch' and 'sample' tags
    • Tags are case-sensitive
  • Example Structure:
    ResourceSet {
      "EPA-WP3-25-001_23A": Table [tags: batch=EPA-WP3-25-001, sample=23A, medium=M1]
      "EPA-WP3-25-001_23B": Table [tags: batch=EPA-WP3-25-001, sample=23B, medium=M1]
      "EPA-WP3-25-002_23A": Table [tags: batch=EPA-WP3-25-002, sample=23A, medium=M2]
    }
    

Output Specifications

filtered_resource_set (ResourceSet)

  • Contents: Only Tables matching selection criteria
  • Preservation:
    • All original columns preserved
    • All tags preserved (batch, sample, medium, missing_value, etc.)
    • Column tags preserved (is_index_column, is_data_column, unit)
    • Table names preserved
  • Example: If selecting 2 out of 10 samples, output contains 2 Tables

Behavior Details

Matching Logic

  • Exact Match: Tag values must match exactly (case-sensitive)
  • Both Tags Required: Resources missing either tag are excluded with warning
  • Non-Table Resources: Ignored (only Table resources are processed)
  • Duplicate Prevention: Same resource won't be added multiple times

Error Handling

  • Missing Tags: Logs warning and skips resource
  • Invalid JSON: Attempts to parse as dict, logs error if fails
  • Empty Selection: Returns empty ResourceSet (valid but unusual)
  • No Matches: Returns empty ResourceSet with info message

Use Cases

1. Quality-Based Filtering

CellCultureLoadData → View Venn Diagram →
Select only complete samples → Filter → Analysis

2. Comparative Analysis

Select samples from Medium A → Filter → Analyze
Select samples from Medium B → Filter → Analyze
Compare results

3. Time-Period Selection

Select experiments from specific date range →
Filter by batch names → Analysis

4. Fermenter Comparison

Select same experiment across different fermenters →
Filter → Compare performance

Integration with CellCulture Dashboard

The dashboard provides an interactive interface:

  1. Overview Step: View all samples in table
  2. Selection Step:
    • Click rows to select samples
    • See batch/sample/medium information
    • Preview selection count
  3. Automatic Filtering: Dashboard creates and runs filter scenario
  4. Visualization: View filtered data in graphs and tables

Example Workflow

[ResourceSet with 50 samples]
      ↓
[Select 10 samples of interest]
      ↓
FilterFermentorAnalyseLoadedResourceSetBySelection
      ↓
[Filtered ResourceSet with 10 samples]
      ↓
[Interpolation / Visualization / Export]

Performance Notes

  • Fast Lookup: Uses set-based matching for O(1) complexity
  • Memory Efficient: Only copies selected resources, not all data
  • Scalable: Handles hundreds of samples efficiently
  • No Data Copy: References original Table objects (metadata preserved)

Tips and Best Practices

  1. Check Data First: View overview and Venn diagram before filtering
  2. Use Dashboard: Interactive selection is easier than manual JSON
  3. Save Selection: Document selected samples for reproducibility
  4. Verify Output: Check filtered count matches expected selection
  5. Chain Operations: Filter → Interpolate → Visualize for complete workflow

Troubleshooting

Issue Cause Solution
Empty output No matches found Verify tag values are exact (case-sensitive)
Missing samples Tags not set Check input from CellCultureLoadData
Warning messages Resources missing tags Review input data quality
JSON parse error Invalid format Use list of dicts or valid JSON string

Notes

  • Designed specifically for CellCulture workflow (follows CellCultureLoadData)
  • Preserves complete data integrity (no data modification)
  • Tag keys ('batch', 'sample') are hardcoded constants for consistency
  • Output can be used directly with CellCultureSubsampling task
  • Compatible with Streamlit dashboard for interactive use

Input

Input ResourceSet to filter
ResourceSet from CellCultureLoadData containing batch/sample tagged resources

Output

Filtered ResourceSet
ResourceSet containing only selected batch/sample combinations

Configuration

selection_criteria

List of dictionaries with 'batch' and 'sample' keys for filtering

Type : list
Shine Logo
Technical bricks to reuse or customize

Have you developed a brick?

Share it to accelerate projects for the entire community.