Filter Cell Culture ResourceSet by Selection
Filter cell culture data by selecting specific batch/sample combinations
Filter fermentation data ResourceSet by selecting specific batch/sample combinations.
Overview
This task enables selective analysis by filtering a ResourceSet to include only specific batch/sample combinations. It works seamlessly with the output from CellCultureLoadData task and preserves all tags and metadata.
Purpose
- Select specific fermenters and experiments for focused analysis
- Remove incomplete or problematic samples from analysis
- Create subsets for comparative studies (e.g., different media, conditions)
- Reduce dataset size for faster processing and visualization
How It Works
Input Requirements
- ResourceSet: Output from CellCultureLoadData containing Tables with tags
- Selection Criteria: List of batch/sample pairs to include
Tag-Based Filtering
Each Table in the ResourceSet is examined for two required tags:
batch: Experiment/trial identifier (e.g., "EPA-WP3-25-001")sample: Fermenter identifier (e.g., "23A", "23B")
Only Tables matching the selection criteria are included in the output.
Filtering Process
- Parses selection criteria (JSON format or dict)
- Creates fast lookup set for O(1) matching
- Iterates through all resources in input ResourceSet
- For each Table resource:
- Extracts 'batch' and 'sample' tag values
- Checks if pair exists in selection criteria
- Includes resource in output if matched
- Logs warning if resource missing required tags
- Returns filtered ResourceSet with only selected samples
Configuration
Selection Criteria Format
The selection_criteria parameter accepts a list of dictionaries or JSON strings:
Python Format
[
{'batch': 'EPA-WP3-25-001', 'sample': '23A'},
{'batch': 'EPA-WP3-25-001', 'sample': '23B'},
{'batch': 'EPA-WP3-25-002', 'sample': '23A'},
]
JSON Format
[
{"batch": "EPA-WP3-25-001", "sample": "23A"},
{"batch": "EPA-WP3-25-001", "sample": "23B"},
{"batch": "EPA-WP3-25-002", "sample": "23A"}
]
Alternative Format (with couple keys)
{
'couple0': {'batch': 'batch01', 'sample': 'A1'},
'couple1': {'batch': 'batch01', 'sample': 'B1'},
'couple2': {'batch': 'batch02', 'sample': 'A1'}
}
Input Specifications
resource_set (ResourceSet)
- Source: Typically from CellCultureLoadData task
- Requirements:
- Must contain Table resources
- Each Table must have 'batch' and 'sample' tags
- Tags are case-sensitive
- Example Structure:
ResourceSet { "EPA-WP3-25-001_23A": Table [tags: batch=EPA-WP3-25-001, sample=23A, medium=M1] "EPA-WP3-25-001_23B": Table [tags: batch=EPA-WP3-25-001, sample=23B, medium=M1] "EPA-WP3-25-002_23A": Table [tags: batch=EPA-WP3-25-002, sample=23A, medium=M2] }
Output Specifications
filtered_resource_set (ResourceSet)
- Contents: Only Tables matching selection criteria
- Preservation:
- All original columns preserved
- All tags preserved (batch, sample, medium, missing_value, etc.)
- Column tags preserved (is_index_column, is_data_column, unit)
- Table names preserved
- Example: If selecting 2 out of 10 samples, output contains 2 Tables
Behavior Details
Matching Logic
- Exact Match: Tag values must match exactly (case-sensitive)
- Both Tags Required: Resources missing either tag are excluded with warning
- Non-Table Resources: Ignored (only Table resources are processed)
- Duplicate Prevention: Same resource won't be added multiple times
Error Handling
- Missing Tags: Logs warning and skips resource
- Invalid JSON: Attempts to parse as dict, logs error if fails
- Empty Selection: Returns empty ResourceSet (valid but unusual)
- No Matches: Returns empty ResourceSet with info message
Use Cases
1. Quality-Based Filtering
CellCultureLoadData → View Venn Diagram →
Select only complete samples → Filter → Analysis
2. Comparative Analysis
Select samples from Medium A → Filter → Analyze
Select samples from Medium B → Filter → Analyze
Compare results
3. Time-Period Selection
Select experiments from specific date range →
Filter by batch names → Analysis
4. Fermenter Comparison
Select same experiment across different fermenters →
Filter → Compare performance
Integration with CellCulture Dashboard
The dashboard provides an interactive interface:
- Overview Step: View all samples in table
- Selection Step:
- Click rows to select samples
- See batch/sample/medium information
- Preview selection count
- Automatic Filtering: Dashboard creates and runs filter scenario
- Visualization: View filtered data in graphs and tables
Example Workflow
[ResourceSet with 50 samples]
↓
[Select 10 samples of interest]
↓
FilterFermentorAnalyseLoadedResourceSetBySelection
↓
[Filtered ResourceSet with 10 samples]
↓
[Interpolation / Visualization / Export]
Performance Notes
- Fast Lookup: Uses set-based matching for O(1) complexity
- Memory Efficient: Only copies selected resources, not all data
- Scalable: Handles hundreds of samples efficiently
- No Data Copy: References original Table objects (metadata preserved)
Tips and Best Practices
- Check Data First: View overview and Venn diagram before filtering
- Use Dashboard: Interactive selection is easier than manual JSON
- Save Selection: Document selected samples for reproducibility
- Verify Output: Check filtered count matches expected selection
- Chain Operations: Filter → Interpolate → Visualize for complete workflow
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| Empty output | No matches found | Verify tag values are exact (case-sensitive) |
| Missing samples | Tags not set | Check input from CellCultureLoadData |
| Warning messages | Resources missing tags | Review input data quality |
| JSON parse error | Invalid format | Use list of dicts or valid JSON string |
Notes
- Designed specifically for CellCulture workflow (follows CellCultureLoadData)
- Preserves complete data integrity (no data modification)
- Tag keys ('batch', 'sample') are hardcoded constants for consistency
- Output can be used directly with CellCultureSubsampling task
- Compatible with Streamlit dashboard for interactive use
Input
Input ResourceSet to filter
ResourceSet from CellCultureLoadData containing batch/sample tagged resources
Output
Filtered ResourceSet
ResourceSet containing only selected batch/sample combinations
Configuration
selection_criteria
List of dictionaries with 'batch' and 'sample' keys for filtering
list