🔍 Introduction
BLAST (Basic Local Alignment Search Tool) is one of the most widely used algorithms for comparing biological sequences. This workflow integrates NCBI RefSeq database preparation and per-sequence local BLAST searches directly into Constellab, enabling researchers to perform rapid and reproducible sequence alignments against curated RefSeq RNA or protein datasets without relying on online NCBI servers.
The workflow consists of two tasks:
- Build RefSeqBlastDatabase : Downloads and prepares the RefSeq BLAST databases locally.
- RefSeqDB Local BLAST : performs local BLAST searches on each, using the previously built database.
This approach allows for:
- Offline BLAST execution (no dependency on NCBI web service availability)
- large datasets blast at the same time.
🧰 Prerequisites
- Access to Constellab and a valid Digital Lab environment
- Installed bricks:
gws_omix
≥ 0.11.11 - Sufficient disk space for RefSeq database download (~400 GB )
- Input files: For Build RefSeqBlastDatabase: No input files required, the task downloads data. For RefSeqDB Local BLAST: A FASTA file containing nucleotide or protein sequences, and the database folder from RefSeqBlastDB.
🧪 Use Case Steps
I. Building the RefSeq BLAST Database (Build RefSeqBlastDatabase Task)
- Add the "Build RefSeqBlastDatabase" task in Constellab.
- Run the task, it will: Download the required RefSeq RNA and protein datasets from NCBI.
- Output: A BLAST DB folder ready to be used in the "RefSeqDB Local BLAST" task.
II. Running Per-Sequence Local BLAST (RefSeqDB Local BLAST Task)
- Add the "RefSeqDB Local BLAST" task in Constellab.
- Link your inputs:
input_fasta
: FASTA file with sequences to query.blast_db_folder
: Output from the Build RefSeqBlastDatabase task. - Configure parameters:
sequence_type:
"nucl"
(nucleotide) or"prot"
(protein). blast_program: One of"blastn"
,"blastp"
,"blastx"
,"tblastn"
. evalue: Expectation value threshold (default:1e-5
). max_target_seqs: Maximum number of target sequences to keep per query (default:5
). threads: Number of CPU threads (default:8
). - Run the task

📤 Output
The RefSeqDB Local BLAST task produces a blast_results
ResourceSet containing one .tsv
file per query sequence.
Each table includes:
Comments - 0
Login to post a comment
Login