🔍 Introduction
Bakta is a powerful tool for the rapid and standardized annotation of bacterial genomes, MAGs and plasmids, from both isolates and metagenome-assembled genomes (MAGs).
It provides dbxref-rich, sORF-inclusive, taxon-independent annotations in machine-readable formats such as JSON, GFF3, GenBank, EMBL, TSV, and FASTA, ensuring compatibility with downstream workflows.
Unlike protein-only functional annotators (e.g., EggNOG-mapper), Bakta is a full annotation pipeline, comparable to Prokka, DFAST, and PGAP, capable of:
- Predicting CDS (coding DNA sequences) and non-coding RNAs (tRNA, rRNA, tmRNA, ncRNA)
- Detecting CRISPR arrays and origins of replication (oriC/V)
- Adding functional descriptions and stable cross-references to major databases (RefSeq, UniRef100, UniParc), facilitating FAIR-compliant and reproducible analyses.
This makes Bakta a complete solution for researchers working with bacterial genome annotation, comparative genomics, and downstream bioinformatics pipelines.
🧰 Prerequisites
- Access to Constellab and a valid Digital Lab environment
- Installed bricks:
gws_microbial_genomics version ≥ 0.1.1 - Input file:
A genome assembly in FASTA format (contigs, plasmids, MAGs)
Bakta Database:
A pre-downloaded Bakta DB (
db-full, db-light, or db) generated using Build/Update Bakta Databasetask.
🧪 Use Case Steps
- Import your genome FASTA into Constellab.
- Link it to the Task:
"Procaryotes Genome Annotation".
- Configure Parameters:
prefix: Output prefix (default: FASTA stem).
genus, species, strain: (optional) Organism metadata.
translation_table: Choose genetic code (default: The Bacterial, Archaeal and Plant Plastid Code, NCBI 11).
replicon_type & replicon_topology: Apply to all contigs if desired (e.g., plasmid + circular).
complete_genome: Mark sequences as complete (optional).
threads: Number of CPU threads to allocate.
- Run the Task:

📂 Output
Bakta produces a set of standardized files for downstream use:

✅ Example Use Cases
- Annotating new bacterial isolates before submission to NCBI/ENA.
- Adding functional context to MAGs in metagenomic studies.
- Comparing plasmid vs chromosome content.
- Generating publication-ready genome maps.
🧬 Comparative Summary: Bakta vs eggNOG-mapper
EXAMPLE:
With Bakta
- Input:
ecoli_contigs.fna
- Outputs:
ecoli.gff3, ecoli.gbff, ecoli.faa, ecoli.ffn, etc.
- What you get in practice:
-
“On contig_12, from 10543 to 11890: a CDS named gyrA”
-
“On contig_3: a tRNA-Leu gene”
-
A GenBank file you can use for comparison, submission, and
visualization.
With
eggNOG-mapper (after Bakta)
- Typical input: ecoli.faa (the
proteins predicted by Bakta)
- Output:
annotation.tsv
- What you get in practice:
-
For the gyrA protein: functional and ontology assignments such as COG
category, GO terms, EC number (if applicable), KEGG pathway (e.g., DNA
replication), etc.
In
short: you move from “here is the gene in the genome” to “here is what it does
and which pathways it belongs to.”