gws_microbial_genomics

Standardized annotation of bacterial genome , MAGs

🔍 Introduction


Bakta is a powerful tool for the rapid and standardized annotation of bacterial genomes, MAGs and plasmids, from both isolates and metagenome-assembled genomes (MAGs).


It provides dbxref-rich, sORF-inclusive, taxon-independent annotations in machine-readable formats such as JSON, GFF3, GenBank, EMBL, TSV, and FASTA, ensuring compatibility with downstream workflows.


Unlike protein-only functional annotators (e.g., EggNOG-mapper), Bakta is a full annotation pipeline, comparable to Prokka, DFAST, and PGAP, capable of:


  • Predicting CDS (coding DNA sequences) and non-coding RNAs (tRNA, rRNA, tmRNA, ncRNA)
    • Detecting CRISPR arrays and origins of replication (oriC/V)
      • Adding functional descriptions and stable cross-references to major databases (RefSeq, UniRef100, UniParc), facilitating FAIR-compliant and reproducible analyses.

        This makes Bakta a complete solution for researchers working with bacterial genome annotation, comparative genomics, and downstream bioinformatics pipelines.


        🧰 Prerequisites


        • Access to Constellab and a valid Digital Lab environment
          • Installed bricks: gws_microbial_genomics version ≥ 0.1.1
            • Input file: A genome assembly in FASTA format (contigs, plasmids, MAGs) Bakta Database: A pre-downloaded Bakta DB (db-full, db-light, or db) generated using Build/Update Bakta Databasetask.

              🧪 Use Case Steps



              1. Import your genome FASTA into Constellab.
                1. Link it to the Task: "Procaryotes Genome Annotation".
                  1. Configure Parameters: prefix: Output prefix (default: FASTA stem). genus, species, strain: (optional) Organism metadata. translation_table: Choose genetic code (default: The Bacterial, Archaeal and Plant Plastid Code, NCBI 11). replicon_type & replicon_topology: Apply to all contigs if desired (e.g., plasmid + circular). complete_genome: Mark sequences as complete (optional). threads: Number of CPU threads to allocate.
                    1. Run the Task:


                      Text editor image

                      📂 Output


                      Bakta produces a set of standardized files for downstream use:



                      Text editor image

                      ✅ Example Use Cases


                      • Annotating new bacterial isolates before submission to NCBI/ENA.
                        • Adding functional context to MAGs in metagenomic studies.
                          • Comparing plasmid vs chromosome content.
                            • Generating publication-ready genome maps.



                              🧬 Comparative Summary: Bakta vs eggNOG-mapper




                              EXAMPLE:


                              With Bakta


                              • Input: ecoli_contigs.fna
                                • Outputs: ecoli.gff3, ecoli.gbff, ecoli.faa, ecoli.ffn, etc.
                                  • What you get in practice:

                                    -            “On contig_12, from 10543 to 11890: a CDS named gyrA”


                                    -            “On contig_3: a tRNA-Leu gene”


                                    -            A GenBank file you can use for comparison, submission, and visualization.


                                    With eggNOG-mapper (after Bakta)


                                    • Typical input: ecoli.faa (the proteins predicted by Bakta)
                                      • Output: annotation.tsv
                                        • What you get in practice:

                                          -            For the gyrA protein: functional and ontology assignments such as COG category, GO terms, EC number (if applicable), KEGG pathway (e.g., DNA replication), etc.


                                          In short: you move from “here is the gene in the genome” to “here is what it does and which pathways it belongs to.”







                                          Shine Logo
                                          Technical bricks to reuse or customize

                                          Have you developed a brick?

                                          Share it to accelerate projects for the entire community.