Menu
Introduction
Getting Started
ID Cards
Use Cases
Technical documentations
Version
Publication date

Mar 9, 2022

Confidentiality
Public
Reactions
0
Share

DNA translation

🔬 Introduction


This task translates a DNA sequence into protein for the forward reading frames 1–3 using the Standard genetic code, and detects open reading frames (ORFs) defined as ATG → nearest in-frame stop (TAA/TAG/TGA). It produces a concise text report with the raw per-frame translations and three per-frame tables (TSV) listing ORFs with genomic coordinates, lengths, stop codon, and peptide. This is handy to quickly scan coding potential, check frame consistency, or extract candidate peptides for downstream analyses.


🧰 Prerequisites


  • Access to Constellab and a valid Digital Lab environment
    • Installed bricks:gws_omix >>> xxxxxxxxxxxx
      • Input file: FASTA or plain text DNA sequence

        🧪 Workflow: Step by Step


        1. Add the task: DNA translation in Constellab
          1. Provide input: set your FASTA DNA file.
            1. Configure (optional): prefix — base name for outputs (default: dna_translate_orf).
              1. Run the task.

                Under the hood:


                • The worker validates IUPAC DNA (A,C,G,T + ambiguity including N).
                  • Translates frames 1, 2, 3 (offsets 0,1,2). Stops appear as *.
                    • Finds ORFs per frame: start at ATG, end at the first in-frame stop ("TAA", "TAG", "TGA"); peptide is translated up to but not including the stop.
                      • Writes one TSV per frame and a compact text report.

                        Text editor image

                        📤 Output


                        • Text report (<prefix>.txt) Input length (nt) Raw protein translations for frames 1–3 (wrapped to 70 chars; stops as *)
                          • Per-frame ORF tables (<prefix>.frame1_orfs.tsv, ...frame2_orfs.tsv, ...frame3_orfs.tsv) Columns (tab-separated; header on the first line): frame — 1, 2, or 3 start_nt, end_nt1-based nucleotide coordinates (inclusive) len_nt, len_aa — ORF length in nucleotides / amino acids stop — stop codon encountered (TAA/TAG/TGA) peptide — translated amino-acid sequence (no terminal *)

                            💡 Tips & Notes


                            • Only the forward strand is analyzed; reverse-complement frames are not included.
                              • If no ATG→stop is found in a frame, that frame’s TSV will just contain the header.
                                Technical bricks to reuse or customize

                                Have you developed a brick?

                                Share it to accelerate projects for the entire community.