Introduction Version

Getting Started

This documentation is under construction. Please give us your feedbacks by contacting us at hub@gencovery.com.
BIOTA is a unified and structured collection of omics data from official open European (EMBL-EBI) and NCBI biological databases. It is dedicated to Gencovery Web Services (GWS) for the conception and use of digital twins of cell metabolism, and in broader extent, to any omics data integration and analysis.
All data distributed in BIOTA are provided in their original versions without any alteration by Gencovery team. The user has the possibility to check the reliability of the data from cited data-providers and from the related published scientific papers.
Gencovery is committed to keep BIOTA open, sustainably and easily accessible to any academics and industry scientists around the world.
Gencovery will deliver in the next futur Open APIs and Open Navigation Features to allow scientists navigating through BIOATA database.
BIOTA is divided in two sections. The first one is a collection on ontology and the second one is a collection on molecular data, classified using the ontology data.
The ontology database

  • Gene ontology: 47,210 entries
  • Systems biology ontology: 671 entries
  • Evidence and conclusion ontology: 1,828 entries
  • Brenda tissue ontology: 10,338 entries
  • NCBI taxonomy: 2,312,365 entries
  • Pathway: 20,987 entries
  • Enzyme classification: 399 entries

The molecular database

  • Metabolic compounds: 137,185 entries
  • Enzymes: 96,866 entries
  • Enzyme orthologs: 6,481 entries
  • Metabolic reactions: 53,449 entries
  • Proteins: 564,277 entries

Ontology databases
BIOATA ontology database is a collection of controlled biological terms used to describe biological information. Ontology information defines hierarchies related to these terms as well as their semantics. They are used to standardise the description ofbiological data during their analysis and interpretation.
Gene ontology database
The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species (http://geneontology.org).
License: GO data are available under the Creative Commons License (CC BY 4.0).
Systems biology ontology
The SBO (Systems Biology Ontology) is a set of controlled, relational vocabularies of terms commonly used in Systems Biology, and in particular in computational modelling. It introduces defines terms to standardise the description of a model and biochemical experiments in order to ease their efficient reuse (http://www.ebi.ac.uk/sbo).
License: SBO is under the Artistic License 2.0
Evidence and conclusion ontology
The Evidence and Conclusion Ontology (ECO) contains terms that describe types of evidence and assertion methods. ECO terms are used in the process of bio-curation to capture the evidence that supports biological assertions (http://www.evidenceontology.org).
Licence: ECO is under the Creative Commons License CC0 1.0 Universal (CC0 1.0).
Brenda tissue ontology
Brenda Tissue Ontology (BTO) is a structured controlled vocabulary for the source of an enzyme comprising tissues, cell lines, cell types and cell cultures. It provides terms, classifications, and definitions of tissues, organs, anatomical structures, plant parts, cell cultures, cell types, and cell lines of organisms from all taxonomic groups (animals, plants, fungi, etc.) as enzyme sources (https://www.ebi.ac.uk/ols/ontologies/bto)
License: BTO data are available under the Creative Commons License (CC BY 4.0).
NCBI taxonomy
The NCBI Taxonomy Database is the top reference curated classification and nomenclature for all of the organisms in the public sequence databases (https://www.ncbi.nlm.nih.gov/taxonomy). This currently represents about 10% of the described species of life on the planet.
Databases of molecular data on the NCBI Web site include such examples as nucleotide sequences (GenBank), protein sequences, macromolecular structures, molecular variation, gene expression, and mapping data. They are designed to provide and encourage access within the scientific community to sources of current and comprehensive information. Therefore, NCBI itself places no restrictions on the use or distribution of the data contained therein.
To learn more about NCBI Website and Data Usage Policies and Disclaimers, please click on NCBI Policies.
Pathways are collected from several open databases: Reactome and BKMS
Reactome pathway data
Reactome is a free, open-source, curated and peer-reviewed pathway database. It provides intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education. 
BKMS pathway data
BKMS-react is an integrated and non-redundant biochemical reaction database containing known enzyme-catalyzed and spontaneous reactions. Biochemical reactions collected from BRENDA, KEGG, MetaCyc and SABIO-RK were matched and integrated by aligning substrates and products.
Enzyme classification
The enzyme classification is collected from Expasy-ENZYME database (https://enzyme.expasy.org). It is based on The Enzyme Commission number (EC number), which is a numerical classification scheme for enzymes, based on the chemical reactions they catalyze. In this database, all the hierarchies of enzyme (functional) classes are described to help user better interpret their data even when the exact enzyme classification is not well known.
See https://enzyme.expasy.org/enzyme-byclass.html for a detailed description of all enzyme classes.
To learn more about the Enzyme Commission number (EC Number), please click on the related Wikipedia page.
License: Expasy-ENZYME database data are available under the Creative Commons License (CC BY 4.0).
The molecular database
BIOTA molecular data is a collection of compounds (metabolites), enzymes and other proteins, genes existing in living organisms. They are collected and structured from various open databases to accelerate the conception of digital twin of cell metabolisms.
Metabolic compounds
Data on metabolic compounds are collected from the ChEBI database. Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on small chemical compounds. The molecular entities in question are either natural products or synthetic products used to intervene in the processes of living organisms. Genome-encoded macromolecules (nucleic acids, proteins and peptides derived from proteins by cleavage) are not as a rule included in ChEBI (https://www.ebi.ac.uk/chebi/).
License: ChEBI data are available under the Creative Commons License (CC BY 4.0)
Enzyme data are based on the top reference European open databases BRENDA and Expasy. They give information on all known enzymes in living organisms on earth (classifications, gene sequences, thermodynamic parameters, etc).
IMPORTANT: Please do not confuse Enzyme with Enzyme-Ortholog (Enzo).
An Enzyme is defined by a unique enzyme in a living organism, while an Enzyme-Ortholog refers to a uniquely referenced EC Number (whatever the organism).
Enzyme orthologs are therefore more suited when dealing with enzyme functional classes, while Enzymes more suited to go deeper in the organism data (taxonomy, specific gene sequence of proteins). For this reason, about 6,000 enzyme-orthologs exist while about 90,000 enzymes are referenced.
BRENDA database
BRENDA is a major collection of enzyme functional data available to the scientific community (https://www.brenda-enzymes.org). Collected data are copyright-protected by Prof. Dr. D. Schomburg, Technische Universität Braunschweig, BRICS, Department of Bioinformatics and Biochemistry, Rebenring 56, 38106 Braunschweig, Germany.
BRENDA describes each type of characterized enzyme for which an Enzyme Commission (EC) number has been reported. In addition, several supplementary information on enzyme are provided such as

  • their tissue/cellular localization (see BRENDA Tissue Ontology),
  • their kinetic parameters and related environnemental conditions (KM, pH, etc.),
  • their cofactors,
  • genetic sequences (as FASTA data).

As EC numbers do not specify enzymes but enzyme-catalyzed reactions, a single reaction can be related to several EC Numbers.
License: Current BRENDA data are available under the Creative Commons License (CC BY 4.0). Please refer to BRENDA download without registration for more details. You are granted license to copy, distribute, display and make commercial use of the BRENDA database if you make appropriate reference to BRENDA and the authors.
Expasy database
Expasy is the bioinformatics resource portal of the SIB Swiss Institute of Bioinformatics (https://www.expasy.org). It provides access to over 160 databases and software tools, developed by SIB Groups and supporting a range of life science and clinical research domains, from genomics, proteomics and structural biology, to evolution and phylogeny, systems biology and medical chemistry. Expasy-ENZYME is a repository of information relative to the nomenclature of enzymes. It describes each type of characterized enzyme for which an EC (Enzyme Commission) number has been provided.
License: Expasy enzyme data are available under the Creative Commons License (CC BY 4.0).
Enzyme orthologs (Enzo)
To our knowledge, the concept of Enzyme-Ortholog (Enzo) is not a standard concept in biology or bioinformatics. It is analogous to Kegg Orthologs and was introduced in BIOTA database to uniquely reference enzyme by their EC Number characteristics as provided in Expasy-ENZYME (https://www.expasy.org), whatever the living organism. Enzos suite well to characterize metabolic pathways and are good for fast reconstruction of metabolic models from functional enzyme data.
This concept is used for instance in the paper of Tabei and co-authors, Bioinformatics 2016.
License: Expasy enzyme data are available under the Creative Commons License (CC BY 4.0).
Metabolic reactions
Rhea is an expert-curated knowledgebase of chemical and transport reactions of biological interest (https://www.rhea-db.org/). It uses the chemical information coming from CheBI, UniProtKB, InChIKey, GO, etc. Rhea is linked to BRENDA and Expasy through enzyme EC Numbers.
License: Rhea data are available under the Creative Commons License (CC BY 4.0).
Protein data are gathered from UniProtKB (https://www.uniprot.org/). The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. As much annotation information as possible is added in addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information).
BIOTA contains the manually reviewed and annotated records of the UniProtKB (Swiss-Prot).
UniProt is updated very frequently, every eight weeks. However BIOTA is currently not updated so fast.
License: Uniprot data are available under the Creative Commons License (CC BY 4.0).
Gencovery Numerical Resources (GNR) refer to the software, librairies and data provided by us through our web services. GNR may be covered by third-party licenses. Gencovery guarantees that GNR are accessible for your commercial and non-commercial use through Gencovery web services. For ad-hoc use of GNR outside Gencovery web services, please check third-party licenses to ensure you are legally authorised. Gencovery does not warrant or assume any legal liability or responsibility for the accuracy, completeness of any information disclosed through Gencovery web services. This is not a legal notice. Please refer to our terms of use for any legal notice about our web services.