Getting Started

Introduction

BIOTA is a unified and structured collection of omics data from official open European (EMBL-EBI) and NCBI biological databases. It is dedicated to Gencovery Web Services (GWS) for the conception and use of digital twins of cell metabolism, and in broader extent, to any omics data integration and analysis.

All data distributed in BIOTA are provided in their original versions without any alteration by Gencovery team. The user has the possibility to check the reliability of the data from cited data-providers and from the related published scientific papers.

BIOTA is divided in two sections. The first one is a collection on ontology and the second one is a collection on molecular data, classified using the ontology data.

The ontology database

Gene ontology: 47,210 entries

Systems biology ontology: 671 entries

Evidence and conclusion ontology: 1,828 entries

Brenda tissue ontology: 10,338 entries

NCBI taxonomy: 2,312,365 entries

Pathway: 20,987 entries

Enzyme classification: 399 entries

The molecular database Metabolic compounds: 137,185 entries

Enzymes: 96,866 entries

Enzyme orthologs: 6,481 entries

Metabolic reactions: 53,449 entries

Proteins: 564,277 entries

Ontology databases

BIOATA ontology database is a collection of controlled biological terms used to describe biological information. Ontology information defines hierarchies related to these terms as well as their semantics. They are used to standardise the description ofbiological data during their analysis and interpretation.

Gene ontology database

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species (http://geneontology.org).

Systems biology ontology

The SBO (Systems Biology Ontology) is a set of controlled, relational vocabularies of terms commonly used in Systems Biology, and in particular in computational modelling. It introduces defines terms to standardise the description of a model and biochemical experiments in order to ease their efficient reuse (http://www.ebi.ac.uk/sbo).

Evidence and conclusion ontology

The Evidence and Conclusion Ontology (ECO) contains terms that describe types of evidence and assertion methods. ECO terms are used in the process of bio-curation to capture the evidence that supports biological assertions (http://www.evidenceontology.org).

Brenda tissue ontology

Brenda Tissue Ontology (BTO) is a structured controlled vocabulary for the source of an enzyme comprising tissues, cell lines, cell types and cell cultures. It provides terms, classifications, and definitions of tissues, organs, anatomical structures, plant parts, cell cultures, cell types, and cell lines of organisms from all taxonomic groups (animals, plants, fungi, etc.) as enzyme sources (https://www.ebi.ac.uk/ols/ontologies/bto)

NCBI taxonomy

The NCBI Taxonomy Database is the top reference curated classification and nomenclature for all of the organisms in the public sequence databases (https://www.ncbi.nlm.nih.gov/taxonomy). This currently represents about 10% of the described species of life on the planet.

Databases of molecular data on the NCBI Web site include such examples as nucleotide sequences (GenBank), protein sequences, macromolecular structures, molecular variation, gene expression, and mapping data. They are designed to provide and encourage access within the scientific community to sources of current and comprehensive information. Therefore, NCBI itself places no restrictions on the use or distribution of the data contained therein.

Pathways

Pathways are collected from several open databases: Reactome and BKMS

Reactome pathway data

Reactome is a free, open-source, curated and peer-reviewed pathway database. It provides intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education.

BKMS pathway data

BKMS-react is an integrated and non-redundant biochemical reaction database containing known enzyme-catalyzed and spontaneous reactions. Biochemical reactions collected from BRENDA, KEGG, MetaCyc and SABIO-RK were matched and integrated by aligning substrates and products.

Enzyme classification

The enzyme classification is collected from Expasy-ENZYME database (https://enzyme.expasy.org). It is based on The Enzyme Commission number (EC number), which is a numerical classification scheme for enzymes, based on the chemical reactions they catalyze. In this database, all the hierarchies of enzyme (functional) classes are described to help user better interpret their data even when the exact enzyme classification is not well known.

See https://enzyme.expasy.org/enzyme-byclass.html for a detailed description of all enzyme classes.

The molecular database

BIOTA molecular data is a collection of compounds (metabolites), enzymes and other proteins, genes existing in living organisms. They are collected and structured from various open databases to accelerate the conception of digital twin of cell metabolisms.

Metabolic compounds

Data on metabolic compounds are collected from the ChEBI database. Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on small chemical compounds. The molecular entities in question are either natural products or synthetic products used to intervene in the processes of living organisms. Genome-encoded macromolecules (nucleic acids, proteins and peptides derived from proteins by cleavage) are not as a rule included in ChEBI (https://www.ebi.ac.uk/chebi/).

Enzymes

Enzyme data are based on the top reference European open databases BRENDA and Expasy. They give information on all known enzymes in living organisms on earth (classifications, gene sequences, thermodynamic parameters, etc).

BRENDA database

BRENDA is a major collection of enzyme functional data available to the scientific community (https://www.brenda-enzymes.org). Collected data are copyright-protected by Prof. Dr. D. Schomburg, Technische Universität Braunschweig, BRICS, Department of Bioinformatics and Biochemistry, Rebenring 56, 38106 Braunschweig, Germany.

BRENDA describes each type of characterized enzyme for which an Enzyme Commission (EC) number has been reported. In addition, several supplementary information on enzyme are provided such as

their tissue/cellular localization (see BRENDA Tissue Ontology),

their kinetic parameters and related environnemental conditions (KM, pH, etc.),

their cofactors,

genetic sequences (as FASTA data).

As EC numbers do not specify enzymes but enzyme-catalyzed reactions, a single reaction can be related to several EC Numbers.

Expasy database

Expasy is the bioinformatics resource portal of the SIB Swiss Institute of Bioinformatics (https://www.expasy.org). It provides access to over 160 databases and software tools, developed by SIB Groups and supporting a range of life science and clinical research domains, from genomics, proteomics and structural biology, to evolution and phylogeny, systems biology and medical chemistry. Expasy-ENZYME is a repository of information relative to the nomenclature of enzymes. It describes each type of characterized enzyme for which an EC (Enzyme Commission) number has been provided.

Enzyme orthologs (Enzo)

To our knowledge, the concept of Enzyme-Ortholog (Enzo) is not a standard concept in biology or bioinformatics. It is analogous to Kegg Orthologs and was introduced in BIOTA database to uniquely reference enzyme by their EC Number characteristics as provided in Expasy-ENZYME (https://www.expasy.org), whatever the living organism. Enzos suite well to characterize metabolic pathways and are good for fast reconstruction of metabolic models from functional enzyme data.

Metabolic reactions

Rhea is an expert-curated knowledgebase of chemical and transport reactions of biological interest (https://www.rhea-db.org/). It uses the chemical information coming from CheBI, UniProtKB, InChIKey, GO, etc. Rhea is linked to BRENDA and Expasy through enzyme EC Numbers.

Proteins

Protein data are gathered from UniProtKB (https://www.uniprot.org/). The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. As much annotation information as possible is added in addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information).

BIOTA contains the manually reviewed and annotated records of the UniProtKB (Swiss-Prot).

Notice

Gencovery Numerical Resources (GNR) refer to the software, librairies and data provided by us through our web services. GNR may be covered by third-party licenses. Gencovery guarantees that GNR are accessible for your commercial and non-commercial use through Gencovery web services. For ad-hoc use of GNR outside Gencovery web services, please check third-party licenses to ensure you are legally authorised. Gencovery does not warrant or assume any legal liability or responsibility for the accuracy, completeness of any information disclosed through Gencovery web services. This is not a legal notice. Please refer to our terms of use for any legal notice about our web services.

Have you developed a brick?