Login

How to visualise a KEGG pathway using Constellab?

MB
Maëva Beugin
May 11, 2023, 2:42 PM

Co-authors : 
AO
Adama OUATTARA

Introduction

KEGG (Kyoto Encyclopedia of Genes and Genomes) is a valuable resource that provides detailed information on metabolic pathways, genes, proteins, diseases and many other aspects of molecular biology. KEGG data visualisation allows researchers to visually understand the complex interactions between different biological elements. Using interactive diagrams and maps, KEGG visualization provides a clear and informative perspective for exploring and interpreting biological data, making it a valuable tool for life science research.
Let show how to view a KEGG pathway using Constellab.
In the second part of this story we will see how to visualise the KEGG pathway in Constellab, but now with colours depending on gene expression.
For sake of simplicity, we will use a live code in a live task. The presented example could be integrated later in a custom brick for distribution. This tutorial is made to help scientists willing to use KEEG maps into Constellab.
LICENCE
KEGG is a database resource for understanding high-level functions and utilities of the biological system. Kanehisa Laboratories owns and controls the rights to KEGG. Although the KEGG database is made freely available for academic use via the website, it is not in the public domain. All commercial use of KEGG requires a license. Please ensure that you have licence to use KEGG database.

Visualise easily a KEGG pathway in Constellab

The following figure shows a live task executed in a Conda environment

Conda environment

name: .venv
channels:
  - conda-forge
dependencies:
  - python=3.8 
  - requests==2.28.2

Live code

# This is a snippet template for a Python live task.
import requests
import os


def get_kegg_pathway_id_by_name(pathway_name):
    try:
        # Make the request to search for pathway entries by name
        response = requests.get(f"http://rest.kegg.jp/find/pathway/{pathway_name}")


        if response.status_code == 200:
            # Parse the response to extract the pathway ID
            lines = response.text.strip().split("\n")
            if lines:
                pathway_entry = lines[0].split("\t")
                if len(pathway_entry) == 2:
                    pathway_id = pathway_entry[0].split(":")[1]
                    return pathway_id


        print(f"Pathway with name '{pathway_name}' not found.")
        return None


    except requests.exceptions.RequestException as e:
        print(f"Error fetching pathway data: {e}")
        return None
      
def download_kegg_pathway_image(pathway_id, save_path):
    try:
        # Make the request to fetch the image data
        response = requests.get(f"http://rest.kegg.jp/get/{pathway_id}/image")


        if response.status_code == 200:
            # Save the image to the specified path
            with open(save_path, "wb") as f:
                f.write(response.content)
            print(f"Pathway image saved to {save_path}")
        else:
            print(f"Error fetching pathway image. Status code: {response.status_code}")


    except requests.exceptions.RequestException as e:
        print(f"Error fetching pathway image: {e}")


if(pathway_id is None):
  pathway_id = get_kegg_pathway_id_by_name(pathway_name)




if pathway_id:
    print(f"The KEGG ID for '{pathway_name}' is: {pathway_id}")
else:
    print("Pathway not found or error occurred.")




save_path = pathway_id + "_pathway_image.png"


download_kegg_pathway_image(pathway_id, save_path)
target_paths = [save_path]

Parameters

pathway_id = None  # Replace with the pathway ID you want to explore (map00010 for example), None if not know
pathway_name ="Glycolysis"#if you don't know pathway ID, fill this with the name

Enter the id of the pathway you want to see or the name of the pathway and run the experiment: you'll get an image with the pathway entered!

Results

If you follow the previous steps you will obtain this image:
Nice, isn't it? Now you've got a dataset and you want to visualise the genes that are differentially expressed on a KEGG pathway? Let's get started!

Visualise a KEGG pathway coloured in Constellab

We're going to create a live task using R's pathview package.
If you want more information about the package pathview, here is the documentation.

Input

Place your dataset as input, with the columns :
- the identifier of the genes studied
- the expression value of the gene in the control condition
- the expression value of the gene in the wild type condition
Here, we will use data from this dataset. It's rat gene expression data are available for 2 conditions. If you want to reproduce this tutorial, download the "expression values across all genes" file and delete the text header and the "Gene ID" column.

Conda environment

We will use the package pathview so we need to add it in the dependencies. We also add 'org.Rn.eg.db' which is a genome-wide annotation for rat, so you will need to update this if you do not have rat data.

name: .venv
channels:
  - conda-forge
  - bioconda
dependencies:
  - r-base
  - bioconductor-pathview
  - bioconductor-org.Rn.eg.db

Live code

The code is as follows. It consists of calculating the log2 of the fold change of our dataset and giving the function pathview a dataframe with only these values and with gene id in the index.

# Load required packages
library(pathview)


# Read the source csv file with header, row names and comma separator
data <- read.csv(source_paths[1], header = TRUE, sep = ",")
#select data
data_pathview = data


# Compute fold_change
fold_change <- data_pathview[column_wild]/data_pathview[column_control]
# Compute log2 of the fold change
log2_fold_change <- log2(fold_change)


# Add a column to the dataframe with the value of log2(FC)
data_pathview <- cbind(data_pathview, log2_fold_change)
colnames(data_pathview)[ncol(data_pathview)] <- "log2_fold_change"


# Only keep the necessary columns 
data_pathview = data_pathview[,c(column_gene_id,"log2_fold_change")]
data_pathview <- na.omit(data_pathview)
# Set the gene id as index
row.names(data_pathview) <- data_pathview[,column_gene_id]
data_pathview[,column_gene_id] <- NULL


# Use pathview function 
pv.out <- pathview(gene.data = data_pathview, cpd.data = NULL, gene.idtype = type_gene_id, pathway.id = pathway_id, species = specie)


# Write the csv file and the image into the result folder
result_path <- "result.csv"
write.csv(data_pathview, file = result_path, col.names = TRUE,row.names = TRUE)
image_path <- paste0(specie,pathway_id,".pathview.png")
target_paths <- c(result_path, image_path)

If you want to compare more than one fold change, you only need to specify one dataframe with all fold changes and keep the gene IDs in the index. Also, the output will be ".pathview.multi.png".

Parameters

specie = "rno"
pathway_id = "00010"
type_gene_id = "symbol"
column_gene_id = "Gene_Name"
column_control = "native"
column_wild = "post_ex_vivo_lung_perfusion"

Indicate the species studied, then the identifier of the pathway you wish to see. Also indicate the identifier type of your genes (by default it's "entrez" for entrez gene id) and finally the names of the columns of your dataframe.
In this case, there are rat data so we put "rno", we want to see the glycolysis pathway ("00010"). Finally, genes are encoded by their symbol.

Results

With this udpated code, you can obtain a image like this:
The EC numbers colored in red indicates that the log2 of the fold change between the two conditions is near 1 so the gene is over-expressed in the wild-type condition compared with the control condition.
Conversely, if it is coloured green, this means that the gene is underexpressed in the wild-type condition compared with the control condition.
This demonstration was done using task from the brick gws_core in version 0.5.13.