How to visualise a KEGG pathway using Constellab?
Introduction
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a valuable resource that provides detailed information on metabolic pathways, genes, proteins, diseases and many other aspects of molecular biology. KEGG data visualisation allows researchers to visually understand the complex interactions between different biological elements. Using interactive diagrams and maps, KEGG visualization provides a clear and informative perspective for exploring and interpreting biological data, making it a valuable tool for life science research.
Let show how to view a KEGG pathway using Constellab.
In the second part of this story we will see how to visualise the KEGG pathway in Constellab, but now with colours depending on gene expression.
For sake of simplicity, we will use a live code in a live task. The presented example could be integrated later in a custom brick for distribution. This tutorial is made to help scientists willing to use KEEG maps into Constellab.
LICENCE
KEGG is a database resource for understanding high-level functions and utilities of the biological system. Kanehisa Laboratories owns and controls the rights to KEGG. Although the KEGG database is made freely available for academic use via the website, it is not in the public domain. All commercial use of KEGG requires a license. Please ensure that you have licence to use KEGG database.
Visualise easily a KEGG pathway in Constellab
The following figure shows a live task executed in a Conda environment
Conda environment
name: .venv channels: - conda-forge dependencies: - python=3.8 - requests==2.28.2
Live code
# This is a snippet template for a Python live task. import requests import os def get_kegg_pathway_id_by_name(pathway_name): try: # Make the request to search for pathway entries by name response = requests.get(f"http://rest.kegg.jp/find/pathway/{pathway_name}") if response.status_code == 200: # Parse the response to extract the pathway ID lines = response.text.strip().split("\n") if lines: pathway_entry = lines[0].split("\t") if len(pathway_entry) == 2: pathway_id = pathway_entry[0].split(":")[1] return pathway_id print(f"Pathway with name '{pathway_name}' not found.") return None except requests.exceptions.RequestException as e: print(f"Error fetching pathway data: {e}") return None def download_kegg_pathway_image(pathway_id, save_path): try: # Make the request to fetch the image data response = requests.get(f"http://rest.kegg.jp/get/{pathway_id}/image") if response.status_code == 200: # Save the image to the specified path with open(save_path, "wb") as f: f.write(response.content) print(f"Pathway image saved to {save_path}") else: print(f"Error fetching pathway image. Status code: {response.status_code}") except requests.exceptions.RequestException as e: print(f"Error fetching pathway image: {e}") if(pathway_id is None): pathway_id = get_kegg_pathway_id_by_name(pathway_name) if pathway_id: print(f"The KEGG ID for '{pathway_name}' is: {pathway_id}") else: print("Pathway not found or error occurred.") save_path = pathway_id + "_pathway_image.png" download_kegg_pathway_image(pathway_id, save_path) target_paths = [save_path]
Parameters
pathway_id = None # Replace with the pathway ID you want to explore (map00010 for example), None if not know pathway_name ="Glycolysis"#if you don't know pathway ID, fill this with the name
Enter the id of the pathway you want to see or the name of the pathway and run the experiment: you'll get an image with the pathway entered!
Results
If you follow the previous steps you will obtain this image:
Nice, isn't it? Now you've got a dataset and you want to visualise the genes that are differentially expressed on a KEGG pathway? Let's get started!
Visualise a KEGG pathway coloured in Constellab
We're going to create a live task using R's pathview package.
If you want more information about the package pathview, here is the documentation.
Input
Place your dataset as input, with the columns :
- the identifier of the genes studied
- the expression value of the gene in the control condition
- the expression value of the gene in the wild type condition
Here, we will use data from this dataset. It's rat gene expression data are available for 2 conditions. If you want to reproduce this tutorial, download the "expression values across all genes" file and delete the text header and the "Gene ID" column.
Conda environment
We will use the package pathview so we need to add it in the dependencies. We also add 'org.Rn.eg.db' which is a genome-wide annotation for rat, so you will need to update this if you do not have rat data.
name: .venv channels: - conda-forge - bioconda dependencies: - r-base - bioconductor-pathview - bioconductor-org.Rn.eg.db
Live code
The code is as follows. It consists of calculating the log2 of the fold change of our dataset and giving the function pathview a dataframe with only these values and with gene id in the index.
# Load required packages library(pathview) # Read the source csv file with header, row names and comma separator data <- read.csv(source_paths[1], header = TRUE, sep = ",") #select data data_pathview = data # Compute fold_change fold_change <- data_pathview[column_wild]/data_pathview[column_control] # Compute log2 of the fold change log2_fold_change <- log2(fold_change) # Add a column to the dataframe with the value of log2(FC) data_pathview <- cbind(data_pathview, log2_fold_change) colnames(data_pathview)[ncol(data_pathview)] <- "log2_fold_change" # Only keep the necessary columns data_pathview = data_pathview[,c(column_gene_id,"log2_fold_change")] data_pathview <- na.omit(data_pathview) # Set the gene id as index row.names(data_pathview) <- data_pathview[,column_gene_id] data_pathview[,column_gene_id] <- NULL # Use pathview function pv.out <- pathview(gene.data = data_pathview, cpd.data = NULL, gene.idtype = type_gene_id, pathway.id = pathway_id, species = specie) # Write the csv file and the image into the result folder result_path <- "result.csv" write.csv(data_pathview, file = result_path, col.names = TRUE,row.names = TRUE) image_path <- paste0(specie,pathway_id,".pathview.png") target_paths <- c(result_path, image_path)
If you want to compare more than one fold change, you only need to specify one dataframe with all fold changes and keep the gene IDs in the index. Also, the output will be ".pathview.multi.png".
Parameters
specie = "rno" pathway_id = "00010" type_gene_id = "symbol" column_gene_id = "Gene_Name" column_control = "native" column_wild = "post_ex_vivo_lung_perfusion"
Indicate the species studied, then the identifier of the pathway you wish to see. Also indicate the identifier type of your genes (by default it's "entrez" for entrez gene id) and finally the names of the columns of your dataframe.
In this case, there are rat data so we put "rno", we want to see the glycolysis pathway ("00010"). Finally, genes are encoded by their symbol.
Results
With this udpated code, you can obtain a image like this:
The EC numbers colored in red indicates that the log2 of the fold change between the two conditions is near 1 so the gene is over-expressed in the wild-type condition compared with the control condition.
Conversely, if it is coloured green, this means that the gene is underexpressed in the wild-type condition compared with the control condition.
This demonstration was done using task from the brick gws_core in version 0.5.13.