How to visualise a KEGG pathway using Constellab?

Mar 12, 2024

Co-authors : 
DA
Djomangan Adama O

Introduction


KEGG (Kyoto Encyclopedia of Genes and Genomes) is a valuable resource that provides detailed information on metabolic pathways, genes, proteins, diseases and many other aspects of molecular biology. KEGG data visualisation allows researchers to visually understand the complex interactions between different biological elements. Using interactive diagrams and maps, KEGG visualization provides a clear and informative perspective for exploring and interpreting biological data, making it a valuable tool for life science research.


Let show how to view genes mapped on KEGG pathways using Constellab.


In the second part of this story we will see some usefull live task to help you to manipulate the visualisation of KEGG pathway.



Task KEGG Visualisation


Prerequesites


You will need a File containing the genes to study with a header. If you have the fold change for the gene expression between two conditions for these genes, then add them in the following columns.


Steps to follow


  1. Upload a File with the genes and the fold change, if applicable.
    1. Link it to the task available in the Task
      1. Fill parameters
        1. Run your experiment
          1. Get your coloured KEGG pathways!





            Be aware that this task can take some time, especially the first time, as a virtual environment has to be installed, and also depending on the length of the genes provided, it can take more time. 


            Results


            In output you will get a set of pathways with the genes mapped. If you have provided the fold change, each box of the pathway will be separated depending on the number of fold changes.




            If you want to have other visualisation, we provide below some useful live task. 


            Visualise easily a KEGG pathway in Constellab using R Live Task


            The following figure shows a live task executed in a Conda environment



            Conda environment


            name: .venv
            channels:
              - conda-forge
            dependencies:
              - python=3.8 
              - requests==2.28.2

            Live code


            # This is a snippet template for a Python live task.
            import requests
            import os
            def get_kegg_pathway_id_by_name(pathway_name):
                try:
                    # Make the request to search for pathway entries by name
                    response = requests.get(f"http://rest.kegg.jp/find/pathway/{pathway_name}")
                    if response.status_code == 200:
                        # Parse the response to extract the pathway ID
                        lines = response.text.strip().split("\n")
                        if lines:
                            pathway_entry = lines[0].split("\t")
                            if len(pathway_entry) == 2:
                                pathway_id = pathway_entry[0].split(":")[1]
                                return pathway_id
                    print(f"Pathway with name '{pathway_name}' not found.")
                    return None
                except requests.exceptions.RequestException as e:
                    print(f"Error fetching pathway data: {e}")
                    return None
                  
            def download_kegg_pathway_image(pathway_id, save_path):
                try:
                    # Make the request to fetch the image data
                    response = requests.get(f"http://rest.kegg.jp/get/{pathway_id}/image")
                    if response.status_code == 200:
                        # Save the image to the specified path
                        with open(save_path, "wb") as f:
                            f.write(response.content)
                        print(f"Pathway image saved to {save_path}")
                    else:
                        print(f"Error fetching pathway image. Status code: {response.status_code}")
                except requests.exceptions.RequestException as e:
                    print(f"Error fetching pathway image: {e}")
            if(pathway_id is None):
              pathway_id = get_kegg_pathway_id_by_name(pathway_name)
            if pathway_id:
                print(f"The KEGG ID for '{pathway_name}' is: {pathway_id}")
            else:
                print("Pathway not found or error occurred.")
            save_path = pathway_id + "_pathway_image.png"
            download_kegg_pathway_image(pathway_id, save_path)
            target_paths = [save_path]

            Parameters


            pathway_id = None  # Replace with the pathway ID you want to explore (map00010 for example), None if not know
            pathway_name ="Glycolysis"#if you don't know pathway ID, fill this with the name

            Enter the id of the pathway you want to see or the name of the pathway and run the experiment: you'll get an image with the pathway entered!


            Results


            If you follow the previous steps you will obtain this image:



            Nice, isn't it? Now you've got a dataset and you want to visualise the genes that are differentially expressed on a KEGG pathway? Let's get started!


            Visualise a KEGG pathway coloured in Constellab using R Live Task


            We're going to create a live task using R's pathview package.


            If you want more information about the package pathview, here is the documentation.



            Input


            Place your dataset as input, with the columns :


            - the identifier of the genes studied


            - the expression value of the gene in the control condition


            - the expression value of the gene in the wild type condition


            Here, we will use data from this dataset. It's rat gene expression data are available for 2 conditions. If you want to reproduce this tutorial, download the "expression values across all genes" file and delete the text header and the "Gene ID" column.


            Conda environment


            We will use the package pathview so we need to add it. We also add 'org.Rn.eg.db' which is a genome-wide annotation for rat, so you will need to update this if you do not have rat data.


            name: .venv
            channels:
              - conda-forge
              - bioconda
            dependencies:
              - r-base
              - bioconductor-pathview
              - bioconductor-org.Rn.eg.db

            Live code


            The code is as follows. It consists of calculating the log2 of the fold change of our dataset and giving the function pathview a dataframe with only these values and with gene id in the index.


            # Load required packages
            library(pathview)
            # Read the source csv file with header, row names and comma separator
            data <- read.csv(source_paths[1], header = TRUE, sep = ",")
            #select data
            data_pathview = data
            # Compute fold_change
            fold_change <- data_pathview[column_wild]/data_pathview[column_control]
            # Compute log2 of the fold change
            log2_fold_change <- log2(fold_change)
            # Add a column to the dataframe with the value of log2(FC)
            data_pathview <- cbind(data_pathview, log2_fold_change)
            colnames(data_pathview)[ncol(data_pathview)] <- "log2_fold_change"
            # Only keep the necessary columns 
            data_pathview = data_pathview[,c(column_gene_id,"log2_fold_change")]
            data_pathview <- na.omit(data_pathview)
            # Set the gene id as index
            row.names(data_pathview) <- data_pathview[,column_gene_id]
            data_pathview[,column_gene_id] <- NULL
            # Use pathview function 
            pv.out <- pathview(gene.data = data_pathview, cpd.data = NULL, gene.idtype = type_gene_id, pathway.id = pathway_id, species = specie)
            # Write the csv file and the image into the result folder
            result_path <- "result.csv"
            write.csv(data_pathview, file = result_path, col.names = TRUE,row.names = TRUE)
            image_path <- paste0(specie,pathway_id,".pathview.png")
            target_paths <- c(result_path, image_path)


            Parameters


            specie = "rno"
            pathway_id = "00010"
            type_gene_id = "symbol"
            column_gene_id = "Gene_Name"
            column_control = "native"
            column_wild = "post_ex_vivo_lung_perfusion"

            Indicate the species studied, then the identifier of the pathway you wish to see. Also indicate the identifier type of your genes (by default it's "entrez" for entrez gene id) and finally the names of the columns of your dataframe.


            In this case, there are rat data so we put "rno", we want to see the glycolysis pathway ("00010"). Finally, genes are encoded by their symbol.


            Results


            With this udpated code, you can obtain a image like this:



            The EC numbers colored in red indicates that the log2 of the fold change between the two conditions is near 1 so the gene is over-expressed in the wild-type condition compared with the control condition.


            Conversely, if it is coloured green, this means that the gene is underexpressed in the wild-type condition compared with the control condition.



            Conclusion


            So this tutorial has allowed you to see how to visualise the KEGG pathway in Constellab, either by using a Task either by using the Live Rask to make your own pipeline. These visualisations will certainly help you to analyse your results after a differential gene expression analysis, for example.