Pathway Analysis – Hands on


The pathway analysis in PathVisio will show you how to interpret transcriptomics data in a biological meaningful context. You will use of the statistical analysed data from the paper “Gene Expression Patterns Induced by HPV-16 L1 Virus-Like Particles in Leukocytes from Vaccine Recipients“ by García-Piňeres and coworkers published in The Journal of Immunology in 2008 (see paper http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2701477/).

The questions in the following assignments are meant to guide you through the practical session. You will receive all the answers at the end of the course. Feel free to ask the instructors if you need help with any steps in the practical.

Make sure you went through the installation instructions (send by email last week) before you start.


Preparation

1. Unzip the human identifier mapping (ID) gene database.
Go to the pathway analysis folder and unzip “Hs_Derby_20130701″.  The Hs_Derby_20130701.bridge” file is needed to identify the genes in the pathway and the dataset.

2. Unzip the human pathway collection of WikiPathways.
Go to the pathway analysis folder and creat a new folder: Human_pathways. Thereafter unzip the “pathways-2015-09-03″ and extract the pathways in the newly created Human_pathways folder.


Assignment 1: Open pathway in PathVisio

  1. Open PathVisio (see installation instructions).
  2. Open the human Statin pathway in PathVisio.
    • Go to File → Open → Browse → Go to the pathway-analysis -> Human_pathways folder
    • Select Hs_Statin_Pathway_WP430_78268.gpml

Question 1A. Small numbers above data nodes, interactions or the info box in the top left of the pathway indicate publication references. Double click the info box in the top left (Title, Availability, Last modified, Organism) and go to the “Literature” tab.
What are the title and authors of the paper reference for this pathway?

Question 1B. Click on the DGAT1 gene in the top right.
With which identifier and database is this gene annotated? (Check the “Backpage” tab on the right side).

Load the identifier mapping database
1. Go to Data → Select Gene Database → Browse to Hs_Derby_20130701.bridge in
the pathway-analysis folder.
2. Check the status bar at the bottom to see if the gene database has been loaded.

Question 1C. Select the DGAT1 gene again and go to the “Backpage” tab.
Can you now also find the Ensembl identifier(s) for this gene? (Required for following steps!)


Assignment 2: Data import in PathVisio

Question 2A. Have a look at the statistical analysed data (dataset.txt in pathway-analysis folder). The first column contains the identifier of the genes. From which of the three database below are the identifiers in the dataset? (Required for following steps!)
⎕ Ensembl
⎕ Entrez Gene
⎕ OMIM

Import the data as described below. Go through the different dialogs and before you click finish, answer the questions at the end!

Description data import

  • Data → Import expression data
  • Select the vaccination dataset (dataset.txt in pathway-analysis folder) as the input file. Everything else should be filled in automatically (see Figure 2a).
  • In the next dialog, make sure the correct separators are used. You should see the different columns in the preview (see Figure 2b).
Figure2a

Figure 2a

Figure2b

Figure 2b

  • Important: in this step you need to select the column that contains the gene identifier and the database (system code) for the identifier. Select the database you chose in question A (note: if the database is wrong your identifiers will not be recognized correctly), see Figure 2c.
  • Now the data gets imported (see Figure 2d ). Before clicking finish answer the questions below:
Figure2c

Figure 2c

Figure2d

Figure 2d

Question 2B. How many rows were successfully imported?

Question 2C. How many identifiers were not recognized? What does that mean?
Important: if the number of rows is the same as the number of identifiers not recognized the data import was not done correctly – you probably didn’t select the correct database! Delete the created .pgex file and redo the import or ask one of the instructors for help. (Required for following steps!)

If you click finish, you should see a default visualization on the pathway (if all genes are gray, the data import was not successful). If there is no pathway open, you can check the status bar at the bottom where the dataset will be listed.


Assignment 3: Creating a basic visualization

Follow the instruction to create a basic visualization:
1. Go to Data → Visualization Options
2. Create a new visualization named “basic visualization”

Figure3a

3. Select “Expression as color” and “Text label”.
4. In “Expression as color” select “Basic”.
5. Check the checkbox before “logFC” and define a new color set.

Figure3b
6. Select “Gradient” and define a gradient from -1 over 0 to 1 (blue – white – red) → Click OK.

Figure3c

Question 3A. Make a screenshot of the pathway.
What do the colors in the pathway mean biologically? ( Hint : Check the “Legend” tab on the right side).

Question 3B. Select the HMGCR gene (top left), go to the “Data” tab.
What is the logFC of the HMGCR gene?


Assignment 4: Creating an advanced visualization

PathVisio also allows users to visualize multiple data columns together. For that we need to create a new advanced visualization.
1. Go to Data → Visualization Options
2. Create a new visualization named “advanced visualization”

Figure3a

3. Select “Expression as color” and “Text label”.
4. In “Expression as color” select “ Advanced ”.
5. Check the checkbox before “logFC” and define a new color set, see Figure 4a.
6. Select “Gradient” and define a gradient from -1 over 0 to 1 (blue – white – red) → Click OK.
7. Check the checkbox before “P.Value” and define a new color set.
8. At the bottom, click on “Add Rule”. Go to the text field next to “Rule Logic” and specify the following criteria: [P.Value] < 0.05. Then click on color and select a light green. Click OK and OK, see Figure 4b.

Figure4a

Figure 4a

Figure4b

Figure 4b

Question 4A. Make a screenshot of the pathway.
What do the colors in the different columns on the data nodes in the pathway mean biologically? (Hint : Check the “Legend” tab on the right side).

Question 4B. How many significant genes (P.Value < 0.05) are in the pathway?


Assignment 5: Perform pathway statistics

To identify pathways that might be affected after vaccination, you can perform pathway statistics to calculate Z-Scores for each pathway (check lecture!). PathVisio automatically ranks the pathways based on the Z-Score.
● Go to Data → Statistics
● First we need to define a criteria for differentially expressed genes. We are going to select those genes based on significant p-value but we are also going to make sure the change is high enough by specifying a logFC threshold:
([logFC] < -1 OR [logFC] > 1) AND [P.Value] < 0.05

Question 5A. Explain in your own words what this expression criteria means (which
genes will be selected)?
([logFC] < -1 OR [logFC] > 1) AND [P.Value] < 0.05

● Now we need to specify the pathway directory. In the pathway-analysis folder you created the directory: Human_pathways
Browse to this directory and select it.
● Then click on Calculate and wait for the result table.

Question 5B. What are the top 5 altered pathways and what are their Z-Scores?

Question 5C. How many genes of the dataset are in at least one pathway (N) and how many differentially expressed genes of the dataset are present in at least one pathway (R)? (Check “N and R” above the result table).

Question 5D. What is the pathway with the lowest Z-Score? What does a low Z-Score mean biologically? (ignore pathways with NaN)


Assignment 6: Export pathway figures

The pathway diagrams with or without data can exported as an image.
Open het highest ranked pathway after performing the pathway statistical analysis. Go to File -> Export and export the image as .png. Now you can use the pathway in presentations or papers.

Figure6a