Outline


Step 0: Dataset

We chose dataset 13 from the diXa data warehouse. To access this page you require the username and password that were handed out by the organizers.

Description
The study is a repeated dose 14-day toxicity study in adult male rats (Rattus norvegicus) using chemical compound FP003SE (compounds sponsored by pharmaceutical companies), administered daily orally. The study will examine the gene, protein and metabolite profiles, along with traditional toxicological endpoint information in the liver, kidney, blood and urine of rats 1-14 days after exposure.

Pre-processing
In this practical we will use the transcriptomic data provided for liver. Quality control, normalization and statistical analysis were performed with arrayanalysis.org and you will go through the steps this afternoon with Lars Eijssen.

Statistical analysis
The pre-processed data file (right click – save as) that we are providing contains the fold change, log fold change, p-value and adjusted p-value for the comparison between high dose vs. vehicle (control) group after 14 days of treatment.


Step 1: Pathways and identifier mapping databases

In addition to the experimental data file, you need two other types of files to use PathVisio:

  • Pathways: A set of files containing all pathways (*.gpml files)
  • Rat identifier mapping database: A species-specific identifier mapping database so PathVisio can take care of the identifier mapping step.

If you are working with one of the provided laptops, you can find those files in your home directory in PathVisio-Data. In case you want to download the data yourself:

  • Pathways: Download the “GPML → Analysis collection pathways → Rattus norvegicus” from Wikipathways
  • Identifier mapping database: Download the rat database Rn_Derby_20120602.zip from BridgeDb
  • Unzip both files

Note: If you want to use PathVisio on your own dataset after this course, you can find pathways and identifier mapping databases for other species on the PathVisio download site as well.


Step 2: Start PathVisio

On the provided laptops:

  • On your Desktop there is a link to the PathVisio 3.0.1 directory. Run the pathvisio.sh file to start PathVisio.

On your own laptop:

  • Now download PathVisio itself, i.e. go to http://www.pathvisio.org/downloads/, see Fig 1a.
  • Download the latest version of PathVisio (3.1.0)
  • If you are asked how to open the file, please select “Java Webstart”
  • If you see a dialog like in Fig. 1b, select the checkbox and click “Run”.
  • Now PathVisio will start all modules (Fig 1c), and the PathVisio main window will be opened (Fig. 1d)

 

Fig 1a: Go to PathVisio download page Fig 1b: Start PathVisio as a Java Web Start program Fig 1c: PathVisio will start all modules Fig 1d: PathVisio opens with an empty pathway view
screenshot1 screenshot2 screenshot3 screenshot4

Step 3: ArrayAnalysis.org results in PathVisio format

  • Download the PathVisio file here. This file contains the following information:
  1. ENSG_ID = Ensembl gene identifier
  2. logFC = the log2 foldchange when comparing the control group (no dose) with the high dose group after 15 days
  3. Fold Change = p-value when comparing the control group (no dose) with the high dose group after 15 days
  4. AveExpr = average expression values
  5. P.Value = p-value when comparing the control group (no dose) with the high dose group after 15 days
  6. adj.P.Val = fdr (false discovery rate) when comparing the control group (no dose) with the high dose group after 15 days

The file was obtained as a result of the statistical analysis module of ArrayAnalysis.org. You will go through the analysis pipeline with this dataset later today.


Step 4: Import the data into PathVisio

  • In the menu bar of PathVisio, click Data → Import expression data
  • Use the Browse buttons to locate the following files (see Fig 2a):
    • Input file: The experiment data file (comp_high-vecile.txt: see for a description step 3).
    • Output file: Will be filled in automatically after selecting the input file, you don’t need to change this.
    • Gene database: Use the identifier mapping database you downloaded in step 1 (Rn_Derby_20120602.bridge).
  • Click “Next”.
  • Make sure that tab is selected, because the columns in our data are delimited by tabs. Check the preview if it looks as you would expect.
  • Click “Next”.
  • Select the columns that contain the gene identifiers and identifier type. In our dataset we don’t have a system code column, so we have to select “Use the same system code for all rows“. Please select Ensembl and NOT Ensembl Rat, see Fig. 2b.
  • Click “Next”.
  • The data will now be imported into an expression dataset that is saved as a .pgex file on your harddisk. Any exceptions will be reported to the file .pgex.ex. No exceptions should occur for our dataset.
  • Click “Finish”.
  • An exception about old Ensembl identifiers might pop up. Please ignore this warning.

In the footbar of PathVisio you can see which identifier mapping databases and which dataset are loaded (see Fig 2d). To test if the dataset has been created correctly, open a pathway by clicking File->open, go to the directory where you extracted the pathway files (on the provided laptops you can find the pathways in your home directory in PathVisio-Data/pathways/), select a gpml file (e.g. Rn_Apoptosis_WP1290_67050.gpml) and click “Open”. The pathway diagram will now appear. There might be a default visualization like Fig 2e. If you click on a gene, the imported data will appear in the bottom of the backpage panel.

 

Fig 2a: Data → Import expression dataset → choose input file and gene database Fig 2b: Specify the identifer column and system code Fig 2c: Make sure that no exceptions occur when importing the data Fig 2d: In the footbar of PathVisio you can see which mapping databases and which dataset are loaded
screenshot8 screenshot9 screenshot10 screenshot11
Fig 2e: Open pathway. There might be a default visualization shown on the pathway elements.
screenshot13

Step 5a: Create a visualization by coloring gene expression

In PathVisio you can make different visualizations to color the genes in a pathway based on their measured data in the dataset.

Our dataset compares the vehicle vs. the high dose group in the liver experiment. We can color the log fold change on the pathway elements to see if a gene is over- or under expressed in the high dose group. To do this, we are going to use a gradient from blue to white to red (you can of course choose any of the other available colors as well).

  1. Go to Data → Visualization Options
  2. Create a new visualization by clicking the button in the top-right corner and select “New”, see Fig 3a.
  3. Specify a name for the visualization (e.g. “Liver transcriptomic data”)
  4. Check the box in front of “Expression as color” and the box in front of “Text label”.
  5. In the options panel below “Expression as color”, select the data columns that contain the logFC of the two groups, see 3b.
  6. Create a new color set by clicking the button to the right of the color set dropdown box and select “New” to create a new color set.
  7. The color set dialog will now open. Click the checkbox in front of “Gradient” to activate the color gradient.
  8. Select the gradient that runs from green to white to red (press arrow to get dropdownbox) and specify the limits in the boxes below the gradient, e.g. -2, 0, 2, see Fig 3c.
  9. Click “Ok” to apply the gradient and press “OK” again to close the color set dialog

Open the Apoptosis pathway (Rn_Apoptosis_WP1290_67050.gpml) out of the rat pathways you downloaded in step 1 (if you haven’t already).
The pathway will now look somewhat like Fig. 3d. The expression pattern between the two groups is visualized in the gene boxes with a color-coded visualization where blue is down-regulated and red is up-regulated.

To save the pathway with the data visualization Click on File -> Export. Here you can save the pathway in different formats so you can use it in presentations, like *.png.

Fig 3a: Create new visualization Fig 3b: Specify the basic expression as color visualization. Fig 3c: Specify a gradient from -2 to 2. Fig 3d: Resulting colored pathway image. Expression Data can be shown in the backpage below the cross references.
screenshot14 screenshot15 screenshot16 screenshot17

Step 5b: Advanced visualization of logFC and pValue

  1. Go to Data → Visualization Options
  2. Create a new visualization by clicking the button in the top-right corner and select “New”, see Fig 3a.
  3. Specify a name for the visualization (e.g. “Liver transcriptomic data advanced”)
  4. Check the box in front of “Expression as color” and the box in front of “Text label”.
  5. In the expression as color panel, select Advanced.
  6. For the logFC you can use the same gradient as in 5a, see Fig 4a.
  7. For the p-value we will define a color rule ([P.Value] < 0.05), see Fig 4b. Click on new color set. Click on "Add Rule" Specify rule logic and color. Then press "Ok".
  8. The pathway element are now split in two columns. The first column is the logFC gradient while the second column specifies if a measurement was significant or not (p-value < 0.05). In the legend tab on the right side, you can see which column in the pathway element represents what, see Fig 4c.
Fig 4a: Advanced visualization options Fig 4b: Define color rule. Fig 4c: Resulting pathway showing logFC and pValue.
screenshot18 screenshot19 screenshot20

Step 6: Search for regulated pathways

In the final step of this tutorial we are going to find out which pathways are enriched with regulated genes. We can then study these pathways and for example see whether they are influenced by the compound fed to the rats. These pathways might provide leads for further investigation of the biological implications.

To identify regulated pathways, we are going to use PathVisio to calculate a z-score for each pathway.

  1. Go to “Data->statistics”.
  2. The “Pathway Statistics” dialog will open (Fig 5a)
  3. In the text field below “Expression:”, type “([logFC] < -1.2 OR [logFC] > 1.2) AND [P.Value] < 0.05" (without the quotes). This expression defines which genes are significantly changed (up or down) in gene expression in the high dose treated animals.
  4. In the text field below “Pathway Directory:”, fill in the directory where the pathway (gpml) files are located (see step 1). You can also use the “Browse” button to locate and select the directory.
  5. Click the “Calculate” button. You should see a progress dialog titled “Calculate Z-scores”.
  6. After a few minutes, the analysis should be finished and you will see a list of pathways appear in the dialog, see Fig 5a.
  7. If you click on a pathway in the list, it will be opened. You can then apply the visualization created in the previous section to study the gene expression profiles and find out if any of the genes were changed in the dataset, see Fig 5b.
  8. Save the list of pathways by clicking on the “Save results” button. You can open the statistical result then in Excel.
Fig 5a: Statistics dialog Fig 5b: Click a row in the result list to open the pathway.
screenshot21 screenshot22

Design your own pathway

If you finished the first part and still have time left, please continue with this tutorial.

WikiPathways was established to facilitate the contribution and maintenance of pathway information by the biology community. WikiPathways is an open, collaborative platform dedicated to the curation of biological pathways. WikiPathways thus presents a new model for pathway databases that enhances and complements ongoing efforts, such as KEGG, Reactome and Pathway Commons. Building on the same MediaWiki software that powers Wikipedia, we added a custom graphical pathway editing tool and integrated databases covering major gene, protein, and small-molecule systems. The familiar web-based format of WikiPathways greatly reduces the barrier to participate in pathway curation. More importantly, the open, public approach of WikiPathways allows for broader participation by the entire community, ranging from students to senior experts in each field. This approach also shifts the bulk of peer review, editorial curation, and maintenance to the community.

We are using the circadian clock pathway as an example in the tutorial. Please follow the 11 steps on the tutorials page.