Introduction to

In this part of the workshop we will look into the steps to take to obtain a data set from the DiXA warehouse, and using to determine its quality, normalise it, and perform statistical analysis. is a collection of workflows and modules that can be easily accessed through the web. It can be used to process data sets from start to end, or to perform part of the workflow, uploading or downloading intermediate results.

Here we will use which contains several modules currently available. Use the username and password provided to you.

This morning, you have used a table of statistical results based on data from an experiment from the DiXa warehouse, in order to perform pathway and network analysis. In the previous session, you have seen how to obtain raw data and experimental annotations from the warehouse, and how to select a subset of a data set.

In this session you will start from the subset of liver samples at 15 days, including controls and high dose FP003SE samples (2×5 arrays in total), and obtain the table of statistical results that is needed for the pathway analysis and other analysis techniques.

The steps we are going to take are quality control (QC) and pre-processing of the raw data, and statistical analysis to determine the differentially expressed genes (DEGs) between the treated and control samples.

Before we are going to do this, there will be a short introductory presentation on (10 minutes).

The Quality Control and Pre-processing module for Affymetrix arrays

Since it takes a while to run, we have already run the QC workflow on the data set containing the 10 CEL files. First we will demonstrate on the screen how you would do this yourself (10 minutes). The results of this run will contain a pdf report with QC images, a zip files of these images, and a tab delimited text file of normalised data.

If you would like to run the analysis yourself later, the zip archive of CEL files is provided as well as the description file. The original ISA-TAB (description) file of the study that was used to compose the description file, can be found here.

The QC report

We will first focus on the QC report, which is provided here. Together with your neighbour, have a look at the report and discuss what you think of the images. Try to find out: (1) what the images mean, and (2) whether the image detects any deviations or not. Take 20 minutes to do this. You can find more information on the images on the site, clicking at the ‘Modules description’ tab at the top of the window.
After you have studies the report, we will take another 15 minutes to go through the report together and discuss your findings.

The normalised data table

Now have a look at the normalised data table, provided here (unzip it first and open it using a spreadsheet programme). What exactly does it contain?

The Statistics Module

This table can be uploaded to the statistics module. In case you have run the QC module yourself, you can click on the button to directly send the data to the statistics module. Now, you will enter the workflow from the statistics module, and upload the normalised data table. What else do you need to tell the module in this case?

Now run the statistics module yourself, uploading the (unzipped) table of normalised data and entering the experimental groups or uploading the description file. Note that in case you have run the statistics module from the QC module, also the annotation would be send directly to the statistics module. Also have a look at the outcome of the statistics module, and try to understand what the histograms, the p value tables, and the tables of DEGs show (20 minutes).

The final step of this part of the workshop, is to discuss the statistics outcome and how it could be used for further analyses (15 minutes).
Further analyses that can be done using the table of DEGs include:
* Pathway, Gene Ontologty, and Network analysis
* Clustering and Principal Component Analysis
* Classification analysis
* …

Very soon the Pathway Module of ArrayAnalysis will be launched (already available as a mock up), allowing application of pathway analysis in PathVisio (using PathVisio RPC) directly from the statistics outcome. Reference
User-friendly solutions for microarray quality control and pre-processing on
Eijssen LM, Jaillard M, Adriaens ME, Gaj S, de Groot PJ, Müller M, Evelo CT.
Nucleic Acids Res. 2013 Jul 1;41(W1):W71-W76. Epub 2013 Apr 24.
PMID: 23620278