In this network analysis tutorial, you are going to:
- Learn how to use Cytoscape
- Use a gene list and find relevant interactions
- Analyze networks based on the network properties like degree or betweenness
- Visualize a data set on a network
- Open a WikiPathways pathway in Cytoscape and extend the pathway with regulatory information
- Exercise 1: Cytoscape basics
- Exercise 2: Interaction network from GeneMANIA
- Exercise 3: Extend biological process with regulatory interactions
If you haven’t installed Cytoscape 3.1.0 yet, you can find the installers for all operating systems on the USB stick (ICSB2013Workshop/Cytoscape-3.1.0). In the directory ICSB2013Workshop/CytoscapeApps you can find all apps that we are going to use in this practical (you can also install them from the AppManager – will be explained in Exercise 1).
Exercise 1: Cytoscape basics
Cytoscape is an open source software tool for integrating, visualizing, and analyzing data in the context of networks. This exercise covers:
- Network loading
- Importing data
- Visualizing data using Visual Styles
- App manager
Loading a Simple Network
- Go to File → Import → Network → File…
- You should see a file chooser dialog.
- In the Cytoscape 3 directory, open the sampleData folder and select galFiltered.sif and then click on Open.
You should see the following dialog:
In the dialog, you can select to add the new network to an existing network collection or to create a new one. You can also select which column should be used for mapping between networks.
Manipulating Your Network
Now that you have a network loaded, you can interact with it in a number of ways:
- Start by clicking on the node at the upper left corner of the network. The node with turn yellow. If you hold your mouse down over the node and drag it around the node will move on the screen.
- Now add another node to the selection by holding down the Shift key and clicking on a node. Note that both nodes are now selected (yellow). Again, move the nodes around. Note that both nodes will move.
- To select a group of nodes, hold the mouse down in the upper left-hand corner and drag your mouse over a region of the network. Again, a group of nodes will be selected and can be moved around on the screen.
- To zoom in on the selected nodes, click on the icon.
- To move the window around the network, you can either use the middle mouse button, or drag the small window outlined in blue around in the Network Overview Pane.
- Finally, zoom your network out by clicking on the icon.
While useful, hand selecting nodes in dense networks can be error-prone and difficult. However, you can specifically search for a node by name or attribute:
- In the Search input field on the toolbar, type in ynr050c. This will select that node and zoom the display to focus on it.
The search interface will also allow you to select nodes by other attributes, but first, we need to import more attributes…
Visualizing Data on Networks
Cytoscape provides a number of features to load arbitrary data and visualize that data by mapping the attribute values to visual styles.
Importing and Exploring Your Data
Cytoscape can read file structures that are delimited Text or Excel files.
- Go to File → Import → Table → File…
- You will see a file chooser dialog. Select the galExpData.csv from the sampleData folder and then click on Open.
You should see the following dialog:
- In the Select a Network Collection section, choose the network to import the attributes to, and the column used as key to map to.
- Next, under Import Data As, select Node Table Columns.
Notice that the data preview shows data from all columns as one field. To correct this, we need to change which delimiter is used:
- In the Advanced section, select Show Text File Import Options
- In the Delimiter field, deselect Tab and instead select Comma.
- In the Attribute Names section, select Transfer first line attribute names
- Click OK to import.
You’ve successfully loaded data!
Now explore the Table Panel to confirm the mapping of the data to the network.
- Locate the Show Column button in the Table Panel.
- Select the data attributes: gal1RGexp, gal4RGexp, and gal80Rexp.
- Return to the Table Panel by clicking away from the attribute list
- Select some nodes in the network (ctrl-A selects all nodes) and see the associated data in the Table Panel
Note that the default setting is to only show data from selected nodes in the Table Panel. To instead display data from the entire network, click the button in the Table Panel and select Show all. You now have access to your networked data and can begin playing with visualization.
- Go to Layout → Prefuse Force Directed Layout.
- Go to the Visual Styles tab in the Control Panel and select Sample1 in the Current Visual Style drop-down.
Visual Styles lists a set of default visual styles. In the next section you will customize a visual style to highlight your data values. Details on editing and creating Visual Styles are in the next section.
So far, we have selected a couple of attributes to display in the Data Panel. Next, we will explore attributes and the Data Panel in more depth.
- Type mcm1 in the Search: box. This will select the node:YMR043W, and display the attributes for that node in the Data Panel.
- We’re going add a new attribute for MCM1. Click on the Create New Column icon and select New Single Column → String.
- Type in pdb for the name of the attribute — this will define a new string attribute for nodes, and add it to the Table Panel.
- Now click into the empty cell for newly-created pdb attribute for YMR043W and type 1mnm, which is the PDB ID for the yeast protein mcm1. You need to hit Return or Tab to enter that data.
- Move the pdb attribute to be the second column by dragging the column header to be behind the name column.
- Finally, select a number of nodes and note that the attributes for all of the nodes are shown in the Table Panel.
- By clicking on the column header, you can sort the columns. Clicking again changes the order of the sort.
Visualizing Data with Visual Styles
- Go to File → Open.
- You should see the Open a Session File Dialog.
- Open the sampleData folder and select galFiltered.cys and then click on Open (click ‘Yes’ to losing current session).
Notice how the galFiltered visual style maps multiple data and annotation values to:
- Edge Stroke Color
- Edge Width
- Node Fill Color
- Node Label
- Node Size
Modifying the Visual Style
Customizing the way you visualize and manipulate networks is a key function of Cytoscape. This is achieved through the use of Visual Styles:
The Visual Styles interface is divided into 3 tabs, for Node, Edge and Network visual properties. For each property, there are 3 columns describing the current Default (meaning the default value for the current mapper), Mapping and any Bypass. A pre-defined set of properties is displayed by default. Additional properties for display can be selected by clicking the Add visual properties drop-down at the top of the list of properties. The Visual Styles interface also lists any pre-defined visual styles in the Current Visual Style drop-down at the top of the interface. In the below figures, node fill color is mapped using a continuous mapping (Cm), with grey as the default color, and the currently selected node has a visual style bypass defined (yellow).
- To start, select the Visual Styles tab on the Control Panel.
- Under the Node tab, find Fill Color and click on the arrow on the right side to expand.
- Click on the value ‘gal1RGexp’ to select an alternative data value to map to node color: select ‘gal80Rexp’. Notice the changes in the network display.
- Click on the gradient color mapping to open the Continuous Mapping Editor.
- Control the values and colors of the mapping by means of triangular handles and endpoint markers.
- Double-click on any handle or marker to change its color: change green to blue and change red to yellow. Notice the immediate changes to the network.
- Click-and-drag any handle to slide its value between the min and max.
- To save your changes, close the editor.
Creating a Visual Style
- In the Current Visual Style section, start a new visual style by clicking the Options… button and selecting Create new Visual Style.
- Enter a name for your custom visual style and click OK.
- Find Fill Color under the Node tab and click the right-side arrow to expand.
- For Column, click on — select value — and select the value ‘gal1RGexp’ to map expression fold values. For Mapping Type select ‘Continuous Mapping’. Click on the gradient to open the gradient editor:
- Click on Min/Max button and set to -1 and 1, respectively.
- Double-click on gradient handles to set colors, e.g.
- below -1.0 = green
- -0.8 = green
- 0.8 = red
- above 1.0 = red
- Click Add to add another gradient handle. Drag to center at 0.0. Leave as white color.
- Close gradient dialog.
- Next, we want to edit the visual properties for Node Size.
- Under the Node tab, expand Size and select ‘gal1RGsig’ to map p-values. For Mapping Type select ‘Continuous Mapping’. Click on gradient to open dialog:
- Note: We want smaller p-values (more significant) to show as larger nodes
- Click Min/Max and set the values accordingly: 1.0E-8 and 1.0E-3.
- Double-click on solid red box for p-values below the minimum (far left) to set the maximum node size. Set to 70.0.
- Double-click on solid red box for p-values below the maximum (far right) to set the minimum node size. Set to 20.0.
- Set the left open box to Node Size 70.0 and the right open box to 20.0. This defines the range of the gradient.
- Click ‘Add’ to add a new gradient handle. Set it to Attribute Value = 1E-4 and Node Size = 40.0.
- Click ‘OK’ in the gradient dialog and explore the visualization you have created!
This creates a pseudo-exponential gradient mapping:
- Zoom by selecting an area and clicking
- Use the bird’s-eye-view panel (bottom-left) to pan around network
- Try switching the Visual Style by switching mappings to another column of data:
- For Node Color select ‘gal4RG’.
- For Node Size select ‘gal4RGsig’.
Notice the change in the view of the network! You can reuse the mappings across multiple data sets. Now is a good time to save your Cytoscape session, to save the visual style you’ve created.
Laying Out Your Network
A network layout is a process that positions the nodes and edges for the network. There are a large variety of layouts in Cytoscape and plugins might add new layouts. All of the layouts will appear under theLayouts menu. In this section, we will explore some of the layouts which are the core layouts supported by the Cytoscape team. Most layouts support the ability to only layout a portion of the network, and most expose parameters that can be used to tune the layout algorithm.
The App Manager allows users to quickly and conveniently add extra features to Cytoscape directly from within Cytoscape, eliminating the need for manual searches through different websites to install and update apps.
If you do not have Internet access enabled, you will not see the list of available apps or be able to automatically update existing ones; however, you will still be able to view and delete previously installed apps.
- Go to Apps → App Manager. The App Manager has three tabs: Install from App Store, Currently Installed and Check for Updates.
- Currently Installed folder lists any apps already installed. To find out more about a specific app, click on its name to display some basic information on the bottom of the window.
- Install from App Store tab displays an interface for interacting with the Cytoscape App Store.
- To install PathExplorer: Search for “paths”. Select PathExplorer from the search results.
- Click on the Install button at the bottom of the window.
- Click on close to exit the App Manager.
- Right click anywhere in the network and PathExplorer items should appear in the context menu. You’ll use this in a later exercise!
- (optional) Install from App Store
- Open apps.cytoscape.org.
- Search for any app of interest (or select on one of the Featured Apps, if nothing comes to mind).
- Notice that you can Install from here (if you have Cytoscape 3 running). You can also Update (if outdated) and Download (if not running).
Go ahead and install these apps for the next exercise:
- PathExplorer: Finds paths, filters them based on node and edge attributes and saves them.
- WikiPathways: Web service client and GPML file format importer
- CyTargetLinker: Extends biological networks with regulatory interactions.
Exercise 2: Interaction network from GeneMANIA
In the fenofibrate treatment data set, we identified 16 significantly up-regulated genes (logFC > 1.2 and p-value < 0.05).
We are going to use the GeneMANIA website to find interactions between those and the most related genes.
Note: There is a GeneMANIA app for Cytoscape but it requires the download of a huge data file, so we will use the website during the workshop. But the same steps that we do now on the website can be repeated in the app if the data file is installed.
- Go to http://www.genemania.org.
- Select R. norvegicus (rat) as a species and paste the 16 gene ids in the gene input (see Fig. 2a).
- In the advanced settings, specify at the bottom that we want to add 50 related genes (see Fig. 2a).
- Click on Go
- The resulting network is displayed in a Cytoscape.js viewer. This is a web version of the Cytoscape tool (see Fig. 2b).
- Most of the nodes are connected by Co-expression edges. There is no pathway information available in GeneMANIA for those genes in Rat.
- If you click on the Functions tab on the right side, you can see overrepresented GO terms. Those terms confirm our pathway analysis. The data set is definitely involved in the fatty acid metabolic and related processes (see Fig. 2c). You can highlight the genes in the network which are annotated with a GO terms by clicking on the GO term.
|Fig 2a: Query GeneMANIA||Fig 2b: Network data for input genes||Fig 2c: Show over-represented GO terms|
In the next step we are going to open the network in the Cytoscape application and visualize the data set on the network.
- Import the genemania.xgmml file from the USB stick in Cytoscape and apply the Prefuse Force Directed Layout.
- A network with 65 nodes and 1263 edges is opened.
- Now we are going to import the fenofibrate treatment data set (see Fig. 2d) and visualize the logFC on the network (see Fig. 2e). Check if some of the nodes have values in the added node attribute columns logFC or p.value.
- After importing, we need to specify the visualization options for the fill color of the nodes. Use a continues mapping for the logFC from -2 (red) to 2 (green) for a gradient from down- to up-regulated (see Fig. 2f).
- The resulting network shows the known interactions between the top-genes of the data set and the logFC visualized on the nodes (see Fig. 2g)
|Fig 2d: Data import in Cytoscape||Fig 2e: Imported data set||Fig 2f: Visual style for Node Fill Color||Fig 2g: Resulting network|
Exercise 3: Extend biological process with regulatory interactions
In this exercise we use four different apps:
- WikiPathways-App: search and download pathways from WikiPathways in pathway or network view (network = unique nodes, no pathway layout).
- NetworkAnalyzer: find the hubs in a network by analyzing the node degree and betweenness properties in a network.
- PathExplorer: find all path that go to or from a specific node.
- CyTargetLinker: extend the network with regulatory interactions to better understand the regulation of a biological process.
Note: In this exercise we are going to use the human Fatty Acid Beta Oxidation pathway. We want to use the human ENCODE transcription factor data, so we chose to use the human instead of the rat pathway.
Download pathway from WikiPathways
- Go to Import → Network → Public Databases and select WikiPathways as the data source.
- Search for Fatty acid beta oxidation in ‘only’ Homo sapiens (see Fig. 3a). Please select the “Import as network” option.
- The resulting network contains 143 nodes and 198 edges (see Fig. 3c).
|Fig 3a: WikiPathways webservice import||Fig 3c: Pathway shown as network in Cytoscape|
Analyzing the properties of the network
- Go to Tools → Network Analyzer → Network Analysis → Analyze network. Analyze the network as an undirected network.
- You can have a closer look at the different properties, like node degree distribution or betweenness.
- In the end you can visualize the properties on the network to identify hub nodes. Click on Visualize Parameters and select “map node size to Degree and map node color to Betweenness” (see Fig. 3d).
- The resulting network identifies the two genes encoding the subunits of the mitochondrial trifunctional protein (HADHA and HADHB), which catalyzes the last three steps of mitochondrial beta-oxidation of long chain fatty acids, as two of the hub proteins in the network (see. Fig 3e).
|Fig 3d: Visualize parameters||Fig 3e: Find hub nodes in resulting network|
Find all path in the network that lead to TCA cycle
- In the pathway diagram on WikiPathways you can see that the pathway has several end points going into the TCA cycle. With the PathExplorer app, we are going to have a closer look what this means from a network perspective.
- Please be aware that the group and complex nodes are not ideal for this analysis (the WikiPathways App will be extended soon to support the visualization of a pathway without those group nodes).
- Find the TCA Cycle node in the network and right click on the node. Select PathExplorer → Find paths to here (see Fig. 3f).
- The app highlights all path within the network that result in this node. You can immediately see that nearly all path end up in the TCA cycle (see Fig. 3g).
|Fig 3f: Use PathExplorer to find paths to TCA cycle.||Fig 3g: Highlighted paths to TCA cycle.|
Extending the network with TF and microRNA regulators
- Go to Apps → CyTargetLinker → Extend network. Select the network, choose GeneID as the attribute containing a biological identifier, specify the directory containing the RINs (they are available on the USB sticks) and choose “Add Regulators” as the direction (see Fig. 3h).
- Select all RINs that are in the directory on the USB stick (see Fig. 3j).
- The extension might take a while – check status in the status bar in the bottom left corner.
- CyTargetLinker extends the network with 1930 microRNA-target interactions from various databases and 58 transcription factor-target gene interactions from ENCODE (see Fig. 3j and 3k).
- When changing the overlap threshold in the CyTargetLinker control panel (left side) to 2, only interactions supported by at least 2 RINs are shown. That allows us to e.g. identify two microRNAs that target several genes in the pathway, hsa-miR-124 and hsa-miR-506 (see Fig. 3l).
|Fig 3h: CyTargetLinker dialog||Fig 3i: RIN selection||Fig 3j: CyTargetLinker legend|
|Fig 3k: Extended network||Fig 3l: Threshold overlap = 2|