You are on page 1of 86

Integrative Bioinformatics using Cytoscape (and R2)

(Bio)Chemistry Concentrations Molecular structures Reaction equations Quantitative Defined experimental setup

(Bio)Chemistry versus Molecular Biology some basic concepts


Molecular Biology Regulation Large biomolecules Large scale processes Qualitative Complex experimental setup (by necessity!)

Human Genetics

Molecular Biology: New techniques Integrative Bioinformatics needed


(Deep)Sequencing Arrays Proteomics Quantitative analysis
handling large datasets statistics

Capturing complexity
integration graphs

Integrative Bioinformatics: Integrated Bioinformaticians!


Human Genetics

Integrative Bioinformatics: An example

Human Genetics

Integrative Bioinformatics: What they did


1. Sequence genome; assign gene function using protein sequence, structural similarities (Bonneau et al., 2004; Ng et al., 2000) 2. Perturb cells: environmental factors; knockouts (Baliga et al., 2004; Kaur et al., 2006; Kottemann et al., 2005) 3. Measure changes: microarrays (Baliga et al., 2004;Kaur et al., 2006; Whitehead et al., 2006). 4. Integrate diverse data (mRNA levels, evolutionarily conserved associations among proteins, metabolic pathways, cis-regulatory motifs, etc.) with the cMonkey algorithm to reduce data complexity and identify subsets of genes that are coregulated in certain environments (biclusters) (Reiss et al., 2006). 5. Using the machine learning algorithm Inferelator construct a dynamic network model for influence of changes in EFs and TFs on the expression of coregulated genes (Bonneau et al., 2006). 6. Explore the network with Gaggle, a framework for data integration and software interoperability to formulate and then experimentally test hypotheses to drive additional iterations of steps 26 (Shannon et al., 2006)

Human Genetics

Integrative Bioinformatics: Their framework

Human Genetics

Integrative Bioinformatics: results

Human Genetics

Goes to show that:


1. Aggregate 2. Search/Visualize 3. Analyze/Feedback Combine data from different sources Filter Algorithms

Need for adaptable software


Goal: Facilitate ideas

Human Genetics

Cytoscape - Network Visualization and Analysis


Freely-available (open-source, java) software, easily extensible (Plugin API) Visualizing networks (e.g. molecular interaction networks) Analyzing networks with gene expression profiles and other cell state data (GO, proteomics, ) Used in several hundred analyses in recent literature Continuity guaranteed

Human Genetics

An example Cytoscape work-flow

Human Genetics

Cytoscape Workflow
1. Load Networks (Import network data into Cytoscape) 2. Load Attributes (Get data about networks into Cytoscape) 3. Analyze and Visualize Networks 4. Prepare for Publication A specific example of this workflow:
Cline, et al. Integration of biological networks and gene expression data using Cytoscape, Nature Protocols, 2, 2366-2382 (2007).

Human Genetics

Networks as graphs
A Network is a collection of
Nodes (or vertices) Edges connecting nodes (directed or undirected, weighted, multiple edges, self-edges) Nodes can represent proteins, genes, metabolites, or groups of these (e.g. complexes) - any sort of object Edges can be either physical or functional interactions, activators, regulators, reactions - any sort of relations

Human Genetics

Cytoscape Workflow
1. Load Networks (Get network data into Cytoscape) 2. Load Attributes (Get data about networks into Cytoscape) 3. Analyze and Visualize Networks 4. Prepare for Publication

Human Genetics

Creating a network

Human Genetics

Free-format Text and Excel Files


Specify Input File
Define Columns

Text Parsing Options

Preview

Human Genetics

Pathways: plenty resources

http://pathguide.org : over 240 pathwaydbs


Human Genetics

All kinds of network data


Physical interactions
Protein Protein interactions Protein DNA interactions Metabolic interactions

Functional interactions
Co-expression relations Genetic interactions Knockout/siRNA targets
Human Genetics

Pre-formatted Network Files


Cytoscape supports many popular file formats:
SIF (Simple Interaction Format) GML (Graph Markup Language) XGMML (eXtensible Graph Markup and Modeling Language) BioPax (Biological Pathway Data) PSI-MI 1 & 2.5 (Protein Standards Initiative) SBML Level 2 (Systems Biology Markup Language)

Available for download from data sources (URLs, web-services, formatted table files)
Human Genetics

Internet Databases
Cytoscape version 2.6
web service clients: import networks directly from several trusted internet resources IntAct (MBL-EBI) PathwayCommons (collection of data resources) NCBI Entrez Gene Many more will be included...

Human Genetics

Interaction Database Search


Import Visualize and Analyze

Human Genetics

Cytoscape Workflow
1. Load Networks (Get network data into Cytoscape) 2. Load Attributes (Get data about networks into Cytoscape) 3. Analyze and Visualize Networks 4. Prepare for Publication

Human Genetics

What are Attributes?


Any data that describes or provides details about the nodes and edges in the network
Gene Expression Data Mass Spectrometry Data Protein Structure Information Gene Ontology (GO) terms Interaction Confidence Values, etc

Cytoscape support multiple data types


Numbers (integers, floats) Text (strings) Human Genetics Logical (booleans)

Attribute Management
Select Attributes for Display

Node or Edge ID

Strings and floating type of attributes

Specific Attribute Tabs

Load Attributes: Import Attribute Files


Map data about Networks onto Networks. Attributes can be loaded in many of the same ways as networks.
Import pre-formatted attribute files Import formatted text or Excel files Create attributes manually in attribute editor Load attributes from web services ID mapping though node attributes
Human Genetics

ID Mapping
Mapping identifiers from one source to another is a major challenge Multiple levels of IDs E.g. probe>gene ->peptide->protein Cytoscape provides an ID mapping through the BioMart web service of EBI to convert the IDs Not perfect but sufficient Additional mapping mechanism underway

Human Genetics

Cytoscape Workflow
1. Load Networks (Get network data into Cytoscape) 2. Load Attributes (Get data about networks into Cytoscape) 3. Analyze and Visualize Networks 4. Prepare for Publication

Human Genetics

Visual Data Integration


1. Network Data
YDR382W YDR382W YFL039C YFL039C pp pp pp pp YDL130W YFL039C YCL040W YHR179W

VizMapper

2. Attribute Data
ExpressionValue YCL040W = 0.542 YDL130W = -0.123 YDR382W = -0.058 YFL039C = 0.192 YHR179W = 0.078

Human Genetics

VizMapper
List of Visual Styles Default Visual Style Editor List of Visual Attributes

List of Data Attributes

Mapping definition

Human Genetics

Types of mappings
Continuous Continuous Data mapped to Continuous Visual Attributes (e.g. gene expression levels mapped to node color) Continuous Data mapped to Discrete Visual Attributes (e.g. p-value categories mapped to node shape) Discrete Discrete (categorical) Data to Discrete Visual Attributes (e.g. GO annotation mapped to node shape) Discrete Data mapped to Continuous Visual Attributes(e.g. multiple GO terms mapped to pie coloring)

Human Genetics

Network Filtering

Human Genetics

Several Layout Algorithms

Spring-embedded
Circular Hierarchical

Human Genetics

Linkout
Nodes and Edges act as hyperlinks to external databases. Userconfigurable URLs Collection of the biological results

Human Genetics

Cytoscape Workflow
1. Load Networks (Get network data into Cytoscape) 2. Load Attributes (Get data about networks into Cytoscape) 3. Analyze and Visualize Networks 4. Prepare for Publication

Human Genetics

Prepare for Publication


Fine tune the Figures Manual Layout manipulation options (align, scale, rotate) Manually override visual styles

place labels, change colors, etc.

Human Genetics

Finalizing the Figures


Publication Quality Graphics in several formats
PDF, EPS, SVG, PNG, JPEG, and BMP

Export Session to HTML for Web


Human Genetics

Cytoscape: So what?
The big Pro Cyto argument: EXTENSIBLE Plugins, Plugins, Plugins
In our case enabled extended array data analysis

Human Genetics

Cytoscape is Extensible
Cytoscape is open source and free software A plugin interface that allows any programmer to write their own extensions to Cytoscape Plugins represent the primary biological analysis mechanism in Cytoscape Plugins are distributed from a central Cytoscape database and can be installed while running
Human Genetics

Hello World Plugin

http://cytoscape.org/cgi-bin/moin.cgi/Hello_World_Plugin http://cytoscape.org/cgi-bin/moin.cgi/Developer_Homepage
Human Genetics

Extending the workflow through plugins

Graph based integration and analysis of molecular biological data


Human Genetics

Integrative Bioinformatics in our group


Aggregate data: 18000+ Affymetrix arrays Tumor series Public data Experiments Manipulate celllines; Lentiviral library Search/Visualize/Selection: R2 Statistical cutoffs Correlations: R2 Clinical data coupling Analysis/Feedback: R2 and Cytoscape Known Interactions Transcription Factor binding

Human Genetics

Integrative Bioinformatics in our group


Patient data GEO arrays HGServer R2-array analysis interface
Statistical analysis Perl module

DB

Cytoscape webstart

AMC Plugin

Cytoscape interface

External data sources Array data: Tumor and Experiments

Canonical paths

Human Algorithms Genetics

Array data analysis: R2

Mainly work by Jan Koster


Human Genetics

R2 interface: Demo

Human Genetics

R2 interface

Human Genetics

R2 interface

Human Genetics

R2 interface

Human Genetics

R2 interface

Human Genetics

R2 interface

Human Genetics

Timeseries in R2 / Cytoscape (Demo)

Human Genetics

Timeseries in R2

Human Genetics

Timeseries in R2

Human Genetics

Timeseries in R2 Integration with Cytoscape through webstart

Human Genetics

Timeseries in Cytoscape: Visualization

Human Genetics

Timeseries in Cytoscape: Aggregate data

Human Genetics

Timeseries in Cytoscape: Search/Filter

Human Genetics

Timeseries in Cytoscape: Filter

Human Genetics

Timeseries in Cytoscape

Human Genetics

Timeseries in Cytoscape

Human Genetics

Tf (green) and partners (red)

Human Genetics

Filtering

Human Genetics

Filtering

Human Genetics

Coloring, layout

Human Genetics

Resuming:
1. Aggregate 2. Search/Visualize 3. Analyze/ Feedback Combine NOTCH3 knockout data with TF and PPi data Layout timeseries/Find downstream targets Identify MSX1/Knockout in new experiment

Human Genetics

More Plugin Examples


BiNGO (Enriched GO categories found in the sub-network) WikiPathways (Visualize curated pathways) MCODE (Putative protein complexes) GenePro (Protein-Protein interaction cluster visualization) jActiveModules (Search for significant sub-networks) NetworkAnalyzer (Statistical analysis of networks) Agilent Literature Search (Network creation) CyGoose (Gaggle communication)
Human Genetics

Timeseries and BinGO: Aggregate

Human Genetics

Timeseries and BinGO: Analyze

Human Genetics

Timeseries and BinGO

Human Genetics

Timeseries and BinGO

Human Genetics

GOlorize plug-in (Pasteur)


Node placement on the basis of both the connection structure (the edges) and the class structure (GO) A modification of the classic force-directed layout algorithm Beyond GO classes, other class information can be used though attributes (e.g. active modules, complexes)

Human Genetics

GOlorize plug-in interface

Default settings for the class attractive force and separation factor Class-directed network layout
Human Genetics

Example: genetic interaction network

Standard Spring-embedded layout algorithm in Cytoscape


Human Genetics

Example: genetic interaction network

Spring-embedded layout algorithm with GO colour-coding


Human Genetics

Example: genetic interaction network

Final results of the GOlorize layout algorithm in Cytoscape


Garcia et al. Bioinformatics 2007
Human Genetics

Find Network Clusters - MCODE Plugin

Network clusters are highly interconnected sub-networks that may be also partly overlapping Clusters in a protein-protein interaction network have been shown to represent protein complexes and parts of biological pathways Clusters in a protein similarity network represent protein families
Human Genetics

Network Clustering
7000 Yeast interactions among 3000 proteins

Human Genetics

Bader & Hogue, BMC Bioinformatics 2003 4(1):2

Human Genetics

Proteasome 26S
Ribosome

Proteasome 20S

RNA Splicing RNA Pol core

Bader & Hogue, BMC Bioinformatics 2003 4(1):2

Human Genetics

Find Network Motifs - Netmatch plugin

Network motif is a sub-network that occurs significantly more often than by chance alone Input: query and target networks, optional node/edge labels Output: topological query matches as subgraphs of target network Supports: subgraph matching, node/edge labels, label wildcards, approximate paths http://alpha.dmi.unict.it/~ctnyu/netmatch.html
Human Genetics

Finding query sub-networks

Query
Ferro et al. Bioinformatics 2007
Human Genetics

Results

Finding Signaling Pathways


Potential signaling pathways from plasma membrane to nucleus via cytoplasm

NetMatch Results
Signaling pathway example NetMatch query

Ras

MAP Kinase Cascade


TFs

Raf-1
Shortest path between subgraph matches
Human Genetics

Mek MAPK

Nucleus - Growth Control Mitogenesis

Find Active Subnetworks


Active modules are sub-networks that show differential expression over user-specified conditions or time-points
Microarray gene-expression attributes Mass-spectrometry protein abundance

Method
Calculate z-score/node, ZA score/subgraph, correct for random expression data sampling Score over multiple experimental conditions Simulated annealing-based search method is used to find the high scoring networks
Ideker T, Ozier O, Schwikowski B, Siegel AF Human Genetics Bioinformatics. 2002;18 Suppl 1:S233-40

Finding active modules


jActiveModules plug-in
Input: interaction network and p-values for gene expression values over several conditions

Output: significant subnetworks that show differential expression over one or several conditions

Ideker T et al. Science 2001; Bioinformatics 2002

Human Genetics

Cerebral: Cellular location and expression data

Human Genetics

Concluding
Cytoscape is a proven valuable tool for integrative bioinformatics Easily extensible: well suited to answer new biological research questions Analyses can be tedious for biologists; up to bioinformaticians to translate these in simple workflows Therefore: bioinformaticians, integrate into wet-lab research groups!
Human Genetics

Some notes
Plugin lifetime
Maintenance Interoperability

Visualization issues
Standard biologist layouts Fancy visuals

Cytoscape 3.0 aims to solve these issues (amongst others)


Human Genetics

Availability
Cytoscape:
http://cytoscape.org cytoscape-discuss@googlegroups.com cytoscape-helpdesk@googlegroups.com

R2
Available shortly through http://humangeneticsamc.nl Keep yourself posted on http://groups.google.com/group/r2-announce Human Genetics

You might also like