You are on page 1of 1

Finding Temporal Pattern in Gene Expression Profiles

Yuka Kutsumi, Olvi Tole, Lakshmi Mamidi, MS, Vincent K. Sam, MS, Guenter Tusch, PhD Medical and Bioinformatics Graduate Program, School of Computing and Information Systems, Grand Valley State University, Allendale, MI, USA
Summary
Temporal translational research is based on measurements that have been obtained at different points in time. Our web-based multi-user program helps the researcher find temporal patterns like peaks in large microarray data sets that include data from different but related studies. This is accomplished by transformation of the data into an abstract layer that is independent from the particular selection of time point in the individual studies.

System Description
We created a software tool using open-source platforms that supports the R statistical package, B i o c o n d u c t o r, a n d W e b 2 . 0 k n o w l e d g e representation standards using the open source Semantic Web tool Protg-OWL. We report here on the web interface that connects to programs based on R and Bioconductor. The SPOT project is implemented mainly in free open source technology. The project is hosted on an Apache server (Fedora Linux). The project s database is built on MySQL. This database is accessed by the interface to display the information that the users selects, also by the back end code which contains the logic of the application is developed in R. The interface is developed in PHP, a scripting language that makes it very easy executing R commands and queries on the MySQL database. The R-scripts performs many different tasks depending on the step of the project that is being executed (fig. 1). At the beginning the user selects the criteria of the experiments through a PHP form (fig. 2). PHP calls an R script that connects with the GEOmetadb [4] SQLite database. Then the user selects different algorithms or parameters in the system (fig. 3), that allow the system to train patterns to recognize, for instance, a peak in fold changes of the temporal expression data [2]. The program generates R code and OWL/ SWRL rules (fig. 4). The researcher (user) can define different gene expression profile peaks, e.g., Early or Late in the time course, and search for similar profiles in the database of interesting studies. We developed tools to support this process using the Protg-OWL ontology development toolkit (compare [3]). SWRL allows users to write rules that can be expressed in terms of OWL concepts and that can reason about OWL individuals. The Protg OWL plug-in allows to easily build ontologies that are backed by OWL code. Acknowledgements:
We would like to thank the following individuals without whose support this study would not have been possible: Dr. Amar Das (Dartmouth), Martin O Connor (Stanford U) Dr. Craig Webb, Dr. Jeremy Miller, (VAI) Dr. Timothy Redmond, Dr. Mark Musen (Stanford U), Ramya Gunda and Jayashanti Gagginepally (GVSU)

User Interface and Evaluation


The R/Protg interface is implemented as a Java program, which interfaces with R and Protg through respective API s. The user selects the different algorithms or parameters in the system via pull-down menus. The program then generates corresponding R macros and OWL/SWRL code. The user interface was evaluated regarding functionality, robustness, user friendliness, and the number of clicks necessary to achieve ones goal. The Spot interface has been improved in terms of the readability. The interface includes warning messages and user guidance. The core search results are obtained after only three pages. The evaluation was mainly performed by running sample gene data obtained from the Meltzerlab GEO site [6]. Multiple platforms were tested and corresponding GDS numbers were verified for all these platforms. In addition, a time series sample was tested, and the expected GDS number was confirmed. We also tested the search by organisms with Drosophila melanogaster. The query obtained all expected datasets. As a next step, we need to establish standard test and validation criteria, and have more biologists to evaluate this application.

SPOT:
S - Protg OWL/SWRL Temporal Abstraction
Learning Concepts from a Subset (Train & Test Data Set)
Microarray database
Select Select training samples Feedback Annotate

Learn Interval sR

Create Temporal Concepts Protg OWL/SWRL

Select IDs SWRL

Research database

Apply Interval sR

Apply Temporal Concepts Protg OWL/SWRL

Select subset SWRL/ SQL

Searching for Learned Concepts in Database

Background
For stimulus response studies a researcher typically obtains a fold change profile and tries to retrieve similar profiles in microarray databases or clinical databases (that more frequently include microarray data, whole-genome sequencing, or other next-generation sequencing data). Peaks in gene profiles in temporal microarray studies represent a biological effect that is reversed after some time. Useful biological information regarding to the genome of an organism can be determined by finding which genes are induced or repressed in a phase of the cell cycle. Sets of genes whose expression are regulated under the same condition are likely to have a related biological function or an evolutional relationship. In the last decade DNA chip technology has increasingly been used in molecular biology. NCBI GEO is a public database for microarray, nextgeneration sequencing, and other forms of highthroughput functional genomic data submitted by the scientific community for gene expression analysis. For stimulus response microarray studies, typically researchers obtain a fold change expression profiles and try to retrieve similar profiles in opensource platforms [1]. However, these traditional approaches assume that the pattern of time points in all selected experiments is identical or very similar to the experimental design of the initial study. Temporal abstraction [2], i.e. creating interval-based abstractions from expression profiles, is not based on that assumption. Temporal Abstraction is one of the methods used in the Spot project.

Figure 1: SPOT Overview

Figure 2: Selection of data of interests

References:
1. Tusch G, Bretl C, O'Connor M, Das A, SPOT--towards temporal data mining in medicine and bioinformatics, AMIA Annu Symp Proc. 2008: 1157. 2. Shahar Y, Musen, M. Knowledge-based temporal abstraction in clinical domains. Artif Intell Med (1996), 8(3): 267-98. 3. O'Connor MJ, Shankar RD, Parrish DB, Das AK. Knowledge-Data Integration for Temporal Reasoning in a Clinical Trial System. Int J Med Inform (2008), doi: 10.1016/j.ijmedinf.2008.07.013
4. Zhu Y, Davis S, Stephens RM, Meltzer PS, Chen Y: GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics(2008) 2798-2800 5. Vzquez-Chona F, Song BK, Geisert EE Jr: Temporal changes in gene expression after injury in the rat retina. Invest Ophthalmol Vis Sci 2004 Aug;45(8):2737-46.

Figure 3: Select time intervals for training


SWRL Code Example
Tissue(?tissue) hasExperiment(?tissue, ?exp) hasGene(?exp, ?gene) hasGeneName(?gene, ?geneName) hasOutputType(?gene, ?outputType) swrlb:equal(?outputType, "INCREASE") temporal:hasValidTime(?gene, ?tVT) hasGene(?exp, ?gene2) hasGeneName(?gene2, ?geneName2) swrlb:equal(?geneName2,?geneName) hasOutputType(?gene2, ?outputType2) swrlb:equal(?outputType2, DECREASE") temporal:hasValidTime(?gene2, ?tVT2) temporal:meets(?tVT, ?tVT2, "days") temporal:hasStartTime(?tVT, ?startTime) temporal:hasFinishTime(?tVT2, ?finishTime) swrlb:lessThanOrEqual(?finishTime, 1) swrlx:createOWLThing(?hbVT, ?exp) -> temporal:ValidPeriod(?hbVT) temporal:hasStartTime(?hbVT,?startTime) temporal:hasFinishTime(?hbVT,?finishTime) hasEarlyPeak(?exp, ?hbVT)

Ontology
ValidTime

StartTime

FinishTime

has ValidTime

IntervalEvent GeneName hasGene Experiment OutputType

hasExperiment Tissue Name

6. http://gbnci.abcc.ncifcrf.gov/geo/gds_subset.php

Figure 4: SWRL Code for the concept Early Peek

You might also like