Professional Documents
Culture Documents
Yuka Kutsumi, Olvi Tole, Lakshmi Mamidi, MS, Vincent K. Sam, MS, Guenter Tusch, PhD Medical and Bioinformatics Graduate Program, School of Computing and Information Systems, Grand Valley State University, Allendale, MI, USA
Summary
Temporal translational research is based on measurements that have been obtained at different points in time. Our web-based multi-user program helps the researcher find temporal patterns like peaks in large microarray data sets that include data from different but related studies. This is accomplished by transformation of the data into an abstract layer that is independent from the particular selection of time point in the individual studies.
System Description
We created a software tool using open-source platforms that supports the R statistical package, B i o c o n d u c t o r, a n d W e b 2 . 0 k n o w l e d g e representation standards using the open source Semantic Web tool Protg-OWL. We report here on the web interface that connects to programs based on R and Bioconductor. The SPOT project is implemented mainly in free open source technology. The project is hosted on an Apache server (Fedora Linux). The project s database is built on MySQL. This database is accessed by the interface to display the information that the users selects, also by the back end code which contains the logic of the application is developed in R. The interface is developed in PHP, a scripting language that makes it very easy executing R commands and queries on the MySQL database. The R-scripts performs many different tasks depending on the step of the project that is being executed (fig. 1). At the beginning the user selects the criteria of the experiments through a PHP form (fig. 2). PHP calls an R script that connects with the GEOmetadb [4] SQLite database. Then the user selects different algorithms or parameters in the system (fig. 3), that allow the system to train patterns to recognize, for instance, a peak in fold changes of the temporal expression data [2]. The program generates R code and OWL/ SWRL rules (fig. 4). The researcher (user) can define different gene expression profile peaks, e.g., Early or Late in the time course, and search for similar profiles in the database of interesting studies. We developed tools to support this process using the Protg-OWL ontology development toolkit (compare [3]). SWRL allows users to write rules that can be expressed in terms of OWL concepts and that can reason about OWL individuals. The Protg OWL plug-in allows to easily build ontologies that are backed by OWL code. Acknowledgements:
We would like to thank the following individuals without whose support this study would not have been possible: Dr. Amar Das (Dartmouth), Martin O Connor (Stanford U) Dr. Craig Webb, Dr. Jeremy Miller, (VAI) Dr. Timothy Redmond, Dr. Mark Musen (Stanford U), Ramya Gunda and Jayashanti Gagginepally (GVSU)
SPOT:
S - Protg OWL/SWRL Temporal Abstraction
Learning Concepts from a Subset (Train & Test Data Set)
Microarray database
Select Select training samples Feedback Annotate
Learn Interval sR
Research database
Apply Interval sR
Background
For stimulus response studies a researcher typically obtains a fold change profile and tries to retrieve similar profiles in microarray databases or clinical databases (that more frequently include microarray data, whole-genome sequencing, or other next-generation sequencing data). Peaks in gene profiles in temporal microarray studies represent a biological effect that is reversed after some time. Useful biological information regarding to the genome of an organism can be determined by finding which genes are induced or repressed in a phase of the cell cycle. Sets of genes whose expression are regulated under the same condition are likely to have a related biological function or an evolutional relationship. In the last decade DNA chip technology has increasingly been used in molecular biology. NCBI GEO is a public database for microarray, nextgeneration sequencing, and other forms of highthroughput functional genomic data submitted by the scientific community for gene expression analysis. For stimulus response microarray studies, typically researchers obtain a fold change expression profiles and try to retrieve similar profiles in opensource platforms [1]. However, these traditional approaches assume that the pattern of time points in all selected experiments is identical or very similar to the experimental design of the initial study. Temporal abstraction [2], i.e. creating interval-based abstractions from expression profiles, is not based on that assumption. Temporal Abstraction is one of the methods used in the Spot project.
References:
1. Tusch G, Bretl C, O'Connor M, Das A, SPOT--towards temporal data mining in medicine and bioinformatics, AMIA Annu Symp Proc. 2008: 1157. 2. Shahar Y, Musen, M. Knowledge-based temporal abstraction in clinical domains. Artif Intell Med (1996), 8(3): 267-98. 3. O'Connor MJ, Shankar RD, Parrish DB, Das AK. Knowledge-Data Integration for Temporal Reasoning in a Clinical Trial System. Int J Med Inform (2008), doi: 10.1016/j.ijmedinf.2008.07.013
4. Zhu Y, Davis S, Stephens RM, Meltzer PS, Chen Y: GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus. Bioinformatics(2008) 2798-2800 5. Vzquez-Chona F, Song BK, Geisert EE Jr: Temporal changes in gene expression after injury in the rat retina. Invest Ophthalmol Vis Sci 2004 Aug;45(8):2737-46.
Ontology
ValidTime
StartTime
FinishTime
has ValidTime
6. http://gbnci.abcc.ncifcrf.gov/geo/gds_subset.php