Professional Documents
Culture Documents
197
T. Sobh and K. Elleithy (eds.), Advances in Systems, Computing Sciences and Software Engineering, 197–200.
© 2006 Springer.
198 UDOH AND BHUIYAN
The improved visualization features are interface to the system. The user will be able to
attractive to any bioinformatics programmer, specify a variety of parameters or algorithms
since the representations are intuitive. using the web based interface, while the results
of the clustering can be presented by showing
II. ANALYSIS and DESIGN which clusters are included in particular groups
(Fig. 2). The program shows the percentages of
In the first phase to develop bioinformatics each cluster with or without the cancer
software in PHP/MYSQl environment, this malignancy. For example, few clusters may be
clustering system was developed in a Linux used and the percentages for cancer within those
server environment hosting Tomcat 5.0, PHP groups may be too mixed with normal samples
5.0, MySQL 4.1 and Ghostscript 8.15. The base (such as a 30% cancerous 70% normal) within a
technology used for the analysis and design are cluster. In such a scenario, the number of clusters
the server-side scripting language (PHP) and may be increased to allow a better level of
MySQL database for persistence storage. On granularity when clustering. Attribute grouping
execution, the PHP code retrieves the microarray and associative clustering explain similar
data from the MYSQL database. It converts the dependencies and also offer improved
data into a usable format, and then passes the classification of such genes [8, 9]. Below is a
output to the clustering software, which in turn dendrogram produced for the analyzed sample
sends the result to a dendrogram and ghostscript (Fig. 3).
programs for visualization. Some of the Cancerous samples are indicated by arrows.
clustering algorithms programmed, include These cancerous samples are all in the first
single link, complete link, group average, group, with some few normal samples. It is clear
weighted average, weighted centroid and ward’s in this example that almost all of the cancer
method (Fig. 2). samples are present in the first group, with a very
high probability that any sample in that group
will be cancerous (Fig. 3). A variation of this
As can be gleaned from Fig. 2, the PHP code
dendrogram can be obtained if another algorithm
calls the mathematical algorithms to perform the
is selected (Fig. 4).
clustering, as well as provide an easy to use
34449_at 35150_at
Id Sample_Tissue_Site Sample_General_Pathologic_Category CASP2 TNFRS
72 LIVER, NOS NORMAL -0.64565 -0.71759
77 COLON, NOS NORMAL 0.38159 -0.32849
79 KIDNEY, NOS MALIGNANT 0.83026 -0.2525
83 LIVER, NOS MALIGNANT 0.047638 -0.81221
91 KIDNEY, NOS MALIGNANT -0.22587 -0.45274
96 KIDNEY, NOS NORMAL -0.0528 -0.21131
98 LUNG, NOS MALIGNANT -0.02914 -0.2353
100 COLON, NOS MALIGNANT 0.65866 0.78431
101 LUNG, NOS NORMAL 0.23735 0.003307
109 LIVER, NOS MALIGNANT -0.3751 -0.10789
117 LIVER, NOS NORMAL -1.0638 -0.84062
118 COLON, NOS NORMAL -0.88117 -0.7941
124 LIVER, NOS NORMAL 0.073264 -0.29181
Fig. 1: A cross-section of a microarray dataset used for the design of the system
(http://www.samsi.info/200304/dmml/web-internal/bio/data.html)
CLUSTER-BASED MINING OF MICROARRAY DATA 199