Professional Documents
Culture Documents
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 3, May June 2013 ISSN 2278-6856
A COMPARATIVE STUDY AND ANALYSIS FOR MICROARRAY GENE EXPRESSION DATA USING CLUSTERING TECHNIQUES
G.BASKAR1, Dr.P.PONMUTHURAMALINGAM2
1
1,2
Department of Computer Science, Government Arts College (Autonomous), Coimbatore, Tamil Nadu, INDIA
1. INTRODUCTION
Biomarkers for cancer diagnosis is a important problem in cancer genomics,gene expresion micro array is used to identifying candidate gene in various cancer studys.gene expression profiling or micro array analaysis has ennabled the measurement of thousands of genes in a single RNA samples, there are a variety of micro array platforms that have been developed to accomplish this and a basic idea for each is a simple a glass slide or memberone is spotted or arranged and which find weather the gene is present or absent. Feature selection approaches have been applied to the identification of differentially expressed genes in microarray data. Microarray is often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al. Both supervised and unsupervised classifiers have been used to build classification models from microarray data. Cluster analysis is grouping of objects, or clusters, such that objects in one cluster are very similar and objects in different clusters are quite distinct. Data Mining helps to convert such data into useful information. In the chip the red colour indicate up regulation and green is down regulation[3]
Figure 1 : preparation of micro array In this paper we make an analaysis of clustering algorithm for cancer data set and the result is compare with respest of accuracy . Volume 2, Issue 3 May June 2013
THE
REFERENCES
[1] John, G.H., Kohavi, R., Pfleger, K.: Irrelevant Feature and The Subset Selection Problem. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 121129 (1994) [2] Liu, H., Yu, L.: Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Transactions on Knowledge and Data Engineering 17(4), 491 502 (2005) [3] Ding, C., Peng, H.: Minimum Redundancy Feature Selection from Microarray Gene Expression Data. In: Proceedings of the Computational Systems Bioinformatics conference (CSB 2003), pp. 523529 (2003) [4] Yu, L., Liu, H.: Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research 5, 12051224 (2004) [5] Pepe, M.S., Etzioni, R., Feng, Z., et al.: Phases of Biomarker Development for Early Detection of Cancer. J. Natl. Cancer Inst. 93, 10541060 (2001) [6] T. Li, C. Zhang, and M. Ogihara, A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression, Bioinformatics, vol. 20, pp. 24292437, 2004 [7] H. Liu and L. Yu, Toward integrating feature selection algorithms for classification and clustering, IEEE Transactionson Knowledge and Data Engineering (TKDE), vol. 17, no. 4,pp. 491502, 2005. [8] M. Wasikowski and X. Chen, Combating the small sample class imbalance problem using feature selection, IEEE Transactions on Knowledge and Data Engineering, vol. 22,no. 10, pp. 13881400, 2010. [9] C. A. Davis, F. Gerick, V. Hintermair, et al., Reliablegene signatures for microarray classification: assessment ofstability and performance, Bioinformatics, vol. 22, pp. 2356 2363, 2006. [10] C. A. Davis, F. Gerick, V. Hintermair, et al., Reliable gene signatures for microarray classification: assessment of stability and performance, Bioinformatics, vol. 22, pp. 2356 2363, 2006. [11] J.A. Lozano, J.M. Pena, P. Larranaga, An empirical comparison of four initialization methods for the k-means algorithm, Lett. 20 (1999) 10271040. [12] Nielsen T.O, West R.B, Linn S.C, et al. Molecular characterization of soft tissue tumours: a gene expression study. Lancet2002 [13] G.Baskar, D.Napoleon Message Passing between Data Point on Clustering Algorithm for Gene LeukemiaDataset ,IJARCS volume 1,number 4,nov- dec2010
Vj=((ij) xi) /xi) /((ij)m )), j=1,2,....c i=1 i=1 4) Repeat step 2) and 3) until the minimum 'J' value is achieved or ||U(k+1) - U(k)|| < . where, k is the iteration step. is the termination criterion between [0, 1]. U = (ij)n*c is the fuzzy membership matrix. J is the objective function.
5.
DATA SETS
Colon tumour is a disease in which cancerous growths (tumours) are found in the tissues of the colon. This dataset contains 62 samples. Among them, 40 tumour biopsies are from tumours (labelled as "negative") and 22 normal (labelled as "positive") biopsies are from healthy parts of the colons of the same patients. The total number of genes to be tested is 2000. Alon, et al, 1999. LEUKEMIA ARE PRIMARY DISORDERS OF BONE NARROW. THEY MALIGNANT NEOPLASMS OF HEMATOPOIE TIC STEM CELLS. THE TOTAL NUMBER OF GENES TO BE TESTED IS 7129, AND NUMBER OF SAMPLES TO BE TESTED IS 72, WHICH ARE ALL ACUTE LEUKEMIA PATIENTS, EITHER .GOLUB, ET AL, 1999
6.
Graph 1: Average accuracy of algorithm Volume 2, Issue 3 May June 2013 Page 322
Authors profiles
[1] G.Baskar received his Masters degree in Information Technology in K.S.Rangasamy College of Technology, Tiruchengode, Tamil Nadu India in 2008 and M.Phil Degree in Computer Science from Bharathiar University, Coimbatore, Tamil Nadu, India in 2010, and He is currently working towards the PhD degree in Department of Computer Science, Government Arts College, Coimbatore, Tamil Nadu, INDIA in 2011. His area of interest includes Data mining, bioinformatics.
[2] P.Ponmuthuramalingam received his Masters Degree in Computer Science from Alagappa University, Karaikudi in 1988 and the Ph.D. in Computer Science from Bharathiar University, Coimbatore. He is working as Associate Professor and Head in Department of Computer Science, Government Arts College (Autonomous), Coimbatore. His research interest includes Text mining, Semantic Web, Network Security and Parallel Algorithms.
Page 323