International Journal of EmergingTrends & Technology in Computer Science(IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org
Volume 3, Issue 3, May June 2014 ISSN 2278-6856
Volume 3, Issue 3 May June 2014 Page 65
Abstract: Data mining is the process of analyzing data from different forms or patterns and summarizing it into valuable information. We can know Data mining as knowledge discovery. Data mining techniques are used to find the hidden or new patterns to store the data. We know that data mining can use every sector like business, agriculture, marketing etc . . In this paper we are describe many classification and clustering techniques which are apply on education sector to improve the growth of eduction sector.. We know that number of student are taking admission in different-2 courses in every college or university. Than According to their admission details they collect information from students students and store the same in computers. This study cab be help to the learner and teaching community to increase its performance and valuability. Clustering is used to defined the the data into groups or regions according to their specification or collected data and these groups are called clustered. Classification techniques are used to predefine the classification of data These techniques can also be used With many other specific discovery tchnques or algorithms.. This paper describe the various approaches and techniques of data mining which can be applied on Educational data to build up a new environment to improve performance of existing data and help to create the new predictions on the data. In this paper we describe the comparative study of classification techniques are Bayes net, naive net and decision tree etc And clustering techniques are k-mean, hierarchal, OPTICS and DBSCAN etc.
Keywords:- Data mining, Classification, Clustering ,Classification methods and Clustering algorithms , Data Mining Application etc.
1. INTRODUCTION Data mining having an ability to find exiting relationship and pattern. Data mining consists itself with machine learning, statistics and visualization techniques to discover and extract knowledge.
Figure1.1 Data mining applications in the education sector [2] Application of data mining in education sector is an emerging trend . The data mining terms, tasks, techniques and application can be used to developing data mining in education sector.[1].Clustering and classification both are very useful to improve the performance on education sector.
2. LITERATUREs REVIEW S. Anupama Kumar and M. N. Vijayalakshm [3] illustrate that various data mining techniques like classification; clustering are apply on the students data base. This study can be used to enable the learner and teaching community increase the performance. These techniques can also be combined with other specific discovery model to increase the capacity of the model. In this paper explain the many techniques of data mining according to Educational data to design a new environment Result of this paper is that edducation system can inhanced their performance by using data mining techniques. In this paper shows that every method have its own key area in which it perform accurate. Umamaheswari and S. Niraimathi [4] present the various techniques of data mining which is used to analysis the student records in order to categorize the students into grade order in all their education studies and it helps in interview situation. It examines that which factors helps to categorize students in rank order to arrange for the recruitment process. Due to this, we can easily discover the eligible student and it also reduces the short listings. The result of this paper is that data mining techniques are efficiently used to manage the performance level of students. Classification is one of the data mining techniques which is used to accurately classifies the data for categorizing student based on the levels. Clustering is an important function of data mining to analysis discover data sources distribution of information and , the cluster analysis is an important research topic. This goes a long way to help how define the recruitment process in an easier manner. Dr. Mohd Maqsood Ali [5] presents the roll of data mining in education sector. We know that every university either public or private and its colleges enroll thousands of students into various courses or programs every year. They collect information from students at the time of admissions and store the same in computers. Before using the data we must understands the nature of A Comparative Study on Role of Data Mining Techniques in Education: A Review
Suman #1 , Mrs.Pooja Mittal *2
#1 Student of Masters of Technology, Department of Computer Science and Application M.D. University, Rohtak, Haryana, India
*2 Assistant Professor, Department of ComputerScience and Application M.D. University, Rohtak, Haryana, India International Journal of EmergingTrends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org Volume 3, Issue 3, May June 2014 ISSN 2278-6856
Volume 3, Issue 3 May June 2014 Page 66
data because data can be used for classifying and predicting the students behavior, performance, dropouts as well as teachers performance. The outcomes of this paper explain the application of data mining in education sector. Thus we can use the various techniques of data mining in education field like as we are using the data mining in industry or business. Data mining can be applied for classifying and clustering students characteristics based on demographic, psychographic and behavioral variables. Data mining can also be applied by using if-then rule. In addition, it can define the profile of successful and unsuccessful students based of GPA achieved during the semesters. M. Sukanya, S. Biruntha, Dr. S. Karthik and T. Kalaikumaran [6] they represent that the performance can be improved in education sector by using classification and clustering .So that data mining techniques have ability to predict the performance of student in education environment. students academic performance is based upon diverse factors like personal, social, Psychological and other environmental variables. Data mining techniques is process which is used to find the hidden information patterns and relationships of large amount of data, which is very much helpful in decision making. Which helps the education sector to capture and compile low cost information for this information and communication technology are used in educational database is increased rapidly because of the large amount of data stored in it. Data mining approach creates useful information from existing student to manage relationships with upcoming students. This helps the teacher to improve the performance of students. Those students needed special attention for reducing falling ratio for taking action at right time. Bayesian classification method can be apply on student database to getting the students division on the basis of previous year database. Data mining is a powerful analytical tool that enables educational institutions to better allocate resources and staff, and proactively manage student outcomes. Student performance in university courses is of great concern to the higher education managements where several factors may affect the performance. Data mining extracts hidden information with the help different mining technique. Richard A. Huebner Norwich University [7] represents a survey of educational data mining research .Educational data mining is an emerging techniques that apply on educational data. The discipline focuses on analyzing educational data to develop models for improving learning experiences and improving institutional effectiveness. Educational data mining (EDM) is an area full of exciting opportunities for researchers and practitioners. This field assists higher educational institutions with efficient ways to improve institutional effectiveness and student learning. Data mining is a significant tool for helping organizations enhance decision making and analyzing new patterns and relationships among a large amount of data.
Bharat Chaudhari, Manan Parikh [8] represents comparative study of clustering algorithms Using weka tools. Clustering is a process in which data is divided into different clusters according their functionality. Data of one cluster is different to another cluster but within that cluster data is homogenous. In this paper they compare the performance of clustering algorithm in term of class wise cluster building ability of algorithm. The outcomes of this paper is that k mean is better than other clustering algorithm(Hierarchical Clustering algorithm, Density based clustering algorithm) but is produce quality when we use large amount of data. Sharaf Ansari, Sailendra Chetlur, Srikanth Prabhu, N. anagement system. Student performance in university courses provide an overview of clustering algorithms used in data mining. They represent an important role in our life because we need much information (data) and we know that data mining is a process to extracting data and recognize the patterns. In this paper they provide an overview of some clustering analysis techniques such as DBSCAN, OPTICS, STING and CLIQUE. Narendra Sharma , Aman Bajpai and Mr. Ratnesh Litoriya [10] represents the comparison between various clustering algorithm using weka tool .Tthere are various tools in data mining which are used to analysis the data .They allow the users to analysis the data in different dimension or angles, categorize it, and summarize the relationships identified .Weka is also an data mining tool which is used for analysis the data. The main objective is to show the comparison of the different- different clustering algorithms of weka and to find out which algorithm will be most suitable for the users. Every algorithmhas their own importance and we use them on the behavior of the data, but on the basis of this research we found that k-means clustering algorithm is simplest algorithm as compared to other algorithms. Bakar, A. A., Kefli, Z., Abdullah, S., Sahani. M [11] represented a paper on Predictive Models for Dengue Outbreak Using Multiple Rule base Classifier. The main aims to develop the predictive models for dengue outbreak detection using Multiple Rule Based Classifiers. The rule based classifiers used are the Decision Tree, Rough Set Classifier, Naive ayes, and Associative Classifier. Dengue fever (DF) and dengue hemorrhagic fever (DHF) have been continuously becoming a public health related issues in Malaysia and growing pandemic as reported by World Health Organization (WHO).. The purpose of the classification modeling is to build a predictive model for predicting the dengue outbreak. There is no research uses this data for predictive modeling, several classifiers are investigated to study the performance of various rule based classifiers individually and the combination of the classifiers. Johns, S., Santos, M.V [12] represents a paper .on the Evolution of Neural Networks for Pair wise Classification Using Gene Expression Programming. Neural networks are a common choice for solving classification problems, International Journal of EmergingTrends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org Volume 3, Issue 3, May June 2014 ISSN 2278-6856
Volume 3, Issue 3 May June 2014 Page 67
but require experimental to adjustments of the topology are effective. Oyelade, O. J, Oladipupo, O. O, Obagbuwa, I. C [13] they describe the application of k- mean clustering algorithm to provide the result of students academic performance. The main aim is to analysis the students performance by using k mean implementation in clustering. In this paper they combined the k mean model with the deterministic model to analyze the students results of a private Institution in Nigeria which is a good benchmark to monitor the progression of academic performance of students in higher Institution for the purpose of making an effective decision by the academic planners. they simply compare the predictive power of clustering algorithm and the Euclidean distance as a measure of similarity distance. they provide better result compare the earliest model of k-mean. Madhuri V. Joseph, Lipsa Sadath and Vanaja Rajan[14] they represent comparison on various techniques in data mining. In data mining having various techniques which are used to exploring the important data from bulk amount of data and they can deal with different data type. In this paper they explain and compare some common techniques of data mining which are mainly used in our daily life and business environment. Every data mining techniques is an important role in the business environment according to its functionality so there is no any one model which play all the roll in business environment. Aastha Joshi, Rajneet Kaur[15] they represent that clustering is a process in which we find the a structure of un label data. In this paper they provide the reviews of six types of clustering techniques- k-Means Clustering, Hierarchical Clustering, DBSCAN clustering, OPTICS, STING. The result of this comparison is that k mean is better for large data set means it can increase the performance with increasing clusters. Hierarchical algorithm is useful in categorical data. Density based methods OPTICS, DBSCAN are designed to find clusters of arbitrary shape whereas partitioning and hierarchical methods are designed to find the spherical shaped clusters Shiv Pratap Singh Kushwah, Keshav Rawat, Pradeep Gupta [16] In this paper presents the comparison of data mining algorithms for clustering. In this paper cover classification, clustering techniques. Data mining is used in every field for the analysis of large volumes of data. The K mean approach is use to predict the solution less sensitive to initialization and provides results at multiple resolutions, and K-mean algorithm is also sensitive to the presence of outliers. KNN classification is an easy to understand and easy to implement classification technique.
3. DATA MINING TECHNIQUES Data mining methods like prediction, clustering and relationship mining are mostly used in the field of marketing ,agriculture and finance etc . These methods can be efficiently applied on educational data .Data mning having many type of techniues like cluastering,classification,neural netwok etc but in this paper we are conisider only two techniques . Clustering techniques Classification techniques Predication Association rule Neural networks Decision Trees Nearest Neighbor Method
Figure 3.1: Data mining techniques[3]
4. CLUSTERING Clustering is used to store the data into groups according to their values, characteristics, similarities and dissimilates In this approach same type data are store in same groups and these groups are known as clusters but data is heterogeneous between two clusters. Clusters can be apply on group of some schools to investigate the similarities and differences between these schools, students can be clustered together to study the differences in their behavior etc [3]. There many types of algorithm are available:-
PARTITIONING MATHOD o K-mean method o K- Medoids method HIERARCHICAL METHODS o Agglomerative o Divisive GRID BASED DENSITY BASED METHODS o DBSCAN
Fig 4.1 clustering techniques. International Journal of EmergingTrends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org Volume 3, Issue 3, May June 2014 ISSN 2278-6856
Volume 3, Issue 3 May June 2014 Page 68
In educational data mining, clustering is play very important role . It is used to provide group the students according to their behavior e.g. clustering define clusters according to active student from non-active student according to their performance in activities
Fig 4.2: Clustering Students Based On Performance
5. CLASSIFICATION:- Classification is a simple process to finding a model that describes and distinguishes data classes of test.It is both types supervised learning and unsupervised..It consists of two steps: Model construction. Model usage Classification is classified into different models, these are followed:- Types of classification models:- Classification by decision tree induction Bayesian Classification Neural Networks Support Vector Machines (SVM) Classification Based on Associations
Model construction: It consists of set of predefined classes. We assume that each tuple /sample is belong to a predefined class. The set of tuple used for model construction is known as training set. These model can be represented as classification rules, decision trees. In this fig5.1 show the methmetics mmodel.
Figure 5.1: model construction[3] Model usage: This model is used for defining future or unknown objects. It is used unsupervised learning rule.
In educational data mining, classification provide result of a student, according to his/her final grade .The decision tree is used to represent logical rules of student final grade.
Fig 5.2: Student Classifications
6. APPLICATION OF DATA MINING IN EDUCATION SECTOR Educational data mining offers several more advantages, as compare to traditional educational research paradigms, such as labs experiments, , and design research etc. there are many application are available for education sector some applications are describe below:
Aksenova et al. (2006) develop a predictive algorithm for fresher, existing and returned students for all level education level. This model is developed according to population, unemployment rates of population, institutional fees, household income, enrolment past data of institutions. Kovacic, J. Zlatko (2010) define the students success according to students enrolment data (socio- demographic and environment variables)by using the data mining techniques such as CART . They concluded that by use of pre-enrolment information can identify students at-risk of dropping the course and advising to make them success.. Mardikyan and Badur (2012) identified some factors that affect instructor teaching performance in university or college or any institute with stepwise regression and decision tree of data mining techniques.
7. CONCLUSION Data mining is a broad area that integrates with several fields including machine learning, statistics, pattern recognition, artificial intelligence, and to analysis of large volumes of data etc. This paper examines International Journal of EmergingTrends & Technology in Computer Science(IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org Volume 3, Issue 3, May June 2014 ISSN 2278-6856
Volume 3, Issue 3 May June 2014 Page 69
techniques of data mining in education section. Data mining can also be applied by using if-then rule for find performance, and students complaint etc.
REFRANCES [1] Dr. Mohd Maqsood Ali ROLE OF DATA MINING IN EDUCATION SECTOR International Journal of Computer Science and Mobile Computing, IJCSMC, Vol. 2, Issue. 4, April 2013, pg.374 383. [2] Witten, I. H. and Frank, E., Data Mining: Practical Machine Learning Tools and Techniques, 2 nd
Edition, Morgan Kaufman Publishers, San Francisco, 2005, p.5 [3] S. Anupama Kumar and M. N. Vijayalakshmi Relevance of Data Mining Techniques in Edification Sector International Journal of Machine Learning and Computing, Vol. 3, No. 1, February 2013 [4] Umamaheswari and S. Niraimathi A Study on Student Data Analysis Using Data Mining Techniques International J ournal of Advanced Research in Computer Science and Software Engineering [5] Dr. Mohd Maqsood Ali ROLE OF DATA MINING IN EDUCATION SECTOR International Journal of Computer Science and Mobile Computing, IJCSMC, Vol. 2, Issue. 4, April 2013, pg.374 383. [6] M. Sukanya, S. Biruntha, Dr.S. Karthik and T. Kalaikumar Data Mining: Performance Improvement inEducation Sector using Classification and ClusteringAlgorithm International Conference on Computing and Control Engineering (ICCCE 2012), 12 & 13 April, 2012 [7] Richard A. Huebner ,Norwich University A survey of educational data mining research Research in Higher Education Journal [8] Bharat Chaudhari1, Manan Parikh2 A Comparative Study of clustering algorithms Using weka tools International Journal of Application or Innovation in Engineering & Management (IJAIEM) Volume 1, Issue 2, October 2012 [9] Sharaf Ansari1, Sailendra Chetlur2, Srikanth Prabhu3, N. Gopalakrishna Kini4, Govardhan Hegde5, Yusuf Hyder6 An overview of clustering algorithms used in data mining International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250- 2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 12, December 2013). [10] Narendra Sharma , Aman Bajpai and Mr. Ratnesh Litoriya Comparison the various clustering algorithms of weka tools International Journal of Emerging Technology and Advanced Engineering Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012) [11] Bakar, A. A., Kefli, Z., Abdullah, S., Sahani, M.: Predictive Models for Dengue Outbreak Using Multiple Rulebase Classifier. In: International Conference on Electrical Engineering and Informatics, pp. 1-6 ,2011 [12] Johns, S., Santos, M. V.: On the Evolution of Neural Networks for Pairwise Classification Using Gene Expression Programming. In: Proceedings of the Annual Conference on Genetic and Evolutionary Computation, pp. 1903-1904, 2009 [13] Oyelade, O. J, Oladipupo, O. O, Obagbuwa, I. C Application of k-Means Clustering algorithm for prediction of Students Academic Performance (IJCSIS) International Journal of Computer Science and Information Security, Vol. 7, _o. 1, 2010. [14] Madhuri V. Joseph, Lipsa Sadath and Vanaja Rajan Data Mining: A Comparative Study on Various Techniques and Methods International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 2, February 2013 ISSN: 2277 128X [15] Aastha Joshi, Rajneet Kaur A Review: Comparative Study of Various Clustering Techniques in Data Mining International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 3, March 2013. [16] Shiv Pratap Singh Kushwah, Keshav Rawat, Pradeep Gupta Analysis and Comparison of Efficient Techniques of Clustering Algorithms in Data Mining International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-1, Issue-3, August 2012
Programming Skills For Data Science: Start Writing Code To Wrangle, Analyze, and Visualize Data With R (Addison-Wesley Data & Analytics Series) - Michael Freeman