You are on page 1of 3

Discovering new knowledge with advanced data mining tool

Simon Kocbek1, Primoz Kosec2, Peter Kokol2, Mitja Lenic2 and Matjaz Debevc2
1
University of Maribor, Faculty of Health Care, Zitna ulica 15, 2000 Maribor, Slovenia
E-mail: simon.kocbek@uni-mb.si
2
University of Maribor, Faculty of Electrical Engineering and Computer Science,
Smetanova ulica 17, 2000 Maribor, Slovenia
E-mail: pkosec@uni-mb.si; kokol@uni-mb.si; mitja.lenic@uni-mb.si; matjaz.debevc@uni-mb.si

Abstract project. A special preparation was devoted to deaf and


heard of hearing people for which the adopted e-
In this paper we present the results of an intelligent learning materials, including video interpreter, were
analysis of database which was gathered during e- developed. For the purpose of evaluation and
learning project called DISNET [1] in which a random additional improvement of e-materials, the efficiency
sample of around 300 unemployed people of learning, and trainees’ satisfaction with e-learning
collaborated. The intelligent data analysis using materials and platform was tested.
advanced methods for decision tree construction was The collection of data has been performed in the
used in order to try to find the main factors that framework of European PHARE 2003 project
influence a trainee’s progress in knowing how to use DISNET entitled “Improvement of computer literacy
e-materials. We also analysed how trainee’s gender of unemployed adults” in the year 2006.
influences on the usage of e-materials.
2. Method
1. Introduction
Traces of machine learning community could
Nowadays the importance of data mining is rapidly already be found in mid 1960s. Trough time different
increasing, which could be crucial for all the learners approaches have evolved [2]. Most of the strength was
who try to digest large and various amounts of data. It and is concentrated in finding a way to extract
is the job of data mining to help these learners by generalized knowledge from the examples.
leveraging sophisticated techniques in data analysis, The selection of appropriate method for analysis of
restructuring, and organization. data can be crucial for success.
Data mining can also be extremely useful in the
field of education, manly because educators can 2.1. Existing algorithms
identify and monitor all students with special education
needs, who are increasingly integrated into regular Decision trees [3] are understandable to humans
curricula. Even though the process of tracking their and can be used even without a computer, but they
progress is even more difficult, data mining is able to have difficulties expressing complex nonlinear
provide improved situation awareness for educators. problem.
Our goal was to find out how more successful There are many other approaches, like
trainees using e-learning system defer from less representation of the knowledge with rules, rough-sets,
successful ones, and how each group of trainees feels case based reasoning, support vector machines,
about e-learning system. We also wanted to research different fuzzy methodologies, ensemble methods [4]
what male and female trainees think about usability of and they all try to answer the question: How to find
e-learning materials. optimal solution, i.e. learn how to learn.
More than 300 trainees were included in the Evolutionary approaches (EA) to knowledge
educational process that took place in a form of extraction are also a good alternative, because they are
courses following the method of blended learning. not inherently limited to local solution. They are based
People with disabilities were also included in the on evolutionary ideas of natural selection and genetic

Seventh IEEE International Conference on Advanced Learning Technologies (ICALT 2007)


0-7695-2916-X/07 $25.00 © 2007
processes of biological organisms. Genetic algorithms lost, but conversions can produce different aspect on
are able to evolve solutions to real-world problems, if the presented problem that can lead to better results.
they have been suitably encoded [5]. The second alternative requires some cut-points
Hybrid approaches rest on the assumption that only where knowledge representations can be merged. In
in the synergetic combination of single models can the decision tree internal nodes or decision leafs
unleash their full power [6]. Each of the single represent such cut points (Fig.1), i.e. a condition can
methods has its advantages, but also limitations and be replaced by another intelligent system (for ex-ample
disadvantages. Therefore, the logical step is to support vector machine - SVM). Such trees are called
combine different methods to overcome disadvantages hybrid decision trees.
and limitations of a single method. GA

2.2. Multimethod approach


Gini ID3, GA
While studying those approaches, we were inspired
by the idea of hybrid approaches and evolutionary
GA GA
algorithms. Both approaches are very promising in
achieving the goal to improve the quality of knowledge
extraction, and are not inherently limited to sub- … ID3
optimal solutions. We also noticed that almost all
attempts combining different methods use loose

coupling approach. The methods work almost
independent of each other, and therefore a lot of luck is Figure 1- An example of a decision tree induced using
needed to make them work as a team. multimethod approach. Each node is induced with
Multimethod approach introduces the idea of appropriate method (GA – genetic algorithm, ID3,
population of different intelligent systems that can Gini, Chi-square, J-measure, SVM, neural network,
produce multiple comparable good solutions, which etc.)
are incrementally improved using the EA approach. In
order to enable knowledge sharing between different 3. Preparing the data
methods the support for transformation between each
individual method is provided. Initial population of In order to evaluate the efficiency of e-learning, the
intelligent systems is generated using different following steps had to be taken:
methods. In each generation different operations 1. Pre-exam: before e-learning of each module a
appropriate for individual knowledge representation trainee was tested for his/hers prior knowledge.
are applied to improve existing and also to create new 2. E-learning of one module. Four modules were
intelligent systems. That enables incremental included: Internet, Information Technology (IT),
refinement of extracted knowledge, with different Microsoft Word and Microsoft Windows.
aspects of a given problem. For example, using 3. Post-exam: after the trainee had finished the course
different induction methods - such as different purity he/she took a new test.
measures - can be simply combined into the decision Each trainee had one week for learning each
tree. As long as the knowledge representation is the module. The pre-exam results were compared to post-
same, a combination of different methods is not a big exam results in order to establish how much did the
obstacle. The main problem is how to combine trainees learn in one week. We calculated the average
methods that use different knowledge representations progress of each trainee and we classified them into
(for example neural networks and decision trees). In two different classes: (1) Class 1 – The average
such cases we provide two alternatives: (1) to convert trainee’s progress was below 23% and (2) Class 2 –
one knowledge representation into another, using The average trainee’s progress was above or equal to
different (already known) methods or (2) to combine 23%.
both knowledge representations into a single intelligent In order to evaluate the usability of e-learning
system. course the trainees answered the SUMI (Software
The first alternative requires implementation of Usability Measurement Inventory) questionnaire [8].
knowledge transmutators (for example conversion of a We decided to create two types of datasets. Single
neural network into a decision tree). Such conversions sample in both types had the same input values: the
are not perfect and some of the knowledge is normally answers of SUMI questionnaire and the information

Seventh IEEE International Conference on Advanced Learning Technologies (ICALT 2007)


0-7695-2916-X/07 $25.00 © 2007
about disability. The output for the first type was the was that men were more comfortable in using e-
efficiency learning class and the output for the second materials than women. According to the average
type was the gender of the trainee. After clearing the trainee’s progress distribution the most significant
data, 169 samples were in each dataset, from which factor that influenced participant’s progress was
there were: (a) 81 trainees from efficiency learning managing data files. Consequently, they needed help
class 1 and 88 trainees from efficiency learning class 2, assistance, which was crucial for defining less
and (b) 60 female and 53 male trainees. Altogether, successful trainees (Class 1). The results of searching
there were 116 females, but we randomly reduced that for new patterns in educational database obtained with
number to avoid wrong conclusions. Both datasets intelligent data analysis based on multimethod decision
have also included 5 trainees with partial and 8 tree induction approach turned out to be very
trainees with full deafness. interesting. The induced decision trees were highly
Although 169 datasets may seem small amount of accurate in the terms of total accuracy, and also
data, the analysis of this kind of data by human would average class accuracy. However, some new
consume too much time and energy. Because of that interesting patterns, which can have some influence,
reason, we decided to use data mining techniques. were also shown but they should be more carefully
investigated and expertly evaluated. Presented results
4. Results also show that our multimethod approach for decision
tree induction can be used for knowledge discovery in
In order to perform the intelligent data analysis a educational databases.
complete dataset was randomly divided into training
and testing set at a ratio 2 to 1 by multimethod tool. 6. References
In the first experiment first type of dataset was
used, where the output was the efficiency learning [1] M. Debevc, M. Verlic, P. Povalej, P. Kokol,
class. We tested the data on several different methods “Designing and implementation of ECDL e-learning
and the accuracy of classification was 75%. The most material for deaf and hard of hearing”, 34.
important attribute was the following question Österreichischen Linguistiktagung, Universität
“Downloading and uploading data files into e-material Klagenfurt, 8.12.2006.
is difficult”. Those participants which agreed with the [2] Thrun S, Pratt L. Learning to Learn. Kluwer
statement fell into Class 1. Another interesting Academic Publishers 1998.
attribute was the statement “I frequently need help
when I am using e-material”. Trainees which agreed [3] Quinlan JR. C4.5: Programs for Machine Learning. San
with the statement fell into the same class. In the next Mateo: Morgan Kaufmann publishers, 1993.
experiment we used the second type dataset, which had [4] Dietterich TG. Ensemble Methods in Machine Learning.
the trainee’s gender as the output. Again, we tested the In: First International Workshop on Multiple Classifier
data on several different methods. The accuracy of Systems, Lecture Notes in Computer Science. New York:
classification was around 71%. The results showed Springer-Verlag, 2000: 1-15.
that the most important attribute in induced decision [5] Goldberg DE. Genetic Algorithms in Search,
trees was the question “Did you often feel tense using Optimization, and Machine Learning. Addison Wesley
e-material?” Female participants answered “yes” more Reading MA 1989.
often comparing to male participants. [6] Iglesias CJ. The Role of Hybrid Systems in Intelligent
More than half of all the females also agreed with Data Management: The Case of Fuzzy/neural Hybrids,
the statement “Sometimes I am not sure that I use the Control Engineering Practice 1996; 4 (6): 839-845.
right command for working with e-material”. On the
[7] M. Lenic, P. Kokol, “Combining classifiers with
other hand, male trainees did not have that problem,
multimethod approach”. In: Ajith et al. ABRAHAM. Soft
and most of them agreed with the statement “When I computing systems: design, management and applications
use this e-material I feel like I have total control of it”. 2002, Frontiers in artificial intelligence and applications.
Amsterdam: IOS Press, 2002; 87: 374-383.
5. Conclusion [8] J. Kirakowski, M. Corbett, “SUMI: The Software
Usability Measurement Inventory”, British Journal of
We presented the results of intelligent data analysis Educational Technology, 24 (3), 1993, pp. 210-212.
used in education. Advanced multimethod approach
was performed for classification of trainee’s progress
and gender. One of the most frequently shown patterns

Seventh IEEE International Conference on Advanced Learning Technologies (ICALT 2007)


0-7695-2916-X/07 $25.00 © 2007

You might also like