You are on page 1of 9

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882

Volume 4, Issue 10, October 2015

A Utility Mining Approach for Building a Knowledge-based


Recommender for Educational Decision Support
Deborah Evelyn. S1,
1

Department of Computer Science and Engineering, University College of Engineering, Kanchipuram

ABSTRACT
With the boom in the number of streamlined career
options that are available today, there is a need for
strategic guidance to students enrolled into a course of
broader study. Helping them discover their domain of
expertise will therefore result in a hassle free path
towards a successful career. This is the key notion of this
paper to help students connect the dots. For this we can
employ data mining technologies, which provide a
collection of methodologies to discover vital patterns
and relationships among data within large data sets. The
database can be based on curriculum and evaluation
process. Then the corresponding patterns would show
students expertise. By the development of a knowledgebased recommender, this information can be conveyed to
the student community. This system has proved to be
synergistic in educational decision support. Alongside
this concept, a psychological proposition called the First
Letter Hypothesis has been put forth through research on
the databases.
Keywords Association rules, Confidence, Knowledge
base, Recommendation, Support, Utility mining.
I.
INTRODUCTION
1.1. SIGNIFICANCE OF UTILITY MINING
The vastness and accessibility of data has indeed
motivated the formulation of various strategies to
unravel meaningful knowledge hidden in huge databases
through data mining. Of all the numerous mining
techniques that are available, frequent itemset pattern
mining and utility mining techniques have gained much
significance. Some of the key factors for this
development are the nature of the databases (the
transition from static types to transactional and
incremental types of databases.) and the nature of the
attributes and the entities contained in it.
More recently, there has been a noted drift of the
application domains that employ data mining, toward the
utility mining approach, from the frequent itemset
mining approach because the latter implicitly considers
the utilities of the item sets contained to be equal and
represents their occurrences with binary values.
Secondly, in the frequent itemset mining, values of item
sets only increase with frequency. These limitations had

resulted in the development of a better strategy, i.e. the


utility mining technique.
The utility mining approach has been formulated to
identify item sets of high utilities (e.g. profit margin,
value, user preferences, etc.) and also to allow the users
to set utility threshold of all item sets in a database. It is
an improvised version of the frequent itemset pattern
mining strategy and is the most state-of-art approach that
can be adapted.
1.2. RECOMMENDER SYSTEMS
Recommender systems are a subclass of information
filtering system that seek to predict the rating or
preference that a user would give to an item.
There are four types of recommender systems as given
below
Content-based: It is an approach that focuses on the
content, i.e. the type of file or format of information
mined in the past activities of users.
Collaborative: It is an approach that works with a
predictive model trained by the logs of the past activities
of users.
Hybrid: It is a combination of the above types.
Knowledge-based: This type of recommender system is
the one that truly depends on a knowledge base built by
the association rules mined in the process of data
mining.
The last type of recommender system discussed above is
the best suit for this project as the type of database that
was mined was a static database. The other types of
recommender systems are designed to work with
transactional and incremental databases.
1.3. ROLE OF DATA MINING IN THE
RECOMMENDER SYSTEM
The association rules generated during the data mining
process was used to formulate the knowledge base of the
recommender system. Associations among the attributes
of the database in-hand were generated using Apriori
and PredictiveApriori algorithms. Thus the knowledge
base so obtained provided meaningful insight on the
students approach to their respective curriculum and
also proved the last letter hypothesis to be true.

www.ijsret.org

1025

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 10, October 2015

1.4. OTHER
RELATED
KEYWORDS
AND
DEFINITIONS
1.4.1. DATA MINING
Data mining is the process of discovering interesting
knowledge from large amount of data stored in database,
data warehouse or other information repositories. Based
on this view, the architecture of a typical system has the
following major components.
1.4.2.

ASSOCIATION RULES, SUPPORT AND


CONFIDENCE
Association rules are used to show the relationship
between data items. Mining association rules allows
finding rules of the form: X-> Y for all X and Y in U.
Here X and Y are item sets of some data set U.
Support and confidence are common methods used to
measure the quality of association rule. Support for the
association rule X->Y is the percentage of transaction in
the database that contains XUY. Confidence for the
association rule is X->Y is the ratio of the number of
transaction that contains XUY to the number of
transaction that contain X.

Figure 1.1: Association Rules


1.4.3. FIRST LETTER HYPOTHESIS
This hypothesis states that, individuals whose names
begin with the last ten letters of the alphabet series are
better achievers and competitors than those whose
names begin with the first ten letters of the alphabet
series.

II.

THE
ARCHITECTURE
OF
EDUCATIONAL RECOMMENDER

THE

The following diagram shows a simple schematic of the


proposed architecture of this project. The project is
divided into three layers for implementation ease. The
Application layer deals purely with the creation and
preprocessing of the databases. The data mining layer
consists of the set of activities that are aimed at efficient
extraction of association rules (meaningful patterns of
this project) and the formulation of the knowledge base
with the most realistic association rules that were mined
during this process. Special focus has been given to the
formulation of the knowledge base and its details are
clearly explained in the following texts of this chapter
and the association rules generated through the mining
process are discussed in chapter 4.

Figure 2.1: Educational Recommender Architecture


2.1. APPLICATION LAYER
The database containing the student information is the
application database for this project. Since no such
database was pre-existent, it was created to through
questionnaires. The following steps were involved in this
process.
2.1.1.
DATA
COLLECTION
AND
PREPROCESSING
The databases were created with the data submitted by
the students of UCEK through a prudently completed
questionnaire. It consisted of the following sections to be
completed against every mainstream subjects in the
curriculum for Computer Science and Engineering at
levels of under graduation study.
Understanding (rating range: 1-3)
Marks Scored (rating range: 1-3) (1-E and below),( 2C,D) ,(3-B and above)
Confidence (rating range: 1-3)
First Attempt (P, PF, F) (P-cleared the finals, PF- cleared
the finals but not in the first attempt, F- still a backlog)
The details furnished were then preprocessed for
efficient and compatible association mining by
smoothing and transformation. The following
transformations were carried out in order to make the
database compatible with the mining tool.
The range 1-3 for understanding was transformed into
Nominal (1), Sound (2) and Profound (3). The range 1-3
for marks scored was transformed into Low (1), Average
(2) and Good (3). The range 1-3 for confidence rating
was transformed into Doubtful (1), Secure (2) and
Confident (3). No transformation was required on the
first attempt column. The unsupervised transformation
function NumericToNominal was applied to numeric
attributes.
The missing values were found on elective subjects these
were smoothened by applying the minimum threshold
value of the entity in the respective column. This
smoothing was necessary as the data mining tool used
cannot handle null values.
www.ijsret.org

1026

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 10, October 2015

2.1.2. PROCEDURE FOR THE CONSTRUCTION


OF THE APPLICATION LAYER
1. Collect the required data by administering survey
questionnaires or by employing other survey methods to
the population of interest (in this project, student
community).
2. Apply data cleaning strategies for efficiency during
the mining process.
3. Apply the required data transformation strategies for
compatibility of attribute types.
4. Create the database in a format compatible with the
data mining tool that is to be used.
2.2. DATA MINING LAYER
This step involved the application of association
algorithms for the formulation of association rules.
Hence, the next phase of the project was to mine the
preprocessed databases. Attempts to cluster the database
through the tool had failed due to the large number of
attributes considered and the exceeding size of the
database on the whole. The subjects were clustered
manually into the following categories.
Table 2.1: Categories and Subjects
SL. Categories And Subjects
NO
1

Application Programming
Fundamentals Of Computing And Programming
Object Oriented Programming
Java Programming Paradigms
Critical Programming
Fundamentals Of Computing And Programming
Data Structures
Design Analysis And Algorithms
Hardware Logic
Electric Circuits And Electron Devices
Digital Principles Of System Design
Microprocessors And Microcomputers
Computer Organization And Architecture
Advanced Computer Architecture
System Theory
Operating System
System Software
Machine Learning
Artificial Intelligence
Theory Of Computations
Principles Of Compiler Design
Network Study
Computer Networks
Web Technology
Software Engineering
Software Engineering
Object Oriented Analysis And Design
Database Techniques
Database Management Systems
Advanced Database Technology

Each cluster was mined independently using both


Apriori and PredictiveApriori algorithms and the
inferences of these procedures were compared for
analysis and to find out which was a more realistic
approach of the two; these inferences are discussed in
chapter 6.
2.2.1.
ASSOCIATION
RULE
MINING
ALGORITHMS
Apriori algorithm is a frequent itemset mining strategy
which learns as it operates over a transactional database.
An item that is frequently encountered during the mining
process has a greater support value. Thus its an
algorithm that highlights the general trend in a particular
dataset.
The pseudo code for the algorithm is given below for a
transaction database , and a support threshold of .
Usual set theoretic notation is employed; though note
that is a multiset. Ck is the candidate set for level . At
each step, the algorithm is assumed to generate the
candidate sets from the large item sets of the preceding
level, heeding the downward closure lemma. Count[c]
accesses a field of the data structure that represents
candidate set c, which is initially assumed to be zero.
Many details are omitted below, usually the most
important part of the implementation is the data structure
used for storing the candidate sets, and counting their
frequencies.

Apriori, while historically significant, suffers from a


number of inefficiencies or trade-offs, which have
spawned other algorithms. Candidate generation
generates large numbers of subsets (the algorithm
attempts to load up the candidate set with as many as
possible before each scan). Bottom-up subset
exploration (essentially a breadth-first traversal of the
subset lattice) finds any maximal subset S only after all
of

www.ijsret.org

its

proper

subsets.

1027

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 10, October 2015

PredictiveApriori algorithm overcomes this by


considering a threshold that various along the process
and by continually predicting the course of the following
steps or associations.
Thus the knowledge base of the recommender system
can be formed of the most realistic association rules
formulated from both algorithms. This knowledge base
will direct the recommender service it would provide
through the front end to the user. The following
schematic shows the architecture of the recommender.
2.2.2. PROCEDURE FOR THE CONSTRUCTION
OF THE KNOWLEDGE BASE
1. Load the database into the data mining environment.
2. Apply the necessary filters and transforms for
attribute type compatibility. This step is necessary
because both Apriori and PredictiveApriori algorithms
cannot handle varying data types.
3. Compare association rules generated.
4. Evaluate the same based on accuracy rate and support
count.
5. Select realistic associations to build the knowledge
base.
2.3. RECOMMENDATION LAYER
The knowledge base programmed into the recommender
systems source code is the cornerstone for the
recommender service. It is the recommender algorithm.
In this project the knowledge base formulated in the
previous stage of the project is programmed into it. In a
knowledge based recommender the interfacing
applications software is not required.

III.

IMPLEMENTATION DETAILS

was set to generate the 10 best association rules as a


result of the process. The comparison based on the
performance evaluation for the two algorithms are
discussed in the following chapter.

Figure 3.3: All attributes of application programming


after preprocessing
3.3. RECOMMENDATION USER INTERFACE
The front end was developed in Java programming
language in the NetBeans IDE 6.9.1. The swing
components and their associated event handling
mechanisms were implemented. Each category of the
core stream subjects were created to be introduced and
explained about practically in separate frames. Each
category frame was made to display the subjects under
it, its application and the scope or the job titles that it
involved.
3.4. FLOW DIAGRAM OF IMPLEMENTATION
METHODOLOGY

This chapter provides a brief look into how the


architecture was implemented using the various
components mentioned in chapter 4.
3.1. DATABASES FOR RESEARCH
The databases where created as per the
procedure found in chapter 3. Initially the data from the
flat files were fed into MS Excel and saved in the
comma separated version. The various anomalies were
corrected as mentioned in chapter 3. Each category had a
corresponding database. The instances in all the
databases were equal (200) and each database had
varying number of attributes corresponding to the
subjects dealt under it.
3.2. ASSOCIATION RULE MINING WITH WEKA
The association rules were generated using
the algorithms, Apriori and PredictiveApriori. Each log
was executed with a 10 cycles of cross validation and

Figure 3.4: Flow diagram of Implementation


Methodology
Thus the various implementation strategies are
explained. The results are discussed in the following
chapter.
www.ijsret.org

1028

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 10, October 2015

IV.

RESULTS AND DISCUSSIONS

4.1. ASSOCIATION RULE MINING


The results of the Association rules mining
processes yielded two types of inferences. The
application specific inferences regarding the patterns and
rules that were generated from application databases
Sl.
No
1.1.

and, the domain specific inferences i.e. technical


inferences on the comparison between Apriori and
PredictiveApriori association rule mining algorithms.
The following association rules were generated using
Apriori and PredictiveApriori algorithms.

Association Rules

Algorithm
Used
Apriori

Application Programming
1.2.

Predictive
Apriori

Application Programming
2.1.

Apriori

Critical Programming
2.2.

Predictive
Apriori

Critical Programming
www.ijsret.org

1029

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 10, October 2015

3.1.

Apriori

Hardware Logic
3.2.

Predictive
Apriori

Hardware Logic
4.1.

Apriori

System Theory
4.2.

Predictive
Apriori

System Theory

5.1.
Apriori

Machine Learning
www.ijsret.org

1030

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 10, October 2015

5.2.

Predictive
Apriori

6.1.

Apriori

Network Study
6.2.

Predictive
Apriori

Network Study
7.1.

Apriori

Software Engineering
7.2.

Predictive
Apriori

Software Engineering

www.ijsret.org

1031

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 10, October 2015

8.1.

Apriori

Database Techniques
8.2.

Predictive
Apriori

Database Techniques
Table 4.1. Tabulation of association rules
mining strategy. The relationships are established among
attributes with varying support count.
Conversely, in the PredictiveApriori strategy
relationships are built among attributes with the same
support count first.
The following graph shows the difference in support
count along execution in the two algorithms.

4.2. COMPARISON BASED ON SUPPORT AND


CONFIDENCE
The following table shows the Support and Confidence
values of the 10 best association rules generated by the
two algorithms that are discussed. All the results of
Apriori algorithm show that the support value of the
association increases gradually with time. This clearly
shows that Apriori algorithm is a frequent itemset

Figure 4.1: Comparison of Support Count


In the following graph the comparison in the confidence
or accuracy for the same associations rules have been
drawn. This graph shows that PredictiveApriori
algorithm has a higher accuracy level than Apriori
algorithm. The fall in accuracy rate of PredictiveApriori
algorithm is relatively very small in comparison with the

Figure 4.2: Comparison of Confidence


other. The margin of slope of the PredictiveApriori
algorithm is smaller. Higher accuracy correlates to
higher utility. The following graph shows the accuracy
plot of the two algorithms discussed.

www.ijsret.org

1032

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 10, October 2015

4.3. RESULTANT GRAPHS OF FIRST LETTER


HYPOTHESIS
The following graph shows the plot of performance in
the finals against the first letters of candidates names.

Figure 4.3: First Letter Hypothesis Plot


From the graph it is clear that the statement of first letter
hypothesis of true. The following chapter holds the
inference of the same.

V. CONCLUSION AND FUTURE WORK


5.1. CONCLUSION
Thus the implementation of a recommender system was
completed successfully and the comparison drawn
between the two algorithms infer that PredictiveApriori
performs better based on the predictive accuracy and the
various statistical measures that were considered. The
following inferences were observed in this educational
research. Candidates who have a good understanding on
the basics do well in the successive levels. A secure
level of understanding and Confidence produce a greater
possibility for success in finals. Basic course titles are
very easy to succeed in finals. A good understanding
may not always lead to an equivalent level of success.
Two dimensional graphs that were generated as a
graphical result of the association rule mining process
showed clearly that the psychological relationship
between performance or competitiveness and the first
letter of a candidates name, as defined by the first letter
hypothesis was true. This is because the individuals with
names beginning with the first ten letters of the alphabet
series are always first in line, in sorting and hence dont
have the urge to fight in order to move ahead; the

converse holds true for the individuals with their names


beginning with the last ten letters of the alphabet series.

REFERENCES
Journal Papers:
[1] Sunita B Aher, Lobo. L. M. R. J.(2012), Data
Preparation Strategy in E-Learning System using
Association Rule Mining Algorithm, International
Journal of Computer Applications, Volume 41-pages 3540.
[2] Sunita B Aher, Lobo. L. M. R. J.(2012), A
Comparative Study for Selecting the Best Unsupervised
learning Algorithm in E- learning Systems,
International Journal of Computer Applications, Volume
41-pages 27-34.
[3] Sunita B Aher, Lobo. L. M. R. J.(2011), Data
Mining in Educational System in WEKA, International
Journal of Computer Applications, International
Conference on Emerging Technology Trends.
[4] Sunita B Aher, Lobo. L. M. R. J.(2011), A
Framework for Recommendation of courses in Elearning System, International Journal of Computer
Applications, Volume 35-pages 21-28.
[5] Sunita B Aher, Lobo. L. M. R. J.(2012), A
Comparative Study of Association Rule Algorithms for
Course Recommender System in E-learning,
International Journal of Computer Applications, Volume
39-pages 48-52.
[6] Mukesh Sharma, Jyothi Choudhary, Gunjan Sharma
(2013), Evaluating the performance of apriori and
predictive apriori algorithm to find new association
rules based on the statistical measures of datasets,
International Journal of Engineering Research and
Technology, Volume 6.
[7] Shwetha, Kanwal Garg (2013), Mining Efficient
Association Rules Through Apriori Algorithm Using
Attributes and Comparative Analysis of Various
Association Rule Algorithms, International Journal of
Advanced Research in Computer Science and Software
Engineering, Volume 3.
Web Source:
[8] Wikipedia Apriori algorithm
https://en.wikipedia.org/wiki/Apriori_algorithm

www.ijsret.org

1033

You might also like