Professional Documents
Culture Documents
Introduction
KDD
ETL Process
Data Warehouse
Machine Learning
Data Mining and its Techniques
Association
Classification
Decision Trees
Clustering
Prediction
Time Series
Application of Data Mining in Stock Market Prediction
Implementation of Data Mining Techniques using RapidMiner
Conclusion
References
INTRODUCTION
3
Knowledge Discovery in
Databases (KDD) refers
to the process of finding
knowledge in data by
applying data mining
methods. Data mining
is considered a subset of
KDD, it focuses on
getting patterns and
information from data
sets suitable to the data
mining technique used.
5
Developing an Understanding
Developing an understanding of the application domain and
KDD goals of the end user etc.
Selection
Creating a target data set
Data Mining
Data cleaning and preprocessing
Interpretation/Evaluati
on Removal of noise or outliers, collecting necessary
information to model, and to make strategies for handling noise
or missing data fields.
6
Data reduction and projection
Preprocessing
The data mining task
Transformation
Deciding whether the goal of the KDD process is
Data Mining classification, regression, clustering, etc.
Interpretation/Evaluati
on Choosing the data mining
algorithm(s)
Selecting methods to be used for searching patterns in the
data. Deciding which model(s) and parameters may be
appropriate and matching a particular data mining method with
the overall criteria of the KDD process.
7
Data mining
Searching for patterns of interest in a particular
representational form or a set of such representations as
KDD classification rules or trees, regression, clustering, and so forth.
Selection
Interpreting mined patterns
ETL PROCESS
Extract
Transform
Load
EXTRACT
10
The Extract step covers the data extraction from the source
and makes it accessible for further processing.
Data Mart
Data Mart is a subset of warehouse and it contains
information pieces relevant to a specified single
subject (Business area or Functional area)
DATA WAREHOUSE
16
DATA MINING
"The process of
extracting
previously unknown,
comprehensible and
actionable
information from
large databases and
using it to make
crucial business
decisions"
Simoudis 1996.
DATA MINING
20
22
Association:
Association is one of the most popular and well researched data mining
technique. In association, a pattern is discovered based on a relationship
between items in the same transaction. Association technique is commonly
used by retailers to research customer's buying habits.
For example, Based on historical sale data, retailers might find out that
customers always buy milk when they buy tea, and, therefore, they can put
milk and tea next to each other to save time for customer and increase sales.
DATA MINING TECHNIQUES
25
Classification:
Classification is a classic data mining technique based on machine
learning. Basically, classification is used to classify each item in a set of data
into one of a predefined set of classes or groups.
Decision Trees
The decision tree is one of the most commonly used data mining
techniques because its model is easy to understand for the users. In decision
tree technique, the root of the decision tree is a simple question or condition
that has multiple answers. Each answer then leads to a set of questions or
conditions that help us determine the data so that we can make the final
decision based on it.
DECISION TREE PROCESS
28
Identify the root node, based on the test attribute which is determined on
The attribute with highest information gain becomes the test attribute.
Child nodes are identified based on recursive process of information gain.
Leaf nodes are identified if the attribute list becomes empty i.e. when there
Clustering:
Clustering is a data mining technique that makes a meaningful or useful
cluster of objects which have similar characteristics using the automatic
algorithms.
The clustering technique defines the classes and puts objects in each class,
while in the classification techniques, objects are assigned into predefined
classes.
This technique groups data into categories through un-supervised learning.
Unsupervised learning is implemented in clustering by partitioning the
database on random partitions.
Clustering Example
31
By using the clustering technique, we can keep books that have some kinds
of similarities in one cluster or one shelf and label it with a meaningful
name.
If readers want to grab books in that topic, they would only have to go to
that shelf instead of looking for the entire library.
DATA MINING TECHNIQUES
32
Prediction:
The prediction, as its name implied, is one of a data mining techniques that
discovers the relationship between independent variables and relationship
between dependent and independent variables.
For instance, the prediction analysis technique can be used in the sale to
predict profit for the future if we consider the sale is an independent
variable, profit could be a dependent variable. Then based on the historical
sale and profit data, we can draw a fitted regression curve that is used for
profit prediction.
DATA MINING TECHNIQUES
33
Time Series:
APPLICATION OF DATAMINING
TECHNIQUES IN STOCK MARKET
PREDICTION
Classification
Clustering
APPLICATION OF CLASSIFICATION IN STOCK
MARKET PREDICTION
35
Classification in stock exchange can be used by using decision trees [6]. This
helps investors know when to buy or sell the stocks.
For decision tree implementation of classification six attributes are taken
where class attribute is taken as action (whether to buy or sell the stock).
The definition of these attributes are shown in table (Next Slide)
APPLICATION OF CLASSIFICATION IN STOCK
MARKET PREDICTION
36
S# Status Description
Given these attributes the stock sample database will look like as shown in
the table below.
APPLICATION OF CLASSIFICATION IN STOCK
MARKET PREDICTION
38
If open, min, max, last > previous replace the values with positive sign.
If open, min, max, last < previous replace the values with negative sign.
If open, min, max, last = previous replace the values with equal.
After the conversion on the input data the transformed table has been
shown in the table.
APPLICATION OF CLASSIFICATION IN STOCK
MARKET PREDICTION
39
After the data transformation, the next step is to apply the classification
model using the decision tree technique. The rules so built are shown in the
Figure below.
Each of six attributes defined can be shown on the decision tree based on
the gain ratio. From Figure we have the root of the classification tree as
open which is positive and gives us sell indication with maximum gain
ratio.
APPLICATION OF CLUSTRING IN STOCK MARKET
PREDICTION
40
apply other data mining technique for getting information from the stock
exchange data; for example clustering may help in classification of data
which has no crisp boundaries. Once classes are well known other types of
analysis such as association rules or neural networks can be applied easily.
IMPLEMENTATION OF CLASSIFICATION AND
TIME SERIES IN RAPIDMINER
42
IMPLEMENTATION
Following were the operators
included in our RapidMiner
process:
Data Extraction
Renaming (Cleaning)
Date to Numerical
(Classification)
Filter Data for
January(Classification)
Windowing (Time Series)
Select Attributes
(Classification)
Renaming 2 (Cleaning)
Generate Gain / Loss Attribute
(Classification)
Generate Profit / Loss
Attribute (Classification)
Reorder Attributes (Cleaning)
Histogram (Visual
Representation)
RESULTS
44
Kia Motors:
Kia Motors Profit / Loss Frequency for January (2006-2016) Histogram
RESULTS
45
Toyota Motors:
Toyota Motors Profit / Loss Frequency for January (2006-2016) Histogram
RESULTS
46
The following table shows the profit / loss frequency comparison between
all three companies:
The table shows that only Toyota motors has greater profit frequency than
loss. So it is less risky to buy stocks of Toyota Motors for the month of
January as compared to other two companies.
CONCLUSION
48
[1] S. N. S. S. Sumathi, Introduction to Data Mining and its Applications vol. 29. Berlin:
Springer, 2006.
[2] U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, "From data mining to knowledge
discovery: an overview," in Advances in knowledge discovery and data mining, M. F. Usama, P.-
S. Gregory, S. Padhraic, and U. Ramasamy, Eds., ed: American Association for Artificial
Intelligence, 1996, pp. 1-34.
[3] S. Patel, P. Patel, and S. Patel, "Overview of ETL process with its important," International
Journal of Engineering Research and Application (IJERA), 2012.
[4] S. Sumathi and S. N. Sivanandam, Introduction to Data Mining and its Applications
(Studies in Computational Intelligence): Springer-Verlag New York, Inc., 2006.
[5] E. Hajizadeh, H. D. Ardakani, and J. Shahrabi, "Application of data mining techniques in
stock markets: A survey," Journal of Economics and International Finance, vol. 2, p. 109, 2010.
[6] Q. A. AL-RADAIDEH, A. A. ASSAF, and E. ALNAGI, "Predicting stock prices using data
mining techniques," The International Arab Conference on Information Technology 2013.
[7] S. Fallahpour, M. H. Zadeh, and E. N. Lakvan, "Use of Clustering Approach For Portfolio
Management," International SAMANM Journal of Finance and Accounting, 2014.