You are on page 1of 50

DATA MINING AND ITS

APPLICATION IN STOCK MARKET


1

MUHAMMAD SAUD KHAN


MUHAMMAD AZEEM IQBAL
SHEIKH ISRAR AHMAD
FAHAD ANWAAR
SCHEME OF PRESENTATION
2

Introduction
KDD
ETL Process
Data Warehouse
Machine Learning
Data Mining and its Techniques
Association
Classification
Decision Trees
Clustering
Prediction
Time Series
Application of Data Mining in Stock Market Prediction
Implementation of Data Mining Techniques using RapidMiner
Conclusion
References
INTRODUCTION
3

Databases / Data everywhere.


Enormous Amount of available data
Demand to extract knowledge from data
Data Mining to rescue
Data Mining Techniques
Application of Data Mining in Stock Market Prediction
4

KNOWLEDGE DISCOVERY IN DATABASES

Knowledge Discovery in
Databases (KDD) refers
to the process of finding
knowledge in data by
applying data mining
methods. Data mining
is considered a subset of
KDD, it focuses on
getting patterns and
information from data
sets suitable to the data
mining technique used.
5

Developing an Understanding
Developing an understanding of the application domain and
KDD goals of the end user etc.

Selection
Creating a target data set

Preprocessing Selecting a data set or data samples, on which discovery is


to be performed.
Transformation

Data Mining
Data cleaning and preprocessing
Interpretation/Evaluati
on Removal of noise or outliers, collecting necessary
information to model, and to make strategies for handling noise
or missing data fields.
6
Data reduction and projection

Finding useful features to represent the data depending on


KDD the goal of the task. It also includes using reduction or
transformation methods to reduce the effective number of
variables under consideration or to find simpler representations
for the data.
Selection

Preprocessing
The data mining task
Transformation
Deciding whether the goal of the KDD process is
Data Mining classification, regression, clustering, etc.
Interpretation/Evaluati
on Choosing the data mining
algorithm(s)
Selecting methods to be used for searching patterns in the
data. Deciding which model(s) and parameters may be
appropriate and matching a particular data mining method with
the overall criteria of the KDD process.
7
Data mining
Searching for patterns of interest in a particular
representational form or a set of such representations as
KDD classification rules or trees, regression, clustering, and so forth.

Selection
Interpreting mined patterns

Preprocessing Interpreting the trends by studying mined patterns .


Transformation

Data Mining Consolidating and visualizing


Interpretation/Evaluati discovered knowledge
on
Understanding and visually presenting the knowledge to
the targeted audience, which is discovered as a result of the whole
process.
ETL PROCESS
8

ETL stands for Extract-Transform-Load.


It is a process of both logical and physical implementation
Describes how data is extracted from its source and loaded to data
warehouse
Data is loaded from homogeneous or heterogeneous source system,
transformed for storing in the proper format and loaded in the
destination system or data warehouse.
After latest addition, ETL encompasses cleaning as a separate step. The
sequence has now become Extract-Clean-Transform-Load.
9

ETL PROCESS

Extract

Transform

Load
EXTRACT
10

The Extract step covers the data extraction from the source
and makes it accessible for further processing.

There are Three ways to perform the extract:


Update notification -- notification
Incremental extract -- may not notification
Full extract no notification
TRANSFORM
11

The transform step applies a set of rules to transform the data


from the source to the target.

The transformation step also requires joining data from


several sources, generating aggregates, generating surrogate
keys, sorting, deriving new calculated values, and applying
advanced validation rules.
TRANSFORM
12

An important function of transformation is the cleaning


of data in the data warehouse

Making identifiers unique (sex categories Male/Female/Unknown)


Convert null values into standardized Not Available/Not Provided
value
Convert phone numbers, ZIP codes to a standardized form
Validate address fields, convert them into proper naming, e.g.
Street/St/St./Str./Str.
Validate address fields against each other (State/Country,
City/State, City/ZIP code, City/Street).
LOAD
13

Load process loads data into the warehouse, it is


necessary to ensure that the load is performed
correctly and with as little resources as possible. The
target of the Load process is often a database.
ETL, DATA WAREHOUSE, DATA MINING
14
DATA WAREHOUSE
15

A data warehouse is an associate repository for all


the data that an enterprise's various business
systems collect. The repository may be physical or
logical.

Data Mart
Data Mart is a subset of warehouse and it contains
information pieces relevant to a specified single
subject (Business area or Functional area)
DATA WAREHOUSE
16

There are two approaches to data warehousing:


Top Down:
The top down approach builds data marts with the
help of ETL after complete data warehouse has been
created.
Bottom Up:
The bottom up approach builds the data marts first
and then combines them into a single, all-encompassing
data warehouse.
MACHINE LEARNING
17

Machine learning is a method of data analysis that


automates analytical model building. Using algorithms
that iteratively learn to grow and change when exposed to
new data, machine learning allows computer programs to
find hidden insights without being explicitly programmed
where to look.

The process of machine learning is similar to that of data


mining but..
MACHINE LEARNING
18

Machine learning algorithms are often categorized as:


Supervised:
Supervised algorithms can apply what has been
learned from old data to new data.
Unsupervised:
Unsupervised algorithms are used to draw
conclusion from datasets.
19

DATA MINING

"The process of
extracting
previously unknown,
comprehensible and
actionable
information from
large databases and
using it to make
crucial business
decisions"
Simoudis 1996.
DATA MINING
20

Can be applied to any type of data ranging from weather


forecasting, stock market trend, electric load prediction,
product design, etc.
Can be described as an analytic process designed to
explore data (usually large amounts of data) in search of
consistent patterns and/or systematic relationships
between variables.
Then to validate the findings by applying the detected
patterns to new subsets of data.
The ultimate goal is prediction.
DATA MINING
21

The process of data mining consists of three stages:


Exploration
This stage involves preparation and collection of data. It also involves data
cleaning and transformation. Based on size of data, different tools to analyze
the data may be required. This stage helps to determine different variables of
the data to determine their behavior.
Model Building
It involves choosing the best model based on their predictive performance.
This stage is also referred as pattern identification. It is a little complex stage
because it involves choosing the best pattern to allow easy predictions.
Deployment
Application of the model to new data in order to generate predictions or
to get estimates of the expected outcome.
Muhammad Azeem Iqbal

22

DATA MINING TECHNIQUES


Association
Classification
Decision Trees
Clustering
Prediction
Time Series
APPLICATION OF DATA MINING IN STOCK MARKET
PREDICTION
IMPLEMENTATION OF DATA MINING TECHNIQUES
USING RAPIDMINER
CONCLUSION
23

DATA MINING TECHNIQUES

Data mining contains


several algorithms and
techniques for picking
out interesting patterns
from large data sets. We
will discuss following
data mining techniques
Association
Classification
Decision Trees
Clustering
Prediction
Time Series
DATA MINING TECHNIQUES
24

Association:
Association is one of the most popular and well researched data mining
technique. In association, a pattern is discovered based on a relationship
between items in the same transaction. Association technique is commonly
used by retailers to research customer's buying habits.

For example, Based on historical sale data, retailers might find out that
customers always buy milk when they buy tea, and, therefore, they can put
milk and tea next to each other to save time for customer and increase sales.
DATA MINING TECHNIQUES
25

Classification:
Classification is a classic data mining technique based on machine
learning. Basically, classification is used to classify each item in a set of data
into one of a predefined set of classes or groups.

Classification method makes use of mathematical techniques such as


decision trees, linear programming, neural network and statistics. In
classification, we develop the algorithm (software) that can learn how to
classify the data items into groups.
CLASSIFICATION
26

For example, we can apply classification in the application that


given all records of employees who left the company, predict who
will probably leave the company in a future period. In this case, we
divide the records of employees into two groups that named leave
and stay. And then we can ask our data mining software to classify
the employees into separate groups.
DATA MINING TECHNIQUES
27

Decision Trees
The decision tree is one of the most commonly used data mining
techniques because its model is easy to understand for the users. In decision
tree technique, the root of the decision tree is a simple question or condition
that has multiple answers. Each answer then leads to a set of questions or
conditions that help us determine the data so that we can make the final
decision based on it.
DECISION TREE PROCESS
28

Identify the root node, based on the test attribute which is determined on

the basis of information gain.


Define test attribute which is determined on the basis of information gain.

The attribute with highest information gain becomes the test attribute.
Child nodes are identified based on recursive process of information gain.

Leaf nodes are identified if the attribute list becomes empty i.e. when there

are no test attributes available.


DECISION TREE EXAMPLE
29

Buys computer - Decision Tree

IF age = youth and student = no THEN buys_computer = no


IF age = youth and student = yes THEN buys_computer = yes
DATA MINING TECHNIQUES
30

Clustering:
Clustering is a data mining technique that makes a meaningful or useful
cluster of objects which have similar characteristics using the automatic
algorithms.
The clustering technique defines the classes and puts objects in each class,
while in the classification techniques, objects are assigned into predefined
classes.
This technique groups data into categories through un-supervised learning.
Unsupervised learning is implemented in clustering by partitioning the
database on random partitions.
Clustering Example
31

In a library, there is a wide range of books on various topics available. The


challenge is how to keep those books in a way that readers can take several
books on a particular topic without hassle.

By using the clustering technique, we can keep books that have some kinds
of similarities in one cluster or one shelf and label it with a meaningful
name.

If readers want to grab books in that topic, they would only have to go to
that shelf instead of looking for the entire library.
DATA MINING TECHNIQUES
32

Prediction:

The prediction, as its name implied, is one of a data mining techniques that
discovers the relationship between independent variables and relationship
between dependent and independent variables.

For instance, the prediction analysis technique can be used in the sale to
predict profit for the future if we consider the sale is an independent
variable, profit could be a dependent variable. Then based on the historical
sale and profit data, we can draw a fitted regression curve that is used for
profit prediction.
DATA MINING TECHNIQUES
33

Time Series:

Time series is a sequence of data points, measured typically at successive


times, spaced at (often uniform) time intervals. Time series analysis
comprises methods that attempt to understand such time series, either to
understand the underlying context of the data points i.e. where did they
come from and how they were generated or to make forecasts (predictions).
34

APPLICATION OF DATAMINING
TECHNIQUES IN STOCK MARKET
PREDICTION

Classification

Clustering
APPLICATION OF CLASSIFICATION IN STOCK
MARKET PREDICTION
35

Classification in stock exchange can be used by using decision trees [6]. This
helps investors know when to buy or sell the stocks.
For decision tree implementation of classification six attributes are taken
where class attribute is taken as action (whether to buy or sell the stock).
The definition of these attributes are shown in table (Next Slide)
APPLICATION OF CLASSIFICATION IN STOCK
MARKET PREDICTION
36

S# Status Description

1 Previous Previous day close price of the stock

2 Open Current day open price of the stock

3 Min Current day minimum price of the stock

4 Max Current day maximum price of the stock

5 Last Current day close price of the stock

6 Action The action taken by the investor on this stock


Buy, Sell
APPLICATION OF CLASSIFICATION IN STOCK
MARKET PREDICTION
37

Given these attributes the stock sample database will look like as shown in
the table below.
APPLICATION OF CLASSIFICATION IN STOCK
MARKET PREDICTION
38

If open, min, max, last > previous replace the values with positive sign.
If open, min, max, last < previous replace the values with negative sign.
If open, min, max, last = previous replace the values with equal.
After the conversion on the input data the transformed table has been
shown in the table.
APPLICATION OF CLASSIFICATION IN STOCK
MARKET PREDICTION
39

After the data transformation, the next step is to apply the classification
model using the decision tree technique. The rules so built are shown in the
Figure below.

Each of six attributes defined can be shown on the decision tree based on
the gain ratio. From Figure we have the root of the classification tree as
open which is positive and gives us sell indication with maximum gain
ratio.
APPLICATION OF CLUSTRING IN STOCK MARKET
PREDICTION
40

Clustering techniques application in stock exchange is useful for portfolio


management [7] or preprocessing of data to identify unknown classes for
further application of other data mining technique.
Clustering can also be used to identify stocks which have similar temporal
behavior. For example we may have stocks in winter related to tea and
coffee with similar buying patterns based on increased profitability. In this
scenario clustering application will model both the stock items of tea and
coffee in the same cluster. Thus giving investors an indication that if the
value of one of the stocks increases there is likelihood that other will also
increase.
APPLICATION OF CLUSTRING IN STOCK MARKET
PREDICTION
41

Some researchers adopted clustering as a preprocessing step to further

apply other data mining technique for getting information from the stock
exchange data; for example clustering may help in classification of data
which has no crisp boundaries. Once classes are well known other types of
analysis such as association rules or neural networks can be applied easily.
IMPLEMENTATION OF CLASSIFICATION AND
TIME SERIES IN RAPIDMINER
42

We implemented classification and time series techniques to predict the


safest company in terms of estimated risk in buying stocks for the month of
January based on historic data collected from Yahoo Finance. We selected
three motor car manufacturing companies for our implementation:
Kia Motors
Toyota Motors
BMW
The process was implemented on the data of past ten years for these
companies.
43

IMPLEMENTATION
Following were the operators
included in our RapidMiner
process:
Data Extraction
Renaming (Cleaning)
Date to Numerical
(Classification)
Filter Data for
January(Classification)
Windowing (Time Series)
Select Attributes
(Classification)
Renaming 2 (Cleaning)
Generate Gain / Loss Attribute
(Classification)
Generate Profit / Loss
Attribute (Classification)
Reorder Attributes (Cleaning)
Histogram (Visual
Representation)
RESULTS
44

Kia Motors:
Kia Motors Profit / Loss Frequency for January (2006-2016) Histogram
RESULTS
45

Toyota Motors:
Toyota Motors Profit / Loss Frequency for January (2006-2016) Histogram
RESULTS
46

BMW (Bayerische Motoren Werke):


BMW Profit / Loss Frequency for January (2006-2016) Histogram
RESULTS
47

The following table shows the profit / loss frequency comparison between
all three companies:

Sr. Company Loss Profit


Freq. Freq.
1 Kia Motors 139 97
2 Toyota Motors 106 113
3 BMW Motors 125 116

The table shows that only Toyota motors has greater profit frequency than
loss. So it is less risky to buy stocks of Toyota Motors for the month of
January as compared to other two companies.
CONCLUSION
48

Data mining is an important part of knowledge discovery


process which can analyze an enormous set of data and get hidden
and useful knowledge. Data mining can be applied effectively not
only in the business environment but also in other fields such as
weather forecast, medicine, transportation, healthcare, insurance,
government etc. Data mining can be very beneficial if suitable
techniques are adopted for certain tasks.
CONCLUSION
49

Suitability of data mining technique depends on the main goal to be


achieved by the user and the underlying structure of the database. In
case of stock exchange the main goal of the investor is informed
decision making and predict the future price. As the nature of the
data for stock exchange is non linear in nature, therefore no
particular data mining technique can be considered best suitable to
predict future prices. However an approximation of prediction can
be made either through historical data mining or through the
pattern recognition
REFERENCES
50

[1] S. N. S. S. Sumathi, Introduction to Data Mining and its Applications vol. 29. Berlin:
Springer, 2006.
[2] U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, "From data mining to knowledge
discovery: an overview," in Advances in knowledge discovery and data mining, M. F. Usama, P.-
S. Gregory, S. Padhraic, and U. Ramasamy, Eds., ed: American Association for Artificial
Intelligence, 1996, pp. 1-34.
[3] S. Patel, P. Patel, and S. Patel, "Overview of ETL process with its important," International
Journal of Engineering Research and Application (IJERA), 2012.
[4] S. Sumathi and S. N. Sivanandam, Introduction to Data Mining and its Applications
(Studies in Computational Intelligence): Springer-Verlag New York, Inc., 2006.
[5] E. Hajizadeh, H. D. Ardakani, and J. Shahrabi, "Application of data mining techniques in
stock markets: A survey," Journal of Economics and International Finance, vol. 2, p. 109, 2010.
[6] Q. A. AL-RADAIDEH, A. A. ASSAF, and E. ALNAGI, "Predicting stock prices using data
mining techniques," The International Arab Conference on Information Technology 2013.
[7] S. Fallahpour, M. H. Zadeh, and E. N. Lakvan, "Use of Clustering Approach For Portfolio
Management," International SAMANM Journal of Finance and Accounting, 2014.

You might also like