You are on page 1of 22

DATA MINING CONCEPTS

WHAT IS DATAMINING?

Definition:
It is the computational
process
of
discovering
patterns in large data sets
involving methods at the
intersection of artificial
intelligence,
machine
learning, statistics, and

WHAT IS DATAMINING?
Data mining (knowledge discovery

from data)
Extraction of interesting (non-trivial, implicit,

previously unknown and potentially useful)


patterns or knowledge from huge amount of
data
Exploration & analysis, by

automatic or semi-automatic
means, of large quantities of data
in order to discover meaningful

WHY DATAMINING?
Credit ratings/targeted marketing:
Given a database of 100,000 names, which
persons are the least likely to default on their
credit cards?
Identify likely responders to sales promotions
Fraud detection
Which types of transactions are likely to be

fraudulent, given the demographics and


transactional history of a particular customer?
Customer relationship management:
Which of my customers are likely to be the most
loyal, and which are most likely to leave for a
competitor?
Data
Mining :helps extract such

information

DATAMINING VS KDD
Knowledge

Discovery
in
Databases (KDD): process of
finding useful information and
patterns in data.
Data Mining: Use of algorithms
to extract the information and
patterns derived by the KDD
process.

KNOWLEDGE DISCOVERY
PROCESS
Knowledge
Data mining: the core Knowledge Interpretation
of knowledge
discovery process.
Data Mining

Task-relevant Data
Data transformations
Preprocessed
Data
Data Cleaning
Data Integration
Databases

Selection

GOALS
PREDICTION - Data mining can show how certain

attributes within the data will behave in the future


IDENTIFICATION - Data patterns can be used to
identify the existence of an item, an event, or an
activity.
CLASSIFICATION - Data mining can partition the
data so that different classes or categories can be
identified based on combinations of parameters.
OPTIMIZATION - Data mining may be to optimize
the use of limited resources such as time, space,
money, or materials and to maximize output
variables such as sales or profits under a given set
of constraints.

BASIC OPERATION
ASSOCIATION

RULES - These rules


correlate the presence of a set of
items with another range of values for
another set of variables.
- Example: When a female retail
shop- per buys a handbag, she is likely
to buy shoes.
CLASSIFICATION HIERARCHIES - The
goal is to work from an existing set of
events or transactions to create a
hierarchy of classes.

BASIC OPERATION
SEQUENTIAL PATTERNS - -A sequence

of actions or events is sought.


CLUSTERING - A given population of
events or items can be partitioned
(segmented) into sets of "similar"
elements.
- Example: An entire population of
treatment data on a disease may be
divided into groups based on the
similarity of side effects pro- duced.

ASSOCIATION RULES
An association algorithm creates

rules that describe how often events


have occurred together. (2)
Example: When a customer buys a

hammer, then 90% of the time they


will buy nails.

CLASSIFICATION
Given old data about customers and

payments, predict new applicants loan


eligibility.
Previous customers
Age
Salary
Profession
Location
Customer type

Classifier

Decision rules
Salary > 5
L
Prof. =
Exec

New applicants
data

Good/
bad

CLASSIFICATION
Decision Tree Method
Marrie
d

no

yes

Accnt
Bal

Salary
<20
k
poor
risk

>=2
0k

fair risk

>=5
0k

<5k

Age

poor
risk
good
risk

>=5
k

<25k

fair risk

>=2
5k

good
risk

Example of Decision Tree for Credit Card


Application

SEQUENTIAL PATTERNS
Given is a set of objects, with each object

associated with its own timeline of events, find


rules that predict strong sequential dependencies
among different events.

(A B)

(C)

(D E)

Rules are formed by first disovering patterns.

Event occurrences in the patterns are governed by


timing constraints.
(A B) (C) (D E)
<= xg

>ng<= ws

<= ms

SEQUENTIAL PATTERN:
EXAMPLE
In point-of-sale transaction sequences,
Computer Bookstore:

(Intro_To_Visual_C) (C++_Primer) -->


(Perl_for_dummies,Tcl_Tk)
Athletic Apparel Store:
(Shoes) (Racket, Racketball) -->
(Sports_Jacket)

CLUSTERING
Clustering algorithms find groups of

items that are similar. It divides a


data set so that records with similar
content are in the same group, and
groups are as different as possible
from each other.
Example: Insurance company could
use clustering to group clients by
their age, location and types of
insurance purchased.

CLUSTERING
Group Data into Clusters
Similar data is grouped in the same
cluster
Dissimilar data is grouped in the same
cluster
How is this achieved ?
K-Nearest Neighbor
A
classification
method

that
classifies a point by calculating the
distances between the point and
points in the training data set.
Then it assigns the point to the
class that is most common among

APPLICATIONS OF
DATAMINING
Banking: loan/credit card

approval

predict good customers based on old customers

Customer relationship

management:

identify those who are likely to leave for a

competitor.

Targeted marketing:

identify likely responders to promotions

Fraud detection:

telecmmunications, financial
transactions
from an online stream of event identify

fraudulent events

APPLICATIONS OF DATAMINING
Medicine: disease outcome,

effectiveness of treatments
analyze patient disease history: find
relationship between diseases
Molecular/Pharmaceutical:
identify new drugs
Scientific data analysis:
identify new galaxies by searching
for sub clusters
Web site/store design and
promotion:

APPLICATIONS OF DATA MINING

ADVANTAGES OF DATA MINING


Provides new knowledge from existing data
Public databases
Government sources
Company Databases
Old data can be used to develop new

knowledge
New knowledge can be used to improve
services or products
Improvements lead to:
Bigger profits
More efficient service

THANK
YOU &
GOD
BLESS!

You might also like