You are on page 1of 31

Data Mining and Its

Applications
Data Mining Techniques For Marketing, Sales, and Customer Support, by
Michael J.A. Berry and Gordon Linoff, John Wiley & Sons, Inc., 1997.
Discovering Data Mining from concept to implementation, by Cabena, Harjinian,
Stadler, Verhees and Zanasi, Prentice Hall, 1997.
Building Data Mining Applications for CRM, by Alex Berson, Stephen Smith and Kurt
Thearling, McGraw Hall, 1999.
Data Mining Cookbook Modeling Data for Marketing, Risk, and Customer
Relationship Management, by Olivia Parr Rud, John Wiley & Sons, Inc, 2001.
Mastering Data Mining The Art and Science of Customer Relationship
management, by Michael J.A. Berry and Gordon S. Linoff, John Wiley & Sons, Inc,
2000.
Machine Learning, by Tom M. Mitchell, McGraw-Hill, 1997.
Data Mining Concepts and Techniques, by Jiawei Han and Micheline Kamber,
Morgan Kaufmann, 2001.
Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar,
Addison Wesley, 2005.
Data Mining and Its Applications

Why Mine Data?


Lots of data is being collected

and warehoused

Web data, e-commerce


purchases at department/
grocery stores
Bank/Credit Card
transactions

Computers have become cheaper and more powerful


Competitive Pressure is Strong
Provide better, customized services for an edge (e.g. in Customer
Relationship Management)
Data Mining and Its Applications

Mining Large Data Sets - Motivation


There is often information hidden in the data

that is not readily evident


Human analysts may take weeks to discover useful
information
Much of the data is never analyzed at all

Data Mining and Its Applications

What is Data Mining?


Many Definitions

Non-trivial extraction of implicit, previously unknown and


potentially useful information from data
Exploration & analysis, by automatic or
semi-automatic means, of
large quantities of data
in order to discover
meaningful patterns

Data Mining and Its Applications

What is (not) Data Mining?


What is not Data
Mining?

Look up phone
number in phone
directory
Query a Web
search engine for
information about
Amazon

What is Data Mining?


Certain names are more
prevalent in certain US
locations (OBrien, ORurke,
OReilly in Boston area)
Group together similar
documents returned by
search engine according to
their context (e.g. Amazon
rainforest, Amazon.com,)
Data Mining and Its Applications

Data Mining Tasks


Prediction Methods

Use some variables to predict unknown or


future values of other variables.

Description Methods

Find human-interpretable patterns that


describe the data.

Data Mining and Its Applications

Three Main Data Mining Tasks


Classification
Clustering
Association Rule Discovery
There

are many other approaches. But most


of them can be categorized into one of the
three approaches.

Data Mining and Its Applications

Data Mining for Retail Industry


Retail industry: huge amounts of data on sales,

customer shopping history, etc.


Applications of retail data mining

Identify customer buying behaviors


Discover customer shopping patterns and trends
Improve the quality of customer service
Achieve better customer retention and satisfaction
Enhance goods consumption ratios
Design more effective goods transportation and
distribution policies

July 9, 2016

Data Mining: Concepts and Techniques

Customer Profiling
what kinds of customers were
profitable in last year?

Question:
Data

Customer details such as Age, Gender, Occupation,


Salary Levels, Account, etc.,
Earnings from customers in last year.
Data Mining
Divide customers into profitability categories
according to earnings such as highly profitable,
profitable, non-profitable, loss.
Find rules using data mining techniques
Analyze the rules and take actions

Data Mining and Its Applications

Customer Profiling: Rules


IF age > 30 and Age <=45 and
occupation is professional and
salary level is between 50,000 and 70,000
Then this user is profitable
The rules are with some statistic support such as support and confidence.

Data Mining and Its Applications

10

Customer Segmentation
Customer segmentation is a process to divide

customers into different groups or segments.


Customers in the same segment have similar needs
or behaviors so that similar marketing strategies
or service policies can be applied to them.
Customer segments are required in several
business areas including

Marketing
Customer services
Products and service development
Sales promotion
Purchase recommendation
Customer retention

Data Mining and Its Applications

11

Customer Retention

In most industries the cost of retaining a customer,


subscriber or client is substantially less than the
initial cost of obtaining that customer.
Question:

Find out what kinds of customers tend to churn and build a


model which can predict the likely-to-churn customers.

Data mining solution:

Collect data about the customers who have churned.


Select a set of customers who have been loyal.
Merge the two data sets to form training, testing and
evaluation data sets.

Data Mining and Its Applications

12

Financial Products Recommendation


Mellon Bank Corporation is a major financial services

company head-quarted in Pittsburgh.

Build an extendible loan secured by the values of a


clients own property.
Achieve the highest possible Return On Investment.
Based on customers with DDA, build a model for HELOC.

Data Mining and Its Applications

13

Data Preparaton
The primary data source was the approximately

40,000 Mellon customers who had (or once had)


HELCOCs and DDAs.
Data

Demographic data sourced both internally and externally (age,


income, length of residence, and other indicators of economic
condition)
DDA data (history of loan balance over 3, 6, 9, 12, 18 months,
history of returned checks, history of interest rates.
Property data sourced externally (home purchase price, loan-tovalue ratio)
Other data related to credit worthiness

Use 120 variables


Data Mining and Its Applications

14

Data Mining and Its Applications

15

Responders

Data Mining and Its Applications

16

Basket Analysis

Data Mining and Its Applications

17

Basket Analysis
A B C

A CD

B CD

ADE

B C E

Rule

Support

Confidence

AD
CA
AC
B&CD

2/5
2/5
2/5
1/5

2/3
2/4
2/3
1/3
Data Mining and Its Applications

18

The Impact of Fraud


GAO (The United States General Accounting

Office) cited $19.1 billion in improper


government payments in 17 major programs
for fiscal year 1998.

Medicare $12.6 Billion


Supplemental Security Income $1.6 B
The Food Stamp Program $1.4 B
Old Age and Survival Insurance $1.2 B
Disability Insurance $941 Million
Housing Subsidies $847 Million
Veterans Benefits, Unemployment Insurance and
Others $514 Million

Data Mining and Its Applications

19

Background
HIC (The Health Insurance Commission) in

Australia is a federal government agency.


HIC pays insurance claims more than 20
million Australian dollars and pay out about
A$8 billion in funds every year.
More than 300 million transactions are
processed and stored every year. 1.3TB in
five year.

Data Mining and Its Applications

20

Preventing Fraud and Abuse


Business Objectives

The focus of the HIC project was on the


recent and steady 10% annual rise in the cost
of pathology claims for clinical tests.

Approaches

To identify potential fraudulent claims or


claims arising from inappropriate practice, and
To develop general profiles of the GP practices
in order to compare practice behaviors of
individual GPs.

Data Mining and Its Applications

21

Data Proprocessing
Two databases
Episode Database
One Episode record records a patient visit.
In total, 6.8 million records.
There were 227 different pathology tests.

GP (doctor) database
There are 17,000 records related to active GPs

The behavior of 10,409 GPs was to be studied.


A matrix of 10,409 by 227 elements.
The elements were then scaled from 0 to 1 with respect
to the total number of tests of each kind.

Data Mining and Its Applications

22

Input to Segmentation

Data Mining and Its Applications

23

Overview

Data Mining and Its Applications

24

Data Mining
They conducted association rule mining, when support =

0.25% the team decided that the presence of some


tests in the input database was causing spurious rules
to be revealed (Pathology Episode Initiation (PEI)).
PEI tests depend on who ordered them and where they
were ordered.
When the PEI tests were removed, the number of rules
dropped significantly.

Data Mining and Its Applications

25

Result Analysis
A request for a microscopic examination of

feces for parasites (OCP) was associated


with a cultural examination of feces (FCS)
in 0.85% of cases.
A 92.6% chance that if OCP tests were
requested, they would be done with FCS.
A 0.61% of chance, OCP was associated with a
different more expensive test called MCS32,
which costs A$13.55 per test.

Data Mining and Its Applications

26

GP Profiles

Data Mining and Its Applications

27

Discussions
Segment 13:
Represent the majority of traditional GPs who
are practicing conventionally. 5,450 GPs. Total
52% of GPs.
Only 6.2% of the medical pathology tests
Segment 4:
54 GPs. Only 0.51% of GPs.
2.7% of the medical pathology tests.

Data Mining and Its Applications

28

Financial Data Mining: News Sensitive


Stock Prediction

Data Mining and Its Applications

29

Advanced Topics
Sequential Mining
Time-Series Mining
Spatial Mining
Web Mining
Social Network Mining
Text Mining
Data Streaming Mining
Mining and Privacy
Data Mining and Its Applications

30

Examples of Data Mining Systems


Mirosoft SQLServer

Integrate DB and OLAP with mining


Support OLEDB for DM standard
SAS Enterprise Miner
A variety of statistical analysis tools
Data warehouse tools and data mining algorithms
IBM Intelligent Miner
A wide range of data mining algorithms
Scalable mining algorithms
data preparation, and data visualization tools
Tight integration with IBM's DB2 RDBMS
Clementine (SPSS)
An integrated data mining development environment for end-users
and developers
Multiple data mining algorithms and visualization tools
Weka (http://cs.waikato.ac.nz/ml/weka)
A free data mining tool.

Data Mining: Concepts and Techniques

31

You might also like