You are on page 1of 32

Matakuliah : M0824-Data Mining

Tahun
: Sep - 2011

Introduction of Data Mining


Session 01

Learning Outcomes
Explain data mining concepts and techniques.

Bina Nusantara
University

Acknowledgments
These slides have been adapted
from Han, J., Kamber, M., &

Pei, Y. (2006). Data Mining:


Concepts and Technique. Edisi
2. Morgan Kaufman. San
Francisco.

Bina Nusantara

Outline Materi

What Motivated Data Mining?


Data Mining On What Kind of Data?
Data Mining Functionalities
Classification of Data Mining Systems
Integration of a Data Mining Systems with a
Database or Data Warehouse Systems
Major Issues in Data Mining

5
Bina Nusantara

What Motivated Data


Mining?

Bina Nusantara
University

Why Data Mining?


The Explosive Growth of Data: from terabytes to petabytes
Data collection and data availability
Automated data collection tools, database systems, Web, computerized
society
Major sources of abundant data
Business: Web, e-commerce, transactions, stocks,
Science: Remote sensing, bioinformatics, scientific simulation,
Society and everyone: news, digital cameras, YouTube
We are drowning in data, but starving for knowledge!
Necessity is the mother of inventionData miningAutomated analysis of massive
data sets

Bina Nusantara University

What Is Data Mining?


Data mining (knowledge discovery from data)
Extraction of interesting (non-trivial, implicit, previously unknown and
potentially useful) patterns or knowledge from huge amount of data
Data mining: a misnomer?
Alternative names
Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
Watch out: Is everything data mining?
Simple search and query processing
(Deductive) expert systems

Bina Nusantara University

Knowledge Discovery (KDD) Process


This is a view from typical
Pattern Evaluation
database systems and data
warehousing communities
Data mining plays an
Data Mining
essential role in the
knowledge discovery process

Task-relevant Data
Selection

Data Warehouse
Data Cleaning
Data Integration
March 6, 2016

Databases

Data Mining: Concepts and


Techniques

What is (not) Data Mining?

What is not Data Mining?


Look up phone
number in phone
directory
Query a Web search
engine for information
about Amazon

What is Data Mining?


Certain names are more prevalent in
certain US locations (OBrien, ORurke,
OReilly in Boston area)
Group together similar documents
returned by search engine according to
their context (e.g. Amazon rainforest,
Amazon.com,)

Data Mining in Business Intelligence


Increasing potential
to support
business decisions

Decision
Making
Data Presentation
Visualization Techniques

End User

Business
Analyst

Data Mining
Information Discovery

Data
Analyst

Data Exploration
Statistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
Data Sources
March 6, 2016 Paper, Files, Web documents, Scientific experiments, Database Systems
Data Mining: Concepts and
Techniques

DBA
11

Data Mining Tasks...

Classification [Predictive]
Clustering [Descriptive]
Association Rule Discovery [Descriptive]
Sequential Pattern Discovery [Descriptive]
Regression [Predictive]
Deviation Detection [Predictive]

Data Mining On What


Kind of Data?

Bina Nusantara
University

13

Data Mining Function: (1) Generalization


Information integration and data warehouse
construction
Data cleaning, transformation, integration, and
multidimensional data model

Data cube technology


Scalable methods for computing (i.e., materializing)
multidimensional aggregates
OLAP (online analytical processing)

Multidimensional concept description: Characterization


and discrimination
Generalize, summarize, and contrast data characteristics, e.g.,
dry vs. wet region
March 6, 2016
Data Mining: Concepts and
Techniques

14

Data Mining Function: (2) Association and


Correlation Analysis
Frequent patterns (or frequent itemsets)
What items are frequently purchased together in your
Walmart?

Association, correlation vs. causality


A typical association rule

Diaper Beer [0.5%, 75%] (support, confidence)


Are strongly associated items also strongly correlated?

How to mine such patterns and rules efficiently in large


datasets?
How to use such patterns for classification, clustering,
and other applications?

March 6, 2016

Data Mining: Concepts and


Techniques

15

Association Rule Discovery: Application


1

Marketing and Sales Promotion:


Let the rule discovered be
{Bagels, } --> {Potato Chips}
Potato Chips as consequent => Can be used to
determine what should be done to boost its sales.
Bagels in the antecedent => Can be used to see which
products would be affected if the store discontinues
selling bagels.
Bagels in antecedent and Potato chips in consequent =>
Can be used to see what products should be sold with
Bagels to promote sale of Potato chips!

Bina Nusantara University

16

Association Rule Discovery: Application


2

Supermarket shelf management.

Goal: To identify items that are bought together by


sufficiently many customers.
Approach: Process the point-of-sale data collected with
barcode scanners to find dependencies among items.
A classic rule --

If a customer buys diaper and milk, then he


is very likely to buy beer.

Bina Nusantara University

17

Data Mining Function: (3) Classification


Classification and label prediction
Construct models (functions) based on some training examples
Describe and distinguish classes or concepts for future prediction
E.g., classify countries based on (climate), or classify cars
based on (gas mileage)
Predict some unknown class labels
Typical methods
Decision trees, nave Bayesian classification, support vector
machines, neural networks, rule-based classification, patternbased classification, logistic regression,
Typical applications:
Credit card fraud detection, direct marketing, classifying stars,
diseases, web-pages,
March 6, 2016

Data Mining: Concepts and


Techniques

18

Classification: Application 1
Direct Marketing
Goal: Reduce cost of mailing by targeting a set
of consumers likely to buy a new cell-phone
product.
Approach:
Use the data for a similar product introduced before.
We know which customers decided to buy and which
decided otherwise. This {buy, dont buy} decision
forms the class attribute.

Bina Nusantara University

19

Classification: Application 2
Fraud Detection
Goal: Predict fraudulent cases in credit card
transactions.
Approach:
Use credit card transactions and the information on
its account-holder as attributes.
When does a customer buy, what does he buy, how
often he pays on time, etc

Bina Nusantara University

20

Classification: Application 3
Customer Attrition/Churn:
Goal: To predict whether a customer is likely to be lost to a
competitor.
Approach:

Use detailed record of transactions with each


of the past and present customers, to find
attributes.
How often the customer calls, where he calls, what time-of-the day
he calls most, his financial status, marital status, etc.

Label the customers as loyal or disloyal.

Bina Nusantara University

21

Data Mining Function: (4) Cluster Analysis


Unsupervised learning (i.e., Class label is unknown)
Group data to form new categories (i.e., clusters),
e.g., cluster houses to find distribution patterns
Principle: Maximizing intra-class similarity &
minimizing interclass similarity
Many methods and applications

March 6, 2016

Data Mining: Concepts and


Techniques

22

Clustering: Application 1
Market Segmentation:
Goal: subdivide a market into distinct subsets
of customers where any subset may
conceivably be selected as a market target to
be reached with a distinct marketing mix.
Approach:
Collect different attributes of customers based on
their geographical and lifestyle related information.
Find clusters of similar customers.

Bina Nusantara University

23

Clustering: Application 2
Document Clustering:
Goal: To find groups of documents that are similar to
each other based on the important terms appearing in
them.
Approach: To identify frequently occurring terms in each
document. Form a similarity measure based on the
frequencies of different terms.

Bina Nusantara University

24

Classification of Data
Mining Systems

Bina Nusantara
University

25

Data Mining: Confluence of Multiple Disciplines


Machine
Learning

Pattern
Recognition

Applications

Data Mining

Algorithm

Database
Technology

March 6, 2016

Data Mining: Concepts and


Techniques

Statistics

Visualization

High-Performance
Computing
26

Integration of a Data Mining


Systems with a Database or
Data Warehouse Systems

Bina Nusantara
University

27

Integration of Data Mining and Data Warehousing


Data mining systems, DBMS, Data warehouse systems coupling
No coupling, loose-coupling, semi-tight-coupling, tight-coupling
On-line analytical mining data
integration of mining and OLAP technologies
Interactive mining multi-level knowledge
Necessity of mining knowledge and patterns at different levels of
abstraction by drilling/rolling, pivoting, slicing/dicing, etc.
Integration of multiple mining functions

March 6, 2016

Characterized classification, first clustering and then association


Data Mining: Concepts and
Techniques

28

Architecture: Typical Data Mining System


Graphical User Interface
Pattern Evaluation
Data Mining Engine

Know
ledge
-Base

Database or Data
Warehouse Server
data cleaning, integration, and selection

Database
March 6, 2016

Data
World-Wide Other Info
Repositories
Warehouse
Web
Data Mining: Concepts and
Techniques

29

Major Issues in Data


Mining

Bina Nusantara
University

30

Major Challenges in Data Mining


Efficiency and scalability of data mining algorithms
Parallel, distributed, stream, and incremental mining methods
Handling high-dimensionality
Handling noise, uncertainty, and incompleteness of data
Incorporation of constraints, expert knowledge, and background
knowledge in data mining
Pattern evaluation and knowledge integration
Mining diverse and heterogeneous kinds of data: e.g.,
bioinformatics, Web, software/system engineering, information
networks
Application-oriented and domain-specific data mining
Invisible data mining (embedded in other functional modules)
Protection of security, integrity,
and
privacy
Data Mining:
Concepts
andin data mining

March 6, 2016

Techniques

31

Dilanjutkan ke pert. 02
Data Pre-processing

Bina Nusantara

You might also like