You are on page 1of 37

Bab 1

Pengantar Data
Mining

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 1/48


Evolution of Database Technology
1960s:
Data collection, database creation, IMS and network DBMS
1970s:
Relational data model, relational DBMS implementation
1980s:
RDBMS, advanced data models (extended-relational, OO,
deductive, etc.) and application-oriented DBMS (spatial,
scientific, engineering, etc.)
1990s2000s:
Data mining and data warehousing, multimedia databases,
and Web databases

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 2/48


What is Data Mining ?
Data mining merupakan bagian dari penggalian
pengetahuan (knowledge discovery) yang tidak
diketahui sebelumnya

Knowledge Discovery in Databases (KDD)

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 3/48


Data Mining: A KDD Process
(Another View)
Pattern Evaluation
Data mining: the core of
knowledge discovery process.
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 4/48


KDD Process
Step 1: Goal Identification
Defined
Goals

Step 2: Create Target Data


Step 3: Data Preprocessing
Data
Warehouse Cleansed
Data

Target
Transactional Data
Database
Step 4: Data Transformation

Transformed
Flat
Data
File

Step 6: Interpretation & Evaluation


Step 5: Data Mining

Data
Model

Step 7: Taking Action

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 5/48


Motivation:
Necessity is the Mother of Invention
Masalah ledakan data

Kita dibanjiri oleh data, tetapi kekurangan pengetahuan

Solution: Data warehousing and data mining


Data warehousing dan on-line analytical processing
Ekstraksi dari pengetahuan (rules, regularities, patterns,
constraints) dari basis data

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 6/48


Scale of Data

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 7/48


Why Mine Data? - Commercial Viewpoint
Kebanyakan Data telah
terkumpul dan tersimpan
Log kunjungan Web
Transaksi online/e-commerce
Transaksi pada departemen store
dan toko grosir
Transaksi perbankan dan
kartu kredit

Harga dan kemampuan komputer

Kebutuhan untuk berkompetisi dengan strategi yang tepat


menjadi lebih tinggi

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 8/48


Why Mine Data? - Scientific Viewpoint
Data diambil dan disimpan
dengan kecepatan tinggi (GB/hour)
Remote sensors pada satelit
Telescopes scanning the skies
Micro-arrays generating gene
expression data
Scientific simulations
generating terabytes of data
Teknik tradisional sudah tidak mampu lagi
mengolah data mentah
Data mining siap membantu para ilmuwan
In classifying and segmenting data
In Hypothesis Formation
IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 9/48
Why must Data Mining?
Data yang sedemikian besar kadang memiliki informasi
yang tersembunyi
Kemampuan manusia yang terbatas untuk mempelototi
data-data yang besar dalam analisis

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 10/48


Definisi-definisi Data Mining

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 11/48


Pengertian yang Salah

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 12/48


Data Mining Applications
Analisis Basis Data dan pendukung keputusan
Manajemen dan analisis pasar
target marketing, customer relation management, market
basket analysis, cross selling, market segmentation
Manajemen dan analisis resiko
Forecasting, customer retention, improved underwriting,
quality control, competitive analysis
Fraud detection and management
Aplikasi yang lain
Text mining (news group, email, documents) and Web analysis.
Intelligent query answering

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 13/48


Origins of Data Mining
Gabungan dari beberapa ilmu seperti machine
learning/AI, pattern recognition, statistics, dan
sistem basis data
Teknik-teknik tradisional
Statistics/
sudah tidak mampu lagi AI
Machine Learning/
Pattern
menangani data yang ada Recognition

Jumlah data yang sangat besar Data Mining


Dimensi tinggi
Data heterogen
Database
systems

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 14/48


Origins Data Mining:
Confluence of Multiple Disciplines
Database
Statistics
Technology

Machine
Learning
Data Mining Visualization

Information Other
Science Disciplines

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 15/48


Data Mining Tasks
Prediction Methods (prediksi)
Menggunakan beberapa variabel untuk
memprediksi hal-hal yang akan datang yang
tidak diketahui sebelumnya

Description Methods (deskriptif)


Menemukan pola pengenalan manusia yang
menggambarkan data yang ada

From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 16/48


Data Mining Tasks...
Classification [Predictive]
Clustering [Descriptive]
Association Rule Discovery [Descriptive]
Sequential Pattern Discovery [Descriptive]
Regression [Predictive]
Deviation Detection [Predictive]

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 17/48


Data
Kumpulan obyek data
dan atributnya
Obyek : record, point,
case, sampel, entitas,
instan
Atribut / variabel / field :
karakteristik dari obyek
(status pernikahan, umur,
dll)

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 18/48


Classification: Definition
Ekstraksi pola pengelompokkan atau pengklasifikasian
sebuah himpunan obyek/data (himpunan training) ke
dalam kelas (class) tertentu berdasarkan atribut-atributnya

Input: himpunan data (training set )


Each record contains a set of attributes, one of the attributes is the class.

Output: menemukan model pengelompokkan dengan


mengambil salah satu atribut kelas

Goal: menggunakan model yang telah ditemukan untuk


memprediksi kelas dari data baru
Himpunan testing, digunakan untuk menguji keakuratan dari model yang
telah ditemukan

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 19/48


Classification Example

Tid Refund Marital Taxable Refund Marital Taxable


Status Income Cheat Status Income Cheat

1 Yes Single 125K No No Single 75K ?


2 No Married 100K No Yes Married 50K ?
3 No Single 70K No No Married 150K ?
4 Yes Married 120K No Yes Divorced 90K ?
5 No Divorced 95K Yes No Single 40K ?
6 No Married 60K No No Married 80K ? Test
10

Set
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No Learn
Training
10 No Single 90K Yes Model
10

Set Classifier

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 20/48


Classification: Application 1

Fraud Detection

From [Berry & Linoff] Data Mining Techniques, 1997

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 21/48


Classification: Application 2

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 22/48


Classification: Application 3

Sky Survey Cataloging


Goal: To predict class (star or galaxy) of sky objects,
especially visually faint ones, based on the telescopic
survey images (from Palomar Observatory).
3000 images with 23,040 x 23,040 pixels per image.
Approach:
Segment the image.
Measure image attributes (features) - 40 of them per object.
Model the class based on these features.
Success Story: Could find 16 new high red-shift quasars,
some of the farthest objects that are difficult to find!

From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 23/48


Classifying Galaxies Courtesy: http://aps.umn.edu

Early Class: Attributes:


Stages of Formation Image features,
Characteristics of light
waves received, etc.
Intermediate

Late

Data Size:
72 million stars, 20 million galaxies
Object Catalog: 9 GB
Image Database: 150 GB

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 24/48


Teknik / Metode

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 25/48


Clustering

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 26/48


Clustering: Application 1

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 27/48


Clustering: Application 2

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 28/48


Mengukur Kemiripan Atribut
Data

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 29/48


Mengukur Kemiripan Atribut
Data

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 30/48


Association Analysis: Definition

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 31/48


Association Analysis: Applications

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 32/48


Association Rule Discovery: Application 1

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 33/48


Association Rule Discovery: Application 2

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 34/48


Ilustrasi 1

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 35/48


Ilustrasi 2

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 36/48


Ilustrasi 3

IR & DM S2 Arif Djunaidy FTIF ITS Bab 1 - 37/48

You might also like