Professional Documents
Culture Documents
AND
DATA MINING
Prepared By
TEAM ROLL NO
MAHESH B. ROHIT 20
ROHITKUMAR R. THAKOR 30
NILESH H. TILOKANI 31
RAKESH M. VYAS 32
Overview
❚ Introduction
❚ Data Warehousing
❚ Data Mining
A producer wants to know….
Which are our
lowest/highest margin
customers ?
Who are my customers
What is the most and what products
effective distribution are they buying?
channel?
[Barry Devlin]
What are the users saying...
❚ Data should be integrated
across the enterprise
❚ Summary data has a real
value to the organization
❚ Historical data holds the
key to understanding data
over time
❚ What-if capabilities are
required
What is Data Warehousing?
A process of
Information
transforming data into
information and
making it available to
users in a timely
enough manner to
make a difference
❚ Function
❙ Missing data: Decision support requires historical data, which
op dbs do not typically maintain.
❙ Data consolidation: Decision support requires consolidation
(aggregation, summarization) of data from many
heterogeneous sources: op dbs, external sources.
❙ Data quality: Different sources typically use inconsistent data
representations, codes, and formats which have to be
reconciled.
Why Now?
Data Warehouse
Engine Analyze
Purchased Query
Data
Legacy
Data Metadata Repository
Components of the Warehouse
Acct. Interest
Name BalanceDate Opened Address
No Rate
Frequently
accessed Rarely
accessed
Acct. Acct. Interest
Balance Name Date Opened Address
No No Rate
Smaller table
and so less
I/O
True Warehouse
Data Sources
Data Warehouse
Data Marts
Data Warehouse for Decision
Support & OLAP (on-line Analytical Processing)
❚ Putting Information technology to help the
knowledge worker make faster and better
decisions
❙ Which of my customers are most likely to go
to the competition?
❙ What product promotions have the biggest
impact on revenue?
❙ How did the share price of software
companies correlate with profits over last 10
years?
Decision Support
Application-Orientation Subject-Orientation
Operation Data
al Warehouse
Database
Credit
Loans Customer
Card
Vendor
Product
Trust
Savings Activity
OLTP vs. Data Warehouse
❚ The Data
Warehouse helps
to “optimize” the
business
Wal*Mart Case Study
❚ Founded by Sam Walton
❚ One the largest Super Market Chains in
the US
❚ Extracting ❚ Enrichment
❚ Conditioning ❚ Scoring
❚ Scrubbing ❚ Loading
❚ Merging ❚ Validating
❚ Householding ❚ Delta Updating
Data Transformation Terms
❚ Extracting
❙ Capture of data from operational source in
“as is” status
❙ Sources for data generally in legacy
mainframes in VSAM, IMS, IDMS, DB2; more
data today in relational databases on Unix
❚ Conditioning
❙ The conversion of data types from the source
to the target data store (warehouse) --
always a relational database
Data Transformation Terms
❚ Householding
❙ Identifying all members of a household
(living at the same address)
❙ Ensures only one mail is sent to a
household
❙ Can result in substantial savings: 1 lakh
catalogues at Rs. 50 each costs Rs. 50
lakhs. A 2% savings would save Rs. 1
lakh.
Data Transformation Terms
❚ Enrichment
❙ Bring data from external sources to
augment/enrich operational data. Data
sources include Dunn and Bradstreet, A.
C. Nielsen, CMIE, IMRA etc...
❚ Scoring
❙ computation of a probability of an
event. e.g..., chance that a customer
will defect to AT&T from MCI, chance
that a customer is likely to buy a new
product
Loads
Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providersValue added data
Utilities Power usage analysis
Data Mining in Use
❚ IT Department uses Data Mining to
ascertain financial transaction of
individuals
❚ Basketball teams use it to track game
strategy
❚ Warranty Claims Routing
❚ Holding on to Good Customers
❚ Removing Bad Customers
What makes data mining possible?
❚ Data Warehousing
provides the Enterprise
with a memory