You are on page 1of 12

Data Warehousing

What is Data Warehouse?


•A single, complete and consistent store of data obtained
from a variety of different sources made available to end
users in a way, they can understand and use in a business
context.
•A process of transforming data into information and making
it available to users in a timely enough manner to make a
difference.

• Data warehousing refers to single, centralized, and unified


repository of data that works across the enterprise.

• A data warehouse is subject-oriented, integrated, time-


varying, non-volatile collection of data that is used primarily
in organizational decision making.
Process of Data warehousing
•Large corporate bodies with their continuing businesses want
to retain that data and undertake analysis of data for looking at
their profitability trends in complex environment.

• The data from diverse documents and sources is clubbed into


one data model.

•Standardization involves complying with single system of


keeping data.

•Normalization refers to removing all duplicate data that has


crept in because of variety of process

• Finally, data is consolidated and made available to different


users across the organization
The Purpose of Data Warehouse
Purpose of Data Warehousing:
Better business intelligence for end users.
• Reduction in time to access and analyze information.

• Consolidation of disparate information sources.

• Replacement of older, less-responsive


decision support systems

• Faster time to market for products and services


Data warehousing pitfalls
•There are also disadvantages to using a data warehouse.
Some of them are:
• Over their life, data warehouses can have high costs.

•Data warehouses can get outdated relatively quickly.


There is a cost of delivering suboptimal information to the
organization.

• There is often a fine line between data warehouses and


operational systems. Duplicate, expensive functionality
may be developed in the data warehouse that, in
retrospect, should have been developed in the operational
systems.

The time it take to load the warehouse will also expand to


Data Mining
Data Mining works with Warehouse
Data
• Data Warehousing provides the
Enterprise with a memory

• Data Mining provides the


Enterprise with intelligence
What is Data mining?
•Data mining refers to extracting or mining knowledge from
large amount of data.

•Data mining is also known as knowledge discovery in database.

•Data mining software tools find hidden pattern and


relationships in large pool of data and infer rules from them that
can be used to predict future behavior guide.

•The major reason why data mining gained a great deal of


attraction is due to wide availability of data and imminent need
of turning that data into information and knowledge.

•The mining of gold from sand or rocks is referred to as gold


mining rather than rock or sand mining. Thus data mining
should have been named knowledge mining from data.
Data Mining: A KDD Process

Pattern Evaluation
– Data mining: the core
of knowledge discovery
process. Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
The KDD process
• Problem fomulation

• Data collection
– subset data: sampling might hurt if highly skewed data
– feature selection
• Pre-processing: cleaning
– name/address cleaning, different meanings (annual, yearly),
duplicate removal, supplying missing values
• Transformation:
– map complex objects e.g. time series data to features e.g.
frequency
• Choosing mining task and mining method:
• Result evaluation and Visualization:

Knowledge discovery is an iterative process


Application Areas

Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providers Value added data
Utilities Power usage analysis
Purpose of Data Mining
• Credit ratings/targeted marketing:
– Given a database of 100,000 names, which persons are the least
likely to default on their credit cards?
– Identify likely responders to sales promotions

• Fraud detection
– Which types of transactions are likely to be fraudulent, given the
demographics and transactional history of a particular customer?

• Customer relationship management:


– Which of my customers are likely to be the most loyal, and
which are most likely to leave for a competitor? :
Data Mining helps extract such
information

You might also like