You are on page 1of 15

DATA

WAREHOUSING

Module 4:
WHAT IS DW?
DW : storage area for processed and integrated data
across different sources (operational data & external data)

A data warehouse allows its users to extract required data for


Business Analysis
& 2
Strategic Decision Making
OTHER

A data warehouse is a
Subject-oriented
Integrated,
Non-volatile,
Time-variant collection of
data in support of
management's decisions. William H. Inmon

3
SUBJECT ORIENTED
Example for an insurance company :

Applications Area Data Warehouse


Auto
Autoand
andFire
Fire
Policy
Policy
Commercial
Commercial Processing Customer Policy
Policy
and Processing
andLife
Life Systems Customer
Insurance Systems
Insurance
Systems
Systems

Data
Data

Claims
Claims Losses
Losses Premium
Premium
Accounting
Accounting Processing
Processing
System
System Billing System
System
Billing
System
System
INTEGRATED
Data is stored once in a single integrated location
(e.g. insurance company)

Auto
AutoPolicy
Policy
Processing Data Warehouse
Processing Database
System
System

Customer
Fire
FirePolicy
Policy
data Processing
stored Processing
System
System
in several
databases
Subject = Customer
FACTS,
FACTS,LIFE
LIFE
Commercial,
Commercial,Accounting
Accounting
Applications
Applications
TIME - VARIANT
Data is stored as a series of snapshots or views which
record how it is collected across time.
Data Warehouse Data

Time Data

Key
{
Data is tagged with some element of time - creation
date, as of date, etc.
Data is available on-line for long periods of time for
trend analysis and forecasting. For example, five or
more years
NON-VOLATILE
Existing data in the warehouse is not overwritten or updated.

External
Sources

Production Data
Databases Warehouse
Data Database
Production Data
Production Warehouse
Applications Warehouse
Applications Environment
Environment
Load
Update
Insert Read-Only
Delete
DATA WAREHOUSE DESIGN
The DW development can be done through 3 different
methodologies
Bottom-up design
Top down design &
Hybrid design

8
ARCHITECTURE

9
EXTRACT-TRANSFORM-
LOAD
ETL is a process in data warehousing responsible for
pulling data out of the source systems and placing it into
a data warehouse.
ETL involves the following tasks:
Extracting the data: from source systems (SAP, ERP,
other operational systems), data from different source
systems is converted into one consolidated data warehouse
format which is ready for transformation processing.
Transforming the data: may involve the following tasks:
Applying business rules (so-called derivations, e.g.,
10
calculating new measures and dimensions)
Cleaning (e.g., mapping NULL to 0 or "Male" to "M" and
"Female" to "F" etc.),
Filtering (e.g., selecting only certain columns to load),
Splitting a column into multiple columns and vice versa,
Joining together data from multiple sources (e.g., lookup,
merge),
Transposing rows and columns,
Applying any kind of simple or complex data
validation (e.g., if the first 3 columns in a row are
empty then reject the row from processing)

Loading the data into a data warehouse or data repository


other reporting applications
11
DATA MARTS
The data mart is a subset of the data warehouse that is
usually oriented to a specific business line or team.
The information in data marts pertains to a single
department.
Each department or business unit is considered
the owner of its data mart including all
the hardware, software and data.
This enables each department to use, manipulate and
develop their data any way they see fit without altering
information inside other data marts or the data
12
warehouse.
Time frame for implementation is less than data
warehouse and takes around 4-12 months
It is relatively cheap than data warehouse
Information
Individually
Structured

Departmentally
Structured

Organizationally
Data Data Warehouse 13
Structured
NEED FOR DATA
WAREHOUSING
Better business intelligence for end-users
Reduction in time to locate, access, and analyze
information
Consolidation of disparate information sources
Strategic advantage over competitors
Faster time-to-market for products and services
Replacement of older, less-responsive decision
support systems
Reduction in demand on IS to generate reports 14
ADVANTAGES &
LIMITATIONS
ADVANTAGES
Integrating of data from multiple sources
Performing new types of analyses
Reducing cost to access historical data
Improved decision support system
LIMITATIONS
Long initial implementation time and associated high cost
Adding new data sources takes time and associated high
cost
15

You might also like