You are on page 1of 26

Data

Warehouse

Agenda

What is Data Warehouse


Transaction System vs Data Warehouse
Data Warehouse Architecture
Metadata
Data Flows
Issues for building Data Warehouse
Warehouse Schema
Tool & Technologies
Advantages of Data Warehouse
Problems
Data Mart
Data Mining

Data Warehouse

What is Data Warehouse?

Collection of integrated, subject-oriented, time-variant


and non-volatile data in support of managements
decision making process.

Described as the "single point of truth", the "corporate


memory", the sole historical register of virtually all
transactions that occur in the life of an organization.

Data Warehouse

Transaction System vs. Data


Warehouse
Transaction System

Supports day-to-day operational


processes
Contains raw, detailed data that has not
been refined or cleansed
Volatile -- data changes from day-to-day,
with frequent updates
Technical issues drive the data structure
and system design
Disparate data structures, physical
locations, query types, etc.
Users rely on technical analysts for
reporting needs
Operational processes impacted by
queries run off of system

Data Warehouse

Supports management analysis and


decision-making processes
Contains summarized, refined, and
cleansed information
Non-volatile -- provides a data
snapshot; adjustments are not
permitted, or are limited
Business analysis requirements drive the
data structure and system design
Integrated, consistent information on a
single technology platform
Users have direct, fast access via On-line
Analytical Processing tools
Minimal impact on operational processes
Data Warehouse

Data Warehouse
Architecture
ODS 1

Query
Meta-data

ODS 2

Lightly
summarized
data

Load
Manager
Detailed data

High
Summarized
data

Manager

DBMS

Reporting,
query,
application
development,
and EIS tools

OLAP tools

ODS 3

Operational data
store (ODS)

Warehouse Manager
Data mining

Archive/backup
data

End-user access tools


Data Warehouse

Operational datastore(ODS)
It is a repository of current and integrated operational data
used for analysis.

Load manager it performs all the operations associated


with the extraction and loading of data into the warehouse.

Warehouse managerperforms all the operations


associated with the management of the data in the
warehouse.

Query manageralso called backend component, it


performs all the operations associated with the management
of user queries.

Data Warehouse

End-user access toolscan be categorized into five main groups:


data reporting and query tools, application development tools,
executive information system (EIS) tools, online analytical
processing (OLAP) tools, and data mining tools

Summarized data-> Stores all th aggregations generated by


warehouse manager.Exists to speed up performance of queries and
do not require backup

Archive/backup data-> Backup ensures recovery of Data


Warehouse from any data loss or any failure.
In archiving, older data is removed from the system in a format that
allows it to be qickly restored if required.

Meta-data

Data Warehouse

Importance of Meta Data

Meta-data : data about data


Purpose of meta-data is to show the pathway back to where the
data began, so that the warehouse administrators know the history
of any item in the warehouse
The meta-data associated with data transformation and loading
must describe the source data and any changes that were made to
the data
The meta-data associated with data management describes the
data as it is stored in the warehouse
The meta-data is required by the query manager to generate
appropriate queries, also is associated with the user of queries

Data Warehouse

Data flows

Inflow- The processes associated with the extraction, cleansing,


and loading of the data from the source systems into the data
warehouse.
upflow- The process associated with adding value to the data in the
warehouse through summarizing, packaging , packaging, and
distribution of the data
downflow- The processes associated with archiving and backingup of data in the warehouse
outflow- The process associated with making the data availabe to
the end-users
Meta-flow- The processes associated with the management of the
meta-data

Data Warehouse

Reporting, query,application
development, and EIS (executive
information system) tools
Warehouse Manager

Operational
data source1

Meta-flow

High
summarized data

Meta-data

Inflow

Lightly
summarized
data

Load
Manager
Operational
data source n

Operational
data store (ods)

Outflow

Upflow

DBMS

Detailed data

Query Manager OLAP (online


analytical processing)
tools

Warehouse Manager

Downflow
Archive/backup
data

Data mining tools

End-user access tools


Information flows of a data warehouse
Data Warehouse

Issues to be addressed in
Building Data Warehouse

When and how to gather Data?


What schema to use?
Data Cleansing
How to propagate updates?
What data to summarize?

Data Warehouse

Warehouse Schema

Fact Table:
Stores the business data. Data in fact table is
called Fact. They contain multidimensional data.

Dimension Table:
To minimize storage requirements, dimension
attributes are usually short identifiers that are
foreign keys into other tables called Dimension
Table

Data Warehouse

Schema with Fact & Dimension


Table
Name of the
Product
Product
Number
Description
Of Product

PRODUCT

Area 1

AREA

Area 2

DURATION

Area 3

Year
Beginning
Date
Completion
Date

Data Warehouse

Star Schema

Fact table in the center and all the dimension tables


attached to the central fact table.
Example: Sales Processing
Dimension
Table:
PRODUCT

Dimension
Table:
AREA

Dimension
Table:

Fact Table
SALES

TIME

Dimension
Table:
CUSTOMER
Data Warehouse

Dimension Tables
Region_Dimension_Table
region _id region _doc
NE
NW
SE
SW

Product_Dimension_Table
prod_grp_id prod_id prod_grp_desc prod_desc
10
20
30

100
140
220

Fewer devices
Circuit boards
Components

Northeast
Northwest
Southeast
Southwest

account _id _id account


account _doc_doc
account

Power supply
Motherboard
Co-processor

100000
100000

ABC
ABCElectronics
Electronics

110000
110000

Midway
Electric
Midway
Electric

120000
120000

Victor
Components
Victor
Components

130000
130000

Washburn,
Inc.
Washburn,

140000
140000

Zerox
Zerox

Inc.

Account_Dimension_Table

month
month

prod_id
prod_id

region_id
region_id

01-1996
01-1996
02-1996
02-1996
03-1996
03-1996

100
100
140
140
220
220

SW
SW
NE
NE
SW
SW

account_id
account_id
100000
100000
110000
110000
100000
100000

vend_id
vend_id net-sales
net-sales
100
100
200
200
300
300

30,000
30,000
23,000
23,000
32,000
32,000

gross_sales
gross_sales
50,000
50,000
42,000
42,000
49,000
49,000

Fact Table
Monthly_Sales_Summary_Table
Vendor_Dimension_Table
month
month

mo_in_fiscal_yr
mo_in_fiscal_yr

month_name
month_name
vend_id
vend_id vendor_desc
vendor_desc

01-1996
01-1996

4
4

January
January

02-1996
02-1996

5
5

February
February

100
100

PowerAge,
Inc.
PowerAge,

03-1996
03-1996

6
6

March
March

200
200

Advanced MicroMicro
DevicesDevices
Advanced

300
300

Farad Incorporated
Farad
Incorporated

Time_Dimension_Table

Data Warehouse

Inc.

Snowflake Schema

Consists of Fact Table and Normalized


Dimensional Table.
Disadvantage:

Unmanageable Data
Difficult to Retrieve Data
Metadata become Complex

Data Warehouse

Snowflake Schema
Product Category

Product
Manufacturer

Dimension
Table
PRODUCT

Dimension
Table
AREA

Dimension
Table

Fact Table
SALES

TIME

Dimension
Table
CUSTOMER
Data Warehouse

Starflake Schema

Combination of Star Schema and Snowflake


Schema.
Consists of Fact table, Star Dimension and
Snowflake Dimension.

Data Warehouse

Starflak
e
Schema

Price
Snowflake
Dimension

Star Dimension
Product

Weight
Product

Fact Table
SALES

Star dimension
Location

Location
Location 1

Location 2
Data Warehouse

Tools and Technologies


Tools & Technologies used in the
construction of a Data Warehouse:

Data Extraction - SAS


Data Cleansing - Apertus, Trillium
Data Storage - ORACLE, SYBASE

Data Warehouse

Advantages of using data


warehouse

End-user access wide variety of data


Business decision making for future purpose
Increases data consistency
Increases productivity
Decreases computing costs
Combines data

Data Warehouse

Problems

Increased end-user demands


High demand for resources
High maintenance
Extracting, cleansing and loading data could be time
consuming.
Data warehousing increases project scope.
Problems with compatibility with systems already in place
e.g. transaction processing system.
Providing training to end-users, who end up not using the
data warehouse.
Security could develop into a serious issue, especially if
the data warehouse is web accessible.

Data Warehouse

Data mart

It a subset of a data warehouse that supports the


requirements of particular department or
business function
The characteristics that differentiate Data Marts
and Data Warehouses include:

A Data mart focuses on only the requirements of users


associated with one department or business function
Data marts do not normally contain detailed
operational data, unlike data warehouses

As data marts contain less data compared with data


warehouses, data marts are more easily understood
and navigated
Data Warehouse

Warehouse Manager

Operational
data source1

Highly
summarized data

Meta-data
ODS 1

Lightly
summarized
data

Load
Manager

ODS 2

Reporting, query,application
development, and EIS tools
Query
Manager

Detailed data

DBMS
OLAP tools

ODS 3

Warehouse Manager
(First Tier)

Data mining

Operational data store (ODS)


End-user
access tools

Archive/backup
data

summarized
Data
Data Mart
(Relational database)

Summarized data
(Multi-dimension
database)
Data Warehouse

(Second Tier)

Reasons for creating a


To give users
access to the data they need to analyze
Data
Mart

most often

To provide data in a form that matches the collective view


of the data by a group of users in a department or
business function

To improve end-user response time due to the reduction


in the volume of data to be accessed

To provide appropriately structured data the user as it is


the requirements of end-user access tools

Normally use less data so tasks such as data cleansing,


loading, transformation, and integration are far easier,
and hence implementing and setting up a data mart is
simpler than establishing a corporate data warehouse
Data Warehouse

Data Mining

Process of extracting previously unknown, valid and actionable


information from large data and then using the information to make
crucial business decisions.

Applications : Early warning systems, Fraud detection, market


research, direct mail.

Data Mining provides techniques to :


Detect trends or patterns, find correlations
Data Analysis

Forecasting and business modeling

Data Warehouse

You might also like