You are on page 1of 27

Introduction to Data Warehousing Concepts

Topics Covered
What is a data warehouse Definition of a data warehouse Why organizations use data warehousing? OLTP vs. OLAP Dimensional Modeling Dimensions and Measures Types of data warehouses Data warehouse schemas and other basics

6 June 2011

What is a data warehouse?


Data warehouse is a database designed in such a way that it is optimized for querying and data analysis. Data warehouse has a collection of historical data from different operations in a company.

6 June 2011

Definition of a data warehouse


A data warehouse is a : Subject-oriented Integrated Non-volatile Time variant Accessible

Store of data obtained from variety of sources and made available to end users in a way that they can understand and use in a business context.

6 June 2011

Definition of a data warehouse


Subject-oriented The data in the data warehouse is organized so that all the data elements relating to the same real-world event or object are linked together. Time-variant The changes to the data in the data warehouse are tracked and recorded so that reports can be produced showing changes over time. Non-volatile Data in the data warehouse is never over-written or deleted - once committed, the data is static, read-only, and retained for future reporting. Integrated The data warehouse contains data from most or all of an organization's operational systems and this data is made consistent.

6 June 2011

Why organizations use data warehousing?


Competitive business environment creates need for complex analysis of ever increasing volume of business data. Hence data warehousing is used: to turn vast volumes of business data into meaningful management information Give users online access to this information Organizations need information which is : Holistic in its coverage of the business Selected and enriched Easily accessible Easily understandable Of a high quality Directly applicable to the decision situation

6 June 2011

OLTP Vs. OLAP


There are two basic data processing models: OLTP: On Line Transaction Processing The main aim of OLTP is reliable and efficient processing of a large number of transactions and ensuring data consistency. OLAP: On Line Analytical Processing The main aim of OLAP is efficient multidimensional processing of large data volumes.

6 June 2011

OLTP Vs. OLAP


OLTP
users function DB design data Clerk, IT professional day to day operations application-oriented current, up-to-date detailed, flat relational isolated repetitive read/write, index/hash on prim. key short, simple transaction tens thousands 100MB-GB transaction throughput

OLAP
Knowledge worker decision support subject-oriented historical, summarized multidimensional integrated, consolidated ad-hoc lots of scans complex query millions hundreds 100GB-TB query throughput, response

usage access unit of work # records accessed #users DB size metric

6 June 2011

OLTP Vs. OLAP


OLTP OLAP

Few

Indexes

Many

Many

Joins

Some

Normalized DBMS

Duplicated Data Derived Data and Aggregates

Denormalized DBMS

Rare

Common

6 June 2011

Dimensional Modeling
Dimensional Modeling is a different approach to database design. Features of Dimensional Modeling are: Highly denormalized schema Data is contained in 2 types of tables: Dimension and Fact tables Usually dimension tables have large number of columns and lesser number of rows. Usually fact tables have lesser number of columns and large number of rows.

6 June 2011

Dimensions ( Who, what, when, where )


Dimensions are the context of measurements. Dimension is a subject area of a business against which the facts are measured. For e.g. Sales summary in a fact table can be viewed by Region dimension ( sales by country, state, city) or by Time dimension ( monthly or yearly sales)

6 June 2011

For Example
Location Dimension - Table Schema

Field Name
Dim_Id Loc_Code Name State_Name County_Name Integer(4) Varchar(4)

Type

Varchar(50) Varchar(20) Varchar(20)

6 June 2011

For Example
Location Dimension - Table Data

Dim_Id
1001 1002 1003 1004

Loc_Code
IL01 IL02 MX01 TO01

Name
Chicago Loop Brooklyn Mexico City Toronto

State_Name
Illinois New York Distirto Federal Ontario

Country_Name
USA USA Mexico Canada

6 June 2011

Measures ( metrics and measurements )


Measures are summarized numeric data regarding the actual business process. Features of Measures: Usually measures are additive ( like total sales ). However they can be semi-additive ( like balances ) or non-additive ( like unit price ). Measures are aggregated/rolled up on the basis of the dimensions. Facts are an overall summary of the measures related to a business area i.e. fact tables contain measures.

6 June 2011

For Example
Monthly Sales Fact - Table Schema

Field Name
TM_Dim_Id PR_Dim_Id LOC_Dim_Id Sales Tax Integer(4) Integer(4) Integer(4) Integer(4) Integer(4)

Type

6 June 2011

For Example
Monthly Sales Fact - Table Data

TM_Dim_Id
1001 1002 1003 1001

PR_Dim_Id
1001 1002 1001 1004

LOC_Dim_Id
1003 1001 1003 1001

Sales
89513383 25468926 777215631 65894001

Tax
8900 2512 7796 6574

6 June 2011

Types of data warehouses


Data warehouse without staging area
Data Sources Data Warehouse Users

Operational system Metadata repository Operational system

Analysis

Summary data

Raw data

Reporting

Flat files

Data mining

6 June 2011

Types of data warehouses


Data warehouse with staging area
Data Sources Staging area Data Warehouse Users

Operational system Metadata repository Operational system

Analysis

Summary data

Raw data

Reporting

Flat files

Data mining

6 June 2011

Types of data warehouses


Data warehouse with staging area and data marts
Data Sources Staging area Data Warehouse Data Marts Users

Operational system Metadata repository Operational system

Purchasing

Analysis

Summary data

Raw data

Sales

Reporting

Flat files

Inventory

Data mining

6 June 2011

Data warehouse schemas and other basics


Three basic conceptual schemas are: Star Schema : A single object (fact table) in the middle connected to a number of dimension tables Snowflake Schema : A refinement of star schema where the dimensional hierarchy is represented explicitly by normalizing the dimension tables Fact Constellations : Multiple fact tables share dimension tables

6 June 2011

Data warehouse schemas and other basics


Star Schema

Time Dimension

Customer Dimension

Sales Fact Product Dimension Dimension 1

Store Dimension

6 June 2011

Data warehouse schemas and other basics


Star Schema
Date
Date ID Month Year

Sales Fact Table


Date ID Product ID Store ID Customer ID

Product
Product ID Prod Name Prod Desc Category QOH

Store
Store ID City State Country Region

Unit Sales Dollar Sales

Customer
Customer ID Cust Name Cust City

Measurements

Cust Country

6 June 2011

Data warehouse schemas and other basics


Snowflake Schema
Year Quarter

Customer Dimension Time

Product

Sales Fact Dimension 1

Sub Cat

Store
Category State

City

6 June 2011

Data warehouse schemas and other basics


Snowflake Schema
Year
Year ID Year

Sub Cat
Sub cat ID

Month Date
Month ID Month Year ID Date ID Date Month ID

Sales Fact Table


Product

Sub cat Cat ID

Date ID Product ID Store ID

Product ID Product Product desc Sub cat ID

Category
Cat ID Category Cat ID

City State
State ID State Country ID City ID City State ID Store ID Store City ID

Customer ID
Store

Unit Sales Dollar Sales Customer


Customer ID Cust Name Cust City

Country
Country ID Country

Measurements

Cust Country

6 June 2011

Data warehouse schemas and other basics


Fact Constellation

Time Dimension Sales Fact

Store Dimension

Forecast Fact Product Dimension Customer Dimension

6 June 2011

Data warehouse schemas and other basics


Fact Constellation
Store
Store ID City State Country Region

Sales Fact Table


Date ID Product ID Store ID Customer ID Unit Sales Dollar Sales Product
Product ID Prod Name Prod Desc Category QOH

Forecast Fact Table


Date
Date ID Month Year

Date ID Month ID Product ID Customer ID

Customer
Customer ID Cust Name Cust City Cust Country

Measurements

Fcst_Weight_net Fcst_Turnover

6 June 2011

Thank You

6 June 2011

You might also like