Professional Documents
Culture Documents
DATA WAREHOUSE
DEFINITION
A Data Warehouse is
DATA WAREHOUSE
CHARACTERISTICS
Data Warehouse
Characteristics
Subject Oriented
Data is categorized and stored by business
subject rather than by application
OLTP
Applicati
on
Insurance
Loans
Saving
Data
Warehou
se
Customer
Financial
Information
(Contd)
Focusing on the modeling and analysis
of data for decision makers, not on daily
operations or transaction processing
Provide a simple and concise (brief)
view around particular subject issues by
excluding data that are not useful in
the decision support process
Integrated
Data on a given subject is defined and
stored once
Savings
Current
accounts
Loans
OLTP Applications
Customer
Data Warehouse
(Contd)
Constructed by integrating multiple,
heterogeneous data sources
relational databases, flat files, on-line
transaction records
Time Variant
Data is stored as a series of
snapshots, each representing a
period of time
Time
Jan-97
Feb-97
Mar-97
Data
January
February
March
(Contd)
The time horizon for the data warehouse is
significantly longer than that of operational
systems
Operational database: current value data
Data warehouse data: provide information from
a historical perspective (e.g., past 5-10 years)
Non Volatile
Typically data in the data warehouse is
not updated or deleted
Operational
Warehouse
Load
Insert
Update
Delete
Read
Read
14
In Detail
DISTINCT
FEATURES
User and
system
orientation
Usage
Data contents
OLTP
OLAP
Customer
Market
Data design
Repetitive
Current,
Detailed
ER + application
View
Current, Local
Ad hoc
Historical,
Consolidated
Schema
(ex:star) +
subject
Evolutionary,
(Contd)
DISTINCT
FEATURES
Access pattern
OLTP
OLAP
Update
# Record
Accessed
DB Size
Metric
Tens, Twenty,
etc
100 MB GB
Transaction
throughput
Read-only but
complex queries
Hundreds
100 GB TB
Query
throughput,
response
PROBLEMS OF DATA
WAREHOUSING
20
DATA MODEL
Multidimensional Data
Model
A data cube :
From tables and spreadsheets to Data Cube
Allows data to be modeled and viewed in multiple
dimensions.
Dimensions :
Perspective or entities with respect to which an organization
wants to keep records.
Example :
AllElectronics may create a sales data warehouse in order
to keep records of the stores sales with respect to the
dimensions time, item, branch, and location.
Thus, we can keep track of things like monthly sales of items
and the branches and locations at which the items were sold
A data cube
We usually think of cubes as 3-D
geometric structures
In data warehouse
The data cube is n-dimensional
st
st
nd
nd
A data warehouse
Requires a consice, subject oriented schema
Facilitates on-line data analysis
Data Model :
Multidimensional model
Model can exist in the form of :
Star Schema
Snowflake Schema
Fact Constellation Schema
Star Schema
There is a central
large Fact table with
no redundancy
Each tuple in the fact
table has a foreign
key to a dimension
table which
describes the details
of that dimension
What is the problem of the schema and how
to overcome it?
What is the advantage of the schema?
Snowflakes Schema
Fact
Table
What is
the
problem
of the
schema?
Some of the dimension tables are normalized
thus splitting data into additional tables
Thus Snowflake schema is not as popular as
the Star schema
Illustration of Constellations
Fact
Table
Fact Table
Fact
Table
DATA WAREHOUSE
DESIGN
Physical Design
Logical Design
The logical design should result in :
1. a set of entities and attributes
corresponding to fact tables and
dimension tables
2. a model of operational data from
your source into subject-oriented
information in your target data
warehouse schema
PHYSICAL DESIGN
Introduction
Physical database design is a fundamental
part of data warehouse design.
The performance of a data warehouse is
largely affected by the physical design of the
underlying databases and the environment
where the databases are running.
To do the physical database design :
it is important to understand the physical system
architecture on which the database will be
operating.
Non-Functional (NF)
Requirements
Beberapa NF yang dijumpai :
data warehouse harus tersedia
(available) 24 jam sehari, 7 hari
seminggu
downtime tidak lebih dari satu jam
dalam sebulan
HIGH AVAILABILITY REQUIREMENTS
Konsekuensi
Database engine tidak dapat
diimplementasikan pada satu (single)
server, tetapi pada failover cluster
a configuration of installations on several
identical servers (nodes)
database instances are running on an
active node, but when the active node is
not available, the database instances
automatically switches over to a
secondary node
Services (SSRS)
a web service : installed on NLB servers
a database : running on a failover
cluster
Network Spesification
Between the SSIS server and the
database server and between the
database server and the OLAP server
try to put in a Gigabit network (capable
of performing at 1Gbps throughput),
rather than the normal 100Mbps
Ethernet.