You are on page 1of 380

.......................................................................................

50102GC20
Production 2.0
May 1999
M08761
Data Warehousing
Fundamentals
Volume 1 Student Guide
Authors
Chon S. Chua
Richard Green
Technical Contributors
and Reviewers
Jackie Collins
Jennifer Jacoby
Mike Schmitz
John Haydu
Russ Pitts
Lauran Serhal
Brian Pottle
Donna Corrigan
Patricia Moll
Harry Penbert
SuiWah Chan
Joel Barkin
Steve Dressler
Publisher
Tony McGettigan
Copyright Oracle Corporation, 1999. All rights reserved.
This documentation contains proprietary information of Oracle Corporation. It is
provided under a license agreement containing restrictions on use and disclosure
and is also protected by copyright law. Reverse engineering of the software is
prohibited. If this documentation is delivered to a U.S. Government Agency of the
Department of Defense, then it is delivered with Restricted Rights and the
following legend is applicable:
Restricted Rights Legend
Use, duplication or disclosure by the Government is subject to restrictions for
commercial computer software and shall be deemed to be Restricted Rights
software under Federal law, as set forth in subparagraph (c) (1) (ii) of DFARS
252.227-7013, Rights in Technical Data and Computer Software (October 1988).
This material or any portion of it may not be copied in any form or by any means
without the express prior written permission of Oracle Corporation. Any other
copying is a violation of copyright law and may result in civil and/or criminal
penalties.
If this documentation is delivered to a U.S. Government Agency not within the
Department of Defense, then it is delivered with Restricted Rights, as defined in
FAR 52.227-14, Rights in Data-General, including Alternate III (June 1987).
The information in this document is subject to change without notice. If you find
any problems in the documentation, please report them in writing to Education
Products, Oracle Corporation, 500 Oracle Parkway, Box SB-6, Redwood Shores,
CA 94065. Oracle Corporation does not warrant that this document is error-free.
Data Warehouse MethodA Methodology for Designing Data Warehouse,
SQL*Loader, PL/SQL, Pro*C, Oracle7, Oracle8, and Oracle8i, Distributed Option,
Parallel Query Option, Parallel Server Option, Media Server, Spatial Data Option,
ConText Option, Video Server, Text Server, WebServer, Oracle Universal Server
ROLAP Option, Express Server, Web-enabled Express Server, SQL*Net,
Developer/2000, Relational Access Manager, Discoverer, Designer/2000,
SQL*Bridge, Transparent Gateway Developers Kit, Procedural Gateway
Developers Kit, Express, Express Analyzer, Express Objects, Sales Analyzer,
and Financial Analyzer are product names, trademarks, or registered trademarks
of Oracle Corporation.
All other products or company names are used for identification purposes only
and may be trademarks of their respective owners.
.....................................................................................................................................................
Data Warehousing Fundamentals iii
.....................................................................................................................................................
Contents
Preface
Profile xi
Related Publications xiv
Typographic Conventions xv
Lesson 1: Introduction
Course Objectives 1-3
Agenda 1-5
Questions About You 1-9
Lesson 2: Meeting a Business Need
Overview 2-3
Unsuitability of OLTP Systems for Complex Analysis 2-5
Management Information Systems and Decision Support 2-7
Data Extract Processing 2-9
Business Drivers for Data Warehouses 2-15
Current Situation and Growth of Data Warehousing 2-19
Typical Uses of a Data Warehouse 2-21
Summary 2-23
Practice 2-1 2-25
Lesson 3: Defining Data Warehouse Concepts and Terminology
Overview 3-3
Data Warehouse Definition 3-5
Data Warehouse Properties 3-7
Data Warehouse Terminology 3-21
Components of a Data Warehouse 3-25
Oracle Warehouse Vision, Products, and Services 3-31
Summary 3-41
Practice 3-1 3-43
Lesson 4: Driving Implementation Through a Methodology
Overview 4-3
Warehouse Development Approaches 4-5
The Need for an Iterative and Incremental Methodology 4-13
.....................................................................................................................................................
iv Data Warehousing Fundamentals
.....................................................................................................................................................
Contents
Oracle Data Warehouse Method 4-15
DWM Fundamental Elements 4-19
Oracle Warehouse Technology Initiative (WTI) 4-57
Summary 4-61
Practice 4-1 4-63
Lesson 5: Planning for a Successful Warehouse
Overview 5-3
Managing Financial Issues 5-5
Obtaining Business Commitment 5-9
Managing a Warehouse Project 5-15
Identifying Planning Phases 5-29
Identifying Warehouse Strategy Phase Deliverables 5-31
Identifying Project Scope Phase Deliverables 5-35
Summary 5-41
Practice 5-1 5-43
Lesson 6: Analyzing User Query Needs
Overview 6-3
Types of Users 6-5
Gathering User Requirements 6-7
Managing User Data Access 6-9
Security 6-21
OLAP 6-25
Query Access Architectures 6-47
Summary 6-51
Practice 6-1 6-53
Lesson 7: Modeling the Data Warehouse
Overview 7-3
Data Warehouse Database Design Phases 7-5
Phase One: Defining the Business Model 7-7
Phase Two: Creating the Dimensional Model 7-17
Data Modeling Tools 7-39
.....................................................................................................................................................
Data Warehousing Fundamentals v
.....................................................................................................................................................
Contents
Summary 7-41
Practice 7-1 7-43
Lesson 8: Choosing a Computing Architecture
Overview 8-3
Architecture Requirements 8-5
The Hardware Architecture 8-7
Database Server Requirements 8-29
Parallel Processing 8-33
Summary 8-39
Practice 8-1 8-41
Lesson 9: Planning Warehouse Storage
Overview 9-3
The Server Data Architecture 9-5
Protecting the Database 9-17
Summary 9-27
Practice 9-1 9-29
Lesson 10: Building the Warehouse
Overview 10-3
Extracting, Transforming, and Transporting Data 10-5
Extracting Data 10-13
Examining Data Sources 10-15
Extraction Techniques 10-23
Extraction Tools 10-35
Summary 10-39
Practice 10-1 10-41
Lesson 11: Transforming Data
Overview 11-3
Importance of Data Quality 11-5
Transformation 11-13
Transforming Data: Problems and Solutions 11-17
Transformation Techniques 11-33
.....................................................................................................................................................
vi Data Warehousing Fundamentals
.....................................................................................................................................................
Contents
Transformation Tools 11-53
Summary 11-57
Practice 11-1 11-59
Lesson 12: Transportation: Loading Warehouse Data
Overview 12-3
Transporting Data into the Warehouse 12-5
Building the Transportation Process 12-11
Transporting the Data 12-15
Postprocessing of Loaded Data 12-25
Summary 12-39
Practice 12-1 12-41
Lesson 13: Transportation: Refreshing Warehouse Data
Overview 13-3
Capturing Changed Data 13-5
Limitations of Methods for Applying Changes 13-25
Purging and Archiving Data 13-33
Final Tasks 13-39
Selecting ETT Tools 13-43
Summary 13-51
Practice 13-1 13-53
Lesson 14: Leaving a Metadata Trail
Overview 14-3
Defining Warehouse Metadata 14-5
Developing a Metadata Strategy 14-11
Examining Types of Metadata 14-19
Metadata Management Tools 14-33
Common Warehouse Metadata 14-35
Summary 14-37
Practice 14-1 14-39
Lesson 15: Supporting End-User Access
Overview 15-3
.....................................................................................................................................................
Data Warehousing Fundamentals vii
.....................................................................................................................................................
Contents
Business Intelligence 15-5
Multidimensional Query Techniques 15-7
Categories of Business Intelligence Tools 15-9
Data Mining in a Warehouse Environment 15-19
Oracle Data Mining Partners 15-33
Summary 15-35
Practice 15-1 15-37
Lesson 16: Web-Enabling the Warehouse
Overview 16-3
Accessing the Warehouse Over the Web 16-5
Common Web Data Warehouse Architecture 16-9
Issues in Deploying a Data Warehouse on the Web 16-11
Evaluating Web-Based Tools 16-19
Summary 16-23
Practice 16-1 16-25
Lesson 17: Managing the Data Warehouse
Overview 17-3
Managing the Transition to Production 17-5
Managing Growth 17-19
Managing Backup and Recovery 17-33
Identifying Data Warehouse Performance Issues 17-45
Summary 17-51
Appendix A: Practice Solutions
Practice 2-1 A-2
Practice 3-1 A-4
Practice 4-1 A-7
Practice 5-1 A-11
Practice 6-1 A-12
Practice 7-1 A-13
Practice 8-1 A-14
Practice 9-1 A-15
.....................................................................................................................................................
viii Data Warehousing Fundamentals
.....................................................................................................................................................
Contents
Practice 10-1 A-18
Practice 11-1 A-20
Practice 12-1 A-21
Practice 13-1 A-23
Practice 14-1 A-24
Practice 15-1 A-26
Practice 16-1 A-28
Glossary
.................................
Preface
.....................................................................................................................................................
Data Warehousing Fundamentals xi
.....................................................................................................................................................
Profile
Profile
Before You Begin This Course
This course is the entry-level course in the Data Warehousing curriculum. Therefore,
there are no prerequisites to this course.
Prerequisites
There are no prerequisites for this course.
How This Course Is Organized
Data Warehousing Fundamentals is an instructor-led course featuring lecture and
paper and pencil exercises as well as group discussions to reinforce the concepts and
skills introduced.
Lesson Aim
Lesson 1:
Introduction
In this lesson, the class format is reviewed, the class agenda is
described, and students introduce themselves. Because this class is
expected to appeal to a broad audience, the introduction will give
the instructor an idea of the composition of the class in terms of
data warehouse knowledge, Oracle knowledge, and the specific
role that each student plays with regard to data warehousing.
Lesson 2: Meeting a
Business Need
This lesson examines how data warehousing has evolved from
early management information systems to todays decision support
systems. The primary motivating factors for data warehouse
creation are explored. The types of industries employing data
warehouse are considered.
Lesson 3: Defining
Data Warehouse
Concepts and
Terminology
This lesson introduces the Oracle definition of a data warehouse.
The lesson offers a general description of the properties of a data
warehouse. The standard components and tools required to build,
operate, and use a data warehouse are identified.
Lesson 4: Driving
Implementation
Through a
Methodology
This lesson introduces the Oracle Data Warehouse Method
(DWM), a methodology employed by Oracle Consulting Services
for incremental development of a total warehouse solution using a
phased development approach. Partnering initiatives launched by
Oracle are described.
Lesson 5: Planning
for a Successful
Warehouse
This lesson introduces the planning that is critical to the success of
a data warehouse project. Planning phases, deliverables, and
project roles are identified. Overall warehouse strategy and project
scope are defined.
.....................................................................................................................................................
xii Data Warehousing Fundamentals
.....................................................................................................................................................
Preface
Lesson 6: Analyzing
User Query Needs
This lesson identifies the analysis required to identify and
categorize users that may need to access data from the warehouse,
and how their requirements differ. Data access and reporting tools
are considered.
Lesson 7: Modeling
the Data Warehouse
This lesson examines the role of data modeling in a data
warehousing environment. The lesson presents a very high level
overview of warehouse modeling steps. You consider the different
types of models that can be employed, such as the star schema.
Tools available for warehouse modeling are introduced.
Lesson 8: Choosing a
Computing
Architecture
This lesson examines the computer architectures that commonly
support data warehouses. The benefits of each hardware
architecture and reasons for using distributed warehouses are
examined. Students examine the technology requirements of a
database server for warehousing.
Lesson 9: Planning
Warehouse Storage
This lesson examines the database setup and management issues
such as partitioning, indexing, and ways to protect your database.
Lesson 10: Building
the Warehouse
In this lesson, you explore the sources of data for the data
warehouse data. You consider how the extraction and
transformation processes take data from source systems and
change it into data that is acceptable to the users of the data
warehouse. The lesson also describes typical data anomalies and
looks at ways to eliminate them.
Lesson 11:
Transforming Data
In this lesson, you explore how the transformation process
transforms data from source systems into data suitable for end user
query and analysis applications.
Lesson 12:
Transportation:
Loading Warehouse
Data
In this lesson, you examine how the extracted and transformed data
is transported into the warehouse.
Lesson 13:
Transportation:
Refreshing
Warehouse Data
In this lesson, you examine methods for updating the warehouse
with changed data, after the first-time load.
Lesson Aim
.....................................................................................................................................................
Data Warehousing Fundamentals xiii
.....................................................................................................................................................
Profile
Lesson 14: Leaving a
Metadata Trail
This lesson focuses on the concept of warehouse metadata, and the
role it plays in a well-developed and managed warehousing
environment.
Lesson 15:
Supporting End-User
Access
This lesson investigates the ways that users may access the data in
the data warehouse. Students are introduced to the concept of
business intelligence. The lesson discusses the discovery model
used by mining tools, and the reasons enterprises are looking at
data mining solutions for discovery of information.
Lesson 16: Web-
Enabling the
Warehouse
This lesson discusses how to take advantage of the Web to deploy
data warehouse information. It addresses internal and external
access, as well as the advantages of Web-enabling a data
warehouse. The lesson outlines the steps involved in deploying a
Web-enabled data warehouse. Challenges in deploying a Web-
enabled data warehouse are also discussed.
Lesson 17: Managing
the Data Warehouse
This lesson explores the management issues, critical success
factors, and challenges to successful data warehouse
implementation. The lesson addresses issues pertaining to the
management of the entire warehouse life cycle.
Lesson Aim
.....................................................................................................................................................
xiv Data Warehousing Fundamentals
.....................................................................................................................................................
Preface
Related Publications
Oracle Publications
Additional Publications
Oracle DBA Handbook, Loney, Kevin, Osborne McGraw-Hill; ISBN: 007882406.
Oracle: The Complete Reference, Koch, George and Kevin Loney; Oracle Press;
ISBN: 007882396X.
The Data Warehouse Toolkit, Kimball, Ralph; John Wiley & Sons; ISBN:
0471153370.
Building the Data Warehouse, Inmon, W.; John Wiley & Sons; ISBN:
0471141615.
Oracle8 Data Warehousing, Dodge, Gary and Gorman, T.; John Wiley & Sons;
ISBN: 0471199524.
The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing,
Developing, and Deploying Data Warehouses, Kimball, Ralph and others; John
Wiley & Sons, 1998; ISBN: 0471255475.
Data Warehouse Design Solutions, Adamson, C. and Venerable, M.; John Wiley &
Sons, 1998; ISBN 0-471-25195-X.
Data Warehousing:Architecture and Implementation, Humphries, M. et. al.,
Prentice Hall PTR, 1999; ISBN: 0-13-080902-0.
Web Sites
Data Warehouse Institute Web site, at http://www.dw-institute.com/
index.htm
The Data Warehouse Information Center Web site, at http://
pwp.starnetinc.com/larryg/index.html
The Data Warehouse.com Web site, at http://data-warehouse.com/
The Data Warehouse Knowledge Center Web site, at http://
www.datawarehouse.org
Title URL
Oracle8i for Data Warehousing: Fast and Simple for More
Data and More Users (Nov 1998)
http://
websight.us.oracle
.com
Large Scale Data Warehousing with Oracle8i, Winter
Corporation Sponsored Research Program
http://
websight.us.oracle
.com
DWM Handbook V1.0.0
.....................................................................................................................................................
Data Warehousing Fundamentals xv
.....................................................................................................................................................
Typographic Conventions
Typographic Conventions
Typographic Conventions in Text
Convention Element Example
Bold italic Glossary term (if
there is a glossary)
The algorithm inserts the new key.
Caps and lowercase Buttons,
check boxes,
triggers,
windows
Click the Executable button.
Select the Cant Delete Card check box.
Assign a When-Validate-Item trigger . . .
Open the Master Schedule window.
Courier new,
case sensitive
(default is
lowercase)
Code output,
directory names,
filenames,
passwords,
pathnames,
URLs,
user input,
usernames
Code output: debug.seti(I,300);
Directory: bin (DOS), $FMHOME (UNIX)
Filename: Locate the init.ora file.
Password: Use tiger as your password.
Pathname: Open c:\my_docs\projects
URL: Go to http://www.oracle.com
User input: Enter 300
Username: Log on as scott
Initial cap Graphics labels
(unless the term is a
proper noun)
Customer address (but Oracle Payables)
Italic Emphasized words
and phrases,
titles of books
and courses,
variables
Do not save changes to the database.
For further information, see Oracle7 Server
SQL Language Reference Manual.
Enter user_id@us.oracle.com, where
user_id is the name of the user.
Quotation marks Interface elements
with long names
that have only
initial caps; lesson
and chapter titles in
cross-references
Select Include a reusable module
component and click Finish.
This subject is covered in Unit II, Lesson 3,
Working with Objects.
Uppercase SQL column
names, commands,
functions, schemas,
table names
Use the SELECT command to view
information stored in the LAST_NAME
column of the EMP table.
.....................................................................................................................................................
xvi Data Warehousing Fundamentals
.....................................................................................................................................................
Preface
Typographic Conventions in Code
Typographic Conventions in Navigation Paths
This course uses simplified navigation paths, such as the following example, to direct
you through Oracle Applications.
(N) Invoice>Entry>Invoice Batches Summary (M) Query>Find
(B) Approve
This simplified path translates to the following:
1 (N) From the Navigator window, select Invoice>Entry>Invoice Batches
Summary.
2 (M) From the menu bar, select Query>Find.
3 (B) Click the Approve button.
N = Navigator, M = Menu, B = Button
Arrow Menu paths Select File>Save.
Brackets Key names Press [Enter].
Commas Key sequences Press and release these keys one at a time:
[Alt], [F], [D]
Plus signs Key combinations Press and hold these keys simultaneously:
[Ctrl]+[Alt]+[Del]
Convention Element Example
Caps and lowercase Oracle Forms
triggers
When-Validate-Item
Lowercase Column names,
table names
SELECT last_name
FROM s_emp;
Passwords DROP USER scott
IDENTIFIED BY tiger;
PL/SQL objects OG_ACTIVATE_LAYER
(OG_GET_LAYER (prod_pie_layer))
Lowercase italic Syntax variables CREATE ROLE role
Uppercase SQL commands
and functions
SELECT userid
FROM emp;
Convention Element Example
.................................
1
Introduction
.....................................................................................................................................................
1-2 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 1: Introduction
Copyright Oracle Corporation, 1999. All rights reserved.

Course Objectives
After completing this course, you should be able to
do the following:
Explain why data warehousing is a popular
solution
Describe data warehousing terminology
Identify components of an implementation
Explain the important of employing a method
Identify modeling concepts
Identify the management and maintenance
processes
Copyright Oracle Corporation, 1999. All rights reserved.

Course Objectives
Identify the hardware platforms that can be
employed with a data warehouse
Identify the features of the database server
Identify tools that can be employed at each stage
Describe user profiles and techniques for querying
the warehouse
Identify data warehouse implementation issues
and challenges
Position the products for the Oracle warehouse
.....................................................................................................................................................
Data Warehousing Fundamentals 1-3
.....................................................................................................................................................
Course Objectives
Course Objectives
After completing this course, you should be able to the following:
Explain why data warehousing is a popular solution in todays information
technology environment
Describe the terminology used with data warehousing
Identify the standard components of a data warehouse implementation
Explain the importance of using a methodology for development, and specifically
identify the phases of the Oracle Data Warehouse Method
Identify and use data warehouse modeling concepts
Identify the different processes required to manage and maintain the warehouse
Identify the hardware platforms that can be employed with a data warehouse
Identify the features required of a database server for a warehouse implementation
Identify the tools that can be used at each phase during the data warehouse
development cycle
Describe user profiles and the techniques users may employ for querying the
warehouse
Identify data warehousing implementation issues and challenges
Position the products for the Oracle warehouse
.....................................................................................................................................................
1-4 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 1: Introduction
Copyright Oracle Corporation, 1999. All rights reserved.

Data Warehousing Fundamentals


Day 1
Lesson 1 Introduction
Lesson 2 Meeting a Business Need
Lesson 3 Defining Data Warehouse
Concepts and Terminology
Lesson 4 Driving Implementation Through a
Methodology
Lesson 5 Planning for a Successful Warehouse
Lesson 6 Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Data Warehousing Fundamentals


Day 2
Lesson 7 Modeling the Data Warehouse
Lesson 8 Choosing a Computing Architecture
Lesson 9 Planning Warehouse Storage
Lesson 10 Building the Warehouse
Lesson 11 Transforming Data
Lesson 12 Transportation: Loading Warehouse
Data
.....................................................................................................................................................
Data Warehousing Fundamentals 1-5
.....................................................................................................................................................
Agenda
Agenda
Day 1
Lesson 1: Introduction
Lesson 2: Meeting a Business Need
Lesson 3: Defining Data Warehouse Concepts and Terminology
Lesson 4: Driving Implementation Through a Methodology
Lesson 5: Planning for a Successful Warehouse
Lesson 6: Analyzing User Query Needs
Day 2
Lesson 7: Modeling the Data Warehouse
Lesson 8: Choosing a Computing Architecture
Lesson 9: Planning Warehouse Storage
Lesson 10: Building the Warehouse
Lesson 11: Transforming Data
Lesson 12: Transportation: Loading Warehouse Data
.....................................................................................................................................................
1-6 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 1: Introduction
Copyright Oracle Corporation, 1999. All rights reserved.

Data Warehousing Fundamentals


Day 3
Lesson 13 Transportation: Refreshing
Warehouse Data
Lesson 14 Leaving a Metadata Trail
Lesson 15 Supporting End-User Access
Lesson 16 Web-Enabling the Warehouse
Lesson 17 Managing the Data Warehouse
.....................................................................................................................................................
Data Warehousing Fundamentals 1-7
.....................................................................................................................................................
Agenda
Day 3
Lesson 13: Transportation: Refreshing Warehouse Data
Lesson 14: Leaving a Metadata Trail
Lesson 15: Supporting End-User Access
Lesson 16: Web-Enabling the Warehouse
Lesson 17: Managing the Data Warehouse
.....................................................................................................................................................
1-8 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 1: Introduction
Copyright Oracle Corporation, 1999. All rights reserved.

Questions About You


To tailor the class to your specific needs and to
encourage dialog among all, please answer the
following questions:
What is your name and company?
What is your role in your organization?
What is your level of Oracle expertise?
Why are you building a data warehouse or data
mart?
What do you hope to get out of this class?
.....................................................................................................................................................
Data Warehousing Fundamentals 1-9
.....................................................................................................................................................
Questions About You
Questions About You
You will get a lot more out of this class if you are aware of the background of your
classmates and the issues that they face in the development of a data warehouse. Each
student has a unique perspective and an experience and knowledge set from which we
can learn. Because this class is expected to appeal to a broad audience, the
introduction will give the instructor an idea of the composition of the class in terms of
data warehouse knowledge, Oracle knowledge, and the specific role that each student
plays with regard to data warehousing.
.....................................................................................................................................................
1-10 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 1: Introduction
.................................
2
Meeting a Business Need
.....................................................................................................................................................
2-2 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 2: Meeting a Business Need
Copyright Oracle Corporation, 1999. All rights reserved.

Project Management
(Methodology, Maintaining Metadata)
Defining
DW Concepts
& Terminology
Planning
for a
Successful
Warehouse
Analyzing
User Query
Needs
Choosing a
Computing
Architecture
Modeling
the Data
Warehouse
Planning
Warehouse
Storage
Overview
ETT
(Building the
Warehouse)
Meeting a
Business
Need
Meeting a
Business
Need
Supporting
End User
Access
Managing
the Data
Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.

Objectives
After completing this lesson, you should be able to
do the following:
Describe why an online transaction processing
(OLTP) system is not suitable for complex analysis
Describe how extract processing for decision
support querying led to data warehouse solutions
employed today
Explain why businesses are driven to employ data
warehouse technology
Identify some of the industries that employ data
warehouses
.....................................................................................................................................................
Data Warehousing Fundamentals 2-3
.....................................................................................................................................................
Overview
Overview
The top slide on the facing page is a road map representing the flow of the course. The
vertical box entitled Meeting a Business Need emphasizes that the warehouse is
business driven. The determination of the warehouse architecture, data model, and
user query needs all stem from business requirements. The horizontal box running
across the bottom represents the ongoing project management throughout the
warehouse lifecycle.
This lesson examines how data warehousing has evolved from early management
information systems to todays decision support systems. The primary motivating
factors for data warehouse creation are explored. The types of industries employing
data warehouse are considered.
Objectives
After completing this lesson, you should be able to do the following:
Describe why an online transaction processing (OLTP) system is not suitable for
complex analysis
Describe how extract processing for decision support querying led to data
warehouse solutions employed today
Explain why businesses are driven to employ data warehouse technology
Identify some of the industries that employ data warehouses
.....................................................................................................................................................
2-4 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 2: Meeting a Business Need
Copyright Oracle Corporation, 1999. All rights reserved.

Characteristics of OLTP Systems


Characteristic OLTP
Typical operation Update
Level of analytical requirements Low
Screens Unchanging
Amount of data per transaction Small
Data level Detailed
Age of data Current
Orientation Records
Copyright Oracle Corporation, 1999. All rights reserved.

Why OLTP Is Not Suitable


for Complex Analysis
Complex Analysis
Historical information
to analyze
Data needs to be integrated
Database design:
Denormalized, star schema
OLTP
Information to support
day-to-day service
Data stored at transaction
level
Database design: Normalized
.....................................................................................................................................................
Data Warehousing Fundamentals 2-5
.....................................................................................................................................................
Unsuitability of OLTP Systems for Complex Analysis
Unsuitability of OLTP Systems for Complex Analysis
Operational systems largely exist to support transactions, for example, the booking of
an airline ticket.
Decision support, which is a type of complex analysis, is very different from OLTP.
Most OLTP transactions require a single record in a database to be located and updated
or an addition of one or more new records. Even a simple decision support query such
as How many luxury cars did we sell in Boston for January 1999 requires very
different operations at the database level to an OLTP transaction. A potentially large
number of records must be located, and there are no update operations at all.
Characteristics of OLTP Systems
The characteristics of OLTP systems are described below.
Why OLTP Is Not Suitable for Complex Analysis
OLTP databases are fully normalized and are designed to consistently store
operational data, one transaction at a time. Complex analysis, on the other hand,
requires database design that even business users find directly usable. To achieve this,
a different database design techniques are required, for example the use of
dimensional and star schemas with highly denormalized dimension tables.
OLTP focuses on recording and completing different types of business transactions but
is unable to provide decision makers with the information they need. The data needed
for such complex analysis is scattered throughout different OLTP systems and must
first be carefully integrated before the information needed can be obtained. Extracting
the data from these OLTP systems demands so much of the system resources that the
IT professional must wait until nonoperational hours before running the queries
required to produce the report. Thus OLTP systems are not suitable for complex
analysis because the database design is not optimized to run such queries.
Additionally, OLTP systems do not have an integrated pool of data from all the
operation systems within the enterprise in order for business users to derive complex
analysis. Also, OLTP systems do not store historical data that is needed for complex
analysis.
Characteristic OLTP
Typical operation Update
Level of analytical requirements Low
Screens Unchanging
Amount of data per transaction Small
Data level Detailed
Age of data Current
Orientation Records
.....................................................................................................................................................
2-6 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 2: Meeting a Business Need
Copyright Oracle Corporation, 1999. All rights reserved.

Management Information Systems


and Decision Support
Operational reports Decision makers
Production
platforms
MIS systems provided business data
Reports were developed on request
Reports provided little analysis capability
Decision support tools gave personal ad hoc
access to data
Ad hoc access
Copyright Oracle Corporation, 1999. All rights reserved.

Analyzing Data from


Operational Systems
Data structures are complex
Systems are designed for high performance and
throughput
Data is not meaningfully represented
Data is dispersed
OLTP systems may be unsuitable for intensive
queries
Operational reports
Production
platforms
.....................................................................................................................................................
Data Warehousing Fundamentals 2-7
.....................................................................................................................................................
Management Information Systems and Decision Support
Management Information Systems and Decision Support
Early Management Information Systems
Early Management Information Systems (MIS) provided management with reports to
assess the performance of the business. Report requirements were submitted as a
request to the MIS development team, who developed the report and made it available
to the user some time afterwarddays, weeks, or even months later. The data in the
reports was made available in a way that was difficult to use for analysis and
forecasting.
Personal Computing
With the advent of personal computing and 4GL programming techniques, MIS
became known as decision support (decision support systems or DSS). DSS was
judged to support business users better, by giving them direct access to the operational
data for additional ad hoc querying, which provided more flexible reporting as the
information was needed.
Analyzing Data from Operational Systems
Although decision support tools are friendly, intuitive, and easy to use, often the
structure of data in the online transaction processing systems does not support the
users real analytical requirements.
The structure of the operational data is often complex and too highly structured
(3NF).
The system was designed for high performancehigh throughput online
transaction processingrather than CPU-intensive analysis of information.
The data is not always meaningfully presented to the end user query tool.
The same data elements may be defined differently for each operational system.
For example, a customer record may hold the customer telephone number. In one
system this number is stored as a 15-digit number, and on another as a 20
alphanumeric character value.
Data is dispersed on multiple and diverse systems, leading to data redundancy and
the inability to coordinate data between systems to provide a global picture of the
business.
Running online transaction processing and decision support concurrently on one
machine degrades performance of the operational system, response time to users,
and performance of networks. The overall impact on the operational system may
be too great.
.....................................................................................................................................................
2-8 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 2: Meeting a Business Need
Copyright Oracle Corporation, 1999. All rights reserved.

End user computing offloaded from the


operational environment
Users own data
Data Extract Processing
Extracts Operational systems Decision makers
Copyright Oracle Corporation, 1999. All rights reserved.

Management Issues
Extract explosion
Extracts Operational systems Decision makers
.....................................................................................................................................................
Data Warehousing Fundamentals 2-9
.....................................................................................................................................................
Data Extract Processing
Data Extract Processing
DSS and Degradation
The problem of performance degradation was partially solved by using extract
processing techniques, which select data from one environment and transport it to
another environment for user access (a data extract).
Data Extract Program
The data extract program searches through files and databases, gathering data
according to specific criteria. The data is then placed into a separate set of files, which
may reside on another environment, for use by analysts for decision support activities.
Extract processing was a logical progression from decision support systems. It was
seen as a way to move the data from the high-performance, high throughput online
transaction processing systems onto client machines dedicated to analysis. Extract
processing also gave the user ownership of the data.
Management Issues with Data Extract Programs Although the principle of
extracts appears logical, and to some degree represents a model similar to the way a
data warehouse works, there are problems with processing extracts.
Extract programs may become the source for other extracts, and extract management
can become a full-time task for information systems departments. In some companies
hundreds of extract programs are run at any time.
.....................................................................................................................................................
2-10 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 2: Meeting a Business Need
Copyright Oracle Corporation, 1999. All rights reserved.

Productivity Issues
Duplicated effort
Multiple technologies
Obsolete reports
No metadata
Copyright Oracle Corporation, 1999. All rights reserved.

Data Quality Issues


No common time basis
Different calculation algorithms
Different levels of extraction
Different levels of granularity
Different data field names
Different data field meanings
Missing information
No data correction rules
No drill-down capability
.....................................................................................................................................................
Data Warehousing Fundamentals 2-11
.....................................................................................................................................................
Data Extract Processing
Data Extract Program (continued)
Productivity Issues with Extract Processing The productivity issues in an extract
processing environment are listed below:
Extract effort is duplicated, because multiple extracts access the same data and use
mainframe resources unnecessarily.
The program designed to access the extracted data must encompass all
technologies employed by the source data.
A report cannot always be reused, because business structures change.
There is no common metadata providing a standard way of extracting, integrating,
and using the data.
Data Quality Issues with Extract Processing The data quality issues in an extract
processing environment are listed below:
The data has no time basis and users cannot compare query results with
confidence. The data extracts may have been taken at a different point-in-time.
Each data extract may use a different algorithm for calculating derived and
computed values. This makes the data difficult to evaluate, compare, and
communicate by managers who may not know the methods or algorithms used to
create the data extract or reports.
Data extract programs may use different levels of extraction.
Access to external data may not be consistent, and the granularity of the external
data may not be well defined.
Data sources may be difficult to identify, and data elements may be repeated on
many extracts.
The data field names and values may have different meanings in the various
systems in the enterprise (lack of semantic integrity).
There are no data correction rules to ensure that the extracted data is correct and
clean.
The reports provide data rather than information, and no drill-down capability.
.....................................................................................................................................................
2-12 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 2: Meeting a Business Need
Copyright Oracle Corporation, 1999. All rights reserved.

From Extract to Warehouse DSS


Controlled
Reliable
Quality information
Single source of data
Data warehouse Internal and
external systems
Decision makers
Copyright Oracle Corporation, 1999. All rights reserved.

Advantages of Warehouse
Processing Environment
No duplication of effort
No need for tools to support many technologies
No disparity in data, meaning, or representation
No time period conflict
No algorithm confusion
No drill-down restrictions
.....................................................................................................................................................
Data Warehousing Fundamentals 2-13
.....................................................................................................................................................
Data Extract Processing
Transitioning from Extract Processing Environment to Warehouse
Processing Environment
There was a transition from decision support using data extracts to decision support
using the data warehouse. The data warehouse is a complete environment that requires
skill, knowledge, and commitment to put together, particularly for the very large scale
enterprise implementation.
The data warehouse environment is more controlled and therefore more reliable for
decision support than an extract environment. The data warehouse environment
supports your entire decision support requirements by providing high-quality
information, made available by accurate and effective cleansing routines and using
consistent and valid data transformation rules and documented presummarization of
data values. It contains one single source of accurate, reliable information that can be
used for analysis.
Advantages of the Warehouse Processing Environment over the Extract
Processing Environment The advantages of the warehousing processing
environment are listed below:
No duplication of effort
No need to consider using a query and reporting tool that supports more than one
technology
No disparity with the data and its meaning
No disparity with the way data is represented
No conflict over the time periods employed
No contention over the algorithms that have been used
No restriction on drill-down capabilities
.....................................................................................................................................................
2-14 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 2: Meeting a Business Need
Copyright Oracle Corporation, 1999. All rights reserved.

Business Motivators
Know the business
Reinvent to face new challenges
Invest in products
Invest in customers
Retain customers
Invest in technology
Improve access to business information
Be profitable
Provide superior services and products
Copyright Oracle Corporation, 1999. All rights reserved.

Business Motivators
Provide supporting information systems
Get quality information
Reduce costs
Streamline the business
Improve margins
.....................................................................................................................................................
Data Warehousing Fundamentals 2-15
.....................................................................................................................................................
Business Drivers for Data Warehouses
Business Drivers for Data Warehouses
Businesses in the nineties face challenges such as regulatory control, competition,
market maturity, product differentiation, customer behavior, and accelerated product
life cycles, all of which require businesses to develop market awareness,
responsiveness, adaptability, innovation, efficiency, and quality.
Critical Success Factors for a Dynamic Business Environment
In order to succeed in an ever-changing business environment a company must:
Know both the market they are in and their business (internally and externally).
Reinvent themselves to face new challenges. This may be changing product
requirements, diverse and effective services, or even changes in internal
organizational structures.
Invest in research and development of new product channels.
Invest in high-value customers who contribute greater returns to the business.
Retain existing customers and attract new customers.
Invest in new technology to support business needs.
Improve access to information so that they can make rapid decisions, based on an
accurate picture of the business.
Be profitable. At the same time, they must be able to invest in resources for the
future, such as technology and people.
Provide superior services and products to keep market share and maintain income.
Information Needed to Ensure Success
To support these strategies, a business needs to have:
Access to consistent and high-quality information on the behaviors of the business
and the external markets, so that they can constantly monitor the state of the
business.
Information that can help to reduce costs, streamline the business, and improve
margins.
.....................................................................................................................................................
2-16 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 2: Meeting a Business Need
Copyright Oracle Corporation, 1999. All rights reserved.

Technological Advances
Parallelism
Hardware
Operating system
Database
Query
Index
Applications
Large databases
64-bit architectures
Indexing techniques
Affordable, cost-effective
open systems
Robust warehouse tools
Sophisticated end user tools
8i
.....................................................................................................................................................
Data Warehousing Fundamentals 2-17
.....................................................................................................................................................
Business Drivers for Data Warehouses
Technology Needed to Support the Business Needs
Todays information technology climate provides you with cost-effective computing
resources in the hardware and software arena, Internet and intranet solutions, and
databases that can hold very large volumes of data for analysis, using a multitude of
data access technologies.
Technological Advances Enabling Data Warehousing
Technology (specifically open systems technology) is making it affordable to analyze
vast amounts of data, and hardware solutions are now more cost-effective.
Parallelism Recent advances in parallelism have benefited all aspects of computing:
Hardware environment
Operating system environment
Database management systems and all associated database operations
Query techniques
Indexing strategies
Applications
Other Factors
Very large volumes of data can be managed for warehouses greater than one
terabyte in size.
Recently introduced 64-bit architectures are increasing server capacity and speed.
Improved indexing techniques (bitmap index, hash index, star join) provide rapid
access to data.
Warehouse tools are becoming more robust and less expensive.
Licensing strategies are more effective and affordable.
Open systems are available.
Sophisticated, user-friendly, and intuitive tools are available to the user community
for all types of data warehouse access.
.....................................................................................................................................................
2-18 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 2: Meeting a Business Need
Copyright Oracle Corporation, 1999. All rights reserved.

Current Situation and Growth


1996 2001
0
5
10
15
20
25
1996 2001
Revenue
Projected Growth
USA Europe APAC Other
0
10
20
30
40
50
60
USA Europe APAC Other
USA Europe APAC Other
Current Revenue
Copyright Oracle Corporation, 1999. All rights reserved.

Growth Motivators and Inhibitors


Successful implementations
Decreased risk
Robust extraction software
Improving price to performance ratios
Improved staff training
Year 2000 compliance
Skills shortage
Lack of integrated metadata
Data cleaning cost
.....................................................................................................................................................
Data Warehousing Fundamentals 2-19
.....................................................................................................................................................
Current Situation and Growth of Data Warehousing
Current Situation and Growth of Data Warehousing
Data warehouses are becoming increasingly popular. The statistics for the estimated
growth of data warehousing are compelling. These figures are not specific to Oracle
but are industry wide.
Revenues
A recent report has shown that in 1996 data warehouse revenues (which include
hardware, software, and people-provided services) netted $8 billion (US). It is forecast
that in 2001 this figure will rise to $23 billion (U.S.), assuming a compound annual
growth rate of around 20% per year.
Geography
Most data warehouse implementations exist in the U.S., with Europe following close
behind, and then Asia Pacific.
Growth Motivators
These include:
Increased successful implementations
Decreased risk with vendors supplying a total solution
More robust and functional extraction software
Improved (and improving) price-to-performance equipment ratios
Improved training for IT staff
Growth Inhibitors
These may include:
Year 2000 compliance
Shortage of skills in specific areas of data warehousing
The lack of integrated metadata components
The labor-intensive commitment to the data cleaning function and its
corresponding dollar and time cost
Enterprisewide Implementations and Data Marts Enterprise data warehouses are
in position to dominate the business, compared with the smaller data mart
implementations that are specific to departments or specific functional requirements.
.....................................................................................................................................................
2-20 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 2: Meeting a Business Need
Copyright Oracle Corporation, 1999. All rights reserved.

Typical Uses of a Data Warehouse


Airline
Banking
Health care
Investment
Insurance
Retail
Telecommunications
Manufacturing
Credit card suppliers
Clothing distributors
0 10 20 30 40
Financial
Retail
Telecom
Manufacturing
Others
Percentage Market Coverage
.....................................................................................................................................................
Data Warehousing Fundamentals 2-21
.....................................................................................................................................................
Typical Uses of a Data Warehouse
Typical Uses of a Data Warehouse
The requirements of a business can be met by employing a data warehouse solution,
which collects data from internal business operations and external data from outside
organizations to provide a single source of reliable data for analysis.
Typical Users of a Data Warehouse
There are many industries that employ data warehouses:
Airlines for aircraft deployment, analysis of route profitability, frequent flyer
promotions, and maintenance
Banking for trend analysis, promotion of products and services, and customer
service
Health care for analysis and cost reduction
Investment and insurance companies for planning, customer analysis, risk
assessment, and portfolio management
Retail stores for trend analysis, buying pattern analysis, promotions, customer
profiling, and pricing
Telecommunications for analysis and for product and service promotions
Other industries that currently use data warehouse solutions are manufacturers, credit
card issuers, and clothing distributors
Figures show that the highest proportion of revenues in data warehousing is spent by
the financial services, retail, telecommunications, and manufacturing industries
.....................................................................................................................................................
2-22 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 2: Meeting a Business Need
Copyright Oracle Corporation, 1999. All rights reserved.

Summary
This lesson covered the following topics:
Describing why an online transaction processing
(OLTP) system is not suitable for complex analysis
Describing how extracting processing for decision
support querying led to data warehouse solutions
employed today
Explaining why businesses are driven to employ
data warehouse technology
Identifying some of the industries that employ data
warehouses
.....................................................................................................................................................
Data Warehousing Fundamentals 2-23
.....................................................................................................................................................
Summary
Summary
This lesson covered the following topics:
Describing why an online transaction processing (OLTP) system is not suitable for
complex analysis
Describing how extracting processing for decision support querying led to data
warehouse solutions employed today
Explaining why businesses are driven to employ data warehouse technology
Identifying some of the industries that employ data warehouses
.....................................................................................................................................................
2-24 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 2: Meeting a Business Need
Copyright Oracle Corporation, 1999. All rights reserved.

Practice 2-1 Overview


The practice covers answering questions and
discussing how data warehousing meets business
needs
.....................................................................................................................................................
Data Warehousing Fundamentals 2-25
.....................................................................................................................................................
Practice 2-1
Practice 2-1
1 OLTP databases hold up-to-the-minute information and are most commonly
designed as read-only databases.
True
False
2 In the scenario below, state whether it refers to an operational system or an
analytical processing system.
Show me how a specific brand of printer is selling throughout different parts of
the United States and how this specific brand of printer is selling since it was first
introduced into my stores.
This scenario refers to:
a An operational system
b An analytical processing system
3 Who is the target audience for the data warehouse?
a The business community in the organization
b IT professionals
c Data-entry clerks
d None of the above
e All of the above
4 Are the following statements true or false?
a Operational systems display the following qualities:
Good performance _____
Static data contents _____
High availability _____
Unpredictable CPU use _____
b Identify the reasons why business analysis is not easy with operational
systems.
Data is not structured for drill-down capablity. _____
The system is not designed for querying. _____
Data analysis can be CPU-intensive. _____
Data is not integrated between systems. _____
5 In groups of three or four, discuss the questions below and present your points to
the class at the end of the discussion.
a List some of the reasons that your company is considering implementing a data
warehouse or data mart.
.....................................................................................................................................................
2-26 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 2: Meeting a Business Need
b What are some of the business problems that your company is trying to
answer?
c Why is the business community in your organization unable to find the
answers to their business questions based on the existing information systems?
.................................
3
Defining Data Warehouse
Concepts and
Terminology
.....................................................................................................................................................
3-2 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Overview
Project Management
(Methodology, Maintaining Metadata)
Defining
DW Concepts
& Terminology
Defining
DW Concepts
& Terminology
Planning
for a
Successful
Warehouse
Analyzing
User Query
Needs
Choosing a
Computing
Architecture
Modeling
the Data
Warehouse
Planning
Warehouse
Storage
ETT
(Building the
Warehouse)
Meeting a
Business
Need
Supporting
End User
Access
Managing
the Data
Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.

Objectives
After completing this lesson, you should be able to
do the following:
Identify a common, broadly accepted definition of
a data warehouse
Recognize some of the operational properties of a
data warehouse
Recognize common data warehousing terminology
Identify the functionality associated with each
component required for a successful data
warehouse implementation
Identify and position the Oracle Warehouse vision,
products, and services
.....................................................................................................................................................
Data Warehousing Fundamentals 3-3
.....................................................................................................................................................
Overview
Overview
The previous lesson covered how data warehousing has evolved from early
management information systems to todays decision support systems that meets a
business need. This lesson defines data warehouse concepts and terminology. Note
that the Defining Data Warehouse Concepts and Terminology block is highlighted in
the course road map on the facing page.
Specifically, this lesson introduces the Oracle definition of a data warehouse. The
lesson offers a general description of the properties of a data warehouse. The standard
components and tools required to build, operate, and use a data warehouse are
identified.
Objectives
After completing this lesson, you should be able to do the following:
Identify a common, broadly accepted definition of a data warehouse
Recognize some of the operational properties of a data warehouse
Recognize common data warehousing terminology
Identify the functionality associated with each component required for a successful
data warehouse implementation
Identify and position the Oracle Warehouse vision, products, and services
.....................................................................................................................................................
3-4 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Definition of a Data Warehouse


An enterprise structured repository of subject-
oriented, time-variant, historical data used for
information retrieval and decision support. The
data warehouse stores atomic and summary data.
Oracle Data Warehouse Method
.....................................................................................................................................................
Data Warehousing Fundamentals 3-5
.....................................................................................................................................................
Data Warehouse Definition
Data Warehouse Definition
This definition of a data warehouse from the Oracle Data Warehouse Method
describes many of the most significant characteristics of a data warehouse. The Oracle
Data Warehouse Method was developed using experiences gained from successful
data warehouse projects carried out by Oracle Consulting Services. This method is
discussed in Lesson 4.
Subject-Oriented
While the data in an OLTP system is stored to support a specific business process (for
example, order entry, campaign management, and so on) as efficiently as possible,
data in a data warehouse is stored based on common subject areas (for example,
customer, product, and so on) for ease of access. That is because the complete set of
questions to be posed to a data warehouse are never known. Every question the data
warehouse answers spawns new questions. Thus, the focus of the design of a data
warehouse is providing users easy access to the data so that current and future
questions can be answered.
Time-Variant
The data warehouse contains slices of data across different periods of time. With these
data slices, the user can view reports from now and in the past.
Historical
A data warehouse typically contains several years worth of data. This is necessary to
support trending, forecasting, and time-based performance reporting (for example,
current year versus previous year).
Information Retrieval and Decision Support
A data warehouse is a facility for getting at information to answer questions. It is not
meant for direct data entry; batch updates are the norm for refreshing data warehouses.
Atomic and Summary Data
Depending on the purpose of the data warehouse, it may contain atomic data,
summary data, or both.
.....................................................................................................................................................
3-6 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Data Warehouse Properties


Subject
Oriented
Integrated
Time Variant Non Volatile
Data
Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.

Data is categorized and stored by business subject


rather than by application.
Subject-Oriented
OLTP Applications
Customer
financial
information
Data Warehouse Subject
Equity
Plans
Shares
Insurance
Loans
Savings
.....................................................................................................................................................
Data Warehousing Fundamentals 3-7
.....................................................................................................................................................
Data Warehouse Properties
Data Warehouse Properties
Bill Inmon defines data warehousing as:
A Data Warehouse is a subject oriented, integrated, time variant, non volatile
collection of data in support of managements decision making process.
Subject-Oriented
Subject-oriented data is organized around major subject areas of an enterprise, and is
useful for an enterprise-wide understanding of those subjects. For example, a banking
operational system keeps independent records of customer savings, loans, and other
transactions. A warehouse pulls this independent data together to provide financial
information. You can access subject-oriented data related to any major subject area of
an enterprise:
Customer financial information
Toll calls made in the telecommunications industry
Airline passenger booking information
Insurance claim data
The data is transformed so that it is consistent and meaningful for the warehouse.
.....................................................................................................................................................
3-8 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Integrated
Data Warehouse OLTP Applications
Data on a given subject is defined and stored once.
Savings
Current
accounts
Loans
Customer
Copyright Oracle Corporation, 1999. All rights reserved.

Time-Variant
01/97
02/97
03/97
January
February
March
Data Warehouse
Time Data
Data is stored as a series of snapshots, each
representing a period of time.
1
997
19
97
1997
.....................................................................................................................................................
Data Warehousing Fundamentals 3-9
.....................................................................................................................................................
Data Warehouse Properties
Integrated
In many organizations, data resides in diverse independent systems, making it difficult
to integrate into one set of meaningful information for analysis. A key characteristic of
a warehouse is that data is completely integrated. Data is stored in a globally
acceptable manner, even when the underlying source data is stored differently. The
transformation and integration process can be time-consuming and costly. It requires
commitment from every part of the organization, particularly top-level managers who
make the decisions and allocate resources and funds.
Data Consistency You must deal with data inconsistencies and anomalies before the
data is loaded into the warehouse. Consistency is applied to naming conventions,
measurements, encoding structures, and physical attributes of the data.
Data Redundancy Data redundancy at the detail level in the warehouse
environment is eliminated; the warehouse only contains data that is physically selected
and moved into it; however, selective and deliberate redundancy in the form of
aggregates and summaries is required in the warehouse to improve the performance of
queries especially drill-down analysis.
Time-Variant
Warehouse data is by nature historical; it does not usually contain the current
transactional data. Data is represented over a long time horizon, from two to ten years,
compared with one to three months of data for a typical operational system. The data
allows for analysis of past and present trends, and for forecasting using what-if
scenarios.
Time Element The data warehouse always contains a key element of time, such as
quarter, month, week, or day, that determines when the data was loaded. The date may
be a single snapshot date, such as 10-JAN-97, or a range, such as 01-JAN-97 to
31-JAN-97.
Snapshots by Time Period Warehouse data is essentially a series of snapshots by
time periods that do not change.
Special Dates A time dimension usually contains all the dates required for analysis,
including special dates like holidays and events.
.....................................................................................................................................................
3-10 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Nonvolatile
Typically data in the data warehouse is not updated
or deleted.
Warehouse
Read
Insert Read
Update
Delete
Load
Operational
Copyright Oracle Corporation, 1999. All rights reserved.

Changing Data
Operational Databases
Warehouse Database
First time load
Refresh
Refresh
Refresh
Purge or Archive
.....................................................................................................................................................
Data Warehousing Fundamentals 3-11
.....................................................................................................................................................
Data Warehouse Properties
Nonvolatile
Typically, data in the data warehouse is read-only. Data is loaded into the data
warehouse for the first-time load, and then refreshed regularly. Warehouse data is
accessed by the business users. Warehouse operations typically involve:
Loading the initial set of warehouse data (often called the first-time load)
Refreshing the data regularly (called the refresh cycle)
Accessing the Data Once a snapshot of data is loaded into the warehouse, it rarely
changes. Therefore, data manipulation is not a consideration at the physical design
level. The physical warehouse is optimized for data retrieval and analysis.
Refresh Cycle The data in the warehouse is refreshed; that is, snapshots are added.
The refresh cycle is determined by the business users. A refresh cycle need not be the
same as the grain (level at which the data is stored) of the data for that cycle. For
example, you may choose to refresh the warehouse weekly, but the grain of the data
may be daily.
Changing Warehouse Data
The following operations are typical of a data warehouse:
The initial set of data is loaded into the warehouse, often called the first-time load.
This is the data by which you will measure the business, and the data containing
the criteria by which you will analyze the business.
Frequent snapshots of core data warehouse data are added, (more occurrences),
according to the refresh cycle and using data from the multiple source systems.
Warehouse data may need to be changed in other ways:
The data you are using to analyze the business may change, the data warehouse
must be kept up-to-date to keep it accurate.
The business determines how much historical data is needed for analysis, say five
years worth. Older data is either archived or purged.
Inappropriate or inaccurate data values may be deleted from or migrated out of the
data warehouse.
.....................................................................................................................................................
3-12 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Data Warehouse Versus OLTP


Property
Response
Time
Operations
Nature of Data
Data Organization
Size
Data Sources
Activities
Data Warehouse
Seconds to hours
Primarily read only
Snapshots over time
Subject, time
Large to very large
Operational, Internal,
External
Analysis
Operational
Sub seconds to
seconds
DML
30-60 days
Application
Small to large
Operational, Internal
Processes
Copyright Oracle Corporation, 1999. All rights reserved.

Usage Curves
Operational system is predictable
Data warehouse
Variable
Random
.....................................................................................................................................................
Data Warehousing Fundamentals 3-13
.....................................................................................................................................................
Data Warehouse Properties
Data Warehouse Versus Online Transaction Processing (OLTP)
Response Time and Data Operations Data warehouses are constructed for very
different reasons than online transactional processing (OLTP) systems. OLTP systems
are optimized for getting data infor storing data as a transaction occurs. Data
warehouses are optimized for getting data outfor providing quick response for
analysis purposes.
Since there tends to be a high volume of activity in the OLTP environment, rapid
response is critical; whereas, data warehouse applications are analytical rather than
operational. Therefore slower performance is acceptable.
Nature of Data The data stored in each database varies in nature: the data
warehouse contains snapshots of data over time to support time-series analysis
whereas, the OLTP system stores very detailed data for a short time such as 30 to 60
days.
Data Organization The data warehouse is subject specific and supports analysis so
data is arranged accordingly. In order for the OLTP system to support subsecond
response, the data must be arranged to optimize the application. For example, an order
entry system may have tables which hold each of the elements of the order whereas a
data warehouse may hold the same data but arrange it by subject such as customer,
product, and so on.
Data Sources Since the data warehouse is created to support analytical activities,
data from a variety of sources can be integrated. The operational data store of the
OLTP system holds only internal data or data necessary to capture the operation or
transaction.
Usage Curves
Operational systems and data warehouses have different usage curves.
An operational system has a more predictable usage curve, the warehouse a less
predictable, more varied, and random usage curve.
Access to the warehouse varies not just on a daily basis, but may even be affected by
forces such as a seasonal variations. For this reason, you cannot expect the operational
system to handle heavy analytical queries (DSS) and continue to give good transaction
rates for the minute-by-minute processing required.
.....................................................................................................................................................
3-14 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

User Expectations
Control expectations
Set achievable targets for query response
Set SLAs
Educate
Growth and use is exponential
Copyright Oracle Corporation, 1999. All rights reserved.

Enterprisewide Warehouse
Large scale implementation
Scopes the entire business
Data from all subject areas
Developed incrementally
Single source of enterprisewide data
Synchronized enterprisewide data
Single distribution point to dependent
data marts
.....................................................................................................................................................
Data Warehousing Fundamentals 3-15
.....................................................................................................................................................
Data Warehouse Properties
User Expectations
The difference in response time may be significant between a data warehouse and a
client-server environment fronted by personal computers. You must control the users
expectations regarding response. Set reasonable and achievable targets for query
response time, which can be assessed and proved in the first increment of
development. You can then define, specify, and agree on Service Level Agreements.
If users are accustomed to fast PC-based systems, they may find the warehouse
excessively slow. However, it is up to those educating the users to ensure that they are
aware of just how big the warehouse is, how much data is there, and of what the
benefit the information is both user and business.
Exponential Growth and Use
Once implemented, data warehouses continue to grow in size. Each time the
warehouse is refreshed more data is added, deleted, or archived. The refresh happens
on a regular cycle. Successful data warehouses grow very quickly, perhaps to a
magnitude of gigabytes a month and terabytes over time.
Once the success of the warehouse is proven, the use increases dramatically. Users
who may have been skeptical want access. Use often grows faster than expected.
Enterprisewide Data Warehouse
To summarize, an enterprisewide warehouse stores data from all subject areas within
the business for analysis by end users. The scope of the warehouse is the entire
business and all operational aspects within the business.
An enterprisewide warehouse is normally (and should be) created through a series of
incrementally developed solutions. Never create an enterprisewide data warehouse
under one project umbrella, it will not work.
With an enterprisewide data warehouse all users access the warehouse, which
provides:
A single source of corporate enterprisewide data.
A single source of synchronized data in the enterprisewide warehouse for each
subject area.
A single point for distribution of data to dependent data marts.
.....................................................................................................................................................
3-16 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Data Warehouses Versus Data Marts


Data Mart
Department
Single-subject, LOB
Few
< 100 GB
Months
Data Mart
Data
Warehouse
Property
Scope
Subjects
Data Source
Size (typical)
Implementation time
Data Warehouse
Enterprise
Multiple
Many
100 GB to > 1 TB
Months to years
.....................................................................................................................................................
Data Warehousing Fundamentals 3-17
.....................................................................................................................................................
Data Warehouse Properties
Data Warehouse Versus Data Mart
Definition Data mart is a subset of data warehouse fact and summary data that
provides users with information specific to their requirements.
Scope A data warehouse deals with multiple subject areas and is typically
implemented and controlled by a central organizational unit such as the Corporate
Information Technology group. It is often called a central or enterprise data
warehouse.
Subjects A data mart is a simpler form of a data warehouse designed for a single
line of business (LOB) or functional area such as sales, finance, or marketing.
Data Source A data warehouse typically assembles data from multiple source
systems. A data mart typically assembles data from fewer sources.
Size Data marts are not differentiated from a data warehouses based on size, but on
use and management.
Implementation Time Data marts are typically smaller and less complex than data
warehouses and therefore are typically easier to build and maintain.
A data mart can be built as a proof of concept step toward the creation of an
enterprisewide warehouse.
.....................................................................................................................................................
3-18 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Marketing
Sales
Finance
Human Resources
Dependent Data Mart
Data
Warehouse
Data Marts
External Data
Flat Files
Operational
Systems
Marketing
Sales
Finance
Copyright Oracle Corporation, 1999. All rights reserved.

Independent Data Mart


Sales or Marketing
External Data
Flat Files Operational
Systems
.....................................................................................................................................................
Data Warehousing Fundamentals 3-19
.....................................................................................................................................................
Data Warehouse Properties
Dependent and Independent Data Marts
Data marts can be categorized into two types: dependent and independent. The
categorization is based primarily on the data source that feeds the data mart.
Dependent Data Mart Dependent data marts have the following characteristics:
The source is the warehouse. Dependent data marts rely on the data warehouse for
content.
The extraction, transformation, and transportation (ETT) process is easy.
Dependent data marts draw data from a central data warehouse that has already
been created. Thus, the main effort in building a mart, the data cleansing and
extraction, has already been performed. The dependent data mart simply requires
data to be moved from one database to another.
The data mart is part of the enterprise plan. Dependent data marts are usually built
to achieve improved performance and availability, better control, and lower
telecommunication costs resulting from local access to data relevant to a specific
department.
Independent Data Mart Independent data marts are stand-alone systems built from
scratch that draw data directly from operational and/or external sources of data.
Independent data marts have the following characteristics:
The sources are operational systems and external sources.
The ETT process is difficult. Because independent data marts draw data from
unclean or inconsistent data sources, efforts are directed toward error processing
and integration of data.
The data mart is built to satisfy analytical needs. The creation of independent data
marts is often driven by the need for a quick solution to analysis demands.
.....................................................................................................................................................
3-20 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Data Warehouse Terminology


Operational data store (ODS)
Stores tactical data from production systems that
are subject-oriented and integrated to address
operational needs
Metadata
Metadata
.....................................................................................................................................................
Data Warehousing Fundamentals 3-21
.....................................................................................................................................................
Data Warehouse Terminology
Data Warehouse Terminology
Operational Data Store
The operational data store (ODS) stores tactical data from production systems that are
subject-oriented and integrated to address operational needs. The detailed, current
information in the ODS is transactional in nature, updated frequently (at least daily),
and is only held for a short period of time.
The objectives of the ODS are to:
Integrate information from the production systems,
Relieve the production systems of reporting and analysis demands, and
Provide access to current data
In addition, the ODS can be a data source for the data warehouse and may be accessed
with the same tools used to access the data warehouse and data marts. The goal is to
provide a tactically-structured, efficient information processing environment to satisfy
analysis and reporting capabilities required for the day-to-day operations of the
business.
Metadata
Information about data, derived directly from the business owners and users, is
maintained to support operations and use of the data warehouse.
.....................................................................................................................................................
3-22 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Data Warehouse Terminology


Architecture
Enterprise data
warehouse
Business
area
warehouse
Source
data
Data
integration
.....................................................................................................................................................
Data Warehousing Fundamentals 3-23
.....................................................................................................................................................
Data Warehouse Terminology
Architecture
A set of rules or structures providing a framework for the overall design of a system or
product.
Technical Infrastructure
The technologies, platforms, databases, gateways, and other components necessary to
make the architecture functional within the corporation.
Data Access Environment
An environment that includes the front-end data-access tools and technologies,
training on how to use these tools and technologies, the implementation of metadata,
and the training to navigate through the metadata.
.....................................................................................................................................................
3-24 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Methodology
Ensures a successful data warehouse
Encourages incremental development
Provides a staged approach to an enterprisewide
warehouse
Safe
Manageable
Proven
Recommended
Copyright Oracle Corporation, 1999. All rights reserved.

Modeling
Warehouses differ from operational structures:
Analytical requirements
Subject orientation
Data must map to subject oriented information:
Identify business subjects
Define relationships between subjects
Name the attributes of each subject
Modeling is iterative
Modeling tools are available
.....................................................................................................................................................
Data Warehousing Fundamentals 3-25
.....................................................................................................................................................
Components of a Data Warehouse
Components of a Data Warehouse
Although every warehouse implementation varies, for every data warehouse there are:
Implementation methodologies
Design and modeling considerations
Operational and management processes to be developed
Data management considerations
User access reporting requirements and tools to be chosen
These are components and requirements that remain constant within any warehouse
development and production environment.
Methodology
Employing a methodology for the development of any system is always important. In
a warehouse environment even more so. The warehouse is such a big investment, in
every resource you can think of, that its success is essential.
To avoid failure of the warehouse implementation, you must employ a methodology
and keep to it. Failure is generally caused in two ways. The first cause of failure is that
the warehouse is not delivered on time, and the second is that the warehouse fails to
deliver what the business users need. A good method helps to manage expectations by
identifying clear deliverables.
Modeling
The warehouse may be modeled from scratch or using an existing operational model
that defines the operational systems. It is more common (and recommended) to model
from scratch, referencing the source systems available and identifying any gaps in data
needs.
The data warehouse is modeled in a different way from an operational system. First,
the structure needs to take into account the way data is analyzed, and the schema is
created accordingly. Second, the warehouse is based upon subjects (not functions), and
it is these subject areas that form the basis of the model.
Subject areas are modeled and implemented one at a time.
Modeling Tools You can use specific modeling tools, such as Oracle Designer/2000,
to model the warehouse initially and facilitate iterative development.
.....................................................................................................................................................
3-26 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Extraction, Transformation, and


Transportation
Purchase specialist tools, or develop programs
Extractionselect data using different methods
Transformationvalidate, clean, integrate, and
time stamp data
Transportationmove data into the warehouse
OLTP Databases Staging File Warehouse Database
Copyright Oracle Corporation, 1999. All rights reserved.

Efficient database server and management tools


for all aspects of data management
Imperatives
Productive
Flexible
Robust
Scalable
Efficient
Hardware, operating system and network
management
Data Management
.....................................................................................................................................................
Data Warehousing Fundamentals 3-27
.....................................................................................................................................................
Components of a Data Warehouse
Extraction, Transformation, and Transportation (ETT)
These processes are fundamental to the creation of quality information in the data
warehouse. You take data from source systems; clean, verify, validate, and convert it
into a consistent state; then move it into the warehouse.
Extraction: The process of selecting specific operational attributes from the
various operational systems.
Transformation: The process of integrating, verifying, validating, cleaning, and
time stamping the selected data into a consistent and uniform format for the target
databases. Rejected data is returned to the data owner for correction and
reprocessing.
Transportation: The process of moving data from an intermediate storage area into
the target warehouse database.
ETT Tools Specialized tools make these tasks comparatively easy to setup,
maintain, and manage, compared to in-house developed programs. Specialized tools
are available from Oracle with the Data Mart Suite.
Specialized tools can be an expensive option, which motivates many warehouses to
employ customized ETT programs written in COBOL, C++, PL/SQL, or other
programming languages or application development tools.
Data Management
The heart of the warehouse is the database management system (or Server, in the case
of Oracle), which must be:
Productive
Flexible
Robust
Scalable
Efficient
The server must possess many other properties (they are considered in a later lesson).
The warehouse environment must also be capable of managing the hardware,
operating system, and overall network infrastructure.
Warehousing environments normally employ a relational database management
system (RDBMS) or server.
Tools Oracle provides tools (such as Oracle Enterprise Manager) that can be used to
manage and control access to the warehouse environment.
.....................................................................................................................................................
3-28 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Data Access and Reporting


Tools that retrieve data for business analysis
Imperatives
Ease of use
Intuitive
Metadata
Training
More than one tool may be required
Warehouse
Database
Simple Queries
Forecasting
Drill-down
.....................................................................................................................................................
Data Warehousing Fundamentals 3-29
.....................................................................................................................................................
Components of a Data Warehouse
Data Access and Reporting
Every warehouse implementation requires tools for end user access. The tools chosen
depend upon the users requirements for information. The tools may be simple
reporting tools to more complex OLAP tools, to highly advanced data mining tools.
Ultimately, they should be easy to use and provide flexibility. There are hundreds of
access and query tools available.
Tools It is important that the tools are intuitive and easy to use. It is imperative that
the warehouse data is presented to the user in a meaningful business specific manner,
one that the user can easily interpret. Metadata provides the user with these data
descriptions and navigation information.
Users have different query requirements, and one query tool may not fit all
requirements. Users may need to perform simple to complex business modeling; trend
analysis using data spanning time periods; complex drill-down; simple queries on
prepared summary information; what-if analysis; detailed trend analysis and
forecasting; and data mining.
Note: Data warehouse implementors, or WTI partners, may need to provide extensive
and intensive training in the use and optimization of selected extraction and reporting
tools. If the tools are SQL-based, for example, the user needs to know how many
tables or indexes can be used before execution impedes system performance.
.....................................................................................................................................................
3-30 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Oracle Warehouse Components


Relational
tools
Applications/ Web
Any Data Any Access Any Source
External
data
Operational
data
OLAP
tools
Text, image
Oracle Medi
Relational /
Multidimensional
Spatial
Audio,
video Web
.....................................................................................................................................................
Data Warehousing Fundamentals 3-31
.....................................................................................................................................................
Oracle Warehouse Vision, Products, and Services
Oracle Warehouse Vision, Products, and Services
Oracle Warehouse Framework
Oracle Warehouse is a comprehensive program involving products, partners, and
services.
Loading Any Source Oracle and a variety of third-party provide solutions to extract
and load data from multiple data sources into the warehouse. You can gather data from
multiple sites, and multiple applications.
Managing Any Data Oracle warehouses using Oracle7, Oracle8, and Oracle8i
relational database management systems can store any data, including atomic,
summary, and transient data. You can also store metadata definitions about the data.
Analyzing Data Using Any Access Oracle Warehouse presents summarized
information using client-server and Web-based tools.
Relational analysis tools: Oracle provides tools for ad how query of relational data
as well as the development of custom data warehouse applications. Discoverer is
an ad hoc query tool that provides decision support and analysis capabilities
through a graphical front end.
Online Analytical Processing (OLAP) tools: The Oracle Warehouse supports
multidimensional data, which is a summarized cube of information that allows
sophisticated analysis across a variety of different dimensions, such as product,
time, and region. For OLAP analysis of multidimensional data, Oracle Express
Analyzer is an object-oriented ad how query tool. To build custom query and
reporting applications, Oracle provides Express Objects, an object-oriented OLAP
development environment.
.....................................................................................................................................................
3-32 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Oracle Data Mart Suite


Ware-
housing
Engines
Data Modeling
Oracle Data Mart Designer
Data
Management
Oracle Enterprise
Manager
Data
Extraction
Oracle Data Mart
Builder
Data Access
& Analysis
Discoverer &
Oracle Reports
OLTP
Engines
OLTP
Databases
Data Mart
Database
Oracle8
SQL*PLUS
Copyright Oracle Corporation, 1999. All rights reserved.

Data Mart Implementation


with the Oracle Data Mart Suite
Oracle Enterprise Server
Oracle Enterprise Manager
Oracle Data Mart Designer
Oracle Data Mart Builder
Oracle Discoverer
Oracle Web Application Server
Oracle Reports
.....................................................................................................................................................
Data Warehousing Fundamentals 3-33
.....................................................................................................................................................
Oracle Warehouse Vision, Products, and Services
Oracle Warehouse Products
Oracle Data Mart Suite This suite consists of seven products, all of which are used
in this course except Oracle Web Application Server and Oracle Reports. Each of the
products in the Oracle Data Mart Suite plays a role in the implementation or use of the
data mart. ODMS delivers an integrated package with the software and documentation
needed to implement a data mart quickly and easily. ODMS consists of these products:
Oracle Enterprise Server
Oracle Enterprise Manager
Oracle Data Mart Designer
Oracle Data Mart Builder
Oracle Discoverer
Oracle Web Application Server
Oracle Reports and Reports Server
.....................................................................................................................................................
3-34 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Oracle Warehouse Builder


Architecture
Warehouse Builder
Code Generation,
Metadata, Workflow
Metadata
Sources
Target
Tables
Oracle8i
PL/SQL, Java
Transforms
Extraction
Facilities
Loader
Remote SQL
Gateways
- OLE-DB/
ODBC
- Mainframe
- Specialized
ERP Data
- SAP
- Peoplesoft
- Oracle
Filter
Transform
External
Functions
PL/SQL, Java
Wrapper
Transform
Driver
.....................................................................................................................................................
Data Warehousing Fundamentals 3-35
.....................................................................................................................................................
Oracle Warehouse Vision, Products, and Services
Oracle Warehouse Builder
Oracle Warehouse Builder (OWB) is the new Oracle integrated product for the design,
building, and management of enterprise data warehouses.
Oracle Warehouse Builder rolls all the functionality of multiple stand-alone data
warehousing tools into a common, fully integrated Java-based graphical user
environment. Visual modeling and design, data extraction, movement and loading,
aggregation, metadata management, metadata integration with analysis tools, and
warehouse administrationliterally everything IT shops need to design, build, and
manage data warehouses is available in this breakthrough team-and project-oriented
visual tool. OWB consists of the following components:
OWB Repository
OWB User Interface
OWB Warehouse Administrator
OWB Software Development Kit
Oracle Integrator for SAP and for PeopleSoft
.....................................................................................................................................................
3-36 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Oracle Business Intelligence Tools


IS develops
users Views
Oracle Reports
Current
Business users
Oracle Discoverer
Tactical
Analysts
Oracle Express
Strategic
Copyright Oracle Corporation, 1999. All rights reserved.

The Tool for Each Task


Oracle What were sales by
Reports region last quarter?
Oracle What is driving the
Discoverer increase in North
American sales?
Question
Given the rapid increase
Oracle in Web sales, what will
Express total sales be for the rest
of the year?
Task Tool
Production
reporting
Advanced
analysis
Ad hoc
query and
analysis
.....................................................................................................................................................
Data Warehousing Fundamentals 3-37
.....................................................................................................................................................
Oracle Warehouse Vision, Products, and Services
Oracle Business Intelligence Tools
Business intelligence is a set of concepts, methods, and process to improve business
decisions using information from multiple sources and applying experience and
assumptions to develop accurate understanding of business dynamics.
Different end users need different tools and access to different data with targeted
capabilities. These tools must be able to meet the demands of particular needs.
However, they should also work together, and must be able to evolve with users as
their needs change. Oracle offers integrated, best-of-breed tools across the entire
business intelligence spectrum.
Every enterprise has a spectrum of business intelligence requirements. At a basic
level, these business intelligence requirements, or tasks, can be associated with
particular kinds of questions.
Oracle Reports, Oracle Discoverer, and Oracle Express are interoperable today,
providing seamless analysis across the entire business intelligence spectrum.
Discoverer users are able to dynamically pass the contents of a workbook to Express,
building a multidimensional cube on the fly and invoking the Express calculation
engine for more sophisticated analysis. Conversely, Express users are able to drill
out to Discoverer to explore the detail-level data in the relational system from data
summarized in an Express cube. Oracle Reports publishes views of data from both
Discoverer worksheets and Express data cubes.
Task Business Question Business
Intelligence Tool
Production reportingthe
creation and publication of
snapshot reports of data to
answer the question what
happened?the kind of
reporting on which businesses
run, that is, weekly sales reports.
What were sales by
region last quarter? How
many widgets did I
produce this week?
Oracle Reports
Ad hoc query analysiscertain
users will need to create their
own ad hoc queries to answer the
question why?
What is driving the
increase in North American
sales?
Oracle Discoverer
Advanced analysiswhich
includes more sophisticated
analytical tasks, such as time-
series analysis, forecasting,
financial modeling, and
multiuser what-if simulations.
Given the rapid increase in
Web sales, what will total
sales be for the rest of the
year?
Oracle Express
.....................................................................................................................................................
3-38 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Oracle Warehouse Services


Oracle
Consulting
Oracle
Education
Oracle Support Services
Customers
.....................................................................................................................................................
Data Warehousing Fundamentals 3-39
.....................................................................................................................................................
Oracle Warehouse Vision, Products, and Services
Oracle Warehouse Services
Oracle Consulting This service provides full life-cycle implementation services for
data warehousing solutions. Oracle Consulting has leveraged Oracles heavy
investment in new technology development through involvement in leading-edge
client engagements. It has also built knowledge repositories and problem-solving
approaches in data warehousing and incorporated them in its Data Warehouse Method.
Major new programs are being planned by Oracle Consultings Data Warehousing
Practice to help companies think about and manage their customers and their
businesses in better ways. Concepts such as one-to-one marketing and balanced
scorecard are brought to life with data warehousing technology and by professionals
who can provide a transition from management vision to fully operational systems.
Oracle Education This service offers a suite of products and services to meet your
training needs, including instructor-led training, online interactive learning, interactive
courseware, in-depth seminars, customized classes, and enterprisewide performance
consulting services. Oracle offers courses in a variety of media such as:
Instructor-led training (ILT) courses run either at an Oracle Education Center or
even on your site
Customized training (combining media offerings)
Media based training using Computer Based Training (CBT) courses
Oracle Support Services This service offers a range of program options, enabling
customers to select the best fit for their organization. Ranging from basic telephone
support and Web-based systems to highly customized, on-site support, the programs
include OracleFoundation, OracleMetals, OracleExpertise, and OracleLifecycle. There
are three global support centers and more than 90 local centers worldwide constitute a
global support infrastructure that enables Oracle Support Services to provide around-
the-clock, around-the-world coverage for core technology and mission-critical
applications.
.....................................................................................................................................................
3-40 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Summary
This lesson covered the following topics:
Identifying a common, broadly accepted definition
of the data warehouse
Distinguishing the differences between OLTP
systems and analytical systems
Defining some of the common data warehouse
terminology
Identifying some of the elements and processes in
a data warehouse
Identifying and positioning the Oracle Warehouse
vision, products, and services
.....................................................................................................................................................
Data Warehousing Fundamentals 3-41
.....................................................................................................................................................
Summary
Summary
This lesson covered the following topics:
Identifying a common, broadly accepted definition of the data warehouse
Distinguishing the differences between OLTP systems and analytical systems
Defining some of the common data warehouse terminology
Identifying some of the elements and processes in a data warehouse
Identifying and positioning the Oracle Warehouse vision, products, and services
.....................................................................................................................................................
3-42 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
Copyright Oracle Corporation, 1999. All rights reserved.

Practice 3-1 Overview


This practice covers the following topics:
Answering questions regarding data warehousing
concept and terminology
Discussing some of the data warehouse concept
and terminology
.....................................................................................................................................................
Data Warehousing Fundamentals 3-43
.....................................................................................................................................................
Practice 3-1
Practice 3-1
1 Indicate whether the following statements about warehouse data are true or false.
2 _______ is a set of rules or structures providing a framework for the overall design
of a system or product.
a Technical infrastructure
b Data access environment
c Architecture
3 The ________ is closely related to the architecture and consists of the
technologies, platforms, databases, gateways, and other components necessary to
make the architecture functional within the corporation.
a Data access environment
b Technical infrastructure
c Data warehouse
4 A telco company needs to understand their network traffic to better pinpoint
frequent trouble spots and predict network expansion and usage. Storing call detail
records and summarizing them by switch and trunk groups among other things in
another environment will satisfy this need.
Which of the following are you going to design?
a Operational data store (ODS)
b Data warehouse
Statement True False
a Data is organized by time.
b Data is always stored in a relational database.
c Data relates to business-specific areas.
d Data is sometimes integrated.
e Data is replaced according to a refresh cycle.
f Data warehouses may contain any type of data.
.....................................................................................................................................................
3-44 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 3: Defining Data Warehouse Concepts and Terminology
5 An online bookstore has customers in their Sales Order System and in their
Marketing System. These customers do not match between systems, because
Marketing staff do not always update the Marketing System with current and
complete customer data. Also, they want to develop profiles of their customers
according to buying patterns and summarize product sales to get the feedback
necessary to improve marketing programs and promotions.
Which of the following are you going to design?
a Operational data store (ODS)
b Data warehouse
6 Discussion: Discuss the questions below about data warehousing concepts and
terminology and present your points to the class at the of the discussion.
a Discuss whether a data warehouse, enterprisewide data warehouse,
independent data mart, dependent data mart, or operational data store is most
suitable for your companys needs.
b Discuss how the pieces of the classic Inmons definition of a data warehouse,
A data warehouse is subject oriented, integrated, time variant, non volatile
collection of data in support of managements decision making process apply
to your environment.
c How will your recommendations in question 6a above deliver benefits?
.................................
4
Driving Implementation
Through a Methodology
.....................................................................................................................................................
4-2 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Overview
Project Management
(Methodology, Maintaining Metadata)
Project Management
(Methodology, Maintaining Metadata)
Defining
DW Concepts
& Terminology
Planning
for a
Successful
Warehouse
Analyzing
User Query
Needs
Choosing a
Computing
Architecture
Modeling
the Data
Warehouse
Planning
Warehouse
Storage
ETT
(Building the
Warehouse)
Meeting a
Business
Need
Supporting
End User
Access
Managing
the Data
Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.

Objectives
After completing this lesson, you should be able to
do the following:
Explain the different approaches to warehouse
development and the benefits of an incremental
approach
Identify the purpose of the Oracle Method
Discuss the purpose and fundamental elements of
the Oracle Consulting Data Warehouse Method
Identify the Data Warehouse Method as a series of
processes and approaches
Discuss the objectives of the Oracle Warehouse
Technology Initiative
.....................................................................................................................................................
Data Warehousing Fundamentals 4-3
.....................................................................................................................................................
Overview
Overview
The previous lesson covered data warehouse concepts and terminology. This lesson
discusses the need of driving a data warehouse implementation project through a
methodology. Note that the Project Management block is highlighted in the course
road map on the facing page.
Specifically, this lesson introduces the Oracle Data Warehouse Method, a
methodology employed by Oracle Consulting Services for incremental development
of a total warehouse solution by using a phased development approach. Partnering
initiatives launched by Oracle are described.
Objectives
After completing this lesson, you should be able to do the following:
Explain the different approaches to warehouse development and the benefits of an
incremental approach to development
Identify the purpose of the Oracle Method
Discuss the purpose and fundamental elements of the Oracle Consulting Data
Warehouse Method
Identify the Data Warehouse Method as a series of processes and approaches
Discuss the objectives of the Oracle Warehouse Technology Initiative
.....................................................................................................................................................
4-4 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Big Bang Approach


Analyze enterprise
requirements
Build enterprise
data warehouse
Report in subsets or
store in data marts
Copyright Oracle Corporation, 1999. All rights reserved.

Big Bang Approach:


Advantages and Disadvantages
Advantages:
The only real advantage is where the
warehouse is being built as part of another
major project or program such as
reengineering and they are dependent on each
other
Having a big picture of the data warehouse
before starting the data warehousing project
Disadvantages:
Involves a high risk, takes a longer time
Runs the risk of needing to change
requirements
.....................................................................................................................................................
Data Warehousing Fundamentals 4-5
.....................................................................................................................................................
Warehouse Development Approaches
Warehouse Development Approaches
The most challenging aspect of data warehousing lies not its technical difficulty, but in
choosing the best approach to data warehousing for your companys structure and
culture, and dealing with the organizational and political issues that will inevitably
arise during implementation.
Big Bang Approach
Historically IT departments attempted to provide enterprisewide data warehouse
implementations in a single project approach. Data warehouse development is a huge
task, and it is a mistake to assume that the solution can be built all at once. The time
required to develop the warehouse often means that user requirements and
technologies change before the project is completed.
In this approach, you do the following:
1 Analyze the entire information requirement for the organization
2 Build the enterprise data warehouse to support these requirements
3 Build access, as required, either directly or by subsetting to data marts
Advantages of the Big Bang Approach There are no real advantages in this
approach over other approaches, and it should be avoided in most cases.
The only real advantage is where the warehouse is being built as part of another
major project or program such as reengineering and they are dependent on each
other
Having a big picture of the data warehouse before starting the data warehousing
project
Disadvantages of the Big Bang Approach The following are the disadvantages
to this approach.
Involves a high risk
Takes a longer time to deliver any perceived business benefit
Runs the risk of needing to change requirements, which will change during
analysis
.....................................................................................................................................................
4-6 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Incremental Approach to Warehouse


Development
Multiple iterations
Shorter implementations
Validation of each phase
Strategy
Definition
Analysis
Design
Build
Production
Strategy
Definition
Analysis
Design
Build
Production
Strategy
Definition
Analysis
Design
Build
Production
Copyright Oracle Corporation, 1999. All rights reserved.

Benefits of an Incremental Approach


Delivers a strategic data warehouse solution
through incremental development efforts
Provides extensible, scalable architecture
Supports the information needs of the enterprise
organization
Quickly provides business benefits and ensures a
much earlier return of investment
Allows a data warehouse to be built based on a
subject or application area at a time
Allows the construction of an integrated data mart
environment
.....................................................................................................................................................
Data Warehousing Fundamentals 4-7
.....................................................................................................................................................
Warehouse Development Approaches
Incremental Approach
The incremental approach manages the growth of the data warehouse by developing
incremental solutions that comply with the full-scale data warehouse architecture.
Rather than starting by building an entire enterprisewide data warehouse as a first
deliverable, start with just one or two subject areas, implement them as scalable data
mart and roll them out to your end users. Then, after observing how users are actually
using the warehouse, add the next subject area or the next increment of functionality to
the system. This is also an iterative process. It is this iteration that keeps the data
warehouse in line with the needs of the organization.
Think big and start small. In other words, your strategy identifies the enterprisewide
warehouse which is delivered by small increments, in short timeframes.
Benefits
Some of the benefits of the incremental approach to warehouse development are listed
below.
Delivers a strategic data warehouse solution through incremental development
efforts
Provides extensible, scalable architecture
Supports the information needs of the enterprise organization
Quickly provides business benefit and ensures a much earlier return of investment
Allows a data warehouse to be built based on a subject or application area at a time
Allows the construction of an integrated data mart environment
.....................................................................................................................................................
4-8 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Marketing
Top-Down Approach
Legacy data
Operations data
External data
sources
Sales
Data
warehouse
Data
marts
Users
Copyright Oracle Corporation, 1999. All rights reserved.

Top-Down Approach:
Advantages and Disadvantages
Advantages:
Provides a relatively quick implementation and
payback
Offers significantly lower risk
Emphasizes high-level business needs
Achieves synergy among subject areas
Disadvantages:
Requires an increase in up-front costs
Difficult to define the boundaries
May not be suitable unless the client needs
cross-functional reporting
.....................................................................................................................................................
Data Warehousing Fundamentals 4-9
.....................................................................................................................................................
Warehouse Development Approaches
Top-Down Incremental Approach
This is the fundamental approach recommended for data warehousing projects in the
Oracle Data Warehouse Method. In this approach, you do the following:
1 Analyze enterprise requirements to develop a conceptual information model and
warehouse road map including identifying and prioritizing subject areas.
2 Complete a model of a selected subject area, map to available data, and perform a
source system analysis.
3 Implement base technical architecture and establish metadata, extraction, and load
processes for the initial subject area.
4 Create and populate the initial subject area data mart within the overall warehouse
framework.
Advantages of the Incremental Top-Down Approach This approach has the
following advantages:
Provides a relatively quick implementation and payback. Typically, the scoping,
definition study, and initial implementation are scaled down so that they can be
completed in six to seven months.
Offers significantly lower risk because it avoids being as analysis heavy as the big
bang approach.
Emphasizes high-level business needs.
Achieves synergy among subject areas. Maximum information leverage is
achieved as cross-functional reporting and a single version of the truth are made
possible.
Disadvantages of the Incremental Top-Down Approach This approach has the
following disadvantages:
Requires an increase in up-front costs before the business sees any return on their
investment
Is difficult to define the boundaries of the scoping exercise if the business is global
May not be suitable unless the client needs cross-functional reporting
Note: An enterprise data warehouse is not always the right answer, but if you are
going to build an enterprise data warehouse, then this approach is by comparison the
best approach.
.....................................................................................................................................................
4-10 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Marketing
Bottom-Up Approach
Legacy data
Operations data
External data
sources
Sales
Data
warehouse
Data
marts
Copyright Oracle Corporation, 1999. All rights reserved.

Bottom-Up Approach:
Advantages and Disadvantages
Advantages:
Appealing to IT
Easier to get buy-in from IT
Disadvantages:
Requires source systems to encapsulate the
current business processes
Design may be out-of-date before delivery
Requires reengineering for each increment
Solutions may be rejected by the next line of
business to be involved
Overall benefit to the business may be
minimized
.....................................................................................................................................................
Data Warehousing Fundamentals 4-11
.....................................................................................................................................................
Warehouse Development Approaches
Bottom-Up Incremental Approach
This approach is similar to the top-down approach but the emphasis is on the data
rather than the business benefit. Here, IT is in charge of the project either because IT
wants to be in charge or the business has deferred the project to IT.
The general steps in this approach are as follows:
1 Generally define the scope and coverage of the data warehouse.
2 Analyze the source systems that are in scope for the data warehouse.
3 Define the initial increment based on the political pressure, assumed business
benefit and data volumes.
4 Define the target model based on the source and map source to target.
5 Implement base line technical architecture and establish metadata, extraction, and
load processes as required to support the increment.
6 Create and populate the initial subject areas within the overall data warehouse
framework.
Advantages of the Bottom-Up Incremental Approach This approach has the
following advantages:
This is a proof of concept type of approach and therefore it is often appealing to
IT.
It is easier to get IT buy-in for this approach because it is focused on IT.
Disadvantages of the Bottom-Up Incremental Approach This approach has the
following disadvantages:
Because of the solution model is typically developed from source systems and
these source systems will have encapsulated within them the current business
processes, the overall extensibility of the model will be compromised.
IT are often the last to know about business changesIT could be designing
something that will be out of date before they complete its delivery.
As the framework of definition in this approach tends to be much narrower, often a
significant amount of reengineering work is required for each increment.
As data definitions are rarely agreed upon by various lines of business for the first
increment, the solution may be rejected by the next line of business to be involved.
IT staff are used to data and not information. It is unusual for them to consider the
temporal aspects of the data, thus minimizing the overall benefit to the business.
.....................................................................................................................................................
4-12 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Oracle Method
Consists of:
Online guidelines and manuals
Workplan templates
Deliverable templates
Created by experienced and field-based
practitioner for estimating, managing, developing,
and delivering business solutions.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-13
.....................................................................................................................................................
The Need for an Iterative and Incremental Methodology
The Need for an Iterative and Incremental Methodology
The recommended approach to a data warehousing project is using an iterative and
incremental approach. By restricting efforts to those required to bring up and maintain
a single subject warehouse, it is much easier to demonstrate value in a relatively short
period of time and obtain management buy-in regarding the potential value of the
approach. On the other hand, such approach addresses managed growth of the data
warehouse through development of incremental solutions that comply with a full-scale
and enterprisewide data warehouse architecture. The scoped increments are delivered
in relatively short timeframes while complying with the strategic data warehouse
architecture.
Data Warehouse Method (DWM) is Oracles full life-cycle approach to delivering data
warehouse solutions. The DWM is part of Oracle Method that is Oracles integrated
approach to solution delivery.
Oracle Method
The Oracle Method (OM) methodology provides the means to document, standardize,
reuse, and improve the way that we deliver services. It consists of online guidelines
and manuals, workplan templates, and deliverable templates created by experienced
and field-based practitioner for estimating, managing, developing, and delivering
business solutions.
.....................................................................................................................................................
4-14 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Oracle Data Warehouse Method


Guides through development:
Business functions
Processes
Tasks
Modeled on the Custom Development Method
Copyright Oracle Corporation, 1999. All rights reserved.

Method Materials
Software Tools
Workplan templates*
Deliverable templates*
Online handbooks
Estimating software
*Not production available yet
Handbooks
Method handbook
Process and task
reference*
Deliverable reference*
.....................................................................................................................................................
Data Warehousing Fundamentals 4-15
.....................................................................................................................................................
Oracle Data Warehouse Method
Oracle Data Warehouse Method
The Oracle Data Warehouse Method (DWM) is based on the proven Oracle Method,
which documents, standardizes, and improves the way services are delivered. Services
include initial strategic studies, business process reengineering, custom and package
application implementation, change management, and program management.
By following a standard approach to defining tasks and deliverables, and are easily
integrated to suit your needs.
Method Materials
The Oracle Method includes software and hard copy handbooks for all lines of
business. These components of the Oracle Method assist all members of your project
team, from project managers to analysts to developers.
The software includes:
Workplan templates*
Deliverable templates*
Online handbooks
Estimating software
The hard copy handbooks contain:
Method handbook
Process and task reference*
Deliverable reference*
* Not production available yet and will be available in later releases.
.....................................................................................................................................................
4-16 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Oracle Data Warehouse Method


Focuses on scoping
Manages risk
Relies on user involvement throughout
Delivers an extensible, scalable solution
Uses a variety of technologies
Identifies tasks with clear objectives and
deliverables
Employs common techniques, skills, and
dependencies
Assigns tasks to processes and processes to
phases
Copyright Oracle Corporation, 1999. All rights reserved.

Benefits
Consistency
Productivity
Experience and
best practices
Flexibility
Risk avoidance
.....................................................................................................................................................
Data Warehousing Fundamentals 4-17
.....................................................................................................................................................
Oracle Data Warehouse Method
Oracle Data Warehouse Method
A warehouse project has many challenges, and the method addresses them by:
Focusing on scoping and requirements, and creating a data warehouse architecture
that is flexible and able to flourish in a dynamic business environment with
unpredictable uses
Managing the risk of a data warehouse project by developing a strong business
case, including measurements to validate the success of the warehouse.
Involving users throughout the life of the project, and advocating the involvement
of a strong executive sponsor from your organization
Defining the technical and warehouse architecture, integrating all data warehouse
components, and delivering an extensible and scalable solution
Outlining approaches, such as data mart solutions, that produce quick and
immediate business benefit while adhering to a strategic architecture
Employing a variety of technologies available from Oracle and third-party
vendors, such as a relational database, OLAP, data acquisition, data access,
metadata, and warehouse management technologies
Laying out the processes and tasks relevant to a data warehouse project, with clear
objectives and deliverables
Assigning tasks to processes, based on common techniques, skills, or
dependencies
Assigning processes to phases, based upon the development approach selected
(The end of a phase reflects the completion of a major set of objectives and
milestones in a data warehouse development effort.)
Benefits
The experience and best practices provide the following benefits:
Consistency is achieved among consultants and practitioners because all
organizations are working from a common set of tasks and deliverables with a
clear understanding of the development processes.
Productivity is increased by following established approaches and adhering to
successful practices. Productivity is also improved by the reduction in mistakes
and reworking, and the ability for a consultant to understand the structure and flow
of the project very quickly.
Flexibility is gained by providing a structured development environment that
allows personnel to be used efficiently based on skills and availability. Flexibility
is also achieved by using a common set of tasks as a foundation for the project
with the ability to customize the tasks based on the needs of each client.
Low risk is achieved through the use of a common set of tasks that outlines the best
ways of developing a warehouse. Mistakes are avoided and the impacts of
decisions can be evaluated within the framework and guidelines of experience.
.....................................................................................................................................................
4-18 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

DWM Fundamental Elements


Approaches
Phases
Processes
Tasks and deliverables
Roles
Phase 1 Phase 2 Phase 3
Phase 1 Task1
Phase 1 Task2
Phase 1 Task3
Phase 3 Task1
Phase 3 Task2
Phase 3 Task3
Phase 2 Task1
Phase 2 Task2
Phase 2 Task3
Process 1
Process 2
.....................................................................................................................................................
Data Warehousing Fundamentals 4-19
.....................................................................................................................................................
DWM Fundamental Elements
DWM Fundamental Elements
The fundamental elements of DWM are:
Approaches: Because Data Warehouse Method is an umbrella method that must
apply to any type of warehouse engagement, from the smallest OLAP engagement
to the largest multiterabyte one that also includes data access, a series of
approaches have been defined. These approaches make the method more
accessible by tailoring it to specific types of service offerings.
Phases: A phase is grouping of processes with a common objective.
Processes: This is a grouping of tasks with a common objective. They also
typically have a common skill set.
Tasks and deliverables: A task is defined as a unit of work that results in the output
of a single deliverable. As the most elementary unit of work, tasks provide the core
of the work breakdown structure (WBS). A WBS simply groups tasks into a
hierarchy for planning and scheduling purposes.
Roles: A skill set of resources assigned to a project
.....................................................................................................................................................
4-20 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Approaches
Incremental Packaged
data mart
Warehouse
Data mart
Data mart
Data mart
Warehouse
infrastructure
implementation
Business
application
implementation
Increment I
Proof of Concept
Increment II
through N
Increment II
through N
.....................................................................................................................................................
Data Warehousing Fundamentals 4-21
.....................................................................................................................................................
DWM Fundamental Elements
Data Warehouse Method Approaches
Methods are developed and documented by phase. Phasing is a useful and necessary
concept for managing projects but can cause unnecessary overhead and project
inefficiencies if only one phasing model is available for all sizes and types of projects.
Based on the type of data warehouse solution required, you determine the
development approach that is right for the project.
Currently DWM incorporates different project phasing models.
Incremental The incremental approach is proven and is considered the best
development practice for data warehousing. This is due to the delivery of immediate
and consistent benefits to the organization, while balancing the delivery of incremental
solutions with a strong, long-term data warehouse architecture.
The goal of the incremental approach is to provide benefits quickly during the initial
increment. Each incremental development effort for the data warehouse solution must
be defined and scoped. This allows complexity and risk to be managed and reuse of
work done in prior increments to be reused and leveraged. Each increment should
support a well-defined, long-term data warehouse architecture designed for
corporatewide data and all functional areas of the client organization.
The incremental approach enables you to develop increments in order of business need
or highest return on investment (ROI).
Packaged Implementation The Package Implementation approach is the viable
alternative for quickly delivering a useful warehouse solution that is focused on a
specific business function, that is, creating data mart solutions. Because a data
warehouse begins to deliver value as soon as the first query is run, implementing a
package solution can maximize the clients potential to identify and leverage
opportunities quickly, and hence to gain a competitive advantage.
.....................................................................................................................................................
4-22 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Requirements
Capture
Incremental Approach
Business
Strategy
IT
Strategy
Warehouse Strategy
Phase
Scoping Services
Technical Architecture
Services
Warehouse
Infrastructure
Services
Warehouse
Business Solution
Services
Increment 1
Increment 2
Increment 3
Increment n
Increment A
Increment B
Increment C
Increment z
Proof of Concept
Copyright Oracle Corporation, 1999. All rights reserved.

Incremental Development
Focus on business
functionality
Deliver business
benefit
Suited to warehouse
evolution
Once an increment is
complete the selection
and scope of the next
increment is defined
Each increment
follows the same
phase sequence
Strategy
PGM/PJM
Project
and
Program
Management
ETA
Enterprise
Technical
Architecture
Definition
Analysis
Design
Build
Transition to Production
Discovery
Incremental
Development
.....................................................................................................................................................
Data Warehousing Fundamentals 4-23
.....................................................................................................................................................
DWM Fundamental Elements
Incremental Approach
The incremental approach is the preferred Oracle approach to building an enterprise
data warehouse solution; it is effective and proven. This approach manages the growth
of the data warehouse by developing incremental solutions that comply with the full-
scale data warehouse architecture.
The architecture is designed to provide a solid framework for the long-term data
warehouse. It includes a central data warehouse with corporate data for all functional
areas, and the functionality to populate, manage, and access the full-scale data
warehouse.
The data warehouse also controls and feeds each data mart within the architecture. By
establishing this architecture, the strategic data warehouse can grow incrementally
while supporting data extensibility and avoiding a divergent group of data marts.
Incremental Development The increments start with the strategy phase, which
defines the overall data warehouse solution and architecture at a high level, including:
Scope of entire solution
Identification and prioritizing of increments
Initial technical architecture
Initial data warehouse architecture
An initial increment is then developed following the phasing model. The increment is
usually scoped to provide maximum benefit, target a specific user audience, and
ensure that the concept can be proved.
At the end of each increment, the discovery phase acts as the review and evaluation
phase. Subsequent increments follow the same phasing approach, building on
experiences gained and lessons learned from development of the first increment.
Data Mart Development DWM also provides an approach for the development of a
solution scoped to address the requirements of a specific functional area or
organizationa data mart solution.
.....................................................................................................................................................
4-24 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

The Strategy Phase


Strategy
Business requirements
Data acquisition
Architecture
Data quality
Administration
Strategy
Definition
Analysis
Design
Build
Transition
Discovery
Copyright Oracle Corporation, 1999. All rights reserved.

The Strategy Phase


Strategy
Metadata
Data access
Documentation
Testing
Training
Strategy
Definition
Analysis
Design
Build
Transition
Discovery
.....................................................................................................................................................
Data Warehousing Fundamentals 4-25
.....................................................................................................................................................
DWM Fundamental Elements
Phases of the Incremental Approach
Strategy Phase The goal of the strategy phase is to clearly define the business
objectives and purpose of the data warehouse solution. Business objectives for the data
warehouse project must be driven by top management and must be business-centric.
The purpose and objectives for the total data warehouse solution are essential to
setting and managing expectations. The strategy phase also clearly defines the data
warehouse team and the executive sponsor.
The overall objectives of the strategy phase include:
Achieve a clear awareness of the business goals and objectives.
Derive the data warehouse scope from business objectives.
Document a clear definition of the data warehouse scope in its entirety.
Document the incremental approach used to support the business objectives.
Define success measurements.
Identify the operational and external data sources required to support the business
goals.
Outline the strategies for data acquisition and data quality.
Define the strategy for warehouse administration.
Identify the role of metadata and document the strategy for metadata management.
Define the data access methods necessary to support business objectives.
Describe the strategy for warehouse documentation and training.
Identify the testing methods necessary to support user acceptance.
Identify the existing technical architecture and capacity plan.
Create the enterprise data warehouse architecture.
Determine the configuration and capacity requirements.
Prerequisite information needed for the strategy phase includes:
High-level business descriptions and existing reference material
Source system documentation and data models, including external data providers
Note: Without a complete understanding of the business objectives and scope of the
overall warehouse you will not be able to proceed successfully.
.....................................................................................................................................................
4-26 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

The Definition Phase

Definition
Business requirements
Data acquisition
Architecture
Data quality
Strategy
Definition
Analysis
Design
Build
Transition
Discovery
Copyright Oracle Corporation, 1999. All rights reserved.

The Definition Phase

Definition
Administration
Metadata management
Data access
Documentation
Strategy
Definition
Analysis
Design
Build
Transition
Discovery
Training
.....................................................................................................................................................
Data Warehousing Fundamentals 4-27
.....................................................................................................................................................
DWM Fundamental Elements
Phases of the Incremetnal Approach (continued)
Definition Phase The goal of the definition phase is to clearly define the scope and
objectives for the incremental development effort. Initial increment, conceptual
models are created, data sources are documented, and the scope of data quality is
clearly defined. The technical architecture and data warehouse architecture are also
created.
The overall objectives of the definition phase include:
Document a clear scope of the definition phase.
Understand operational and external data sources.
Plan for the initial load and refresh of the warehouse.
Define the interface, configuration, and capacity requirements.
Integrate metadata.
Define the scope of the data quality effort.
Outline warehouse administration efforts.
Outline data access methods.
Train the user community.
Prerequisite information needed for the definition phase includes:
Business goals and objectives
Data warehouse purpose, objectives, and scope
Enterprise data warehouse logical model
Source system data flows
Subject area gap analysis
Data acquisition strategy
Data warehouse architecture and technical infrastructure
Data access environment and data quality strategy
Data warehouse administration strategy, metadata strategy, and training strategy
.....................................................................................................................................................
4-28 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

The Analysis Phase

Analysis
Business requirements
Data acquisition
Architecture
Data quality
Administration
Strategy
Definition
Analysis
Design
Build
Transition
Discovery
Copyright Oracle Corporation, 1999. All rights reserved.

The Analysis Phase


Analysis
Metadata
Data access
Documentation
Testing
Training
Strategy
Definition
Analysis
Design
Build
Transition
Discovery
.....................................................................................................................................................
Data Warehousing Fundamentals 4-29
.....................................................................................................................................................
DWM Fundamental Elements
Phases of the Incremental Approach (continued)
Analysis Phase The goal of the analysis phase is to focus on the users information,
data acquisition, and data access requirements for business analysis and decision
making. Relational and multidimensional models are produced for the data warehouse,
metadata, and if appropriate, the data marts. Tool selection is also completed for all
appropriate warehouse components during this phase.
The overall objectives of the analysis phase include:
Collect and model detailed data requirements, including summarization, to support
the business requirements.
Identify and model multidimensional structures.
Map source data to target database objects.
Resolve design conflicts and data quality issues.
Collect and model metadata requirements.
Collect detailed data access, reports, and query requirements.
Select the appropriate tools for data acquisition, data quality, administration,
metadata, and data access components of the warehouse project.
Prerequisite information needed for the strategy phase includes:
Business goals and objectives
Data warehouse purpose, objectives, and scope
Detailed data load, refresh, and summarization plan
Detailed data quality acceptance plan
Data warehouse architecture, technical infrastructure, and capacity plan
Warehouse administration and metadata integration plans
Data access and training plans
Viable data acquisition tools, data quality tools, metadata tools, and data access
tools lists
.....................................................................................................................................................
4-30 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

The Design Phase


Data acquisition
Architecture
Data quality
Administration
Design
Metadata management
Strategy
Definition
Analysis
Design
Build
Transition
Discovery
Copyright Oracle Corporation, 1999. All rights reserved.

The Design Phase


Design
Strategy
Definition
Analysis
Design
Build
Transition
Discovery
Data access
Database design & build
Documentation
Testing
Training
Transition
.....................................................................................................................................................
Data Warehousing Fundamentals 4-31
.....................................................................................................................................................
DWM Fundamental Elements
Phases of the Incremental Approach (continued)
Design Phase The goal of the design phase is to transform the requirements
identified during the analysis phase into detailed design specifications and to complete
the technical architecture installation.
The overall objectives of the design phase include:
Document a clear scope of the design phase.
Design the initial data load and refresh modules.
Execute the hardware and software installation plan.
Design the data cleansing, error and exception handling, and audit and control
modules.
Outline the metadata specifications for reporting, bridging, and capturing.
Design the end user layer and standard queries and reports.
Establish and document the user and role access privileges.
Create the database designs for the data warehouse, data mart, metadata repository,
and multidimensional structures identified during the analysis phase.
Document the initial version of all modules designed.
Create the test plans for integration testing, system testing, regression testing,
volume testing, and ad hoc query testing.
Prerequisite information needed for the design phase includes:
The initial data load and refresh requirements
The technical infrastructure and data warehouse architecture
The data acquisition plan
The metadata requirements
The data access requirements
The test strategy
.....................................................................................................................................................
4-32 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

The Build Phase


Data acquisition
Architecture
Data quality
Administration
Build
Metadata management
Strategy
Definition
Analysis
Design
Build
Transition
Discovery
Copyright Oracle Corporation, 1999. All rights reserved.

The Build Phase


Build
Strategy
Definition
Analysis
Design
Build
Transition
Discovery
Data access
Database design &build
Documentation
Testing
Training
Transition
.....................................................................................................................................................
Data Warehousing Fundamentals 4-33
.....................................................................................................................................................
DWM Fundamental Elements
Phases of the Incremental Approach (continued)
Build Phase The goal of the build phase is to create and test the database structures,
data acquisition modules, warehouse administration modules, metadata modules, data
access modules, and reports and queries.
The overall objectives of the build phase include:
Deliver a well-designed, thoroughly-tested, and integrated data warehouse
solution.
Optimize the database structures to meet design standards and performance
objectives.
Deliver access components.
Deliver documentation for using and maintaining the warehouse.
Prerequisite information needed for the design phase includes:
The data acquisition module designs
The technical architecture and capacity plan
The data quality and issue resolution plans
The warehouse administration and scheduling plan
The metadata implementation plan
Specifications for the end-user layer, standard queries and reports, roles and
privileges, and query governor limits
The logical and physical database and multidimensional database design
The index and data storage design
The user guide, the metadata reference guide, and the warehouse administration
reference
Test plans for integration testing, system testing, environment testing, regression
testing, and ad hoc access testing
.....................................................................................................................................................
4-34 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Transition to Production Phase


Testing
Training
Transition
Post-implementation support
Transition to production
Strategy
Definition
Analysis
Design
Build
Transition
Discovery
Data acquisition
Copyright Oracle Corporation, 1999. All rights reserved.

Discovery Phase
Post-implementation
support
Discovery
Strategy
Definition
Analysis
Design
Build
Transition
Discovery
.....................................................................................................................................................
Data Warehousing Fundamentals 4-35
.....................................................................................................................................................
DWM Fundamental Elements
Phases of the Incremental Approach (continued)
Transition to Production Phase The goal of the transition to production phase is to
install the warehouse, go to production, prepare the users to use and manage the
solution, and begin managing the growth and maintenance of the warehouse.
The overall objectives of this phase include:
Install the warehouse solution.
Prepare users to use the warehouse and support personnel to manage the
warehouse.
Populate the production database with production data on the production platform,
using production modules.
Deliver an integrated warehouse and monitor the performance and end-user
access.
Identify additional access and informational requirements.
Prerequisite information needed for the transition to production phase includes:
All production implementation modules
The integrated data warehouse architecture and technical infrastructure
Production data
Installation plan
System documentation
Training materials
Discovery Phase The goal of this phase is to evaluate the implemented increment,
identify increment opportunities, and identify and plan for the next increment. This
enables for the users and developers to analyze the effort most recently undertaken,
make adjustments, review the possible increments, and select the next effort based on
business need and data warehouse infrastructure need.
The overall objectives of this phase include:
Perform a detailed evaluation of the implemented increment.
Identify opportunities and select the next increment.
Evaluate the completed project plan and consider experiences and lessons learned
from previous efforts.
Drive ongoing data warehouse development with business need and user input.
Prerequisite information needed for the discovery phase includes:
System in production
Increment project plan
Use log evaluation
Enterprise data warehouse implementation road map and infrastructure road map
Enterprise data warehouse architecture and technical architecture
Increment technical architecture
Enterprise data warehouse requirements
.....................................................................................................................................................
4-36 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Processes
Cohesive set of tasks that meet objectives
Common skill set
Project deliverables
Most overlap and interrelate; others are strict
predecessors
Copyright Oracle Corporation, 1999. All rights reserved.

Processes
Business Requirements Definition
Data Acquisition
Architecture
Data Quality
Warehouse Administration
Metadata Management
Data Access
Database Design and Build
Documentation
Testing
Training
Transition
Post-Implementation Support
.....................................................................................................................................................
Data Warehousing Fundamentals 4-37
.....................................................................................................................................................
DWM Fundamental Elements
Processes
A process is a cohesive set of related tasks that meets a specific project objective and
results in key deliverables.
Each process is a discipline involving similar skills to perform the tasks within the
process. You might think of a process as a simultaneous subproject within a larger
development project.
Every data warehouse project involves most if not all of the following processes,
whether they are the responsibility of the consulting team, the client, IT staff, a third
party, or a combination of these. Most processes overlap in time with others and are
interrelated through common deliverables, while others are strict predecessors of each
other.
Business Requirements Definition
Data Acquisition
Architecture
Data Quality
Warehouse Administration
Metadata Management
Data Access
Database Design and Build
Documentation
Testing
Training
Transition
Post-Implementation Support
.....................................................................................................................................................
4-38 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Business Requirements Definition


Defines requirements
Clarifies scope
Establishes implementation road map
Provides initial focus on enterprise
implementation
Identifies information needs
Models the requirements
Copyright Oracle Corporation, 1999. All rights reserved.

Data Acquisition
Identify, extract, transform, and transport source
data
Consider internal and external data
Move data between sources and target
Perform gap analysis between source data and
target database objects
Define first-time load and refresh strategy
Define tool requirements
Build, test, and execute data acquisition modules
.....................................................................................................................................................
Data Warehousing Fundamentals 4-39
.....................................................................................................................................................
DWM Fundamental Elements
Business Requirements Definition
The Business Requirements Definition process defines the requirements, clarifies the
scope, and establishes the implementation road map of the data warehouse. With the
direction of the business organization, strategic business goals and initiatives are
outlined and used to direct the strategies, purpose, and goals of the data warehouse
solution.
As the process continues, Business Requirements Definition focuses on scoping the
solution to be developed and delivered, identifying the warehouse information needs,
and modeling the requirements.
Data Acquisition
The Data Acquisition process identifies, extracts, transforms, and transports all source
data necessary for the operation of the data warehouse. Data acquisition is performed
among several components of the warehouse, including operational and external data
sources to data warehouse, data warehouse to data mart, and data mart to individual
marts.
Early in the data acquisition process, data sources are identified and evaluated against
the subject areas, and gap analysis is conducted to ensure that the data is available to
support the information requirements. Strategies are developed for the first-time load
of the warehouse and for the subsequent refreshes of the warehouse.
You evaluate tools against high-level requirements and make recommendations.
With the detailed analysis output, modules are designed and built to extract, transform,
transport, and load the source data into the warehouse. Once built, the modules are
tested and executed and the production database objects are populated.
.....................................................................................................................................................
4-40 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Architecture
Specify technical foundation
Create warehouse architectural design
Integrate products of architecture components for
scalability and flexibility
Determine database environmentdistributed or
centralized
Define development, testing, training, and
production environments
Configure the platform
Perform database sizing
Consider disk striping
Copyright Oracle Corporation, 1999. All rights reserved.

Data Quality
Ensure data consistency, reliability, accuracy
Develop a strategy for:
Cleansing
Integrity functions
Quality management procedures
Identify business rules for:
Cleansing
Error handling
Audit and control
Define data quality tool requirements
Build, test, and execute data quality modules
.....................................................................................................................................................
Data Warehousing Fundamentals 4-41
.....................................................................................................................................................
DWM Fundamental Elements
Architecture
The Architecture process specifies elements of the technical foundation and
architectural design of the data warehouse. The focus is on integrating different
products and the data warehouse components to ensure an extensible and scalable
architecture.
For the technical architecture, an evaluation is performed to determine whether the
database environment should be distributed or centralized. Network, hardware and
software requirements, including acquisition; infrastructure changes; and the platform
configuration are defined and implemented.
The platform configuration covers the data acquisition environment, server
architecture, middleware, database sizing, and disk striping.
The data warehouse architecture ensures an integrated strategic data warehouse
architecture while delivering incremental solutions.
Data Quality
The Data Quality process ensures the consistency, reliability, and accuracy of the data
in the warehouse. A data quality strategy is developed based upon a clear
understanding of the agreements and contractual obligations for data cleansing, audit
and control, and integrity functions.
Data management procedures are defined.
Data quality tools are evaluated and recommended.
The process identifies the business rules for error exception and handling, scrubbing
and cleansing, and audit and control. The business rules for error handling may vary
between the initial load and subsequent updates to the data warehouse. Using the data
quality strategy, procedures, and tools, modules are developed to support the
requirements for data quality.
.....................................................................................................................................................
4-42 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Warehouse Administration
Specify maintenance strategy for:
Configuration management
Warehouse management
Data governing
Define warehouse management workflow
and tool requirements
Build, test, and execute modules
Prove data access management and
monitoring
Automate warehouse management tasks
Copyright Oracle Corporation, 1999. All rights reserved.

Metadata Management
Define metadata strategy
Define metadata types
Specify requirements for the metadata repository,
integration, and access
Establish technical and business views of
metadata
Develop modules for capturing,
bridging, and accessing metadata
.....................................................................................................................................................
Data Warehousing Fundamentals 4-43
.....................................................................................................................................................
DWM Fundamental Elements
Warehouse Administration
The Warehouse Administration process specifies the strategy and requirements for the
maintenance, use and ongoing update of the data warehouse. Strategies are established
for configuration management, warehouse administration, and data governing.
Warehouse administration workflow, tool evaluation, and testing are addressed.
Modules are designed and built for scheduling, backup and recovery, archiving,
security, audit, and data governing. Several data access management and monitoring
tasks are addressed during this process, including authorizing access to appropriate
levels of data, monitoring usage, governing queries, identifying repetitive queries,
calculating metrics, defining access thresholds, adding or removing users, and
updating access authority.
To provide successful ongoing support and maintenance of the warehouse, this process
focuses on the automation of the warehouse management tasks.
The process also defines strategies for security and control, backup and recovery,
disaster recovery, archiving, and restoration.
Metadata Management
The Metadata Management process specifies the metadata strategy and the
requirements for the metadata repository, integration, and access. The primary
objective of this process is to provide technical and business views of the warehouse
metadata.
The technical view focuses on compiling the metadata to support warehouse
management. This view includes data acquisition rules; transformation of source
data to the target database; time and date of data; data authorization; refresh,
archive, and backup schedules and results; and the data accessed, including
metrics such as frequency and volume of requests.
The business view focuses on enabling users to understand the information
available in the warehouse and how it may be accessed. The business metadata
focuses on what data is in the warehouse, the source of the data, how it was
transformed from source to target, and information compiled while accessing the
warehouse.
The Metadata Management process also develops the modules for capturing, bridging,
and accessing the metadata. Metadata is created by several data warehouse
components, such as data acquisition, database design, and data access. Each
component, particularly if supported by a tool, has its own metadata storage facility
and access capabilities, therefore the disparate metadata must be linked using bridging
capabilities to ensure consistency and to facilitate access by the appropriate personnel.
.....................................................................................................................................................
4-44 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Data Access
Identify, select, and design user access tools
Define user profiles
Determine requirements for interface style,
queries, reports, and the end user layer
Evaluate, acquire, and install access tools
Design and develop data access objects
Queries and reports
Catalogs
Hierarchies and dimensions
Copyright Oracle Corporation, 1999. All rights reserved.

Database Design and Build


Support data requirements
Provide efficient access
Create and validate logical and physical models
Create relational and multidimensional database
objects
Evaluate partitioning, segmentation, and
placement
Identify indexes and keys
Generate DDL
Build and implement database objects
.....................................................................................................................................................
Data Warehousing Fundamentals 4-45
.....................................................................................................................................................
DWM Fundamental Elements
Data Access
The Data Access process focuses on identifying, selecting, and designing tools to
support user access to data. A strategy is established and the user requirements are
defined as a framework for the data access environment.
Tools are evaluated, tested, and recommended.
User profiles are defined based on the level of data required to support their analysis,
decision-making requirements, and skill level. Detailed requirements are also
collected for the user interface style and for queries and reports.
With the user profiles, functional requirements, and levels of data to be accessed, tool
criteria are established for each data access component. In most cases, data access is
supported by a variety of tools rather than one tool to support everyone.
After tools are selected and installed, the data access objects are designed and
developed, including canned queries and reports, catalogs, metadata retrieval,
hierarchies, dimensions, user layer schemas, and user interfaces.
Database Design and Build
The Database Design and Build process implements the design of database objects
that support the data requirements and ensure efficient access to the data. This process
focuses on creating and validating the database logical and physical designs for the
relational and multidimensional database.
Physical data partitioning, segmentation, and data placement are evaluated against
business and user requirements and operational constraints. Indexes and key
definitions are decided. The database data definition language (DDL) is generated and
is used to build and implement the development, testing, and production of data
warehouse database objects.
.....................................................................................................................................................
4-46 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Documentation
Produce textual deliverables:
Glossary
User and technical documentation
Online help
Metadata reference guide
Warehouse management reference
New features guide
Copyright Oracle Corporation, 1999. All rights reserved.

Testing
Develop a test strategy
Create test plans, scripts, and scenarios
Test all components:
Data acquisition
Data access
Ad hoc access
Regression
Volume
Backup
Recovery
Support acceptance testing
.....................................................................................................................................................
Data Warehousing Fundamentals 4-47
.....................................................................................................................................................
DWM Fundamental Elements
Documentation
The Documentation process focuses on producing all user and technical
documentation for the data warehouse, including references, user and system
operations guides, and online help.
To ensure active and successful use of the warehouse, the metadata reference guide
describes the contents of the data warehouse in business terms and provides a
navigational road map to the contents of the data warehouse.
In addition, the warehouse management documentation outlines the workflow and
manual and automated management procedures.
The new features guide highlights any enhancements to warehouse functionality that
result from the implementation of the solution.
Testing
The Testing process is an integrated approach to testing the quality of all components
of the data warehouse. The testing strategy is developed and approved before the test
system is created. System integration and module test plans, test scripts, and test
scenarios are developed. Each test is performed and proven. Testing includes proving
the physical design of the database.
Data acquisition modules, data access tools, and canned queries and reports also
undergo thorough module and integration testing. The testing strategy addresses all
components of the solution, including the ad hoc access processes.
Regression testing is performed, testing changes to the data warehouse against a
baseline, to ensure past functionality works when an enhancement is added.
Volume testing is conducted on the production platform to ensure that performance
meets established objectives.
Preparation of the acceptance environment and support for acceptance testing are also
performed during the Testing process.
.....................................................................................................................................................
4-48 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Training
Define requirements:
Technical
End user
Business
Identify staff to be trained
Establish time frames
Design and develop materials
Focus on tool training and use of
the warehouse
Copyright Oracle Corporation, 1999. All rights reserved.

Transition
Define tasks for transitioning to the production
warehouse
Migrate modules and procedures
Develop the installation plan
Prepare the maintenance environment
Prepare the production environment
.....................................................................................................................................................
Data Warehousing Fundamentals 4-49
.....................................................................................................................................................
DWM Fundamental Elements
Training
The Training process defines the development and user training requirements,
identifies the technical and business personnel requiring training, and establishes time
frames for executing the training plans.
Training plans and training materials are designed and developed. User and technical
training is conducted.
The key objective is to provide both users and administrators with adequate training to
take on the tasks of operating, maintaining and using the data warehouse solution.
Training should focus on tool training and how business value is generated from the
information in the data warehouse.
Transition
The Transition process focuses on tasks to perform to transition to the production data
warehouse, and includes tasks to create the installation plan and prepare the
maintenance and production environments. During this process, the warehouse
management workflow is implemented and the production data warehouse is
available.
.....................................................................................................................................................
4-50 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Post-Implementation Support
Evaluate and review warehouse use
Monitor warehouse use
Refresh the warehouse
Monitor and respond to problems
Conduct performance testing and tuning
Transfer responsibility
Evaluate and review the implemented
solution
.....................................................................................................................................................
Data Warehousing Fundamentals 4-51
.....................................................................................................................................................
DWM Fundamental Elements
Post-Implementation Support
The Post-Implementation Support process provides an opportunity to evaluate and
review the solution. You evaluate use of the warehouse by accessing metadata and
evaluating queries and reports run against the warehouse. The information assists with
management of standard queries and reports, and the user layer, and identifies required
indexes.
The process also focuses on refreshing the warehouse, monitoring and responding to
system problems, correcting errors, and conducting performance and tuning activities
for all components of the data warehouse. Other actions at this time include:
Change control for information requirements
Roll out of metadata, queries, reports, filters, and conditions
Library of shared objects
Security
Incorporation of new users
Distribution of data marts and catalogs
During this process, responsibility for the data warehouse may be transferred from
information system (IS) staff to the owning organization.
.....................................................................................................................................................
4-52 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Tasks and Deliverables


Outlined in Work Breakdown Structure
Organized by process and phase
Task ID Task Name
A Strategy
A.RD.EXEC Business Requirements Definition
A.RD.001 Obtain Existing Reference Material
A.RD.002 Obtain Reference Data Models
A.RD.003 Define Strategic Goals, Vision of the Enterprise
A.RD.004 Establish Business Initiatives
A.RD.005 Define Objectives and Purpose of Enterprise Data
Warehouse
A.RD.015 Collect Enterprise Business Information
Requirements
A.RD.034 Document Data Warehouse Subject Areas
A.RD.035 Create Data Warehouse Subject Area Data Model
A.RD.044 Define Data Warehouse Implementation Roadmap
A.RD.045 Prepare Business Case for Enterprise Data
Warehouse
.....................................................................................................................................................
Data Warehousing Fundamentals 4-53
.....................................................................................................................................................
DWM Fundamental Elements
Tasks and Deliverables
Tasks are the foundation for the work breakdown structure (WBS). Each task is
assigned to a process and phase within an approach.
DWM identifies tasks and deliverables that are included in a full life-cycle
development project. They are fully outlined in the Work Breakdown Structure and are
organized by process and phase. Below you see a sample of tasks as identified in the
WBS.
Task ID Task Name Deliverable
A Strategy
A.RD.EXEC Business Requirements Definition
A.RD.BEG Begin Strategy Execution
A.RD.001 Obtain Existing Reference
Material
Existing Reference Material
A.RD.002 Obtain Reference Data Models Reference Data Models
A.RD.003 Define Strategic Goals, Vision,
and Initiatives of the Enterprise
Enterprise Goals, Vision, and
Initiatives
A.RD.004 Define Objectives and Purpose of
Enterprise Data Warehouse
Enterprise Data Warehouse
Statement of Value
A.RD.015 Collect Enterprise Information
Requirements
Enterprise DW Information
Requirements
A.RD.034 Create Enterprise DW Logical
Data Model
Enterprise DW Logical
Model
A.RD.044 Define Enterprise DW
Implementation Roadmap
Enterprise DW
Implementation Roadmap
A.RD.045 Prepare Business Case for
Enterprise Data Warehouse
Enterprise DW Business
Case
.....................................................................................................................................................
4-54 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Roles
The project team: roles and responsibilities
Common roles
Analyst, database administrator, programmer,
tester
Warehouse specific roles
DW architect, metadata architect, data quality
administrator, DW administrator
.....................................................................................................................................................
Data Warehousing Fundamentals 4-55
.....................................................................................................................................................
DWM Fundamental Elements
Roles
A warehouse project is complex in many ways especially the project team. The DWM
identifies the roles required and the main responsibilities of each role.
It identifies roles that are common within technology departments, such as:
Development database administrator, who works closely with the system
administrator
Lead tester, who oversees the test script planning, development, and execution
activities
Production database administrator, who installs and configures the production
database and maintains database access controls
It identifies roles that are unique to data warehouse projects, for example:
Data warehouse administrator: The data warehouse administrator is responsible for
warehouse management, maintenance, and the total data warehouse production
environment.
Data warehouse architect: The data warehouse architect establishes the strategic
data warehouse architecture and manages the integration of the developed
increments with the wider data warehouse architecture.
Data warehouse database designer: The data warehouse database designer is
responsible for producing the logical and physical database designs for the data
warehouse and data mart and for metadata objects.
Within this element of the method, other roles are identified.
.....................................................................................................................................................
4-56 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Warehouse Technology Initiative


Customer driven
Warehouse products only
Quality, not quantity
High-value partnerships
Requires
Oracle certified solution partner
level
Product certification
References
.....................................................................................................................................................
Data Warehousing Fundamentals 4-57
.....................................................................................................................................................
Oracle Warehouse Technology Initiative (WTI)
Oracle Warehouse Technology Initiative (WTI)
A number of leading hardware and software vendors provide warehouse initiatives.
They may comprise a solution that uses a single vendors products or combines
products from multiple vendors.
The Oracle Warehouse Technology Initiative (WTI) offers the Oracle database
combined with specialized tools from dedicated warehouse providers. The partner
company must supply products with specific functionality that supports data
warehouses using an Oracle database.
Oracle Alliance Program
The Oracle Alliance program is a partnership of the worlds leading information
technology companies. There are more than 3,000 partners in 93 countries. Through
the program, partners and Oracle work together to offer mutually reinforcing products
and services that expand markets and lead to greater business success for all. The
program includes partners from key segments of the information technology industry,
including software developers, hardware vendors, distributors, resellers, consultants,
and system integrators.
.....................................................................................................................................................
4-58 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
M
Copyright Oracle Corporation, 1999. All rights reserved.

WTI Partners by Categories


Design and administration
Source
Manage
Access
Data content provider
.....................................................................................................................................................
Data Warehousing Fundamentals 4-59
.....................................................................................................................................................
Oracle Warehouse Technology Initiative (WTI)
WTI Partners by Categories
Oracles WTI is composed of the following partner categories: design and
administration, source, manage, access, and data content providers.
Design and Administration Enables you to plan and design a data warehouse from
the ground up. These products help you identify and qualify the source data, lay out
the data structures, and define the mapping between data sources and the target data
warehouse.
Source WTI partners in this category to produce tools that help you build and
implement the data warehouse. IT professionals use these tools and utilities to extract,
transform, cleanse, and move data from source systems into the data warehouse or
data marts.
Manage This category covers products in every area of warehouse management,
including administering the database, managing the warehouse metadata, and
managing recurring tasksany tool or utility that enables you to manage or administer
an Oracle7, Oracle8, Oracle8i, or Express-based data warehouse or data mart.
Access Enables you to view the contents of your data warehouse or data mart
database for analysis. Tools include report writers, query products, OLAP software,
executive information systems, and data mining. The products embrace a broad range
of architecturesfrom server-only to client-server to Web-based servers.
Data Content Provider This category includes any enterprise that sells or rents data
sets suitable for data warehousing. The data can range from market-share information
to demographics to financial-time services.
.....................................................................................................................................................
4-60 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Summary
This lesson discussed the following topics:
Explaining the different approaches to warehouse
development and the benefits of an incremental
approach
Identifying the purpose of the Oracle Method
Discussing the purpose and fundamental elements
of Data Warehouse Method
Discussing the objectives of the Oracle
Warehouse Technology Initiative
.....................................................................................................................................................
Data Warehousing Fundamentals 4-61
.....................................................................................................................................................
Summary
Summary
This lesson discussed the following topics:
Explaining the different approaches to warehouse development and the benefits of
an incremental approach
Identifying the purpose of the Oracle Method
Discussing the purpose and fundamental elements of Data Warehouse Method
Discussing the objectives of the Oracle Warehouse Technology Initiative
.....................................................................................................................................................
4-62 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Copyright Oracle Corporation, 1999. All rights reserved.

Practice 4-1 Overview


This practice covers the following topics:
Defining the business requirements of a fictitious
beverage company, including the purpose, goals,
and strategies of a data warehouse by interviewing
executives
Uncovering some of the possible issues and
challenges in a data warehouse implementation
project through the class discussion
.....................................................................................................................................................
Data Warehousing Fundamentals 4-63
.....................................................................................................................................................
Practice 4-1
Practice 4-1
Exercise Background
Task You and a team of two or three other people are about to embark on Phase I of
a data warehouse project, that is determining the business requirements. This task
involves interviewing executives in your company to define the purpose, goals, and
strategies of the data warehouse.
In this exercise, you are going to form small groups and role-play the interviewing
session with your teammates. Do the following:
Read through this worksheet. (5 mins)
Form into groups of four and role play the interviewing session with your
teammates. Each of you will be assuming a role such as the DW team manager, the
chief financial officer (CFO), the chief operating officer (COO), or the information
technology (IT) director. Use the interview questions and the background about
each character to help you in this exercise. (15 mins)
Regroup and in the class discussion answer the questions. Give your feedback
based on your observation. (20 mins)
Scenario Krispan Beverages, Inc., produces soft drinks, noncarbonated drinks,
mixers and sparkling waters, and distributes them all over the world. The CFO, has
been promoting and executing on the concepts of data warehousing for some time.
Some of the executives at Krispan seem to think that they are ready to build a data
warehouse to better understand their business and help business decision makers to
make better decisions.
Company Profile Krispan Beverages Inc. is based in California. The company
develops, manufactures, markets, and distributes a full line of branded cola and
multiflavored soft drinks, juice products and bottled water.
Mission Statement We exist to create value for our share owners on a long-term
basis by building a business that enhances Krispans trademarks. We do this by
maintaining our market leading status developing superior soft drinks, both carbonated
and noncarbonated, and profitable nonalcoholic beverage system, financial analysis,
and distribution services using empowered team dynamics in a total quality
paradigm.
.....................................................................................................................................................
4-64 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
Role 1: Data Warehouse Team Manager He is the data warehouse team manager
for the Data Warehousing Implementation team. He is going to interview the
following key people using the interview questions on the next page.
The chief financial officer (CFO) who is also the board-appointed project sponsor
this data warehouse implementation project.
The chief operating officer (COO)
The IT director (IT Director)
Role 2: CFO He was the board-appointed project sponsor and the person who has
been gaining a lot of profits from the companys success. He does not want the new
systems because they will require a lot of change within his group. He is conservative
in his thinking and wants things to go on as before. He supports the companys mission
statement but only so far as it meets his own agenda.
Role 3: COO She wants the system because she realizes the power of information,
believes that the data warehouse will give her real control in the company, and
acknowledges that the data warehouse will enable the company to be more
competitive in the marketplace. The COO has a good high-level understanding of what
she wants the system to provide her but she will need significant help in sorting out the
details. She understands the vision for the business and fully supports it.
Role 4: IT Director She does not understand the vision of the business but pretends
that she does by quoting it on a regular basis. She is very technical savvy but lacks the
business understanding of the organization. She wants power and influence, and
believes she can get both of these through the new infrastructure and big systems that
are planned.
.....................................................................................................................................................
Data Warehousing Fundamentals 4-65
.....................................................................................................................................................
Practice 4-1
Interview Questions
Ask the key persons the following questions.
Class Discussions
1 Identify the major challenges for a data warehousing implementation project, as
shown in this exercise.
2 Give your suggestions on how to overcome these challenges.
3 If you apply the Oracle Data Warehouse Method in the implementation to this
project, how would apply it and where do you see the benefits from using this
method?
Question to Ask CFO COO IT Director
1 What is the business vision?
2 Why does the company need an
enterprise data warehouse?
3 What do you expect the data warehouse
to provide or what will you get out of
the warehouse?
4 How soon do you need to have data
loaded into the data warehouse and how
up-to-date does the data need to be?
.....................................................................................................................................................
4-66 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 4: Driving Implementation Through a Methodology
.................................
5
Planning for a Successful
Warehouse
.....................................................................................................................................................
5-2 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Overview
Project Management
(Methodology, Maintaining Metadata)
Analyzing
User Query
Needs
Choosing a
Computing
Architecture
Modeling
the Data
Warehouse
Planning
Warehouse
Storage
ETT
(Building the
Warehouse)
Meeting a
Business
Need
Supporting
End User
Access
Managing
the Data
Warehouse
Planning
for a
Successful
Warehouse
Planning
for a
Successful
Warehouse
Defining
DW Concepts
& Terminology
Copyright Oracle Corporation, 1999. All rights reserved.
Objectives
After completing this lesson, you should be able to
do the following:
Explain the financial issues that must be managed
Outline techniques for obtaining business
commitment to the warehouse
Outline the key tasks involved in managing a
warehouse project
Identify the major warehouse planning phases and
their deliverables
List warehouse strategy phase deliverables
List warehouse scope phase deliverables
.....................................................................................................................................................
Data Warehousing Fundamentals 5-3
.....................................................................................................................................................
Overview
Overview
The previous lesson introduced the importance of driving a warehouse project by a
methodology.
This lesson introduces the planning that is critical to the success of a data warehouse
project. Planning phases, deliverables, and project roles are identified. Overall
warehouse strategy and project scope are defined.
Note that the Planning for a Successful Warehouse block is highlighted in the
overview slide on the facing page.
Objectives
After completing this lesson, you should be able to do the following:
Explain the financial issues that must be managed in developing and implementing
a data warehouse.
Outline techniques for obtaining business commitment to the warehouse.
Outline the key tasks involved in managing a warehouse project
Identify the major warehouse planning phases and their deliverables
List warehouse strategy phase deliverables
List warehouse scope phase deliverables
.....................................................................................................................................................
5-4 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Productivity or ROI (30%)
For internal users
For external users
Intangible Benefits
(45%)
Remain
competitive
Respond to
changing
business
conditions
Support
reorganization
Database Associates, Data Warehouse in Practice, June 1993
Financial Justification
Better Data and Better
Decision Making (25%)
Reduce IS costs
Better response time
Rigorous reporting
Copyright Oracle Corporation, 1999. All rights reserved.
ROI and Associated Costs
Build a strong case
Costs
ROI
Profitability
Efficiency
Objectives
Consider
Impact of time for ETT
Additional storage requirements
Cost of redundant data
Cost of database, software licenses, labor
.....................................................................................................................................................
Data Warehousing Fundamentals 5-5
.....................................................................................................................................................
Managing Financial Issues
Managing Financial Issues
Financial Justification
The project is a big investment in resources and finances. Management must be able to
report on how the data warehouse benefits the business. Justification is divided into
three main areas:
The intangible benefits (45%) are that the business can remain competitive,
respond to changing business conditions, and support reorganization.
Better data and decision making (25%) reduce information technology costs,
provide better response times, and provide rigorous reporting.
Productivity or Return on Investment (ROI) (30%) benefit internal and external
users.
Return on Investment The financial justification must set out a strong case that
clearly establishes measurements such as cost versus return on investment, and
increased efficiency and profit. It must also set clearly defined objectives that can be
monitored and measured.
Associated Costs Along with cost justification, you should provide a plan that
specifies other factors that will impact the cost of the project and other aspects of the
business.
The cost of developing ETT or purchasing the ETT tools
The actual time required for data cleansing, transformation, and extraction, which
may impact day-to-day operations
Storage requirements for extract, summarization, work space, log space, backup,
recovery, and maintenance
The cost of redundant data
Hardware and software costs
The cost of server and system software licenses
Labor costs
You may regard this as a negative approach because some of these issues have a bad
impact on the business. However, given the enormous size of a data warehouse
project, every issue, good or bad, must be clearly understood and appreciated.
.....................................................................................................................................................
5-6 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Funding the Project
State that initial system integration costs are high.
Determine who funds the project:
Information systemsdevelopment group
Departmentusers
Information
systems
Selected
subject for
pilot
Department
Department
Department
More subjects
funded by end-user
organizations
Small staff
Short duration
Copyright Oracle Corporation, 1999. All rights reserved.
Some warehouses do not charge initially.
Benefits:
Encourages efficient use
Provides shared costs
Drawbacks:
Users cannot dwell on detail.
Users try to reduce costs.
Machine resources are taken up monitoring
use.
Charging Back Costs
.....................................................................................................................................................
Data Warehousing Fundamentals 5-7
.....................................................................................................................................................
Managing Financial Issues
Funding
Initially, the information technology group may fund the project up until the pilot run
of the first increment. After the pilot, when the process is proven, funding usually
passes to the individual departments, particularly if the implementation is a
departmentalized data mart.
Debates often arise between information systems and individual departments about
who should pay for resources, such as the hardware and software, system (warehouse)
monitoring tools, and OLAP tools. Individual departments often express concern that,
if they fund tools in the development of one of the first subject areas that will be used
for warehouse initiatives, they should be able to recoup part of the investment from
other departments who build subject areas and benefit from those tools at a later time.
If the information systems department funds the tools, they absorb the cost or can bill
back to individual departments as required, over the depreciation life of the tools. In
the case of specific data marts (for departments), the cost is often the responsibility of
the local department.
Some warehouses do not charge for the first few months, usually while the project is
being funded by information systems development groups. Once the warehouse is
piloted and has proved successful, then charges are normally levied.
Charge Models
There are different models that you may use; none of them are completely fair. There
are no chargeback models strictly for the warehouse environment and the best model
may be a hybrid, specifically developed in house for the purpose.
Chargeback Benefits
Encourages efficient and sensible use of resources
Promotes realistic ongoing additional requirement requests
Allows users to share the cost for the data warehouse processing and maintenance
Chargeback Drawbacks
Users cannot dwell on detail, knowing they are being charged for the service.
Users may not be motivated to discover more, anticipating that costs may run too
high.
Machine resources are needed to monitor and maintain a charging system.
The business value of tangible, measurable results, in most cases, far outweighs the
overhead costs. Even if chargeback strategies are not deployed, the information
systems team still need to monitor warehouse use and can use those metrics to justify
future direction.
.....................................................................................................................................................
5-8 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Obtaining Business Commitment
Ensure that the warehouse:
Has total support
Is driven by the business
Research the problem
Identify goals, visions, priorities
Research the solution
Identify the benefits
Identify the constraints
.....................................................................................................................................................
Data Warehousing Fundamentals 5-9
.....................................................................................................................................................
Obtaining Business Commitment
Obtaining Business Commitment
A data warehouse implementation requires the total support of those who control the
business and make the decisions that drive the business forward. The warehouse is a
business-driven project, not an information technology drive for the latest hardware,
software, tools, and techniques.
Business objectives must be clear, well defined, measurable, and achievable:
Research and study the business problem; identify the business vision, goals, and
priorities
Research the solution and define what the warehouse solution may do
Identify the benefits of the solution, such as efficiency, people power, customer
satisfaction, and returns
Identify the constraints, such as schedule, costs, and experience
Note: Obtaining business commitment is supported by the Business Requirements
Definition of the DWM Strategy Phase.
.....................................................................................................................................................
5-10 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Data Warehouse Champion
Maintains intergroup communication
Settles conflicts
Identifies and solves issues
Articulates the vision
Brings in business expertise
Organizes and supports the team
Communicates progress
Brings the data warehouse to life
Copyright Oracle Corporation, 1999. All rights reserved.
Provides direction
Decides upon implementation issues
Sets priorities
Assists with resource allocation
Communicates to all levels at all times
Steering Committee
Business
executives
Information
systems
representatives
Knowledge
workers
.....................................................................................................................................................
Data Warehousing Fundamentals 5-11
.....................................................................................................................................................
Obtaining Business Commitment
Data Warehouse Champion
There must be someone within the organization who remains focused and works to:
Ensure all groups within the development team communicate.
Settle conflicts between groups.
Identify and solve issues or problems at any level.
Articulate the vision and wisdom of the warehouse to everyone involved in
developing and using the warehouse.
Bring business expertise to the task.
Organize and support the team.
Communicate progress, processes, and achievements throughout the organization.
Bring the data warehouse to life.
Steering Committee
The steering committee should comprise representatives of different sectors within the
business:
Business executives
Information systems representatives
Users
The aim of the committee is to:
Provide business direction.
Decide upon enterprisewide implementation issues.
Determine and set development priorities.
Assist with resource allocation.
Communicate consistently to all areas and levels of the organization.
Each subject area may have its own detailed project plan, which can be rolled up to a
master plan weekly or monthly. The steering committee must be aware of how
changes to business direction and priorities affect existing project plans, milestones,
and deliverables. They must approach the renegotiation of existing plans tactfully and
diplomatically.
Note: The steering committee is not a substitute for the project manager.
.....................................................................................................................................................
5-12 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Warehouse Data Ownership
Users must own the data
Users must be involved throughout
Users must be part of the steering committee:
Enhances cooperation
Reduces friction
Helps meet requirements
Enhances feedback
.....................................................................................................................................................
Data Warehousing Fundamentals 5-13
.....................................................................................................................................................
Obtaining Business Commitment
Warehouse Data Ownership
It is important that users feel they own the warehouse and the data contained within it.
If they have a vested interest in the project, they are eager for more information and
have an interest in the future use and maintenance of the content.
You should involve users throughout the project, making them part of the steering
committee.
Involving the users in this way leads to:
Enhanced cooperation between different departments in the business
Reduced friction among groups or departments, with problem resolution and
formal project and change management
Meeting business requirements
Continuous and useful feedback
.....................................................................................................................................................
5-14 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Managing a Warehouse Project
Determine organizational readiness for the
warehouse
Adopt an incremental approach to warehouse
development
Set expectations
Manage expectations
Assemble the project team
Estimate the data warehouse project
Recognize critical success factors
.....................................................................................................................................................
Data Warehousing Fundamentals 5-15
.....................................................................................................................................................
Managing a Warehouse Project
Managing a Warehouse Project
Managing a warehouse project involves seven broad categories of tasks:
Determining organizational readiness for the warehouse
Adopting an incremental approach to warehouse development
Setting expectations
Managing expectations
Assembling the project team
Estimating the data warehouse project
Identifying critical success factors
These tasks are described on the following pages.
.....................................................................................................................................................
5-16 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Determining Organizational Readiness
for the Warehouse
1. Are the objectives and business drivers clearly
defined, compelling, and agreed upon?
2. Have you selected a methodology for design,
development, and implementation?
3. Is the project scope clearly defined, with a focus
on business rather than technology?
4. Is there strong support from a business
management sponsor?
5. Does the business management sponsor have
specific expectations?
Copyright Oracle Corporation, 1999. All rights reserved.
Determining Organizational Readiness
for the Warehouse
6. Are there cooperative relations between
business and Information Systems staff?
7. Have you identified which source data will be
used to populate the data warehouse?
8. What is the quality and cleanliness of the
source data?
9. Are you authorized to choose and acquire
hardware and software to implement the
warehouse?
10. Are you prepared to select and train your
implementation team?
.....................................................................................................................................................
Data Warehousing Fundamentals 5-17
.....................................................................................................................................................
Managing a Warehouse Project
Determining Organizational Readiness for the Warehouse
Before you commit time, money, staff, and other resources to your data warehouse
project, it is essential that you assess the readiness of your organization for the
warehouse.
There are several good readiness checklists available in data warehousing textbooks.
Here is a representative list of essential indicators that test an organizations readiness.
If your organization is significantly unprepared in light of these indicators, experience
shows that the lack of readiness does not correct itself once the warehouse project
starts. If your organization is not ready for or committed to the warehouse, it is best to
delay the project rather than to start it and hope to catch up.
.....................................................................................................................................................
5-18 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Setting Expectations
Scope
Rollout over time
Phases
Incremental
.....................................................................................................................................................
Data Warehousing Fundamentals 5-19
.....................................................................................................................................................
Managing a Warehouse Project
Setting Expectations
Expectations for each data warehouse project phase should be established early on.
Every organization has heard something about data warehousing, data marts, data
mining, and on and on. To set the expectations throughout the organization you first
need to determine what each member of the organization is expecting from the data
warehouse.
Set Expectations for the Incremental Approach Educate all members of the
organization in advance that the data warehouse project will be incrementally
developed. Explain that there is no formal implementation of the entire data
warehouse all at once. Help the user community to understand that the data warehouse
provides views of the business over time and under continually changing strategic
environments.
.....................................................................................................................................................
5-20 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Managing Expectations
Documenting
Informing sponsors
Reporting progress to end users
.....................................................................................................................................................
Data Warehousing Fundamentals 5-21
.....................................................................................................................................................
Managing a Warehouse Project
Managing Expectations
Documenting Deliverables Managing expectations during the data warehouse
project management cycle can be completed by documenting the deliverables that
were completed within each phase.
Keeping Sponsors Informed Keep the executive sponsor of the warehouse, as well
as the end-user community, abreast of the iterative development that is taking place
during each phase.
Reporting Incremental Progress to End Users Highlight all new progress and
functionality to inform the user community of the incremental advances that are being
made to increase the amount of information that can be gained from the data
warehouse.
.....................................................................................................................................................
5-22 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Assembling the Project Team
Project manager/Project leader
Architect
Executive sponsor
Data analyst
Database or system administrator
.....................................................................................................................................................
Data Warehousing Fundamentals 5-23
.....................................................................................................................................................
Managing a Warehouse Project
Assembling the Project Team
During the life cycle of a data warehouse project, you will need to call on staff from
both the business side and Information Systems sides of your organization. Often,
project roles will be shared and switched over the project life cycle.
Project Manager/Project Leader
Manages and defines the data warehouse project plan
Is responsible for the overall design and function of the data warehouse
Coordinates project resources, controls the budget, documents project status,
resolves issues, coordinates vendor activity, manages change control
On large data warehouse projects, the project manager and project leader are typically
two different individuals.
Architect
Designs and documents data warehouse architecture and technical infrastructure
On a small data warehouse project, may also be responsible for integrating all
networking products and host connectivity
Executive Sponsor
Provides clout; influences resource availability, funding, and scheduling
Provides understanding of the organization and its business
Data Analyst
Is responsible for the data model and schema design
Manages data quality, data integration, aggregation, and updates
On a small data warehouse project, may also be involved in data extraction and
transformation
On a large data warehouse project, may also be involved in exploring end-user
data requirements and deploying business intelligence and analysis tools
Database or System Administrator
Is responsible for physical database implementation
Installs all hardware and software products for the data warehouse environment
Manages database installation, configuration, security, and administration
May also be involved in helping programmers with data extraction,
transformation, loading, backup, and archiving
.....................................................................................................................................................
5-24 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Estimating the Data Warehouse Project
Bottom-Up Project Estimate
Requirements
definition
Data acquisition
Architecture
Data quality
Administration
. . .
Percentage of
Project Effort
.25 .79 3.2
.74
1
.2
.3
.23
.59
.32
.12
1.36
.84
.39
.23
6.69
2.22
3.22
4.51
6.26
5.28
.2
5.84
.85
4.1%
16.1%
9.9%
4.3%
11.0%
Total A C B D E F
.....................................................................................................................................................
Data Warehousing Fundamentals 5-25
.....................................................................................................................................................
Managing a Warehouse Project
Estimating the Data Warehouse Project
How much will it cost? and When will it be ready? are typically the first questions
asked at the start of a data warehouse project. The most reliable approach to
estimating, which can provide answers to these questions, is to calculate a bottom-up
project estimate.
Bottom-Up Project Estimate A bottom-up estimate can be developed from a work
breakdown structure that contains all the tasks to be performed, with project roles
mapped to tasks, and defined roles percentages for task participation. The tasks and
role mapping provide the infrastructure for documenting the estimating factors that
influence each task. Estimating factors can then be used in an estimating formula for
each task.
The Percentage of Project Effort table depicted on the slide summarizes a bottom-up
estimating model. Each cell represents the percent of project effort in that phase of the
process. Phase columns sum down to phase totals, and process rows sum across to
process totals. Following is a key to the table.
Column Label Meaning
A Strategy
B Definition
C Analysis
D Design
E Build
F Transition
.....................................................................................................................................................
5-26 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Recognizing Critical Success Factors
Focus on the business, not the technology
Use an iterative development methodology
Include end users on the project team
.....................................................................................................................................................
Data Warehousing Fundamentals 5-27
.....................................................................................................................................................
Managing a Warehouse Project
Recognizing Critical Success Factors
Each data warehouse project management phase has critical success factors. The
critical success factors for the overall data warehouse project typically include these
three items:
Design the data warehouse with a focus on the business, not the technology. In a
successfully managed data warehouse project there are no technical decisions,
only business decisions.
Use an iterative development methodology. Include short phases that provide
frequent deliverables to help manage expectations throughout the project.
Include end users on the project team. End user input is necessary for design
decisions that enable the data warehouse project to meet the business goals.
.....................................................................................................................................................
5-28 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Identifying Planning Phases
Strategy
Scope
Analysis
Design
Build
Production
.....................................................................................................................................................
Data Warehousing Fundamentals 5-29
.....................................................................................................................................................
Identifying Planning Phases
Identifying Planning Phases
Effective and efficient data warehouse project management involves the use of project
phases. Project phases identify the tasks to be completed, the resources required, the
directing and reporting efforts, and the quality assurance required before moving onto
the next phase. Project phasing is a management technique used to focus project teams
toward a short-term goal and to communicate progress to senior management.
Phase Goal
Strategy Clearly define the business objectives and purpose of the data
warehouse solution, while establishing an environment for
incremental development. The strategy phase provides the enterprise
vision for the data warehouse.
Scope Clearly define the scope and objectives for the incremental
development effort while complying with strategy. Initial models are
created, data sources are documented, and the scope of data quality is
defined. The technical architecture and data warehouse architecture
are also created for the scoped solution.
Analysis Formulate the detailed requirements for the data acquisition, and the
data access requirements for business analysis and decision making.
Design Take the requirements from the analysis phase and translate them
into detailed design specifications, while accounting for the technical
architecture, data warehouse architecture, and available technology.
Build Create and test the database structures, data acquisition modules,
warehouse administration modules, metadata modules, data access
modules, and reports and queries.
Production Install the incremental solution, prepare the client personnel to use
and manage the solution, go to production, and begin managing the
growth and maintenance of the warehouse.
.....................................................................................................................................................
5-30 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Strategy Phase Deliverables
The Strategy Phase
Business goals and
objectives
Data warehouse
logical model
Source system data
flows
Data warehouse purpose,
objectives, and scope
Incremental
milestones
Phases
Subject area gap
analysis
Data acquisition
strategy
Strategy
Scope
Analysis
Design
Build
Production
.....................................................................................................................................................
Data Warehousing Fundamentals 5-31
.....................................................................................................................................................
Identifying Warehouse Strategy Phase Deliverables
Identifying Warehouse Strategy Phase Deliverables
For each of the data warehouse project phases there are deliverables. The deliverables
for the strategy phase focus on defining the business objectives and purpose of the data
warehouse solution.
The purpose and objectives for the total data warehouse solution are essential to
setting and managing expectations. The strategy phase also clearly defines the data
warehouse team and the executive sponsor.
Strategy Deliverable Description
Business goals and objectives Documents the strategic business goals and objectives
Data warehouse purpose,
objectives, and scope
Documents the purpose and objectives of the
enterprise data warehouse, its scope, and how it is
intended to be used
Enterprise data warehouse logical
model
High-level, logical information model that diagrams
the major entities and relationships for the enterprise
Incremental milestones Documents a realistic scope of the data warehouse,
acceptable delivery milestones for each increment, and
source data availability
Source system data flows Outlines source system data, where it originates, the
flow of data between business functions and source
systems, degree of reliability, and data volatility
Subject area gap analysis Documents the variance between the information
requirements and the ability of the data sources to
provide the information
Data acquisition strategy Documents the approach for extracting, transforming,
transporting, and loading data from the source systems
to the target environments for the initial load and
subsequent refreshes
.....................................................................................................................................................
5-32 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Strategy Phase Deliverables
Phases
Strategy
Scope
Analysis
Design
Build
Production
The Strategy Phase
Data warehouse
architecture
Technical
infrastructure
Data access
environment
Data quality strategy
Data warehouse
administration strategy
Metadata strategy
Training strategy
.....................................................................................................................................................
Data Warehousing Fundamentals 5-33
.....................................................................................................................................................
Identifying Warehouse Strategy Phase Deliverables
Identifying Warehouse Strategy Phase Deliverables (continued)
Strategy Deliverable Description
Data warehouse architecture Documents the set of rules or structures providing the
framework for the centralized data warehouse, data
marts, metadata repository, fact tables,
multidimensional structures, and data access
components
Technical infrastructure Outlines the technologies, platforms, databases,
gateways, and other components necessary to make
the architecture functional
Data access environment Documents the identification, selection, and design of
tools that support end-user access to the warehouse
data
Data quality strategy Outlines the approach for data management, error and
exception handling, data cleansing, and the audit and
control of the data
Data warehouse administration
strategy
Documents the warehouse administration tasks and
considerations such as version control, archive,
backup, and analysis of metadata and query profiles
for optimization
Metadata strategy Documents the strategy for capturing, integrating, and
accessing metadata for all components of the
warehouse environment
Training strategy Outlines the development and end-user training
requirements, identifies the technical and business
personnel requiring training, and establishes time
frames for executing the training plans
.....................................................................................................................................................
5-34 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Defining the Warehouse Project Scope
Focus on the business, not the
technology
Break down the project into
manageable phases
Encourage rapid turnaround on
deliverables
Always include the end users on
the team
Phases
Strategy
Analysis
Design
Build
Production
Scope
.....................................................................................................................................................
Data Warehousing Fundamentals 5-35
.....................................................................................................................................................
Identifying Project Scope Phase Deliverables
Identifying Project Scope Phase Deliverables
Defining the Warehouse Project Scope
Without a complete understanding of the business objectives and scope of the overall
warehouse, project staff will not be able to proceed successfully.
Focus on the Business, Not the Technology Iterative development requires
discipline in scoping deliverables. A clear business focus, rather than technology
considerations, should drive scope. A realistic scope that produces deliverables in
short time frames helps ensure success and continued management commitment to the
data warehouse implementation.
Break the Project Down into Manageable Phases One challenge in defining
manageable phases is dealing with numerous tasks coupled to numerous
interdependencies, all occurring within a short time frame. Breaking this complexity
down into manageable pieces works toward the success of the project.
Define Deliverables As each phase is broken down into a collection of processes,
define the expected deliverables for each task.
Involve End Users Iterative development works only when users are active
participants on the delivery team. In a data warehouse project there should be no
technical decisions, only business decisions. Business requirements drive all technical
decisions.
.....................................................................................................................................................
5-36 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Scope Phase Deliverables
The Scope Phase Phases
Strategy
Scope
Analysis
Design
Build
Production
Business requirements
definition
Data sources
Load and refresh
plans
Technical architecture
Data warehouse
architecture
.....................................................................................................................................................
Data Warehousing Fundamentals 5-37
.....................................................................................................................................................
Identifying Project Scope Phase Deliverables
Defining the Warehouse Project Scope (continued)
The deliverables for the scope phase focus on clearly defining the scope and objectives
for the incremental development effort. Initial models are created, data sources are
documented, and the scope of data quality is clearly defined. The technical
architecture and data warehouse architecture are also created.
Scope Deliverable Description
Business requirements definition Documents the objectives and defines the
development efforts for the business requirement task
(The scope clearly outlines the requirements,
functionality, expected benefits, and costs of the
solution. Success criteria and business constraints are
also documented.)
Data sources Outlines the operational and external data source
systems, hardware and software platforms, types of
data in the system, frequency and source of updates
Load and refresh plans Documents how extraction, transformation, and
transportation will be performed
Technical architecture Documents capacity planning, interface requirements,
hardware architecture, software, tools, and
configuration requirements
Data warehouse architecture Outlines the database objects, data access components,
and metadata repository
.....................................................................................................................................................
5-38 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Scope Phase Deliverables
Phases
Strategy
Scope
Analysis
Design
Build
Production
The Scope Phase
Data quality
Warehouse administration
plan
Metadata integration
plan
Data access plan
Training plan
.....................................................................................................................................................
Data Warehousing Fundamentals 5-39
.....................................................................................................................................................
Identifying Project Scope Phase Deliverables
Defining the Warehouse Project Scope (continued)
Scope Deliverable Description
Data quality Documents the plan for data cleansing and scrubbing,
error and exception handling, auditing, and feeding
back corrected data to source systems
Warehouse administration plan Documents the tasks, resources, and time frames for
producing the warehouse administration functionality
Metadata integration plan Outlines the tasks, resources, and time frames needed
to ensure the metadata is integrated with the data
warehouse components
Data access plan Documents the data access tasks for implementing an
existing tool or developing a system to provide access
capabilities
Training plan Outlines the training needed to support the tasks of the
current phase
.....................................................................................................................................................
5-40 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Summary
This lesson discussed the following topics:
Cultivating management support, both financial
and political, for the warehouse
Developing a realistic scope that produces
deliverables in short time frames to help ensure
success
Assessing your organizations readiness for a data
warehouse
Setting realistic expectations
.....................................................................................................................................................
Data Warehousing Fundamentals 5-41
.....................................................................................................................................................
Summary
Summary
This lesson discussed the following topics:
Cultivating management support, both financial and political, for the warehouse
Developing a realistic scope that produces deliverables in short time frames to help
ensure success
Assessing your organizations readiness for a data warehouse
Setting realistic expectations
.....................................................................................................................................................
5-42 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Practice 5-1 Overview
This practice covers the following topics:
Generating a warehouse organizational readiness
checklist
Generating a warehouse strategy deliverables
checklist
Generating a warehouse project scope
deliverables checklist
.....................................................................................................................................................
Data Warehousing Fundamentals 5-43
.....................................................................................................................................................
Practice 5-1
Practice 5-1
Warehouse Organizational Readiness Checklist
1 For each item in the following list that measures warehouse readiness, rate your
own organizations readiness. Rate each items relative importance in measuring
your organizations readiness.
Readiness Measure Your Organizations Readiness
Are the objectives and business drivers clearly
defined, compelling, and agreed upon?
Have you selected a methodology for design,
development, and implementation?
Is the project scope clearly defined, with a
focus on business rather than technology?
Is there strong support from a business
management sponsor?
Does the business management sponsor have
specific expectations?
Are there cooperative relations between
business and Information Systems staff?
Have you identified which source data will be
used to populate the data warehouse?
What is the quality and cleanliness of the
source data?
Are you authorized to choose and acquire
hardware and software to implement the
warehouse?
Are you prepared to select and train your
implementation team?
.....................................................................................................................................................
5-44 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Warehouse Strategy Deliverables Checklist
2 Form into small groups, and consider each of the following strategy deliverables.
For each deliverable, discuss briefly whether you would use it in your own strategy
checklist back at your workplace, and rate its importance relative to the other
deliverables.
Strategy Deliverable Description Will You Use? Why?
Business goals and
objectives
Documents the strategic
business goals and objectives
Data warehouse
purpose, objectives,
and scope
Documents the purpose and
objectives of the enterprise data
warehouse, its scope, and how
it is intended to be used.
Enterprise data
warehouse logical
model
High-level, logical information
model that diagrams the major
entities and relationships for the
enterprise
Incremental milestones Documents a realistic scope of
the data warehouse, acceptable
delivery milestones for each
increment, and source data
availability
Source system data
flows
Outlines source system data,
where it originates, the flow of
data between business functions
and source systems, degree of
reliability, and data volatility
Subject area gap
analysis
Documents the variance
between the information
requirements and the ability of
the data sources to provide the
information
Data acquisition
strategy
Documents the approach for
extracting, transforming,
transporting, and loading data
from the source systems to the
target environments for the
initial load and subsequent
refreshes
.....................................................................................................................................................
Data Warehousing Fundamentals 5-45
.....................................................................................................................................................
Practice 5-1
Warehouse Strategy Deliverables Checklist (continued)
Strategy Deliverable Description Will You Use? Why?
Data warehouse
architecture
Documents the set of rules or
structures providing the
framework for the centralized
data warehouse, data marts,
metadata repository, fact tables,
multidimensional structures,
and data access components
Technical infrastructure Outlines the technologies,
platforms, databases, gateways,
and other components
necessary to make the
architecture functional
Data access
environment
Documents the identification,
selection, and design of tools
that support end-user access to
the warehouse data
Data quality strategy Outlines the approach for data
management, error and
exception handling, data
cleansing, and the audit and
control of the data
Data warehouse
administration strategy
Documents the warehouse
administration tasks and
considerations such as version
control, archive, backup, and
analysis of metadata and query
profiles for optimization
Metadata strategy Documents the strategy for
capturing, integrating, and
accessing metadata for all
components of the warehouse
environment
Training strategy Outlines the technical and
business personnel requiring
training, and establishes time
frames for executing the
training plans
.....................................................................................................................................................
5-46 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
Warehouse Project Scope Deliverables Checklist
3 Staying in your small group, discuss each of the following project scope
deliverables. For each deliverable, discuss briefly whether you would use it in your
own project scoping checklist back at your workplace, and rate its importance
relative to the other deliverables.
Scope Deliverable Description Will You Use? Why?
Business requirements
definition
Documents the objectives and
defines the development efforts
for the business requirement
task (The scope clearly outlines
the requirements, functionality,
expected benefits and costs of
the solution. Success criteria
and business constraints are
also documented.)
Data sources Outlines the operational and
external data source systems,
hardware and software
platforms, types of data in the
system, frequency and source of
updates
Load and refresh plans Documents how extraction,
transformation, and
transportation will be
performed
Technical architecture Documents capacity planning,
interface requirements,
hardware architecture, software,
tools and configuration
requirements
Data warehouse
architecture
Outlines the database objects,
data access components, and
metadata repository
.....................................................................................................................................................
Data Warehousing Fundamentals 5-47
.....................................................................................................................................................
Practice 5-1
Warehouse Project Scope Deliverables Checklist (continued)
Scope Deliverable Description Will You Use? Why?
Data quality Documents the plan for data
cleansing and scrubbing, error
and exception handling,
auditing, and feeding back
corrected data to source
systems
Warehouse
administration plan
Documents the tasks, resources,
and time frames for producing
the warehouse administration
functionality
Metadata integration
plans
Outlines the tasks, resources,
and timeframes needed to
ensure that the metadata is
integrated with the data
warehouse components
Data access plan Documents the data access
tasks for implementing an
existing tool or developing a
system to provide access
capabilities
Training plan Outlines the training needed to
support the tasks of the current
phase
.....................................................................................................................................................
5-48 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 5: Planning for a Successful Warehouse
.................................
6
Analyzing User Query
Needs
.....................................................................................................................................................
6-2 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Overview
Project Management
(Methodology, Maintaining Metadata)
Defining
DW Concepts
& Terminology
Planning
for a
Successful
Warehouse
Analyzing
User Query
Needs
Analyzing
User Query
Needs
Choosing a
Computing
Architecture
Modeling
the Data
Warehouse
Planning
Warehouse
Storage
ETT
(Building the
Warehouse)
Meeting a
Business
Need
Supporting
End User
Access
Managing
the Data
Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.

Objectives
After completing this lesson, you should be able to
do the following:
Identify the warehouse users
Identify how to gather user requirements
Identify tasks involved with managing query
access
Identify the different database models that support
OLAP query tools
Describe query access architectures
.....................................................................................................................................................
Data Warehousing Fundamentals 6-3
.....................................................................................................................................................
Overview
Overview
The previous lesson covered planning for a successful warehouse. This lesson
discusses analyzing user query needs. Note that the Analyzing User Query Needs
block is highlighted in the course road map on the facing page.
Specifically, this lesson identifies the analysis required to identify and categorize users
who may need to access data from the warehouse. This lesson also helps you
determine how their requirements differ. Data access and reporting tools are
considered.
Objectives
After completing this lesson, you should be able to do the following:
Identify the warehouse users
Identify how to gather user requirements
Identify tasks involved with managing query access
Identify the different database models that support OLAP query tools
Describe query access architectures
.....................................................................................................................................................
6-4 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Types of Users
Executives
Managers
Business analysts
Copyright Oracle Corporation, 1999. All rights reserved.

User Access
Types of Users
Executives
Casual users
or managers
Business
analysts or
power users
Structured Unstructured
.....................................................................................................................................................
Data Warehousing Fundamentals 6-5
.....................................................................................................................................................
Types of Users
Types of Users
In any warehouse environment, the user communities and their query requirements
vary according to their roles and responsibilities.
Types of Users Definition Requirements
Executives They are in charge of the business and
have overall responsibility for
controlling the business at an
enterprise level, determining
profitability, competitiveness, and
strategy. They need to see bottom-line
figures.
They may interface to the
warehouse only through
printed reports although
these users will experience
the power of the data
warehouse as the reports
become more accurate,
consistent, and easier to
produce.
Their needs drive the
development of the
applications, the
architecture of the
warehouse, the data it
contains, and the priorities
for implementation.
Casual users or
managers
They are in charge of a smaller
component of the business and need
the information to control the
profitability, direction, planning, and
control of a smaller subset of the
business. They also need to see the
enterprisewide picture in order to fit
localized plans into the corporate
goal.
They need easy-to-use tool
that helps them specify
what they want to see and
determine how to produce
the desired results on its
own.
The tool must allow
construction of all the
reporting elements without
being too complicated.
A single interface and
invisible multipass SQL are
critical.
Business
analysts or
power users
They have a solid understanding of
the business process and also have a
technical understanding of
dimensional modeling and SQL,
which are required to extract the
answers to business questions from
the data warehouse and produce the
reports needed by the managers and
executives. They often function as a
liaison between business and
technical groups.
They need a tool that
reflects the way they would
break down and solve the
business problem.
The tool should handle
reporting elements such as
ranking and comparison
across summary levels.
.....................................................................................................................................................
6-6 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Gathering User Requirements


Areas to focus:
How users do business and what the business
drivers are
What attributes users need (required versus good
to have)
What the business hierarchies are
What data users use and what they like to have
What levels of detail or summary needed
What type of front-end data access tool used
How users expect to see the query results
Copyright Oracle Corporation, 1999. All rights reserved.

Gathering User Requirements:


Possible Obstacles
The following are some of the possible obstacles:
Business objective of the data warehouse has not
been specifically defined
Scope of the data warehouse is too broad
Misunderstanding about the purpose and function
of a decision support systems and operational
systems
.....................................................................................................................................................
Data Warehousing Fundamentals 6-7
.....................................................................................................................................................
Gathering User Requirements
Gathering User Requirements
You must approach data warehouse end-user requirements gathering in a radically
different way than with operational systems.
The following are the areas to focus in gathering user requirements.
How users do business
What the business drivers are
What attributes users need
Which attributes are absolutely required and which attributes are good to have
What the business hierarchies are
What data users use now and what they would like to have
What levels of detail or summary the users need
What type of front-end data access tool will be used
How the users expect to see the results of their queries
The following are some of the possible obstacles to gathering user requirements.
The business objective of the data warehouse has not been specifically defined
The scope of the data warehouse is too broad
There is a misunderstanding about the purpose and function of a decision support
systems and operational systems
.....................................................................................................................................................
6-8 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Data Access Tool Requirements


Simple reports
Complex trend analysis
Regression analysis
Multidimensional data analysis
Exceptions reporting
Forecasting
Data manipulation
Data mining
Parameterized reports for batch execution
Web-based or client-server-based (or both)
Copyright Oracle Corporation, 1999. All rights reserved.

Data Access Strategy


Define user requirements early
Determine the choice of tools early
Identify user roles and access requirements
.....................................................................................................................................................
Data Warehousing Fundamentals 6-9
.....................................................................................................................................................
Managing User Data Access
Managing User Data Access
Data Access Tool Requirements
The front-end tools must be able to associate common business terms used on a day-
to-day basis, with a combination of clear and easy-to-understand data definitions. This
enables the users to use the product quickly, without the need for extensive training.
Metadata provides definitions of the data that the user can understand, in simple,
straightforward business terminology.
The tool must be flexible, to provide different reporting requirements such as:
Simple reports
Complex trend analysis
Regression analysis
Multidimensional data analysis
Exceptions reporting
Forecasting
Data manipulation
Data mining
Parameterized reports for batch execution
Web-based or client-server based (or both)
Data Access Strategy
Given the importance to warehouse users of the data and accessing that data, the
choice of tools employed by users is primary and must be defined and determined
early in the definition of the data warehouse.
.....................................................................................................................................................
6-10 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

User Query Progression


Starts simple
Becomes more analytical
Requires different techniques
and flexible tools
What?
Why?
Why?
Why?
.....................................................................................................................................................
Data Warehousing Fundamentals 6-11
.....................................................................................................................................................
Managing User Data Access
User Query Progression
The tools that you employ must provide the flexibility to answer a users immediate
and future needs. The answer to a question may not be immediately obvious, and one
question can often lead to another. Querying the warehouse is an iterative process.
For example, a user may start with a query that answers reasonably simple questions,
such as: What are the sales figures for Sprock tennis rackets during the first half of
1999 in the U.S.A. as a whole?
Once the query is answered, the user may start to ask more analytical questions, such
as: Why did the sales figures for Sprock tennis rackets in the U.S.A. increase during
that period?
The answer proves to be that the World Tennis Championships ran in Miami in March
1999. Obviously, tennis caught everyones attention. Now that the answer to that is
known, the process continues: Which U.S. state sold most Sprock tennis rackets?
Why?
To answer these types of question, the user needs to be able to analyze data in a
number of different ways.
.....................................................................................................................................................
6-12 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Training
Methods
Informal: one-to-one or small class
Formal: larger class
Self-study
Basic topics
Logging on
Accessing metadata
Creating and submitting a query
Interpreting results
Saving queries and storing results
Utilizing resources
Learning warehouse fundamentals
ILT
IDL
CBT
.....................................................................................................................................................
Data Warehousing Fundamentals 6-13
.....................................................................................................................................................
Managing User Data Access
Training the Users
Training Methods Users must be trained in using the system you have put in place.
There are a number of ways of teaching. The common methods are:
Informal sessions with a small number of users who can disseminate the
information after the class (Typically the sessions are on a one-to-one basis, as
there are few real users of the warehouse initially.)
Formal sessions in a classroom environment with larger numbers of students
Self-study using interactive video, computer based training (CBT), or reference
manuals
Fundamental Training Topics The basic training should include some of the
following fundamental topics:
How to switch on the hardware and log on to the data warehouse
How to find out what data is there (access the metadata) and interpret its meaning
How to create and issue a query
How to prioritize queries
How to monitor query execution
How to interpret query results
How to save the query and store results
To have a basic understanding about the resources that are used within the query
environment, particularly in the environment where query governors are used (as
in a warehouse)
How the warehouse works:
Where the data comes from
The level of data quality and integrity (or lack of it)
What mapping is and how it is important
Backup and recovery responsibilities (if any)
Data and query availability
Scheduled downtime
.....................................................................................................................................................
6-14 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Query Efficiency
User considerations
Successful completion
Faster query execution
Less CPU used
More opportunity for further analysis
Copyright Oracle Corporation, 1999. All rights reserved.

Query Efficiency
Designer considerations
Use indexes
Select minimum data
Employ resource governors
Minimize bottlenecks
Develop metrics
Use prepared and tested queries
Use quiet periods
.....................................................................................................................................................
Data Warehousing Fundamentals 6-15
.....................................................................................................................................................
Managing User Data Access
Query Efficiency
Users Perspective An efficient query has the following characteristics from a
users perspective.
Runs successfully, completely, and produces the desired results
Takes less time to run and is therefore more beneficial to productivity
Uses less CPU power and therefore costs less if charges are levied
Enables the user to move more quickly onto further analysis
Designers Role Efficient query access is dependent on the good design of the data
warehouse. The following points are important to ensure query efficiency:
Create indexes on key values to minimize full-table scans.
Select only the minimum amount of data required.
Administer resource governors on the server to:
Prevent access
Cut off a query after it has run for a specified time
Inform the user how long a query will take (Resource governors may be set for
the entire application or by user group. Governors are vital where data volumes
are very large.)
Minimize intensive I/O bottlenecks.
Develop metrics to support queries.
Make more use of prepared and tested queries.
Submit large jobs out of working hours, or when CPU usage, network, and I/O
contention is minimal.
Note: Database resource manager in Oracle8i provides you with the ability to control
and limit the total amount of processing resources available to a given user or set of
users. Using this facility, you will be able to:
guarantee certain users a minimum amount of processing resources regardless of
the load of the system and the number of users.
distribute available processing resources by allocating percentages of CPU time to
different users and applications.
limit the degree of parallelism that a set of users can use.
configure an instance to use a particular method of allocating resources.
select the priority from a given set of priorities that the DBA has assigned to the
user.
.....................................................................................................................................................
6-16 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Charge Models
Examples of charge models:
Flat allocation model
Transaction-based model
Telephone service model
Cable TV model
Develop your own unique model
Avoid a charge model that
discourages users from using
the warehouse
.....................................................................................................................................................
Data Warehousing Fundamentals 6-17
.....................................................................................................................................................
Managing User Data Access
Charging for Data Warehouse Access
At some point the IT Department might need to start charging user groups for data
warehouse usage, as a way of obtaining continuous funding for the data warehouse
initiative. The chargeback schemes will work only if there are reliable mechanisms to
track and monitor usage of the warehouse per user.
Charge Models There are a number of different models that may be used to charge
for services. Some of the examples are:
Flat allocation model: The cost is allocated by a central group (Financial
Controller) based on the percentage of resources used by the organization, such as
office space, number of users, and budgets.
Transaction based model: The cost is based on query usage, which may mean
calculations based on CPU use, I/O, data, or table elements accessed and reported.
Telephone service model: The cost is based on connection time.
Cable TV model: The cost is based on simple standard service charges plus
charges for special services.
Some of these models may not apply to your installation; you may consider
developing a unique model based on your own unique requirements.
Note: Whatever model you employ should balance the needs of the users to access the
data they need against the cost of that data, without discouraging use.
.....................................................................................................................................................
6-18 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Query Scheduling and Monitoring


Query scheduling
Manages information usage
Directs queries
Executes queries
Sets job queue priorities
Query monitoring
Track resource-intensive queries
Detect unused queries
Catch queries that use summary data
inefficiently
Catch queries that perform regular summary
calculations at the time of query execution
Detect illegal access
Copyright Oracle Corporation, 1999. All rights reserved.

Query Management and Monitoring Tools


Use tools, schedulers, Oracle Enterprise Manager
Consider
Automation levels
Technology interfaces
Cost
.....................................................................................................................................................
Data Warehousing Fundamentals 6-19
.....................................................................................................................................................
Managing User Data Access
Managing Queries
Query Scheduling Once the warehouse is operational, queries are submitted to the
warehouse server. You need to create a process that:
Manages the use of information in the data warehouse
Directs queries to the appropriate data source, using metadata
Schedules the execution of a query
Sets job queue priorities
Query Monitoring You need to keep a check on warehouse query activity. The
query management program (or tool) must:
Track resource-intensive queries, which require analysis to identify why they are
so resource-intensive, followed by tuning to improve performance.
Detect queries that are never used and remove them. Do not forget to ensure that
the users need to be advised of this kind of change.
Catch queries that use summary data inefficiently; the summary strategy may need
revision.
Catch queries that perform regular summary calculations at the time of query
execution. You may decide to include another summary table in the data
warehouse with the presummarized data to provide immediate access, which
improves overall speed of access.
Detect illegal access. A user may need access to currently denied data.
Query Management and Monitoring Tools For scheduling you can use custom in-
house developed programs, a UNIX scheduler, third-party tools, or Oracle Enterprise
Manager.
For monitoring you may use the DSS tools themselves (where they have the
capability), in-house developed tools, and server management products such as Oracle
Enterprise Manager.
Consider the automation levels, technology interfaces, and cost of the query
management and monitoring tools before purchasing them.
.....................................................................................................................................................
6-20 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Security
Do not overlook
Subject area sponsors:
Review and authorize
request for access
rights
Identify enhancements
Transparent security
Easy to implement,
maintain, and manage
Copyright Oracle Corporation, 1999. All rights reserved.

Security Plan
Define a strategy:
Allocate business area owners
Ensure invisibility
Ensure easy management
Consider auditing
Manage passwords
.....................................................................................................................................................
Data Warehousing Fundamentals 6-21
.....................................................................................................................................................
Security
Security
Security is commonly controlled by the database administrator (DBA). It must be
considered early in the development to ensure that access to the key resource
information is controlled. Information is a key company resource that needs
protection. Therefore never assume that you can overlook security because user access
is query-only. There are some simple guidelines on security that you can follow:
Ensure that each subject area has a sponsor who can carry out the following tasks:
Review and authorize requests for access rights
Identify further enhancements to the security setup (Data may be separated
into that which is accessible to all users and that which is accessible to a select
few.)
Ensure that the security is transparent and does not impair access from the user
perspective
Ensure that the strategy is easy for you to implement, maintain, and manage
Security Plan
Allocate an owner to every business area within the warehouse. The owner should
be able to advise what access any requestor should be given and define the data
that can be made available publicly, compared with data that must be restricted.
Ensure that the security levels are virtually invisible to the users.
Ensure that you can manage and administer the security simply and define a clear,
simple strategy for:
Access requests
Allocating predefined roles, both public and restricted, to subject areas
Auditing to identify unauthorized access attempts
Password management
.....................................................................................................................................................
6-22 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Role-Based Security
Subject area access:
Summary data for new
users
All data for experienced
users
Departmental access
Limited object access
Access during load
Copyright Oracle Corporation, 1999. All rights reserved.

Application Context and Fine-Grained


Access Control in Oracle8i
Application
context
Access
policy
Table
Who am I?
Where am I?
.....................................................................................................................................................
Data Warehousing Fundamentals 6-23
.....................................................................................................................................................
Security
Role-Based Security
You should use the usual technique of database roles that you can use in an operational
environment. However, you need to consider implementing role-based security
somewhat differently, because of the differences in the way the warehouse and
operational systems work.
For example, you should set up roles that do the following jobs:
Provide users with access to specific subject areas
Provide users with access by department
Limit access to specific objects within any subject area
Control access when loading data (You need a role to REVOKE and a role to
GRANT if you are using Oracle databases.)
Fine-Grained Access Control in Oracle8i
Fine-grained access control gives customers a way to extend their table-based and
view-based security to finer levels of granularity than previously possible. It is
implemented by attaching security policies to tables or views. These security policies
can limit access by users to only specific rows within the table or view.
Application Context Application context is a feature related to fine-grained access
control that can be used to implement a security policy. It is provided so that customers
who want to do fine-grained access control can base their security policies on
information about the user, such as, who is the user, which machine are they using,
what is their management hierarchy? Application context provides a secure
framework to store such information so that it may be used to implement access to
database objects.
The justification for fine-grained access control is as follows:
Application-based security can be bypassed.
Views work best for limited number of user groups.
Internet and remote access demand data-driven, user-based security.
Requirements for privacy (For example, in the medical, the human resource, and
the defense applications.)
Building security in one place reduces cost of ownership.
.....................................................................................................................................................
6-24 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Comparing OLAP and DSS


OLAP is used for multidimensional analysis.
DSS provides a system enabling decision making.
OLAP tools provide a DSS capability.
OLAP for the warehouse provides analytical
power.
Other terms:
EIS
KBS
Copyright Oracle Corporation, 1999. All rights reserved.

The Functionality of OLAP


Rotate and drill down to successive levels of
detail.
Create and examine calculated data interactively
on large volumes of data.
Determine comparative or relative differences.
Perform exception and trend analysis.
Perform advanced analytical functions for example
forecasting, modeling, and
regression analysis
.....................................................................................................................................................
Data Warehousing Fundamentals 6-25
.....................................................................................................................................................
OLAP
OLAP
The term online analytical processing (OLAP) was coined by Dr. E. F. Codd to
describe a technology that could bridge the gap between personal computing and
enterprise data management. Decision support systems (DSS) are systems that enable
decision makers in organizations to access data relevant to the decisions they are
required to make. The definitions of OLAP and DSS are often confused with each
other.
Comparing OLAP and DSS
OLAP Online analytical processing covers a wide spectrum of usage and a wide
variety of requirements. Online analytical processing has a number of different
definitions, such as a loosely defined set of principles that provide a dimensional
framework for decision support. Essentially OLAP is a flexible analytical tool that is
commonly used to analyze and interpret data in a data warehouse or data mart.
DSS Decision support systems are not new. They have been around for many years.
In an earlier lesson, you saw that decision support systems were provided with
information obtained from data extract processing.
DSS, therefore, provide users with data, enabling decision making. They may or may
not be a data warehouse or data mart. They may have an operational environment or an
operational environment with data extracts used for specific decision making
activities.
There is little distinction between decision support and online analytical processing.
Online analytical processing tools provide a decision support capability. Both online
analytical processing and decision support query and reporting tools provide the
means for informed decision making.
OLAP and DSS Tools for the Warehouse
Ultimately, online analytical processing tools and decision support tools that are
designed to access warehouse data are more flexible and more capable of true analysis
than standard reporting tools typically used to access relational operational data.
The Functionality of OLAP OLAP provides much more than just the ability to
perform rotating or drilling down. It offers the ability to create and examine calculated
data interactively on large volumes of data, the ability to determine comparative or
relative differences, as well as the ability to perform exception and trend analysis on
calculated data. Some of the advanced analytical functions of OLAP are forecasting,
modeling, regression analysis, and solving simultaneous equations.
Note: OLAP and DSS are also referred to as EIS (executive information systems) or
KBS (knowledge based systems).
.....................................................................................................................................................
6-26 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Original OLAP Rules


1. Multidimensional conceptual view
2. Transparency
3. Accessibility
4. Consistent reporting performance
5. Client-server architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Original OLAP Rules


6. Generic dimensionality
7. Dynamic sparse matrix handling
8. Multiuser support
9. Unrestricted cross-dimensional operations
10. Intuitive data manipulation
11. Flexible reporting
12. Unlimited dimensions and aggregation levels
.....................................................................................................................................................
Data Warehousing Fundamentals 6-27
.....................................................................................................................................................
OLAP
Original 12 OLAP Rules of Dr. E. F. Codd
The OLAP rules were originally defined by Dr E. F. Codd. He saw the need for a
model that was more suitable for mapping to the way analysts understand the business.
1 Multidimensional conceptual view: A tool should provide users with a
multidimensional model that corresponds to the business problems and is
intuitively analytical to use.
2 Transparency: The OLAP systems technology, the underlying database and
computing architecture, and the heterogeneity of input data sources should be
transparent to users to preserve their productivity and proficiency with familiar
front-end environments and tools.
3 Accessibility: The OLAP system should access only the data actually required to
perform the analysis. Additionally, the system should be able to access data from
all heterogeneous enterprise data sources required for the analysis.
4 Consistent reporting performance: As the number of dimensions and the size of
the database increase, users should not perceive any significant degradation in
performance.
5 Client-server architecture: The OLAP system has to conform to client-server
architectural principles for maximum price and performance, flexibility,
adaptivity, and interoperability.
6 Generic dimensionality: Every data dimension must be equivalent in both
structure and operational capabilities.
7 Dynamic sparse matrix handling: The OLAP system has to be able to adapt its
physical schema to the specific analytical model that optimizes sparse matrix
handling to achieve and maintain the required level of performance.
8 Multiuser support: The OLAP system must be able to support a work-group of
users working concurrently on a specific model.
9 Unrestricted cross-dimensional operations: The OLAP system must be able to
recognize dimensional hierarchies and automatically perform associated roll-up
calculations within and across dimension.
10 Intuitive data manipulation: Consolidation path reorientation drill-down and roll-
up, and other manipulations should be accomplished through direct point-and-
click, drag-and-drop actions on the cells of the cube.
11 Flexible reporting: The ability to arrange rows, columns, and cells in a fashion that
facilitates analysis by intuitive visual presentation of analytical reports must exist.
12 Unlimited dimensions and aggregation levels: Depending on business
requirements, an analytical model may have a dozen or more dimensions, each
having multiple hierarchies. The OLAP system should not impose any artificial
restrictions on the number of dimensions or aggregation levels.
.....................................................................................................................................................
6-28 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

1001
1007
1010
1020
Relational Database Model
31
42
22
32
F
M
M
F
Anderson
Green
Lee
Ramos
Attribute 1
Name
Attribute 2
Age
Attribute 3
Gender
Row 1
Row 2
Row 3
Row 4
The table above illustrates the employee relation.
Attribute 4
Emp No.
Copyright Oracle Corporation, 1999. All rights reserved.

Multidimensional Database Model


The data is found at the intersection of dimensions.
Store
GL_Line
Time
FINANCE
Store
Product
Time
SALES
Customer
.....................................................................................................................................................
Data Warehousing Fundamentals 6-29
.....................................................................................................................................................
OLAP
Comparing Relational and Multidimensional Database Models
Before examining online analytical processing in any more detail, you should consider
the difference between relational and multidimensional (OLAP) database models.
The Relational Database Model A relation is a two-dimensional table. Each row in
the table holds data that pertain to some thing or a portion of some thing. Each column
of the table contains data regarding an attribute. Sometimes rows are called tuples and
columns are called attributes.
For example, the top slide on the facing pages is a sample table. Notice that it has four
rows (tuples) made up of four columns (attributes).
The Multidimensional Database Model You can visualize the data model for a
multidimensional database as a cube (the equivalent of a table in a relational database).
Each cube has several dimensions (equivalent to index fields in relational tables).
The cube acts like an array in a conventional programming language. Logically, the
space for the entire cube is preallocated. To find or insert data, you use the dimension
values to calculate the position.
For example, sales for Product P2, Store London, and Time Jan97 may be in position
[2,50,13]. In practice, a multidimensional product would have techniques to compress
the amount of disk space used.
In the diagram, the database contains two cubes. Sales is a four-dimensional cube of
information collected over time by store, product and customer. The Financial
information cube is three-dimensional, collected by time, store, and general ledger
account line. The store and time dimensions are common to the two cubes. Because
the database can contain many cubes, this approach is sometimes referred to as
multicube storage.
A cube can also be a formula rather than a variable. In this case the cube is stored as a
calculation formula such as Profit = Revenue Expenses, and the data is calculated on
demand from the stored cubes for revenue and expenses. This is like a view in a
relational system.
The power of this model is the high degree of analysis it puts at your fingertips, when
combined with online analytical processing tools. Online analytical processing today
generally involves the use of a separate multidimensional server that contains a
relatively small amount of highly indexed data from operational systems.
.....................................................................................................................................................
6-30 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Relational Server
Benefits:
Well-known environment with many experts in
most organizations able to support the product
Can be used with data warehousing and
operational systems
Many tools available with advanced features
including improvements made to performance
with report servers
Disadvantages:
Does not have any complex functions or
analysis capabilities provided by OLAP tools
These products may also be restricted to the
volumes of data they can access
Copyright Oracle Corporation, 1999. All rights reserved.

Multidimensional Server
Benefits:
Quick access to very large volumes of data
Extensive and comprehensive libraries of
complex functions specifically for analysis
Strong modeling and forecasting capabilities
Can access multidimensional and relational
database structures
Disadvantages:
Difficulty of changing dimensions without
reaggregating to time
Lack of support for very large volumes of data
.....................................................................................................................................................
Data Warehousing Fundamentals 6-31
.....................................................................................................................................................
OLAP
Choosing Between Relational and Multidimensional Servers
Each database server has its own strengths and weaknesses.
Relational Server
Benefits:
Well-known environment with many experts in most organizations able to
support the product.
Can be used with data warehousing and operational systems.
Many tools available with advanced features including improvements made to
performance with report servers.
Disadvantages:
Does not have any complex functions or analysis capabilities provided by
OLAP tools.
These products may also be restricted to the volumes of data they can access.
Multidimensional Server
Benefits:
Quick access to very large volumes of data.
Extensive and comprehensive libraries of complex functions specifically for
analysis.
Strong modeling and forecasting capabilities.
Can access multidimensional and relational database structures.
Disadvantages:
Difficulty of changing dimensions without reaggregating to time.
Lack of support for very large volumes of data.
.....................................................................................................................................................
6-32 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

MOLAP Server
The application layer
stores data in a
multidimensional structure
The presentation layer
provides the
multidimensional view
MOLAP
Engine
DSS client
Application
layer
Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.

MOLAP Server
Data
Arrays
Cached
Offloaded from server
Efficient storage and processing
Complexity hidden from the user
Analysis using preaggregated
summaries and precalculated
measures
MOLAP
engine
DSS client
Application
layer
Warehouse
.....................................................................................................................................................
Data Warehousing Fundamentals 6-33
.....................................................................................................................................................
OLAP
Multidimensional OLAP Server (MOLAP)
The multidimensional online analytical processing (MOLAP) engine takes the data
from the warehouse or from operational sources. The MOLAP engine then stores the
data in proprietary data structures, summaries, and precalculates as many outcomes as
possible.
Characteristics
Data is stored as a precalculated array.
The data resides, or is cached, in a proprietary multidimensional database, with a
multidimensional viewer. Both the data and index values are held in arrays.
The database is organized to allow rapid retrieval of related data across multiple
dimensions.
Data can be offloaded from the server onto the client for local access, reducing
network traffic. However, it can take time to form the cubes.
The MOLAP tools store and process multidimensional data efficiently.
The calculation engine creates new information from existing data through
formulas and transformations.
The complexity of the underlying data is transparent to the user.
The tools can exploit the complexity of the analysis involved.
The complex analytical querying capabilities enable a business to respond to
change faster.
Preaggregated summary data and precalculated measures enable quick and easy
analysis of complex data relationships.
.....................................................................................................................................................
6-34 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

ROLAP Server
The warehouse stores atomic
data.
The application layer
generates SQL for the three-
dimensional view.
The presentation layer
provides the multidimensional
view.
ROLAP
engine
DSS client
Application
layer
Warehouse
server
Multiple
SQL
Copyright Oracle Corporation, 1999. All rights reserved.

ROLAP Server
Data and metadata in server
Multidimensional views of
data
High connectivity
Unlimited
Database size
Query criteria
Complex SQL generated by
tool
ROLAP
engine
DSS client
Application
layer
Warehouse
server
Multiple
SQL
.....................................................................................................................................................
Data Warehousing Fundamentals 6-35
.....................................................................................................................................................
OLAP
Relational Database OLAP Server (ROLAP)
The relational online analytical processing (ROLAP) engine takes data from the
relational data warehouse. The ROLAP engine uses its built-in SQL functionality to
create a multidimensional representation of the data and presents that to the user as a
multidimensional view.
Characteristics
Data and metadata is stored as records in the relational database. The OLAP server
uses this metadata dynamically to generate the SQL statements necessary to
retrieve the data as the user requests it.
Users see a multidimensional view of data that is stored in relational tables.
End users are supplied with a multidimensional viewing tool to view the relational
data.
There is high capacity connectivity to powerful servers.
There are no limitations on the size of the database or the kind of analysis that may
be performed. However, if the server is SQL-driven, some engines may severely
affect performance if the user joins several tables or performs complex
computations.
Complex SQL code is generated by the ROLAP tool. The tools create a number of
SQL statements when they access the database; this may adversely affect
performance.
.....................................................................................................................................................
6-36 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

MOLAP, ROLAP, and HOLAP


Express
Server
Express
user
Warehouse
? ?
.....................................................................................................................................................
Data Warehousing Fundamentals 6-37
.....................................................................................................................................................
OLAP
MOLAP, ROLAP, and HOLAP
Multidimensional OLAP (MOLAP), relational OLAP (ROLAP), and hybrid OLAP
(HOLAP) are terms that can cause some confusion.
OLAP The key concept is the consistent theme in each of these configurations:
online analytical processing. OLAP tools and applications must be able to manipulate
and display data using a multidimensional view. The multidimensional data model is
specifically designed for this type of analysis, and reflects the way users think about
their businesses.
Performance Versus Storage: The central issue surrounding this OLAP
configuration question is the trade-off between performance and storage space.
When data is stored in the multidimensional model (MOLAP), data-access
performance is maximized for the end user. However, some redundancy of storage
results, and multidimensional databases can become extremely large.
When data is stored only in the warehouse and is brought into the
multidimensional cache when queried (ROLAP), added storage is not an issue, but
query performance suffers.
Flexible OLAP Access: A complete OLAP solution should provide any of these
options. Oracle Express technology is based on a multidimensional data model,
but the underlying data can be structured in a number of ways.
.....................................................................................................................................................
6-38 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

MOLAP
Express
Server
Express
user
Warehouse
Query
Data
MDDB
Periodic
load
Copyright Oracle Corporation, 1999. All rights reserved.

ROLAP
Express
Server
Express
user
Warehouse
Data
cache
Live
fetch
Cache
Query
Data
.....................................................................................................................................................
Data Warehousing Fundamentals 6-39
.....................................................................................................................................................
OLAP
MOLAP, ROLAP, and HOLAP (continued)
MOLAP In a pure MOLAP environment, data from the warehouse, online
transactional processing (OLTP) systems, or other external source is periodically
loaded into a multidimensional database (MDDB) such as Oracle Express, where it is
presummarized and optimized for analysis.
ROLAP In a ROLAP environment, relational data from a data warehouse or data
mart is retrieved in response to a user query on the fly, and that data is brought into the
Oracle Express multidimensional cache.
Once data has been cached into Oracle Express, subsequent access of that same data
does not require a refetch of the data from the warehouse.
.....................................................................................................................................................
6-40 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Hybrid (HOLAP)
Express
Server
Express
user
Warehouse
Fetch,
cache
MDDB and
cache
Periodic
load Query
Data
.....................................................................................................................................................
Data Warehousing Fundamentals 6-41
.....................................................................................................................................................
OLAP
MOLAP, ROLAP, and HOLAP (continued)
HOLAP The MOLAP and ROLAP approaches can be combined into a hybrid
(HOLAP) solution, which takes advantage of the strengths of both the ROLAP and
MOLAP methods.
In the hybrid solution, the relational database is used to store the bulk of the detail
data, and the multidimensional model is used to store summary data.
.....................................................................................................................................................
6-42 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Choosing a Reporting Architecture


Business needs
User adaptability
GUI interface
Computer architecture
Network architecture
Network throughput
Openness
MOLAP
ROLAP
Simple Complex
Query
Performance
Good
OK
Analysis
.....................................................................................................................................................
Data Warehousing Fundamentals 6-43
.....................................................................................................................................................
OLAP
Choosing Between ROLAP and MOLAP Architectures and Tools
Factors Influencing Query Tool Choice The diagram shows that ROLAP serves
the user who requires simple analysis and MOLAP serves the user who needs more
complex analysis, because of the performance and summarization benefits of MOLAP.
There are a number of key issues to consider when determining which product to use:
Business need: Does the tool fit current and future reporting requirements?
Consider whether the tool is able to successfully access the data sources and
models needed to provide information required. Is the tool able to access the
volumes of data necessary to perform the analysis required?
User: Some tools have a steep learning curve and are specialized in their
presentation. Is there room in your organization for yet another specialist tool?
Does the tool provide the flexibility, functionality, and speed needed?
GUI: Consider how organized, intuitive, user-friendly, and robust the interface is.
Computing architecture: Consider existing computer architectures. Decide
whether the fat client with its associated features and functionality could be
replaced by the thin client. Do the selected tools fit in with your current and
planned architecture?
Network architecture: Consider how the products deploy their requests across the
network, and the effects on the network and server. Can the chosen network
(WAN, LAN, or MAN) support the analysis approaches chosen? Conversely, can
the tool fit within the network architecture defined?
Network throughput: Is the network capable of the capacity? Is it likely to be
affected by access contention? What is your networking strategy? Do you have
one?
Openness: Is the product portable and does it have the necessary application
program interface (API) to connect to the databases you have in place? Can you
write or customize APIs?
.....................................................................................................................................................
6-44 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Choosing a Reporting Architecture


Performance
Scalability
Management
Enterprisewide
perspective
MOLAP
ROLAP
Simple Complex
Query
Performance
Good
OK
Analysis
.....................................................................................................................................................
Data Warehousing Fundamentals 6-45
.....................................................................................................................................................
OLAP
Choosing Between ROLAP and MOLAP Architectures and Tools
(continued)
Factors Influencing Query Tool Choice (continued)
Performance: Will the product be able to respond to the variety of queries required
in acceptable (defined) time frames? Determine your own speed metrics. Ensure
that the tool can meet service level agreement response times required, and if not
you should renegotiate.
Scalability: Consider whether the tool is capable of expanding to meet future
needs, for example, moving from a simple daily reporting situation to alert-driven
exception reporting, without major modification.
Management: What kind of management and support does the product require? Is
there a large administrative task in setting up the environment and building end-
user layers (metalayers)?
Enterprisewide perspective: Always consider the tools with an enterprisewide
approach in mind, not just local, or departmental, considerations.
.....................................................................................................................................................
6-46 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Client-Server Access
Mainframe power preserved
Tools
Simple query
Complex query
Data mining
Common protocol
Common gateway
Common protocol
Warehouse server
Windows
Macintosh
OS/2
UNIX
Copyright Oracle Corporation, 1999. All rights reserved.

Web Access
Internet: global network
Intranet: corporate access
Lower costs
Hardware
Communication
Application
Security issues
.....................................................................................................................................................
Data Warehousing Fundamentals 6-47
.....................................................................................................................................................
Query Access Architectures
Query Access Architectures
In the industry today, there are many architectures, and in the warehouse environment
the two most prominent are client-server and Web access.
Client-Server Access
The principle behind the client-server approach is to split the processing among
servers and localized processing on the client.
This openness among systems provides the configuration with total flexibility.
Different users may run different tools that access the data warehouse. They are:
Simple query tools
Complex analysis tools
Data mining tools
Web Access
At this time data warehouse information is provided as Web-based applications on
intranets (networks within a company), as an alternative to other DSS delivery
mechanisms.
Internet and intranet access to a warehouse may bring these benefits:
Lower hardware costs
Lower communication costs
Lower application licensing and maintenance costs
Minimized burden on administrators
Internet Security Issues Security issues abound in this environment, and you must
carefully consider the impact of providing global access to your data. You should
consider:
View-based security techniques, with a permissions table identifying users
clearance codes. The codes themselves match to clearance codes held with the data
in the warehouse.
Caching techniques that allow only queries available to users of a certain code to
actually access the cached data.
Password abstraction, which allows you to specify for access a password that is
then converted behind the scenes, when access to the database is then made
available.
.....................................................................................................................................................
6-48 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Fat Client
PC clients to high-end servers
Demand more software and hardware
Are difficult to administer
Give limited application reusability
Provide a lot of software for limited use
Are expensive to buy, maintain, and license
Copyright Oracle Corporation, 1999. All rights reserved.

Thin Client
Browser device to server
Lower hardware cost
Lower license cost
Open deployment
Challenges
Less of a library
More security, data integrity, and
distributed capabilities
Robustness, scalability, and
extensibility
Example: NC from Oracle
.....................................................................................................................................................
Data Warehousing Fundamentals 6-49
.....................................................................................................................................................
Query Access Architectures
Fat Client
In a client-server architecture, a fat client is a client that performs the bulk of the data
processing operations. The data itself is stored on the server.
During the 1980s, the industry introduced PCs (clients) with graphical interfaces and
high-end servers that can house databases. As these became more popular, companies
downsized, rightsized, and reduced mainframe computing architectures. Today, the PC
is the foundation of most modern enterprise systems, and gives many users the ability
to perform many tasks with ease.
PCs create some challenges, however:
They have become fat, demanding more software and hardware.
Administering multiple copies of software is difficult.
Once developed, client software offers limited reusability in extending
applications.
Users require a limited selection of the software available on the PC.
PCs are costly to purchase and maintain in terms of the amount of software
required to support each device.
Thin Client
In client-server applications, a thin client is designed to be especially small so that the
bulk of the data processing occurs on the server. A thin client is a network computer
without a hard disk drive, whereas a fat client includes a disk drive.
Advances in Internet technology, decreases in the cost of high-end servers, and
increases in the total cost of purchasing, supporting, and maintaining PCs are
prompting IT departments to reconsider their client-server strategy. They are starting
to use the features of the Web to eliminate the reliance on PCs. To this end, the thin
client (a browser) is a device that contains the application logic, connected to the high-
end server.
Thin client access to a data warehouse across the Web has a number of advantages:
Lower hardware cost per user
Lower licensing costs per user (The software is centralized on the server.)
Open deployment platform
Web access is still in its early years and has some challenges to face. It needs to:
Evolve from a library of documents to an electronic business platform that can
conduct secure transactions on intranets and the Internet
Provide rich levels of security, data integrity, and distributed transaction support
Provide robust, scalable, and reusable extensibility
The network computer (NC), available from Oracle, is an example of a thin client.
.....................................................................................................................................................
6-50 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Summary
The lesson discussed the following topics:
Building a data warehouse entails enabling users
to access the information in the warehouse
Determining user query needs is an important part
of the data warehouse project implementation
Planning for good data access capability is
important to the success of the data warehousing
project
.....................................................................................................................................................
Data Warehousing Fundamentals 6-51
.....................................................................................................................................................
Summary
Summary
The lesson discussed the following topics:
The purpose of building a data warehouse is to enable users to access the
information in the warehouse
Determining user query needs is an important part of the data warehouse project
implementation
Planning for good data access capability is important to the success of the data
warehousing project
.....................................................................................................................................................
6-52 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
Copyright Oracle Corporation, 1999. All rights reserved.

Practice 6-1 Overview


This practice covers the following topics:
Completing a user profile exercise
Answering true/false questions on user
involvement in determining query access to the
data warehouse
Performing the Security Consideration Checklist
exercise
.....................................................................................................................................................
Data Warehousing Fundamentals 6-53
.....................................................................................................................................................
Practice 6-1
Practice 6-1
1 Complete the user profile column in this exercise with one of the following user
types:
Executive
Casual user or manager
Business analyst or power user
2 Answer true or false to the following questions.
Name Access Needs Technology User Profile
Brian OReilly Need to develop simple
forecast, such as
budgets
Ease of use is important
Microsoft Office
Internet browser
Spreadsheets
Mary Ramos One click access
Only need highly
summarized
information
Ease of use is very
important
E-mail
Microsoft Office
Internet browser
Kim Seng Constantly wants to
get more data
Understands the
organizations business
processes
Spreadsheets
Oracle Reports
Oracle Discoverer
Oracle Express
Analyzer
Amber Salinas Lots of drilling
Customize graphical
user interface (GUI)
Needs to know data
structures
Extensive SQL
programming
Oracle7X,
Oracle8X Server
Oracle Express
Question True False
a Do not involve users in the early process of the data warehouse
implementation because they are going to delay your delivery
date.
b Choose the warehouse data access tools by involving only IT
staff because they are the ones who know what the users need.
c Prototype access methods with prospective users.
.....................................................................................................................................................
6-54 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 6: Analyzing User Query Needs
3 Security Consideration Checklist exercise: Form into small groups, and discuss
each of the following questions. For each question, discuss briefly whether you
would use it in your own security consideration checklist back at your workplace,
and rate its importance relative to the other questions on the checklist.
Security Consideration Question Will You Use? Why?
a Security should be addressed at column
level (and in some cases at the row level),
at the table level, at the database level, at
the tools level, at the client and server
level, and at the network level.
b Create views to limit access to particular
columns or, in unusual circumstances,
rows.
c Do not rely on anything to protect the
database except the database security.
d How are reports upgraded when new
versions are released?
e Security should be implemented based on
what makes the most sense for both the
short-term and long-term health of the
business. Judge security not only by its
structure, but by how well it supports the
entire corporate organizations needs and
survival.
.................................
7
Modeling the Data
Warehouse
.....................................................................................................................................................
7-2 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Overview
Project Management
(Methodology, Maintaining Metadata)
Defining
DW Concepts
& Terminology
Planning
for a
Successful
Warehouse
Analyzing
User Query
Needs
Choosing a
Computing
Architecture
Modeling
the Data
Warehouse
Planning
Warehouse
Storage
ETT
(Building the
Warehouse)
Meeting a
Business
Need
Supporting
End User
Access
Managing
the Data
Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Objectives
After completing this lesson, you should be able to
do the following:
List generic phases for modeling a data
warehouse
List the components of a warehouse data model
Identify tools available for warehouse modeling
.....................................................................................................................................................
Data Warehousing Fundamentals 7-3
.....................................................................................................................................................
Overview
Overview
This lesson examines the role of data modeling in a data warehousing environment.
The lesson presents a very high level overview of warehouse modeling steps. You
consider the different types of models that can be employed, such as the star schema.
Tools available for warehouse modeling are introduced.
Note that the Modeling the Data Warehouse block is highlighted in the overview
slide on the facing page.
Objectives
After completing this lesson, you should be able to do the following:
List generic phases for modeling a data warehouse
List the components of a warehouse data model
Identify tools available for warehouse modeling
Note: Oracle offers a two-day, instructor-led course entitled Data Warehouse
Database Design. That course teaches comprehensive database design by using a case
study, whereas this lesson provides a high-level overview.
.....................................................................................................................................................
7-4 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Data Warehouse Database Design Phases
1. Defining the business model
(conceptual model)
2. Creating the dimensional model
(logical model)
3. Modeling summaries
4. Creating the physical model
Physical model
1
2, 3
4
Select a
business
process
.....................................................................................................................................................
Data Warehousing Fundamentals 7-5
.....................................................................................................................................................
Data Warehouse Database Design Phases
Data Warehouse Database Design Phases
In the past several years, a number of methods for designing a data warehouse have
been published. Although these methods define certain terms differently, all include
the same general tasks required to produce a sound data warehouse database design.
This lesson focuses on the major tasks associated with the data warehouse database
design process. These tasks have been grouped into four phases:
Defining the business model
Creating the dimensional (logical or star schema) model
Modeling summaries
Creating the physical model
.....................................................................................................................................................
7-6 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Performing Strategic Analysis
Performing strategic analysis
Creating the business (conceptual) model
Phase 1: Defining the Business Model
Select a
business
process
Copyright Oracle Corporation, 1999. All rights reserved.
Creating the Business Model
Performing strategic analysis
Creating the business (conceptual) model
Defining business requirements
Identifying the business measures
Identifying the dimensions
Identifying the grain
Identifying the business definitions and
rules
Verifying data sources
Phase 1: Defining the Business Model
.....................................................................................................................................................
Data Warehousing Fundamentals 7-7
.....................................................................................................................................................
Phase One: Defining the Business Model
Phase One: Defining the Business Model
Performing Strategic Analysis
Performed at the enterprise level, strategic analysis identifies, prioritizes, and selects
the major business processes (also called business events or subject areas) that are
most important to the overall corporate strategy.
Strategic analysis includes the following steps:
Identify the business processes that are most important to the overall corporate
strategy.
Understand the business processes by drilling down on the dimensions that
characterize each business process.
Prioritize and select the business process to implement in the warehouse, based on
which one will provide the quickest and largest return on investment (ROI).
Creating the Business Model
The strategic analysis step produces a high-level definition of the chosen business
process or processes. In this second step of the business modeling phase, a business
model is created.
Defining Business Requirements The business model is created by defining the
business analysis requirements for each process. The previous lesson discussed
interviewing end users to learn their query needs. You will also need to meet with
business managers and business analysts who are directly responsible for the specific
business processes in order to:
Define specific business measures.
Create a detailed listing of the dimensions that characterize each measure.
Identify the granularity required to satisfy the analysis requirements.
Clarify business definitions and business rules.
Verifying Data Sources Concurrently, you must perform an information systems
(IS) data audit, a systematic exploration of the underlying legacy source systems to
verify that the data required to support the business requirements is available.
.....................................................................................................................................................
7-8 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Business Requirements Drive the Design
Process
Existing
metadata
Production
ERD model
Business
requirements
Research
Primary input
Other inputs
Nonrelational
legacy systems
.....................................................................................................................................................
Data Warehousing Fundamentals 7-9
.....................................................................................................................................................
Phase One: Defining the Business Model
Business Requirements Drive the Design Process
The entire scope of the data warehouse initiative must be driven by business
requirements. Business requirements determine:
What data must be available in the warehouse
How data is to be organized
How often data is updated
End-user application templates
Maintenance and growth
Primary Input The business requirements are the primary input to the design of the
data warehouse. Information requirements as defined by the business peoplethe end
userswill lay the foundation for the data warehouse content.
Other Inputs Overlaying those requirements with source information and further
research regarding how data is used helps to determine the specific data that the data
warehouse will provide. Other sources may be:
Existing metadata
Source ER diagrams from relational OLTP systems
Research
Legacy nonrelational systems data
.....................................................................................................................................................
7-10 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Identifying Measures and Dimensions
Balance
Units Sold
Cost
Sales

The attribute is perceived as
a constant or discrete value:
The attribute varies
continuously:
Measures Dimensions
Description
Location
Color
Size
.....................................................................................................................................................
Data Warehousing Fundamentals 7-11
.....................................................................................................................................................
Phase One: Defining the Business Model
Identifying Measures and Dimensions
Measures A measure contains a numeric value that measures an aspect of the
business. Typical examples are gross sales dollars, total cost, profit, margin dollars, or
quantity sold. A measure can be additive or partially additive across dimensions.
Dimensions A dimension is an attribute by which measures can be characterized or
analyzed. Dimensions bring meaning to raw data. Typical examples are customer
name, date of order, or product brand.
Ultimately, the business requirements document should contain a list of the business
measures and a detailed list of all dimensions, down to the lowest level of detail for
each dimension. An example is shown in the slide for a retail customer sales process.
Distinguishing Between Measures and Dimensions
During the warehouse design, you must decide whether a piece of data is a measure or
a dimension.
You can use the following as a guide:
If the data regularly changes value, it is a measure; for example, units sold or
account balances.
If the data is constant (a discrete value), it is a dimension. For example, the color of
a product and the address of a customer are unlikely to change frequently.
A need or capability to summarize often identifies a measure.
Dimensions are typically represented along the axes of existing reports.
These rules are not definitive but act as a guide where there is indecision.
.....................................................................................................................................................
7-12 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Determining Granularity
YEAR?
QUARTER?
MONTH?
WEEK?
DAY?
.....................................................................................................................................................
Data Warehousing Fundamentals 7-13
.....................................................................................................................................................
Phase One: Defining the Business Model
Determining Granularity
When gathering more specific information about measures and analytic parameters, it
is important also to understand the level of detail that is required for analysis and
business decisions. This level of detail is called granularity. The greater the level of
detail, the finer the level of granularity.
The Key Question What do your users really need for now and for the near-term
future? Determine that and then design for one grain finer.
Consider that users typically perform fine-grain analysis on a short horizon, maybe six
weeks. Thus, as a solution, you can retain six weeks of data online and roll off the aged
data automatically.
Note: Remember that you can always aggregate upward, but you cannot disaggregate
lower than the data that is stored in the data mart.
.....................................................................................................................................................
7-14 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Identifying Business Rules
Product
Type Monitor Status
PC 15 inch New
Server 17 inch Rebuilt
19 inch Custom
None
Location
Geographic proximity
0 - 1 miles
1 - 5 miles
> 5 miles
Store
Store > District > Region
Time
Month > Quarter > Year
.....................................................................................................................................................
Data Warehousing Fundamentals 7-15
.....................................................................................................................................................
Phase One: Defining the Business Model
Identifying Business Rules
Business model elements should also be documented with agreed-upon business rules
and definitions. For example, the wholesale computer sales process might include the
following business rules:
All product items are grouped by status.
March, April, and May make up the first quarter in the fiscal year.
A store is in one and only one district.
.....................................................................................................................................................
7-16 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Creating the Dimensional Model
Phase 2: Creating the Dimensional (Logical)
Model
Identify fact tables
Translate business measures into fact tables
Analyze source system information for
additional measures
Identify base and derived measures
Document additivity of measures
Identify dimension tables
Link fact tables to the dimension tables
Create views for users
.....................................................................................................................................................
Data Warehousing Fundamentals 7-17
.....................................................................................................................................................
Phase Two: Creating the Dimensional Model
Phase Two: Creating the Dimensional Model
When you complete the first phase, defining the business model, you proceed to the
second phase, creating the dimensional (logical) model.
Identify fact tables
Translate business measures into fact tables
Analyze source system information for additional measures
Identify dimension tables
Link fact tables to dimension tables
Create views for users
.....................................................................................................................................................
7-18 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Dimension Tables
Dimension tables have the following characteristics:
Contain textual information that represents the
attributes of the business
Contain relatively static data
Are joined to a fact table through a foreign key
reference
Product Channel
Facts
(units,
price)
Customer Time
Copyright Oracle Corporation, 1999. All rights reserved.
Fact Tables
Fact tables have the following characteristics:
Contain numeric measures (metrics) of the
business
May contain summarized (aggregated) data
May contain date-stamped data
Are typically additive
Have key value that is typically a concatenated key
composed of the primary keys of the dimensions
Joined to dimension tables through foreign keys
that reference primary keys in the dimension
tables
.....................................................................................................................................................
Data Warehousing Fundamentals 7-19
.....................................................................................................................................................
Phase Two: Creating the Dimensional Model
Dimension Tables
Dimensions are the textual descriptions of the business. Dimension tables are typically
smaller than fact tables and the data changes much less frequently. Dimension tables
give perspective regarding the whys and hows of the business and element
transactions.
While dimensions generally contain relatively static data, customer dimensions are
updated more frequently.
Dimensions Are Essential for Analysis The key to a powerful dimensional model
lies in the richness of the dimension attributes because they determine how facts can
be analyzed. Dimensions can be considered as the entry point into fact space.
Always name attributes in the users vocabulary. That way, the dimension will
document itself and its expressive power will be apparent.
Fact Tables
Facts are the numerical measures of the business. The fact table is the largest table in
the star schema and is composed of large volumes of data.
Although a star schema typically contains one fact table, other DSS schemas can
contain multiple fact tables.
Raw facts such as dollar sales can be combined or calculated with other facts to create
measures. Measures can be stored in the fact table or created when necessary for
reporting purposes.
.....................................................................................................................................................
7-20 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Dimensional Model (Star Schema)
Product Channel
Facts
(units,
price)
Customer
Time
Dimension tables
Fact table
.....................................................................................................................................................
Data Warehousing Fundamentals 7-21
.....................................................................................................................................................
Phase Two: Creating the Dimensional Model
Dimensional Model
Schema A schema is a collection of database objects, such as tables, views, indexes,
and synonyms.
Dimensional Model The dimensional model has a single fact table and one or more
lookup or dimension tables for analytical purposes.
Star Schema The star schema is the simplest form of a dimensional model.
The fact table contains foreign keys that reference primary keys in the dimension
tables.
.....................................................................................................................................................
7-22 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Star Schema Model
Central fact table
Radiating dimensions
Denormalized model
Store Table
Store_id
District_id
...
Item Table
Item_id
Item_desc
...
Time Table
Day_id
Month_id
Period_id
Year_id
Product Table
Product_id
Product_desc

Sales Fact Table
Product_id
Store_id
Item_id
Day_id
Sales_dollars
Sales_units
...
Copyright Oracle Corporation, 1999. All rights reserved.
Star Schema Model
Easy for users to understand
Fast response to queries
Simple metadata
Supported by many front end tools
Less robust to change
Slower to build
Does not support history
.....................................................................................................................................................
Data Warehousing Fundamentals 7-23
.....................................................................................................................................................
Phase Two: Creating the Dimensional Model
Star Schema Model
A star schema model can be depicted as a simple star; a central table contains fact data,
and multiple tables radiate out from it, connected by database primary and foreign
keys. Unlike other database structures, a star schema has denormalized dimensions.
A star model:
Is easy to understand by the users because the structure is so simple and
straightforward
Provides fast response to queries with optimization and reductions in the physical
number of joins required between fact and dimension tables
Contains simple metadata
Is supported by many front end tools
Is slow to build because of the level of denormalization
The star schema is emerging as the predominant model for data warehouses
.....................................................................................................................................................
7-24 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Snowflake Schema Model
Time Table
Week_id
Period_id
Year_id
Dept Table
Dept_id
Dept_desc
Mgr_id
Mgr Table
Dept_id
Mgr_id
Mgr_name
Product Table
Product_id
Product_desc
Item Table
Item_id
Item_desc
Dept_id
Sales Fact Table
Item_id
Store_id
Sales_dollars
Sales_units
Store Table
Store_id
Store_desc
District_id
District Table
District_id
District_desc
Copyright Oracle Corporation, 1999. All rights reserved.
Snowflake Schema Model
Direct use by some tools
More flexible to change
Provides for speedier data loading
May become large and unmanageable
Degrades query performance
More complex metadata
Country State County City
.....................................................................................................................................................
Data Warehousing Fundamentals 7-25
.....................................................................................................................................................
Phase Two: Creating the Dimensional Model
Snowflake Schema Model
A snowflake model is closer to an entity relationship diagram than the classic star
model because the dimension data is more normalized. Developing a snowflake model
means building class hierarchies out of each dimension (normalizing the data).
A snowflake model:
Results in severe performance degradation because of its greater number of table
joins
Provides a structure that is easier to change as requirements change
Is quicker at loading data into its smaller normalized tables, compared to loading
into a star schemas larger denormalized tables
Allows using history tables for changing data, rather than level fields (indicators)
Has a complex metadata structure that is harder for end user tools to support
One of the major reasons why the star schema model has become more predominant
than the snowflake model is its query performance advantage. In a warehouse
environment, the snowflakes quicker load performance is much less important than its
slower query performance.
Other Warehouse Models
Besides the star and snowflake schemas, there are other models that can be considered.
Constellation A constellation model (also called galaxy model) simply comprises a
series of star models. Constellations are a useful design feature if you have a primary
fact table, and summary tables of a different dimensionality. It can simplify design by
allowing you to share dimensions among many fact tables.
Third Normal Form Warehouse Some data warehouses consist of a set of
relational tables that have been normalized to third normal form (3NF). Their data can
be directly accessed using SQL code. They may have more efficient data storage, at
the price of slower query performance due to extensive table joins. Some large
companies build a 3NF central data warehouse feeding dependent star data marts for
specific lines of business.
.....................................................................................................................................................
7-26 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Using Summary Data
Provides fast access to precomputed data
Reduces use of I/O, CPU, and memory
Is distilled from source systems and precalculated
summaries
Usually exists in summary fact tables
Phase 3: Modeling summaries
.....................................................................................................................................................
Data Warehousing Fundamentals 7-27
.....................................................................................................................................................
Phase Two: Creating the Dimensional Model
Using Summary Data
Summary data contains fact data that is summarized, such as maximum, minimum,
average, and total, rather like the total or subtotal line of a report. When you require
summary information, you have two choices:
Issue the SQL, access the dimensions, then access the base fact table and perform
the summary calculations on all the selected rows to produce the result (possibly
involving thousands to millions of rows).
Issue the SQL, access the dimensions, then access the keys to the related summary
table and find the row with the presummarized data to produce the result.
Having direct access to a summary table containing precomputed data reduces the disk
I/O, and CPU sort, and memory swapping requirements. Summary data is also referred
to as aggregated data, aggregated facts, or aggregated detail.
Lightly and Highly Summarized Data Summary data falls into two loose
categories:
Lightly summarized data is summarized from the incoming fact data and normally
stored over a unit of time. Please refer to the earlier discussion on granularity.
Highly summarized data is more compact. It may be distilled from lightly
summarized data or introduced into the warehouse already in the highly compact
format.
.....................................................................................................................................................
7-28 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Designing Summary Tables
Units Sales($) Store
Product A
Total
Product B
Total
Product C
Total
Average
Maximum
Total
Percentage
.....................................................................................................................................................
Data Warehousing Fundamentals 7-29
.....................................................................................................................................................
Phase Two: Creating the Dimensional Model
Designing Summary Tables
Summary tables contain fact data that is aggregated using functions such as total,
average, and margin. The summary table shares the dimensions used by the fact data.
Summary data usually exists in summary fact tables, but it may exist in dimension
tables if it is discrete (such as year-to-date figures). For example, a customer
dimension may contain attributes, such as city, state, and country. The summary table
can use these hierarchical attributes to show summary measures for those dimensions
of the business.
How Many Summaries? The issue with summary tables is not whether you are
going to have any, but how many you are going to have. Business users require
summary information. For example, a manager needs the bottom line figures that
show how well the company is performing. Analysis of the requirement is
instrumental in ensuring that the users get the information they need and that they get
it quickly. A warehouse may contain hundreds of summary tables.
What to Summarize Deciding on what summary data to maintain in the warehouse
is an early design consideration and is based upon the users query requirements.
These requirements are determined early on during analysis, and should be
documented, implemented, and monitored. You can identify a summary requirement
that was not specified earlier by monitoring code to identify GROUP BY clauses used
commonly in SQL statements.
A well-designed set of summary tables improves query performance by allowing
queries direct access to precomputed summaries and predefined views of data.
.....................................................................................................................................................
7-30 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Summary Tables Example
SALES FACTS
Sales$ Region Month
10,000 North Jan 99
12,000 South Feb 99
11,000 North Jan 99
15,000 West Mar 99
18,000 South Feb 99
20,000 North Jan 99
10,000 East Jan 99
2,000 West Mar 99
SALES BY MONTH/REGION
Month Region Tot_Sales$
Jan 99 North 41,000
Jan 99 East 10,000
Feb 99 South 40,000
Mar 99 West 17,000
SALES BY MONTH
Month Tot_Sales
Jan 99 51,000
Feb 99 40,000
Mar 99 17,000
.....................................................................................................................................................
Data Warehousing Fundamentals 7-31
.....................................................................................................................................................
Phase Two: Creating the Dimensional Model
Summary Tables Example
Assume a banking scenario. Simple cumulative data gives the total of deposit
transactions, summarized in the warehouse, for that day and every other day thereafter.
With this method there is no loss of detail, but a lot of processing is required when
querying data.
Rolling summarized data brings in daily totals for the first seven days. On day eight,
the first seven days are totaled and stored as a weekly record. At the end of the month,
weekly records are added together to create a monthly record. You reset weekly and
monthly records (slots) to zero at appropriate points. With this method there is less
processing required when querying data, but the detail is lost.
Note: Summary data is also referred to as aggregated data, aggregated facts, or
aggregated detail.
Summary Table Management The requirement for summary tables may change
over time, as what constitutes a popular query changes. Queries may be seasonal, for
example, you may have specific queries for spring, summer, autumn, and winter. The
query management process should be able to identify the summaries that are used, the
summaries that need to be created, and the summaries that may be removed.
.....................................................................................................................................................
7-32 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Summary Management
in Oracle8i
Product
Region
Time
Sales
summary
City
Sales
State
Summary
usage
Summary advisor
Space
requirements
Summary
recommendations
.....................................................................................................................................................
Data Warehousing Fundamentals 7-33
.....................................................................................................................................................
Phase Two: Creating the Dimensional Model
Summary Management in Oracle8i
Oracle8i summary management features includes three major components:
Query-rewrite capabilities
Mechanisms for maintaining summary tables, including incremental updates
Advisory capabilities that help the warehouse administrator create and delete
summaries, based on usage
Summary Advisor
Oracle 8i summary advisor offers the following information:
Summary usage: such as the number of times a rewrite was made to use a
summary, the space used by a summary, and a cost-benefit ratio for each summary.
Summary recommendations: such as creation, retention and dropping of
summaries.
Space requirements: based on queries for possible summaries.
Materialized Views
Summaries are stored in materialized views. While creating materialized views, you
can specify storage options to control the size and location of the views.
Query Rewrite
The Oracle8i cost-based optimizer may use a summary to satisfy a query on the base
table (SALES). The process of transforming a query to access materialized views,
such as the query using the SALES table in the example, is called a query rewrite.
If the SALES table consisted of several million rows and the materialized view
contains a few thousand rows, the query will execute very much faster. Query rewrite
is the key benefit enabled by materialized views.
.....................................................................................................................................................
7-34 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
23
Using Time in the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
The Time Dimension
Where should the element of time be stored?
Time
dimension
Sales fact
Time is critical to the data warehouse.
A consistent representation of time is required for
extensibility.
.....................................................................................................................................................
Data Warehousing Fundamentals 7-35
.....................................................................................................................................................
Phase Two: Creating the Dimensional Model
Using Time in the Data Warehouse
Though it may seem obvious, real-life aggregations of time can be quite complex.
Which weeks roll up to which quarters? Is the first quarter the calendar months of
January, February, and March, or the first 13 weeks of the year that begin on Monday?
Some causes for nonstandardization are:
Some countries start the work week on Mondays, others on Sunday.
Weeks do not cleanly roll up to years, because a calendar year is one day longer
than 52 weeks (one day longer in leap years).
There are differences between calendar and fiscal periods. Consider a warehouse
that includes data multiple organizations, each with its own calendars.
Holidays are not the same for all organizations and all locations.
Representing time is critical in the data warehouse. You may decide to store multiple
hierarchies in the data warehouse to satisfy the varied definitions of units of time. If
you are using external data, you may find that you create a hierarchy or translation
table simply to be able to integrate the data. Matching the granularity of time defined
in external data to the time dimension in your own warehouse may be quite difficult.
The Time Dimension
Because online transaction data, typically the source data for the warehouse, does not
have a time element, you apply an element of time in the extraction, transformation,
and transportation process. For example, you might assign a week identifier to all the
airline tickets that sold within that week. The transaction may not have a time or date
stamp on it, but you know what date the sale has occurred by the generation of the
transaction file.
The dimension of time is most critical to the data warehouse.
A consistent representation of time is required for extensibility.
Storing the Time Dimension Typically there is a time dimension table in the data
warehouse although time elements may be stored on the fact table. Before deciding
where to store time, you must consider the following:
Almost every data warehouse has a time dimension.
Organizations use a variety of time periods for data analysis.
A row whose key is an SQL date may be populated with additional time qualifiers
needed to perform business analysis, such as workday, fiscal period, and special
events.
.....................................................................................................................................................
7-36 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Creating the Physical Model
Phase 4: Creating the Physical Model
Translate the dimensional design to a physical model
for implementation
Define storage strategy for tables and indexes
Perform database sizing
Define initial indexing strategy
Define partitioning strategy
Update metadata document with physical information
Copyright Oracle Corporation, 1999. All rights reserved.
Physical Model Design Tasks
Define naming and database standards
Perform database sizing
Design tablespaces
Develop initial indexing strategy
Develop data partition strategy
Define storage parameters
Set initialization parameters
Use parallel processing
.....................................................................................................................................................
Data Warehousing Fundamentals 7-37
.....................................................................................................................................................
Phase Two: Creating the Dimensional Model
Creating the Physical Model
The physical model resides in the relational database server (RDBMS). You need to
ensure that each object stored (primarily tables) is held in the appropriate manner and
contains all the necessary indexes to ensure optimal performance. There are other
considerations that you should bear in mind for performance, such as data partitioning.
Dimensional Model to Physical Model The mapping of the dimensional model to
the physical elements is accomplished by performing the following to the base
dimensional model:
Add the format such as data types and lengths to the attributes of each entity.
Define storage strategy for tables and indexes.
Perform database sizing.
Define the initial indexing strategy.
Define partitioning strategy.
Update metadata document.
Physical Model Design Tasks
A good physical model is often the difference between a data warehouse success or
failure. The design of the physical model builds on the logical model, adding indexes,
referential integrity, and physical storage characteristics.
Transforming the base dimensional data model into the physical model includes:
Defining naming and database standards
Performing an initial sizing for the data warehouse database
Designing tablespaces
Defining an initial indexing strategy such as primary, unique, nonunique, and
bitmapped for loading programs and end-user access (It may include dropping and
re-creating the indexes before and after batch load routines.)
Using partitioning to split table and index data into smaller, more manageable
chunks
Determining where to place database objects on disk such as disk mapping,
striping, or RAID
Setting initialization parameters
Using parallel processing
.....................................................................................................................................................
7-38 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Tools with a GUI enable definition, modeling, and
reporting
Avoid a mix of modeling techniques caused by:
Development pressures
Developers with lack of knowledge
No strategy
Determine a strategy
Write and publish formally
Make available electronically
Using Data Modeling Tools
CASE tools
Spreadsheets
Paper and
pencil
Copyright Oracle Corporation, 1999. All rights reserved.
GUI Tool Interface
.....................................................................................................................................................
Data Warehousing Fundamentals 7-39
.....................................................................................................................................................
Data Modeling Tools
Data Modeling Tools
You can model the warehouse database by using tools that provide a GUI for:
Entering metadata definitions of facts, dimensions, hierarchies, and relationships
Drawing diagrams of star schemas containing the facts and dimensions
Documenting business requirements
Defining integrity rules and constraints
Generating reports about your metadata definitions
Techniques and Considerations Avoid implementing your data warehouse using a
mixture of techniques or models. This mixture is often caused by:
The pressure on development; a combination of all previous models is considered
a quick approach
Unknowledgeable or untrained designers
Lack of a coherent and available strategy
Determine a strict modeling strategy, and publish the approved strategy formally
throughout the business subject areas.
Consider establishing a data warehouse group to write and maintain all standards and
procedures, or to adapt existing standards and procedures to accommodate data
warehousing. The documents should be made available electronically (on the Web, for
example) and placed in a central repository.
GUI Data Modeling Tools
These tools are also referred to as computer aided software engineering (CASE) tools.
Disregarding these tools, many warehouse implementers simply use spreadsheets or
paper and pencil to model their designs and document the metadata.
Note: Logic Works was acquired by Platinum, which in turn was acquired by
Computer Associates.
WTI Partner Product
Logic Works (see note below) Erwin
Micro Strategy DSS Architect and DSS Agent
Oracle Designer
Data Mart Designer
Prism Solutions, Inc. Inmon Generic Data Models
Smart Corporation Smart DB Workbench
.....................................................................................................................................................
7-40 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Summary
This lesson discussed the
following topics:
Creating a business model
Creating a dimensional model
Modeling the summaries
Creating a physical model
Business model
Physical model
Dimensional model
Select among
business
processes
.....................................................................................................................................................
Data Warehousing Fundamentals 7-41
.....................................................................................................................................................
Summary
Summary
In this lesson, you explored one process for modeling the warehouse database. This
lesson discussed the following topics:
Creating a business model driven by business processes
Creating a logical dimensional model containing a central fact characterized by
several dimensions
Modeling the summaries needed for end-user analysis
Translating the logical model to a physical model
Note: Oracle offers a two-day, instructor-led course entitled Data Warehouse
Database Design. That course uses a case study to teach comprehensive database
design, whereas this lesson provided a high-level overview.
.....................................................................................................................................................
7-42 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.
Practice 7-1 Overview
This practice covers the following topics:
Specifying true or false to a series of statements
Completing a series of sentences accurately
Practicing identifying a simple business model
.....................................................................................................................................................
Data Warehousing Fundamentals 7-43
.....................................................................................................................................................
Practice 7-1
Practice 7-1
1 Identify whether the following statements are true or false.
2 Complete these sentences.
a Access to data in a _________ table is faster than calculating aggregates at the
time of query execution.
b The data warehouse model contains ____ tables that comprise the measures of
the business.
c Dimensions are denormalized in a _______ model.
d A common guideline is to define granularity at one level ________ than
currently used by end users.
3 Practice identifying a simple business model. Pair up with a partner and take turns
interviewing each other to sketch a simple business model.
a Ask your partner to list several of the most important business processes in his
or her organization.
b Ask him or her to prioritize a single business process that would be easiest to
model and deliver the best return on investment in a short time as a data
warehouse project.
c For the chosen business process, help your partner identify one or two business
measures and dimensions that give meaning to those measures.
Question True False
The business model is a logical representation of selected
business processes.
The star model is normalized.
The snowflake model is denormalized.
All warehouses must have a time dimension.
In a warehouse environment, data loading performance is less
important than query performance.
.....................................................................................................................................................
7-44 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 7: Modeling the Data Warehouse
.................................
8
Choosing a Computing
Architecture
.....................................................................................................................................................
8-2 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Overview
Project Management
(Methodology, Maintaining Metadata)
Defining
DW Concepts
& Terminology
Planning
for a
Successful
Warehouse
Analyzing
User Query
Needs
Choosing a
Computing
Architecture
Choosing a
Computing
Architecture
Modeling
the Data
Warehouse
Planning
Warehouse
Storage
ETT
(Building the
Warehouse)
Meeting a
Business
Need
Supporting
End User
Access
Managing
the Data
Warehouse
Copyright Oracle Corporation, 1999. All rights reserved.

Objectives
After completing this lesson, you should be able to
do the following:
Discuss the architectural requirements for the data
warehouse
Consider the benefits of each hardware
architecture
Describe the database server characteristics
required in a warehouse environment
Review the importance of parallelism for the data
warehouse environment
.....................................................................................................................................................
Data Warehousing Fundamentals 8-3
.....................................................................................................................................................
Overview
Overview
The previous lesson covered modeling the data warehouse. This lesson discusses
choosing a computing architecture for the warehouse. Note that the Choosing a
Computing Architecture block is highlighted in the course road map on the facing
page.
Specifically, this lesson examines the computer architectures that commonly support
data warehouses. The benefits of each hardware architecture and reasons for using
distributed warehouses are examined. Students examine the technology requirements
of a database server for warehousing.
Objectives
After completing this lesson, you should be able to do the following:
Discuss the architectural requirements for the data warehouse
Consider the benefits of each hardware architecture
Describe the database server characteristics required in a warehouse environment
Review the importance of parallelism for the data warehouse environment
.....................................................................................................................................................
8-4 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Scalability Manageability Availability Extensibility


Flexibility Integration
Architectural Requirements
User
Budget
Business
Technology
Copyright Oracle Corporation, 1999. All rights reserved.

Strategy for Architecture Definition


Obtain existing architecture plans
Obtain existing capacity plans
Document existing interfaces
Prepare capacity plan
Prepare technical architecture
Document operating system requirements
Develop recovery plans
Develop security and control plans
Create architecture
Create technical risk assessment
.....................................................................................................................................................
Data Warehousing Fundamentals 8-5
.....................................................................................................................................................
Architecture Requirements
Architecture Requirements
The data warehouse tenets described on the top-left slide are perceived to be the
primary tenets in a data warehouse environmentthat is, the architecture must be
scalable, manageable, available, extensible, flexible, and integrated. This list can be
extended to include tunable, reliable, robust, supportable, and recoverable.
Making Compromises
Compromises may affect the task of balancing user needs and business requirements if
budgetary constraints restrain your choices or if technical difficulties are too
challenging.
The architecture requirements definition must be considered at an early stage, in
parallel with the user requirements. Only at this time can successful choices be made.
Architecture requirements definition is a specific phase of the Oracle Data Warehouse
Method (DWM).
Strategy for Architecture Definition
You must have a definitive strategy that employs identified and proven technology.
Using DWM as a foundation for this discussion, consider some of the tasks you need
to perform in the early stages when planning the hardware architecture and
surrounding environment.
Obtain existing plans and outlines of the current technical architecture for the
environments that will supply the warehouse.
Obtain existing capacity plans for the current environments.
Document existing data warehouse interfaces, and document enterprise data
warehouse interface requirements.
Prepare enterprise data warehouse capacity plan.
Prepare enterprise data warehouse technical architecture.
Document enterprise data warehouse system operational requirements.
Develop recovery and fallback strategy.
Develop security and control strategy.
Create enterprise data warehouse architecture.
Create technical risk assessment.
All of these tasks are mentioned in this lesson, but not in the order identified above.
.....................................................................................................................................................
8-6 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Hardware Architectures
Involve all experts
New technology
Old technology
Networking
Copyright Oracle Corporation, 1999. All rights reserved.

Hardware Architectures
Robust
Available
Reliable
Extensible
Scalable
Supportable
Recoverable
Parallel
VLM
64-bit
Connective
Open
.....................................................................................................................................................
Data Warehousing Fundamentals 8-7
.....................................................................................................................................................
The Hardware Architecture
The Hardware Architecture
Consider the hardware architectures first. This is an area of the plan where a number of
people including the data warehousing IT team members must be involved. This
includes the current database administrators of the operational systems, who have the
experience and expertise of current systems and performance and who can also
provide useful input regarding the existing architectures and interfaces. You must
ensure that networking staff are involved as well. It is a critical issue for processes
such as ETT and user access.
Hardware Requirements
The choice of hardware architecture is critical to the success of the data warehouse and
its infrastructure. Warehouses require hardware architectures that are:
Robust
Available
Reliable
Flexible
Extensible
Scalable
Supportable
Recoverable
Parallel
In addition, the architecture should
Have a very large memory (VLM) capability
Be able to use 64-bit addressing
Be connective and conform to open system standards
Note: Do not confuse the term database server with a file server on a local area
network or any other server. For our purposes, the term database server describes the
Relational Database Management System (RDBMS) or Database Management
System (DBMS).
.....................................................................................................................................................
8-8 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Hardware Architectures
SMP
Cluster
MPP
NUMA
Hybrids use SMP and
MPP
Copyright Oracle Corporation, 1999. All rights reserved.

Evaluation Criteria
Determine the platform for your needs
SMP Clusters NUMA MPP
Scalability
Maturity
Low
High
Low
High
.....................................................................................................................................................
Data Warehousing Fundamentals 8-9
.....................................................................................................................................................
The Hardware Architecture
Hardware Requirements (continued)
Today, hardware architectures support a number of different configurations that are
useful for data warehousing and are more cost-effective than hardware architectures
previously available:
Symmetric multiprocessing (SMP): Symmetric multiprocessing architectures are
the oldest of the technologies and have a proven track record.
Cluster: Cluster and massively parallel processing architectures are comparatively
new but are more scalable and provide a lot of power.
Massively parallel processing (MPP) and nonuniform memory access (NUMA):
NUMA is an even more recent innovation that gives you the scalability of an MPP
environment and the manageability of an SMP environment.
Some architectures are a hybrid, employing both SMP and MPP capabilities.
Evaluation Criteria
By specifying the hardware requirements early on in the development of the
warehouse, you have enough lead time to acquire and test the chosen components.
Determining the platform depends upon a number of factors, and the different
architectures have advantages and disadvantages that you must evaluate before
making a final decision:
A symmetric multiprocessing architecture may be sufficient if you have a small
database, can afford a longer response time, and have problems that are not
complex. Problem complexity is determined by the number of users, the type of
calculations, and the types of queries that the system must handle.
The larger your database, the more complex your problems, and the shorter the
required response time, the closer you are to specifying a massively parallel
processing system.
.....................................................................................................................................................
8-10 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Parallel Processing
Parallel daily operations
Shared resources
Memory
Disk
Nothing
Loosely or tightly coupled
Application
Database
Operating system
Hardware
Copyright Oracle Corporation, 1999. All rights reserved.

Making the Right Choice


Requirements differ from operational systems
Benchmark
Available from vendors
Develop your own
Use realistic queries
Scalability important
.....................................................................................................................................................
Data Warehousing Fundamentals 8-11
.....................................................................................................................................................
The Hardware Architecture
Parallel Processing
Hardware architectures that contain parallel processors are often categorized
according to the resources they share.
Memory: SMP machines are often described as tightly coupled.
Disk: Clustered architectures are often described as loosely coupled.
Nothing: MPP machines are described as loosely or tightly coupled, according to
the way communication is accomplished among nodes.
NUMA is an SMP architecture with loosely coupled memory using uniform and non-
uniform memory access.
Making the Right Choice
How do you know which architecture to choose? Operational environments do not
map directly to the way the warehouse operates, with its unpredictable workloads and
scalability requirements.
The only realistic way to determine the interaction between your data warehouse
database and the hardware configuration is to perform full-scale testing. Of course you
may not be able to achieve this.
When benchmarking, use real user queries against volumes of data that mimic the
volumes anticipated in the warehouse.
If you are unhappy with vendor benchmarks, consider developing your own. This is
going to add to the cost of development. However, costs are high for a warehouse
implementation and you may find the amount spent on your own benchmark
worthwhile in the long term.
Because scalability is probably one of the most important requirements, you might
tend toward the choice of an SMP device.
.....................................................................................................................................................
8-12 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Communication by shared memory


Disk controllers accessible to all CPUs
Proven technology
SMP
Shared disks
Common bus
CPU CPU CPU CPU
Shared memory
Copyright Oracle Corporation, 1999. All rights reserved.

SMP
Benefits:
High concurrency
Workload balancing
Moderate scalability
Easy administration
Limitations:
Memory (cluster for improvements)
Bandwidth
CPU CPU CPU CPU
Shared memory
.....................................................................................................................................................
Data Warehousing Fundamentals 8-13
.....................................................................................................................................................
The Hardware Architecture
Symmetric Multiprocessing
A symmetric multiprocessing (SMP) machine comprises a set of CPUs that share
memory. It has a shared everything architecture:
Each CPU has full access to the shared memory through a common bus.
Communication between the CPUs uses the shared memory.
Disk controllers are accessible to all CPUs.
This is a proven technology, particularly in the data warehousing environment.
Note: A bus is a cable or circuit used to transfer data or electrical signals among
devices.
Benefits
High concurrency
Workload balancing
Moderate scalability
Is not as scalable as MPP or NUMA.
Easier to administer than a cluster environment, with proven tools
Limitations
Available memory may be limitedthis can be enhanced by clustering
Bandwidth for CPU to CPU communication and I/O and bus communication
Note: SMP machines are often nodes in a cluster. Multiple SMP nodes can be used
with certain vendors architecturesDEC, Pyramid, Sequent, Sun, SparcServer
where disk is shared among the multiple nodes. Some warehouse sites are exploring
the evolving concept of loaning excess memory or processing capacity among
applications or hardware.
Some SMP vendors allow you to scale to MPP without losing your SMP box. You
simply add interconnect software and associated technology.
.....................................................................................................................................................
8-14 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

NUMA
Nonuniform
memory access
Disk
Shared bus
Disk
Shared
memory
Shared
memory
CPU CPU CPU CPU CPU CPU
Loosely coupled shared memory
Copyright Oracle Corporation, 1999. All rights reserved.

NUMA
Benefits:
Fully scalable, incremental additions to disk,
CPU, and bandwidth
Performs better than MPP
Suited for Oracle server
Limitations:
The technology is new and less proven
You need new tools for easy system
management
NUMA is more expensive than SMP
.....................................................................................................................................................
Data Warehousing Fundamentals 8-15
.....................................................................................................................................................
The Hardware Architecture
Nonuniform Memory
Shared memory systems are systems with loosely coupled memory. The shared
memory may be accessed by using uniform memory access from CPUs or by
nonuniform memory access (NUMA).
The Oracle Parallel Server can work with either form of memory access, but NUMA is
a more costly form of access and synchronization than uniform memory access. While
any CPU can access the memory, it is more costly for remote nodes.
Benefits
A fully scalable architecture that can overcome some of the scalability problems of
SMP
A very scalable parallel architecture, and therefore it is possible to add disk, CPU,
and bandwidth incrementally to any level
A system that performs better than an MPP system where there are ad hoc or
mixed workloads
Suited to the Oracle server
Limitations
The technology is new and less proven.
You need new tools for easy system management.
NUMA is more expensive than SMP.
.....................................................................................................................................................
8-16 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Clusters
Node 1 Node 2 Node 3
Common high-speed bus
Shared disks
Common high-speed bus
Shared
memory
CPU CPU CPU
Shared
memory
CPU CPU CPU
Shared
memory
CPU CPU CPU
Copyright Oracle Corporation, 1999. All rights reserved.

Clusters
Shared disk, loosely coupled
Dedicated memory
High-speed bus
Shared resources
SMP node
Benefits:
High availability
Single database concept, incremental growth
Limitations:
Scalability, internode synchronization needed
Operating system overhead
Shared
memory
CPU CPU CPU
Shared
memory
CPU CPU CPU
Shared
memory
CPU CPU CPU
.....................................................................................................................................................
Data Warehousing Fundamentals 8-17
.....................................................................................................................................................
The Hardware Architecture
Clusters
Shared disk, loosely coupled systems have the following characteristics:
Each node consists of one or more CPUs and associated dedicated memory.
Memory is not shared between nodes.
Communication occurs over a high-speed bus.
Each node has access to all of the disks and other resources.
An SMP machine can be a node, if the hardware supports it.
Benefits
High availability; all data is accessible even if one node dies
The concept of one database, which is an advantage over shared nothing systems
such as MPP
Incremental growth
Limitations
Bandwidth of the high speed bus limits the scalability of the system.
Internode synchronization is required. Each node has a data cache; cache
consistency must be maintained for the locking mechanisms to work effectively.
The shared disk software gives an overhead on the operating system.
.....................................................................................................................................................
8-18 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

MPP
CPU
Memory
CPU
Memory
CPU
Memory Memory
CPU
Disk Disk Disk Disk
Copyright Oracle Corporation, 1999. All rights reserved.

MPP
A shared nothing architecture
Many nodes
Fast access
Exclusive memory on a node
Low cost per node
Scalable
nCUBE configuration
.....................................................................................................................................................
Data Warehousing Fundamentals 8-19
.....................................................................................................................................................
The Hardware Architecture
Massively Parallel Processing
The massively parallel (MPP) architecture is concerned with disk access, rather than
memory access, and works well with operating systems that provide transparent disk
access. You can scale the configuration up by adding more CPUs.
If a table or database is located on a disk, access depends entirely on the CPU that
owns it. If the CPU fails, the data cannot be accessed, regardless of how many other
CPUs are running, unless logical pointers are established to alternative CPUs.
Typically, massively parallel architectures have the following characteristics:
Are very fast compared with SMP and cluster architectures
Support a few to thousands of nodes
Provide fast access between nodes
Have associated nonshared memory associated with each node
Have a low cost per node
Massively parallel technology is comparatively new and not proven to the same extent
as SMP and cluster technology.
nCUBE Arrangements Nodes may be organized on a grid arrangement if using
nCUBE. Multiprocessor designs provide a scalable architecture that let you increase
performance easily as your needs grow. The key to a multiprocessor system is the
interconnectthe mechanism that allows the processors to communicate and
cooperate. In an nCUBE system, processors are connected in a multidimensional cube
called a hypercube, providing the fastest and densest communications network
available. The hypercube network is organized so that connections among processors
form cubes. As more processors are added, the cube grows to larger dimensions. The
nCUBE system is scalable to hundreds of processors.
.....................................................................................................................................................
8-20 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

MPP Benefits
Unlimited incremental growth
Very scalable
Fast access
Low cost per node
Good for DSS
CPU
Memory
CPU
Memory
CPU
Memory Memory
CPU
Disk Disk Disk Disk
Copyright Oracle Corporation, 1999. All rights reserved.

MPP Limitations
Rigid partitioning
Cache consistency
Restricted disk access
High memory cost per node
High management burden
Careful data placement
CPU
Memory
CPU
Memory
CPU
Memory Memory
CPU
Disk Disk Disk Disk
.....................................................................................................................................................
Data Warehousing Fundamentals 8-21
.....................................................................................................................................................
The Hardware Architecture
Massively Parallel Processing (continued)
Benefits
Practically unlimited, and incremental growth
Very scalable (given careful data placement)
Fast access between nodes
Low cost per node (each node is an inexpensive processor)
Each node has its own devices, but, in case of failure, other nodes can access the
devices of the failed node (on most systems); failure may be local to the node.
Good for DSS and read-only databases
Limitations
Many database servers (not necessary with Oracle) require rigid data partitioning
for parallelism and scalability.
Cache consistency must be maintained.
Disk access is restricted.
The memory cost per node is high.
The management burden is high.
Careful data placement is required for scalability.
.....................................................................................................................................................
8-22 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Windows NT
Architecture based on the client-server model
Benefits:
Include built-in Web services
Scalability
Ease of management and control
Limitations:
Not as secure
Cannot execute programs remotely
Lack linear scalability beyond four processors
Addressing space for applications is limited to
two gigabytes
.....................................................................................................................................................
Data Warehousing Fundamentals 8-23
.....................................................................................................................................................
The Hardware Architecture
Windows NT
The architecture for Windows NT is based on the client-server model. The approach
divides the operating system into an executive running in kernel mode and several
server processes, each running in user mode. Each server process implements a unique
operating system environment.
Benefits
Windows NT server operating system includes built-in Web services that provide a
complete, integrated intranet solution.
Windows NT offers scalability improvements of up to 33 percent, yielding more
linear scalability on machines with eight or more processors.
Ease of management and control with user profiles and system policies enable
system administrators to easily manage user desktops, including the ability to
control access to the network and to desktop resources as well as support for users
accessing multiple workstations.
Limitations
Windows NT is not as secure as other operating systems such as UNIX.
On other operating systems, you can execute programs on your machine remotely,
but you cannot do this with Windows NT.
Although Windows NT can support SMP with up to 32 processors, Windows NT
has been criticized for its lack of linear scalability beyond four processors.
Addressing space limits Windows NT applications to two gigabytes. This is
insufficient for large data warehouses.
.....................................................................................................................................................
8-24 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Architectural Tiers
Tiered structures:
Modular
Logical separation
Distributed structures:
Two-tier
Three-tier
Four-tier (and more)
.....................................................................................................................................................
Data Warehousing Fundamentals 8-25
.....................................................................................................................................................
The Hardware Architecture
Architectural Tiers
Architectures can be the simple two-tier type, the more complex three-tier type, or if
Web applications are involved up to a four-tier type. This enables a useful division of
labor for specific tasks and processes, and can assist and complement the network
setup.
Two-Tier Architecture A simple two-tier architecture involves:
A mainframe CPU, such as IBM, with source data, which is copied and extracted
periodically to
A smaller server, such as Windows NT
A query and analysis tool is then provided for the NT environment.
This structure does not fit well into the kind of enterprisewide environments discussed
so far. Three-tier architectures are more common.
Three-Tier Architecture A three-tier architecture employs a separate middleware
layer for data access and translation.
Tier 1 hosts the production applications on a mainframe or midrange system and is
devoted to real-time production level data processing.
Tier 2 comprises a departmental server resident with the warehouse users, for
example, a UNIX workstation or NT server, which is optimized for query
processing, analysis, and reporting.
Tier 3 comprises the desktop and handles reporting, analysis, and graphical data
presentation. PCs are connected on a LAN.
The three-tier architecture is more effective than two-tier architecture because the first
tier is devoted to operational processing, the second to department-level query
processing and analysis, and the third to desktop data presentation.
Four-Tier and Greater Architecture This architecture is similar in structure to the
three tiers, with the addition of a Web-based tier.
.....................................................................................................................................................
8-26 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Middleware
Technologies for integration
Gateway
.....................................................................................................................................................
Data Warehousing Fundamentals 8-27
.....................................................................................................................................................
The Hardware Architecture
Middleware
Middleware is a term that is used to describe technologies that allow you to integrate
multiple server technologies together in a seamless manner. Middleware tools are
common in todays computing environment. Oracle gateway technology is one
example of middleware available off the shelf.
In a multitier data warehousing environment with Internet access, middleware is
becoming increasingly redefined and refined.
.....................................................................................................................................................
8-28 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Database Server Requirements


Robust
Available
Reliable
Extensible
Scalable
Supportable
Recoverable
Parallel
Copyright Oracle Corporation, 1999. All rights reserved.

Parallelism
Database
Query
Load
Index
Sort
Backup
Recovery
.....................................................................................................................................................
Data Warehousing Fundamentals 8-29
.....................................................................................................................................................
Database Server Requirements
Database Server Requirements
The database server (DBMS) must be:
Robust
Available
Reliable
Flexible
Extensible
Scalable
Supportable
Recoverable
Parallel
Parallelism
The driving force behind the warehouse implementation is the needs of the end users
who require access to the information. The database environment must handle all
operational tasks and processes quickly and efficiently. Of course parallel capabilities
minimize the time taken to perform all the major functions of the warehouse and
maximize availability.
As you have seen parallelism at all levels is becoming mandatory for warehouses:
Database (server)
Query
Load
Index
Sort
Backup
Recovery
.....................................................................................................................................................
8-30 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Further Considerations
Optimization strategy
Partitioning strategy
Summarization strategy
Indexing techniques
Hardware and software scalability
Availability
Administration
Copyright Oracle Corporation, 1999. All rights reserved.

Server Environments
Operational
servers
Warehouse
servers
Data mart
servers
Open DBMS
Network, relational,
hierarchical
Mainframe
proprietary DBMS
Oracle, IMS, DB2,
VSAM, Rdb, Non
Stop SQL, RMS
Open DBMS
Relational
General purpose and
warehouse-specific
DBMS
Oracle, Informix,
Sybase, IBM DB2,
NCR/AT&T Teradata
Red Brick
Open DBMS
Relational and
multidimensional
General purpose
and warehouse
specific DBMS
Oracle, Oracle
Express, Arbor
Essbase, MS SQL
Server, NT
.....................................................................................................................................................
Data Warehousing Fundamentals 8-31
.....................................................................................................................................................
Database Server Requirements
Further Considerations
Parallelism is not the only consideration; you must also consider the following:
The optimization strategy, particularly star query techniques employed with star
and snowflake structures (Todays servers enable you to optimize data access in
many different ways.)
The partitioning strategy
Summarization strategies, to ensure that the overhead of creating summaries does
not affect the load
Indexing techniques, in particular, bitmap indexes
Hardware and software scalability
Availability of the warehouse
The system administration, which must easily manage the entire infrastructure
Server Environments
Many different database servers and hardware architectures can be employed for a
warehouse solution. It is generally assumed that data warehouse database technology
means relational technology.
Operational Servers: Open, mainframe proprietary database servers (whether
network database server, hierarchical database server, or relational database
server), such as Oracle, IMS, DB2, DB2/PE, VSAM, Rdb, Non-Stop, SQL, or
RMS.
Warehouse Servers: Open (usually relational) database servers that may be
warehouse specific or general purpose, such as Oracle, Informix, Adabas D,
OpenIngres, or Red Brick.
Data Mart Servers: Relational, multidimensional (OLAP) databases, or both; they
may be warehouse specific or general purpose, such as Oracle, Oracle Express,
Arbor Essbase, MS SQL Server, and NT based environments.
.....................................................................................................................................................
8-32 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Parallel Processing
A large task broken into smaller tasks:
Concurrent execution
One or more processors
Processor 1
Elapsed time
Not parallel
Processor 2
Processor 1
Processor 4
Processor 3
Parallel
Copyright Oracle Corporation, 1999. All rights reserved.

Parallel Database
Increased speed
Improved scalability
Performance gains
Availability
Flexibility
More users
Processor 2
Processor 1
Processor 4
Processor 3
Parallel
.....................................................................................................................................................
Data Warehousing Fundamentals 8-33
.....................................................................................................................................................
Parallel Processing
Parallel Processing
A parallel processor takes a task (usually a large task) and divides it into smaller tasks
that can be executed concurrently on one or more nodes (separate processors). As a
result, a large task requested by a single user completes more quickly. Before
examining the individual parallel features, consider the parallel database.
Parallel Database
A parallel database takes advantage of architectures that share access to data, software,
and peripheral devices by running multiple instances that share a single physical
database.
This type of processing has two key features:
Increased speed: The server can perform the same task in less time
Improved scalability: The ability to perform a task many times larger, on a system
many times larger, without any performance degradation
These key features give you the following benefits:
Higher performance
Greater availability
Greater flexibility
Greater accessibility to online users
All of these features directly benefit the warehouse and are supported by the Oracle7,
Oracle8, and Oracle8i Server.
.....................................................................................................................................................
8-34 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Parallel Query
SQL code split among server processes.
Query
Sub-
Query
Sub-
Query
Sub-
Query
Copyright Oracle Corporation, 1999. All rights reserved.

Parallel Load
Bypass SQL processing to speed throughput.
Feb 98 Mar 98
Order table
Jan 98
.....................................................................................................................................................
Data Warehousing Fundamentals 8-35
.....................................................................................................................................................
Parallel Processing
Parallel Query
Most database servers today support parallel query. Specifically, the Oracle Server
parallel query option divides the work of processing a single SQL statement among
multiple query server processes. In some applications, particularly decision support
systems, an individual query may use vast amounts of CPU resource and disk I/O. The
server parallelizes individual queries into units of work that can be processed
simultaneously.
Parallel Load
Parallelism can dramatically speed up loading data. Database servers can bypass
standard SQL processing (that is, data manipulation language commands, such as
INSERT), and the data is loaded directly into the database tables.
.....................................................................................................................................................
8-36 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Parallel Processing
Index
Sort
Backup
Recovery
Summaries
Reduces the time to create
Allocates memory in cache efficiently
Runs simultaneously from any node
Offline
Online
Runs simultaneously from redo logs
Uses the CREATE TABLE AS SELECT
statement
.....................................................................................................................................................
Data Warehousing Fundamentals 8-37
.....................................................................................................................................................
Parallel Processing
Parallel Index
Creating an index in parallel decreases the time required to create and reconfigure a
warehouse. Many indexes exist in the warehouse database. Nearly every attribute on
dimension tables and composite key values on the fact table are indexed. Indexes take
up a lot of space in the warehouse, and you must consider the direct access storage
device (DASD) needed for indexes as well as fact and dimension tables.
Parallel Sort
Sorting is an intensive task that requires a substantial amount of memory. If you are
working in a parallel environment, sort areas are allocated more efficiently to reduce
serialization and cross-instance pinging. Sort space is cached in memory (in the Oracle
server this is in the System Global Area).
Parallel Backup
With parallel operations, backups can be performed simultaneously from any node of a
parallel server.
Online backups enable the database to be backed up while active, allowing users
continuous access.
Offline backups enable the database to be backed up while shutdown, preventing
user access.
Parallel Recovery
The goal of parallel recovery is to employ I/O parallelism to reduce the elapsed time
required to perform crash recovery, instance recovery, or media failure recovery. The
server uses one process to read files sequentially and dispatch redo information to
several recovery processes to apply the changes from the log files to the data files.
Parallel Table Creation
With the Oracle7, Oracle8, and Oracle8i Server you can create tables in a parallel
manner using the CREATE TABLE AS SELECT (CTAS) statement.
.....................................................................................................................................................
8-38 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Summary
This lesson discussed the following topics:
Outlining the basic architecture requirements for a
warehouse
Highlighting the benefits and limitations of all the
different hardware architectures
.....................................................................................................................................................
Data Warehousing Fundamentals 8-39
.....................................................................................................................................................
Summary
Summary
This lesson discussed the following topics:
Outlining the basic architecture requirements for a warehouse
Highlighting the benefits and limitations of all the different hardware architectures
.....................................................................................................................................................
8-40 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
Copyright Oracle Corporation, 1999. All rights reserved.

Practice 8-1 Overview


This practice covers the following topics:
Defining, stating benefits and limitations of SMP,
NUMA, clusters, and MPP
Defining parallelism and explaining its importance
to the data warehouse
.....................................................................................................................................................
Data Warehousing Fundamentals 8-41
.....................................................................................................................................................
Practice 8-1
Practice 8-1
1 Form into small groups, and consider each of the following hardware
architectures. With your books closed, create a short definition for each
architecture. Each answer should include the benefits and limitations of each
architecture.
2 Staying in your small group, discuss each of the following questions.
a What is parallelism?
b Why is it important to the data warehouse?
Architecture Definition Benefits Limitations
SMP
NUMA
Clusters
MPP
.....................................................................................................................................................
8-42 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 8: Choosing a Computing Architecture
.................................
9
Planning Warehouse
Storage
.....................................................................................................................................................
9-2 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

Overview
Project Management
(Methodology, Maintaining Metadata)
Defining
DW Concepts
& Terminology
Planning
for a
Successful
Warehouse
Analyzing
User Query
Needs
Choosing a
Computing
Architecture
Modeling
the Data
Warehouse
Planning
Warehouse
Storage
Planning
Warehouse
Storage
ETT
(Building the
Warehouse)
Meeting a
Business
Need
Managing
the Data
Warehouse
Supporting
End User
Access
Copyright Oracle Corporation, 1999. All rights reserved.

Objectives
After completing this lesson, you should be able to
do the following:
Discuss different partitioning methods and
indexing methods
Consider the benefits and limitations of different
RAID levels in protecting the database
.....................................................................................................................................................
Data Warehousing Fundamentals 9-3
.....................................................................................................................................................
Overview
Overview
The previous lesson covered choosing a computing architecture. This lesson discusses
planning warehouse storage. Note that the Planning Warehouse Storage block is
highlighted in the course road map on the facing page.
Specifically, this lesson examines the database setup and management issues such as
partitioning, indexing, and ways to protect your database.
Objectives
After completing this lesson, you should be able to do the following:
Discuss different partitioning methods and types of indexes
Consider the benefits and limitations of different RAID levels in protecting the
database
.....................................................................................................................................................
9-4 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

Data Partitioning
Breaking up of data into
separate physical
units that can be handled
independently
Ease of:
Restructuring
Reorganization
Removal
Recovery
Monitoring
Management
Archiving
Indexing
Feb 98 Mar 98
Add
Drop
Order table
Other data is not affected
Jan 98
Copyright Oracle Corporation, 1999. All rights reserved.

Objects to Partition
Tables:
Fact
Dimension
Indexes
.....................................................................................................................................................
Data Warehousing Fundamentals 9-5
.....................................................................................................................................................
The Server Data Architecture
The Server Data Architecture
Data Partitioning
Partitioning enables you to break tables down into smaller, more manageable units,
thus addressing the problems of supporting large tables and indexes (which are
inherent in data warehouses). A large table is broken into many smaller physical tables
or views, and then they are pulled together again for query actions that access data
from more than one of the tables or views.
The data may be partitioned horizontally or vertically. Partitioning helps in the
following ways:
Improves the speed of access and data management by eliminating the need to visit
both vertical or horizontal partitions during query and backup tasks
Increases the availability by reducing the time to perform all the warehouse
management tasks (such as load) and the ability to take one area of the database
offline and keep others active
You partition fact data to break the large volumes of data up into smaller units.
Partitioned data can easily be:
Restructured
Reorganized
Removed
Recovered
Monitored
Managed
Archived
Indexed, with improved sequential data scanning
Note: In determining objects to partition, you use partitioning initially on the fact
table, because it is the largest and requires the most management and maintenance.
However, you can use partitioning on any table in the data warehouse. You should also
partition indexes.
.....................................................................................................................................................
9-6 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

Horizontal Partitioning
Table and index data are split by:
Time
Sales region or person
Geography
Organization
Line of business
Candidate columns appear in
WHERE clause
Analysis determines requirement
Copyright Oracle Corporation, 1999. All rights reserved.

Vertical Partitioning
You may use vertical partitioning when:
Speed of query and update actions is improved by
it
Users require access to specific columns
Some data is changed infrequently
Descriptive dimension text may be better moved
away from the dimension itself
.....................................................................................................................................................
Data Warehousing Fundamentals 9-7
.....................................................................................................................................................
The Server Data Architecture
Data Partitioning (continued)
There are two broad categories of partitioning: Horizontal partitioning and vertical
partitioning.
Horizontal Partitioning Horizontal partitioning is commonly used in warehouse
environments because it enables you to store a very large table in smaller tables. It
gives the database administrator control over the rows that go into each table.
For example, 12 months of data can be stored in 12 tables or views, one for each
month. The advantage, when querying data, is that full table scans are reduced. A
query that requires information for the month of February merely scans a single table
or view of the data.
Warehouse partitioning can be based on different criteria, but usually one or more of
the following:
Time
Sales region
Sales person
Geographical unit
Organization
Line of business
Example: Partitioning by time is most common, because most of the information you
need for analysis is based on time periods. Partitioning by time is also effective for
loading and archiving tasks. You can insert a new data table into the warehouse for
each month, and easily remove (drop) the oldest table.
Vertical Partitioning With vertical partitioning, you break tables up on a column-
by-column basis. You may use vertical partitioning when:
It would improve the speed of query and update actions.
Users require access to specific columns. It is useful if queries are specifically on a
small number of columns rather than a whole row, or you want to control visibility
to sensitive data, such as salary figures on a payroll (HR) system.
Some data is changed infrequently. You can keep the infrequently changed data in
a separate partition. It is easier to manage data this way, and you can make some of
the attributes globally read-only. You can also store less frequently accessed data
on CD-ROM and in a carousel or cartridge unit.
Descriptive dimension text may be better moved away from the dimension itself.
Initial partitioning strategies are normally used in the first implementation of the
warehouse. After use, you often find that analysis and review of performance, users
query techniques, and data management strategies determine the need for further or
alternative partitioning. Continually review the strategy.
.....................................................................................................................................................
9-8 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

Partitioning Methods
Range partitioning (Oracle8 and Oracle8i)
Hash partitioning (Oracle8i)
Composite partitioning (Oracle8i)
Range
partitioning
Hash
partitioning
Composite
partitioning
.....................................................................................................................................................
Data Warehousing Fundamentals 9-9
.....................................................................................................................................................
The Server Data Architecture
Partitioning Methods
The different types of partitioning methods that are available for Oracle8 and Oracle8i
are listed below.
Range Partitioning (Oracle8 and Oracle8i)
Range partitioning exists since Oracle8. This option supports partitioning data
based on ranges of values. Range partitioning guarantees that only data with a
particular set of values is contained in each partition. Range partitioning is good
for rolling windows of data.
Hash Partitioning (Oracle8i)
Hash partitioning is a new feature of Oracle8i. Hash partitioning reduces
administrative complexity by providing many of the manageability benefits of
partitioning, with minimal configuration effort. When implementing hash
partitioning, the administrator simply chooses a partitioning key and the number of
partitions. Oracle8i automatically distributes the data evenly across all partitions.
Hash partitioning is particularly appropriate for tables that do not have a natural
partitioning key.
Composite Partitioning (Oracle8i)
Composite partitioning partitions data using the range method and within each
partition, subpartitions it, using the hash method. This new type of partitioning,
which is available only in Oracle8i, supports historical operations data at the
partition level, and parallelism (parallel DML) and data placement at the
subpartition level. Composite partition is ideal for both historical data and data
placement.
Two new partitioning methods introduced in Oracle8i, hash and composite
partitioning, offer improvements for tables that do not naturally submit themselves to
range partitioning in one or more of the following areas:
Ease of specification
Simplicity of management for support of parallelism
Reduction in skew in the amount of resources required to perform maintenance
operations (such as export or backup) on different partitions of a table
Performance by adding support for partitionwise joins and intrapartition parallel
data manipulation language (DML)
Take better advantage of hierarchical storage management solutions.
Benefits of Partitioning A major reason for supporting partitioned objects in
Oracle8 and Oracle8i was the dramatic increase in the size of database objects (for
example, tables) and the need to:
Reduce downtime (owing to scheduled maintenance and data failures)
Improve performance through partition elimination (it is also called partition
pruning)
Improve manageability and ease of configuration
.....................................................................................................................................................
9-10 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

Star Query Optimization


Optimum performance with star schema models
1. Dimensions are queried to create a
2. Cartesian product, computed against
3. Smaller reference tables.
4. The result is joined to
5. A fact table to produce a query result.
= Query
Result
1 2 3 4 5
Copyright Oracle Corporation, 1999. All rights reserved.

Star Transformation
STAR_TRANSFORMATION_ENABLED
Key 1 Key 2 Key 3
Key 1
Key 2
Key 3
Product_Table
Market_Table
Time_Table
Dollars
Fact_Table
Year Month
Stat
Brand
1002
1001
1003
1002 1003 1001
March 1998
ABC
SF
6000
2002 2003 2001 10000
3002 3003 3001 15200
4002 4003 4001 9526
.....................................................................................................................................................
Data Warehousing Fundamentals 9-11
.....................................................................................................................................................
The Server Data Architecture
Star Query Optimization
A star query is a mechanism that provides high levels of performance when querying
data in a star or snowflake model (a natural representation for most warehouses).
Optimizers that support star query execution can handle the complex joins with a
specific execution plan.
The star query works by accessing dimensions to create a Cartesian product, which is
computed against smaller reference tables. The result is joined to the fact table, which
is scanned once to produce the query result.
Note: The Oracle server cost based optimizer supports this technique.
Star Transformation
The star transformation is a cost-based query transformation aimed at executing star
queries efficiently. Whereas the star optimization works well for schemas with a small
number of dimensions and dense fact tables, the star transformation may be considered
as an alternative if any of the following holds true:
The number of dimensions is large.
The fact table is sparse.
There are queries where not all dimension tables have constraining predicates.
The STAR_TRANSFORMATION_ENABLED parameter specifies whether a cost-based
query transformation is applied to star queries. The default value is TRUE. This
parameter can be set dynamically using the ALTER SESSION command.
.....................................................................................................................................................
9-12 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

Indexing
Indexing is used because:
It is a huge cost saving, greatly improving
performance and scalability
Can replace a full table scan by a quick read of the
index followed by a read of only those disk blocks
that contain the rows needed
Copyright Oracle Corporation, 1999. All rights reserved.

B-Tree Index
Most common type of indexing
Used for high cardinality columns
Designed for few rows returned
.....................................................................................................................................................
Data Warehousing Fundamentals 9-13
.....................................................................................................................................................
The Server Data Architecture
Indexing Data
By intelligently indexing data in your data warehouse, you can increase both the
performance and scalability of your warehouse solution. Using indexes, you can
replace a full table scan by a quick read of the index followed by a read of only those
disk blocks that contain the rows needed. The types of indexes are described below.
B-Tree Indexes This is the most common type of indexing, used for high cardinality
columns, and designed for few rows returned. Rather than scanning an entire table to
find rows where certain column satisfies a WHERE clause predicate, you instead
create a separate index structure on that column. This index structure contains a sorted
list of all the actual discrete column values, and each value in the index is associated
with a list of pointers to all the rows in the original table that contain that value. The
index is stored internally using a binary tree (or B-tree) representation in order to
allow the database engine to quickly find any element in the sorted list.
Note: Cardinality is defined as the number of distinct key values expressed as a
percentage of the number of rows in the table. For example, a million-row index with
five distinct values has a low cardinality while a 100-row table with 80 distinct values
has a high cardinality.
.....................................................................................................................................................
9-14 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

Bitmap Indexes
Provide performance benefits and storage savings
Store values as 1s and 0s
Use instead of B-tree indexes when:
Tables are large
Columns have relatively low cardinality
Bitmap index on product color
Blue - 1000100100010010100
Green - 0001010000100100000
Mauve - 0100000011000001001
Gold - 0010001000001000010
Copyright Oracle Corporation, 1999. All rights reserved.

Oracle 8 and Oracle8i


Index Enhancements
Oracle8 index enhancements:
Partitioned index
Index-organized tables
Oracle8i index enhancements:
Function-based index
New bitmap index
improvements
Online index build and rebuild
Descending index
Statistics can be collected when an index is
created
.....................................................................................................................................................
Data Warehousing Fundamentals 9-15
.....................................................................................................................................................
The Server Data Architecture
Bitmap Indexes
Bitmap indexes provide substantial performance benefits and storage savings. When a
bitmap index is created on a column, a bit stream (ones and zeros) is created for each
distinct value in the indexed column. They are useful on low cardinality data.
Scanning 1s and 0s is much more efficient than scanning data values.
Bitmap indexes are an alternative to normal B-tree indexes in the following situations:
The table is large (millions of rows).
Columns have low cardinality index key values.
Oracle8 and Oracle8i Index Enhancements
Partitioned Indexes (Oracle8)
You may choose to partition B-tree or bitmap indexes in synch with your table
partitioning strategy. These are called local indexes. Indexes may be prefixed
(synchronized with the tablespace), nonprefixed (related to columns not in the
partition), or global (the index is partitioned differently from the table).
Index Organized Tables (Oracle8)
The data for the table is held in the index, and changes to data result only in
changes to the index. Access can be by primary or any other key that is a valid
prefix of the primary key. Standard SQL is used to access these indexes. Some of
the benefits are that they provide faster, key-based access involving exact match or
range searches, and storage requirements are reduced because index and key
values are stored once and the value of ROWID is not required.
Oracle8i Index Enhancements
The following are the index enhancements for Oracle8i:
Function-based index: Allows a warehouse administrator to build an index on
a function. A common use of a function-based index is in the creation of case-
insensitive indexes, which can be implemented by creating an index on the
uppercase function applied to a character column.
New bitmap index improvements: Reduction in compress or uncompress
operations.
Online index build and rebuild: Rebuilding indexes and index-organized tables
can be done without locking the table.
Descending index: Indexes in Oracle8i can be stored in descending order of
key values.
.....................................................................................................................................................
9-16 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

Protecting the Database


RAID is essential with large databases
RAID improves:
Reliability
Storage management
There are different levels of RAID
You can eliminate disk contention with disk
striping
.....................................................................................................................................................
Data Warehousing Fundamentals 9-17
.....................................................................................................................................................
Protecting the Database
Protecting the Database
You must consider using some form of protection against media failure, such as
mirroring or RAID (Redundant Array of Independent Disks) technology, so that the
data warehouse can be restored to its original state. This protection is valuable, even in
a small database, because it can often save the need for recovery. The larger a
database, the greater the necessity and the bigger the cost of this sort of technology.
RAID
RAID achieves data accessibility benefits in a cost effective manner:
Improved reliability (fault tolerance)
Enhanced storage management
RAID Levels
There are a number of different levels of RAID:
RAID Level 0: Striping without parity (DSA)
RAID Level 0+1: Mirrored striping
RAID Level 1: Mirrored disk array (MDA)
RAID Level 3: Data striping with byte level parity
RAID Level 4: Same as RAID 3, but with block level parity
RAID Level 5: Independent Disk Array (IDA)
Note: RAID Levels 0, 1, and 5 are discussed on the following pages because these are
found to be most useful. In a data warehouse where the workload profile is unknown,
you should use machine striping for all objects. To eliminate contention for disks you
should ensure that tables that are subject to multiple concurrent parallel scans are
given a dedicated set of disks, striped to give the necessary I/O bandwidth and load
balancing abilities.
The stripe size is a hotly debated issue. It impacts tablescan performance as well as
database operational issues, such as backups and restores. When setting the stripe size,
the administrator should ensure that each I/O can be satisfied within one stripe.
.....................................................................................................................................................
9-18 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

RAID 0: Striping
The file is written to a four-drive disk array:
Block 1 on Drive 1
Block 2 on Drive 2 . . .
Block 5 in another sector on Drive 1
File A (a)
File A (e)
File A (b)
File A (f)
File A (c) File A (d)
Disk array controller
Copyright Oracle Corporation, 1999. All rights reserved.

RAID 0: Striping
Benefits:
Good for simultaneous reads and writes
No redundancy
Scalable
Limitations:
Not recommended for mission-critical systems
No recovery from data loss
One bad sector affects entire disk of data
.....................................................................................................................................................
Data Warehousing Fundamentals 9-19
.....................................................................................................................................................
Protecting the Database
RAID Level 0: Striping
RAID-0 spreads (stripes) the database across hardware volumes. Striping data spreads
the I/O load across multiple disks, increasing throughput. There is a tradeoff between
performance and resilience. The more disks there are, the more files end up on a single
disk, and inevitably the more files are lost if there is a disk failure. This makes the use
of mirroring or RAID technology all the more important.
In the example, you see a file written to a four-drive disk array. Data is striped by
system block size, in increments of one segment at a time (the segment size is a
system-dependent feature). Independent data paths go to the drives, and the spreading
of segment-length portions of data is repeated across the entire disk array.
Benefits:
Is good for simultaneous reads and writes, which benefit applications that produce
very large files
Gives rise to no disk redundancy as data is striped by system block size, in
increments of one segment at a time
Provides a scalable solution
Limitations:
Is not recommended for mission critical systems
Provides no recovery from data loss
Enables one bad sector to affect the entire disk of data
.....................................................................................................................................................
9-20 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

RAID 1: Mirrored Disk


Copy of files stored on mirror disk
Disk array controller
Disk 1
Disk 1
Mirror
Disk 2
Disk 2
Mirror
File A (a)
File A (b)
File A (a)
File A (b)
File B (c)
File B (d)
File B (e)
File B (c)
File B (d)
File B (e)
Copyright Oracle Corporation, 1999. All rights reserved.

RAID 1: Mirrored Disk


Benefits:
Complete data redundancy
No performance penalty
Improves reads
Scalability
Limitations:
Highest cost of all RAID configurations
.....................................................................................................................................................
Data Warehousing Fundamentals 9-21
.....................................................................................................................................................
Protecting the Database
RAID Level 1: Mirrored
RAID Level 1 (or mirroring) provides the simplest level of redundancy. One primary
disk is mirrored by another disk in the RAID set. The number of mirror disks is
scalable, but the capacity of the RAID set is not.
Mirroring doubles the size of your disk set. Higher levels of RAID require special
equipment but reduce the number of extra disks needed. This enables you to get more
data onto a system.
This method gives the following benefits:
Complete data redundancy
No performance penalty; in fact, RAID-1 improves performance for reads
Scalable
The limitation of this method is that it bears the highest cost of all RAID
configurations.
.....................................................................................................................................................
9-22 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

RAID 5: Independent Disk Array


Data striped with parity across array
File C (b)
File C (e)
P (d,e,f,g,h)
File C (a)
File C (d)
File (h)
P (i,j)
File C (c)
File C (f)
File C (i)
P (a,b,c)
File C (g)
File C (j)
Disk array controller
Disk 1 Disk 2
Disk 3 Disk 4
Copyright Oracle Corporation, 1999. All rights reserved.

RAID 5: Independent Disk Array


Benefits:
Efficient data integrity
Data reconstruction
Multiple concurrent seeks across array
Scalable
Limitations:
Disk overhead
Data write rate
A warehouse typically uses RAID 0, 1, or 5
.....................................................................................................................................................
Data Warehousing Fundamentals 9-23
.....................................................................................................................................................
Protecting the Database
RAID Level 5: Independent Disk Array (IDA)
RAID-0 is designed to engage all disk drives in the array at the same time on the same
read and write operation. However, RAID-5 is designed to engage as many drives as
possible at the same time on different read and write operations.
The stripe size is system-dependent, as with RAID-0. When the host sends a portion of
data to be written to disk, the RAID controller breaks it up into smaller portions,
according to the stripe size, and writes the portions to the disks in parallel.
The parity information is interleaved throughout the disk array and is marked by a
parity segment.
This method gives the following benefits:
Efficient data integrity
Reconstruction of data from a failed disk (as long as it is not the parity disk)
Multiple concurrent seeks across disk array
Scalability
The limitation of this method is the disk overhead and reduced data write rate.
Note: A typical data warehouse employs RAID-0, RAID-1, or RAID-5.
.....................................................................................................................................................
9-24 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

Backup
Plan at the design stage
Use hot backups for VLDBs
Back up necessary components:
Fact and dimension data
Warehouse schema
Metadata schema
Metadata
Export/Import
utility
Disk space
Time
.....................................................................................................................................................
Data Warehousing Fundamentals 9-25
.....................................................................................................................................................
Protecting the Database
Backup
The backup and recovery strategy for a warehouse needs to be considered at the design
stage. Details such as how the data is partitioned greatly affect the strategy. For small
and medium databases, daily cold backups (taken while all instances of the database
are shut down) and export/import are viable backup tools.
However, once you move to very large databases (VLDBs), complete cold backups
become difficult to fit into an overnight window. In addition, the disk space required
for a complete export of a large database becomes an issue. You need to consider other
strategies, such as using tape or other devices.
The defined backup strategy for the warehouse should allow for hot backups, where
you can back up any part of the database at any time of the day, while the database
instances are still active. With Oracle, this means backing up individual and active
tablespaces.
You should back up every component that is essential to warehouse operations
everything required to restore a working environment:
Fact data
Dimension data
Data warehouse and metadata schema
Data warehouse metadata
Export/Import
The export/import utility enables an entire or part of a database to be extracted into a
dump file and then imported into another database (under another owner if required).
Generally, import/export of a VLDB uses too much disk space. You could use named
pipes to a disk on a UNIX system to overcome space problems. However, this
technique would be very time-consuming.
.....................................................................................................................................................
9-26 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

Summary
This lesson discussed the following topics:
Explaining vertical partitioning and horizontal
partitioning
Distinguishing the different types of partitioning
methods
Distinguishing between B-tree index and bitmap
index
Understanding why warehouse typically uses
RAID 0, 1, or 5 to protect the database
.....................................................................................................................................................
Data Warehousing Fundamentals 9-27
.....................................................................................................................................................
Summary
Summary
This lesson discussed the following topics:
Discussing vertical partitioning and horizontal partitioning
Distinguishing the different types of partitioning methods
Distinguishing between B-tree index and bitmap index
Understanding why warehouse typically uses RAID 0, 1, or 5 to protect the
database
.....................................................................................................................................................
9-28 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
Copyright Oracle Corporation, 1999. All rights reserved.

Practice 9-1 Overview


This practice covers the following topics:
Defining partitioning method
Identifying indexing method
Determining RAID levels and providing
justification for each of the level
.....................................................................................................................................................
Data Warehousing Fundamentals 9-29
.....................................................................................................................................................
Practice 9-1
Practice 9-1
1 For the following description, state the type of partitioning method it best
describes. The partitioning methods are range partitioning, hash partitioning, and
composite partitioning.
Description Partitioning Method
Places specific ranges of table entries on different disks. For
example, records having name as a key may have names
beginning with A-B in one partition, C-D in the next, and so
on. Likewise, a DSS managing monthly operations might
partition each month onto a different set of disks.
Distributes DBMS data evenly across the set of disk
spindles. This partitioning method is applied to one or more
database keys, and the records are distributed across disk
subsystems accordingly.
The drawback of this partitioning method is the quantity of
data may vary significantly from one partition to another and
the frequency of data access may vary as well. For example,
as the data accumulates, it may turn out that a larger number
of customer names fall into the M-N range than the A-B
range.
This partition method is a combination of two partitioning
methods. A table that is partitioned using this method is
initially partitioned by range, and then subpartitioned using
the hash method.
.....................................................................................................................................................
9-30 Data Warehousing Fundamentals
.....................................................................................................................................................
Lesson 9: Planning Warehouse Storage
2 For each of the following descriptions, state the type of indexing method it best
describes. The indexing methods are B-tree, bitmap, and index-organized tables.
3 Form into small groups, and consider each of the following questions. For each
question, discuss in your groups and present your groups answers to the class at
the end of the discussion.
a How does RAID-5 differ from RAID-1?
b How do I decide between RAID-5 and RAID-1?
c What variables can affect the performance of a RAID-5 device?
d What types of files are suitable for placement on RAID-5 devices?
4 For each of the descriptions below, assign the RAID level, such as RAID Level 0,
RAID Level 1, or RAID Level 5.
Description Indexing Method
Contains a hierarchy of highest-level and succeeding lower-
level index blocks. The upper level blocks are called branch
blocks, and they point to the lower-level blocks. The leaf
blocks are the lower-level blocks and they contain the unique
ROWID that points at the location of the actual row.
This indexing method will benefit queries in which the
WHERE clause contains multiple predicates on low-
cardinality columns.
This method merges table data and index data into one
structure. Thus, the data is the index and the index is the
data.
Description RAID Level
This RAID level has the lowest cost and highest performance.
This RAID level is low cost and has high availability.
This RAID level has high performance and high availability.
Table Row ID Male Female
0001 1 0
0002 0 1
0003 0 1
0004 1 0
Each row has
a bit for each key
Each key value has
a bit for each row.

You might also like