Professional Documents
Culture Documents
The Data Warehousing Institute takes pride in the educational soundness and technical
accuracy of all of our courses. Please give us your comments wed like to hear from
you. Address your feedback to:
email: info@dw-institute.com
Publication Date:
May 2004
Copyright 2002-2004 by The Data Warehousing Institute. All rights reserved. No part
of this document may be reproduced in any form, or by any means, without written
permission from The Data Warehousing Institute.
ii
TABLE OF CONTENTS
Module 1
1-1
Module 2
2-1
Module 3
3-1
Module 4
4-1
Module 5
5-1
Appendix A
A-1
iii
Module 1
Data Warehousing Concepts
Topic
Data Warehousing Basics
Page
1-2
1-10
1-16
1-30
1-34
1-36
Readiness Assessment
1-38
1-1
impact
realizes business value
Outcome
achievement, discovery
Action
insight, resolve, decision, innovation
done by people
Knowledge
recall, experience, instinct, beliefs
Information
done by software
& databases
facts, metrics
Data
descriptive, quantitative, qualitative
1-2
INFORMATION
KNOWLEDGE
ACTIONS AND
OUTCOMES
IMPACT AND
VALUE
1-3
Source
Data
source to warehouse
ETLs
intake
integration
distribution
Data
Staging
Data
Warehouse
delivery
access
Data Marts
Marts
Data
views, views,
web reports,
spreadsheets,
etc.) etc.)
(cubes, cubes,
(star-schema,
web reports,
spreadsheets,
1-16
INTAKE
INTEGRATION
Integration describes how the data fits together. The challenge for the
warehousing architect is to design and implement consistent and
interconnected data that provides readily accessible, meaningful business
information. Integration occurs at many levels the key level, the
attribute level, the definition level, the structural level, and so forth
(Data Warehouse Types, www.billinmon.com) Additional data cleansing
processes, beyond those performed at intake, may be required to achieve
desired levels of data integration.
DISTRIBUTION
DELIVERY
ACCESS
Data stores with access responsibility are those that provide business
retrieval of integrated data typically the targets of a distribution process.
Access-optimized data stores are biased toward easy of understanding and
navigation by business users.
1-17
Source
Data
source to warehouse
ETLs
intake
integration
Data Warehouse
distribution
delivery
Data Marts
(star-schema and/or cubes)
access
queries & analysis
Business Intelligence Tools
The Inmon
Data Warehouse
Source
Data
source to warehouse
ETLs
intake
integration
Data
Warehouse
distribution
Data Marts
1-18
KIMBALLS
DEFINITION (BUS)
Kimball defines the warehouse as nothing more than the union of all the
constituent data marts. (Ralph Kimball, et. al, The Data Warehouse Life
Cycle Toolkit, Wiley Computer Publishing, 1998) This definition
contradicts the concept of the data warehouse as a single-source hub. The
Kimball data warehouse assumes all data store roles -- intake, integration,
distribution, access, and delivery
DIFFERENCES IN
PRACTICE
Kimball Warehouse
intake
integration
distribution
Distribution is insignificant
because data marts are a subset
of the data warehouse
access
delivery
1-19
Architecture
Implementation
Operation
1-34
IMPLEMENTATION
RESULTS
OPERATION
RESULTS
Operation is the phase where data warehousing delivers value. That value
is realized through business services that provide data and information
and enable confident decisions and positive actions. Training, support,
and administration are also key elements of data warehouse operation.
1-35
Module 2
Data Warehousing Architecture
Topic
Business Architecture
Page
2-2
Data Architecture
2-10
Technology Architecture
2-46
Project Architecture
2-48
Organizational Architecture
2-58
2-1
Business Architecture
Business Processes
events / transactions
activities
ce
kfor
wor
product
sources
inputs
ss
e
n
si ss
bu roce
p
customers
2-6
Business Architecture
Business Processes
UNDERSTANDING
BUSINESS
PROCESSES
Business processes are the things that a business does to produce its
products, deliver its services, manage its infrastructure, etc. Every
business process can be understood in terms of the components of that
process:
2-7
Data Architecture
Data Modeling Concepts
Contextual
Models
2-20
Source Composition
Source Subjects
Triage
Warehousing Subjects
Business Questions
Facts & Qualifiers
Target Configuration
Conceptual
Models
Logical
Models
Structural
Models
Physical
Models
Implemented Warehousing
Databases
Implemented
Data
Data Architecture
Data Modeling Concepts
FAMILIAR DATA
MODELING
PRINCIPLES
WAREHOUSE
MODELING
DIFFERENCES
Both source data and warehousing data need to be modeled. Each is modeled separately, and
they are associated through a technique called triage.
Warehouse data uses range from publishing and managed query to complex OLAP applications
and data mining. The ideal data structure depends on planned uses of the data.
The complexities of warehouse data modeling require that many modeling techniques be used.
Matrix models, E/R models, subject models, dimensional models, star-schema, and snowflakeschema are all used to meet various data modeling needs.
Redundancy and time-variance combine to make a very large database (VLDB) a common
warehouse consideration. Optimizing for data volumes and database size is a common
requirement of warehouse modeling.
2-21
Data Architecture
Integration and Data Flow Standards
Hub and Spoke Integration
Data
Sources
Integration
Hub
Data
Mart
Data
Mart
Data
Mart
Bus Integration
Data
Sources
Integration Bus
Data
Mart
2-34
Data
Mart
Data
Mart
Data Architecture
Integration and Data Flow Standards
HUB-AND-SPOKE
INTEGRATION
BUS INTEGRATION
2-35
Project Architecture
Methodology
Top-Down Development
Bottom-Up Development
Incremental Deployment
Hybrid Methods
Incremental
Deployment
Incremental
Enterprise Modeling
Incremental Development
Planning
Identify Business
Area Scope
Integration Structure
Design & Development
2-50
Project Architecture
Methodology
TOP-DOWN
BOTTOM-UP
BALANCING
ENTERPRISE &
BUSINESS UNIT
FOCUS
2-51
Organizational Architecture
Program, Project & Operations Roles
BI Program
sponsorship
program management
data governance
BI Projects
integration design
database development
ETL development
project management
metadata management
2-64
BI Operations
BI application development
architecture specification
quality management
Organizational Architecture
Program, Project & Operations Roles
ROLES AND
RESPONSIBILITIES
Program Management
Sponsorship
Advocacy, political will, resource acquisition, issue resolution, expectation setting, etc.
Data Governance
Data definitions, business rules alignment, data quality management, access authorization, etc.
Business basis for data rules about content, relationships, correctness, integrity, etc.
Requirements for data & information, service levels, quality & reliability, etc.
Architecture Specification
Frameworks & standards for business alignment, data, technology, projects, etc.
Quality Management
Beyond data quality quality of information, delivery, interface, reporting, services, etc.
Meta data strategy, meta data implementation, meta data content, etc.
BI Project Roles & Responsibilities
Project Management
Integration Design
Database Development
ETL Development
BI Application Development
Analysis, design, construction, and deployment of information services & analytic applications
BI Operations Roles
Maintenance and support of data migration processes; Continuous data quality management
2-65
Module 3
Data Warehouse Implementation
Topic
Page
Implementation Planning
3-2
3-8
3-22
Deployed Technology
3-40
Implementation Components
3-44
Delivery Results
3-48
3-1
Contextual
Models
Source Composition
Source Subjects
Triage
Conceptual
Models
Product
LOB
lob-code
e
lob-nam
hic Area
Geograp
REGION
rgn-code
e
rgn-nam
T
DISTRIC
er
b
m
u
n
diste
dist-nam
ZONE
mber
zone-nu e
m
zone-na
3-16
Warehousing Subjects
Business Questions
Facts & Qualifiers
Target Configuration
Logical
Models
CT LINE
PRODU e
line-cod n
criptio
e
n
li -des
CT
PRODU
id
tc
u
prod
desc
tc
u
d
ro
p
name
product-
SIZE OF SE
MER BA
S
CU TO
r-count
custome ount
ld-c
househo
Structural
Models
Time
YEAR
ber
year-num
ER
QUART er
mb
u
n
quarter-
MONTH
umber
month-n
3-17
product-type
product-SKU
product-code
payment-method
transaction-status
register-id
transaction-amt
store-number
transaction-time
customer address
renewal date
membership date
customer name
member number
PRODUCT
member-number
membership-type
MEMBERSHIP MASTER
date-joined
date-last-renewed
term-last-renewed
date-of-last-activity
last-name
first-name
business-name
address
city-and-state
zip-code
POINT-OF-SALE DETAIL
transaction-date
SALES TRANSACTION
CUSTOMER
date-time
terminal-id
transaction-id
line-number
SKU
3-25
membership-type
MEMBERSHIP MASTER
date-joined
date-last-renewed
term-last-renewed
date-of-last-activity
product-descrip.
product-type
product-SKU
product-code
PRODUCT
payment-method
transaction-status
register-id
transaction-amt
store-number
transaction-time
customer address
renewal date
membership date
customer name
member number
member-number
cells
exp
to ide and
transf ntify
or
by typ mations
e & na
me
last-name
first-name
business-name
address
city-and-state
zip-code
POINT-OF-SALE DETAIL
transaction-date
SALES TRANSACTION
CUSTOMER
date-time
terminal-id
transaction-id
line-number
log
transf ic of
orm
is sep ations
a
docum rately
ented
SKU
Transform Need
Description
Specify Selection
Requirements
Identifies and describes the selection processes needed to choose among multiple sources. The
objective is to select the best data to be used for warehouse population. Selection requirements
may exist at both data store and date element levels.
Specify Filtering
Requirements
Identifies and describes the filtering processes needed to choose records from a source file (or
rows from a source table) to be used for data warehouse population.
Identifies and describes the conversion and translation processes which change the formats and
values of data elements. Conversion processing achieves consistency of formats and value sets
among data extracted from multiple sources. Translation processes change data formats and
values from encoded and cryptic to descriptive and meaningful.
Identifies and describes needed derivation processes used to develop a value for a single data
element by applying logic to the values of some other data elements. It also identifies and
describes the processes through which summary data values are created.
Specify Clean-up
Requirements
Identifies and describes the clean-up processing needed to ensure quality and integrity of the data
that is placed into the data warehouse. Clean-up needs may exist at both data record and data
element levels. Among the issues of clean-up processing are intra- and inter-record consistency
checking, and decisions regarding elements with null values or invalid values.
3-35
Deployed Technology
Range and Roles of Technology
3-42
desktop
wireless
voice
Analytic Applications
BPM (scorecards & dashboards)
CRM Analytics
Operations Analytics
Collaboration
E-mail, Groupware, Workflow
Text Analysis
Text Search & Text Mining
Content Management
Infrastructure
Storage, Servers, Databases, Metadata, Administration & Management, Networking
web
Data Integration
Modeling, Mapping, Cleansing, ETL
Data Resources
Operational Systems, Documents, Images, External Data, Audio/Visual
Deployed Technology
Range and Roles of Technology
TECHNOLOGY
ROLES AND
RELATIONSHIPS
Delivery
Delivery media includes web portals, desktop clients, email, wireless, voice print, pager and fax.
Delivery technology sets include (1) B2E Portal intranet business-to-enterprise delivery to the
workforce, (2) B2B Portal internet business-to-delivery to vendors, customers, partners and anyone
with internet access, (3) B2C Portal extranet business-to-customer delivery.
Analytic Applications
Analytic applications are the technology components of business applications, ranging from static
reporting to dashboards and scorecards. They place information into business function context, i.e.
Customer Relationship Management (CRM), Supply Chain Management (SCM), Business
Performance Management (BPM), etc.
Analytic Application
Development Tools
Tools, templates, and packaged applications to quickly build views, reports, dashboards, scorecards,
and other applications to deliver information in context of a business function or business process.
Collaboration
Web applications to support employees, partners, customers, vendors and others to collaborate on
documents, share business metrics, manage content, and work collectively. While reporting is still
dominant today, collaboration capabilities will grow as the technology and market place mature.
Data access and analysis tools are todays most common delivery technologies. Unlike analytic
applications, these tools focus on data before information, and they provide less business context
than analytic applications. The most widely-used tools include managed reporting, query, and OLAP.
Text Analysis
Text analysis tools use semantics and statistical techniques to identify, tag, and select relevant
content from text documents. Parsing, pattern recognition, natural language processing and other
advanced techniques are used to transform unstructured text into data and/or information structures.
Data warehouses and data marts integrate and reconcile data from multiple data sources. Their
purpose is to prepare data to serve as the raw material from which information is created. Regardless
of the multiple definitions of data warehouse and data mart that are used in the industry, all
warehouses and marts exist primarily to serve this purpose.
Content Management
Data Resources
This class includes all sources from which data can be acquired. When both internal and external data
are considered, and when both structured and unstructured data are included, the range of possible
source technologies becomes exceptionally broad.
Infrastructure
This technology class describes the underlying hardware, software, networking, administration and
support structures upon which systems and data sources are constructed and operated.
3-43
Delivery Results
Data and Information Services
Executive
Business Manager
Knowledge Worker
Expert
Regular
Use
Occasional
Use
Beginner
Training Services
y of s
a
r
r
a
ice
v
r
e
s
BI
Support Services
The right kinds of services matched to the customers roles, responsibilities, and experience level
3-50
Delivery Results
Data and Information Services
MEETING
CUSTOMER NEEDS
Classification of customers as
o Knowledge workers who carry out the day-to-day activities of the
business
o Managers responsible for performance of individual business
processes
o Executives responsible for business performance across many
business processes
Classification of services as
o Data access and information delivery services that make data and
information available to the business.
o Analysis and reporting services that deliver analytic applications
of greater complexity than simple data access and information
delivery.
o Training services that develop customer skill and ability to use
the data warehouse, with a goal of making each customer selfsufficient.
o Support services that augment the services culture, enhance
communications with customers, and ensure rapid resolution of
problems.
3-51
Module 4
Data Warehouse Operation
Topic
Page
Business Services
4-2
4-6
Managed Quality
4-14
Managed Infrastructure
4-16
4-1
Business Services
Valuable and Sustainable Services
Operation
business services
data refresh
managed platforms
managed environment
customer service
managed quality
managed infrastructure
E
AINABL
T
S
U
S
LE A N D
VICES
R
E
S
N
V A LU A B
O
TI
FO R M A
N
I
&
SS
A
DAT
BUSINE
E
H
T
R
FO
impact
Outcome
achievement, discovery
Action
insight, resolve, decision, innovation
Knowledge
recall, experience, instinct, beliefs
Information
facts, metrics
Data
descriptive, quantitative, qualitative
4-2
Business Services
Valuable and Sustainable Services
WAREHOUSING
FOR THE
LONG TERM
4-3
Managed Quality
4-14
Dimensions of Quality
Managed Quality
Dimensions of Quality
QUALITY
IMPROVEMENT
DIMENSIONS OF
QUALITY
Business intelligence quality is much more than simple data quality. Data
quality is, in fact, a relatively small and easy piece of the overall quality
domain. BI quality is measured and managed in three major categories:
Business Quality directly affects the business value derived from BI,
and the economic success of the BI program.
4-15
Managed Infrastructure
Processes, Technology, and People
program management
change management
quality management
data governance
development methodologies
project management
data warehouse administration
metadata management
data warehousing tools & technology
BI tools & technology
infrastructure tools & technology
BI roles & responsibilities
BI organizations
4-16
Managed Infrastructure
Processes, Technology, and People
COMPLEX
INFRASTRUCTURE
PROCESS
This course has already discussed the analytics processes of BI. When
successful, BI becomes a key component in decision making processes. It
depends, however, on many other processes to achieve this level of
success. The process components of BI infrastructure are program
management, change management, data governance, development
methodology, project management, data warehouse administration, and
metadata management.
TECHNOLOGY
While technology cant create BI, neither can BI be created without use of
technology. Blending the right technologies with the process and people
components of BI is a key to success. Technology infrastructure includes
data warehousing tools, BI tools, and enabling/infrastructure hardware
and software.
PEOPLE
People are integral to effective BI. Neither processes nor technology can
deliver value independently of the knowledge, decisions, and actions of
people. Human infrastructure is arguably the single most important of all
BI infrastructure categories. Identifying the right set of roles and
responsibilities, assigning them to people with the right skills, and
constructing the right kinds of organizations and relationships are all
critical to BI success.
4-17
Module 5
Summary and Conclusions
Topic
Page
Common Mistakes
5-2
5-6
5-1
Common Mistakes
From TDWIs 10 Mistakes Series
An effective project manager will not
1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
6
7
8
9
10
Hiring yourself.
Squelching disagreement.
Confusing titles with roles and responsibilities.
Talking the walk.
Thinking one size fits all.
Pointing fingers.
Interviewing only for technical skills.
Limiting leadership.
Becoming too task focused.
Believing that all decisions are created equal.
An effective data modeler will avoid
1
2
3
4
5
6
7
8
9
10
5-3
Kimball, Reeves, Ross & Thornthwaite The Data Warehouse Lifecycle Toolkit
1998, John Wiley & Sons
INTERNET SITES:
The Data Warehousing Institute (www.dw-institute.com)
Business Intelligence and Data Warehousing
The Data Administration Newsletter (www.tdan.com)
Information and Data Management
The Data Warehousing Information Center (www.dwinfocenter.org)
Data Warehousing Resources
Inmon Associates, Inc.(www.billinmon.com)
The Inmon Approach to Data Warehousing
The Ralph Kimball Group (www.rkimball.com)
The Kimball Approach to Data Warehousing
5-6