You are on page 1of 64

DW Concepts

Introduction to DW
Author: Deepak Natarajan & Aditya Gollapudi

Systech Education

Course Agenda

Session 1

What is DW? Elements of DW OLTP vs. OLAP Types of OLAP Implementing Life Cycle
Systech Education

Session 2

Session 3

Introduction to DW

Purpose

The purpose of this module is to give an insight into the basic concepts and terminology of data warehousing and business intelligence

Introduction to DW

Systech Education

Objective

Upon completion of this chapter a participant can


Define a Data warehouse. Understand the elements of DW. Differentiate between OLTP and OLAP system. Explain the types of OLAP system Understand the implementation life cycle

Introduction to DW

Systech Education

What is DW?

Data Warehouse

Systech Education

The Problem
IBM Hunter # ?

?
VAX
WLIC ? WASTE

HP SI ERS

Freds PC ?
Introduction to DW
Systech Education

The Solution

DATA WAREHOUSE

meta data

Introduction to DW

Systech Education

Data Warehouse

A warehouse is a subject-oriented, integrated, timevariant and non-volatile collection of data in support of management's decision making process.

Introduction to DW

Systech Education

Data Warehouse

Subject Oriented:

Data that gives information about a particular subject instead of about a company's ongoing operations. Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.

Integrated:

Introduction to DW

Systech Education

Data Warehouse

Time-variant:

All data in the data warehouse is identified with a particular time period. Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.

Non-volatile

Introduction to DW

Systech Education

10

Elements of DW

Source System Staging Area Dimension Model Data Mart Data Warehouse ODS EDW Kimballs Approach Inmons Approach

Systech Education

Source System

An operational system of record whose function is to capture the transactions of the business. Characteristics of Source System:

Priorities are uptime and availability. Queries against source system are narrow and severely restricted. Maintains little historical data.

Introduction to DW

Systech Education

12

Staging Area

It is a storage area and set of processes that clean, transform, combine, de-duplicate, household, archive and prepare source data for use in the data warehouse. Characteristics of Staging Area:

It is layered between the source system and presentation server. It does not provide query and presentation services.
Systech Education

Introduction to DW

13

Dimensional Model

It is a technique for modeling data that is alternative to entity-relationship (E-R) modeling. A dimensional model contains the same information as an E-R model but packages the data in a symmetric format. Components Of Dimensional Model

Fact Table

Is the primary table in each dimensional model that is meant to contain measurements of the business. Each dimension is defined by its primary key that serves as the basis for referential integrity with any given fact table.
Systech Education

Dimension Table

Introduction to DW

14

Data Mart

A logical subset of the complete data warehouse. A data mart is usually built for a single part of the business and organized around single business process.

Every data mart must be represented by a dimensional model. Basis for top-down and bottom-up approach in data warehouse.
Systech Education

Introduction to DW

15

Data Warehouse

The queryable source of data in the enterprise. The data warehouse is the union of all the constituent data marts.

Historical data is maintained. Data is fed from the data staging area. It is also frequently updated on a controlled load basis as data is corrected, snapshots are accumulated and label are changed.
Systech Education

Introduction to DW

16

Operational Data Store

An ODS is an integrated database of operational data. Its sources include legacy systems and it contains current or near term data. An ODS may contain 30 to 60 days of information, while a data warehouse typically contains years of data.

Introduction to DW

Systech Education

17

Operational Data Store

An ODS is usually designed to contain low level or atomic (indivisible) data such as transactions and prices. Only Data warehouse contains aggregate data.

Introduction to DW

Systech Education

18

Enterprise Data Warehouse

An Enterprise Data Warehouse is a data warehouse containing all publishable quality data of a permanent nature collected by an organization. This inevitably includes historic data from multiple data sources.

Introduction to DW

Systech Education

19

Operational transaction data is usually excluded due to its volatile nature. Enterprise data warehouses are valuable resources, are costly to construct, and require a long time to evolve.

Introduction to DW

Systech Education

20

Kimballs Approach

Start with clearly defined user requirements. Build a subject area at a time based on a star schema. The data warehouse is the union of all the data marts but only if the dimensions conform across all the fact tables. Before loading the facts and dimensions, first concentrate on the staging area.
Systech Education

Introduction to DW

21

Inmons Approach

Advocates normalized enterprise data warehouse. Time variant data structures. Dont be concerned about requirements too much up front.

Build it and they will come.

Introduction to DW

Systech Education

22

OLTP vs. OLAP

OLTP ER Model OLAP ER Model Relationship between ER Model and Dimension Model Advantages of Dimension Modeling

Systech Education

OLTP - ER Model

Logical design technique that seeks to eliminate data redundancy conceptual data model that views the real world as entities and relationships Entity-Relationship diagram is used to visually represents data objects

Introduction to DW

Systech Education

24

Facts about ER Models

The ER modeling technique is a discipline used to illuminate the microscopic relationships among data elements. The highest art form of ER modeling is to remove all redundancy in the data. This is immensely beneficial to transaction processing because transactions are made very simple and deterministic.
Systech Education

Introduction to DW

25

Example:- The transaction of updating a customer's address may devolve to a single record lookup in a customer address master table. This lookup is controlled by a customer address key, which defines uniqueness of the customer address record and allows an indexed lookup that is extremely fast. It is safe to say that the success of transaction processing in relational databases is mostly due to the discipline of ER modeling
Systech Education

Introduction to DW

26

Introduction to DW

Systech Education

27

OLAP Dimensional Modeling

Logical design technique that seeks to present the data in a standard framework that is intuitive and allows for high performance access. Every dimensional model is composed of

One table with a multi part key called the fact table A set of smaller tables called dimension tables

Introduction to DW

Systech Education

28

Introduction to DW

Systech Education

29

The Relation between Dimensional Modeling and Entity-Relationship Modeling

A single entity relationship diagram breaks down into multiple fact table diagrams ER diagrams are useful, but they are meant to be viewed in small sections, not all at once.

Introduction to DW

Systech Education

30

OLTP

OLAP

OLTP is a class of program that OLAP enables a user to easily and facilitates and manages transaction- selectively extract and view data oriented applications, typically for from different points-of-view. data entry and retrieval transactions Source of data: Operational data; OLTPs are the original source of the data Source of data: Consolidation data; OLAP data comes from the various OLTP databases

Introduction to DW

Systech Education

31

OLTP
Purpose of Data: To control and run fundamental business tasks What the data reveals: A snapshot of ongoing business processes

OLAP
Purpose of Data: To help with planning, problem solving, and decision support What the data reveals: Multi-dimensional views of various kinds of business activities

Introduction to DW

Systech Education

32

OLTP
Inserts and Updates: Short and fast inserts and updates initiated by end users Queries: Relatively standardized and simple queries returning relatively few records

OLAP
Inserts and Updates: Short and fast inserts and updates initiated by end users Queries: Often complex queries involving aggregations

Introduction to DW

Systech Education

33

OLTP
Processing speed: Typically very fast

OLAP
Processing speed: Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes Space requirements: Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP
Systech Education

Space requirements: Can be relatively small if historical data is archived

Introduction to DW

34

OLTP
Database design: Highly normalized with many tables

OLAP
Database design: Typically denormalized with fewer tables; use of star and/or snowflake schemas Data Access: Moderate access frequency; large quantities of data; predominantly reading operations

Data Access:Very frequent access; small quantities of data per operation; Reading, writing, modifying, deletion

Introduction to DW

Systech Education

35

OLTP
Backup and recovery: Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability

OLAP
Backup and recovery: Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method

Introduction to DW

Systech Education

36

Steps Involved in Converting ER Model to Dimensional Model

The first step in converting an ER diagram to a set of DM diagrams is to separate the ER diagram into its discrete business processes and to model each one separately. The second step is to select those many-to-many relationships in the ER model containing numeric and additive nonkey facts and to designate them as fact tables. The third step is to denormalize all of the remaining tables into flat tables with single-part keys that connect directly to the fact tables. These tables become the dimension tables.
Systech Education

Introduction to DW

37

Structure of a Dimensional Model evolved from an ER Model

The master DM model of a data warehouse for a large enterprise will consist of somewhere between 10 and 25 very similar-looking star join schemas. Each star join will have four to 12 dimension tables. If the design has been done correctly, many of these dimension tables will be shared from fact table to fact table.
Systech Education

Introduction to DW

38

Dimension Model and Drilling

Drilling Down means adding more dimension attributes to the SQL answer set from within a single star join. Drilling Up means removing more dimension attributes from the SQL answer set within a single star join. Drilling Across means linking separate fact tables together through the conformed (shared) dimensions.
Systech Education

Introduction to DW

39

Advantages of Dimensional Modeling


predictable, standard framework predictable framework of the star join schema withstands unexpected changes in user behavior extensible to accommodate unexpected new data elements and new design decisions administrative utilities and software processes helps to manage and use aggregates
Systech Education

Introduction to DW

40

Types of OLAP

ROLAP MOLAP HOLAP

Systech Education

Types of OLAP

ROLAP

Relational OLAP. ROLAP systems store data in the relational database. Multidimensional OLAP. MOLAP systems store data in the multidimensional cubes. HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP.
Systech Education

MOLAP

HOLAP

Introduction to DW

42

ROLAP
Source Systems Data Warehouse Server

Clients

Relational Viewers SQLRead Relational Data

Introduction to DW

Systech Education

43

MOLAP
RDBMS Server MDBMS Server Clients

Multi Dimensional Viewers SQLRead Multi Dimensional Data

Introduction to DW

Systech Education

44

Introduction to DW

Systech Education

45

HOLAP
RDBMS Server MDBMS Server Clients

Multi Dimensional Viewers SQL Reach Trough Multi Dimensional Data

Introduction to DW

Systech Education

46

Implementing Life Cycle

Planning Business Requirements Definition Dimensional Modeling Physical Design Data Staging Design and Development Technical Architecture Design

Systech Education

Implementing Life Cycle (Contd..)


Product Selection and Installation End User Application Specification End User Application Development Deployment Maintenance and Growth Project Management

Systech Education

Lifecycle Approach

Successful implementation of a data warehouse depends on the appropriate integration of numerous tasks and components. You need to coordinate the many facets of a data warehouse and demonstrate strength across all aspects of the project for success. The Business Dimensional Lifecycle ensures that the project pieces are brought together in the right order and at the right time.
Systech Education

Introduction to DW

49

Business Dimensional Lifecycle


Technical Architecture Design Product Selection & Installation
Technology Track

Project Planning

Business Requirement Definition

Dimensional Modeling

Physical Design

Data Staging Design & Development End-User Application Development

Deployment
Data Track Application Track

Maintenance & Growth

End-User Application Specification

Project Management

Introduction to DW

Systech Education

50

Project Planning

The lifecycle begins with project planning. It addresses the definition and scoping of the data warehouse project, including early critical tasks like readiness assessment and business justification. Then project planning focuses on resource and skill-level staffing requirements coupled with project task assignments, duration and sequencing. Project planning is dependent on the business requirements, as denoted by the two-way arrow between the activities.
Systech Education

Introduction to DW

51

Business Requirements Definition

A data warehouses success is greatly increased by a sound understanding of the business end users and their analytical requirements. The designers must understand the key factors driving the business to effectively determine business requirements and translate them into design considerations. The business requirements establish the foundation for the three parallel tracks focused on technology, data and end user applications.
Systech Education

Introduction to DW

52

Dimension Modeling

The definition of the business requirements determines the data needed to address business users analytical requirements. Then data models to support these analyses are designed by the construction a matrix that represents key business processes and their dimensionality which will serve as a blueprint to ensure that the data warehouse is extensible across the organization over time.
Systech Education

Introduction to DW

53

Dimension Modeling (Cont.)

Coupling this data analysis with our earlier understanding of the business requirements, we then develop a dimensional model with a fact table grain, associated dimensions, attributes, and hierarchical drill paths and facts. The logical database design is completed with the appropriate table structure and primary/foreign key relationships and also the preliminary aggregation plan is also developed.
Systech Education

Introduction to DW

54

Physical Design

Physical database design focuses on defining the physical structures necessary to support the logical database design. Primary elements include defining the naming standards and setting up the database environment. Preliminary indexing and partitioning strategies are also determined.
Systech Education

Introduction to DW

55

Data Staging Design & Development

The data staging process has three major steps:

Extraction, Transformation and Load

The extract process always exposes data quality issues that have been buried within the operational data store. If not addressed properly they can significantly impact the credibility of the data warehouse. Also two warehouse staging processes need to be build:

One for the initial population of the data warehouse. Another for the regular, incremental loads.
Systech Education

Introduction to DW

56

Technical Architecture Design

The technical architecture design establishes the overall architecture framework and vision. Three factors must be considered simultaneously to establish the data warehouse technical architecture design:

Business requirements, current technical environment, and planned strategic technical directions.

Introduction to DW

Systech Education

57

Product Selection and Installation

Using the technical architecture design as framework, specific architectural components such as the hardware platform, database management system, data staging tool, or data access tool will need to be evaluated and selected. A standard technical evaluation process is defined along with specific evaluation factors for each architectural component. Once the product has been evaluated and selected, they are then installed and thoroughly tested to ensure end-to-end integration with the data warehouse environment.
Systech Education

Introduction to DW

58

End User Application Specification

A set of standard end user application is usually defined since not all business users need ad hoc access to the data warehouse. Application specification describe the report template, user driven parameters, and required calculations. These specifications ensure that the development team and the business users have a common understanding of the applications to be delivered.
Systech Education

Introduction to DW

59

End User Application Development

The development of the end user applications involves configuring the tool metadata and constructing the specified reports. Optimally these applications are build using an advanced data access tool that provides significant productivity gains for the development team. In addition, it offers a powerful mechanism for business users to easily modify existing report templates.
Systech Education

Introduction to DW

60

Deployment

Deployment represents the convergence of technology, data, and end user applications accessible from the business users desktop. Extensive planning is required to ensure that these puzzle pieces fit together properly. Business users education integrating all aspects of the convergence must be developed and delivered. In addition, user support and communication or feedback strategies should be established before any business users have access to the data warehouse.
Systech Education

Introduction to DW

61

Maintenance & Growth

Focus on the business users must continue by providing them with ongoing support and education. Also focus on the backroom must continue to ensure that the processes and procedures are in place for effective ongoing operations of the data warehouse.

Introduction to DW

Systech Education

62

Project Management

Used to ensure that Business Dimensional Lifecycle activities remain on track and in sync. Activities include:

Monitoring project status Issue tracking Change control to preserve scope boundaries The development of a comprehensive project communication plan that addresses both the business and information systems organizations
Systech Education

Introduction to DW

63

Guidelines for using the Business Dimension Life Cycle

Business Lifecycle diagram identifies:


High level task sequencing Activities that should be happening concurrently throughout the technology data Application tracts

Focus on sequencing and concurrency and not absolute timelines.

Introduction to DW

Systech Education

64

You might also like