You are on page 1of 22

Lecture 05 Tue, Feb 17, 2009 1800 : 2100 FAST NU, Karachi

Architectural Components
3 Major Areas Data Acquisition

Extraction, Transformation, Cleansing, Integration, Staging Loading, Archiving, Management Reports, Query Processing, Complex Analysis

Data Storage

Information Delivery

Building Blocks of the Data Warehouse Source Data Data Staging Data Storage Information Delivery Metadata Management and Control
2

Architectural Components
Management & Control Source Data
External

Information Delivery

DATA ACQUISITION

DATA STORAGE

Metadata MDDB Data Marts

Data Warehouse

INFORMATION DELIVERY

Data Mining

Production

OLAP

Internal

Archived

Data Storage

Reports / Queries

Data Staging

Infrastructure Supporting Architecture


Operational
People Procedures Training

Management Software

Physical
Hardware Operating System DBMS Network Software
4

Platform Options
Single Platform Option Hybrid Option Source Data Platforms Staging Area Platforms Options for Staging Area Source Data Platforms Data Storage Platforms Separate Platforms Data Movement Options Shared Disk Mass Data Transmission Real Time Connection Manual Methods
5

Server Hardware
SMP (Symmetric Multiprocessing)
Clusters MPP (Massively Parallel Processing) ccNuma or NUMA (Cache-coherent Nonuniform

Memory Architecture)

Symmetric Multiprocessing
Features
Shared everything architecture Simplest parallel processing

Benefits

Proven technology since 1970 Workload balance Scalable performance Easy administration

Limitations
Limited available memory Limited bandwidth Limited availability

Consideration
Data warehouse size is two to three hundred gigabytes and concurrency

requirements are reasonable

Symmetric Multiprocessing
Processor Processor Processor Processor

Common Bus

Shared Memory

Shared Disks
8

Clusters
Features

Each node has one or more processors and associated memory Memory is shared within each node only High speed bus communication Shared disks Cluster of nodes

Benefits
High availability Preserves the concept of one database Incremental growth

Limitations
Bus bandwidth High O/S overhead Cache consistency maintenance for inter-node synchronization

Consideration
If data warehouse is expected to grow in a well defined increments
9

Clusters
Processor Processor Processor Processor

Shared Memory

Shared Memory

Common High Speed Bus


Shared Disks

10

Massively Parallel Processing


Features

Shared nothing architecture Focus of disk access than memory access Works well with O/S that supports transparent disk access Inter-node communication through processor to processor connection
Highly scalable Fast access between nodes Improved system availability Cost per node is low Requires rigid data partitioning Restricted data access Limited work load balance Cache consistency must be maintained

Benefits

Limitations

Considerations
Medium to large size data warehouse of four to five hundred gigabytes

11

Massively Parallel Processing


Processor Processor Processor Processor

Memory

Memory

Memory

Memory

Disk

Disk

Disk

Disk

12

Cache-coherent Nonuniform Memory Architecture


Features
New architecture, since early 1990s Big SMP broken into smaller SMP Single real memory address space over entire machine

Benefits

Maximum flexibility Overcome memory limitations of SMP Better scalability than SMP Partitioning with centralized approach

Limitations
Complex programming Limited software support Still maturing

Consideration
For experienced technology users

13

Cache-coherent Nonuniform Memory Architecture


Processor Processor Processor Processor

Shared Memory

Shared Memory

Disks

Disks

14

Software Tools

Data Modeling Data Extraction Data Transformation Architecture First, Data Loading Then Tools Data Quality Queries and Reports OLAP Alert Systems Middleware and Connectivity Data Warehouse Management
15

Metadata
Definitions Data about data Table of contents for the data Catalog for the data Data warehouse atlas Data warehouse roadmap Data warehouse directory The nerve center

16

Metadata Example
Entity Name Definition Remarks Source Systems Created Date Last Update Date Update Cycle Last Full Refresh Full Refresh Cycle Data Quality Reviewed Last Deduplication Planned Archival Responsible User Customer Alias Names Account, Client A person or an organization that purchases good or services from the company It includes regular, current and past customers Finished Goods Orders, Maintenance Contracts, Online Sales January 15, 1999 January 21, 2001 Weekly December 29, 2000 Every Six Months January 25, 2001 January 10, 2001 Every Six Months Jane Brown
17

Need of Metadata
For Using Data Warehouse
For Building Data Warehouse For Administering Data Warehouse Who needs it? IT Professionals Power Users Casual Users

18

A Nerve Center
Source Systems Extraction Tools Query Tool Reporting Tool

Cleansing Tools

Data Warehouse Metadata


Data Mining Data Load Function External Data

OLAP Tool

Transformation Tools Applications

19

Metadata by Functional Areas


Data Acquisition Extraction, Transformation, Cleansing, Integration, Staging Data Storage Loading, Archiving, Management Information Delivery Reports, Query Processing, Complex Analysis Business Metadata Technical Metadata

20

Metadata Requirements
Capturing and Storing Data
Variety of Metadata Sources Metadata Integration Metadata Standardization Rippling through Revisions Keeping Metadata Synchronized Metadata Exchange

21

Metadata Sources
Source Systems
Data Extraction Data Transformation and Cleansing Data Loading Data Storage Information Delivery

22

You might also like