You are on page 1of 50

Data Warehouse Architecture

Agenda
    Business Intelligence Architecture Architectural Components Architectural Approaches Implementing architecture in SAP BI

Data Warehouse Architecture


 Data Architecture
 Business processes

 Infrastructure
 Hardware, networking, operating systems, desktops

 Technical
 Technologies used a various levels of the architecture i.e. query tools, extraction tools

Data Architecture
 Determined by business processes
 If you are a manufacturer then production orders, orders, shipping and billing are important

 Dimensional or E-R
 Depends on level of query detail  Different models at different levels (query, data staging, etc)

 MOLAP, ROLAP

Infrastructure
 Network capacity, servers, RAM, storage  Where you invest in infrastructure depends somewhat on needs
 Nightly updates - adequate staging horsepower  Worldwide availability - parallel or distributed servers  Customer-level analysis - large servers

Technical Architecture
 The technical architecture is driven by the metadata catalog/repository
 Metadata refers to the data that describes the data objects, structures, data flows, etc defined in the data warehouse  Metadata is stored independently of the business data allowing for more flexibility
For example, because connectivity parameters are not hardcoded in the ETL process data sources can easily be changed

Technical Architecture
 Extract  Data comes from multiple sources and is of multiple types. Data compression and encryption handling must be considered at this area, if it applies.  Transform  Data transformation includes surrogate key management, integration, de-normalization, cleansing, conversion, aggregation, and auditing.  Load  Loading is often done to multiple targets, with load optimization and support for the entire load cycle.  Security  Administrator access and data encryption policies.  Job control  This includes job definition, job scheduling (time and event), monitoring, logging, exception handling, error handling, and notification.

Agenda
    Business Intelligence Architecture Architectural Components Architectural Approaches Implementing architecture in SAP BI

Generic Architecture

Data Mart Layer Operational Data Store Data Warehouse Layer

Data Staging Layer

Source System

Source System

Source System

Source System

Data Staging Layer  Decouples extraction processes from data cleansing processes
 Reduces impact of loads on transaction system  Snapshot allows repeatable / re-startable ETL processes  Area to store data using different load schedules to the DW

 Data at this layer is not typically available to end users  Area to store 3rd party data

Data Staging Layer  Employ data manipulation not appropriate for the DW  Data in Data Staging Area is:  a copy of operational data  deleted after a short period so nonhistorical  in flat tables  atomic

Do you need a Data Staging Layer


Is real-time reporting required? Does OLTP run 24-7? How complex is the reporting requirement? Does OLTP hardware have enough resources to handle reporting and transaction processing?  What are the long term needs of the organization?    

Operational Data Store


 ODS is a hybrid of data warehouse and operational systems  Allows OLTP response times, update capabilities and DSS capabilities  Can be used for operational reporting because:
 Data from multiple source systems are integrated at this point  Operational reporting is typically more structured so dimensional models are not required

 Data in the ODS is:


    kept for 6 months so non-historical in flat tables or 3NF often a copy of operational data atomic

Operational Data Store


 Class I
 Real-time

 Class II
 Two-hour to four-hour

 Class III
 Daily

 Class IV
 Aggregated data from the data warehouse

Do you need Data Staging Layer


 Is real-time reporting required?  Is cross-domain reporting required (multiple source systems)?  How complex is the reporting requirement (in terms of aggregation)?

Enterprise Data Warehouse


 EDW is the central hub of the BI infrastructure in some architectural approaches  Single point of truth for enterprise  The question of the appropriate data model for the EDW layer remains controversial  Data in the EDW is:
     cleansed and harmonized during the ETL process historical distributed to various other systems often 3NF but can be dimensional usually atomic but sometimes aggregate

To Normalize or Not To Normalize


Property Redundancy Data volume Number of records Administration effort Support for realignment Delta handling Transformation Normalized Lower Lower Higher Higher Good Easy Difficult Denormalized Higher Higher Lower Lower Poor Difficult Easy

Choice of whether to normalize or not depends on the complexity and performance of transformations when transferring data between layers and reporting requirements

Data Marts
 Data mart is a departmental, or subject oriented, subset of the EDW  Data in the data mart comes from the EDW if one exists  Relationship between EDW and data marts is often called a hub-and-spoke architecture  Data in Data Mart is:
    often aggregated but may be atomic dimensional may or may not be historical volatile

Agenda
    Business Intelligence Architecture Architectural Components Architectural Approaches Implementing architecture in SAP BI

Two Competing Camps


 Corporate Information Factory
 Hub-and-spoke architecture  Top-Down  Inmon

 Data Bus Architecture


 Bottom-up  Kimball

Corporate Information Factory

Data Bus Architecture

Corporate Information Factory


 Relies on a single centralized data warehouse
 Data is atomic and normalized

 Data is distributed to data marts


 Exploration warehouses, data mining warehouses
Often 3NF and atomic because its easier to use for analytical models

 OLAP data bases


Often star schema and aggregated because its easier to perform complex analyses

Data Bus Architecture


 No centralized Data Warehouse
 In the data bus architecture the staging area serves a similar purpose as the data warehouse in the CIF

 Define conformed dimensions and fact tables


 Conformed dimensions form the data bus

 Create data marts in conformance with the conformed dimensions and fact tables
 A metadata repository forms the basis of the data bus

 No reliance on normalized databases


 Summarized and atomic data both stored in star schemas

Data Mart Dimensions

Service Call Status Service Call Type

Date Billing Date

End Date

Employee Time

Organization/ Location Fact Groups Customer

GL Account Product

Purchased Product

Pricing Package Vendor Destination Phone # Originating Phone #

Usage Type

Bus Architecture Matrix

Conformed Dimensions and Facts


 A conformed dimensions and fact table are semantically the same across the enterprise  Architecture defines and reconciles the attributes associated with each conformed dimension or fact table  Often as much a political issue to create these as a technical issue  Conformed dimension and fact definitions should include:
     Organization wide definitions Equations used to derive them (for facts) Units of measure Reporting periods Granularity (for facts)

Dependent Data Marts / Hub & Spoke


Report

Report

Report

Report

Report

An approach also used in the early days, but refined over time: - Originally suggested extensive effort in building the DW - Now recommends building DW incrementally

Data Mart Bus (conformed)


Report

Report

Report

Report

Report

An approach also used in the early days, but refined over time: - Originally suggested building silos - Now recommends enterprise perspective

Independent Data Marts


Report

Report

Report

Report

How Data Warehousing was often performed in the early days - Individual projects developing solutions into functional silos - No program / enterprise perspective - No conformed dimensions

Central Data Warehouse


Report

Report

Report

Report

Report

Seeks to overcome the limitations of previous architectures: - Highly variable with many individual approaches

Federated Data Warehouse

Report

Report

Report

Report

Report

An attempt to consolidate legacy Data Marts

Architecture Alternatives
Hub and Spoke Architecture (Corporate Information Factory) Source System Staging Area
Normalized Relational Warehouse (atomic data)

End user Applications

Dependent Data Marts (summarized/ some atomic data)

Data Mart Bus Architecture with Linked Dimensional Data Marts Source System Staging Area
Dimensionalized Data Marts linked by Conformed Dimensions (atomic/summarized data)

End user Applications

Architecture Alternatives
Independent Data Marts Architecture Source System Staging Area
Independent Data Marts (atomic/summarized data)

End user Applications

Centralized Data Warehouse Architecture Source System Staging Area


Normalized Relational Warehouse (atomic/some summarized data)

End user Applications

Architecture Alternatives

Federated Architecture
Existing Data Warehouses, Data Marts and Legacy Systems Logical/Physical Integration of common data elements

End user Applications

Architecture Hub-and-Spoke Bus Centralized Independent Data Marts Federated

Usage 39% 26% 17% 12% 4%

The Bottom Line


 Differences between these in practice are blurry
 The argument is mostly a distraction

 Best idea is to follow best practices:


 Build incrementally according to a business function using portfolio approach  Employ an enterprise perspective through governance, standards and a portfolio approach  Dimensionally model data at the reporting level and store atomic data at other levels  Utilise conformed dimensional models  Employ a Staging Area or Data Warehouse

Agenda
    Business Intelligence Architecture Architectural Components Architectural Approaches Implementing architecture in SAP BI

InfoObjects
 Attributes that are assigned to characteristic InfoObjects are stored independently of any InfoProvider to which the characteristic is attached so characteristics serve as conformed dimensions in the SAP data model

InfoProviders
 InfoProviders are different metaobjects in the data basis that can be seen within query definition  Physical
 DataStore  InfoCube  InfoObjects

 Logical
    VirtualProviders InfoSets (Join) MultiCubes (Union) Aggregation Level

Overview of InfoProviders

DataStore Objects
 DataStores are used mostly as a staging area for data before it is transferred to one or more InfoProviders
 Can be used for reporting, data cleansing, data harmonization and data-quality management  Data is atomic  Unlike InfoCubes, DataStores allow record updates

 DataStores consist of one or three flat tables depending on the type  DataStores can be used as a source for queries but are not as efficient as InfoCubes
 Required when reporting is at atomic level

DataStore Objects
 Standard DataStore Objects
 Building blocks for operational data stores  Permits delta updates  Data must be activated after loading

 Direct-update DataStore Objects


 Used when access to data is required immediately after it is loaded i.e. as part of an analytical process  Accessed via the BI API so that external applications can write to it directly

 Write-optimized DataStore Objects


 Optimized for uploading large amounts of data  Doesnt support delta updates

Modeling Layers of BI Architecture in SAP  Data Staging Layer


 Persistent Staging Area (PSA)
Used by the SAP provided Business Content

 DataStore Object
Allows reporting which is helpful for administrators and developers

 Operational Data Store


 DataStore Objects
Often in conjunction with SAP PI for near real-time messaging

 VirtualProviders
Care must be taken with this approach because performance of source systems can be affected

Modeling Layers of BI Architecture in SAP


 EDW
 DataStore Objects
Data in a DataStore Object is in a flat table but you can build normalized structures using multiple DataStore Objects

 Data Mart Layer


 Dimensional objects such as InfoCubes, Virtual InfoCubes, Multiproviders

Metadata in SAP BI
 Metadata in SAP BI is stored in the metadata repository  Documents all objects and the data flows defined among the various objects  Metadata can be exported to HTML and serve as project documentation

Distributing Data in SAP BI


The Data Mart Interface Is used to distribute data from one BI system to another.

The Open Hub Service allows you to distribute data from a BI system to non-SAP data marts

SAP BI Architecture Alternatives


 Single Global Data Warehouse  Global Data Warehouse Feeding Local Data Warehouses  Local Data Warehouses Feeding a Global Data Warehouse  Template-Driven Local Data Warehouses  Virtual Data Warehouse

Single Global Data Warehouse


Dependent Data Marts

InfoMart

InfoMart

Single Global DWH

Global Data Warehouse Feeding Local Data Warehouses

Dependent Data Marts


InfoMart

InfoMart

InfoMart

InfoMart

Local DWH

Local DWH

InfoMart

Global DWH

InfoMart

Local Data Warehouses Feeding a Global Data Warehouse

Dependent Data Marts


InfoMart

InfoMart

InfoMart

Global DWH

InfoMart

Local DWH

Local DWH

Template-Driven Local Data Warehouses


Data Mart Bus Architecture with Linked Dimensional Data Marts
InfoMart

InfoMart

InfoMart

InfoMart

Local DWH
Template

Local DWH

Template

InfoMart

Global DWH

InfoMart

You might also like