You are on page 1of 18

Chapter 11

Logical Database Design


Author: Graeme C. Simsion and Graham C. Witt
Are data warehouses / marts just
operational databases?
Query Query Query
Tools Tools Tools

Data Data Data


Mart Mart Mart

Load Load Load


Program Program Program

Data Warehouse

Load Load Load Load Load


Program Program Program Program Program

Source Source Source Source External


Data Data Data Data Data

Copyright: ©2005 by 2
Differences Compared with
Operational Databases
• Different requirements
• Database technology may not be
relational
– Special multi-dimensional databases now
exist to service this area
• Many techniques of database design
can still apply

Copyright: ©2005 by 3
Characteristics
• Data integration
– With data from many operational databases
• Loads data rather than performs updates
• Less predictable database ‘hits’
• Complex queries but a simple interface
• May emphasize historical data
• May summarize other data

Copyright: ©2005 by 4
Quality Criteria Revisited
• Completeness
• Non-redundancy
• Enforcement of (business) rules
• Data reusability
• Stability and flexibility
• Simplicity and elegance
• Communication and effectiveness
• Performance
Copyright: ©2005 by 5
Modelling / designing?
• Data warehouses feed data marts
– Need a separate approach
– Marts are more focused
– Warehouses are more general and must
handle all marts envisaged
• Consider each in turn
• (revisit 16.1 over page)

Copyright: ©2005 by 6
Query Query Query
Tools Tools Tools

Data Data Data


Mart Mart Mart

Load Load Load


Program Program Program

Data Warehouse

Load Load Load Load Load


Program Program Program Program Program

Source Source Source Source External


Data Data Data Data Data

Copyright: ©2005 by 7
Modeling for Data
Warehouses
1. May need an initial corporate model of the
business
2. Need to understand existing (operational)
data (bases)
3. Determine requirements of the warehouse
4. Determine sources and handling differences
5. Shaping data for data marts
Last two steps are more complex

Copyright: ©2005 by 8
Sources and Differences
when Designing
• Minimize number of source systems
• Carefully judge source data item quality
• Reconcile multiple sources
– Eg. Differences in timeframe and currency of item
• Handle compatibility of coding schemes for
data items
• Unpack overloaded attributes
– Eg. Address containing postcode where postcode
becomes an important part of a warehouse

Copyright: ©2005 by 9
Shaping Data for Data Marts
• Need to maximize flexibility
• Cater for common purposes between marts
and basic commonality (sorted out when
handing requirements for the warehouse)
• If difficult to cater for both flexibility and
common purpose opt for flexibility
• The rule: Maximize Flexibility, Minimize
Anticipation

Copyright: ©2005 by 10
Modeling for Data Marts
• Modeling for general business people
– Little technical knowledge
– Need for special queries
• Much simpler than operational databases
– Facts vs. transaction handling and complex
business rules
• Users of data marts need to move easily
between marts

Copyright: ©2005 by 11
Basic Data Mart Architecture:
Star Schema
• Fact table (only one)
• Dimension tables
– To classify fact table into categories
Period Customer

Accounting Month No Customer ID


Quarter No Customer Type Code
Year No Region Code
Sale State Code
Customer Name

Accounting Month No *
Product ID *
Customer ID *
Product Location ID * Location
Quantity
Value
Product ID Location ID
Product Type Code Location Type Code
Product Name Region Code
State Code
Location Name

Copyright: ©2005 by 12
Alternative Architectures:
Snowflake
• One fact table
• Dimensions are hierarchical
– Collapse 1:many relationships through de-
normalization Customer
Type
Period
Customer
Customer Type ID
Customer Type Name
Accounting Month No
Quarter No Customer ID
Year No Customer Type ID
Sale Region ID Region State
Customer Name

Accounting Month No
Region ID State ID
Product Product ID
Product Customer ID Location State ID State Name
Type Region Name
Location ID
Quantity
Product Type ID Product ID Value Location ID
Product Type Name Product Type ID Location Type ID Location
Product Name Region ID Type
Location Name

Location Type ID
Location Type Name

Copyright: ©2005 by 13
Snowflakes and Many-to-
many Relationships
Cannot be handled without action:
1. Ignore less common cases
Salesperson – But include data in fact (eg. Include
#salespeople in fact table)
be
credited to 2. Use a repeating group in dimension
table
Product
be credited
3. Treat sale-by-salesperson as the
be
with fact table
classified by
classify
Sale
Whatever you do, involve the
business users in decision making
about architecture

Copyright: ©2005 by 14
Time-dependent Data
• History and time are common in data marts
and you must be able to
– Handle different granularities of time
– Cater for overlapping periods
– Consider hierarchies of time periods
• Slowly changing dimensions are common
(eg. People may move customer categories
over time)
– Speed of dimension data change
– Speed of moving fact data from one dimension to
another

Copyright: ©2005 by 15
Dimension Change Example
Customer
Customer Purchase
Group

• Customers can change group


• Solutions
– Two group foreign keys (now, and at time
of purchase / transaction)
– Ignore if change is slow and cost of
ignoring it is low
– Hold a history of each customer’s
membership of groups
Copyright: ©2005 by 16
Concluding Word
• Data warehousing and data marts are
complex
• Specific design challenges and
limitations exist
• Patterns are also useful here
– There are resources available
• Do further reading about the area if
you’re interested
Copyright: ©2005 by 17
Next lecture
• Ontology: What’s all the fuss?
– How ontology can and cannot help you?
– What role does underlying theory (such as
ontology) play in practical data modelling?
– If we have ontology what happens with
creativity?

Copyright: ©2005 by 18

You might also like