Professional Documents
Culture Documents
Yannis Kotidis
AT&T Labs-Research
Roadmap
What is the data warehouse Multi-dimensional data modeling Data warehouse design
the star schema, bitmap indexes
Yannis Kotidis
Yannis Kotidis
Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing Large time horizon for trend analysis (current and past data) Non-Volatile store
physically separate store from the operational environment
Yannis Kotidis
Bulk load/refresh
warehouse is offline
Relational-olap
(Redbrick, Informix, Sybase, SQL server)
Yannis Kotidis
automate day-to-day operations (purchasing, banking etc) transactions access (and modify!) a few records at a time database design is application oriented metric: transactions/sec
complex queries that access millions of records long scans would interfere with normal operations need historical data for trend analysis metric: query response time
Yannis Kotidis
Examples of OLAP
Comparisons (this period v.s. last period)
Show me the sales per store for this year and compare it to that of the previous year to identify discrepancies
Yannis Kotidis
Multidimensional Modeling
Example: compute total sales volume per product and store
Store
Total Sales
Store
2 $468 $800
$540 $745
Yannis Kotidis
DIMENSIONS
product
state
August
month week
day 9
month
Yannis Kotidis
city store
category product
store
Yannis Kotidis
10
Pivoting
Pivoting: aggregate on selected dimensions
usually 2 dims (cross-tabulation)
Sales
1
Product 1 454 468 296 652 2 800 3 240 540 780 4 925 745 1670 ALL 1379 1268 536 1937 5120
11
Store
Yannis Kotidis
2 3 4
Roadmap
What is the data warehouse Multi-dimensional data modeling Data warehouse design
the star schema, bitmap indexes
Yannis Kotidis
13
PRODUCT
SALES time_key product_key location_key
measures
LOCATION
location_key store street_address city state country 15 region
units_sold
amount
Yannis Kotidis
PRODUCT
SALES time_key product_key location_key
measures
Pcategory
LOCATION
location_key store street_address city state country 17 region
units_sold
amount
Yannis Kotidis
Sregion=Europe
Bitmap Index
Each value in the column has a bit vector:
The i-th bit is set if the i-th row of the base table has the value for the indexed column The length of the bit vector: # of records in the base table
LOCATION location_key L1 L2 L3 L4 L5
Yannis Kotidis
Join-Index
Join index relates the values of the dimensions of a star schema to rows in the fact table. a join index on region maintains for each distinct region a list of ROW-IDs of the tuples recording the sales in the region Join indices can span multiple dimensions OR
can be implemented as bitmapindexes (per dimension) use bit-op for multiple-joins
LOCATION
region = Africa region = America region = Asia region = Europe
SALES
R102
R117 R118
1 1
R124
Yannis Kotidis
19
Join-index will prune of the data (uniform sales), but the remaining is still large (several millions transactions)
Index is unclustered
state
Pre-computation is necessary
city store
Yannis Kotidis
20
Cross-Tabulation (products/store)
Sales
1 Product 1 454 468 296 652 2 800 3 240 540 780 4 925 745 1670
Store
2 3 4
Yannis Kotidis
21
Yannis Kotidis
22
Store
2 3 4
SELECT LOCATION.store, SALES.product_key, SUM (amount) FROM SALES, LOCATION WHERE SALES.location_key=LOCATION.location_key CUBE BY SALES.product_key, LOCATION.store
Yannis Kotidis
23
product,quarter
store,quarter
product, store
quarter
product
store
none
Yannis Kotidis
26
Computation Directives
Hash/sort based methods (Agrawal et. al. VLDB96)
1. 2. 3. 4. 5. Smallest-parent Cache-results Amortize-scans Share-sorts Share-partitions
product,store,quarter
product,quarter
store,quarter
product, store
quarter
product
store
none
Yannis Kotidis
27
Compute aggregates in multi-way by visiting cube cells in the order which minimizes the # of times to visit each cell, and reduces memory access and storage cost.
Yannis Kotidis
28
Roadmap
What is the data warehouse Multi-dimensional data modeling Data warehouse design
the star schema, bitmap indexes
Yannis Kotidis
29
B(v, S )
product, store
u:u v,C (u )C (u )
v S
(CS (u ) Cv (u ))
product,quarter
store,quarter
quarter
product
store
none
Yannis Kotidis
30
Problem Generalization
Materialize and maintain the right subset of views with respect to the workload and the available resources What is the workload?
Farmers v.s. Explorers [Inmon99] Pre-compiled queries (report generating tools, data mining) Ad-hoc analysis (unpredictable)
Yannis Kotidis
32
Alternatives: use query result caching techniques and reuse prior computations (WATCHMAN, DynaMat)
Yannis Kotidis
33
Exploit dependencies among the views to maintain the best subset of them within the given update window
DW base tables
Query Interface Aggregate Locator View Pool Admission Control
User
Yannis Kotidis
34
Space bound
Time bound
Yannis Kotidis
35
Yannis Kotidis
36
The End
Thank you!
Yannis Kotidis
37