Professional Documents
Culture Documents
Corinne Baragoin
Geetha Balasubramaniam
Bhuvana Chandrasekharan
Landon DelSordo
Jan B Lillelund
Julie Maw
Annie Neroda
Paulo Pereira
Jo A Ramos
ibm.com/redbooks
International Technical Support Organization
September 2003
SG24-7002-00
Note: Before using this information and the product it supports, read the information in
“Notices” on page xxix.
This edition applies to IBM DB2 Universal Database V8.1 Fixpack 2+, IBM DB2 Cube Views
V8.1, IBM DB2 Office Connect Analytics Edition V4.0, IBM QMF For Windows V7.2f, Ascential
MetaStage V7.0, Meta Integration Model Bridge V3.1, IBM DB2 OLAP Server V8.1, Cognos
Series 7, BusinessObjects Enterprise 6, and MicroStrategy V7.2.3.
Note: We recommend that you consult the product documentation or follow-on versions of this
redbook for more current information.
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxvii
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxix
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
The team that wrote this redbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxxii
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxv
Comments welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxv
Chapter 4. Using the cube model for summary tables optimization . . . 125
4.1 Summary tables and optimization requirements . . . . . . . . . . . . . . . . . . . 126
4.2 How cube model influences summary tables and query performance . . 127
4.3 MQTs: a quick overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.3.1 MQTs in general . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.3.2 MQTs in DB2 Cube Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.4 What you need to know before optimizing . . . . . . . . . . . . . . . . . . . . . . . 136
4.4.1 Get at least a cube model and one cube defined . . . . . . . . . . . . . . 136
Contents v
6.3.2 Launch Excel and load Office Connect Add-in . . . . . . . . . . . . . . . . 229
6.3.3 Connect to OLAP-aware database (data source) in DB2 . . . . . . . . 230
6.3.4 Import cube metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
6.3.5 Bind data to Excel worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
6.4 OLAP style operations in Office Connect . . . . . . . . . . . . . . . . . . . . . . . . 235
6.5 Saving and deleting reports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
6.6 Refreshing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
6.7 Optimizing for better performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
6.7.1 Enable SQLDebug trace in Office Connect. . . . . . . . . . . . . . . . . . . 241
6.7.2 Use DB2 Explain to check if SQL is routed to the MQT . . . . . . . . . 243
6.7.3 Scenario demonstrating benefit of optimization . . . . . . . . . . . . . . . 244
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker
271
8.1 Ascential MetaStage product overview . . . . . . . . . . . . . . . . . . . . . . . . . . 272
8.1.1 Managing metadata with MetaStage. . . . . . . . . . . . . . . . . . . . . . . . 276
8.2 Metadata flow scenarios with MetaStage . . . . . . . . . . . . . . . . . . . . . . . . 281
8.2.1 Importing ERwin dimensional metadata into DB2 Cube Views. . . . 281
8.2.2 Leveraging existing enterprise metadata with MetaStage . . . . . . . 289
8.2.3 Performing cross-tool impact analysis . . . . . . . . . . . . . . . . . . . . . . 295
8.2.4 Performing data lineage and process analysis in MetaStage . . . . . 308
8.3 Conclusion: benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset
Contents vii
Chapter 11. Accessing DB2 dimensional data using Cognos . . . . . . . . 483
11.1 The Cognos solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
11.1.1 Cognos Business Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
11.2 Architecture and components involved . . . . . . . . . . . . . . . . . . . . . . . . . 487
11.3 Implementation steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
11.4 Implementation considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
11.4.1 Optimizing drill through . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
11.4.2 Optimizing Impromptu reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
11.4.3 Implementation considerations: mappings . . . . . . . . . . . . . . . . . . 513
11.4.4 Enhancing the DB2 cube model . . . . . . . . . . . . . . . . . . . . . . . . . . 520
11.5 Cube model refresh considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
11.6 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
11.6.1 Sales analysis scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
11.6.2 Financial analysis scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
11.6.3 Performance results with MQT . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
11.7 Conclusion: benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
Contents ix
Appendix C. FAQs, diagnostics, and tracing . . . . . . . . . . . . . . . . . . . . . . 669
Setup questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
Metadata questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
OLAP Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671
Appendix E. The case study: retail datamart . . ...... ....... ...... . 685
The cube model . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... ....... ...... . 686
The cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... ....... ...... . 687
Tables in the star schema . . . . . . . . . . . . . . . . . . . ...... ....... ...... . 687
MQT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... ....... ...... . 694
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
Figures xiii
6-5 Provide connection information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
6-6 Cube Import Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
6-7 Select cube(s) for import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
6-8 Imported cube metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
6-9 Export data to Microsoft Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
6-10 Select Excel sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
6-11 View data in Excel spreadsheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
6-12 Show Pivot table field list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
6-13 Drag and drop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
6-14 Member selection or filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
6-15 PivotTable wizard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
6-16 Layout wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
6-17 PivotChart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
6-18 SQLDebug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
6-19 Access plan graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
6-20 Customized report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
6-21 Access plan graph - STORE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
6-22 Access plan graph - PRODUCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
7-1 Components required for QMF for Windows with DB2 Cube Views . . 250
7-2 New object window for QMF for Windows . . . . . . . . . . . . . . . . . . . . . . 251
7-3 List of queries saved at the server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
7-4 OLAP Query wizard server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
7-5 OLAP Query wizard sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7-6 OLAP Query wizard cube schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7-7 OLAP Query wizard cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
7-8 View of the cube in Object Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
7-9 Hierarchy levels in Object Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
7-10 Default Layout Designer toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
7-11 Layout Designer without enable online mode option . . . . . . . . . . . . . . 256
7-12 Default filter window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
7-13 Filter window with options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
7-14 Formatting options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
7-15 Drill down operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
7-16 Drill up operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
7-17 Roll up operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
7-18 Slices of the product dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
7-19 Portion of CONSUMER table from a relational view . . . . . . . . . . . . . . 263
7-20 Sales cube example in DB2 OLAP Center . . . . . . . . . . . . . . . . . . . . . 265
7-21 OLAP report 1: most profitable consumer groups in the West region . 266
7-22 OLAP report 2: most profitable sales . . . . . . . . . . . . . . . . . . . . . . . . . . 267
7-23 OLAP report 3: consumer buying trends . . . . . . . . . . . . . . . . . . . . . . . 268
7-24 Resource Limits Group in QMF for Windows Administrator. . . . . . . . . 269
8-1 Ascential Enterprise Integration Suite . . . . . . . . . . . . . . . . . . . . . . . . . 272
Figures xv
8-45 DataStage Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
8-46 DataStage run results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
8-47 RunImport output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
8-48 MetaStage category browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
8-49 Data lineage menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
8-50 Data lineage path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
8-51 MetaStage category browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
8-52 Browse from menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
8-53 Process analysis menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
8-54 Process analysis path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
8-55 Inspect event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
9-1 A sample of typical metadata movement solutions . . . . . . . . . . . . . . . 330
9-2 Meta Integration functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
9-3 Meta Integration architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
9-4 Meta Integration supported tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
9-5 A metadata integration solution example . . . . . . . . . . . . . . . . . . . . . . . 334
9-6 Business cases for metadata movement solutions . . . . . . . . . . . . . . . 335
9-7 Possible metadata movement solutions for DB2 Cube Views . . . . . . . 336
9-8 Metadata movement scenarios illustrated in this chapter . . . . . . . . . . 336
9-9 The cube model used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
9-10 Logical view of the ERwinv4 model . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
9-11 Enabling the ERwin v4 dimensional features. . . . . . . . . . . . . . . . . . . . 341
9-12 Specifying the table dimensional roles. . . . . . . . . . . . . . . . . . . . . . . . . 342
9-13 DB2 schema generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
9-14 Importing the ERwin v4 model into MIMB . . . . . . . . . . . . . . . . . . . . . . 345
9-15 Specifying the export bridge parameters . . . . . . . . . . . . . . . . . . . . . . . 345
9-16 Exporting the model to DB2 Cube Views . . . . . . . . . . . . . . . . . . . . . . . 346
9-17 Specifying the XML file to import into OLAP Center . . . . . . . . . . . . . . 347
9-18 Controlling how the metadata is imported into OLAP Center . . . . . . . 347
9-19 The ERwin v4 business names and description are also converted . . 348
9-20 Exporting from the DB2 cube model as XML . . . . . . . . . . . . . . . . . . . . 349
9-21 Converting the cube model XML file to an ERwin v4 XML file . . . . . . . 350
9-22 Cube model converted to ERwin v4 with business names . . . . . . . . . 351
9-23 Logical view of the ERWin model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
9-24 Logical view of the ERwin model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
9-25 Enabling the ERwin dimensional features . . . . . . . . . . . . . . . . . . . . . . 355
9-26 Specifying the tables dimensional roles . . . . . . . . . . . . . . . . . . . . . . . . 356
9-27 Saving the model as ERX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
9-28 ERwin names expansion feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
9-29 Importing the ERwin model into MIMB. . . . . . . . . . . . . . . . . . . . . . . . . 358
9-30 Specifying the export bridge parameters . . . . . . . . . . . . . . . . . . . . . . . 359
9-31 Exporting the model to DB2 Cube Views . . . . . . . . . . . . . . . . . . . . . . . 360
9-32 Specifying the XML file to import into OLAP Center . . . . . . . . . . . . . . 360
Figures xvii
9-76 This is the fact table of the star schema . . . . . . . . . . . . . . . . . . . . . . . 401
9-77 Starting the CWM export wizard from DB2 Data Warehouse Center . 402
9-78 Selecting the database to be exported to CWM . . . . . . . . . . . . . . . . . 402
9-79 The CWM XMI file rendered in a browser . . . . . . . . . . . . . . . . . . . . . . 403
9-80 MIMB: importing the DB2 Data Warehouse Center CWM XMI file . . . 404
9-81 The sample warehouse Beverage Company imported from CWM . . . 404
9-82 Specifying the export parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
9-83 Choosing a subsetting mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
9-84 Subsetting the star schema model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
9-85 Exporting the cube model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
9-86 The Beverage Company star schema imported into DB2 Cube Views 408
9-87 The sample Informatica XML model . . . . . . . . . . . . . . . . . . . . . . . . . . 409
9-88 Importing the Informatica model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
9-89 Specifying the export parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
9-90 Exporting the model to DB2 Cube Views . . . . . . . . . . . . . . . . . . . . . . . 411
9-91 The cube model as imported in DB2 OLAP Center . . . . . . . . . . . . . . . 412
10-1 OLAP Server architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
10-2 Integration Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
10-3 Metadata flow through the Integration Server bridge . . . . . . . . . . . . . . 424
10-4 Reverse metadata flow through the Integration Server bridge . . . . . . 425
10-5 Export from DB2 Cube Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
10-6 Integration Server Bridge window . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
10-7 The IS bridge from DB2 Cube Views to Integration Server . . . . . . . . . 430
10-8 Use of bridge from DB2 Cube Views to Integration Server . . . . . . . . . 431
10-9 Import model into Integration Server . . . . . . . . . . . . . . . . . . . . . . . . . . 432
10-10 Import metaoutline into Integration Server . . . . . . . . . . . . . . . . . . . . . . 433
10-11 Result of import into Integration Server model. . . . . . . . . . . . . . . . . . . 434
10-12 Integration Server column renaming in metadata . . . . . . . . . . . . . . . . 436
10-13 Integration Server column properties . . . . . . . . . . . . . . . . . . . . . . . . . . 437
10-14 Integration Server measure hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . 437
10-15 Add back missing columns in Integration Server . . . . . . . . . . . . . . . . . 439
10-16 Integration Server export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
10-17 The IS bridge from Integration Server to DB2 Cube Views . . . . . . . . . 441
10-18 Import wizard screen 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
10-19 Import wizard screen 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
10-20 DB2 Cube Views cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
10-21 Integration Server metaoutline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
10-22 The measure in DB2 Cube Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
10-23 The measure in Integration Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
10-24 MQT script FROM clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
10-25 MQT script GROUP BY clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
10-26 Integration Server data load SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456
10-27 Load explain with MQT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
Figures xix
11-25 Query properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
11-26 Data definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
11-27 Results on aggregate data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
11-28 Calculated measure in Db2 Cube Views . . . . . . . . . . . . . . . . . . . . . . . 513
11-29 Calculated measure in Impromptu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
11-30 Alternate hierarchies in DB2 Cube Views . . . . . . . . . . . . . . . . . . . . . . 515
11-31 Alternate hierarchies in PowerPlay Transformer . . . . . . . . . . . . . . . . . 516
11-32 Reproduce the alternate hierarchies in DB2 Cube Views . . . . . . . . . . 517
11-33 Impromptu query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
11-34 transformer: adding alternate hierarchy . . . . . . . . . . . . . . . . . . . . . . . . 519
11-35 Create alternate drill down . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
11-36 Transformer Model relative time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
11-37 Transformer model Day of Week . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
11-38 PowerPlay seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
11-39 PowerPlay seasonality: another example . . . . . . . . . . . . . . . . . . . . . . 522
11-40 Transformer Model measure formatting. . . . . . . . . . . . . . . . . . . . . . . . 523
11-41 Scenario 1: report example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
11-42 Scenario 1: report example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
11-43 Scenario 1: report example 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
11-44 Scenario 1: report example 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
11-45 Scenario 1: report example 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
11-46 Scenario 1: report example 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
11-47 Scenario 1: report example 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
11-48 Scenario 1: report example 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
11-49 Scenario 1: report example 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
11-50 Scenario 2: report example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
11-51 Scenario 2: report example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
11-52 Scenario 2: report example 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
11-53 Scenario 3: add a new calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
11-54 Scenario 3: the graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
11-55 Financial scenario: report example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 534
11-56 Financial scenario: report example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 535
11-57 Financial scenario: report example 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 535
11-58 Forecasting scenario: Forecast option . . . . . . . . . . . . . . . . . . . . . . . . . 536
11-59 Forecasting scenario: result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
11-60 Create a mobile sub-cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
11-61 Open the PowerPlay sub-cube saved . . . . . . . . . . . . . . . . . . . . . . . . . 539
12-1 BusinessObjects Enterprise 6 product family . . . . . . . . . . . . . . . . . . . 544
12-2 Metadata flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
12-3 Metadata mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
12-4 Additional metadata mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
12-5 Cube model to universes mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 550
12-6 Hierarchies mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
Figures xxi
13-10 Drilling to campaign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
13-11 Question 2: report grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
13-12 DB2 explain for question 2: without MQT . . . . . . . . . . . . . . . . . . . . . . 606
13-13 DB2 explain for question 2: with MQT . . . . . . . . . . . . . . . . . . . . . . . . . 607
13-14 Question 3: report grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
13-15 Question 4: report grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
13-16 Question 5: report grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
14-1 Web services layered architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
14-2 Web services for DB2 Cube Views Architecture . . . . . . . . . . . . . . . . . 620
14-3 Web services for DB2 Cube Views . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
14-4 Sales cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622
14-5 XML Representation of Sales Cube (Part 1 of 2). . . . . . . . . . . . . . . . . 623
14-6 XML Representation of Sales Cube (Part 2 of 2). . . . . . . . . . . . . . . . . 624
14-7 Metadata for STORE dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
14-8 Dimensions in Sales cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
14-9 XML Representation of Dimension members in Sales Cube (1 of 2) . 627
14-10 XML Representation of Dimension members in Sales Cube (2 of 2) . 628
14-11 Dimension Members - STORE dimension . . . . . . . . . . . . . . . . . . . . . . 630
14-12 Top level members in DATE dimension. . . . . . . . . . . . . . . . . . . . . . . . 631
14-13 Slice of Sales Cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632
A-1 DataStage Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
A-2 Project properties dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642
A-3 Services manager under Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
A-4 Directory Administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
A-5 Select data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
A-6 MetaStage attach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
A-7 ERwin import category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
A-8 New import category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
A-9 New import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
A-10 Import selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
A-11 ERwin import parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
A-12 ERwin saved as XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
A-13 DB2 Configuration Assistant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
A-14 New user-defined category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
A-15 ERwin Sales model User Category . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
A-16 Add Selection to Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
A-17 Select Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
A-18 Request Publication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
A-19 MetaStage Subscribe. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656
A-20 New subscription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656
A-21 Subscription options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
A-22 DataStage MetaBroker subscription parameters . . . . . . . . . . . . . . . . . 657
A-23 DataStage client login . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658
Figures xxiii
xxiv DB2 Cube Views: A Primer
Tables
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service that
does not infringe any IBM intellectual property right may be used instead. However, it is the user's
responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not give you any license to these patents. You can send license
inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such provisions
are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES
THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer
of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may
make improvements and/or changes in the product(s) and/or the program(s) described in this publication at
any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm
the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on
the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrates programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the
sample programs are written. These examples have not been thoroughly tested under all conditions. IBM,
therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy,
modify, and distribute these sample programs in any form without payment to IBM for the purposes of
developing, using, marketing, or distributing application programs conforming to IBM's application
programming interfaces.
The following terms are trademarks of International Business Machines Corporation and Rational Software
Corporation, in the United States, other countries or both:
Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, other
countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the
United States, other countries, or both.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun
Microsystems, Inc. in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
SET, SET Secure Electronic Transaction, and the SET Logo are trademarks owned by SET Secure
Electronic Transaction LLC.
Other company, product, and service names may be trademarks or service marks of others.
Multidimensionality is the primary requirement for an OLAP system, and the cube
always refers to the collections of the data that an OLAP system implements.
Business Intelligence and OLAP systems are no longer limited to the privileged
few business analysts: they are being democratized by being shared with the
rank and file employee demanding Relational Database Management Systems
(RDBMS) that are more OLAP-aware.
IBM DB2® Cube Views V8.1 (DB2 Cube Views through the redbook) and its
cube model provide DB2 Universal Database™ (DB2 through the redbook) the
ability to address multidimensional analysis and become a key player in the
OLAP world.
This redbook also documents, within Part 3, some front-end tools and metadata
bridges to DB2 Cube Views provided by IBM and different business partners
through their own products. The business partners’ metadata bridge chapters,
also delivered as Redpapers, are:
MetaStage® metadata bridge from Ascential™ (REDP3712)
Universal metadata bridge from Business Objects (REDP3711)
Cognos metadata bridge from Cognos, Inc. (REDP3713)
QMF™ for Windows® front-end tool from IBM and Rocket Software
(REDP3702
MetaIntegration metadata bridge from MetaIntegration Technologies, Inc.
(REDP3714)
MicroStrategy metadata bridge from MicroStrategy, Inc. (REDP3715)
Landon DelSordo is a Certified Senior IT Specialist in the US. She has over 25
years experience in IT. She has a degree in mathematics from the College of
William and Mary. Her areas of expertise include business intelligence, OLAP
and large data warehouses. She began working with DB2 in 1982 prior to the
availability of Version 1 on MVS™.
Julie Maw is a Senior IT specialist in the United Kingdom. She has 19 years of
experience in IBM, mostly working with iSeries™ customers. She has been
working in the business intelligence field for six years, initially as a member of the
the Business Intelligence Offering Team within IBM Global Services. She is
currently a member of the EMEA Business Intelligence Technical Sales team
specializing in DB2 OLAP Server.
Thanks to the following business partners or IBMers who came onsite in San
Jose during one week to test and document their metadata bridge from/to DB2
Cube Views:
Nina Sandy and Patrick Spedding, Cognos, Inc. on Cognos metadata bridge
Thanks to the following people for their help in planning and preparing this
project and their involvement and input all along the project:
Nathan Colossi
Daniel De Kimpe
Preface xxxiii
John Poelman
Gary Robinson
Christopher Yao
IBM Silicon Valley Lab
Thanks to the following people for their input and contributions during the project:
William Sterling
IBM WW Technical Sales Support
Richard Sawa
Hyperion
Mike Alcorn
Upendra Chitnis
Jason Dere
Bruno Fischel
Suzanna Khatchatrian
Jeff Gibson
Gregor Meyer
Benjamin Nguyen
Joyce Taylor
Craig Tomlyn
Tamuyen Phung
Jennifer Xia
IBM Silicon Valley Lab
John Medicke
Stephen Rutledge
IBM Software Solutions and Strategy Division
Wenbin Ma
Calisto Zuzarte
IBM Toronto Lab
Matt Kelley
Rocket Software
Your efforts will help increase product acceptance and customer satisfaction. As
a bonus, you'll develop a network of contacts in IBM development labs, and
increase your productivity and marketability.
Find out more about the residency program, browse the residency index, and
apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
Preface xxxv
xxxvi DB2 Cube Views: A Primer
Part 1
The multidimensional model in DB2 Cube Views simply reflects the physical
layout of the tables.
Sales
Facts
Time
Dimension
A ll A ll All A ll
T im e S to r e s C u s to m e rs P r o d u c ts
S to r e C u s to m e r Prod uct
Y ea r
C o u n tr y C o u n tr y G rou p
S to r e C u s to m e r P ro d u c t
Q u a rte r
R e g io n R e g io n L in e
S to r e C u s to m e r P r o d u ct
M o n th
S ta te S ta te Nam e
S to re C u s to m e r
D ay
C ity C ity
S to r e C u s to m e r
Nam e N am e
In Figure 1-2 we see four dimensions: Time, Store, Customer, and Product. The
fact table itself is not represented in this diagram. The solid line and the dashed
line represent two different slices of the data. The solid line query is a slice
across the database for months, store cities, all customers, at the product line
level for one or more business metrics in the fact table. For example, the query
could be for sales in May of a specific product line for all customers consolidated
by store city. The slice of data represented by the dashed line represents months,
all stores, customer states and product name level data.
OLAP implementations
The term OLAP is a general term that encompasses a number of different
technologies that have been developed to implement an OLAP database. The
most common server implementations that are available currently are MOLAP,
ROLAP, and HOLAP.
HOLAP stands for Hybrid OLAP and, as the name infers, is a hybrid of ROLAP
and MOLAP. In a MOLAP database the data is mostly pre-calculated which has
the advantage that it offers very fast query response time, but the disadvantages
include the time taken to calculate the database and the space required to hold
these pre-calculated values. There is therefore a practical limit on the size of a
MOLAP database. In a ROLAP database the performance of the queries will be
largely governed by the complexity of the SQL and the number and size of the
tables being joined in the query. However, within these constraints, a ROLAP
solution is generally a more scalable solution.
From a user perspective the line between what is stored in MOLAP and what is
stored in relational should be seamless. A HOLAP environment therefore
attempts to combine the benefits of both MOLAP and ROLAP technologies. By
storing the lower levels of a dimension in relational instead of in MOLAP, the
MOLAP database size is reduced and therefore the time required to perform the
pre-calculation of the aggregate data is reduced. Queries that request data from
the MOLAP section of the database will benefit from the fast performance that is
expected from having pre-calculated data. Moreover, by storing the lower levels
of the database in relational, the database as a whole (MOLAP and relational
combined) can be extended to take advantage of the scalability benefits in the
relational database.
The above terms are used to refer to server based OLAP technologies. There is
also another acronym, DOLAP, which refers to Desktop OLAP. DOLAP enables
users to quickly pull together small cubes that run on their desktops or laptops.
All of these OLAP technologies can be implemented using IBM DB2’s family of
products. MOLAP is provided with IBM DB2 OLAP Server that is an OEM version
of Hyperion Essbase, and is separate from the DB2 database engine. HOLAP is
available using both IBM DB2 OLAP Server and IBM DB2 itself. ROLAP and
DOLAP are available with DB2, and various front end tools provide a dimensional
representation to the end user of the ROLAP database using terminology with
which the business analysts are familiar.
All styles of OLAP need the same basic metadata: the cube model and a
mapping of that model to the relational source. A common factor in most of these
scenarios is that at some point, a query against the relational database is going
to be generated. However, up till now the relational database, unlike the other
tools described above, has had no metadata at all describing the nature of the
OLAP structure within the database.
This does not mean however that this is yet another isolated store of metadata.
To develop DB2 Cube Views, IBM involved its business partners with the result
that these partners have developed associations between their products and
DB2 Cube Views. Most Business Intelligence tools are able to exchange
metadata between their products and DB2 Cube Views via a bridge. For
example, a ROLAP front end tool requires metadata about the OLAP structure
within the underlying database. DB2 Cube Views has that information and is able
to share that information by passing the metadata across the bridge to the front
end tool. Some products, namely QMF for Windows and Office Connect to date,
have been enhanced such that they access the DB2 Cube Views metadata
directly. Then DB2 Cube Views allows to share metadata with all partner tools
but also to define the OLAP model and its mapping and export this metadata as
well. Defining the OLAP model and mappings is performed in DB2 once, and to
avoid replicating effort, exported to the tool.
Now that DB2 UDB V8.1 is OLAP-aware it is able to make use of the metadata it
has available to it to optimize the database for OLAP. As we have discussed, an
OLAP database is a subject area specific database that holds business metrics
and dimensions. Any business metric can be queried according to numerous
slices of the database as governed by the hierarchies that are available within the
dimensions. A query that is based on the lowest level available in each dimension
is a query based at the level of granularity of the fact table itself. Often, however,
a query is going to be expressed at a higher level in one or more of the
dimensions and as such represents an aggregate or summary of the base data.
Data loading
DB2 OLAP
Server
Hybrid
Analysis
ns
Cube Build
lum
co
&
QMF
Drill through
ws
for Windows
ro
Query Results
Figure 1-3 illustrates some examples of how the presence of MQTs can aid
performance in on OLAP environment. At the top of the figure we see DB2 OLAP
Server. Where an aggregate of the base data is required to load into DB2 OLAP
Server, then the use of an MQT should improve the performance of extract for the
data load. More critically perhaps, the use of an MQT may significantly improve
the performance of the relational queries that are generated in a hybrid
The challenge here is that at each part of the implementation where metadata is
required, it is necessary to perform steps to re-create the metadata. This process
of losing information and rediscovering it is expensive and error prone. Nor is this
a one time problem. As the schema changes, all tools and applications will have
to be updated.
OLAP models are rarely this simple; they usually include more dimensions and
require additional storage for aggregates such as months, quarters, years,
product groups, and sales regions. The space and time required to build such
aggregates can make aggregation impractical when it is equivalent to
preaggregate everything.
Having decided on which summary tables to create the challenge is then one of
managing those summary tables. Summary tables occupy space and take time
to refresh. The DBA will need to determine a balance between creating more
summary tables and operating within the space and time limitations that exist in
their particular environment. Moreover there is also a balance to be struck
between creating more summary tables and overloading the optimizer.
Using the same example, the DBA would have to generate the metadata to
describe the relational source data and the target MOLAP database structure,
and then generate the scripts required to load the data as it has a full
understanding of the source and target databases and any transformations that
are required.
Whatever method is available to the DBA, having specified how the data should
be loaded into any MOLAP database, the next step is to optimize that data load.
The DBA will need to determine the indexes that should be built and may well
need to analyze the query to determine how best to improve the performance of
A DBA needs to understand the dimensional model, the nature of the data, and
the access patterns. Cost/benefit calculations that consider the cost to build, the
space the aggregates will consume, and the benefit they will yield may help. The
cost/benefit analysis will help determine which slices of the multidimensional
model will be pre-aggregated and stored and which will be computed on
demand. Some incoming queries will directly correspond to pre-aggregated
stored values; others can be quickly derived from existing partial aggregates. In
both cases, faster queries result. However, getting to the point where the DBA is
confident in their choice of aggregate tables takes time.
Through the use of bridges, DB2 Cube Views is able to share its OLAP metadata
(import/export) with other partner Business Intelligence tools, offering users of
those tools a fast start option and assisting in reducing the maintenance involved
in changing metadata that may be stored repeatedly in different formats in
different products.
DB2 Cube Views is available as part of both DB2 UDB V8.1 Data Warehouse
Standard Edition and DB2 UDB V8.1 Data Warehouse Enterprise Edition, in
addition to being available separately.
The DB2 Cube Views metadata model takes a layered approach, an overview of
which is shown in Figure 1-4.
Cube
Cube Cube
CubeModel
Model
Cube
Cubedimension
dimension Dimension
Cube
Cubehierarchy
hierarchy Hierarchy
Hierarchy
Cube
Cubefacts Facts
Facts Attribute
Attribute
Join
Join Relationship
Relationship
Cube Measure
Measure Measure
Measure Attribute
Attribute Join
Join Attribute
Attribute
Metadata
Figure 1-4 demonstrates how the cube metadata, shown in the top part of the
diagram, maps to the relational table constructs in DB2 UDB V8.1, shown in the
bottom part of the diagram.
The cube metadata defines two major structures, the cube model and the cube:
The cube model can be compared to a conceptual OLAP database. The cube
model can be constructed in many ways. It maps OLAP metadata objects to
the relational structures in DB2 UDB V8.1. The metadata objects that are
Some query tools are able to connect directly to the DB2 Cube Views metadata
via the DB2 Cube Views API and provide the end user with the cube definition
that they require in order to navigate the cube and query the data. Other tools will
make use of the DB2 Cube Views metadata via a bridge as is discussed in
“Metadata bridges to back-end and front-end tools” on page 19.
The user interface to the DB2 Cube Views metadata is via a client workstation
graphical user interface called OLAP Center. OLAP Center is a Java™ based
utility that uses available DB2 UDB V8.1 common classes and maintains the
same look and feel as the other DB2 GUI tools. OLAP Center can launch and can
be launched by other DB2 UDB V8.1 tools. The the architecture of OLAP Center
is depicted in Figure 3-17 on page 83.
The Optimization Advisor takes as its input the metadata, the input values that
are entered in OLAP Center (disk space limit, time limit and MQT maintenance
The Optimization Advisor may also make use of the recently introduced
super-aggregates operators in order to create MQTs that can potentially be used
by a greater number of queries.
As the Optimization Advisor is optimizing at the cube model level, and is able to
take advantage of recently developed assists within DB2 UDB V8.1, it is in a
good position to meet its objective of maximizing efficiency by being able to
determine a smaller number of MQTs than might otherwise have been
determined manually. This is illustrated in Figure 1-5.
MQTs that
would have
been built
manually
Wizard
optimizes
into one
"super-MQT"
1.4.3 Interfaces
The interface options that are available with release one of DB2 Cube Views
have already been discussed in earlier sections of this chapter. The purpose of
this subsection is to summarize those interface options, and also to introduce an
additional interface that is not actually part of release one of the product, but is
available as a Technology Preview.
The interface options that are available in release one of the product to access
DB2 Cube Views metadata are listed in Table 1-1.
Direct access to metadata through DB2 IBM QMF for Windows 7.2f, Office
Cube Views API Connect 4.0 Analytic Edition
Note: Cognos accesses DB2 Cube Views metadata using both bridge and
API.
Figure 1-6 illustrates some possible future application scenarios using DB2 Cube
Views Web services.
SOAP/HTTPS
XML
XML XML
Customer Data Company Portals
Mobile Clients Internet/
SOAP/HTTPS SOAP/HTTPS
Intranet
The intention for DB2 Cube Views Web services is that it will provide access for
Web services developers to OLAP analytical data. It is not the intention for DB2
Cube Views Web services to become a new slice, dice, and drill interface, but
more that DB2 Cube Views Web services will allow developers to quickly find
sources of dimensional information for their applications; determine the slices
they need, and retrieve the data using an XPath-based execute method. Without
learning OLAP interfaces and query languages, Web services developers will be
able to call on their existing knowledge of XML and XPath to add analytic
information to their applications.
DB2 Cubes Views Web services is available from the alphaWorks® IBM Web
site:
http://www.alphaworks.ibm.com.
Some tools will push metadata into DB2 Cube Views, some will pull metadata
from DB2 Cube Views into their own tool metadata structure, and some will offer
a two-way bridge which can both push and pull metadata to and from DB2 Cube
Views. This is illustrated in Figure 1-7. Typically design and ETL tools will be
pushing metadata into DB2 Cube Views; and query and reporting and OLAP
tools will be pulling metadata from DB2 Cube Views.
XML XML
MOLAP
DB2 Stored
Procedure Engines
Using this simple XML based interface instead of working directly against the
new DB2 Cube Views, catalog tables protect developers of these bridges from
changes to the underlying tables. For further information on front-end tools and
metadata bridges, please refer to Part 3, on page 219.
In each scenario, we refer to the back-end tools and front-end tools in a generic
way rather than naming any specific products.
In addition, they will most likely be using one of today’s premier data delivery
platforms as a front-end for the database because it provides ease of use and
because it works so well when coupled with a star schema database. To
integrate your front-end tool, the star schema that you have built as tables,
columns, primary keys, foreign keys will need to be mapped to the tool as a
collection of OLAP objects like measures, derivations, dimensions, hierarchies,
attributes and joins. DB2 Cube Views gives you a new GUI called the OLAP
Center where you can map these OLAP objects directly to your relational objects
and hold these mappings in DB2, as shown in Figure 2-2.
Using the OLAP Center, you can pinpoint the columns in the fact table that
actually contain the measures and capture formulas for deriving additional
measures that are not physically stored in the star. Further, you can describe the
dimensions and their various hierarchies, even multiple hierarchies if that applies.
You can also indicate the proper joins to use when accessing the star. Once you
have these OLAP objects described, you can group them into cubes, even into
multiple cubes, each of which represents a subset of your full cube model based
on the star schema. If you have already captured this information in a back-end
data modeling or ETL (Extract, Transform, Load) tool, you can skip the data entry
and just import the metadata directly via a metadata bridge.
Once the OLAP metadata is stored in DB2 Cube Views, you can use another
metadata bridge to send it over to your favorite front-end data delivery tool,
automatically, to populate its metadata layer. This way, if a different person is
responsible for the database from the one who is responsible for the data
delivery tool, then the metadata layer will be consistent. Also, if you will be using
multiple tools, the metadata only needs to be captured once, in DB2 Cube Views,
and then shared with all the other tools in your solution. Figure 2-3 below
illustrates this metadata transfer.
olap ojects
relational objects
Once the metadata layer in your reporting tool has been populated, the tool will
soon be sending SQL queries to your star schema. If the SQL requires
aggregation and joins, and it probably does, the user’s response time could
possibly be slow. That is a problem.
But let us say you have a good DBA who knows what to do. He pre-builds an
aggregate table and adds it to the database where your star schema is located.
The really nice thing about pre-built aggregates in DB2 is that the tool writing the
SQL doesn’t have to know about them. The DB2 optimizer will automatically use
them if the query matches up to them well enough. This makes for very much
faster query response times. Figure 2-4 shows a query being satisfied by a
pre-built aggregate.
SQL
optimizer
pre-built aggregate
The not-so-nice thing about pre-built aggregates is that the optimizer might not
choose to use them every time if the SQL doesn’t quite match up. In that case,
your DBA may have wasted his time building the wrong aggregates. Perhaps he
could solve this problem by building more aggregates, maybe even one for every
possible situation. The trouble with that approach is he might end up using as
much disk space on aggregates as he did on the star schema itself, not to
mention the time he’ll have to spend designing the aggregates and refreshing
them with data periodically. DB2 Cube Views can help. It can build the ideal set of
aggregates or MQTs for him the first time and find out the best compromise
between space, time and query performance.
In Figure 2-5, you can see the DB2 Cube Views Optimization Advisor, a very
smart expert system on performance that is going to ask your DBA a few
questions before it gets to work on building the aggregates. Questions like these:
1. What kinds of queries do you plan to use against this star schema?
SQL
Optimization Advisor:
1. query types?
2. space?
3. cube model? DB2
catalog optimizer
4. statistiques?
5. sample data?
Now, the Optimization Advisor has what it needs to recommend one or more
aggregates for your database. In Figure 2-6 you can see that it has generated an
aggregate table, in some ways similar to the aggregates your DBA might have
built by himself, but it is probably much more than that. By using very
sophisticated rules and techniques, the aggregates recommended by the
Optimization Advisor will very likely be super aggregates with multiple
aggregations across multiple combinations of hierarchical levels of multiple
dimensions defined within the cube model. In a way, some aggregate tables
become a little bit like cubes, but not complete ones because of the space
restrictions placed on it by your DBA and by the Optimization Advisor itself. Best
of all, the aggregates will be recommended in such a way that they are highly
likely to be chosen by the DB2 optimizer at query time.
SQL
Optimization Advisor:
1. query types?
2. space?
3. cube model? DB2 optimizer
4. statistiques? catalog
5. sample data?
Now, let’s gain a deeper understanding of the benefits of DB2 Cube Views by
examining a series of scenarios one by one:
Feeding metadata into DB2 Cube Views
Feeding front-end tools from DB2 Cube Views
– Supporting Multidimensional OLAP (MOLAP) tools with DB2 Cube Views
– Supporting Relational OLAP (ROLAP) tools with DB2 Cube Views
– Supporting Hybrid OLAP (HOLAP) tools with DB2 Cube Views
– Supporting bridgeless ROLAP tools with DB2 Cube Views
Feeding Web services from DB2 Cube Views
These scenarios will help the reader understand the metadata flows in and out of
DB2 Cube Views, as well as the performance and administrative benefits of
using DB2 Cube Views in each case.
Front-End Tools
(MOLAP, ROLAP, Relational Objects
HOLAP) in DB2
Whichever of the three approaches you choose to use, the result will be a
mapping between your relational objects and your OLAP objects that will lie at
the heart of DB2 Cube Views. Figure 2-8 shows an example of some relational
objects you might have and the OLAP objects to which they might map.
Figure 2-9 represents the same example but this time showing the relational
objects as a star schema.
DB2
DB2 Cube
Cube Views
Views
Cube
Cube Model
Model
Dimension
Dimension
Cube
Cube
Hierarchy
Hierarchy
Facts
Facts Attribute
Attribute
Join
Join Relationship
Relationship
Data modeling tools add a lot of value to database implementation projects. They
greatly increase understanding through graphic representations of data
relationships and data meaning, while they dramatically decrease the time it
takes to develop a new database from its inception to its implementation. They
capture information and store it as metadata to be used as future reference and
as the basis of further development. Also, they typically generate database
commands capable of creating all the physical objects for the new database.
ETL tools are also rich sources of metadata related to the star schema, since
they are used to populate it. Their metadata includes detailed information about
the target star schema tables and columns, as well as information about the
source system databases and the transformations that have been performed on
each data element on its way from source to target. This transformation history
information makes data lineage reporting possible. For example, an end user
might find it useful to know that the net sales figure he is looking at on a report is
actually the result of a complex calculation involving two separate fields each of
which was originally extracted from a different operational database.
Metadata management tools offer very special advantages, too, since they
interact with multiple tools and exchange and integrate the metadata from
multiple tools into one centralized, consolidated resource. These powerful
metadata resources offer valuable assistance to the enterprise in the form of
cross-tool data lineage reporting as well as cross-tool impact analysis reporting.
An impact analysis report would alert a data analyst that a change made in one
tool, for example a data modeling tool, will have an impact on another tool, such
as an ETL tool or a reporting tool.
Note: When feeding the metadata mappings into DB2 Cube Views from
back-end tools, the imported metadata may contain all or part of the needed
information. For example, the metadata may describe only the relational
schema (the star schema) and not the complete metamodel. So while
importing that star schema metadata helps jump start the DBA’s work, there
may still remain the tasks of defining a most complete cube model using DB2
Cube Views OLAP Center and mapping it back to the star schema.
Also, if you are using an ETL tool that offers data lineage reporting, then the
objects in DB2 Cube Views can show up in the data lineage reports because
metadata bridges exist to share the DB2 Cube Views metadata with the ETL tool
repositories. Lastly, if you are using a metadata management tool that offers
cross-tool impact analysis and shares metadata with DB2 Cube Views, then its
reports can show users how a change in a data model can affect the DB2 Cube
Views objects, or how a change in a DB2 Cube Views object can affect an
existing report on your data delivery platform.
OLAP Metadata
Metadata
Bridge
OLAP Objects in
DB2 Cube Views
Star Schema Relational Objects
in DB2
DDL
Data Modeling,
ETL,
& Metadata
Management
Tools
Benefits
The benefits in this approach are:
Low administrative effort
Better cross-tool data understanding
Data model enrichment
A scenario
Let’s say you had already created a rich layer of metadata in your favorite
front-end tool before you installed DB2 Cube Views. Since that metadata already
contains descriptions of your star schema database and of the OLAP objects
related to it, it makes sense to save time and re-work by exporting the metadata
from the front-end tool and importing it into DB2 Cube Views using a metadata
bridge and get the most complete meta-model.
OLAP Metadata
Metadata
Bridge
OLAP Objects in
DB2 Cube Views
Relational Objects Star Schema
MOLAP and
in DB2 ROLAP
Reporting Tools
Benefits
The main benefit will be to speed-up your start-up with DB2 Cube Views.
Speedy start-up
Clearly, any star schema reporting system that was implemented without DB2
Cube Views stands to improve its performance by adding DB2 Cube Views, and
you can get started building and using the automatic high performance
aggregates (also known in DB2 as Materialized Query Tables, or MQTs), as soon
you have completed the job of defining your OLAP objects and mapping them to
your relational objects. Your data delivery platform already contains the relational
and OLAP metadata objects and the mappings that you need, so all you have to
do to get the DB2 Cube Views model populated is to import them from the
front-end tool via a metadata bridge. This type of metadata exchange is possible
with any reporting tool that supports a two-way metadata bridge with DB2 Cube
Views, and it will speed you on your way to reaping the performance benefits of
DB2 Cube Views.
A scenario
Let’s say none of your back-end or front-end tools offers any bridges to DB2
Cube Views. In that case, you will use the OLAP Center to create your OLAP
metadata from scratch, using a GUI built especially for that purpose.
OLAP Objects in
DB2 Cube Views DB2 OLAP
Center GUI
Star Schema
Relational Objects
in DB2
Benefits
The benefits in this approach are:
Speedy start-up
Highly refined OLAP object definitions
Speedy start-up
The Quick Start wizard in the OLAP Center is truly a time saver. By detecting the
OLAP objects instead of requiring the user to enter each one manually, the OLAP
model is quickly built and the user can spend his time doing further refinements,
rather than basic tasks. The Quick Start wizard can detect and create the
following objects:
A cube model that contains all of the other metadata objects.
A facts object that corresponds to the fact table you specified.
Measures that correspond to the fact table columns you specified.
For example, you may have a model that contains facts and dimensions, but not
hierarchies. The OLAP Center has a Hierarchy wizard you can use to create
hierarchies for each dimension. A hierarchy can be defined using only one
attribute, or it can define relationships between two or more attributes within a
given dimension of a cube model. Defining these relationships provides a
navigational and computational means of traversing the specified dimension.
You can define multiple hierarchies for a dimension in a cube model. This wizard
allows you to specify other advanced OLAP objects, such as:
Hierarchy type (for example., balanced, unbalanced, standard, ragged,
network, recursive)
Hierarchy level
Attributes associated with each hierarchy level
Attribute type (for example, associated or descriptive)
Other wizards in the OLAP Center enable the creation of still more metadata
objects:
Dimension type (for example, regular or time)
Attributes associated with each dimension
Calculated attributes
Calculated measures
New tables
New measures
New attributes
New joins
Aggregation rules for each measure (for example, SUM, COUNT, MIN, MAX,
AVG, STDDEV, script, none)
Metadata Bridge
MQT
Optimization SQL
MOLAP,
ROLAP, or
HOLAP
Front-End
Tool
DB2 UDB DB2 Cube Views
The data story is one of speed and efficiency, owing to the superiority of DB2
Cube Views’ automatically-built aggregate tables over manually-built aggregates.
Since the aggregates are built directly by the Optimization Advisor, they offer
Each scenario is a little different from the others, and each will offer the user
some unique benefits. It is entirely possible that your plans will include
implementing more than one of these types of tools. If so, then DB2 Cube Views
will offer you the additional benefit of allowing you to collect your OLAP metadata
in one central place, namely in DB2, and then share it many times with all your
reporting tools via metadata bridges.
A scenario
Let’s say you decide to build a MOLAP cube for your users, using the data in your
star schema as the source of the cube. If you were to compare the data in your
MOLAP cube to the data in your relational star schema, you would probably
notice a few striking differences. The first difference you are likely to notice is one
of grain. Grain refers to the lowest level detail data in terms of dimensional
hierarchy that is stored in the database.
For example, a leaf-level cell in your MOLAP database might represent all the
GrandMa’s soup sales (that is, all the GrandMa’s Soup SKU’s combined) for one
month for one state for all customers from the same zip code. In that case the
grain of the MOLAP database could be said to be the intersection of product
group + month + state + customer zip. Clearly, the MOLAP database in this case
would represent an aggregation of the fact table data and it would have a
different grain from that of the fact table. Figure 2-14 shows the cube as a subset
of the star schema.
Product
Product Time
Time Market
Market Customer
Customer
All
All All
All All
All All
All
SKU Quarter
Quarter State Cust
Cust
SKU State
Zip
Zip
Month
Month Store
Store Customer
Customer
Cube Day
Day MOLAP Area
Model
Figure 2-14 MOLAP with higher grain from that of star schema
Similar to the last example, the MOLAP database in this example is also an
aggregate of the fact table, and it also has a different grain from the fact table.
This time, it also has one less dimension, and all the data is aggregated to the
“All Customers” level. Figure 2-15 shows this second cube as a different subset
of the star schema, with different dimensionality from the star schema.
Product
Product Time
Time Market
Market Customer
Customer
All
All All
All All
All All
All
SKU Quarter
Quarter State
State Cust
Cust
SKU
Zip
Zip
Month
Month Store
Store Customer
Customer
Cube Day
Day MOLAP A
Model
Figure 2-15 MOLAP with different dimensionality from that of star schema
Differences in grain and dimensionality between the fact table and the MOLAP
database make it necessary to aggregate the fact table data in order to load it
efficiently into the lowest-level cells of the MOLAP database. The MOLAP tools
understand this and generate SQL containing appropriate aggregation grouping
constructs when it loads the relational data into the MOLAP databases.
The data flow, represented by the lower set of arrows pointing both directions,
carries the SQL extract request from the MOLAP tool to DB2 and the result set
data from the relational tables to the MOLAP database at load time. Notice that
the pre-built aggregate is being used as the source of this load rather than the
relational base tables, even though the front-end tool created its extract SQL
based on the base tables.
Metadata Bridge
MQT MOLAP
Optimization SQL Cube Load Front-End Tool
Based on this knowledge, the Optimization Advisor can create an MQT that
contains a slice of fact table data aggregated to match the lowest level of data to
be loaded to your MOLAP database. Once this is done, your loads and refreshes
will be run considerably faster, since there is no additional aggregation that
needs to be done at load time.
Market
Market Product
Product Time
Time
All
All All
All All
All
Region
Region Family
Family Year
Year
State
State SKU
SKU Quarter
Quarter
Sub
Sub Month MOLAP Area
Month
Cube SKU
SKU
Model MQT Slice
It could be argued rightly that the time it takes to refresh an MQT takes away
some of the benefit described just above, but by no means all of it. Since the
MOLAP database will be unavailable to the end users during the MOLAP load,
this downtime can be considerably reduced by moving the aggregation work from
the MOLAP tool to the relational database in the form of the MQT refresh. Also, if
your schedule is such that the periodic fact table refresh can be scheduled well
ahead of the periodic MOLAP load, then the MQT refreshes can be done early,
along with the relational update, when there is less pressure on the load window.
Another key inherent advantage of MQTs is that they are available to all users of
the data warehouse, not just a particular MOLAP tool. This is a general benefit of
MQTs overall vis-à-vis MOLAP databases.
Economies of scale
The benefit of fast MOLAP database load is greatly increased in certain
situations, reaping many times the basic benefit. Time savings are multiplied
when you are building several MOLAP databases from the same star schema
data. Grain and dimensionality will vary from MOLAP database to MOLAP
database, but DB2 Cube Views can build an MQT that will be shared by multiple
MOLAP database loads. If you want this benefit, then you will define one cube
within your cube model that represents a combination of all your MOLAP
databases, or a superset, of all the MOLAP databases you want to load, in terms
of grain and dimensionality. In these cases, the time saved during the MOLAP
loads is many times the time spent refreshing the MQTs because there are
multiple loads, but only one MQT.
Another multiplied benefit can be realized because of the MQT’s ability to accept
incremental refreshes. Every time the fact table is updated with new data, the
MQTs associated with it will need to be refreshed as well, so that they will stay
synchronized and usable by the DB2 optimizer. If the MQT is built using SUM
and COUNT column functions, then it is capable of being refreshed
incrementally, rather than having to be rebuilt from scratch each time the fact
table is updated. The multiplied benefit is realized in shops where the star
schema data is updated more frequently than the cubes. For example, if the
relational fact table is updated once a day and the MOLAP databases are
refreshed once a week, then only a fraction of the aggregation work for the week
has to be done on the day of the MOLAP load. This can add up to a tremendous
benefit in time saved.
A scenario
Let’s assume you have built the same star schema as the one described in the
MOLAP scenario. This time, however, you are not going to offer any MOLAP
cubes. Instead, you want to allow your users to access the entire star schema
directly, in ROLAP fashion, producing reports at any and all levels of grain and
dimensionality. Let’s assume you want to allow them to be able to do both
Drill-down queries and general Report queries. Drill-down queries produce
reports that typically present the end user with a view of the data that
corresponds with the very highest level of aggregation of your data, and then
allows the user to drill-down, dimension by dimension, until he can see the data
that is most interesting to him. By contrast, report queries can start with a query
that is equally likely to access any part of the cube model, and possibly offer
drill-down from there.
The data flow, represented by the lower set of arrows pointing both directions,
carries the SQL data retrieval request from the ROLAP tool to DB2 and the result
set data from the relational tables to the ROLAP tool at report query time. Notice
that the pre-built aggregate is being used to satisfy this query rather than the
relational base tables, even though the front-end tool constructed its SQL to read
the base tables.
Metadata Bridge
MQT ROLAP
ROLAP SQL Front-End
Optimization
Tool
If you tell DB2 Cube Views that you want to optimize for drill-down, then it will
build an MQT for you that has a concentration of aggregations at the middle
levels of your dimensions as well as dense aggregations at the top levels of your
cube model. Then at query time, your end users will get fast response times from
start to finish, without your having to build or maintain cubes for them. See
Figure 2-19 for a graphical look at the dense rollup plus the additional
aggregation slices that would be built into your MQT in this scenario.
All
All All
All All
All
Region
Region Family
Family Year
Year
State
State SKU
SKU Quarter
Quarter
Area densely
Sub
Sub optimized for
Month
Month drill-down
Cube SKU
SKU
Model Additional
MQT Slices
The second optimization option, Report, is the most generalized of all query
types. If you indicate through the OLAP Center that you want this type of
optimization, then the Optimization Advisor will build you an MQT similar to the
one built for Drill-down, but without the dense rollup at the top. The MQTs that
are built to support ROLAP reporting contain multiple slices of aggregated data,
so that as many queries as possible can be re-routed to the MQT by the DB2
optimizer. See Figure 2-20 for a depiction of the multiple-slice MQT that DB2
Cube Views might build for you in this situation.
All
All All
All All
All
Region
Region Family
Family Year
Year
State
State SKU
SKU Quarter
Quarter
Sub
Sub Month
Month
Cube SKU
SKU
Model
MQT Slices
The largest benefits are on the data side. Since ROLAP tools access very large
tables and do not use cubes, their analytical queries must depend on pre-built
aggregates in order to achieve outstanding performance. The aggregate tables
built by the DB2 Cube Views expert system satisfy this requirement extremely
well because they are chosen by the DB2 optimizer in a high percentage of
cases. Although most of these tools offer their own aggregate awareness
features that access pre-built aggregate tables instead of using the base star
schema tables when they can, using aggregate-awareness features outside of
DB2 adds a considerable amount of administrative overhead to the configuration
and use of the ROLAP tool. By relying instead on DB2 and DB2 Cube Views to
create and maintain the aggregate-awareness in the overall system,
administration is greatly simplified and much more efficient.
A scenario
Let’s assume, once again that you have built the same star schema as you did for
the MOLAP and ROLAP scenarios, but this time you want it all. You want the
speed of MOLAP for those queries that can be resolved within your cubes, and
you want optimized ROLAP for those queries that stray outside your MOLAP
boundaries. Also, you want all the slicing and dicing to appear seamless to your
end users, regardless of which database is used to satisfy their queries (see
Figure 2-21).
All
All All
All All
All
Region
Region Family
Family Year
Year
State
State SKU Quarter
Quarter MOLAP Area
SKU
Sub
Sub Month HOLAP Area
Month
Cube SKU
SKU
Model
DB2 Cube Views can automatically build aggregates that are tailored specifically
to optimize your HOLAP drill-throughs. If you want these queries optimized, you
first create a cube definition within your cube model that corresponds in terms of
grain and dimensionality to the HOLAP area you expect will be hit by drill-through
queries. In this case, what you define in the OLAP Center as your cube actually
represents more data than will be held in your MOLAP database because you
are specifying the HOLAP area rather than the MOLAP area with your cube
definition.
The data flow, represented by the lower set of arrows pointing in both directions,
carries the SQL data retrieval request from the HOLAP tool to DB2 and the result
set data from the relational tables to the HOLAP tool at MOLAP load time or at
report query time. Notice that the pre-built aggregate is being used to satisfy
these queries rather than the relational base tables, even though the front-end
tool constructed its SQL to read the base tables.
Metadata Bridge
Based on this knowledge, the Optimization Advisor can create an MQT that
contains multiple slices of fact table data aggregated to optimize both the load of
your MOLAP database and your drill-through queries that go below the MOLAP
database. When the advisor is choosing which slices to build, it gives highest
priority to the dimension with the greatest number of distinct values, in other
words, the dimension with the greatest cardinality, because drill-through queries
in this dimension will benefit the most from using an MQT. Once the MQT is built,
your MOLAP loads will run considerably faster, since there is no additional
aggregation that needs to be done at MOLAP database load time, and your
drill-through queries will also be optimized.
In our scenario, DB2 Cube Views will understand that since the cardinality of the
product dimension is very high, then the cost to aggregate in that dimension will
be expensive, and give high priority to building an MQT that has a slice of data
that aggregates the product dimension lower than the boundary of the cube. On
the other hand, since there are only 3 months per quarter, pre-building an
aggregate by month would get lower priority. Consequently, the slices of data just
below the cube grain that are likely to be hit at drill-through time have been
optimized.
As stated above, these actions will produce an MQT that optimizes both your
MOLAP load and your drill-through queries and the MOLAP load and refresh will
get optimized by the drill-through optimization. Your loads and refreshes will be
run considerably faster, since there is no additional aggregation that needs to be
done at load time.
In the past, reporting tools that were not equipped to store any OLAP metadata
were not able to offer their users any OLAP slice and dice or drill-down reporting
at all, even if they were connected to star schema relational databases. Since
DB2 Cube Views stores the OLAP metadata centrally in the database, that
inability is now gone. DB2 Cube Views has an application programming interface
or API that allows any program or tool to access the OLAP metadata directly to
retrieve OLAP-aware information about the underlying star schema. This way, the
end user sees only OLAP objects to choose from, and he never has to see
anything about tables, columns, joins or SQL. Several tools have already
incorporated this API and by doing so they have transformed themselves into
bridgeless ROLAP tools.
A scenario
Let’s assume that you have the same star schema as was built for all the
previous scenarios, but the difference this time is your front-end tool. Instead of
deploying a sophisticated (and expensive) enterprise-wide data delivery platform
rich with metadata and report distribution options, your organization has opted for
a less expensive tool that will offer your users OLAP-style navigation through
your star schema data using only the resources contained immediately in your
DB2 database.
MQT
Optimization ROLAP SQL
Bridgeless
ROLAP Tool
Figure 2-23 Data and metadata exchange with bridgeless ROLAP tools
The data flow begins as the user clicks on the displayed OLAP metadata objects.
The bridgeless ROLAP tool uses the information in the relational mapping
metadata retrieved earlier via the API to construct the SQL statements needed to
extract the corresponding data from the relational tables and produce reports.
Notice that the pre-built aggregate is being used to satisfy the query rather than
the relational base tables, even though the bridgeless ROLAP tool constructed its
SQL to read the base tables.
Note: DB2 Cube Views Web services, available from the alphaWorks IBM
Web site:
http://www.alphaworks.ibm.com.
In its search to find external information on its customers, the Grocery Max folks
discovered a Company called Cross References that specializes in demographic
data collection and is also a Web services provider. After finding the description
of Cross References’ Web services on a public UDDI registry, Grocery Max
decided to use Cross References’ OLAP data to augment their own. By doing
so, they learned that 30% of the locals and potential customers of the selected
stores had French origins. In the light of this new information Grocery Max
executives decided to increase supply of French products for the selected stores.
In addition to the metadata services described above, the OLAP provider might
offer a Web service to retrieve the OLAP data based on the fact names and
dimension values that were previously retrieved in the metadata and to translate
the caller’s XPath statements into SQL. This data flow is represented by the
lower set of arrows pointing in both directions, which carry the SQL data retrieval
request from the Web service to DB2 and the resulting dataset from the relational
tables back to the Web service. Notice that the pre-built aggregate is being used
to satisfy this query rather than the relational base tables, even though the SQL
was constructed to read the base tables.
XML
Internet
Web
Service
XPath Request
MQT
ROLAP SQL
Optimization Web
XML Application
2.4.3 Benefits
The benefits in this approach are:
Easy integration of information
Easy access to remote information
Cube modeling in DB2 UDB V8.1 now makes the database aware of the higher
level of organization that OLAP requires by building metadata objects in the DB2
catalog in the form of dimensions, hierarchies, attributes, measures, joins, and so
on. This metadata model is very strong and complete and capable of modeling a
wide range of schemas from simple to complex. It adds value not only to DB2 but
also to the tools and applications that access such dimensional data through
simple DB2 interfaces. It allows access of dimensional intelligence in the form of
facts, dimensions, hierarchies, attributes to be exchanged from back-end tools
through the database to front-end tools. Metadata needs to be defined only once
and is then available to all tools and applications that need such metadata.
OLAP tools and applications that interface with DB2 have each a different view of
OLAP. It ranges from well-defined (rigid) dimensional models to flexible ones.
With all these requirements as input, the DB2 Cube Views model has taken a
layered approach to creating objects in the DB2 catalog. This approach allows
the tools to derive maximum benefit from the cube model.
The DB2 Cube Views model is designed to handle star/snowflake schemas due
to simple yet compelling advantages that these type of schemas possess:
Industry standard:
Such designs are widely implemented and easily understood for OLAP-type
solutions. Understanding allows more useful navigation to be performed by
users/tools accessing the database and allows meaningful data to be
retrieved more easily.
Performance:
Star schema databases deliver high performance in data retrieval by
minimizing the number of joins required (relative to a normalized relational
model) and generally by simplifying access to the data. Performance is
enhanced for queries that join many tables (instead of spanning many
records) and a single table scan could span many records.
These have such significance that star schemas (or snowflake designs) are
recommended for performance reasons when building cube models in DB2.
This model is named star schema due to the dimension tables appearing as
points of a star surrounding the central fact table, as shown in Figure 3-1.
Product
Customer
Product _ID
Product_Desc Customer_ID
Customer_Name
FACT
Product _ID
Customer_ID
Date_ID
Region_ID
Sales
Expenses
Profit
Region Time
Region_ID Year_ID
Country Quarter_ID
State Month_ID
City Date_ID
How you came to the decision to build a star schema, whether it is a star schema
data warehouse or a star schema datamart drawn from a 3NF data warehouse,
is another debate not addressed directly in this book.
With DB2 Cube Views the database designers can define logically all
dimensions, measures, and hierarchies from the same transaction data in the
form of cube models and then deploy as many cubes they feel necessary,
maintaining consistency among applications.
For example, the Product dimension table generates subsets of rows. First, all
rows from the table (where level=Family in the star schema) are extracted and
only those attributes that refer to that level (Family_Intro_Date, for example) and
the keys of the hierarchy (Family_Family_ID) are included in the table.
Population MARKET
Population_Population_ID
CUSTOMER
Population_Alias
Customer_ID
Customer_name
Market
Population_ID
Region Market_State_ID
Region_ID SALES
State
Region_Region_ID State Product_ID
Director Market Type Customer_ID
State_ID
Date_ID
Sales
Expenses
PRODUCT
Family Profit
Family TIME
Family_Family_ID
Family_Intro_Date
Product Year
Quarter
Product_Product_ID Month
Family_ID Date_ID
Caffeinated
Package type
Ounces
SKU
Intro_Date
In this chapter, we will use star schema to also represent snowflake unless
explicitly required to distinguish between them.
To better understand the notion of a cube model in DB2, we will pursue a layered
approach to this concept in the context of the following scenario.
Imagine that over time, we are tracking sales data of a retail company selling
cosmetics that has stores spread over several states. Information is stored about
each customer, line of products that the company sells, the stores, campaign
details that the company adopts. These questions now arise when the company
wants to decide on a new campaign:
Which is the best geographic location that they start with, based on stores
making consistent profits?
Which time period is the best to start the campaign?
Who is the target market for the new product?
For our business case study, we used a retail database with the tables
CONSUMER_SALES, CONSUMER, PRODUCT, STORE, DATE, CAMPAIGN .
Note: Refer to Appendix E, “The case study: retail datamart” on page 685 for
a complete description of the star schema for the retail database case study.
Dimension Table
Dimension Table
Fact Table
DB2 Relational
Dimension Table Dimension Table Objects
Profit =@Column(STAR.CONSUMER_SALES.TRXN_SALE_AMT) -
@Column(STAR.CONSUMER_SALES.TRXN_COST_AMT
Some of the measures described in the facts object can be actual columns from
the relational table or aggregated measures (measure that have been calculated
using the aggregation functions as SUM, AVG). For example, using the Profit SQL
expression as input for the SUM aggregation function, an aggregation on the
Profit measure would be: SUM(Revenue - Cost). For further information on the
options available when creating advanced measures in DB2 Cube Views, please
refer to 3.4, “Enhancing a cube model” on page 118.
In DB2 Cube Views, we define a metadata object called Fact based on one fact
table from the relational star schema (see Figure 3-5).
Metadata Objects
Facts
Measure M easure
Fact Table
DB2 Relational
Dim ension Table
Dim ension Table Objects
3.2.2 Attributes
Simply stated, an attribute represents a database table column.
For example, attributes for the facts object are derived from the key columns:
DATE_KEY, CONSUMER_KEY, STORE_ID, ITEM_KEY, COMPONENT_ID.
Note 2: When other attributes are used in the defining SQL expression, the
other attributes cannot form attribute reference loops. For example, if Attribute
A references Attribute B, then Attribute B cannot reference Attribute A.
Metadata Objects
Facts
Dimension Table
Dimension Table
Fact Table
Dimension Table
DB2 Relational
Dimension Table Objects
Figure 3-6 Attributes on dimension tables
3.2.3 Dimensions
The dimension object provides a way to categorize a set of related attributes that
together describe one aspect of a measure (see Figure 3-7). A dimension is a
collection of data of the same type.
Information for these objects is abstracted from the relational tables of the star
schema constituting the dimensions.
Dimensions are used in cube models to organize the data in the facts object
according to logical categories like Region, Product, or Time.
Metadata Objects
Dimension
Facts
Dimension Table
Dimension Table
Fact Table
DB2 Relational
Dimension Table Dimension Table Objects
2 0 0 1
1 s t Q tr.
J a n F e b M a r
Hierarchies are defined by the different levels in the dimension and the
parentage.
The type of hierarchy objects defined in DB2 Cube Views can be:
Balanced:
A hierarchy is a balanced hierarchy if children have one parent and levels
have associated meaning or semantics and the parent of any member in a
level is found in the level above. For example, see the Time dimension in
Figure 3-9.
Time
Year
Quarter
Month
Day
Unbalanced:
A hierarchy is unbalanced if children have one parent and levels do not have
associated meaning or semantics, as in Figure 3-10.
X Y Z
J K E F
W1 W2
The semantics are in the relationships between levels rather than in the level
itself, as in the example:
Product A 'is composed of' products X and Y and component Z
Component Z 'is composed of' Component E and part F
Product X is composed of Components J and K
Component K is composed of parts W1 and W2
R e g io n C o u n try S ta te C it y
A m e r ic a s USA C a lif o r n ia San Jose
E u ro p e G re e c e - A th e n s
- I c e la n d - R e y k j a v ik
Geography
Region
Country
State
City
Network:
A hierarchy is a network hierarchy if children can have more than one parent,
such as a family tree, for example.
The implementation in the DB2 tables defines the deployment mode: it can be
standard or recursive:
In a standard deployment, the attributes in the dimension table define each
level in the hierarchy. All types of hierarchies are supported.
In a recursive deployment, the levels in a dimension hierarchy is defined by
the parent-child relationship:
– One attribute defines parent.
– One attribute defines children.
Only the unbalanced hierarchy type is supported.
Thus hierarchy can be recursively deployed (using the OLAP Center GUI)
when the hierarchy has only two attributes.
For a more detailed description for type and deployment of hierarchy object, see
the IBM DB2 Cube Views Setup and User’s Guide, SC18-7298.
The associated type specifies that the right attribute is associated with the left
attribute, but is not a descriptor of the left attribute. For example, a CityPopulation
right attribute is associated with, but not a descriptor of CityID.
Cardinality defines type of relationship between the left and right attributes. For
example, in a 1:1 cardinality, there is at most one left attribute instance for each
right attribute instance, and at most one right attribute instance for each left
attribute instance. Other possible values for cardinality are 1:Many, Many:1, and
Many:Many.
Metadata Objects
Dimension
Hierarchy
Facts
Attribute
Relationship
Dimension Table
Dimension Table
Fact Table
Dimension Table
DB2 Relational
Dimension Table Objects
3.2.6 Joins
DB2 Cube Views stores metadata objects called joins, representing joins in the
star schema (see Figure 3-13).
In the case of a star schema, the join objects in the cube model are those that
exist between the facts object and each dimension object.
Join
between Fact and
Dimension
Join
between Fact
tables Join between
dimension tables
Dimension Table
Dimension Table
Fact Table
Dimension Table Relational Objects in
Dimension Table DB2
A join object joins two relational tables together. The simplest form of a join maps
a column in the first table to a column in the second table, along with an operator
to indicate how the columns should be compared.
If any type of join can be selected for modeling purposes, inner joins based on
the optimization rules as described in IBM DB2 Cube Views Setup and User’s
Guide, SC18-7298 are required for the Optimization Advisor.
The cube model now has the objects that are depicted in Figure 3-14.
Facts
Attribute Hierarchy
Join Relationship
Metadata Objects
Dimension Tables
Dimension Tables
Cube model
A cube model in DB2 Cube Views is a logical representation of the underlying
physical tables in DB2 and is itself a metadata object in DB2 Cube Views. A cube
model and all its related metadata objects are stored in the DB2 catalog within
the DB2 database prepared for DB2 Cube Views.
From the perspective of a BI tool, ultimately importing the DB2 Cube Views
metadata model, the cube model is the virtual multidimensional environment or
universe within which users navigate through their graphical interface. Tool users
are unaware of the underlying mapping to DB2 relational objects, and tend not to
think of their environment as a logical abstraction of a DB2 star schema but
rather as a pure conceptual representation of the business.
A cube is a very precise definition of an OLAP cube that can be delivered using a
single SQL statement. The cube facts and list of cube dimensions are subsets of
those in the referenced cube model. Cubes are appropriate for tools and
applications that do not use multiple hierarchies because cube dimensions only
allow one cube hierarchy per cube dimension. You can use the Cube wizard in
OLAP Center to create a cube. You must have a complete cube model to create
an associated cube.
One or more cubes can be derived from a cube model. A cube has a cube facts
as the central object and cube dimensions. The cube facts (measures) and cube
dimensions are again subsets of the corresponding objects referenced in the
cube model. Cube hierarchies are scoped down to the cube and each can be a
subset of the parent hierarchy that it references. Each cube dimension can have
only one hierarchy defined. This structural difference between cube model and a
cube allows a slice of data (the cube) to be retrieved by a single SQL query.
Using cubes is appropriate in the case of tools and applications that do not
require multiple hierarchies. Cube metadata can also be used by the
Optimization Advisor when optimizing query performance or a specific business
subject (refer to Chapter 4, “Using the cube model for summary tables
optimization” on page 125).
Cube
Cube Cube
Cube Model
Model
Cube
Cube dimension
dimension Dimension
Dimension
Cube
Cube hierarchy
hierarchy Hierarchy
Hierarchy
Cube
Cube facts
facts Facts
Facts Attribute
Attribute
Join
Join Relationship
Relationship
Cube Measure
Measure Measure
Measure Attribute
Attribute Join
Join Attribute
Attribute
Metadata
A basic complete cube model based on a star schema should have a facts object
joined to two or more dimension objects. At least one hierarchy should be
defined for each dimension.
The OLAP Center has the same look and feel of other DB2 GUI tools. It is a Java
based program using available DB2 common classes.
On the Windows platform, the OLAP Center can also be started from Start ->
Programs -> IBM DB2 -> Business Intelligence Tools -> OLAP Center.
XML output
XML input
(export) file
(import) file
JDBC
Star
OLAP
Type text MQT
Type text
Type
text
Type
text
Type
text
Type text
text
These are the main tasks that are performed from the OLAP Center:
Import of OLAP partner metadata in the form of eXtensible Markup Language
(XML) files into DB2. This is done using the Import wizard available through
the OLAP Center menu. The XML files can be imported from partners’ tools
through their bridge. See Chapter 5, “Metadata bridges overview” on
page 221 on OLAP partner bridges.
Export of OLAP metadata in DB2 that can also be exported as XML files and
made available to other OLAP solutions. The XML files can be exported and
passed through a bridge to partners’ tools. See Chapter 5, “Metadata bridges
overview” on page 221 on OLAP partner bridges.
Creation and manipulation of metadata objects in DB2. The GUI helps view
metadata objects using detailed and graphic views and manipulation of
metadata objects through Object Properties. The Quick Start wizard and
Object Creation wizards help in creating the OLAP metadata objects in DB2.
Now, let’s get started on creating and building metadata objects in DB2 Cube
Views. We will focus on the different methods to build a cube model using the
OLAP Center, as depicted in Figure 3-18.
The different tasks involved in building a cube model (based on a star schema)
are presented as a broad overview in Table 3-2.
Preparing the relational Register DB2 Cube Views Section 3.3.2, “Preparing
database stored procedure with the the DB2 relational
database database for DB2 Cube
Views” on page 86
Create metadata catalog
tables
Building a cube model With Quick Start wizard Section 3.3.4, “Building a
cube model with Quick
Start wizard” on page 91
3.3.2 Preparing the DB2 relational database for DB2 Cube Views
Setting up the database to be used with DB2 Cube Views includes:
Registering the DB2 Cube Views stored procedure with the database
Creating metadata catalog tables for DB2 Cube Views
This preparation is done manually using DB2 commands or from the OLAP
Center GUI
The import scenario is used in such BI environments that already have OLAP
metadata (dimensional models) captured by various tools which can be reused
and/or consolidated by DB2 Cube Views. This can help gain a head start in the
development of cube models by reducing development effort. This can also make
OLAP definitions across the enterprise consistent.
The principle
Building a cube model by import is based on the principle that a dimensional
model metadata is available from sources outside of DB2 Cube Views. This
metadata information has been converted into a form that is compatible with DB2
Cube Views. Metadata to be imported into DB2 Cube Views is always in the form
of an XML file. The import XML file is the output received from passing source
For example, if we are importing metadata coming from a partner tool bridge that
uses XML files, then the partner tool metadata is first exported as an XML file
from the partner tool and its bridge. The output from the bridge is then used as
the import XML for the following steps which guide you through the steps to
import metadata into DB2 Cube Views using the OLAP Center:
1. Launch OLAP Center.
2. Connect to the relational database in DB2 from OLAP Center (see
Figure 3-19 on page 87) in which you wish to store the cube model metadata.
This should be a relational database that has already been prepared for DB2
Cube Views (if not, preparation will take place on connecting to the database
for the first time, as explained earlier).
3. Once connected to the database, choose from OLAP Center --> Import
(see Figure 3-20).
At this stage, the import XML file is read, and information is displayed about
the objects that it contains (cube model, facts, dimensions, cube and so on).
If these objects that are being imported are brand new definitions to be added
to DB2, then they have a (New) tag associated with the name. If an object
with that name already exists in DB2, then an (Existing) tag is displayed next
to the object. Apart from this graphical description, the window also displays a
textual description of the number of new objects, number of existing objects,
and total number of objects being imported.
Here you have the option to replace existing objects or create new objects in
to the DB2 catalog.
6. Click Next to see the summary of options that you have chosen and click
Finish to actually import the metadata.
7. On successful import, the detailed view of OLAP objects in OLAP Center
shows the cube model and cubes (if any) imported (see Figure 3-23).
For importing, the db2mdapiclient utility typically uses an XML file that is
produced by a DB2 Cube Views bridge or that was exported from the OLAP
Center.
For example, to import DB2 Cube Views metadata into the relational database in
DB2, say, MDSAMPLE, change to the ..\SQLLIB\samples\olap\xml\input
directory and enter the following command:
db2mdapiclient -d MDSAMPLE -u db2admin -p mypasswrd -i create.xml -o
myresponse.xml -m MDSampleMetadata.xml
When creating empty cube models, Quick Wizard should always be used if
possible, as it includes additional features as join auto-detection that does not
exist when creating the cube model manually.
To start using the Quick Start wizard, right-click Cube Models as shown in
Figure 3-25 and choose Create Cube Model - Quick Start.
The Quick Start wizard allows user to specify the fact object and consequently its
measures. Once the fact objects and measures have been specified, the wizard
creates a basic cube model with a fact object and other dimension objects.
Basic objects (dimensions, attributes and joins) are created using the RI
constraints and the primary key/ foreign key pairs information. Once a basic cube
model has been created, the properties of the metadata objects that have been
created can be modified at a later time. Other metadata objects like hierarchies
and cubes, which do not get created while using the Quick Start wizard, need to
be added manually, from scratch.
To create an empty cube model, right-click the cube models object and select
Create Cube Model (see Figure 3-25)
Note: If more than one table is needed to build the facts object, then you will
be prompted to specify the join. In our illustrative star schema, there is only
one table for the facts object.
Available measure columns from the facts table are listed on the left hand side
panel from which the user is allowed to select measures. Measures are selected
with a mouse left-click action and then clicking > to move the measure to the
panel on the right hand side. Figure 3-30 shows TRXN_COST_AMT,
TRXN_SALE_AMT and PROMO_SAVINGS_AMT as the selected measures.
Click Create Calculated Measure (see) to launch the SQL expression builder as
shown in Figure 3-31 and create a calculated measure, for example:
Profit =@Column(STAR.CONSUMER_SALES.TRXN_SALE_AMT) -
@Column(STAR.CONSUMER_SALES.TRXN_COST_AMT)
Type in the name of the calculated measure (example Profit) in the Name field.
The actual expression is built by selecting the measures used in the calculation
with the mouse (from the list in Data) and by choosing the appropriate operator
(for example, the operator is '-' in the case of calculating Profit).
Thus, to calculate profit, select TRXN_SALE_AMT first, select the operator (-)
and then select the measure TRXN_COST_AMT.
Click Validate to check if the expression built is valid and click OK to create the
calculated measure and return to the Facts Wizard.
Click Next to see the list of measures (which includes the calculated measures
that you created). You can change the type of aggregation to SUM or any of the
types listed in the drop down box according to the requirement. This action can
also be performed after creating the Facts object from its context menu option
Properties...
Tip: You can create your own aggregation scripts at a later time when you
have built at least one dimension for the cube model. Typically, the basic cube
model is built and then further enhancements are made depending on
business requirements. Building an aggregation script is done by choosing
Edit aggregations... from context menu option on selecting a measure.
Dimension objects are created from the OLAP Center GUI by selecting Create
Dimension... from the context menu option on Dimensions (see Figure 3-32)
Note: Alternatively, dimensions can also be created from the context menu
option on “All Dimensions” from the OLAP Center. Pursuing this option does
not require the user to specify information about the join with the facts objects.
Dimensions once created in this manner can be added to the cube model at a
later time. Adding such dimensions to the cube model, will then be done from
the context menu option Add Dimension on Dimensions in the cube model.
The join to the facts objects needs to be specified only at the time of adding
the dimension to the cube model.
Specify the name of the Dimension object, including the schema name under
which it needs to be created (see Figure 3-33).
The selected table appears in the window on the right by clicking the > button.
Click Next to specify the joins if using more than one table. In our example, there
is only one candidate table to represent the Time dimension.
Select the attributes to describe the dimension object (see Figure 3-35).
Select and click > to identify specific attributes or click >> to select the entire list
of existing attributes from the relational table.
Note: You can also create calculated attributes at the time of creating the
dimension using the Create Calculated Attribute... or at a later time, by
editing the attribute properties from the context menu option on the attribute.
Specify whether the type of the dimension is Regular or Time — those are the
two dimension types that you can have — by selecting the appropriate radio
button (see Figure 3-36).
In our example, we have selected type as Time. For other dimensions such as
Product, Region, and so on, you should choose the type to be Regular.
Click Next to specify the join between the dimension and the facts object.
Note: When using Quick Start, where Referential Integrity has already been
defined, Quick Start will automatically detect the joins.
You can select an existing candidate join or use Create Join to define new joins
(see Figure 3-37).
Click OK to return to the Dimension wizard and select the requisite join.
Note: Two tables can be joined using more than one attribute pair or, in other
words, specifying more than one attribute pair in the join information while
creating/modifying a join object. To do this, select the attribute from each
column to form the attribute pair and click Add. Repeat this to add another
attribute pair.
From the OLAP Center GUI, you can now see the dimension created under the
cube model. Set the view to Show OLAP objects --> Graphical view (see
Figure 3-40).
On the panel on the right hand side, you will see the graphical view of the objects
created with a line between the Fact and Dimension object denoting the join.
Expanding the object-tree list on the left hand side, you will see that the facts
object has an implicitly created attribute once the fact-dimension join has been
specified. For example, DATE_KEY (the foreign key in the fact table) is an
attribute of the facts object MyFact.
Proceed in the same manner, by launching the Dimension wizard to create the
other dimension objects for the cube model.
At least one hierarchy must be defined for each dimension object in a cube
model, if you want to use the dimension as part of a cube.
Select the elements or attributes that form the hierarchy (see Figure 3-43). For
example, to build a Year-Quarter-Month hierarchy in the Time dimension, we can
choose CAL_YEAR_ID, CAL_QUARTER_ID and CAL_MONTH_ID.
Select the type of hierarchy from the type/deployment drop-down list (see
Figure 3-43).
Note: Recursive deployment is only valid if you have two members in your
hierarchy and will be shown in the pull down option if you have selected
exactly two hierarchy level attributes for your hierarchy. If you have more or
less than two, you won't see that option, since it won't be valid to choose.
Click Next to specify related attributes for the hierarchy attributes (see
Figure 3-45).
Define all of the relational attributes that you need to define and then click OK to
return to the Hierarchy Wizard. Select Finish to complete the definition of the
hierarchy (see Figure 3-47).
You can create more than one hierarchy for a dimension in a cube model.
Proceed to create all the dimension objects (for example, STORE, PRODUCT,
CONSUMER, CAMPAIGN for the retail model depicted in Appendix E, “The case
study: retail datamart” on page 685) for the cube model and the hierarchies for
each dimension.
This completes the process of creating a basic cube model with fact and
dimensions and related objects.
2. Provide the Name and Schema for the cube (see Figure 3-50).
4. Select the dimensions for the cube (see Figure 3-52). The available
dimensions are those defined for the cube model.
5. Select the hierarchy for the cube by clicking the push button. Remember that
a cube can have only one hierarchy defined. Select the hierarchy for the cube
from the drop down list (see Figure 3-53).
By default, all the levels in a chosen hierarchy are selected. You can further
deselect members from the hierarchy list.
6. Click OK to return to the Cube Wizard and click Finish to complete creating
the cube.
To ensure cube model completeness, OLAP Center will validate cube models
and cubes once created and will enforce a set of rules. The rules are listed and
documented in IBM DB2 Cube Views Setup and User’s Guide, SC18-7298 in the
Metadata Object Rules section as:
Base rules
Completeness rules
Optimization rules
You can define how the measures are derived from the database columns when
you create a calculated measure. See Example 3-3.
You can also write an aggregation script to specify different types of aggregation
across different dimensions.
When defining complex measures as in Example 3-4, you can control the order
of aggregation.
You can create a measure as a function of two parameters (see Example 3-5), to
correlate Sales and Marketing to support analysis of the effectiveness of your
marketing.
All these functions can be performed from the OLAP Center by right-clicking the
Measures object and selecting either Edit Measures or Edit Aggregations
Create calculated attributes:
You can create attributes derived from base data (see Example 3-6). To
create a calculated attribute, from the OLAP Center, right-click the Attributes
tree object in a dimension and select Edit Attributes and then click Create
Calculated Attribute
You can also perform the same function by right-clicking a dimension object
and select Properties... or Edit Attributes...
Note: You have to perform a CAST to ensure that the strings which are
concatenated are of the same data type.
G e o g ra p h y
R e g io n C o u n tr y S ta t e C ity
R e g io n
A m e r ic a s USA C a lifo r n ia S an Jose
C o u n tr y E u ro p e G re e c e - A th e n s
- Ic e la n d - R e y k ja v ik
S t a te
C it y
Some countries in this hierarchy do not have State and similarly some
countries do not have any associated semantics for Region. For this type of
data, you can implement a ragged hierarchy for the cube model using the
OLAP Center. To do this, right-click the Hierarchies tree object or the
dimension object that you wish to create a hierarchy for and select Create
Hierarchy...
The top-bottom flow chart in Figure 3-55 gives an idea of how to decide the
type of hierarchy to be deployed.
Y Children
have only 1
N
parent ?
Y levels have
semantics
N Network
?
Y Parent
always one N Unbalanced
level above
?
Balanced Ragged
The cube model may need to be enhanced based on the type of queries that are
run against the star schema in order that those actions have better performance.
This may involve building cubes within the cube model to accommodate better
optimization. We need to remember here that cubes should be built as proper
subsets of the cube model.
Whether to build a cube or not is also relevant from the end user tool or
application tool accessing the metadata. Using Office Connect, for example,
requires that many cubes be built based on the slice of data that the user
frequently retrieves. If extracts are regularly performed, then again a cube must
be built to mimic the SQL query behind the extract. Cubes can also be used to
act as filters against the cube model, thus letting the user access only a slice of
the data that he is interested in.
These concepts are discussed in detail in Chapter 4, “Using the cube model for
summary tables optimization” on page 125.
For example, the OLAP Center export and import features for backing up and
recovering DB2 Cube Views metadata only is not a recommended approach and
should be used with caution as an issue may be to lose synchronization between
metadata objects in DB2 Cube Views catalog tables and data in the DB2
database. For the same raisons, use db2move utility to move the DB2 Cube Views
catalog tables only is not recommended either.
When you need to move the DB2 Cube Views catalog tables only from one
server to another one for example from development to test environment, prefer
using OLAP Center export and import features or the db2mdapiclient utility
instead of using the DB2 utility db2move. The db2mdapiclient utility, provides a
way to export all objects within a cube model (cube model and all its cubes and
other objects). This does not allow the user to select which cubes within a cube
model should be exported.
This chapter also demonstrates the different methods of building a cube model in
DB2 Cube Views. A cube model can be built by import, with Quick Start wizard or
from scratch. When building a cube model by import, you start with OLAP
metadata that is already available, which has been passed through a suitable
bridge to transform it into DB2 Cube Views format. When building a cube model
from scratch, you can either use the Quick Start wizard or build the metadata
objects yourself. Using the Quick Start wizard builds a cube model using existing
joins between fact and dimension tables and this requires RI (referential integrity)
implemented for the star schema.
You can also choose to build a cube model by sequentially defining the objects
(facts, dimensions, joins) yourself.
Note: Even if Referential Integrity is highly recommended for DB2 Cube Views
and pre-requisite for Quick Start wizard, it is not mandatory when building the
cube model manually and informational constraints may be used (see4.4.2,
“Define referential integrity or informational constraints” on page 136).
Since many of these queries are run frequently, they may cause a significant
workload on the systems supporting the data warehouse or data mart. Other
queries may aggregate so much information that they impede or exclude other
work scheduled to run on the system. Taken as a whole, available system
resources prohibit the repeated aggregation of the base tables every time one of
these queries are run, even when appropriate indexes exist.
Therefore, as a solution to this problem, the decision support DBAs can build a
large number of summary tables, or materialized aggregate views, that
pre-aggregate and store the results of these queries to help them increase the
system performance.
In modeling terms, the summary tables group the data along various dimensions,
corresponding to specified levels of hierarchy, and compute various aggregate
functions or measures.
These types of requests involve data scans, joins, aggregations, and sorts, and if
they are performed repeatedly against the base fact table and dimension tables,
they will result in poor query performance. Instead, when a DBA creates
summary tables that have already performed this work and stored the results so
that they are available for subsequent query users, the result can be dramatically
improved response times for the query workload.
DB2 provides support for summary tables through its Materialized Query Tables,
or MQTs. Its implementation of MQTs is more generalized than just summary
data. DB2 permits a materialized query table to be created without aggregation,
providing the benefits of pre-joins or caching of data from distributed remote
sources. In the case of analytical data in general and of DB2 Cube Views in
particular, the materialized data is always summarized and aggregated. A
summary table, therefore, is a specialized type of MQT and the only type we’re
considering in this book.
Chapter 4. Using the cube model for summary tables optimization 127
In addition to the concern about the potential for disk consumption, there are
other challenges with containing aggregates. We must consider the time required
to create and maintain the aggregates. Particularly in the cases where you
require that the MQTs remain current with the base data, we must factor in the
amount of time required to update all MQTs affected by base data changes. We’ll
discuss the various types of MQTs and maintenance options in the next section.
As the number of MQTs grows, we also must recognize that this places an
additional burden on the DB2 optimizer to evaluate each one as a candidate for
the target of a query rewrite. A very large number of MQTs can potentially slow
down the performance of a query by significantly extending the amount of time
required to optimize it.
One solution to these issues is to pre-aggregate only some of the data, allowing
the rest of the data to be aggregated on demand. Obviously, this is most effective
when the most frequently requested slices of data are pre-aggregated and
stored. Making that determination can be quite challenging for DBAs.
DB2 Cube Views has introduced a very sophisticated Optimization Advisor (see
Figure 4-1) that performs a cost/benefit analysis of a number of potential MQTs
based on the multidimensional model, the anticipated workload type, catalog
statistics, and block-level sampling of the data, as described in Figure x below.
The extent of this cost/benefit analysis is governed by the administrator’s
specification of the amount of disk space available to store MQTs and the
maximum amount of time to spend on the sampling process.
Statistics &
Query Types
Model Information Data Samples
Time & Space limits
Optimization Advisor
Summary Tables
Figure 4-1 The Optimization Advisor
Distributive measures use simple aggregation functions such as SUM and COUNT
that can be aggregated from intermediate values. Non-distributive measures use
more complex aggregation functions, such as STDDEV and AVG, which must be
aggregated each time from the base tables.
Chapter 4. Using the cube model for summary tables optimization 129
Note: If AVG is needed in an MQT and will be aggregated further in a number
of queries, you may consider including SUM and COUNT as measures and derive
the AVG function from these values (SUM(SUM)/SUM(COUNT)) to avoid pushing the
query to the base tables.
DB2 supports several complex GROUP BY expressions that offer significant benefit
with MQTs. DB2 can view these complex aggregations as separate groupings,
thus allowing queries to use MQTs that are defined with groupings that are
supersets of those requested in the query. DB2 Cube Views exploits this
capability to reduce the total number of MQTs it recommends while still
accommodating a large number of potential queries. The two complex groupings
which DB2 Cube Views supports are GROUPING SETS and ROLLUP.
is equivalent to:
SELECT … GROUP BY store_id, product_id
UNION ALL
SELECT … GROUP BY date_year, date_month
With the grouping set, the data is scanned only once whereas with the union, it is
scanned twice. This is very significant with MQTs in that we’re not only reducing
the requirement for scanning the data but also reducing the number of MQTs
being built.
For example:
SELECT …
GROUP BY ROLLUP (region_id,district_id,store_id)
is equivalent to:
SELECT …
GROUP BY (store_id, district_id, region_id)
UNION ALL
SELECT …
GROUP BY (district_id, region_id)
UNION ALL
As a rule, DB2 Cube Views recommends the smallest number of MQTs possible
to enhance the performance of the largest number of queries. The number of
MQTs can be minimized by using these complex constructs because DB2 is able
to understand the separate sub-groupings that exist as part of the superset and
optimize a large number of queries that are not an exact match with the MQT
SELECT statement. DB2 Cube Views provides three significant advantages by
using these complex constructs:
Reduction in number of MQTs
Reduction in total disk space
Reduction in refresh times by minimizing base table scans
DB2 can use an MQT when the grouping is different from the requested grouping
if the MQT has a GROUP BY on a finer granularity than the request. For example, if
an MQT aggregates at the month level, DB2 can use that MQT to support
requests for data grouped by quarter or year. Therefore, it is unnecessary to
define multiple MQTs or slices for each different level.
Thus, the DB2 Cube Views Optimization Advisor and the DB2 optimizer combine
to provide a very powerful summary table capability.
We will also discuss additional MQT options available in DB2 and considerations
for using them.
First, there are options you specify when you create MQTs, and then there are
options for maintaining them. We’ll begin by discussing the options for creating
them, and then cover maintenance.
Chapter 4. Using the cube model for summary tables optimization 131
4.3.1 MQTs in general
An MQT is created with a CREATE TABLE statement with an AS fullselect clause
and the REFRESH IMMEDIATE or REFRESH DEFERRED option. A summary table
additionally includes aggregation.
Example 4-1 is a simple example of the SQL to create and populate an MQT
aggregating data between the STORE and SALES tables.
You may optionally identify the names of the columns of the MQT. The column
names are required if the result of the full select has duplicate names or
unnamed columns.
After the table has been created, the MQT has to be synchronized with the base
tables. This is controlled via the refresh option. The System Maintained MQT
refresh options are summarized in Table 4-1.
Note: Since DB2 Cube views only generate System Maintained MQTs, we do
not cover User Maintained MQTs in this book.
Table 4-1 basically specifies what major work the DBA needs to do with the
MQTs after changes are made to the base tables. There are two cases:
One where changes are made with SQL
One where the base tables are loaded
There is, however, an important difference between the two options when a LOAD
INSERT is performed on the base tables:
Important: The MQTs can still be used in query rewrites when they are
created using the REFRESH DEFERRED option even though the base tables have
been reloaded with new and most likely changed data. This can produce
wrong results from queries as long as the MQTs are not in sync with the base
tables.
However, if the REFRESH IMMEDIATE option is used, the MQTs will be set to
check pending and thus can not be considered by DB2 as candidates for
query rewrites. This ensures that queries will only use the base tables after a
load as long as the MQTs are not synchronized with them. Unless you perform
an online LOAD (ALLOW READ ACCESS) on the base tables, then the MQT still
can be considered as a candidate for query rewrite.
Chapter 4. Using the cube model for summary tables optimization 133
It should also be noted that because MQTs are instantiated as real tables, the
same guidelines apply to MQTs as for ordinary tables with regard to optimizing
access using tablespace definitions, creating indexes, reorganizing and
collecting statistics.
What this basically means is that your summary tables have to be refreshed
manually with a REFRESH TABLE <MQT tablename> statement in all cases except
where the summary table has been created with the REFRESH IMMEDIATE option
and the base table(s) are changed using regular SQL statements.
Since it is likely that the summary tables will be out of synchronization with the
base tables after the base tables are updated or changed, it is important to plan
for the maintenance of the summary tables in advance prior to putting them in
production. There are several reasons for this:
The main reason is that the base tables oftentimes are loaded and not altered
with SQL statements.
Only the SUM, COUNT, COUNT_BIG and GROUPING aggregation functions are
usable in REFRESH IMMEDIATE MQT. Otherwise the Optimization Advisor might
change the CREATE TABLE statement of the MQT from REFRESH IMMEDIATE to
REFRESH DEFERRED.
Further steps will be to plan for synchronization of summary tables with the base
tables as early as possible in the development cycle and to include it in the
normal database maintenance schedule. Detailed scripts are provided in
“Further steps in MQT maintenance” on page 198.
Note: DB2 does not permit an MQT to be built on another MQT. Therefore, if
your base data contains MQTs, they will not be considered by OLAP Center to
be part of the model.
Here are the steps to drop a summary table from a command line:
1. Connect to the database of the cube model that you dropped. For example,
enter: db2 connect to RETAIL.
2. Enter the following command: DROP TABLE <table_name>, where table_name
is the name of the summary table that you want to drop.
Note: When rerunning the Optimization Advisor, you should manually drop
any old MQTs from the previous optimization not being recreated during the
new optimization. They will not be dropped automatically.
Over time, the number of MQTs suggested by the Optimization Advisor may
change. This means that if the number of MQTs decreases compared to an
earlier optimization, you need to manually drop the extra MQTs. The reason for
this is that the Optimization Advisor only creates DROP statements for those MQTs
that are being created.
Chapter 4. Using the cube model for summary tables optimization 135
4.4 What you need to know before optimizing
There are basically four things that you need to have or know before using
Optimization Advisor:
1. Get at least a cube model and one cube defined.
2. Make sure that referential integrity and informational constraints are in place
on the base tables.
3. Know or have an idea of the type of queries that will be used on the OLAP
database.
4. Understand how Optimization Advisor uses the cube model/cube definitions
and how they interact together to leverage query optimization.
As the Optimization Advisor is using the cube definition to optimize for some of
the query types, for example, drill through, a good practice would be to create at
least one cube as a subset of the cube model.
A cube model can be easily created using the OLAP Center graphical interface.
It can also be imported via bridges from different partners tools (such as
MetaStage, Meta Integration, DB2 OLAP Integration Server, and so on).
In addition to the foregoing reasons, this also allows the DB2 optimizer to exploit
knowledge of these special relationships to process queries more efficiently. If
the Referential Integrity constraints can be guaranteed by the application and
you do not want to incur the overhead of maintaining the constraints, consider
using informational constraints. Informational constraints are constraint rules that
can be used by the DB2 optimizer but are not enforced by the database
manager. This permits queries to benefit from improved performance without
incurring the overhead of referential constraints during data maintenance.
The DB2 Cube Views Quick Start wizard used to create the cube model also
requires referential integrity.
The DB2 Optimizer (specifically for MQTs) only requires referential integrity for
query rewrite purposes in the following situation:
When the SQL statement being processed has less tables than the SQL
statement used to create the MQT.
For example, consider that the MQT was created using the SQL statement in
Example 4-2.
The DB2 optimizer for the query in Example 4-3 does require referential integrity,
because it is not using all tables defined on the MQT.
Chapter 4. Using the cube model for summary tables optimization 137
FROM FACT_TABLE f,
STORE s,
TIME t,
SCENARIO y
WHERE
f.store_id=s.store_id and
f.time_id=t.time_id and
f.scenario_id=y.scenario_id
GROUP BY s.store_name, t.quarter_desc, t.month_name, y.scenario_name
The DB2 optimizer for the query in Example 4-4 does not require referential
Integrity, because it is using all tables defined on the MQT.
DB2 V8.1 introduced informational constraints, and they may be used as well.
Attention: If you use informational constraints, you must ensure that the data
is in fact accurately adhering to the constraints you have described.
Otherwise, you could get different results from the base tables than from the
MQT.
In addition to the foregoing examples, also consider the star schema example in
Figure 4-2.
1 TV 1 San Francisco
Relationships
implied,
not defined FACT
1 2 $300
1 1 $250
2 1 $100
2 3 $100
Primary keys and foreign keys have not been defined for referential integrity, and
we create the MQT in Figure 4-3.
1 San Francisco
SID SALES
1 350
FACT
2 300
1 2 $300
1 1 $250
2 1 $100
As you can see on Figure 4-4, DB2 will not include the non-matching rows on the
MQT and different queries can generate different results such as Select SUM
(sales) from Fact result in a different result compared to a Select SUM (sales)
from (MQT1), even if the two queries are semantically different.
Chapter 4. Using the cube model for summary tables optimization 139
SELECT SUM(SALES)
FROM FACT
Result
STORE Using base table: $750
Using MQT1: 650
SID STORE
1 San Francisco
2 New York
FACT
1 2 $300
SID SALES
1 1 $250
1 350
2 1 $100
2 300
2 3 $100
Similar issues exist with NULLS as with constraints, and for that reason DB2
Cube Views requires that foreign keys be created as non-nullable. If a foreign key
is nullable, DB2 assumes that it could contain NULLS. If all foreign keys are
nullable, an MQT will only be used if the joins in the MQT exactly match with the
joins in the query. In this case, many MQTs would be created in order to optimize
the model. Therefore, DB2 Cube Views requires non-nullable foreign keys to
avoid an explosion of the number of MQTs.
Note: If you use informational constraints, you must ensure that the
data is in fact accurately adhering to the constraints you have
described.
Chapter 4. Using the cube model for summary tables optimization 141
In order to enforce the optimization rules on this star schema, you need to define
constraints on each of the facts-to-dimension joins as shown in Figure 4-5.
Several rules define each of these joins. You can use informational constraints
only for foreign key constraints.
Campaign Table
SALES Table
COMPONENT_ID
Consumer Table CONSUMER_ID
DATE_KEY
CONSUMER_IDENT_KEY ITEM_KEY
FULL_NAME STORE_ID
GENDER_DESC TRXN_QTY
AGE_RANGE_DESC TRXN_SALE_AMT PRODUCT Table
TRXN_COST_AMT
PROMO_SAVINGS_AMT PRODUCT_IDENT_KEY
CURRENT_POINT_BAL ITEM_DESC
SUB_CLASS_DESC
CLASS_DESC
SUB_DEPT_DESC
STORE Table DEPT_DESC
STORE_IDENT_KEY
STORE_NAME
AREA_DESC
DISTRICT_DESC
REGION_DESC
CHAIN_DESC
ENTERPRISE_DESC
For example, for the join between the Product and SALES tables, you must
define constraints for the following rules:
Product_Ident_Key is the primary key in the Product table.
Product.Product_Ident_Key and SALES.Item_Key are both non-nullable
columns.
SALES.Item_Key is a foreign key referencing Product.Product_Ident_Key.
The join cardinality is 1:Many (Product.Product_Ident_Key :
SALES.Item_Key).
The join type is inner join if summary tables optimization needed.
The examples supplied are from the Sales cube model and the lines denote the
portions of a cube model that each query type accesses.
For example, in Figure 4-6, a user might start by accessing the sale value for all
stores of all products for the year 2002. Then the user can move deeper into the
data by querying for sales by quarter in all stores for all products. Performance is
usually very important for these types of queries because they are issued
real-time.
For the drill down query type, the Optimization Advisor optimizes based on the
cube model and not on the cubes defined on the model. The Optimization
Advisor recommends summary tables that aggregate data at the top of the cube
model. Using the Optimization Advisor for optimizing for the drill down query type
will benefit queries that access the top levels of the cube model.
Chapter 4. Using the cube model for summary tables optimization 143
Time Store Consumer Product Campaign
Campaign
Year Enterprise Gender Department
Type
Chain Sub
Quarter Age Range Campaign
Department
Store Component
Accessing the top level data without these summary tables in place will require
repeated queries and numerous computations to be done on the base data. With
the summary tables that pre-compute the aggregations at the top level, there will
be considerable performance improvement.
Report queries
Report queries can hit anywhere in the cube model, but they usually tend to favor
the top and middle of the hierarchies. For example, as depicted in Figure 4-7, a
user might access the sale value of each item for all stores for the month January
2002. Then the user might access the sale value for each store area by product
class for each month in the year 2002.
Gender Campaign
Year Enterprise Department
Type
Chain Sub
Quarter Age Range Campaign
Department
Store Component
For the report query type, the Optimization Advisor optimizes based on the cube
model and not on the cubes defined under the model. The Optimization Advisor
recommends summary tables that aggregate data from the top, down towards
the middle of the cube model. Query performance is usually not as critical for
report queries as for drill down queries because a user is less likely to be waiting
for an immediate response to each individual query. If optimization for many
query types will be required and space is at a premium, you should consider the
inclusion of report optimization last.
Extract queries
Extract queries access only the base level of a cube defined for the cube model
and are used to load data into a Multidimensional OLAP (MOLAP) data store.
Data aggregated to the base level of the cube is loaded into a MOLAP
application for further processing. For example, the Quarter-Chain-Age
Range-Sub Department-Campaign Type in Figure 4-8 represents the base level
of a cube defined for the cube model.
Chapter 4. Using the cube model for summary tables optimization 145
Time Store Consumer Product Campaign
Gender Campaign
Year Enterprise Department
Type
Chain Sub
Quarter Age Range
Department
Extract query optimization is based on the bottom slice of the cubes defined for
the cube model. Performance improvements will vary depending on how close
the base level of the cube is to the bottom of the cube model. The higher the slice
is on the cube model, the higher the expected performance improvements are.
Accessing the higher level data without these summary tables in place will
require repeated and costly queries to get the base data for the cube. With the
summary tables that pre-compute the aggregations at the base level of the cube,
there will be a lot of performance improvement.
The cube defined on the cube model should logically map to the MOLAP cube to
which you want to load the data. Theoretically, there will be an MQT generated
for each cube defined against the cube model.
Consider having a MOLAP outline (Example 4-6) which maps to the cube in
Figure 4-8.
The extract query for the MOLAP cube in Example 4-6 requires the data at the
base level of the cube as given in Figure 4-8. The aggregation for the higher
levels in the MOLAP cube will be performed by the MOLAP application based on
the base level data.
Chapter 4. Using the cube model for summary tables optimization 147
Time Store Consumer Product Campaign
Gender Campaign
Year Enterprise Department
Type
Chain Sub
Quarter Age Range Campaign
Department
Store Component
Drill through query optimization is based on the cubes defined for the cube
model. The Optimization Advisor recommends summary tables that aggregate
data for a few levels at the bottom of the cube. The level of aggregation is based
on the disk space availability. Using the Optimization Advisor for optimizing for
drill through query type will benefit queries that frequently access the relational
data below the MOLAP cube.
What this means, for example, is that if drill through and extract both are
specified, only drill through will be done. If drill down and drill through are
selected, only drill down will be done.
Overcoming the urge to create multiple cube models can be challenging when
one faces the prospect of having to cater to diverse queries, some of which
exploit the top, most aggregated, part of the cube while others go to the lowest
level of granularity on one or more dimensions. Generally, however, the
Optimization Advisor will provide reasonable optimization for any combination of
query types specified provided that the defined cube or cubes under the cube
model fairly accurately reflect the query types.
For example if you want to optimize for both extract and drill through it is
generally sufficient to create a logical cube in DB2 Cube Views for each HOLAP
cube or MOLAP cube and the Optimization Advisor should provide reasonable
optimization of both the hybrid queries and the extract queries. The reason is that
both extract and drill through optimizes for the bottom of the defined logical cube
in the same way with the small difference that drill through includes additional
slices near the bottom of the cube to the MQTs, favoring dimensions with high
cardinality. However, if measured performance is still not sufficient after
optimization, you can consider defining an additional cube to provide a hint to the
Optimization Advisor.
Chapter 4. Using the cube model for summary tables optimization 149
Note: Care should be taken if you are making extracts (or drill through) that go
to the bottom of the cube model. Creating a cube that basically is a copy of the
cube model and running the Optimization Advisor on it will produce an MQT,
but since the MQT will be comparable in size to the fact table, it will be most
likely that the DB2 optimizer will not reroute the query to the MQT even though
the MQT is aggregated on a lower level on all relevant dimensions and thus a
candidate for a query rewrite. The reason for this that the cost of going to the
base table will be equal to or less than going to the MQT.
Cubes are defined and created for business reasons. They satisfy the business
requirements of a particular project when the cube model will represent the
whole subject area. They are not required for drill down and report query
patterns. For drill down, the Optimization Advisor builds aggregates of the
topmost levels down attempting to reach down as far as the medium levels of the
dimensions. The same goes for report style optimization. The difference between
this optimization and the drill down style of optimization is that report style
optimization does not include rollups at the top of the logical cube.
Drill down Top and medium levels (includes rollups at the top levels)
Drill through Queries at the bottom levels of the cube (includes additional
low level slices)
Cubes must be defined when optimizing for drill through and extract query types
to reflect the data that these queries will likely access. If you perform both extract
and drill through queries for a particular cube model, you should build two cubes:
one designed for extract queries, and a second designed for drill through
queries. For a further discussion on optimization for different query type, please
refer to “Define the query types” on page 160.
When considering the number of cubes to build it should be noted that the more
cubes the Optimization Advisor must consider, the more MQTs will be created
(when specifying query types that optimizes based on cubes). It may seem than
the more MQTs we build, the better chance DB2 will have of doing query rewrites
and thus improving performance, but this may not always be the case.
Therefore, it is important to consider whether the MQTs created are in fact all
needed: not only because they consume disk space, but also because they
extend the SQL query compilation time of all queries going to the database.
The Optimization Advisor depends in very many ways on the metadata supplied
to it. Therefore, it is also important to know that you can improve the
recommended summary tables by selecting which measures and attribute
relationships you want to include in a particular cube model.
If you are very limited on disk space, you might choose to include only critical
measures in your cube model and drop any measures that you do not expect to
use regularly. Excluding measures from the cube model may, however, have an
impact on front-end tools, so this action should be chosen judiciously.
It should be noted that distributive measures (like SUM or COUNT) are handled very
well when you do rollups, while non-distributive measures will have to be
calculated from the bottom of the cube and up. This means that even though you
have an MQT at a lower level in the hierarchy of the cube, it can not be used if
the non-distributive measures in the MQT are not aggregated to the exact level
needed by the query. From an MQT point of view, what this means is that in a
space constrained environment, you should first seek to eliminate
non-distributive measures from the cube model before looking at the distributive.
Chapter 4. Using the cube model for summary tables optimization 151
4.5 Using the Optimization Advisor
Basically, the Optimization Advisor is designed to create MQTs based on the
cube model, the defined cubes, the size and analysis time constraints, and the
base tables and their statistics.
Important: The value proposition of the Optimization Advisor is that the DBA
will only have to decide whether to use MQTs or not. Once the decision is
made, the Optimization Advisor will build the MQTs without further effort.
This contrasts to earlier times where the MQTs would have to be built by hand,
often without the expected performance improvement or at the cost of many
hours of analysis and reiterative efforts.
In the following sections we’ll take you through the necessary steps of using the
Optimization Advisor.
The Optimization Advisor thus analyzes the information that you provide to the
wizard along with the metadata of the cube model, the DB2 statistics, and any
data sampling of the base tables that you allow (given a time constraint). The
result is a recommendation of which summary tables and summary table indexes
should be created. The choices that you provide for each parameter in the wizard
affect the summary tables the wizard recommends and ultimately the
performance that you gain.
In general, the Optimization Advisor will try to limit the recommendation to the
smallest number of summary tables possible while seeking to avoid impacting
the resulting performance of the queries.
Disk space
limitation
12
11 1
10 2
9
8
7 5
4
3
Optimization Summary tables
6
Advisor
Optimization
time allowance
Figure 4-10 Optimization Advisor wizard
The following list describes the information that you must provide to the wizard:
The type of queries expected on the cube model: This helps the Optimization
Advisor understand which portions of the cube model are queried most
frequently.
Disk space limitations: This helps the Optimization Advisor to recommend
summary tables that in aggregate do not exceed the maximum allowable disk
space.
Time limitations: This is the maximum amount of time that the Optimization
Advisor can use to determine sample the base tables and produce a
recommendation.
If you allow data sampling, the Optimization Advisor examines the base data
(fact and dimensions) to get an estimate of how big a given grouping of data
would be.
Chapter 4. Using the cube model for summary tables optimization 153
In addition to the information provided to the wizard, the Optimization Advisor
analyzes the following information to create recommendations:
Cube model metadata, which includes the cube model, the cubes defined
based on the cube model, the measures, the attribute relationships, etc.
DB2 statistics, including the number of records, number of pages and average
record length.
Data sampling information (if you allow data sampling, which we recommend
highly), which includes overall trends and exceptions in the data. This is also
known as data sparsity.
As a result, the Optimization Advisor produces two SQL files that can create and
refresh the recommended summary tables. If you choose you can change the
files, but generally it is recommended to run them unchanged.
Note: Before optimizing a cube model using the Optimization Advisor, you
must have DB2 constraints specified for the base tables used in the cube
model. Constraints must be specified between each fact table and dimension
table and between each dimension table in a snowflake schema. The
constraints must be specified on nonnullable columns.
The Optimization Advisor wizard is launched from the OLAP Center. There are
two ways to invoke this; one is illustrated in Figure 4-13. All screens used to
perform this task are shown to enable you to see all of the options that are
available through the wizard.
Consider the view of the Sales cube model in Figure 4-11 as displayed in the
OLAP Center.
A Sales cube is defined based on the Sales cube model seen in Figure 4-11 with
the same 5 dimensions, except that the hierarchies are a subset of the cube
model hierarchies. Figure 4-12 shows the view of the Sales Cube as displayed in
the OLAP Center.
Chapter 4. Using the cube model for summary tables optimization 155
Figure 4-12 Sales Cube
It is not our intention to walk you through the cube model and cubes, but it is
important that you have determined that the basic cube model and cubes are in
place before you start the optimization process.
The next two things we need to plan for before running the Optimization Advisor
wizard are:
How much disk space we have for the summary tables and summary table
indexes
How much time we have to generate the recommendations
These are important because the Optimization Advisor will try to create
recommendations that is the most appropriate within the constraints that you
have. For our scenario, assume we have no time limit and have a disk space
limitation of 8 Gigabytes.
In the following sections we will go through the Optimization Advisor wizard for
the sample scenario and analyze the Summary table recommendations.
Chapter 4. Using the cube model for summary tables optimization 157
Figure 4-13 Menu selection to access Optimization Advisor
Define query type Specify the type or types of queries Refer to section
expected to be performed most often on “Define the query
the cube model. The available types of types” on
queries are: Drill down, Report, Drill page 160.
through, and Extract.
Specify limitations Specify the available disk space for the Refer to section
summary tables and indexes that will be “Specify disk
built. Specify if you want to allow data space and time
sampling. Also specify the maximum limitations” on
amount of time you want to allow for the page 161
Optimization Advisor to determine
recommendations. The more space,
information, and time that you specify, the
more significantly your performance
results will improve.
Specify file names to Enter a unique file name in both the Create Refer to section
store the SQL scripts summary tables SQL script field and the “Specify file
Refresh summary tables SQL script field. names to store
the SQL scripts”
on page 166
4. Save the recommended SQL scripts into the file names specified and close
the Optimization Advisor wizard.
5. Run the SQL scripts. If you are creating large summary tables, building the
summary tables might require a substantial amount of time to complete. You
can use the DB2 Command Center or Command Window to run the SQL
scripts. You need to have the following privileges to run the SQL scripts:
– CREATEIN, DROPIN on schema DB2INFO
– SELECT and ALTER (or CONTROL) on base tables
Chapter 4. Using the cube model for summary tables optimization 159
Here are the steps to run the SQL scripts from the DB2 Command Window:
a. Change to the directory where you saved the SQL scripts.
b. Connect to the database of the cube model that you optimized. For
example, enter: db2 connect to RETAIL.
c. Enter the following command:
db2 -tvf filename
where filename is the name of the create summary table SQL script.
You can run the refresh summary table SQL script anytime, depending on
how current you want the data in the summary table to be with the base
data, to synchronize the summary tables with the change in base data.
You must choose at least one query type. The Optimization Advisor optimizes
based on the cube model or the cubes defined on the cube model based on the
query type selected. The options, Drill down and Report, are selected by default.
For our sample scenario, since we want to optimize for drill through type of
queries, choose the option Drill through in this page and click Next to continue.
Chapter 4. Using the cube model for summary tables optimization 161
The following factors need to be considered before specifying the available disk
space:
The query performance levels that you want
The number of cube models that you are optimizing for
How critical each cube model is
How frequently each cube model is used
The availability and cost of the disk space
5% Medium
10% High
Unlimited Highest
If you want to specify no limit on the disk space for summary tables, select the
option Unlimited disk space available in the wizard (Figure 4-15). Alternatively,
you can specify a disk place limit by choosing the option Maximum disk space
allowed and specify the available disk space in MB or GB.
For our sample scenario, since we have a disk space limitation of 8 Gigabytes,
choose the option Maximum disk space allowed and specify 8 and choose the
option GB from the drop down list.
In the case where the Optimization Advisor is not allowed to perform sampling,
DB2 Cube Views must rely on the current statistics in the catalog. Because those
statistics describe individual tables and not the intersections of values among
If you allow data sampling, the Optimization Advisor will examine the data in the
cube model to get more information so that it can create the most effective set of
recommendations that will match the available disk space. By default, Data
Sampling is selected and no limit is set for the time to do the data sampling.
In the Optimization Advisor wizard (Figure 4-15), you can choose to allow Data
Sampling by selecting the option. If you allow Data Sampling, then you can
choose to specify unlimited time for the data sampling process by selecting the
option Unlimited Time Available. Alternatively, if you want to specify a time limit
for the data sampling process, select the option Maximum Time Allowed and
specify a maximum time limit in Minutes or Hours.
Chapter 4. Using the cube model for summary tables optimization 163
For our sample scenario, since we want to allow Data Sampling with no time
limitation, select the Data Sampling option and choose the Unlimited Time
Available option.
Restriction: The sampling done by DB2 can only be performed on tables and
not views. This means that if a fact table is composed of several tables
overlaid with a view, and the view is specified in the cube model as the fact
table, the sampling will fail.
For our sample scenario, since we have non-distributive measures, choose the
DEFERRED option.
Specify tablespaces
You can specify different tablespaces for storing the summary tables and the
summary table indexes. The tablespaces defined under the DB2 data source are
listed for you to choose from. The SQL for summary tables and the indexes will
refer to the selected tablespaces. The summary tables are generally wide, so it is
recommended to use a tablespace with a wide page size to store the summary
tables.
Click Next to have the wizard determine the recommendations for creating and
refreshing the summary tables.This might take some time depending on the
volume of data we are handling.
For our sample scenario, choose any tablespace from the drop down lists.
Chapter 4. Using the cube model for summary tables optimization 165
Specify file names to store the SQL scripts
In the summary page of the wizard in Figure 4-17, specify unique file names for
the create summary table SQL and the refresh summary table SQL scripts. You
can view the Create or Refresh SQL that is recommended to optimize the model
by clicking the Show SQL button.
You can see more information about the recommended summary tables that the
SQL will create by clicking the Details button. The following details will be
shown:
Expected disk space usage by summary tables — see Figure 4-18. See also
Table 4-4 on page 162 for some recommendations for summary table disk
space usage as a percentage of the fact table.
The reason to use DEFERRED when IMMEDIATE is specified — see
Figure 4-19. The full text of the DEFERRED refresh message is: “[OC7201]
The "DB2INFO.MQT0000000002T01" recommended summary table will use
DEFERRED refresh because one or more nullable attributes were found as
columns in the fullselect of this recommended summary table.”
The Optimization Advisor creates one or more summary tables. For summary
tables created with the DEFERRED update option, the create summary table
SQL and the refresh summary table SQL are the same. The DEFERRED option
drops the previously created summary tables instead of applying the delta to the
original data. This improves the performance.
Chapter 4. Using the cube model for summary tables optimization 167
For our sample scenario, specify the file names for the create summary table
script and the refresh summary table script as c:\sales_drillthru_createmqt.sql
and c:\sales_drillthru_refreshmqt.sql respectively. Click Finish to save the
scripts and close the Optimization Advisor wizard.
All that remains to be done is to run the resulting create script in a DB2 command
window: DB2 -tvf <DDL script>.
It is clear that refreshing the MQTs is a fairly simple operation in this case.
Note: The Optimization Advisor should be re-run periodically when the data
source changes significantly in size and the MQT re-optimized.
As can be seen in Example E-1 on page 694, the MQT creation (and sometimes
the refresh) scripts are quite large and will take considerable effort to create by
hand. With the Optimization Advisor, this is now more or less eliminated.
Moreover, since the DBA often has no query workloads to work from — for
example, in cases where the datamart is being implemented and not in
production yet — the Optimization Advisor will be able to provide a set of MQTs
that enhance the performance of the data mart for a number of different queries
based on the actual design of the database and the expected workload.
Note that having created and run the MQT creation script, a plan should be
established for running the refresh script at appropriate times. In practice, this is
often done right after the base tables have been updated or reloaded, for
example, in a designated service window or at times with low workload. Deferring
the refresh to a later time — especially in cases where the MQTs have been
created with refresh DEFERRED — will introduce inconsistencies between the
MQTs and the base tables if the base tables have been updated using load
insert. This situation should be considered carefully before it becomes practice.
Please refer to “MQTs: a quick overview” on page 131 for a further discussion on
this subject.
After creating MQTs, we need to consider what activities should follow. This is
covered in the next section.
The steps described in Figure 4-20 propose a method for creating MQTs and
keeping them aligned with the workload on the database.
Initial creation
Check MQT
Use “DB2 Explain SQL” DFT_REFRESH_AGE
on a representative
If needed, envisage subset of the query
changing the cube workload
model
Does your query follow the
Advisor ?
The process depicted in Figure 4-20 basically shows what general steps to take
when reviewing the MQT implementation, including the initial creation at the
MQTs and the iterative process of maintaining MQTs.
Chapter 4. Using the cube model for summary tables optimization 169
The process as depicted is simplified and thus only includes steps that directly
pertain to the creation and maintenance of the MQTs. This means that normal
table maintenance such as reorganization and collecting statistics is not
included, other than the initial statistics creation, as can be seen in the upper
left-hand corner of the figure.
Prior to going into full production and during the initial creation of the MQTs, we
suggest, if possible, that you start with a subset of the data for the base tables
and run an iteration or two across the reduced data. This assumes that a
meaningful query workload can be obtained at this stage. The advantage of this
approach is that you can create the MQTs fairly quickly and determine whether
they perform as expected or whether changes are required.
If we look back at Figure 4-20 on page 169, we can transform this into a little
more detailed checklist:
1. Create a subset of the data for your base tables. This will initially reduce
loading and refresh times of the base tables and MQTs as well as reducing
any query times when creating the query workload.
2. Create a cube model and any relevant cubes. Obviously this is needed, as
the Optimization Advisor depends on the metadata to suggest MQTs.
3. Run statistics on the base tables. This is especially important prior to running
the Optimization Advisor and creating the MQTs for the first time, since
without the statistics this process can be prolonged significantly as well as
produce suboptimal MQTs.
4. Use the Optimization Advisor to create the MQTs.
5. Use DB2 Explain SQL on a representative subset of the query workload. The
purpose of this is to see whether the query workload has changed since the
MQTs were created or the assumptions about the workload were incorrect.
6. Validate if the query is using MQTs:
a. Make sure that DFT_REFRESH_AGE is configured for the type of MQT being
used (0 for Immediate and Any for Deferred/Immediate)
b. Make sure that the query matches the requirements to use the MQT.
(More details are given in DB2 UDB’s High Function Business Intelligence
in e-business, SG24-6546, in section 2.7, “Materialized view matching
considerations”.)
c. Check that primary keys and foreign keys are defined on dimensions and
fact tables.
d. Check that statistics are updated on MQT and base tables and run them if
not.
e. Make sure that the MQTs are not in check pending state.
Going through the list above, some tools might come in handy, as shown in
Table 4-5.
How deep down in the hierarchies do the Save your output from the Optimization
MQTs go? Advisor or use the DB2 Control Center:
right-click the MQT table and select
Generate DDL
Are the DB2 parameters correctly set? DB2 Command Window and DB2 Control
Center
In the following sections we will elaborate and expand on the toolbox provided by
DB2 to help monitoring MQTs.
Chapter 4. Using the cube model for summary tables optimization 171
4.6.1 What SQL statements are being run?
After the Optimization Advisor has been run and the MQTs have been created,
we should check to make sure that DB2 is actually using them. The first step in
this process is to determine what SQL is being run. Most of the front-end
Business Intelligence and reporting tools have facilities for displaying and/or
saving the SQL submitted to DB2. Retrieving the SQL from these tools may not
be practical, however — especially in the cases where they are only available
from the user workstations.
A way for the DBA of making certain that the SQL is stored for analysis is by
capturing dynamic SQL using DB2’s Snapshot Monitor. The DB2 Snapshot
Monitor captures both the statement text and the statistics pertaining to the
statements, including number of executions, number of rows read and updated,
and the execution times. This will provide enough information to the DBA for
further analysis into the types and frequency of the queries as well as the
statements themselves for access path analysis. Note that the statements
captured by the DB2 Snapshot Monitor are those originally submitted to DB2.
They do not reflect any potential rewrite by the optimizer.
To get the statements from DB2, you attach to the instance and query the DB2
Snapshot Monitor:
ATTACH TO <instance> USER <userid> USING <userid>
GET SNAPSHOT FOR DYNAMIC SQL ON <database name>
This will provide point-in-time information from the SQL statement cache for the
database.
You can also use the new TABLE function to specify the elements of interest
from the snapshot. For example:
SELECT SNAPSHOT_TIMESTAMP, ROWS_READ, NUM_EXECUTIONS,
PREP_TIME_WORST, TOTAL_EXEC_TIME, STMT_TEXT
FROM TABLE (SNAPSHOT_DYN_SQL ('database name’, -1))
AS SNAPSHOT_DYN_SQL
Usually the queries run against a star schema are run as dynamic SQL.
However, in the event that there are static SQL statements issued against the
database, these statements can be retrieved from the SYSCAT.STATEMENTS
catalog view of the database. For example:
SELECT TEXT FROM SYSCAT.STATEMENTS
Important note: MQTs are never considered when static embedded SQL
queries are executed.
Before using Explain, the Explain tables must be created. The Explain tables
capture access plans when the Explain facility is activated. You can create them
using table definitions documented in the SQL Reference, Volume 1, or you can
create them by invoking the sample command line processor (CLP) script
provided in the EXPLAIN.DDL file located in the 'misc' subdirectory of the sqllib
directory. To invoke the script, connect to the database where the Explain tables
are required, then issue the command:
db2 -tf EXPLAIN.DDL
Explaining the chosen optimizer path can also be done through the Control
Center. A feature of this tool is that if the Explain tables have not been created,
they will be created for you. To access the tool, you can right-click the database
you want the SQL explained for and you will see it, as shown in Figure 4-21.
Chapter 4. Using the cube model for summary tables optimization 173
Figure 4-21 Explain SQL
Selecting Explain SQL will produce a new window as shown in Figure 4-22
(possible preceded by a message stating that the Explain tables have been
created, if they were missing) where the SQL statement to be explained can be
entered.
The advantage of using this approach to explaining SQL statements is that the
access path chosen by the optimizer will be displayed graphically, which makes it
easier to quickly determine whether for example query rewrites are performed. In
the following example a query is explained graphically. The result set is
displayed at the top of the graphic, and the tables from which the information is
retrieved are at the bottom of the graphic.
Between the top and the bottom there are a number of different boxes, each
describing the actions DB2 performs to get from the base data to the result set.
In the case depicted in Figure 4-23 we see a simple scenario, where DB2
accesses the DB2INFO.MQT0000000001T02 Materialized Query Table to get
the needed information using a table scan (TBSCAN). You can double-click each
box to get a detailed explanation of each step, but often the Access Plan Graph
provides enough information to determine where the problem is located.
Chapter 4. Using the cube model for summary tables optimization 175
Notice in Figure 4-23 that the cost timeron tally is displayed in every box in the
graph. Often, if queries take too long, for example, the problem can be located to
the place in the graph where the tally increases dramatically compared to the
rest of the graph.
Chapter 4. Using the cube model for summary tables optimization 177
As can quickly be seen from the timeron tally, the table scan of the
STAR.CONSUMER_SALES is, not surprisingly, very expensive. It is also easy to
see that there are no MQTs being used.
We will not attempt to go into more detail about DB2 Visual Explain. Instead, we
refer to the Visual Explain Help Guide, which provides detailed explanations of
each of the elements displayed in the graph as well as providing an insight into
how the graphs should be interpreted. A quick way of doing this is to double-click
a box in the Access Plan Graph and click the Help button. From here, there is a
specific description, as well as a general one, of what you see.
Basically, what the grouping sets describe is how the MQT is to be aggregated
across the dimension hierarchies. If you compare this to the cube model in DB2
Cube Views, you will be able to map where the MQT aggregates to, in each of
the dimensions.
Once you have the grouping sets and the cube model, you can determine the
hierarchies, by looking again in DB2 Cube Views, as shown in Figure 4-25.
Chapter 4. Using the cube model for summary tables optimization 179
Figure 4-25 A cube model hierarchy
Continue this analysis for any other grouping sets and MQTs, and you will have
determined all the slices made by the MQTs in your cube.
For an example of how you can visualize such a slice, please refer to Figure 4-6
on page 144, where two very shallow slices are depicted across the cube
dimensions.
Now that we know how deep into the cube the MQTs go, we have a good
foundation for determining how well any given query workload matches the
MQTs built.
The analysis of such a query workload lies outside the bounds of this book, but a
way to get started would be to find the queries that do not reroute to the MQTs
(use explain for this) and see if you can group them into families depending on
which dimensions they make use of and how deep down the dimensions they go.
Now order the families by size and by how many hierarchies they are from the
closest slice of any MQT that covers the needed dimensions. Take the largest
family and the family which are only one or two dimension hierarchies from being
able to use an MQT, and explore the cost of changing your MQTs (for example,
by using Cube Views) to match the query families. Continue until a large enough
percentage of queries route to the MQTs or other constraints such as disk space
are met.
Chapter 4. Using the cube model for summary tables optimization 181
The EXPLAIN output will indicate if the indexes are being used. If they are not, use
for example the Index Advisor (db2advis) to get recommendations on indexes to
benefit your workload.
Note that only one of the following three options can be used: [s,i,w]
Create an input file called db2advis.in with the 5 lines provided in Example 4-9.
Since the Index Advisor can use a file with the queries as input, along with
frequency indications, it is ideally suited for our needs, since we are already
working with a query workload for which we want to optimize the database.
See db2advis in the Command Reference for more information on the Index
Advisor.
Important: Always make sure that RUNSTATS have been run on the base tables
prior to building an MQT. DB2 depends on the statistics for accessing the
base tables, and the REFRESH time of the MQTs may be extended
considerably if the statistics are not present or current.
Make sure the DB2 special registers are set to allow the DB2 optimizer to
consider the types of MQTs you’re interested in.
or specifically:
Chapter 4. Using the cube model for summary tables optimization 183
To set the relevant special registers:
This controls what types of MQTs can be used for optimization. The default is
SYSTEM, while NONE will disable query rewrites.
Note that the CURRENT REFRESH AGE special register must be set to a value other
than zero for the specified table types to be considered when optimizing the
processing of dynamic SQL queries.
In cases where a SET statement has not yet been executed, the special registers
are determined by the value of the database configuration parameters. The
database configurations parameters can, for example, be viewed and changed
from the DB2 Control Center. Right-click the database and select Configure
Parameters.
The special register values described above can be mapped to the database
configuration keywords as shown in Table 4-6.
Should you want the MQT not be included in the DB2 optimizers efforts to
reroute queries, you can issue an ALTER TABLE statement with the option
DISABLE QUERY OPTIMIZATION. The materialized query table will then not be
used for query optimization. The table can still be queried directly, though.
In all cases a REFRESH TABLE <MQT tablename> statement should clear the CHECK
PENDING state.
Chapter 4. Using the cube model for summary tables optimization 185
In general, we get the largest performance benefits in the top of the cube, where
the measures are aggregated highly and the MQTs have few rows, compared to
lower down, where the aggregations are less and the MQTs have more rows. We
are, however, also helped by the fact that MQTs denormalize the base data
(often further) and thus at the expense of disk space, eliminate many joins which
often also are very costly. What this means is that we see substantial benefits
from MQTs even when the aggregation is fairly low. In the tests we made we saw
substantial performance benefits even where the MQTs built had slices that went
through the lower half of the cube model dimensions (guided by our cube
definitions).
It is, however, difficult to make firm recommendations as to how low you can go
in the cube before the performance benefit becomes small. One of the main
reasons for this that it is very difficult to say anything about data sparsity for a
given set of base data. In addition, the data sparsity varies greatly between
various sets of base data. The “point of diminishing returns” must, therefore, be
determined iteratively, given there are no other constraints such as time or
space. It is our experience that, after some initial experimentation, the basic
recommendations of the Optimization Advisor with a few iterations performed
quite well, given that we had no initial query workload with which to qualify our
estimates. It was, however, fairly clear that the better idea you have about the
query workload, the better matched the MQTs will be.
This optimization will obviously not always be the right approach. In order to get
an idea about what effect the Optimization Advisor query type specification has,
Table 4-7 is presented.
Drill down The focus of the optimization is at the upper parts of the
hierarchies and rollups may be done
Report Like drill down, but without the top level rollups
Drill through Like extract, but with additional slices near the bottom
Since most cases need cubes specified under the cube model in the OLAP
Center, it can be tempting to build a lot of them for any case that comes to mind.
This is not recommended. Generally it is advisable to create the cubes
predominantly for business reasons and to limit the number of cubes to a small
number, preferably less than a handful for big cases.
Chapter 4. Using the cube model for summary tables optimization 187
Avoid creating multiple cube models on the same base tables. You will have
difficulty maintaining and synchronizing the metadata between the cube
models and you will most likely not have one complete set of metadata
describing the entire data either.
Resist the urge to build MQTs that go to the bottom of all dimensions (by
building a cube reflecting the entire cube model). Even though any query
could, in theory, be routed to the MQT, DB2 will most likely discard it because
the cost of going to the MQT will be higher than going straight to the base fact
table. The end result will be a lot of wasted disk space.
If you are not getting the MQTs you want and changing the existing cubes
does not work, try adding a cube to the cube model to provide additional hints
to the Optimization Advisor.
The examples in the following sections are taken from query workloads
generated by various OLAP tools, but for simplicity’s sake we have selected a
number of them that all using the same MQT, built for drill down queries. The
script to create and refresh the MQT is provided in “MQT” on page 694.
Note that the MQT is specified with REFRESH DEFERRED. This was a deliberate
selection by us when running the Optimization Advisor and was done in our test
setup for flexibility reasons. We thus avoid placing the MQT in CHECK PENDING
state if changes are made to the base table. This is, however, a selection the
DBA must be very careful of using in a production environment.
Having inconsistencies between the base tables and the MQTs can result, from
the users point of view, in getting different results depending on whether the DB2
optimizer chooses to use MQTs or not. Since the optimizer behavior is
transparent to the end user, confidence in the data mart or data warehouse can
quickly be lost even when inconsistencies are quite acceptable from a theoretical
point of view. Database inconsistencies are, in practice, unnerving to the end
users — even more so when using MQTs, since the end user have no way of
knowing when an MQT is being used. The problem persists even in the cases
where the end users are SQL literate because the DB2 optimizer’s ability to
rewrite the SQL hides the use of MQTs.
Now suppose the user does a query on the quarter level again (all the time
knowing that the sums on quarters are off by a fraction) but this time adds region
to the query, which by chance is not in the MQT. If the user now sums the
numbers for the regions he will find that they match the sum for the three months
he did earlier but not the sum for quarter. Apparently the sums for quarters are
now correct? Yes, but only because the DB2 optimizer chooses to go to the base
tables because region has not been aggregated in the MQT.
This behavior is not very easy to predict - especially if there are multiple MQTs
and the queries span many dimensions. Often it is better to have the MQTs
invalidated when the base tables are updated and experience a performance
impact until the MQTs have been refreshed, than have a period where the users
can not depend 100% on their results.
With the MQT listed above in mind we now will take a look at the query examples.
Chapter 4. Using the cube model for summary tables optimization 189
Doing an explain without the MQTs yields the situation shown in Figure 4-26.
Figure 4-26 The top five most profitable consumer groups without MQTs
As can be seen in Figure 4-26, the cost of running the query without the query
rewrite is 766,026.56 timerons.
Now, in Figure 4-27, we try the same query on the same data, but with the MQTs
in place.
Figure 4-27 tells us that the cost of the query is now 1,3842.65 timerons — a
performance improvement of much more than an order of magnitude.
Now, one may argue that there are no relevant indexes on the fact table and that
this would change the picture we would be seeing, but placing indexes on the fact
table is expensive in terms of space, especially since you would have to index the
foreign key columns and you will still not have the preaggregations which we so
richly benefit from here. Moreover it should be noted that the MQTs generated by
the Optimization Advisor take up approximately 10% of the space occupied by
the fact table — much less than would be occupied by a set of indexes covering
the measures of the fact table.
Chapter 4. Using the cube model for summary tables optimization 191
Example 4-11 Querying down the cube
Select T1."REGION_DESC" "c1" , T1."AREA_DESC" "c2" , sum(T2."TRXN_SALE_QTY")
"c3" , sum(T2."TRXN_SALE_AMT") "c4"
from "STAR"."CONSUMER_SALES" T2, "STAR"."STORE" T1
where T2."STORE_ID" = T1."IDENT_KEY"
group by T1."REGION_DESC", T1."AREA_DESC"
order by 1 asc , 2 asc
Running the query without the MQTs provides the access graph in Figure 4-28.
Figure 4-28 Sales amount and quantity by region area without MQTs
Now, in Figure 4-29, see what happens if we allow the optimizer to make use of
the MQTs.
Figure 4-29 Sales amount and quantity by region area with MQTs
Chapter 4. Using the cube model for summary tables optimization 193
4.7.4 Moving towards the middle of the cube
The query in Example 4-12 goes down to the lower levels of the campaign
dimension. Here we start to qualify the query as we are looking for the Coupon
component of the campaigns.
The aggregations are not as large as before since we can reduce the relevant
number of rows needed from the fact table, as the work that needs to be done is
less, but take a look at the timeron cost for doing the query against the base
tables in Figure 4-30.
Even though we are only interested in the coupon campaigns, DB2 has no index
to take advantage of (the COMPONENT_ID) and thus again performs a
tablespace scan on the fact table. If there are many queries that reference this
column on the fact table we might consider building an index, but let’s take a look
at what our MQT can do for this query.
Chapter 4. Using the cube model for summary tables optimization 195
Figure 4-31 Sales through Coupon campaigns with MQTs
We could continue our exploration into the realm of MQTs, but we think that
these examples, even though they are quite simple, fairly represent the general
performance benefits reaped from using MQTs.
All these factors are considered by the Optimization Advisor but it is nevertheless
important to understand that these factors play an important role in determining
how efficient the use of MQTs are. By building cubes that more or less
encompasses the entire cube model and providing a large space allowance it is
possible to make the Optimization Advisor build large MQTs that go very deep
into the cube. However, space issues become severe as does the creation and
refresh times of the MQTs as they come near the number of rows of the fact table
so that the DBA responsible for their creation should survey the suggested MQTs
carefully before deploying them in a production environment.
Prior to the actual deployment of the MQTs, we suggest doing the following:
Determine the actual space requirements.
Determine the necessary MQT refresh window as the MQT will be as large as
or even larger than the base fact table.
Perform an Explain on the resulting database with a representative query
workload to determine that the MQTs are used and provide a substantial
performance benefit compared to their possible large cost of creation and
maintenance.
Generally it should be noted that if queries often are performed at the transaction
level of the cube with few or no aggregations, MQTs will not be of much help. In
this case we suggest exploring the use of indexes to boost performance in the
cases where certain fact table columns often are chosen above others or, if that
is not the case, in general rely on other means of performance optimization.
Chapter 4. Using the cube model for summary tables optimization 197
During the refresh of an MQT (either INCREMENTAL and FULL refresh), there is
a time associated to join the dimensions and fact tables involved on the MQT.
Depending on how well tuned your physical data model is, it provides significant
results during population of MQTs. Also, there are different techniques that you
can take to populate the MQTs like load instead of refresh, or avoid logging data.
During query execution time, the DB2 optimizer considers the MQT like regular
tables when it comes to access plan strategies. Good indexes on MQTs also are
important for query optimization as well update statistics on base tables and
MQTs are required for the DB2 optimizer to be able to choose an MQT instead of
accessing the base tables.
Another important aspect that you also need to consider for your query
environment is related to the different approaches that you can apply to refresh
the MQTs. You can for example, perform INCREMENTAL or FULL refresh on
the MQTs, and you also perform IMMEDIATE or DEFERRED updates on
existing MQTS. These different approaches can affect the availability of MQTs
and cause impact in your query environment. In the following sections we
discuss more of the details and techniques, and how you can implement them.
Note: In this chapter the term Regular Tables or Base Tables is used to
reference the underlying tables that are used to feed MQTs.
DB2 supports two different types of MQT: User Maintained and System
Maintained.
For the System Maintained MQTs, you can specify the frequency of maintenance
as either:
Refresh DEFERRED (point-in-time)
Refresh IMMEDIATE (current time)
DB2 Cube Views Advisor allows you to select either refresh DEFERRED or
refresh IMMEDIATE options. However, if for any reason, the SQL query for the
MQT is not supported by an MQT as refresh IMMEDIATE, DB2 Cube Views
automatically generates the MQT as refresh DEFERRED.
During this interval, by default, the MQT is available for queries, unless you
explicitly execute the command Set Refresh Age 0 (only MQTs defined as
refresh IMMEDIATE are available for query rewrite).
For the refresh DEFERRED option, there are two different scenarios, which
depend on the type of maintenance performed on the base tables:
1. Updates, Inserts, and Deletes:
a. These automatically reflect the changes on the STAGING tables within the
same unit of work.
b. There is a latency between updates on the STAGING tables and the
MQTs.
c. The MQTs are still available to be used during query rewrite by the DB2
optimizer.
d. Additional action is required to synchronize the MQTs. Transfer the
changes from the STAGING Tables to the MQT.
Chapter 4. Using the cube model for summary tables optimization 199
2. Load and Import with Insert option:
a. Data is inserted only in the Base Tables.
b. Depending on the Load options on the base tables, the MQTs are placed
in Check Pending No Access state and the DB2 optimizer does not route
any query to these MQTs until it is refreshed again.
c. There is a latency between updates on the STAGING tables and the
MQTs.
d. Additional action is required to compute the delta into the STAGING tables
and from the STAGING tables to the MQTs.
Here is a list of considerations that you might use as a reference to make this
decision:
Refresh IMMEDIATE MQTs, like refresh INCREMENTAL MQTs can only
have COUNT, SUM, COUNT_BIG and GROUPING aggregation functions.
Latency of the data. The tolerance for latency depends on the application.
– Some applications can accept a latency of the data for query, such as
end-of-day, end-of-week, end-of-month. For example, data warehouses
and strategic decision-making could accept a certain latency for the data.
In fact, for some situations, it is a requirement for the application that the
data is only refreshed during certain periods. In such cases, the MQT does
not need to be kept in synchronization with the base tables, and the
refresh DEFERRED option should be used.
– For OLAP applications and tactical decisions, probably any MQT latency
may be unacceptable and the IMMEDIATE option can be used.
Refresh IMMEDIATE on MQTs with a high volume of insert, update, and
delete activity could cause significantly performance overhead on the base
tables.
Refresh IMMEDIATE requires:
– Extra Column with COUNT(*) for maintenance
– Extra Column with COUNT(nullable_colum_name) on the select list for
each nullable column that is referenced in the select list with a SUM.
Refresh DEFERRED requires a staging table for INCREMENTAL refresh.
The INCREMENTAL refresh might be faster on an MQT defined as refresh
IMMEDIATE compared to an MQT defined as refresh DEFERRED because
there is no need to use staging tables.
Refresh DEFERRED MQTs can be kept out of synchronization.
Load insert activity on base tables:
– The MQTs defined as refresh IMMEDIATE option are unavailable while
the Load Insert operation is being performed on the base tables, unless
you specify ALLOW READ ACCESS on the load statement.
– The MQTs defined as refresh DEFERRED are available while the Load
Insert operation is being performed on the base tables.
Chapter 4. Using the cube model for summary tables optimization 201
4.9.4 INCREMENTAL refresh versus FULL refresh
When deciding between use INCREMENTAL refresh or FULL refresh on MQT,
you need to consider the following:
Refresh INCREMENTAL MQTs, like refresh IMMEDIATE MQTs can only
have COUNT, SUM, COUNT_BIG , and GROUPING aggregation functions
INCREMENTAL refresh increases the availability of the MQTs. The refresh
operation can be faster than a FULL refresh.
INCREMENTAL refresh requires an index on the group by columns,
otherwise the performance can be slower than FULL refresh.
INCREMENTAL refresh requires logging, unless alter table is specified on the
MQT to turn off to Not Logging.
The import replace or load replace option cannot be used on the
underlying tables of an MQT that needs to be incrementally maintained. FULL
refresh is required when used those options.
INCREMENTAL refresh can generate updates and deletes of existing rows on
the MQT.
The frequency of INCREMENTAL refreshes can cause a logging overhead
against the MQT:
– More frequent refreshes have the potential to involve more updates
against the MQT.
– Less frequent refreshes may result in fewer updates because data
consolidation may occur either on the staging table (for refresh
DEFERRED MQT) or underlying table (for refresh IMMEDIATE MQT).
– Less frequent refreshes could result in a large volume of data in the
staging table (for refresh deferred MQT) that needs to be pruned and
logged.
The current version of DB2 Cube Views only generates scripts to perform FULL
refresh on the MQT. If you plan to have MQTs incrementally maintained, you
need to create the required scripts as well as updating the DB2 Cube Views
scripts as required by the INCREMENTAL refresh process. The next session
covers the scripts required for different scenarios to perform FULL and
INCREMENTAL refresh on MQTs.
Note: Since DB2 Cube Views generates the DDL to create the MQTs and
these scripts are manually executed by the DBAs, they have the possibility to
perform any change on it. We do not recommended to perform changes on the
Select Statement for the MQT because such SQL is created based in some
intelligence of sampling the source data as well the type of query/report that is
performed against the MQT. If for any reason you need to change the SQL,
make sure that your MQT still valid and being used by the end user
query/reports.
Note: Even if the MQT is defined as refresh IMMEDIATE, but when using the
load utility (standard practice for data warehouse applications) to update the
base tables, the data is not automatically propagated to the MQTs. An
additional command is required to select, compute, and insert the data from
base tables into the MQT.
Figure 4-32 shows the process flow and the major tasks required for the FULL
refresh process on MQTs defined either as IMMEDIATE or DEFERRED.
Chapter 4. Using the cube model for summary tables optimization 203
Full Refresh on MQTs
MQT Maintainance Type:
Refresh Deferred
Refresh Immediate
Refresh After Initial Load
Market
Product Operational
oad
Input
lL
In itia
Create MQT
MQT
Refresh MQT not incremental
Time
Fact
Refresh After Additional Loads
Table Load Append
Operational
Delta D ro
Re p Exi
Input Delta
Re -c r s
fre ea t ting
sh eM M
MQ QT QT
Tn
ot
inc
rem
en
ta l
Scenario
Channel
MQT
After you perform Initial load on the base tables and create the MQT, you can
execute a command to perform a FULL refresh on the MQT. By issuing a
refresh command against the MQT, DB2 selects and computes the data from
the underlying tables (based on the select statement used to create the MQ)T
and inserts it into the target MQT.
Assuming that you need to append large volumes of data into existing underlying
tables and that you have decided to perform a FULL refresh on the MQTs after
every load append, since this process does not automatically refresh the MQTs,
you need perform execute a refresh command in order to synchronize the
MQTs with the underlying tables. By issuing a refresh command against the
MQT that is already populated, it first deletes the data from the MQT (unless you
manually drop and recreate the MQT), selects and computes the entire data from
the underlying tables (based on the select statement used to create the MQT),
and inserts it into the target MQT.
Table 4-9 and Table 4-10 provide a complete list of tasks required to implement
FULL Refresh on REFRESH DEFERRED and REFRESH IMMEDIATE MQTs.
Table 4-9 Initial FULL refresh on refresh DEFERRED and IMMEDIATE MQTs
Step Step definition Considerations
#
1 CREATE TABLE <affectable> ( Not Null constraint is required on surrogate
<prod_key> INTEGER NOT NULL, ... key for referential integrity
<sales> INTEGER NOT NULL,
<misc> SMALLINT, …..
) IN <fact_table_space>
2 CREATE TABLE <product> ( Not Null constraint is required on the
<prod_key> INTEGER NOT NULL, primary key for referential integrity
<prod_name> VARCHAR (30) NOT NULL,
<prod_group> VARCHAR (30) ), ….
IN <dimension_table_space>
3 CREATE UNIQUE INDEX <uix_prod_key> ON Recommended for performance purpose.
PRODUCT (prod_key; See DB2 documentation for additional
options on create index.
4 ALTER TABLE <product> ADD CONSTRAINT Primary Key constraint is required for
<pk_prod_key> PRIMARY KEY (prod_key); referential integrity
5 ALTER TABLE <fact_table> ADD CONSTRAINT A FOREIGN KEY constraint is required for
fk_prod_key FOREIGN KEY (prod_key) the DB2 optimizer to support MQTs on
REFERENCES <product> (prod_key) situation where the query does not match
ON DELETE NO ACTION the number of tables defined on the MQT. It
ON UPDATE NO ACTION can be either ENFORCED or NOT
NOT ENFORCED / ENFORCED ENFORCED.
ENABLE QUERY OPTIMIZATION
6 DB2 Load from … insert/replace into <product> … Initial Load on the Dimensions and Fact
Table.
DB2 Load from … insert/replace into <Fact_Table> Note: If an ENFORCED FOREIGN KEY
constraint is defined, you first need to load
the Dimension and at then load the fact
table.
Chapter 4. Using the cube model for summary tables optimization 205
Step Step definition Considerations
#
7a DB2 CREATE TABLE <mqt> AS (SELECT Create Table DDL for REFRESH
SUM(t1.sales> AS <sales>, DEFERRED MQTs.
SUM(t1.misc> AS <misc>, This example does support
COUNT(t1.misc) AS <count_misc>, INCREMENTAL REFRESH .
COUNT(*) AS <count_of_rows>, The COUNT(t1.misc) and COUNT(*) are
<t2.prod_name> AS <prod_name>, required for INCREMENTAL REFRESH.
<t2.prod_group> AS <prod_group>, Note:
…. > The COUNT(t1.misc) and COUNT(*) are
FROM <fact_table t1, product t2, …> only required for Incremental Refresh.
WHERE <t1.prod_key = t2.prod_key and …> > For Refresh Deferred MQTs, DB2 Cube
GROUP BY <prod_name, …>) Views does not generate DDL for
DATA INITIALLY DEFERRED Incremental Refresh as well it does not
REFRESH DEFERRED generate the COUNT(*) and
ENABLE QUERY OPTIMIZATION COUNT(nullable_measure_column) which
MAINTAINED BY SYSTEM are required for Incremental Refresh.
IN <mqt_table_space> >Columns that accept null and are listed
NOT LOGGED INITIALLY; on the GROUP BY clause can affect
considerably the performance for the
INCREMENTAL REFRESH;
ENABLE QUERY OPTIMIZATION is
required in order to the MQT be used by the
DB2 Optimizer during query rewrite.
The option NOT LOGGED INITIALLY is not
required, however, it improves significantly
the performance during FULL and
INCREMENTAL refreshes on the MQTs.
Chapter 4. Using the cube model for summary tables optimization 207
Table 4-10 FULL refresh on refresh DEFERRED/ IMMEDIATE MQTs
Step # Step definition Considerations
2 Load from … insert/replace into <Fact_Table> … Incremental Loads on the Fact Table.
3 REFRESH TABLE <mqt> NOT INCREMENTAL; The refresh table command performs a
Or FULL refresh on the MQT.
SET INTEGRITY FOR <mqt> IMMEDIATE The Set Integrity also can be used and
CHECKED has the effect (FULL refresh on the
MQT).
4 CREATE INDEX <index1> ON <mqt> (column Optional - you can create index on the
list..)… MQT to improve query performance.
Use the DB2 Index Advisor to identify
required Index.
Note: DB2 Cube Views generate index
for the MQT.
6 RUNSTATS ON TABLE <mqt> AND INDEXES ALL Update the table and index statistics
because they are used by the optimizer
to determined the cost for query rewrite
Dimension Dimension
Operational
Input
oad
lL
itia
- In
p1
Ste
Dimension Dimension
After you perform step-1 (Initial load on the base tables) and create the MQT,
you need to execute a command to perform a FULL refresh not INCREMENTAL
(Step-2) on the MQT. By issuing a refresh command (step-2) against the MQT, it
computes the data from the underlying tables (based on the select Statement
used to create the MQT) and inserts into the target MQT.
Assume that you need to append more data into existing base tables as well
synchronize the MQTS. After you append the data into the underlying tables
(step-3), you need to execute a command to incrementally refresh the MQT. By
issuing a refresh command with the Incremental option (step-4) against the MQT,
the delta information is selected and computed from the underlying tables (based
on the select statement used to create the MQT) and inserted into the MQT
table. This process can either insert new rows or update existing rows into the
MQT.
Table 4-11 shows all required steps that you need to perform in order to
incrementally refresh an MQT defined as refresh IMMEDIATE.
Chapter 4. Using the cube model for summary tables optimization 209
Table 4-11 Steps for INCREMENTAL refresh on refresh IMMEDIATE MQTs
Step # Step definition Considerations
1 ALTER TABLE <mqt> ACTIVATE This is not a required step; however, for large MQTs or
NOT LOGGED INITIALLY; when you are incrementally adding large volumes of data,
by disabling the log for the MQT you can improve
significantly the INCREMENTAL refresh process. If for any
reason the refresh process fails, the MQT might become
invalid and you need to Drop, Re-create and perform a
FULL refresh on the MQT again.
3 RUNSTATS ON TABLE <mqt> Update the table and index statistics because they are used
AND INDEXES ALL; by the optimizer to determined the cost for query rewrite
4 Load from … insert into Incremental Load data on the Fact Table. Note: For
<Fact_Table> … INCREMENTAL refresh only the Insert option is supported.
5 SET INTEGRITY FOR This command remove the “Check Pending” status from
<Fact_Table> IMMEDIATE Fact Table;
CHECKED;
6 REFRESH TABLE <mqt> The refresh table command with the INCREMENTAL
INCREMENTAL; option consider only the appended data from the underlying
Or tables (fact table and dimensions) . This option can produce
SET INTEGRITY FOR <mqt> either insert of new rows on the MQT as well can update
IMMEDIATE CHECKED existing rows.
INCREMENTAL
7 DROP INDEX Optional, You can remove the index created for the
<index_for_incremental>; INCREMENTAL refresh process if you think it is not used by
CREATE INDEX <index1> ON any other query;
<mqt> (column list..)… Optional you can create additional indexes on the MQT to
improve query performance. Use the DB2 Index Advisor to
identify required Index.
Note: DB2 Cube Views generate index for the MQT.
8 REORG TABLE <mqt>; Optional reorganize the MQT, especially if a cluster index
was defined.
9 RUNSTATS ON TABLE <mqt> Update the table and index statistics because they are used
AND INDEXES ALL by the optimizer to determined the cost for query rewrite
Figure 4-34 shows the process flow and the major tasks required for the
incremental refresh process on MQTs defined as refresh DEFERRED.
Dimension Dimension
Operational
Input
ad
l Lo
itia
1 - In
p
Ste
MQT
Dimension Fact Step 2 - Refresh MQT NOT incremental MQT
Delta
Step 3 - Load Append Operational
Input Delta
Delta
Delta S te
p4
- Re
fres
hS Step 5 - Refresh MQT
TAG
E INCREMENTAL
Staging
Table
Dimension Dimension
Chapter 4. Using the cube model for summary tables optimization 211
After you perform step-1 (initial load on the base tables) and create the MQT, you
need to execute a command to perform a FULL refresh (Step-2) on the MQT. By
issuing a refresh command (step-2) against the MQT, it computes and
aggregates the initial information from the underlying tables (based on the select
statement used to create the MQT) and inserts the data into the MQT table.
Assume you need to append more data into existing base tables as well
synchronize the MQTS. After you append the data into the underlying tables
(step-3), you need to execute a command to incrementally refresh the MQT. By
issuing a Set Integrity command with the Incremental option (step-4) against
the staging table, it selects and computes the delta information from the
underlying tables (based on the select statement used to create the MQT) and
inserts it into the staging table.
Table 4-12 shows all required steps that you need to perform in order to
incrementally refresh an MQT defined as refresh DEFERRED.
7 REFRESH TABLE <mqt> This step is to select the data from the
INCREMENTAL; staging table and populate the MQT
Or table;
SET INTEGRITY FOR <mqt> It can produce either insert of new rows
IMMEDIATE CHECKED on the MQT as well can update existing
INCREMENTAL rows.
Chapter 4. Using the cube model for summary tables optimization 213
Step # Step definition Considerations
The following are some recommendations that you need to carefully evaluate to
implement in your production environment:
Create index on MQT columns that are referenced on the where clause for
most used queries (use the DB2 Index Advisory to help you identify
appropriate indexes).
Evaluate the possibility to use unique index with include columns on
dimension tables. It can speed up retrieval time from these tables during the
INCREMENTAL and FULL refresh of MQTs.
Create a non-unique index on the MQT columns that guarantee uniqueness
of rows in an MQT. In case of a partitioned MQT, the partition key should be a
subset of the columns described above.
Do not create an index on the staging table, since such indexes degrade the
performance of appends to the staging table.
For partition tables, make sure that you partition the staging table according
to the partitioning of the MQT to promote collocated joins.
Refresh of MQTs consumes CPU, I/O, and buffer pool resources, which
ultimately impacts other users contending for the same resources. Refresh
resource consumption can be reduced by combining multiple MQTs in a
single refresh statement, since DB2 uses “multiple-query optimization” to
share joins and aggregations required of each MQT in order to reduce the
resource consumption.
Reorganize tables (regular and MQTs) after incremental load, insert, and
delete of large amounts of data.
Collect statistics for underlying tables and on the MQTs:
– After performing INCREMENTAL refresh on MQTs
– After performing FULL refresh on MQTs
– Perform any changes on existing MQTs (such as create, alter, or remove
index, alter table)
Chapter 4. Using the cube model for summary tables optimization 215
4.11 Configuration considerations
When using MQTs, two of the main questions will be:
How to estimate the memory required for MQTs
How to estimate the storage required for MQTs
The following recommendation apply only to non-clustered (or single node DB2)
configurations:
SORTHEAP:
– SORTHEAP is usually very important for MQT REFRESH.
– The size of the sortheap allocation depends on the complexity of the MQT.
– Look at the Explain plan of the full select used by the MQT to estimate
sortheap requirements and size accordingly....
– If you have the memory to consume (32 bit versus 64 bit), this is a good
candidate for over allocation.
– You need to ensure that the dbm cfg sort heap threshold parameter is
sized appropriately to support the sortheap specified.
– Note that the sortheap allocation will usually be significantly less in the
runtime environment.
STATEMENT HEAP:
– An MQT refresh may require a lot of statement heap. Otherwise, you may
get an error like “statement too complex”.
Besides the space required for the MQTs, you also need additional temporary
space for joins and aggregations.
If the MQT's size estimate provided by DB2 Cube Views is very large, it is
probably an indication that you may need more tempspace.
The following formula helps you to estimate the TEMP space required for refresh
on an MQT:
TEMPSPACE Required = (# of pages required) * (pagesize) where
# of pages required) = (Total # of rows in the MQT) * (10+10) / 2560)
(10 + 10) = Is the size stored in the TEMP per MQT row. It is defined
twice in situation where it needs to the DELETE of old data and
INSERT of new data in the MQT.
2560 is a number based on the fact that we store 256 rows per page
(assuming 256 slots in a page of 10 bytes each).
Notes:
1. The Delete is referring to deleting the old data in the MQT. If it was an initial
population, you do not need to account for this.
2. Note that the result of the formula gives the number of pages and the
number 2560 is independent of the page size. Depending on the page
size, we need to compute disk space accordingly. An 8K-page size would
require double amount of disk compared to a 4K-page size.
3. Particular care needs to be taken at the catalog node because the refresh
process of MQTs also uses TEMP space on this node. If you have a
separate the catalog node from the data nodes with very little TEMP
tablespace, you can have problems to perform full REFRESH on MQTs.
Make sure you add additional TEMP space on this node to avoid any
problems during the refresh process.
Chapter 4. Using the cube model for summary tables optimization 217
4.12 Conclusion
Building efficient MQTs is a difficult and time consuming task if done by hand. As
an alternative, we show that using the OLAP Center’s Optimization Advisor is a
tool that not only eliminates the laborious MQT construction job for the DBA, but
also does a very good job of constructing them based on the knowledge provided
in the OLAP Center cube model and accompanying cubes.
By briefly analyzing some examples from our sample data mart, we show how
MQTs may be deployed, ensuring that they are used, and ultimately how they
may highly benefit queries.
We also explain how to start building a Web service with OLAP capabilities.
The purpose of this section is to provide an introduction to those tools that can
access the DB2 Cube Views metadata directly via the API, and to document the
metadata bridges that make use of the API. For more information, please refer to
the bridge article on the Developer Domain:
http://www7b.software.ibm.com/dmdd/library/techarticle/0305poelman/0305poelman.html
The bridges that are documented are the ones that were available for testing at
the time of writing this book.
The advantage of having implemented the API as a DB2 stored procedure is that
it becomes language neutral. Any programming language that can talk to DB2
can invoke this stored procedure.
To use the API, the calling program must construct XML documents to pass into
the stored procedure. The program will also need to parse the XML that is
returned by the stored procedure.
One of the considerations that a developer will need to decide upon when
developing a bridge is whether the bridge will call the API to read the metadata,
or read in exported metadata XML files — in the case of a pull from DB2 Cube
Views. In the case of a push to DB2 Cube Views, the consideration is whether to
call the API to create the metadata, or write the metadata to an XML file.
As can be seen in Table 5-1, all partners to date have chosen to develop their
bridges by reading from and writing to XML files.
Cognos DB2 MR1 of Pull from DB2 XML or CLI Cube Model No
Dimensional Cognos Series Cube Views
Metadata 7 Version 2
Wizard
Notes:
Note 1: Dependent upon source and target products.
These bridges and their functionalities will be constantly changing over time, and
you will have to check directly with the tool partners for availability of these
functions, specifically for two-way and incremental changes to functions.
IBM DB2 Office Connect helps users overcome current limitations by providing a
simple GUI-based patented process that enables information in a spreadsheet to
be transferred seamlessly to multiple databases.
This chapter describes how to access multidimensional data in DB2 using IBM
DB2 Office Connect Analytics Edition.
For example, you can obtain a multidimensional view of aggregate data for
applications such as budgeting, cost allocation, financial performance analysis,
and financial modeling. If you are working with sales data, you can analyze the
data and then display revenue for:
All years, or for specified periods of time, such as one year, one quarter, or
one month
All products or specified products
All sales outlets or specified outlets
All countries or specified countries or regions within a country
In order to use Office Connect to exploit the new OLAP aware DB2, you should
have the following products installed:
IBM DB2 Office Connect Analytics Edition V4.0
Excel - Office2000 or XP and above
DB2 UDB V8.1 FP2+ and above
DB2 Cube Views V8.1 FP2+
Server
Client
da ta
e m eta
re c e iv
Project Manager
System Catalog
Connection Manager tables
ODBC connection
Metadata
Pivot table services Cube Model
Cube
SQL Queries
Relational tables
& data
DB2 Database
Attention: If you have both of Essbase and Office Connect add-ins active
then Essbase owns the double click and the right click mouse actions.
IBM DB2 Office Connect accesses OLAP metadata in DB2 through the DB2
Cube Views API (implemented as a stored procedure in DB2).
The actual data retrieval (with the help of pivot table services) is through SQL
queries.
P r e p a r e m e t a d a t a in D B 2
L a u n c h E x c e l , lo a d O f f ic e C o n n e c t
A d d - in
C o n n e c t t o O L A P a w a re d a ta b a s e
in D B 2 u s in g O f f ic e C o n n e c t
A d d - in
I m p o r t C u b e m e t a d a t a in t o P r o j e c t
m anager
B in d d a ta to e x c e l w o r k s h e e t
C re a te C u s to m R e p o rt
( F o r m a t r e p o r t u s in g O L A P s t y l e
C r e a t e b a s ic t o p - l e v e l r e p o r t O f f ic e C o n n e c t o p e r a t io n s )
Second, a cube model needs to be created in DB2 Cube views. Office Connect is
designed to work with only DB2 Cube Views cubes. Therefore, cubes (a subset
of the cube model) also needs to be defined.
Note: We only used Excel XP through the examples and figures provided in
this chapter. You may figure out some slight changes when using Excel 2000.
The Add-in can also be enabled/disabled from Tools -> Add-ins in Excel (see
Figure 6-4).
Select Next to select the cube metadata that you want to import (see Figure 6-7).
2. You have the option to select more than one cube at this time. If you select the
cube model, then all the cubes defined within that cube model are also
selected. The cube model ‘s name is retrieved to put the cube in context. The
cube model has no other semantics within an Office Connect retrieval.
To bind data to the Excel worksheet, you can right-click a selected cube and
select export data to Microsoft Excel (see Figure 6-9) or left-click a selected cube
and drag and drop it on to the Excel spreadsheet.
You have the option to select the sheet that you want to export to (see
Figure 6-10).
Click OK to export the data to the Excel spreadsheet (see Figure 6-11). The
spreadsheet report will now show data at the topmost level for all dimensions.
In Figure 6-11, STORE, PRODUCT and DATE are the dimensions. The fields
that are below (that is, Data) are the measures. This default report gives the
Sales data measures for all stores, all product and for all years. This is the top
most level of the aggregation. The level to which you can drill down to depends
on the cube definition — in terms of how many measures you have subsetted for
this cube and hierarchy levels for the dimensions.
You will also have a pop-up window called the Pivot Table Field List. This is
discussed in 6.4, “OLAP style operations in Office Connect”
You can add dimensions/members to the report from the Pivot Table Field List.
Note: Pivot Table Field List option (window) is only available under Microsoft
Office Excel XP. In Office 2000 there is no pivot table list and once a cube or
component's of a cube dropped to create pivot table, Excel's pivot table wizard
and Office Connect Project Manager are the only windows available to the
user to add/remove the cube component.
The Pivot Table Field List can also be invoked from the Office Connect tool bar
(see Figure 6-12)
Office Connect uses these Excel pivot tables to view the data and also to pivot it.
A pivoting action moves a dimension from a row to a column and vice versa.
Here are some of the common actions that you can perform in the Office
Connect workspace.
To drag a dimension from either from the field list or the pivot table, simply
left-click the name and release the mouse button to the location that you desire.
Note: You can only drag and drop the dimensions not the member names.
Tip: When using the mouse to move a dimension, a little spreadsheet icon
appears and moves with the mouse pointer. It has a rectangle to represent the
data, a long rectangle to the left to represent the row headings and a wide
rectangle at the top to represent the column headings.
You can also drill down using the Show Detail button on the tool bar.
From this member selection window, you can navigate and filter or deselect the
members that you would like to remove from your report. You can use the help
text for additional information.
Drilling up
Use the same member selection window to just select the upper level member
that you require while deselecting the lower level members.
If you drag the dimension from the row or column back to the top, then again this
will drill up for that dimension but to the top-most level.
You can also use the Hide Detail button on the tool bar.
To launch the PivotTable layout wizard, right-click the pivot table and select
Wizard... (see Figure 6-15)
Select Layout to launch the layout wizard (see Figure 6-16). Here you can drag
out the uninteresting measures, swap dimensions between rows and columns, or
remove dimensions from the report.
Chart
Right-click anywhere on the pivot table report and select Pivot Chart to display
the report as a chart (see Figure 6-17). You can use the Chart Wizard to select
the type of chart.
You also the option to save the data source connection information from the
project manager (Project -> Save data source to file)
With Office Connect, simply opening the saved report in Excel (without having
saved data source information) does not need supplying the data source
connection information again for that report or worksheet.
To delete a Office Connect report, right-click the report in Project Manager and
select Delete.
Office Connect requires to have cubes defined to mimic the SQL queries that you
expect when using Office Connect.
Question: How do you check if the report SQL query from Office Connect is
exploiting the MQT once it is built?
Answer: Extract the SQL query from Office Connect (by enabling SQLDebug
Trace) and use it in DB2 Explain. This will show whether the query is being
routed to the MQT or not.
The subsections 6.7.1, “Enable SQLDebug trace in Office Connect” and 6.7.2,
“Use DB2 Explain to check if SQL is routed to the MQT” explain how to do this.
Now, any type of drill action should give the SQL that the query is using. After
performing a query in Office Connect, an SQLDebug window will appear that
displays the SQL that has just been submitted.
Save the SQL using copy/paste to perform DB2 Explain. Example 6-1 shows a
SQL query that was used for retrieving the top most level of a cube.
Use the SQL that you saved from the SQLDebug trace to obtain the access plan
graph that DB2 uses. This graph will show whether DB2 will choose the MQT for
the data retrieved by this query (see Figure 6-19).
The following scenario illustrates the benefit of using MQTs (in other words,
optimizing the cube model in DB2 Cube Views) for Office Connect.
From a basic top-level report (refer to Figure 6-11) that we start with, we first drill
down to show data only for the West’ region. Example 6-2 shows the SQL used
to retrieve the data for this drill down action.
SUM("STAR"."CONSUMER_SALES"."TRXN_SALE_AMT") as
"TRXN_SALE_AMT",SUM("STAR"."CONSUMER_SALES"."TRXN_COST_AMT") as
"TRXN_COST_AMT",SUM("STAR"."CONSUMER_SALES"."TRXN_SALE_AMT" -
"STAR"."CONSUMER_SALES"."TRXN_COST_AMT") as
"Profit",SUM("STAR"."CONSUMER_SALES"."PROMO_SAVINGS_AMT") as "PROMO_SAVINGS_AMT"
From "STAR"."CONSUMER_SALES" inner join "STAR"."STORE" ON
"STAR"."CONSUMER_SALES"."STORE_ID"="STAR"."STORE"."IDENT_KEY" Where
(("STAR"."STORE"."ENTERPRISE_DESC"='Enterprise ' And
"STAR"."STORE"."CHAIN_DESC"='Chain Retail Market ' And
"STAR"."STORE"."REGION_DESC"='West') ) Group by
"STAR"."STORE"."ENTERPRISE_DESC","STAR"."STORE"."CHAIN_DESC","STAR"."STORE"."REGION_D
ESC"
Without having implemented any MQT, the time (in timerons) for this query was
362,752.31.
We then drill down on products to include data only for SKINCARE (see
Example 6-3).
Again, without having any MQTs implemented, the explain SQL in DB2 shows
that the time (in timerons) for this query was 49.184.94
See Figure 6-21 and Figure 6-22 for the access plan graphs for these two
queries.
After using the Optimization Advisor to create MQTs, the corresponding times (to
drill down on STORE and PRODUCT are 25.19 and 25.19
The scope and breadth of the QMF Product Family portfolio has provided for the
continuance of integration of the latest technologies. This chapter will discuss the
new integration offerings of QMF for Windows with multidimensional data
analysis through the use of OLAP technology.
In either case, the main point is that QMF was currently fulfilling the need of
some customers to do primitive OLAP-like functions in addition to providing for
their query and reporting needs.
Chapter 7. Accessing dimensional data in DB2 using QMF for Windows 249
DB2 Database
Receive metadata
System catalog tables
Metadata
Figure 7-1 Components required for QMF for Windows with DB2 Cube Views
All communications between QMF for Windows and DB2 Cube Views occur via
XML.
Prior to the release of QMF for Windows v7.2f, the types of queries supported
were SQL, Prompted and Natural Language. The introduction of a new OLAP
query object type was the necessary feature that brought the OLAP construct of
a cube into the QMF data space (see Figure 7-2). To create a new OLAP query,
select File->New... to display the new object window.
The new OLAP query can be saved at the server level in the QMF control tables
as type OLAP Query.
This new OLAP query object provides a drag-and-drop interface enabling the
user to build an OLAP query. The building of the OLAP query begins with the use
of the OLAP query wizard after OLAP query is selected for the New window.
Chapter 7. Accessing dimensional data in DB2 using QMF for Windows 251
Figure 7-4 OLAP Query wizard server
If DB2 Cube Views has not been installed or properly configured on the server
selected, an error message will occur as shown in Example 7-1.
2. Choose how to sort the cube list: schema or model (see Figure 7-5). Upon
completion of this step, QMF for Windows retrieves and sorts the list of cubes
by invoking the stored procedure of DB2 Cube Views to obtain the existing
cube definitions from the DB2 Cube Views catalog tables. If no cubes are
found on the server, an error message will occur.
a. The cube list sorted by schema begins with the server name, followed by
each schema name that contains one or more cubes and concludes with
all cubes owned by the schema name (see Figure 7-6).
b. The cube list sorted by model begins with the server name, followed by
each cube model that contains one or more cubes and concludes with all
cubes derived from the cube model (see Figure 7-7).
Chapter 7. Accessing dimensional data in DB2 using QMF for Windows 253
Figure 7-7 OLAP Query wizard cube
A tool tip can be displayed by placing the mouse over a metadata object in the
Object Explorer. The tool tip consists of the actual metadata object name, its
business name, its data type and the aggregation, if applicable.
Chapter 7. Accessing dimensional data in DB2 using QMF for Windows 255
There are three groups in the Layout Designer:
Top Dimensions
Side Dimensions
Measures
The Layout Designer in Figure 7-10 enables the user to drag and drop attributes
into the various groupings to create an interactive view of the multidimensional
data. The top and side groups will contain dimensions. The measure group
contains measures.
An option on the Layout Designer is to enable online mode. When this option is
selected, changes made in the Layout Designer will automatically result in
updates to the Query Results View.
When the enable online mode is not checked as in Figure 7-11, the Query
Results View appears greyed out, and updates made to the Layout Designer will
not take effect until the user selects Apply.
The query layout can also be created by using drag and drop within the lower
portion of the tree control in the Object Explorer entitled Layout. The Layout
Designer and the Layout tree control contain the identical query information.
The Query Results View appears in the middle panel by default. The actions of
dragging and dropping dimensions and measures into the Layout Designer are
reflected in changing to the Query Results View. The Query Results View is
constantly refreshed on each change made in the Layout Designer. This task is
accomplished via under the covers with SQL generated by QMF for Windows.
The SQL execution and status is indicated by the message line in the lower left
hand corner of the application. As with a regular SQL query, the user can cancel
the operation of the SQL generated OLAP query by selecting the Cancel Query
button or menu option.
When a cube model is selected for an OLAP query, the default result set will
contain the first measure, aggregated up to the highest level.
Filter option
The OLAP Query Filter command brings to the front a window that allows for the
user to select what values to include in the results. This filter panel in Figure 7-12
allows the user to determine precisely which values are available. A checked box
indicates that the value is included and an unchecked box indicates that the
value is not included. This filter also serves to re-add values that have previously
been excluded from the results. Changing the filter values requires the OLAP
query to execute SQL behind the scenes to generate the new results set.
Chapter 7. Accessing dimensional data in DB2 using QMF for Windows 257
window. The default for the filter option is that all attributes are selected and
included in the Query Results View.
It can be determined from the Object Explorer window whether any filters are in
place: a filter symbol is located in the upper left-hand corner of the existing
metadata object icon.
Chapter 7. Accessing dimensional data in DB2 using QMF for Windows 259
Formatting options
Formatting options shown in Figure 7-14 can also be applied to columns in the
Query Results View. To add formatting, select the desired column and either use
the right-click option or the formatting tool bar to change the formatting
parameters. You can specify column heading names, data text colors,
background colors, and data format.
OLAP functionality
QMF for Windows provides the mechanisms by which the user can employ
OLAP techniques while performing multidimensional data analysis. These
techniques include drill down, drill up, rollup, pivot, slice and dice, and drill
through.
Drill down
Drill down refers to a specific analytical OLAP technique when the user traverses
among levels of data ranging from the highest, most summarized level to the
lowest, most detailed level. The drill down path is defined by the hierarchy within
Drill up
Drill up refers to a specific analytical OLAP technique when the user traverses
among levels of data ranging from the lowest, most detailed level to the highest,
most summarized level. The drill up path is defined by the hierarchy within the
cube dimension and is the same as the drill down path. To decrease the
granularity of the result set, the drill up feature can be employed in the QMF for
Windows Query Results View. Simply click the plus (-) sign preceding the data
value to expand the level. Drill up can also be accomplished through
right-clicking a column header within the Query Results View and selecting drill
up.
Chapter 7. Accessing dimensional data in DB2 using QMF for Windows 261
By default, dimensions are displayed drilled up to the highest level of the
hierarchy (see Figure 7-16).
Roll up
Roll up refers to a specific analytical OLAP technique involving the computation
of the data relationships between all levels of a hierarchy in a dimension. These
data relationships are often summations though any type of computational
relationship or formula that might be defined.
The All values row represents the value of all of the collective hierarchy levels
rolled up to the highest level of aggregation.
Pivot
Pivot refers to a specific analytical OLAP technique of changing the dimensional
orientation of the result set. Pivot can be accomplished in QMF for Windows by
changing one of the top dimensions into a side dimension and vice versa or
swapping dimensions.
Drill through
Drill through refers to a specific analytical OLAP technique of switch from a cube
(multidimensional data model) to the primary relational data. Since QMF for
Windows is a complete relational query and reporting tool, the underlying
relational tables that develop the cube can be accessed and viewed as in
Figure 7-19.
Chapter 7. Accessing dimensional data in DB2 using QMF for Windows 263
7.5.1 Who can use OLAP functionality?
Because of its easy-to-use interface, QMF for Windows v7.2f can be tailored to
the OLAP requirements of virtually any educated worker from a senior-level
executive to a skilled business analyst, or even the average manager, sales
person, or novice user. Different members of an organization can access shared
OLAP queries, make data or formatting modifications, and save these modified
queries, thereby building a base of OLAP queries that suit the needs of each
individual user. The results from the analysis of the OLAP query can also be
printed.
To begin OLAP analysis with QMF for Windows, one cube object derived from a
cube model has to be defined with OLAP Center since QMF for Windows builds
its OLAP query based upon a cube. Since metadata objects are saved at the
server level, different users of QMF for Windows could access the any existing
cube objects and would not need to initially use the OLAP Center before creating
OLAP queries in QMF for Windows.
Figure 7-20 represents our scenario cube named Sales Cube. Sales Cube is
defined by a star schema with one center fact table, CONSUMER_SALES and
five dimension tables: CONSUMER, DATE, STORE, CAMPAIGN and PRODUCT.
Chapter 7. Accessing dimensional data in DB2 using QMF for Windows 265
1. Begin by creating a new OLAP Query object. Select File->New and choose
the OLAP Query icon.
2. Follow the OLAP Query wizard to select the appropriate server and cube from
the given cube list.
3. After the initial result set is retrieved, drag and drop the Consumer dimension
into the Side Dimension group indicated in the Layout Designer.
4. Drag and drop the Profit measure into the Measures group indicated in the
Layout Designer. Anytime dimensions and measures are added or removed
from the result set, the SQL is generated and sent by QMF for Windows to
DB2 to process the request.
5. Select the Filter option. Under dimension Store, expand Region Description
and deselect Central and East attributes. This will result in the inclusion of
only values from the west region.
In Figure 7-21, It can be seen that the most profitable groups are Unknown_less
than 19, Female_26-35, Female_36-45 and Female_19-25.
Figure 7-21 OLAP report 1: most profitable consumer groups in the West region
Chapter 7. Accessing dimensional data in DB2 using QMF for Windows 267
2. Pivot on the Time dimension by moving the Time dimension from the Side
Dimension group to the Top Dimension group.
3. Add the Consumer dimension to the Side Dimension Group.
4. Add Profit to the Measure Group.
5. Drill down into the Female level and ascertain in Figure 7-23 that Females
56-65 have increased the profit margin by close to 5% from 1998 to 1999.
7.6 Maintenance
When operating OLAP queries, the user should take care about:
Invalidation of OLAP queries
Performance issues
Select the Limits tab. The following limits may need to be adjusted to
successfully run high demanding OLAP queries:
Maximum Rows to Fetch:
– Warning Limit
– Cancel Limit
Maximum Bytes to Fetch:
– Warning Limit
– Cancel Limit
7.7 Conclusion
QMF for Windows v7.2f provides support for multidimensional data analysis
through the introduction of the OLAP query, enhancements to the graphical-user
interface and support of DB2 Cube Views. For more information on QMF for
Windows and the QMF Family, go to:
http://www.ibm.com/qmf
Chapter 7. Accessing dimensional data in DB2 using QMF for Windows 269
270 DB2 Cube Views: A Primer
8
DISCOVER PREPARE
PREPARE TRANSFORM
TRANSFORM
Enterprise Connectivity
Enterprise Connectivity
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 273
of your data quickly, without guesswork or labor intensive manual intervention
and on-going maintenance. MetaBrokers come in five groups:
a. The first group deals with data model design, and includes tools such as
CA ERwin, Oracle Designer, and the Unified Modeling Language (UML).
b. The second group deals with OLAP and Business Intelligence tools such
as Cognos PowerPlay, Business Objects, and Hyperion.
c. The third group deals with ETL tools such as Ascential DataStage and
Informatica PowerCenter.
d. The fourth group enables the sharing of operational metadata. This allows
critical DataStage operational metadata to be perfectly reconciled with its
associated design within the directory. Conceptually, this can be used with
other tools' event metadata, provided that it conforms to the prescribed
format and meaning.
e. The fifth group is a custom MetaBroker capability called MetaArchitect.
MetaArchitect is a repeatable mechanism used to establish relationships
and interchanges with a third party tool's metadata when no MetaBroker
currently exists. It can also be used for special requirements such as
Stewardship, DataStage, or Glossary information exchange. Using an
existing metamodel, such as DataStage, MetaArchitect can alias the
existing base model into a form that would not otherwise be possible. This
enables rich bi-directional metadata exchange via Common Separated
Values (CSV) or XML Metadata Interchange (XMI) file formats, with
optional XSL-T style sheets for granular XML vocabulary formatting.
MetaArchitect is the most expedient and consistent approach for the
integration of home grown and commercial repositories such as CA
Advantage and ASG Rochade.
3. MetaStage Explorer:
MetaStage Explorer is a power user client interface for inspecting and
interacting with the metadata in the MetaStage Directory. It delivers
sophisticated metadata navigation and analysis functions. To minimize
manual intervention, key Explorer functions have been script-enabled to
permit users to focus on high-value analysis and management activities. The
MetaStage Explorer delivers these key capabilities:
– Impact analysis: Using one of two model-oriented browser capabilities,
you can traverse the underlying metadata objects using any tools' Meta
Model representation to understand their relationships. More powerfully,
you can immediately determine where an individual ERwin table is used,
and what depends on that definition for its daily function, such as a
Business Objects universe, Cognos Impromptu catalog, or DataStage
design. With any change to an original ERwin design, you know exactly
how data is flowing into a warehouse, or what BI tools' reports could be
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 275
Under the covers, it utilizes an administratively controlled, SQL-accessible
portion of MetaStage Directory. Built using industry standard Java
technology, the MetaStage Browser provides a reusable template for
integrating your metadata into your own information delivery environment —
such as an Enterprise Information Portal, Business Intelligence tools, and
Microsoft Excel — or other application or Web technologies using
industry-standard SQL.
MetaStage
Clients SQL
MetaStage Directory
ProfileStage DataStage
QualityStage
Impact
Analysis
Figure 8-3 shows this flow of metadata in a typical data warehouse lifecycle
flowing from source system analysis to conceptual and physical models to
Business Intelligence (BI) from left to right.
3. 5.
1. Job Design and BI Tool
Standard Data 2. 4.
Table runtime Meta Data Table Meta Data
Definitions
Subscription Subscription
To
Portal
This flow of metadata from design to end user illustrates the implementation of a
publish and subscribe paradigm that MetaStage uses to enable an organization
to formalize metadata policies and procedures.
This process lets you control who has the authority to make data public and to
export it to other tools.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 277
Examining Figure 8-3 a little more closely, keeping in mind the publish and
subscribe paradigm, a warehouse project might typically start with a set of
conceptual and physical data models, or source system analysis. To create data
models, you may use any tool of your choosing. While it is recommended that
you standardize on a data modeling tool in you organization, MetaStage does not
force you to standardize on any one modeling or Business Intelligence tool. In
fact, this is the beauty of MetaStage. If, for example, you find that it is more
productive for one data modeling group to use UML class diagrams with
Rational® Rose® and another group to use ERwin, MetaStage does not prevent
you from doing this.
Note: MetaStage supports UML 1.1, 1.3 and 1.4 via XMI file format.
Therefore, any modeling tool that supports XMI export (as for example
Rational Rose) is automatically supported by MetaStage.
Step 1 in Figure 8-3 shows that when the warehouse data models are stable they
can be made available to other uses by publishing the metadata objects in
MetaStage. Once published, the data model definitions then become the
standard metadata definitions that subsequent warehouse processes will use. In
the flow depicted by Figure 8-3, the users of the data model definitions are the
Extract, Transform, Load (ETL) operation and the BI process.
Simultaneously, the ETL and BI development groups can subscribe to the data
model definitions provided by data modelers. By subscribing to the standard data
model definitions the ETL and BI processes can no operate in parallel using the
same metadata definitions achieving a maximum level of reuse and consistency.
After the ETL and BI developers have completed their respective tasks, the
specific metadata definitions for the ETL and BI processes can now be published
as shown in Steps 3 and 5 in Figure 8-3. Now that MetaStage has the complete
set of metadata definitions that span the enterprise data warehouse process,
business and technical metadata can be selectively distributed to end users via
either:
Generated HTML documentation (customizable by XSL transformations)
Directly, via SQL against an administratively controlled relational schema
Customizable JSP-based standard Web interface
Since MetaStage stores all metadata definitions in its directory powerful analysis
can be preformed on both design and runtime process metadata. MetaStage
provides powerful query, impact analysis and data lineage capabilities to better
understand the nature of the business and technical metadata in your
environment.
DB2 Business
ERwin
Cube Views Intelligence
Exployer
Administrator
Integration Hub
Listener
Database
DataStage Process
Server MetaBroker
The unique semantic translation that only Ascential MetaBrokers can do occurs
after the MetaBroker has read each tool’s specific metadata. Once read, the
MetaBroker will perform semantic translation into atomic semantic units and
store each semantic unit in the Directory. Once stored in the Integration Hub, the
semantics of each unit is preserved for use by any other MetaBroker. The units
highlighted by Figure 8-4 reflects the semantic equivalency of atomic units in the
Directory are available for read by any other MetaBroker where there is metadata
equivalency between tools. Ascential calls this semantic overlap. The picture
conceptualizes the semantic overlap between different tool’s metadata models.
Furthermore, having understood the full extent of each tools metadata model,
and the overlap, MetaStage can ready shared metadata between tools.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 279
For example, a table stored by ERwin has the equivalent meaning in DB2 Cube
Views and most BI tools. If the ERwin MetaBroker stores a table metadata
definition object, the same table object is then available to be read by the DB2
Cube Views MetaBroker or any other MetaBroker that has a semantically
equivalent metadata definition.
To drill down one step further into the flow of metadata definitions with
MetaStage, we will use Figure 8-5 to illustrate the process. Figure 8-5 shows that
metadata definitions always flow in and out of MetaStage via a MetaBroker.
Note: API includes native tool APIs such as COM, C++, Java, SQL and others
dependent on the data source access provided by the tool.
DB2
XML Cube Views
ERwin
MetaBroker
XML
API
MetaStage
Directory
Business
Objects,
Cognos, etc…
Unlike Figure 8-3 on page 277, Figure 8-5 implies no sequence of flow.
Often this depends upon each company’s best practice approach to metadata
management, and the tools involved. Figure 8-3 on page 277 depicts a
recommended flow of metadata for a specific warehouse project circumstance
where we start with data models. However, MetaStage does not enforce this flow
of metadata. In the following sections we will explore concrete examples of the
flow of metadata in various scenarios to get a head start in developing DB2 Cube
Views cube models and generally integrating DB2 Cube Views in your data
warehouse environment.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 281
Figure 8-6 ERwin 4.1 sales star schema
ERwin 4.1 has the ability not only to design your star schema for use in DB2
Cube Views, it can tag each table in the star schema as playing an OLAP role so
that other tools can use that information. In ERwin 4.1 you can tag a table as
being a Fact or Dimension table.
To capture the dimensional model from ERwin 4.1 you must first tag each table
in your star or snowflake schema as being a fact or a dimension. Applying this
additional metadata is what makes the abstraction from your physical data
structure to a dimensional structure. Without such dimensional metadata, the
MetaBroker simply sees a relational data structure.
Figure 8-7 shows that the CONSUMER_SALES table has been defined as a
Fact.
Note: You must manually select each table as being a fact or dimension for the
appropriate XML tag to be generated in the ERwin XML export. Choosing
Calculate Automatically will not produce dimensional metadata in the XML
export.
A summary of the process involved to get ERwin 4.1 dimensional metadata into
DB2 Cube Views is shown by Figure 8-8.
1. The ERwin 4.1 metadata must be exported to an XML file format.
2. MetaStage uses the ERwin 4.1 MetaBroker to import the ERwin 4.1 XML file
format.
3. The relevant metadata objects are exported to the DB2 Cube Views XML file
format.
4. The DB2 Cube Views XML file format is imported into OLAP Center.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 283
ERwin DB2
Cube Views
2. Run
1. Export
ERwin
ERwin 4. Import
MetaBroker
XML DB2 Cube
Views XML
XML MetaBroker MetaStage MetaBroker XML
3. Run DB2
Cube Views
MetaBroker
To provide the ERwin 4.1 dimensional metadata to DB2 Cube Views, you must
first export your ERwin 4.1 model to an XML file format and import the ERwin 4.1
metadata into MetaStage. You do this by performing a MetaStage import using
the ERwin 4.1 XML file format as the source. Figure 8-9 shows the MetaStage
import dialog used to import the ERwin 4.1 metadata.
Note: The ERwin 4.0 MetaBroker is forward compatible with ERwin 4.1.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 285
Figure 8-11 shows the ERwin 4.1 metadata in MetaStage after the import is
complete.
Figure 8-11 shows the major subset of metadata that can be shared with DB2
Cube Views. We can see that each ERwin table and its respective OLAP object
is imported into MetaStage. These objects can now be exported to DB2 Cube
Views so that further refinement of the cube model can occur in DB2 Cube
Views.
As shown in Figure 8-8 on page 284, the DB2 Cube Views MetaBroker creates
an XML file containing the metadata definitions that must be imported into DB2
Cube Views subsequent to the completion of the export of metadata from
MetaStage.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 287
The DB2 Cube Views MetaBroker requires the name and location of the XML file
to produce, as shown by Figure 8-13. This XML file contains the source metadata
definitions that will be imported to DB2 Cube Views using the XML import
feature.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 289
In Figure 8-15 we can see the Hyperion cube model. Since there has already
been an investment in developing a cube model, it makes sense to reuse the
same cube model in other parts of the organization. In this case we want to make
the cube model available to DB2 cube views. To do this with MetaStage, we
must first import the Hyperion MOLAP database model into MetaStage and then
export it to DB2 Cube Views.
3. Run DB2
Cube Views
MetaBroker
Figure 8-16 Summary metadata flow from Hyperion Essbase to DB2 Cube Views
Once you have exported the Hyperion metadata model to an XML file format,
MetaStage can be used to import the Hyperion metadata into MetaStage.
Figure 8-17 shows the import selection dialog.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 291
After selecting the Hyperion MetaBroker to import the Hyperion metadata, we will
see the metadata in MetaStage shown in Figure 8-18.
Here you must provide the Hyperion metadata XML file format for the
MetaBroker to import.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 293
Once the Hyperion metadata is in MetaStage, we export this metadata to DB2
Cube Views by subscribing to this metadata.
You must specify the location of the DB2 Cube Views XML file for the MetaBroker
to produce. After running the DB2 Cube Views MetaBroker, an XML file will be
produced. Import this XML file into DB2 Cube Views using OLAP Center.
The Hyperion cube model is now stored in DB2 Cube Views and ready for
enhancement and use. The resultant cube model in DB2 Cube Views is shown in
Figure 8-14 on page 289.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 295
To show cross tool impact analysis we will look at the column TRXN_SALE_AMT
defined in the ERwin data model shown in Figure 8-22.
To perform cross tool impact analysis, MetaStage must have stored in its
Directory metadata from all the tools you want to include in the analysis.
For this example we will use metadata from ERwin and DB2 Cube Views. In
8.2.1, “Importing ERwin dimensional metadata into DB2 Cube Views” on
page 281, we saw how to import the ERwin metadata into MetaStage and then
export this metadata to DB2 Cube Views. We will assume that this step has been
performed.
Assuming that we already have the ERwin metadata in the MetaStage Directory
we now need to import the DB2 Cube Views metadata into MetaStage. To do this
you must export the appropriate cube model from OLAP Center into an XML file
format for the DB2 Cube Views MetaBroker to read.
When the OLAP Center XML export is complete, you will have an XML source
file to use with the DB2 Cube Views MetaBroker. The DB2 Cube Views
MetaBroker will read the metadata from this file to import into MetaStage.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 297
You must import the DB2 Cube Views metadata by running the MetaBroker.
Invoke a MetaStage import: at the Import Selection dialog, choose IBM DB2
Cube Views as shown in Figure 8-24 to run the import.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 299
After running the DB2 Cube Views MetaBroker you will have all of the metadata
relating to your cube model in MetaStage shown in Figure 8-26. Notice that the
ERwin metadata was already imported into MetaStage.
Figure 8-26 MetaStage after ERwin and DB2 Cube Views metadata import
Before we can run cross tool impact analysis queries we must run the
MetaStage Object Connector. From MetaStage choose Tools>Object
Connector to open the Object Connector dialog shown in Figure 8-27.
The Object Connector will automatically search the MetaStage Directory for
objects that have equivalent identities and connect them and their respective
child objects using the special Connected To relationship.
Note: Each object has an identity that usually includes its name.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 301
Run the Object Connector to connect semantically equivalent objects in the
MetaStage Directory. When all equivalent objects have been connected cross
tool impact analysis can be performed. In our scenario we are interested in the
column TRXN_SALE_AMT shown in Figure 21. In our example we want to show
the impact of making a change to the TRXN_SALE_AMT. If we want to make a
change to a column in our data model, we would typically make the change in the
tool that stores the master copy of our data model. In this case, ERwin is storing
the master copy of the data model metadata. Therefore, we should make the
change in ERwin. Functionally in MetaStage this means that it will be more
effective if we make ERwin the context from which we run our impact analysis
query.
To make the ERwin copy of the TRXN_SALE_AMT to the root of our impact
analysis query we must change the context in MetaStage so that we are
browsing the ERwin metadata. When we run the impact analysis query however,
we will see the impact of making a change to TRXN_SALE_AMT from ERwin
across to DB2 Cube Views. Impact analysis queries always begin from some
object. We will call this object the root object. Therefore, in our example
TRXN_SALE_AMT will be the root of our impact analysis query.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 303
You will now see the sales data model from the ERwin perspective shown in
Figure 8-31.
From Figure 8-22 on page 296 we already know where the TRXN_SALE_AMT
is: it is part of the CONSUMER_SALES table. To run the impact analysis query
on the TRXN_SALE_AMT we need to navigate to the column object. To do this,
right-click the CONSUMER_SALES table object and select Browse from
CONSUMER_SALES>CA ERwin 4.0 as shown in Figure 8-32.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 305
Figure 8-33 ERwin Where Used impact analysis menu
After running the impact analysis query, you will see the screen in Figure 8-34,
showing the impact analysis across tools with the viewing context being that of
ERwin only. To show both the ERwin and the DB2 Cube Views context on the
same screen, click the button Show Connected Objects via creation view.
After clicking the Show Connected Objects via creation view button, you will
be presented with a new impact analysis query path viewer. You will be able to
navigate around the path viewer canvas by using the horizontal and vertical scroll
bars.
If we scroll down to the bottom of the path viewer canvas, shown by Figure 8-35,
we can see the impact of making a change to column TRXN_SALE_AMT to DB2
Cube Views. We can see that TRXN_SALE_AMT has a relationship
Of_OLAPMember to the measure Profit that is subsequently used in other
Measures.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 307
Figure 8-35 Impact analysis path viewer with creation view context
Note: After creating the Impact Analysis, the user is able to do a right-mouse
click and create HTML documentation.
Process analysis uses process metadata to tell you the history of process runs,
including success or failure or warnings, parameters used, time and date of
execution. It focuses on the path between job designs and the events they
generate. This information is useful, for example, if you want to check whether
past jobs ran successfully or run with errors.
Data lineage uses process metadata to tell you the history of an item of data,
including its source, status, and when it was last modified. It focuses on the
source table in a DataStage job and the derivations, transformations and lookups
that connect it to a target table in the Operational Data Store or datamart. This
information is useful, for example, if you are trying to resolve a data warehousing
design problem, and need to collect information about the way the information
was transformed for the business user from the source system.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 309
When you right-click a captured object in one of the above classes, and choose
Data Lineage> Find Sources, or Data Lineage> Find Targets, a data lineage
path appears in the Path Viewer if a path is available for that object. The path
includes the source data collection, the target data collection, the links that
connect them, and either the number of rows read and written or the time and
date of the event.
Data lineage queries allow you to answer the following types of questions:
Which jobs updated table Sales in the last two days?
What was the overall status of DataStage job CashItems, and did it report any
unusual occurrences related to table Sales?
What data sources did job CashItems use? How exactly did it transform them
into Sales?
Note: Data lineage queries can also report on the success or failure of
processes, but only insofar as the processes affect specific tables that were
written to or read from. Process analysis queries look at executable objects
such as DataStage jobs and all the resources touched by the events and
activities they generate.
Jobs that run successfully generate events associated with the data resources
they access. Jobs that are aborted generate events identifying the point at which
failure occurred.
When you capture process metadata with the Process MetaBroker, these
events, along with the related activities and resources, are stored in the
MetaStage Directory. MetaStage uses objects of the Event, Activity, and
Software Executable classes to create Process Analysis paths. There are two
types of process analysis paths:
Find Runs: These paths start with a specified software executable and
continue through sets of events to the resources touched. (In the MetaStage
view, a DataStage job design is an instance of the Software Executable
class.)
Both paths follow the same data model containment and dependency
relationships used in impact analysis, but process analysis paths provide a more
direct display of the relationships between executables and runs.
Figure 8-37 is a Find Runs path example, taking ActivityJob01UD, as the source
software executable.
The shading of the two run icons and the shaded area around the event icon
(lower right) illustrates what you see if a run of the selected executable fails. This
shading appears as red in the MetaStage Path Viewer and indicates failed runs.
You can trace the failure to a specific event in the run, and see that, in this case,
the failure is associated with CTransformerStage1. You can right-click the event
and inspect it to see the error message DataStage wrote to the log, which is
stored in the Event’s Message attribute. By inspecting the Actual parameter set
object, you can see the parameter values used for this run.
Note: The same process analysis can be done using the DataStage view.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 311
The basic flow of data in the DataStage job shown in Figure 8-38 assumes that
some other system produces flat sequential files. Although DataStage has the
capability to access almost any kind of source system (including, but not
exhaustive, Siebel, Oracle Applications, SAP R/3, PeopleSoft, J.D. Edwards,
RDBMS, mainframe flat files and database sources, JMS, WebSphere MQ, Web
services, and XML) to load directly into the DB2 star schema, this example does
not show this configuration. The DataStage job will read the data from each file,
transform the data, and load the results into the respective tables. As the job
runs, certain operational process metadata will be produced and captured by the
MetaStage Process MetaBroker.
Transaction/Other System
Run
XMLRun
XML
consumer_sales.txt
Star Schema
DataStage
DataStage event
event
Server event
Cached Events
ActivityCommand Process
TCP/IP TCP/IP Listener RunImport
MetaBroker Run
XML
DataStage Server
Host MetaStage
MetaBroker
Activity,
RunContext,
Run & Event
Objects
MetaStage Repository
Host
The Process MetaBroker is installed and configured on the server host running
DataStage. When a DataStage Server job runs, the job uses the
ActivityCommand interface installed by the Process MetaBroker to
communicate process events via TCP/IP to the Process MetaBroker. As the
DataStage Server job continues to run, events will be cached in the Process
MetaBroker events directory specified in the Process MetaBroker configuration
file.
When an end event is received by the Process MetaBroker signaling that the
DataStage Server job has completed, the Process MetaBroker is ready to
transmit the run to MetaStage. The Process MetaBroker will transform each
individual event file for a particular run into a single XML file and send it to the
Listener running on the MetaStage host. The Listener is a process that runs on
the MetaStage host and listens on a particular port defined in a configuration file.
The Listener’s purpose is to wait for the Process MetaBroker to send completed
runs to the MetaStage host.
When the Process MetaBroker packages up a run in an XML file, it will connect
to the Listener on the MetaStage host and transmit the XML file. The Listener will
store the XML file in a directory specified in a configuration file. Once the run
XML files are on the MetaStage host, they can be imported into the MetaStage
Directory using the RunImport utility. The RunImport utility will read the run XML
file/s and import them into the MetaStage Directory. Performing a RunImport will
result in Activity, RunContext, Run and Event objects being created in the
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 313
MetaStage Directory. These objects will have relationships to DataStage ETL,
design metadata objects so that data lineage and process analysis can be
performed.
When these have been completed, we will perform the steps required to import
the process metadata so it is ready for data lineage and process analysis
queries.
We must now import the DataStage job design metadata into MetaStage. To do
this we will create a DataStage import category called DataStage_p0 in
MetaStage as shown in Figure 8-40.
Figure 8-41 Importing multiple DataStage job designs from a DataStage project
After clicking New as shown in Figure 8-41, the Import Selection dialog will be
presented. Here we select Ascential DataStage v7 as the source MetaBroker
shown in Figure 8-42.
We will accept the defaults and click OK. The DataStage MetaBroker parameters
dialog will be shown. Accept the defaults and click OK.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 315
The DataStage login dialog will be shown. For our example, we are connecting to
the host wb-arjuna and the project p0 as shown in Figure 8-43. We will click OK
here and the DataStage MetaBroker will import the contents of the p0 DataStage
project into MetaStage.
We now have all the data model and job design metadata in MetaStage. Before
we can run the DataStage jobs, we must look at the Locator concept a little
more.
Locators in MetaStage
Because of the inconsistencies in certain ODBC drivers, MetaStage cannot
always match captured table definitions to the identical table definitions
previously imported from DataStage. When MetaStage does match a captured
table definition to a previously imported table definition, it does not create a new
object in the directory, but instead connects the run-time process metadata
information to the originally imported table definition. You can then use data
lineage and process analysis to see which events touched this table definition
when the job was run, and to see which column definitions the table definition
contains. (If you are viewing the objects in the MetaStage view, table definitions
are called data collections, and column definitions are called data items.)
When it cannot match a table definition, MetaStage creates a new table definition
with the same name as the table definition in the job design, and adds it to the
directory. However, this table definition will have no column definitions, because
the Process MetaBroker does not capture column definitions during runtime. The
value of its Creation Model attribute is MetaStage instead of DataStage.
When you import table definitions from these databases into DataStage, a fully
qualified locator string is created for them based on the information in the Locator
table. This locator information remains with the table definitions when they are
imported into MetaStage or captured by the Process MetaBroker.
The Locator table will be used as a lookup while the DataStage jobs are running
so that the DataStage engine can create event files with the correct Locator path
for DataStage objects. The DDL to create the locator table is:
CREATE TABLE MetaStage_Loc_Info (
Computer varchar(64),
SoftwareProduct varchar(64),
DataStore varchar(64));
To create the locator table we submit the SQL above to the DB2 connection we
established in Figure A-13 on page 652 which was RETAIL. Next we must insert
an entry in the Locator table for the DataStage Server engine to use. The SQL for
our example is:
insert into db2admin.MetaStage_Loc_Info (Computer, SoftwareProduct,
DataStore) values ('wb-arjuna', 'DB2', 'RETAIL');
For our example the values for the SQL insert statement can be obtained by
using the MetaStage Class Browser. We will open the MetaStage Explorer and
click the Computer icon in the Shortcut bar on the left.
In our example, there will be two Computer objects, one created by the ERwin
MetaBroker and one created by the DataStage MetaBroker. We will expand the
wb-arjuna object imported by ERwin and further expand the Hosts_Resource
and Created_Resource relationships. As shown in Figure 8-44, the values
displayed for our example are the values we wish to insert into the Locator table.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 317
Figure 8-44 MetaStage: computer instances
This means that when TableDefinition objects are created in the process
metadata, the Locator path used will be the path we specify in the locator table
MetaStage_Loc_Info described above. After submitting the SQL to insert the
Locator path entry, the table will have the following row:
SELECT * FROM MetaStage_Loc_Info;
Get Data All:
"COMPUTER", "SOFTWAREPRODUCT", "DATASTORE"
"wb-arjuna", "DB2", "RETAIL"
1 row fetched from 3 columns.
Now we have created and populated the Locator table we proceed to run the
DataStage jobs to produce process metadata. Since we have already configured
the Process MetaBroker and Listener, capturing process metadata for our jobs is
a simple matter of running the DataStage jobs.
We will open the DataStage Director shown in Figure 8-45 to run our DataStage
jobs. We have two jobs for this example, LoadDimensions and LoadFacts.
1. First we will run the LoadDimensions job. To do this we will highlight the
LoadDimensions job and click the Run Now button from the tool bar. Running
this job will produce a run XML file on our Listener host. The location of the
run XML file will be the value we entered into the Listener configuration file in
Table A-3 on page 644. When the LoadDimensions job is complete we will
run the LoadFacts job to produce another run XML file. The results of our
DataStage job runs are shown in Figure 8-46.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 319
For our example, we ran two jobs and we have two resultant XML files. All the
events and activities associated with the job runs are contained in the XML files.
A sample of the run XML is shown in Example 8-1.
We can see that a Write event started and affected the LocatorComponent:
wb-arjuna->DB2->RETAIL->STAR->CAMPAIGN. We can see that the
DataStage Server used our Locator entry as part of the Locator path that was
inserted into the run XML.
Now that we have the run XML files produced, we will import the runs into the
MetaStage Directory using RunImport shown in Figure 8-39 on page 313.
RunImport is designed to be scheduled to run on a regular basis after
DataStage runs your warehouse activities. RunImport can be scheduled using
any Windows command scheduler including the Windows @ scheduler.
Therefore it is recommended that the MetaStage Explorer be shut down during
the RunImport process.
For our example we will simply open a command window and run the default
RunImportStart.bat file provided with the installation of RunImport. We will
navigate to the RunImport installation directory and run the batch command:
D:\mstage\java\client\runimport>RunImportStart.bat
The output from the RunImport for our example is shown in Figure 8-47.
We can see that the two run XML files were successfully processed and
committed to the MetaStage Directory. Any associated RunImport log
information will be in the log file location specified in the RunImport configuration
file as shown in Table A-4 on page 646.
For our data lineage query we will look at the CONSUMER dimension table and
what happened to it during DataStage job runs. To do this we will open the
MetaStage Explorer and examine the CONSUMER TableDefinition object.
In Figure 8-48, we have opened the MetaStage Explorer and clicked on the
DataStage_p0 Import category to show the DataStage objects. Highlighted is
the CONSUMER TableDefinition object.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 321
Figure 8-48 MetaStage category browser
We will now right-click the CONSUMER object to expose the context menu for
the object. For our data lineage example we will find the find the sources of the
CONSUMER object as shown in Figure 8-49.
By clicking the Find Sources menu option, we see the data lineage path shown
in Figure 8-50.
We see from Figure 8-50 that the CONSUMER table was loaded from the
consumer.txt file and that in this particular job 8749 rows were inserted into the
table. The value in red indicates a write event and the values in blue indicate a
read event. Each object on the data lineage can be inspected in detail to find out
more information about the particular object.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 323
For example, the toConsumerTable link could be opened to drill down into the
transformations that occurred to each column on the Link object.
To do this we will open the MetaStage Explorer and click the DataStage_p0
import category and scroll down to the LoadFacts job design shown in
Figure 8-51.
We will right-click the LoadFacts job design object to expose the context menu
shown in Figure 8-52.
We will choose the Browse from LoadFacts -> Ascential DataStage v7 menu.
This will give us a tree control from which we can browse the LoadFacts job
design object in more detail. Now we have the ability to browse relationships from
the LoadFacts job design object. From here we expand the Compiles
into_Compiled job relationship. Since we started examining the job design, we
need to find the actual compiled instance of that job design that ran on the
DataStage Server. Figure 8-53 shows that we have right-clicked the LoadFacts
compiled job to expose the context menu.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 325
Figure 8-53 Process analysis menu
On the menu we can choose to run the Process Analysis->Find Runs query.
Running the query results in the process analysis path shown in Figure 8-54.
We can see that the compiled job ran on 2003-06-18 and that there was some of
problem in its execution. We know that there was a problem because the ending
event toConsumerSalesTable has a red icon. We can examine in more detail the
reason for the problem running the job by inspecting the toConsumerSalesTable
event object. If we double-click the event object we can see more detail about the
event. Figure 8-55 shows the actual DataStage Server message.
We have seen in the previous sections that MetaStage can provide tremendous
value, not only as a simple integration path to exchange metadata with other
tools and DB2 Cube Views, but as an enterprise metadata management tool.
Chapter 8. Using Ascential MetaStage and the DB2 Cube Views MetaBroker 327
In addition to the exchange of metadata with DB2 Cube Views, MetaStage
provides tight integration with all your warehouse tools. This will provide
metadata consistency and optimize design metadata sharing and reuse that will
reduce costs due to time delays during development and production. In addition,
MetaStage stores a persistent directory of all your metadata in a location that can
be integrated into fail-safe disaster recovery systems to protect your metadata
investment.
Warehouse Cube
Manager Views
Metadata
Metadata
Integrator
Integrator
Data
Movement
Metadata
Metadata Data
Data Bridge
Bridge Component
Mapper
Mapper Builder
Builder
The code of the produced data movement components can be reviewed through
any Quality Assurance (QA) processes, and does not depend on any middleware
(free of any run-time cost at deployment time). The Model Mapper provides the
mapping migrations required to support the perpetual changes in the source and
destination data stores. Indeed, one of the key features of MIW is the built-in
support for change management facilitating the maintenance and/or generation
of new versions of the data movement components as needed. Data Connectors
are available for most popular databases via ODBC (as DB2), as well as for XML
data sources (as HL7 for Health Care) to service the expanding needs in the
fields of EDI, e-business, and enterprise information portals.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 331
9.1.2 Meta Integration Repository (MIR)
MIR is based on a modern 3-tier architecture as shown in Figure 9-3 with support
for multi-users, security, and concurrency control. The repository metamodel
integrates standards like the OMG CWM and UML, and supports XMI compliant
metadata interchange. MIR can manage massive amounts of metadata and
make it persistent on most popular RDBMS like DB2, Oracle or SQL Server. The
underlying repository database is fully open allowing users to build their own
metadata Web portals, or use their existing data tools to perform metadata
reporting, mining, and even intelligence.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 333
DB2
Cube Views
OLAP Center
Import
Import Export
Export
Bridge from Bridge
Bridge from Bridgetoto
IBM DB2 IBM
IBM DB2 IBMDB2
DB2
Cube
CubeViews
Views Cube
CubeViews
Views
XML
XML XML
XML
Meta
MetaIntegration®
Integration®
Model
ModelBridge
Bridge
(MIMB)
(MIMB)
Import
ImportBridge
Bridgefrom
from
IBM
IBMRational
Rational Meta
MetaIntegration®
Integration® Import
ImportBridge
Bridgefrom
from
Rose MDL
Rose MDL OMG CWM XMI
(Non-Persistent)
(Non-Persistent) OMG CWM XMI
Export
ExportBridge
Bridgetoto Repository
Repository Export Bridge to
DB2
IBM Export Bridge to Warehouse
IBMRational
Rose
Rational
MDL
Rose MDL
(MIR)
(MIR)
OMG CWM XMI
OMG CWM XMI Manager
Optional
OptionalRepository
RepositoryPersistency
Persistency
Portability Layer
Portability Layer
MIR Persistent Repository
DB2
The exchange of metadata between various tools and DB2 Cube Views using
metadata bridges is motivated by several business cases (tools integration in the
enterprise, documentation...) and helps data warehouse specialists, database
administrators, data modelers and application developers in the following ways:
Forward engineering of a data model created in a design tool or an ETL tool
to a DB2 Cube Views cube model. This metadata movement capability allows
a data modeler to reuse metadata already designed and available in the
enterprise to quickly create a cube model in DB2 Cube Views, therefore
saving time when creating the OLAP metadata and leveraging the existing
metadata, such as business names and descriptions that are not likely to be
stored in the database.
OM - Object DM - Data
Modeling Modeling Data Movement & OLAP Reporting
(Dimensions) (Cubes)
Data Integration
(ETL & EAI)
The tools vendors themselves can provide some of these metadata movements,
for example, IBM Rational® Rose® provides bi-directional integration between
UML object modeling and physical data modeling. Similarly, BI vendors provide
the forward engineering from their OLAP dimension design tool to their OLAP
based reporting tool. However, large corporations use best-of-breed tools from
many vendors. In such case, MIMB can play a key role implementing all the
metadata movement required for the integration of their development tools, as
illustrated in Figure 9-7.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 335
Modeling Tools & ETL Vendors BI Tool Vendors
DB2
Cube Views
OLAP Center
Designer Reporter
(Dimensions) (Cubes)
Relational metadata
OLAP metadata
Figure 9-7 Possible metadata movement solutions for DB2 Cube Views
As each tool has its own tricks and each MIMB bridge has its own set of
import/export options, each scenario has been written as an independent piece
and can be read separately based on your interests.
DB2
Warehouse
Manager
4.1
ERwin/ERX 3.52
DB2
Cube PowerMart / PowerCenter 5.x to 6.x
Views
PowerDesigner 9.5
OLAP Metadata Standards
Center
2002
The current MIMB v3.1 provides IBM DB2 Cube Views import and export bridges
for IBM DB2 OLAP Center, and is available for download at:
http://www.metaintegration.net/Products/Downloads/
This version 3.1 provides very complete support for the foregoing user cases (1)
and (2) of forward engineering:
An ERwin star schema sample model is provided with instructions to generate
the DB2 Cube Views dimensions, facts, and cube model.
However, MIMB v3.1 provides currently incomplete support for the foregoing
user cases (3) and (4), due to current BI/OLAP limitations in the Meta
Integration Repository (MIR) metamodel of v3.x.
Note: To get the most up-to-date information on new versions and releases,
concerning metamodel extensions and support for change management and
impact analysis between all the integrated data modeling, ETL, and BI tools,
check the following site regularly:
http://www.metaintegration.net/Products/MIMB
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 337
The business name, description and data type of relational objects are also
converted
The produced cube model can then be edited DB2 OLAP Center to enrich it with
additional OLAP metadata such as hierarchies, levels, cubes, calculated
measures and more.
The generated model can be edited in the destination tool to further document it,
and add information that was not contained in the source cube model XML file.
This missing information can be physical information (such as indexes or
tablespaces) that can be retrieved automatically from the database using the
destination tool’s database synchronization features, or it can be logical
information, such as generalizations (super type sub type entities) or UML
methods.
For more mapping information, please read the MIMB software documentation,
which includes the complete mapping specification of each bridge. This
documentation can be consulted online at:
http://www.metaintegration.net/Products/MIMB/
The DB2 Cube Views cube model we used is shown in Figure 9-9.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 339
1. Using ERwin v4, create the star schema model
2. Using ERwin v4, generate the SQL DDL for this database
3. Using DB2, run this SQL script to create the tables and columns of this
schema
4. Using ERwin v4, save the model as XML
5. Using MIMB, convert this ERwin v4 XML file into a DB2 Cube Views XML file
6. Using DB2 Cube Views, import this DB2 Cube Views XML file
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 341
Figure 9-12 Specifying the table dimensional roles
Note: The role of each table should be set explicitly, so that it is saved in the
ERwin v4.x XML file format and the bridges can used it. Otherwise, if ERwin
v4.x computes the dimensional role of the table automatically, it will not be
saved in the ERwin v4.x XML file.
2) Using ERwin v4.x, generate the SQL DDL for this database
Once the model has been designed, the SQL DDL can be generated and the
database created in DB2 UDB. In the ERwin v4.x model Physical View, choose
the menu Tools -> Forward Engineer/Schema Generation to generate the
SQL script as shown in Figure 9-13.
Note: The database schema must be created in DB2 before the cube model is
imported into DB2 Cube Views.
At this point, the database has been setup and is ready to receive the cube
model metadata.
When the model is loaded in ERwin v4.x, choose Save As from the File menu,
select the XML format type in the Save as type list, type the file name for the
model you are saving in the File name text box, and click Save.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 343
Note: If the ERwin v4 model is logical and physical (business names have
been defined in the logical view), the Save as XML process described above
will not properly save the physical names into the XML file, if ERwin v4.x
automatically computed these physical names.
To work around this issue, you can use an alternate Save as XML feature of
ERwin v4 located in menu Tools -> Add-Ins' -> Advantage Repository
Export. It produces a slightly different XML file format where the physical
names are expanded.
This issue does not occur if the ERwin v4.x model is physical only.
5) Using MIMB, convert ERwin XML into DB2 Cube Views XML file
Start the MIMB tool and select the import bridge labeled CA ERwin 4.0 SP1 to
4.1, and import your ERwin v4.x XML file, as shown in Figure 9-14.
The MIMB validation feature checks that the model is valid according to the rules
of the MIR metamodel. If something is wrong (a key is empty or a column does
not belong to any table or a foreign key does not reference a primary key), it will
display a warning or error message.
The subsetting feature allows you to create a subset of the model so that the
model exported to the destination tool only contains the few tables you chose.
Select the export bridge labeled IBM DB2 Cube Views and click the Options
button to specify the export parameters as shown in Figure 9-15.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 345
The export parameters used in this scenario are as follows:
The DB2 schema for the tables of the model is STAR, as the model may not
always specify where each table is located.
The cube model to be created will be located in the same STAR DB2 schema.
We specify that the source encoding of the ERwin v4 model is utf-8.
The other options are left with their default value.
Close this window, specify the name of the DB2 Cube Views XML file to be
created, and click the Export button.
6) Using DB2 Cube Views, import this DB2 Cube Views XML file
At this point, the cube model XML file has been created and is ready to be
opened into the DB2 OLAP Center graphical tool. Just start OLAP Center,
connect to your database, and choose Import in the OLAP Center menu as
shown in Figure 9-17.
The content of the XML file is displayed in Figure 9-18, which allows controlling
how this metadata should be imported, in case there is already some metadata
in place and object name collision should occur:
Either update the existing objects with the new imported version.
Or keep the current version of the metadata.
Figure 9-18 Controlling how the metadata is imported into OLAP Center
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 347
Finally, the ERwin star schema metadata converted and imported into OLAP
Center will provide the DB2 cube model in Figure 9-9 on page 339.
The business names and descriptions defined in ERwin v4.x are also converted
to the cube model, as shown in Figure 9-19.
Figure 9-19 The ERwin v4 business names and description are also converted
Congratulations, the ERwin v4.x star schema model was converted to DB2 Cube
Views!
The first step of the conversion process is to save this cube model into an XML
file. Use the OLAP Center menu OLAP Center -> Export and select the cube
model to be exported as shown in Figure 9-20.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 349
2) Using MIMB, convert DB2 Cube Views XML into ERwin XML file
Start the MIMB software, select the import bridge labeled IBM DB2 Cube Views
and import your model. Select the export bridge labeled CA ERwin 4.0 SP1 to
4.1, select the name of the export ERwin v4 XML file, and click the Export Model
button as shown in Figure 9-21.
Figure 9-21 Converting the cube model XML file to an ERwin v4 XML file
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 351
The cube model converted to ERwin v4.x contains the business names and
descriptions, and the logical view is shown in Figure 9-23.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 353
Figure 9-24 Logical view of the ERwin model
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 355
Figure 9-26 Specifying the tables dimensional roles
Note: The role of each table should be set explicitly, so that it is saved in the
ERwin ERX file format and the bridges can used it. Otherwise, if ERwin
computes the dimensional role of the table automatically, it is not saved in the
ERwin ERX file
When the model is loaded in ERwin, choose Save As from the File menu, select
the ERX format type in the File format area, type the file name for the model you
are saving in the File name text box and click OK as shown in Figure 9-27.
Note: When saving a logical and physical model, the physical names of
tables, columns, and keys may not always be saved into the ERX file. Indeed,
when ERwin is used to manage the automatic generation of physical names
from logical names, only the generation rules are saved.
One solution is to make sure all physical names are explicitly set, therefore not
relying on any generation rules from the logical names.
Alternatively, when saving a model as ERX, the dialog box offers a button
called Expand, which opens another dialog box labeled Expand Property
Values. Select the DB2 tab of this window, and check the appropriate names
to expand (column name) as shown in Figure 9-28.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 357
Figure 9-28 ERwin names expansion feature
5) Using MIMB, convert ERwin ERX file into DB2 Cube Views XML
Start the MIMB tool and select the import bridge labeled “CA ERwin 3.0 to 3.5.2”,
and import your ERX file, as shown in Figure 9-29.
Close this window, specify the name of the DB2 Cube Views XML file to be
created, and click the Export button (see Figure 9-31).
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 359
Figure 9-31 Exporting the model to DB2 Cube Views
6) Using DB2 Cube Views, import this DB2 Cube Views XML file
At this point, the cube model XML file has been created and is ready to be
opened into the OLAP Center graphical tool. Just start OLAP Center, connect to
your database and choose Import in the OLAP Center menu to get the display in
Figure 9-32.
Figure 9-32 Specifying the XML file to import into OLAP Center
Figure 9-33 Controlling how the metadata is imported into OLAP CEnter
Finally, the ERwin star schema metadata converted and imported into OLAP
Center will provide the DB2 cube model in Figure 2-9 on page 15.
The business names and descriptions defined in ERwin are also converted to the
cube model, as shown in Figure 9-34.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 361
Figure 9-34 The ERwin business names and descriptions are also converted
The ERwin objects’ business name and description are also converted.
Congratulations, the ERwin star schema model was converted to DB2 Cube
Views!
You can now edit this model in DB2 OLAP Center to enrich it with additional
OLAP metadata such as hierarchies, levels, cubes, calculated measures, and
more.
This step has already been detailed in “1) Using DB2 Cube Views, export your
cube model as an XML file” on page 349. The DB2 cube model is saved into an
XML file using OLAP Center > Export in DB2 Cube Views.
2) Using MIMB, convert DB2 Cube Views XML file into ERX file
Start the MIMB software, select the import bridge labeled IBM DB2 Cube Views’
and import your model. Select the export bridge labeled “CA ERwin 3.0 to 3.5.2”,
select the name of the export ERwin ERX file, and click the Export Model
button.
Figure 9-35 Converting the DB2 cube model XML to an ERwin ERX file
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 363
3) Using ERwin, import this ERX file
At this point, you can open the generated ERX file into ERwin using menu File ->
Open. When the file choice window appears, select ERwin ERX (*.erx) in the
List files of type list box and select the ERX file produced by MIMB as shown in
Figure 9-36.
The cube model converted to ERwin contains the business names and
descriptions, and a logical view is displayed in Figure 9-37.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 365
Forward engineering from PowerDesigner to DB2 Cube Views
The goal of this scenario is to demonstrate how an existing PowerDesigner PDM
model can be converted to a DB2 cube model.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 367
During the implementation of this data model in PowerDesigner, the dimensional
modeling features of the PDM physical diagram were used. A dimensional type
was specified on each table (Fact or Dimension) as shown in Figure 9-39.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 369
Figure 9-41 DB2 schema generation
When the model is loaded in PowerDesigner, choose Save As from the File
menu, select the Physical Data Model (xml) (*.pdm) format in the Save as type
list, type the file name for the model you are saving in the File name text box and
click Save.
We recommend not using such external shortcuts for the purpose of metadata
integration with DB2 Cube Views.
Select the export bridge labeled IBM DB2 Cube Views and click the Options
button to specify the export parameters as shown in Figure 9-43.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 371
Figure 9-43 Specifying the export bridge parameters
Close this window, specify the name of the DB2 Cube Views XML file to be
created, and click the Export button to get the display shown in Figure 9-44.
6) Using DB2 Cube Views, import the DB2 Cube Views XML file
At this point, the cube model XML file has been created and is ready to be
opened into the OLAP Center graphical tool. Just start OLAP Center, connect to
your database, and choose Import in the OLAP Center menu to get the display
shown in Figure 9-45.
Figure 9-45 Specifying the XML file to import into OLAP Center
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 373
The content of the XML file is displayed in Figure 9-46, which allows controlling
how this metadata should be imported, in case there is already some metadata
in place and object name collision should occur:
Either update the existing objects with the new imported version.
Or keep the current version of the metadata.
Figure 9-46 Controlling how the metadata is imported into OLAP Center
Finally, the PowerDesigner metadata converted and imported into OLAP Center
will provide the DB2 cube model in Figure 2-9 on page 15.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 375
1) Using DB2 Cube Views, export your cube model as an XML file
The DB2 Cube Views model used in this scenario is the one shown in Figure 9-9
on page 339.
This step has already been detailed in “1) Using DB2 Cube Views, export your
cube model as an XML file” on page 24. The DB2 cube model is saved into an
XML file using OLAP Center in DB2 Cube Views.
Figure 9-48 Converting the cube model XML file to an PowerDesigner XML file
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 377
The cube model converted to PowerDesigner contains the business names and
descriptions as displayed in Figure 9-50.
Note: The Rose MDL file format is very widely used as a de facto standard
means of exchanging UML metadata. Many design tools support it and
therefore this scenario can also be used to interact and exchange metadata
with them.
A non-exhaustive list of such tools would include IBM Rational XDE, Microsoft
Visual Studio 6 (Visual Modeler), Sybase PowerDesigner, Embarcadero
Describe, Gentleware Poseidon and Casewise.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 379
Figure 9-51 The Rose object model
To create this model in Rose, the UML object model was developed first, and was
then transformed into a relational database schema, as shown in Figure 9-53,
Figure 9-54, Figure 9-55, and Figure 9-56.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 381
Figure 9-53 Create a new database
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 383
Figure 9-57 Generation of the SQL DDL in Rose
At this point, the database has been set up and it is ready to receive the cube
model.
When the model is loaded in Rose, choose Save from the File menu.
5) Using MIMB, convert Rose MDL into DB2 Cube Views XML
Start the MIMB tool and select the import bridge labeled IBM Rational Rose
2000e to 2002, and click the Options button to specify the import parameters as
shown in Figure 9-58.
Then, we can import the Rose MDL file, as shown in Figure 9-59.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 385
Figure 9-59 Importing the Rose model into MIMB
Select the export bridge labeled IBM DB2 Cube Views and click the Options
button to specify the export parameters as shown in Figure 9-60.
Close this window, specify the name of the DB2 Cube Views XML file to be
created, and click the Export button as shown in Figure 9-61.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 387
6) Using DB2 Cube Views, import the DB2 Cube Views XML file
At this point, the cube model XML file has been created and is ready to be
opened into the OLAP Center graphical tool. Just start OLAP Center, connect to
your database, and choose Import in the OLAP Center menu to display
Figure 9-62.
Figure 9-62 Specifying the XML file to import into OLAP Center
The content of the XML file is displayed in Figure 9-63, which allows controlling
how this metadata should be imported, in case there is already some metadata
in place and object name collision should occur:
Either update the existing objects with the new imported version.
Or keep the current version of the metadata.
Finally, the Rose star schema metadata converted and imported into OLAP
Center will provide the DB2 cube model in Figure 9-9 on page 339.
The business names and descriptions defined in Rose are also converted to the
cube model, as shown in Figure 9-64.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 389
Figure 9-64 The Rose objects’ business name and description are also converted
Congratulations, the Rose star schema model was converted to DB2 Cube
Views!
This step has already been detailed in “1) Using DB2 Cube Views, export your
cube model as an XML file” on page 24. The DB2 cube model is saved into an
XML file using OLAP Center > Export in DB2 Cube Views.
2) Using MIMB, convert DB2 Cube Views XML into Rose MDL
Start the MIMB software, select the import bridge labeled IBM DB2 Cube Views
and import your model. Select the export bridge labeled IBM Rational Rose
2002, select the name of the export Rose MDL file, and click the Export Model
button as shown in Figure 9-65.
Figure 9-65 Converting the cube model XML file to a Rose MDL file
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 391
Figure 9-66 The cube model converted to Rose Data Modeler
9.5.5 Metadata integration of DB2 Cube Views with CWM and XMI
The Object Management Group (OMG) Common Warehouse Metamodel (CWM)
is an industry standard metamodel supported by numerous leading data and
metadata management tools vendors. The CWM metamodel shown in
Figure 9-68 is defined as an instance of the Meta Object Facility (MOF)
meta-metamodel and expressed using the OMG Unified Modeling Language
(UML) in terms of classes, relationships, diagrams and packages. Any model
instance of the UML and CWM metamodel can also be serialized into an XML
document using the XML Metadata Interchange (XMI) facility.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 393
Meta Object Facility
Unified Modeling Language
(MOF)
(UML)
Meta- OMG Common Warehouse Metamodel
Levels Examples (CWM)
M3 The MOF MetaMetaModel
Warehouse
Meta-metamodel
Management Warehouse Warehouse
M2 The UML MetaModel Process Operation
with Class, Operations,
Metamodel, Meta- Attributes, Relationships, etc.
metadata The CWM Relational MetaModel
with Table, columns, Primary
Analysis Data Information Business
Keys, etc. Transformation OLAP
Mining Visualization Nomenclature
M1 A UML Object Model with a Class
“Customer” and an operation
Model, “getAddress”
A CWM Relational Model with a
Resources Object- Record- Multi
Metadata, Relational XML
(also Schema) Table “CustomerAddress” and Oriented Oriented Dimensional
Columns: “Street”, “Zip”, etc.
M0 { Peter Frampton,
Instance of Level…
Meta Integration Technology, Inc. (MITI) has been a strong supporter of the
OMG CWM standard since 1999 and joined the OMG in 2000 as an influencing
member. Since 2001, MITI became a domain member of the OMG focusing on
XMI based metadata interchange. MITI is mostly working on the implementation
and support of the CWM standard with other key OMG members such as
Adaptive, Hyperion, IBM, Oracle, SAS, and Unisys. MITI is also actively
participating to all OMG enablement showcases demonstrating bi-directional
metadata bridges with many design tools, ETL tools and BI tools.
The April 28 - May 2, 2002 CWM Enablement Showcase at the Annual Meta
Data Conference / DAMA Symposium, San Antonio, Texas is shown in
Figure 9-69.
Sybase PowerDesigner
Popular RDBMS:
Oracle Designer
DB2,
Database Schema
Oracle,
Extraction
SQL server,
etc.
The metadata interchange and integration challenges using the CWM and XMI
standards are due to multiple factors:
The UML, CWM and XMI standards are evolving and each of them has
multiple versions. A given instance metadata document is therefore a
combination of versions of each of these standards.
A testing suite or open source reference implementation is not yet available.
Software vendors implementing import/export capabilities often need to
extend the standard by using additional properties (TaggedValues), additional
metadata resources (the CWMX extension packages) leading to specific
CWM dialects.
This scenario demonstrates how to export DB2 Cube Views metadata in the
OMG CWM XMI format.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 395
The next scenario will show how to import OMG CWM XMI metadata generated
by DB2 Warehouse Manager into DB2 Cube Views.
1) Using DB2 Cube Views, create a cube model and export it in XML
The DB2 Cube Views model used in this scenario is the one shown in Figure 9-9
on page 339.
This step has already been detailed in “1) Using DB2 Cube Views, export your
cube model as an XML file” on page 24. The DB2 cube model is saved into an
XML file using OLAP Center > Export in DB2 Cube Views.
2) Using MIMB, convert this XML file into a CWM XMI file
Start the MIMB software, select the import bridge labeled IBM DB2 Cube Views
and import the cube model XML file as shown in Figure 9-70.
Select the export bridge labeled OMG CWM 1.0 and 1.1 XMI 1.1 and type the
name of the export file in the To field. Click the Options button to specify the
export options.
The CWM export bridge has many parameters, which allow controlling how the
CWM file should be created.
We also specify that the source encoding of the cube model XML file is utf-8 as
shown in Figure 9-72.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 397
Figure 9-72 Specifying the export options: encoding
Finally, click the Export Model button as shown in Figure 9-73 to create the
CWM XMI file.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 399
This scenario focuses on the exchange of metadata between DB2 Warehouse
Center and DB2 Cube Views via the OMG CWM XMI file format. We will
demonstrate how a datamart designed in DB2 Warehouse Center in the form of a
star schema can be saved as a CWM XMI file, then converted to a DB2 Cube
Views XML file using the MIMB utility, and finally, open it in DB2 Cube Views as
a cube model.
1) Using DB2 Data Warehouse Center, create the star schema model
This scenario uses the small “Beverage Company” star schema model shown in
Figure 9-75.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 401
2) Using Data Warehouse Center, save the model as a CWM XMI file
From the DB2 Data Warehouse Center, export the metadata as shown in
Figure 9-77.
Figure 9-77 Starting the CWM export wizard from DB2 Data Warehouse Center
3) Using MIMB, convert CWM XMI file into DB2 Cube Views XML file
At this point, start MIMB and select the IBM DB2 Warehouse Manager import
bridge. This bridge is designed to understand the DB2 Warehouse Manager
dialect of CWM. Then, import the CWM XMI file as shown in Figure 9-80.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 403
Figure 9-80 MIMB: importing the DB2 Data Warehouse Center CWM XMI file
Click the button labeled Model Viewer to review the imported metadata in
Figure 9-81.
Figure 9-81 The sample warehouse Beverage Company imported from CWM
Select the export bridge labeled IBM DB2 Cube Views and click the Options
button to specify the export parameters as shown in Figure 9-82.
In this scenario, the star schema tables are located in a DB2 schema called
DWCTBC, which we also use to store the OLAP metadata. We specify that the
source encoding of the CWM file is utf-8. The other parameters are left to their
default value.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 405
Figure 9-83 Choosing a subsetting mode
Drag and drop the 4 tables to be subsetted and click the button Subset selected
class(es) as shown in Figure 9-84.
The cube model has now been produced and is ready to be imported into DB2
Cube Views.
4) Using DB2 Cube Views, import this Cube Views XML file
Finally, the metadata is imported into OLAP Center as shown in Figure 9-86.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 407
Figure 9-86 The Beverage Company star schema imported into DB2 Cube Views
Congratulations, you have imported into DB2 Cube Views a star schema
designed in DB2 Warehouse Manager!
Note: A copy of the Informatica software was not available during the writing
of this chapter, so the XML file shown here was not directly generated by
Informatica, but instead was forward engineered from an ERwin model to
Informatica using the MIMB software. Nevertheless, the principles of this
scenario are still relevant and the conversion process is the same.
2) Using MIMB, convert Informatica XML into DB2 Cube Views XML
Start the MIMB software, select the import bridge labeled Informatica
PowerMart/Center XML, select the XML file to be imported, and click Import
Model to import it as shown in Figure 9-88.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 409
Figure 9-88 Importing the Informatica model
If the Informatica XML file contains the definition of tables that are not part of the
target star schema, you can filter them out using the MIMB subsetting feature.
Please refer to the MIMB documentation for details.
Then, select the export bridge labeled IBM DB2 Cube Views and press on the
Options button to specify the export parameters as shown in Figure 9-89.
Note: The fact or dimension information on each table may not always be
specified in the Informatica XML file. In this case, we can specify it this way.
We also specify that the source encoding of the Informatica model is utf-8.
At this point, you can export the model to the DB2 Cube Views XML file format as
shown in Figure 9-90.
3) Using DB2 Cube Views, import this DB2 Cube Views XML file
At this point, the cube model file is ready for importing. Select the Import item in
the OLAP Center menu and use the wizard to import the file. You can see the
result in Figure 9-91.
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 411
Figure 9-91 The cube model as imported in DB2 OLAP Center
Congratulations, you have imported into DB2 Cube Views a star schema
designed in Informatica PowerMart/Center!
Change in the enterprise is a reality and therefore, each of these tools have
implemented metadata version and configuration management features to
properly capture change and manage the versions of the enterprise metadata.
Whether change occurs first in the database, or in a tool managing the database,
and whether it is a small incremental update of a dramatic new version, it needs
to be propagated to the other tools in the enterprise, and these tools also need to
understand what has changed and how to handle this new version of the
metadata.
The Meta Integration Model Bridge utility can extract the new version of the
metadata from the source tool where the change happened, transform this
metadata using sophisticated forward engineering and reverse engineering
algorithms across vendors tools, formats, and methodologies, and publish the
new version of the metadata into the destination tool.
To analyze the new version of the metadata in the destination tool and compare it
to the current version of the metadata that may be already in place, it is
recommended to use the version management features such as metadata
comparator and metadata integrator in the destination tool, such as the ones
implemented in most design tools, ETL tools and their underlying metadata
repositories.
In case of DB2 Cube Views as a destination of a metadata flow, the version and
configuration management features are available in the XML import wizard. They
can be used to control how the current version of the metadata stored in the DB2
catalog can be replaced by the new version of the metadata in the XML file.
For example, the Meta Integration Repository server (MIR) and Works client
(MIW) suite of software is fully equipped for metadata version and configuration
management, with a metadata repository manager, metadata comparator,
metadata integrator, metadata mapper, in addition to all the metadata bridges
also available in the Meta Integration Model Bridge (MIMB) utility (more than 40
of them as of summer 2003).
Chapter 9. Meta Integration of DB2 Cube Views within the enterprise toolset 413
9.7 Conclusion: benefits
DB2 Cube Views simplifies and expedites business data analysis by presenting
relational information as multidimensional objects.
MIMB does exactly this plus more. MIMB allows you to bring design and
modeling information into DB2 Cube Views, and automates the creation process
of the Cube Model and its related dimensions.
When you are done with your DB2 Cube Views design, you can also use MIMB
to exchange your multidimensional model with your BI and reporting tools.
MIMB allows you to reuse the multidimensional objects you've created in DB2
Cube Views and populate this metadata in your BI and reporting tools.
DB2 UDB V8.1 is a close partner product to DB2 OLAP Server, and together
these database products provide high level functionality and performance in
order to enable business managers to analyze and more effectively manage
business performance.
Figure 10-1 illustrates the DB2 OLAP Server functions in a relational database
environment. At the bottom of the figure three types of database cubes are
shown. On the right hand side there is the MOLAP cube where all data that is
queried by the user is stored in the DB2 OLAP Server MOLAP file structure. In
the middle there is again the MOLAP database, but in addition there is the ability
for the user to drill through to the underlying relational database using Integration
Server drill through. This relational database is represented by the box at the top
left of the figure and is shown as a star schema model. Finally, on the left hand
side, there is the hybrid database whereby higher levels of the hierarchies are
held in the MOLAP database and lower levels are held in the relational database.
The metadata for Integration Server is also stored in a relational database, and
this is represented by the box at the top left of the figure and labelled the IS
metadata catalog.
Many end user tools are available to query the data in DB2 OLAP Server.
IS Metadata
Catalog DB2 OLAP Integration Server
DB2
Client SQL SQL
Star Schema SQL
(ODBC)
Model Drill-Through Member
Data Load
Reports Load
SQL
DB2 Client
(ODBC)
Hybrid Analysis
Engine DB2 OLAP Server
Essbase APIs
In order to perform these tasks, the metadata in the Integration Server model has
to map relational tables to dimensions in DB2 OLAP Server. It also has to
describe how the hierarchies can be built from the relational data, and any
transformations that are to take place. The metaoutline needs to specify the
sequence of the dimensions that are required in the physical outline and select
the hierarchies that are to be built from those defined the in model. Measures can
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 419
be defined directly from the relational database, or complex measures can be
built using the many function templates available in DB2 OLAP Server. The
metaoutline is used to build the physical outline, and as such there is additional
metadata than can be specified in the metaoutline that is specific to OLAP Server
— for example, specifying whether the dimension is dense or sparse.
Figure 10-2 illustrates this process. Integration Server holds metadata that
describes both the source relational database and the target DB2 OLAP Server
database. With this information, Integration Server can generate and run the
SQL that is required to both build the target outline, and perform the data load for
the target DB2 OLAP Server database.
Integration Server
Integration Server
Metadata
DB2 OLAP
DB2 Server
Create outline
Load data
There are other features of DB2 OLAP Server available such as Enterprise
Services, Administration Services, Spreadsheet Services and OLAP Mining. For
more information on DB2 OLAP Server please refer to DB2 OLAP Server V8.1:
Using Advanced Functions, SG24-6599 and go to the following Web site:
http://www-3.ibm.com/software/data/db2/db2olap/
The Integration Server Bridge is a two-way bridge, meaning that metadata can
be sent from DB2 Cube Views to Integration Server and also from Integration
Server to DB2 Cube Views. However, you must always bear in mind that DB2
Cube Views metadata is designed for OLAP in general, whereas the Integration
Server metadata is specific to DB2 OLAP Server. Therefore, there will be
elements in both products that will not be able to be mapped when sent across to
the other product.
This means that some metadata will be lost no matter which direction the
metadata flows to or from. It is therefore not recommended that the bridge be
used for round-tripping .
Table 10-1 shows the mapping that takes place between DB2 Cube Views and
Integration Server
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 421
Table 10-1 Object mapping between DB2 Cube Views and Integration Server
Integration Server object DB2 Cube Views object
Fact Facts
Dimension Dimension
Member Attribute
Hierarchy Hierarchy
Join Join
Metaoutline Cube
The Integration Server Bridge reads from and writes to XML files, and runs on
the Windows platform only.
This section suggests some of the issues to consider in the following scenarios:
DB2 OLAP Server and DB2 Cube Views not installed
DB2 OLAP Server with Integration Server installed, but DB2 Cube Views not
installed
OLAP Server installed, but Integration Server and DB2 Cube Views not
installed
DB2 Cube Views installed, but OLAP Server not installed
The metadata in DB2 Cube Views is generic to OLAP, whereas the Integration
Server metadata is specific to DB2 OLAP Server. Also, DB2 Cube Views
requires unique names within object type across the whole model whereas
Integration Server allows duplicate names in different contexts. For example, a
hierarchy in one dimension can have the same name as a hierarchy in another
dimension. DB2 Cube Views would not allow this. In general, therefore, the
process flow may be to create the metadata in DB2 Cube Views and then export
across the bridge to Integration Server. Once in Integration Server it can be
further enhanced for DB2 OLAP Server specific functionality.
Furthermore, the metadata flow may need to take into account additional
products other than just DB2 OLAP Server. The metadata flow may start with a
push into DB2 Cube Views (across a bridge) from a data modelling tool or an
ETL tool, for example. In this case the natural flow between DB2 Cube Views and
Integration Server would again be from DB2 Cube Views to Integration Server.
Figure 10-3 provides an illustration of the scenario for metadata flow from DB2
Cube Views to Integration Server:
1. Create the metadata in DB2 Cube Views by any of the methods available.
2. Export the metadata to an XML file.
3. Process the XML file through the Integration Server Bridge.
4. Import the XML files that are produced into Integration Server.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 423
Process through the Integration Server bridge
1
XML for IS
Model
1
XML for IS
1
Metaoutline
The result of having imported the XML files that have been generated by the
bridge into Integration Server will depend on whether the cube model, or the
cube model and the cube, were exported originally from DB2 Cube Views. A
cube model in DB2 Cube Views maps to an Integration Server model. A cube in
Integration Server maps to a metaoutline in Integration Server. Figure 10-3
shows an input XML file with information from both a cube model and a cube
being processed by the bridge to generate two XML files: one for the Integration
Server model and one for the Integration Server metaoutline.
The metadata in DB2 Cube Views is generic to OLAP and not specific to
Integration Server. Some metadata objects in DB2 Cube Views, such as
aggregation scripts, have no equivalent in Integration Server, and for these
objects it will not be possible to flow the metadata from DB2 Cube Views to
Integration Server (IS).
In this case the only software product that needs to be installed is DB2 Cube
Views.
The task here, therefore, is to export the metadata from Integration Server to
DB2 Cube Views in the reverse direction across the bridge, as can be shown in
Figure 10-4. The Integration Server model and metaoutline are exported
separately from Integration Server, and then the bridge combines these two XML
files into a single XML file that can then be imported into DB2 Cube Views.
1
1
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 425
The metadata in Integration Server is specifically created in order to generate
DB2 OLAP Server databases, the functionality of which cannot be totally
replicated within DB2 UDB V8.1. Therefore, it may not be possible to flow the
metadata for the more complex objects from Integration Server to DB2 Cube
Views.
10.2.3 DB2 OLAP Server installed, but not IS and DB2 Cube Views
In this scenario we have an existing installation of DB2 OLAP Server, but until
now Integration Server has not been implemented. Here we are introducing both
Integration Server and DB2 Cube Views. It is assumed that DB2 UDB V8.1 is
already installed and that a DB2 multidimensional database was already being
used to load data into the DB2 OLAP Server database via data load rules files.
Here we have DB2 OLAP Server applications and databases, but no metadata to
describe the relational database sources and the dimensions and hierarchies
within those data sources.
It may well be that for some DB2 OLAP Server databases, the effort involved in
generating the metadata outweighs the benefits, and a decision is taken not to
generate metadata for those databases. For example, if those databases
perform well and there are no plans to introduce Hybrid Analysis into those
databases, or if those databases have no data load performance issues.
Furthermore, if those databases are loaded from non-DB2 data sources, then
metadata exchange with DB2 Cube Views will not be appropriate.
Metadata can either be created in DB2 Cube Views and put through the
Integration Server bridge into Integration Server, or it can be created in
Integration Server and the flow can then be from Integration Server, across the
bridge into DB2 Cube Views. For the same reasons that were discussed in the
initial scenario where neither DB2 OLAP Server nor DB2 Cube Views were
installed, the flow of metadata may well be from DB2 Cube Views to Integration
Server.
10.2.4 DB2 Cube Views installed, but not DB2 OLAP Server
In this scenario we have an existing installation of DB2 Cube Views and we are
introducing DB2 OLAP Server, including Integration Server. The DB2 Cube
Views implementation has created metadata in the form of one or more cube
models, each having one or more cubes, to describe a DB2 multidimensional
database.
One suggestion that is worth thinking about before starting with the Integration
Server bridge has to do with organizing the output from each process. Use of the
Integration Server bridge is going to generate a number of XML files. It is
recommended that a naming convention be adopted to assist in identifying both
the content of each XML file, and the process that generated each XML file. For
example, was the XML file generated as a result of an export from DB2 Cube
Views or as the result of a bridge process? If it was produced by the bridge, was
the bridge being used to process the XML from DB2 Cube Views or from
Integration Server? Having separate folders for each process may assist with the
task of identifying both the content of an XML file and the process that was used
to create it.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 427
10.3.1 Metadata flow from DB2 Cube Views to Integration Server
The tasks that need to take place in order for metadata to flow from DB2 Cube
Views to Integration Server are as follows:
1. Export the metadata from OLAP Center to an XML file.
2. Process the XML file through the Integration Server bridge to produce one or
two output XML files.
3. Import the output XML file(s) into Integration Server.
If you click a cube, then the cube model that is associated with that cube will also
be selected. A cube in DB2 Cube Views will map to a metaoutline in Integration
Server. It is not possible to select a cube without a cube model because a cube
model is required before a cube can exist. Moreover, a metaoutline without a
model in Integration Server is not valid. So by selecting to export a cube in DB2
Cube Views, the result in Integration Server will be both a model and a
metaoutline.
In DB2 Cube Views V8.1 FP2+, it is not possible to select more than one cube for
export from within OLAP Center. Therefore if you have more than one cube in a
cube model that you wish to export to Integration Server, you need to perform the
export, bridge, import process multiple times.
Having selected the cube model or cube that you wish to migrate, you must then
enter the full path name of the XML file that you wish to create. There is a browse
button available to assist in selecting the appropriate drive and folder.
Finally, when you have entered the export file name, click the OK button. You
should then receive an OLAP Center message informing you that your export
has been successful.
To launch the bridge, use the command line and run the ISBridge.bat file directly.
By default the ISBridge.bat file is located in the SQLLIB\bin directory.
When launching the bridge, you will be presented with the IS Bridge window as
shown in Figure 10-6.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 429
Figure 10-6 Integration Server Bridge window
The Integration Server bridge window contains two tabs. The To DB2 tab is used
if processing metadata from Integration Server to DB2 Cube Views. The From
DB2 tab is used if processing metadata from DB2 Cube Views to Integration
Server. As this section is concerned with going from DB2 Cube Views to
Integration Server, then at this point, you should select the From DB2 tab and
you will be presented with the window shown in Figure 10-7.
Figure 10-7 The IS bridge from DB2 Cube Views to Integration Server
In the DB2 Cube Views XML file name field, enter the full path name of the XML
file that was created as the result of the export from DB2 Cube Views. There is a
Browse button available to assist in finding the correct file and folder. In the
Output directory field enter the full path name of the directory in which you want
Figure 10-8 Use of bridge from DB2 Cube Views to Integration Server
When you click the Apply button, the bridge will process the input XML file and
generate the output XML file(s) in the target directory specified. You should
receive a successful completion message that will also inform you of the name of
the output XML file which gets created in order to generate the Integration Server
model, and the name of the output XML file which gets created in order to
generate the Integration Server metaoutline (if applicable).
The bridge will generate a log file. The log file that is generated has a different
name for each direction of the bridge. When processing from DB2 Cube Views to
Integration Server the log file is called aislog.xml and by default is located in the
directory where the ISBridge is started.
The log file will contain informational messages recording the names of the XML
files that have been created. It will also detail any objects that could not be
mapped. It is important therefore that you review this log as it is a record of what
has not been mapped across into Integration Server. If any of these objects are
required in Integration Server they will have to be created manually.
Table 10-1 on page 422 contains the mappings that take place. A list of the
objects that cannot be mapped by the bridge can be found in the manual, Bridge
for Integration Server User’s Guide, SC18-7300.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 431
If you have more than one cube in the cube model that you wish to take across to
Integration Server, then you will have to repeat this process for each XML output
file that was exported from DB2 Cube Views. When you process the subsequent
files across the bridge the bridge will inform you that the model XML file already
exists if you specify the same output directory. You then have the option to
replace or not to replace. If you choose not to replace, then only one XML file (for
the metaoutline) will be generated.
The window you get should look something like that displayed in Figure 10-9.
Click the Import to Catalog button to import the model into Integration Server.
Once the import process has completed, you can go into Integration Server to
view the results of the import. Figure 10-11 shows the model as it would appear
in the Integration Server model.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 433
Figure 10-11 Result of import into Integration Server model
If you have more than one cube from the same cube model to bring across to
Integration Server, then you should repeat the process for each cube. Selecting
the additional cubes will also select the same cube model again for export. When
you process this through the bridge, therefore, you will not only get the XML file
that you need to import in order to create the second metaoutline, but also you
will once again get the XML file for creating the Integration Server model.
Furthermore, if you requested the same output directory, it will overwrite the
original XML file that was generated from the previous import. Clearly, when
importing into Integration Server, you only need to import the XML file for the
additional metaoutlines.
Naming considerations
In terms of names, you will notice that the name given to the metaoutline is the
name of the cube within DB2 Cube Views. The accounts dimension is given a
fixed name of Accounts. Moreover, in general, the names that you will see are
the actual names from DB2 Cube Views, not the business names. There is no
concept of comments in Integration Server, so there is nowhere to store any
comments that have been documented in DB2 Cube Views.
The reason that the column names are used instead of the business names is to
prevent problems arising should you want to combine one or more tables in the
Integration Server model metadata, that is by dropping one table on top of the
other on the right hand panel in the Integration Server model. In order to do this,
the metadata column names need to be unique. It they are not, Integration
Server will rename the second column in the model metadata by prefixing the
column name with the table name and an underscore character. This is
illustrated in Figure 10-12.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 435
The top of this figure shows two tables called MARKET and REGION as they
would be displayed in the Integration Server model if they were displayed
separately. There is a column called REGIONID in both tables. The bottom of
the figure shows the result of dropping the region table on top of the market table
in the Integration Server model. The REGIONID from the second table is
renamed to REGION_REGIONID.
It is in order to allow for this functionality that the Integration Server bridge utilizes
the column names rather than the business names when moving metadata from
DB2 Cube Views to Integration Server.
Related attributes
One of the DB2 Cube Views constructs that is not supported by the bridge is the
attribute relationship. If a descriptive attribute relationship was defined in DB2
Cube Views you may wish to reassign that relationship in Integration Server by
defining the descriptive column as an alias of the member that gets created in the
metaoutline. Similarly an associated attribute relationship may equate
conceptually in DB2 OLAP Server terms to an attribute. In this case you should
manually create the attribute dimension in Integration Server.
Calculated measures
Some, but not all of the calculated measures can be mapped across to
Integration Server. And those that do get mapped may appear differently in
Integration Server.
When this is mapped across to Integration Server, the measure is created in the
Integration Server model in the Accounts dimension with a transformation rule,
as is shown in Figure 10-13.
For other measures, the Integration Server bridge will attempt to build a measure
hierarchy in the Integration Server metaoutline. For example, in DB2 Cube Views
a Sales per unit measure is defined as in Example 10-2.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 437
If the bridge is unable to build a straightforward measure hierarchy, then the
measure will be dropped. Example 10-3 shows a similar calculation to
Example 10-2 in that a division is involved, however, in this example the value
that is being divided is in itself an expression.
In this case the measure will be dropped and will not appear in Integration
Server.
The Integration Server bridge will not create any formula in the Integration Server
metaoutline. What it will first try to do is to map a DB2 Cube Views measure to a
measure in the Integration Server model as in the Profit example in Figure 10-13
on page 437. Only if it is not able to do this will it then attempt to map the
measure to a measure hierarchy in the Integration Server metaoutline as shown
in Figure 10-14 on page 437. The most likely reason for it not being able to map
to a measure in the model in Integration Server is if the aggregation function is
set to None in DB2 Cube Views.
When considering the measures, you may choose to group measures differently
in Integration Server and to change the consolidation attributes. Member
properties such as the use of two-pass calculation and dynamic calc storage
settings should also be reviewed.
When adding any new items into Integration Server, consider the effect on the
resulting SQL that will be generated. If you plan to make use of an MQT, then
clearly anything that you add must be able to be derived from the available
MQT(s).
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 439
2. Export the Integration Server metaoutline metadata to an XML file.
3. Process these XML files through the Integration Server bridge to produce one
output XML file.
4. Import the output XML file into DB2 Cube Views.
Select the model that you wish to export and click the Save as XML File button.
When you have saved the model, repeat the process for the metaoutline.
At this stage you will have two XML files, one for the Integration Server model
export and one for the Integration Server metaoutline export.
If the model has more than one metaoutline that you wish to take over to DB2
Cube Views, then you will also need to repeat the export process for each
metaoutline.
In this direction you are required to enter four items, as shown in Figure 10-17.
Figure 10-17 The IS bridge from Integration Server to DB2 Cube Views
You must specify the full path names of the model and metaoutline XML files that
were exported from Integration Server. The Browse button is available to assist
with locating these files. You are also required to specify the schema name of the
database tables that relate to this metadata. Finally you should enter the full path
name of the XML output file that you wish the bridge to create. It is important that
you specify the .xml file type suffix when you enter the file name.
After you have clicked the Apply button, you should receive a successful
completion message once the bridge process has finished. The output XML file
should be created in the directory that you specified.
The bridge will generate a log file. The log file that is generated has a different
name for each direction of the bridge. When processing from Integration Server
to DB2 Cube Views the log file is called isalog.xml and by default is located in the
directory where the ISBridge is started.
The log file will be empty if the bridge was able to map everything successfully.
Objects that could not be mapped should be reported in the log. It is
recommended therefore that you review this log to see those objects that have
not been mapped across into DB2 Cube Views. It is possible at Fixpack 2+ of the
product that some objects that were not able to be mapped do not appear in the
log. A manual review in OLAP Center of the imported objects is therefore
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 441
recommended. A list of the objects that cannot be mapped by the bridge can be
found in the manual, Bridge for Integration Server User’s Guide, SC18-7300.
Table 10-1 on page 422 contains the mappings that do take place.
If you exported more than one metaoutline from Integration Server, then you will
need to repeat this process for each metaoutline that you wish to bring across to
DB2 Cube Views. Note that, for every metaoutline that you wish to process, you
will also be required to specify the exported model XML file each time.
Start up OLAP Center and from the menu bar click OLAP Center->Import and
you will be presented with the first screen of the Import Wizard as shown in
Figure 10-18.
After having entered the full path name of the XML file that was generated by the
Integration Server bridge, click Next, and you will be presented with the list of
metadata objects that OLAP Center is going to import, as shown in Figure 10-19.
Click the appropriate radio button to specify how the OLAP Center import should
resolve existing names.
The import should then complete and the result can be viewed from OLAP
Center.
If you are bringing across more than one metaoutline, you will have more than
one XML file to import. When you import the additional XML file(s) that contain
the additional migrated metaoutline(s) you will of course be including the model
metadata each time in the XML file that you are importing. The import wizard will
recognize that some of the objects that you are attempting to import exist already
in the cube model, and when you get to the screen shown in Figure 10-19, the
wizard will display the number of new objects (reflecting the new metaoutline)
and the number of existing objects (reflecting the already imported model and
metaoutline) that you are requesting to import. You can then select the
appropriate radio button option to either replace or not replace the existing
objects with the ones in the current import XML file.
Once you have imported the additional metaoutlines, the end result will be one
cube model with multiple cubes, one cube per metaoutline.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 443
Some of the considerations are discussed in this section:
Bridge mapping
Measures considerations
Naming considerations
Alternate hierarchies
Hidden columns
Automatically generated time dimension
Bridge mapping
The Integration Server model will have been mapped to a cube model in DB2
Cube Views, and the Integration Server metaoutline will have been mapped to a
cube in DB2 Cube Views. The name of the Integration Server model will become
the name of the DB2 Cube Views cube model. The name of the metaoutline will
become the name of the Integration Server cube.
Facts in the accounts dimension from the Integration Server model will attempt to
be mapped back to measures in the cube model. Members of the accounts
dimension in the metaoutline will attempt to be mapped back to measures in the
DB2 Cube Views cube.
Measures considerations
In Integration Server the measures are usually a combination of measures that
can be mapped straight back to a column in the fact table, and measures that are
derived or calculated measures. Measures that can be mapped straight back to
columns in the fact table will appear in both the Integration Server model and the
metaoutline. This follows the architecture rules for DB2 Cube Views where
measures that are in the cube must also appear in the cube model. For these
types of measures the mapping from Integration Server to DB2 Cube Views is
straightforward.
Naming considerations
The bridge takes into account the differences in requirements for the uniqueness
of names in Integration Server and DB2 Cube Views. The rules are stricter in
DB2 Cube Views than they are in Integration Server, and the bridge therefore
performs some name changes of objects in order to avoid name collisions. So,
for example, in Integration Server, a dimension in one metaoutline can have the
same name as a dimension in another metaoutline for the same model. In DB2
Cube Views, dimensions referenced in different cubes for the same cube model
cannot have the same name. In Integration Server, a hierarchy created for
dimension A in a model can have the same name as a hierarchy for dimension B
in the same model. In DB2 Cube Views, hierarchy names must be unique across
the cube model.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 445
Therefore, to avoid name collisions, the Integration Server bridge will perform
name changes for dimensions and hierarchies when processing the metadata
from Integration Server to DB2 Cube Views.
For dimensions, the name change occurs at the individual cube level; there are
no name changes at the cube model level. At the cube level, the dimension
name is changed such that it is prefixed with the cube name and a blank
character. So, for example, if there were a metaoutline in Integration Server
called Sales Cube that contained a dimension called Customer, then when taken
across the bridge and imported into DB2 Cube Views, the resulting dimension
name would be Sales Cube Customer.
The Integration Server bridge will attempt to map a metaoutline hierarchy back to
a hierarchy in the Integration Server model. If it is unsuccessful, the bridge will
create a cube model hierarchy for the cube hierarchy to reference. If the bridge
needs to create a cube model hierarchy, it will use a naming convention of
NewDimensionName with a suffix of HIER .
Note: Be aware that any name changes that the Integration Server bridge
performs are not logged in the isalog.xml file.
Alternate hierarchies
Another consideration regarding hierarchies is that in Integration Server a
dimension in a metaoutline may have multiple or alternate hierarchies. This will
not map to DB2 Cube Views because at the cube level only one hierarchy per
dimension is permitted. If there is more than one hierarchy for any given
dimension in a metaoutline, Integration Server will map the first hierarchy that is
presented to it in the XML file that is exported from Integration Server. This may
or may not be the first hierarchy that is presented to the user in the user interface.
Hidden columns
In the Integration Server model it is possible to flag that a column should be
hidden. This is often used where there are many columns in the relational table
that are not required for the OLAP Server database. The Integration Server
bridge assumes that hidden columns are not required, and will therefore only
take them across to DB2 Cube Views if they are required in a join. If a column is
flagged as hidden and it is not required for a join, it will not be taken across to
DB2 Cube Views as an available attribute.
Should you wish to include these hidden columns in DB2 Cube Views then you
can add them back in OLAP Center once the import has completed.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 447
These three values can then be used to create a time hierarchy of Year, Quarter,
Month (and of course many other time hierarchies can be created in the same
way). When you export the model to XML you will see a dimension called Time in
the XML file for which the physical table name that is given is the name of the
fact table. The join between the fact table and the time dimension will physically
be a self join on the fact table.
When the Integration Server metadata is imported into DB2 Cube Views (via the
bridge), the Cube Model will reflect what was created in the Integration Server
model. There will be a Time dimension that is based on the fact table, the
attributes for Year, Quarter and Month will be mapped to SQL expressions as
described above, and the hierarchy Year, Quarter and Month will be created. The
join will be a self join on the fact table. By default in Integration Server the join on
the fact table will be based on the date column that was selected. If this has not
been changed then this will come across the bridge as an inner join on the date
column with a cardinality of Many:1.
If left like this, an error will be received when trying to run the Optimization
Advisor. The error will indicate that a primary key is not defined using the
columns involved in the fact table self join. In order to optimize this type of cube
model a primary key must be defined, the join cardinality must be 1:1 and the join
type must be inner. An example of this is described in Chapter 6 of the IBM DB2
Cube Views Business Modeling Scenarios manual, SC18-7803.
Therefore, in order to optimize this cube model, you must ensure that a primary
key is defined for the fact table, and that the column(s) referenced in the primary
key are the ones used in the definition of the self join on the fact table in OLAP
Center.
10.4 Maintenance
This chapter has looked at the use of the Integration Server bridge through the
GUI interface. It is also possible to use the Integration Server bridge from a
command line using the ISBridge command. The syntax and parameters for this
command are detailed in the Bridge for Integration Server User's Guide,
SC18-7300.
This means that in combination with the db2mdapiclient utility for import/export
metadata to/from DB2 Cube Views (as described in Appendix D, “DB2 Cube
Views stored procedure API” on page 673), and the Integration Server
impexp.bat utility to import/export metadata to/from the Integration Server
catalog, it is also possible to fully script the process that has been described in
this chapter.
The current Integration Server bridge only performs the mapping of metadata
objects contained in the XML files that it is given. It relies totally on the import
utility of both tools (DB2 Cube Views and Integration Server) to place the
metadata in the catalogs.
When going the other way, from Integration Server to DB2 Cube Views, it is also
recommended that a full refresh be performed.
The interaction of DB2 OLAP Server with the relational database occurs in three
areas:
Data load: Loading the MOLAP database from the relational database
Hybrid Analysis: Extending the MOLAP hierarchy into the relational
database
Integration Server drill through reports: Running relational reports from
specific intersections in the MOLAP database
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 449
10.5.1 Data load
In performing a data load, an SQL query is generated by Integration Server that
will involve joining each of the dimension tables required for this database to the
fact table, and potentially performing some aggregation of the data. This
aggregation will depend on the level of granularity in the relational source
compared to the level required for the OLAP database.
Sometimes a relational data source is purposely built for an OLAP database, and
as such, it is at the same level of granularity as the OLAP database. Other times
the OLAP database is a summary extraction from the relational source data, and
as such, the data load query will involve a level of aggregation. Usually with
larger databases and certainly if Hybrid Analysis or drill through reporting are
enabled, then the MOLAP database will contain higher level data and will
therefore require aggregations in the data load SQL.
Loading data from an MQT should be faster than loading data from the base
tables. There will be no joins to perform and the data in the MQT should be an
aggregation of the base data. The higher the level of aggregation, the smaller the
MQT that the data load has to query and therefore the greater the potential
performance benefit. However, there is a cost to consider, and that is the cost of
building the MQT in terms of both time to build and storage space required.
If the level of granularity in the MOLAP database and the relational database are
the same, then the hierarchies in the cube and cube model will be the same.
When optimizing for extract, the Optimization Advisor will advise on an MQT
based on the cube which will result in a large MQT that will basically be a result
of the fact table joined to each of the dimension tables.
This type of MQT will probably take a long time to build. Additionally, if the MQT
is at almost the same level as the fact table, then the greater the chance that the
DB2 optimizer will choose not to select that MQT when deciding on how best to
return the result set. In general, if the number of rows in the MQT is close to the
number of rows in the fact table, then there will probably be little performance
benefit.
In terms of justifying the time to build, there are a number of things to consider.
Clearly the more DB2 OLAP Server databases that get built from the one
relational database the more likely it will be that the benefits of a reduced load
time out weigh the cost to build the MQT. Moreover, as the MOLAP database is
unavailable to users whilst the data load is taking place, then there are additional
advantages related to end user availability to be had, by moving the workload
away from the data load and into the relational database.
However, if the MOLAP user also requires access to the same relational
database, then considerations will also need to be given to the scheduling of the
MQT refreshes so as not to effect the end users. Similarly, the synchronization of
data also needs to be considered in an environment where the user has access
to both the MOLAP and relational data.
In this example a cube has been defined in DB2 Cube Views from the cube
model example used in this book. The cube is defined at a higher level than the
base fact table. For example the DATE dimension goes down to month instead of
day, the CONSUMER dimension does not go down to individuals, the STORE
dimension does not go down to individual store and the STORE hierarchy is only
three levels deep. The cube as defined in DB2 Cube Views is shown in
Figure 10-20.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 451
Figure 10-20 DB2 Cube Views cube
This cube was then exported from DB2 Cube Views, processed through the
Integration Server bridge, imported into Integration Server, and the resultant
metaoutline modified slightly such that a database can then be built in DB2 OLAP
Server. Figure 10-21 shows the metaoutline in Integration Server.
The section “Review the results in Integration Server” on page 434 discussed
some of the changes that might be done, having imported the metadata into
Integration Server. Listed below are the changes that were made in this example:
Changed the order of the dimensions, allocated Time and Accounts
properties and specify the appropriate dense and sparse settings.
Changed the name of the dimensions to remove the metaoutline suffix
Changed the name of the measures to business names.
Kept prefix members with their parent where appropriate in order to ensure
unique member names.
Specified dynamic calc for the higher level members of the DATE dimension.
Changed consolidation properties for the measures and added three
generation two members to the ACCOUNTS dimension in order to group the
measures as either quantity, values or loyalty points measures.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 453
Profit % is lost as a measure when processed across the bridge. This was
added back manually in Integration Server, specifying that the percentage be
rounded to one decimal place. Figure 10-22 and Figure 10-23 show how this
measure is defined in both DB2 Cube Views and Integration Server.
The Optimization Advisor was run against the cube model specifying a query
type of Extract. The MQT script that was generated was then run to create the
MQT. As this MQT is for extract, it is a straightforward summary table with a
simple GROUP BY. The MQT aggregation is based on the cube. In the MQT
create script the tables are tagged as in Figure 10-24. The actual columns in the
MQT create GROUP BY clause are shown in Figure 10-25. By relating these
back to the cube definition in Figure 10-20 on page 452 it is clear that the extract
is matching the cube exactly.
The data load SQL can then be copied and pasted into DB2 Explain to see
whether the data load will in fact use the MQT. In order to access the data load
SQL click Outline->User Defined SQL from the Integration Server metaoutline
display. You will then see the window displayed in Figure 10-26. The SQL can
then be copied from here.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 455
Figure 10-26 Integration Server data load SQL
Figure 10-27 shows the DB2 access plan graph from Visual Explain and this
indeed verifies that a table scan of the MQT will be used for the data load instead
of a join from the base tables.
The timerons cost taken from DB2 explain for the data load query with the MQT
and without the MQT are shown in Table 10-2.
A significant cost improvement can be seen when the query is re-routed to the
MQT.
A second example of the potential benefits of DB2 Cube Views comes from a
real customer installation. In this example the customer was a large enterprise
who had implemented partitioning of DB2 OLAP Server databases, sourcing
their data from a single database. Their source fact table was approximately a
300 million row table and from this they were able to build a 17 million row MQT
to meet the data load requirements of the DB2 OLAP Server databases.
The time to just run the query to extract the data from the base tables was
approximately 1.5 hours. This time does not include the time taken to write the
blocks in the DB2 OLAP Server database. The time to run the query to extract
from the MQT was just 2 minutes. Multiply this performance enhancement across
each of their MOLAP partitions and this represents a highly significant
improvement.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 457
In this kind of environment where there is a number of similar OLAP Server
databases, it may be worth spending some time thinking about the type of cubes
to design in DB2 Cube Views. If a straight one-to-one mapping takes place from
the Integration Server metaoutlines to DB2 Cube Views, then there will be the
same number of cubes in DB2 Cube Views as there are databases in DB2 OLAP
Server. When the Optimization Advisor is run in DB2 Cube Views, the script that
is generated will create one MQT per cube.
This may not be the most optimal design. It may therefore be worth considering
designing a fewer number of cubes in DB2 Cube Views that represent a
super-set of the actual cubes that might otherwise be created. Certainly, in this
particular example, only one MQT was created and was used for data load by
each of the partitions in the partitioned database.
DB2 Cube Views enables Hybrid Analysis. If the user’s query can be re-routed to
an MQT that is an aggregation of the base data when it crosses the line from
MOLAP to relational, then the user will experience better performance. The
transition from the MOLAP database to the relational database will be a
smoother one.
The more aggregation that has been achieved in the MQT, and therefore the
fewer the number of rows relative to the fact table, the greater the benefit for the
performance of the query.
The Optimization Advisor will attempt to create a low level slice of the data and
then to add one or a few more slices of the data. A key factor governing the slices
that are selected is the disk space limitation value that is entered when running
the Optimization Advisor.
Figure 10-28 shows an extract from the script that was created for the drill
through optimization for this cube (as is detailed in Figure 10-20 on page 452).
The extract shows that in this example the Optimization Advisor has selected two
slices of the database. The first is the low level slice which is identical to the
extract MQT. For the second slice, the Optimization Advisor has identified the
PRODUCT dimension as having the greater cardinality, and therefore as being
the one that is most likely to be in the relational database. This second slice of
the data goes right down to ITEM_DESC in the PRODUCT dimension.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 459
Having identified the PRODUCT dimension as the anchor point for what is most
likely to be in the relational database, the Optimization Advisor will then evaluate
a number of different options for identifying what else should be in the slice. In
this example, this second slice includes the top two levels of the CAMPAIGN
dimension.
Query workloads
The DB2 OLAP Server spreadsheet client was then used to perform a selection
of queries to measure the performance achieved when a result set has to be
fetched from the relational database. The queries that were run were simple
queries chosen to demonstrate performance characteristics of extending out of
the MOLAP database and into relational with MQTs, rather than to demonstrate
any query functionality of the end user tool or any analytical functions with DB2
OLAP Server.
Both query types were run for each Hybrid Analysis scenario. This generated a
number of query workloads which are detailed below. The name for each query
workload is the name of the query type prefixed with the name of the hybrid
scenario. For most query workloads, there are variations. Each variation is
named by using an alphabetic suffix.
Hybrid 1 (H1)
The dimension with the greater cardinality is PRODUCT, and with there being
over 10,000 products in the dimension table, the greatest benefit in terms of
reducing the size of the MOLAP database and the time it takes to calculate that
database would be in putting the leaf level of PRODUCT into relational as is
shown in Figure 10-29. In this figure, each of the dimensions and their
hierarchies are represented. Those members of the hierarchy that are inside the
area marked with the thick line are in MOLAP, and those members of the
hierarchy outside of the area (just item in this case) are in DB2.
Campaign
Type Gender Year Department Region
Sub
Campaign Age Range Quarter department District
Item
The queries that were run for the Hybrid 1 scenario are as follows:
H1_Query 1:
In this query, ITEM_DESC is in relational and other dimensions are at a high
level.
This query looks at the sale of shampoo products in the east region resulting
from new product introduction campaigns in 2000 and 2001, comparing sales
figures for females and males.
The members selected from generation two of each of the other hierarchies
are described in Table 10-3.
STORE East
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 461
This query involves the user drilling down on SHAMPOO to retrieve the
relational data. The measure used in each of the queries is sales.
Figure 10-30 shows the query as it would be in the spreadsheet client prior to
the drill down on SHAMPOO.
H1_Query 2:
In this query, ITEM_DESC is in relational and other dimensions are at a low
level.
This query looks at the sales of luxury shower products across the cities in
California resulting from a campaign targeting young single men. The report
looks at the months in the first quarter of 2001 and breaks the sales down into
three consumer groups, to see which products were purchased by the
different consumer groups.
The members selected from level zero of each of the other hierarchies are
described in Table 10-4.
STORE Each of the cities in California: San Jose, San Francisco, Los Angeles,
Sacramento, San Diego
For the Product dimension, six members from within the luxury shower
sub-class were selected, these members would need to be retrieved from the
relational database.
Figure 10-31 shows a subset of the report, it includes only one month from
the DATE dimension and two cities from the STORE dimension.
Hybrid 2 (H2)
In this scenario, the bottom two levels of PRODUCT (item and sub class) are put
into relational, as is shown in Figure 10-32.
Campaign
Type Gender Year Department Region
Sub
Campaign Age Range Quarter department District
Item
The queries that were run for the Hybrid 2 scenario are:
H2_Query 1:
In this query, ITEM_DESC and SUB_CLASS_DESC are in relational and
other dimensions are at a high level.
The members selected from generation two of each of the other hierarchies
are as specified in Table 10-3 on page 461. The query is very similar to the
one shown in Figure 10-30 on page 462.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 463
This time, however, the query is split into two parts, as there are now two
levels of the PRODUCT dimension in relational. H2_Query 1a involves a drill
down on HAIRCARE and H2_Query 1b involves a drill down on SHAMPOO
as before.
H2_Query 2:
In this query, ITEM_DESC and SUB_CLASS_DESC are in relational and
other dimensions are at a low level.
The members selected from level zero of each of the other hierarchies are as
specified in Table 10-4 on page 462. The query is very similar to the one
shown in Figure 10-31.
This time again the query is split into two parts as there are now two levels of
the PRODUCT dimension in relational. H2_Query 2a includes two level one
members, Stand Shower and Luxury Shower, from the PRODUCT dimension.
H2_Query 2b is identical to H1_Query 2.
A subset of H2_Query 2a is shown in Figure 10-33. All of the cities are shown
but again only one month is shown in the figure.
Hybrid 3 (H3)
In this scenario, the leaf level of two dimensions were placed outside of the
MOLAP database. PRODUCT is the clear choice for one of the dimensions
because of the number of items. The choice for the second dimension was not
such an obvious one in this particular model, as the cardinality across the other
dimensions was not so significant and was similar in each dimension. Usually
there would be a clear choice for which dimension might next be enabled for
hybrid. STORE was selected as an example of a second dimension, as is shown
in Figure 10-34.
Campaign
Type Gender Year Department Region
Sub
Campaign Age Range Quarter department District
Item
The queries that were run for the Hybrid 3 scenario are:
H3_Query 1
In this query, ITEM_DESC and AREA_DESC are in relational and other
dimensions are at a high level.
The members selected from generation two of each of the other hierarchies
are as specified in Table 10-5. It is almost the same as before, the only
change being for the STORE dimension.
STORE Florida
This query also now needs to have two parts, one being a drill down on the
PRODUCT dimension and one being a drill down on the STORE dimension.
H3_Query 1a looks like Figure 10-35 prior to the drill downs.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 465
Figure 10-35 H3_Query 1a
H3_Query 2
The members selected from level zero of each of the other hierarchies are as
specified in Table 10-4 on page 462.
The query is exactly the same as H1_Query 2 as shown in Figure 10-31 because
this already includes both ITEM_DESC and AREA_DESC.
Hybrid 4 (H4)
In this scenario, the bottom two levels of PRODUCT and the leaf level of STORE
are put into relational as is shown in Figure 10-37.
Campaign
Type Gender Year Department Region
Sub
Campaign Age Range Quarter department District
Item
The queries that were run for the Hybrid 4 scenario are:
H4_Query 1:
In this query ITEM_DESC and SUB_CLASS_DESC from the PRODUCT
dimension and AREA_DESC from the STORE dimension are in relational and
other dimensions are at a high level.
The query is again very similar to before, but this time three drill down queries
will be performed. Review H3_Query 1a in Figure 10-35 on page 466. The
initial report will need to be one level higher than this in the PRODUCT
dimension. The three drill down queries that will be performed, therefore, will
be:
– Drill down on HAIRCARE (H4_Query 1a). Select only SHAMPOO.
– Drill down on SHAMPOO (H4_Query 1b). Then select the shampoos as
shown in Figure 10-36 on page 466 and finally
– Drill down on Florida (H4_Query 1c).
H4_Query 2:
In this query ITEM_DESC and SUB_CLASS_DESC from the PRODUCT
dimension and AREA_DESC from the STORE dimension are in relational and
other dimensions are at a low level.
Query H4_Query 2a is identical to H2_Query 2a as shown in Figure 10-33 on
page 464. This includes SUB_CLASS_DESC and AREA_DESC.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 467
Query H4_Query 2b is identical to H1_Query 2 as shown in Figure 10-31 on
page 463 as this includes ITEM_DESC and AREA_DESC.
Hybrid 5 (H5)
In this scenario the bottom two levels of both PRODUCT and STORE are put into
relational, as is shown in Figure 10-38.
Campaign
Type Gender Year Department Region
Sub
Campaign Age Range Quarter department District
Item
The queries that were run for the Hybrid 5 scenario are:
H5_Query 1:
In this query, ITEM_DESC and SUB_CLASS_DESC from the PRODUCT
dimension and AREA_DESC and DISTRICT_DESC from the STORE
dimension are in relational and other dimensions are at a high level.
The query is again very similar to before, but this time four drill down queries
will be performed. Review again H3_Query 1a in Figure 10-35 on page 466.
The initial report will need to be one level higher than this in both the
PRODUCT and STORE dimension. The four drill down queries that will be
performed therefore will be:
– Drill down on HAIRCARE (H5_Query 1a). Select only SHAMPOO.
– Drill down on SHAMPOO (H5_Query 1b). Then select the shampoos as
shown in Figure 10-36 on page 466.
– Drill down on East (H5_Query 1c). Select Florida.
– Drill down on Florida (H5_Query 1d)
Query results
These query workloads were then run with each of the different levels of Hybrid
Analysis enabled. The SQL that Hybrid Analysis generated was captured by
using the Hybrid Analysis trace functionality within DB2 OLAP Server. A logging
level of 2 was used to capture the SQL.
The SQL that Hybrid Analysis generates can be copied and pasted into DB2
Visual Explain to see which tables will be accessed.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 469
For example, consider H1_Query 1, which involves a drill down on SHAMPOO.
Hybrid Analysis will generate two queries for this, the first to discover the
member names for the children of SHAMPOO, and the second to fetch the data
for those members. These two queries are shown in Example 10-4 and
Example 10-5.
The first query is a lookup from the PRODUCT table, and as such, the result set
will be taken directly from that table. However it is the second query that should
benefit from being re-directed to the MQT.
Without the MQT, the query accesses the fact table and each of the dimension
tables and has to perform many joins. This can be seen in the main section of the
access plan graph from DB2 Explain, which is shown in Figure 10-41.
With the MQT available to be used, the query is re-routed to the MQT.
Figure 10-42 shows the bottom section of the DB2 access plan graph. The initial
fetch was costed at 654.44 and after that there was very little cost involved, as
the final cost for the query was reported as 654.5.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 471
Figure 10-42 H1_Query 1 with MQT
The benefit of the MQT to the Hybrid Analysis user for this query is significant.
Each of the queries that have been described in this section were run and the
performance results captured. The tests were not run under benchmark
conditions, but they were run dedicated with no other jobs running at the same
time. For each query Hybrid Analysis will generate two or more SQL statements.
The performance results were recorded for each individual SQL statement (from
here on referred to as query) within each Hybrid Analysis query (referred to in the
charts as query workload). For each individual query the charts record the
elapsed query time both with and without the MQT being available, and whether
the query was re-routed to an MQT.
35802 16.699 N
35802 2.344 Y
As expected, the first query does not re-route, as it is just performing a query
lookup on the PRODUCT table. However the second query does re-route and the
elapsed time for the query is reduced from 17 seconds to 2 seconds.
The full results for each of the queries can be found in Appendix B, “Hybrid
Analysis query performance results” on page 661.
The results in the appendix have been summarized and are presented in
Table 10-8.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 473
Query workload Elapsed time without Elapsed time with MQT
MQT
Generally speaking, without MQTs, there is an expectation that the Query 2’s will
perform better than the Query 1’s because of the level of aggregation required in
the Query 1 queries (and of course the Query 2’s also perform record selection).
This is shown in the results for the majority of the query workloads. There are
exceptions to this however because another factor that needs to be taken into
account are the number of queries generated by Hybrid Analysis. For example,
consider the H5 set of query workloads. H5_Query 1a requires the highest level
of aggregation and without the MQT performs poorly even though only two
queries are actually generated for this workload. However, for the other query
workloads in H5 the Query 1’s slightly outperform the Query 2’s. For these other
query workloads the level of aggregation is less in the Query 1 workloads and
are less of a factor than the high number of queries being generated by these
other Query 2 workloads.
The effect of the MQT here is to significantly improve the performance of the
poorly performing Query 1’s because some or all of the aggregation required is
available in the MQT.
The very worst performing Query 1 queries (without the MQT) are H2_Query 1a,
H4_Query 1a and H5_Query 1a. These are all 1a queries which means higher
The point that was emphasized here was that with DB2 Cube Views the
performance results were more consistent from the outset, without having to
spend time doing performance analysis and creating numerous indexes in order
to achieve the optimum results. With the MQT the performance for H2_Query 1a
was improved from 161.955 seconds to 5.269 seconds. H4_Query 1a was
improved from 161.156 seconds to 1.143 seconds and H5_Query 1a was
improved from 162.635 seconds to 2.023 seconds.
When looking at the performance results in Table 10-8 on page 473 the key
factor that comes across is the improvement in the consistency of the query
response times when the MQT is available:
Without the MQT query response times varied from 0.511 seconds to as
much as 162.535 seconds.
With the MQT the response time variation was reduced significantly with
response times of between 0.474 seconds and 5.269 seconds. Note that the
performance figures in table 10-8 relate to the query portion of the Hybrid
Analysis query only and do not equate to the total response time experienced
by the spreadsheet client user
The detailed results can be found in Table B-1 in Appendix B, “Hybrid Analysis
query performance results” on page 661. Here it is interesting to see that for
some of the Hybrid Analysis query workloads more than one of the SQL queries
that get generated are able to be re-routed to the MQT. These queries are
H2_Query 1b, H3_Query 1b, H4_Query 1b, H4_Query 1c, H5_Query 1b,
H5_Query 1c and H5_Query 1d. Each of these query workloads generated two
data fetch queries that could be re-routed to the MQT.
None of the Query 2 type queries generated SQL that involved re-routing more
than once, although it is the Query 2 type queries that generate the most SQL
statements. The more dimensions and levels that are in relational, the more
queries Hybrid Analysis has to issue in order to confirm which table and column
a data value relates to.
The worst performing Query 2 type queries (without the MQT) are H2_Query 2a,
H4_Query 2a and H5_Query 2b. These are in fact all the same query, and they
are also queries at the sub-class level of PRODUCT when both sub-class and
item are in relational. With the MQT H2_Query 2a performance is improved from
14.237 seconds to 1.639 seconds. H4_Query 2a is improved from 14.759 to
2.503 seconds, and H5_Query 2b is improved from 13.597 to 2.652 seconds.
Only the script that was generated by the Optimization Advisor was run, no
additional database tuning was performed.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 475
A few of the query workloads actually perform slightly worse with the MQT
compared to without the MQT in this test. The reasons for this were not pursued
at the time as the increase was only approximately 1 second. The query
workloads that experienced this slight increase were all ones where higher
numbers of queries were generated, were all Query 2 type queries, and were all
ones at the lower levels of the hierarchies. A possible explanation for this could
be related to the fact that the MQT in this example was larger than the fact table.
The best results are usually achieved when the MQT is based on a slice of data
at higher levels of the dimensions, resulting in an MQT that is smaller than the
fact table. However in this example the MQT that was generated was positioned
at a fairly low level in relation to the fact table, resulting in an MQT that was larger
than the fact table.
A maximum batch window of 4 hours was specified for the calculation. The
calculation of the database with everything in the MOLAP database (including
item) did not complete within this time frame and was therefore canceled. The
number of blocks in the database after the data load was already 3,634,703
which is significantly larger than each of the fully calculated databases with
Hybrid Analysis enabled.
However, it is important to review carefully the content of the drill through reports
that are being developed. Generally if the drill through to the relational database
is to fetch lower levels of a hierarchy then this will probably, although by all means
not necessarily, be implemented as a Hybrid Analysis solution rather than an
Integration Server drill through report. Typically Integration Server drill through
reports are written to access information that is outside of the hierarchy, for
example to access text columns or additional dates.
If a drill through report accesses columns that are not available in the MQT then
the MQT will not be able to be used. For example the MQT in our scenario does
not go down to individual consumer or stores, therefore any report requesting
data at these levels would not re-route to the MQT. Placing these lower levels in
the cube in DB2 Cube Views in order to get them included in the MQT will
significantly increase the size of the MQT. Similarly, if the drill through report
requires a number of textual data columns to be included in the MQT, then again
the size of the MQT may increase significantly. It is important to always consider
the size increase implications of placing more columns and rows in the MQT.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 477
When the Optimization Advisor is run, it will take into account any disk space limit
that it has been given. If it estimates that to include the additional columns will
result in this disk space limit being exceeded, then the script that it produces will
not include one or more of these additional columns. It is important therefore to
review the script that is produced by the Optimization Advisor to check what has
been included.
The Optimization Advisor should then be run and the new MQT created. In the
Optimization Advisor there is no separate query type option to differentiate
between Hybrid Analysis and drill through reports. For both types of queries the
option that should be selected is drill through.
A subset of the report when it is run from the DB2 OLAP Server spreadsheet
client is shown in Figure 10-44. In this example the user in the spreadsheet client
has clicked on the cell intersection of SHAMPOO for the product subclass and of
Double Income No Kids for the campaign cell. The resulting report lists the items
within the shampoo subclass, the product brand code and the component
description.
The template SQL that Integration Server generates for the drill through report
can be accessed by clicking the Template SQL button in the Drill-Through
Reports window. This button can be seen in Figure 10-43 on page 478. The
template SQL can be copied into DB2 Visual Explain and modified to substitute
the actual column names and data values for the template containers in the
template.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 479
Figure 10-45 shows the bottom section of the DB2 access plan graph when there
is no MQT available. Here it is clear that the query is accessing the base tables.
Table 10-10 shows the relative performance costs that were provided by DB2
Explain.
Chapter 10. Accessing DB2 dimensional data using Integration Server Bridge 481
10.6 Conclusions
From the examples in this section, it is clear that DB2 Cube Views and DB2
OLAP Server work very well together. At all of the points where DB2 OLAP
Server issues queries to DB2, performance benefits may be gained if the DB2
optimizer is able to re-route the query to an MQT. The fewer the rows in the MQT
compared to the base fact table, the greater the gain in performance. The optimal
slices of data for this MQT will be dependent upon the cardinality of the data.
Data load, Hybrid Analysis, and Integration Server drill through reports can all
benefit from having their queries re-routed to an appropriate MQT.
Hybrid Analysis is of particular interest because there are two benefits. Firstly,
implementing Hybrid Analysis results in a smaller database and potentially a
dramatic decrease in calculation time. Secondly, the introduction of DB2 Cube
Views can assist in improving the performance of the relational queries that
Hybrid Analysis generates.
It may be that there is an existing DB2 OLAP Server MOLAP database for which
there is a requirement to reduce the calculation time. By placing the lowest level
of the dimension with the greatest cardinality in DB2, the calculation time can be
significantly reduced and the Hybrid Analysis relational queries can be assisted
with DB2 Cube Views.
Or there may be an existing hybrid environment and DB2 Cube Views can assist
by enabling an additional level in the same or another dimension to be included
in the relational part of the hybrid space.
The considerations for Hybrid Analysis do not change with DB2 Cube Views, but
DB2 Cube Views is an enabler for Hybrid Analysis and as such offers
administrators greater flexibility in how they design their OLAP databases.
The ability to exchange metadata in both directions between DB2 Cube Views
and Integration Server increases productivity. Once the metadata exists in one
product, it can be sent across the bridge to the other product, thereby enabling a
fast start in that second product.
Cognos Business Intelligence is easy to use, with all reporting and analysis
capabilities accessible from one Web-based portal. Users can select reports,
customize them, analyze information, and share information with the same
facility as using the Web. For IT departments, Cognos BI is easy to deploy and
administer, and is built for the demands of enterprise-scale environments.
Web or
Desktop
MOLAP
Cognos Query lets novice and experienced users directly access corporate data
resources for real-time data exploration. Using only a Web browser, users can
navigate suites of published queries, saving and modifying them as required to
meet their information needs.
Event detection
In concert with Cognos' powerful reporting and analysis capabilities, Cognos
NoticeCast provides the ability to push information to users, allowing them to
focus quickly on what needs immediate attention. NoticeCast delivers
personalized, high-value information based on defined events, providing
automatic monitoring of performance management. Within delivered alerts,
NoticeCast delivers business intelligence content and operational issues. Any
user, anywhere across the organization or value chain, can monitor key events
using e-mail notifications and alerts that push business-critical information to
them.
Scorecards
Cognos Series 7 delivers enterprise scorecarding through Cognos Metrics
Manager. Cognos Metrics Manager lets companies model plans or strategies as
a set of inter-connected performance indicators. This can communicate
goal-driven metrics to thousands of employees across your organization.
Cognos Metrics Manager is next-generation scorecarding technology that is an
essential component of corporate performance management. Your company can
move from its plans, to monitoring performance, to drilling down on information to
understand the issues causing the results.
Analytic applications
Cognos offers an integrated set of analytic applications based on its Cognos
Series 7 architecture. These applications come with pre-built reports,
performance indicators, and connections with underlying data sources from ERP
vendors. They package reporting and analysis, scorecarding, and planning
capabilities for the areas of customer, supply chain, and financial/operational
analysis.
Cognos Analytic Applications offer quick time to results, and a BI foundation that
can be easily customized with Cognos business intelligence technology.
Cognos bridge
The bridge populates
the metadata from
Cube Views into the
Cognos tools
Impromptu Transformer
SQL reports
(.imr)
Impromptu (see Figure 11-3) is SQL-based report writing tool and delivers the
widest possible range of management reports to users in any large organization.
Designers can create measure folders which group measures into business rules
which allow users to navigate through various measure rollups and drill down to
view the lower level measures in their OLAP reports. Designers can define
multidimensional data structures visually, using standard drag and drop actions;
define dimensions, levels, categories (members), or measures by dragging and
dropping data items appropriately. This applies to advanced features like
alternate drill downs, calculated measures, measure allocations, and calculated
categories.
Designers can easily define date periods to analyze data across time, from years
down to individual days. Designers can define their own custom time periods as
required, and easily create relative time period definitions.
Visualizer
Cognos Metrics
Manager
Cognos Query
You can generate Transformer models and Impromptu queries (.iqd) that reflect
your dimensional design. This helps you build, more quickly and easily,
PowerCubes that can provide drill-through access to the underlying DB2 data.
This allows you to have a starting point from which to build and expand your
business intelligence environment.
Terminology within the Cognos tool set closely matches the terms in DB2 Cube
Views, making it an easy transition from one environment to the other. Resulting
reports, catalog and model are easily updated and manipulated to meet the
requirements of the MQTs.
Any missing
Yes
objects from No
import? –
Step 2 recorded in
Import log
Step 3
The bridge prompts the user for a connection string that defines the DB2
Cube Views environment to be used. The user is then be prompted for
authentication to DB2. Any access via the bridge to DB2 Cube Views is
controlled by DB2 level security. The bridge then displays a list of DB2
Cube Views cube models that have been defined.
Note: The Cognos bridge will only import the DB2 Cube Views cube
models’ metadata.
2. Step 2: The bridge will parse the DB2 Cube Views metadata. This process is
updated dynamically into a log file. The OLAP metadata objects are divided
up into Cognos and non-Cognos objects. Cognos objects are built into the
appropriate Cognos tool, either an Impromptu Catalog (.cat) or Transformer
model (.mdl). The business rule names given to the objects in DB2 Cube
A set of reports is created by the bridge, based on each table of the star schema.
Each report has groupings based on the columns of the table as well as the
measures from the fact table. The first three columns of the table are grouped,
A default template for these reports is included with the bridge. This template
outlines the layout of the reports that the bridge generates. It can then be
customized in Impromptu appropriately. Templates are a starting point from
which reports can be built. They can contain information about column metadata,
margin settings, page orientation, font choices for different report objects and so
on. In many cases, users will create a default template that contains the
corporate or group logo, colors and fonts as per company standards. The
template report called libcogmdi.imt is stored in:
<rootlocation>\cer3\bin\ where <rootlocation> is the directory into which
the Cognos software was installed
The drill through report is based on the fact table of the star schema. By default,
it is linked to each measure within the Transformer model. For further information
about the default reports, please see section 2.4.
Figure 11-13 shows screenshots of how the Cognos environment can match the
cube model. These are screenshots of the fully modified Transformer model that
matches the cube model with the dimensions, measures, hierarchies, and
complex measures. Additionally, Cube Groups, Measure folders, and extra
dimensions have been created in Cognos, providing additional business value.
The first hierarchy listed in the cube model will be the first folder, and any
additional hierarchies will be listed within the folder called ‘Additional columns
folder’. As with the Transformer, further business rules can be added such as
additional calculations, prompts and filters to the Impromptu Catalog, enhancing
the business value of the users’ reporting capabilities.
Some of the Cognos OLAP features that can enhance the BI from DB2 Cube
Views include:
PowerPlay Cube Groups: A set of cubes built relating to a single level within
one dimension of the model. For instance, PowerCubes can be created on
Regions, or Campaigns (see Figure 11-13 on page 500)
Measure formatting: Applying formatting options to the measure values so
they appear for the consumers consistently.
Measure folders: Group measures into business rules which allow users to
navigate through various measure rollups and drill down to view the lower
level measures in their OLAP reports.
PowerCube level security: Restricting access to portions of data in a
PowerCube to certain members of the user community
Security in Cognos: Imbed the user ID for drill through access with the user id:
user will not be prompted when accessing the PowerCube or Impromptu
Report.
By default, Transformer will use a transactional query as its drill through source.
This works well for users who drill to the bottom of cube dimensions, then wish to
drill through to DB2 for more details. The cube model in DB2 Cube Views will
already have been defined to represent the complete star schema and the cubes
based on business needs too. Optimization Advisor would have already been
performed to recommend drill through MQT, assuming that the cube model is
reasonably optimized.
For example, if an MQT exists that is grouped on the Customer, Campaign and
Store dimensions, at the Age_Range, Campaign_Type and Region levels within
these dimensions, then:
1. Author an appropriate report in Impromptu.
2. Ensure that the resulting SQL is in the format of Example 11-1.
When the user clicks Drill Through in PowerPlay, a number of drill through
options are displayed, for example:
Drill through to campaigns by customer and region (see Figure 11-16)
Drill through to Products by Region, Area, and Campaign
Figure 11-16 will drill through to provide the results shown in Figure 11-17.
When PowerPlay drills to an Impromptu report (Figure 11-18), it passes the drill
context to Impromptu, which is potentially applied to query items in the report.
Note: Filtered dimensions that are not included in the drill through query will
simply be ignored, ensuring that the MQT is still used. Users have the option
of adding to the query any additional information they would like to include
once they have drilled through, or even drilling from this query to another,
more detailed, report (assuming report to report drill paths have been
defined).
The resulting SQL from this drill through report can be run in the DB2 Cube
Views Control Center, to verify that it is using the MQT (see Figure 11-19).
Figure 11-21 Drill through DB2 explain: without MQT (lower level access graph)
Figure 11-22 Drill through DB2 explain: without MQT (upper level access graph)
In this example, the drill through which leveraged an MQT index for its query cost
975.93 timerons. An equivalent drill through query against the fact and
dimension tables, rather than an MQT, cost 38,690.26 timerons. This MQT query
executed in 3.55 % of the time of the non-MQT query -96.45% faster! Table 11-2
summarizes the results.
2. This will generate a simple SQL SELECT query, returning the transaction detail
rows from the fact table as in Figure 11-24, and descriptive information from
the associated dimension tables (in this case the Campaign dimension table).
3. From the Task bar in Impromptu, select Report / Query, Group tab as shown
in Figure 11-25. Group on the Dimension Columns that you are reporting on.
This is to ensure that the query will return aggregated data, rather than the
individual transaction details.
4. Choose the Data tab, select each measure column in turn, and add an
aggregation, for example, total (which generates a SQL SUM) for each
measure column (data column from the fact table) as shown in Figure 11-26.
5. Once all measure columns are defined as aggregates, click OK. The query
returns aggregate data (see Figure 11-27), via the MQT.
Calculated measures
Calculated measures defined in the OLAP Center will need to be reproduced in
the Cognos environment. Figure 11-28 shows, for example, a measure “Profit%
in DB2 Cube Views.
So, if we need to analyze the trend of sales or profitability throughout the week,
and compare weekdays to weekends, we can add a manual level to the
dimension in Transformer: Weekday/Weekend. Figure 11-39 shows % Profit by
Year and Day of Week.
Measure formatting
Measure formatting can be applied within Transformer (see Figure 11-40), to
ensure that measures are formatted appropriately (for example, $, %) when
displayed to the user in PowerPlay. The measure formatting properties are
stored in the PowerCube, and are the default settings for the measure.
In other words:
Who is buying? (Customer dimension)
What are they buying? (Product dimension)
When are they buying? (Date dimension)
Where are they buying? (Location dimension)
How much? (Revenue and Units Sold measures)
You can see customer buying patterns, needs, answer important business
questions and track sales performance metrics. For example, rank and compare
sales volumes and values by customer type over time, or evaluate the
effectiveness of sales resources.
Following are some example business questions that can be answered with the
Cognos reporting and analysis tools:
2. To more easily see the trend, switch to a clustered bar graph view (either via
the Clustered Bar Graph toolbar button, or using the Change Displays
option under the Explore menu option) (see Figure 11-42).
3. We can now clearly see that the sales for Campaign New Product
Introduction were much higher in 1999. Drill down into New Product
Introduction and drill into Year 1999 as shown in Figure 11-43.
4. To see which days of the week these sales are occurring, drag the Day of
Week dimension over one of the date labels for example, 1999 Q2. Drag the
Consumer dimension into the legend (see Figure 11-44).
6. Hide the Unknown category (right-click the category and select Hide) You
should see in Figure 11-46 that, even though most purchases are being made
by female consumers, the proportion of males increases at the weekend.
To illustrate, we will analyze which are our top 10 stores for revenue for the
organization, based on YTD growth over the same period last year.
2. Using the Rank toolbar icon (or by selecting Rank from the Explore menu),
rank on YTD Growth, and select Top 10 shown in Figure 11-51.
To do this:
1. We would first need to add the margin range as a calculation in the
Impromptu catalog as shown in Figure 11-53.
2. We then add this calculation to the Fact iqd, and finally to the Transformer
Model. We now have this margin calculation available as a dimension. For
example, to answer the question:
– How many high profit transactions occurred, by region, in 2001?
We built the graph in Figure 11-54.
3. Drill into the Profit Measure, to see the source measures it is derived from.
We can see in Figure 11-57 that our Costs have reduced, but our Sales have
reduced by more, hence the reduction in profit compared to last year.
The basis for the scenarios we used came from the Cognos Analytic
Applications. These Applications combine software and services expertise from
Cognos with the best practices, thinking, and experience of business experts.
The result is business intelligence software that comes with built-in reports, key
metrics, and integrated business processes. Customers can be up and running
(and gaining value) from business intelligence technology, quickly and easily.
Table 11-3 summarizes some of the grouping combinations tested with Cognos
and the results observed when using the DB2 Explain access plan graph.
Both Cognos PowerPlay for drill through and Cognos Impromptu catalog for
other SQL reports may benefit from MQTs.
The drill through report is basically a SQL report that a report author would
create. Optimally, as explained in 11.4.1, “Optimizing drill through” on page 502,
that report would be created with the knowledge of the MQTs and therefore built
to leverage the MQT. The PowerPlay Cube allows users to navigate the data
without having to connect to the database. They would perform OLAP analysis in
PowerPlay until they get to a point where they wanted additional data that was
not in the PowerCube or a list style report with invoice numbers, for example, on
it that they would drill through. When the user drills through, he is using the
Impromptu report which has been already defined and designed to leverage the
MQT. The end user who performs the drill through does NOT build their own
report from scratch. The navigation they perform in the PowerCube will be
passed on in a filter to the Impromptu before it is executed against the database
(sent to the Optimizer).
So we could say that ALL reports in Impromptu have the capability to use the
MQTs.
A good practice when designing star schema and cube model to leverage MQT
performance, would be:
Understand the business rules and business questions that the users want to
ask.
Find out what are the most common types of inquiries the users want to
perform and then design the star schema around that.
Define DB2 Cube Views cube model and cubes to meet those expectations.
To build the reports in “Scenarios” on page 524, Cognos ran several SQL based
reports leveraging the MQTs and the metadata from DB2 Cube Views. Dramatic
improvements in performance were realized with these reports. The architecture
of the Cognos BI Solution leverages the improved query performance with the
use of SQL based Impromptu reports, via drill through and direct query.
Maximum benefit of the MQTs is realized when the SQL reports themselves are
built using the intelligence of the metadata in combination with the definitions of
the MQTs. Optimal performance of the Impromptu report is realized.
The metadata import greatly facilitates the designing and creation of effective
and meaningful MOLAP PowerCubes. PowerCube data population uses
transaction level data, and as MQTs are generally built at summary levels, this
process will not necessarily leverage MQTS. Users navigate large volumes of
summarized data with sub-second response times using PowerCubes.
At the time of this publishing, Cognos was set to release Cognos ReportNet.
Cognos ReportNet was designed and developed to meet the requirements of all
areas of enterprise reporting (ad hoc reporting, managed reporting, production
reporting). ReportNet leverages DB2 Cube Views, providing complete reporting
access.
Founded in 1969, Cognos serves more than 22,000 customers around the world.
In this section we will only discuss on Business Intelligence platform. For more
information on BusinessObjects product line, please check the Web site:
http://www.businessobjects.com
WebIntelligence is the industry’s best query, reporting, and analysis solution for
the Web. WebIntelligence is a thin-client solution that enables users to query,
report, analyze, and share corporate data using a simple browser as their
interface, while maintaining tight security over data access.
Information delivery
BusinessObjects Enterprise 6 meets information delivery requirements through
the combination of a BI portal and powerful broadcast capabilities.
By providing easy access to DB2 Cube Views’ metadata, you leverage your
investment in existing technology, increase the efficiency and effectiveness of
BusinessObjects universe management, and optimize your BusinessObjects
reports’ queries when using DB2 Cube Views’ MQTs.
BusinessObjects
Repository
Cube/Cube Model
Data
DB2 Database
The DB2 Cube Views metadata is exported in a DB2 Cube Views XML file.
This file can be imported by BusinessObjects Universal Metadata Bridge to
automatically create BusinessObjects universe file according to mapping rules.
BusinessObjects classes and objects are derived from the DB2 Cube Views
objects according to the mapping listed in Figure 12-3 and in Figure 12-4.
Note: The values between the < > marks are default values to assign to
BusinessObjects objects properties that cannot be Null and when the
corresponding value is missing in DB2 Cube Views.
You can visualize these mappings in the following numbering on the diagram in
Figure 12-5:
1. Measure class is created in the BusinessObjects universe for the list of the
measures defined in the fact.
The exception is where the dimensionRef property is empty. In this case the
name is:
<measure name> by other dimension
Figure 12-10 DB2 Cube Views XML file for multiple aggregations
When the BusinessObjects Universal Metadata Bridge reads the XML file, it
converts the aggregations to measures.
The universe list includes the new measures as shown in Figure 12-14.
Figure 12-15 lists the internal data types and their equivalent in BusinessObjects
objects.
2. To export metadata:
a. From the OLAP Center main window, click OLAP Center --> Export. The
Export window opens.
b. Select either one cube model or one cube model and one cube to export.
You cannot export a cube without its corresponding cube model.
c. Specify an export XML file name or browse for an XML file to overwrite.
d. Click OK. The Export window closes, and a DB2 Cube Views XML file is
created containing information about the metadata objects that you
specified.
BusinessObjects Universal Metadata Bridge analyses the content in the XML file
to extract metadata information. It then creates a BusinessObjects universe
including classes, objects, tables, columns, custom hierarchies, aggregation
functions and joins.
2. In the XML section of the screen, select the XML file location by either typing
in the path to the file, or clicking the button next to the XML File text box as in
Figure 12-22 and Figure 12-23.
3. Cube is the default option button selection and the available cube schemas
appear in the list box. If you would rather use a cube model, click Cube
Model.
4. Select the schema you want to use to create a universe, and click Import.
The schema appears in the panel as shown in Figure 12-24.
5. Select the schema you want to use to create a universe, and click Import.
The schema appears in the pane as shown below.
Notice that the object group contains dimensions, attributes, hierarchies, and
measures.
6. Enter the universe name.
7. Select a universe connection in the Universe Connection panel.
8. If you want to replace an existing universe, click the Replace existing
universe check box.
All parameters needed for batch mode during execution are entered as
arguments of the executable when it is called.
To create a universe using batch mode, one or more XML files containing
metadata must be available.
Batch mode can be called from a command line, script, or Scheduler. Batch
mode produces a log file containing errors and warnings encountered during
execution of the batch file. A batch file is composed of:
Batch files sequences
Batch file arguments
To check if the report query is optimized by DB2 Cube Views through the
Optimization Advisor, we used the following method:
1. In BusinessObjects, launch SQL Viewer from the Query Panel (see
Figure 12-30), and copy the SQL statement.
2. In DB2 Control Center, launch Explain SQL Panel, and paste the SQL
statement.
3. In DB2 Control Center, analyze the access plan graph from DB2 Explain to
check if MQTs is used.
The following examples show reports created on top of the universe that has
been previously built with the BusinessObjects Universal Metadata Bridge. It can
be seen that query response times are improved by MQTs.
12.4.1 Query 1
Query 1 addresses the business question:
What are the top five most profitable consumer groups?
The report
The report is shown in Figure 12-33.
The SQL
The SQL is shown in Example 12-1.
The Access Plan Graph of the query shows that tablespace scans have been
used because no MQTs can be used for query rewrites.
The measured response time for the refresh of the report is also long: 12
seconds.
The Access Plan Graph of the query is simple; The MQT MQT0000000001T01
has been used to retrieve all the information.
The response time of “Refresh report” action is short in this case, thanks to the
MQT. In this case less than 4 seconds were needed for refreshing this report, of
which most can be attributed to system and network latency.
12.4.2 Query 2
Query 2 addresses the business question:
What are the most Profitable Consumer Groups buying (Level 1 of product)?
The report
The report is shown in Figure 12-36.
The SQL
The SQL is shown in Example 12-2.
This is acceptable for queries that only are run occasionally. Moreover since the
cost of creating MQTs for low levels in the hierarchies are high in terms of space
while yielding only few performance benefits we generally would avoid these
types of MQTs.
12.4.3 Query 3
Query 3 addresses the business question:
What are the top three most profitable departments per year by region?
The report
The report is shown in Figure 12-37.
The Access Plan Graph of the query shows that tablespace scans have been
used because no MQTs can be used for query rewrites.
With MQTs built by the Optimization Advisor, the data access used is described
in Figure 12-39.
The Access Plan Graph of the query is simple; the MQT MQT0000000001T02
has been used to retrieve all the information. The response time of Refresh
report action is shorter in this case, thanks to the MQT.
12.5 Deployment
The design of DB2 Cubes and MQTs is an iterative process. The review of the
BusinessObjects Universe allows you to improve your Cube/Cube Model and the
BusinessObjects query response time gives you indications on how you should
use DB2 Optimization Advisor to create the required MQTs.
Familiarity with DB2 Cube Views and the MicroStrategy product suite is assumed
in the following sections of this chapter.
BI Architect IT Administrator
Windows UI Windows UI
DB2
Cube Views
After the DB2 Cube Views metadata is defined, the Import Assistant can be used
to convert this multidimensional information into its MicroStrategy equivalent that
will serve as the basis for additional development or immediate reporting
activities. The Import Assistant analyzes the DB2 Cube Views metadata —
translating each component, including Attributes, Hierarchies, and Measures —
and produces a MicroStrategy project ready for use. Within the MicroStrategy
environment, one can then run queries and create reports right away or enhance
the project further to take advantage of modeling facilities specific to
MicroStrategy.
The flow of information during the import process is represented in Figure 13-2.
MicroStrategy
Metadata
Project Creator
Cube Translator
Warehouse
Cube Reader
Catalog
Cube Views
XML
DB2
The ZIP file contains a stand-alone installation for the Import Assistant and its
on-line help. The Import Assistant must be installed on a machine with
MicroStrategy Architect V7.2.3.
13.3.2 Prerequisites
Prior to using the Import Assistant, the following prerequisites should be
completed:
The MicroStrategy product suite is installed on a machine and an ODBC DSN
to the database is set up.
A database is created for the purpose of hosting the MicroStrategy metadata
and an ODBC DSN is established for it.
The MicroStrategy metadata is configured using the MicroStrategy
Configuration Wizard.
The DB2 Cube Views metadata is defined and its XML representation is
generated.
13.3.3 Import
To begin using the Import Assistant: Double-click MstrDb2Import.exe. The
Import Assistant dialog is shown in Figure 13-3. You must enter the following
input parameters:
The location of the schema definition file that is the DB2 Cube Views XML file.
The project source in which to create the project based on the imported
metadata.
The database instance that points to the DB2 Cube Views database.
The location of the log file that the import process generates.
Note: The DB2 Cube Views XML file must contain a single cube (not a cube
model); otherwise the Import Assistant will not function properly.
Project source
The project source specifies the MicroStrategy metadata where to import the
DB2 Cube Views metadata. You may determine the project source in one of the
following ways:
Select the appropriate project source from the drop-down menu.
Click New to create a new project source. This opens the Project Source
Manager. You need to choose a name for the new project source and enter
the ODBC DSN for the MicroStrategy metadata and its corresponding
database login and password.
When you have selected your project source, click Login. Enter the username
and password, and click OK. You must have administrator privileges to log in
Database instance
The database instance specifies connection information to the DB2 Cube Views
database. You may do this in one of the following ways:
Select the appropriate database instance from the drop-down menu.
Click New to create a new database instance. This opens the Database
Instances dialog box. You need to choose a name for the new database
instance and enter the ODBC DSN for the DB2 Cube Views database and its
corresponding database login and password.
Import
When you have finished determining the schema definition file, project location
and process log file, click Import. The metadata from IBM DB2 Cube Views
begins to transfer to MicroStrategy 7i. The Import Assistant displays status
information about the different steps of the import process in its feedback
window. The feedback window is shown in Figure 13-4.
When the transfer is complete, open MicroStrategy Desktop and log in to the
project source you selected to view your imported project.
At the present time, cube models, cubes, facts, cube facts, cube dimensions and
cube hierarchies are not used to extract information since they are either
container objects or subset objects.
The user can thus start creating new reports based on this infrastructure.
The DB2 Cube Views object model and the MicroStrategy object model do not
coincide exactly. Table 13-1 summarizes the mapping between the two object
models.
Attributes Attributes
Descriptive Relationships
Joins
Hierarchies Dimensions
Hierarchies
Facts Measures
Metrics Measures
Attributes
The Import Assistant supports all attribute definitions specified in the DB2 Cube
Views XML including attributes that use IBM DB2 functions and attributes that
are based on other attributes. One exception to note is the case of attributes
defined across multiple tables.
Joins
MicroStrategy does not explicitly have the concept of a join. Join information is
used in part to infer MicroStrategy attribute definitions and the MicroStrategy
system dimension.
The Import Assistant does not support joins other than equi-joins nor joins on
more than one column. These joins are simply ignored during the import
process.
Currently, the Import Assistant does not handle joins between columns with
different names. It is recommended to use MicroStrategy Architect after the
import to properly map the columns. An alternative is to rename the columns
directly in the database to render the naming convention consistent.
Descriptive relationships
The Import Assistant uses descriptive relationships to merge the relevant DB2
Cube Views attributes into a single MicroStrategy attribute with multiple attribute
forms. The DB2 Cube Views designer should define as many descriptive
relationships as possible to ensure the most accurate representation of the
model while still avoiding redundant relationships.
Associated relationships
Associated relationships are used by the Import Assistant to further refine the
model and link logically-connected parts of the system dimension. Whenever few
joins exist within a dimension, it is strongly recommended to define as many
associated relationships as the model logically requires. An alternative is to link
the various resulting independent attributes with MicroStrategy Architect after the
import.
Dimension
The Import Assistant supports all dimension definitions.
Hierarchies
The Import Assistant creates a hierarchy for every DB2 Cube Views hierarchy. It
is recommended that hierarchies be defined using attributes that are ID forms for
MicroStrategy attributes.
Measures
The Import Assistant converts each measure into a set of objects: facts, base
formulae and metrics.
Asymmetric measures are currently not handled by the Import Assistant. They
should ideally result in nested aggregation metrics in MicroStrategy. For example,
the measure PROMO_SAVINGS_PTS defined below should be aggregated with
MAX along the Campaign dimension, AVG along the Time dimension and SUM
along other dimensions as specified in Example 13-2.
The Import Assistant does not support measures defined across multiple tables.
Note: The performance results are expressed in DB2 access path timerons
cost measure.
Note: In order to ensure that MicroStrategy will make use of DB2 MQTs, a
good practice is to include ID columns in the DB2 Cube Views cube model
design since the SQL built through MicroStrategy include ID columns.
Note: For the following examples, the Attribute names in MicroStrategy have
been modified to enhance the readability of the reports. For example,
REGION_ID has been renamed Region.
The business question mentioned above has been resolved making use of
MicroStrategy Metric Level functionality, which allows the user to create metrics
with a specific dimensionality. In this case, the user has created a Transaction
Sales Amount metric (Trnx Sale Amt) at the report level and a second
Transaction Sales Amount metric at the Region level (see Figure 13-5 for the
result).
Note 2: The Regional Transaction Sales Amount Contribution metric has been
created using the Derived Metrics functionality in MicroStrategy. Derived
metrics allow users to create metrics on the fly after report results have been
returned.
Note: Make sure you remove all tabs from the SQL by using a text editor
before pasting the SQL in the DB2 Explain SQL Dialog.
The database cost for solving this SQL was 3,779.44 timerons with a more
simplified construction complexity. The results are shown in Figure 13-9.
Table 13-2 summarizes the data access path costs issued from DB2 explain
when using DB2 Cube Views MQTs.
The business issue mentioned above has been resolved using the Drill
Anywhere functionality in MicroStrategy which allows the user to drill anywhere in
the project’s browsing hierarchies. In this case, the user has drilled to the
Campaign attribute from Region in the 01 – Regional Department Sales
Contribution report, as shown in Figure 13-10.
Table 13-3 summarizes the data access paths costs issued from DB2 explain
when using DB2 Cube Views MQTs.
The business issue mentioned above has been resolved by making use of the
Rank function in MicroStrategy. Two metrics have been developed using this
functionality: one that ranks the Transaction Sales Amount over all Regions, and
a second one that makes use of the Break By function parameter set at the
Region level. This second metric will provide the user with a Rank on Campaign
per Region while the first one will provide the user with a Rank over all Regions.
After submitting the report’s SQL to a database with DB2 Cube Views and MQTs
available, the total database cost for generating results is 3779.44 timerons.
Table 13-4 summarizes the data access paths costs issued from DB2 explain
when using DB2 Cube Views MQTs.
The business issue mentioned above has been resolved by making use of the
Report Limit functionality in the Report Data options in MicroStrategy. The user
has created a Top 5 filter based on the Rank of Trnx Sales Amt and added the
filter to the Report Limit properties in the Report Data options. The report is
displayed in Figure 13-15.
After submitting the report’s SQL to a database with DB2 Cube Views and MQTs
available, the total database cost for generating results is 1002.01 timerons.
Table 13-5 summarizes the data access path costs issued from DB2 explain
when using DB2 Cube Views MQTs.
The business issue mentioned above has been resolved by making use of the
Conditional Metrics functionality in MicroStrategy. The user has created two
metrics based on the sum of Transaction Sale Amount and adding to each one of
them a filter on the desired age groups.
Note: The metrics “26-25 Contribution” and “46-55 Contribution” were created
using the Derived Metrics functionality in MicroStrategy.
After submitting the report’s SQL to a database with DB2 Cube Views and MQTs
available, the total database cost for generating results is 3,281.01 timerons.
Table 13-6 summarizes the data access path costs issued from DB2 explain
when using DB2 Cube Views MQTs.
Web services are unlikely to become the new slice, dice, and drill interface for
dedicated OLAP tools. These tools require the high-speed service they get from
existing native interfaces. But Web services-based analytic applications will need
access to multidimensional information. These new applications will be cross
organizational and business boundaries, assembling information from a variety
of sources and using it to inform and drive business processes.
Web services for DB2 Cube Views provides the following simple, high-level Web
services using XPath as the query language.
Describe: To query and navigate through OLAP metadata
Members: To retrieve dimension member data
Execute: To execute slice and dice queries on a cube
The following are the primary advantages of using Web services for DB2 Cube
Views:
Allows application developers to provide analytic capabilities to any client on
any device or platform using any programming language. These Web
services are based on open standards like XML, HTTP and SOAP, so the
clients can have an independent implementation using any tool, technology or
hardware platform. For example, client applications that run on pervasive
devices like PDA can access OLAP data. Refer to 14.2, “Overview of the
technologies used” on page 615 for more understanding of XML and SOAP.
Allows client applications to easily and securely access remote analytical data
hosted by partners, customers or suppliers over the Web. This helps in
building analytical applications from diverse sources of data.
Transition from a tightly-coupled client-server paradigm to loosely-coupled
Web-based analytical systems. Prior to Web services, the client component
to access OLAP server had to be installed on each client.
Input to these Web services need to be specified as an XPath expression and
the output is an XML document. So, the application developers can leverage
on their existing knowledge on XML and XPath without requiring to learn
OLAP interface and query languages. Refer to 14.2, “Overview of the
technologies used” on page 615 for more understanding of XPath.
Management
Security
UDDI Service publication
14.2.2 XML
eXtensible Markup Language is an extensible tag language that can describe
complicated structures in ways that are easy for programs to understand. Web
services depend heavily on XML. XML is language- and platform-independent. it
is XML that enables the conversation between business programs.
XML is a meta-markup language and is used for creating your own markup
languages. Using XML, you can define the tags for your markup language. XML
tags are used to describe the contents of the document. This means that any
type of data can be defined easily using XML. XML is universal not only by its
range of applications but also by its ease of use: Its text-based nature makes it
easy to create tools, and it is also an open, license-free, cross-platform standard,
14.2.3 SOAP
The current industry standard for XML messaging is Simple Object Access
Protocol (SOAP). SOAP is the basis for the W3C XML Protocol Working Group.
SOAP Remote Procedure Call (RPC) is the latest stage in the evolution of SOAP;
the body of a SOAP message contains a call to a remote procedure and the
parameters to pass in. Both, the call and the parameters are expressed in XML.
14.2.4 WSDL
If we want to find services automatically, we require a way to formally describe
both their invocation interface and their location. The Web Services Description
Language (WSDL) V 1.1. provides a notation serving these purposes.
A WSDL specification uses XML syntax, therefore, there is an XML Schema that
defines the WSDL document.
14.2.5 UDDI
UDDI stands for universal description, discovery, and integration. UDDI is a
technical discovery layer. It can be seen as the Yellow Pages in the Web
services world. It defines:
The structure for a registry of service providers and services
The API that can be used to access registries with this structure
The organization and project defining this registry structure and its API
14.2.6 XPath
XML Path Language (XPath) provides a notation for selecting elements within an
XML document. That is, XPath is a language for addressing and matching parts
of an XML document when considered as a tree of nodes. It uses a compact and
non-XML syntax. XPath operates on the logical structure underlying XML. Xpath
models an XML document as a tree of nodes (root nodes, element nodes,
attribute nodes, text nodes, namespace nodes, processing instruction nodes,
and comment nodes).
The basic syntactic construct in XPath is the expression. An expression is the full
XPath syntax. An object is obtained by evaluating an expression, which has one
of the following four basic types:
Node-set (an unordered collection of nodes without duplicates)
Boolean
Number
String
XPath uses path notation to define locations within a document. The paths
starting with a “/” signifies an absolute path. A simple example of this follows.
The XPath location step makes the selection of document part based on the
basis and the predicate. The basis performs a selection based on Axis Name
and Node Test. Then the predicate performs additional selections based on the
outcome of the selection from the basis. A simple example of this is as follows:
The path /library/book[1] selects the first book element under library.
The OLAP service provider may be registered in a UDDI registry for service
requestors or clients to find and discover Web services to retrieve meta-data, to
execute slice and dice queries, and to retrieve member data.
OLAP Web services client can discover OLAP providers in UDDI registries,
access the provider through the Web services to retrieve XML descriptions of
cubes and execute slice and dice queries on the cubes.
Cube
Metadata
SOAP SOAP
Cube
HTTP HTTP
Data
These Web services provide a means to query a cube defined in DB2 Cube
Views for its metadata, members and measures data, in XML format.
The Describe Web service accesses the DB2 Cube Views metadata catalog
tables using DB2 Cube Views API to retrieve the information. The Members and
Execute Web services use cube metadata and its input parameter values to
construct the SQL to query the base tables of the cube (star schema tables).
Figure 14-3 shows the input and output for the Web services provided by IBM.
XPath (Specifies the query: XPath (Specifies the query: XPath (Specifies the
Cube, Dimension, Hierarchy Cube, Dimension, Hierarchy Where-Clause and
Level) Level) Aggregation Level)
Depth (Specifies the grain Depth (Specifies the grain Measures (list of measures)
of the query) of the query)
OUTPUT
<Cube-1 Name> <Cube-1 Name> <Cell-1 Measure1=.... Measure 2=... Dim
<Cube Dimension 1> <Cube Dimension 1> Member = Dim Member = ..>
< Hierarchy Level 1> <Member1 at level 1> <Cell-2 Measure1=.... Measure 2=... Dim
<Hierarchy Level 2> <Member at Level 2> Member = Dim Member = ..>
.... .... .....
</Cube Dimension 1> <Member1 at level 1>
<Cube Dimension 2> <Member at Level 2>
.... ...
<Cube Fact> </Cube Dimension 1>
<Measure 1> <Cube Dimension 2>
.... ....
</Cube-1 Name> Cube </Cube Dimension 2> Dimension Cube
<Cube-2 Name>
....
Metadata ....
</Cube-1 Name> members slice
</Cube-2 Name>
To describe each of the Web services for DB2 Cube Views, let us consider the
representation of the Sales cube in Figure 14-4.
14.4.1 Describe
Client applications can retrieve the cube metadata defined in DB2 Cube Views
using the Describe Web service. The metadata for a cube defined in DB2 Cube
Views includes:
Cube dimensions
Cube dimensions hierarchy
Cube fact (cube measures)
The metadata also includes the business names for the cube, each of its
dimensions, the levels in the dimension hierarchy and the measures.
The metadata does not include the actual member and fact data.
Hierarchy
DATE
CAL_MONTH_DESC businessName="Calender Month Name">
<DAY_DESC businessName="Day Description"/>
</CAL_MONTH_DESC>
</CAL_QUARTER_DESC>
</CAL_YEAR_DESC>
</DATE>
<CAMPAIGN businessName="CAMPAIGN" kind="cubeDimension"> CAMPAIGN
Dimension
<CAMPAIGN_TYPE_DESC businessName="Campaign Type Description">
<CAMPAIGN_DESC businessName="Campaign Description">
<STAGE_DESC businessName="Stage Description">
<CELL_DESC businessName="Cell Description">
CAMPAIGN
<PACKAGE_DESC businessName="Package Description">
Hierarchy
<COMPONENT_DESC
businessName="ComponentDescription"/>
</PACKAGE_DESC>
</CELL_DESC>
</STAGE_DESC>
</CAMPAIGN_DESC>
</CAMPAIGN_TYPE_DESC>
CONSUMER
Dimension
</CAMPAIGN>
<CONSUMER businessName="CONSUMER" kind="cubeDimension">
<GENDER_DESC businessName="Gender Description">
CONSUMER
<AGE_RANGE_DESC businessName="Age Range Description">
Hierarchy
<FULL_NAME businessName="Full Name"/>
</AGE_RANGE_DESC>
</GENDER_DESC>
CONSUMER
</CONSUMER> Dimension
<PRODUCT businessName="PRODUCT" kind="cubeDimension">
<DEPARTMENT_DESC businessName="Department Description">
<SUB_DEPT_DESC businessName="Sub Department Description">
<CLASS_DESC businessName="Class Description">
<SUB_CLASS_DESC businessName="Sub Class Description">
PRODUCT
Hierarchy
Hierarchy
<AREA_DESC businessName="Area Description">
STORE
<STORE_NAME businessName="Store Name"/>
</AREA_DESC>
</DISTRICT_DESC>
</REGION_DESC>
</CHAIN_DESC>
</ENTERPRISE_DESC>
</STORE>
Fact
<SALES_FACT businessName="SALES FACT" kind="cubeFacts">
<Profit businessName="Profit" />
<CURRENT_POINT_BAL businessName="Consumer Point Balance" />
<MAIN_TENDER_AMT businessName="Main Tender Amount"/>
<MAIN_TNDR_CURR_AMT businessName="Main Tender Current Amount" />
<PROMO_SAVINGS_AMT businessName="Promotion Savings Amount" />
<PROMO_SAVINGS_PTS businessName="Promotion Savings Points" />
Measures
<TOTAL_POINT_CHANGE businessName="Total Point Change" />
<TRXN_COST_AMT businessName="Transaction Cost Amount" />
<TRXN_SALE_AMT businessName="Transaction Sale Amount" />
<TRXN_SALE_QTY businessName="Transaction Sale Quantity" />
<TRXN_SAVINGS_AMT businessName="Transaction Savings Amount" />
<TRXN_SAVINGS_PTS businessName="Transaction Savings in Points"
/>
<CONSUMER_QTY businessName="Consumer Quantity" />
<ITEM_QTY businessName="Item Quantity" />
<TRXN_QTY businessName="Transaction Quantity" />
</SALES_FACT>
</Sales_Cube>
There are also other attributes associated with each of the elements. For
example, Business Name is an attribute for each of the elements in the XML
document.
Any element in the XML document can be referenced by specifying the XPath.
Refer to section Section 14.2.6, “XPath” on page 618 to understand XPath. For
example, DATE within the XML document in Figure 14-5 can be referenced as
Sales_Cube/DATE.
Depth is used to filter out children-nodes below a certain level from the nodes
selected by the XPath. Depth -1 indicates no filter.
A client application can use Describe Web service to query specific metadata
information by specifying an XPath query expression and depth.
Table 14-1 explains the input and output parameters of the Describe Web
service.
If a client application wants to query the STORE dimension in Sales cube and
restrict selection to only 3 levels deep, the Describe Web service will be invoked
with the following input.
XPath: Sales_Cube/STORE
The output shown in Figure 14-7 from the Describe Web service will be the
metadata for STORE dimension with information on the top 3 levels in the
hierarchy.
STORE Hierarchy
<ENTERPRISE_DESC businessName="Enterprise Description" >
(top 3 levels)
<CHAIN_DESC businessName="Chain Description">
<REGION_DESC businessName="Region Description" />
</CHAIN_DESC>
</ENTERPRISE_DESC>
</STORE>
A client application can query the Sales_Cube for all its metadata by invoking the
Describe Web service with the following input.
XPath: Sales_Cube
Depth: -1
The output will be the same as the complete XML representation of the Sales
Cube as in Figure 14-2.
A client application can query the Sales_Cube for only its high level metadata by
invoking Describe Web service with the following input.
XPath: Sales_Cube
Depth: 1
As you can see, the metadata retrieved by the XPath is controlled by altering the
value of the Depth parameter.
DATEDimension
....
Members
</CAL_MONTH_DESC>
<CAL_MONTH_DESC name="June 1998">
....
</CAL_MONTH_DESC>
....
</CAL_QUARTER_DESC>
<CAL_QUARTER_DESC name="Fourth Quarter 1998">
.....
</CAL_QUARTER_DESC>
...
DATEDimension
</CAL_YEAR_DESC>
Members
<CAL_YEAR_DESC name="1999">
.....
</CAL_YEAR_DESC>
...
</DATE> CONSUMER Dimension
Dimension
Consumer
Members
<FULL_NAME name="Elvira Ricks" />
....
</AGE_RANGE_DESC>
<AGE_RANGE_DESC name="19-25">
.....
</AGE_RANGE_DESC>
...
</GENDER_DESC>
<GENDER_DESC name="Male">
.....
</GENDER_DESC>
</CONSUMER> STORE Dimension
<STORE>
....
</STORE> CAMPAIGN Dimension
<CAMPAIGN>
...
</CAMPAIGN> PRODUCT Dimension
<PRODUCT>
...
</PRODUCT>
</Sales_Cube>
Depth is used to filter out children-nodes below a certain level from the nodes
selected by the XPath. Depth -1 indicates no filter.
A client application can use Members Web service to query dimension members
by specifying an XPath query expression and depth.
Table 14-2 explains the input and output parameters of the Members Web
service.
For example, a client application can query the STORE dimension in Sales cube
for all its members by invoking the Members Web service with the following
input.
XPath: Sales_Cube/STORE
Depth: -1
The output shown in Figure 14-11 from the Members Web service will be all the
members in the STORE dimension for all levels in the hierarchy.
STORE Dimension
</AREA_DESC>
....
Members
</DISTRICT_DESC>
<DISTRICT_DESC name="Texas">
....
</DISTRICT_DESC>
...
</REGION_DESC>
<REGION_DESC name="East">
....
....
</REGION_DESC>
<REGION_DESC name="West">
....
....
</REGION_DESC>
...
</CHAIN_DESC>
...
</ENTERPRISE_DESC>
...
</STORE>
If a client application wants to list the top level members in the DATE dimension,
the Members Web service will be invoked with the following input. The output will
be as in Figure 14-12.
XPath: Sales_Cube/DATE
Depth: 1
Top Level
Members
(Years)
<CAL_YEAR_DESC name="1999" />
<CAL_YEAR_DESC name="2000" />
<CAL_YEAR_DESC name="2001" />
</DATE>
As you can see, the Members Web service lists only the top level for example,
CAL_YEAR_DESC members of the DATE dimension as defined by the Depth
parameter.
14.4.3 Execute
The Execute Web service retrieves an XML representation of the cube. An XML
cube contains members and measures data.
Consider a slice of the Sales Cube in Figure 14-13. The slice contains data for
the cross-section of 2 dimensions DATE and PRODUCT up to the
Month/Sub-Department level. The NULL values in the table denote the highest
level of aggregation for the specific column.
- - - - - 9000
1999 - - - - 9000
1999 First Quarter 1999 - - - 3000
1999 Fourth Quarter 1999 - - - 3000
1999 Second Quarter 1999 - - - 3000
1999 First Quarter 1999 Jan-99 - - 3000
1999 Fourth Quarter 1999 Nov-99 - ` 3000
1999 Second Quarter 1999 Jun-99 - - 3000
- - - HOMECARE - 9000
- - - HOMECARE GARDEN 480
- - - HOMECARE STATIONERY 8520
1999 - - HOMECARE 9000
1999 First Quarter 1999 - HOMECARE 3000
1999 Fourth Quarter 1999 - HOMECARE 3000
1999 Second Quarter 1999 - HOMECARE 3000
1999 First Quarter 1999 Jan-99 HOMECARE 3000
1999 Fourth Quarter 1999 Nov-99 HOMECARE 3000
1999 Second Quarter 1999 Jun-99 HOMECARE 3000
1999 - - HOMECARE GARDEN 480
1999 First Quarter 1999 - HOMECARE GARDEN 160
1999 Fourth Quarter 1999 - HOMECARE GARDEN 160
1999 Second Quarter 1999 - HOMECARE GARDEN 160
1999 First Quarter 1999 Jan-99 HOMECARE GARDEN 160
1999 Fourth Quarter 1999 Nov-99 HOMECARE GARDEN 160
1999 Second Quarter 1999 Jun-99 HOMECARE GARDEN 160
1999 - - HOMECARE STATIONERY 8520
1999 First Quarter 1999 - HOMECARE STATIONERY 2840
1999 Fourth Quarter 1999 - HOMECARE STATIONERY 2840
1999 Second Quarter 1999 - HOMECARE STATIONERY 2840
1999 First Quarter 1999 Jan-99 HOMECARE STATIONERY 2840
1999 Fourth Quarter 1999 Nov-99 HOMECARE STATIONERY 2840
1999 Second Quarter 1999 Jun-99 HOMECARE STATIONERY 2840
Each cell represents a row of data in the slice represented in Figure 14-13. The
column values are represented as attributes of the cell element. Attributes for
NULL values are absent. Member values identify a cell in a cube.
Table 14-3 explains the input and output parameters of the Execute Web service:
Input XPath XPath defines the query. It specifies the cube name,
where-clause and the aggregation level.
cube name is the name of the DB2 Cube Views
cube on which the slice and dice query will be
executed. (Only one cube name can be
specified)
where-clause filters rows in the slice, and
consequently remove cell XML elements in the
XML cube
aggregation level defines the level of
aggregation of data returned and allows
identification of the dimensions and levels that
should be retrieved in an XML Cube
Input Measures List of measures.
Output XML XML document containing the slice of the cube.
document
The output from the Execute Web service will have the following attributes for
each cell as defined by the aggregation level in addition to the measure:
DEPARTMENT_DESC
SUB_DEPT_DESC
The cells are filtered from Figure 14-13 based on the where-clause. As a result,
the output will only have 3 cells as in Example 14-2.
14.5 Conclusion
Web services for DB2 Cube Views presents a new opportunity for Web
services-based analytical applications running on any device or platform using
any programming language to access OLAP metadata and data.
Part 4 Appendixes
It also describes in detail how to design and run a DataStage job that will be used
to populate our sales model datamart.
Appendix A. DataStage: operational process metadata configuration and DataStage job example 641
Figure A-2 Project properties dialog
3. Next, we must ensure that the Process MetaBroker configuration file has the
correct startup parameters for our environment. To do this, navigate to the
installation directory of the Process MetaBroker on the server machine and
open the file processmb.cfg. By default on Windows, the directory is:
C:\Program Files\Ascential\MetaStage\Process MetaBroker\processmb.cfg
All default settings for the Process MetaBroker are acceptable, however, you
may choose to reconfigure the variables shown in Table A-2.
If you change any variable in the Process MetaBroker configuration file you
must stop and start the Process MetaBroker. To do this in the example
Windows environment, open the Services Manager via the Windows Control
Panel as shown in Figure A-3 to stop and start the Process MetaBroker so
that changed variables will take effect.
The DataStage Server and Process MetaBroker have now been configured to
produce process metadata when jobs contained by the project you selected in
Figure 38 run.
Now that the server machine has been configured to produce process metadata,
the client must be configured to accept the process metadata that the Process
MetaBroker receives from the running DataStage job.
Appendix A. DataStage: operational process metadata configuration and DataStage job example 643
Configure the client
There are three steps involved in configuring the client to accept process
metadata from the Process MetaBroker:
1. Configure the Listener to accept DataStage job run XML files
2. Create a MetaStage Directory
3. Configure RunImport to import the DataStage job runs
If you change any variable in the Listener configuration file you must stop and
start the MetaStage Listener using Services Manager under Windows Control
Panel so that changed variables will take effect.
2. Secondly, if you have not done so already, you must create a MetaStage
Directory (Directory) so that the RunImport configuration file can reference
the Directory name.
a. Run the MetaStage Directory Administrator. You will see the Directory
Administrator dialog shown in Figure A-4.
b. Click New. You will now be asked to select a data source in which to create
the MetaStage Directory shown in Figure A-5.
Appendix A. DataStage: operational process metadata configuration and DataStage job example 645
Choose the Machine Data Source tab and either create a new data
source name (DSN) or create a new DSN. Before you click OK, make note
of the DSN you chose or created. This name will become the name of your
MetaStage directory. You will need to use this value later when configuring
the RunImport. You will be asked to enter any login details and then click
OK. When the Directory Administrator completes you will have an empty
MetaStage Directory to work with.
3. Finally, before the client is ready to accept process metadata, we must ensure
that the RunImport configuration file has the correct startup parameters on
our client machine. To do this, navigate to the installation directory of the
RunImport on the client machine and open the file runimport.cfg. By default
on Windows, the directory is:
C:\Program Files\Ascential\MetaStage\Listener\runimport.cfg
All default settings for the RunImport are acceptable however you may
choose to reconfigure the variables shown in Table A-4.
Now we have configured both the client and the server in our environment to be
able to produce and consume process metadata. We can move on to creating
the DataStage Server jobs that will produce our process metadata.
In this example we will discuss a sample DataStage job that will be used to load
our consumer sales model data warehouse. To build our DataStage job we will
need source and target metadata. We will get the source and target metadata
from the ERwin consumer sales model shown in Figure 8-6 on page 282 and
Figure 8-22 on page 296.
Appendix A. DataStage: operational process metadata configuration and DataStage job example 647
We will now obtain the source and target metadata. To do this we will use
MetaStage as the metadata integration hub and Directory. We will first import the
ERwin consumer sales model into MetaStage and then export the metadata for
sources and targets to DataStage so that we can used the metadata definitions
to build a DataStage job to load our data warehouse:
1. Start MetaStage Explorer.
In the Attach dialog shown in Figure A-6, make sure that you choose the
Directory name created or selected Figure A-5. In the case of the example it
is msrepos. Click Current to open the current version of the Directory.
MetaStage allows you to connect to different versions of the Directory, but
since this is a new directory and no imports have been done, simply choose
Current.
2. When you open MetaStage you will see the screen shown in Figure A-7, but
you will not have an ERwin Import Category yet.
3. Now we have a container for our ERwin metadata, we can import the
metadata objects into MetaStage. Right-click the Import Category
ERwin_SalesModel and choose Import->New as shown in Figure A-9.
Appendix A. DataStage: operational process metadata configuration and DataStage job example 649
4. After choosing to perform a new import, you will be asked to make an Import
Selection as shown in Figure A-10. Choose CA ERwin v4.0 from the Source
MetaBroker drop down list.
Appendix A. DataStage: operational process metadata configuration and DataStage job example 651
Figure A-12 ERwin saved as XML
Appendix A. DataStage: operational process metadata configuration and DataStage job example 653
8. We will create a new User-defined category called ERwin_SalesModel to
match our Import category as shown in Figure A-15.
b. We now see the Select Category dialog shown in Figure A-17. Select
ERwin_SalesModel and click OK. The ERwin design metadata will now be
inserted into our user-defined category.
Appendix A. DataStage: operational process metadata configuration and DataStage job example 655
d. To export the ERwin metadata to DataStage we will subscribe to the
objects. Right-click the ERwin_SalesModel Publication category and
choose Subscribe to on the context menu shown in Figure A-19.
e. The Subscription wizard will be shown. Click Next and the New
Subscription dialog will be shown as in Figure A-20. Choose Ascential
DataStage v7 and click Next.
h. Next we see the DataStage client login screen shown in Figure A-23. Here
is where we will select the destination DataStage project that will receive
the ERwin design metadata and which host the server resides. As can be
seen from Figure A-23, we chose the host wb-arjuna which was the
Appendix A. DataStage: operational process metadata configuration and DataStage job example 657
ERwin import parameter we chose in Figure A-11 on page 651. For our
example we will export all the metadata definitions to the Project p0. In this
example we are not required to enter any login information to DataStage.
Clicking OK will run the export to DataStage.
When the export completes we can open the DataStage Manager as shown
in Figure A-24. If we navigate to the Table Definitions Folder and open
DataStage7_MetaBroker->STAR we will see that our ERwin tables for the
consumer sales model are now in DataStage.
Similarly, Figure A-26 shows the DataStage job that we will use to load the
consumer sales fact table. Both DataStage jobs shown in Figure A-25 and
Figure A-26 were built using the ERwin design metadata imported into
DataStage.
Appendix A. DataStage: operational process metadata configuration and DataStage job example 659
Figure A-26 DataStage Designer: load fact
Figure A-27 shows that we used the ERwin metadata exported by the DataStage
MetaBroker to load the source columns for the file access. A similar operation
was performed for the target side of each DataStage job.
We will now perform the steps required to import the process metadata so it is
ready for data lineage and process analysis queries as described in 8.2.4,
“Performing data lineage and process analysis in MetaStage” on page 308.
Table B-1 lists all of the queries without the use of an MQT and when there was a
drill through query type MQT available.
Setup questions
Q: What version of DB2?
A: DB2UDB v8.1 FP2+
Q: What edition of DB2?
A: ESE, DB2 Warehouse Edition.
DB2 Warehouse Enterprise Edition includes DB2 UDB V8.1 Enterprise
Server Edition (ESE).
DB2 Warehouse Standard Edition includes DB2 UDB V8.1 Workgroup Server
Unlimited Edition
OLAP Center
Q: OLAP Center won’t start, or can’t connect to DB2.
A: This may happen, though very rare. Increase the value of the application
heap size (APPLHEAPSZ) parameter in the database configuration file.
How do I … ?
– Q: Delete a set of objects?
A: You cannot perform this action at present. You will have to delete
objects one by one.
– Q: Delete all objects?
A: Use db2mdapiclient, which provides the ability to delete all the objects
at a time.
– Q: Create complex measures?
A: Use the Aggregration Script Builder.
Q: Is there a Tutorial?
A: No, but there is online help and info pops.
This appendix contains a brief introduction to the following topics in the DB2
Cube Views API:
API architecture overview
Purposes and functionality of the API
Stored procedure interface
API operations
Error handling and tracing
db2mdapiclient utility
Figure D-1 shows a diagrammatic representation of the DB2 Cube Views API
and how metadata is exchanged through the API.
Client Server
DB2 D atabase
S ystem Catalog
Application B tables
db2mdapiclient
M etadata
SQ L Q ueries
Application C Relational tables
Eg. Office Connect,QMF & data
This stored procedure accepts input and output parameters in which you can
express complex metadata and metadata operations. Applications A & B push
metadata (by creating and manipulating) to the DB2 catalog and also pull
metadata from the DB2 Catalog. Application C just pulls metadata from DB2
Cube Views. Flow of metadata in all these cases is through the stored procedure
interface.
Metadata operations
Retrieval
Input Describe
Output
Modification
Request Response
operation operation status
description operation results
Create,Drop,
Alter,Import,Rename
Application Metadata
metadata Metadata objects
Metadata objects
Administration
Validate
Syntax:
call md_message (request, metadata, response)
Prototype:
md_message (request IN CLOB(1M),
metadata INOUT CLOB(1M),
response OUT CLOB(1M))
Remarks:
- Request and response parameters are mandatory
- Metadata parameter is optional
- XML parameters exchanged using Character Large Object (CLOB) structures
- "CALL" SQL statement invokes the stored procedure
Tracing
The API supports three priorities of tracing (low, medium and high). Using the
configuration file, an administrator can set the level of tracing to log to file.
Runtime tracing is turned off by default, and the trace file name is
db2mdtrace.log.
When tracing is turned on, with the level set to a value other than none, errors
that occur in the API might be recorded in both the error log and the trace log,
depending on the level and severity setting for these logs.
Error logging
The API distinguishes between three severities of errors (low, medium and high).
The default severity setting is medium, and the error log file name is
db2mderror.log. When an error occurs while reading the configuration file, this
error is logged in a file named db2mdapi.log.
When the API is configured to high or medium error logging, and a high or
medium error occurs, the API generates a callstack beginning at the point where
the error occurs in the API. This callstack is similar to a medium-level trace, but
the data is sent to the error log instead of the trace log.
Note: The default location of the trace and error logs is ..\sqllib\db2 directory
on Windows ( ../sqllib/db2dump directory on AIX).
Location
The source code, db2mdapiclient.cpp is located in \SQLLIB\samples\olap\client\
directory on Windows (/home/db2inst1/sqllib/samples/olap/client on AIX).
Tasks
You can use the db2mdapiclient utility to perform any of the operations that are
supported by the DB2 Cube Views stored procedure, md_message() as
described in Table D-1
Usage
The db2mdapiclient utility uses files to hold the XML that is passed to and
received from the md_message() stored procedure (see Figure D-4).
m d_ m e ss a ge ()
d b2 m da piclie nt Stored proce dure
AP I
M eta data in
DB 2 S ytem
cata log
To see a list of parameters for the db2mdapiclient command, you can enter
‘db2mdapiclient’ at a command line (on both Windows and AIX) as shown in
Figure D-5.
USAGE:
db2mdapiclient [OPTIONS]
Options can be specified in any order
REQUIRED OPTIONS:
-d or --database database name
-i or --inputoperation input operation file name
-o or --outputoperation output operation file name
OTHER OPTIONS:
-u or --userid userid for database
-p or --password password for database
-m or --inputmetadata input metadata file name. Required for
operations such as "create" & "import"
-n or --outputmetadata output metadata file name. If output
metadata file is not specified & there
is metadata returned by stored procedure
then output metadata will be written to
outputmetadata.xml
-a or --parameterbuffersize parameter buffer size, defaults to
1000000 bytes
-b or --metadatabuffersize metadata buffer size, defaults to
1000000 bytes
-v or --verbose print extra information while
processing
-h or --help this usage text
To import DB2 Cube Views metadata for a database (say SAMPLE), change to
the ..\SQLLIB\samples\olap\xml\input directory (on Windows) and enter the
command shown in Example D-3
To export DB2 Cube Views metadata for a database (say SAMPLE), change to
the ..\SQLLIB\samples\olap\xml\input directory (on Windows) and enter the
command shown in Example D-4
To validate DB2 Cube Views metadata for a database (say SAMPLE), change to
the ..\SQLLIB\samples\olap\xml\input directory (on Windows) and enter the
command shown in Example D-5
The default structure of the validate.xml allows validation all metadata objects in
the DB2 catalog for optimization (which is, checking for conformance to base
rules, cube completeness rules and optimization rules).
We used one AIX machine (GREENLAND) and two Windows 2000 machines
(HELIUM and GALLIUM) but the results provided in this redbook are the results
of our testing on non-optimized configurations.
Profit Profit
FROM
"STAR"."CONSUMER_SALES" AS T1,
"STAR"."CAMPAIGN" AS T2,
"STAR"."CONSUMER" AS T3,
"STAR"."DATE" AS T4,
"STAR"."PRODUCT" AS T5,
"STAR"."STORE" AS T6
WHERE
T1."COMPONENT_ID"=T2."IDENT_KEY" AND
T1."CONSUMER_KEY"=T3."IDENT_KEY" AND
T1."DATE_KEY"=T4."IDENT_KEY" AND
T1."ITEM_KEY"=T5."IDENT_KEY" AND
T1."STORE_ID"=T6."IDENT_KEY"
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this redbook.
IBM Redbooks
For information on ordering these publications, see “How to get IBM Redbooks”
on page 699. Note that some of the documents referenced here may be available
in softcopy only.
Getting Started on Integrating Your Information, SG24-6892
Integrating XML with DB2 XML Extender and DB2 Text Extender, SG24-6130
DB2 UDB’s High Function Business Intelligence in e-business, SG24-6546
DB2 OLAP Server, Theory and Practices, SG24-6138-00
DB2 OLAP Server V8.1: Using Advanced Functions, SG24-6599
Data Modeling Techniques for Data Warehousing, SG24-2238-00
Up and Running with DB2 UDB ESE Partitioning for Performance in an
e-Business Intelligence World, SG24-6917-00
Other publications
These publications are also relevant as further information sources:
IBM DB2 Cube Views Setup and User’s Guide, SC18-7298
Bridge for Integration Server User’s Guide, SC18-7300
IBM DB2 Cube Views Business Modeling Scenarios Manual, SC18-7803
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
(Second Edition) by Ralph Kimball, April 2002, ISBN 0471-200247
Index 703
Cube wizard 80, 114 schema 65
cubes 23, 228, 240 set up 86
number 150 datamart 65, 126, 309
slice and dice queries 619 DataStage 309
sorts 252 Administrator 640
XML descriptions 619 DataStore 309
customized report 234, 244 design 314
CWM 393 Designer 640, 660
ObjectModel package 397 DSN 309
RDB package 397 File 309
CWM file 398, 403 import category 314
CWM XMI 412 job 309–311, 643
CWM XMI file 396, 402 run 318, 639
status 310
job design 314
D job run 324
data
login 316
change 295
Manager 640, 658
history 308
operation 639
models 281
project 314, 640
movement 331
runs 324
schemas 64
Server 313, 320
study 85
TableDefinition 309
data lineage 31–33, 278, 281, 309, 314, 316, 647
view 311
path 309, 323
DataStage Director 318
query 309–310
DataStage project
run 321
client
data lineage analysis 639
configuration 644
data load 420
configuration 640
performance 426
server
time 451
configuration 640
data model 4
DataStage Server
definitions 278
jobs 647
data model enrichment 33–34
DB2
data modelers 278, 308
common classes 82
data modeling 278
create tables 343
data modeling tool 23, 31, 33–34, 295
create the tables 370
data retrieval 227
DB2 access plan 456
performance 64
DB2 access plan graph 480
data sampling 16
DB2 catalog 63, 81
data sources 87, 295, 310, 312, 322, 493
DB2 Configuration Assistant 226
data transformation 399
DB2 connection 317
data warehouse 65, 126, 276
DB2 constraints 154
update 312
DB2 Control Center 82
Data Warehouse Center
DB2 Cube Views 414
save the model 402
API 15
database
error logging 679
catalogs 63
export 682
connect 86, 88, 160
Import 682
Index 705
sparse 420 MetaBroker 281, 283
tables 4, 65 process 283
Time 103, 434–435 save the model 357
type 38, 103 SQL DDL 356
dimension type 100 star schema 353
time 38 XML export 282
dimensional XML export file 281
schemas 64 ERwin MetaBroker 653
dimensional model 32, 87 parameters 651
dimensional modeling 85 ERwin v4.x
dimensionality 42, 45 import 351
dimensions 9, 22, 29, 32, 36, 63, 127 save the model 343
describe 102 SQL DDL 342
hierarchies 15 star schema 340
higher level 9 ERX 412
identify specific attributes 102 ERX file 357, 363
lowest level 9 Essbase 227
order 453 ETL 278, 330, 399
select the existing attributes 102 ETL tools 4, 8, 19, 23, 29, 31, 87, 413–414
sequence 434–435 event 311, 326
Directory 301, 648 class 310
directory date 310
persistent 328 time 310
disk consumption 128 event metadata 639
disk space limit 478 Event’s Message attribute 311
DOLAP 7–8, 11 events 309–310, 313
drag and drop 235, 238, 256 Excel 226, 240
drill down 22, 26, 46, 56, 84, 86, 160, 234, 236, 240, Excel worksheet 233
244, 260, 464–468 executables 311
time 246 Explain tables 173
drill through 11, 26, 53, 84, 86, 160, 263, 459, 477, EXPLAIN.DDL 183
479, 502 export 9, 39, 233, 284, 294, 421, 429, 440, 452
drill through query 505 eXtended Markup Language 83
drill through report 450 eXtensible Markup Language 616
cost timerons 509 extract 26, 43, 84, 86, 160, 450
SQL 506 MOLAP 44
drill up 237, 261
Drill-Down Path 498
dynamic calc 453
F
fact 29, 282
fact table 4, 65
E granularity 41
end-user facts 32, 70, 127
analytics requirements 119 Facts wizard 99
Enterprise Information Portals 545 failed runs 311
error code 327 failure 310–311
error message 252 filter 236, 257
ERwin 278, 280–281, 304, 412 Find Executables 311
ERX file import 364 Find Runs 310
Index 707
integration 58 ISBridge.bat 429
informational constraints 136, 138
inner join 141–142
inspect 311
J
J.D. Edwards 312
integration 280, 327
Java 82, 280, 331
API 280
JMS 312
file format 280
job 310
Integration Server
relationship 325
adding back objects 435, 438
job design 309, 324
bridge
jobs 310, 318
log file 441
compiled 325
calculated measures 435–436
run 647
data load
joins 5, 22, 36, 63, 77, 97, 102, 450
SQL 455
attributes 141
drill through report 449, 477–478
auto-detection 92, 104
template SQL 479
cardinality 141, 448
duplicate names 423
creation 105
dynamic calc 439
inner 448
mapping 424, 431
JSP-based 278
measure hierarchy 437
measures 444
metadata 418 L
metaoutline 421, 424–425, 429, 437–438, 440 Layout Designer 255
model 419, 421, 424–425, 429, 433, 438, 440 Measures 256
naming 435 online mode 256
related attributes 435–436 Side Dimensions 256
rename 435 Top Dimensions 256
results review 434 left attribute 76
scenario 477 level of aggregation 450
storage settings 439 levels 75
transformation rule 437 limit 269
two-pass calculation 439 link 324, 327
Integration Server Bridge 417, 421 Link object 324
Integration Server bridge links 309
alternate hierarchies 444, 447 Listener 313, 318–319, 644, 647
attribute relationship 436 Listener configuration file 644
benefits 449 listener.cfg 644
column names 436 LOAD
formula 438 ALLOW READ ACCESS 201
hidden columns 444, 447 Locator 316
launch 429 Locator path 317
log 431 Locator table 317, 647
mapping 444 log 202, 311, 321, 441
naming 444–445 logical model 344
time dimension 444, 447 lookup table 594
window 430 lookups 309
interchange 281 lowest level 44
isalog.xml 441, 447
ISBridge 448
Index 709
move 123 Import Category 302
movement 330 Object Connector 300
operational 314 objects 302
configuration 660 Operate 641
parse 495 Path Viewer 310–311
publishing 278 Process
pull 19 MetaBroker 309
push 19 query 278
repository manager 413 repository 278, 301
sharing 32 Run 313
store 9, 88 RunContext 313
structured 8 RunImport 646
subscribing 294 subscription 294
synchronizing 188, 328 subscription wizard 287
technical 278 MetaStage Class Browser 317
tools 276 MetaStage Directory 321, 644, 647
update 36 MetaStage Explorer 317, 320–321, 324, 648
versions 413 MicroStrategy xxxi
metadata catalog tables queries performance 597
create 86 query examples 597
metadata management tools 31 query reports
metadata model business cases 597
import 79 MQTs 597
metadata object query performance results 602
rules 118 SQL View option 600
metadata repository 413 timerons 597
MetaIntegration metadata bridge xxxi MicroStrategy Administrator 587
MetaIntegration Technologies xxxi MicroStrategy Desktop 587, 596
meta-metamodel 393 MicroStrategy Import Assistant 587
metamodel 31, 393 associated relationships 595
metaoutline 420, 424, 435, 452 asymmetric measures 596
export 425 attribute relationships 595
name 444 attributes 594
suffix 453 best practices 594
MetaStage xxxi, 271 Cube Reader 587
Activity 313 Cube Translator 587
classes 310 database instance 591
connected objects 301 descriptive relationships 595
contained objects 301 dimension 595
Data Collection 309 hierarchies 595
data collections 316 implementation steps 588
data items 316 import 591
Data Schema 309 input parameters 590
Data Store 309 installation 589
Directory 313, 646 join 594
Directory Administrator 644 log file 589
Event 313 mapping 592
File 309 measure 595
import 284 process log file 591
Index 711
production 170 multidimensionality xxxi
query rewrite 184 multiple joins 67
query type specification 186
query workload 181, 188
referential integrity 136
N
names
refresh 45, 204
changes 446
implementation 208
collisions 445
schedule 45
prefixed 446
REFRESH DEFERRED 132, 134, 164
suffix 446–447
REFRESH IMMEDIATE 132, 134, 164, 198,
uniqueness 445
200
navigate 64, 236, 304, 432
requirements 201
network hierarchy 75
REFRESH INCREMENTAL 133, 198
non-DB2 data sources 426
frequency 202
non-nullable
limitations 214
foreign keys 140
requirements 202
normalization 66
refreshing 134
normalized 64
reorganizing 134, 170
NULLS 140
reroute 151, 181
number of joins 64
RUNSTATS 183, 215
number of rows 323, 450
Set Refresh Age 199
read 310
sharing 45
written 310
SORTHEAP 216
space estimation 216
SQL script O
create 160 Object Creation wizards 83
SQL scripts Object Explorer 254
run 160 Object Management Group 393
SQL statements run 172 ODBC 226, 230, 240, 316
staging table 215 Oerational Data Store 295
STATEMENT HEAP 216 Office Connect 9, 225
summary table update 134 access plan 245
synchronization 45, 132, 134, 200 Add-in 229
System Maintained 132 DB2 Explain 241–242
tailored 51 SQL trace 242
TEMP space 217 trace 241
tuning 215 Office Connect tool bar 235
User Maintained 132 Office2000 226
MstrDb2Import.exe 589 OLAP 3, 5, 126
multidimensional application developers 614
data models 4 OLAP Center 15, 22, 29, 36, 75, 81, 157, 264, 289,
model 5 295, 297
multidimensional data 11 architecture 83
describe 263 export 428
manipulate 263 GUI 84
visualize 263 Import wizard 442
multidimensional metadata management 63 results 443
Multidimensional OLAP 6 SQL expression builder 98
multidimensional structure 13 OLAP cube 80
Index 713
performance enhancements 449 variables 642
performance optimization 39 process metadata 318, 640
physical data 282 process runs 309
physical outline 419 processes 278
physical storage 280 processmb.cfg 642
pivot 262, 268 Project Manager 227, 230, 240
tables 235 protocols 59
Pivot Chart 239 public data 277
Pivot Table Field List. 234 publisher 279
pivot table services 226–227 pull 88, 222
PivotTable layout wizard 238 push 423
PMB 647
populate
the multidimensional schema 87
Q
QMF
portals 59
OLAP Query 266
PowerCenter
Filter 257
save target datamart 409
object 266
PowerCube level security 501
OLAP Query wizard 251, 254
PowerCubes 489, 498
QMF for Windows xxxi, 9
PowerDesigner 412
Administrator 251
Conceptual Data Models 365
control tables 251
import 377
drill down 257
Physical Data Models 365
error message 252
save the model 370
example 265
shortcut 371
filter options 257
SQL DDL 369
form 250
star schema PDM 366
formatting 260
primary key 33, 65, 67, 85, 92, 448
hierarchy levels 255
Process
job 250
MetaBroker 310
Layout Designer 254, 256
process
Layout tree control 256
analysis 310
list 250
design 278
maintenance 268
failure 309
map 250
history 310
Maximum Bytes to Fetch 269
import 314
Maximum Rows to Fetch 269
metadata 310, 312, 314
Object Explorer 254
runtime 278
OLAP query 251
success 309
OLAP Query wizard 251
process analysis 309, 314, 316, 647
procedure 250
paths 310
Prompted Query 257
process flow 423
query 250
Process MetaBroker 316, 318, 640, 647
Query Results View 254, 256
configuration file 642
Resource Limits 269
EventsDirectory 643
rollup 257
ListenerPort 643
sort
ListenerServer 643
model 252
LogDirectory 642, 644
schema 252
Port 642, 644
Index 715
scenarios 281 storage 127
scheduler 320 super aggregates 27
schemas 63 super-aggregate operators 10, 16
scratch 36, 45, 423 swap rows and columns 236, 238
script 38 Sybase 412
semantic 64 synchronization 451
semantic translation 279
service requestors 620
SET INTEGRITY 133, 185, 213
T
table definitions 316
Siebel 312
table scan 64
slice 5–6, 9, 11, 15, 22, 40, 44, 46, 80, 459
TableDefinition objects 318
slice and dice 262
tables 32–33
snowflake 64, 66, 77
TABLESAMPLE 16, 162
snowflake schema 64, 79, 138, 282, 594
target 309
constraints 143
target table 309
SOAP 615
tasks 84
SOAP request 619
TCO 583
Software Executable
Technology Preview 56
class 310
Time 444
source system 309
tools 296
source table 309
Total Cost of Ownership 583
sparse 434, 453
transformations 309–310, 313
sparsity 26
transformer derivation 647
spreadsheet 225, 233, 460, 479
Transformer files 487
add-in 226
Transformer Modeling environment 516
SQL 280
trends 22
SQL expression 71
two-way bridge 223, 421
SQL queries 40, 227, 240–241, 450
aggregation 24
joins 24 U
SQLDebug 241 UDDI 615
staging database 295 UDDI registries 619
standard 65, 395 UML 381, 393
star schema 22, 32, 64–65, 79, 81, 84–85, 92, 127, UML class 278
138, 282, 498, 659 UML XMI 339
subset 41 unbalanced hierarchy 74
start-up 35 Unified Modeling Language 393
STDDEV 129 uniqueness 453
stddev 38 Universal metadata bridge xxxi
storage 12 unusual occurrences 310
storage requirement 11 user-defined 277
stored procedure 222, 227, 252 user-defined functions 71
subscribe utf-8 397
data model definitions 278
subscriber 279
SUM 38, 45, 69, 129, 134, 151, 201–202
V
VALIDATE 222
summaries 10 vendors 223
summary tables 12, 126 views 162
model-based xxxi
X
XMI 278, 393
XMI compliant 332
XML 19, 56, 83, 222, 250, 282, 312, 319, 412, 615,
623–624
XML element 628
XML file 32, 87, 222, 313, 344, 422, 424, 440, 442,
Index 717
718 DB2 Cube Views: A Primer
DB2 Cube Views
A Primer
(1.0” spine)
0.875”<->1.498”
460 <-> 788 pages
Back cover ®