Professional Documents
Culture Documents
White Paper
020607dmxwpADM
Page 1
analysis. ADM provides a solution. It speeds data-intensive joins and aggregations, making it possible to perform them much more frequently. ADM high-speed aggregations and joins exploit patented algorithms, parallel processing technology, and dynamic optimization to accelerate applications and reduce computing resources. And these savings increase dramatically as data volumes grow. The following shows a job built from join and aggregate tasks in the DMExpress management interface.
Join
Aggregate
Figure 1. Using DMExpress ADM High-Speed Join and Aggregation to Generate a Product Sales Report.
Page 2
Aggregate operations yield the greatest performance benefits when they are used for input to data analysis; for example, to aggregate daily sales data for each quarter and each sales region. If these queries run at consistent times and if the aggregation application is fast enough, then it is often possible to generate the aggregations and store them in tables within the database before the query is run. The query can then use these aggregate tables to compute the results, thus avoiding the processing necessary to create the aggregations. In the case of a six million row query, reducing the number of rows read by creating aggregations across dimensions can vastly accelerate processing time. A query answered from base-level data can take hours and involve millions of data records and millions of calculations. With pre-calculated aggregates, the same query can be answered in seconds with just a few records and calculations.
Multi-level hierarchal aggregations (Figure 2) can be efficiently performed with ADM. Using the aggregate functionality and implementing a processing scheme that involves working with smaller and smaller datasets, ADM can perform hierarchical aggregation and then use DMExpress merge functionality to bring the hierarchically-aggregated data together to create a final sales report.
Page 3
Aggregations can also be used to replace fact data with rolled up versions of themselves. For example, you can use this approach to create a monthly product-sales-by-store summary of an original fact table. All of the daily sales records are aggregated into monthly records, which reduces the size of the fact table. Yet dimensional analysis by time at the month, quarter, and year levels can still be performed. ADM can also perform multi-level hierarchal aggregation to help build cubes for more advanced dimensional data analyses.
Using High-Performance Joins for Changed Data Capture (CDC) and Other Preprocessing
To reduce the time needed to retrieve information, data must be preprocessed into the proper form for the dimensional data warehouse. High-speed joins are critical to this process, which can include lookups of legacy values for appropriate replacement, cleansing and validating data, identifying and eliminating nonmatching or unmatching values, and pre-aggregation. Using ADM high-performance join for data-intensive operations, descriptive information can be combined with factual data late in a processing sequence so that storage and throughput requirements are minimized. CDC is an increasingly important pre-processing function. By loading only new, updated, and deleted records into the data warehouse, CDC significantly conserves time and resources when carrying out data-intensive processes (Figure 3).
Page 4
Rather than replacing the information in the data warehouse with the data in the entire online transactional database, a join will match the primary key of the previously loaded record with its corresponding new record and then compare the data portions of the two records to determine if theyve changed. In this way, only added, deleted, and altered records are updated, which significantly reduces elapsed time of database loads. By using a high-performance join for CDC, data warehouse updates can be performed with far greater efficiency.
Case Studies Demonstrate Elapsed Time Savings Using DMExpress ADM High-Performance Aggregations and Joins
Because of the significant benefits they provide, aggregations and joins are widely used in nearly every industrial sector where large volumes of data must be analyzed including the banking, pharmaceutical, retail, and telecommunications industries. Below are two examples from the financial industry where ADM highperformance aggregations and joins were used to significantly accelerate data processing.
Figure 4. Investment Bank Uses DMExpress with ADM to Reduce Data Processing Time by 90%.
Page 5
Figure 5. Bank Cuts Processing Time by 33% Using DMExpress with ADM
Page 6
Figure 6. DMExpress Speed is Utilized at Many Points in the Data Management Structure.
Page 7
DMExpress leverages nearly 40 years of research and development in high-performance software. It is needed wherever you have data-intensive applications. It speeds processes like ETL, staging data for a data warehouse, and database loading by up to 90%. The ultimate business benefit of DMExpress is twofold:
It provides rapid availability of data for critical operations such as BI (including CRM, ERP, SCM, and CPM), data marts, data mining, web log processing, online data stores, and OLTP systems. Its speed and efficiency minimize resource consumption, making it possible to consolidate hardware for significant cost savings.
For more information about DMExpress ADM and high-performance aggregations and joins, contact DMExpress sales at 201-930-8200 or visit http://www.syncsort.com/products/dmx/features/adm.htm.
Page 8
Syncsort Incorporated, 2007 All rights reserved. DMExpress is a trademark of Syncsort Incorporated. All other company and product names used herein may be the trademarks of their respective companies.