You are on page 1of 15

Change Data Capture Next Generation ETL

Moving less data, faster Attunity CDC makes ETL efficient

an Attunity White Paper

Attunity Change Data Capture (CDC) delivers up-to-the-minute data and dramatically
reduced resource consumption when used as part of ETL and data synchronization
processes. For enterprises that need to use data stored in mainframe and legacy
data sources as part of their ETL/BI initiatives, Attunity offers a robust and flexible
solution for moving the right enterprise data to the right place at the right time.

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

Page 1 of 15

Change Data Capture next generation ETL


Attunity Connect Product White Paper May 2004
Attunity Ltd. follows a policy of continuous development and reserves the right to alter, without prior notice, the
specifications and descriptions outlined in this document. No part of this document shall be deemed to be part of any
contract or warranty. Attunity Ltd. retains the sole proprietary rights to all information contained in this document. No part
of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,
electronic, mechanical, photocopy, recording, or otherwise, without prior written permission of Attunity Ltd. or its duly
appointed authorized representatives.
Copyright 2004 Attunity Ltd. All rights reserved.
Attunity, the Attunity logo, Application Adapter Framework, Attunity AAF, Attunity Connect, are trademarks of Attunity Ltd. All other
marks are the property of their respective owners.

Page 2 of 15

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

Table of Contents

THE CASE FOR CHANGE DATA CAPTURE EXECUTIVE SUMMARY.........................................................4


Changing Business conditions demand a new Solutions ..................................................................................4
Change data capture meets business demands ................................................................................................5
Dont forget your mainframe.................................................................................................................................5
Attunity - serving operational data for ETL and BI applications.........................................................................5
CHANGE DATA CAPTURE AND ETL - SOLUTION PATTERNS AND COMPONENTS .................................6
CDC Solution Components ..................................................................................................................................6
Scenario 1 - Batch Oriented CDC (pull CDC).....................................................................................................7
Scenario 2 - Live/Real-time CDC (push CDC) ...................................................................................................7
ATTUNITY CONNECT FOR CHANGE DATA CAPTURE....................................................................................8
Product Architecture and Modules ......................................................................................................................8
Setting up CDC using the Attunity Studio ........................................................................................................ 10
Attunity Change Capture Agents ...................................................................................................................... 11
VSAM-CISC CDC (Mainframe)..................................................................................................................... 11
VSAM-Batch CDC (Mainframe) .................................................................... Error! Bookmark not defined.
DB2-Journal CDC (Mainframe) .................................................................................................................... 11
DB400-Journal CDC (iSeries)....................................................................................................................... 11
Adabas CDC (Mainframe)............................................................................................................................. 11
Query-based CDC (all platforms, all data sources) .................................................................................... 11
Using Attunity and BI/ETL Tools....................................................................................................................... 12
CASE STUDY REAL-TIME VSAM CDC AT STATE HEALTHCARE ........................................................... 13
The Challenge .................................................................................................................................................... 13
The Solution ....................................................................................................................................................... 13
High Performance and Throughput .................................................................................................................. 14
ABOUT ATTUNITY ............................................................................................................................................... 15

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

Page 3 of 15

The Case for Change Data Capture Executive Summary


Business Intelligence (BI) is at the heart of the best global organizations, enabling them to understand
business trends, improve decisions, and support day-to-day operations. ETL (extract, transform, and load) is
the process that enterprises use in order to build the consolidated data stores (e.g., Data Warehouses and
Data Marts) required for effective BI. Traditionally, ETL processes have been run periodically, on a monthly
or weekly basis, and use a bulk approach that moves and integrates the entire data-set from the source
system to the target data warehouse. While this approach was acceptable for enterprises over the years (it
was the only option available in most cases), current business conditions require a new way of integrating
data using Change Data Capture.

Changing Business conditions demand a new Solutions

Business Globalization and 24/7 operations. In the past, enterprises could stop online systems
during the night or weekend, to provide a window of time for running bulk ETL processes.
Nowadays, running a global business with 24x7 operations, means smaller or no downtime
windows.

Need for up-to-date, current data. Customer demand, competitive pressure, and improved decisions,
require more up-to-date data. To make the most of BI in todays ever-accelerating business climate,
managers should not be working with last weeks data. Today, decision makers need data that is
updated a few times a day, or even in real-time.

Data volumes are increasing. The more the business grows, the bigger the data volumes in
operational data stores become. Larger data volumes mean increased CPU and network resources
when performing a bulk ETL process, while the bulk extract windows are getting smaller over time.

Cost reduction. Bulk ETL operations are costly and inefficient, as they require more processing
power, more memory, and more network bandwidth. In addition, as bulk ETL processes run for long
periods of time, they also require more administration and IT resource to manage.

To stay ahead of these changing business conditions and increase the value of BI implementations, a
generation of intelligent ETL is required. The power behind it is Change Data Capture:
Change Data Capture (CDC) is an innovative approach to data integration, which is based
on the identification, capture, and delivery of the changes made to enterprise data sources.
Page 4 of 15

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

Change data capture meets business demands

No downtime window for ETL. CDC enables organizations to move changes to data while the
operational systems are running, without the need for a downtime window.

Current, up-to-date data. By constantly identifying changes, CDC delivers new data more
frequentlyeven real-timeproviding more current data for enterprise users.

Reduced cost. By moving only the changed data, CDC requires significantly fewer resources for
moving and transforming data. Cost is reduced in hardware, software, and human resources.

Dont forget your mainframe


BI is only as good as the data it relies on. Analysts estimate that mainframe systems still store up to 70% of
corporate business information, and mainframes still process most of the business transactions in the world.
Mainframe data sources also typically stores higher volumes of data, further increasing the need for a more
efficient approach to moving data such as Change Data Capture. In addition, popular mainframe data
sources like VSAM, which are non-relational, present additional challenges when incorporating that data into
BI solutions. BI tools expect relational data; non-relational data require data model changes and new data
structure mappings.

Attunity - serving operational data for ETL and BI applications


Attunity is a provider of data access and change data capture products that work with ETL and BI tools to
increase the value of enterprise data and enable to serve data where and when it fits best, quickly and
efficiently meet the needs of the business.
Attunity products work with mainframe and many other enterprise data sources, providing:

Change data capture, in batch or real-time (for next generation ETL processes)

High-speed bulk data extraction (for traditional ETL processes)

Standard data access (for enterprise reporting and BAM)

Federated data access (for enterprise reporting and BAM)

To learn more about the Attunity products, visit us at www.attunity.com or email info@attunity.com.
The rest of this document provides in-depth information about CDC solution patterns and how Attunity
enables enterprises to leverage CDC today for more efficient ETL and more effective BI.
Change Data Capture next generation ETL
Copyright 2004 Attunity Ltd. All rights reserved

Page 5 of 15

Change Data Capture and ETL - Solution Patterns and Components


CDC solutions are designed to maximize the efficiency of operational data extraction, minimizing resource
usage by replicating/moving only changes to the data (i.e. the deltas); and, by minimizing the latency in the
delivery of the changed data to the potential consumers. To minimize latency, CDC solutions need to:

Capture changes to the data in near-real-time or real-time manner.

Enable consumers of changed data to receive changes quickly, either by asking for the changes in
more frequently (e.g. every hour, or every 20 minutes), or by automatically sending the changes as
soon as they are identified.

CDC Solution Components

Change Capture Agents. Change capture agents are live software components that are
responsible for the identification and capture of changes to the source operational data store. These
agents also prepare the changes for delivery to the target database/application. Change capture
agents are typically built and optimized for specific data stores (i.e. monitoring a journal, or using
dedicated hooks), though generic agents exist as well. To capture changes, CDC agents typically
scrape system or database journals, use hooks or triggers or user exits, to collect changes and
notify the receiving systems.

Change Delivery Mechanisms. Change delivery mechanisms are responsible for the reliable
delivery of changed data to the change consumertypically the ETL tool or program that will
complete the ETL process (i.e. adding the transform and load steps). Change delivery mechanisms
can either use a pull model where the change consumer initiates the request to get the changes; or
a push model, where changes are pushed to the consumer as soon as they are captured.
Operational
Data Source

Data Warehouse

Hooks,
Triggers

Read Changes
(pull)

Change
Data
Capture

ETL Tool or
Program
Send Changes
(push)

Load Data

Load Data

Data Marts

Monitor Journals

Journal/Log

The above diagram provides an overview of a CDC-enabled ETL process. The following paragraphs provide
an overview of two CDC scenarios and the components that take part in the process.
Page 6 of 15

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

Scenario 1 Batch-Oriented CDC (pull CDC)


In this scenario, the ETL tool periodically requests the changes, each time receiving a batch of records that
represent all the changes that were captured since the last request cycle. Change delivery requests can be
done in low or high frequencies: twice a day, or every 15 minutes for example. For many organizations, the
preferred method of providing extracted changes is to expose them as records of a data source table. This
approach enables the ETL tool to seamlessly access the changed records using standard interfaces like
ODBC. The CDC solution needs to take care of maintaining the context of the last change delivery, and
deleting the records that were already read. Once the records are read, they can be transformed and loaded
by the ETL tool to the target data store.
This scenario is essentially identical to traditional bulk ETL, except that it captures and moves only the
changes to the data set instead of moving the entire source data store. This approach greatly reduces the
required resources and eliminates the need for a downtime window for ETL operations.
When should organizations use this approach? This batch-oriented approach is very easy to implement,
as it is conceptually so similar to traditional ETL processes. Organizations should use this method when their
requirements for latency do not exceed time periods of a few minutes to hours.

Scenario 2 - Live/Real-time CDC (push CDC)


In this scenario, which accommodates near real-time or real-time latency requirements, the change delivery
mechanism pushes the changes to the ETL tool or program as soon as possible. This is typically done by
using an event-delivery mechanism or messaging middleware. In either case, this method requires that the
ETL tool employ a listener that will constantly wait for change events, and a change publisher that the CDC
agents use to send and notify changes in real-time. Some CDC solutions use proprietary event delivery
mechanisms, and some use standard messaging middleware (e.g. MQ Series) that promote easier
interoperability.
Note that while message-oriented or event-driven integration is more common in EAI processes (i.e. using
tools like Integration Brokers), many of the leading ETL tool vendors are beginning to add this feature to their
solutions to accommodate the demands of high-end, real-time BI applications.
When should organizations use this approach? This real-time approach is most suitable to situations where
applications demand zero latency and require the most up-to-date data.
The remainder of this paper will present an overview to Attunitys change data capture solutions, and will
show how they work in the two scenarios described above.

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

Page 7 of 15

Attunity Connect for Change Data Capture


Attunity provides right-time data access and movement, enabling enterprises to extend the value of business
information stored in mainframe and legacy data sources. Attunity delivers a robust and flexible solution for
IT organizations, providing the ability to serve corporate data where and when it fits best, to quickly and
efficiently meet the needs of the business.
The Attunity Connect product family provides a unified solution for bulk data movement, change data
capture, federated data access, and direct data access. Attunity Connect supports more then 30 data
sources on more then 20 computing platforms, and provides unique capabilities for legacy, non-relational
data sources on the mainframe and other legacy platforms.

Product Architecture and Modules


Attunity Connect is integration middleware designed to enable standards-based and federated data access,
as well as change data capture from enterprise data sources. Attunity Connect provides the following key
components:

Attunity Server. The server manages the Attunity components on the data server, the client
connections, security, and load balancing.

Attunity Metadata Repository. The repository defines the data models that expose the data source
for direct access or change data capture. For non-relational data sources, Attunity enables to define
mappings to an enhanced relational data model, including import for existing metadata.

Page 8 of 15

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

Attunity Data Drivers. The data drivers provide standard access to various data sources, relational
and non-relational. The drivers enable to read/write data from operational data sources for bulk data
extraction or for enterprise reporting.

Attunity Change Capture Agents and Change Queue. The change capture agents are live
software components that continuously monitor for changes in data sources and prepare a change
queue, that may be virtual or physical. A virtual change queue is essentially a virtual layer that
reflects an existing log (e.g. CICS logstream) as a change queue, without actually copying the
change records. A physical change queue acts as a staging area and creates a copy of the change
records.

Attunity Standard Client Interfaces. The client interfaces are components that client applications
can use to perform queries using the back-end Attunity drivers or a change queues. These include
ODBC, JDBC, OLE/DB, ADO, and ADO.NET. A change reader can use any one of these interfaces.

Attunity Event Queue Services. The event queue services provide an event delivery mechanism
that uses a message queue transport to send events to event listeners. On the data server, event
publishers can publish events from change queues or from legacy applications, and have them
routed to event listeners on other platforms.

The following diagram describes the high-level Attunity Connect CDC architecture:
(* The Attunity Server and Metadata Repository are omitted for clarity)

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

Page 9 of 15

Setting up CDC using the Attunity Studio


The Attunity Studio provides a single, integrated GUI for the user to configure Change Data Capture, and is
generally based on the following steps:
1. Configure a data source. In this step, the user configures a logical data source in Attunity Connect
that represents the operational data store from which changes are to be extracted.
2. Data Source Metadata Setup. In this step, the user configures the data model for the structure of the
extracted data for the configured data source. The Studio easily leverages existing metadata (e.g.
COBOL copybooks) by providing Import Wizards that generate a corresponding relational metadata
model.
3. Configure the Change Capture Agent. In this step, the user defines the data changes to be captured
using the CDC Agent configuration wizard. This may require additional information on the location of
a journal such as the CICS logstream or DB2 Journal that maintains the history of changes to the
selected tables.
The following screen shots provide a sample of the Attunity Studio Wizards:

VSAM CDC
Wizard uses
the CICS logs

Choosing
tables for
CDC

Mapping
VSAM to a
relational
metadata
model

Page 10 of 15

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

Attunity Change Capture Agents


The Attunity Connect infrastructure is flexible and extensible, providing the ability to plug new change
capture agents as they are developed. The following paragraphs describe the change capture agents
currently available from Attunity, and those that will be available shortly:

VSAM-CICS CDC (Mainframe)


The Attunity VSAM-CICS change capture agent monitors a CICS logstream for changes in VSAM tables.
Configuring the agent is simple and done using a Wizard in the Attunity Studio. It requires the user to specify
the name of the CICS logstream and the names of the tables from which changes are to be captured. The
VSAM-CICS CDC agent is generally available.

DB2 CDC (Mainframe)


The Attunity DB2-Journal change capture agent monitors the DB2 database journals on mainframes and
captures changes made to specific tables. Configuring the agent is simple; it is done using a Wizard in the
Attunity Studio that allows users to configure the DB2 journal details and the names of the tables for which
changes are to be captured. The DB2-Journal CDC agent is generally available.

DB400 CDC (iSeries)


The Attunity DB400-Journal change capture agent monitors the DB400 database journals on iSeries
systems and captures changes made to specific tables. Configuring the agent is simple; it is done using a
Wizard in the Attunity Studio that allows users to configure the DB400 journal details and the names of the
tables for which changes are to be captured. The DB400-Journal CDC agent is generally available.

Query-based CDC (all platforms, all data sources)


The Attunity Query-based change capture agent enables the capture of changes to ANY data source
supported by Attunity Connect, including any RDBMS. The agent queries custom change-tables, which can
easily be implemented using database triggers. The Query-based CDC agent is generally available today.

Additional change capture agents coming shortly:


The following change capture agents are currently in development and will be available later this year:

VSAM-Batch (Mainframe)

IMS/DB CDC (Mainframe)

Adabas (Mainframe, Unix)

Oracle (Unix, Windows)

SQL Server (Windows)

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

Page 11 of 15

Using Attunity and BI/ETL Tools


Attunity Connect works seamlessly with, and has been deployed successfully with the following ETL tools:

Informatica PowerCenter

Ascential Data Stage

Business Objects Data Integrator

Cognos DecisionStream

Hummingbird Genio

Attunity ODBC Clients enable immediate interoperability with all of these tools to support metadata browsing,
data extraction and change data capture. By employing the batch-oriented CDC scenario described earlier, it
is easy to set up ETL processes that retrieve changes every hour or every few minutes.
In this case, implementing a CDC process is similar to implementing a traditional bulk ETL processes. To set
it up, Attunity Connect is configured to recognize the change queue as a data source, and users of the ETL
tool can simply extract the records from the change queue, referring to it as the source data store. Each time
the process is run, a batch of changed records will be returned and processed by the ETL tool. Attunity
Connect keeps track of the last changed record read, and the next time the ETL tool reads the change
queue (i.e., select * from myChanges), it receives the next batch of changes that occurred since the last
request.
In addition, Attunity takes care of normalizing non-relational data by virtually mapping it to a relational data
model. This facilitates the processing of non-relational data in ETL tools and makes it easier to transform and
load this data into a relational database as the target data store.
Real-time change data capture is supported by integrating an Attunity Event Router into any of these tools.
Attunity is planning to add support for MQSeries by the end of June 2004, which will facilitate interoperability
with ETL tools that support this popular messaging middleware.
Furthermore, Attunity products enable direct access to operational data stores, as well as federated access
to historical (DW) and operational data. These capabilities complement the BI offerings provided by the
vendors mentioned abovefurther increasing the value of Attunity Connect for the end-user.

Page 12 of 15

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

Case Study Real-time VSAM CDC at State Healthcare


In 2003, a US States (name withheld) healthcare agency deployed Attunity Connect to solve real-time bidirectional synchronization of VSAM and SQL Server data sources. The solution required near zero latency
and thus required real-time change data capture. Deployed in less than six months, the system is now in
production.

The Challenge
The State agency chose to host its HIPAA-compliant solution on a Windows platform, using SQL Server and
BizTalk Server. This presented an immediate challenge in that the States existing healthcare systems run
on a mainframe and use VSAM tables as their operational data store. To guarantee success, both systems
have to remain in sync to ensure data integrity.

The Solution
The State agency implemented an end-to-end solution based on BizTalk Server and Attunity Connects
Change Data Capture modules. The result is a new system that:

Captures VSAM data changes in CICS online mode


On the mainframe, CICS programs make changes to VSAM tables and update a log/journal in online
mode. Attunity Connects Query-based Change Capture Agent identifies and captures the changes in
the log, prepares them for delivery, and puts them into an Attunity Connect event queue.

Captures VSAM data changes in batch mode


On the mainframe, COBOL programs make changes to VSAM tables without updating logs or journals.
To capture these changes, enhanced COBOL programs make the necessary calls to activate Attunity
Connects API-based change capture agents. The agents prepare the changed data for delivery into an
Attunity Connect event queue.

Delivers VSAM changes to SQL Server


In this solution, the Attunity Event Router for BizTalk Server acts as the change listener that retrieves
changes from the Attunity Connect event queue and delivers them to BizTalk Server for processing. The
system leverages the powerful mapping capabilities of BizTalk Server, restructuring mainframe data for
the target relational tables, and Attunity Connects XML Database Adapter formats the data for SQL
Server.

Captures SQL Server data changes


On SQL Server, database triggers capture data changes and send them to Microsoft BizTalk Server via
Attunity Connect. BizTalk then applies transformations and uses the Attunity Connect event adapter to
prepare changes for delivery.

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

Page 13 of 15

Delivers SQL Server changes to VSAM tables


Attunity Connect event routers on the mainframe pick up change events and update the VSAM tables
using insert, update, or delete functions.

This diagram provides a high-level solution architecture overview:

High Performance and Throughput


The new solution accommodates huge throughput, especially in the batch mode. By fine-tuning the change
data capture processes using flexible settings, the solution today handles millions of change-messages daily,
transporting gigabytes of data.

Page 14 of 15

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

About Attunity
Attunity is a leading provider of connectivity solutions for enterprise data and legacy applications. Founded
in 1987 and traded on the NASDAQ exchange, Attunitys worldwide operations support over 1,000 direct end
users including many of the Fortune 1000. Through distribution and OEM agreements with global-class
partners such as Oracle and HP, Attunity-based solutions are deployed on tens of thousands of systems
worldwide.
The Attunity Connect product family provides standards-based access to over 30 data sources on 20
different computing platforms. Attunity Connect engines reside natively on each target platform and provide
enterprise-class integration capabilities such as real-time read/write access, federated data access between
relational and non-relational data sources, bulk data extraction and change data capture.
Attunitys products are available through direct sales and support offices (listed below) as well as distributors
in Japan, S.E. Asia, Europe, and Latin America. For more information, visit www.attunity.com or email
info@attunity.com.
Corporate Headquarters / USA
Attunity Inc.
40 Audubon Road
Wakefield, MA 01880, USA
t +1 (781) 213-5200
f +1 (781) 213-5240
sales@attunity.com

United Kingdom
Attunity (UK) Ltd.
Unit 6
Beacontree Plaza
Reading
RG2 0BS
United Kingdom
t +44(0)118 975 3330
f +44(0)118 975 3005
info-uk@attunity.com

EMEA Regional Office


Attunity Ltd.
8 Hagalim Street
POB 12227
Herzliya 46733
Israel
t +972 9 960 2626
f +972 9 960 2601
info-emea@attunity.com

France
Attunity (France) S.A.
51, Blvd. Bessires
75017 Paris
France
t +33 1 53 06 80 80
f +33 1 53 06 80 89
info-france@attunity.com

Israel
Attunity (Israel) Ltd.
8 Hagalim Street
POB 12227
Herzliya 46733
Israel
t +972 9 960 2600
f +972 9 960 2601
info-il@attunity.com

Change Data Capture next generation ETL


Copyright 2004 Attunity Ltd. All rights reserved

Peoples Republic of China


Attunity (Hong Kong) Ltd.
Room 2406, New York Life Tower
Windsor House, 311 Gloucester
Road
Causeway Bay
Hong Kong
t +(852) 2756 9233
f +(852) 2707 0622
info-hk@attunity.com
Attunity (PRC) Ltd.
8E, Tseng Chow Commercial
Mansion
1590 Yan'an Road West
Shanghai 200052
Peoples' Republic of China
t +86-21-62809691, 62809692
f +86-21-62806762
info@attunity.com.cn
Australia
Attunity Pty Limited
Suite 8A, 5 Railway Parade
Hurstville NSW 2220
Australia
t +61 2 9580-9880
f +61 2 9580-4898
info-australia@attunity.com

Page 15 of 15

You might also like