You are on page 1of 16

Ensuring Business Availability of

Information with Dual Load Solution


Introducing Informatica Dual Load Solution for Teradata

W H I T E PA P E R
This document contains Confidential, Proprietary and Trade Secret Information (“Confidential Information”) of
Informatica Corporation and may not be copied, distributed, duplicated, or otherwise reproduced in any manner
without the prior written consent of Informatica.

While every attempt has been made to ensure that the information in this document is accurate and complete, some
typographical errors or technical inaccuracies may exist. Informatica does not accept responsibility for any kind of
loss resulting from the use of information contained in this document. The information contained in this document is
subject to change without notice.

The incorporation of the product attributes discussed in these materials into any release or upgrade of any
Informatica software product—as well as the timing of any such release or upgrade—is at the sole discretion of
Informatica.

Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670; 6,339,775; 6,044,374;
6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following pending U.S. Patents: 09/644,280;
10/966,046; 10/727,700.

This edition published September 2010


White Paper

Table of Contents
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Business Availability of Information at Premium . . . . . . . . . . . . . . . . . . . . . 3

Teradata Dual Active Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Considerations for Dual Load Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Informatica Dual Load Solution for Teradata . . . . . . . . . . . . . . . . . . . . . . . 8

Customer Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10


Case 1: Dual Active/Disaster Recovery for Customer Centricity . . . . . . . . . . . . . . . . . . . 10
Case 2: Disaster Recovery as Part of Enterprise Risk and
Business Continuity Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Summary of Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Ensuring Business Availability of Information with Dual Load Solution 1


Executive Summary
Businesses are increasingly more dependent on information technology to run their operations
and compete effectively in the marketplace. The information management discipline that involves
the delivery of data from source to target and to end users is no exception. As a result, data
availability and recovery service-level requirements have become more stringent in the past few
years. This means that many organizations must develop a comprehensive tier-based strategy for
information availability and protection to apply in their environment.
One of the major impediments in data delivery has been a solution gap in data warehousing and
data integration systems that ensure the availability of business critical information for multiple
systems end to end. To respond, Informatica pioneered a solution that extracts and transforms the
data once and simultaneously loads it into multiple data warehousing systems for consistency,
recovery, and restartability in collaboration with Teradata. It enables smooth peak data processing
for increased throughput and lower risk of outage while ensuring data protection and recovery in
the event of an outage.
This white paper illustrates how an organization can load-balance mission-critical information,
drive higher efficiency in the technology infrastructure, reduce risks of data loss during outages,
and better prepare for disaster recovery. It also describes how a product-based, dual load solution
can help an IT department demonstrate the higher operational productivity, resiliency, and agility
of the data integration and data warehousing environment. This fortified IT underpinning—delivering
business-critical data in the most reliable, resilient, and secure fashion—equips an organization
with minute-by-minute decision-making abilities, cross-business unit visibility, and transparency
for regulatory compliance.

2
White Paper

Business Availability of Information at Premium “Optimization must occur


Availability of business-critical information is ever more vital to organizational performance, and, at all levels, not just at query
as a result, availability and recovery service-level requirements have become more stringent in
recent years. Enterprises are revisiting their strategy and architecture to meet data availability performance layer. The staging area
and recovery requirements, including an assessment of how information is supporting business
processes currently and how it can improve those processes in the future. They are also evaluating and detailed storage layer support
what technologies will be critical for ensuring the delivery of trustworthy, actionable, and
authoritative data as part of the IT backbone. the extensibility and flexibility
As a result, business intelligence (BI) and data warehousing (DW) teams are under pressure to requirements. As more data is
deliver mission-critical data warehouses more reliably, quickly, and cost-effectively at lower risk.
They face the following tasks: added to the warehouse over
• Lower the cost and risk of BI/DW implementations by avoiding outages and containing the time, either as deeper tables with
costs of downtime, including recovery tasks
• Explicitly demonstrate how BI/DW systems are meeting or exceeding the service-level more rows or wider tables with
agreements (SLAs) to business, especially for mission-critical operations
more columns, loading becomes a
• Ensure that BI/DW strategic use of analytical and operational data occurs across all
applications performance issue.”
• Work closely with enterprise architects, project sponsors, and IT operations/risk groups to select
the right solutions and implement them successfully Mark Beyer
However, in developing a business continuity and disaster recovery plan, an organization Research Vice President, Gartner
could face: “Data Warehouse Architecture Best Practices
Inability to achieve continuous availability and Guiding Principles,” November 6, 2009
• Service disruptions longer than seconds or a few minutes
• Unable to keep systems operational during a planned downtime
• Lack of transparency on what happened during downtime

Limited recoverability and risk protection


• Delay in restoring systems or data after an outage
• Failure to contain localized failures and geographic disasters
• Questionable data protection

Lack of data loading solution to multiple target systems delaying time to market
• Cannot maintain systems to be “live” and contribute to workload execution
• Unable to smooth peak processing
• Taking days and weeks to move data among production and Q&A systems

Ensuring Business Availability of Information with Dual Load Solution 3


In many cases, to overcome these barriers, organizations are using a tiered architecture at the
hardware level without the ability to simultaneously load data into multiple target systems, or
they are deploying a hand-coded approach for scheduling the multiple data loadings without the
requisite data resiliency and protection.
There’s a better way. Consider a scenario where there are the data centers in two cities, say Los
Angles, California and Jersey City, New Jersey and the primary target systems in Los Angeles and
secondary target system in Jersey City are deployed with a highly reliable data loading solution.
Like many large enterprises, this company has dozens of major offices and branches across the
country. So the queries of business users can be routed to the primary or secondary system based
on their location to optimize the response time while data synchronization continually occurs
between the systems in Los Angeles and Jersey City. If the primary system gets too busy, a query
optimization system can direct a query from the primary to the secondary system so that a user
does not experience a significant delay in query response time. Simultaneously, data in primary
and secondary systems are synchronized from the source systems to the target systems so that
the primary and secondary systems are up to date. In this typical active-active environment,
applications are executed against a dual active pair and data is synchronized between the two
systems.
Now, imagine there was a prolonged system outage in Los Angeles where the primary system
is located. Up to the point of failure, IT is enabling the process of accessing, transforming,
and preserving the most up-to-date, business-relevant data in a data integration repository
environment. This means that, as soon as an outage occurs, a query previously directed at the
primary system in Los Angeles can be immediately redirected to the secondary system in Jersey
City for business continuity. This approach helps an organization perform the recovery tasks in a
reliable, secure manner because there are checkpoints and monitoring information in addition to
the inherent data security within the enterprise data integration platform. This disaster recovery
scenario has an active-passive configuration in which data is updated in tandem but the
application is executed against the primary system only.

4
White Paper

Teradata Dual Active Environment


Let us further examine a typical environment where a Teradata dual active system can be deployed
as shown in Figure 1.

Teradata Dual Active Architecture


Solution: Reliable, Simultaneous Dual Load of Bulk Data

Teradata System A Query


Data (Los Angeles)
Routing
Synchronization
ETL Portals/
Applications

Data Synchronization Teradata Multi-


System Manager (TMSM) Teradata San Francisco
IP Relationship Branch
Table Copy Monitoring Routing Manager

Replication Administration
Teradata Business
Query Intelligence
Dual Load Operational Control Director

Teradata
Demand New York
ETL Chain Manager Headquarters

Teradata System B
(Jersey City)
Figure 1. Teradata Dual Active Architecture

First, at the left, you can see how three data synchronization methods—table copy, replication, and
dual load—help ensure the primary system and secondary system are current and synchronized.
The dual load solution highlighted in blue uniquely enables the extract-transform-load (ETL)
processing of bulk data into the multiple Teradata target systems at any data volume at low
latency. This dual load technique co-exists with other data synchronization techniques such as
table copy and data replication. Table copy helps move, archive, and restore data within tables
across Teradata systems in medium-to-long latency scenarios. Data replication captures changes
in the primary database and applies the in-database changes from the primary system to the
secondary system in low-volume instances. The dual load solution also communicates with
Teradata Multi-System Manager (TMSM) about the status of data loading jobs. TMSM serves as
a centralized management system that performs monitoring, administration, and operational
controls across Teradata systems. Teradata Query Director can route the query using the IP routing
data against multiple applications, including BI systems, Teradata Relationship Manager, and
Teradata Demand Chain Manager, coming from users in multiple locations.
If users are located closer to the primary system than to the secondary system, the query can be
routed to the primary system by the Teradata Query Director. If another user is in the vicinity of the
secondary system, the query can be routed to the secondary system to ensure optimal speed.

Ensuring Business Availability of Information with Dual Load Solution 5


If the primary system becomes unavailable, Teradata Query Director can reroute the query from
the primary to the secondary system so that the user can continue with the routine business
“Informatica PowerCenter Dual activities that require fresh, reliable information. The dual loading solution continues to load data
into the secondary server and saves the staging file for the unavailable server while sending alerts
Load Option for Teradata is and events to a management system. This means that users can continue to get the freshest data
from the source to the target system while the IT team can efficiently manage recovery using the
the result of significant joint workflow tasks and other essential information about monitoring the environment.
collaboration between our The dual load approach has unique advantages. From the architectural standpoint, it can:
• Ensure that infrastructure components meet business SLAs, including data availability,
two companies. It provides
recoverability, and protection requirements as part of the tiered approach
joint customers with reliable, • Serve as a technology foundation that delivers IT services supporting business processes and
their associated service levels, including the architecture of business systems
simultaneous loading of data into
• Promote standardization and reusability while rationalizing the introduction of new subsystems
multiple Teradata systems. This From the business standpoint, it can:
collaboration also enables customers • Tie into enterprise risk management and governance initiatives, including business continuity
and disaster recovery management as part of a layered strategy
to significantly enhance disaster
• Lower the total costs of ownership of the data warehousing infrastructure while mitigating risks
to business operations
recovery capabilities, increase
• Increase IT subsystem performance and thus positively impact business operations such as
efficiency in resource utilization and mitigating revenue loss and containing costs and other risks

reduce overall operating costs.” • Future-proof the data warehousing environment by actively balancing the loads, thereby making
it more reliable, resilient, and robust to handle mission-critical business intelligence and other
applications
Stephen Brobst
Chief Technology Officer, Teradata

6
White Paper

Considerations for Dual Load Solutions


For data loading processes, many organizations still hand code individual processes instead of
using a platform-based data integration solution. The scripting approach does not offer a single,
unified environment where you can access, integrate, and deliver data—any data, anywhere, at
any time. This one-off hand-coding approach is not scalable due to a high administrative and
operational overhead and lower level of protection against outages and other recovery objectives.
Furthermore, coding individual processes is expensive, resource intensive, time consuming, and
prone to errors.
To address these inherent deficiencies, a dual load solution must be based on a proven, robust
platform that enables IT teams to reuse data extraction, transformation, and loading processes
end to end. This ability to reuse the existing environment and processes also helps bring down the
overall total costs of ownerships of the solution in three ways:
1. Universal data access capabilities reduce the cost of accessing data, regardless of format.
2. High performance, continuous data integration capabilities lower the costs of managing a large,
distributed Tier 1 environment.
3. The dual load solution integrated with the TMSM helps reduce the cost of monitoring,
administration, and operational control of multiple Teradata data warehouses.
Furthermore, a dual load solution based on a unified and comprehensive platform is more
scalable and secure and able to meet future technical and business requirements more easily
than hand coding. A dual load solution based on an enterprise data integration platform is
uniquely suited to handle the largest volumes of mission-critical data while ensuring that data
from sources to target is protected and recoverable. In evaluating a dual load solution, an
organization should consider the following questions:
Does the solution increase efficiency in data loading?
• Perform reliable, simultaneous bulk data loading
• Handle low data latency between multiple systems
• Incur no additional CPU required on primary systems

Can it minimize development, administration, and monitoring overhead?


• Promote reusability of data integration workflows across multiple systems
• Monitor data loading state from dual active environment
• Require minimum dependencies between systems for switchover, failover, and restarts

How does it mitigate risk as part of the business continuity and disaster recovery mandate?
• Continue data loading even if one system is unavailable
• Ensure control and transparency over recovery state and restartability
• Demonstrate data loss protection and security from sources to loading

Ensuring Business Availability of Information with Dual Load Solution 7


Informatica Dual Load Solution for Teradata
To address these critical dual loading requirements, Informatica is introducing the Informatica®
Dual Load Solution for Teradata. This solution extracts and transforms the data once and
simultaneously loads it into multiple Teradata systems (dual active or disaster recovery) ensuring
consistency, recovery, and restartability
Informatica’s Dual Load Solution for Teradata ensures peak application performance, prevents
service disruptions, and facilitates disaster planning with superior data protection and recovery.
The solution leverages the Informatica Platform as follows:
• Accesses, integrates, and loads the freshest business-critical data from all systems using
Informatica’s comprehensive, open, and unified platform
• Ensures your mission-critical enterprise data warehouse enjoys high availability, full failover, and
reliable data recovery
• Provides the business with minute-by-minute decision-making abilities, cross-business unit
visibility, and transparency for regulatory compliance
• Increases the efficiency and cost-effectiveness of data warehousing by automating processes,
improving business-IT collaboration, and incorporating proven best practices
Figure 2 illustrates the high-level solution architecture for Informatica Dual Load Solution for
Teradata.

Figure 2. High-Level Solution Architecture: Informatica Dual Load Solution for Teradata

8
White Paper

How does it work? First. Informatica PowerExchange® Adapters secure and sustain direct
connectivity to any data in source systems, whether it’s relational data, mainframe, packaged
applications, data in the cloud, or unstructured and semi-structured data. The data is then
extracted into the Informatica Platform where PowerCenter® Advanced Edition™ with its Metadata
Manager performs lineage analysis and creates a metadata catalog to enhance transparency and
collaboration between IT and the business. The PowerCenter High Availability Option™ configures
multiple backup services and minimizes service disruptions across the entire platform to provide
resiliency, restartability, failover, and recoverability. Within the PowerCenter environment, binary
staging files are maintained and saved for pushing the data into the Informatica Dual Load
Solution environment. The Informatica Dual Load Solution consists of the following capabilities:
Dual Load Staging Adapter
• Defines dual load connections as extensions to the Teradata Parallel Transporter (TPT) adapter
component of PowerExchange for Teradata
• Defines run-time session-level parameters specifying connectivity attributes to external systems
• Makes all data repeatable and staged into binary files for recovery and restartability

Dual Load Utility and Template Workflow


• Loads data into dual or multiple Teradata systems simultaneously via PowerExchange for
Teradata with TPT API
• Supports dual loading with no need to change mapping logic
• Orchestrates loading and recovery tasks, including continuous data loading to secondary
Teradata system if primary fails
Teradata Multi-System Manager (TMSM) Event Reporter
• Integrates with Teradata Multi-System Manager (TMSM)
• Reports on staging and loading status, including number of rows sent and checkpoints taken
• Provides detailed updates, including the following:
• Staging completed
• Loading to server completed successfully
• Loading to server failed
• Staging file queued for the server
• Recovery for loading to the server completed successfully
• Recovery for loading to the server failed

All these capabilities together, Informatica’s dual load solution helps IT organizations quickly
design, test, and populate data warehouses to meet stringent SLAs and other business demands.
Based on the Informatica data integration platform, it also uniquely empowers IT to extend current
data warehousing projects and modernize the environment with highly reliable, secure, and
resilient data integration processing.

Ensuring Business Availability of Information with Dual Load Solution 9


Customer Use Cases
Case 1: Dual Active/Disaster Recovery for Customer Centricity
As with many corporations, this financial service leader has been examining what kind of
experiences its customers are having both on-line and at branches to make operations more
profitable and mitigate risks during the recent financial turmoil. This corporate-wide effort
led to a higher usage of business intelligence and other reports on customer activities and
surrounding investment information. It inevitably led to an accelerated data volume growth and
the need for continuous availability to support its brokers and analysts, so they could perform
their jobs effectively. To support this effort, the company decided to modernize its enterprise
data warehousing environment and better prepare for future growth. With the stakes raised for
mission-critical data, it began an intense data infrastructure design effort to integrate the many
views of customers and activity with two major data centers. The fault-tolerant design and layered
approach to disaster recovery were also corporate mandates. Within IT, it was also crucial for the
data warehousing team to accelerate the process of development, quality assurance, testing, and
production.
The company is planning to address both dual active and disaster recovery scenarios, load-
balancing data between two systems and creating a dedicated disaster recovery environment.
In addition, it designed an environment to take advantage of the dual loading processes for
development and test environments so that it can reduce time moving to production from weeks
to days. The enterprise data warehouse is standardized on Informatica and Teradata; this dual
active implementation with Informatica’s dual load solution is part of its continuing effort to
extend its environment and get more from what it already has. This new undertaking will deliver
mission-critical business information for greater customer profitability, increase data warehousing
development productivity, and contain risks and recovery costs with dedicated dual load
processing during planned or unplanned downtime.

10
White Paper

Case 2: Disaster Recovery as Part of Enterprise Risk and Business


Continuity Management
Located in an area prone to earthquakes and wildfires, a large media company set out to
establish a comprehensive enterprise risk and business continuity management plan. Business
dependency on information and the resulting costs of downtime were escalating. The monolithic
architecture made it difficult for the company to provide a tiered recovery strategy that could
contain the costs of recovery and match the quality of service to the criticality of information
services. The company set up two systems in multiple cities and found out that the traditional
site’s failover did not meet the risk and governance requirements. By designing and implementing
a more granular approach to information availability from its data warehouse, IT was able to
optimize the architecture through fortifying with the tier-based approach and thus ensure data
availability and protection in the event of disasters or other recovery scenarios.
Focusing on classifying the criticality of business information, the company developed a plan so
that the service levels of information delivery are proportional to the business priority. This new
approach triggered it to re-examine how multiple applications and systems can feed source data
directly into their data infrastructure to reduce the costs and risk for managing multiple staging
jobs. By decreasing the amount of multiple data manipulations and processing, the company
was better positioned to handle routine transactions at lower costs. Furthermore, the stronger
data infrastructure meant that the business could move some of the business processes that
were traditionally handled manually into automated processes with business rules based on
information such as availability of partner, content, and pricing. By injecting information into these
business processes in this fortified environment, the company saw an opportunity to increase
agility for operations and reduce the costs of support and maintenance for the processes that
were handled by one-off manual manipulations. As a result, the company is on track to better
prepare for a series of recovery scenarios, including geographic disasters and other outages, while
improving the quality and speed of the decision processes for business users supported by the
technology infrastructure.

Ensuring Business Availability of Information with Dual Load Solution 11


Summary of Benefits
Informatica empowers businesses to tap the mission-critical information infrastructure end to
end for superior performance and continuous availability. The Informatica Dual Load Solution
for Teradata helps increase business availability of Tier 1 operations and mitigates loss due
to outages. As a result, with fresh, trustworthy data, the solution sustains upsell and cross-
sell activities without disruptions. It ensures a continuous stream of sales, distribution, and
partner data for efficient revenue generation. The solution is designed to reduce costs for data
infrastructure, downtime, and disaster recovery by balancing loads and smoothing peak data
processing to get more from the existing investment. It also helps cut down activities for planned
and unplanned downtime by extending the data integration infrastructure and minimizing change
management. Finally, it helps manage risk and governance with continuity and a recovery plan for
information assets. It can protect vital business processes intricately linked to up-to-date data.
The ability to prevent data loss and reconcile data quickly after the outages helps meet recovery
objectives.

Conclusion
Increasing business availability of information is a key concern for many organizations seeking
to become data driven. They are re-examining the IT architecture to ensure that it is designed
to perform a layered strategy to ensure that data availability, recoverability, and protection
are fundamental tenets of supporting business continuity. In response, an increasing number
of IT departments are taking a multisystem approach to the data warehousing architecture
and adopting a more systematic, streamlined approach to access, integrate, and deliver the
freshest, most relevant data to the business. To support this tiered approach, Informatica,
along with Teradata, has introduced the only dual load solution in the market that relies on a
comprehensive, product-based data integration platform. The Informatica Dual Load Solution
for Teradata helps organizations access, integrate, and load the freshest, business-critical data
to construct and maintain the multiple Teradata data warehouses for business continuity and
disaster recovery. This pioneering dual load solution extracts and transforms the data once and
simultaneously loads it into multiple Teradata systems. With Informatica’s dual load solution, an IT
organization can minimize the costs, time, and risks associated with availability and protection of
information, empowering business to tap the mission-critical information infrastructure for superior
performance and continuous availability.

12
White Paper

Learn More
Learn more about Informatica’s EDW solutions at http://www.informatica.com/solutions/
enterprise_data_warehouse. Visit us at http://www.informatica.com or call (800) 653-3871 to
learn more about Informatica and the entire Informatica Platform.

About Informatica
Informatica Corporation (NASDAQ: INFA) is the world’s number one independent provider of data
integration software. Organizations around the world gain a competitive advantage in today’s
global information economy with timely, relevant and trustworthy data for their top business
imperatives. More than 4,100 enterprises worldwide rely on Informatica to access, integrate and
trust their information assets held in the traditional enterprise, off premise and in the cloud.

Ensuring Business Availability of Information with Dual Load Solution 13


Worldwide Headquarters, 100 Cardinal Way, Redwood City, CA 94063, USA
phone: 650.385.5000 fax: 650.385.5500 toll-free in the US: 1.800.653.3871 www.informatica.com

© 2010 Informatica Corporation. All rights reserved. Printed in the U.S.A. Informatica, the Informatica logo, and The Data Integration Company are trademarks or registered trademarks of Informatica Corporation in the United States and in
jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. First Published: August 2010 7188 (09/20/2010)

You might also like