You are on page 1of 6

Data Migration Strategies

What is Data Migration?


The term data migration is used in several contexts for data movement activities. Lets look at the definition of data migration: Data migration is the process of transferring data between storage types, formats or computer systems. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks. It is required when organizations or individuals change computer systems or upgrade to new systems, or when systems merge (such as when the organizations that use them undergo a merger/takeover). (Source: Wikipedia) This article will address the large-scale data migration projects, where data is to be moved from source (old) system(s) to the target (new) system(s) on a one-time basis, usually as a result of application or technology upgrade initiative. The business objective of a data migration project is to move the data set of interest from the source system to the target system, while improving data quality and maintaining business continuity.

Data Migration Strategies


At any given time, according to industry analyst estimates, roughly two-thirds of the Fortune 1000/Global 2000 are engaged in some form of data conversion project including migration from legacy systems to packaged applications, data consolidations, data quality improvements and creation of data warehouses and data marts. These projects are driven by internal financial pressures, increasing competition, ongoing deregulation, industry consolidation due to mergers and acquisitions and Year 2000 and European currency concerns. According to recent studies conducted by The Standish Group, a research advisory firm based in Dennis, Massachusetts, 15,000 of these types of projects with budgets of $3 million or greater will begin in 1999 at a total cost of $95 billion. AMR Research estimates that ERP conversion projects alone will generate $52 billion by 2002, and Gartner Group/Dataquest forecasts that $2.6 billion will be spent on data warehousing software in 1999, growing to $6.9 billion by 2002. Project Failure as Norm? "Not only time, but company credibility, is regarded as a corporate asset. Running the risk of project default and/or cancellation is all too commonplace, and extensive investment in refining insufficient data facilities is costly and counter-productive. This approach alone can jeopardize a company's timetable and competitiveness." Jim Johnson, Chairman, The Standish Group Unfortunately, most data migration projects don't go as smoothly as anticipated. According to The Standish Group, in 1998, 74 percent of all IT projects either overran or failed, resulting in almost $100 billion in unexpected costs. Of the 15,000 data migration projects that are starting in 1999, as many as 88 percent will either overrun or fail.

One of the primary reasons for this extraordinary failure rate is the lack of a thorough understanding of the source data early on in these projects. Conventional approaches to data profiling and migration can create nearly as many problems as they resolve data not loading properly, poor quality data and compounded inaccuracies, time and cost overruns and, in extreme cases, late- stage project cancellations. The old adage "garbage in, garbage out" is the watchword here. Understanding Your Data: The First Essential Step Before undertaking large scale legacy-to-x application data migrations, data analysts need to learn as much as possible about the data they plan to move. If, for example, you have a variety of pegs that you'd like to fit into round holes, it's best to find out which ones are already round and which ones are square, triangular, rectangular, etc. Considering the magnitude of an organization's data, the task of the data analyst to obtain this knowledge is overwhelming, to say the least. IT organizations can begin their data analysis by implementing a two-step process: data profiling and mapping. Data profiling involves studying the source data thoroughly to understand its content, structure, quality and integrity. Once the data has been profiled, an accurate set of mapping specifications can be developed based on this profile a process called data mapping. The combination of data profiling and mapping comprises the essential first step in any successful data migration project and should be completed prior to attempting to extract, scrub, transform and load the data into the target database. Conventional Techniques in Data Migration: Problems and Pitfalls The conventional approach to data profiling and mapping starts with a large team of people (data and business analysts, data administrators, database administrators, system designers, subject matter experts, etc.). These people meet in a series of joint application development (JAD) sessions and attempt to extract useful information about the content and structure of the legacy data sources by examining outdated documentation, COBOL copy books, inaccurate meta data and, in some cases, the physical data itself. Typically, this is a very labor- intensive process supplemented, in some cases, by semi-automated query techniques. Profiling legacy data in this way is extremely complex, timeintensive and error-prone. Once the process is complete, only a limited understanding of the source data is achieved. At that point, according to the project flow chart, the data analyst moves on to the mapping phase. However, since the source data is so poorly understood and inferences about it are largely based on assumptions rather than facts, this phase typically results in an inaccurate data model and set of mapping specifications. Based on this information, the data is extracted, scrubbed, transformed and loaded into the new database. Not surprisingly, in almost all cases, the new system doesn't work correctly the first time. Then the rework process begins: redesigning, recoding, reloading and retesting. At best, the project incurs significant time and cost overruns. At worst, faced with runaway costs and no clear end in sight, senior management cancels the project, preferring to live with an inefficient but partially functional information system rather than incur the ongoing costs of an "endless" data migration project. Strategies for Data Migration: 6 Steps to Preparing Your Data Data profiling and mapping consist of six sequential steps, three for data profiling and three for data mapping, with each step building on the information produced in the previous steps. The resulting transformation maps, in turn, can be used in conjunction with third-party data migration tools to extract, scrub, transform and load the data from the old system to the new system.

Data sources are profiled in three dimensions: down columns (column profiling) ; across rows (dependency profiling); and across tables (redundancy profiling).

Column Profiling. Column profiling analyzes the values in each column or field of source data, inferring detailed characteristics for each column, including data type and size, range of values, frequency and distribution of values, cardinality and null and uniqueness characteristics. This step allows analysts to detect and analyze data content quality problems and evaluate discrepancies between the inferred, true meta data and the documented meta data. Dependency Profiling. Dependency profiling analyzes data across rows comparing values in every column with values in every other column and infers all dependency relationships that exist between attributes within each table. This process cannot be accomplished manually. Dependency profiling identifies primary keys and whether or not expected dependencies (e.g., those imposed by a new application) are supported by the data. It also identifies "gray-area dependencies" those that are true most of the time, but not all of the time, and are usually an indication of a data quality problem. Redundancy Profiling. Redundancy profiling compares data between tables of the same or different data sources, determining which columns contain overlapping or identical sets of values. It looks for repeating patterns among an organization's "islands of information" billing systems, sales force automation systems, post-sales support systems, etc. Redundancy profiling identifies attributes containing the same information but with different names (synonyms) and attributes that have the same name but different business meaning (homonyms). It also helps determine which columns are redundant and can be eliminated and which are necessary to connect information between tables. Redundancy profiling eliminates processing overhead and reduces the probability of error in the target database. As with dependency profiling, this process cannot be accomplished manually.

Figure 1: Key steps in data profiling and mapping Once the data profiling process is finished, the profile results can be used to complete the remaining three data mapping steps of a migration project: normalization, model enhancement and transformation mapping. Normalization. By building a fully normalized relational model based on and fully supported by the consolidation of all the data, the data model will not fail. Model Enhancement. This process involves modifying the normalized model by adding structures to support new requirements or by adding indexes and de-normalizing the structures to enhance performance. Transformation Mapping. Once the data model modifications are complete, a set of transformation maps can be created to show the relationships between columns in the source files and tables in the enhanced model, including attribute-to-attribute flows. Ideally, these transformation maps facilitate

the capture of scrubbing and transformation requirements and provide essential information to the programmers creating conversion routines to move data from the source to the target database. Developing an accurate profile of existing data sources is the essential first step in any successful data migration project. By executing a sound data profiling and mapping strategy, small, focused teams of technical and business users can quickly perform the highly complex tasks necessary to achieve a thorough understanding of source data a level of understanding that simply cannot be achieved through conventional processes and semi-automated query techniques. Better Decisions By following these six steps in data profiling and mapping, companies can take their data migration projects to completion successfully the first time around eliminating extensive design rework and late-stage project cancellations. A good data profiling and mapping methodology will take into account the entire scope of a project, even warning IT management if the business objectives of the project are not supported by the data. Data profiling and mapping, if done correctly, can dramatically lower project risk, enabling valuable resources to be redirected to other, more fruitful projects. Finally, it will deliver higher data and application quality, resulting in more informed business decisions, which typically translate into greater revenues and profits. John B. Shepherd is vice president of Professional Services for Evoke Software, developer of Migration Architect, the industry's premier data profiling and mapping software. John has 13 years of experience managing point-of-sale and application development organizations in the transportation, telecommunications and healthcare industries. Shepherd can be reached in San Francisco at Evoke Software, (415) 512-0300 or via e-mail at jshepherd@evokesoft.com. For more information on related topics, visit the following channels:

Data Integration DW Administration, Mgmt., Performance

Proven Strategies
In this article we discuss the proven strategies for executing large-scale data migration projects. The list has been compiled over years, after working on numerous large-scale critical data migration projects. Each strategy can be adopted with some level of customization, as per an individual organizations needs . Strategy 1: Invest in Profiling Source Data Source data is the starting point for any data migration effort. Understanding characteristics of source data is paramount for the success of the data migration project for several reasons to uncover undocumented data relationships, data quality, data volume, data anomalies, etc. Data profiling essentially provides x-ray vision of the source data sets, which helps to understand the strengths and weaknesses of the data sets. The investment made will have direct impact on the effectiveness of downstream processes and software code components. Also, it is important to define the scope of the data profiling exercise up front to avoid any overspending on this task. Basic data profiling can be performed by developing scripts; however, for highly complex and large data sets, using an industrial strength data profiling tool is worth the investment. Strategy 2: Create a Data Migration Process Model Data migration can be as simple as single step with just one source and one target system, or it can be a highly complex process involving multiple source systems, multiple steps and

multiple target systems. Create an elaborate process model depicting every step of the migration process. The artifact serves as the road map for moving data, as well as an agreement among the stakeholders involved. The process model also serves as input to the downstream administration, configuration management and software development processes. The process model should have interim steps to validate volume and quality of data that is flowing through the process. By having the embedded checkpoints, data analysts can make sure the exceptions are within accepted limits and there are no hidden surprises. Strategy 3: Define Roles and Responsibilities Up Front Data migration can be a complex and daunting project involving several stakeholders and IT task managers/team leaders. A formal handshake at every critical step of the data migration process is imperative. The architects and the project manager should identify all possible roles and assign responsibilities to the roles as part of project planning. The project manager then should formally assign these roles to all project staff members. By assigning roles and responsibilities up front, project leadership can ensure that entire data migration life cycle is supported with appropriate accountability established. Conduct a formal walkthrough of the Roles and Responsibilities document to get buy-in from all stakeholders and project staff members. Strategy 4: Divide and Conquer Just like any other large and complex task, data migration also should be divided around logical grouping of data such as business area, geography, cost center, etc. The choice of such logical groupings depends on the business context for data migration task. It is recommended to choose smallest data set (Hawaii) first and then move on to larger data sets (California). By following such methodology, the team can learn and fine-tune the migration process early on with smaller data sets, thus minimizing the risks. Migration of each data set can be treated like a release, which will help the team immensely in communication. Each release should be followed by a formal release evaluation step to document and to educate and refinements for next release. Strategy 5: Invest in Technology/Tool Training For large-scale migration projects, it is recommended to invest in proven technology/tool for obvious reasons automation, metadata collection, scheduling, error handling, etc. If the team is new to such technology/tool, then it is highly recommended to invest in formal training for the staff that is responsible for development and execution of data migration code components. By investing in such training, project leadership can minimize the risk associated with the learning curve involved in the project. Also during the training process, the team gets the opportunity to establish relationship with vend ors technical support staff. Strategy 6: Conduct Performance Testing For large-scale data migration efforts, the size of the data sets being moved from source to destination data stores can be overwhelming. Due to business and/or operational requirements, most of the data migration projects have predefined and short time windows for moving data. Hence, it is imperative for the code components to have acceptable performance levels. It is highly recommended to fully test the code components for performance at production scale. The project staff should continue to tune the software and/or configuration parameters until the desired throughput has been achieved. The

repetitive performance testing will also help staff get acquainted with the technology and the migration process. Strategy 7: Have a Plan B Last but not least, just like for any mission-critical project, have a plan B. Even after significant planning, testing and rehearsal, migration projects tend to face surprises. Hence, from business continuity standpoint, it is imperative to have an alternative solution planned and tested before the project begins. The plan B must be formulated with inputs from all stakeholders including business leadership, business users, operational IT staff and migration project leadership. The migration project leadership should get sign off from all stakeholders for the plan B and communicate any changes thereafter. By communicating the plans and intentions to all stakeholders, the project leadership can ensure that all dependent business processes are prepared for the change.

Conclusion
As we learned, most of the data migration projects fail for various reasons. One of the primary reasons for such failure is underestimation of the scale and complexity of the data migration effort. By proactively investing in estimation and planning, IT managers can get good handle on the project. Data migration is a multidimensional effort, which can be time sensitive and mission critical. By following these simple and proven strategies, IT managers can certainly improve the probability of success. End Notes: 1. Data Migration in the Global 2000, Bloor Research (September 2007)

You might also like