Professional Documents
Culture Documents
MDM overview
Financial services
scenario
ibm.com/redbooks
International Technical Support Organization
April 2009
SG24-7704-00
Note: Before using this information and the product it supports, read the information in
“Notices” on page xvii.
This edition applies to Version 8, Release 0, Modification 1 of IBM InfoSphere Information Server
(5724-Q36) and Version 8, Release 0, Modification 1 of IBM InfoSphere Master Data
Management Server (5724-V51).
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
The team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiv
Contents v
vi MDM: RDP for MDM
Figures
1-1 Optimal architecture combining MDM Hub with Data Integration Hub . . . . 3
1-2 Synchronization of master data in the enterprise . . . . . . . . . . . . . . . . . . . . 4
1-3 IBM Information Server suite in the flow . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1-4 IBM Information Server functionality in MDM deployment . . . . . . . . . . . . . 6
1-5 RDP for MDM solution overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1-6 MDM Logical Data Model & Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1-7 Role and flow of RDP for MDM solution . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1-8 Enabling Suspect Duplicate Processing in MDM Server UI . . . . . . . . . . . 17
1-9 Configuring the critical matching fields for person matching . . . . . . . . . . . 18
1-10 Configuring the critical matching fields for organization matching . . . . . 19
1-11 Creating a new application accessing MDM Server . . . . . . . . . . . . . . . . 21
1-12 Modifying a existing application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1-13 Creating a existing adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1-14 Using a Change Data Capture solution. . . . . . . . . . . . . . . . . . . . . . . . . . 22
1-15 Overview of RDP for MDM solution scenarios . . . . . . . . . . . . . . . . . . . . 23
2-1 Main components of RDP for MDM processing . . . . . . . . . . . . . . . . . . . . 29
2-2 CDIDTP table contents: Corresponds to the I’n’ columns . . . . . . . . . . . . . 48
2-3 CDCONTMETHTP table contents: Corresponds to the C’n’ columns . . . 48
2-4 Contact (P) and Contract (C) RT/ST combinations . . . . . . . . . . . . . . . . . . 50
2-5 Import SIF flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2-6 Validation and Standardization flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2-7 Error Consolidation and Referential Integrity flow. . . . . . . . . . . . . . . . . . . 58
2-8 Match Pers and Org flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2-9 Match LOB flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2-10 Bulk Load flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2-11 Upsert Load flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3-1 TBank environment configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3-2 Rapid MDM approach used in the scenario for the initial load . . . . . . . . . 73
3-3 Checking table data, part 1 of 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3-4 Checking table data, part 2 of 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3-5 Savings table data, part 1 of 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3-6 Savings table data, part 2 of 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3-7 Savings table data, part 3 of 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3-8 Loans table data, part 1 of 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3-9 Loans table data, part 2 of 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3-10 Loans table data, part 3 of 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3-11 Loans table data, part 4 of 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3-12 DQA approach: Data assessment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Figures ix
part 5 of 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
3-76 Create of a reference table using Information Analyzer, part 1 of 6 . . . 154
3-77 Create of a reference table using Information Analyzer, part 2 of 6 . . . 155
3-78 Create of a reference table using Information Analyzer, part 3 of 6 . . . 156
3-79 Create of a reference table using Information Analyzer, part 4 of 6 . . . 157
3-80 Create SIF, part 5 of 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
3-81 Create SIF, part 6 of 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
3-82 Create SIF tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
3-83 Create SIF, part 1 of 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
3-84 Create SIF, part 2 of 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
3-85 Create SIF, part 3 of 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
3-86 Create SIF, part 4 of 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
3-87 Create SIF, part 5 of 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
3-88 Create SIF, part 6 of 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
3-89 Create SIF, part 7 of 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
3-90 Launch RDP for MDM jobs, part 1 of 7 . . . . . . . . . . . . . . . . . . . . . . . . . 182
3-91 Launch RDP for MDM jobs, part 2 of 7 . . . . . . . . . . . . . . . . . . . . . . . . . 182
3-92 Launch RDP for MDM jobs, part 3 of 7 . . . . . . . . . . . . . . . . . . . . . . . . . 183
3-93 Launch RDP for MDM jobs, part 4 of 7 . . . . . . . . . . . . . . . . . . . . . . . . . 184
3-94 Launch RDP for MDM jobs, part 5 of 7 . . . . . . . . . . . . . . . . . . . . . . . . . 185
3-95 Launch RDP for MDM jobs, part 6 of 7 . . . . . . . . . . . . . . . . . . . . . . . . . 186
3-96 Launch RDP for MDM jobs, part 7 of 7 . . . . . . . . . . . . . . . . . . . . . . . . . 187
3-97 Verify successful load, part 1 of 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
3-98 Verify successful load , part 2 of 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
3-99 Verify successful load, part 3 of 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
3-100 Verify successful load, part 4 of 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
3-101 Verify successful load, part 5 of 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
3-102 Suspect , part 1 of 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
3-103 Suspect resolution, part 2 of 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
3-104 Suspect resolution, part 3 of 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
3-105 Suspect resolution, part 4 of 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
3-106 Suspect resolution, part 5 of 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
3-107 Suspect resolution, part 6 of 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
3-108 Suspect resolution, part 7 of 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
3-109 Suspect resolution, part 8 of 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
3-110 Suspect resolution, part 9 of 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
3-111 Hierarchy scenario example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
3-112 TBank hierarchy scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
3-113 RDP for MDM jobs Director output, part 1 of 2 . . . . . . . . . . . . . . . . . . 214
3-114 RDP for MDM jobs Director output, part 2 of 2 . . . . . . . . . . . . . . . . . . 215
3-115 Hierarchy view using MDM Server UI, part 1 of 15 . . . . . . . . . . . . . . . 217
3-116 Hierarchy view using MDM Server UI, part 2 of 15 . . . . . . . . . . . . . . . 218
3-117 Hierarchy view using MDM Server UI, part 3 of 15 . . . . . . . . . . . . . . . 219
Figures xi
xii MDM: RDP for MDM
Tables
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service that
does not infringe any IBM intellectual property right may be used instead. However, it is the user's
responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not give you any license to these patents. You can send license
inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer
of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may
make improvements and/or changes in the product(s) and/or the program(s) described in this publication at
any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm
the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on
the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the
sample programs are written. These examples have not been thoroughly tested under all conditions. IBM,
therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
Oracle, JD Edwards, PeopleSoft, Siebel, and TopLink are registered trademarks of Oracle Corporation
and/or its affiliates.
Red Hat, and the Shadowman logo are trademarks or registered trademarks of Red Hat, Inc. in the U.S. and
other countries.
SAP, and SAP logos are trademarks or registered trademarks of SAP AG in Germany and in several other
countries.
EJB, J2EE, J2SE, Java, JSP, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in
the United States, other countries, or both.
Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
Nagraj Alur is a Project Leader with the IBM ITSO, San Jose Center. He has
more than 33 years of experience in database management systems (DBMSs),
and has been a programmer, systems analyst, project leader, independent
consultant, and researcher. His areas of expertise include DBMSs, data
warehousing, distributed systems management, database performance,
information integration, and client/server and Internet computing. He has written
extensively on these subjects and has taught classes and presented at
conferences all around the world. Before joining the ITSO in November 2001, he
was on a two-year assignment from the Software Group to the IBM Almaden
Research Center, where he worked on Data Links solutions and an eSourcing
prototype. He holds a master’s degree in computer science from the Indian
Institute of Technology (IIT), Mumbai, India.
Mike Carney is an Executive Architect within the IBM Software Group, and the
lead architect and development team lead for RDP DataStage Jobs. He has over
20 years of experience with Software application development and data
warehousing, including 10 years with IBM Advanced Consulting Group for
DataStage, where he has contributed to many innovations to the Information
Server Product. Mike holds a BA in Mathematics from Boston College.
Priyanka Deswal is a Senior IT Specialist with the IBM Software group in India.
She has more than nine years of experience in Information Management. She is
currently working as a Technical Pre-Sales specialist for Information Platform and
Solutions. She is an advanced cluster certified DB2® Expert. Her areas of
expertise include DB2, InfoSphere Information Server, InfoSphere MDM Server,
DB2 Content Manager. She holds a bachelors degree in computer science and
engineering.
Elizabeth Dial is a Technical Architect with the IBM Software Group. She is
currently part of the Trusted Information Agenda team, supporting the
advancement of industry leading architectures for IBM customers worldwide. As
a member of the InfoSphere Worldwide Center of Excellence, Elizabeth has
created and contributed to best practices pertaining to data quality and the
iterations methodology, and has participated in the design of data quality
components to support the InfoSphere suite of products. Elizabeth has 10 years
of professional experience designing and implementing data integration projects
that include data warehousing, SOA, and the integration of QualityStage with
IBM MDM Serve and other enterprise applications.
Preface xxi
Patrick Owen began his IT career at Acxiom Corporation, one of the world's
largest Data Service Providers specializing in personal identification and
name/address hygiene. After his experiences there with the Orchestrate®
Parallel Framework, he moved to Ascential® Software and now holds the
position of world-wide InfoSphere Information Server Architect, specializing in
performance, High-Availability, and Grid. Patrick has worked on projects
spanning many industries (including: insurance, package delivery, mortgage,
utilities, retail, and entertainment rental). He holds a BS in Computer Science
from the University Of Arkansas at Little Rock where he published several papers
on Optical Character Recognition, and Water Vapor Mapping Systems for
Extra-Terrestrial Landers.
David Borean
Karen Chouinard
Charles Jia
Linda Park
Joseph Tsang
Lena Woolf
IBM Canada
Aarti Borkar
Paul Christensen
Stacy Scoggins
Neil D Potter
Kiranmayi Potu
Brian L Tinnel
Ningning (Kevin) Wang
Balakumaran (Bala) Vaithyalingam
Larissa Wojciechowski
IBM USA
Srinivas Mudigonda
IBM India
Your efforts will help increase product acceptance and customer satisfaction. As
a bonus, you will develop a network of contacts in IBM development labs, and
increase your productivity and marketability.
Find out more about the residency program, browse the residency index, and
apply online at:
ibm.com/redbooks/residencies.html
Preface xxiii
Comments welcome
Your comments are important to us!
The content we provide in this Redbooks publication dives into the details of RDP
MDM, to give you a better understanding of the technical underpinnings,
operational metrics, and deployment methods for the RDP MDM Server solution.
The context for the RDP for MDM solution is described as follows:
MDM and the enterprise
Typical enterprises are run under a myriad of applications and systems that
all work together over the enterprise network to accomplish the management,
control, and reporting of the business. Each of these systems has some slice
of data that is critical and important to the enterprise and in fact represents
the gold copy of this data. The master data. In order for all of the systems in
the enterprise to work effectively together, this master data must be managed,
standardized, and synchronized. Otherwise it would be like people in the
United Nations trying to communicate without translators.
Over time, we have shown that the most optimal method of managing this
master data in an enterprise is through the use of flexible scalable MDM and
data integration hubs working in unison, as shown in Figure 1-1 on page 3. It
is critical that a common data governance practice be used across the
enterprise, and that these hubs be driven by a common set of data
transformation and data quality business rules giving consistency in the
enterprise deployments. It was these goals, and the goal to provide a rapid
deployment framework for MDM in the enterprise that led to the development
of the RDP for MDM solution.
Master Data & Master Data & Master Data & Master Data & Master Data &
Master Data Master Data Master Data Master Data Master Data
Services Services Services Services Services
Figure 1-1 Optimal architecture combining MDM Hub with Data Integration Hub
Chapter 1. Rapid Deployment Package for Master Data Management solution overview 3
Synchronization of master data in the enterprise
Synchronization of master data in the enterprise is where the flexibility and
scalability of the MDM Hub/Data Integration Hub approach really shows its
power. By providing access and consumption control of MDM data, driven by
a common set of business and data quality rules, we have the best of all
worlds. Synchronization can now be done at an end-of-day batch, delta load,
intraday trickle feed, as a service-oriented architecture (SOA) service call, or
as a full XA-compliant transaction under high availability with two phase
commit and roll back, as shown in Figure 1-2.
Synchronization can happen at either the MDM Hub or Data Integration Hub
layer through a common set of rules. Metadata gold copies with known data
quality can now be managed in the enterprise under a common set of
precedence and synchronization rules.
MDM Transaction
Web Services
Information Server
Information Analyzer
Fast Track SIF
DataStage Load Process
DS jobs History
MDM Database
Chapter 1. Rapid Deployment Package for Master Data Management solution overview 5
Data Integration and MDM — why use RDP MDM in every MDM Server
deployment
It is critical that RDP for MDM be used in all MDM Server deployments for
both loading and delta processing in order to allow fully automated end-to-end
metadata and data quality management in the enterprise. When using data
management services in the MDM Server hub, it is critical that these services
update the metadata when changes are made to MDM Server resident gold
copy metadata so that the metadata lineage linkage is maintained.
It is also critical that QualityStage be called and used for all data quality
processing so that a single set of data quality rules apply for the enterprise.
By abiding by these rules, both MDM Server and Information Server can now
be used to deliver synchronized master data across all domains in the
enterprise, no matter how large or complex. Figure 1-4 shows the breadth of
Information Server functionality contained in the RDP for MDM solution to
achieve the stated objectives.
The RDP for MDM solution is the important first step towards implementing
the MDM vision and the business value it offers at a later date. Your first
implementation phase should scope an attainable business objective to
validate the advantages of continuing with subsequent phases that
incrementally incorporate all the features of MDM Server.
The RDP for MDM solution reflects the three main phases of an implementation
of MDM Server:
Source data analysis
MDM Server point-in-time load
MDM Server data consumption
In the following sections, we describe the following aspects of the MDM solution:
MDM Server implementation phases
RDP for MDM solution
Configuring the RDP for MDM solution
Main configuration scenarios
Best practices
Chapter 1. Rapid Deployment Package for Master Data Management solution overview 7
1.2 MDM Server implementation phases
This section describes the three main phases of an implementation of MDM
Server:
Source Data Analysis (aka Data Profiling),
MDM Server point-in-time load
MDM Server data consumption
This phase is critical to ensuring a successful data integration effort that delivers
timely and superior data quality.
During the initial load (as well as delta processing), appropriate translation and
transformation of the source data needs to occur. Also required is any potential
data cleansing activity that includes standardization, matching, and
de-duplication.
As part of the data cleansing activity during initial load and delta processing,
potential duplicate records are identified. You need to take appropriate action to
confirm or deny whether the identified duplicate condition exists. You need to
determine the match rules for identifying potential duplicates between two or
more records. This involves defining match criteria, which vary depending upon
whether the records are of the type person or organization. For a person, the
default business duplicate key is the Social Security Number. For an
organization, the default duplicate key is the Corporate Tax Identification.
Chapter 1. Rapid Deployment Package for Master Data Management solution overview 9
When duplicates are identified, you need to survive (combine/consolidate) data
about an entity from the multiple records and fill in the gaps from missing fields
by creating a more complete record based on the fields in the duplicate records.
The survivorship process can be automated when we are confident that the
records are duplicates. If any degree of doubt exists, the records are stored
separately and marked as potential duplicates. The Data Stewards will have to
perform manual consolidation of duplicate data using MDM Server Data
Stewardship UI, which calls MDM Server business transactions
Note: The RDP for MDM solution described in 1.3, “RDP for MDM solution” on
page 11 provides a user interface, as well as business services that are Web
services or Enterprise Java™ Bean (EJB™)-enabled. That does not preclude
other consumption mechanisms (such as stored procedures or extracts), but
those solutions tend not to be as scalable as Web services or EJBs.
Figure 1-5 shows how the components that make up the RDP for MDM solution
correspond to the three MDM Server implementation phases.
MDM Server
Composites
Processor
SIF Parser
Batch Txn
SIF Sequencer MDM Business
(DataStage) Soundex
Services NYSIIS Key
Generation
Tactical UIs
Duplicate Suspect Processing
Default Rule Reporting UI
Information Server Deterministic Off
Probabilistic On
New Candidate
List Rule* + use new QS Job
Match UI
DataStage QS
History Configuration
MDM Database Management
Information Server
Reporting
Information Analyzer SIF Load Process
Street name
Phonetic
Load (BIRT)
Address Script – p _street name
Fast Track DS Jobs Process Table Address lines 100 chr
Parameter
Candidate List
DataStage Sample Reports
(Data Stewardship
and MDM KPIs)
MDM
Source Data Analysis MDM Point in Time Load
Consumption
Chapter 1. Rapid Deployment Package for Master Data Management solution overview 11
The components shown in Figure 1-5 on page 11 are as follows:
IBM InfoSphere InformationAnalyzer
The InfoSphere InformationAnalyzer assesses the quality of the reference
data, the frequency of data per column, and the cardinality across columns.
IBM InfoSphere FastTrack
InfoSphere FastTrack accelerates the translation of business requirements
into data integration projects, which requires collaboration across analysts,
data modelers, and developers. It allows business logic to be captured and
translated into DataStage ETL jobs.
IBM InfoSphere DataStage
InfoSphere DataStage is an ETL tool that uses a graphical notation to
construct data integration solutions.
IBM InfoSphere QualityStage
InfoSphere QualityStage (QS in Figure 1-5 on page 11) is used to assess the
quality of free-form fields such as names and addresses.
IBM InfoSphere Master Data Management (MDM) Server
IBM InfoSphere Master Data Management (MDM) Server allows businesses
to centrally manage customer, product, and account data for use
enterprise-wide.
Standard Interface Format (SIF)
Standard Interface Format (SIF) is a flat file format delimited by the pipe
symbol between columns and the new line character between records. The
first 2 columns identify the type of data record and is denoted by the Record
Type (RT) followed by the Sub Type (ST). After the first 2 columns, the Source
System Key (SSK) is provided to allow the record to be referenced using the
existing source systems’ keys. The SIF is described in detail in Appendix B.1,
“SIF details” on page 248.
DataStage and QualityStage jobs
These jobs perform validation, standardization, matching, de-duplication,
suspect processing, loading, and delta processing of SIF input data into the
MDM data repository.
MDM Server data model subset
RDP for MDM solution supports a subset of the MDM Server data model —
this subset is shown in Figure 1-6 on page 13.
ADDRESS
CONTRACT LOCATION GROUP
CONTRACT
ADMIN BASE CONTRACT ROLE ADDRESS GROUP
CONTACT
SYSTEM KEY CONTACT METHOD GROUP
METHOD
CONTRACT COMPONENT
PHONE
NUMBER
PARTY
RELATION-
CATEGORY PARTY
SHIP
NAME
TYPE PARTY
PRIVACY/ PREFERENCE IDENTIFIER
PARTY
ACTION PREFERENCE
PERSON
DUPLICATE
PARTY
CATEGORY TYPE ORGANIZATION SUSPECT
EQUIVALENCY
Chapter 1. Rapid Deployment Package for Master Data Management solution overview 13
Figure 1-7 shows the general flow of the RDP for MDM solution and the various
roles and supporting InfoSphere products participating in the different phases of
the implementation.
MDM
Source Data Analysis MDM Point in Time Load
Consumption
Delta RDP for MDM uses the Information Server engine to maintain metadata
lineage automatically. If you choose to use the Maintenance Business
Services method for delta processing, an additional set of custom Java
metadata update services will need to be developed and connected into the
X-Meta hub for data lineage to be automatically maintained and available
through Metadata Workbench.
MDM Business Services Load provides support for standard MDM Server
XML layout, as well as MDM Server SIF layout as the file input format.
This Business Services Load method uses the MDM Server SIF layout. This
method requires that the records with the same context be loaded as one
grouping. This is accomplished by producing the SIF layout and passing it
through the SIF Sequencer DataStage job, as shown in Figure 1-5 on
page 11. The SIF Sequencer sorts the records to allow the input to match
what would have been expected by the MDM Server Business Services if
XML was used as the input format.
Chapter 1. Rapid Deployment Package for Master Data Management solution overview 15
In the MDM consumption phase, the master data that is eventually loaded into
MDM Server (and updated per data latency requirements of the application)
is consumed using one of the interfaces identified earlier.
Potential duplicate master data that could not be automatically collapsed
during load needs to be examined by the Data Steward. Often, duplicate
records have conflicting data in the same fields. It is the Data Steward's role
to choose which data will survive and be carried forward into the master
record. Data Steward uses the MDM Server UI to perform this task.
The MDM Server UI invokes MDM Server business transactions, which
creates the new master record and inactivate duplicate records. As part of
this process, MDM Server invokes duplicate suspect processing again to
identify any potential new suspects. MDM Server will call QualityStage
runtime matching jobs with the same set of rules as in batch QualityStage job
used during initial load to guarantee the same matching logic.
Also provided are the following features:
– Data Stewardship UI, which allows users to perform various processing
activities on party data
– Party Maintenance UI, which is a Web services-based UI featuring a
graphical 360 view of customer data
– Sample OOTB reports and a framework to build your own custom reports.
This feature allows a client to implement reporting requirements around
Suspect Duplicate Processing in support of the Data Stewardship role.
Because the critical fields for person matching and organization matching can be
configured independently, the UI uses two different screens.
Note: A1, A2, and B matches are described in 2.8, “Match” on page 60.
Select the matching critical data fields for a person by moving the appropriate
fields from the left pane to the right pane under “Matching Critical Data Fields”
section.
Chapter 1. Rapid Deployment Package for Master Data Management solution overview 17
The fields selected are as follows:
– Name
– Address
– City
– State/Province
– Country
– ZIP/Postal Code
– Gender
– Birth Date
– Social Security Number
– Driver License Number
S elect M atching C ritical D ata Fields S elected M atching C ritical D ata Fields
N ame N am e
Address A ddress
C ity C ity
State/Province S tate /P rovince
C ountry C ountry
C ounty ZIP /P ostal C ode
Z ip/Postal C ode Add > > Gender
Gender B irth D ate
Birth D ate S ocial S ecurity N um ber
< < R emove
Social Security N um ber D river License num ber
D riv e r Lic ens e num ber
P a ss port
H om e E- m ail A ddress
H om e Telep h on e
M obile Telephone
SU BM IT C AN C EL
Figure 1-9 Configuring the critical matching fields for person matching
M a s te r D a ta M a n a g e m e n t S e r v e r
D a t a S t e w a rd s h ip M a t c h in g C r itic a l D a ta f o r O r g a n iz a tio n
P a rt y M a in t e n a n c e C o n s o le S u s p e c t C a t e g o r y M a tc h i n g S c o r e
R e p o rt in g
S u s p e c t M a tc h C a te g o r y M i n i m u m M a tc h S c o r e
M a t c h i n g C r it i c a l D a t a R u le s
A 1 /D u p lic a t e 27 5 .0
P e rso n
O r g a n iz a t io n A 2 /S u s p e c t 25 0 .0
C o n fig u r a tio n O p tio n s
B / U n re s o lv e d S u s p e c t 23 0 .0
A d m in is t ra t io n C o n s o le
M a tc h i n g C r i ti c a l D a ta F i e l d s
S e l e c t M a t c h i n g C r i ti c a l D a ta F i e l d s S e l e c te d M a tc h i n g C r i ti c a l D a ta F i e l d s
N am e N am e
A d d re ss A d d ress
C it y C i ty
S ta te /P r o v in c e S ta te / P r o v i n c e
C o u n tr y C o u n tr y
C o u n ty Z i p / P o s ta l C o d e
Z ip /P o s ta l C o d e Ad d > > E s ta b l i s h e d D a t e
E s ta b lis h e d D a te C o r p o r a t e T a x I d e n ti fi c a ti o n
C o r p o r a te T a x Id e n tific a tio n D U N S nu m ber
< < R e m o ve
D U N S num be r B u sin e ss T e le p h on e
T a x R e g is t r a t io n N u m b e r
T a x Id e n t if ic a t io n N u m b e r
B u s in e s s T e le p h o n e
B u s i n e s s E -m a i l
S U B M IT CAN
Figure 1-10 Configuring the critical matching fields for organization matching
Chapter 1. Rapid Deployment Package for Master Data Management solution overview 19
1.4.2 Main configuration scenarios
Figure 1-15 on page 23 shows the options available for the initial load of the
MDM data repository using the RDP for MDM solution, delta load using the Delta
RDP for MDM option, and the Batch Framework with SIF or XML layout as input.
Note: Direct database access to MDM Server would only be allowed for
inquiry purposes, because for add/update purposes it would bypass data
quality validations, thereby compromising the master data management
standards. MDM Server highly recommends to use either its Web services
layer or its EJB layer for processing data.
– New applications would access the master data by interfacing with MDM
Server and use the Source System Keys stored therein to access the
non-master data stored in the existing systems as shown in 3.8, “MDM
consumption application” on page 230.
Figure 1-11 is the simplest scenario for consuming data within MDM
Server. The new application can be designed to ensure that customer
master data is read directly from the MDM Server, as well as use MDM
Server to add new customers before adding new accounts in its own
system.
MDM Server
New Application Add/Update/Search Web MDM Server
Services
– Existing applications will continue to read data from their own repositories.
A synchronization process would need to be written to update existing
systems with clean and accurate master data from the MDM Server. The
MDM Server provides a notification mechanism to inform external systems
about significant MDM events, such as the collapse of duplicate parties.
The synchronization process should listen for the data changes that have
occurred in master repository and update existing systems accordingly.
Chapter 1. Rapid Deployment Package for Master Data Management solution overview 21
Three scenarios are briefly described here as follows:
• In the “Modifying a existing application” scenario (Figure 1-12), an
existing application is modified to use MDM Server Web Services to
search for new customers and update their data within MDM Server.
This could involve the creation of new screens within the existing
application, or by re-coding existing screens.
MDM Server
Legacy
Modify for Add/Update/Search Web MDM Server
Application
Services
MDM Server
Legacy Legacy Add/Update/Search
Web MDM Server
Application Adapter
Services
The delta load portion of Figure 1-15 shows the invocation of the
customizable (as indicated by the red border around this box) RDP
Maintenance Business Services component of the MDM Server by the Batch
Framework when processing delta records in the SIF or XML layout format. It
also shows how the SIF records can be consumed directly by the delta RDP
for MDM functionality. This is not covered in this IBM Redbooks publication.
Modify DB
MDM Database
New Applications
Consume Custom
MDM Data Stored MDM UI ESB Legacy Applications
Procedures or Legacy Adapters
Chapter 1. Rapid Deployment Package for Master Data Management solution overview 23
1.4.3 Best practices
The goal of any initial implementation of a technology should be a successful first
phase that fulfills an acceptable set of business requirements.
The goal of the RDP for MDM implementation is to minimize the amount of
configuration necessary within client implementations, by providing prebuilt
configuration assets at the most common configuration points, as follows:
Name Standardization
Address Standardization
Party Matching
Data Model Extensions
If you want greater flexibility with standardizing and matching your input source
data, we suggest you use Configuration C, described in 1.4.2, “Main
configuration scenarios” on page 20 for your RDP for MDM implementations
where you choose to perform both standardization and matching in the SIF to
Load section as shown in Figure 1-15 on page 23.
Note: These configuration parameters are described in Table 2-2 on page 36.
Currently, the SIF and the DataStage/QualityStage have some flexibility for
customization to accommodate the specific needs of your organization (such as
modifying the code table values in the MDM Server and modifying the
standardization rules and de-duplication process). In addition, you can add
columns and change the precision and scale of existing MDM Server columns,
which would require modification of the SIF and DataStage jobs.
If the initial load uses the configuration options in the config file, while the
runtime uses the configuration options in the CONFIGELEMENT table,
data will not be clean after a few days in production.
1 The configuration parameters to be overridden have “configelement ==” in the help text field
Note: The names of the error files vary widely. Therefore, you should
construe that other error files conform to the template shown here for
Party.
Configuration
Parameters
1
By Delta RDP for MDM which was not available at the time of writing of this IBM Redbooks
publication
QS_B_MATCH_CUTOFF_ORGANIZATION (150)
QS_EXCLUDE_FIELDS_FROM_MATCH_ORGANIZATION
QS_A1_MATCH_CUTOFF_PERSON (205)
QS_A2_MATCH_CUTOFF_PERSON (175)
QS_B_MATCH_CUTOFF_PERSON (150)
QS_EXCLUDE_FIELDS_FROM_MATCH_PERSON
QS_MATCH_ORG_NATID (I2)
QS_MATCH_ORG_1
QS_MATCH_ORG_2
QS_MATCH_ORG_3
QS_MATCH_ORG_4
QS_MATCH_PERSON_NATID
QS_MATCH_PERSON_1
QS_MATCH_PERSON_2
QS_MATCH_PERSON_3
QS_MATCH_PERSON_4
DB_SCHEMA
DB_USERID
DB_PASSWORD
a. Parameters indicate that the parameter value must be customized for the user’s particular
environment, while the recommended value for this parameter is in parenthesis
Tables Table 2-2 on page 36 through Table 2-5 on page 41 describe all the
parameters available for customization. It can be a daunting task. However, it is
our opinion that there are a few key parameters that fall in the MUST MODIFY list
(described in “MUST MODIFY parameters” on page 43), others that should be in
the CONSIDER MODIFYING list (described in “CONSIDER MODIFYING
parameters” on page 45), while the rest can be left to default until sufficient
experience has been gained to attempt to customize them as well.
MDM_DEPLOYMENT_NAME WebSphere® Customer Center MDM deployment name required by the jobs
reading and writing to the CONFIGELEMENT
table. Must match the deployed MDM
application name in order for it to update the
correct values.
File location DS_SUPPORT_FILE_DIR /mdmisdata03/data/MDMIS/PARAMETERS/ Directory where required files are installed.
For instance FREQUENCY files used by QS
Match. (at present this seems to be the only
files stored there)
QualityStage QS_A1_MATCH_CUTOFF_ORGANIZATION 205 Specify Org A1a Minimum Match Score - 205
QS_MATCH_ORG_2 I2 (blank)
QS_MATCH_ORG_4 I3 (blank)
QS_MATCH_PERSON_2 C3 (blank)
QS_MATCH_PERSON_3 C5 (blank)
QS_MATCH_PERSON_4 C7 (blank)
DROP DS_DETECTED_DUPLICATES_ACTION E Action to take if duplicates (same key only) records are
detected in the SIF file. The duplicate records will be
removed from input. E: Error all duplicates / K: Keep first,
error others.
DS_PARTY_DROP_SEVERITY_LEVEL 4 Party will be dropped if there are errors with severity <=
DS_PARTY_DROP_SERVERITY_LEVEL. Severity level
ranges from 0 (worst) to 10 (least)
DS_EMAIL_ERROR_CHECK_REPORT 1 Flag to indicate whether the error report (of SIF file error
count) should be e-mailed at all. (abort or not is controlled
by the 3 parms: DS_SIF_ERROR_THRESHOLD,
DS_SIF_INDIVIDUAL_ERROR_THRESHOLD,
DS_SIF_INDIVIDUAL_ERROR_THRESHOLD_KOUNT)
DS_SIF_ERROR_THRESHOLD 101 Percentage of ALL SIF records with errors that will cause
the job stream to abort (any value above 100 will skip this
check)
DS_SIF_INDIVIDUAL_ERROR_THRESHOLD_KOUNT 101 Number of Individual SIF Files, whose Error Threshold has
been exceeded, that are required for an abort.
Error DROP_ON_ASSIGNEDBY_ERR 1 Identifier assigned by party was dropped. 0. Do not drop the
Consolidation party, but drop the identifier record. 1. drop the party.
ReasonCode100385 severity <=party drop
DROP_ON_FROM_ERR 1 Contact Rel from party error action. 0. Do not drop the party,
but drop the contact rel record. 1. drop the party.
ReasonCode100383 severity <=party drop
Runtime FS_HIERARCHY_SIF_FILE_PATTERN /mdmisdata03/Projects/MDMISINT3/SIF Hierarchy SIF files pattern. Includes full path and file
_IN/sanitycheck/*.hsif mask. All files meeting this pattern are read by the RDP
jobs
FS_SIF_FILE_PATTERN /mdmisdata03/Projects/MDMISINT3/SIF SIF files pattern. Includes full path and file mask. All files
_IN/sanitycheck/*.sif meeting this pattern are read by the RDP jobs
DS_PROCESSING_DATE (auto assigned) 1900-01-01 00:00:00 Generated at runtime. Can be used to fix the processing
date if you are restarting the load at a later date.
SK_MID_ADDRESS_ID_SF skMid_ADDRESS_ID.sf The file that holds the previous surrogate key
SK_MID_ALERT_ID_SF skMid_ALERT_ID.sf The file that holds the previous surrogate key
SK_MID_CONT_EQUIV_ID_SF skMid_Contacts_CONTEQUIV_ID.sf The file that holds the previous surrogate key
SK_MID_CONT_ID_SF skMid_Contacts_CONT_ID.sf The file that holds the previous surrogate key
SK_MID_CONT_REL_ID_SF skMid_ContactRel_CONT_REL_ID.sf The file that holds the previous surrogate key
SK_MID_CONTACT_METHOD_ID_SF skMid_CONTACT_METHOD_ID.sf The file that holds the previous surrogate key
SK_MID_CONTR_COMP_VAL_ID_SF skMid_CONTR_COMP_VAL_ID.sf The file that holds the previous surrogate key
SK_MID_CONTR_COMPONENT_ID_SF skMid_CONTR_COMPONENT_ID.sf The file that holds the previous surrogate key
SK_MID_CONTRACT_ID_SF skMid_CONTRACT_ID.sf The file that holds the previous surrogate key
SK_MID_CONTRACT_ROLE_ID_SF skMid_CONTRACT_ROLE_ID.sf The file that holds the previous surrogate key
SK_MID_HIER_ULT_PAR_ID_SF skMid_HIER_ULT_PAR_ID.sf The file that holds the previous surrogate key
SK_MID_HIERARCHY_ID_SF skMid_HIERARCHY_ID.sf The file that holds the previous surrogate key
SK_MID_HIERARCHY_NODE_ID_SF skMid_HIERARCHY_NODE_ID.sf The file that holds the previous surrogate key
SK_MID_HIERARCHY_REL_ID_SF skMid_HIERARCHY_REL_ID.sf The file that holds the previous surrogate key
SK_MID_IDENTIFIER_ID_SF skMid_Identifier_IDENTIFIER_ID.sf The file that holds the previous surrogate key
SK_MID_LOB_REL_ID_SF skMid_LOB_REL_ID.sf The file that holds the previous surrogate key
SK_MID_LOCATION_GROUP_ID_SF skMid_LOCATION_GROUP_ID.sf The file that holds the previous surrogate key
SK_MID_MISCVALUE_ID_SF skMid_MISCVALUE_ID.sf The file that holds the previous surrogate key
SK_MID_NATIVE_KEY_ID_SF skMid_NativeKey_NATIVE_KEY_ID.sf The file that holds the previous surrogate key
SK_MID_ORG_NAME_ID_SF skMid_OrgName_ORG_NAME_ID.sf The file that holds the previous surrogate key
SK_MID_PERSON_NAME_ID_SF skMid_PersonName_PERSON_NAME_ID.sf The file that holds the previous surrogate key
SK_MID_PERSON_SEARCH_ID_SF skMid_PersonName_PERSON_SEARCH_ID.sf The file that holds the previous surrogate key
SK_MID_PPREF_ID_SF skMid_PrivPref_PPREF_ID.sf The file that holds the previous surrogate key
SK_MID_ROLE_LOCATION_ID_SF skMid_ROLE_LOCATION_ID.sf The file that holds the previous surrogate key
SK_MID_SUSPECT_ID_SF skMid_Contacts_SUSPECT_ID.sf The file that holds the previous surrogate key
SK_PREFIX_CONT_ID_NEXT_VAL 1
Surrogate key value for xxxx
SK_PREFIX_CONT_ID_SF skPrefix_Contacts_CONT_ID.sf
The file that holds the previous surrogate key
SK_PREFIX_CONTRACT_ID_SF skPrefix_Contracts_CONTRACT_ID.sf The file that holds the previous surrogate key
SK_PREFIX_HIERARCHY_ID_SF skPrefix_HIERARCHY_ID.sf The file that holds the previous surrogate key
DB_INSTANCE (blank)
DB_PASSWORD (blank)
DB_SCHEMA (blank)
DB_USERID (blank)
$APT_DB2INSTANCE_HOME /home/dsadm/remote_db2config
$APT_STRING_PADCHAR (blank)
DS_PARALLEL_APT_CONFIG_FILE /opt/IBM/InformationServer/Server/Configuratio
ns/MDM_Default.apt
DS_SEQUENTIAL_APT_CONFIG_FILE /opt/IBM/InformationServer/Server/Configuratio
ns/MDM_1X1.apt
DS_LANGUAGE_TYPE_CODE 100
FS_DATA_SET_HEADER_DIR /mdmisdata03/Projects/MDMISINT3/DATA/
FS_ERROR_DIR /mdmisdata03/Projects/MDMISINT3/ERROR/
FS_LOG_DIR /mdmisdata03/data/MDMIS/LOG/
FS_PARAM_SET_DIR ./ParameterSets/
FS_REJECT_DIR /mdmisdata03/Projects/MDMISINT3/REJECT/
FS_SK_FILE_DIR /mdmisdata03/Projects/MDMISINT3/SK/
FS_TMP_DIR /mdmisdata03/data/MDMIS/TMP/
FS_HIERARCHY_SIF_FILE_PATTERN /mdmisdata03/Projects/MDMISINT3/SIF_IN/san
itycheck/*.hsif
FS_SIF_FILE_PATTERN /mdmisdata03/Projects/MDMISINT3/SIF_IN/san
itycheck/*.sif
$APT_IMPORT_PATTERN_USES_FILESET true
$APT_IMPORT_REJECT_STRING_FIELD_OVERRUNS true
$APT_SORT_INSERTION_OPTIMIZATION true
QS_MATCH_PERSON_NATID I1
QS_PERFORM_ORG_MATCH 1
QS_PERFORM_PERSON_MATCH 1
QS_STAN_ADDRESS 1
QS_STAN_ORG_NAME 1
QS_STAN_PERSON_NAME 1
a. This name must match the name used when deploying the MDM application
DS_USE_NATIVE_KEY 1
SK_MID_ALERT_ID_NEXT_VAL 1
SK_MID_CONT_EQUIV_ID_NEXT_VAL 1
SK_MID_CONT_ID_NEXT_VAL 1
SK_MID_CONT_REL_ID_NEXT_VAL 1
SK_MID_CONTACT_METHOD_ID_NEXT_VAL 1
SK_MID_CONTR_COMP_VAL_ID_NEXT_VAL 1
SK_MID_CONTR_COMPONENT_ID_NEXT_VAL 1
SK_MID_CONTRACT_ID_NEXT_VAL 1
SK_MID_HIER_ULT_PAR_ID_NEXT_VAL 1
SK_MID_HIERARCHY_ID_NEXT_VAL 1
SK_MID_HIERARCHY_NODE_ID_NEXT_VAL 1
SK_MID_HIERARCHY_REL_ID_NEXT_VAL 1
SK_MID_IDENTIFIER_ID_NEXT_VAL 1
SK_MID_LOB_REL_ID_NEXT_VAL 1
SK_MID_LOCATION_GROUP_ID_NEXT_VAL 1
SK_MID_MISCVALUE_ID_NEXT_VAL 1
SK_MID_NATIVE_KEY_ID_NEXT_VAL 1
SK_MID_ORG_NAME_ID_NEXT_VAL 1
SK_MID_PERSON_NAME_ID_NEXT_VAL 1
SK_MID_PERSON_SEARCH_ID_NEXT_VAL 1
SK_MID_PPREF_ID_NEXT_VAL 1
SK_MID_ROLE_LOCATION_ID_NEXT_VAL 1
SK_MID_SUSPECT_ID_NEXT_VAL 1
SK_PREFIX_CONT_ID_NEXT_VAL 1
SK_PREFIX_CONTRACT_ID_NEXT_VAL 1
SK_PREFIX_HIERARCHY_ID_NEXT_VAL 1
QS_EXCLUDE_FIELDS_FROM_MATCH_PERSON (blank)
QS_MATCH_ORG_1 (blank)
QS_MATCH_ORG_2 (blank)
QS_MATCH_ORG_3 (blank)
QS_MATCH_ORG_4 (blank)
QS_MATCH_PERSON_1 C1
QS_MATCH_PERSON_2 C3
QS_MATCH_PERSON_3 C5
QS_MATCH_PERSON_4 C7
QS_PHONETIC_CODING_TYPE_ADDRESS QSNYSIIS
QS_PHONETIC_CODING_TYPE_ORGANIZATION QSNYSIIS
QS_PHONETIC_CODING_TYPE_PERSON QSNYSIIS
QS_REJECT_ADDRESS_IF_NOT_STANDARDIZED 0
QS_REJECT_ORG_NAME_IF_NOT_STANDARDIZED 0
QS_REJECT_PERSON_NAME_IF_NOT_STANDARDIZED 0
DS_PARTY_DROP_SEVERITY_LEVEL 4
Notification DS_EMAIL_ERROR_CHECK_DISTRIBUTION
DS_EMAIL_ERROR_CHECK_REPORT 1
Abort DS_DROP_MAX_ITERATIONS 10
handling
DS_FAILED_COLUMNIZATION_ACTION F
DS_FAILED_RECORDIZATION_ACTION F
DS_SIF_ERROR_THRESHOLD 101
DS_SIF_INDIVIDUAL_ERROR_THRESHOLD 101
DS_SIF_INDIVIDUAL_ERROR_THRESHOLD_KOUNT 101
Note: A fix is being developed to ensure that these parameters are processed
properly by the IL_000_AutoStart_PS_IL job.
Each record type / sub-record type (also referred to as RT/ST) combination has a
unique layout (metadata). The record type identifies the primary subject areas
which are Contact (P) and Contract (C). The contact and contract RT/ST
combinations are shown in Figure 2-4.
Contact record type (‘P’) & sub types Contract record type (‘C’) & sub types
PP Person Contact CH Contract
PO Organization Contact CK Native Key
PG Organization Name CC Contract Component
PH Person Name CR Contract Component Role
PE External Match CL Role Location
PA Address CV Contract Component Value
PC Contact Method CM Contract Misc Value
PI Identifier CT Contract Alert
PB Line of Business Relationship
PR Contact Relationship
PM Person Miscellaneous Value
PS Privacy Preference
PT Person Alert
Note: The layout of the RT/ST closely mirrors the tables defined in the MDM
Server repository. The financial services scenario described in Chapter 3,
“Financial services business scenario” on page 67 includes a spreadsheet
template identifying the metadata associated with each RT/ST combination.
This template can be used to define the mapping specification of columns in a
source system to those in the SIF for creating the SIF from that particular
source system.
In the above record layout, <CR><LF> is the DOS Line Feed Character
The domain values of key columns in the SIF must contain the values defined
by the MDM Server. This will require transformation of domain values in the
source system to that of the MDM Server. For example, the domain values for
Gender in the MDM Server are ‘M’ and ‘F’, while the source system may have
‘0’ and ‘1’. The process creating the SIF is responsible for mapping the
domain values appropriately.
When a column is identified as being not nullable, a value must be provided
for it and that value cannot be null.
The Timestamp format is configurable using a format string such as
YYYY-MM-DD.HH.MM.SS. Refer to IBM WebSphere DataStage and
QualityStage Version 8 Parallel Job Developer Guide, SC18-9891-00 for
details on format strings.
The order of rows does not matter, because the rows will be sorted in the
proper order by the DataStage jobs.
Tip: We recommend that you use naming conventions to identify the origin,
content, and date and time attributes of the SIF. Implement a directory
structure where multiple SIFs can be queued in a READY directory and moved
to a LOADING directory, before finally being moved to a COMPLETED
PROCESSING directory when complete.
Figure 2-5 shows the high level flow of the Import SIF step.
1
Column Import Duplicate Drop Dataset
Split SIF .
by
SIF RT/ST . Duplicates dropped
file
into .
18 groups
18
Column Import Duplicate Drop Dataset
Invalid RT/ST
Failed recordization
Funnel Error Failed columnization
file Invalid date format
Duplicates
The IL_010_IS_Import_SIF job reads one or more SIF files through the File Set
facility of DataStage. During this step the row number and input file name
information is captured and appended to the input data before the end of record
(DOS Line Feed) characters. This is done to enable error reconciliation back to
the original input files.
If the file (set) fails this basic recordization, the job fails. However, the process
captures all such errors before failing. The errors are written to a special reject
flat file. When parsing the data in the columns, invalid values in RT/ST and
Admin_ID generate error log rows and are directed to a reject file named
SIF_Import_ERR_MSG.[batchid].txt. Errors detected with columns are called
columnization errors.
Statistics are collected on the number of input errors, the number of rows written
to each RT/ST link, and the number of rows rejected due to recordization and
columnization. Error handling includes recordization and columnization
processing as follows:
During record-level parsing, if a SIF row is not properly formed (record type,
sub-record type, admin_sys_tp_cd, admin_client_id_or_contract_id, data
string, end of record DOS Line Feed character), any additional columns
detected after the final expected column (as defined by the metadata) are
ignored and the following warning message is written to the Director log, as
shown in Example D-2 on page 277.
[“Import consumed only ‘m’ bytes of the record's ‘n’ bytes (no
further warnings will be generated from this partition)”]
Note: Carefully review the Director log output for such warnings, because
they do not appear in the RDP for MDM error logs.
During column parsing, the individual RT/ST record types are processed
through the Column Importer stage after being split. The number of columns
and data types as well as any NOT NULL SIF column restriction is enforced at
this stage. Note that this is not necessarily the same as a NOT NULL
restriction on a column in the target database.
Rejected data is written to an error data set with the record/offset number
within the source file, RT/ST, Admin columns along with an error code values.
One error data set is created for each RT/ST combination. In a later process
(Error Consolidation) all error data sets are consolidated and written to a
sequential file.
Note: A configuration option (Fail entire load if any row fails column
import?) specifies whether the load process should be halted or continued
on occurrence of this error. If this is set to allow the process to continue,
implicitly-related rows may be rejected in the Error Consolidation step.
…
MDM MDM MDM
Code & Code & Code &
Reference Reference Reference
Data Data Data
Output
dataset User Exit Lookup Transformer QualityStage Standardization Dataset
of SIF
Parser For custom
validations ..
18 separate
runs – one for Funnel
each output dataset
of the SIF Parser Reject
link
Invalid codes
Error Invalid date bounds
file etc.
DS_PROCESSING_DATE
Prior to the first validation stage, you may perform some custom validation or
some additional defaulting or value substitution in the Pre-Code Table Validation
Exit. This occurs after the NOT NULL column rules have been applied. Whether
there was an error in the exit or not, the input row is not dropped at this point. If
an error is detected in the exit, it must set the “Error on Line Flag from User Exit”
column to 1. The row is then passed on to the MDM Code Table lookups. A
discussion of the Pre-Code Table Validation Exit is beyond the scope of this IBM
Redbooks publication.
Note: The key to MDM Code Tables requires the code value, a language
code, and the expiration date. The default language code from the MDM
Configuration Manager and the processing date will be used.
Each of the tests may write an error log row and set the 'Error on Line' flag, but
the data will be passed to the next edit to enable all possible errors to be
captured in a single pass. The exception is an error in metadata validation.
Rows that are to be rejected will generate an entry into the Error Log entry and
set the “Error on Line” flag to 1. All tests are performed on all rows. Each of the
following example tests (partial list) will place a 1 in an 'Error on Line' column and
generate an individual entry in the Error Log:
“Error on Line from User Exit” contains a 1.
A column is mandatory but not present.
A column is present but must be empty.
A value should exist on an MDM Code Table, and does not.
A parent key should be located, but was not.
A date column contains data in the wrong format.
The Error message log contains the record number of the row in the SIF, a
message number and description of the error including the column referred to,
and optionally a copy of the data or some snippet of the data that is important to
understanding the error.
Special edits are performed for Person Name, Organization Name, and Address
with respect to standardization.
For the Name RT/ST's a configuration option ("Stan_Person_Name" and
"Stan_Org_Name" for Person Names and Organization Names respectively)
is used to specify whether standardization is to be performed by
QualityStage.
The Address RT/ST contains a column (OVERRIDE_IND) to indicate whether
QualityStage should perform standardization on a particular address. If the
value is N, then the normal processing occurs. If the value is Y then
standardization by QualityStage has been overridden and is bypassed.
Note: Phone number standardization does not have its own parameter. It
is driven by Address Standardization. If Address Standardization is on, the
phone number gets standardized as part of the IL_020_ContactMethod
process.
For both Address and Name, a configuration option ("Phonetic Coding Type")
is used to specify whether QualityStage should generate NYSIIS, or Soundex
phonetic values, or none.
Edits are performed to ensure the other data on the SIF is consistent with the
flags/options.
After all tests have been performed, the row is checked to see if the "Error On
Line" flag is a 1. If it is, the row is discarded and the number of rows discarded is
tracked. If it is not, the row is written to the "Valid RT/ST Data Set".
Pers
ORGNAME
dataset JOIN
of
Validation
Errors Errors Errors
Funnel
Error RI violations
file etc.
Two error logs are produced, one for Contact-related errors and the other for
Contract-related errors.
Error consolidation is the process that collects all errors including those from the
previous steps (Import SIF and Validation and Standardization) and RI validation
in this step and copies the data into two processing streams as follows:
One stream picks up the error severity associated with the errors and only
passes the row forward if the associated severity is greater than or equal to
the configuration option ("Party Drop Severity Level"). The errors kept are
sorted and duplicates removed so that only one row for each source system
key (SSK) is kept.
The other stream contains all previous errors and will be joined with the errors
generated by the following association test to consolidate them into one error
data stream for Contacts and another data stream for Contracts.
Note: All prior jobs must have completed and will have consolidated their own
errors and written them to their own Error Parallel Data Sets. The Job
Sequencer ensures that this and other dependencies are controlled.
The next process is to drop rows with Association Errors. The process is built
with a configuration option ("Party Drop Severity Level") to specify if the process
should drop all rows for a new parent (Contact or Contract) if any of the data
associated with that parent is in error. Such dropped rows have an “error by
association” entry that is sent to the log.
Error log entries generated in this step are funneled together with all the errors
from the previous steps and a consolidated error file is created.
Funnel
CONTACT Pers
Method
JOIN Split
Person & Org
CONTACT
Org
ADDY
A1
ORGNAME
dataset JOIN Org
of Match
Validation
A2, B
If a party is matched to another party and the SIF also contains a Contact
Relationship row for the matched parties, we have a conflict. If the two parties are
the same party (A1) they cannot also have a party-to-party relationship because
there is only one party. In this situation, the match results will be overridden and
the match type is changed to an A2. A suspect row is generated but the parties
are not merged into one party. This processing is done before the generation of
implied matches.
Note: A LOBa Relationship SIF row can override the results of the match
process. A configuration option (Allow Match Across LOB) will identify if
matching across lines of business (LOB) is allowed. If configuration options
specify that matching across LOB is not allowed (default is “allowed”), match
pairs identified as A1 are sent through a process to determine if a group of
matched records must be broken into multiple groups based on LOB.
a. Large enterprises often have multiple lines of business (LOB). Privacy laws and company practices
often restrict the sharing of data across lines of business. An example is an insurance company
that has individual life, group life and property and casualty lines of business. A party may only be a
group and individual life customer. In this example, the party would have two lines of business
relationships.
Split
LOB
Rel Person & Org
Pers
A2, A3, B
ORG
A2, A3, B
Match Execution outputs data where the input rows and candidate rows are
grouped and include a Match Category (A1, A2, and so forth) and Match Score.
If one or more contacts have an A1 match to each other, the data associated
with the contacts are merged at the row level, not at the column level.
Note: From a survivorship perspective, the last row’s values are retained. It
should be noted that the concept of last row is not deterministic because it
depends upon the output of a sort.
If one or more contacts do not have an A1 match then, suspect rows are
created for each match.
The MDM Server model provides the ability to store addresses as a top level
subject domain and then link them to their various usages. Multiple contacts,
contact methods within or across contacts, and contracts can share the same
address. Duplicate addresses may exist within the SIF input or between the SIF
and rows already in the database. Removal of duplicates and altering references
to point to the survivor is achieved using cryptography. The address columns
(DS_MD5_CRITICAL_ADDRESS_COLUMNS parameter specifies the columns
used to calculate the checksum used in address matching) used to determine if
an address is a duplicate is passed into a cryptographic function (MD5
(Message-Digest Algorithm 5) is a widely -cryptographic function with a 128-bit
hash value. The corresponding value is stored with the address. This value will
also be calculated for inserts/updates received in the SIF. The SIF values can
then be used to check for uniqueness within the SIF and to do a quick join/lookup
to the database to see if a duplicate already exists.
With RDP for MDM, you have two choices for loading the MDM data repository:
Bulk Load can only be done natively as there is no ODBC support for bulk
loading.
Figure 2-10 shows the load flow of the operational tables and the history
tables.
Tip: For superior performance, Bulk Load is the preferred loading choice.
Operational
Load Copy
Ready Records
Data
History
Tip: If you do not want to install and configure the components required for
native access to the database, then Upsert is the preferred loading method
over Bulk Load.
Rejects
Operational
File
Load Copy
Ready Records
Data
History History
Rejects
You may also choose to load the history tables during the load by setting the
LOAD_HISTORY_FLAG configuration parameter to “C” for compound1 history
records (default), “S” for simple2 history records, or none.
1
Includes history all the changes that have occurred plus the original record
2
Includes only the history of all the changes that have occurred. If a record never changed, it will not
be in the history table
The RedHat Enterprise Linux 4 platform was chosen as the platform for the DB2
for LUW MDM repository.
Users
&
Administrators
DataStage Engine
Legacy systems & MDM repository
of IBM InfoSphere Information Server IADB
(RedHat Enterprise Linux platform)
(RedHat Enterprise Linux platform)
orion.itsosj.sanjose.ibm.com (9.43.86.101)
phoenix.itsosj.sanjose.ibm.com (9.43.86.102)
Attention: Our objective was to showcase the RDP for MDM implementation
on a RedHat Enterprise Linux platform. For convenience, we chose our data
sources and target MDM repository to be hosted on a single Linux platform,
even though we recognize that in a real world environment these systems
would likely be hosted on an eclectic mix of operating systems, servers, and
database management systems. The configuration we used was only meant
to showcase the functionality of the RDP for MDM solution, and should in no
way be seen as delivering the scalability and performance requirements of
your business solution.
Note: DQA is assumed to have occurred and only the results of this task
are presented here.
2. Review the MDM data model and customize it to the specific requirements of
your organization.
In case of customization, RDP for MDM jobs will need to be modified as well.
Note: Customization of the MDM data model is not covered in this IBM
Redbooks publication.
3. Create the code mapping tables from source to SIF, and update the MDM
code tables with domain values if appropriate.
4. Create a canonical form from the various data sources (Three sources in this
scenario).
Attention: The canonical form was a concept we invented for this scenario
and is not defined in the RDP for MDM solution.
Note: In our case, we thoroughly cleaned the data prior to creating the SIF
so that no errors occurred.
However, to show the error messages generated by the RDP for MDM
jobs, we created other SIFs containing the most frequently occurring errors
and ran it through the RDP for MDM jobs. The purpose was to show the
correspondence between a particular error and the error messages
generated for it by the RDP for MDM jobs. This is described in Appendix D,
“Error processing” on page 271.
9. Verify the successful loading of the MDM repository using the MDM Server
Reporting facility.
10.Resolve any suspect parties that were not automatically collapsed in the load
jobs, but are suspected to be duplicates using the MDM Server Data
Stewardship UI.
Note: The Delta RDP for MDM solution was not available at the time of
writing of this IBM Redbooks publication. It will be addressed later as an
update to this IBM Redbooks publication or a separate IBM Redpaper.
We also have instances of executions of the RDP for MDM jobs with SIFs
containing commonly encountered problems to see the correspondence between
a specific error condition in the SIF, and the corresponding error messages
generated by the RDP for MDM jobs. This is described in Appendix D, “Error
processing” on page 271.
Data Quality Assessment (DQA) Merge data from all the sources
MDM Server Key data columns into a canonical form
Key data + Domain values Domain values in key data columns
No
Verify adequacy of RDP rulesets
No
Modify ruleset OK?
Yes
Figure 3-2 Rapid MDM approach used in the scenario for the initial load
Briefly, a DQA is performed on the data sources to identify the master data
columns and the domain values in these master data columns for inclusion in the
MDM repository. The MDM data model’s master data columns and
corresponding domain values are reviewed against those of the 3 data sources.
Based on this review, the MDM Code Reference tables may need to be updated
with additional values, and source-to-SIF code mapping tables generated
between the source master data columns and corresponding MDM master data
columns.
The master data from the data sources is loaded into a canonical form that
closely mirrors the format of the SIF records consumed by the RDP for MDM
jobs. During this process, you need to ensure that all MDM required columns (as
described in Appendix B, “Standard Interface File details” on page 247) have
valid data in them to avoid rejection by the RDP for MDM jobs. It is more efficient
to detect and fix these errors early in the cycle (potentially in the source system
itself) than after the RDP for MDM jobs have flagged it.
The purpose of creating a canonical form is to have a single format for validating
the efficacy of the RDP for MDM rulesets, and for simplifying the DataStage jobs
for creating the SIF. regardless of the number of data sources involved. Typically,
the data used for validating the efficacy of the RDP for MDM rulesets would be a
representative sample of all the data. If the RDP for MDM rulesets are modified
to address your organization’s data, these modified rulesets must replace the
corresponding default ones in the RDP for MDM jobs.
The data in the canonical form is then loaded into the SIF using the
source-to-SIF column mapping templates you have created, and the
source-to-SIF code mapping tables generated earlier.
Important: Before the RDP for MDM jobs can be run, you must drop all
referential integrity constraints and triggers defined in the MDM repository.
The referential constraints and triggers must be recreated before the MDM
repository can be considered operational and consumable by business
applications.
This overall flow is covered in more detail for our particular scenario as follows:
TBank checking, savings, and loans systems
Data Quality Assessment (DQA)
Create a canonical form from the data sources
Validate efficacy of the RDP for MDM rulesets and modify to suit
Create SIF
Execute RDP for MDM jobs
Verify successful load
The DDL of the three tables is shown in Example 3-1 on page 76, while the data
content in each of these tables is shown in Figure 3-3 on page 78 through
Figure 3-11 on page 86. Note that all the columns are defined as being nullable
with no Primary Key defined. In a real-world environment, you would most likely
have a Primary Key defined for each table.
The master data columns in each table are highlighted in bold in Example 3-1 on
page 76.
10000024 Anton T & Larue Jensen 20000024 Anton T & Larue Jensen 30000019 Anton T & Larue Jensen
Because the business is the ultimate recipient and user of the data resulting from
the integration effort, the success of a DQA is dependent upon the ability and
commitment of the business community to participate in the process, and more
importantly, to resolve semantic and business rule differences at the functional
level. Figure 3-12 on page 89 provides a high-level overview of the main steps of
DQA process.
Prepare the data for assessment
Select the data sources to be investigated and analyzed.
Conduct data discovery
The DA and SME perform the investigation and analyzes using tools such as
IBM WebSphere Information Analyzer and IBM InfoSphere AuditStage. This
involves checking metadata integrity, structural integrity, entity integrity,
relational integrity, and domain integrity.
Document data quality issues and decisions
After all information about data quality is known, the appropriate data
alignment and cleansing decisions can be made and implemented.
IT Data Analyst
SME
Information
Analyzer AuditStage
Staged All
Full Volume Targeted
Source(s) Information
Profiling Columns
Report Review Data Alignment
Meta Data/Domain Integrity Decisions
•Column Analysis
•Completeness Domain Integrity
•Consistency •Business Rule Identification and Validation
•Pattern Consistency
•Translation table creation
Structural Integrity
•Table Analysis
•Key Analysis
Entity Integrity
•Duplicate Analysis
•Targeted Data Accuracy
Relational Integrity
•Cross-Table Analysis
•Redundancy Analysis
The IBM InfoSphere Information Server product provides three tools for data
assessment:
IBM InfoSphere Information Analyzer
This product enables you to assess large volumes of data in a fraction of the
time that could be handled manually. Through its Column Analysis, Primary
Key Analysis and Cross-Table Analysis functions, IBM InfoSphere Information
Analyzer enables systematic analysis and reporting of results, allowing the
data analyst and subject matter expert to focus on the real problem of data
quality issues.
IBM InfoSphere QualityStage
This product complements IBM InfoSphere Information Analyzer by
investigating free-form text fields such as names, addresses, and
descriptions. IBM InfoSphere QualityStage allows you to define rules for
standardizing free-form text domains, which is essential for effective
probabilistic matching of potentially duplicate master data records. This level
of sophisticated data assessment is critical to understanding the total
In this scenario, we focus on determining the domain values in the columns in the
source systems that need to be mapped to the corresponding columns in the
MDM repository. The determination of the domain values in the source systems
may necessitate adding new domain values to the MDM repository to
accommodate values that exist in the source systems.
Note: As part of the implementation preparation, the MDM code tables must
be populated with appropriate values. The MDM implementation process, the
steps to determine what these values should be and how they are loaded is
not within the scope of this IBM Redbooks publication.
For example, if the source system rates a customer into five categories (one
through five) and the MDM repository only allows four categories, you will need
to add an additional category to the MDM repository code reference table for
customer rating. Also, because the SIF must be loaded with domain values
expected by the MDM repository, the process creating the SIF must map the
values in the source systems to the values in the MDM repository. Mapping
tables are required for each code reference table in the MDM repository. For
example, gender may be stored as 0 (female) and 1 (male) in the source
systems, while the MDM repository expects M (male) and F (female). This
requires a mapping table for gender that maps 0 to F and 1 to M.
Table 3-2 shows the columns that need to be mapped between the sources and
the target MDM repository. This list was arrived at after an analysis of the code
reference tables in the MDM repository and the ones in the source systems.
Table 3-2 Code table mapping between the sources and the MDM repository
Common Source systems MDM Server
columns Column & source & domain values domain values & column
Country COUNTRY in CHECKING US 185 COUNTRY_TP_CD in
(null) and other country codes CDCOUNTRYTP
COUNTRY in SAVINGS US
(null)
COUNTRY in LOAN US
(null)
GENDER in SAVINGS 1
0
(null)
GENDER in LOAN M
F
(null)
Attention: If the MDM repository is populated from the same column (such as
GENDER) in multiple data sources, it is possible that there could be
overlapping values that have different semantic meanings. For example, in
one system, the value 0 could represent a female, while 0 in another system
represents a male. When creating the canonical form, semantic conflicts must
be resolved before populating the column. This situation did not exist in our
scenario.
Figure 3-17 on page 96 shows the mapping between the master data columns in
the source systems to the corresponding columns in the canonical form table
(Example 3-2 on page 95).
There are two columns (SRCSYSTEMID and ZIPCODE) that do not have
corresponding columns in the source. The SRCSYSTEMID column is
generated based on the source system columns being mapped (1 for
Checking, 2 for Savings, and 3 for Loan), while the zip code is embedded in
other columns in the source systems and therefore not explicitly mapped.
Canonical form
Source systems
WORK_STATUS SRCSYSTEMID
PREF_LANGUAGE
CHECKINGID
CUSTOMERID
C GENDER ACCOUNTID
PHONE
H
E CUSTOMERID WORKSTATUS
C
NATIONALITY
CUSTOMER_STATUS
CELLNB
K DOD PHONENB
I AGEVERIFICATIONDOCUMENT
N ADDRESS EMAIL
G SSN
PASSPORTNB
COUNTRY
DOB
AGEVERIFICATIONNB
DRIVERLICNB
NAME SSN
FIRSTNAME
CITY
EFFECTIVE_CUSTOMERDATE
LASTNAME
S
SAVINGSID INITIALS
GENDER
A PHONE STREETADDRESS
V
I
STREET
DRIVERLICENSEID
CITY
N CELLPHONE COUNTRY
G COUNTRY
S SSN ZIPCODE
DOB
SALUTATION
DOD
NAME DOB
SOLICITATIONALLOW
CUSTOMER_PERFORMANCE MARITALSTATUS
GENDER
TITLE NATIONALITY
LASTNAME
CITY CUSTOMERSTATUS
L
O
EMAIL
PASSPORTNB
CUSTOMERPERF
A GENDER STARTDATE
N FIRSTNAME
S CUSTOMERID SOLICITATIONALLOW
STREET
CUSTOMER_STATUS
AGEVERIFICATIONDOC
DOD SALUTATION
INITIALS
COUNTRY PREF_LANGUAGE
MARRIED_STATUS
DOB
FREEFORMNAME
LOANID FREEFORMADDRESS
SRCSYSTEMID is assigned a value of 1 (checking), 2 (savings) or 3 (loans) depending upon the source
ZIPCODE has no assignment from any of the input sources
Note: We assume that the Data Quality Assessment (DQA) has taken place
previously, and the required ODBC data sources (see Figure 3-18 on page 98,
which includes the definition of the TBANK and IADB data sources) have been
defined for both the sources and target systems. All the data sources that
were imported using InfoSphere Information Server console is also available
to FastTrack users. The metadata acquired from these data sources is used to
identify the target columns and tables in FastTrack, and to configure ODBC
connectivity in the generated DataStage jobs.
1
Template jobs for more complex requirements
Figure 3-19 on page 101 through Figure 3-31 on page 112 describe the main
screenshots in creating a specification that maps the SAVINGS source columns
to the corresponding CANONICAL target columns using FastTrack, and the
generation and configuration of the DataStage job for that specification.
Note: The mapping is repeated for the CHECKING and LOAN sources as
well, but that is not repeated here.
To define the sources to canonical form table target mapping, perform the
following steps:
1. Log in login to the appropriate server (virgo) with the user ID isadmin, who is
assumed to have the required permissions to access InfoSphere Information
Server, as shown in Figure 3-19 on page 101.
2. FastTrack source-to-target mapping specifications are contained in projects.
We opened a previously created project named SourceToSif_Canonical for
our mapping specification, as shown in Figure 3-20 on page 102.
Note: The mapping specification is complete when all the columns have
been mapped correctly.
Important: The generated job shown in Figure 3-31 on page 112 is the job
corresponding to CHECKING_TO_CANONICAL instead of
SAVINGS_TO_CANONICAL. This was an error on our part while capturing
screenshots.
Figure 3-32 on page 114 through Figure 3-35 on page 115 show the main
screenshots in the execution of the generated job, and after all the sources were
processed, the partial contents of the canonical form table is shown in
Example 3-3 on page 116.
3.5.4 Validate efficacy of the RDP for MDM rulesets & modify to suit
The purpose of creating a canonical form is to have a single format for validating
the efficacy of the RDP for MDM rulesets, and for simplifying the DataStage jobs
for creating the SIF, regardless of the number of data sources involved. The data
used for validating the efficacy of the RDP for MDM rulesets should be a
representative sample of all the data.
Note: In our test environment, the volume of data was quite small. We
therefore chose to use all of it as input to this process.
If the RDP for MDM rulesets are modified to address your organization’s data,
then these modified rulesets must replace the corresponding default ones in the
RDP for MDM jobs.
Figure 3-36 on page 119 through Figure 3-42 on page 123 show some of the
main screenshots that describe the import process. To import OOTB RDP for
MDM rulesets into a DataStage project, perform the following steps
1. Launch the WebSphere DataStage and QualityStage Designer. From the task
bar, navigate to Import → DataStage Components, as shown in Figure 3-36
on page 119.
2. In the DataStage Repository Import window (Figure 3-37 on page 119),
specify the RDP for MDM jobs dsx file. Select the Import selected radio
button to select the components to import. Click OK.
3. Figure 3-38 on page 120 through Figure 3-40 on page 122 show the available
components. Because we were only interested in the components related to
name and address standardization, we only selected them (four shared
containers and all the rulesets) and clicked OK, as shown in Figure 3-40 on
page 122. The progress of the import of the selected components is shown in
Figure 3-41 on page 122.
At the completion of the import, you see the imported components in the
ValidationStanContainers in the navigation pane in Figure 3-42 on page 123.
You can now proceed to validate the efficacy of the OOTB RDP for MDM rulesets
in the standardization job in “Validating the RDP for MDM rulesets on the
standardization job” on page 123.
Figure 3-37 Import OOTB RDP for MDM rulesets into standardization job,
part 2 of 7
Figure 3-41 Import OOTB RDP for MDM rulesets into standardization job,
part 6 of 7
Our objective was to ensure that most, if not all of the input data was loaded
by RDP for MDM into the MDM data repository. Towards this end, we focused
on ensuring that the critical columns (CITY in this case) had the necessary
information. This led us to work on the USPREP ruleset. (The USPREP
ruleset is not supplied with the RDP for MDM jobs. It is a part of the standard
QualityStage ruleset.) Due to time constraints, we did make an effort to modify
the other rulesets to enhance the quality of the standardization performed by
the OOTB RDP for MDM rulesets. However, given our recommendation to
work with all the OOTB RDP for MDM rulesets, we demonstrate here the
process of validating the efficacy of the OOTB RDP for MDM rulesets and
modifying them if necessary for subsequent replacement of the original OOTB
rulesets.
Figure 3-43 on page 127 through Figure 3-59 on page 138 show some of the
main screenshots that describe the validation process. Perform the following
steps to validate RDP for MDM rulesets on the standardization job:
1. Launch the WebSphere DataStage and QualityStage Designer and display
the VSSTANAddress shared container on the Designer canvas, as shown in
Figure 3-43 on page 127 through Figure 3-45 on page 129. VSSTANAddress
is the shared container that contains the USPREP stage to be validated. This
stage processes address data from one or more source columns and moves it
into appropriate domain columns. Because we were only interested in the
address fields, our focus was on reviewing the street address in the
AddressDomain_USPREP column and the city name, state, and zip code in
the AreaDomain_USPREP column.
2. The USPREP stage was inspected to see which rulesets were used and how
they were used. We did not modify it. We reviewed them in order to generate
a corresponding standardization job (J02_ORGUSPREP_STAN) to test the
OOTB RDP for MDM rulesets.
1
FREEFORMADDRESS is really the column we wanted to target. But we knew that either
FREEFORMADDRESS or STREETADDRESS contained the data we needed. Therefore, this
setup ensures that it generates one address to be standardized for each row.
Figure 3-46 Validate RDP for MDM ruleset on standardization job, part 4 of 17
Figure 3-48 Validate RDP for MDM ruleset on standardization job, part 6 of 17
Figure 3-50 Validate RDP for MDM ruleset on standardization job, part 8 of 17
Figure 3-55 Validate RDP for MDM ruleset on standardization job, part 13 of 17
Figure 3-57 Validate RDP for MDM ruleset on standardization job, part 15 of 17
1
The USPREP is modified rather than ORGUSPREP, because that is the ruleset in the RDP for
MDM jobs which would need to be replaced.
Note: The override codes A (for ADDRESS) circled in the Enter Input
Pattern text field corresponds to the literal ZQADDRZQ explained earlier.
The boxed values correspond to the characters overridden with A in the
Override Code column of the Current Pattern List.
3. The modified rulesets are provisioned as shown in Figure 3-63 on page 143.
You need to provision new, copied, or customized rule sets in the Designer
client before you can compile and run a job that uses them.
4. A copy of the J02_ORGUSPREP_STAN job is created as
J12_USPREP_STAN using a stage named USPREP, as shown in Figure 3-64
on page 143.
The USPREP stage is modified to refer to the canonical form data columns
STREETADDRESS and FREEFORMADDRESS, as shown in Figure 3-65 on
page 144.
Figure 3-66 on page 144 shows the execution of this job.
5. Figure 3-67 on page 145 shows the results of processing by the modified
USPREP ruleset, which shows the AreaDomain_USPREP column populated
with the city name for the relevant rows. This indicates successful input
pattern overrides.
6. Figure 3-68 on page 146 through Figure 3-70 on page 147 show the
successful export of the modified USPREP ruleset as a dsx file
(USPREPCHANGED.dsx).
We proceeded to import the modified USPREP ruleset into the RDP for MDM
jobs as described in “Import modified RDP for MDM rulesets into RDP for MDM
project” on page 147.
Figure 3-64 Override Input Pattern & rerun the standardization job, part 5 of 11
Figure 3-66 Override Input Pattern & rerun the standardization job, part 7 of 11
Figure 3-70 Override Input Pattern & rerun the standardization job, part 11 of 11
Import modified RDP for MDM rulesets into RDP for MDM
project
Figure 3-71 on page 149 through Figure 3-75 on page 151 describe some of the
main screenshots involved in importing the modified RDP for MDM rulesets in
“Override Input Pattern, rerun the standardization job, and export modified
ruleset” on page 138 into the RDP for MDM project using WebSphere DataStage
Designer.
Note: After provisioning, the job using the modified rulesets must be
recompiled
With the creation of the SIF (that proceeded in parallel and is described in 3.5.5,
“Create SIF” on page 151), we proceeded to execute the RDP for MDM jobs as
described in 3.5.6, “Execute RDP for MDM jobs” on page 175.
Figure 3-73 Import modified RDP for MDM rulesets into RDP for MDM jobs,
part 3 of 5
Figure 3-74 Import modified RDP for MDM rulesets into RDP for MDM jobs,
part 4 of 5
Because we had used Information Analyzer in the DQA, we briefly describe the
process of creating one reference table to serve as a lookup for mapping values
in the canonical form table to code values stored in the MDM code tables.
Figure 3-76 on page 154 through Figure 3-81 on page 159 describe the main
screenshots for creating a single reference table. It involves determining the code
values in the appropriate MDM code table and then creating the reference table
in Information Analyzer using these values as follows:
1. Figure 3-76 on page 154 shows the navigation pane in the MDM Server UI.
Navigate to Administration Console → Navigation tree → Code Tables. In
the content pane, select a code table of interest (CdAdminSysTp) from the
drop down list and click GO.
2. Figure 3-77 on page 155 shows the list of valid values in this table. We added
code values for the Checking (1000000), Savings (100001), and Loan
(1000002) systems through this GUI.
Note: Repeat this process for all the code tables of interest in the MDM data
repository for which reference tables need to be created.
Note: Repeat this process for all the columns in the canonical form table
that have code values.
Table 3-3 on page 160 summarizes the code table value mappings between the
source and the target MDM data repository for our scenario.
CUSTOMERSTATUS A 1 CLIENT_ST_TP_CD in
B 2 CDCLIENTSTTP
C 3
D 4
(null)
Generate the SIF from the data stored in the canonical form
data table
We used FastTrack Version 8.0.1 to define the mapping between the columns in
the canonical form data table and the SIF, and generated a DataStage job to load
the SIF tables. A subsequent DataStage job extracted the data from the SIF
tables and created the SIF file for processing by the RDP for MDM jobs.
Because the FastTrack process was similar to the one described in the creation
of the canonical form data table, it is not repeated here.
Figure 3-83 on page 165 through Figure 3-88 on page 170 describe the main
screenshots that perform the lookup of the reference tables for transforming the
code table values.
Table 3-4 on page 163 shows the mapping between the columns in the canonical
form data table to the corresponding SIF columns.
Figure 3-89 on page 171 shows the execution of the job jpGenerateOutputSIF
that extracts the contents of the SIF tables and generates the SIF file with the
pipe | delimiters between the columns.
Example 3-4 on page 171 shows the partial contents of the SIF file generated by
this process corresponding to the canonical form data.
1
These tables map one-to-one with the RT/ST combinations described in Appendix B.1, “SIF details”
on page 248. The DDL for these tables can be downloaded from the IBM Redbooks publications Web
page:
http://www.redbooks.ibm.com/redpieces/abstracts/sg247704.html
CANONICAL_TBL.FREEFORMADDRESS,CANONICAL_TBL.STREETADDRESS PA ADDRESS.ADDR_LINE_ONE
CANONICAL_TBL.CUSTOMERID ADDRESS.ADMIN_CLIENT_ID
CTS_LKP_SRCID.TRANSFORMVALUE ADDRESS.ADMIN_SYS_TP_CD
CANONICAL_TBL.CITY ADDRESS.CITY_NAME
CTS_LKP_COUNTRY.TRANSFORMVALUE ADDRESS.COUNTRY_TP_CD
CANONICAL_TBL.ZIPCODE ADDRESS.POSTAL_CODE
CANONICAL_TBL.CUSTOMERID PP CONTACT.ADMIN_CLIENT_ID
CTS_LKP_SRCSYSTEM.TRANSFORMVALUE CONTACT.ADMIN_SYS_TP_CD
CTS_LKP_AGEVERDOC.TRANSFORMVALUE CONTACT.AGE_VER_DOC_TP_CD
CANONICAL_TBL.DOB CONTACT.BIRTH_DT
CTS_LKP_NATIONALITY.TRANSFORMVALUE CONTACT.CITIZENSHIP_TP_CD
CTS_LKP_CUSTPERF.TRANSFORMVALUE CONTACT.CLIENT_IMP_TP_CD
CTS_LKP_CUSTSTATUS.TRANSFORMVALUE CONTACT.CLIENT_ST_TP_CD
CANONICAL_TBL.DOD CONTACT.DECEASED_DT
CTS_LKP_GENDER.TRANSFORMVALUE CONTACT.GENDER_TP_CODE
CTS_LKP_MARITALST.TRANSFORMVALUE CONTACT.MARITAL_ST_TP_CD
CTS_LKP_PREFLANG.TRANSFORMVALUE CONTACT.PREF_LANG_TP_CD
CANONICAL_TBL.CUSTOMERID PC CONTACTMETHOD.ADMIN_CLIENT_ID
CTS_LKP_SRCID.TRANSFORMVALUE CONTACTMETHOD.ADMIN_SYS_TP_CD
CANONICAL_TBL.CELLNB (when it is NOT NULL) CONTACTMETHOD.REF_NUM
CANONICAL_TBL.CUSTOMERID PC CONTACTMETHOD.ADMIN_CLIENT_ID
CTS_LKP_SRCID.TRANSFORMVALUE CONTACTMETHOD.ADMIN_SYS_TP_CD
CANONICAL_TBL.EMAIL (when it is NOT NULL) CONTACTMETHOD.REF_NUM
CANONICAL_TBL.CUSTOMERID PC CONTACTMETHOD.ADMIN_CLIENT_ID
CTS_LKP_SRCID.TRANSFORMVALUE CONTACTMETHOD.ADMIN_SYS_TP_CD
CANONICAL_TBL.PHONENB (when it is NOT NULL) CONTACTMETHOD.REF_NUM
CANONICAL_TBL.ACCOUNTID CH CONTRACT.ADMIN_CONTRACT_ID
CTS_LKP_SRCID.TRANSFORMVALUE CONTRACT.ADMIN_SYS_TP_CD
CANONICAL_TBL.ACCOUNTID CC CONTRACTCOMPONENT.ADMIN_CONTRACT_ID
CTS_LKP_SRCID.TRANSFORMVALUE CONTRACTCOMPONENT.ADMIN_SYS_TP_CD
CTS_LKP_PRODTP.TRANSFORMVALUE CONTRACTCOMPONENT.PROD_TP_CD
CANONICAL_TBL.CUSTOMERID CR CONTRACTROLE.ADMIN_CLIENT_ID
CTS_LKP_SRCID.TRANSFORMVALUE CONTRACTROLE.ADMIN_CLIENT_SYS_TP_CD
CANONICAL_TBL.ACCOUNTID CONTRACTROLE.ADMIN_CONTRACT_ID
CTS_LKP_SRCID.TRANSFORMVALUE CONTRACTROLE.ADMIN_SYS_TP_CD
CTS_LKP_PRODTP.TRANSFORMVALUE CONTRACTROLE.PROD_TP_CD
REGIONS_STAGING.REGION_ID HN HIERARCHY_NODE.ADMIN_CLIENT_ID
REGIONS_STAGING.REGION_DESCRIPTION HIERARCHY_NODE.DESCRIPTION
REGIONS_STAGING.REGION_ID HN HIERARCHY_REL.ADMIN_CLIENT_ID_CHILD
REGIONS_STAGING.PARENT_REGION_ID HIERARCHY_REL.ADMIN_CLIENT_ID_PARENT
(when PARENT_REGION_ID IS NOT NULL)
REGIONS_STAGING.REGION_ID HN HIERARCHY_UP.ADMIN_CLIENT_ID
REGIONS_STAGING.REGION_DESCRIPTION HIERARCHY_UP.DESCRIPTION
(when PARENT_REGION_ID is null)
CANONICAL_TBL.CUSTOMERID PI IDENTIFIER.ADMIN_CLIENT_ID
CTS_LKP_SRCID.TRANSFORMVALUE IDENTIFIER.ADMIN_SYS_TP_CD
CANONICAL_TBL.DRIVERLICNB (when it is NOT NULL) IDENTIFIER.REF_NUM
CANONICAL_TBL.CUSTOMERID PI IDENTIFIER.ADMIN_CLIENT_ID
CTS_LKP_SRCID.TRANSFORMVALUE IDENTIFIER.ADMIN_SYS_TP_CD
CANONICAL_TBL.PASSPORTNB (when it is NOT NULL) IDENTIFIER.REF_NUM
CANONICAL_TBL.CUSTOMERID PI IDENTIFIER.ADMIN_CLIENT_ID
CTS_LKP_SRCID.TRANSFORMVALUE IDENTIFIER.ADMIN_SYS_TP_CD
CANONICAL_TBL.SSN (when it is NOT NULL) IDENTIFIER.REF_NUM
CANONICAL_TBL.CUSTOMERID PH PERSONNAME.ADMIN_CLIENT_ID
CTS_LKP_SRCID.TRANSFORMVALUE PERSONNAME.ADMIN_SYS_TP_CD
CANONICAL_TBL.FREEFORMNAME PERSONNAME.FREE_FORM_NAME
CANONICAL_TBL.FIRSTNAME,CANONICAL_TBL.FREEFORMNAME PERSONNAME.GIVEN_NAME_ONE
CANONICAL_TBL.FREEFORMNAME,CANONICAL_TBL.LASTNAME PERSONNAME.LAST_NAME
CTS_LKP_SALUTATION.TRANSFORMVALUE PERSONNAME.PREFIX_NAME_TP_CD
CANONICAL_TBL.CUSTOMERID CL ROLELOCATION.ADMIN_CLIENT_ID
CTS_LKP_SRCID.TRANSFORMVALUE ROLELOCATION.ADMIN_SYS_TP_CD
CANONICAL_TBL.ACCOUNTID ROLELOCATION.ADMIN_CONTRACT_ID
CTS_LKP_SRCID.TRANSFORMVALUE ROLELOCATION.ADMIN_CLIENT_SYS_TP_CD
CTS_LKP_PRODTP.TRANSFORMVALUE ROLELOCATION.PROD_TP_CD
Deactivate RI constraints
The following command was used to deactivate the RI constraints:
db2 -tsvf Deactivate_FK.sql
DB_INSTANCE db2inst1
DB_SCHEMA DB2INST1.a
DB_USERID db2inst1
$APT_IMPORT_PATTERN_USES_FILESET_MOUNTED True
$APT_STRING_PADCHAR (blank)
DS_PARALLEL_APT_CONFIG_FILE /opt/IBM/InformationServer/Server/Configuratio
ns/MDM_Default.apt
DS_SEQUENTIAL_APT_CONFIG_FILE /opt/IBM/InformationServer/Server/Configuratio
ns/MDM_1X1.apt
DS_LANGUAGE_TYPE_CODE 100
FS_DATA_SET_HEADER_DIR /data/RDP/DATA/
FS_ERROR_DIR /data/RDP/ERROR/
FS_LOG_DIR /data/RDP/LOG/
FS_PARAM_SET_DIR ./ParameterSets/
FS_REJECT_DIR /data/RDP/REJECT/
FS_SK_FILE_DIR /data/RDP/SK/
FS_TMP_DIR /data/RDP/TMP/
FS_HIERARCHY_SIF_FILE_PATTERN /data/RDP/SIF_IN/canonical_1/*.hsif
FS_SIF_FILE_PATTERN /data/RDP/SIF_IN/canonical_1/*.sif
$APT_IMPORT_PATTERN_USES_FILESET True
$APT_IMPORT_REJECT_STRING_FIELD_OVERRUNS True
$APT_SORT_INSERTION_OPTIMIZATION True
c
QS_MATCH_PERSON_NATID I1
d
QS_PERFORM_ORG_MATCH 1
QS_PERFORM_PERSON_MATCHe 1
QS_STAN_ADDRESSf 1
g
QS_STAN_ORG_NAME 1
h
QS_STAN_PERSON_NAME 1
a. Period is required
b. Default is the passport number — this should be I2 which is the Corporate Tax Identification
c. The default setting of C2 equates to Business Phone Number, which is not a reasonable national id document. We therefore changed it to I1, which
is SSN
d. We chose to perform Org match
e. We chose to perform Person match
f. We chose to perform standardization on address
g. We chose to perform standardization on OrgName
h. We chose to perform standardization on PersonName
DS_USE_NATIVE_KEY 1
SK_MID_ALERT_ID_NEXT_VAL 1
SK_MID_CONT_EQUIV_ID_NEXT_VAL 1
SK_MID_CONT_ID_NEXT_VAL 1
SK_MID_CONT_REL_ID_NEXT_VAL 1
SK_MID_CONTACT_METHOD_ID_NEXT_VAL 1
SK_MID_CONTR_COMP_VAL_ID_NEXT_VAL 1
SK_MID_CONTR_COMPONENT_ID_NEXT_VAL 1
SK_MID_CONTRACT_ID_NEXT_VAL 1
SK_MID_HIER_ULT_PAR_ID_NEXT_VAL 1
SK_MID_HIERARCHY_ID_NEXT_VAL 1
SK_MID_HIERARCHY_NODE_ID_NEXT_VAL 1
SK_MID_HIERARCHY_REL_ID_NEXT_VAL 1
SK_MID_IDENTIFIER_ID_NEXT_VAL 1
SK_MID_LOB_REL_ID_NEXT_VAL 1
SK_MID_LOCATION_GROUP_ID_NEXT_VAL 1
SK_MID_MISCVALUE_ID_NEXT_VAL 1
SK_MID_NATIVE_KEY_ID_NEXT_VAL 1
SK_MID_ORG_NAME_ID_NEXT_VAL 1
SK_MID_PERSON_NAME_ID_NEXT_VAL 1
SK_MID_PERSON_SEARCH_ID_NEXT_VAL 1
SK_MID_PPREF_ID_NEXT_VAL 1
SK_MID_ROLE_LOCATION_ID_NEXT_VAL 1
SK_MID_SUSPECT_ID_NEXT_VAL 1
SK_PREFIX_CONT_ID_NEXT_VAL 1
SK_PREFIX_CONTRACT_ID_NEXT_VAL 1
SK_PREFIX_HIERARCHY_ID_NEXT_VAL 1
QS_EXCLUDE_FIELDS_FROM_MATCH_PERSON (blank)
QS_MATCH_ORG_1b I2
QS_MATCH_ORG_2 (blank)
QS_MATCH_ORG_3 (blank)
QS_MATCH_ORG_4 (blank)
QS_MATCH_PERSON_1 C1
QS_MATCH_PERSON_2 C3
QS_MATCH_PERSON_3 C5
QS_MATCH_PERSON_4 C7
QS_PHONETIC_CODING_TYPE_ADDRESS QSNYSIIS
QS_PHONETIC_CODING_TYPE_ORGANIZATION QSNYSIIS
QS_PHONETIC_CODING_TYPE_PERSON QSNYSIIS
QS_REJECT_ADDRESS_IF_NOT_STANDARDIZED 0
QS_REJECT_ORG_NAME_IF_NOT_STANDARDIZED 0
QS_REJECT_PERSON_NAME_IF_NOT_STANDARDIZED 0
DS_PARTY_DROP_SEVERITY_LEVELc 0
DS_EMAIL_ERROR_CHECK_REPORT 1
Abort DS_DROP_MAX_ITERATIONS 10
handling
DS_FAILED_COLUMNIZATION_ACTIONd C
DS_FAILED_RECORDIZATION_ACTIONe C
DS_SIF_ERROR_THRESHOLD 120
DS_SIF_INDIVIDUAL_ERROR_THRESHOLD 50
DS_SIF_INDIVIDUAL_ERROR_THRESHOLD_KOUNT 12
a. We chose to adopt the time stamp format for finer granularity information
b. We did not have organizations in our input data. But if we had, then the default value of C1 (which is SSN) is not apropriate
c. Defines the severity level below which parties are dropped — we chose the least sensitive setting
d. Defines what you want to do with a parsing failure — we chose C(ontinue)
e. Defines what you want to do with a parsing failure — we chose C(ontinue)
The Job Run Options are shown in Figure 3-91 through Figure 3-95 on
page 186.
The successful completion of the job is shown in Figure 3-96 on page 187 with
an elapsed time of 7 minutes and 2 seconds.
We then proceeded to verify the successful load of the MDM data repository as
described in 3.5.7, “Verify successful load” on page 188.
Note: If the RDP for MDM jobs have successfully validated all the rows, then
no errors should be highlighted by this process. If RI constraints are found to
be violated, then an error (SQLSTATE 23512) is raised and the table is put into
a check pending state. You will then have to resolve these errors before
proceeding further.
Figure 3-97 on page 192 through Figure 3-101 on page 196 show the search
and successful retrieval of information relating to a customer whose given name
is “Torben”.
Note: The information associated with Torben Andersom has been merged
from different recordsa in the input because the RDP for MDM jobs was
able to automatically match (“A1”) the Torben Andersom records in the
CHECKING, SAVINGS, and LOAN systems.
a. Passport Number is from the LOAN system, while Social Security Number is
from the CHECKING/SAVINGS systems.
Note: You should review master data information of other important customers
as well, and once the information retrieved is deemed to be accurate, you can
conclude that the load by the RDP for MDM jobs was successful.
The process for manually reviewing these suspects and resolving duplicates is
called suspect resolution. The MDM Server UI provides the capability to find the
identified suspects and resolve (and mark) them as duplicates or not, as shown
in Figure 3-102 on page 199 through Figure 3-110 on page 207. It involves
searching for suspects, reviewing their details, and collapsing them into a single
record and choosing the column values to store in the collapsed record.
Note: Using the realtime services of MDM Server will ensure that they are not
raised as possible duplicates again.
Note: Each time parties are collapsed, the MDM Server will invoke suspect
processing for a newly created party to identify any new potential suspects. If
you have modified the Standardization and Matching Quality Stage rules in the
RDP for MDM initial load, the same rules must be deployed in MDM Server
runtime to ensure identical business logic between runtime services requests
and load. For more information about integrating runtime Quality Stage rules
with MDM Server see InfoSphere MDM Server Developer Guide. This guide is
part of the documentation available to you after you have installed the MDM
Server.
You should repeat this process for all persons that have suspects associated with
them.
You may now integrate the realtime services of MDM Server into your existing
applications as described earlier.
Figure 3-111 on page 210 illustrates the concept. It shows the following
information:
3 hierarchies
– National
– Western Region
– Eastern Region
6 parties
– Austin
– Bill,
– Charles
– David
– Estelle
– Frank
1
Each node must reference a valid Hierarchy using the Hierarchy Name (such as Legal, Marketing &
Finance) and TypeCode (1, 2, & 3)
2
The RDP for MDM data model only supports a party/contact hierarchy even though MDM Server
supports product hierarchies as well.
Note: Each party has 2 corresponding hierarchy nodes associated with it.
Note: Business rules have been defined to ensure that a cyclic graph does not
occur.
Hierarchy Node
David
Hierarchy data is processed as a separate feed after all other party or contact
data has been validated, matched, keys assigned, and the data loaded into the
MDM data repository. The input hierarchy data is validated against the hierarchy
data and party or contact information already in the MDM data repository.
The Hierarchy RT/ST (Table B-20 on page 260 through Table B-23 on page 261)
data is processed in the same manner as the non-hierarchy RT/ST party/contact
and contract data.
Note: There were no organizations in our TBank data. For the purposes of
creating a hierarchy, we loaded organization records in to the MDM data
repository. The loading of these organization records is not described here.
Figure 3-112 shows the MARKETING hierarchy, the various party (person1 and
organization2) hierarchy nodes, and the hierarchy node relationships defined for
the TBank scenario. We have combined persons and organizations in the same
hierarchy. We did not have persons in some organizations in our scenario (an
unlikely situation in the real world).
US – Wide Marketing
Local Marketing – San Jose Local Marketing – San Francisco Local Marketing – Eugene Local Marketing – Salem Local Marketing – Seattle
Yesica Anderson
A Carter
Christina Anderson
Alex Skov
Denise Farrel
Kurt Madi
Barry Rosen
1
Oval shape
2
Rectangle shape
The sequence of the SIF records is immaterial because they are sorted into the
required sequence by the RDP for MDM jobs.
As mentioned earlier, you will typically integrate the realtime services of MDM
Server into your existing applications in order to access the master data therein.
However, our scenario involved writing a new simple MDM consumption
application that obtains a 360-degree view of a customer, where master data is
obtained from the MDM Server through a Web service call and non-master data
is retrieved from the corresponding source systems checking, savings, and loan.
Our application involved a GUI interface that provided for search on first name
and last name in the MDM repository, returning the address (master data from
the MDM repository) and balance (non-master data) from the appropriate source
systems checking, savings, and loan. The MDM consumption application was
developed as a JSP and J2EE™ application and can be downloaded from the
IBM Redbooks Web page:
http://www.redbooks.ibm.com/redpieces/abstracts/sg247704.html
Note: In our sample application, we did not provide for wild card searches
and assumed that the results of the search would either be zero or one row
from the MDM repository with the associated party id
Uses the party ID to retrieve the address information from the MDM
repository, and the corresponding source system keys (SSK) for the checking,
saving, and loan systems as highlighted in Example 3-10 on page 233.
Uses the SSKs to connect to the DB2 for LUW source systems to retrieve the
balance (non-key) data (as highlighted in Example 3-10 on page 233) and
present back to the user as shown in Figure 3-131 on page 232.
In this case, the customer Renee Jackson only has a Savings account, no
Checking or Loan accounts.
Note: The code shown in Example 3-10 on page 233 is only meant to show
the Web service calls and subsequent access to the source systems. It has no
error handling capabilities which would be essential in a real world application.
<html>
<head>
<title>test</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta name="GENERATOR" content="Rational Application Developer">
</head>
<body>
<%
if(request.getParameter("query") != null){
%>
<%
PersonSearchResult myPSrchR = myPSRR.getSearchResult(0);
%>
proceedFlag = false;
if(proceedFlag){
query = "select balance from db2inst1.checking " + " where customerid =?";
stmt = con.prepareStatement(query);
custid = Integer.parseInt(strCustId);
stmt.setInt(1, custid);
rs = stmt.executeQuery();
if(rs.next()){ %>
<% }
stmt.close();
} else { %>
<% }
myPASKR = myPSP.getPartyAdminSysKeyByPartyId(myControl,"1000001",partyId);
strCustId = myPASKR.getAdminSysKey().getAdminSysPartyId();
strCustId = strCustId.substring(0,strCustId.length()-1);
proceedFlag = true;
proceedFlag = false;
if(proceedFlag){
query = "select balance from db2inst1.savings where savingsid =?";
custid = Integer.parseInt(strCustId);
stmt = con.prepareStatement(query);
stmt.setInt(1, custid);
rs = stmt.executeQuery();
if(rs.next()){ %>
<% }
stmt.close();
} else { %>
<% }
myPASKR = myPSP.getPartyAdminSysKeyByPartyId(myControl,"1000002",partyId);
strCustId = myPASKR.getAdminSysKey().getAdminSysPartyId();
proceedFlag = true;
proceedFlag = false;
if(proceedFlag){
query = "select balance from db2inst1.loan where customerid =?";
custid = Integer.parseInt(strCustId);
stmt = con.prepareStatement(query);
stmt.setInt(1, custid);
rs = stmt.executeQuery();
if(rs.next()){ %>
<% }
stmt.close();
} else { %>
<% } %>
</table>
<% } catch(Exception e) {
e.printStackTrace();
} else {
</form>
<%
}
%>
</body>
</html>
Master data usage and functionality can be categorized into 3 different styles:
Collaborative
Collaborative use of master data involves creating, defining, verifying and
augmenting master data to establish a single version of the truth about
customers, products, suppliers and accounts.
Operational
Operation use focuses on the management, delivery and consumption of
master data in day to day operations.
Analytical
Analytical use stages master data destined for analytical systems or to supply
rich insight to operational processes.
After the decision is made to consolidate data from multiple sources, you need to
decide how closely to integrate the data between the systems and how often to
keep it up to date.
A.2.2 Registry
The Registry style creates a skeleton record with minimum amount of data
required to identify the master record and to facilitate the linking of accounts
across multiple source systems. This is the most popular style implemented for a
first phase of IBM InfoSphere MDM Server. The data collected is not used to
update other systems. The system of record of the data belongs to the individual
source systems.
A.2.3 Coexistence
The Coexistence style implements all the features of the Registry style, but also
provides data elements that the client wants to track at the party level. Master
data can be updated in source systems or in MDM Server, in which case the data
is fed back to source systems. The Rapid Deployment Package (RDP) for MDM
solution facilitates the process of feeding MDM Server with master data from
source systems in batch fashion. The coexistence style is one step closer than
the Registry style to becoming the system of record, but the existing source
systems still remain as the system of record.
A.2.4 Transaction
The Transaction style implements centralized management of master data. All
data updates happen directly to the MDM solution and can be distributed to other
applications and systems, which implement read-only access. The MDM
repository becomes the system of record for master data.
InfoSphere MDM Server maintains master data for multiple domains including
customer, account, and product, as well as other data types such as location and
privacy preferences. Through business services, InfoSphere MDM Server
facilitates integration with all applications and business processes that consume
master data.
You can interface with MDM Server using one of the supported interfaces1
including the following interfaces:
RMI
JMS
Batch
Web Services
1
MDM Server supports an XML-based transaction interface. It comes with a request and a response
schema, defined in XSD. All input XMLs must conform to the request schema, while MDM Server
always responds with an XML conforming to the response schema. The schemas define the
structure of the business objects, which should be passed in or returned from MDM Server
transactions.
Extension Framework
This component provides mechanisms for extending the behavior and data
model of the product.
A code generation tool is provided to allow clients to add more columns to
existing fields and to add more tables to implement new business features.
The code generation tool also generates the Web services integration code
for the data extension or data addition.
External Components
These are components delivered as part of MDM Server and are consumers
of MDM Server’s services.
MDM Server provides a framework for batch processing. The batch processor
is a common J2SE™ component that supports pluggable readers/writers,
multiple instances, and concurrent processing within an instance for high
throughput. The batch processor invokes the request framework for each
transaction read. Therefore, all MDM Server services and client-defined
Note: For full details on IBM InfoSphere MDM Server functionality and the
data model, refer to MDM Server documentation available with the RDP for
MDM packaged software.
You should map the key data columns in your source systems to corresponding
columns in the appropriate SIF RT/ST records before they can be loaded into the
MDM repository using RDP for MDM jobs.
Note: The SIF supports both inserts to and updates of records in the MDM
repository, but not delete operations. In this IBM Redbooks publication, we
cover both inserts (for initial load) and updates to perform delta processing.
The data types of each column in the RT/ST must be known to map the columns
in your source systems to the SIF — this is defined in the RT/ST templates
provided as part of the RDP for MDM solution. Table B-1 on page 250 through
Table B-23 on page 261 do not contain the data type information.
When a value in the column of an RT/ST record can be null (as indicated by the
“N” in the “Can be empty?” column in Table B-1 on page 250), you can define the
action to be taken on the value in the corresponding column of the MDM data
repository when NULL is supplied as a value in that RT/ST column as follows:
Set the null indicator for that column in the RT/ST to a 1 or 0. The Mapping Rule
specifies the action to be taken on the value in the corresponding column of the
MDM data repository. The null indicator columns (names beginning with
“NULL_”) and their corresponding Mapping Rule are shown in Table B-1 on
page 250 through Table B-23 on page 261. For example, in the RT/ST, the
NULL_PREF_LANG_TP_CD in Table B-1 on page 250 column corresponds to
the PREF_LANG_TP_CD column (which can be empty) and the Mapping rule for
it specifies that the following action be taken:
If 1 then set to null, if 0 and column is empty use prior value, if 0
and column is not empty overwrite prior value
Note: If the PREF_LANG_TP_CD column has a value, then the null indicator
setting does not apply.
Table B-1 on page 250 through Table B-23 on page 261 provide a high level
overview of the individual columns and mapping rules for each of the 23 RT/ST
combinations.
RECTYPE N "P"
SUBTYPE N "P" or "O" (Cannot be updated)
ADMIN_SYS_TP_CD N CDADMINSYSTP
ADMIN_CLIENT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
FORCE_MATCH N "Y" or "N"
CONTEQUIV_DESCRIPTION Y
ACCE_COMP_TP_CD Y CDACCETOCOMPTP
PREF_LANG_TP_CD Y CDLANGTP
CONTACT_NAME Y
SOLICIT_IND Y
CONFIDENTIAL_IND Y
CLIENT_IMP_TP_CD Y CDCLIENTIMPTP
CLIENT_ST_TP_CD Y CDCLIENTSTTP
CLIENT_POTEN_TP_CD Y CDCLIENTPOTENTP
RPTING_FREQ_TP_CD Y CDRPTINGFREQTP
LAST_STATEMENT_DT Y
ALERT_IND Y
PRVBY_ADMIN_SYS_TP_CD Y CDADMINSYSTP
PRVBY_ADMIN_CLIENT_ID Y
DO_NOT_DELETE_IND Y
SOURCE_IDENT_TP_CD Y CDSOURCEIDENTTP
LAST_USED_DT Y
LAST_VERIFIED_DT Y
SINCE_DT Y
LEFT_DT Y
ACCESS_TOKEN_VALUE Y
ORG_TP_CD Y MUST BE EMPTY if SUBTYPE = "P", REQUIRED FOR SUBTYPE = "O" CDORGTP
INDUSTRY_TP_CD Y MUST BE EMPTY if SUBTYPE = "P" CDINDUSTRYTP
ESTABLISHED_DT Y MUST BE EMPTY if SUBTYPE = "P"
BUY_SELL_AGR_TP_CD Y MUST BE EMPTY if SUBTYPE = "P" CDBUYSELLAGREETP
PROFIT_IND Y MUST BE EMPTY if SUBTYPE = "P"
MARITAL_ST_TP_CD Y MUST BE EMPTY if SUBTYPE = "O" CDMARITALSTTP
BIRTHPLACE_TP_CD Y MUST BE EMPTY if SUBTYPE = "O" CDCOUNTRYTP
CITIZENSHIP_TP_CD Y MUST BE EMPTY if SUBTYPE = "O" CDCOUNTRYTP
HIGHEST_EDU_TP_CD Y MUST BE EMPTY if SUBTYPE = "O" CDHIGHESTEDUTP
AGE_VER_DOC_TP_CD Y MUST BE EMPTY if SUBTYPE = "O" CDAGEVERDOCTP
GENDER_TP_CODE Y MUST BE EMPTY if SUBTYPE = "O" not validated
BIRTH_DT Y MUST BE EMPTY if SUBTYPE = "O"
DECEASED_DT Y MUST BE EMPTY if SUBTYPE = "O"
CHILDREN_CT Y MUST BE EMPTY if SUBTYPE = "O"
DISAB_START_DT Y MUST BE EMPTY if SUBTYPE = "O"
DISAB_END_DT Y MUST BE EMPTY if SUBTYPE = "O"
USER_IND Y MUST BE EMPTY if SUBTYPE = "O"
NULL_DESCRIPTION N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ACCE_COMP_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PREF_LANG_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CONTACT_NAME N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SOLICIT_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CONFIDENTIAL_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CLIENT_IMP_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CLIENT_ST_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CLIENT_POTEN_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_RPTING_FREQ_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_STATEMENT_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ALERT_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PROVIDED_BY_CONT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_DO_NOT_DELETE_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SOURCE_IDENT_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_USED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_VERIFIED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SINCE_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LEFT_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ACCESS_TOKEN_VALUE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_INDUSTRY_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ESTABLISHED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_BUY_SELL_AGR_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PROFIT_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_MARITAL_ST_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_BIRTHPLACE_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CITIZENSHIP_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_HIGHEST_EDU_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_AGE_VER_DOC_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_GENDER_TP_CODE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_BIRTH_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_DECEASED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CHILDREN_CT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_DISAB_START_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_DISAB_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_USER_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "P"
SUBTYPE N "G"
ADMIN_SYS_TP_CD N Not required.
ADMIN_CLIENT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
ORG_NAME_TP_CD N CDORGNAMETP
ORG_NAME N
S_ORG_NAME Y
START_DT Y Use Processing Date if not supplied.
END_DT Y
LAST_USED_DT Y
LAST_VERIFIED_DT Y
SOURCE_IDENT_TP_CD Y CDSOURCEIDENTTP
P_ORG_NAME Y
NULL_S_ORG_NAME N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_USED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_VERIFIED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SOURCE_IDENT_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "P"
SUBTYPE N "H"
ADMIN_SYS_TP_CD N Not required.
ADMIN_CLIENT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
PREFIX_NAME_TP_CD Y CDPREFIXNAMETP
PREFIX_DESC Y
NAME_USAGE_TP_CD N CDNAMEUSAGETP
FREE_FORM_NAME Y Must be supplied if LAST_NAME is empty. Must be empty if GIVEN_NAME or LAST_NAME present.
GIVEN_NAME_ONE Y
GIVEN_NAME_TWO Y
GIVEN_NAME_THREE Y
GIVEN_NAME_FOUR Y
LAST_NAME Y Must be empty if FREE_FORM_NAME supplied
GENERATION_TP_CD Y CDGENERATIONTP
SUFFIX_DESC Y
START_DT Y Use Processing Date if not supplied.
END_DT Y
USE_STANDARD_IND Y
LAST_USED_DT Y
LAST_VERIFIED_DT Y
SOURCE_IDENT_TP_CD Y CDSOURCEIDENTTP
P_LAST_NAME Y
P_GIVEN_NAME_ONE Y
P_GIVEN_NAME_TWO Y
P_GIVEN_NAME_THREE Y
P_GIVEN_NAME_FOUR Y
GIVEN_NAME_ONE_SEARCH Y
GIVEN_NAME_TWO_SEARCH Y
GIVEN_NAME_THREE_SEARCH Y
GIVEN_NAME_FOUR_SEARCH Y
LAST_NAME_SEARCH Y
NULL_PREFIX_NAME_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PREFIX_DESC N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_GIVEN_NAME_ONE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_GIVEN_NAME_TWO N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_GIVEN_NAME_THREE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_GIVEN_NAME_FOUR N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_GENERATION_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SUFFIX_DESC N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_USE_STANDARD_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_USED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_VERIFIED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SOURCE_IDENT_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "P"
SUBTYPE N "E’
ADMIN_SYS_TP_CD N CDADMINSYSTP
ADMIN_CLIENT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
DESCRIPTION Y
LINKTO_ADMIN_SYS_TP_CD N Not required.
LINKTO_ADMIN_CLIENT_ID N
RECTYPE N "P"
SUBTYPE N "A"
ADMIN_SYS_TP_CD N Not required.
ADMIN_CLIENT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
UNDEL_REASON_TP_CD Y CDUNDELREASONT
MEMBER_IND Y P
PREFERRED_IND N
SOLICIT_IND Y
EFFECT_START_MMDD Y
EFFECT_END_MMDD Y
EFFECT_START_TM Y
EFFECT_END_TM Y
START_DT Y Use Processing Date if not supplied.
END_DT Y
LAST_USED_DT Y
LAST_VERIFIED_DT Y
SOURCE_IDENT_TP_CD Y
CARE_OF_DESC Y CDSOURCEIDENTTP
ADDR_USAGE_TP_CD Y
COUNTRY_TP_CD Y CDADDRUSAGETP
RESIDENCE_TP_CD Y CDCOUNTRYTP
PROV_STATE_TP_CD Y CDRESIDENCETP
ADDR_LINE_ONE Y CDPROVSTATETP
ADDR_LINE_TWO Y
ADDR_LINE_THREE Y
CITY_NAME Y
POSTAL_CODE Y
ADDR_STANDARD_IND Y
OVERRIDE_IND Y
RESIDENCE_NUM Y
COUNTY_CODE Y
LATITUDE_DEGREES
LONGITUDE_DEGREES
POSTAL_BARCODE
P_CITY
BUILDING_NAME
STREET_NUMBER
STREET_NAME
P_STREET_NAME
STREET_SUFFIX
PRE_DIRECTIONAL
POST_DIRECTIONAL
BOX_DESIGNATOR
BOX_ID
STN_INFO
STN_ID
REGION
DEL_DESIGNATOR
DEL_ID
DEL_INFO
NULL_UNDEL_REASON_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_MEMBER_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PREFERRED_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SOLICIT_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_EFFECT_START_MMDD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_EFFECT_END_MMDD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_EFFECT_START_TM N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_EFFECT_END_TM N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_USED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_VERIFIED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SOURCE_IDENT_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CARE_OF_DESC N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CONTRY_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_RESIDENCE_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PROV_STATE_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ADDR_LINE_TWO N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ADDR_LINE_THREE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_POSTAL_CODE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ADDR_STANDARD_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_OVERRIDE_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_RESIDENCE_NUM N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_COUNTY_CODE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LATITUDE_DEGREES N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LONGITUDE_DEGREES N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_POSTAL_BARCODE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_BUILDING_NAME N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_STREET_NUMBER N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_STREET_NAME N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_STREET_SUFFIX N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PRE_DIRECTIONAL N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_POST_DIRECTIONAL N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_BOX_DESIGNATOR N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_BOX_ID N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_STN_INFO N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_STN_ID N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_REGION N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_DEL_DESIGNATOR N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_DEL_ID N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_DEL_INFO N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "P"
SUBTYPE N "C"
ADMIN_SYS_TP_CD N Not required.
ADMIN_CLIENT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
UNDEL_REASON_TP_CD Y CDUNDELREASONTP
MEMBER_IND Y
PREFERRED_IND Y
SOLICIT_IND Y
EFFECT_START_MMDD Y
EFFECT_END_MMDD Y
EFFECT_START_TM Y
EFFECT_END_TM Y
START_DT Y Use Processing Date if not supplied.
END_DT Y
LAST_USED_DT Y
LAST_VERIFIED_DT Y
SOURCE_IDENT_TP_CD Y CDSOURCEIDENTTP
CONT_METH_TP_CD N CDCONTMETHTP
METHOD_ST_TP_CD Y CDMETHODSTATUSTP
ATTACH_ALLOW_IND Y
TEXT_ONLY_IND Y
MESSAGE_SIZE Y
COMMENT_DESC Y
REF_NUM N
CONT_METH_STD_IND Y
COUNTRY_CODE Y
AREA_CODE Y
EXCHANGE Y
PH_NUMBER Y S
EXTENSION Y
NULL_UNDEL_REASON_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_MEMBER_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PREFERRED_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SOLICIT_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_EFFECT_START_MMDD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_EFFECT_END_MMDD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_EFFECT_START_TM N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_EFFECT_END_TM N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_USED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_VERIFIED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SOURCE_IDENT_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_METHOD_ST_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ATTACH_ALLOW_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_TEXT_ONLY_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_MESSAGE_SIZE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_COMMENT_DESC N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CONT_METH_STD_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_COUNTRY_CODE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_AREA_CODE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_EXCHANGE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PH_NUMBER N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_EXTENSION N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "P"
SUBTYPE N "I"
ADMIN_SYS_TP_CD N Not required.
ADMIN_CLIENT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
ID_TP_CD N CDIDTP
ID_STATUS_TP_CD Y CDIDSTATUSTP
REF_NUM Y
START_DT Y Use Processing Date if not supplied.
END_DT Y
EXPIRY_DT Y
ASSIGNEDBY_ADMIN_SYS_TP_CD Y CDADMINSYSTP
ASSIGNEDBY_ADMIN_CLIENT_ID Y
IDENTIFIER_DESC Y
ISSUE_LOCATION Y
LAST_USED_DT Y
LAST_VERIFIED_DT Y
SOURCE_IDENT_TP_CD Y CDSOURCEIDENTTP
NULL_ID_STATUS_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_REF_NUM N ref_num can only be null for 1 identifier status type. If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not
empty overwrite prior value
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_EXPIRY_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ASSIGNED_BY N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_IDENTIFIER_DESC N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ISSUE_LOCATION N #########################################################################################################
NULL_LAST_USED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_VERIFIED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SOURCE_IDENT_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "P"
SUBTYPE N "B"
ADMIN_SYS_TP_CD N Not required.
ADMIN_CLIENT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
ENTITY_NAME N "CONTACT"
LOB_TP_CD N CDLOBTP
LOB_REL_TP_CD N CDLOBRELTP
START_DT Y Use Processing Date if not supplied.
END_DT Y
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "P"
SUBTYPE N "R"
ADMIN_SYS_TP_CD_TO N Not required.
ADMIN_CLIENT_ID_TO N
ADMIN_SYS_TP_CD_FROM N Not required.
ADMIN_CLIENT_ID_FROM N TO and FROM SSKs cannot be the same.
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
REL_TP_CD N CDRELTP
REL_DESC Y
START_DT Y Use Processing Date if not supplied.
END_DT Y
REL_ASSIGN_TP_CD Y CDRELASSIGNTP
END_REASON_TP_CD Y CDENDREASONTP
NULL_REL_DESC N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_REL_ASSIGN_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_END_REASON_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "C"
SUBTYPE N "H"
ADMIN_SYS_TP_CD N CDADMINSYSTP
ADMIN_CONTRACT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
CONTR_LANG_TP_CD Y CDLANGTP
CURRENCY_TP_CD Y CDCURRENCYTP
FREQ_MODE_TP_CD Y CDFREQMODETP
BILL_TP_CD Y CDBILLTP
PREMIUM_AMT Y
NEXT_BILL_DT Y
CURR_CASH_VAL_AMT Y
LINE_OF_BUSINESS Y
BRAND_NAME Y
SERVICE_ORG_NAME Y
BUS_ORGUNIT_ID Y
SERVICE_PROV_ID Y
REPLBY_ADMIN_SYS_TP_CD Y Required if Reply by contract ID present. CDADMINSYSTP
REPLBY_ADMIN_CONTRACT_ID Y
ISSUE_LOCATION Y
PREMAMT_CUR_TP Y CDCURRENCYTP
CASHVAL_CUR_TP Y CDCURRENCYTP
ACCESS_TOKEN_VALUE Y
MANAGED_ACCOUNT_IND Y WARNING: Leave null unless advised by MDM Server expert.
AGREEMENT_NAME Y
AGREEMENT_NICKNAME Y
SIGNED_DT Y
EXECUTED_DT Y
END_DT Y
ACCOUNT_LAST_TRANSACTION_DT Y
TERMINATION_DT Y
TERMINATION_REASON_TP_CD Y CDTERMINATIONREASONTP
AGREEMENT_DESCRIPTION Y
AGREEMENT_ST_TP_CD Y CDAGREEMENTSTTP
AGREEMENT_TP_CD Y CDAGREEMENTTP
SERVICE_LEVEL_TP_CD Y CDSERVICELEVELTP
LAST_VERIFIED_DT Y
LAST_REVIEWED_DT Y
PRODUCT_ID Y NOT USED
CLUSTER_KEY Y
NULL_CONTR_LANG_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CURRENCY_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_FREQ_MODE_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_BILL_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PREMIUM_AMT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_NEXT_BILL_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CURR_CASH_VAL_AMT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LINE_OF_BUSINESS N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_BRAND_NAME N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SERVICE_ORG_NAME N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_BUS_ORGUNIT_ID N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SERVICE_PROV_ID N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_REPL_BY_CONTRACT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ISSUE_LOCATION N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PREMAMT_CUR_TP N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CASHVAL_CUR_TP N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ACCESS_TOKEN_VALUE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_MANAGED_ACCOUNT_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_AGREEMENT_NAME N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_AGREEMENT_NICKNAME N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SIGNED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_EXECUTED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_REPLACES_CONTRACT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ACCOUNT_LAST_TRANSACTION_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_TERMINATION_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_TERMINATION_REASON_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_AGREEMENT_DESCRIPTION N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_AGREEMENT_ST_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_AGREEMENT_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SERVICE_LEVEL_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_VERIFIED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_LAST_REVIEWED_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PRODUCT_ID N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CLUSTER_KEY N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "C"
SUBTYPE N "K"
ADMIN_FLD_NM_TP_CD N CDADMINFLDNMTP
ADMIN_CONTRACT_ID N
LINKTO_ADMIN_FLD_NM_TP_CD N CDADMINFLDNMTP
LINKTO_ADMIN_CONTRACT_ID N
CONTRACT_COMP_IND Y ANY VALUE INPUT WILL BE OVERRIDDEN TO "N"
RECTYPE N "C"
SUBTYPE N "C"
ADMIN_SYS_TP_CD N Not required.
ADMIN_CONTRACT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
PROD_TP_CD N CDPRODTP
CONTRACT_ST_TP_CD N CDCONTRACTSTTP
CURR_CASH_VAL_AMT Y
PREMIUM_AMT Y
ISSUE_DT Y
VIATICAL_IND Y
BASE_IND Y
CONTR_COMP_TP_CD Y CDCONTRCOMPTP
SERV_ARRANGE_TP_CD Y CDARRANGEMENTTP
EXPIRY_DT Y
PREMAMT_CUR_TP Y CDCURRENCYTP
CASHVAL_CUR_TP Y CDCURRENCYTP
CLUSTER_KEY Y
NULL_CURR_CASH_VAL_AMT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PREMIUM_AMT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ISSUE_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_VIATICAL_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_BASE_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SERV_ARRANGE_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_EXPIRY_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PREMAMT_CUR_TP N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CASHVAL_CUR_TP N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CLUSTER_KEY N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "C"
SUBTYPE N "R"
ADMIN_SYS_TP_CD N Not required.
ADMIN_CONTRACT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
ADMIN_CLIENT_SYS_TP_CD N Not required.
ADMIN_CLIENT_ID N
CONTR_COMP_TP_CD Y CDCONTRCOMPTP
PROD_TP_CD N CDPRODTP
CONTR_ROLE_TP_CD N CDCONTRACTROLETP
REGISTERED_NAME Y
DISTRIB_PCT Y
IRREVOC_IND Y
START_DT Y Use Processing Date if not supplied.
END_DT Y
RECORDED_START_DT Y
RECORDED_END_DT Y
SHARE_DIST_TP_CD Y CDSHAREDISTTP
ARRANGEMENT_TP_CD Y CDARRANGEMENTTP
ARRANGEMENT_DESC Y
END_REASON_TP_CD Y CDENDREASONTP
NULL_REGISTERED_NAME N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_DISTRIB_PCT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_IRREVOC_IND N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_RECORDED_START_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_RECORDED_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SHARE_DIST_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ARRANGEMENT_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ARRANGEMENT_DESC N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_END_REASON_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "C"
SUBTYPE N "L"
ADMIN_SYS_TP_CD N Not required.
ADMIN_CONTRACT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
ADMIN_CLIENT_SYS_TP_CD N Not required.
ADMIN_CLIENT_ID N
CONTR_COMP_TP_CD Y CDCONTRCOMPTP
PROD_TP_CD N CDPRODTP
CONTR_ROLE_TP_CD N CDCONTRACTROLETP
ADDR_USAGE_TP_CD N CDADDRUSAGETP
START_DT Y Use Processing Date if not supplied.
END_DT
UNDEL_REASON_TP_CD CDUNDELREASONTP
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_UNDEL_REASON_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "C"
SUBTYPE N "L"
ADMIN_SYS_TP_CD N Not required.
ADMIN_CONTRACT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
ADMIN_CLIENT_SYS_TP_CD N Not required.
ADMIN_CLIENT_ID N
CONTR_COMP_TP_CD Y CDCONTRCOMPTP
PROD_TP_CD N CDPRODTP
CONTR_ROLE_TP_CD N CDCONTRACTROLETP
ADDR_USAGE_TP_CD N CDADDRUSAGETP
START_DT Y Use Processing Date if not supplied.
END_DT
UNDEL_REASON_TP_CD CDUNDELREASONTP
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_UNDEL_REASON_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "C"
SUBTYPE N "V"
ADMIN_SYS_TP_CD N Not required.
ADMIN_CONTRACT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
CONTR_COMP_TP_CD Y CDCONTRCOMPTP
PROD_TP_CD N CDPRODTP
DOMAIN_VALUE_TP_CD N CDDOMAINVALUETP
VALUE_STRING N
START_DT Y Use Processing Date if not supplied.
END_DT Y
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_VALUE_STRING N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PRIORITY_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_SOURCE_IDENT_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_DESCRIPTION N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_VALUEATTR_TP_CD_0 N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ATTR0_VALUE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_VALUEATTR_TP_CD_1 N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ATTR1_VALUE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_VALUEATTR_TP_CD_2 N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ATTR2_VALUE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_VALUEATTR_TP_CD_3 N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ATTR3_VALUE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_VALUEATTR_TP_CD_4 N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ATTR4_VALUE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_VALUEATTR_TP_CD_5 N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ATTR5_VALUE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_VALUEATTR_TP_CD_6 N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ATTR6_VALUE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_VALUEATTR_TP_CD_7 N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ATTR7_VALUE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_VALUEATTR_TP_CD_8 N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ATTR8_VALUE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_VALUEATTR_TP_CD_9 N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ATTR9_VALUE N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "P"
SUBTYPE N "S"
ADMIN_SYS_TP_CD N Not required.
ADMIN_CLIENT_ID N
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
PPREF_REASON_TP_CD N CDPPREFREASONTP
SOURCE_IDENT_TP_CD N CDSOURCEIDENTTP
VALUE_STRING Y
START_DT Y Use Processing Date if not supplied.
END_DT Y
PPREF_TP_CD N CDPPREFTP
PPREF_ACT_OPT_ID Y PPREFACTIONOPT
NULL_VALUE_STRING N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_PPREF_ACT_OPT_ID N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_REMOVED_BY_USER N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_CREATED_BY_USER N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_ALERT_SEV_TP_CD N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_END_DT N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
NULL_DESCRIPTION N If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value
RECTYPE N "H"
SUBTYPE N "H"
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
NAME N
HIERARCHY_TP_CD N CDHIERARCHYTP
DESCRIPTION
START_DT
END_DT
RECTYPE N "H"
SUBTYPE N "H"
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
NAME N
HIERARCHY_TP_CD N CDHIERARCHYTP
ADMIN_SYS_TP_CD N
ADMIN_CLIENT_ID N
ENTITY_NAME
DESCRIPTION
START_DT
END_DT
NODEDESIG_TP_CD
LOCALEDESCRIPTION
RECTYPE N "H"
SUBTYPE N "R"
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
NAME N
HIERARCHY_TP_CD N CDHIERARCHYTP
ADMIN_SYS_TP_CD_PARENT N CDADMINSYSTP
ADMIN_CLIENT_ID_PARENT N
ADMIN_SYS_TP_CD_CHILD N CDADMINSYSTP
ADMIN_CLIENT_ID_CHILD N
DESCRIPTION
START_DT
END_DT
RECTYPE N "H"
SUBTYPE N "U"
LOAD_TYPE Y “U” update, “A” add, “empty” either add or update as applicable
NAME N
HIERARCHY_TP_CD N CDHIERARCHYTP
ADMIN_SYS_TP_CD N CDADMINSYSTP
ADMIN_CLIENT_ID N
DESCRIPTION
START_DT
END_DT
Note: MDM Server also comes with MDM Server Workbench, a development
tool to help with the creation of these data and behavior extensions. This
workbench comes in the form of a plug-in to Rational® Software Architect.
You may also create new transactions or services using the MDM Server
application framework. You can build transactions by constructing new
controller/business components and using the existing Request Framework and
Common Components.
1
MDM Server also uses its own extension framework to plug in some modules, such as Rules of
Visibility, in order to keep it loosely coupled and easily configurable to turn “on” or “off”.
The Extension Controller uses the parameters to determine if any Extension Sets
must be further evaluated. Relevant Extension Sets are then interrogated and
qualified extensions, either Java or rules sets, are invoked.
Because RDP for MDM loads directly into the MDM target tables, creating new
MDM Server services or behavior extensions will have no impact on RDP for
MDM. However, with extensions to the MDM data model, changes must be made
to both the MDM Server and the corresponding RDP for MDM assets.
MDM Server provides a code generation tool to allow clients to change existing
column attributes, add new columns to existing tables, and add new tables to
satisfy business requirements. The code generation tool also generates the Web
Services integration code for these data extensions.
Data extensions Add a new element to an Change to the corresponding ImportSIF shared
and additions existing SIF record when container (names starting with ILIS…) will propagate
that element does not through to the target.
participate in some
transformation or For BulkLoad, no further changes are required.
aggregation
For Insert (Upsert), change to the corresponding DB
or shared container (names starting with ILDBIN…) is also
required.
Modify an existing
element's data type and
precision/scale/length
when that element does
not participate in some
transformation or
aggregation.
Add a new table (new SIF This is beyond the scope of a typical RDP for MDM
record) implementation.
In this way, as long as the extensions are confined to existing shared containers,
it should be possible to upgrade core RDP jobs without losing client-specific
customizations.
Using RCP judiciously facilitates re-usable job designs based on input metadata,
rather than using a large number of jobs with hard-coded table definitions to
perform the same tasks. Furthermore, RCP facilitates re-use through parallel
shared containers.
By using RCP, only the columns explicitly referenced within the shared container
logic need to be defined, the remaining columns pass through at runtime, as long
as each stage in the shared container has RCP enabled on their stage Output
properties.
There are some RDP for MDM jobs, stages, and shared containers where RCP
is explicitly disabled. In most cases, RCP is disabled for QualityStage match, as
this is a standard practice (only matching key columns are output). However,
there are other objects and jobs where RCP for MDM is disabled that should be
reviewed to ensure the additional columns are passed down when necessary.
Table C-2 on page 270 summarizes the jobs and containers that may require
review.
Note: This is an incomplete table, and may change with new releases of RDP
assets.
MDMIS R4 IL_000_PS_Stage_ErrReasonTbl
MDMIS R4 IL_010_IS_Import_SIF
MDMIS R4 IL_020_VS_Address
Configuration
Parameters
Figure D-1 Main components of RDP processing & error logs generated
The general format of the error messages in the error logs is shown in Table D-1
on page 273. Please refer to the download site for a document on the error
codes:
http://www.redbooks.ibm.com/redpieces/abstracts/sg247704.html
3 ADMIN_SYS_TP_CD These two fields are the SSK (Source System Key)
4 ADMIN_CLIENT_ID_OR_
CONTRACT_ID
5 CONT_ID This is the surrogate key (SK) generated for each row
6 SIF_FILE_NAME This is the physical location of the error row among the
input data files
7 SIF_ROW_NUMBER
13 INTERNAL_ID This is a surrogate key that we apply inside the RDP for
MDM jobs for use there, it is dropped before loading the
database (where the CONT_ID is used instead).
15 ERR_STAGE_NAME This is the name of the stage that detected the error and
produced the error row.
In this appendix, we created a number of SIF files containing the most commonly
encountered errors to identify the corresponding error messages generated by
the RDP for MDM jobs. The contents of the consolidated error log is shown here.
Example D-2 on page 277 shows the contents of the error log for this error:
The first record highlights row 28 in the SIF file SIF_Out.pipe with the SSK of
(1000000,70005817) that the SIF parser is unable to parse. Error message
shows Unable to parse record at RT/ST Level, and the error severity level
is 0. The name of the stage (tx_RTST_ci_Rejects) and job name
(IL_010_Parse_Columnization) is also provided.
This row is rejected.
Note: Currently, the pipe character cannot be substituted as the field delimiter,
nor is an escape character provided.
Example D-4 on page 280 shows the contents of the error log for this error:
The first record highlights row 2 in the SIF file SIF_Out.CodeError with the
SSK of (1000002,8000090) that is in error. Error message shows “The
following is not correct: ClientPotentialType”, and the error severity level is 0.
The name of the stage
(020_Contact.CheckCodeAndContentValidationErrors) and job name
(IL_020_VS_Contact) is also provided.
The second row has the error message “Record In Error Dropped” for the
same row (2) in the SIF. It also It shows name of the stage in which this occurs
as being 020Contact.DropErrorRows, and the job name being
IL_020_VS_Contact.
The subsequent records show the rows (689, 780, and 41) in the SIF that are
also rejected because they are associated with row 2 that was dropped. The
messages “Invalid PersonName Records: No Matching Contact Record” (row
689)” and “Record dropped by association. Fatal errors were detected on
related party records.” (rows 780 and 41) are generated.
Example: D-3 Validation error with the code table error—partial contents of SIF
P|P|1000002|8000719|A|N|||||||1||||||||||||||||||||3||||||1984-05-07
00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|
P|P|1000002|8000090|A|N|||||||2||-13||||||||||||||||||3|||||M|1975-09-02
00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|
P|P|1000001|200000071|A|N|||||||2|||||||||||||||||||||||||F|1989-08-23
00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|
P|P|1000001|200000041|A|N|||||||1|||||||||||||||||||||||||M|1998-08-03
00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|
P|P|1000000|70008172|A|N|||100|||||2|||||||||||||||||||||185|||M|1937-08-02
00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|
P|P|1000002|8000037|A|N|||||||1||||||||||||||||||||3|||||F|1986-09-02
00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|
0|0|0|0|
Example: D-4 Validation error with the code table error log output
P|P|1000002|8000090|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.CodeError|2|1624|100054|
The following is not correct: ClientPotentialType|0|canonical_errCode|8611|2008-11-01
09:24:13|020_Contact.CheckCodeAndContentValidationErrors|IL_020_VS_Contact
P|P|1000002|8000090|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.CodeError|2|110184|10006
6|Record In Error Dropped|0|canonical_errCode|8611|2008-11-01
09:24:13|020Contact.DropErrorRows|IL_020_VS_Contact
P|H|1000002|8000090|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.CodeError|689|110126|100
246|Invalid PersonName Records: No Matching Contact
Record|0|canonical_errCode|8611|2008-11-01 09:25:10|030_CONTACT_RIV.Party Join
Proc|IL_030_RI_Contact_Person_Org
P|I|1000002|8000090|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.CodeError|780|110387|100
387|Record dropped by association. Fatal errors were detected on related party
records.|0|canonical_errCode|8611|2008-05-30
09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop
P|A|1000002|8000090|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.CodeError|41|110387|1003
87|Record dropped by association. Fatal errors were detected on related party
records.|0|canonical_errCode|8611|2008-05-30
09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop
When this error occurs, any additional columns detected after the final expected
column (as defined by the metadata) are ignored and a warning message
[“Import consumed only 74bytes of the record's 164 bytes (no further warnings
will be generated from this partition)”] is written to the Director log as shown in
Figure D-2 on page 284.
Attention: The main point here is to carefully review the Director log output for
such warnings because they do not appear in the RDP for MDM error logs.
The count of bytes (74 in our example) begins after the SSK because that is
where the columns begin — the count includes the column delimiter pipe “|”
character.
Figure D-2 End of record missing error — partial contents of Director log output
Example D-9 on page 286 shows the contents of the error log for this error:
The second record highlights row 8 in the SIF file
SIF_Out.endBeforeStartDate with the SSK of (1000002,8000212) that is in
error. Error message shows “EndDate must be after StartDate”, and the error
severity level is 0. The name of the stage
(020_Contact.CheckCodeAndContentValidationErrors) and job name
(IL_020_VS_Contact) is also provided.
The first record also highlights the fact that row 8 is dropped with the error
message “Record In Error Dropped”.
The subsequent records are errors resulting from the invalid date bounds.
Note the various rows (157, 254, 353, and 675), error messages, and stage
and job in which these errors were detected.
Example: D-9 Start date after end date error log output
P|P|1000002|8000212|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.endBeforeStartDate|8|110
184|100066|Record In Error Dropped|0|canonical_endBeforeStartDate|9107|2008-11-04
04:44:17|020Contact.DropErrorRows|IL_020_VS_Contact
P|P|1000002|8000212|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.endBeforeStartDate|8|102
|100056|EndDate must be after
StartDate|0|canonical_endBeforeStartDate|9107|2008-11-04
04:44:17|020_Contact.CheckCodeAndContentValidationErrors|IL_020_VS_Contact
P|H|1000002|8000212|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.endBeforeStartDate|675|1
10126|100246|Invalid PersonName Records: No Matching Contact
Record|0|canonical_endBeforeStartDate|9107|2008-11-04 04:45:44|030_CONTACT_RIV.Party
Join Proc|IL_030_RI_Contact_Person_Org
P|I|1000002|8000212|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.endBeforeStartDate|254|1
10387|100387|Record dropped by association. Fatal errors were detected on related
party records.|0|canonical_endBeforeStartDate|9107|2008-05-30
09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop
P|C|1000002|8000212|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.endBeforeStartDate|353|1
10387|100387|Record dropped by association. Fatal errors were detected on related
Example D-11 on page 288 shows the contents of the error log for this error:
The first record highlights row 1 in the SIF file SIF_Out.dateFormatError with
the SSK of (1000002,8000719) that is in error. Error message shows “Unable
to parse record at RT/ST Level”, and the error severity level is 0. The name of
the stage (tx_RTST_ci_Rejects) and job name
(IL_010_Parse_Columnization) is also provided.
The subsequent records are errors resulting from the invalid date format.
Note the various rows (227, 422, 513, and 859), error messages, and stage
and job in which these errors were detected.
Select the Additional materials and open the directory that corresponds with
the IBM Redbooks form number, SG247704.
RDP for MDM IBM InfoSphere Rapid Deployment Package (RDP) for MDM
overview provides a rapid deployment approach to implementing MDM INTERNATIONAL
solutions that provide immediate return on investment. It TECHNICAL
MDM overview provides a seamless upgrade path to IBM InfoSphere MDM SUPPORT
Server which provides the complete range of MDM ORGANIZATION
functionally in the market today.
Financial services
scenario In this IBM Redbooks publication, we use a simple financial
services MDM scenario to describe in detail the RDP for BUILDING TECHNICAL
MDM offering and show how it can deliver a return on INFORMATION BASED ON
investment in a short timeframe using a phased approach PRACTICAL EXPERIENCE
that ensures minimal risk.
SG24-7704-00 0738432636