You are on page 1of 115

Informatica Data Quality (Version 9.5.

1 HotFix 2)
User Guide
Informatica Data Quality User Guide
Version 9.5.1 HotFix 2
June 2013
Copyright (c) 2009-2013 Informatica Corporation. All rights reserved.
This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use and
disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any
means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. This Software may be protected by U.S. and/or international Patents and
other Patents Pending.
Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS
227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013

(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable.
The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us in
writing.
Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart,
Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica On Demand,
Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging and Informatica Master Data
Management are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product
names may be trade names or trademarks of their respective owners.
Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rights reserved.
Copyright

Sun Microsystems. All rights reserved. Copyright

RSA Security Inc. All Rights Reserved. Copyright

Ordinal Technology Corp. All rights reserved.Copyright

Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright Isomorphic Software. All rights reserved. Copyright

Meta Integration Technology, Inc. All


rights reserved. Copyright

Intalio. All rights reserved. Copyright

Oracle. All rights reserved. Copyright

Adobe Systems Incorporated. All rights reserved. Copyright

DataArt,
Inc. All rights reserved. Copyright

ComponentSource. All rights reserved. Copyright

Microsoft Corporation. All rights reserved. Copyright

Rogue Wave Software, Inc. All rights


reserved. Copyright

Teradata Corporation. All rights reserved. Copyright

Yahoo! Inc. All rights reserved. Copyright

Glyph & Cog, LLC. All rights reserved. Copyright

Thinkmap, Inc. All rights reserved. Copyright

Clearpace Software Limited. All rights reserved. Copyright

Information Builders, Inc. All rights reserved. Copyright

OSS Nokalva,
Inc. All rights reserved. Copyright Edifecs, Inc. All rights reserved. Copyright Cleo Communications, Inc. All rights reserved. Copyright

International Organization for


Standardization 1986. All rights reserved. Copyright

ej-technologies GmbH. All rights reserved. Copyright

Jaspersoft Corporation. All rights reserved. Copyright

is
International Business Machines Corporation. All rights reserved. Copyright

yWorks GmbH. All rights reserved. Copyright

Lucent Technologies. All rights reserved. Copyright


(c) University of Toronto. All rights reserved. Copyright

Daniel Veillard. All rights reserved. Copyright

Unicode, Inc. Copyright IBM Corp. All rights reserved. Copyright

MicroQuill Software Publishing, Inc. All rights reserved. Copyright

PassMark Software Pty Ltd. All rights reserved. Copyright

LogiXML, Inc. All rights reserved. Copyright

2003-2010 Lorenzi Davide, All rights reserved. Copyright

Red Hat, Inc. All rights reserved. Copyright

The Board of Trustees of the Leland Stanford Junior University. All rights
reserved. Copyright

EMC Corporation. All rights reserved. Copyright

Flexera Software. All rights reserved. Copyright

Jinfonet Software. All rights reserved. Copyright

Apple
Inc. All rights reserved. Copyright

Telerik Inc. All rights reserved. Copyright

BEA Systems. All rights reserved.


This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and/or other software which is licensed under various versions of the
Apache License (the "License"). You may obtain a copy of these Licenses at http://www.apache.org/licenses/. Unless required by applicable law or agreed to in writing, software
distributed under these Licenses is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Licenses for
the specific language governing permissions and limitations under the Licenses.
This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright

1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under various versions of the GNU Lesser General Public License Agreement, which may be
found at http:// www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a particular purpose.
The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine, and
Vanderbilt University, Copyright (

) 1993-2006, all rights reserved.


This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution of this
software is subject to terms available at http://www.openssl.org and http://www.openssl.org/source/license.html.
This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, <daniel@haxx.se>. All Rights Reserved. Permissions and limitations regarding this software
are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby
granted, provided that the above copyright notice and this permission notice appear in all copies.
The product includes software copyright 2001-2005 (

) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at
http://www.dom4j.org/ license.html.
The product includes software copyright

2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available
at http://dojotoolkit.org/license.
This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding this
software are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html.
This product includes software copyright

1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http://
www.gnu.org/software/ kawa/Software-License.html.
This product includes OSSP UUID software which is Copyright

2002 Ralf S. Engelschall, Copyright

2002 The OSSP Project Copyright

2002 Cable & Wireless Deutschland.


Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php.
This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subject to
terms available at http:/ /www.boost.org/LICENSE_1_0.txt.
This product includes software copyright

1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http://
www.pcre.org/license.txt.
This product includes software copyright

2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at
http:// www.eclipse.org/org/documents/epl-v10.php.
This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://www.stlport.org/doc/
license.html, http:// asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://httpunit.sourceforge.net/doc/ license.html,
http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/license.html, http://www.libssh2.org, http://slf4j.org/
license.html, http://www.sente.ch/software/OpenSourceLicense.html, http://fusesource.com/downloads/license-agreements/fuse-message-broker-v-5-3- license-agreement;
http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html; http://www.jgraph.com/jgraphdownload.html; http://www.jcraft.com/
jsch/LICENSE.txt; http://jotm.objectweb.org/bsd_license.html; . http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231; http://www.slf4j.org/license.html; http://
nanoxml.sourceforge.net/orig/copyright.html; http://www.json.org/license.html; http://forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://
www.sqlite.org/copyright.html, http://www.tcl.tk/software/tcltk/license.html, http://www.jaxen.org/faq.html, http://www.jdom.org/docs/faq.html, http://www.slf4j.org/license.html;
http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/License; http://www.keplerproject.org/md5/license.html; http://www.toedter.com/en/jcalendar/license.html; http://
www.edankert.com/bounce/index.html; http://www.net-snmp.org/about/license.html; http://www.openmdx.org/#FAQ; http://www.php.net/license/3_01.txt; http://srp.stanford.edu/
license.txt; http://www.schneier.com/blowfish.html; http://www.jmock.org/license.html; http://xsom.java.net; and http://benalman.com/about/license/; https://github.com/
CreateJS/EaselJS/blob/master/src/easeljs/display/Bitmap.js; http://www.h2database.com/html/license.html#summary; and http://jsoncpp.sourceforge.net/LICENSE.
This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and Distribution License
(http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php), the Sun Binary Code License Agreement
Supplemental License Terms, the BSD License (http:// www.opensource.org/licenses/bsd-license.php) the MIT License (http://www.opensource.org/licenses/mit-license.php) and
the Artistic License (http://www.opensource.org/licenses/artistic-license-1.0).
This product includes software copyright

2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this software are
subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab. For further information
please visit http://www.extreme.indiana.edu/.
This Software is protected by U.S. Patent Numbers 5,794,246; 6,014,670; 6,016,501; 6,029,178; 6,032,158; 6,035,307; 6,044,374; 6,092,086; 6,208,990; 6,339,775; 6,640,226;
6,789,096; 6,820,077; 6,823,373; 6,850,947; 6,895,471; 7,117,215; 7,162,643; 7,243,110, 7,254,590; 7,281,001; 7,421,458; 7,496,588; 7,523,121; 7,584,422; 7676516; 7,720,
842; 7,721,270; and 7,774,791, international Patents and other Patents Pending.
DISCLAIMER: Informatica Corporation provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied
warranties of noninfringement, merchantability, or use for a particular purpose. Informatica Corporation does not warrant that this software or documentation is error free. The
information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is subject to
change at any time without notice.
NOTICES
This Informatica product (the "Software") includes certain drivers (the "DataDirect Drivers") from DataDirect Technologies, an operating company of Progress Software Corporation
("DataDirect") which are subject to the following terms and conditions:
1. THE DATADIRECT DRIVERS ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OF THE
POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACH OF
CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS.
Part Number: DQ-UG-95100-HF2-0001
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica My Support Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Informatica Support YouTube Channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Informatica Marketplace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Informatica Velocity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Part I: Informatica Data Quality Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 1: Introduction to Data Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Data Quality Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Chapter 2: Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Reference Data Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
User-Defined Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Informatica Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Reference Data and Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Reference Table Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Managed and Unmanaged Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Content Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Character Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Classifier Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Pattern Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Probabilistic Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Regular Expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Token Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Creating a Content Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Creating a Reusable Content Expression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Generating Reference Data from a Midstream Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 3: Classifier Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Classifier Models Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Table of Contents i
Classifier Transformation Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Classifier Model Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Compiling a Classifier Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Classifier Model Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Adding Data and Label Values to a Classifier Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Deleting Data Values from a Classifier Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Classifier Model Label Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Adding a Label to a Classifier Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Assigning Labels to Classifier Model Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Deleting a Label from a Classifier Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Classifier Scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Classifier Model Views. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Classifier Model Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Finding Values in Reference Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Finding Data Rows with no Labels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Finding a Value in a Data Row. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Creating a Classifier Model from a Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Copy and Paste Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Copying a Classifier Model to Another Content Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Importing a Classifier Model from Another Content Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 4: Probabilistic Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Probabilistic Models Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Labeler Transformation Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Parser Transformation Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Probabilistic Model Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Compiling the Probabilistic Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Probabilistic Model Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Adding a Reference Data String to a Probabilistic Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Copying Data Strings to a Probabilistic Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Probabilistic Model Label Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Overflow Label. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Assigning Labels to Probabilistic Model Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Adding a Label to a Probabilistic Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Deleting a Label from a Probabilistic Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Probabilistic Model Advanced Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Creating an Empty Probabilistic Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Creating a Probabilistic Model from a Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Copy and Paste Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Copying a Probabilistic Model to Another Content Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Importing a Probabilistic Model from Another Content Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
ii Table of Contents
Part II: Data Quality Features in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter 5: Column Profiles in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . 34
Column Profile Concepts Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Column Profile Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Scorecards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Column Profiles in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Filtering Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Sampling Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Creating a Single Data Object Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Chapter 6: Column Profile Results in Informatica Developer. . . . . . . . . . . . . . . . . . . 38
Column Profile Results in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Column Value Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Column Pattern Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Column Statistics Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Exporting Profile Results from Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Chapter 7: Rules in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Rules in Informatica Developer Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Creating a Rule in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Applying a Rule in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 8: Scorecards in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Scorecards in Informatica Developer Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Creating a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Exporting a Resource File for Scorecard Lineage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Viewing Scorecard Lineage from Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Chapter 9: Mapplet and Mapping Profiling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Mapplet and Mapping Profiling Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Running a Profile on a Mapplet or Mapping Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Comparing Profiles for Mapping or Mapplet Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Generating a Mapping from a Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Chapter 10: Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Reference Tables Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Reference Table Data Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Creating a Reference Table Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Creating a Reference Table from a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Creating a Reference Table from a Relational Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Table of Contents iii
Copying a Reference Table in the Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Editing Reference Table Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Finding Data Values in a Reference Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Part III: Data Quality Features in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Chapter 11: Column Profiles in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . 54
Column Profiles in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Column Profiling Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Profile Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Profile Results Option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Sampling Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Drilldown Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Creating a Column Profile in the Analyst Tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Editing a Column Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Running a Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Creating a Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Managing Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Synchronizing a Flat File Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Synchronizing a Relational Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Chapter 12: Column Profile Results in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . 60
Column Profile Results in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Profile Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Column Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Column Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Column Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Column Profile Drilldown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Drilling Down on Row Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Applying Filters to Drilldown Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Column Profile Export Files in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Profile Export Results in a CSV File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Profile Export Results in Microsoft Excel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Exporting Profile Results from Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Chapter 13: Rules in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Rules in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Predefined Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Predefined Rules Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Applying a Predefined Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Expression Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Expression Rules Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Creating an Expression Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
iv Table of Contents
Chapter 14: Scorecards in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Scorecards in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Informatica Analyst Scorecard Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Metric Weights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Adding Columns to a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Running a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Viewing a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Editing a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Defining Thresholds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Metric Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Drilling Down on Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Viewing Trend Charts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Scorecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Notification Email Message Template. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Setting Up Scorecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Configuring Global Settings for Scorecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Scorecard Integration with External Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Viewing a Scorecard in External Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Scorecard Lineage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Viewing Scorecard Lineage in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Chapter 15: Exception Record Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Exception Record Management Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Exception Management Process Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Reserved Column Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Exception Management Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Viewing and Editing Bad Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Updating Bad Record Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Viewing and Filtering Duplicate Record Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Editing Duplicate Record Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Consolidating Duplicate Record Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Viewing the Audit Trail. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Chapter 16: Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Reference Tables Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Reference Table Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
General Reference Table Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Reference Table Column Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Create Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Creating a Reference Table in the Reference Table Editor. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Create a Reference Table from Profile Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Table of Contents v
Creating a Reference Table from Profile Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Creating a Reference Table from Column Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Creating a Reference Table from Column Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Create a Reference Table From a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Analyst Tool Flat File Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Creating a Reference Table from a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Create a Reference Table from a Database Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Creating a Reference Table from a Database Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Creating a Database Connection for a Reference Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Copying a Reference Table in the Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Reference Table Updates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Managing Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Managing Rows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Finding and Replacing Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Exporting a Reference Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Enable and Disable Edits to an Unmanaged Reference Table. . . . . . . . . . . . . . . . . . . . . . . . . 98
Audit Trail Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Viewing Audit Trail Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Rules and Guidelines for Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
vi Table of Contents
Preface
The Informatica Data Quality User Guide is written for Informatica users who create and run data quality processes in
the Informatica Developer and Informatica Analyst client applications. The Informatica Data Quality User Guide
contains information about profiles and other objects that you can use to analyze the content and structure of data and
to find and fix data quality issues.
Informatica Resources
Informatica My Support Portal
As an Informatica customer, you can access the Informatica My Support Portal at http://mysupport.informatica.com.
The site contains product information, user group information, newsletters, access to the Informatica customer
support case management system (ATLAS), the Informatica How-To Library, the Informatica Knowledge Base,
Informatica Product Documentation, and access to the Informatica user community.
Informatica Documentation
The Informatica Documentation team takes every effort to create accurate, usable documentation. If you have
questions, comments, or ideas about this documentation, contact the Informatica Documentation team through email
at infa_documentation@informatica.com. We will use your feedback to improve our documentation. Let us know if we
can contact you regarding your comments.
The Documentation team updates documentation as needed. To get the latest documentation for your product,
navigate to Product Documentation from http://mysupport.informatica.com.
Informatica Web Site
You can access the Informatica corporate web site at http://www.informatica.com. The site contains information about
Informatica, its background, upcoming events, and sales offices. You will also find product and partner information.
The services area of the site includes important information about technical support, training and education, and
implementation services.
Informatica How-To Library
As an Informatica customer, you can access the Informatica How-To Library at http://mysupport.informatica.com. The
How-To Library is a collection of resources to help you learn more about Informatica products and features. It includes
articles and interactive demonstrations that provide solutions to common problems, compare features and behaviors,
and guide you through performing specific real-world tasks.
vii
Informatica Knowledge Base
As an Informatica customer, you can access the Informatica Knowledge Base at http://mysupport.informatica.com.
Use the Knowledge Base to search for documented solutions to known technical issues about Informatica products.
You can also find answers to frequently asked questions, technical white papers, and technical tips. If you have
questions, comments, or ideas about the Knowledge Base, contact the Informatica Knowledge Base team through
email at KB_Feedback@informatica.com.
Informatica Support YouTube Channel
You can access the Informatica Support YouTube channel at http://www.youtube.com/user/INFASupport. The
Informatica Support YouTube channel includes videos about solutions that guide you through performing specific
tasks. If you have questions, comments, or ideas about the Informatica Support YouTube channel, contact the
Support YouTube team through email at supportvideos@informatica.com or send a tweet to @INFASupport.
Informatica Marketplace
The Informatica Marketplace is a forum where developers and partners can share solutions that augment, extend, or
enhance data integration implementations. By leveraging any of the hundreds of solutions available on the
Marketplace, you can improve your productivity and speed up time to implementation on your projects. You can
access Informatica Marketplace at http://www.informaticamarketplace.com.
Informatica Velocity
You can access Informatica Velocity at http://mysupport.informatica.com. Developed from the real-world experience
of hundreds of data management projects, Informatica Velocity represents the collective knowledge of our
consultants who have worked with organizations from around the world to plan, develop, deploy, and maintain
successful data management solutions. If you have questions, comments, or ideas about Informatica Velocity,
contact Informatica Professional Services at ips@informatica.com.
Informatica Global Customer Support
You can contact a Customer Support Center by telephone or through the Online Support. Online Support requires a
user name and password. You can request a user name and password at http://mysupport.informatica.com.
viii Preface
Use the following telephone numbers to contact Informatica Global Customer Support:
North America / South America Europe / Middle East / Africa Asia / Australia
Toll Free
Brazil 0800 891 0202
Mexico 001 888 209 8853
North America +1 877 463 2435
Toll Free
France 0805 804632
Germany 0800 5891281
Italy 800 915 985
Netherlands 0800 2300001
Portugal 800 208 360
Spain 900 813 166
Switzerland 0800 463 200
United Kingdom 0800 023 4632
Standard Rate
Belgium +31 30 6022 797
France +33 1 4138 9226
Germany +49 1805 702702
Netherlands +31 30 6022 797
United Kingdom +44 1628 511445
Toll Free
Australia 1 800 120 365
Asia Pacific 00 080 00016360
China 400 810 0900
Preface ix
x
Part I: Informatica Data Quality
Concepts
This part contains the following chapters:
Introduction to Data Quality, 2
Reference Data, 4
Classifier Models, 15
Probabilistic Models, 23
1
C H A P T E R 1
Introduction to Data Quality
This chapter includes the following topic:
Data Quality Overview, 2
Data Quality Overview
Use Informatica Data Quality to analyze the content and structure of your data and enhance the data in ways that meet
your business needs.
You use Informatica applications to design and run processes to complete the following tasks:
Profile data. Profiling reveals the content and structure of data. Profiling is a key step in any data project, as it can
identify strengths and weaknesses in data and help you define a project plan.
Create scorecards to review data quality. A scorecard is a graphical representation of the quality measurements in
a profile.
Standardize data values. Standardize data to remove errors and inconsistencies that you find when you run a
profile. You can standardize variations in punctuation, formatting, and spelling. For example, you can ensure that
the city, state, and ZIP code values are consistent.
Parse data. Parsing reads a field composed of multiple values and creates a field for each value according to the
type of information it contains. Parsing can also add information to records. For example, you can define a parsing
operation to add units of measurement to product data.
Validate postal addresses. Address validation evaluates and enhances the accuracy and deliverability of postal
address data. Address validation corrects errors in addresses and completes partial addresses by comparing
address records against address reference data from national postal carriers. Address validation can also add
postal information that speeds mail delivery and reduces mail costs.
Find duplicate records. Duplicate analysis calculates the degrees of similarity between records by comparing data
from one or more fields in each record. You select the fields to be analyzed, and you select the comparison
strategies to apply to the data. The Developer tool enables two types of duplicate analysis: field matching, which
identifies similar or duplicate records, and identity matching, which identifies similar or duplicate identities in
record data.
Manage exceptions. An exception is a record that contains data quality issues that you correct by hand. You can
run a mapping to capture any exception record that remains in a data set after you run other data quality processes.
You review and edit exception records in the Analyst tool or in Informatica Data Director for Data Quality.
Create reference data tables. Informatica provides reference data that can enhance several types of data quality
process, including standardization and parsing. You can create reference tables using data from profile results.
Create and run data quality rules. Informatica provides rules that you can run or edit to meet your project
objectives. You can create mapplets and validate them as rules in the Developer tool.
2
Collaborate with Informatica users. The Model repository stores reference data and rules, and this repository is
available to users of the Developer tool and Analyst tool. Users can collaborate on projects, and different users can
take ownership of objects at different stages of a project.
Export mappings to PowerCenter. You can export and run mappings in PowerCenter. You can export mappings to
PowerCenter to reuse the metadata for physical data integration or to create web services.
Data Quality Overview 3
C H A P T E R 2
Reference Data
This chapter includes the following topics:
Reference Data Overview, 4
User-Defined Reference Data, 5
Informatica Reference Data, 6
Reference Data and Transformations, 6
Reference Tables, 7
Content Sets, 8
Reference Data Overview
A reference data object contains a set of data values that you perform search operations in source data. You can
create reference data objects in the Developer tool and Analyst tool, and you can import reference data objects to the
Model repository. The Data Quality Content installer includes reference data objects that you can import.
You can create and edit the following types of reference data:
Reference tables
A reference table contains standard and alternative versions of a set of data values. You add a reference table to
a transformation in the Developer tool to verify that source data values are accurate and correctly formatted.
A database table contains at least two columns. One column contains the standard or preferred version of a
string, and other columns contain alternative versions. When you add a reference table to a transformation, the
transformation searches the input port data for values that also appear in the table. You can create tables with
any data that is useful to the data project you work on.
Content Sets
Content sets are repository and file objects that contain reference data values. Content sets are similar in
structure to reference tables but they are more commonly used for lower-level There are different types of content
sets. When you add a content set to a transformation, the transformation searches the input port data for values
that appear in the content or for strings that match the data patterns defined in the content set.
The Data Quality Content installer includes reference data objects that you can import. You download the Data Quality
Content Installer from Informatica.
The Data Quality Content installer includes the following types of reference data:
4
Informatica reference tables
Database tables created by Informatica. You import Informatica reference tables when you import accelerator
objects from the Content Installer. The reference tables contain standard and alternative versions of common
business terms from several countries. The types of reference information include telephone area codes,
postcode formats, first names, Social Security number formats, occupations, and acronyms. You can edit
Informatica reference tables.
Informatica content sets
Content sets created by Informatica. You import content sets when you import accelerator objects from the
Content Installer. A content set contains different types of reference data that you can use to perform search
operations in data quality transformations.
Address reference data files
Reference data files that identify all valid addresses in a country. The Address Validator transformation reads this
data. You cannot create or edit address reference data files.
The Content Installer installs files for the countries that you have purchased. Address reference data is current for
a defined period and you must refresh your data regularly, for example every quarter. You cannot view or edit
address reference data.
Identity population files
Contain information on types of personal, household, and corporate identities. The Match transformation and the
Comparison transformation use this data to parse potential identities from input fields. You cannot create or edit
address identity population files.
The Content Installer writes population files to the file system.
User-Defined Reference Data
You can use the values in a data object to create a reference data object.
For example, you can select a data object or profile column that contains values that are specific to a project or
organization. The column values let you create custom reference data objects for a project.
You can build a reference data object from a data column in the following cases:
The data rows in the column contain the same type of information.
The column contains a set of data values that are either correct or incorrect for the project.
Note: Create a reference object with incorrect values when you want to search a data set for incorrect values.
The following table lists common examples of project data columns that can contain reference data:
Information Reference Data Example
Stock Keeping Unit (SKU) codes Use an SKU column to create a reference table of valid SKU
code for an organization. Use the reference table to find
correct or incorrect SKU codes in a data set.
Employee codes Use an employee code or employee ID column to create a
reference table of valid employee codes. Use the reference
table to find errors in employee data.
User-Defined Reference Data 5
Information Reference Data Example
Customer account numbers Run a profile on a customer account column to identify
account number patterns. Use the profile to create a token
set of incorrect data patterns. Use the token set to find
account numbers that do not conform to the correct account
number structure.
Customer names When a customer name column contains first, middle, and
last names, you can create a probabilistic model that defines
the expected structure of the strings in the column. Use the
probabilistic model to find data strings that do not belong in
the column.
Informatica Reference Data
You purchase and download address reference data and identity population data from Informatica. You purchase an
annual subscription to address data for a country, and you can download the latest address data from Informatica at
any time during the subscription period.
The Content Installer user downloads and installs reference data separately from the applications. Contact an
Administrator tool user for information about the reference data installed on your system
Reference Data and Transformations
Several transformations read reference data to perform data quality tasks.
The following transformations can read reference data:
Address Validator. Reads address reference data to verify the accuracy of addresses.
Case Converter. Reads reference data tables to identify strings that must change case.
Classifier. Reads content set data to identify the type of information in a string.
Comparison. Reads identity population data during duplicate analysis.
Labeler. Reads content set data to identify and label strings.
Match. Reads identity population data during duplicate analysis.
Parser. Reads content set data to parse strings based on the information the contain.
Standardizer. Reads reference data tables to standardize strings to a common format.
You can create reference data objects in the Developer tool and Analyst tool. For example, you can create a reference
table from column profile data. You can export reference tables to the file system.
The Data Quality Content Installer file set includes Informatica reference data objects that you can import.
6 Chapter 2: Reference Data
Reference Tables
A reference table contains the standard versions of a set of data values and alternative versions of the values that
might occur in business data.
You add a reference table to a transformation in the Developer tool. You use the transformations to find reference data
values in input data and to write the alternative values as output data.
You create a reference table in the following ways:
Create an empty reference table and enter the data values.
Create a reference table from data in a flat file.
Create a reference table from data in another database table.
Create a reference table from column profile results.
You can create a reference table in the Developer tool or Analyst tool. You can edit reference table data in the
Developer tool. You can edit reference table data and metadata in the Analyst tool. When you create a reference table,
the Model repository stores the table metadata as a repository object.
Reference Table Structure
Most reference tables contain at least two columns. One column contains the correct or required versions of the data
values. Other columns contain different versions of the values, including alternative versions that may appear in the
source data.
The column that contains the correct or required values is called the valid column. When a transformation reads a
reference table in a mapping, the transformation looks for values in the non-valid columns. When the transformation
finds a non-valid value, it returns the corresponding value from the valid column. You can also configure a
transformation to return a single common value instead of the valid values.
The valid column can contain data that is formally correct, such as ZIP codes. It can contain data that is relevant to a
project, such as stock keeping unit (SKU) numbers that are unique to an organization. You can also create a valid
column from bad data, such as values that contain known data errors that you want to search for.
For example, a Developer tool user creates a reference table that contains a list of valid SKU numbers in a retail
organization. The user adds the reference table to a Labeler transformation and creates a mapping with the
transformation. The user runs the mapping on a product database table. When the mapping runs, the Labeler creates
a column that identifies the product records that do not contain valid SKU numbers.
Reference Tables and the Parser Transformation
You create a reference table with a single column when you want to use the table data in a pattern-based parsing
operation. You configure the Parser transformation to perform pattern-based parsing, and you import the data to the
transformation configuration.
Managed and Unmanaged Reference Tables
Reference tables store metadata in the Model repository. Reference tables can store column data in the reference
data warehouse or in another database.
When the reference data warehouse stores the column data, Informatica services identify the table as a managed
reference table. When another database stores the column data, Informatica services identify the table as an
unmanaged reference table. The Content Management Service specifies the database connection for the reference
data warehouse.
You can edit managed and unmanaged reference table data in the Developer tool and Analyst tool. You can edit
managed and unmanaged reference table metadata in the Analyst Tool.
Reference Tables 7
Before you edit the table, verify that you have the required privileges on the following services:
Content Management Service. To edit reference table data, you need the Edit Reference Table Data privilege. To
edit reference table metadata, you need the Edit Reference Table Metadata privilege.
Model Repository Service. To view the project that contains the reference table, you need the Create Project
privilege.
Use the Security options in the Administrator tool to review or update the service privileges.
To edit data in an unmanaged reference table, verify also that you configured the reference table object to permit
edits.
Note: If you edit the metadata for an unmanaged reference table in a database application, use the Analyst tool to
synchronize the Model repository with the database table. You must synchronize the Model repository and the
database table before you use the unmanaged reference table in the Developer tool.
Content Sets
A content set is a Model repository object that you use to store reusable content expressions. A content expression is
an expression that you can use in Labeler and Parser transformations to identify data.
You can create content sets to organize content expressions into logical groups. For example, if you create a number
of content expressions that identify Portuguese strings, you can create a content set that groups these content
expressions. Create content sets in the Developer tool.
Content expressions include character sets, pattern sets, regular expressions, and token sets. Content expressions
can be system-defined or user-defined. System-defined content expressions cannot be added to content sets. User-
defined content expressions can be reusable or non-reusable.
Character Sets
A character set contains expressions that identify specific characters and character ranges. You can use character
sets in Labeler transformations that use character labeling mode.
Character ranges specify a sequential range of character codes. For example, the character range "[A-C]" matches
the uppercase characters "A," "B," and "C." This character range does not match the lowercase characters "a," "b," or
"c."
Use character sets to identify a specific character or range of characters as part of labeling operations. For example,
you can label all numerals in a column that contains telephone numbers. After labeling the numbers, you can identify
patterns with a Parser transformation and write problematic patterns to separate output ports.
8 Chapter 2: Reference Data
Character Set Properties
Configure properties that determine character labeling operations for a character set.
The following table describes the properties for a user-defined character set:
Property Description
Label Defines the label that a Labeler transformation applies to
data that matches the character set.
Standard Mode Enables a simple editing view that includes fields for the start
range and end range.
Start Range Specifies the first character in a character range.
End Range Specifies the last character in a character range. For a range
with a single character, leave this field blank.
Advanced Mode Enables an advanced editing view where you can manually
enter character ranges using range characters and delimiter
characters.
Range Character Temporarily changes the symbol that signifies a character
range. The range character reverts to the default character
when you close the character set.
Delimiter Character Temporarily changes the symbol that separates character
ranges. The delimiter character reverts to the default
character when you close the character set.
Classifier Models
A classifier model analyzes input strings and determines the types of information that they contain. You use a
classifier model in a Classifier transformation.
Use a classifier model when input strings contain significant amounts of data. For example, you can use a classifier
model to identify the subject matter in a set of documents. You export the text from each document, and you store each
document as a separate field in a single data column. The Classifier transformation reads the data and classifies the
subject matter in each field according to the labels defined in the classifier model.
The classifier model contains the following columns:
Data column
A column that contains the words and phrases that are likely to exist in the input data. The transformation
compares the input data with the data in this column.
Label column
A column that contains descriptive labels that can define the information in the data. The transformation returns a
label from this column as output.
The classifier model also contains compilation data that the Classifier transformation uses to calculate the correct
information type for the input data.
You create a Classifier model in the Developer tool. The Model repository stores the metadata for the classifier model
object. The column data and compilation data are stored in a file in the Informatica directory structure.
Content Sets 9
Pattern Sets
A pattern set contains expressions that identify data patterns in the output of a token labeling operation. You can use
pattern sets to analyze the Tokenized Data output port and write matching strings to one or more output ports. Use
pattern sets in Parser transformations that use pattern parsing mode.
For example, you can configure a Parser transformation to use pattern sets that identify names and initials. This
transformation uses the pattern sets to analyze the output of a Labler transformation in token labeling mode. You can
configure the Parser transformation to write names and initials in the output to separate ports.
Pattern Set Properties
Configure properties that determine the patterns in a pattern set.
The following table describes the property for a user-defined pattern set:
Property Description
Pattern Defines the patterns that the pattern parser searches for.
You can enter multiple patterns for one pattern set. You can
enter patterns constructed from a combination of wildcards,
characters, and strings.
Probabilistic Models
A probabilistic model identifies data values by the types of information that they represent and by the position of the
values in an input string.
You use probabilistic models with the Labeler and Parser transformations.
A probabilistic model contains the following columns:
An input column that represents the data on the input port. You populate the column with sample data from the
input port. The model uses the sample data as reference data in parsing and labeling operations.
One or more label columns that identify the types of information in each input string. You add the label columns to
the model, and you assign labels to the data values in each string. Use the label columns to indicate the correct
position of the data values in the string.
You create a Classifier model in the Developer tool. The Model repository stores the metadata for the classifier model
object. The column data and compilation data are stored in a file in the Informatica directory structure.
The probabilistic model also contains compilation data that the transformations can use to calculate the correct
information type for the input data. You update the model logic when you compile the model in the Developer tool.
Regular Expressions
In the context of content sets, a regular expression is an expression that you can use in parsing and labeling
operations. Use regular expressions to identify one or more strings in input data. You can use regular expressions in
Parser transformations that use token parsing mode. You can also use regular expressions in Labeler transformations
that use token labeling mode.
Parser transformations use regular expressions to match patterns in input data and parse all matching strings to one
or more outputs. For example, you can use a regular expression to identify all email addresses in input data and parse
each email address component to a different output.
Labeler transformations use regular expressions to match an input pattern and create a single label. Regular
expressions that have multiple outputs do not generate multiple labels.
10 Chapter 2: Reference Data
Regular Expression Properties
Configure properties that determine how a regular expression identifies and writes output strings.
The following table describes the properties for a user-defined regular expression:
Property Description
Number of Outputs Defines the number of output ports that the regular
expression writes.
Regular Expression Defines a pattern that the Parser transformation uses to
match strings.
Test Expression Contains data that you enter to test the regular expression.
As you type data in this field, the field highlights strings that
matches the regular expression.
Next Expression Moves to the next string that matches the regular expression
and changes the font of that string to bold.
Previous Expression Moves to the previous string that matches the regular
expression and changes the font of that string to bold.
Token Sets
A token set contains expressions that identify specific tokens. You can use token sets in Labeler transformations that
use token labeling mode. You can also use token sets in Parser transformations that use token parsing mode.
Use token sets to identify specific tokens as part of labeling and parsing operations. For example, you can use a token
set to label all email addresses that use that use an "AccountName@DomainName" format. After labeling the tokens,
you can use the Parser transformation to write email addresses to output ports that you specify.
Token Set Properties
Configure properties that determine the labeling operations for a token set.
The following table describes the properties for a user-defined character set:
Property Token Set Mode Description
Name N/A Defines the name of the token
set.
Description N/A Describes the token set.
Token Set Options N/A Defines whether the token set
uses regular expression mode or
character mode.
Label Regular Expression Defines the label that a Labeler
transformation applies to data
that matches the token set.
Regular Expression Regular Expression Defines a pattern that the
Labeler transformation uses to
match strings.
Content Sets 11
Property Token Set Mode Description
Test Expression Regular Expression Contains data that you enter to
test the regular expression. As
you type data in this field, the
field highlights strings that match
the regular expression.
Next Expression Regular Expression Moves to the next string that
matches the regular expression
and changes the font of that
string to bold.
Previous Expression Regular Expression Moves to the previous string that
matches the regular expression
and changes the font of that
string to bold.
Label Character Defines the label that a Labeler
transformation applies to data
that matches the character set.
Standard Mode Character Enables a simple editing view
that includes fields for the start
range and end range.
Start Range Character Specifies the first character in a
character range.
End Range Character Specifies the last character in a
character range. For single-
character ranges, leave this field
blank.
Advanced Mode Character Enables an advanced editing
view where you can manually
enter character ranges using
range characters and delimiter
characters.
Range Character Character Temporarily changes the symbol
that signifies a character range.
The range character reverts to
the default character when you
close the character set.
Delimiter Character Character Temporarily changes the symbol
that separates character ranges.
The delimiter character reverts
to the default character when
you close the character set.
Creating a Content Set
Create content sets to group content expressions according to business requirements. You create content sets in the
Developer tool.
1. In the Object Explorer view, select the project or folder where you want to store the content set.
12 Chapter 2: Reference Data
2. Click File > New > Content Set.
3. Enter a name for the content set.
4. Optionally, select Browse to change the Model repository location for the content set.
5. Click Finish.
Creating a Reusable Content Expression
Create reusable content expressions from within a content set. You can use these content expressions in Labeler
transformations and Parser transformations.
1. Open a content set in the editor and select the Content view.
2. Select a content expression view.
3. Click Add.
4. Enter a name for the content expression.
5. Optionally, enter a text description of the content expression.
6. If you selected the Token Set expression view, select a token set mode.
7. Click Next.
8. Configure the content expression properties.
9. Click Finish.
Tip: You can create content expressions by copying them from another content set. Use the Copy To and Paste
From options to create copies of existing content expressions. You can use the CTRL key to select multiple content
expressions when using these options.
Generating Reference Data from a Midstream Profile
You can run a profile on mapping data to create a data source for a reference data object. For example, run a profile on
the object that you connect to a Labeler or Parser transformation. You can add the profile data to a probabilistic model.
When you create a probabilistic model with data from the midstream profile, you customize the model for the mapping
data.
Complete the following steps to run a midstream mapping profile and generate input data for a probabilistic model:
1. Open the mapping that contains the transformation you will connect to the Labeler or Parser.
2. Select a data object and click Profile Now.
Select the Results tab in the profile, and review the profile results.
3. Under Column Profiling, select the column you want to add to the probabilistic model.
4. Under Details, select the option to Show Values.
The editor displays the data values in the column you selected.
Note: You can select all values in the column or a subset of values.
5. If you want to add a subset of column values to a probabilistic model, follow these steps:
a. Use the Shift or Ctrl keys to select one or multiple values from the editor.
b. Right-click the values and select Send to > Export Results to File.
6. If you want to add all column values to a probabilistic model, click the option to Export Value Frequencies to
File.
Content Sets 13
7. In the Export dialog box, enter a file name. You can save the file on the Informatica services machine or on the
Developer client machine.
If you save the file on the client machine, enter a path to the file.
You can use the file as a data source for the Label or Data column in the probabilistic model.
14 Chapter 2: Reference Data
C H A P T E R 3
Classifier Models
This chapter includes the following topics:
Classifier Models Overview, 15
Classifier Model Structure, 16
Classifier Model Reference Data, 16
Classifier Model Label Data, 18
Classifier Scores, 19
Classifier Model Views, 19
Classifier Model Filters, 20
Creating a Classifier Model from a Data Object, 21
Copy and Paste Operations, 22
Classifier Models Overview
A classifier model is a reference data object. Use a classifier model to analyze long text strings that contain multiple
values. A classifier model identifies the most common type of information in each string.
You add a classifier model to a Classifier transformation. The transformation searches for common values between
the classifier model data and the data in each input row. The transformation uses the common values to categorize the
type of information that each row represents.
You use a classifier model when the input data has the following characteristics:
The input data contains text. Classifier models apply natural language processes to text data to identify the types
of information in the text. Natural language processes detect relevant words in the input string. Natural language
processes disregard words that are not relevant.
The input data strings contain multiple values. For example, you can create a data column that contains the
contents of an email message in each field.
The Classifier transformation reads string datatypes. The transformation imposes no limit on the length of the input
strings.
You compile classifier models in the Developer tool. When you compile a model, you create associations between
similar data values in the model. The Classifier transformation uses the compiled data to search for information in the
input data.
15
Classifier Transformation Example
You can use a classifier model and a Classifier transformation to categorize email messages based on the text that
they contain.
For example, you work in a customer support center, and you review the email messages that the organization
receives from customers. The organization has customers in many countries, and it receives emails in many
languages. You want to sort the emails by language, so that you can send each email to the center that can best reply
to the customer.
You complete the following steps to sort the emails:
1. You write the email messages to a single file or a database table.
2. You create a classifier model that contains sample text for each language.
Note: You can use sample data from the email messages data as source data for the model. Copy the email
message text to a file or database table, and create a data source from the file or table in the Model repository.
3. You add the classifier model to a Classifier transformation.
4. You add the transformation to a mapping, and you connect the transformation ports to the data source and data
targets. You create a data target for each language.
When you run the mapping, the Classifier transformation analyzes the email messages and writes the email text to the
correct data target. You can share the data target with the team members in the appropriate support center.
Classifier Model Structure
A classifier model contains a column of reference data values and a column that specifies a label for each row of
reference data values. When a Classifier transformation compares the input data and the model data, the
transformation returns the label that most closely describes the input data.
A classifier model also contains compilation data. The Classifier transformation uses the compilation data to calculate
similarities between the reference data and the input data. When you compile a model, you create or update the
compilation data.
The Model repository stores the classifier model object. When you create or update a classifier model, you write the
reference data and the compiled metadata to a file on the Developer tool machine. The file is read-only. You can read
the file path in the classifier model in the Developer tool.
Compiling a Classifier Model
Each time you edit a label value or reference data value in a classifier model, you must compile the model. When you
compile the model, you update the compilation data in the model.
u To update the compilation data, open the model in the Developer tool and click Compile.
Classifier Model Reference Data
A classifier model contains a reference data column that can include sentences, paragraphs, or pages of text. The
reference data represents the different types of text input that a Classifier transformation can read in a mapping. When
16 Chapter 3: Classifier Models
you create a model, verify that the reference data includes the types of text that you expect to find when you run the
mapping.
You can use the mapping source data to create a classifier model. Select a sample of the source data and copy the
data sample to the model.
Consider the following rules and guidelines when you work with classifier model reference data:
A reference data field can be of any length. You can enter pages of text into each data field.
You import reference data from a data object.
You cannot edit reference data values. However, you can delete a data row.
Adding Data and Label Values to a Classifier Model
Use a data source to append data to a classifier model. You can add data values and label values.
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. Click Append Data.
The Classifier Model wizard opens.
4. Browse the Model repository and select the data object that you want to use. Click Next.
Note: Do not select a social media data object as a data source.
5. Review the columns on the data object, and select a column to add as a data column or label column for the
model. You can add a reference data column and a label column in the same operation.
To use a data source column as the reference data column in the model, select the column name and click
Data.
You can select multiple data columns. The classifier model merges the contents of the columns you select into
a single column.
To use a data source column as the label column for the model, select the column name and click Label.
Click Next.
6. Select the number of rows to copy from the data source.
Select all rows, or enter the number of rows to copy. If you enter a number, the model counts the rows from the
start of the data set.
7. Click Finish and save the model.
After you append the data, verify that the data rows you added include label values.
Deleting Data Values from a Classifier Model
You can delete data values in the default view and in the detailed view of a classifier model. To delete all data values,
use the default view.
1. Open the classifier model in the Developer tool. To open the model, select the model name in the content set and
click Edit.
2. Select the row that contains the data you want to delete.
You can select a single row, multiple rows, or all rows.
3. Click Delete.
Classifier Model Reference Data 17
Classifier Model Label Data
A classifier model contains descriptive labels for the types of information in the reference data fields. When you create
a model, you assign a label to each data field.
You can select a column as the label data column when you add data source data to a classifier model. You can also
enter label names to the model. You can assign any label in the model to any reference data field.
Labels are independent of the reference data values that they describe. If you delete all table rows that contain a
selected label, you do not delete the label from the model. If you delete a label, you do not delete the reference data
values that you associated with the label.
Adding a Label to a Classifier Model
If you add data values without labels to a classifier model, add the labels separately.
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. Select the New Label option to add a label.
The label appears in the model.
4. Type a name in the New Label dialog box.
5. Click OK.
Assigning Labels to Classifier Model Data
Labels represent the types of information that a classifier model can identify in the source data. When you create a
classifier model, verify that every data row in the model has a label.
You can assign a label to one row or multiple rows.
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. Filter the data rows in the model to display the rows that do not have a label.
To display the rows that do not have a label, clear all label names in the Labels panel.
4. Select one or more data rows. You can use the Select All option to select all the rows that appear.
The model adds a check-mark to the rows that you select.
5. Browse the label values in the model, and select a label to apply to the data rows.
The model assigns the label to the rows you selected.
6. Compile the model to add the label names to the classifier model logic.
If you assign a label that you cleared from the display of label names, the model hides the rows. Select the label name
in the Labels panel to view the rows.
Deleting a Label from a Classifier Model
You can delete a label in the default view or the detailed view of a classifier model.
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. Click Properties.
18 Chapter 3: Classifier Models
4. In the Manage Labels dialog box, select one or more label to delete.
You can select multiple labels.
5. Click Delete.
6. Click OK to close the dialog box.
Classifier Scores
A Classifier transformation compares each row of input data with every row of reference data in a classifier model. The
transformation calculates a score for each comparison. The scores represent the degrees of similarity between the
input row and the reference data rows.
When you run a mapping that contains a Classifier transformation, the mapping returns the label that identifies the
reference data row with the highest score. The score range is 0 through 1. A high score indicates a strong match
between the input data and the model data.
Review the classifier scores to verify that the label output accurately describes each row of input data. You can also
review the scores to verify that the classifier model is appropriate to the input data. If the transformation output
contains a large percentage of low scores, the classifier model might be inappropriate. To improve the comparisons,
compile the model again. If the compiled model does not improve the scores, replace the model in the
transformation.
Classifier Model Views
You can use the default view and the detailed view to update the data in a classifier model. The default view displays
the label values and data values in a table. The detailed view displays the data values in a series of text boxes.
Use the default view to review and update the labels on each row. You can select one row, multiple rows, and all rows.
The default view can display approximately 100 characters of row data. The detailed view can display all data in each
row. Click the Classifier Model Data option to toggle between the views. Use the detailed view to review and update
the data in a single row.
You can add data, filter data rows, and add labels to rows in each view. You can search the data values in a single row
in the detailed view.
Classifier Scores 19
The following image show the default view of a classifier model that contains data for language classification:
Classifier Model Filters
You can apply filters to classifier model data in the default and detailed views.
You can use filters to perform the following tasks:
Find data values that do not have an associated label. Use the label options to filter the data rows that display in a
classifier model. If a data row does not use a label, add a label to the row.
Find data values in reference data rows. Use the filter in the default view to find data values in the reference data.
Verify that the reference data overlaps with the source data in a mapping.
Find a data value within a reference data row. Use the filter in the detailed view when you need to verify that a
reference data row contains a data value. A data row can contain a large quantity of data values.
Finding Values in Reference Data
Use the filter in the default view to verify that reference data rows contain the data values you expect.
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. Type a text value in the filter field.
The Developer tool displays the data rows that contain the filter text.
20 Chapter 3: Classifier Models
Finding Data Rows with no Labels
Clear all label options to display the reference data rows that do not have a label.
When you open a classifier model, the Developer tool displays all rows and labels by default.
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. In the default view, select or clear a label.
When you select a label, the model displays the data strings that you associated with the label. When you clear a
label, the model hides the data strings that you associated with the label.
4. To verify that all data strings use a label, clear all the label values. The model displays any string that does not use
a label.
5. Click a label value to add the label to the data string.
Finding a Value in a Data Row
Use the filter in the detailed view to search for a data value in a single data row.
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. Select the detailed view.
4. In the default view, enter a value in the filter field.
The model displays the data rows that contain the value.
5. Select a data row to search.
6. Type the search value in the search field below the data row.
The model highlights the first instance of the value in the row.
7. Click the Down arrow to find the next instance of the value in the row.
Use the Up and Down arrows to move through the values in the data row.
Creating a Classifier Model from a Data Object
Use a data object as the source for classifier model data.
A classifier model performs optimally when you use the input data from the Classifier transformation as the source for
the model reference data. For example, you can run a profile on the transformation object in the mapping. Create a
data object from the profile results.
1. In Object Explorer, open or create a content set.
2. Select the Content view.
3. Select Classifier Models, and click Add.
The Classifier Model wizard opens.
4. Enter a name for the classifier model.
Optionally, enter a text description of the model.
5. Browse the Model repository and select the data object that contains the reference data.
Click Next.
Creating a Classifier Model from a Data Object 21
6. Review the columns on the data object, and select a column to add as reference data values or label values for
the model.
To add a data column as reference data, select the column name and click Data.
To use a data column as a source for label values, select the column name and click Label.
Click Next.
7. Select the number of rows to copy from the data source.
Select all rows, or enter the number of rows to copy. If you enter a number, the model counts the rows from the
start of the data set.
8. Click Finish and save the model.
After you create the classifier model, compile the model.
Copy and Paste Operations
You can copy a classifier model from one content set to another in a Model repository. Copy a classifier model to share
resources with other Developer tool users.
You can copy a model to another content set, or you can import a model to the current content set. You can import
multiple models from multiple content sets in the repository in a single operation.
When you copy a model, the Content Management Service creates a copy of the model data file on the service
machine. Each model uses a different data file.
Copying a Classifier Model to Another Content Set
You can copy a classifier model from one content set to another in a Model repository. When you copy a classifier
model, you specify the model object and the source and destination content sets.
1. Open the content set that contains the classifier model.
2. Select a classifier model and click Copy To.
3. Browse the Model repository and select a content set.
You can copy the classifier model to a content set in the current project or another project.
4. Click OK.
The Developer tool copies the classifier model to the selected content set.
Importing a Classifier Model from Another Content Set
You can import a classifier model from one content set to another in a Model repository. When you import a classifier
model, you specify one or more model objects and the source and destination content sets.
1. Open the content set to contain the classifier model.
2. Select a classifier model and click Paste From.
3. Browse the Model repository and select a classifier model.
You can paste the classifier model from a content set in the current project or another project.
4. Click OK.
The Developer tool pastes the classifier model to the current content set.
22 Chapter 3: Classifier Models
C H A P T E R 4
Probabilistic Models
This chapter includes the following topics:
Probabilistic Models Overview, 23
Probabilistic Model Structure, 25
Probabilistic Model Reference Data, 27
Probabilistic Model Label Data, 27
Probabilistic Model Advanced Properties, 30
Creating an Empty Probabilistic Model, 30
Creating a Probabilistic Model from a Data Object, 31
Copy and Paste Operations, 32
Probabilistic Models Overview
A probabilistic model is a reference data object. Use a probabilistic model to understand the contents of a data string
that contains multiple data values. A probabilistic model identifies the types of information in each value in the
string.
You can add a probabilistic model to a Labeler transformation and a Parser transformation:
Use a probabilistic model in a Labeler transformation to assign a descriptive label to each value in a data string.
The Labeler transformation writes the labels to an output port in the same format as the input string.
Use a probabilistic model in a Parser transformation to write each value in an input string to a new port. The Parser
transformation creates an output port for each data category that you define in the probabilistic model.
Probabilistic models use natural language processes to identify the type of information in a string. Natural language
processes detect relevant terms in the input string and disregard terms that are not relevant.
You compile a probabilistic model in the Developer tool. When you compile a model, you create associations between
similar data values in the model. The Labeler and Parser transformations uses the compiled data to analyze the
values in the input strings.
23
Labeler Transformation Example
The customer database at an insurance organization contains multiple data entry errors. You are a data steward at the
insurance organization. You configure a mapping with a Labeler transformation to determine the different types of
data that each column contains.
The following data fragment contains customer account data:
Row ID Field 1 Field 2 Field 3
1 19132954 AIM SECURITIES PETRIE TAYBRO
2 10110169 JASE TRAPANI BANK OF NEW YORK
3 10111786 WANGER ASSET MANAGEMENT, LLP JAN SEEDORF
4 10112299 FELIX LEVENGER HARVARD MAGAZINE
5 10112036 DESCHNES & FILS LTE (QUEBEC) RICHARD TREMBLAY
6 BERGER ASSOCIATES 10111101 DAREEN HULSMAN
7 19131385 EAGLE FINANCIAL GROUP INC PATRICK MCKINNIE
8 LAKENYA PASKETT WHITEHALL FINANCIAL GROUP 15954710
When you run the mapping, the Labeler transformation compares the input data with the probabilistic model reference
data. The Labeler transformation assigns a label to each input value. The transformation writes the labels to an output
port. Each output row contains a set of labels that defines the data structure on the corresponding input row.
The Labeler adds the following labels to the output port:
Row ID Output Labels
1 number organization contact
2 number contact organization
3 number organization contact
4 number contact organization
5 number organization contact
6 organization number contact
7 organization number contact
8 contact organization number
24 Chapter 4: Probabilistic Models
Parser Transformation Example
A supermarket stores product descriptions in a single column in a database table. The product descriptions contain
multiple data values that represent different types of information. You are a data steward at the supermarket. You
want to create columns for the different types of information in the product descriptions.
You configure a mapping with a Parser transformation to organize the data values into the correct fields.
The following data fragment contains product data for orange juice:
Product Description
Sunnydream Orange Juice Unsweetened 12 oz
When you run the mapping, the Parser transformation writes the input data to the following output ports:
Product Name Product Type Product Details Product Size
Sunnydream Orange Juice Unsweetened 12 oz
Probabilistic Model Structure
A probabilistic model contains a column of reference data values and one or more columns of label values. The
reference data values represent the different values that can appear in the input data. The labels represent the types
of information that the input data values can contain.
A Labeler or Parser transformation uses the label values to analyze the structure of the data in an input row. The
structure of the row determines the type of information in each data value. Each input row can have a different
structure. When you assign labels to the reference data values, you define the possible order in which the input data
values might appear.
A probabilistic model also contains compilation data. The transformations use the compilation data to calculate
similarities between the reference data and the input data. When you compile a model, you create or update the
compilation data.
Probabilistic Model Structure 25
The following figure shows a probabilistic model in the Developer tool:
When you use a probabilistic model in a Labeler transformation, the Labeler assigns a label value to each value in the
input row. For example, the transformation labels the string "Franklin Delano Roosevelt" as "FIRSTNAME
MIDDLENAME LASTNAME."
When you use a probabilistic model in a Parser transformation, the Parser writes each input value to an output port
based on the label that matches the value. For example, the Parser writes the string "Franklin Delano Roosevelt" to
FIRSTNAME, MIDDLENAME, and LASTNAME output ports.
Compiling the Probabilistic Model
Each time you update a probabilistic model, you can compile the model. Compile the model to update the model logic
with the current data values and label values.
Before you compile the model, verify that all label values identify at least one data value.
u To compile the model, open the model in the Developer tool and click Compile.
26 Chapter 4: Probabilistic Models
Probabilistic Model Reference Data
The reference data values in a probabilistic model represent the types of input data that a Labeler or Parser
transformation might read in a mapping.
You can add, edit, and delete reference data rows in the Developer tool. You can paste reference data from the
clipboard. You can also use mapping source data as a source for the reference data values in a probabilistic
model.
After you add the data values, create the label columns and assign a label to each data value in the each row of the
model.
Note: The label values in the model indicate the possible structure of the input rows that the transformation reads.
Verify that the probabilistic model contains the label structure that you expect the transformation to find in the source
data.
Adding a Reference Data String to a Probabilistic Model
You can use the keyboard to enter a reference data string to a probabilistic model.
1. Open the content set that contains the model.
Select the model name and click Edit.
2. Click New to add an empty row to the model.
3. Enter a data string to the data field.
4. Save the probabilistic model.
After you add a reference data string to a model, assign a label to each value in the string.
Copying Data Strings to a Probabilistic Model
You can copy a column of data strings to a probabilistic model from the Windows clipboard. Add the data you copy to
the Data column.
1. Open a data source in a data application and select one or more cells in a data column.
2. Copy the cells to the clipboard.
3. Open the content set that contains the model.
4. Select the model name and click Edit.
5. Right-click a cell in the Data column and click Paste.
The copied cells appear in the Data column.
6. Save the probabilistic model.
Probabilistic Model Label Data
A probabilistic model contains descriptive labels for the types of information in the reference data. When you create a
model or add reference data to a model, assign a label to each reference data value.
The labels you create appear as columns in the probabilistic model. When you assign a label to a data value, the
model adds the value to the label column. You can assign any label in the model to any reference data value. If the
Probabilistic Model Reference Data 27
same value has different meanings in two rows of reference data, you can assign different labels to the value in each
row.
You can define the same combination of labels for multiple input strings. Multiple examples of a label increase the
likelihood that the probabilistic model assigns the correct label to an input data value.
Address Data Example
You can build a probabilistic model to parse address data values. The probabilistic model determines the address
data type from the value and also from its position in the input string. For example, the model can determine when the
same value is a street name or an address suffix.
The following table shows how you can assign labels to address data values in different combinations:
Reference Data Label 1 - Street Names Label 2 - Address Suffixes
Park Place Park Place
Park Avenue Park Avenue
Madison Avenue Madison Avenue
Central Park Central Park
State Street State Street
The Labeler transformation can return any of the label combinations that you define in the model. Organize the label
columns from left to right in the order in which you want the labels to appear in the output data.
Note: If you add or remove a label in a probabilistic model after you add the model to a Parser transformation, you
invalidate the parsing operation that uses the model. You must delete and re-create the operation that uses the
probabilistic model.
If a probabilistic model contains a label value that does not identify a data value, you cannot compile the model.
Overflow Label
When a transformation cannot assign a label that you define to an input data value, the transformation assigns an
overflow label to the data.
The Labeler transformation assigns an overflow label to any data value that it cannot identify. The Parser
transformation creates an overflow column for unassigned data.
A transformation can fail to recognize an input value if the number of values in the input row exceeds the number of
labels in the probabilistic model. Before you use a model in a mapping, review the mapping source data and verify that
the model contains the correct number of label values.
The following table shows how a Parser transformation uses an overflow port to parse data that a probabilistic model
cannot recognize:
Input Data Street_Names port Address_Suffixes port Overflow port
Park Place Park Place
Park Avenue Park Avenue
Madison Avenue Madison Avenue
28 Chapter 4: Probabilistic Models
Input Data Street_Names port Address_Suffixes port Overflow port
Central Park Central Park
Washington Square Park Washington Square Park
Madison Square Garden Madison Square Garden
Assigning Labels to Probabilistic Model Data
Assign a label to every data value in every row, in a model.
You can assign different labels to the same data value if the data value appears in multiple locations in the input data.
For example, you can assign the labels FIRSTNAME LASTNAME to the names "John Blake" and "Blake Smith."
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. Verify that the model contains the reference data that you need.
4. Right-click an input data row and select New Label. Enter a column name in the New Label dialog box.
The label appears in the model.
5. Right-click an input data row and select View tokens and labels as rows.
The Labels panel displays under the input data column. The panel displays each reference data value as a data
row.
6. In the Tokens column, select a reference data value.
7. In the Labels column, select a label to assign to the data value.
8. Save the probabilistic model.
Note: A label is a structural element in a model. If you add or remove a label after you add the model to a
transformation, you invalidate the operation that uses the model. Delete and re-create the transformation operation.
Adding a Label to a Probabilistic Model
Add a label for every type of data value in the Data column. If you use the probabilistic model in a Parser
transformation, add a label for each output port that you expect the transformation to create.
1. Open the content set that contains the model.
2. Select the model name and click Edit.
3. From the Label menu, select New.
4. In the New Label dialog box, enter a label name.
5. Click OK to add the label to the model.
Deleting a Label from a Probabilistic Model
When you delete a label from a model, any data value associated with the label remains in the model. Assign another
label to each data value.
1. Open the probabilistic model in the Developer tool.
To open the model, select the model name in the content set and click Edit.
2. From the Label menu, select Edit.
Probabilistic Model Label Data 29
3. In the Edit Label dialog box, select a label name.
4. Click Delete to delete the label.
5. Click OK.
Probabilistic Model Advanced Properties
You can review the computational properties that the Developer tool uses to compile a probabilistic model. Open the
Advanced Properties dialog box to review the properties.
The basic element in the compilation of probabilistic models is the n-gram. An n-gram is a series of letters that follow or
precede other letters to complete a word. When a mapping runs, the Labeler or Parser transformation creates multiple
n-grams for each value in the reference data column of the probabilistic model. The transformation compares the input
data values with the reference data values and the n-grams. The advanced properties on a probabilistic model
determine how the probabilistic model handles n-grams and other model features.
Note: The default property values represent the preferred settings for probabilistic analysis and probabilistic model
compilation. If you edit an advanced property, you may adversely affect the accuracy of the probabilistic analysis. Do
not edit the advanced properties unless you understand the effects of the changes you make.
Creating an Empty Probabilistic Model
You can use a data object as the source for the data in a probabilistic model, or you can create an empty model.
Create an empty probabilistic model when you want to enter reference data at a later time.
1. In Object Explorer, open or create a content set.
2. Select the Content view.
3. Select Probabilistic Models, and click Add.
The Probabilistic Model wizard opens.
4. Select the Probabilistic Model option.
Click Next.
5. Enter a name for the model.
Click Finish and save the model.
The probabilistic model opens in the Developer tool.
After you create the empty model, you must add input data.
30 Chapter 4: Probabilistic Models
Creating a Probabilistic Model from a Data Object
You can use a data object as a source for probabilistic model data. For example, use the source data object from the
mapping that will read the probabilistic model. You can also profile an object in the mapping and create a data object
from the profile results.
A probabilistic model performs optimally when you use the input data from the Labeler or Parser transformation as the
source for the model reference data. For example, you can run a profile on the transformation object in the mapping.
Create a data object from the profile results.
1. In Object Explorer, open or create a content set.
2. Select the Content view.
3. Select Probabilistic Models, and click Add.
The Probabilistic Model wizard opens.
4. Select the Probabilistic Model from Data Objects option.
Click Next.
5. Enter a name for the probabilistic model.
Optionally, enter a text description of the model.
6. Browse the Model repository and select the data object that contains the reference data.
Click Next.
7. Review the data columns on the data object, and select a column to add as reference data values or label values
for the model.
To add a data column as reference data, select the column name and click Data.
To use a data column as a source for label values, select the column name and click Label.
Click Next.
8. Select the number of rows to copy from the data source.
Select all rows, or enter the number of rows to copy. If you enter a number, the model counts the rows from the
start of the data set.
9. Set the delimiters to use for the reference data values. Specify a delimiter to identify multiple values that
represent a single piece of information.
The default delimiter is a character space.
10. Enter a name for the overflow column.
The overflow column contain a any token that the labeling or parsing operation cannot recognize.
The default name is O.
11. Click Finish and save the model.
The probabilistic model opens in the Developer tool.
After you create the probabilistic model, compile the model.
Creating a Probabilistic Model from a Data Object 31
Copy and Paste Operations
You can copy a probabilistic model from one content set to another in a Model repository. Copy a probabilistic model
to share resources with other Developer tool users.
You can copy a model to another content set, or you can import a model to the current content set. You can import
multiple models from multiple content sets in the repository in a single operation.
When you copy a model, the Content Management Service creates a copy of the model data file on the service
machine. Each model uses a different data file.
Copying a Probabilistic Model to Another Content Set
You can copy a probabilistic model from one content set to another in a Model repository. When you copy a
probabilistic model, you specify the model object and the source and destination content sets.
1. Open the content set that contains the probabilistic model.
2. Select a probabilistic model and click Copy To.
3. Browse the Model repository and select a content set.
You can copy the probabilistic model to a content set in the current project or another project.
4. Click OK.
The Developer tool copies the probabilistic model to the selected content set.
Importing a Probabilistic Model from Another Content Set
You can import a probabilistic model from one content set to another in a Model repository. When you import a
probabilistic model, you specify one or more model objects and the source and destination content sets.
1. Open the content set to contain the probabilistic model.
2. Select a probabilistic model and click Paste From.
3. Browse the Model repository and select a probabilistic model.
You can paste the probabilistic model from a content set in the current project or another project.
4. Click OK.
The Developer tool pastes the probabilistic model to the current content set.
32 Chapter 4: Probabilistic Models
Part II: Data Quality Features in
Informatica Developer
This part contains the following chapters:
Column Profiles in Informatica Developer, 34
Column Profile Results in Informatica Developer, 38
Rules in Informatica Developer, 41
Scorecards in Informatica Developer, 43
Mapplet and Mapping Profiling, 45
Reference Tables, 47
33
C H A P T E R 5
Column Profiles in Informatica
Developer
This chapter includes the following topics:
Column Profile Concepts Overview, 34
Column Profile Options, 35
Rules, 35
Scorecards, 35
Column Profiles in Informatica Developer, 36
Creating a Single Data Object Profile, 37
Column Profile Concepts Overview
A column profile determines the characteristics of columns in a data source, such as value frequency, percentages,
and patterns.
Column profiling discovers the following facts about data:
The number of unique and null values in each column, expressed as a number and a percentage.
The patterns of data in each column and the frequencies with which these values occur.
Statistics about the column values, such as the maximum and minimum lengths of values and the first and last
values in each column.
Use column profile options to select the columns on which you want to run a profile, set data sampling options, and set
drill-down options when you create a profile.
A rule is business logic that defines conditions applied to source data when you run a profile. You can add a rule to the
profile to validate data.
Create scorecards to periodically review data quality. You create scorecards before and after you apply rules to
profiles so that you can view a graphical representation of the valid values for columns.
34
Column Profile Options
When you create a profile with the Column Profiling option, you can use the profile wizard to define filter and
sampling options. These options determine how the profile reads rows from the data set.
After you complete the steps in the profile wizard, you can add a rule to the profile. The rule can have the business
logic to perform data transformation operations on the data before column profiling.
Rules
Create and apply rules within profiles. A rule is business logic that defines conditions applied to data when you run a
profile. Use rules to further validate the data in a profile and to measure data quality progress.
You can add a rule after you create a profile. You can reuse rules created in either the Analyst tool or Developer tool in
both the tools. Add rules to a profile by selecting a reusable rule or create an expression rule. An expression rule uses
both expression functions and columns to define rule logic. After you create an expression rule, you can make the rule
reusable.
Create expression rules in the Analyst tool. In the Developer tool, you can create a mapplet and validate the mapplet
as a rule. You can run rules from both the Analyst tool and Developer tool.
Scorecards
A scorecard is the graphical representation of the valid values for a column or output of a rule in profile results. Use
scorecards to measure data quality progress. You can create a scorecard from a profile and monitor the progress of
data quality over time.
A scorecard has multiple components, such as metrics, metric groups, and thresholds. After you run a profile, you can
add source columns as metrics to a scorecard and configure the valid values for the metrics. Use a metric group to
categorize related metrics in a scorecard into a set. A threshold identifies the range, in percentage, of bad data that is
acceptable for columns in a record. You can set thresholds for good, acceptable, or unacceptable ranges of data.
When you run a scorecard, you can configure whether you want to drill down on the metrics for a score on the live data
or staged data. After you run a scorecard and view the scores, you can drill down on each metric to identify valid data
records and records that are not valid. You can also view scorecard lineage for each metric or metric group in a
scorecard. To track data quality effectively, you can use trend charts and monitor how the scores change over a period
of time.
The profiling warehouse stores the scorecard statistics and configuration information. You can configure a third-party
application to get the scorecard results and run reports. You can also display the scorecard results in a web
application, portal, or report such as a business intelligence report.
Column Profile Options 35
Column Profiles in Informatica Developer
Use a column profile to analyze the characteristics of columns in a data set, such as value percentages and value
patterns. You can add filters to determine the rows that the profile reads at runtime. The profile does not process rows
that do not meet the filter criteria.
You can discover the following types of information about the columns you profile:
The number of times a value appears in a column.
The frequency of occurrence of each value in a column, expressed as a percentage.
The character patterns of the values in a column.
The maximum and minimum lengths of the values in a column, and the first and last values.
You can define a column profile for a data object in a mapping or mapplet or an object in the Model repository. The
object in the repository can be in a single data object profile, multiple data object profile, or profile model.
You can add rules to a column profile. Use rules to select a subset of source data for profiling. You can also change the
drilldown options for column profiles to determine whether the drilldown reads from staged data or live data.
Filtering Options
You can add filters to determine the rows that a column profile uses when performing profiling operations. The profile
does not process rows that do not meet the filter criteria.
1. Create or open a column profile.
2. Select the Filter view.
3. Click Add.
4. Select a filter type and click Next.
5. Enter a name for the filter. Optionally, enter a text description of the filter.
6. Select Set as Active to apply the filter to the profile. Click Next.
7. Define the filter criteria.
8. Click Finish.
Sampling Properties
Configure the sampling properties to determine the number of rows that the profile reads during a profiling
operation.
The following table describes the sampling properties:
Property Description
All Rows Reads all rows from the source. Default is enabled.
First Reads from the first row up to the row you specify.
Random Sample of Reads a random sample from the number of rows that you specify.
Random Sample (Auto) Reads from a random sample of rows.
36 Chapter 5: Column Profiles in Informatica Developer
Creating a Single Data Object Profile
You can create a single data object profile for one or more columns in a data object and store the profile object in the
Model repository.
1. In the Object Explorer view, select the data object you want to profile.
2. Click File > New > Profile to open the profile wizard.
3. Select Profile and click Next.
4. Enter a name for the profile and verify the project location. If required, browse to a new location.
5. Optionally, enter a text description of the profile.
6. Verify that the name of the data object you selected appears in the Data Objects section.
7. Click Next.
8. Configure the profile operations that you want to perform. You can configure the following operations:
Column profiling
Primary key discovery
Functional dependency discovery
Data domain discovery
Note: To enable a profile operation, select Enabled as part of the "Run Profile" action for that operation.
Column profiling is enabled by default.
9. Review the options for your profile.
You can edit the column selection for all profile types. Review the filter and sampling options for column profiles.
You can review the inference options for primary key, functional dependency, and data domain discovery. You
can also review data domain selection for data domain discovery.
10. Review the drilldown options, and edit them if necessary. By default, the Enable Row Drilldown option is
selected. You can edit drilldown options for column profiles. The options also determine whether drilldown
operations read from the data source or from staged data, and whether the profile stores result data from previous
profile runs.
11. Click Finish.
Creating a Single Data Object Profile 37
C H A P T E R 6
Column Profile Results in
Informatica Developer
This chapter includes the following topics:
Column Profile Results in Informatica Developer, 38
Column Value Properties, 39
Column Pattern Properties, 39
Column Statistics Properties, 39
Exporting Profile Results from Informatica Developer, 40
Column Profile Results in Informatica Developer
Column profile analysis provides information about data quality by highlighting value frequencies, patterns and
statistics of data.
The following table describes the profile results for each type of analysis:
Profile Type Profile Results
Column profile - Percentage and count statistics for unique and null values
- Inferred datatypes
- The datatype that the data source declares for the data
- The maximum and minimum values
- The date and time of the most recent profile run
- Percentage and count statistics for each unique data element in a column
- Percentage and count statistics for each unique character pattern in a column
Primary key profile - Inferred primary keys
- Key violations
Functional dependency profile - Inferred functional dependencies
- Functional dependency violations
38
Column Value Properties
Column value properties show the values in the profiled columns and the frequency with which each value appears in
each column. The frequencies are shown as a number, a percentage, and a bar chart.
To view column value properties, select Values from the Show list. Double-click a column value to drill-down to the
rows that contain the value.
The following table describes the properties for column values:
Property Description
Values List of all values for the column in the profile.
Frequency Number of times a value appears in a column.
Percent Number of times a value appears in a column, expressed as a percentage of all
values in the column.
Chart Bar chart for the percentage.
Column Pattern Properties
Column pattern properties show the patterns of data in the profiled columns and the frequency with which the patterns
appear in each column. The patterns are shown as a number, a percentage, and a bar chart.
To view pattern information, select Patterns from the Show list. Double-click a pattern to drill-down to the rows that
contain the pattern.
The following table describes the properties for column value patterns:
Property Description
Patterns Pattern for the selected column.
Frequency Number of times a pattern appears in a column.
Percent Number of times a pattern appears in a column, expressed as a percentage of all
values in the column.
Chart Bar chart for the percentage.
Column Statistics Properties
Column statistics properties provide maximum and minimum lengths of values and first and last values.
To view statistical information, select Statistics from the Show list.
Column Value Properties 39
The following table describes the column statistics properties:
Property Description
Maximum Length Length of the longest value in the column.
Minimum Length Length of the shortest value in the column.
Bottom Last five values in the column.
Top First five values in the column.
Note: The profile also displays average and standard deviation statistics for columns of type Integer.
Exporting Profile Results from Informatica Developer
You can export column values and column pattern data from profile results.
Export column values in Distinct Value Count format. Export pattern values in Domain Inference format.
1. In the Object Explorer view, select and open a profile.
2. Optionally, run the profile to update the profile results.
3. Select the Results view.
4. Select the column that contains the data for export.
5. Under Details, select Values or select Patterns and click the Export button.
The Export data to a file dialog box opens.
6. Accept or change the file name. The default name is [Profile_name]_[column_name]_DVC for column value
data and [Profile_name]_[column_name]_DI for pattern data.
7. Select the type of data to export. You can select either Values for the selected column or Patterns for the
selected column.
8. Under Save, select Save on Client.
9. Click Browse to select a location and save the file locally in your computer. By default, Informatica Developer
writes the file to a location set in the Data Integration Service properties of Informatica Administrator.
10. If you do not want to export field names as the first row, clear the Export field names as first row check box.
11. Click OK.
40 Chapter 6: Column Profile Results in Informatica Developer
C H A P T E R 7
Rules in Informatica Developer
This chapter includes the following topics:
Rules in Informatica Developer Overview, 41
Creating a Rule in Informatica Developer, 41
Applying a Rule in Informatica Developer, 42
Rules in Informatica Developer Overview
A rule is business logic that defines conditions applied to source data when you run a profile. You can create reusable
rules from mapplets in the Developer tool. You can reuse these rules in Analyst tool profiles to validate source
data.
Create a mapplet and validate as a rule. This rule appears as a reusable rule in the Analyst tool. You can apply the rule
to a column profile in the Developer tool or in the Analyst tool.
A rule must meet the following requirements:
It must contain an Input and Output transformation. You cannot use data sources in a rule.
It can contain Expression transformations, Lookup transformations, and passive data quality transformations. It
cannot contain any other type of transformation. For example, a rule cannot contain a Match transformation as it is
an active transformation.
It does not specify cardinality between input groups.
Creating a Rule in Informatica Developer
You need to validate a mapplet as a rule to create a rule in the Developer tool.
Create a mapplet in the Developer tool.
1. Right-click the mapplet editor.
2. Select Validate As > Rule.
41
Applying a Rule in Informatica Developer
You can add a rule to a saved column profile. You cannot add a rule to a profile configured for join analysis.
1. Browse the Object Explorer view and find the profile you need.
2. Right-click the profile and select Open.
The profile opens in the editor.
3. Click the Definition tab, and select Rules.
4. Click Add.
The Apply Rule dialog box opens.
5. Click Browse to find the rule you want to apply.
Select a rule from a repository project, and click OK.
6. Click the Value column under Input Values to select an input port for the rule.
7. Optionally, click the Value column under Output Values to edit the name of the rule output port.
The rule appears in the Definition tab.
42 Chapter 7: Rules in Informatica Developer
C H A P T E R 8
Scorecards in Informatica
Developer
This chapter includes the following topics:
Scorecards in Informatica Developer Overview, 43
Creating a Scorecard, 43
Exporting a Resource File for Scorecard Lineage, 44
Viewing Scorecard Lineage from Informatica Developer, 44
Scorecards in Informatica Developer Overview
A scorecard is a graphical representation of the quality measurements in a profile. You can view scorecards in the
Developer tool. After you create a scorecard in the Developer tool, you can connect to the Analyst tool to open the
scorecard for editing. Run the scorecard on current data in the data object or on data stored in the staging
database.
You can edit a scorecard, run the scorecard, and view the scorecard lineage for a metric or metric group in the Analyst
tool.
Creating a Scorecard
Create a scorecard and add columns from a profile to the scorecard. You must run a profile before you add columns to
the scorecard.
1. In the Object Explorer view, select the project or folder where you want to create the scorecard.
2. Click File > New > Scorecard.
The New Scorecard dialog box appears.
3. Click Add.
The Select Profile dialog box appears. Select the profile that contains the columns you want to add.
4. Click OK, then click Next.
5. Select the columns that you want to add to the scorecard.
43
By default, the scorecard wizard selects the columns and rules defined in the profile. You cannot add columns
that are not included in the profile.
6. Click Finish.
The Developer tool creates the scorecard.
7. Optionally, click Open with Informatica Analyst to connect to the Analyst tool and open the scorecard in the
Analyst tool.
Exporting a Resource File for Scorecard Lineage
You can export a project containing scorecards and dependent objects as a resource file for Metadata Manager. Use
the exported resource file in the XML format to create and load a resource for scorecard lineage in Metadata
Manager.
1. To open the Export wizard, click File > Export.
2. Select Informatica > Resource File for Metadata Manager.
3. Click Next.
4. Click Browse to select a project that contains the scorecard objects and lineage that you need to export.
5. Click Next.
6. Select the scorecard objects that you want to export.
7. Enter the export file name and file location.
8. To view the dependent objects that the Export wizard exports with the objects that you selected, click Next.
The Export wizard displays the dependent objects.
9. Click Finish.
The Developer tool exports the objects to the XML file.
Viewing Scorecard Lineage from Informatica Developer
To view the scorecard lineage for a metric or metric group from the Developer tool, launch the Analyst tool.
1. In the Object Explorer view, select the project or folder that contains the scorecard.
2. Double-click the scorecard to open it.
The scorecard appears in a tab.
3. Click Open with Informatica Analyst.
The Analyst tool opens in the browser window.
4. In the Scorecard view of the Analyst tool, select a metric or metric group.
5. Right-click and select Show Lineage.
The scorecard lineage diagram appears in a dialog box.
44 Chapter 8: Scorecards in Informatica Developer
C H A P T E R 9
Mapplet and Mapping Profiling
This chapter includes the following topics:
Mapplet and Mapping Profiling Overview, 45
Running a Profile on a Mapplet or Mapping Object, 45
Comparing Profiles for Mapping or Mapplet Objects, 46
Generating a Mapping from a Profile, 46
Mapplet and Mapping Profiling Overview
You can define a column profile for an object in a mapplet or mapping. Run a profile on a mapplet or a mapping object
when you want to verify the design of the mapping or mapplet without saving the profile results. You can also generate
a mapping from a profile.
Running a Profile on a Mapplet or Mapping Object
When you run a profile on a mapplet or mapping object, the profile runs on all data columns and enables drill-down
operations on the data that is staged for the data object. You can run a profile on a mapplet or mapping object with
multiple output ports.
The profile traces the source data through the mapping to the output ports of the object you selected. The profile
analyzes the data that would appear on those ports if you ran the mapping.
1. Open a mapplet or mapping.
2. Verify that the mapplet or mapping is valid.
3. Right-click a data object or transformation and select Profile Now.
If the transformation has multiple output groups, the Select Output Group dialog box appears. If the
transformation has a single output group, the profile results appear on the Results tab of the profile.
4. If the transformation has multiple output groups, select the output groups as necessary.
5. Click OK.
The profile results appears in the Results tab of the profile.
45
Comparing Profiles for Mapping or Mapplet Objects
You can create a profile that analyzes two objects in a mapplet or mapping and compares the results of the column
profiles for those objects.
Like profiles of single mapping or mapplet objects, profile comparisons run on all data columns and enable drill-down
operations on the data that is staged for the data objects.
1. Open a mapplet or mapping.
2. Verify that the mapplet or mapping is valid.
3. Press the CTRL key and click two objects in the editor.
4. Right-click one of the objects and select Compare Profiles.
5. Optionally, configure the profile comparison to match columns from one object to the other object.
6. Optionally, match columns by clicking a column in one object and dragging it onto a column in the other
object.
7. Optionally, choose whether the profile analyzes all columns or matched columns only.
8. Click OK.
Generating a Mapping from a Profile
You can create a mapping object from a profile. Use the mapping object you create to develop a valid mapping. The
mapping you create has a data source based on the profiled object and can contain transformations based on profile
rule logic. After you create the mapping, add objects to complete it.
1. In the Object Explorer view, find the profile on which to create the mapping.
2. Right-click the profile name and select Generate Mapping.
The Generate Mapping dialog box displays.
3. Enter a mapping name. Optionally, enter a description for the mapping.
4. Confirm the folder location for the mapping.
By default, the Developer tool creates the mapping in the Mappings folder in the same project as the profile. Click
Browse to select a different location for the mapping.
5. Confirm the profile definition that the Developer tool uses to create the mapping. To use another profile, click
Select Profile.
6. Click Finish.
The mapping appears in the Object Explorer.
Add objects to the mapping to complete it.
46 Chapter 9: Mapplet and Mapping Profiling
C H A P T E R 1 0
Reference Tables
This chapter includes the following topics:
Reference Tables Overview, 47
Reference Table Data Properties, 47
Creating a Reference Table Object, 48
Creating a Reference Table from a Flat File, 49
Creating a Reference Table from a Relational Source , 50
Copying a Reference Table in the Model Repository, 51
Editing Reference Table Data, 52
Finding Data Values in a Reference Table , 52
Reference Tables Overview
A reference table contains standard versions and alternative versions of a set of data values. You configure a
transformation to read a reference table to verify that source data values are accurate and correctly formatted.
Reference tables store metadata in the Model repository. Reference tables can store column data in the reference
data warehouse or in another database.
Use the Developer tool to create reference tables and to update reference table data values.
Reference Table Data Properties
You can view properties for reference table data and metadata in the Developer tool. The Developer tool displays the
properties when you open the reference table from the Model repository.
A reference table displays general properties and column properties. You can view reference table properties in the
Developer tool. You can view and edit reference table properties in the Analyst tool.
47
The following table describes the general properties of a reference table:
Property Description
Name Name of the reference table.
Description Optional description of the reference table.
The following table describes the column properties of a reference table:
Property Description
Valid Identifies the column that contains the valid reference data.
Name Name of each column.
Data Type Data type of the data in each column.
Precision Precision of each column.
Scale Scale of each column.
Description Description of the contents of the column. You can optionally
add a description when you create the reference table.
Include a column for low-level descriptions Indicates that the reference table contains a column for
descriptions of column data.
Default value Default value for the fields in the column. You can optionally
add a default value when you create the reference table.
Connection Name Name of the connection to the database that contains the
reference table data values.
Creating a Reference Table Object
Choose this option when you want to create an empty reference table and add values by hand.
1. Select File > New > Reference Table from the Developer tool menu.
2. In the new table wizard, select Reference Table as Empty.
3. Enter a name for the table.
4. Select a project to store the table metadata.
At the Location field, click Browse. The Select Location dialog box opens and displays the projects in the
repository. Select the project you need.
Click Next.
5. Add two or more columns to the table. Click the New option to create a column.
48 Chapter 10: Reference Tables
Set the following properties for each column:
Property Default Value
Name column
Data Type string
Precision 10
Scale 0
Description Empty. Optional property.
6. Select the column that contains the valid values. You can change the order of the columns that you create.
7. Optionally, edit the following properties:
Property Default Value
Include a column for row-level descriptions Cleared
Audit note Empty
Default value Empty
Click Finish.
The reference table opens in the Developer tool workspace.
Creating a Reference Table from a Flat File
You can create a reference table from data stored in a flat file.
1. Select File > New > Reference Table from the Developer tool menu.
2. In the new table wizard, select Reference Table from a Flat File.
3. Browse to the file you want to use as the data source for the table.
4. Enter a name for the table.
5. Select a project to store the table metadata.
At the Location field, click Browse. The Select Location dialog box opens and displays the projects in the
repository. Select the project you need.
Click Next.
6. Set UTF-8 as the code page.
7. Specify the delimiter that the flat file uses.
8. If the flat file contains column names, select the option to import column names from the first line of the file.
Creating a Reference Table from a Flat File 49
9. Optionally, edit the following properties:
Property Default Value
Text qualifier No quotation marks
Start import at line Line 1
Row Delimiter \012 LF (\n)
Treat consecutive delimiters as one Cleared
Escape character Empty
Retain escape character in data Cleared
Maximum rows to preview 500
Click Next.
10. Select the column that contains the valid values.
11. Optionally, edit the following properties:
Property Default Value
Include a column for row-level descriptions Cleared
Audit note Empty
Default value Empty
Maximum rows to preview 500
Click Finish.
The reference table opens in the Developer tool workspace.
Creating a Reference Table from a Relational Source
You can use a database source to create a managed or unmanaged reference table.
Note: You can configure a database connection in the Connection Explorer. If the Developer tool does not show the
Connection Explorer, select Window > Show View > Connection Explorer from the Developer tool menu.
1. Select File > New > Reference Table from the Developer tool menu.
2. In the new table wizard, select Reference Table from a Relational Source. Click Next.
3. Select a database connection.
At the Connection field, click Browse. The Choose Connection dialog box opens and displays the available
database connections.
To create a managed reference table, connect to the reference data warehouse. To create an unmanaged
reference table, connect to a different database.
50 Chapter 10: Reference Tables
Click OK when you select a connection.
4. If the database connection you select does not specify the reference data warehouse, select Unmanaged
table.
If you want to perform edit operations on an unmanaged reference table, select the Editable option.
5. Select a database table.
At the Resource field, click Browse. The Select a Resource dialog box opens and displays the resources on the
database connection. Explore the database and select a database table.
6. Enter a name for the table.
7. Select a project to store the reference table object.
At the Location field, click Browse. The Select Location dialog box opens and displays the projects in the
repository.
Select a project and click Next.
8. Select the column that contains the valid values.
9. Optionally, edit the following properties:
Property Default Value
Include a column for row-level descriptions Cleared
Audit note Empty
Default value Empty
Maximum rows to preview 500
Click Finish.
Copying a Reference Table in the Model Repository
You can copy a reference table between projects and folders in the Model repository. Copy a reference table to share
resources with other Developer tool users.
The reference table and the copy you create are not linked in the Model repository or in the database. When you create
a copy, you create a new database table.
1. Browse the Model repository and find the reference table you want to copy.
2. Right-click the reference table and select Copy from the context menu.
3. In the Model repository, find the project or folder you want to store to copy of the table.
4. Click Paste.
Copying a Reference Table in the Model Repository 51
Editing Reference Table Data
You can edit reference table data values in the Developer tool.
1. In the Object Explorer, select the project or folder that contains the reference table.
2. Right-click the reference table object and select Open.
The table opens on the Overview tab.
3. Select the Data tab to view the reference data values.
4. Edit the data values. You can edit the data in the following ways:
To add a data row, click New.
The cursor moves to the final row of the table and adds a row. Enter values for each field in the row.
To edit a data value, double-click the value in the reference table and update the value.
To delete a data row, select the row and click Delete.
5. When you complete the edits, save the reference table.
Finding Data Values in a Reference Table
You can search a reference table for data values. Use the search options when a table contains one or more instances
of a data value that you must update.
1. In the Object Explorer, select the project or folder that contains the reference table.
2. Right-click the reference table object and select Open.
The table opens on the Overview tab.
3. Select the Data tab to display the reference data values.
4. Enter the search criteria on the toolbar:
Enter a data value in the Find field.
Select a column to search.
5. Search the columns you select for the data value in the Find field.
Use the Up and Down options to find instances of the data value.
52 Chapter 10: Reference Tables
Part III: Data Quality Features in
Informatica Analyst
This part contains the following chapters:
Column Profiles in Informatica Analyst, 54
Column Profile Results in Informatica Analyst, 60
Rules in Informatica Analyst, 67
Scorecards in Informatica Analyst, 71
Exception Record Management, 82
Reference Tables, 87
53
C H A P T E R 1 1
Column Profiles in Informatica
Analyst
This chapter includes the following topics:
Column Profiles in Informatica Analyst Overview, 54
Column Profiling Process, 54
Profile Options, 55
Creating a Column Profile in the Analyst Tool, 56
Editing a Column Profile, 57
Running a Profile, 58
Creating a Filter, 58
Managing Filters, 59
Synchronizing a Flat File Data Object, 59
Synchronizing a Relational Data Object, 59
Column Profiles in Informatica Analyst Overview
When you create a profile, you select the columns in the data object for which you want to profile data. You can set or
configure sampling and drilldown options for faster profiling. After you run the profile, you can examine the profiling
statistics to understand the data.
You can profile wide tables and flat files that have a large number of columns. You can profile tables with more than 30
columns and flat files with more than 100 columns. When you create or run a profile, you can choose to select all the
columns or select each column you want to include for profiling. The Analyst tool displays the first 30 columns in the
data preview. You can select all columns for drilldown and view value frequencies for these columns. You can use
rules that have more than 50 output fields and include the rule columns for profiling when you run the profile again.
Column Profiling Process
As part of the column profiling process, you can choose to either include all the source columns for profiling or select
specific columns. You can also accept the default profile options or configure the profile results, sampling, and drill-
down options.
54
The following steps describe the column profiling process:
1. Select the data object you want to profile.
2. Determine whether you want to create a profile with default options or change the default profile options.
3. Choose where you want to save the profile.
4. Select the columns you want to profile.
5. Select the profile results option.
6. Choose the sampling options.
7. Choose the drill-down options.
8. Define a filter to determine the rows that the profile reads at run time.
9. Run the profile.
Note: Consider the following rules and guidelines for column names and profiling multilingual and Unicode data:
You cannot add a column to a profile if both the column name and profile name match. You cannot add the same
column twice to a profile even if you change the column name.
You can profile multilingual data from different sources and view profile results based on the locale settings in the
browser. The Analyst tool changes the Datetime, Numeric, and Decimal datatypes based on the browser locale.
Sorting on multilingual data. You can sort on multilingual data. The Analyst tool displays the sort order based on
the browser locale.
To profile Unicode data in a DB2 database, set the DB2CODEPAGE database environment variable in the
database and restart the Data Integration Service.
Profile Options
Profile options include profile results option, data sampling options, and data drilldown options. You can configure
these options when you create a column profile for a data object.
You use the New Profile wizard to configure the profile options. You can choose to create a profile with the default
options for columns, sampling, and drilldown options. When you create a profile for multiple data sources, the Analyst
tool uses default column profiling options.
Profile Results Option
You can choose to discard previous profile results or to display results for previous profile runs.
The following table describes the profile results option for a profile:
Option Description
Show results only for columns, rules selected in current run Discards the profile results for previously profiled columns
and displays results for the columns and rules selected for
the latest profile run. Do not select this option if you want the
Analyst tool to display profile results for previously profiled
columns.
Profile Options 55
Sampling Options
Sampling options determine the number of rows that the Analyst tool chooses to profile. You can configure sampling
options when you go through the wizard or when you run a profile.
The following table describes the sampling options for a profile:
Option Description
All Rows Chooses all rows in the data object.
First <number> Rows The number of rows that you want to run the profile against.
The Analyst tool chooses the rows from the first rows in the
source.
Random Sample <number> Rows The number of rows for a random sample to run the profile
against. Random sampling forces the Analyst tool to perform
drilldown on staged data. Note that this can impact drilldown
performance.
Random sample Random sample size based on the number of rows in the
data object. Random sampling forces the Analyst tool to
perform drilldown on staged data. Note that this can impact
drilldown performance.
Drilldown Options
You can configure drilldown options when you go through the wizard or when you run a profile.
The following table describes the drilldown options for a profile:
Options Description
Enable Row Drilldown Drills down to row data in the profile results. By default, this
option is selected.
Select Columns Identifies columns for drilldown that you did not select for
profiling.
Drilldown on live or staged data Drills down on live data to read current data in the data
source.
Drill down on staged data to read profile data that is staged in
the profiling warehouse.
Creating a Column Profile in the Analyst Tool
Select a data object and create a custom profile or a default profile. When you create a custom profile, you can
configure the columns, the rows to sample, and the drilldown options. The Analyst tool creates the profile in the same
project and folder as the data object.
1. In the Navigator, select the project that contains the data object that you want to create a custom profile for.
2. In the Contents panel, right-click the data object and select New > Profile.
56 Chapter 11: Column Profiles in Informatica Analyst
The New Profile wizard appears. The Column profiling option is selected by default.
3. Click Next.
4. In the Sources panel, select a data object.
5. Choose to create a default profile or a custom profile.
To create a default profile, click Save or Save & Run.
To create a custom profile, click Next.
6. Enter a name and an optional description for the profile.
7. In the Folders panel, select the project or folder where you want to create the profile.
The Analyst tool displays the project that you selected and shared projects that contain folders where you can
create the profile. The profile objects in the folder appear in the Profiles panel.
8. Click Next.
9. In the Columns panel, select the columns that you want to profile. The columns include any rules you applied to
the profile. The Analyst tool lists the name, datatype, precision, and scale for each column.
Optionally, select Name to select all columns.
10. Accept the default option in the Profile Results Option panel.
The first time you run the profile, the Analyst tool displays profile results for all columns selected for profiling.
11. In the Sampling Options panel, configure the sampling options.
12. In the Drilldown Options panel, configure the drilldown options.
Optionally, click Select Columns to select columns to drill down on. In the Drilldown columns window, select
the columns for drill down and click OK.
13. Click Next.
14. Optionally, define a filter for the profile.
15. Click Next to verify the row drilldown settings including the preview columns for drilldown.
16. Click Save to create the profile, or click Save & Run to create the profile and then run the profile.
Editing a Column Profile
You can make changes to a column profile after running it.
1. In the Navigator, select the project or folder that contains the profile that you want to edit.
2. Click the profile to open it.
The profile opens in a tab.
3. Click Actions > Edit.
A short-cut menu appears.
4. Based on the changes you want to make, choose one of the following menu options:
General. Change the basic properties such as name, description, and profile type.
Data Source. Choose another matching data source.
Column Profiling. Select the columns you want to run the profile on and configure the necessary sampling
and drill down options.
Column Profiling Filter. Create, edit, and delete filters.
Editing a Column Profile 57
Column Profiling Rules. Create rules or change current ones.
Data Domain Discovery. Set up data domain discovery options.
5. Click Save to save the changes or click Save & Run to save the changes and then run the profile.
Running a Profile
Run a profile to analyze a data source for content and structure and select columns and rules for drill down. You can
drill down on live or staged data for columns and rules. You can run a profile on a column or rule without profiling all the
source columns again after you run the profile.
1. In the Navigator, select the project or folder that contains the profile you want to run.
2. Click the profile to open it.
The profile appears in a tab. Verify the profile options before you run the profile.
3. Click Actions > Run Profile.
The Analyst tool displays the profile results.
Creating a Filter
You can create a filter so that you can make a subset of the original data source that meets the filter criteria. You can
then run a profile on this sample data.
1. Open a profile.
2. Click Actions > Edit > Column Profiling Filters to open the Edit Profile dialog box.
The current filters appear in the Filters panel.
3. Click New.
4. Enter a filter name and an optional description.
5. Select a simple, advanced, or SQL filter type.
Simple. Use conditional operators, such as <, >, =, BETWEEN, and ISNULL for each column that you want to
filter.
Advanced. Use function categories, such as Character, Consolidation, Conversion, Financial, Numerical,
and Data cleansing.
Click the function name on the Functions panel to view its return type, description, and parameters. To
include the function in the filter, click the right arrow (>) button, and you can specify the parameters in the
Function dialog box.
Note: For a simple or an advanced filter on a date column, provide the condition in the YYYY/MM/DD
HH:MM:SS format.
SQL. Creates SQL queries. You can create an SQL filter for relational data sources. Enter the WHERE clause
expression to generate the SQL filter. For example, to filter company records in the European region from a
Company table with a Region column, enter
Region = 'Europe'
in the editor.
6. Click Validate to verify the SQL expression.
58 Chapter 11: Column Profiles in Informatica Analyst
Managing Filters
You can create, edit, and delete filters.
1. In the Navigator, select the project or folder that contains the profile you want to filter.
2. Open the profile.
3. Click Actions > Edit > Column Profiling Filters to open the Edit Profile dialog box.
The current filters appear in the Filters panel.
4. Choose to create, edit, or delete a filter.
Click New to create a filter.
Select a filter, and click Edit to change the filter settings.
Select a filter, and click Delete to remove the filter.
Synchronizing a Flat File Data Object
You can synchronize the changes to an external flat file data source with its data object in Informatica Analyst. Use the
Synchronize Flat File wizard to synchronize the data objects.
1. In the Contents panel, select a flat file data object.
2. Click Actions > Synchronize.
The Synchronize Flat File dialog box appears in a new tab.
3. Verify the flat file path in the Browse and Upload field.
4. Click Next.
A synchronization status message appears.
5. When you see a Synchronization complete message, click OK.
The message displays a summary of the metadata changes made to the data object. To view the details of the
metadata changes, use the Properties view.
Synchronizing a Relational Data Object
You can synchronize the changes to an external relational data source with its data object in Informatica Analyst.
External data source changes include adding, changing, and removing columns and changes to rules.
1. In the Contents panel, select a relational data object.
2. Click Actions > Synchronize.
A message prompts you to confirm the action.
3. To complete the synchronization process, click OK. Click Cancel to cancel the process.
If you click OK, a synchronization status message appears.
4. When you see a Synchronization complete message, click OK.
The message displays a summary of the metadata changes made to the data object. To view the details of the
metadata changes, use the Properties view.
Managing Filters 59
C H A P T E R 1 2
Column Profile Results in
Informatica Analyst
This chapter includes the following topics:
Column Profile Results in Informatica Analyst Overview, 60
Profile Summary, 61
Column Values, 62
Column Patterns, 62
Column Statistics, 63
Column Profile Drilldown, 63
Column Profile Export Files in Informatica Analyst, 64
Column Profile Results in Informatica Analyst Overview
View profile results to understand the structure of data and analyze its quality. You can view the profile results after
you run a profile. You can view a summary of the columns and rules in the profile and the values, patterns, and
statistics for columns and rules.
After you run a profile, you can view the profile results in the Column Profiling, Properties, and Data Preview views.
You can export value frequencies, pattern frequencies, or drilldown data to a CSV file. You can export the complete
profile summary information to a Microsoft Excel file so that you can view all data in a file for further analysis.
In the Column Profiling view, you can view the summary information for columns for a profile run. You can view
values, patterns, and statistics for each column in the Values, Patterns, and Statistics views.
The Analyst tool displays rules as columns in profile results. The profile results for a rule appear as a profiled column.
The profile results that appear depend on the profile configuration and sampling options.
The following profiling results appear in the Column Profiling view:
The summary information for the profile run, including the number of unique and null values, inferred datatype, and
last run date and time.
Values for columns and the frequency in which the value appears for the column. The frequency appears as a
number, a percentage, and a chart.
Value patterns for the profiled columns and the frequency in which the pattern appears. The frequency appears as
a number and a percentage.
Statistics about the column values, such as average, length, and top and bottom values.
60
Note: You can select a value or pattern and view profiled rows that match the value or pattern on the Details panel.
In the Properties view, you can view profile properties on the Properties panel. You can view properties for columns
and rules on the Columns and Rules panel.
In the Data Preview view, you can preview the profile data. The Analyst tool includes all columns in the profile and
displays the first 100 rows of data.
Profile Summary
The summary for a profile run includes the number of unique and null values expressed as a number and a
percentage, inferred datatypes, and last run date and time. You can click each profile summary property to sort on
values of the property.
The following table describes the profile summary properties:
Property Description
Name Name of the column in the profile.
Unique Values Number of unique values for the column.
% Unique Percentage of unique values for the column.
Null Number of null values for the column.
% Null Percentage of null values for the column.
Datatype Datatype derived from the values for the column. The Analyst tool can derive the
following datatypes from the datatypes of values in columns:
- String
- Varchar
- Decimal
- Integer
- "-" for Nulls
Note: The Analyst tool cannot derive the datatype from the values of a numeric
column that has a precision greater than 38. The Analyst tool cannot derive the
datatype from the values of a string column that has a precision greater than 255.
If you have a date column on which you are creating a column profile with a year
value earlier than 1800, the inferred datatype may show up as fixed length string.
Change the default value for the year-minimum parameter in the
InferDateTimeConfig.xml, as necessary.
% Inferred Percentage of values that match the data type inferred by the Analyst tool.
Documented Datatype Datatype declared for the column in the profiled object.
Maximum Value Maximum value in the column.
Minimum Value Minimum value in the column.
Last Profile Run Date and time you last ran the profile.
Drilldown If selected, drills down on live data for the column.
Profile Summary 61
Column Values
The column values include values for columns and the frequency in which the value appears for the column.
The following table describes the properties for the column values:
Property Description
Value List of all values for the column in the profile.
Note: The Analyst tool excludes the CLOB, BLOB, Raw, and Binary datatypes in column values in a
profile.
Frequency Number of times a value appears for a column, expressed as a number, a percentage, and a
chart.
Percent Percentage that a value appears for a column.
Chart Chart for the percentage.
Drill down Drills down to specific source rows based on a column value.
Note: To sort the Value and Frequency columns, select the columns. When you sort the results of the Frequency
column, the Analyst tool sorts the results based on the datatype of the column.
Column Patterns
The column patterns include the value patterns for the columns and the frequency in which the pattern appears.
The profiling warehouse stores 16,000 unique highest frequency values including NULL values for profile results by
default. If there is at least one NULL value in the profile results, the Analyst tool can display NULL values as
patterns.
Note: The Analyst tool cannot derive the pattern for a numeric column that has a precision greater than 38. The
Analyst tool cannot derive the pattern for a string column that has a precision greater than 255.
The following table describes the properties for the column patterns:
Property Description
Pattern Pattern for the column in the profile.
Frequency Number of times a pattern appears for a column, expressed as a number.
Percent Percentage that a pattern appears for a column.
Chart Chart for the percentage.
Drill down Drills down to specific source rows based on a column pattern.
62 Chapter 12: Column Profile Results in Informatica Analyst
The following table describes the pattern characters and what they represent:
Character Description
9 Represents any numeric character. Informatica Analyst displays up to three characters separately
in the "9" format. The tool displays more than three characters as a value within parentheses. For
example, the format "9(8)" represents a numeric value with 8 digits.
X Represents any alphabetic character. Informatica Analyst displays up to three characters
separately in the "X" format. The tool displays more than three characters as a value within
parentheses. For example, the format "X(6)" may represent the value "Boston."
Note: The pattern character X is not case sensitive and may represent upper case or lower case
characters from the source data.
p Represents "(", the left parenthesis.
q Represents ")", the right parenthesis.
b Represents a blank space.
Column Statistics
The column statistics include statistics about the column values, such as average, length, and top and bottom values.
The statistics that appear depend on the column type.
The following table describes the types of column statistics for each column type:
Statistic Column Type Description
Average Integer Average of the values for the column.
Standard Deviation Integer The standard deviation, or variability between column values,
for all values of the column.
Maximum Length Integer, String Length of the longest value for the column.
Minimum Length Integer, String Length of the shortest value for the column.
Bottom Integer, String Lowest values for the column.
Top Integer, String Highest values for the column.
Column Profile Drilldown
Drilldown options for a column profile enable you to drill down to specific rows in the data source based on a column
value. You can choose to read the current data in a data source for drilldown or read profile data staged in the profile
warehouse. When you drill down to a specific row on staged profile data, the Analyst tool creates a drilldown filter for
the matching column value. After you drill down, you can edit, recall, reset, and save the drilldown filter.
Column Statistics 63
You can select columns for drilldown even if you did not choose those columns for profiling. You can choose to read
the current data in a data source for drilldown or read profile data staged in the profiling warehouse. After you perform
a drilldown on a column value, you can export drilldown data for the selected values or patterns to a CSV file at a
location you choose. Though Informatica Analyst displays the first 200 values for drilldown data, the tool exports all
values to the CSV file.
Drilling Down on Row Data
After you run a profile, you can drill down to specific rows that match the column value or pattern.
1. Run a profile.
The profile appears in a tab.
2. In the Summary view, select a column name to view the profile results for the column.
3. Select a column value on the Values tab or select a column pattern on the Patterns tab.
4. Click Actions > Drilldown to view the rows of data.
The Drilldown panel displays the rows that contain the values or patterns. The column value or pattern appears
at the top of the panel.
Note: You can choose to drill down on live data or staged data.
Applying Filters to Drilldown Data
You can filter the drilldown data iteratively so that you can analyze data irregularities on the subsets of profile
results.
1. Drill down to row data in the profile results.
2. Select a column value on the Values tab.
3. Right-click and select Drilldown Filter > Edit to open the DrillDown Filter dialog box.
4. Add filter conditions, and click Run.
5. To manage current drilldown filters, you can save, recall, or reset filters.
To save a filter, select Drilldown Filter > Save.
To go back to the last saved drilldown filter results, select Drilldown Filter > Recall.
To reset the drilldown filter results, select Drilldown Filter > Reset.
Column Profile Export Files in Informatica Analyst
You can export column profile results to a CSV file or a Microsoft Excel file based on whether you choose a part of the
profile results or the complete results summary.
You can export value frequencies, pattern frequencies, or drilldown data to a CSV file for selected values and
patterns. You can export the profiling results summary for all columns to a Microsoft Excel file. Use the Data
Integration Service privilege Drilldown and Export Results to determine, by user or group, who exports profile
results.
64 Chapter 12: Column Profile Results in Informatica Analyst
Profile Export Results in a CSV File
You can export value frequencies, pattern frequencies, or drilldown data to view the data in a file. The Analyst tool
saves the information in a CSV file.
When you export inferred column patterns, the Analyst tool exports a different format of the column pattern. For
example, when you export the inferred column pattern X(5), the Analyst tool displays the following format of the
column pattern in the CSV file: XXXXX.
Profile Export Results in Microsoft Excel
When you export the complete profile results summary, the Analyst tool saves the information to multiple worksheets
in a Microsoft Excel file. The Analyst tool saves the file in the "xlsx" format.
The following table describes the information that appears on each worksheet in the export file:
Tab Description
Column Profile Summary information exported from the Column Profiling
view after the profile runs. Examples are column names, rule
names, number of unique values, number of null values,
inferred datatypes, and date and time of the last profile
run.
Values Values for the columns and rules and the frequency in which
the values appear for each column.
Patterns Value patterns for the columns and rules you ran the profile
on and the frequency in which the patterns appear.
Statistics Statistics about each column and rule. Examples are
average, length, top values, bottom values, and standard
deviation.
Properties Properties view information, including profile name, type,
sampling policy, and row count.
Exporting Profile Results from Informatica Analyst
You can export the results of a profile to a ".csv" or ".xlsx" file to view the data in a file.
1. In the Navigator, select the project or folder that contains the profile.
2. Click the profile to open it.
The profile opens in a tab.
3. In the Column Profiling view, select the column that you want to export.
4. Click Actions > Export Data.
The Export Data to a file window appears.
5. Enter the file name. Optionally, use the default file name.
6. Select the type of data to export.
All (Summary, Values, Patterns, Statistics, Properties)
Value frequencies for the selected column.
Column Profile Export Files in Informatica Analyst 65
Pattern frequencies for the selected column.
Drilldown data for the selected values or patterns.
7. Enter a file format. The format is Excel for the All option and CSV for the rest of the options.
8. Select the code page of the file.
9. Click OK.
66 Chapter 12: Column Profile Results in Informatica Analyst
C H A P T E R 1 3
Rules in Informatica Analyst
This chapter includes the following topics:
Rules in Informatica Analyst Overview, 67
Predefined Rules, 68
Expression Rules, 69
Rules in Informatica Analyst Overview
A rule is business logic that defines conditions applied to source data when you run a profile. You can add a rule to the
profile to cleanse, change, or validate data.
You may want to use a rule in different circumstances. You can add a rule to cleanse one or more data columns. You
can add a lookup rule that provides information that the source data does not provide. You can add a rule to validate a
cleansing rule for a data quality or data integration project.
You can add a rule before or after you run a profile. When you add a rule to a profile, you can create a rule or you can
apply a rule. You can create or apply the following rule types for a profile:
Expression rules. Use expression functions and columns to define rule logic. Create expression rules in the
Analyst tool. An analyst can create an expression rule and promote it to a reusable rule that other analysts can use
in multiple profiles.
Predefined rules. Includes reusable rules that a developer creates in the Developer tool. Rules that a developer
creates in the Developer tool as mapplets can appear in the Analyst tool as reusable rules.
After you add a rule to a profile, you can run the profile again for the rule column. The Analyst tool displays profile
results for the rule column. You can modify the rule and run the profile again to view changes to the profile results. The
output of a rule can be one or more virtual columns. The virtual columns exist in the profile results. The Analyst tool
profiles the virtual columns. For example, you use a predefined rule that splits a column that contains first and last
names into FIRST_NAME and LAST_NAME virtual columns. The Analyst tool profiles the FIRST_NAME and
LAST_NAME columns.
Note: If you delete a rule object that other object types reference, the Analyst tool displays a message that lists those
object types. Determine the impact of deleting the rule before you delete it.
67
Predefined Rules
Predefined rules are rules created in the Developer tool or provided with the Developer tool and Analyst tool. Apply
predefined rules to the Analyst tool profiles to modify or validate source data.
Predefined rules use transformations to define rule logic. You can use predefined rules with multiple profiles. In the
Model repository, a predefined rule is a mapplet with an input group, an output group, and transformations that define
the rule logic.
Predefined Rules Process
Use the New Rule Wizard to apply a predefined rule to a profile.
You can perform the following steps to apply a predefined rule:
1. Open a profile.
2. Select a predefined rule.
3. Review the rules parameters.
4. Select the input column.
5. Configure the profiling options.
Applying a Predefined Rule
Use the New Rule Wizard to apply a predefined rule to a profile. When you apply a predefined rule, you select the rule
and configure the input and output columns for the rule. Apply a predefined rule to use a rule promoted as a reusable
rule or use a rule created by a developer.
1. In the Navigator, select the project or folder that contains the profile that you want to add the rule to.
2. Click the profile to open it.
The profile appears in a tab.
3. Click Actions > Edit > Column Profiling Rules.
The Edit Profile dialog box appears.
4. To open the New Rule dialog box, click + .
5. Select Apply a Rule.
6. Click Next.
7. In the Rules panel, select the rule that you want to apply.
The name, datatype, description, and precision columns appear for the Inputs and Outputs columns in the
Rules Parameters panel.
8. Click Next.
9. In the Inputs section, select an input column. The input column is a column name in the profile.
10. Optionally, in the Outputs section, configure the label of the output columns.
11. Click Next.
12. In the Columns panel, select the columns you want to profile. The columns include any rules you applied to the
profile. Optionally, select Name to include all columns.
The Analyst tool lists the name, datatype, precision, and scale for each column.
13. In the Sampling Options panel, configure the sampling options.
14. In the Drilldown Options panel, configure the drilldown options.
68 Chapter 13: Rules in Informatica Analyst
15. Click Save to apply the rule or click Save & Run to apply the rule and then run the profile.
Expression Rules
Expression rules use expression functions and columns to define rule logic. Create expression rules and add them to
a profile in the Analyst tool.
Use expression rules to change or validate values for columns in a profile. You can create one or more expression
rules to use in a profile. Expression functions are SQL-like functions used to transform source data. You can create
expression rule logic with the following types of functions:
Character
Conversion
Data Cleansing
Date
Encoding
Financial
Numeric
Scientific
Special
Test
Expression Rules Process
Use the New Rule Wizard to create an expression rule and add it to a profile.
The New Rule Wizard includes an expression editor. Use the expression editor to add expression functions,
configure columns as input to the functions, validate the expression, and configure the return type, precision, and
scale.
The output of an expression rule is a virtual column that uses the name of the rule as the column name. The Analyst
tool profiles the virtual column. For example, you use an expression rule to validate a ZIP code. The rule returns 1 if the
ZIP Code is valid and 0 if the ZIP code is not valid. Informatica Analyst profiles the 1 and 0 output values of the
rule.
You can perform the following steps to create an expression rule:
1. Open a profile.
2. Configure the rule logic using expression functions and columns as parameters.
3. Configure the profiling options.
Creating an Expression Rule
Use the New Rule Wizard to create an expression rule and add it to a profile. Create an expression rule to modify or
validate values for columns in a profile.
1. In the Navigator, select the project or folder that contains the profile that you want to add the rule to.
2. In the Contents panel, click the profile to open it.
Expression Rules 69
The profile appears in a tab.
3. Click Actions > Edit > Column Profiling Rules.
The Edit Profile dialog box appears.
4. Click New.
5. Select Create a rule.
6. Click Next.
7. Enter a name and optional description for the rule.
8. Optionally, choose to promote the rule as a reusable rule and configure the project and folder location.
If you promote a rule to a reusable rule, you or other users can use the rule in another profile as a predefined
rule.
9. In the Functions tab, select a function and click the right arrow to enter the parameters for the function.
10. In the Columns tab, select an input column and click the right arrow to add the expression in the Expression
editor. You can also add logical operators to the expression.
11. Click Validate. You can proceed to the next step if the expression is valid.
12. Optionally, click Edit to configure the return type, precision, and scale.
13. Click Next.
14. In the Columns panel, select the columns you want to profile. The columns include any rules you applied to the
profile. Optionally, select Name to select all columns.
The Analyst tool lists the name, datatype, precision, and scale for each column.
15. In the Sampling Options panel, configure the sampling options.
16. In the Drilldown Options panel, configure the drilldown options.
17. Click Save to create the rule or click Save & Run to create the rule and then run the profile.
70 Chapter 13: Rules in Informatica Analyst
C H A P T E R 1 4
Scorecards in Informatica Analyst
This chapter includes the following topics:
Scorecards in Informatica Analyst Overview, 71
Informatica Analyst Scorecard Process, 71
Metrics, 72
Scorecard Notifications, 77
Scorecard Integration with External Applications, 79
Scorecard Lineage, 80
Scorecards in Informatica Analyst Overview
A scorecard is the graphical representation of valid values for a column in a profile. You can create scorecards and
drill down on live data or staged data.
Use scorecards to measure data quality progress. For example, you can create a scorecard to measure data quality
before you apply data quality rules. After you apply data quality rules, you can create another scorecard to compare
the effect of the rules on data quality.
Scorecards display the value frequency for columns as scores. The scores reflect the percentage of valid values in the
columns. After you run a profile, you can add columns from the profile as metrics to a scorecard. You can create metric
groups so that you can group related metrics to a single entity. You can define thresholds that specify the range of bad
data acceptable for columns in a record and assign metric weights for each metric. When you run a scorecard, the
Analyst tool generates weighted average values for each metric group. To identify valid data records and records that
are not valid, you can drill down on each column. You can use trend charts in the Analyst tool to track how scores
change over a period of time.
Informatica Analyst Scorecard Process
You can run and edit the scorecard in the Analyst tool. You can create and view a scorecard in the Developer tool. You
can run the scorecard on current data in the data object or on data stored in the staging database.
When you view a scorecard in the Contents view of the Analyst tool, it opens the scorecard in another tab. After you
run the scorecard, you can view the scores on the Scorecard view. You can select the data object and navigate to the
data object from a score within a scorecard. The Analyst tool opens the data object in another tab.
71
You can perform the following tasks when you work with scorecards:
1. Create a scorecard in the Developer tool and add columns from a profile.
2. Optionally, connect to the Analyst tool and open the scorecard in the Analyst tool.
3. After you run a profile, add profile columns as metrics to the scorecard.
4. Run the scorecard to generate the scores for columns.
5. View the scorecard to see the scores for each column in a record.
6. Drill down on the columns for a score.
7. Edit a scorecard.
8. Set thresholds for each metric in a scorecard.
9. Create a group to add or move related metrics in the scorecard.
10. Edit or delete a group, as required.
11. View trend charts for each score to monitor how the score changes over time.
12. View scorecard lineage for each metric or metric group.
Metrics
A metric is a column of a data source or output of a rule that is part of a scorecard. When you create a scorecard, you
can assign a weight to each metric. Create a metric group to categorize related metrics in a scorecard into a set.
Metric Weights
When you create a scorecard, you can assign a weight to each metric. The default value for a weight is 1.
When you run a scorecard, the Analyst tool calculates the weighted average for each metric group based on the metric
score and weight you assign to each metric.
For example, you assign a weight of W1 to metric M1, and you assign a weight of W2 to metric M2. The Analyst tool
uses the following formula to calculate the weighted average:
(M1 X W1 + M2 X W2) / (W1 + W2)
Adding Columns to a Scorecard
After you run a profile, you can add profile columns to a scorecard. Use the Add to Scorecard Wizard to add columns
from a profile to a scorecard and configure the valid values for the columns. If you add a profile column to a scorecard
from a source profile that has a filter or a sampling option other than All Rows, profile results may not reflect the
scorecard results.
1. In the Navigator, select the project or folder that contains the profile.
2. Click the profile to open it.
The profile appears in a tab.
3. Click Actions > Run Profile to run the profile.
4. Click Actions > Add to Scorecard.
The Add to Scorecard Wizard appears.
72 Chapter 14: Scorecards in Informatica Analyst
Note: Use the following rules and guidelines before you add columns to a scorecard:
You cannot add a column to a scorecard if both the column name and scorecard name match.
You cannot add a column twice to a scorecard even if you change the column name.
5. Select Existing Scorecard to add the columns to an existing scorecard.
The New Scorecard option is selected by default.
6. Click Next.
7. Select the scorecard that you want to add the columns to, and click Next.
8. Select the columns and rules that you want to add to the scorecard as metrics. Optionally, click the check box in
the left column header to select all columns. Optionally, select Column Name to sort column names.
9. Select each metric in the Metrics panel and configure the valid values from the list of all values in the Score
using: Values panel.
You can select multiple values in the Available Values panel and click the right arrow button to move them to the
Selected Values panel. The total number of valid values for a metric appears at the top of the Available Values
panel.
10. Select each metric in the Metrics panel and configure metric thresholds in the Metric Thresholds panel.
You can set thresholds for Good, Acceptable, and Unacceptable scores.
11. Click Next.
12. In the Score using: Values panel, set up the metric weight for each metric. You can double-click the default
metric weight of 1 to change the value.
13. In the Metric Group Thresholds panel, set up metric group thresholds.
14. Click Save to save the scorecard or click Save & Run to save and run the scorecard.
Running a Scorecard
Run a scorecard to generate scores for columns.
1. In the Navigator, select the project or folder that contains the scorecard.
2. Click the scorecard to open it.
The scorecard appears in a tab.
3. Click Actions > Run Scorecard.
4. Select a score from the Metrics panel and select the columns from the Columns panel to drill down on.
5. In the Drilldown option, choose to drill down on live data or staged data.
For optimal performance, drill down on live data.
6. Click Run.
Viewing a Scorecard
Run a scorecard to see the scores for each metric. A scorecard displays the score as a percentage and bar. View data
that is valid or not valid. You can also view scorecard information, such as the metric weight, metric group score, score
trend, and name of the data object.
1. Run a scorecard to view the scores.
2. Select a metric that contains the score you want to view.
Metrics 73
3. Click Actions > Drilldown to view the rows of valid data or rows of data that is not valid for the column.
The Analyst tool displays the rows of valid data by default in the Drilldown panel.
Editing a Scorecard
Edit valid values for metrics in a scorecard. You must run a scorecard before you can edit it.
1. In the Navigator, select the project or folder that contains the scorecard.
2. Click the scorecard to open it.
The scorecard appears in a tab.
3. Click Actions > Edit.
The Edit Scorecard dialog box appears.
4. On the Metrics tab, select each score in the Metrics panel and configure the valid values from the list of all values
in the Score using: Values panel.
5. Make changes to the score thresholds in the Metric Thresholds panel as necessary.
6. Click the Metric Groups tab.
7. Create, edit, or remove metric groups.
You can also edit the metric weights and metric thresholds on the Metric Groups tab.
8. Click the Notifications tab.
9. Make changes to the scorecard notification settings as necessary.
You can set up global and custom settings for metrics and metric groups.
10. Click Save to save changes to the scorecard, or click Save & Run to save the changes and run the
scorecard.
Defining Thresholds
You can set thresholds for each score in a scorecard. A threshold specifies the range in percentage of bad data that is
acceptable for columns in a record. You can set thresholds for good, acceptable, or unacceptable ranges of data. You
can define thresholds for each column when you add columns to a scorecard, or when you edit a scorecard.
Complete the following prerequisite tasks before you define thresholds for columns in a scorecard:
In the Navigator, select the project or folder that contains the profile and add columns from the profile to the
scorecard in the Add to Scorecard window.
Optionally, in the Navigator, select the project or folder that contains the scorecard and click the scorecard to edit it
in the Edit Scorecard window.
1. In the Add to Scorecard window, or the Edit Scorecard window, select each metric in the Metrics panel.
2. In the Metric Thresholds panel, enter the thresholds that represent the upper bound of the unacceptable range
and the lower bound of the good range.
3. Click Next or Save.
Metric Groups
Create a metric group to categorize related scores in a scorecard into a set. By default, the Analyst tool categorizes all
the scores in a default metric group.
74 Chapter 14: Scorecards in Informatica Analyst
After you create a metric group, you can move scores out of the default metric group to another metric group. You can
edit a metric group to change its name and description, including the default metric group. You can delete metric
groups that you no longer use. You cannot delete the default metric group.
Creating a Metric Group
Create a metric group to add related scores in the scorecard to the group.
1. In the Navigator, select the project or folder that contains the scorecard.
2. Click the scorecard to open it.
The scorecard appears in a tab.
3. Click Actions > Edit.
The Edit Scorecard window appears.
4. Click the Metric Groups tab.
The default group appears in the Metric Groups panel and the scores in the default group appear in the Metrics
panel.
5. Click the New Group icon to create a metric group.
The Metric Groups dialog box appears.
6. Enter a name and optional description.
7. Click OK.
8. Click Save to save the changes to the scorecard.
Moving Scores to a Metric Group
After you create a metric group, you can move related scores to the metric group.
1. In the Navigator, select the project or folder that contains the scorecard.
2. Click the scorecard to open it.
The scorecard appears in a tab.
3. Click Actions > Edit.
The Edit Scorecard window appears.
4. Click the Metric Groups tab.
The default group appears in the Metric Groups panel and the scores in the default group appear in the Metrics
panel.
5. Select a metric from the Metrics panel and click the Move Metrics icon.
The Move Metrics dialog box appears.
Note: To select multiple scores, hold the Shift key.
6. Select the metric group to move the scores to.
7. Click OK.
Editing a Metric Group
Edit a metric group to change the name and description. You can change the name of the default metric group.
1. In the Navigator, select the project or folder that contains the scorecard.
Metrics 75
2. Click the scorecard to open it.
The scorecard opens in a tab.
3. Click Actions > Edit.
The Edit Scorecard window appears.
4. Click the Metric Groups tab.
The default metric group appears in the Metric Groups panel and the metrics in the default metric group appear
in the Metrics panel.
5. On the Metric Groups panel, click the Edit Group icon.
The Edit dialog box appears.
6. Enter a name and an optional description.
7. Click OK.
Deleting a Metric Group
You can delete a metric group that is no longer valid. When you delete a metric group, you can choose to move the
scores in the metric group to the default metric group. You cannot delete the default metric group.
1. In the Navigator, select the project or folder that contains the scorecard.
2. Click the scorecard to open it.
The scorecard opens in a tab.
3. Click Actions > Edit.
The Edit Scorecard window appears.
4. Click the Metric Groups tab.
The default metric group appears in the Metric Groups panel and the metrics in the default metric group appear
in the Metrics panel.
5. Select a metric group in the Metric Groups panel, and click the Delete Group icon.
The Delete Groups dialog box appears.
6. Choose the option to delete the metrics in the metric group or the option to move the metrics to the default metric
group before deleting the metric group.
7. Click OK.
Drilling Down on Columns
Drill down on the columns for a score to select columns that appear when you view the valid data rows or data rows
that are not valid. The columns you select to drill down on appear in the Drilldown panel.
1. Run a scorecard to view the scores.
2. Select a column that contains the score you want to view.
3. Click Actions > Drilldown to view the rows of valid or invalid data for the column.
4. Click Actions > Drilldown Columns.
The columns appear in the Drilldown panel for the selected score. The Analyst tool displays the rows of valid
data for the columns by default. Optionally, click Invalid to view the rows of data that are not valid.
76 Chapter 14: Scorecards in Informatica Analyst
Viewing Trend Charts
You can view trend charts for each score to monitor how the score changes over time.
1. In the Navigator, select the project or folder that contains the scorecard.
2. Click the scorecard to open it.
The scorecard appears in a tab.
3. In the Scorecard view, select a score.
4. Click Actions > Show Trend Chart.
The Trend Chart Detail window appears. You can view score values that have changed over time. The Analyst
tool uses historical scorecard run data for each date and the latest valid score values to calculate the score. The
Analyst tool uses the latest threshold settings in the chart to depict the color of the score points.
Scorecard Notifications
You can configure scorecard notification settings so that the Analyst tool sends emails when specific metric scores or
metric group scores move across thresholds or remain in specific score ranges, such as Unacceptable, Acceptable,
and Good.
You can configure email notifications for individual metric scores and metric groups. If you use the global settings, the
Analyst tool sends notification emails when the scores of selected metrics cross the threshold from the score ranges
Good to Acceptable and Acceptable to Bad. You also get notification emails for each scorecard run if the score
remains in the Unacceptable score range across consecutive scorecard runs.
You can customize the notification settings so that scorecard users get email notifications when the scores move from
the Unacceptable to Acceptable and Acceptable to Good score ranges. You can also choose to send email
notifications if a score remains within specific score ranges for every scorecard run.
Notification Email Message Template
You can set up the message text and structure of email messages that the Analyst tool sends to recipients as part of
scorecard notifications. The email template has an optional introductory text section, read-only message body
section, and optional closing text section.
The following table describes the tags in the email template:
Tag Description
ScorecardName Name of the scorecard.
ObjectURL A hyperlink to the scorecard. You need to provide the username and password.
MetricGroupName Name of the metric group that the metric belongs to.
CurrentWeightedAverage Weighted average value for the metric group in the current scorecard run.
CurrentRange The score range, such as Unacceptable, Acceptable, and Good, for the metric
group in the current scorecard run.
Scorecard Notifications 77
Tag Description
PreviousWeightedAverage Weighted average value for the metric group in the previous scorecard run.
PreviousRange The score range, such as Unacceptable, Acceptable, and Good, for the metric
group in the previous scorecard run.
ColumnName Name of the source column that the metric is assigned to.
ColumnType Type of the source column.
RuleName Name of the rule.
RuleType Type of the rule.
DataObjectName Name of the source data object.
Setting Up Scorecard Notifications
You can set up scorecard notifications at both metric and metric group levels. Global notification settings apply to
those metrics and metric groups that do not have individual notification settings.
1. Run a scorecard in the Analyst tool.
2. Click Actions > Edit.
3. Click the Notifications tab.
4. Select Enable notifications to start configuring scorecard notifications.
5. Select a metric or metric group.
6. Click the Notifications check box to enable the global settings for the metric or metric group.
7. Select Use custom settings to change the settings for the metric or metric group.
You can choose to send a notification email when the score is in Unacceptable, Acceptable, and Good ranges
and moves across thresholds.
8. To edit the global settings for scorecard notifications, click the Edit Global Settings icon.
The Edit Global Settings dialog box appears where you can edit the settings including the email template.
Configuring Global Settings for Scorecard Notifications
If you choose the global scorecard notification settings, the Analyst tool sends emails to target users when the score is
in the Unacceptable range or moves down across thresholds. As part of the global settings, you can configure the
email template including the email addresses and message text for a scorecard.
1. Run a scorecard in the Analyst tool.
2. Click Actions > Edit to open the Edit Scorecard dialog box.
3. Click the Notifications tab.
4. Select Enable notifications to start configuring scorecard notifications.
5. Click the Edit Global Settings icon.
The Edit Global Settings dialog box appears where you can edit the settings, including the email template.
6. Choose when you want to send email notifications using the Score in and Score moves check boxes.
78 Chapter 14: Scorecards in Informatica Analyst
7. In the Email from field, change the email ID as necessary.
By default, the Analyst tool uses the Sender Email Address property of the Data Integration Service as the
sender email ID.
8. In the Email to field, enter the email ID of the recipient.
Use a semicolon to separate multiple email IDs.
9. Enter the text for the email subject.
10. In the Body field, add the introductory and closing text of the email message.
11. To apply the global settings, select Apply settings to all metrics and metric groups.
12. Click OK.
Scorecard Integration with External Applications
You can create a scorecard in the Analyst tool and view its results in external applications or web portals. Specify the
scorecard results URL in a format that includes the host name, port number, project ID, and scorecard ID to view the
results in external applications.
Open a scorecard after you run it and copy its URL from the browser. The scorecard URL must be in the following
format:
http://{HOST_NAME}:{PORT}/AnalystTool/com.informatica.at.AnalystTool/index.jsp?
mode=scorecard&project={MRS_PROJECT_ID}&id={SCORECARD_ID}&parentpath={MRS_PARENT_PATH}&view={VIEW_MOD
E}&pcsfcred={CREDENTIAL}
The following table describes the scorecard URL attributes:
Attribute Description
HOST_NAME Host name of the Analyst Service.
PORT Port number for the Analyst Service.
MRS_PROJECT_ID Project ID in the Model repository.
SCORECARD_ID ID of the scorecard.
MRS_PARENT_PATH Location of the scorecard in the Analyst tool. For example, /
project1/folder1/sub_folder1.
VIEW_MODE Determines whether a read-only or editable view of the
scorecard gets integrated with the external application.
CREDENTIAL Last part of the URL generated by the single sign-on feature
that represents the object type such as scorecard.
The VIEW_MODE attribute in the scorecard URL determines whether you can integrate a read-only or editable view of
the scorecard with the external application:
view=objectonly
Displays a read-only view of the scorecard results.
Scorecard Integration with External Applications 79
view=objectrunonly
Displays scorecard results where you can run the scorecard and drill down on results.
view=full
Opens the scorecard results in the Analyst tool with full access.
Viewing a Scorecard in External Applications
You view a scorecard using the scorecard URL in external applications or web portals. Copy the scorecard URL from
the Analyst tool and add it to the source code of external applications or web portals.
1. Run a scorecard in the Analyst tool.
2. Copy the scorecard URL from the browser.
3. Verify that the URL matches the http://{HOST_NAME}:{PORT}/AnalystTool/com.informatica.at.AnalystTool/
index.jsp?
mode=scorecard&project={MRS_PROJECT_ID}&id={SCORECARD_ID}&parentpath={MRS_PARENT_PATH}&view={VIEW
_MODE}&pcsfcred={CREDENTIAL} format.
4. Add the URL to the source code of the external application or web portal.
Scorecard Lineage
Scorecard lineage shows the origin of the data, describes the path, and shows how the data flows for a metric or metric
group. You can use scorecard lineage to analyze the root cause of an unacceptable score variance in metrics or
metric groups. View the scorecard lineage in the Analyst tool.
Complete the following tasks to view scorecard lineage:
1. In Informatica Administrator, associate a Metadata Manager Service with the Analyst Service.
2. Select a project and export the scorecard objects in it to an XML file using the Export Resource File for Metadata
Manager option in the Developer tool or infacmd oie exportResources command.
3. In Metadata Manager, use the exported XML file to create a resource and load it.
Note: The name of the resource file that you create and load in Metadata Manager must use the following naming
convention: <MRS name>_<project name>. For more information about how to create and load a resource file,
see Informatica PowerCenter Metadata Manager User Guide.
4. In the Analyst tool, open the scorecard and select a metric or metric group.
5. View the scorecard lineage.
Viewing Scorecard Lineage in Informatica Analyst
You can view a scorecard lineage diagram for a metric or metric group. Before you can view scorecard lineage
diagram in the Analyst tool, you must load the scorecard lineage and metadata in Metadata Manager.
1. In the Navigator, select the project or folder that contains the scorecard.
2. Click the scorecard to open it.
The scorecard appears in a tab.
3. In the Scorecard view, select a metric or metric group.
80 Chapter 14: Scorecards in Informatica Analyst
4. Right-click and select Show Lineage.
The scorecard lineage diagram appears in a new window.
Important: If you do not create and load a resource in Metadata Manager with an exported XML file of the
scorecard objects, you might see an error message that the resource is not available in the catalog. For more
information about exporting an XML file for scorecard lineage, see Exporting a Resource File for Scorecard
Lineage on page 44.
Scorecard Lineage 81
C H A P T E R 1 5
Exception Record Management
This chapter includes the following topics:
Exception Record Management Overview, 82
Exception Management Tasks, 84
Exception Record Management Overview
An exception is a record that contains unresolved data quality issues. The record may contain errors, or it may be an
unintended duplicate of another record. You can use the Analyst tool to review and edit exception records that are
identified by a mapping that contains an Exception transformation.
You can review and edit the output from an Exception transformation in the Analyst tool or in the Informatica Data
Director for Data Quality web application. You use Informatica Data Director for Data Quality when you are assigned a
task as part of a workflow.
You can use the Analyst tool to review the following exception types:
Bad records
You can edit records, delete records, tag them to be reprocessed by a mapping, or profile them to analyze the
quality of changes made to the records.
Duplicate records
You can consolidate clusters of similar records to a single master record. You can consolidate or remove
duplicate records, extract records to form new clusters, and profile duplicate records.
The Exception transformation creates a database table to store the bad or duplicate records. The Model repository
stores the data object associated with the table. The transformation also creates one or more tables for the metadata
associated with the bad or duplicate records.
To review and update the bad or duplicate records, import the database table to the staging database in the Analyst
tool. The Analyst tool uses the metadata tables in the database to identify the data quality issues in each record. You
do not use the data object in the Model repository to update the record data.
Exception Management Process Flow
The Exception transformation analyzes the output of other data quality transformations and creates tables that
contain records with different levels of data quality.
After the Exception transformation creates an exception table, you can use the Analyst tool or Informatica Data
Director for Data Quality to review and update the records in the table.
82
You can configure data quality transformations in a single mapping, or you can create mappings for different stages in
the process.
Use the Developer tool to perform the following tasks:
Create a mapping that generates score values for data quality issues
Use a Match transformation in cluster mode to generate score values for duplicate record exceptions.
Use a transformation that writes a business rule to generate score values for records that contain errors. For
example, you can define an IF/THEN rule in a Decision transformation. Use the rule to evaluate the output of
other data quality transformations.
Use an Exception transformation to analyze the record scores
Configure the Exception transformation to read the output of other transformations or to read a data object from
another mapping. Configure the transformation to write records to database tables based on score values in the
records.
Configure target data objects for good records or automatic consolidation records
Connect the Exception transformation output ports to the target data objects in the mapping.
Create the target data object for bad or duplicate records
Use the Generate bad records table or Generate duplicate record table option to create the database object
and add it to the mapping canvas. The Developer tool auto-connects the bad or duplicate record ports to the data
object.
Run the mapping
Run the mapping to process exceptions.
Use the Analyst tool or Informatica Data Director for Data Quality to perform the following tasks:
Review the exception table data
You can use the Analyst tool or Informatica Data Director for Data Quality to review the bad or duplicate record
tables.
Use the Analyst tool to import the exception records into a bad or duplicate record table. Open the imported
table from the Model repository and work on the exception data.
Use Informatica Data Director for Data Quality if you are assigned a task to review or correct exceptions as
part of a Human task.
Note: The exception tables you create in the Exception transformation include columns that provide metadata to
Informatica Data Director for Data Quality. The columns are not used in the Analyst tool. When you import the
tables to the Analyst tool for exception data management, the Analyst tool hides the columns.
Reserved Column Names
When you create a bad record or consolidation table, the Analyst tool generates columns for use in its internal tables.
Do not import tables that use these names. If an imported table contains a column with the same name as one of the
generated columns, the Analyst tool will not process it.
Reserve the following column names for bad record or consolidation tables:
checkStatus
rowIdentifier
acceptChanges
recordGroup
masterRecord
Exception Record Management Overview 83
matchScore
any name beginning with DQA_
Exception Management Tasks
You can perform the following exception management tasks in the Analyst tool:
Manage bad records
Identify problem records and fix data quality issues.
Consolidate duplicate records
Merge groups of duplicate records into a single record.
View the audit trail
Review the changes made in the bad or duplicate record tables before writing the changes to the source
database.
Viewing and Editing Bad Records
Complete these steps to view and edit bad records:
1. Log in to the Analyst tool.
2. Select a project.
3. Select a bad records table.
4. Optionally, use the menus to filter the table records. You can filter records by value in the following columns:
Priority, Quality Issue, Column, and Status.
5. Click Show to view the records that match the filter criteria.
6. Double-click a cell to edit the cell to edit the cell value.
7. Click Save to save the rows you updated.
Saving changes to a record is the first step in processing the record in the Analyst tool. After you save changes to a
record, you can update the record status to accept, reprocess, or reject the record.
Updating Bad Record Status
For each record that does not require further editing, perform one of the following actions:
Select one or more records by clicking the check box next to each record. Select all the records in the table by clicking
the check box at the top of the first column.
Note: The Analyst tool does not display records that you have taken action on.
Click Accept.
Indicates that the record is acceptable for use.
Click Reject.
Indicates that the record is not acceptable for use.
Click Reprocess.
Selects the record for reprocessing by a data quality mapping. Select this option when you are unsure if the record
is valid. Rerun the mapping with an updated business rule to recheck the record.
84 Chapter 15: Exception Record Management
Viewing and Filtering Duplicate Record Clusters
Complete these steps to view and filter duplicate clusters:
1. Log in to the Analyst tool.
2. Select a project.
3. Select a duplicate record table.
4. The first cluster in the table opens.
The Analyst tool also displays the number of clusters in the table. Click a number to move to a cluster.
5. Optionally, use the Filter option to filter the cluster list.
In the Filter Clusters dialog box, select a column and enter a filter string. The Analyst tool returns all clusters with
one or more records that contain the string in the column you select.
Editing Duplicate Record Clusters
Edit clusters to change how the Analyst tool consolidates potential duplicate records.
You can edit clusters in the following ways:
To remove a record from a cluster:
Clear the selection in the Cluster column to remove the record from the cluster. When you delete a record from a
cluster, the record assumes a unique cluster ID.
To create a new cluster from records in the current cluster:
Select a subset of records and click the Extract Cluster button. This action creates a new cluster ID for the
selected records.
To edit the record:
Select a record field to edit the data in that field.
To select the fields that populate the master record:
Click the selection arrow in a field to add its value to the corresponding field in the Final Record row. An arrow
indicates that the field provides data for the master record.
To specify a master record:
Click a cell in the Master column for a row to select that row as the master record.
Consolidating Duplicate Record Clusters
When you have processed a cluster, complete this step to consolidate the cluster records to a single record in the
staging database.
u In the cluster you processed, click the Consolidate Cluster button.
The Analyst tool performs the following updates on cluster records:
In the staging database, the Analyst tool updates the master record with the contents of the Final record and sets
the status to Updated.
The Analyst tool sets the status of the other selected records to Consolidated.
The Analyst tool sets the status of any cleared record to Reprocess.
Exception Management Tasks 85
Viewing the Audit Trail
The Analyst tool tracks changes to the exception record database in an audit trail.
Complete the following steps to view audit trail records:
1. Select the Audit Trail tab.
2. Set the filter options.
3. Click Show.
The following table describes record statuses for the audit trail.
Record Status Description
Updated Edited during bad record processing, or selected as the
Master record during consolidation.
Consolidated Consolidated to a master record during consolidation.
Rejected Rejected during bad record processing.
Accepted Accepted during bad record processing.
Reprocess Marked for reprocessing during bad record processing.
Rematch Removed from a cluster during consolidation.
Extracted Extracted from a cluster into a new cluster during
consolidation.
86 Chapter 15: Exception Record Management
C H A P T E R 1 6
Reference Tables
This chapter includes the following topics:
Reference Tables Overview, 87
Reference Table Properties, 87
Create Reference Tables, 89
Create a Reference Table from Profile Data, 90
Create a Reference Table From a Flat File, 92
Create a Reference Table from a Database Table, 94
Copying a Reference Table in the Model Repository, 95
Reference Table Updates, 96
Audit Trail Events, 98
Rules and Guidelines for Reference Tables, 99
Reference Tables Overview
A reference table contains standard versions and alternative versions of a set of data values. You configure a
transformation to read a reference table to verify that source data values are accurate and correctly formatted.
Reference tables store metadata in the Model repository. Reference tables can store column data in the reference
data warehouse or in another database.
Use the Analyst tool to create and update reference tables. You can edit reference table data and metadata in the
Analyst tool.
Reference Table Properties
You can view and edit reference table properties in the Analyst tool. A reference table displays general properties that
describe the repository object and column properties that describe the column data.
To view the properties, open the reference table and select the Properties view.
To edit the properties, open the reference table and select the Edit Table option.
87
General Reference Table Properties
The general properties include information about the users who created and updated the reference table. The general
properties also identify the table location and the current valid column in the table.
The following table describes the general properties:
Property Description
Name Name of the reference table.
Description Optional description of the reference table.
Location Project that contains the reference table in the Model
repository.
Valid Column Column that contains the valid reference data.
Created on Creation date for the reference table.
Created By User who created the reference table.
Last Modified Date of the most recent update to the reference table.
Last Modified By User who most recently edited the reference table.
Connection Name Connection name of the database that stores the reference
table data.
Type Reference table type. The reference table can be managed
or unmanaged.
If the table is unmanaged, the property indicates if users can
edit the table data.
Reference Table Column Properties
The column properties contain information about the column metadata.
The following table describes the column properties:
Property Description
Name Name of each column.
Data Type The datatype for the data in each column. You can select
one of the following datatypes:
- bigint
- date/time
- decimal
- double
- integer
- string
You cannot select a double data type when you create an
empty reference table or create a reference table from a flat
file.
88 Chapter 16: Reference Tables
Property Description
Precision Precision for each column. Precision is the maximum
number of digits or the maximum number of characters that
the column can accommodate.
The precision values you configure depend on the data
type.
Scale Scale for each column. Scale is the maximum number of
digits that a column can accommodate to the right of the
decimal point. Applies to decimal columns.
The scale values you configure depend on the data type.
Description Optional description for each column.
Nullable Indicates if the column can contain null values.
Create Reference Tables
Use the reference table editor, profile results, or a flat file to create reference tables. Create reference tables to share
reference data with developers in the Developer tool.
Use the following methods to create a reference table:
Create a reference table in the reference table editor.
Create a reference table from profile column data or profile pattern data.
Create a reference table from flat file data.
Create a reference table from data in another database table.
Creating a Reference Table in the Reference Table Editor
Use the New Reference Table Wizard and the reference table editor view to create a reference table. You use the
reference table editor to define the table structure and add data to the table.
1. In the Navigator, select the project or folder where you want to create the reference table.
2. Click Actions > New > Reference Table.
The New Reference Table Wizard appears.
3. Select the option to Use the reference table editor.
4. Click Next.
5. Enter the table name, and optionally enter a description and default value.
The Analyst tool uses the default value for any table record that does not contain a value.
6. For each column you want to include in the reference table, click the Add New Column icon and configure the
properties for each column.
Note: You can reorder or delete columns.
7. Optionally, enter an audit note for the table.
The audit note appears in the audit trail log.
Create Reference Tables 89
8. Click Finish.
Create a Reference Table from Profile Data
You can use profile data to create reference tables that relate to the source data in the profile. Use the reference tables
to find different types of information in the source data.
You can use a profile to create or update a reference table in the following ways:
Select a column in the profile and add it to a reference table.
Browse a profile column and add a subset of the column data to a reference table.
Select a column in the profile and add the pattern values for that column to a reference table.
Creating a Reference Table from Profile Columns
You can create a reference table from a profile column. You can add a profile column to an existing reference table.
The New Reference Table Wizard adds the column to the reference table.
1. In the Navigator, select the project or folder that contains the profile with the column that you want to add to a
reference table.
2. Click the profile name to open it in another tab.
3. In the Column Profiling view, select the column that you want to add to a reference table.
4. Click Actions > Add to Reference Table.
The New Reference Table Wizard appears.
5. Select the option to Create a new reference table.
Optionally, select Add to existing reference table, and click Next. Navigate to the reference table in the project
or folder, preview the reference table data and click Next. Select the column to add and click Finish.
6. Click Next.
7. The column name appears by default as the table name. Optionally enter another table name, a description, and
default value.
The Analyst tool uses the default value for any table record that does not contain a value.
8. Click Next.
9. In the Column Attributes panel, configure the column properties for the column.
10. Optionally, choose to create a description column for rows in the reference table.
Enter the name and precision for the column.
11. Preview the column values in the Preview panel.
12. Click Next.
13. The column name appears as the table name by default. Optionally, enter another table name and a
description.
14. In the Save in panel, select the location where you want to create the reference table.
The Reference Tables: panel lists the reference tables in the location you select.
15. Optionally, enter an audit note.
16. Click Finish.
90 Chapter 16: Reference Tables
Creating a Reference Table from Column Values
You can create a reference table from the column values in a profile column. Select a column in a profile and select the
column values to add to a reference table or create a reference table to add the column values.
1. In the Navigator, select the project or folder that contains the profile with the column that you want to add to a
reference table.
2. Click the profile name to open it in another tab.
3. In the Column Profiling view, select the column that you want to add to a reference table.
4. In the Values view, select the column values you want to add. Use the CONTROL or SHIFT keys to select
multiple values.
5. Click Actions > Add to Reference Table.
The New Reference Table Wizard appears.
6. Select the option to Create a new reference table.
Optionally, select Add to existing reference table, and click Next. Navigate to the reference table in the project
or folder, preview the reference table data and click Next. Select the column to add and click Finish.
7. Click Next.
8. The column name appears by default as the table name. Optionally enter another table name, a description, and
default value.
The Analyst tool uses the default value for any table record that does not contain a value.
9. Click Next.
10. In the Column Attributes panel, configure the column properties for the column.
11. Optionally, choose to create a description column for rows in the reference table.
Enter the name and precision for the column.
12. Preview the column values in the Preview panel.
13. Click Next.
14. The column name appears as the table name by default. Optionally, enter another table name and a
description.
15. In the Save in panel, select the location where you want to create the reference table.
The Reference Tables: panel lists the reference tables in the location you select.
16. Optionally, enter an audit note.
17. Click Finish.
Creating a Reference Table from Column Patterns
You can create a reference table from the column patterns in a profile column. Select a column in the profile and select
the pattern values to add to a reference table or create a reference table to add the pattern values.
1. In the Navigator, select the project or folder that contains the profile with the column that you want to add to a
reference table.
2. Click the profile name to open it in another tab.
3. In the Column Profiling view, select the column that you want to add to a reference table.
4. In the Patterns view, select the column patterns you want to add. Use the CONTROL or SHIFT keys to select
multiple values
5. Click Actions > Add to Reference Table.
Create a Reference Table from Profile Data 91
The New Reference Table Wizard appears.
6. Select the option to Create a new reference table.
Optionally, select Add to existing reference table, and click Next. Navigate to the reference table in the project
or folder, preview the reference table data and click Next. Select the column to add and click Finish.
7. Click Next.
8. The column name appears by default as the table name. Optionally enter another table name, a description, and
default value.
The Analyst tool uses the default value for any table record that does not contain a value.
9. Click Next.
10. In the Column Attributes panel, configure the column properties for the column.
11. Optionally, choose to create a description column for rows in the reference table.
Enter the name and precision for the column.
12. Preview the column values in the Preview panel.
13. Click Next.
14. The column name appears as the table name by default. Optionally, enter another table name and a
description.
15. In the Save in panel, select the location where you want to create the reference table.
The Reference Tables: panel lists the reference tables in the location you select.
16. Optionally, enter an audit note.
17. Click Finish
Create a Reference Table From a Flat File
You can import reference data from a CSV file. Use the New Reference Table wizard to import the file data.
You must configure the properties for each flat file that you use to create a reference table.
92 Chapter 16: Reference Tables
Analyst Tool Flat File Properties
When you import a flat file as a reference table, you must configure the properties for each column in the file. The
options that you configure determine how the Analyst tool reads the data from the file.
The following table describes the properties you can configure when you import file data for a reference table:
Properties Description
Delimiters Character used to separate columns of data. Use the Other
field to enter a different delimiter.
Delimiters must be printable characters and must be
different from the escape character and the quote character
if selected.
You cannot select non-printing multibyte characters as
delimiters.
Text Qualifier Quote character that defines the boundaries of text strings.
Choose No Quote, Single Quote, or Double Quotes.
If you select a quote character, the wizard ignores delimiters
within pairs of quotes.
Column Names Imports column names from the first line. Select this option if
column names appear in the first row.
The wizard uses data in the first row in the preview for
column names.
Default is not enabled.
Values Option to start value import from a line. Indicates the row
number in the preview at which the wizard starts reading
when it imports the file.
Creating a Reference Table from a Flat File
When you create a reference table data from a flat file, the table uses the column structure of the file and imports the
file data.
1. In the Navigator, select the project or folder where you want to create the reference table.
2. Click Actions > New > Reference Table.
The New Reference Table Wizard appears.
3. Select the option to Import a flat file.
4. Click Next.
5. Click Browse to select the flat file.
6. Click Upload to upload the file to a directory in the Informatica services installation directory that the Analyst tool
can access.
7. Enter the table name. Optionally, enter a description and default value.
The Analyst tool uses the default value for any table record that does not contain a value.
8. Select a code page that matches the data in the flat file.
9. Preview the data in the Preview of file panel.
10. Click Next.
Create a Reference Table From a Flat File 93
11. Configure the flat file properties.
12. In the Preview panel, click Show to update the preview.
13. Click Next.
14. On the Column Attributes panel, verify or edit the column properties for each column.
15. Optionally, create a description column for rows in the reference table. Enter the name and precision for the
column.
16. Optionally, enter an audit note for the table.
17. Click Finish.
Create a Reference Table from a Database Table
When you create a reference table from a database table, you connect to the database and import the table data.
Before you import reference table data from a database table, verify that the Informatica domain contains a
connection to the database. If the domain does not contain a connection to the database, you can define one in the
Analyst tool.
Use the Manage Connections options to define a database connection. You can find the Manage Connections
options on the Analyst tool header. The options also appear in the New Reference Table wizard.
Creating a Reference Table from a Database Table
To create the reference table, connect to a database and import the data.
1. In the Navigator, select the project or folder to store the reference table object.
2. Click Actions > New > Reference Table.
The New Reference Table Wizard appears.
3. Select the option to Connect to a relational table.
4. Select Unmanaged Table to create a table that does not store data in the reference data warehouse.
To perform edit operations on an unmanaged reference table, select the Editable option.
Click Next.
5. Select the database connection from the list of established connections.
Click Next.
6. On the Tables panel, select a table.
7. Verify the table properties in the Properties panel.
Optionally, click Data Preview to view the table data.
Click Next.
8. On the Column Attributes panel, select the Valid column.
Optionally, specify an audit comment to write to the audit trail when a user updates the reference table.
If you create a managed reference table, you can perform the following actions on the Column Attributes
panel:
Edit the reference table column names.
Add a metadata column for row-level descriptions.
94 Chapter 16: Reference Tables
9. Click Next.
10. Enter a name and optionally a description for the reference table.
11. On the Folders panel, verify the project or folder to store the reference table.
The Reference Tables panel lists the reference tables in the folder that you select.
12. Click Finish.
Creating a Database Connection for a Reference Table
You can use the New Reference Table wizard to define a database connection. When you define the database
connection, you can select the table that contains the data values for the reference table.
1. In the Navigator, select the project or folder to store the reference table object.
2. Click Actions > New > Reference Table.
The New Reference Table wizard appears.
3. Select the option to Connect to a relational table.
4. Select Unmanaged Table if you want to create a table that does not store data in the reference data
warehouse.
If you want to perform edit operations on an unmanaged reference table, select the Editable option.
5. Click Next.
6. Click Manage Connections.
The Manage Connections window opens.
7. To create a database connection, click New.
Enter the connection properties for the database that contains the reference table data.
8. Click OK.
The database connection appears in the list of available connections.
Copying a Reference Table in the Model Repository
You can copy a reference table between folders in a Model repository project.
The reference table and the copy you create are not linked in the Model repository or in the database. When you create
a copy, you create a new database table.
1. Browse the Model repository, and find the reference table you want to copy.
2. Right-click the reference table, and select Duplicate from the context menu.
3. In the Duplicate dialog box, select a folder to store the copy of the reference table.
4. Optionally, enter a new name for the copy of the reference table.
5. Click OK.
Copying a Reference Table in the Model Repository 95
Reference Table Updates
The business data that a reference table contains can change over time. Review and update the data and metadata in
a reference table to verify that the table contains accurate information. You update reference tables in the Analyst tool.
You can update the data and metadata in a managed reference table and an unmanaged reference table.
You can perform the following operations on reference table data and metadata:
Manage columns
Use the Edit column properties dialog box to add columns, delete columns, and edit column properties.
Manage rows
Use the Add Row dialog box to add data rows of data to a reference table. Use the Edit Row or Edit Multiple
Rows dialog box to update the row data.
Edit reference data values
You can edit a reference data value in the reference table view.
Replace data values
Use the Find and Replace option to replace data values that are no longer accurate or relevant to the
organization. You can find a value in a column and replace it with another value. You can replace all values in a
column with a single value.
Export a reference table
Export a reference table to a comma-separated values (CSV) file, dictionary file, or Excel file.
Enable or disable edits on an unmanaged table
Update an unmanaged reference table to enable or disable edits to table data and metadata.
Managing Columns
Use the Edit column properties dialog box to manage the columns in a reference table. You can also set table
properties in the Edit column properties dialog box.
1. In the Navigator, select the project or folder that contains the reference table that you want to edit.
2. Click the reference table name to open it in a tab. The Reference Table tab appears.
3. Click Actions > Edit Table or click the Edit Table icon.
The Edit column properties dialog box appears. Use the dialog box options to perform the following
operations:
Change the valid column in the table.
Delete a column from the table.
Change a column name.
Update the descriptive text for a column.
Update the editable status of the reference table.
Update the audit note for the table. The audit note appears in the audit log for any action that you perform in
the Edit column properties window.
4. When you complete the operations, click OK.
96 Chapter 16: Reference Tables
Managing Rows
You can add, edit, or delete rows in a reference table.
1. In the Navigator, select the project or folder that contains the reference table.
2. Click the reference table name to open it. The table opens in the Reference Table tab.
3. Edit the data rows. You can edit the data rows in the following ways:
To add a row, select Actions > Add Row.
In the Add Row window, enter a value for each column. Optionally, enter an audit note.
Click OK to apply the changes.
To edit a data value, double-click the value in the reference table and update the value
After you edit the data, use the row-level options to accept or reject the edit.
To edit multiple rows, select the rows to edit and select Actions > Edit.
In the Edit Multiple Rows window, enter a value for each column in the row. Optionally, enter an audit
note.
Click OK to apply the changes.
To delete rows, select the rows to delete and click Actions > Delete.
In the Delete Rows window, optionally enter an audit note.
Click OK to delete the data.
Note: Use the Developer tool to edit row data in a large reference table. For example, if a reference table contains
more than 500 rows, edit the table in the Developer tool.
Finding and Replacing Values
You can find and replace data values in a reference table. Use the find and replace options when a table contains one
or more instances of a data value that you must update.
1. In the Navigator, select the project or folder that contains the reference table.
2. Click the reference table name to open it. The table opens in the Reference Table tab.
3. Click Actions > Find and Replace.
The Find and Replace toolbar appears.
4. Enter the search criteria on the toolbar:
Enter a data value in the Find field.
Select the columns to search. By default, the operation searches all columns.
Enter a data value in the Replace With field.
5. Search the columns you select for the data value in the Find field.
Use the following options to replace values one by one or to replace all values:
Use the Next and Previous options to find values one by one.
To replace a value, select Replace.
Use the Highlight All option to display all instances of the value.
To replace all instances of the value, select Replace All.
Reference Table Updates 97
Exporting a Reference Table
Export a reference table to a comma-separated file, dictionary file, or Microsoft Excel file.
1. In the Navigator, select the project or folder that contains the reference table.
2. Click the reference table name to open it. The table opens in the Reference Table tab.
3. Click Actions > Export Data.
The Export data to a file window appears.
4. Configure the following options:
Option Description
File Name File name for the exported data.
File Format Format of the exported file. You can select the following formats:
csv. Comma-separated file.
xls. Microsoft Excel file.
dic. Informatica dictionary file.
Export field names as first row Column name option. Select the option to indicate that the first row of the file
contains the column names.
Code Page Code page of the reference data. The default code page is UTF-8.
5. Click OK to export the file.
Enable and Disable Edits to an Unmanaged Reference Table
You can configure an unmanaged reference table to enable or disable updates to data values and to columns in the
table.
Before you change the editable status of the reference table, save the table.
1. In the Navigator, select the project or folder that contains the reference table.
2. Click the reference table name to open it in a tab.
3. Click Actions > Edit Table or click the Edit Table icon.
The Edit column properties window appears.
4. Select or clear the Editable option.
When you change the editable status of the reference table, the properties dialog box closes.
Audit Trail Events
Use the Audit Trail view for a reference table to view audit trail log events.
The Analyst tool creates audit trail log events when you make a change to a reference table and enter an audit trail
note. Audit trail log events provide information about the reference tables that you manage.
98 Chapter 16: Reference Tables
You can configure query options on the Audit Trail tab to filter the log events that you view. You can specify filters on
the date range, type, user name, and status. The following table describes the options you configure when you view
audit trail log events:
Option Description
Date Start and end dates for the log events to search for. Use the calender to choose dates.
Type Type of audit trail events. You can filter and view the following events types:
- Data. Events related to data in the reference table. Events include creating, editing, deleting, and
replacing all rows.
- Metadata. Events related to reference table metadata. Events include creating reference tables,
adding, deleting, and editing columns, and updating valid columns.
User User who edited the reference table and entered the audit trail comment. The Analyst tool
generates the list of users from the Analyst tool users configured in the Administrator tool.
Status Status of the audit trail log events. Status corresponds to the action performed in the reference
table editor.
Audit trail log events also include the audit trail comments and the column values that were inserted, updated, or
deleted.
Viewing Audit Trail Events
View audit trail log events to get more information about changes made to a reference table.
1. In the Navigator, select the project or folder that contains the reference table that you want to view the audit trail
for.
2. Click the reference table name to open it in a tab. The Reference Table tab appears.
3. Click the Audit Trail view.
4. Configure the filter options.
5. Click Show.
The log events for the specified query options appear.
Rules and Guidelines for Reference Tables
Use the following rules and guidelines while working with reference tables in the Analyst tool:
When you import a reference table from an Oracle, IBM DB2, IBM DB2/zOS, IBM DB2/iOS, or Microsoft SQL
Server database, the Analyst tool cannot display the preview if the table, view, schema, synonym, or column
names contain mixed case or lowercase characters.
To preview data in tables that reside in case-sensitive databases, set the Support Mixed Case Identifiers attribute
to true. Set the attribute to true in the connections for Oracle, IBM DB2, IBM DB2/zOS, IBM DB2/iOS, and Microsoft
SQL Server databases in the Developer tool or Administrator tool.
When you create a reference table from inferred column patterns in one format, the Analyst tool populates the
reference table with column patterns in a different format.
Rules and Guidelines for Reference Tables 99
For example, when you create a reference table for the column pattern X(5), the Analyst tool displays the following
format for the column pattern in the reference table: XXXXX.
When you import an Oracle database table, verify the length of any VARCHAR2 column in the table. The Analyst
tool cannot import an Oracle database table that contains a VARCHAR2 column with a length greater than
1000.
To read a reference table, you need execute permissions on the connection to the database that stores the table
data values. For example, if the reference data warehouse stores the data values, you need execute permissions
on the connection to the reference data warehouse. You need execute permissions to access the reference table
in read or write mode. The database connection permissions apply to all reference data in the database.
100 Chapter 16: Reference Tables
I NDEX
A
Analyst tool
find and replace reference data values 97
C
column profile
drilldown 64
Informatica Developer 36
options 35
overview 34
process 55
column profile results
Informatica Developer 38
column properties
reference tables in Analyst tool 87
reference tables in Developer tool 47
creating a custom profile
profiles 56
creating a reference table from column patterns
reference tables 91
creating a reference table from column values
reference tables 91
creating a reference table from profile columns
reference tables 90
creating a reference table manually
reference tables 89
creating an expression rule
rules 69
D
data object profiles
creating a single profile 37
Developer tool
find and replace reference data values 52
E
export
scorecard lineage to XML 44
exporting a reference table
reference tables 98
expression rules
process 69
F
flat file properties
reference tables in Analyst tool 87
reference tables in Developer tool 47
flat files
synchronizing a flat file data object 59
I
importing a reference table
reference tables 93
Informatica Analyst
column profile results 60
column profiles overview 54
rules 67
Informatica Data Quality
overview 2
Informatica Developer
rules 41
M
managing columns
reference tables 96
managing rows
reference tables 97
mapping object
running a profile 45
Mapplet and Mapping Profiling
Overview 45
P
predefined rules
process 68
profile results
column patterns 62
column statistics 63
column values 62
drilling down 64
Excel 65
exporting 64
exporting from Informatica Analyst 65
exporting in Informatica Developer 40
summary 61
profiles
creating a custom profile 56
running 58
R
reference tables
column properties in Analyst tool 87
column properties in Developer tool 47
creating a reference table from column patterns 91
creating a reference table from column values 91
101
creating a reference table from profile columns 90
creating a reference table manually 89
exporting a reference table 98
finding and replacing values in the Analyst tool 97
finding and replacing values in the Developer tool 52
flat file properties in Analyst tool 87
flat file properties in Developer tool 47
importing a reference table 93
managed and unmanaged 7
managing columns 96
managing rows 97
viewing audit trail tables 99
rules
applying a predefined rule 68
applying in Informatica Developer 42
creating an expression rule 69
creating in Informatica Developer 41
expression 69
overview 35
predefined 68
S
scorecard
configuring global notification settings 78
configuring notifications 78
viewing in external applications 80
scorecard integration
Informatica Analyst 79
scorecard lineage
viewing from Informatica Developer 44
viewing in Informatica Analyst 80
scorecards
adding columns to a scoredard 72
creating a metric group 75
defining thresholds 74
deleting a metric group 76
drilling down 76
editing 74
editing a metric group 75
Informatica Analyst 71
Informatica Analyst process 71
Informatica Developer 43
metric groups 75
metric weights 72
metrics 72
moving scores 75
notifications 77
overview 35
running 73
viewing 73
T
tables
synchronizing a relational data object 59
trend charts
viewing 77
V
viewing audit table events
reference tables 99
102 Index

You might also like