You are on page 1of 34

Managing a Data Cleansing

Process for Materials or Services


Edition II
A practical guide to cleansing master data using
international data quality standards designed
for the military

ECCMA White paper


by
Peter R. Benson
2013-06-04

Managing a Data Cleansing Process for Materials or Services


Introduction
If you are reading this it may be because you are evaluating a data cleansing project proposal, or you are
about to embark on a data cleansing project either using in house resources or contracting out the
project to a service provider. You may also be evaluating a range of software applications that are
designed to make the project easier. Of course you could also be reading this because you are in the
middle of a data cleansing project and nothing is going according to plan. The project is late, over budget
or worse, not delivering the quality of data you expected.
You may be staring into the abyss of a go-live date that, at first, appeared manageable, but now appears
impossible with a process that is grinding to a halt as the number of problems and exceptions continue
to grow. Even worse you may have cut corners to meet your go-live date only to be faced with an ever
increasing number of complaints about the changes you have made or not made to the item names or
descriptions in your ERP system. No one can find what they are looking for and purchasing is filling in the
gaps with free text purchase orders. With end users and senior managers all pointing to the quality of
the data as the root cause of the failure, why did you get involved and how was it possible for such a
simple project to go so horribly wrong?
The answer lies in the fact that data cleansing or cataloging looks so deceptively simple, surely anyone
can do it. In fact, data cleansing is like any other process and while it can be successfully accomplished
on a very small scale by anyone with some common sense, it still takes a process and skill to be
successful. Very much like cooking, there is an enormous difference between cooking for the family,
cooking professionally for hundreds of guests, and designing, building and managing a food
manufacturing process designed to consistently, efficiently and economically turn out millions of quality
items.
Cooking is a good analogy, first you need a description of the end product then you need the right
ingredients, the right process and the right tools. Of course this is not all; it also takes experience and
skill, if not a natural gift. As with most processes, specialization and industrialization has allowed
companies to develop the tools to increase the speed and reduce the cost of data cleansing but in
respect of quality, in-house data cleansing will always deliver better quality data for the same reason
that Machiavelli explained why a citizen army is always better than hired mercenaries, mercenaries fight
for money and citizens fight to protect their homes and their families, the motivation is different.
Data cleansing (cleaning) or cataloging has come a very long way over the last ten years and as the
original author of the UNSPSC (United Nations Standard Product and Services Code) I am honored to
have watched the industry grow, not only in size, but in sophistication. The purpose of this document is
to provide an insight into the process of data cleansing, to make it easier to evaluate data cleansing
proposals and to make it possible for you to manage a data cleansing project with confidence.

Page 2

Managing a Data Cleansing Process for Materials or Services


Show me the money
Regardless of where or how it is done, data cleansing costs money and justifying the cost is the first step
in any data cleansing project. The most common justification for data cleansing is cost avoidance or cost
reduction through part standardization or supplier rationalization. These are realistic goals that can be
estimated with a reasonable degree of accuracy but they are frustratingly hard to sell to upper
management.
Reducing costs while clearly necessary and vital to profitability is intrinsically hard to sustain over time
simply because of the law of diminishing returns. Once the low hanging fruit has been identified and
harvested an ever increasing effort is required for an ever decreasing yield. The challenge is to
demonstrate that the sustained efforts necessary to maintain quality data contributes to revenue and
profit in a significant measurable way and this is hard to do on cost savings alone.
Master data plays a key role in almost every aspect of a business from identifying prospective customers
to making a sale, creating and delivering a product or service as well as paying suppliers, contractors and
employees, not forgetting calculating and paying management bonuses and shareholder dividends.
Rather than measuring savings in maintenance costs it is simply more effective to focus on the potential
for increased output and reduced unit cost through reduced down time, improvements in production or
better still convert increased output and reduced unit cost into cash flow or return on capital. Focusing
on the impact quality master data has on growth and profitability is far more attractive and for good
reason, the return on investment (ROI) increases over time.
In a steel plant I worked with, the IT and purchasing managers where trying to justify their data
cleansing project based on savings in maintenance costs and improving the requisition to order process.
Given the size of the plant and the quality of the data, the expected savings were significant both in
terms of money and time, but they were unable to get the project passed by senior management. The
reason was simple, a quick analysis of finished product cost showed that maintenance cost including
labor represented a mere 0.15% of total finished product cost. It was understandable that it should not
have been on managements high priority list. But down time was clearly on managements radar, as
total output and unit production costs are directly and immediately impacted by down time. The
correlation between maintenance and down time was clearly understood so all it required was to point
out that the predicted cost of improving data quality and the requisition to order process would be
covered by reducing downtime by 2 minutes per day! With better data perhaps we could find the root
causes of the problems and even reduce downtime by 5 minutes or even 15 minutes which would
represent a ROI of 750%. With a full order book and senior managers struggling to increase output,
restating the data cleansing project benefits in terms of potential increased output capacity made the
project an immediate top priority but it also did something else, it created a clear understanding that
maintaining data quality would be key to maintaining output and the acceptance of the need for
sustained funding for the data quality program.

Page 3

Managing a Data Cleansing Process for Materials or Services


Fundamental principles:
Data is a critical asset to all organizations and as can be seen in the following illustration of the different
types of data and how they are related, master data plays a key role.

master data
data held by an organization that describes the entities that are both independent and fundamental
for that organization, and that it needs to reference in order to perform its transactions
EXAMPLE: A credit card transaction is related to two entities that are represented by master data. The
first is the issuing banks credit card account that is identified by the credit card number, where the
master data contains information required by the issuing bank about that specific account. The second is
the accepting banks merchant account that is identified by the merchant number, where the master
data contains information required by the accepting bank about that specific merchant.
NOTE 1 Master data typically includes records that describe customers, products, employees, materials,
suppliers, services, shareholders, facilities, equipment, and rules and regulations.
NOTE 2 The determination of what is considered master data depends on the viewpoint of the
organization.
NOTE 3 The term "entity" is used in the general sense, not as used in information modeling.
[ISO 8000-110]

Page 4

Managing a Data Cleansing Process for Materials or Services


A master data record in an ERP system includes many different data elements controlled and managed
by different business functions. Some are general or basic data elements and some are function
specific. Items that are stocked or inventoried will need minimum stock levels and reorder quantities, as
well as, lead times. Every item needs a name and almost everything is going to need a price and a
purchase order description, although it is regrettably common to see that when this is mandatory, the
item name is often copied into the purchase order description field.
Data cleansing, as its name implies, is a process of transformation; it consists of taking one set of names
and descriptions and creating another set of names and descriptions.

Data cleansing is the process of improving the quality of the names and descriptions of an item
typically in an ERP application.
The data cleansing process can be broken down into a two step process where in the first step the
original name and descriptions are deconstructed and then enriched to create a structured master data
record which in the second step is used to build new names and descriptions.
The process of building the structured master data record is called cataloging and the process used to
transform a structured master data record into descriptions is called rendering.
Both the cataloging and rendering processes are driven by rules. The rules for cataloging are contained
in data requirements (DR) also known as cataloging templates or identification guides, these are the
actual data quality standards. The rules for rendering are contained in rendering guides (RG) or
description rules.
When the process we know today as data cleansing was originally being developed, contractors would
perform the transformation from the original item names and descriptions to the new item names and
descriptions without providing their customers with copies of the cataloging templates, the structured
master data, or the rules used for creating the new names and descriptions. This is rarely the case today
as most customers realize that without the rules and the structured master data, they can never be
independent of the contractor.

Page 5

Managing a Data Cleansing Process for Materials or Services


ISO 22745 was designed as an international standard to ensure that the data needed to cleanse master
data could be preserved independently of any software application and easily exchanged between data
cleansing applications or services to encourage the competition that has resulted in better quality at a
lower cost.
ISO/TS 22745-10 is the international standard for representing an open technical dictionary
ISO/TS 22745-20 is the international standard for maintenance of an open technical dictionary
ISO/TS 22745-30 is the international standard for representing computer processable data requirements
using an open technical dictionary.
ISO/TS 22745-35 is the international standard for requesting data that meets specified data
requirement
ISO/TS 22745-40 is the international standard for the exchange of structured master data using an open
technical dictionary
ISO/WD 22745-45 is the international standard for representing computer processable rendering guides
using an open technical dictionary.
ISO 8000 was designed as a standard to be used specifically for contracting for quality master data

Cataloging

The first step in the journey to better descriptions is cataloging. This is the process of describing
something, anything, and it is indeed an ancient art. Aristotle was struggling with descriptions over
2,350 years ago when he wrote categories (http://classics.mit.edu/Aristotle/categories.html). Cataloging
identifies the discrete characteristic of something in the form of property-value pairs where the
property provides the meaning of the value. Examples of properties are height, weight and material.
Properties are used to represent characteristics and the first rule of cataloging is that the properties
must be explicitly defined; this is typically done using a dictionary. While properties are used to define
the meaning of values, it is often useful to further define values in terms of their data types, a date, a
measure, a numeric value, a text string; these are examples of data types.
Many items may be described using the same properties with different values. For example, you can
describe many individuals using the properties of name, date of birth, place of birth; the properties
remain the same only the values change. A group of items that can be described using the same
properties is called a Class. A class name is therefore nothing more than the name for a group of
properties, typically it will be used in naming the item and it will also be used in its descriptions.

Page 6

Managing a Data Cleansing Process for Materials or Services


Although it is possible to have several data requirements for a single class, this is not common and users
often use the class name as the name of the data requirement, this is a common source of confusion.
While we are on the subject of sources of confusion, if you look back at the example of a structured
master data record, you will see that the class is included in the list of properties, yes the class is indeed
a property and the class name of a material should not be confused with the class name of a
classification.

Page 7

Managing a Data Cleansing Process for Materials or Services


Data quality
Lex parsimoniae, the principle of the law of thrift that has become known as Ockham's razor applies
to cataloging, the fewer properties you need to uniquely describe an item the better.
Choosing the right data requirement is the key to successful cataloging. What you are looking for is just
enough data to be useful. The definition of data quality is the degree to which the data complies with
the data requirement. You need to set the bar high enough to achieve your goals but no higher, as this
will incur unnecessary additional costs.
Data that exceeds the data requirement is not better data; it is just more expensive data.
The best way to define your initial data requirements is to perform a scoping study. Performed correctly,
a scoping study can identify your initial dictionary, your initial data requirements and your initial
description rules. The emphasis is on initial because your dictionary, data requirements and
description rules will evolve over time as you become more familiar with the role data plays in your
company.
Duplicate and substitutable items
Your initial data requirements will need to be set to allow you to identify duplicate and substitutable
items. The concepts of duplicate and substitutable are different.
Duplication applies to items of production. Duplicate items are created when a single item is given
multiple numbers by a manufacturer or supplier or when items are manufactured to a standard
specification. Most buyers cringe at the very thought that a manufacturer or supplier should use
different part numbers for the same item, but they do. The reason some suppliers advertise that you will
not find the same item at a lower price elsewhere, is that they know that the manufacturer issued the
part number specifically for them and the exact same item has a different part numbers when it is sold
through a different supplier. If this was not bad enough, part numbers and model number have become
brands in their own right, so a manufacturer may keep the part number or model number while making
what, in their opinion, is a small insignificant change in features or designs. To you, these features or
designs may be important. If you have ever ordered a replacement part only to find it no longer fits, you
will understand the nature of the problem and why it is always safer to order something using a full
specification.
Substitution applies to items of supply. Substitutable items are created when several items are given a
single number by a buyer. Identify substitutable items is very important to buyers, it is the primary
method of reducing price and risk by leveraging competition.
The true skill of a cataloger lies in their ability to understand and identify substitutable items.
True duplicate items are easier to identify and safer to group under a single material record but as we
have seen, the part number alone is not a reliable indication of duplication. Duplication should always
be determined by comparing characteristic data.

Page 8

Managing a Data Cleansing Process for Materials or Services


In the following structured master data record both the characteristic and the identification data are
shown:

Items that share the same characteristics are substitutable and you can merge the records by grouping
all the identification data under a single material, however considerable care needs to be taken in
determining which items are duplicates or substitutable. The characteristic data for each items needs to
be considered carefully, as well as, the characteristic that you consider to be fundamental. Removing
characteristics will cause materials to become substitutable and adding characteristic will cause
materials that were substitutable to become different.
The skill in cataloging is to be able to use both the characteristic and the identification data
appropriately. If you are trying to buy one tire you should use the part number (identification data) but
if you are requesting quotes for hundreds of tires you should use the characteristic data.

Page 9

Managing a Data Cleansing Process for Materials or Services


External identification data is copyright
The types of identification data can include a NATO Stock Number (NSN) issued by a NATO cataloging
bureau, a Supplier Part Number (SPN), a Manufacturer Part number (MPN) or a Standard Reference
Number (SRN). Other types not shown include a Drawing Reference Number (DRN) or a Global Item
Number (GIN) issued by the GS1, an international association of retail item numbering organizations.
Finally it is also common to include the Buyer Material Numbers (BMN) or Stock Keeping Unit number
(SKU) which are the identifiers typically issued by other buyers. This is common when there are multiple
business units within a group or when a group of companies share a common catalog.
Your passport number, your vehicle identification number, your vehicle registration number, your club
membership number, your telephone number, a tax payer identifier; these are all familiar identifiers.
Just as every master data record you create will have an internal identifier; external identifiers are
typically the internal identifiers of other organizations.
An identifier is created by an author, all identifiers are copyright, they are the legal property of their
author and the author is the authoritative source of the characteristic data that was used to assign the
identifier. Taking this into consideration it also follows that you should never use an external identifier
as your internal identifier and you must exercise great care in how you use external identifiers.
Including external identifiers in your master data is an acceptable use as is using external identifiers as
internal search parameters but you must be careful to clearly identify the source of any characteristic
data or other external identifiers that are retrieved as the result of a search using an external identifier.
This is often hard to follow and an example helps. A D-U-N-S Number issued by Dun and Bradstreet
(D&B) is essentially a proprietary product number; it identifies a collection of data that belongs to D&B.
While you can store the D-U-N-S numbers that were assigned to your trading partners by D&B, you
should only use these numbers to buy data from D&B. What you cannot do without a license is to use a
D-U-N-S Number to lookup or distribute data that did not come from D&B. When you think of it, this is
eminently reasonable, imagine a third party selling credit data or address verification data using your
internal customer or vendor identifier. While it is acceptable to include D-U-N-S Numbers as
organization identifiers in your vendor or customer master data and to display this number in reports,
you should not allow the use of the D-U-N-S Numbers as a lookup field even within your organization if
you do not have a license to do so (to my knowledge only one organization, the federal government, has
been granted license to use the D-U-N-S Numbers for public search of data that does not belong to
D&B).

Page 10

Managing a Data Cleansing Process for Materials or Services


The data cleansing building blocks

The ISO 8000 defines quality master data as portable data that meets requirements.
This idea of standardized cataloging originated in 1950 with the development of the NATO cataloging
System (NCS) and is at the heart of all cataloging and data cleansing today. The principle is very simple
the master data must conform to a specified data requirement and both the master data and the data
requirements are coded using a common dictionary, this makes the master data portable.
Structured master data
The structured master data is the key to the data cleansing process; it is composed of identification data
and characteristic data. The identification data can be a manufacturers model number, a suppliers part
numbers or even a drawing number or a standards reference number. What is important to remember
is that identification data are third party identifiers controlled by third parties, so knowing who assigned
the identifier is as important as the identifier itself. An item identifier combined with identifier of the
organization that issued the item identifier is called an item reference.
Characteristic data is data that describes the physical characteristics or performance characteristics of
an item. NATO refers to this as the Fit, Form and Function of an item.

Structured Master Data


Identification data

Characteristic data

Cataloging is the process of creating a structured master data record that conforms to a data
requirement.
Both identification and characteristic data are represented in the form of property-value pairs where the
property gives meaning to the value. The definitions of the properties, as well as, any other concepts
used in cataloging are contained in the dictionary.
Page 11

Managing a Data Cleansing Process for Materials or Services


Data requirements
Data requirements are the data quality standards against which the structured master data is
measured.
The data requirements are the cataloging rules; they define what data elements are required or
optional. Comparing the structured master data against the data requirements is how you identify the
missing data. As we will see, data requirements play an important role in data acquisition when they are
used to create requests for missing data or for data validation.
In advanced data requirements you can also assign a data type to a property, as well as, validation rules
such as a limited list of codes or a specific unit of measure or a numeric range. You can also define a
mask for example if you want a string to be uppercase or the number of points after a decimal or a
specific date format. These advance features of data requirements are typically used in designing data
capture systems.
Items that are described using the same characteristics are said to be members of a class. For example
all the bolts in your structured master data could be described using the same properties of TYPE,
MATERIAL, LENGTH, if this were the case all the materials would belong to the class BOLT. It is possible
for a class to have several data requirements, for example, one for engineering that documents the data
needed by engineering and one for procurement that document the data needed for purchasing but in
practice most companies create a single data requirement and name the data requirement after the
class. This is actually not a good practice and I recommend that data requirements be simply identified
using a reference number. The following is an example of two data requirements:
Data Requirement reference
Date updated
Class Name
Characteristic data
Property 1
Mandatory
Property 2
Mandatory
Property 3
Mandatory
Property 4
Mandatory
Property 5
Mandatory
Property 6
Optional
Property 7
Optional
Property 8
Optional
Property 9
Optional
Property 10
Optional
Identification data
Mandatory

Property 11

Optional

Property 12

DR1
2013-05-21
VALVE,BALL
THREAD CLASS
BODY MATERIAL
PIPE SIZE
CONNECTION STYLE

DR2
2013-05-21
BEARING
TYPE
INSIDE DIAMETER
OUTSIDE DIAMETER
WIDTH

MAX PRESSURE
LOAD CAPACITY
SPEED RATING
SEALING METHOD

MANUFACTURER
REFERENCE
(PREFERRED)
NATO STOCK NUMBER

Page 12

MANUFACTURER
REFERENCE
(PREFERRED)

Managing a Data Cleansing Process for Materials or Services


The dictionary
The only difference between an open technical dictionary and the dictionary you used at school is that
the concepts defined in the dictionary are given a concept identifier; this makes it easier to link the
concepts used in your structured master data and your data requirements to the definitions in your
dictionary.
Replacing concept names with their identifiers is called concept encoding and its purpose is to make
sure your data is unambiguous. Computers love encoded data, the identifiers are shorter than words
and much faster to process but of course in order for humans to make sense of encoded data it needs to
be decoded.
The role of the dictionary is to encode data and to decode data. One of the significant benefits of using a
dictionary and concept encoding is that by using a multilingual dictionary it is possible to encode using
one language and decode using another. In practice this means that it is possible to catalog in English for
example and render description in French, automatically. This process has been used successfully by
companies creating multilingual ERP descriptions with one company successfully using items cataloged
in English to create item names and descriptions in twenty nine languages.

You can build a dictionary using a spreadsheet with columns for concept identifiers, term and definition.
ISO 22745-10 is the international standard for representing an open technical dictionary; it is a very
useful model for dictionaries used to render complex, defined length names and descriptions or
multilingual names and descriptions.
The following example dictionary was created as a subset of the ECCMA open technical dictionary. The
concept identifiers have been abbreviated by removing the leading 0161-1# and the trailing #1 which
are required constants when exchanging standard compliant concept identifiers. In the eOTD not all the
terms are in capitals or the definitions in mixed case, so these were converted to make the dictionary
look more attractive.
Every company must maintain its own dictionary, creating it as a subset of the eOTD simply makes the
task easier.

Page 13

Managing a Data Cleansing Process for Materials or Services


Concept
identifier

Concept
type

Term

Abbrev

Definition

01-1142515

Class

BEARING

A device that supports and allows a rotating shaft to run


without damage by reducing friction

01-1145956

Class

VALVE,BALL

A device utilizing a ball of varying configurations


connected to a shaft which rotates to control or block
flow in a pipe

02-095207

Property

TYPE

A generic type, or configuration of the object

02-014725

Property

BODY MATERIAL

The chemical compound or mechanical mixture


properties of which the body is fabricated

02-005366

Property

INSIDE
DIAMETER

ID

The length of a straight line which passes through the


center of an item, and terminates at the inside
circumference

02-006986

Property

OUTSIDE
DIAMETER

OD

The length of a straight line which passes through the


center of a circular figure or body, and terminates at
the outside circumference

02-010188

Property

WIDTH

A measurement of the shortest dimension of the item,


in distinction from length

02-016927

Property

LOAD CAPACITY

LC

The weight the item can accommodate

02-101753

Property

SPEED RATING

SR

The maximum safe operating speed or rotational speed


(rpm)

02-019192

Property

SEALING
METHOD

02-128590

Property

MANUFACTURER
REFERENCE
(PREFERRED)

MR(P)

A preferred reference consisting of the manufacturer


name and manufacturer assigned part number

02-128594

Property

NATO STOCK
NUMBER

NSN

A number issued by a NATO codification bureau


identifying an item of supply

02-128591

Property

SUPPLIER
REFERENCE

SR

A reference consisting of a supplier name and supplier


assigned part number

02-024128

Property

THREAD CLASS

A numeric-alpha designator indicating the pitchdiameter tolerance and the external or internal location
of the thread.

02-007268

Property

PIPE SIZE

Designates the size of the pipe.

02-024592

Property

CONNECTION
STYLE

The style designation indicating the configuration that


most nearly corresponds to the appearance of the
connection.

02-093392

Property

MAX PRESSURE

The maximum operating pressure that the Packing is


designed to withstand

05-003934

UOM

INCH

"

A unit of linear measure equal to one twelfth of a foot


(2.54 cm)

07-000255

VALUE

STEEL COMP 316

SS

See industry standard

The means by which the item is sealed

Page 14

Managing a Data Cleansing Process for Materials or Services


Rendering guides
Rendering guides are a recent addition to a catalogers tool kit; they are covered in ISO/WD 22745-45,
the most recent addition to the cataloging standards. A rendering guide is an extension of the data
requirement; it specifies the sequence of properties that should be used in a name or description, as
well as how the property value pairs of the characteristic and identification data should be represented
in the name or description.
Most typically there are generic rules that apply to all names or descriptions followed by rules that apply
to one or more classes and finally there may be a rule that applies to a specific item. The following is an
example of general rendering rules followed by two rendering guides for the same item class, one is for
the item name and the second is for the purchase order description.
GENERAL RULES
SEPARATORS
CLASS NAME - CHARACTERISTIC DATA

": "

PROPERTY VALUE PAIRS

", "

PROPERTY NAME - PROPERTY VALUE

"="

CHARACTERISTIC DATA - IDENTIFICATION DATA

"; "

SPECIFIC DESCRIPTION RULES


THE RULE SPECIFY THE ORDER OF THE PROPERTIES (P1Pn) AND IF THE PROPERTY NAME (N) IS TO BE SUPPRESSED
(S) OR ABBREVIATED (A) OR IF THE PROPERTY VALUE (V) IS TO BE ABBREVIATED (A)
DESCRIPTION RULE REFERENCE

RG1

RG2

BASED ON DATA REQUIREMENT REF

DR1

DR1

DESCRIPTION TYPE

ITEM NAME

PURCHASE ORDER

ITEM CLASS

VALVE,BALL

VALVE,BALL

RULE

CN: P2NSVA, P3NSVA, P5NSV

CN:P3=V, P5=V, P2=V; P11=V

VALVE,BALL: SS, , 2500PSI

VALVE,BALL: SIZE=1/4 INCH,


MAX PRESSURE=2500PSI, BODY
MATERIAL=STEEL COMP 316;
MANUFACTURER REFERENCE
(PREFERRED)=PARKER:4ZMB4LPFA-SSP

EXAMPLE

Page 15

Managing a Data Cleansing Process for Materials or Services


The following is an example of a structured master data record containing descriptions rendered from
characteristic and identification data.

Page 16

Managing a Data Cleansing Process for Materials or Services


Classifications
The key to Classifications is to understand that they are derived from, the characteristic data.
If a classification is provided without the characteristic data from which the classification was derived,
the classification cannot be verified.
All classifications are designed for a specific purpose so it is common to need more than one
classification. Some of the classifications will be an internal classifications, for example for spend
analysis and some will be external third party managed classifications.

Externally
managed

Internally
managed

Some classifications can be derived from the item class and many third party classifications can be
automatically assigned using one of the many eOTD commercial classification lookup tables maintained
by ECCMA. In some cases the class is not sufficient to determine the classification and another property
must be used in conjunction with, or instead of, the class. An example of this is the customs tariff code
(HTS) where the material typically determines the classification.
If the classification changes these lookup tables or other classification rules will need to be reapplied to
update the classifications. For this reason it is not recommended to maintain classifications in a master
data record unless they are regularly used in search or reporting functions.
While I am the original author of the UNSPSC, responsible not only for its name and design rules but also
for the process used to create and maintain it, perhaps it is only fitting that I also recognize its weakness
and in doing so the weakness of all classifications. As the name implies, a classification is an organization
of classes. In a hierarchical classification, classes are grouped into super classes, themselves grouped
into super classes. These groups of classes are called the nodes of a classification and they are also

Page 17

Managing a Data Cleansing Process for Materials or Services


given names. Hierarchical classifications are referred to as tree structures with root or leaf nodes or with
parent and child nodes.
As we started, to better define classes as, collections of characteristics, it followed that a superclass
should include the characteristics of all of its subclasses and also a subclass should not contain a
characteristic that was not inherited from its parent class. When you apply this logic many of the
hierarchical classifications used in procurement and spend analysis such as UNSPSC, eCLass, CPV or NIGP
start to break down.
The UNSPSC was designed as a standard spend analysis classification specifically for the purchasing card
industry. The plan was to encourage merchants to adopt Level III credit card processing in which each
line item has a description and a supplier assigned UNSPSC commodity classification. The objective was
to be able to provide better accounting for what was being bought with a company purchasing card and
to actually decline the card at the point of sale when the corporate accounts department flagged a
UNSPSC codes as decline for a specific individual or group of individuals. In theory, at least this would
solve the problem of the high number of refusals caused by using the much more generic merchant
classification (which is what is still used today). Despite the merchants unwillingness to pay the extra
cost required to implement Level III credit card transactions, the concept was flawed in that it relied on
the seller classifying what they sold. Over and above the challenge of suppliers using different versions
of the classification, we quickly found that suppliers wanted to classify what they were selling in as many
UNSPSC classifications as possible, even going to the extent of giving the same product many different
names and bar codes in order to be listed under the different classifications, if only one was allowed per
product. We also found that the codes used by sellers to classify their products were not the code the
buyers wanted the item to be classified under. Largely abandoned for the purpose of managing
purchasing card transactions, the UNSPSC then became used by suppliers as a catalog classification.
Luckily, just as we were developing the UNSPSC and others were developing yet more material or
service classifications, high speed and high relevance text search was coming into its own. Today, these
third party managed classifications are of limited value and most companies realize that they need to
manage multiple classifications, not only to satisfy their customers, but their own internal requirements.
More important, a classification can never replace a good description and buyers now realize that they
need to obtain the characteristic data from their suppliers so that they can name, describe and classify
items in whatever manner suits their internal operational requirement.

Page 18

Managing a Data Cleansing Process for Materials or Services


Rendering names and descriptions
Descriptions are created (rendered) from both the characteristic and identification data in the
structured master data record.

The difference between names and descriptions that were manually entered versus automated
rendered can be observed in the consistency of the terminology and the formatting of the names and
descriptions. It is possible for manually entered data to be consistent particularly in a small well
disciplined group with minimal turnover but this is rarely the case and a quick glance at most material
master data will identify inconsistencies in the use of terminology, as well as, in the formatting of names
and descriptions. It is very expensive to build a search engine that is tolerant of a lack of consistency in
names and descriptions, a simple space of change in a character can cause items to be missed. To a
computer ORING, O RING, O-RING , ORING and O/RING are very different.
Automating the process of creating names and descriptions through the application of rendering guides
not only creates consistency in the use of terminology and formats but also consistency across the
names and descriptions of all the items that belong to the same class. Automating this process also
allows changes to be made to a large number of items very quickly.
Item names and descriptions need not be static, in fact the ability to change them, as required is one of
the more useful features of an item master. What does not change is the material number.
Item names and descriptions must be useful and both requirements and rendering preferences change
over time so it is important to be able to respond to requests to change item names or descriptions. It is
important to create a culture and a process where users can recommend changes and be confident that
they will be acted upon, or as an alternative they will find a work around, typically adding another item
record, same item but with the name or description they asked for. The following is a work flow for
resolving issues that arise when a name or description is not acceptable.

Page 19

Managing a Data Cleansing Process for Materials or Services


New name and
description

Name or description
is not acceptable

All the required data in


the description?

Yes

No

Required data in the


master data record?

Yes

Change the
Description Rules

Yes

Add data to master


data record

Yes

Add property or
coded value to data
requirement

No

Data requirement
includes the property
or coded value?

No

Property or
coded value in the
dictionary

No

Add property or
coded value to
dictionary

Page 20

Create new name or


description

Managing a Data Cleansing Process for Materials or Services


Data cleansing workbook:
The data requirement (cataloging template) is the data quality standard that drives the data cleansing
process, the terminology used in the data requirements must be defined in a dictionary and finally there
are the rendering guides. In addition to these three resources you will need an organization lookup table
that list the names of organizations that are used in the identification data to form the item references,
as well as, in both the characteristic and identification data to identify the source of the data
(provenance). While it is possible to include organization data in the dictionary, it is easier to manage
this data separately in an organization table.
The following are the tables that you will need in your data cleansing workbook;
1.
2.
3.
4.

Data Requirements
Dictionary
Rendering Guides
Organization identifiers

Data Cleansing
Workbook .xlsx

Available from www.ECCMA.org

The scoping study


A scoping study is a critical part of any data cleansing project. The objective of the study is to provide an
analysis sufficient to define the quantity and quality of the source data, as well as, a framework for
measuring the quantity and quality of the data to be delivered and the anticipated level of effort
required to perform the data cleansing task. Undertaking a data cleansing project without a scoping
study is like building a house without a plan. It can be done, but the result is rarely predictable even if
the frustration is.

Page 21

Managing a Data Cleansing Process for Materials or Services


The source data for the scoping study is the material or service master as well as the vendor master and
preferably three years purchase order line item detail.
The purchase order data is used to identify the active materials and vendors and to provide an estimate
of the level of duplication. The analysis will also identify the all important free text transactions. These
are items where the material or service master reference is missing.
The next step in the analysis is the identification of key vendors. The level of effort required to obtain
the characteristic data from which the new descriptions will be created will depend on the level
cooperation from the suppliers and to a large degree this will depend on the nature of the relationship.
Everything else being equal, the more money you spend with a vendor the more likely they are to pay
attention to your request for data, but how you ask and what you ask for, can also make a huge
difference. The nicer you are and the more specific the request the better.
The next step is to group materials into data cleansing strategies and priorities. The groups on the right
of the matrix, where there are a high number of suppliers, are the groups where vendor rationalization
will yield the most return with group six being the top priority. The groups on the left of the matrix are
high risk categories with group five being the top risk group. Whenever there are a low number of
suppliers the emphasis should be on contracts as well as monitoring the financial well being of your
suppliers, conversely when the numbers of suppliers are high, close monitoring of market trends is the
best strategy.

Page 22

Managing a Data Cleansing Process for Materials or Services


Outsourced data cleansing
The data cleansing industry typically categorizes material master and service master data in accordance
with one of the following quality levels. The purpose of the data cleansing process is to move data from
Level 1 to Level 4.

Level 1

The item is identified in terms of its source of supply and has a


reference number sufficient to successfully place an order for
the item from that source.

Level 2

The item is identified and has been assigned a class sufficient to


allow the classification of the item for spend analysis.

Level 3

The item is identified and partially described some of the


properties specified in the data requirement have been
provided. The data is useful for some of the business functions.

Level 4

The item is identified and all mandatory properties in the data


requirement have been provided, it can be competitively
sourced based solely on its description and the data meets the
requirements needed to support all known business functions.

Phase 1: taking data from level 1 to level 2


Source Data Extraction
Before the data cleansing process can begin data must be extracted from the source systems. These are
typically ERP systems but they can also be specialized procurement, production, inventory or
maintenance planning systems, basically anywhere there is a name or a description of an item.
The data entering the process should be at a minimum Level I data, if there is insufficient data to
purchase the item from a known source, then it is very unlikely that the data cleansing process will be
able to improve the description.

Page 23

Managing a Data Cleansing Process for Materials or Services


The reference number specified in Level I can however be very source specific as in Joe, can you send
me 100 boxes of those small screws we buy from you?. As long as Joe understands what you mean by
small screw it is considered a valid Level I reference, at least in the context of ordering screws for Joe.
In some instances where the item is a true commodity that can be obtained from many sources the
reference may be a standard or a published specification such as, a military item specification (MILSPEC).
Typically, the data cleansing service provider will supply a data extraction template and a good service
provider will spend some time getting to know your system to ensure that whatever data you have that
may be useful to the process is not overlooked.
Many data cleansing service providers will ask for the vendor master and this is actually a good sign as
this indicates that they will probably be looking to see if they can find some of the item descriptions in
your suppliers electronic catalogs. You should consider setting ground rules regarding communicating
with your suppliers.
Before the data cleaning work begins, a good service provider will look to see if there are any flags in the
master data that indicate obsolete items. A good service provider will also ask for a twelve or twenty
four month extract of your purchase order transaction file and they will use this to identify high value
and frequently purchased items as these need to be prioritized.
Reference Data Extraction
Typically, descriptions contain manufacturer or vendor names, part numbers or other reference data
such as references to standards or drawings. This process analyzes the descriptions and extracts
potential reference data.
If the original Level I description was HOSE, 1/4" X 250 FT. GRAINGER 5W553 the extracted reference
data would be GRAINGER 5W553. At this stage there is no way of knowing if it is a supplier or a
manufacturer reference.
Potential Duplicate Identification Based on Reference Data
This is largely an automated process that does not yield significant number of duplicates but a good
service provider will include it in the file preparation work. This process includes de-duplication where
items that have identical descriptions and the identification of those items that have similar reference
data. The items with similar reference data are marked as potential duplicates and they are either
reviewed by the customer or by one of the service provider domain experts. At this stage what is looked
for are obvious duplicates, so duplicate identification is based on very close matching of reference data.
Class Assignment
This is a process in which descriptions are analyzed and the item is assigned a class from the eOTD. This
will bring line items to Level 2 quality data. The assignment of a class is based on the original description
and is not definitive. The class may be modified when the data required by the data requirement is
extracted, the item is researched or the item is physically inspected during walk down.

Page 24

Managing a Data Cleansing Process for Materials or Services


Class assignment is a direct replacement for the older UNSPSC classification; the process is similar and
largely automated but considerably more accurate and reliable. Assigning the UNSPSC or other
commercial classifications (CPV, eClass, NIGP) to an item once an eOTD class has been assigned is simply
a matter of applying a table look-up and it is a completely automated process.
If the original Level 1 description was HOSE, 1/4" X 250 FT. GRAINGER 5W553, searching for the class
concept of HOSE would have resulted in the eOTD class concept 0161-1#01-087529#1 associated with
the term HOSE and the definition A flexible pipe, of rubber, plastic, etc. may be reinforced, designed
to convey liquid, air, gas, etc.

Level 2 analysis
Potential Duplicate Identification Based on Class and Reference Data
Under this process, items are grouped by class and the combination of the class and the reference data
is used to identify potential duplicates. In NATO this is referred to as the SCREENING process. Partial
reference data matching within a class is an efficient and reliable way to identify potential duplicates.
The items identified, as a result of the process, are marked as potential duplicates and a report is
generated. Duplicate resolution itself is a separate process that requires physical verification followed by
resolution in the master data and procurement records. The potential duplicate based on class and
reference data report is typically one of the first indicators of the benefits that can be expected from a
data cleansing project.

The combination of the report with unit price and minimum stock levels will provide a reliable
indication of the savings that can be expected from an inventory rationalization project.
The combination of the report with the Purchase Order transaction file will provide a reliable
indication of the savings that can be expected from a vendor rationalization project.

Spend Analysis Classification Mapping


As we saw under classifications, most companies develop and maintain a number of spend
classifications for example a spend classification that rolls up to the chart of accounts and another spend
classification that groups items by procurement specialty. If these classifications were created as
groupings of the root class taken from the eOTD then the assigned eOTD class can be automatically
mapped to other classifications as the eOTD class is mapped to the UNSPSC, the CPV and several other
commercial classifications making adding third party classifications an automated process.
For example the eOTD concept of 0161-1#01-087529#1 Hose would be classified as 40.14.20.00
Hoses in the UNSPSC version UNv120901 or 37-11-01-90 Hose (w/o conn., unspecified) in eClass
version 8.0 or 44165100 Hoses in CPV-2007 or 460-00 HOSE, ACCESSORIES, AND SUPPLIES:
INDUSTRIAL, COMMERCIAL, AND GARDEN in NIGP.

Page 25

Managing a Data Cleansing Process for Materials or Services


Phase 2: taking data from level 2 to level 4
Develop Data Requirements
This is the most critical part of the data cleansing process and by far the most important. Many data
cleansing companies differentiate themselves by their domain expertise but you need to own and
control the data requirements used to clean your data. The development and validation of the data
requirements used to clean your data represent one of your major investments in the data cleansing
process.
These data requirements will be used to verify the quality of your master data, to maintain your master
data going forward and will be the key component in your ability to request the data you need from
your suppliers. It cannot be stressed too highly that a data cleaning process that does not provide you
with access to these data requirements is to be avoided. You can ask that your data requirements are
either registered and published in the ECCMA Data Requirement Registry (eDRR) or are given to you in a
form that you can use with any commercial off the shelf cataloging or data cleaning software program.
The preferred format is eOTD-i-xml, this is an ISO 22745-30 compliant format.
If you have decided to work with a company that uses proprietary data requirements you must
negotiate a license to use these data requirements after the data cleaning project is completed, as they
are an integral part of your master data.
The following is an example of a data requirement that includes a data type and specified units of
measure.
Class

HOSE

0161-1#01-087529#1

INTERIOR DIAMETER

numeric, mm

mandatory

Length

numeric, m

mandatory

MATERIAL

text

optional

COLOR

text

optional

The development of the data requirements will determine the cost of the data cleansing project. While
there is a cost associated with poor quality master data that does not support the needs of a business, it
is also possible to over specify data requirements and as a result, spend more than is necessary on data
cleansing.
As we saw earlier data requirements change over time and according to need. The best way to deal with
this is to work to satisfy the most obvious and well known of your current data requirements and accept
that as new requirements are identified, some of the descriptions will need to be reworked.
It is better to start with simple data requirements as these will not only lower the cost of the data
cleaning project but you will find it much easier to keep the project on track.

Page 26

Managing a Data Cleansing Process for Materials or Services


Value Extraction
This process consists of analyzing the original item descriptions using the data requirements as a guide
and extracting properties and their associated values. Value extraction is considered complete when all
mandatory properties and their values as specified in the data requirements are populated.
Value extraction is a semi-automated process; it requires a high degree of domain and technical
expertise, as well as, quality control. Despite what some service providers claim, you cannot extract
data that is not there, but of course you can research missing data. This is a much more expensive
proposition.
If you are considering cleansing your master data it is because the existing data is incomplete or
unreliable so it follows that relying on the data extracted from these descriptions may not be a very
good idea. Extracting data can serve a purpose as even if it is incorrect it can be useful in the validation
process because tests have shown that in responding to request for data, respondents are much more
willing to correct errors then they are to fill in a blank field. The bottom line is that extracted data must
always be validated.
An additional task often associated with data extraction is value standardization. Value standardization
consists of establishing preferred units of measure and converting all values to these units of measure.
Creating consistent metric units by setting the position of the decimal is useful and without risk.
Conversions between metric and imperial measurements is a risky business and best practice is to
clearly indicate that a value is the result of a conversion.
Cataloging at Source (C@S)
Cataloging at Source (C@S) is a process that was developed and extensively tested by NATO as a
replacement for the traditional military cataloging method. It is at the heart of the development of ISO
22745 as an international standard for cataloging and ISO 8000 the international standard for data
quality.
The traditional method of cataloging in the military was for the buyers to request that the suppliers or
manufacturers provide technical specifications and drawings that were then used by military cataloguers
to create a NATO Stock Number (NSN) record, essentially the military equivalent of your master data
record.

Page 27

Managing a Data Cleansing Process for Materials or Services

NSN
Item of Supply
Segment A
Identification Guide, Item name

Characteristic data
Fit, Form, Function
Segment V (coded)
Segment M (clear text)

Packaging data
Segment W

Identification data
Item of production
Segment C

Material Management data


Segment H

NCAGE

Part Number

Name
Address

Beyond manufacturer resistance and supplier inability to provide what consisted of unspecified data,
the cost of extracting the data from source documents was prohibitive. Cataloging at source took a
different approach in specifying exactly what data was needed. The process was extensively tested and
it demonstrated a substantial improvement in the quality of data that was provided, the speed the data
was provided and as a result it lowered the cost of cataloging (by 75%!!).
In March 2011, this resulted with the inclusion of the following clause in the standard that specifies the
information exchange requirements for most material management functions commonly performed in
supporting international projects.
The Contractor shall supply identification and characteristic data in accordance with ISO 8000110:2009 on any of the selected items covered in his contract. Following an initial codification request
as specified in section 3.2, the NATO Codification Bureau (NCB) shall present a list of the required
properties in accordance with the US Federal Item Identification Guides (The US Federal Item
identification guides are data requirements)
The process also demonstrated that suppliers and manufacturers welcomed the change, as for the first
time, they were given visibility of exactly what data their customer wanted or needed and the preferred
being asked for data as opposed to the alternative, where they had no visibility of what data was being
collected or from where it was being obtained.
ISO 22745 was developed to support the cataloging at source process and to create what has become
known as the data supply chain, as illustrated in the following diagram.

Page 28

Managing a Data Cleansing Process for Materials or Services

Creating and managing a data supply chain is the single most important development in data cleansing.
It is a recognition that the characteristic data essential in creating a structured master data record
originates from outside the organization. Cataloging at source, has to a large degree, replaced the data
extraction and research function performed by contractors and it is the largest single contributor to
reducing the cost of cataloging.
If your service provider is using automated web search tools, such as, web robots also known as web
wanderers, crawlers, or spiders you should require a written confirmation that they are doing so
ethically and legally. They should have a written policy in which they expressly agree to adhere to the
robot exclusion rules defined in the robot.txt file on the target web site and will respect the rules
governing the use of a third partys web site. These automated programs are used by search engines,
such as Google, Yahoo and Bing to index web content. Unfortunately, spammers also use them to scan
for email addresses and many companies use them to illegally obtain data, this is not only frowned upon
but it can be illegal and can be considered industrial espionage. If these automated search agents are
not managed properly they can also seriously disrupt the operation of a third partys website.
Remember, the data cleansing company is working for you and they are conducting research as your
agent, so you do care about how they do their work.
The following is a work flow that details the cataloging at source process.

Page 29

Managing a Data Cleansing Process for Materials or Services


Cataloging at source work flow
Service or Material Item
Record in ISO 8000-120
Master Data warehouse

Data is sufficient to
order the item from a
known supplier

No

Contact buyer to obtain data


sufficient to order the item from a
known supplier

No

Contact supplier to obtain


Technical Point of Contact email

No

Contact buyer or supplier or


conduct on-line research to
determine class

No

Create Data Requirement

No

Send Technical Point of Contact


email request for data with URL to
on-line form fill

Yes

Supplier
Master Data contains
Technical Point of
Contact email

Yes

Description
sufficient
to assign a class

Yes

Data requirement
exists in Registry

Yes

Supplier has
ISO 22745 catalog

Yes

Send Supplier ISO 22745


request for data

Reply received

No

Yes

Add data to ISO 8000-120 Master


Data warehouse

Page 30

On-line research or
data extraction

Managing a Data Cleansing Process for Materials or Services


Leveraging level 4 data
Generate Standardized Descriptions
One of the major benefits of level 4 data is the ability to automatically and programmatically generate
descriptions. Descriptions can be generated in any of the languages supported by the dictionary.
Descriptions are generated using a rendering guide that specifies the data elements to be included, as
well as, the order. The rendering guide will also specify the overall length of the description and where
abbreviations should be used. The software required to auto generate descriptions can be very
sophisticated and both the item name and all the descriptions should be rendered from the structured
master data record.
The major advantage of this process is the dynamic nature of descriptions. If the data requirement for a
class changes and a new property is introduced, new descriptions can easily be generated for all the
items in the class.
Potential Duplicate Identification Based on Characteristic Data
This process combines the reference data, the class and the characteristic data to perform a
sophisticated analysis that results in a probability analysis. Used by experienced domain experts this
process can be extremely efficient at identifying potential duplicates with a very high degree of
confidence.
Competitive Sourcing
Competitive sourcing is one of the primary purposes of cleansing data. The better the specification, the
higher the response to your Request for Quote (RFQ) and the easier it is to analyze the replies. Many
suppliers will not respond to an incomplete technical specification because they know that even if they
get the order there is a high probability that the item they supply will be incorrect and be returned.
Automating the generation and analysis of RFQ is relatively straight forward, in fact it was one of the
very first systems I designed. It was called Jade (I cannot remember why) and it was driven by a very
primitive supplier master and item master. The system generated detailed RFQs which were sent out
first by mail and then by telex through a special network. This was in the days before email when in the
UK it was illegal to connect a fax machine to the British Telecom network, you could only lease the
equipment which required a dedicated line at a combined cost of $250 per month! Of course BT also
owned all the telephones and all the answering machines. BT would never have given up on their
goldmine if it had not been for massive civil disobedience. I was actually fined and threaten with
permanent disconnection for plugging an answering machine purchased in the US into the BT network,
luckily times have changed.
Physical Verification
Physical verification is not typically quoted as part of a standard data cleansing process but it is
recommended on many items identified as potential duplicates. If it is undertaken it is common to
include a physical stock check and photographs of the items. Although it is obvious, it is good practice to
include a ruler in the picture for scale (surprisingly many contractors forget this).

Page 31

Managing a Data Cleansing Process for Materials or Services


Potential Duplicate Resolution
Great care needs to be taken in resolving potential duplicates. The first step is to determine that the
items are truly duplicates. Even an identical manufacture part number cannot be relied upon as
conclusive proof of duplication. Part numbers are very useful search strings and in many instances they
have become recognizable brands, so manufacturers and suppliers can maintain the part numbers even
when they make changes to the materials or components. This often results in changes in the fit, form
or function. Physical verification of the items to determine that they are true duplicates is highly
recommended but it is important to also keep in mind that a common duplication problem is
counterfeiting, where the two items share the same visible physical characteristics but may be
substantially different in terms of their performance characteristics.
Resolution of duplication consists of selecting one or more items to be marked as deprecated in the item
master. This means that the item number is no longer to be used but it is not deleted, as deleting an
item from the master data would make it impossible to report on the historical records. By deprecating
the item number, it should no longer be available for requisition but there may still be open purchase
orders and these should be given time to work through the system.
If the item is inventoried, then the physical inventory should be consolidated as soon as the duplication
has been confirmed. It is a good idea to leave the empty bin in place with a note including the new item
number and location. Dating the note will allow the bin to be safely reused when needed.
The savings attributed to duplicate identification and resolution is typically measured by the reduction in
inventory of the highest priced item plus the annual savings. The lower the inventory turn and the
higher the price differential, is the greater the savings. While the savings, due to a reduction in
inventory, would normally be shown as a balance sheet item, it is not uncommon for the excess
inventory to be consumed. This reduces expenditure to below normal in the short term so it is very
important to be aware that expenditure will recover once the excess inventory is absorbed .

The growth of commodities


Commodities were traditionally materials of uniform quality defined by a standard that can be
referenced in the contract and produced in large quantities by many different producers. In order to be
traded as a commodity, the compliance of the item along with the standard, needs to be capable of
being independently verified. One of the benefits of standards is commoditization, from the buyers
perspective this increases competition and from the suppliers perspective, increases market size. While
the number of commodities traded on the commodities markets has grown, this growth has been
eclipsed by the growth in the commoditization of intangibles in the form of financial instruments such
as, derivatives. The lesson learned from the commoditization of intangible assets is the critical nature
played by identifiers and the associated characteristic data. These lessons apply to the commoditization
of services.

Page 32

Managing a Data Cleansing Process for Materials or Services


Materials vs. Services
In many companies the expenditure on materials is decreasing and the expenditure on services is
increasing, this is a reflection of a growing sophistication in the supply chain which needs to be matched
by an increase in the ability to reliably contract for services. A service can be described using the same
process used to describe tangible materials; a service will be assigned a service number and will be
assigned a class and described using characteristic data. Contracting for an intangible service relies just
as much on the specification as does contracting for any tangible item. The difference is that the
characteristics of a tangible item are typically its physical or performance characteristics while the
characteristic that describes a service are typically its tangible output described as the deliverables.
Best practice is to avoid including materials with services when the material have their own material
number. This can be a challenge when contracting for maintenance services, which include the
replacement of material, such as motors and valves. Reconciling purchase orders can be more difficult
but without this effort, spend analysis can be very frustrating.

Contracting for data cleansing services


The quality and consistency of data cleansing services continues to improve to a large degree because
the quality of the delivered data can now be objectively and independently measured as the degree to
which the data meets the data requirements. All master data cleansing should include or be preceded
by a scoping study that defines the number of records to be cleansed, the number of item classes, the
data requirement by class and the priority by item class, where appropriate. Throughout the data
cleansing project the customer should actively monitor the dictionary, the data requirements and the
description rules which should be clearly defined as deliverables. Changes in the data requirements
during the project should be documented in change orders as they will impact project cost and
duration.

Page 33

Managing a Data Cleansing Process for Materials or Services


The following is a recommended specification of the deliverables that should be included in a master
data cleansing contract:
The master data delivered pursuant to this contract shall be ISO 8000-120 compliant:
1. The master data shall be provided in ISO 22745-40 compliant Extensible Markup Language
(xml).
2. The provenance of the property values shall be identified in accordance with ISO 8000-120
using an ECCMA Global Organization Registry (eGOR) identifier to identify the source of the
data and shall be dated with the date the data was obtained.
3. Identification data (for example part numbers, drawing numbers, standard specifications)
shall be in the form of a reference where the organization that issued the identifier shall
itself be identified using an ECCMA Global Organization Registry (eGOR) identifier.
4. The master data shall comply with agreed data requirements that shall be delivered in xml
in compliance with ISO 22745-30 or registered in the ECCMA Data Requirements Registry
(eDRR).
5. Property values that are rendered from other property values (for example rendered
descriptions) shall identify the rules used in rendering and the rules shall be stated in
conformance with ISO 22745-45.
6. If a classification is provided, all the characteristics used in assigning the classification must
be included in the characteristic data.
7. The master data, the data requirements and the description rules shall be encoded using
concept and terminology identifiers from the ECCMA Open Technical Dictionary (eOTD), an
ISO 22745 compliant open technical dictionary that supports free identifier resolution.
For the avoidance of doubt the following data must be provided in an application neutral format
without the inclusion of proprietary tags:
1. The dictionary (including all classes, attributes, units of measure and coded values with any
and all terminology necessary to render descriptions)
2. The data requirements (cataloging templates)
3. The description rendering rules
Statement of intellectual property: Contractor hereby warrants that data delivered pursuant to
this contract is free from any and all claims to intellectual property be it in the form of copyright,
patent or trade secret that would restrict customer from using or redistributing the delivered
data.

Page 34

You might also like