Professional Documents
Culture Documents
Page 2
Page 3
master data
data held by an organization that describes the entities that are both independent and fundamental
for that organization, and that it needs to reference in order to perform its transactions
EXAMPLE: A credit card transaction is related to two entities that are represented by master data. The
first is the issuing banks credit card account that is identified by the credit card number, where the
master data contains information required by the issuing bank about that specific account. The second is
the accepting banks merchant account that is identified by the merchant number, where the master
data contains information required by the accepting bank about that specific merchant.
NOTE 1 Master data typically includes records that describe customers, products, employees, materials,
suppliers, services, shareholders, facilities, equipment, and rules and regulations.
NOTE 2 The determination of what is considered master data depends on the viewpoint of the
organization.
NOTE 3 The term "entity" is used in the general sense, not as used in information modeling.
[ISO 8000-110]
Page 4
Data cleansing is the process of improving the quality of the names and descriptions of an item
typically in an ERP application.
The data cleansing process can be broken down into a two step process where in the first step the
original name and descriptions are deconstructed and then enriched to create a structured master data
record which in the second step is used to build new names and descriptions.
The process of building the structured master data record is called cataloging and the process used to
transform a structured master data record into descriptions is called rendering.
Both the cataloging and rendering processes are driven by rules. The rules for cataloging are contained
in data requirements (DR) also known as cataloging templates or identification guides, these are the
actual data quality standards. The rules for rendering are contained in rendering guides (RG) or
description rules.
When the process we know today as data cleansing was originally being developed, contractors would
perform the transformation from the original item names and descriptions to the new item names and
descriptions without providing their customers with copies of the cataloging templates, the structured
master data, or the rules used for creating the new names and descriptions. This is rarely the case today
as most customers realize that without the rules and the structured master data, they can never be
independent of the contractor.
Page 5
Cataloging
The first step in the journey to better descriptions is cataloging. This is the process of describing
something, anything, and it is indeed an ancient art. Aristotle was struggling with descriptions over
2,350 years ago when he wrote categories (http://classics.mit.edu/Aristotle/categories.html). Cataloging
identifies the discrete characteristic of something in the form of property-value pairs where the
property provides the meaning of the value. Examples of properties are height, weight and material.
Properties are used to represent characteristics and the first rule of cataloging is that the properties
must be explicitly defined; this is typically done using a dictionary. While properties are used to define
the meaning of values, it is often useful to further define values in terms of their data types, a date, a
measure, a numeric value, a text string; these are examples of data types.
Many items may be described using the same properties with different values. For example, you can
describe many individuals using the properties of name, date of birth, place of birth; the properties
remain the same only the values change. A group of items that can be described using the same
properties is called a Class. A class name is therefore nothing more than the name for a group of
properties, typically it will be used in naming the item and it will also be used in its descriptions.
Page 6
Page 7
Page 8
Items that share the same characteristics are substitutable and you can merge the records by grouping
all the identification data under a single material, however considerable care needs to be taken in
determining which items are duplicates or substitutable. The characteristic data for each items needs to
be considered carefully, as well as, the characteristic that you consider to be fundamental. Removing
characteristics will cause materials to become substitutable and adding characteristic will cause
materials that were substitutable to become different.
The skill in cataloging is to be able to use both the characteristic and the identification data
appropriately. If you are trying to buy one tire you should use the part number (identification data) but
if you are requesting quotes for hundreds of tires you should use the characteristic data.
Page 9
Page 10
The ISO 8000 defines quality master data as portable data that meets requirements.
This idea of standardized cataloging originated in 1950 with the development of the NATO cataloging
System (NCS) and is at the heart of all cataloging and data cleansing today. The principle is very simple
the master data must conform to a specified data requirement and both the master data and the data
requirements are coded using a common dictionary, this makes the master data portable.
Structured master data
The structured master data is the key to the data cleansing process; it is composed of identification data
and characteristic data. The identification data can be a manufacturers model number, a suppliers part
numbers or even a drawing number or a standards reference number. What is important to remember
is that identification data are third party identifiers controlled by third parties, so knowing who assigned
the identifier is as important as the identifier itself. An item identifier combined with identifier of the
organization that issued the item identifier is called an item reference.
Characteristic data is data that describes the physical characteristics or performance characteristics of
an item. NATO refers to this as the Fit, Form and Function of an item.
Characteristic data
Cataloging is the process of creating a structured master data record that conforms to a data
requirement.
Both identification and characteristic data are represented in the form of property-value pairs where the
property gives meaning to the value. The definitions of the properties, as well as, any other concepts
used in cataloging are contained in the dictionary.
Page 11
Property 11
Optional
Property 12
DR1
2013-05-21
VALVE,BALL
THREAD CLASS
BODY MATERIAL
PIPE SIZE
CONNECTION STYLE
DR2
2013-05-21
BEARING
TYPE
INSIDE DIAMETER
OUTSIDE DIAMETER
WIDTH
MAX PRESSURE
LOAD CAPACITY
SPEED RATING
SEALING METHOD
MANUFACTURER
REFERENCE
(PREFERRED)
NATO STOCK NUMBER
Page 12
MANUFACTURER
REFERENCE
(PREFERRED)
You can build a dictionary using a spreadsheet with columns for concept identifiers, term and definition.
ISO 22745-10 is the international standard for representing an open technical dictionary; it is a very
useful model for dictionaries used to render complex, defined length names and descriptions or
multilingual names and descriptions.
The following example dictionary was created as a subset of the ECCMA open technical dictionary. The
concept identifiers have been abbreviated by removing the leading 0161-1# and the trailing #1 which
are required constants when exchanging standard compliant concept identifiers. In the eOTD not all the
terms are in capitals or the definitions in mixed case, so these were converted to make the dictionary
look more attractive.
Every company must maintain its own dictionary, creating it as a subset of the eOTD simply makes the
task easier.
Page 13
Concept
type
Term
Abbrev
Definition
01-1142515
Class
BEARING
01-1145956
Class
VALVE,BALL
02-095207
Property
TYPE
02-014725
Property
BODY MATERIAL
02-005366
Property
INSIDE
DIAMETER
ID
02-006986
Property
OUTSIDE
DIAMETER
OD
02-010188
Property
WIDTH
02-016927
Property
LOAD CAPACITY
LC
02-101753
Property
SPEED RATING
SR
02-019192
Property
SEALING
METHOD
02-128590
Property
MANUFACTURER
REFERENCE
(PREFERRED)
MR(P)
02-128594
Property
NATO STOCK
NUMBER
NSN
02-128591
Property
SUPPLIER
REFERENCE
SR
02-024128
Property
THREAD CLASS
A numeric-alpha designator indicating the pitchdiameter tolerance and the external or internal location
of the thread.
02-007268
Property
PIPE SIZE
02-024592
Property
CONNECTION
STYLE
02-093392
Property
MAX PRESSURE
05-003934
UOM
INCH
"
07-000255
VALUE
SS
Page 14
": "
", "
"="
"; "
RG1
RG2
DR1
DR1
DESCRIPTION TYPE
ITEM NAME
PURCHASE ORDER
ITEM CLASS
VALVE,BALL
VALVE,BALL
RULE
EXAMPLE
Page 15
Page 16
Externally
managed
Internally
managed
Some classifications can be derived from the item class and many third party classifications can be
automatically assigned using one of the many eOTD commercial classification lookup tables maintained
by ECCMA. In some cases the class is not sufficient to determine the classification and another property
must be used in conjunction with, or instead of, the class. An example of this is the customs tariff code
(HTS) where the material typically determines the classification.
If the classification changes these lookup tables or other classification rules will need to be reapplied to
update the classifications. For this reason it is not recommended to maintain classifications in a master
data record unless they are regularly used in search or reporting functions.
While I am the original author of the UNSPSC, responsible not only for its name and design rules but also
for the process used to create and maintain it, perhaps it is only fitting that I also recognize its weakness
and in doing so the weakness of all classifications. As the name implies, a classification is an organization
of classes. In a hierarchical classification, classes are grouped into super classes, themselves grouped
into super classes. These groups of classes are called the nodes of a classification and they are also
Page 17
Page 18
The difference between names and descriptions that were manually entered versus automated
rendered can be observed in the consistency of the terminology and the formatting of the names and
descriptions. It is possible for manually entered data to be consistent particularly in a small well
disciplined group with minimal turnover but this is rarely the case and a quick glance at most material
master data will identify inconsistencies in the use of terminology, as well as, in the formatting of names
and descriptions. It is very expensive to build a search engine that is tolerant of a lack of consistency in
names and descriptions, a simple space of change in a character can cause items to be missed. To a
computer ORING, O RING, O-RING , ORING and O/RING are very different.
Automating the process of creating names and descriptions through the application of rendering guides
not only creates consistency in the use of terminology and formats but also consistency across the
names and descriptions of all the items that belong to the same class. Automating this process also
allows changes to be made to a large number of items very quickly.
Item names and descriptions need not be static, in fact the ability to change them, as required is one of
the more useful features of an item master. What does not change is the material number.
Item names and descriptions must be useful and both requirements and rendering preferences change
over time so it is important to be able to respond to requests to change item names or descriptions. It is
important to create a culture and a process where users can recommend changes and be confident that
they will be acted upon, or as an alternative they will find a work around, typically adding another item
record, same item but with the name or description they asked for. The following is a work flow for
resolving issues that arise when a name or description is not acceptable.
Page 19
Name or description
is not acceptable
Yes
No
Yes
Change the
Description Rules
Yes
Yes
Add property or
coded value to data
requirement
No
Data requirement
includes the property
or coded value?
No
Property or
coded value in the
dictionary
No
Add property or
coded value to
dictionary
Page 20
Data Requirements
Dictionary
Rendering Guides
Organization identifiers
Data Cleansing
Workbook .xlsx
Page 21
Page 22
Level 1
Level 2
Level 3
Level 4
Page 23
Page 24
Level 2 analysis
Potential Duplicate Identification Based on Class and Reference Data
Under this process, items are grouped by class and the combination of the class and the reference data
is used to identify potential duplicates. In NATO this is referred to as the SCREENING process. Partial
reference data matching within a class is an efficient and reliable way to identify potential duplicates.
The items identified, as a result of the process, are marked as potential duplicates and a report is
generated. Duplicate resolution itself is a separate process that requires physical verification followed by
resolution in the master data and procurement records. The potential duplicate based on class and
reference data report is typically one of the first indicators of the benefits that can be expected from a
data cleansing project.
The combination of the report with unit price and minimum stock levels will provide a reliable
indication of the savings that can be expected from an inventory rationalization project.
The combination of the report with the Purchase Order transaction file will provide a reliable
indication of the savings that can be expected from a vendor rationalization project.
Page 25
HOSE
0161-1#01-087529#1
INTERIOR DIAMETER
numeric, mm
mandatory
Length
numeric, m
mandatory
MATERIAL
text
optional
COLOR
text
optional
The development of the data requirements will determine the cost of the data cleansing project. While
there is a cost associated with poor quality master data that does not support the needs of a business, it
is also possible to over specify data requirements and as a result, spend more than is necessary on data
cleansing.
As we saw earlier data requirements change over time and according to need. The best way to deal with
this is to work to satisfy the most obvious and well known of your current data requirements and accept
that as new requirements are identified, some of the descriptions will need to be reworked.
It is better to start with simple data requirements as these will not only lower the cost of the data
cleaning project but you will find it much easier to keep the project on track.
Page 26
Page 27
NSN
Item of Supply
Segment A
Identification Guide, Item name
Characteristic data
Fit, Form, Function
Segment V (coded)
Segment M (clear text)
Packaging data
Segment W
Identification data
Item of production
Segment C
NCAGE
Part Number
Name
Address
Beyond manufacturer resistance and supplier inability to provide what consisted of unspecified data,
the cost of extracting the data from source documents was prohibitive. Cataloging at source took a
different approach in specifying exactly what data was needed. The process was extensively tested and
it demonstrated a substantial improvement in the quality of data that was provided, the speed the data
was provided and as a result it lowered the cost of cataloging (by 75%!!).
In March 2011, this resulted with the inclusion of the following clause in the standard that specifies the
information exchange requirements for most material management functions commonly performed in
supporting international projects.
The Contractor shall supply identification and characteristic data in accordance with ISO 8000110:2009 on any of the selected items covered in his contract. Following an initial codification request
as specified in section 3.2, the NATO Codification Bureau (NCB) shall present a list of the required
properties in accordance with the US Federal Item Identification Guides (The US Federal Item
identification guides are data requirements)
The process also demonstrated that suppliers and manufacturers welcomed the change, as for the first
time, they were given visibility of exactly what data their customer wanted or needed and the preferred
being asked for data as opposed to the alternative, where they had no visibility of what data was being
collected or from where it was being obtained.
ISO 22745 was developed to support the cataloging at source process and to create what has become
known as the data supply chain, as illustrated in the following diagram.
Page 28
Creating and managing a data supply chain is the single most important development in data cleansing.
It is a recognition that the characteristic data essential in creating a structured master data record
originates from outside the organization. Cataloging at source, has to a large degree, replaced the data
extraction and research function performed by contractors and it is the largest single contributor to
reducing the cost of cataloging.
If your service provider is using automated web search tools, such as, web robots also known as web
wanderers, crawlers, or spiders you should require a written confirmation that they are doing so
ethically and legally. They should have a written policy in which they expressly agree to adhere to the
robot exclusion rules defined in the robot.txt file on the target web site and will respect the rules
governing the use of a third partys web site. These automated programs are used by search engines,
such as Google, Yahoo and Bing to index web content. Unfortunately, spammers also use them to scan
for email addresses and many companies use them to illegally obtain data, this is not only frowned upon
but it can be illegal and can be considered industrial espionage. If these automated search agents are
not managed properly they can also seriously disrupt the operation of a third partys website.
Remember, the data cleansing company is working for you and they are conducting research as your
agent, so you do care about how they do their work.
The following is a work flow that details the cataloging at source process.
Page 29
Data is sufficient to
order the item from a
known supplier
No
No
No
No
No
Yes
Supplier
Master Data contains
Technical Point of
Contact email
Yes
Description
sufficient
to assign a class
Yes
Data requirement
exists in Registry
Yes
Supplier has
ISO 22745 catalog
Yes
Reply received
No
Yes
Page 30
On-line research or
data extraction
Page 31
Page 32
Page 33
Page 34