You are on page 1of 7

6 Dimensions of Data Quality

Overview:
To be successful in business, you need to make decisionsfast and based on the
right information.
While a business intelligence system makes it much simpler to analyze and report
on the data loaded into a data warehouse system, the existence of data alone
does not ensure that executives make decisions smoothly; the quality of the
data is equally as important.
Consider a high-level meeting to review company performance: if you learn that
tworeports compiled from supposedly the same set of data reflect two different
revenuefigures, no one can know which figures are accurate, which could cause
important decisions to be postponed while the truth is investigated.
One of the causes of data quality issues is in source data that data sources can
havescattered or misplaced values, outdated and duplicate records, and
inconsistent (or undefined) data standards and formats across customers,
products, transactions, financials and more.But perhaps the largest contributor
to data quality issues is that the data are being entered, edited, maintained,
manipulated and reported on by people.
On the surface, it is obvious that data quality is aboutcleaning up baddata
data that are missing, incorrect or invalid in some way. But in order to ensure
data are trustworthy, it is important to understand the key dimensions of data
quality to assess how the data are bad in the first place.

DAMA-UK-DQ-Dime
nsions-White-Paper-R37.pdf

Completeness:Completeness is defined as expected comprehensiveness. Data


can be complete even if optionaldata is missing. As long as the datameets the
expectations then the data is consideredcomplete.
For example, a customers first name and lastname are mandatory but middle
name is optional; so a record can be considered complete even if a middle name
is not available.
Questions you can ask yourself: Is all the requisite information available? Do any
data values have missing elements? Or are they in an unusable state?
OR
Completeness: Is all the requisite information available? Are data values
missing, or in an unusable state? In some cases, missing data is irrelevant, but
when the information that is missing is critical to a specific business process,
completeness becomes an issue.
Consistency: Consistency means dataacross all systemsreflects the same
information and are in synch with each other across the enterprise. Examples:
A business unit status is closed but there are sales for that business unit.

Employee status is terminated but pay status is active.


Questions you can ask yourself: Are data values the same across the data sets? Are
there any distinct occurrences of the same data instances that provide conflicting
information?
Conformity: Conformity means the data is following the set of
standarddatadefinitions like data type, size and format. For example, date of
birth of customer is in the format mm/dd/yyyy
Questions you can ask yourself: Do data values comply with the specified formats?
If so, do all the data values comply with those formats?
Maintaining conformance to specific formats is important.
Accuracy: Accuracy is the degree to which data correctlyreflects the realworld
object OR an event being described. Examples:
Incorrectspellings of product or person names, addresses, and even untimely or
not current data can impact operational and analytical applications.
Sales of the business unit are the real value.
Address of an employee in the employee database is the real address.
Questions you can ask yourself: Do data objects accurately represent the real
world values they are expected to model? Are there incorrect spellings of product
or person names, addresses, and even untimely or not current data?
These issues can impact operational and analytical applications.
Duplication: Are there multiple, unnecessary representations of the same data
objects within your data set? The inability to maintain a single representation for
each entity across your systems poses numerous vulnerabilities and risks.
Integrity: Integrity means validity of data across the relationships and ensures
that all data in a database can be traced and connected to other data.
For example, in a customer database, there should be a valid customer, addresses
and relationship between them. If there is an address relationship data without a
customer then that data is not valid and is considered an orphaned record.
Ask yourself: Is there are any data missing important relationship linkages?
The inability to link related records together may actually introduce duplication
across your systems.
Timeliness: Timeliness references whether information is available when it is
expected and needed. Timeliness of data is very important. This is reflected in:
Companies that are required to publish their quarterly results within a given frame
of time
Customer service providing up-to date information to the customers
Credit system checking in real-time on the credit card account activity

The timeliness depends on user expectation. Online availability of data could be


required for room allocation system in hospitality, but nightly data could be
perfectly acceptable for a billing system.
----------------------------------------------------------------------IDQ Functionality:
IDQ-Informatica Data Quality is used for Data Quality Analysis perspective provides
subset of data, attributes details which gives what is wrong and what is being used.
It has capability to generate various levels of reports/graphs based on data. Used
for understanding, acting and reporting.
Use the IDQ to design and run processes to complete the following tasks:
Profile data: Profiling reveals the content and structure of data.
Profiling is a key step in any data project, as it can identify strengths and
weaknesses in data and help you define a project plan.
Create scorecards to review data quality: A scorecard is a graphical
representation of the quality measurements in a profile.
Standardize data values: Standardize data to remove errors and inconsistencies
that you find when you run a profile. You can standardize variations in punctuation,
formatting, and spelling. For example, you can ensure that the city, state, and ZIP
code values are consistent.
Parse data: Parsing reads a field composed of multiple values and creates a field
for each value according to the type of information it contains. Parsing can also add
information to records. For example, you can define a parsing operation to add units
of measurement to product data.
Validate postal addresses: Address validation evaluates and enhances the
accuracy and deliverability of postal address data. Address validation corrects errors
in addresses and completes partial addresses by comparing address records against
address reference data from national postal carriers. Address validation can also
add postal information that speeds mail delivery and reduces mail costs.
Find duplicate records: Duplicate analysis calculates the degrees of similarity
between records by comparing data from one or more fields in each record. You
select the fields to be analyzed, and you select the comparison strategies to apply
to the data. The Developer tool enables two types of duplicate analysis: field
matching, which identifies similar or duplicate records, and identity matching, which
identifies similar or duplicate identities in record data.
Create reference data tables: Informatica provides reference data that can
enhance several types of data quality process, including standardization and
parsing. You can create reference tables using data from profile results.

Create and run data quality rules: Informatica provides rules that you can run or
edit to meet your project objectives. You can create mapplets and validate them as
rules in the Developer tool. Collaborate with Informatica users: The Model repository
stores reference data and rules, and this repository is available to users of the
Developer tool and Analyst tool. Users can collaborate on projects, and different
users can take ownership of objects at different stages of a project.

Export mappings to Power Center: You can export mappings to Power Center to
reuse the metadata for physical data integration or to create web services.

PROFILING:

dataquality-130923
144809-phpapp01.pptx

----------------------------------------------------------------------------------------------------------------------------------------IDQ Questions:

IDQ Pros and Cons


https://www.trustradius.com/products/informatica-data-quality/reviews
www.tekclasses.in for Demo

What are the most used transformations in IDQ?


What is address doctor?
Can we export an object from IDQ to Power center tool? If yes then how?
What is a reference table?
In IDQ, is possible to create user defined reference tables? In what circumstances
can they be required?
What is a parser transformation?
What is the functionality of labeler transformation?
How to export all object profile at once?
Does IDQ have an emailing system like power Center?

How can we publish IDQ SSR results on the Intranet/Web?


What type of IDQ plans can be exported as mapplets to Power center?
How to add in one time many physical data object
Is there a way we can parameterize Notifications Recipients list in Exception task
inside a Human Task?
Can we use Oracle tables as reference tables in IDQ?
What is address doctor?
How to check in Informatica Data Quality which fields in range are unique
What is the algorithm of tools such as Data Flux or Informatica to remove
duplicates?
How do you include you mapping output variables in the workflow without having to
use a human task?
----------------------------9999999999----------------------------Persistent cache in look up
Bulk vs normal load
Constraint base load ordering

You might also like