You are on page 1of 16

SAP Data Services Transforms

Hi Everyone,

SAP Data services is one of the finest ETL(Extract, Transform, Load) tools which delivers a
single enterprise-class solution for data integration, data quality, data profiling, and text
data processing that allows you to integrate, transform, improve, and deliver trusted data to
critical business processes.

SAP Data services Transforms are built-in system objects stored in repository, which are
used whenever we want to transform data from source(s) to target(s).

The transforms can be found under Transforms tab of our Local object Library – which
provides access to all repository objects (in-built or user built).

The transforms are majorly classified into four categories as below. Expanding each type we
can see the list of transforms present in each category.

1. Data Integrator
2. Data Quality
3. Platform
4. Text Data Processing
1. Data Integrator Transforms:

The list of data integrator transforms present are:

 Data transfer
 Date Generation
 Effective_Date
 Hierarchy_Flattening
 History_Preserving
 Key_Generation
 Map_CDC_Operation
 Pivot (Columns to Rows)
 Reverse Pivot (Rows to Columns)
 Table_Comparison
 XML_Pipeline

Lets look into the detailed description of each of the transforms present under Data
integrator category.

Data Transfer:

 This transform writes the data from a source or the output from another transform
into a transfer object and subsequently reads data from the transfer object.
 The transfer type can be a relational database table or file.
 We can use the Data_Transfer transform to push down operations to the database
server when the transfer type is a database table.
 We can also can push down resource-consuming operations such as joins, GROUP
BY, and sorts using this transform.
 Please go through the article ‘Data Transfer transform in SAP Data Services’ to know
more details about this transform.

Date Generation:

 This transform produces a series of dates incremented as we specify in the transform


settings.
 We use this transform to produce the key values for a time dimension target.
 From this generated sequence we can also populate other fields in the time
dimension (such as day_of_week) using functions in a query transform.
 Please go through the article ‘Date Generation transform in SAP Data Services’ to
know more details about this transform.

Effective Date:

 This transform is used to calculate an “effective-to” value for data that contains an
effective date.
 The calculated effective-to date and an existing effective date produce a date range
that allows queries based on effective dates to produce meaningful results.
 Please go through the article ‘Effective Date transform in SAP Data Services’ to know
more details about this transform.

Hierarchy Flattening:

 This transform constructs a complete hierarchy from parent/child relationships, then


produces a description of the hierarchy in vertically or horizontally flattened format
which can be used to build start models in data warehouse environment.
 Please go through the article ‘Hierarchy Flattening transform in SAP Data Services’ to
know more details about this transform.

History Preserving:

 The History_Preserving transform allows us to produce a new row in our target


rather than updating an existing row.
 We can indicate in which columns the transform identifies changes to be preserved.
 If the value of certain columns change, this transform creates a new row for each
row flagged as UPDATE in the input data set.
 Please go through the article ‘History Preserving transform in SAP Data Services’ to
know more details about this transform.

Key Generation:

 This transform is used to generate new keys for new rows/records in a data set.
 When it is necessary to generate artificial keys in a table, the Key_Generation
transform looks up the maximum existing key value from a table and uses it as the
starting value to generate new keys.
 The transform expects the generated key column to be part of the input schema.
 Please go through the article ‘Key Generation transform in SAP Data Services’ to
know more details about this transform.

Map CDC Operation:

 Using its input requirements (values for the Sequencing column and a Row operation
column), the Map CDC Operation transform performs three functions:
o Sorts input data based on values in Sequencing column drop-down list and
(optional) the Additional grouping columns box.
o Maps output data based on values in Row operation column drop-down list.
Source table rows are mapped to INSERT, UPDATE, or DELETE operations
before passing them on to the target.
o Resolves missing, separated, or multiple before- and after-images for UPDATE
rows.
o Allows you filter columns and view UPDATE rows prior to running the job.
 While commonly used to support relational or mainframe changed-data capture
(CDC), this transform supports any data stream as long as its input requirements are
met. Relational CDC sources include Oracle and SQL Server.
 This transform is typically the last object before the target in a data flow because it
produces INPUT, UPDATE and DELETE operation codes. Data Services produces a
warning if other objects are used.
 Please go through the article ‘Map CDC Operation transform in SAP Data Services’ to
know more details about this transform.

Pivot (Columns to Rows):

 This transform creates a new row for each value in a column that we identify as a
pivot column.
 The Pivot transform allows us to change how the relationship between rows is
displayed.
 For each value in each pivot column, Data Services produces a row in the output
data set. We can also create pivot sets to specify more than one pivot column.
 Please go through the article ‘Pivot (Columns to Rows) transform in SAP Data
Services’ to know more details about this transform.

Reverse Pivot (Rows to Columns):

 This transform creates one row of data from several existing rows.
 The Reverse Pivot transform allows us to combine data from several rows into one
row by creating new columns.
 For each unique value in a pivot axis column and each selected pivot column, Data
Services produces a column in the output data set.
 Please go through the article ‘Reverse Pivot (Rows to Columns) transform in SAP
Data Services’ to know more details about this transform.

Table Comparison:

 This transform compares two data sets and produces the difference between them as
a data set with rows flagged as INSERT, UPDATE, or DELETE.
 The Table_Comparison transform allows us to detect and forward changes that have
occurred since the last time a target was updated.
 Note that in order to use the Table_Comparison transform with Teradata 13 and
later tables as the comparison table and target table, we must do the following
things:
o On the Teradata server, set the General parameter DBSControl to TRUE to
allow uncommitted data to be read.
o In the Data Services Teradata datastore, add the following statement in the
“Additional session parameters” field:
o SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ
UNCOMMITTED;
 Please go through the article ‘Table Comparison transform in SAP Data Services’ to
know more details about this transform.

XML Pipeline:

 This transform is used to process large XML files one instance of a repeatable
structure at a time.
 With this transform, Data Services does not need to read the entire XML input into
memory then build an internal data structure before performing the transformation.
 An NRDM structure is not required to represent the entire XML data input. Instead,
the XML_Pipeline transform uses a portion of memory to process each instance of a
repeatable structure, then continually releases and reuses memory to steadily flow
XML data through the transform.
 During execution, Data Services pushes operations of the XML_Pipeline transform to
the XML source.
 Please go through the article ‘XML Pipeline transform in SAP Data Services’ to know
more details about this transform.

The below image represents all the transforms available as part of Data Integrator category.
SAP Data Services Transforms

2. Data Quality Transforms:

The list of data quality transforms present are:

 Associate
 Country_ID
 Data_Cleanse
 DSF2_Walk_Sequencer
 Geocoder
 Global_Address_Cleanse
 Global_Suggestion_List
 Match
 USA_Regulatory_Address_Cleanse
 User_Defined

Lets look into the detailed description of each of the transforms present under Data Quality
category.

Associate:

 The Associate transform works downstream from Match transform to provide a way
to combine, or associate, their match results by using the Match transform-
generated Group Number fields.
 We may need to add a Group Statistics operation to the Associate transform to
gather match statistics.
 You can combine the results of two or more Match transforms, two or more Associate
transforms, or any combination of the two.
 For example, we may use one Match transform to match on name and address, use
a second Match transform to match on SSN, and then use an Associate transform to
combine the match groups produced by the two Match transforms.
 Please go through the article ‘Associate transform in SAP Data Services’ to know
more details about this transform.
Country ID:

 The Country ID transform parses our input data and then identifies the country of
destination for each record.
 After identifying the country, the transform can output the country name, any of
three different ISO country codes, an ISO script code, and a percentage of
confidence in the assignment.
 Though we can use the Country ID transform before any transform in a data flow, we
will probably find it most useful during a transactional address cleanse job.
 Place the Country ID transform before the Global Suggestion List transform. The
Global Suggestion List transform needs the ISO_Country_Code_2Char field that the
Country ID transform can output.
 It is not necessary to use the Country ID transform before the Global Address
Cleanse transform in a data flow because the Global Address Cleanse transform
contains its own Country ID processing.
 It is also not necessary to use the Country ID transform before the USA Regulatory
Address Cleanse transform because the input data should contain U.S. addresses
only.
 Please go through the article ‘Country ID transform in SAP Data Services’ to know
more details about this transform.

Data Cleanse:

 Use the Data Cleanse transform to parse and format custom or person and firm data
as well as phone numbers, dates, e-mail addresses, and Social Security numbers.
 Custom data includes operational or product data specific to the business.
 The cleansing package we specify defines how our data should be parsed and
standardized.
 Within a data flow, the Data Cleanse transform is typically placed after the address
cleansing process and before the matching process.
 Please go through the article ‘Data Cleanse transform in SAP Data Services’ to know
more details about this transform.

DSF2 Walk Sequencer:

 To add walk sequence information to our data, include the DSF2 Walk Sequencer
transform in the data flow. We can then send our data through presorting software
to qualify for the following walk-sequence discounts:
o Carrier Route
o Walk Sequence
o 90% Residential Saturation
o 75% Total Active Saturation
 DSF2 walk sequencing is often called “pseudo” sequencing because it mimics USPS
walk sequencing.
 Where USPS walk-sequence numbers cover every address, DSF2 walk sequence
processing provides “pseudo” sequence numbers for the addresses only in that
particular file.
 Please go through the article ‘DSF2 Walk Sequencer transform in SAP Data
Services’ to know more details about this transform.

Geocoder:

 The Geocoder transform uses geographic coordinates expressed as latitude and


longitude, addresses, and point-of-interest (POI) data. Using the transform, we can
append addresses, latitude and longitude, census data (US only), and other
information to the data.
 Based on mapped input fields, the Geocoder transform has three modes of geocode
processing:
o Address Geocoding
o Reverse Geocoding
o POI textual search
 Please go through the article ‘Geocoder transform in SAP Data Services’ to know
more details about this transform.

Global Address Cleanse:

 The Global Address Cleanse transform identifies, parses, validates, and corrects
global address data, such as primary number, primary name, primary type,
directional, secondary identifier, secondary number, locality, region and postcode.
 Note:The Global Address Cleanse transform does not support CASS certification or
produce a USPS Form 3553. If you want to certify your U.S. address data, you must
use the USA Regulatory Address Cleanse transform, which supports CASS.
 If we perform both address cleansing and data cleansing, the Global Address Cleanse
transform typically comes before the Data Cleanse transform in the data flow.
 Please go through the article ‘Global Address Cleanse transform in SAP Data
Services’ to know more details about this transform.

Global Suggestion List:

 The Global Suggestion List transform query addresses with minimal data, and it can
offer suggestions for possible matches. It is a beneficial research tool for managing
unassigned addresses from a batch process.
 Global Suggestion List functionality is designed to be integrated into our own custom
applications via the Web Service.
 The Global Suggestion List transform requires the two character ISO country code on
input. Therefore, we may want to place a transform, such as the Country ID
transform, that will output the ISO_Country_Code_2Char field before the Global
Suggestion List transform.
 The Global Suggestion List transform is available for use with the Canada, Global
Address, and USA engines.
 Please go through the article ‘Global Suggestion List transform in SAP Data
Services’ to know more details about this transform.

Match:

 The Match transform is responsible for performing matching based on the business
rules we define. The transform then sends matching and unique records on to the
next transform in the data flow.
 For best results, the data in which we are attempting to find matches should be
cleansed. Therefore, we may need to include other Data Quality transforms before
the Match transform.
 Please go through the article ‘Match transform in SAP Data Services’ to know more
details about this transform.

USA Regulatory Address Cleanse:

 The USA Regulatory Address Cleanse transform identifies, parses, validates, and
corrects U. S. address data according to the U.S. Coding Accuracy Support System
(CASS).
 This transform can create the USPS Form 3553 and output many useful codes to our
records. We can also run in a non-certification mode as well as produce suggestion
lists.
 If we perform both data cleansing and matching, the USA Regulatory Address
Cleanse transform typically comes before the Data Cleanse transform and any of the
Match transforms in the data flow.
 SAP recommends using a sample job or data flow that is set up according to best
practices for a specific use case.
 Please go through the article ‘USA Regulatory Address Cleanse transform in SAP Data
Services’ to know more details about this transform.

User Defined:

 The User-Defined transform provides us with custom processing in a data flow using
full Python scripting language.
 The applications for the User-Defined transform are nearly limitless. It can do just
about anything that we can write Python code to do.
 We can use the User-Defined transform to generate new records, populate a field
with a specific value, create a file, connect to a website, or send an email, just to
name a few possibilities.
 We can place this transform anywhere in our data flow. If we have created our own
transform, then the only restrictions about where it can be located in the data flow
are those which we place on it.
 Although the User-Defined transform is quite flexible and powerful, we will find that
many of the tasks we want to perform can be accomplished with the Query
transform.
 The Query transform is generally more scalable and faster, and uses less memory
than User-Defined transforms.
 Please go through the article ‘User Defined transform in SAP Data Services’ to know
more details about this transform.

The below image shows the pictorial representation of all the transforms available as part of
Data Quality category.
SAP Data Services Transforms

3. Platform Transforms:

The list of Platform transforms present are:

 Case
 Data_Mask
 Map_Operation
 Merge
 Query
 Row_Generation
 SQL
 Validation

Lets look into the detailed description of each of the transforms present under Platform
category.

Case:

 This transform specifies multiple paths in a single transform (different rows are
processed in different ways).
 The Case transform simplifies branch logic in data flows by consolidating case or
decision making logic in one transform. Paths are defined in an expression table.
 Please go through the article ‘Case transform in SAP Data Services’ to know more
details about this transform.

Data_Mask:

 The Data Mask transform enables us to protect personally identifiable information in


our data.
 Personal information includes data such as credit card numbers, salary information,
birth dates, personal identification numbers, or bank account numbers.
 We may want to use data masking to support security and privacy policies, and to
protect our customer or employee information from possible theft or exploitation.
 Please go through the article ‘Data Mask transform in SAP Data Services’ to know
more details about this transform.
Map Operation:

 This transform modifies data based on mapping expressions and current operation
codes. The operation codes can be converted between data manipulation operations.
 Writing map expressions per column and per row type (INSERT/UPDATE/DELETE)
allows us to perform:
o Change the value of data for a column.
o Execute different expressions on a column, based on its input row type.
o Use the before_image function to access the before image value of an
UPDATE row.
 Please go through the article ‘Map Operation transform in SAP Data Services’ to
know more details about this transform.

Merge:

 This transform combines incoming data sets, producing a single output data set with
the same schema as the input data sets.
 Please go through the article ‘Merge transform in SAP Data Services’ to know more
details about this transform.

Query:

 The Query transform retrieves a data set that satisfies conditions that we specify.
 A Query transform is similar to a SQL SELECT statement.
 Please go through the article ‘Query transform in SAP Data Services’ to know more
details about this transform.

Row generation:

 This transform produces a data set with a single column.


 The column values start with the number that we set in the ‘Row number starts’ at
option. The value then increments by one to a specified number of rows.
 Please go through the article ‘Row Generation transform in SAP Data Services’ to
know more details about this transform.

SQL:

 This transform performs the indicated SQL query operation. Use this transform to
perform standard SQL operations when other built-in transforms cannot perform
them.
 The options for the SQL transform include specifying a datastore, join rank, cache,
array fetch size, and entering SQL text.
 Note:The SQL transform supports a single SELECT statement only.
 Please go through the article ‘SQL transform in SAP Data Services’ to know more
details about this transform.

Validation:

 The Validation transform qualifies a data set based on rules for input schema
columns.
 We can apply multiple rules per column or bind a single reusable rule (in the form of
a validation function) to multiple columns.
 The Validation transform can identify the row, column, or columns for each validation
failure. We can also use the Validation transform to filter or replace (substitute) data
that fails our criteria.
 When we enable a validation rule for a column, a check mark appears next to it in
the input schema.
 Please go through the article ‘Validation transform in SAP Data Services’ to know
more details about this transform.

XML Map:

 The XML_Map transform is a data transform engine designed for hierarchical data. It
provides functionality similar to a typical XQuery or XSLT engine.
 The XML_Map transform takes one or more source data sets and produces a single
target data set. Flat data structures such as database tables or flat files are also
supported as both source and target data sets.
 We can use the XML_Map transform to perform a variety of tasks. For example:
o We can create a hierarchical target data structure such as XML or IDoc from a
hierarchical source data structure.
o We can create a hierarchical target data structure based on data from flat
tables.
o We can create a flat target data set such as a database table from data in a
hierarchical source data structure.
 XML_Map transform works in two modes- Normal and Batch mode.
 In normal mode, data is handled on a row by row basis before sending it to the next
transform.
 In batch mode, data is handled as block of rows, before sending it to the next
transform.
 There are different transform icons to indicate each mode.
 Please go through the article ‘XML Map transform in SAP Data Services’ to know
more details about this transform.

The below image shows the pictorial representation of all the transforms available as part of
Platform category.
SAP Data Services Transforms

4. Text data Processing Transforms:

The list of Text data processing transforms present are:

 Entity_Extraction

Lets look into the detailed description of each of the transforms present under Text data
Processing category.

Entity Extraction:

 The Entity Extraction transform performs linguistic processing on content by using


semantic and syntactic knowledge of words.
 We can configure the transform to identify paragraphs, sentences, and clauses and it
can extract entities and facts from text.
 Typically, we use the Entity Extraction transform when we have text with specific
information we want to extract and then use in downstream analytics and
applications.
 Please go through the article ‘Entity Extraction transform in SAP Data Services’ to
know more details about this transform.

The below image shows the pictorial representation of all the transforms available as part of
Text Data Processing category.

With this, we have seen all the transforms present in SAP data services.

You might also like