You are on page 1of 2

semantic integration explained

So what does all this talk about semantic data integration really mean? Metadata, image files, meta types? How does
semantic rationalization work? This primer explains it all.

Central to the expressor semantic data integration systemtm is the concept of semantic rationalization, the process of
mapping fields from multiple and diverse external data resources to data types and definitions that are then used
exclusively within expressor projects.

The input to the semantic rationalization process is the metadata describing the data that will be processed and
emitted by the expressor application. The output is an image file and entries within the listings of subject areas, terms,
term abbreviations, and names comprising the dictionary definitions that will be used in subsequent semantic
rationalization efforts.

Image files are specific to the information resource for which they were written but different image files will map the
same common name — the definition — to differently-named fields in their corresponding data resources. For
example, the definition part_number may be mapped in two image files to the rdbms table column
name pnum and to the COBOL data item part_no.

The expressor system represents all data as one of five expressor types — byte, date time, decimal, number, and
string. Each definition is associated with an expressor meta type, which is a descriptive alias (e.g., binary or id) for the
actual expressor data type (i.e., respectively, byte and string). The meta type assigned to a definition, and its
underlying type, may be different from the type assigned to the data in its resource, but this conversion provides for
greater consistency and ease of use across projects.

The important concept is that individuals developing expressor data integration applications work only with the
definition names and their assigned meta types and never need to refer to, or even know, the names or types used by
the actual data resources. Consequently, all of the application code and business rule development — referred to as
semantic integration — use the definition names and corresponding meta types, thereby providing a decoupling of the
integration design from the physical environment.

Let us see how definitions and meta types provide for semantic integration. Suppose the previously mentioned rdbms
column contains an alphanumeric part number [e.g., pnum varchar2(6)] while the COBOL file resource data
item identifies a numeric part number [e.g., part_no PIC 9(5)]. Note that not only are these fields different
types but they are also different sizes.
During the semantic rationalization process, each field is mapped to the definition part_number and assigned the
meta type id, an alias for the expressor string type. What is the rationale for changing the types of
the pnum and part_no fields? It is a reasonably safe assumption that mathematical operations will not be made
on the part_number field, so storing and manipulating this value as a string type will provide greater functionality
(the string manipulation and pattern matching functions will be usable), less confusing scripting (part numbers are
always represented by the same type), and memory management efficiency.

It is the responsibility of the expressor parallel data processing engine, not the integration developer, to manage the
conversions between the types in the resources and the types within the integration, so these conversions are
completely hidden from the developer, who only needs to be aware of the meta type assigned to the definition. Now
all integration projects, where a data resource contains a field mapped to the part_number definition will treat this
data as a string type.

What about the size differences? Although all part numbers are now represented by a common type, how can code
be simplified if part numbers differ in size? The answer is to write a business rule for the
definition part_number that adjusts the size of part number fields. For example, the following rule initially left
pads the part number and then extracts the right-most characters as the formatted part number value.

string.substring(string.concatenate("00000000", part_number), -8)

Note that the formatted part number actually contains eight characters, which means that the rule is usable with data
resources where part number values contain up to eight characters. Of course, the rule can be modified to format the
part number to any width using any padding character. Writing a comparable rule for a numeric field would be much
more complicated, especially when using zero as the padding digit. The first step of integration would apply this rule to
the incoming part number and all subsequent steps would work with an eight character wide string field.

As part of the expressor semantic integration paradigm, it is important to note that all transformation code and
business rules refer to the data fields using their definition names and not the names used by their respective
resources. The expressor illustrator -- a Visio based graphical development tool -- scripting window, coding aids, and
schema viewers show only the definition names and access the definition specific business rules. Consequently, all
data manipulation within the expressor processing engine uses these semantically rationalized definition names and
their corresponding expressor data types. Only when data is read into, or emitted from, the expressor integration does
the engine use the resource specific field names and types.
 

You might also like