You are on page 1of 337

SPICA

DATA SYSTEMS

Business Intelligence (BI) Training

Business Objects Data Integrator

1
SPICA Training Details
DATA SYSTEMS

• DBMS
• RDBMS
• SQL
• Data Warehousing
• ETL Tool- Business Objects Data Integrator

2
DBMS-WHAT IS A DATABASE,
SPICA ANYWAY?
DATA SYSTEMS

• A database is simply a container of information.


To find an example of a database that most
people use everyday, you need look no further
than your address book or the yellow pages.

3
SPICA Basic database terms- Item/Entity
DATA SYSTEMS

Lets consider the type of information that’s in an address book

Person’s Name
Person’s Address
Person’s Phone No
Person’s DOB

What is the common element ?


-Person

In this example, each person is considered an item. In other words,


the item is a thing or things that the database is storing information
about. A database can certainly store information about multiple sets
of items (customers, orders, products, etc.)

4
SPICA Basic database terms- Field
DATA SYSTEMS

• What information about a person (Item/Entity) is


stored in database
– Name
– Address
– Phone No
– DOB

Each individual piece of information is known as“


Field” e.g. Name, Address, DOB

5
SPICA Basic database terms- Record
DATA SYSTEMS

• All information about one particular item is


known as a record.

6
SPICA Basic database terms- Value
DATA SYSTEMS

• a value is the actual text (or numerical amount


or date) that is used to fill in the blank when
adding information to your database.

7
Putting it all together - Table
SPICA
DATA SYSTEMS
Table
Item/Entity

Person

Name Address Phone No. DOB


Field
John Steel 344 Main 201-367-2345 21/12/1997
Record Street, New
York City

Value Mike Brown 45th Street, 201-456-9857 25/10/1980


New York City

8
SPICA Basic database terms-Primary Key
DATA SYSTEMS

• Uniquely identifies each record in the


table

Person

Person Id Name Address Phone No. DOB

1001 John Steel 344 Main 201-367- 21/12/1997


Street, New 2345
York City
1002 John Steel 45th Street, 201-456- 25/10/1980
New York 9857
City
9
SPICA DBMS Examples
DATA SYSTEMS

• dBase
• Oracle
• SQL Server
• Etc.

10
SPICA RDBMS
DATA SYSTEMS

• Business applications involve many tables


• RDBMS Links (Relationship) the data in multiple
tables
• The Relationship between the tables is based on
one or more field values common to both tables.

11
Relationship between Tables-
SPICA Foreign Key
DATA SYSTEMS

Country
Country Id Name
1 USA Person
Person Id Name Address Phone Country DOB
2 UK No. Id
1001 John 344 Main 201- 1 21/12/1997
Steel Street, 367-
New York 2345
City

1002 John 45th 201- 2 25/10/1980


Steel Street, 456-
London 9857

12
Data Types
SPICA (Data Classification)
DATA SYSTEMS

• Char()
• Varchar2()
• Number
• Number (m, n)
• Date
• Etc.

13
SPICA RDBMS Examples
DATA SYSTEMS

• Oracle
• SQL Server
• Etc.

14
Structured Query Language (SQL)
SPICA
DATA SYSTEMS

• Was developed by IBM Corporation, Inc


• Set of commands to access data within the relational
database
• Application programs and Oracle tools often allow users
to access the database without directly using SQL
• These applications in turn must use SQL when executing
the user's request

15
Features of SQL
SPICA
DATA SYSTEMS

• It processes sets of data as groups rather than


as individual units.

• It provides automatic navigation to the data.

• Common Language for All Relational


Databases

16
SQL Statements
SPICA
DATA SYSTEMS

CREATE
CREATE
ALTER
ALTER
DROP
DROP Data Definition
Data Definition Language
Language (DDL)
(DDL)
RENAME
RENAME
TRUNCATE
TRUNCATE
INSERT
INSERT
UPDATE
UPDATE Data Manipulation
Data Manipulation Language
Language (DML)
(DML)
DELETE
DELETE
COMMIT
COMMIT
ROLLBACK
ROLLBACK Transaction Control
Transaction Control Language
Language (TCL)
(TCL)
SAVEPOINT
SAVEPOINT
SELECT
SELECT DataRetrieval
Data Retrieval Language
Language (DRL)
(DRL)
GRANT
GRANT Data Control
Data Control Language
Language (DCL)
(DCL)
REVOKE
REVOKE 17
Database Objects
SPICA
DATA SYSTEMS

Object
Object Description
Description

Table
Table Basic
Basic unit
unit ofof storage;
storage; composed
composedofof rows
rows && columns
columns

View
View Logically
Logicallyrepresents
represents subsets
subsets ofof data
datafrom
fromone
oneor
ormore
moretables
tables

Sequence
Sequence Generates
Generates primary
primarykkey
eyvalues
values

Index
Index Improves
Improves the
theperformance
performanceofof some
somequeries
queries

Synonym
Synonym Gives
Gives alternative
alternativenames
names to
toobjects
objects
18
Dual Table
SPICA
DATA SYSTEMS

• The DUAL Table is a dummy table available to all users


in the database.
• It has one column and one row.
• The Dual table is used to select system variables or to
evaluate an expression.

For example:

SELECT SYSDATE,USER FROM dual;

19
Writing Simple Queries
SPICA
DATA SYSTEMS

• A query is a request for information from the


database tables.

For Example:
SELECT * from Jobs;

20
SPICA Ensuring Uniqueness
DATA SYSTEMS

DISTINCT keyword following SELECT ensures


that the resulting rows are Unique.

For example:

SELECT DISTINCT department_id from


employees;

21
Limiting Rows
SPICA
DATA SYSTEMS

A WHERE clause in the SELECT statement is used to


limit the number of rows processed.

For Example:

Select name from employees where dept_id=30

22
Comparison Operators
SPICA
DATA SYSTEMS

1. Equality =
2. Inequality !=,<> or ^=
3. Less than <
4. More than >
5. Less Than or Equal To <=
6. Greater Than or Equal To >=

23
Exercise-1
SPICA
DATA SYSTEMS

1. Use Dual Table


2. Write a query to retrieve all the columns from table
3. Write a query that will select unique rows from table
4. Write a query which will limit the rows.
5. Write a query using NULL.

24
Sorting Rows
SPICA
DATA SYSTEMS

To retrieve all employee names of department 90


from the employees table order by last_name.

For Example:
SELECT first_name ||' '|| last_name from
employees where department_id=90 ORDER BY
last_name;

25
SPICA Conversion Functions
DATA SYSTEMS

• To_char (date, [‘format’])


select ename, to_char(hiredate,’dd/mm/yy’)
from emp;
• To_number (string)
Select to_number(‘12’) +12 from dual;
• To_date (string,‘format’)
Select to_date(‘8/4/73’, ‘dd/mm/yy’);

26
SPICA Exercise
DATA SYSTEMS

• Write a query to display customer name


alphabetically.
• Write a query using Conversion function to show
qtr-yyyy from system date.

27
SPICA Group Functions
DATA SYSTEMS

• Group functions are sometimes called aggregate


functions and return a value based on number of inputs.
The exact number of inputs are not determined until the
query is executed and all rows are fetched. This differs
from time-before the query is executed.

• Group functions do not process NULL values and do not


return a NULL value, even when NULLs are the only
values evaluated. For Example, a COUNT or SUM of
NULL will result in 0.

28
SPICA AVG
DATA SYSTEMS

This function has the syntax AVG([{DISTICNT|ALL}] n),


where n is a numeric expression. The AVG function
returns the mean of the expression n. If neither
DISTINCT nor ALL is specified in the function call, the
default is ALL.

For Example:
SELECT job_id,AVG(salary) FROM hr.employees where
job_id like ‘AC%’ GROUP BY jb_id;

29
SPICA MAX
DATA SYSTEMS

This function has the syntax MAX([{DISTINCT


|ALL}] x), where x is an expression. This function
returns the highest value in the expression x.

For Example:

SELECT MAX(hire_date),MAX(salary) from


hr.employees;

30
SPICA SUM
DATA SYSTEMS

This function has the syntax SUM([{DISTINCT|ALL}] x),


where x is a numeric expression. This function returns
the sum of the expression x.

SELECT SUM(blocks) from user_tables;

31
Grouping Data with GROUP BY
SPICA
DATA SYSTEMS

As the name implies, group functions work on data that is grouped. We tell
the database how to group or categorize the data with a GROUP By clause.
Whenever we use a group function in the SELECT clause of a SELECT
statement, we must place all non-grouping columns in the Group By clause.
If no GROUP BY clause is specified, the default grouping becomes the
entire result set.
For Example:
SELECT cust_state_province,count(*) customer_count from
sh.customers GROUP BY cust_state_province;
In this example, we categorize the data by state and apply the group
function(COUNT). It returns the number of rows for each state in the
CUSTOMER table. If we want to order the results by the number of
customers, our ORDER BY clause can contain either the column number or
the grouping function.

32
SPICA Joins
DATA SYSTEMS

• Equi Joins
• Outer Joins

33
SPICA Equi Joins
DATA SYSTEMS

This type of join is based on = operator. This combines


rows from two tables that have equivalent values for the
specified columns.

For Example:
Select locations.location_id, city, department_name from
locations,departments where
locations.location_id=departments.location_id;

34
SPICA Outer Joins
DATA SYSTEMS

We use outer join to see the data from one table even if
there is no corresponding row in the joining table.

35
SPICA Left Outer
DATA SYSTEMS

A left Outer Join is a join between two tables that return rows based
on the matching condition as well as unmatched rows from the table
to the left of the JOIN clause. For example, the following query
returns the country name and city name from the COUNTRIES and
LOCATIONS tables, as well as the entire country names from the
countries table.

SELECT c.country_name,l.city from countries c LEFT OUTER JOIN


locations l
ON c.country_id =l.country_id

36
SPICA Right Outer Join
DATA SYSTEMS

A right outer join is a join between two tables that returns


rows based on the matching condition, as well as
unmatched rows from the table to the right of the join
clause.

Select country_name, city from locations l RIGHT


OUTER JOIN countries c
on c.country_id=l.country_id;

37
SPICA Full Outer Join
DATA SYSTEMS

A full outer join is new to oracle 9i. This is a join between


two tables that returns rows based on the matching
condition. as well as unmatched rows from the table on
the right and left of the JOIN clause. Suppose that you
want to list all the employees last names with their
department names. You want to include all the
employees, even if they have not been assigned a
department. You also want to include all the
departments, even if there are no employees working for
that department.

38
SPICA Example
DATA SYSTEMS

Select e.employee_id, e.last_name,


d.department_id,d.department_name from employees e
FULL OUTER JOIN departments d ON
e.department_id=d.department_id;

39
DDL
SPICA
DATA SYSTEMS

• CREATE
• ALTER
• DROP
• TRUNCATE
• RENAME

40
CREATE TABLE STATEMENT
SPICA
DATA SYSTEMS

CREATE TABLE "table_name"


("column 1" "data_type_for_column_1",
"column 2" "data_type_for_column_2",
... )

41
Create Customer Table
SPICA
DATA SYSTEMS

CREATE TABLE customer


(First_Name char(50),
Last_Name char(50),
Address char(50),
City char(50),
Country char(25),
Birth_Date date)

42
Alter Table
SPICA
DATA SYSTEMS

Use Alter Table Statement to :


• Add a column
• Drop a column
• Change a column name
• Change the data type for a column

43
SPICA Add Column
DATA SYSTEMS

First, we want to add a column called "gender“ to this


table. To do this, we key in:

ALTER table customer add Gender char(1)

44
Rename Column
SPICA
DATA SYSTEMS

Next, we want to rename "Address" to "Addr". To do this,


we key in,

ALTER table customer change Address Addr


char(50)

45
Change Datatype
SPICA
DATA SYSTEMS

Then, we want to change the data type for "Addr" to


char(30). To do this, we key in,

ALTER table customer modify Addr char(30)

46
SPICA Drop Column
DATA SYSTEMS

Finally, we want to drop the column "Gender". To do this,


we key in,

ALTER table customer drop Gender

47
SPICA Drop Table
DATA SYSTEMS

Sometimes we may decide that we need to get rid of a


table in the database for some reason. In fact, it would
be problematic if we cannot do so because this could
create a maintenance nightmare for the DBA's.
Fortunately, SQL allows us to do it, as we can use the
DROP TABLE command.
The syntax for DROP TABLE is

DROP TABLE "table_name"

48
SPICA Truncate Table
DATA SYSTEMS

• TRUNCATE TABLE allows you to remove all


rows from a table.

• TRUNCATE requires exclusive access to the


table whereas DELETE doesn't. So if you'd like
to empty table open by another user, you should
use DELETE command instead of TRUNCATE.

49
SPICA Rename Table
DATA SYSTEMS

RENAME TABLE allows you to rename an existing table in any schema


(except the schema SYS).

Syntax

RENAME TABLE table-Name TO new-Table-Name

If there is a view or foreign key that references the table, attempts to


rename it will generate an error. In addition, if there are any check
constraints or triggers on the table, attempts to rename it will also generate
an error.

RENAME TABLE SAMP.EMP_ACT TO EMPLOYEE_ACT

50
SPICA Exercise
DATA SYSTEMS

• Create a table.
• Alter a table.
• Drop a table.
• Truncate table.

51
SPICA
DATA SYSTEMS

Modifying Data

52
SPICA DML Statements
DATA SYSTEMS

• INSERT Adds rows to a table


• UPDATE Changes the value stored in a table.
• MERGE Updates or inserts rows from one table into
another.
• DELETE Removes rows from a table.

53
Inserting Rows into a Table
SPICA
DATA SYSTEMS

The INSERT statement is used to add rows to one or


more tables. Rows can be added with specific data
values, or the rows can be created from existing data
using a subquery.

54
SPICA Insert - Example
DATA SYSTEMS

INSERT INTO checking


(account_id,create_date,balance) values
(‘Kiesha’,SYSDATE,5000);

INSERT INTO e_checking SELECT * from checking


where account_type=‘C’

55
Updating Rows in a Table
SPICA
DATA SYSTEMS

The UPDATE statement is used to modify existing rows in a table.

56
SPICA Update - Example
DATA SYSTEMS

UPDATE order_rollup
set(qty,price) =(SELECT SUM(qty),SUM(price) from
order_lines where customer_id =‘KOHL’)
where customer_id = ‘KOHL’
and order_period=TO_DATE(’01-Oct-2001’);

OR

UPDATE order_rollup set phone =‘3125551212’


,fax=‘7735551212’
where customer_id=‘KOHL’;

57
Merging Rows into a table
SPICA
DATA SYSTEMS

The Merge statement is used to both update and insert


rows in a table.

58
SPICA Example
DATA SYSTEMS

MERGE INTO oe.product_information pi


USING(SELECT product_id,list_price,min_price FROM new_prices)
NP
ON(pi.product_id=np.product_id)
WHEN MATCHED THEN UPDATE SET
pi.list_price=np.lit_price,pi.min_price=np.min_price
WHEN NOT MATCHED THEN INSERT
(pi.product_id,pi.category_id,pi.list_price,pi.min_price)
VALUES(np.product_id,33,np.list_price,np.min_price);

59
Transaction Control Statements
SPICA
DATA SYSTEMS

• COMMIT
• ROLLBACK
• SAVEPOINT

60
Advantages of COMMIT & ROLLBACK statements
SPICA
DATA SYSTEMS

• Ensure Data Consistency


• Preview data changes before making changes
permanent
• Group logically related operations

61
Controlling Transactions
SPICA
DATA SYSTEMS

INSERT UPDATE INSERT DELETE

COMMIT SAVEPOINT A SAVEPOINT B

ROLLBACK to B

ROLLBACK to A

ROLLBACK
62
State of the Data After COMMIT
SPICA
DATA SYSTEMS

• Data changes are made permanent in the database.


• The previous state of the data is permanently lost.
• All user can view the results.
• Lock on the affected rows are released; those rows are
available for other users for manipulation.
• All savepoints are erased.

63
State of the Data After ROLLBACK
SPICA
DATA SYSTEMS

• Discard all pending changes by using the


ROLLBACK segment.
• Data changes are undone
• Previous state of the data is restored
• Lock on the affected rows are released.

64
SPICA Exercise
DATA SYSTEMS

• Use DML Statement


• Use Commit/Rollback

65
SPICA
DATA SYSTEMS

Data Warehousing

66
Agenda
SPICA
DATA SYSTEMS

• OLTP Systems
• Business Intelligence
• Data Warehousing Components
• Dimensional Modeling
– Dimensions
– Facts
– Star Schema
– Time Dimension
• Retail Store Case Study
• Dimensional Modeling life Cycle
• Surrogate Keys
• Degenerate Dimensions
• Snowflake Vs Star Schema
67
SPICA Agenda ..
DATA SYSTEMS

• Updates to Dimension Tables


– Slowly Changing Dimensions
• Usage/Requirements for SCD
• Type1
• Type 2
• Type3
– Conformed/Non-Conformed Dimension tables
• Bus Matrix

68
SPICA OLTP Systems
DATA SYSTEMS

– OLTP systems are designed for Day to Day


operations.
• Order Entry System
• Payroll System
• ERP
– Database design ER modeling in 3rd NF

69
Data Models Revisited
SPICA
DATA SYSTEMS

• An abstract view of data that excludes many implementation


details. May be easier for most users to understand

• Comprised of logical concepts, e.g. objects or entities, their


properties and their interrelationships

• Hides computer storage details that are either too complex


or not of interest to most users

70
SPICA What is Data Modeling Life Cycle?
DATA SYSTEMS

•The goal of the data modeling life cycle is primarily


the creation of a storage area for the business data.

71
E/R Modeling
SPICA
DATA SYSTEMS

E/R modeling is a design technique in which we store the


data in highly normalized form inside a relational database.

Features of ER model:
• ER model is highly normalized
• Stress is on optimization of OLTP transaction

72
SPICA ER Model for Retail Sales
DATA SYSTEMS

Customer_Master
Region_Master Channel_Master
Customer_Cd
Region_Cd Channel_Code
Customer_Type_Cd (FK)
Region_Desc City_Cd (FK) Channel_Desc
Customer_Desc
Address_Line1
Address_Line2
Country_Master Phone_No
Country_Cd Zipcode
Region_Cd (FK) Customer_Channle_Master
Country_Desc Customer_Channel_ID
Channel_Code (FK)
Customer_Cd (FK)
State_Master City_Master

State_Cd City_Cd

Country_Cd (FK) State_Cd (FK)


State_Desc City_Desc Order_Header_Txn
Population Customer_Type_Master
Order_Cd
Metro_Flag Customer_Type_Cd Promotion_cost_Master
Product_Category_Master City_Cd (FK) Promotion_Cd
Customer_Type_Desc
Customer_Channel_ID (FK)
Product_Category_Cd Order_Desc Promotion_Cost
Product_Category_Desc Order_Dt Promo_Cost_Entry_Date
Product_Master
Product_Price_Master Product_Cd
Product_Category_Cd (FK)
Product_Desc
Product_Cd (FK) Order_Line_Txn
Current_Flag Odder_Line_Cd
Unit_Price
Start_Date Product_Cd (FK)
End_Date Order_Cd (FK)
Order_Line_Desc
Promotion_Master Promotion_Cd (FK)
Promotion_Cd Line_Amount_After_Promo
Promotion_Type_Master
Promotion_Type_Cd (FK) No_of_Units
Promotion_Type_Cd
Product_Cd (FK) Unit_Price
Promotion_Type_Desc Promotion_Desc
Promotion_Start_Dt
Promotion_End_Dt
Disct_Pct
Promotion_Freebie
Promo_Freebie_CTC

73
SPICA Answer the following Queries
DATA SYSTEMS

• Total Sales for Product


• Total Sales by Store
• Total Sales by Country
• Total Sales by Year
• Total Sales by Qtr

74
SPICA SQL for Queries
DATA SYSTEMS

Select sum(sales_amt), Product_Master.Product_id


from Sales_Tx, Product_Master
Where
Sales_Tx.Product_id=Product_Master.Product_id
Group By Product_Master.Product_id

Select sum(sales_amt), Country_Master.Country_Code


from Sales_Tx, Store_Master, Country_Master, State_Master
Where
Sales_Tx. Store_id= Store_Master. Store_id
And State_Master.State_cd=Store_Master.State_Code
And Coumtry_Master.Country_Code=State_Master.Country_Code
Group By Country_Master.Country_Code

75
SQL for Queries for
SPICA Quarterly/Yearly Sales
DATA SYSTEMS

• Select Sum(Sales_Amt),
to_char(order_date,’QQ-YYYY’) from Tx_table
group by to_char(order_date,’QQ-YYYY’)

• Select Sum(Sales_Amt),
to_char(order_date,’YYYY’) from Tx_table group
by to_char(order_date,’YYYY’)

76
SPICA Observations for ER Diagram
DATA SYSTEMS

• More Joins for queries


• Query Performance is poor
• When Date is converted
to_char(Tx_Date,’QQYYYY’) indexes are not
used
• Runtime Aggregation results in poor
performance

77
SPICA Business Intelligence
DATA SYSTEMS

• Business Intelligence
– What is Business Intelligence?
• BI refers to technologies, applications for
collection, integration, analysis & presentation of
Business Information.
– How different it is from OLTP Systems?
• OLTP systems are designed for Day to Day
operations, while BI application are for strategic
decision making.

78
SPICA Functional View of Systems
DATA SYSTEMS

Sales
Sales Marketing Finance Rates/
Rates/ Customer
Customer
Sales
Sales Marketing Finance MIS
Regulatory Service
Service MIS
Regulatory

Demographics
Demographics Promotions

General
General Ledger
Ledger
Contracts
Contracts

Purchasing Product Info


Market Data
Accounting

79
SPICA Difficult Answer for Simple Questions
DATA SYSTEMS

• How many customers we have?


– Customers are present in multiple systems
– Which system has correct count?
• What is the sales for last month.
– Sales data is available in multiple systems
– Conversion rate applied by sales & finance system
(Average Daily Rate & Average Monthly Rate) is
different.

80
Data Integration for DSS
SPICA (Decision Support Systems)
DATA SYSTEMS

Sales
Sales

Product
Product Data
Data
Customer
Customer Data
Data
Sales
Sales Data
Data
Revenue
Revenue data
data
G/L
G/L Data
Data
External
External Data
Data 81
SPICA Components of Business Intelligence
DATA SYSTEMS

• Data Warehouse
• Data Marts
• Enterprise Data Warehouse
• OLAP (Reports & Dash Boards)
• Operational Data Store
• Data Mining

82
How BI help Companies?
SPICA
DATA SYSTEMS

• Historical Analysis of Data (Trend Analysis)


• Predictive Analysis & Planning
• Churn Management
• Strategic Decision Making
• Single view of Customer
• 360 degree view of Business
• Cross Sell/Up-Sell
• Finding out relation between products, fraud
management (Using Data Mining)
• Etc..

83
SPICA Data Warehouse
DATA SYSTEMS

Data Warehouse is a collection of integrated,


subject oriented database designed to support
the DSS function, where each unit is relevant to
some moment in time.

84
SPICA Goals of Data Warehousing
DATA SYSTEMS

• Easy Accessibility of Information


• Present Organizations information consistently
• Adaptive and resilient to change
• Secure Bastion that protects our information
assets
• Serve as a foundation for improved decision
Making

85
SPICA Characteristics of DW
DATA SYSTEMS

• Subject Oriented

• Integrated

• Non Volatile

• Time Variant

86
SPICA OLTP/OSS
DATA SYSTEMS

• OLTP (On line transaction processing) or OSS


(Operational Support Systems) systems were
built to automate business transactions.

• A focus on bookkeeping functions.

• Applications were built along functional lines

• Historical data was typically not needed or


retained.
87
SPICA Online Analytical Processing (OLAP)
DATA SYSTEMS

• Standard reporting
• Ad-hoc query and reporting
• Multidimensional analytical reporting
• Predictive analysis and planning
• Data Cubes (ROALP Vs MOALP)

88
SPICA OLTP VS OLAP Requirements
DATA SYSTEMS

• Work Load
• Data Modifications
• Schema Design
• Historical Data

89
SPICA OLTP VS OLAP Query
DATA SYSTEMS

Characteristic Transactional Query Analytical Query

Typical Operation Update Analyze

Age of Data Current Historical

Level of Data Detail Aggregate

Data Required per Minimum Extensive


Query
Querying Pattern Individual Queries Iterative Queries

90
SPICA Dimensional Modeling – what it is?
DATA SYSTEMS

• Dimensional Modeling is a logical design technique for


Data warehousing aimed at easier access of data and
business representation of data.

• Features of Dimensional Model:


– De-normalized Structures
– Database structured for faster and easier querying
– Stress is on easier interpretation of data
– Dimensional Databases occupy extra space compared to
equivalent ER model because of redundancy
– Consists of Fact & Dimension Tables

91
SPICA De-normalization
DATA SYSTEMS

• Carefully introduced redundancy to improve


query performance

92
SPICA Dimension tables
DATA SYSTEMS

™ Dimension tables contain the details about the business entities such as customer,
product, etc. This enables the business users to better understand the data and
their reports.
™ Since the data in a dimension table is denormalized, it typically has a large number
of columns.
™ The attributes in a dimension table are typically used as row and column headings
in a report or query results display.
™ Arrange members into hierarchies or levels

93
SPICA Geography Dimension Example
DATA SYSTEMS

Location_Key ZIP_CODE City_Code City_Name State_Code State_Name Country_Code Country_Name

1 08854 123 Edison NJ New Jersey USA United States


Of America

2 008855 123 Edison NJ New Jersey USA United States


Of America

3 05646 356 Bronx NY New York USA United States


Of America

94
SPICA Fact tables
DATA SYSTEMS

™ The fact table contains numerical values of what you measure.


™ Fact table contains the keys to associated dimension tables. These are called foreign
keys in fact table.
™ Fact tables typically contain a small number of columns.
™ Compared to dimension tables, fact tables have a large number of rows.
™ The information in a fact table has characteristics, such as:
¾ It is numerical and used to generate aggregates and summaries.
¾ Data values need to be additive, or semi-additive, to enable summarization of a
large number of values.

from fact tables


95
SPICA Star Schema – What is it ?
DATA SYSTEMS

• Star schema is combination of fact table and several dimension tables.


• Each Dimension has foreign key relationship with fact table.
• Such an arrangement in the dimensional model looks like a star formation,
with the fact table at the core of the star and the dimension tables along the
spikes of the star. The dimensional model is therefore called a STAR
schema.

96
SPICA Star Schema Example
DATA SYSTEMS

97
SPICA Surrogate Key
DATA SYSTEMS

Integer keys are sequentially assigned as


needed to populate dimension table and join to
the fact table.

Surrogate key is the Primary key in Dimension


table.

98
SPICA Data Warehouse Surrogate Keys
DATA SYSTEMS

‰Recommend surrogate keys


• Integer, non-meaningful, sequence number
• Surrogate keys join fact and dimension tables
• Treat natural, operational keys as attributes

‰Benefits
• Isolate warehouse from operational changes (SCDs)
• Improve performance
• Handle “Not applicable,” “Date TBD,” …..
• Allow integration from multiple sources (e.g. same key can be used by
multiple systems for different customers)
• Enable tracking of dimension changes
99
SPICA Date and Time Dimensions
DATA SYSTEMS

• Virtually everywhere: measurements are defined


at specific times, repeated over time, etc.
• Most common: calendar-day dimension with the
grain of a single day, many attributes
• Doesn’t have a conventional source:
– Built by hand, spreadsheet
– Holidays, workdays, fiscal periods, week numbers,
last day of month flags, must be entered manually
– 10 years are about 4K rows

100
SPICA Date and Time Dimensions..
DATA SYSTEMS

101
Date Dimension
SPICA
DATA SYSTEMS

• Note the Natural key: a day type and a full date


– Day type: date and non-date types such as inapplicable
date, corrupted date, hasn’t happened yet date
– fact tables must point to a valid date from the dimension, so
we need special date types, at least one, the “N/A” date
• How to generate the primary key?
– Meaningless integer?
– Or “10102005” meaning “Oct 10, 2005”? (reserving 9999999
to mean N/A?)
– This is a close call, but even if meaningless integers are
used, the numbers should appear in numerical order (why?
Because of data partitioning requirements in a DW, data in a
fact table can be partitioned by time)

102
SPICA Non & Semi-Additive Facts
DATA SYSTEMS

Non Additive Facts, when added, lose their


meaning such as Ratios, Percentages, Unit
Price, etc.

Semi Additive Facts that are additive across


some dimensions but are non-additive across
other dimensions. Inventory Levels, Account
Balances, etc.

103
SPICA DW Data Model- Sales System
DATA SYSTEMS
Customer_Dim Product_Dim
cust_key prod_key

customer id product id
customer name product name
phno Brand_Name
Active Flag SKU
Marital_Status gross_wt
Address_Line1
Address_line2
Birth_Date
No_Of_Children
Orde-Fact
prod_key (FK)
Region_Dim
cust_key (FK)
Time_Dim Region_Key
Time_Key (FK)
Time_Key Region_Key (FK) Zipcode
Cal_Date order_key (FK) city_name
Fiscal_Date ord_amt state_name
Day_Name ord_qty country_name
Month_No
Manth_Name

Order_Dim
order_key
order id
ord_cotract
Sum_Orde_Fact_Prod
Time_Key (FK)
Region_Key (FK)
prod_key (FK)
ord_amt Payment_mode_dim
ord_qty pay_mode_key
mode_type

Sales_Target_Fact
Time_Key (FK)
Region_Key (FK)
prod_key (FK)
target_amt
target_qty

Payment_Fact
order_key (FK)
Time_Key (FK)
Region_Key (FK)
pay_mode_key (FK)
amt_paid

104
Answer the following Queries For
SPICA Orders DW
DATA SYSTEMS

• Total Sales for Product


• Total Sales by Store
• Total Sales by Country
• Total Sales by Year
• Total Sales by Qtr

105
SPICA Query Observation
DATA SYSTEMS

• Since Data is de-normalized less joins are used


in query
• Performance of the query is faster

106
Advantages of Dimensional Modeling over
SPICA ER modeling for DW
DATA SYSTEMS

• Dimensional Modeling: Easy for business users to understand


• Dimensional Modeling: Faster response for querying and reporting
• ER Models Tend to be very complex and difficult to navigate.
• Dimensional Modeling: Gracefully extensible to new design decisions
• Adding new facts
• Adding new dimensions
• Adding new dimension attributes
• Dimensional Modeling: Models ‘Business’ rather than relationships among
data elements

107
SPICA OLTP VS DW Structure
DATA SYSTEMS

108
SPICA Steps in Dimensional Modeling
DATA SYSTEMS

Identify the business process.


• From the business process you should be able
to:
– Identify the Grain
– Identify the Dimensions
• Revise grain if necessary
– Identify the Facts

109
SPICA
DATA SYSTEMS

Retail Store Sales


Dimensional Schema

110
SPICA Dimensional Modeling Life Cycle
DATA SYSTEMS

111
Retail Store Summarized
SPICA Business Case
DATA SYSTEMS

Background:
‰Chain consists of over 1000 grocery stores in five states
‰Stores average 60,000 stock keeping units (SKUs) in departments such as frozen
foods, dairy etc.
‰Bar codes are scanned directly into the cash registers’ point of sale (POS) system
‰Products are promoted via coupons, temporary price reductions , ads and in-store
promotions

Analytic Requirements:
‰Need to know what is selling in the stores each day in order to evaluate product
movement, as well as to see how sales are impacted by promotions
‰Need to understand the mix of products in a consumer’s market basket.

112
Design Steps 1-3
SPICA
DATA SYSTEMS

1. Identify the Business Process:

2. Identify the Grain:

3. Identify the Dimensions:

113
SPICA Declare the Grain
DATA SYSTEMS

• What level of details should be made available


in data model?

• Decisions to be made:
1. What should be level of granularity?
2. What if business data contains data at 2 different
granularities?
3. Should we create a summary fact?

A transaction Line item in a bill


114
Identify Dimensions
SPICA
DATA SYSTEMS

• Identify Dimension table and preferably columns


– Methodology:
1. How will I describe my measured data?
2. What will be my report headers?

Date Store

A transaction Line item in a bill

Promotion Product

115
Identify Facts
SPICA
DATA SYSTEMS

• Decisions to be made:
1. Which facts will appear in the fact table?
2. Whether to store calculated values in fact table
or in views?

Sales Quantity
Unit Sales Price
Total sales price
Profit

116
Retail Store Sales Star
SPICA Schema in Action
DATA SYSTEMS

What were the weekly sales for the snacks category during the “Super Bowl”
promotion in the NY District during the month January 2003?
STORE KEY
PRODUCT KEY
Store Name
Product Desc City
Product Size District
DATE KEY
Package Type Zone
Category PRODUCT KEY
STORE KEY
PROMOTION KEY

DATE KEY Sales Qty PROMOTION KEY


Sales $ Amt
Date Desc.. Promotion Desc
Week Discount
Month
Year
117
Data Warehousing Components
SPICA
DATA SYSTEMS

Data capture Data Cleansing/ Data Presentation


Transformation Maintenance

I
OS N
PY Oper.
ES Data Bases F
ETL Data Warehouse O Reporting
RT Or
AE Raw
Data
Process R
data Clean
T M
Stores data M
I S
O A Data Mining
N T
A
L I
External Data Marts Data Marts O
Sources N

Information

Analytic Applications
118
Ralph Kimball’ Approach
SPICA Simplified Elements of a Warehouse
DATA SYSTEMS
Data Mart Bus:
Conformed facts and dims
Extract
Services:
Transform from Data Mart #1

source-to-Target Dimensional
Atomic AND Ad Hoc Query Tools
Maintain Conform summery data
Report Writers
Dimensions Business
Extract Process Centric
No user query
Access Analytic Applications
support Design Goals: Modeling:

Data Store: Easy-of -use Forecasting


Query Scoring
Flat files or Performance Data Mining
relational tables Load
Design Goals:
Extract Data Warehouse
Staging
Throughput
integrity/
consistency
Data Mart #.....

Source System Data Staging Area Presentation Area Data Access Tools
119
Independent Data Marts: Ralph Kimball’s Ideology
Bill Inmon’ Approach
SPICA
DATA SYSTEMS

Data Mart #1

Extract Dimensional
summery data
“Operational Departmental
Centric
Data Store”

Extract Normalized ETL


tables Data Mart #2 Access

Atomic Data
Load User query Data
support to Warehouse
Extract #...
atomic data

Access

Source Data Staging Presentation Data Access


System Area ODS Area Tools
120
Dependent Data Marts: Bill Inmon’s Ideology
Staging Area
SPICA
DATA SYSTEMS

• A storage area and set of process that clean, transform, combine,


removing duplicate, archive and prepare source data for use in data
warehouse
• optional
• Accepts data from different sources
• To cleanse the source data
• Data model is required at staging area
• Multiple data models may be required for parking different sources
and for transformed data to be pushed out to warehouse
• location on network to store data files
• No access to Business Users
• ETL Processes use staging area
– Extract
– Transform
– Load

121
SPICA DW Tools
DATA SYSTEMS

Vendor ETL OLAP


Oracle OWB Siebel Analytics
Microsoft SQL Server SSIS SQL Server SSRS
SAP-Business Objects BO- Data Integrator/SAP Business Objects XIR2/SAP
BW BW

Informatica Informatica Power Center Power Analyzer

IBM Data Stage Cognos 8

122
SPICA Other DW Tools
DATA SYSTEMS

• Modeling
– ERwin
• Data Cleansing Cleansing
– Ascential Quality Manager, Trillium, Vality, FirstLogic,
Innovative Systems

123
SPICA Data Mart
DATA SYSTEMS

A Logical and Physical subset of Data


Warehouse’s presentation area.

124
SPICA Operational Data Store
DATA SYSTEMS

• Operational data store is a real time integrated data


store used by operations team in the organization to
manage business process and their results.

• ODS is a normalized data base that integrates


operational data in near real time to give the current
business performance based on KPI’s to the operational
managers.

• Data Warehouse is used for strategic decision making


while ODS is used to day to day decision making.
125
SPICA Data Mart Topologies
DATA SYSTEMS

Independent Multi Marts Dependent Data Mart


OLTP

DM

Advantages Advantages
• “Fast” Implementation • Single Version of the truth.
• Quick ROI •Consistent Data model.
• Departmental Control • Robust data transformation.

Disadvantages Disadvantages
• Multiple data Models. •Must have an existing data warehouse
•Multiple interfaces to manage/maintain. • Must fit with the corporate strategy
• No single version of the truth.
• Duplication of Data.
126
SPICA Enterprise Architecture with ODS layer
DATA SYSTEMS

IT users
Operational Data

Data transformation

ODS Layer

Enterprise Warehouse

Data Marts/DW

127
Business User
SPICA DW Project Lifecycle
DATA SYSTEMS

• Project Planning
• Business Requirement Definition
• Technical Architecture Design
• Dimensional Modeling
• Physical Design
• Data Staging Design and Development (ETL)
• Analytic Application Specification Design and
Development (OLAP)
• Testing and Production Deployment
• Maintenance

128
SPICA Snowflake Schema
DATA SYSTEMS

• Single fact table surrounded by normalized


dimension tables.
• Less intuitive, slower performance due to joins
• Snow flake schema is used to represent
hierarchies of information.

129
Snowflaking
SPICA
DATA SYSTEMS

• Removal of low cardinality attributes from


dimension table
• Disadvantage: User understanding reduced
• Advantage When dimension is large then
sub-dimension may be built Save Space
• May be loaded at different times than the
main dimension

130
SPICA Example of Snow flake schema
DATA SYSTEMS

131
SPICA Snowflake - Disadvantages
DATA SYSTEMS

• Normalization of dimension makes it difficult for


user to understand
• Decreases the query performance because it
involves more joins
• Dimension tables are normally smaller than fact
tables - space may not be a major issue to
warrant snowflaking

132
Star vs. Snowflake
SPICA Design Variations
DATA SYSTEMS

Promotion
Date

POS
Trxn

Store Product

Brand
City
Prod
Class Color Store
POS District
Size Trxn
Promotion
Day

Year Month Promo Type


Prod
Week

133
SPICA Factless Fact table
DATA SYSTEMS

• Factless fact tables are fact tables that do not


have any measures.
• These kind of fact tables arise when there are
no obvious measures for the business area.
• Daily attendance tracking is one such
example of a business area having no
concrete measures.

134
SPICA Factless Fact table
DATA SYSTEMS

• Table Promotion Fact


Time Key
This promotion fact table records which
Product Key
items are on promotion in which stores
and at what times Store Key

Promotion Key

135
SPICA
DATA SYSTEMS

Updates to Dimension Tables

136
Slowly Changing Dimensions
SPICA
DATA SYSTEMS

• Consider the customer demographics dimension


table. What happens when a customer’s status
changes from rental home to own home?

• Next, look at the product dimension table the


product category for a product was changed.

• The corresponding row in that dimension table


must be changed.
137
SPICA Dimensions..
DATA SYSTEMS

We can derive the following principles about dimensions:

• Most dimensions are generally constant over time


• Many dimensions, though not constant over time, change slowly
• The product key of the source record does not change
• The description and other attributes change slowly over time
• In the source OLTP systems, the new values overwrite the old ones
• Overwriting of dimension table attributes is not always the
appropriate option in a data warehouse
• The ways changes are made to the dimension tables depend on the
types of changes and what information must be preserved in the
data warehouse

138
SPICA Slow-changing Dimensions
DATA SYSTEMS

• When the DW receives notification that some


record in a dimension has changed, there are
three basic responses:
– Type 1 slow changing dimension (Correction of
Errors)
– Type 2 slow changing dimension (Preservation of
History)
– Type 3 slow changing dimension (Alternate Realities)

139
Type 1 Slowly Changing Dimension
SPICA (Overwrite)
DATA SYSTEMS

• Overwrite one or more values of the dimension with the new value
• Use when
– the data are corrected
– there is no interest in keeping history
– there is no need to run previous reports or the changed value is
immaterial to the report
• Type 1 Overwrite results in an UPDATE SQL statement when the
value changes
• If a column is Type-1, the ETL subsystem must
– Add the dimension record, if it’s a new value or
– Update the dimension attribute in place
• Must also update any Staging tables, so that any subsequent
DW load from the staging tables will preserve the overwrite
• This update never affects the surrogate key

140
Applying Type 1 Changes to the
Data Warehouse
SPICA
DATA SYSTEMS

• Overwrite the attribute value in the dimension


table row with the new value
• The old value of the attribute is not preserved
• No other changes are made in the dimension
table row
• The key of this dimension table or any other key
values are not affected
• This type is easiest to implement

141
SPICA Type 1 Change
DATA SYSTEMS

142
Type-2 Slowly Changing Dimension
SPICA (Preservation of History)
DATA SYSTEMS

• Standard When a record changes, instead of overwriting


– create a new dimension record
– with a new surrogate key
– add the new record into the dimension table
– use this record going forward in all fact tables
– no fact tables need to change
– no aggregates need to be re-computed
• Perfectly partitions history because at each detailed version of the
dimension is correctly connected to the span of fact tables for which
that version is correct

143
SPICA Type-2 Slowly Changing Dimensions
DATA SYSTEMS

• The natural key does not


change
• The job attribute changes
• We can constraint our query
– the Manager job
– Joe’s employee id
• Type-2 do not change the
natural key (the natural key
should never change)

144
SPICA Types of Type 2
DATA SYSTEMS

• Flag based (Active/Inactive)


• Version based (1,2,3..)
• Start Date & End Date based

145
SPICA Type-2 SCD Precise Time Stamping
DATA SYSTEMS

• With a Type-2 change, you might want to


include the following additional attributes in the
dimension
– Date of change
– Exact timestamp of change
– Reason for change
– Current Flag (current/expired)

146
Type 2 Change
SPICA (Marital status change than address change)
DATA SYSTEMS

147
Type-3 Slowly Changing Dimensions
SPICA (Alternate Realities/Soft Changes)
DATA SYSTEMS

• Applicable when a change happens to a dimension record but the old record remains
valid as a second choice
– Product category designations
– Sales-territory assignments
• Instead of creating a new row, a new column is inserted (if it does not already exist)
– The old value is added to the secondary column
– Before the new value overrides the primary column
– Example: old category, new category
• Usually defined by the business after the main ETL process is implemented
– “Please move Brand X from Men’s Sportswear to Leather goods but allow me to track Brand
X optionally in the old category”
• The old category is described as an “Alternate reality”

148
SPICA Type 3 Change
DATA SYSTEMS

149
SPICA Standardize Dimension & Facts
DATA SYSTEMS

• Conform Dimensions
• Conform Facts

150
SPICA Conformed Dimensions
DATA SYSTEMS

• Dimensions are conformed when they are either


exactly the same (including the keys) or one is a
perfect subset of the other.
• If the product dimension is shared between two
fact tables of sales and inventory, then the
attributes of the product dimension must have
the same meaning in relation to each of the two
fact tables.

151
SPICA Conformed Dimensions..
DATA SYSTEMS

• A conformed dimension is a comprehensive


combination of attributes from the source
systems after resolving all discrepancies and
conflicts.
• For example, a conformed product dimension
must truly reflect the master product list of the
enterprise and must include all possible
hierarchies. Each attribute must be of the correct
data type and must have proper lengths and
constraints.
152
SPICA Conformed Dimensions….
DATA SYSTEMS

153
SPICA Conformed Facts
DATA SYSTEMS

• Facts from multiple fact tables are conformed when the


technical definitions of the facts are equivalent.
• Ensure same definitions and terminology across data
marts
• Types of facts to be standardized include revenue, price,
cost, and profit margin
• Guarantee that the same algorithms are used for any
derived units in each fact table
• Make sure each fact uses the right unit of measurement

154
SPICA BUS MATRIX
DATA SYSTEMS

Dimensions Facts

Product Customer Employee Time Revenue Salary

Sales X X x X X

HR X X X

CRM X X X

Marketing X X X X

155
SPICA Business Objects Data Integrator
DATA SYSTEMS

156
SPICA Course Overview
DATA SYSTEMS

1. Understanding Data Integrator


2. Defining Source & Target Metadata
3. Creating a Batch Job
4. Creating Data Flows
5. Validating,Tracing & Debugging Batch Jobs
6. Using Built in Transforms
7. Using Built in Functions
8. Using Data Integrator Scripting Language & Variables
9. Using Workflows
10. Capturing Changes in Data
11. Handling Errors & Auditing
12. Installation & Administrator
157
SPICA Architecture
DATA SYSTEMS

158
SPICA Designer
DATA SYSTEMS

• The Designer is a development tool with an


easy-to-use graphical user interface. It enables
developers to define data management
applications that consist of data mappings,
transformations, and control logic.
• Use the Designer to create applications
containing work flows (job execution definitions)
and data flows (data transformation definitions).

159
Data Integrator repository
SPICA
DATA SYSTEMS

• The Data Integrator repository is a set of tables that hold


user-created and predefined system objects, source and
target metadata, and transformation rules.
• Set up repositories on an open client/server platform to
facilitate sharing metadata with other enterprise tools.
• Store each repository on an existing RDBMS.
• Each repository is associated with one or more Data
Integrator Job Servers which run the jobs you create.

160
Two types of repositories:
SPICA
DATA SYSTEMS

• A local repository is used by an application


designer to store definitions of Data Integrator
objects (like projects, jobs, work flows, and data
flows) and source/target metadata.
• A central repository is an optional component
that can be used to support multi-user
development. The central repository provides a
shared object library allowing developers to
check objects in and out of their local

161
Data Integrator Job Server
SPICA
DATA SYSTEMS

• The Data Integrator Job Server starts the data


movement engine that integrates data from multiple
heterogeneous sources, performs complex data
transformations, and manages extractions and
transactions from ERP systems and other sources.
• The Data Integrator Job Server can move data in either
batch or real-time mode and uses distributed query
optimization, multithreading, in-memory caching, in-
memory data transformations, and parallel processing to
deliver high data throughput and scalability.

162
Data Integrator engine
SPICA
DATA SYSTEMS

• When Data Integrator jobs are executed, the Job


Server starts Data Integrator engine processes
to perform data extraction, transformation, and
movement.

163
Data Integrator Access Server
SPICA
DATA SYSTEMS

• The Access Server is a real-time, request-reply


message broker that collects message requests,
routes them to a real-time service, and delivers a
message reply within a user-specified time
frame.

164
Data Integrator Administrator
SPICA
DATA SYSTEMS

• The Administrator provides browser-based


administration of Data Integrator resources
including:
– Scheduling, monitoring, and executing batch jobs
– Configuring, starting, and stopping real-time services
– Configuring Job Server, Access Server, and
repository usage
– Managing users

165
SPICA Data Integrator Objects
DATA SYSTEMS

• Projects
• Jobs
• Data Flows
• Work flows
• Scripts
• Transforms

166
SPICA Projects & Jobs
DATA SYSTEMS

• Projects
– A project is an object that allows you to group jobs. A
project is the highest level of organization offered by
Data Integrator.
– Opening a project makes one group of objects easily
accessible in the user interface.
• Jobs
– A job is the only object you can execute. You can
manually execute and test jobs in development.
– In production, you can schedule batch jobs.
– A job is made up of steps you want executed
together. 167
SPICA Data Flow
DATA SYSTEMS

• Data flows extract, transform, and load data.


• Everything having to do with data, including
reading sources, transforming data, and loading
targets, occurs inside a data flow.
• The lines connecting objects in a data flow
represent the flow of data through data
transformation steps.

168
SPICA Workflow
DATA SYSTEMS

• A work flow defines the decision-making process


for executing data flows.
• For example, elements in a work flow can
determine the path of execution based on a
value set by a previous job or can indicate an
alternative path if something goes wrong in the
primary path.

169
SPICA Transforms
DATA SYSTEMS

• Data Integrator includes objects called


transforms.
• Transforms operate on data sets.
• Transforms manipulate input sets and produce
one or more output sets.
• E.g.
– Query
– Case
– Date Generator
– Etc.

170
SPICA Scripts
DATA SYSTEMS

• Scripts are single-use objects used to call


functions and assign values to variables in a
work flow.
• For example, you can use the SQL function in a
script to determine the most recent update time
for a table and then assign that value to a
variable.
• You can then assign the variable to a parameter
that passes into a data flow and identifies the
rows to extract from a source.

171
SPICA Object Use
DATA SYSTEMS

• Single Use Objects


– Some objects are defined only within the context of a
single job or data flow, for example scripts and
specific transform definitions.
• Re-usable Objects
– A reusable object has a single definition; all calls to
the object refer to that definition. If you change the
definition of the object in one place, you are changing
the object in all other places in which it appears.

172
SPICA Relationship between the objects
DATA SYSTEMS

• Jobs are composed of work flows and/or data flows


• Work flow is an incorporation of several data flows
• Data flow is the process by which source data is
transformed into target data
• Work flow:
Control Control
Data flow Data Flow
Operations Operations

• Data flow:

Source(s) DataTransformation(s) Target(s)

173
SPICA Projects
Object Hierarchy
DATA SYSTEMS
Jobs

Scripts Work Flows &


Conditionals*

Scripts Batch Data


Flows**

Transforms

Data Sources and Targets

Database
Datastores

Tables
File Formats

Template
Tables Flat Files

Functions

Work flows and conditionals are


* optional and can be embedded 174
** Data flows can also be embedded
SPICA Data Integrator Development Process
DATA SYSTEMS

• Design
• Test
• Production

175
SPICA Designer Interface
DATA SYSTEMS

• Key Areas of Designer windows


• Toolbar
• Local Object Library
• Project Area
• Tool palette
• Workspace

176
SPICA Key Areas of Designer Window
Menu Bar Workspace
DATA SYSTEMS
Project
Area
Tool Palette

Local
Object
Library

177
SPICA Defining Source and Target Metadata
DATA SYSTEMS

• Using Data store


• Importing Metadata
• Defining a File Format

178
SPICA Datastores
DATA SYSTEMS

• Connection or multiple connections to data sources


• Import metadata from the data source
• Metadata consists of:
– Table name
– Column names
– Column data types
– Primary key columns
– Table attributes
– RDBMS functions
– Application specific data structure
• Data Integrator uses datastores to read data from source table or
load data to target table

179
SPICA Types of Datastores
DATA SYSTEMS

• Database Datastores:
– Provides a simple way to import way to import
metadata
• Application Datastores:
– Easily import metadata from most ERP systems
• Adapter Datastores:
– Provides access to an application’s data and
metadata or just metadata

180
SPICA Using Datastore
DATA SYSTEMS

• Explain Datastore
• Create a database datastore
• Change a datastore definition

181
SPICA Importing metadata
DATA SYSTEMS

• Types of metadata
• Capture metadata information from imported data
• Import metadata by browsing
• Activity : Creating ODS and importing table metadata

182
SPICA Defining a file format
DATA SYSTEMS

• Explain file formats (Fixed width, Comma Separated)


• Handle errors in file formats

183
SPICA Handling errors in file format
DATA SYSTEMS

• When you enable error handling in File Format


Editor, Data Integrator:
– Checks for two types of Flat-file source error:
• Data-type conversion error
• Row-format error

– Stop processing the source file after reaching a


specified number of invalid rows
– Logs data-type conversion or row-format warning to
the Data Integrator error log

184
SPICA Exercise
DATA SYSTEMS

• Login to Designer
• Import Tables
• Import File Definition & define error handling

185
SPICA Creating a Batch Job
DATA SYSTEMS

186
SPICA Agenda
DATA SYSTEMS

• Create a Project
• Creating a Job
• Explain source & target objects
• Explain what a transform is
• Understand the Query transform
• Understand Job Execution
• Activity
– Defining a data flow to load into target
– Using format file to populate a target table
– Using a template tables

187
SPICA Introduction
DATA SYSTEMS

• Create a project
• Create a job
• Create Data Flow
• Add,connect and delete objects in workspace
• Using Query Transforms

188
SPICA Create Data Flow
DATA SYSTEMS

189
SPICA Source Objects & Target Objects
DATA SYSTEMS

• Define sources from which to read & write data


– Table
– Template table
– File
– Document
– XML file
– XML message

190
SPICA Transform
DATA SYSTEMS

Transform manipulate input sets and produce one or


more output sets
Sometimes transforms such as Date_generation and
SQL transform can also be used as source objects.
Use operation codes with transforms to indicate how
each row in the data set is applied to target table
Most commonly used transform is Query transform

191
SPICA Understanding Query Transform
DATA SYSTEMS

• Query transform can perform following operations:


– Choose (filter) the data to extract from sources
– Join data from multiple sources
– Map columns from input to output schemas
– Perform transformations and functions on data
– Add new columns to output schema
– Assign primary key to output columns

192
SPICA Query Editor Window
Input Schema Output Schema
DATA SYSTEMS

Each Tab represents different SQL options


193
Smart Edit Area
Editor
SPICA Understanding Target Table Editor
DATA SYSTEMS

• Target table loader offers different tabs where


you can set database type properties,different
table loading options, and use different tuning
techniques for loading a job

194
SPICA Table loading options
DATA SYSTEMS

Rows per commit No. of rows sent to target database in one fetch
process

Column comparison This specifies how the input columns are mapped
to output columns. There are 2 options:

Compare by position: Data Integrator disregards


the column names and maps source column to
target column by position.

Compare by name: Data Integrator maps source


columns to target columns by name
Delete data from table before Used for batch jobs, sends a TRUNCATE
loading statement to clear the contents of table before
loading. Defaults to not selected.

Enable Partitioning This loads data using the no. of partitions in the
table as maximum number of parallel instances.You
can select one of the following loader options: No.
of loaders,Enable partitioning or transactional
loading. 195
SPICA
DATA SYSTEMS

No. of loaders Loading with one loader is known as Single loader loading.

Parallel loading refers to loading jobs that contain a number of loaders greater than one.
The default no. of loaders for this option is one. Maximum no. of loaders is five

Use overflow file This option is used for recovery purposes. If row cannot be loaded, it is written to a file.
When this option is selected, options are enabled for the file name and file format. The
overflow format can include the data rejected and the operation being
performed(write_data) or SQL command used to produce the rejected operation(write _sql)

Ignore columns with value Enter a value that might appear in source column and that you do not want updated in the
target table. When this value appears in the source column , the corresponding target
column is not updated during auto correct loading. You can enter spaces.

Ignore columns with null Auto correct load options check box should be enabled to use this option and if you do not
want NULL source columns updated in the target table

196
SPICA
DATA SYSTEMS

Use input keys If target table contains no primary key, this option enables
Data Integrator to use primary key from the input.

Update Key columns This option allows Data Integrator to update key column
values when it loads data to target

Auto correct load Ensures that same row is not duplicated in a target table.
This is particularly useful for data recovery operations.

Include in transaction Indicates that this target is included in the transaction


processed by a batch or real time job. This option allows you
to commit data to multiple table as a part of same
transaction. If loading fails for any one of the tables, no data
is committed to any of the tables.
Transaction order Indicates where this table falls in loading order of the tables
being loaded. By default , there is no ordering.
Tables with transaction order zero are loaded at the 197
discretion of the data flow process.
SPICA Save & Validate the Job
DATA SYSTEMS

198
List of Trace Options
SPICA
DATA SYSTEMS

Trace Description

Row Writes a message when a transform imports or exports a row

Session Writes a message when a job description is read from the repository, when the job is optimized and when the job
runs.

Work Flow Writes a message when a work flow description is read from the repository ,when the work flow is optimized
,when the work flow runs and when the work flow ends.

Data Flow Writes a message when the data flow starts, when the work flow is optimized, when the work flow runs, and when
the work flow ends.

Transform Writes a message when a transform starts,completes or terminates.

Custom Writes a message when a case transform starts and completes successfully.
Transform
Custom functions Writes a message of all user invocations of the AE_log message function from custom C Code

SQL functions Writes data retrieved before SQL functions:


• Every row retrieved by the named query before the SQL is submitted in the key generation function• Every row
retrieved by the named query before theSQL is submitted in the lookup function (but only if PRE_LOAD_CACHE is
not specified).
• When mail is sent using the mail_to function.

SQL Readers Writes a message (using Table comparison transforms) about whether a row exists in the target table that
corresponds to input row in the source table.

199
SPICA Using Log files
DATA SYSTEMS Error log

Tool Description
Monitor log Itemizes the steps executed in the job and
the time execution began and ended
Statistics log Displays each step of each data flow in the
job, the number of rows streamed through
each step, and the duration of each step.

Error log Displays the name of the object being


executed when an Data Integrator error
occurred and the text of the resulting error
message. If the job ran against SAP data,
some of the ABAP errors are also available
in the
Monitor log Data Integrator error log. Statistics log

200
SPICA Using descriptions & annotations
DATA SYSTEMS

• Use description with objects


– Add description to an object :
Designer determines when to show objects descriptions based
on system level setting and
an object level setting.
– Display a description in the workspace
Display the object by right click the object in the workspace and select View
Enabled Descriptions.

• Use annotations to describe job,work and data


flows
– Add annotations to job,work flow,data flow or diagram
in the workspace. 201
Using View Data and the Interactive
SPICA Debugger
DATA SYSTEMS

• Sources
View data allows you to see source data before you execute a job.
Using data details you can:
– Create higher quality job designs
– Scan and analyze imported table and file data from object library
– See the data for those same objects within existing jobs.
– Refer back to the source data after you execute the job.

• Targets
View data allows you to check target data before you
executing a job,then look at the changed data after the job
executes. In a data flow, you can use one or more View data
panels to compare data between transforms and within source
and target objects. 202
SPICA Using Interactive Debugger
DATA SYSTEMS

• The Designer includes an interactive debugger that allows you to examine and
modify data row-by-row (during a debug mode job execution) by placing filters
and breakpoints on lines in a data flow diagram. The interactive debugger
provides powerful options to debug a job.Designer displays 3 additional
windows : Call Stack,Trace,Variables & View Data Panes

Call Stack
Windows

Variable windows

Trace Windows

Data sample rate — The number of rows cached for each line when a job executes using the interactive debugger. For
example, in the following data flow diagram, if the source table has 1000 rows and you set the Data sample rate to 500,
then the Designer displays up to 500 of the last rows that pass through a selected line. The debugger displays the last
row processed when it reaches a breakpoint 203
SPICA Setting filters and breakpoints
DATA SYSTEMS

• Before you start a debugging session, however, you might want to set the
following:
• Filters and breakpoints
• Interactive debugger port between the Designer and an engine.

204
SPICA Template tables
DATA SYSTEMS

• During the initial design of an application, you might find it convenient


to use template tables to represent database tables. With template
tables, you do not have to initially create a new table in your DBMS
and import the metadata into Data Integrator. Instead, Data Integrator
automatically creates the table in the database with the schema
defined by the data flow when you execute a job. After creating a
template table as a target in one data flow, you can use it as a source
in other data flows. Though a template table can be used as a source
table in multiple data flows, it can only be used as a target in one data
flow.
• Template tables are particularly useful in early application development
when you are designing and testing a project.
• When the job is executed, Data Integrator uses the template table to
create a new table in the database you specified when you created the
template table. Once a template table is created in the database, you
can convert the template table in the repository to a regular table.
• Once a template table is converted, you can no longer alter the
schema. 205
Creating Template Table
SPICA
DATA SYSTEMS

206
SPICAConvert template table into regular table
DATA SYSTEMS

207
SPICA Using Built-in transforms
DATA SYSTEMS

208
SPICA List of Built In Transforms
DATA SYSTEMS

• Query
• Case
• Merge
• Data Transfer
• Date Generation
• Key Generation
• Validation

209
SPICA Query Transform
DATA SYSTEMS

• Query transform can also be used for following.


– Performing calculations
– Joining sources
– Aggregation

210
SPICA Propose Join
DATA SYSTEMS

211
SPICA Outer Join Specifications
DATA SYSTEMS

212
SPICA Outer Join Example
DATA SYSTEMS

213
Join Rank
(The highest ranked source is accessed first to construct the join.
SPICA )
DATA SYSTEMS

214
SPICA Calculation
DATA SYSTEMS

215
Convert Query to Flat file & connect it as
SPICA target (Useful for testing purpose)
DATA SYSTEMS

216
SPICA Aggregate Data
DATA SYSTEMS

217
SPICA Group By
DATA SYSTEMS

218
SPICA Using Case Transforms
DATA SYSTEMS

• Case Transforms: Specifies multiple paths in a single transform


(different rows are processed in different ways).
• It provides case logic based on row values and operates within data
flows.
• You use the Case Transform to simplify branch logic in data flows by
consolidating case or decision making logic into one transform.

219
SPICA Case Expression
DATA SYSTEMS

220
SPICA Case Logic
DATA SYSTEMS

• Row can be TRUE for one case only option is


enabled, the row is passed to the first case
whose expression returns TRUE. Otherwise, the
row is passed to all the cases whose expression
returns TRUE.
• Default: If row doesn’t pass any case it goes to
default case. Default case can be renamed.

221
SPICA Case output
DATA SYSTEMS

222
SPICA Exercise
DATA SYSTEMS

223
SPICA Merge Transform
DATA SYSTEMS

• Combines incoming data sets, producing a


single output data set with the same schema as
the input data sets.
• All sources must have the same schema,
including:
• The same number of columns
• The same column names
• The same data types of columns
-If column names are not same it can be passed through Query
Transform to change the column name
224
SPICA Data flow with Merge Transform
DATA SYSTEMS

225
Data Outputs
SPICA
DATA SYSTEMS

• The output data set contains a row for every row in the
source data sets.
• The transform does not strip out duplicate rows.
• If the data types of columns in the sources do not match
the target, add a query in the data flow before the Merge
transform. In the query, apply a data type conversion to
the columns with data types that do not match the target
column data types.
• You must apply other operations such as DISTINCT in a
query following the Merge transform.

226
SPICA Merge Output
DATA SYSTEMS

227
SPICA Exercise
DATA SYSTEMS

228
SPICA Date_generation Transform
DATA SYSTEMS

• Produces a series of dates incremented as you specify


• Use this transform to produce the key values for a time
dimension target.
• From this generated sequence you can populate other
fields in the time dimension (such as day_of_week)
using functions in a query.
• Data Outputs: A data set with a single column named
DI_GENERATED_DATE containing the date sequence.
• The rows generated are flagged as INSERT.
• Generated dates can range from 1900.01.01 through
9999.12.31.
229
SPICA Options
DATA SYSTEMS

• Start date The first date in the output sequence.


Specify this date using the following format:
YYYY.MM.DD where YYYY a year, MM is a
month value, and DD is a day value.
• End date The last date in the output sequence.
Use the same format used for Start date to
specify this date.
• Increment The interval between dates in the
output sequence. Select Daily, Monthly, or
Weekly.

230
SPICA Options..
DATA SYSTEMS

• Join rank
• A positive integer indicating the weight of the output data
set if the data set is used in a join. Sources in the join
are accessed in order based on their join ranks. The
highest ranked source is accessed first to construct the
join.
• Cache
• Select this check box to hold the output from the
transform in memory to be used in subsequent
transforms. Select Cache only if the resulting data set is
small enough to fit in memory.

231
SPICA Configuration
DATA SYSTEMS

232
SPICA Key Generation
DATA SYSTEMS

• Generates new keys for new rows in a data set.


• When it is necessary to generate artificial keys in
a table, the Key_Generation transform looks up
the maximum existing key value from a table
and uses it as the starting value to generate new
keys.
• The transform expects the generated key
column to be part of the input schema.

233
Options
SPICA
DATA SYSTEMS

• Table Name
– The fully qualified name of the source table from
which the maximum existing key is determined.
Should already be imported.
• Generated key column
– The column in the key source table containing the
existing keys values. A column with the same name
must exist in the input data set;
• Increment value
– The interval between generated key values.

234
SPICA Configuration
DATA SYSTEMS

235
SPICA Time Dimension Population
DATA SYSTEMS

236
SPICA Data Transfer Transform
DATA SYSTEMS

• Data Integrator generates SQL SELECT statements to retrieve the


data from source databases.
• Data Integrator automatically distributes the processing workload by
pushing down as much as possible to the source database server.
• Pushing down operations provides the following advantages:
– Use the power of the database server to execute SELECT operations
(such as joins, Group By, and common functions such as decode and
string functions). Often the database is optimized for these operations.
– Minimize the amount of data sent over the network. Less rows can be
retrieved when the SQL statements include filters or aggregations.
• You can also do a full push down from the source to the target,
which means Data Integrator sends SQL INSERT INTO... SELECT
statements to the target database.

237
SPICA Data Transfer Transform..
DATA SYSTEMS

• Data Transfer transform Writes the data from a source or


the output from another transform into a transfer object
and subsequently reads data from the transfer object.
• The transfer type can be a relational database table,
persistent cache table, or file.
• Use the Data_Transfer transform to push down
operations to the database server when the transfer type
is a database table.
• You can push down resource-consuming operations
such as joins, GROUP BY, and sorts.

238
SPICA Configuration
DATA SYSTEMS

239
Example
SPICA
DATA SYSTEMS

• This simple data flow contains a Query


transform that does a lookup of sales subtotals
and groups the results by country and region.

240
SPICA Example..
DATA SYSTEMS

• Suppose the GROUP BY operation processes


millions of rows.
• Data Integrator cannot push the GROUP BY
operation down to the database because the
Query before it contains a lookup_ext function
which Data Integrator cannot push down.
• You can add a Data_Transfer transform to split
the lookup_ext function and the GROUP BY
operation into two sub data flows to enable Data
Integrator to push the GROUP BY to the target
database.
241
SPICA Example..
DATA SYSTEMS

When you execute the job, Data Integrator displays messages for
each sub data flow. Also watch table getting created & dropped during
job run.

242
SPICA Exercise
DATA SYSTEMS

243
SPICA Using Validation Transform
DATA SYSTEMS

• The validation transform provides the ability to compare your


incoming data against a set of pre-defined business rules and, if
needed, take any corrective actions. View Data features can identify
anomalies in the incoming data to help you better define corrective
actions in the validation transform.

244
Example
SPICA
DATA SYSTEMS

• The following example defines a validation


transform that verifies that the data for zip code
as a five-digit number.

245
SPICA Apply Validation Rule on Zip Code
DATA SYSTEMS

246
SPICA Input Data
DATA SYSTEMS

247
SPICA Output in both Targets
DATA SYSTEMS

248
SPICA Built in Transforms & Operation Codes
DATA SYSTEMS

• Transforms manipulate data inputs and produce


one or more output data sets.
• Operation codes maintain the status of each row
in each dataset described by inputs to and
outputs from objects in data flows.

249
SPICA Operation Codes
DATA SYSTEMS

Normal Creates a New row in the target. All rows in a data set are
flagged as normal when they are extracted from source table.
Most of the transforms operate only on rows flagged as
NORMAL
Insert Rows can be flagged as INSERT by the Table _Comparison
transforms to indicate that a change occurred in a data set as
compared with an earlier image of the same data set. The
Map_Operation transform can also produce rows flagged as
INSERT. Only History_Preserving and Key_Generation
transforms can accept data sets with rows flagged as INSERT
as input.
Delete Is ignored by the target. Rows flagged as DELETE are not
loaded.Rows can be flagged as DELETE in the Map_Operation
and Table Comparison transforms. Only the
History_Preserving,transform with the Preserve delete row(s)
as update row(s)option selected, can accept data sets with
rows flagged as
DELETE.
Update Rows can be flagged as UPDATE by the Table _Comparison
transform to indicate that a change occurred in a data set as
compared with an earlier image of the same data set.
Map_Operation transform can also produce rows flagged as
UPDATE. Only History_Preserving and Key_Generation
transforms can accept data sets with rows flagged as UPDATE
250
as input.s
SPICA Workflows
DATA SYSTEMS

Workflows, Variables, Scripting & Functions

251
What is a work flow?
SPICA
DATA SYSTEMS

• A work flow defines the decision-making process


for executing data flows.
• For example, elements in a work flow can
determine the path of execution based on a
value set by a previous job or can indicate an
alternative path if something goes wrong in the
primary path.

252
SPICA Jobs Vs Workflow
DATA SYSTEMS

• Jobs are special work flows.


• Jobs are special because you can execute them.
• Almost all of the features documented for work
flows also apply to jobs, with one exception: jobs
do not have parameters.

253
Elements in work flows
SPICA
DATA SYSTEMS

• Workflow
• Data flows
• Scripts
• Conditionals
• While loops
• Try/catch blocks

254
Order of execution in work flows
SPICA
DATA SYSTEMS

• Sequential Dataflow execution from left to right

255
SPICA Parallel Dataflow execution
DATA SYSTEMS

256
SPICA Parallel execution of complex workflows
DATA SYSTEMS

257
Using Data Integrator Scripting Language
and Variables
SPICA
DATA SYSTEMS

• You can increase the flexibility and reusability of


work flows and data flows using local and global
variables when you design your jobs.
• E.g. Single Dataflow can be used for processing
either US or UK data when run at different times.
• Query Transform can have where clause as
$region & based on the value of based on the
parameter value data will be processed either
for US or UK.

258
SPICA Variables..
DATA SYSTEMS

• If you define variables in


a job or work flow, Data
Integrator typically uses
them in a script, catch, or
conditional process.

259
Variables can be used as file names for:
SPICA
DATA SYSTEMS

• Flat file sources and targets


• XML file sources and targets

260
SPICA Local Vs Global Variables
DATA SYSTEMS

• Global variables are defined at Job level & can


be used by any object within the job.
• Local variables are defined at workflow or
dataflow level

261
SPICA Parameters
DATA SYSTEMS

• Parameters can be defined to:


– Pass their values into and out of work flows
– Pass their values into data flows

262
SPICA Parameter & Variable creation
DATA SYSTEMS

263
Passing values into data flows
SPICA
DATA SYSTEMS

264
Defining local & Global variables
SPICA
DATA SYSTEMS

265
Defining parameters
SPICA
DATA SYSTEMS

• There are two steps for setting up a parameter


for a work flow or data flow:
– Add the parameter definition to the flow.
– Set the value of the parameter in the flow call.

266
SPICA Assign value to Parameter
DATA SYSTEMS

267
Viewing global variables- Job Property
SPICA
DATA SYSTEMS

268
Setting global variable values
SPICA
DATA SYSTEMS

• Values for global variables can be set outside a


job:
– As a job property
– As an execution or schedule property
– It can also be set while running job from designer
– Data Integrator saves schedule property values in the
repository. However, these values are only
associated with a job schedule, not the job itself.
Consequently, these values are viewed and edited
from within the Administrator.

269
Understanding Data Integrator Scripting
SPICA Language
DATA SYSTEMS

• Introduction
• Language Syntax
– Supports ANSI SQL-02 varchar behavior
– Treats an empty string as zero length varchar value(instead of NULL)
– Evaluates comparisons to FALSE.
– Uses new is NULL and IS NOT NULL operators in Data Integrator
Scripting language to test for NULL values.
– Treats trailing blanks as regular characters, instead of trimming them,
when reading from all sources.
– Ignores trailing blanks in comparisons in transforms(Query and
Table_Comparison) and functions
(decode,ifthenelse,lookup,lookup_ext,lookup_seq)

270
SPICA Basic Syntax rules
DATA SYSTEMS

• Statements end with a semicolon(;)


• Variables begin with the dollar sign($)
• String values are enclosed in single quotes(‘)
• Comments begin with pound(#)

271
Comparison results for the
variable assignments $var1 = NULL and $var2=NULL
SPICA
DATA SYSTEMS

Conditions Translates to Returns


If (NULL=NULL) NULL is equal to Null FALSE
If (NULL! =NULL) NULL is not equal to Null FALSE
If (NULL=‘ ’ ) NULL is equal to empty string FALSE
If (NULL! =‘ ’ ) NULL is not equal to empty string TRUE
If (‘bbb’= NULL) bbb is equal to Null FALSE
If (‘bbb’ !=NULL) bbb is not equal to Null FALSE
If (‘bbb’ =‘ ’) bbb is equal to empty string FALSE
If (‘bbb’ !=‘ ’) bbb is not equal to empty string TRUE
If ($var =NULL) NULL is equal to Null FALSE

If ($var ! =NULL) NULL is not equal to Null FALSE


If ($var1=‘ ’) NULL is equal to empty string FALSE
If ($var1 !=‘ ’) NULL is not equal to empty string FALSE
If ($var1=$var2) NULL is equal to Null FALSE
If ($var1!=$var2) NULL is not equal to Null FALSE
272
Comparing two variables always test for
SPICA NULL
DATA SYSTEMS

Conditions Recommendations
If ($var1=$var2) Do not compare without explicitly
testing for NULLS. Business
Objects does not recommend
using this logic because any
relational comparison to NULL
value returns FALSE.
If (($var1 IS NULL) AND ($var2 IS Will execute the TRUE branch if
NULL)) OR ($var1-$var2)) both $var1 and $var2 are NULL, or
if neither are NULL but are equal to
each other.

273
SPICA Workflow Object-Scripts
DATA SYSTEMS

• Scripts are single-use objects used to call


functions and assign values to variables in a
work flow.

274
Setting file names at run-time using variables
SPICA
DATA SYSTEMS

• When you use variables as sources and targets,


you can also use multiple file names and wild
cards.

275
Workflow-Try/catch blocks
SPICA
DATA SYSTEMS

• A try/catch block is a combination of one try


object and one or more catch objects that allow
you to specify alternative work flows if errors
occur while Data Integrator is executing a job.

276
SPICA Try-Catch
DATA SYSTEMS

277
SPICA Available Exceptions for Catching
DATA SYSTEMS

This object gets executed when exception is caught


(Could be a workflow or script etc.)

278
Conditionals
SPICA
DATA SYSTEMS

• Conditionals are single-use objects used to


implement if/then/else logic in a work flow

279
SPICA Workflow with try-catch & conditional
DATA SYSTEMS

280
SPICA Conditional Configuration
DATA SYSTEMS

281
SPICA Functions
DATA SYSTEMS

• Define built in functions


• Differentiate between functions and transforms
• List types of operations available for functions
• Using functions in expressions
• Use date and time functions and date_generation
transform to build a dimension table
• Use lookup functions to look up status in a table
• Understand other functions

282
SPICA Built in functions
DATA SYSTEMS

• Functions take input values and produce a


return value. Functions also operate on
individual values passed to them
• Input values can be parameters passed into
data flow,values from column of data or
variables defined inside a script. You can use
functions in expressions that include script and
conditional statements.

283
Differentiating between functions and
SPICA transforms
DATA SYSTEMS

• Functions operate on single values, such as


values in specific columns in data set.
• Transforms operate on data set
,creating,updating & deleting rows of data

284
SPICA Types of operations for Functions
DATA SYSTEMS

Operative Type Description


Aggregate Generates a single value from
a set of values. Aggregate
functions,such as max.min &
count.

Iterative Maintains state information


from one invocation to another.
The life of iterative function’s
state information is the
execution life of the query in
which they are included. E.g.
Lookup
Stateless State information is not
maintained from one invocation
to the next. Stateless functions
such as to_char or month can
be used anywhere expressions 285
are allowed.
SPICA Functions
DATA SYSTEMS

286
SPICA Other types of functions
DATA SYSTEMS

• Database & application functions –


Functions specific to you DBMS
• Custom functions- Functions you define
yourself

287
SPICA Using functions in expressions
DATA SYSTEMS

• Functions are typically used to add:


– Columns based on some other value (lookup function)
– Generated key fields
• You can use functions in:
– Transforms : for e.g Query,Case and SQL transforms.
– Scripts: these are single-use objects used to call functions and
assign values to variables to work flow.
– Conditionals: these are single-use objects used to implement
if/then/else logic in work flow. Conditionals and their
components- if expressions,then and else diagrams- are
included in the scope of parent control flow’s variables and
parameters.
– Other custom functions
288
SPICA Custom functions
DATA SYSTEMS

– Written by the user in Data Integrator scripting language


– Reusable objects
– Managed through the function wizard

Consider these guidelines when you create your own functions:

• Functions can call other functions.


• Functions cannot call themselves.
• Functions cannot participate in a cycle of recursive calls. For
example,function A cannot call function B, which calls function A.
• Functions return a value.
• Functions can have parameters for input, output, or both. However, data
flows cannot pass parameters of type output or input/output.

Before creating a custom function, you must know the input, output, and
return values and their data types. The return value is predefined to be
Return. 289
SPICA Using built in functions
DATA SYSTEMS

• The built in functions for date and time and built-in date_generation transform
are useful when building a time dimension table.
• to_char
– To convert date to string.
• to_date
– To convert a string to a date.
• Month
– To determine the month in which the given date fails.
• Quarter
– To determine the quarter in which given date fails.

290
Use lookup functions to look up status in a
SPICA table
DATA SYSTEMS

• A specialized type of join, similar to an SQL outer join


– A SQL outer join may return multiple matches for a single record in the
outer table.
– Lookup functions always return exactly the same no. of records that are
in the source (outer) table.
– Sophisticated caching options
– A default value when no match is found.

291
SPICA Lookup functions
DATA SYSTEMS

292
Lookup_ext()
SPICA
DATA SYSTEMS
• While all lookup functions return one row for each row in the source they differ by how
they choose which of several matching rows to return:
– Lookup_ext()
• Allows specification of an Order by column and Return policy(Min,Max) to
return the record with highest/lowest value in a given field. For e.g. a
surrogate key.
• This function also extends functionality by allowing you to :
– Return multiple columns from a single lookup.
– Choose from more operators to specify a lookup condition.
– Specify a return policy for lookup.
– Call lookup_ext, using Function Wizard, in the query output.
.

293
SPICA lookup_ext Syntax:
DATA SYSTEMS

lookup_ext ([translate_table, cache_spec, return_policy],


[return_column_list], [default_value_list],[condition_list],
[orderby_column_list],[output_variable_list], [sql_override])

294
SPICA Using lookup_seq
DATA SYSTEMS

• Retrieves a value in a table or file based on the values in a different source


table or file and a particular sequence value.
• Lookup_seq syntax:
lookup_seq ( translate_table, result_column, default_value,
sequence_column, sequence_expression, compare_column,expression)

295
Use database type functions to
SPICA return information on data sources
DATA SYSTEMS

• db_type
– function returns the database type of the data store configuration in use at runtime.
This function is useful if your datastore has multiple configurations.
Syntax: db_type(ds_name)
• db_version
– Function returns the database version of datastore configuration in use at runtime.
Syntax: db_version(ds_name)
• db_database_name
– Function returns the database name of the datastore configuration in use at runtime.
Syntax:db_database_name(ds_name)

296
DB Functions
SPICA
DATA SYSTEMS

• db_owner :
– Function returns the real owner name for the datastore configuration that is in
use at runtime.This function is useful if your datastore has multiple configurations
because with multiple configurations, you can use alias owner instead of
database owner names.
Syntax : db_owner(ds_name,alias_name)
• decode :
– Function to return an expression based on the first condition in the specified list
of conditions and expressions that evaluates to TRUE. It provides an alternate
way to write nested ifthenelse functions.
Syntax : decode(condition_and expression_list,default expression)

297
SPICA
DATA SYSTEMS

More Transforms

298
Map_Operation
SPICA
DATA SYSTEMS

• Allows conversions between data manipulation


operations.
• The Map_Operation transform allows you to
change operation codes on data sets to produce
the desired output.

299
To Delete rows based on input change
SPICA Normal-Delete
DATA SYSTEMS

300
Table_Comparison
SPICA
DATA SYSTEMS

• Compares two data sets and produces the


difference between them as a data set with rows
flagged as INSERT, UPDATE, or DELETE.
• The Table_Comparison transform allows you to
detect and forward changes that have occurred
since the last time a target was updated.

301
SPICA Data Inputs
DATA SYSTEMS

• The data set from a source or the output from


another transform. Only rows flagged as
NORMAL are considered by the transform. This
data is referred to as the input data set.

• The specification for a database table to


compare to the input data set. This table is
referred to as the comparison table.

302
Comparison method
SPICA
DATA SYSTEMS

Row-by-row select — Select this option to have the transform look up


the target table using SQL every time it receives an input row. This option
is best if the target table is large compared to the number of rows the
transform will receive as input.
Cached comparison table — Select this option to load the comparison
table into memory. In this case, queries to the comparison table access
memory rather than the actual table. This option is best when you are
comparing the entire target table. Data Integrator uses pageable cache
as the default. If the table fits in the available memory, you can change
the cache type to in-memory in the data flow Properties.
Sorted input — Select this option to read the comparison table in the
order of the primary key column(s) using sequential read.
This option improves performance because Data Integrator reads the
comparison table only once.

303
SPICA Options
DATA SYSTEMS

304
SPICA
DATA SYSTEMS

• Input primary key column(s)


– The input data set columns that uniquely identify each
row. These columns must be present in the
comparison table with the same column names and
data types.
• Compare columns
– (Optional) Improves performance by comparing only
the sub-set of columns you drag into this box from the
input schema. If no columns are listed, all columns in
the input data set that are also in the comparison
table are used as compare columns.
305
SPICA
DATA SYSTEMS

• Generated Key Column


– For an UPDATE, the output data set will contain the
largest generated key found for a given primary key.
• Detect deleted row(s) from comparison table
– (Optional) Generates DELETEs for all rows that are in
the comparison table and not in the input set.
Assumes the input set represents the complete data
set. By default this option is turned off.

306
Data Outputs
SPICA
DATA SYSTEMS

• A data set containing rows flagged as INSERT,


UPDATE or DELETE.
• This data set contains only the rows that make
up the difference between the two input sources:
one from the input to the transform (input data
set), and one from a database table you specify
in the transform (the comparison table).

307
SPICA Insert
DATA SYSTEMS

308
SPICA Update
DATA SYSTEMS

309
SPICA Delete
DATA SYSTEMS

310
History_Preserving
SPICA
DATA SYSTEMS

• The History_Preserving transform allows you to


produce a new row in your target rather than
updating an existing row.
• You can indicate in which columns the transform
identifies changes to be preserved.
• If the value of certain columns change, this
transform creates a new row for each row
flagged as UPDATE in the input data set.

311
Data Inputs
SPICA
DATA SYSTEMS

• A data set that is the result of a comparison


between two images of the same data in which
changed data from the newer image are flagged
as UPDATE rows and new data from the newer
image are flagged as INSERT rows.

312
SPICA Input Data..
DATA SYSTEMS

313
SPICA Properties
DATA SYSTEMS

314
SPICA SCD-2
DATA SYSTEMS

• Date Based
• Flag Based

315
SPICA Flag based Dimension
DATA SYSTEMS

• Compare columns
– Rows flagged as Insert should be inserted with
current flag as ‘A’
– Input might have been flagged as update because of
phone no change. This row should be updated.
Current flag does not change
– If input is flagged as update because of state change,
this row should be inserted with current flag as ‘A’ &
existing row should be updated with current flag as ‘I’

316
Auditing
SPICA
DATA SYSTEMS

• Auditing provides a way to ensure that a data flow loads


correct data into the warehouse. Use auditing to perform
the following tasks:
• Define audit points to collect run time statistics about the
data that flows out of objects.
• Define rules with these audit statistics to ensure that the
data at the following points in a data flow is what you
expect:
– Extracted from sources
– Processed by transforms
– Loaded into targets
• Generate a run time notification that includes the audit
rule that failed.

317
SPICA
DATA SYSTEMS

318
SPICA Audit Configuration
DATA SYSTEMS

• Audit label — The unique name in the data flow that


Data Integrator generates for the audit statistics
collected for each audit function that you define. You use
these labels to define audit rules for the data flow.
• Audit rule — A Boolean expression in which you use
audit labels to verify the Data Integrator job. If you define
multiple rules in a data flow, all rules must succeed or
the audit fails.
• Actions on audit failure — One or more of three ways to
generate notification of an audit rule (or rules) failure:
email, custom script, raise exception.

319
Accessing the Audit window
SPICA
DATA SYSTEMS

320
SPICA Define Audit Points
DATA SYSTEMS

321
SPICA Audit Rule & Action on failure
DATA SYSTEMS

322
SPICA Installation & Configuration
DATA SYSTEMS

323
Create local or Central Repositories in
SPICA Repository Manager
DATA SYSTEMS

324
Add Job Server & Associate Repositories
SPICA
DATA SYSTEMS

325
Connect to local repository & access global
repository from it by activating link
SPICA (Tools/Central Repositories)
DATA SYSTEMS

326
SPICACentral Repository Objects are available
DATA SYSTEMS

327
Add objects from local to central repository
SPICA for the first time to check-in the object
DATA SYSTEMS

328
Check-out the object next time you
SPICA want to make any changes
DATA SYSTEMS

329
SPICA Check-In after changes are done
DATA SYSTEMS

330
SPICA Undo check-out to read only permission
DATA SYSTEMS

331
SPICA Management Consol
DATA SYSTEMS

332
SPICA Register Repository
DATA SYSTEMS

333
SPICA Schedule Batch Jobs
DATA SYSTEMS

334
SPICA Execute Job & view log
DATA SYSTEMS

335
SPICA Add Users
DATA SYSTEMS

336
SPICA Log Retention Period
DATA SYSTEMS

337

You might also like