Professional Documents
Culture Documents
DATA SYSTEMS
1
SPICA Training Details
DATA SYSTEMS
• DBMS
• RDBMS
• SQL
• Data Warehousing
• ETL Tool- Business Objects Data Integrator
2
DBMS-WHAT IS A DATABASE,
SPICA ANYWAY?
DATA SYSTEMS
3
SPICA Basic database terms- Item/Entity
DATA SYSTEMS
Person’s Name
Person’s Address
Person’s Phone No
Person’s DOB
4
SPICA Basic database terms- Field
DATA SYSTEMS
5
SPICA Basic database terms- Record
DATA SYSTEMS
6
SPICA Basic database terms- Value
DATA SYSTEMS
7
Putting it all together - Table
SPICA
DATA SYSTEMS
Table
Item/Entity
Person
8
SPICA Basic database terms-Primary Key
DATA SYSTEMS
Person
• dBase
• Oracle
• SQL Server
• Etc.
10
SPICA RDBMS
DATA SYSTEMS
11
Relationship between Tables-
SPICA Foreign Key
DATA SYSTEMS
Country
Country Id Name
1 USA Person
Person Id Name Address Phone Country DOB
2 UK No. Id
1001 John 344 Main 201- 1 21/12/1997
Steel Street, 367-
New York 2345
City
12
Data Types
SPICA (Data Classification)
DATA SYSTEMS
• Char()
• Varchar2()
• Number
• Number (m, n)
• Date
• Etc.
13
SPICA RDBMS Examples
DATA SYSTEMS
• Oracle
• SQL Server
• Etc.
14
Structured Query Language (SQL)
SPICA
DATA SYSTEMS
15
Features of SQL
SPICA
DATA SYSTEMS
16
SQL Statements
SPICA
DATA SYSTEMS
CREATE
CREATE
ALTER
ALTER
DROP
DROP Data Definition
Data Definition Language
Language (DDL)
(DDL)
RENAME
RENAME
TRUNCATE
TRUNCATE
INSERT
INSERT
UPDATE
UPDATE Data Manipulation
Data Manipulation Language
Language (DML)
(DML)
DELETE
DELETE
COMMIT
COMMIT
ROLLBACK
ROLLBACK Transaction Control
Transaction Control Language
Language (TCL)
(TCL)
SAVEPOINT
SAVEPOINT
SELECT
SELECT DataRetrieval
Data Retrieval Language
Language (DRL)
(DRL)
GRANT
GRANT Data Control
Data Control Language
Language (DCL)
(DCL)
REVOKE
REVOKE 17
Database Objects
SPICA
DATA SYSTEMS
Object
Object Description
Description
Table
Table Basic
Basic unit
unit ofof storage;
storage; composed
composedofof rows
rows && columns
columns
View
View Logically
Logicallyrepresents
represents subsets
subsets ofof data
datafrom
fromone
oneor
ormore
moretables
tables
Sequence
Sequence Generates
Generates primary
primarykkey
eyvalues
values
Index
Index Improves
Improves the
theperformance
performanceofof some
somequeries
queries
Synonym
Synonym Gives
Gives alternative
alternativenames
names to
toobjects
objects
18
Dual Table
SPICA
DATA SYSTEMS
For example:
19
Writing Simple Queries
SPICA
DATA SYSTEMS
For Example:
SELECT * from Jobs;
20
SPICA Ensuring Uniqueness
DATA SYSTEMS
For example:
21
Limiting Rows
SPICA
DATA SYSTEMS
For Example:
22
Comparison Operators
SPICA
DATA SYSTEMS
1. Equality =
2. Inequality !=,<> or ^=
3. Less than <
4. More than >
5. Less Than or Equal To <=
6. Greater Than or Equal To >=
23
Exercise-1
SPICA
DATA SYSTEMS
24
Sorting Rows
SPICA
DATA SYSTEMS
For Example:
SELECT first_name ||' '|| last_name from
employees where department_id=90 ORDER BY
last_name;
25
SPICA Conversion Functions
DATA SYSTEMS
26
SPICA Exercise
DATA SYSTEMS
27
SPICA Group Functions
DATA SYSTEMS
28
SPICA AVG
DATA SYSTEMS
For Example:
SELECT job_id,AVG(salary) FROM hr.employees where
job_id like ‘AC%’ GROUP BY jb_id;
29
SPICA MAX
DATA SYSTEMS
For Example:
30
SPICA SUM
DATA SYSTEMS
31
Grouping Data with GROUP BY
SPICA
DATA SYSTEMS
As the name implies, group functions work on data that is grouped. We tell
the database how to group or categorize the data with a GROUP By clause.
Whenever we use a group function in the SELECT clause of a SELECT
statement, we must place all non-grouping columns in the Group By clause.
If no GROUP BY clause is specified, the default grouping becomes the
entire result set.
For Example:
SELECT cust_state_province,count(*) customer_count from
sh.customers GROUP BY cust_state_province;
In this example, we categorize the data by state and apply the group
function(COUNT). It returns the number of rows for each state in the
CUSTOMER table. If we want to order the results by the number of
customers, our ORDER BY clause can contain either the column number or
the grouping function.
32
SPICA Joins
DATA SYSTEMS
• Equi Joins
• Outer Joins
33
SPICA Equi Joins
DATA SYSTEMS
For Example:
Select locations.location_id, city, department_name from
locations,departments where
locations.location_id=departments.location_id;
34
SPICA Outer Joins
DATA SYSTEMS
We use outer join to see the data from one table even if
there is no corresponding row in the joining table.
35
SPICA Left Outer
DATA SYSTEMS
A left Outer Join is a join between two tables that return rows based
on the matching condition as well as unmatched rows from the table
to the left of the JOIN clause. For example, the following query
returns the country name and city name from the COUNTRIES and
LOCATIONS tables, as well as the entire country names from the
countries table.
36
SPICA Right Outer Join
DATA SYSTEMS
37
SPICA Full Outer Join
DATA SYSTEMS
38
SPICA Example
DATA SYSTEMS
39
DDL
SPICA
DATA SYSTEMS
• CREATE
• ALTER
• DROP
• TRUNCATE
• RENAME
40
CREATE TABLE STATEMENT
SPICA
DATA SYSTEMS
41
Create Customer Table
SPICA
DATA SYSTEMS
42
Alter Table
SPICA
DATA SYSTEMS
43
SPICA Add Column
DATA SYSTEMS
44
Rename Column
SPICA
DATA SYSTEMS
45
Change Datatype
SPICA
DATA SYSTEMS
46
SPICA Drop Column
DATA SYSTEMS
47
SPICA Drop Table
DATA SYSTEMS
48
SPICA Truncate Table
DATA SYSTEMS
49
SPICA Rename Table
DATA SYSTEMS
Syntax
50
SPICA Exercise
DATA SYSTEMS
• Create a table.
• Alter a table.
• Drop a table.
• Truncate table.
51
SPICA
DATA SYSTEMS
Modifying Data
52
SPICA DML Statements
DATA SYSTEMS
53
Inserting Rows into a Table
SPICA
DATA SYSTEMS
54
SPICA Insert - Example
DATA SYSTEMS
55
Updating Rows in a Table
SPICA
DATA SYSTEMS
56
SPICA Update - Example
DATA SYSTEMS
UPDATE order_rollup
set(qty,price) =(SELECT SUM(qty),SUM(price) from
order_lines where customer_id =‘KOHL’)
where customer_id = ‘KOHL’
and order_period=TO_DATE(’01-Oct-2001’);
OR
57
Merging Rows into a table
SPICA
DATA SYSTEMS
58
SPICA Example
DATA SYSTEMS
59
Transaction Control Statements
SPICA
DATA SYSTEMS
• COMMIT
• ROLLBACK
• SAVEPOINT
60
Advantages of COMMIT & ROLLBACK statements
SPICA
DATA SYSTEMS
61
Controlling Transactions
SPICA
DATA SYSTEMS
ROLLBACK to B
ROLLBACK to A
ROLLBACK
62
State of the Data After COMMIT
SPICA
DATA SYSTEMS
63
State of the Data After ROLLBACK
SPICA
DATA SYSTEMS
64
SPICA Exercise
DATA SYSTEMS
65
SPICA
DATA SYSTEMS
Data Warehousing
66
Agenda
SPICA
DATA SYSTEMS
• OLTP Systems
• Business Intelligence
• Data Warehousing Components
• Dimensional Modeling
– Dimensions
– Facts
– Star Schema
– Time Dimension
• Retail Store Case Study
• Dimensional Modeling life Cycle
• Surrogate Keys
• Degenerate Dimensions
• Snowflake Vs Star Schema
67
SPICA Agenda ..
DATA SYSTEMS
68
SPICA OLTP Systems
DATA SYSTEMS
69
Data Models Revisited
SPICA
DATA SYSTEMS
70
SPICA What is Data Modeling Life Cycle?
DATA SYSTEMS
71
E/R Modeling
SPICA
DATA SYSTEMS
Features of ER model:
• ER model is highly normalized
• Stress is on optimization of OLTP transaction
72
SPICA ER Model for Retail Sales
DATA SYSTEMS
Customer_Master
Region_Master Channel_Master
Customer_Cd
Region_Cd Channel_Code
Customer_Type_Cd (FK)
Region_Desc City_Cd (FK) Channel_Desc
Customer_Desc
Address_Line1
Address_Line2
Country_Master Phone_No
Country_Cd Zipcode
Region_Cd (FK) Customer_Channle_Master
Country_Desc Customer_Channel_ID
Channel_Code (FK)
Customer_Cd (FK)
State_Master City_Master
State_Cd City_Cd
73
SPICA Answer the following Queries
DATA SYSTEMS
74
SPICA SQL for Queries
DATA SYSTEMS
75
SQL for Queries for
SPICA Quarterly/Yearly Sales
DATA SYSTEMS
• Select Sum(Sales_Amt),
to_char(order_date,’QQ-YYYY’) from Tx_table
group by to_char(order_date,’QQ-YYYY’)
• Select Sum(Sales_Amt),
to_char(order_date,’YYYY’) from Tx_table group
by to_char(order_date,’YYYY’)
76
SPICA Observations for ER Diagram
DATA SYSTEMS
77
SPICA Business Intelligence
DATA SYSTEMS
• Business Intelligence
– What is Business Intelligence?
• BI refers to technologies, applications for
collection, integration, analysis & presentation of
Business Information.
– How different it is from OLTP Systems?
• OLTP systems are designed for Day to Day
operations, while BI application are for strategic
decision making.
78
SPICA Functional View of Systems
DATA SYSTEMS
Sales
Sales Marketing Finance Rates/
Rates/ Customer
Customer
Sales
Sales Marketing Finance MIS
Regulatory Service
Service MIS
Regulatory
Demographics
Demographics Promotions
General
General Ledger
Ledger
Contracts
Contracts
79
SPICA Difficult Answer for Simple Questions
DATA SYSTEMS
80
Data Integration for DSS
SPICA (Decision Support Systems)
DATA SYSTEMS
Sales
Sales
Product
Product Data
Data
Customer
Customer Data
Data
Sales
Sales Data
Data
Revenue
Revenue data
data
G/L
G/L Data
Data
External
External Data
Data 81
SPICA Components of Business Intelligence
DATA SYSTEMS
• Data Warehouse
• Data Marts
• Enterprise Data Warehouse
• OLAP (Reports & Dash Boards)
• Operational Data Store
• Data Mining
82
How BI help Companies?
SPICA
DATA SYSTEMS
83
SPICA Data Warehouse
DATA SYSTEMS
84
SPICA Goals of Data Warehousing
DATA SYSTEMS
85
SPICA Characteristics of DW
DATA SYSTEMS
• Subject Oriented
• Integrated
• Non Volatile
• Time Variant
86
SPICA OLTP/OSS
DATA SYSTEMS
• Standard reporting
• Ad-hoc query and reporting
• Multidimensional analytical reporting
• Predictive analysis and planning
• Data Cubes (ROALP Vs MOALP)
88
SPICA OLTP VS OLAP Requirements
DATA SYSTEMS
• Work Load
• Data Modifications
• Schema Design
• Historical Data
89
SPICA OLTP VS OLAP Query
DATA SYSTEMS
90
SPICA Dimensional Modeling – what it is?
DATA SYSTEMS
91
SPICA De-normalization
DATA SYSTEMS
92
SPICA Dimension tables
DATA SYSTEMS
Dimension tables contain the details about the business entities such as customer,
product, etc. This enables the business users to better understand the data and
their reports.
Since the data in a dimension table is denormalized, it typically has a large number
of columns.
The attributes in a dimension table are typically used as row and column headings
in a report or query results display.
Arrange members into hierarchies or levels
93
SPICA Geography Dimension Example
DATA SYSTEMS
94
SPICA Fact tables
DATA SYSTEMS
96
SPICA Star Schema Example
DATA SYSTEMS
97
SPICA Surrogate Key
DATA SYSTEMS
98
SPICA Data Warehouse Surrogate Keys
DATA SYSTEMS
Benefits
• Isolate warehouse from operational changes (SCDs)
• Improve performance
• Handle “Not applicable,” “Date TBD,” …..
• Allow integration from multiple sources (e.g. same key can be used by
multiple systems for different customers)
• Enable tracking of dimension changes
99
SPICA Date and Time Dimensions
DATA SYSTEMS
100
SPICA Date and Time Dimensions..
DATA SYSTEMS
101
Date Dimension
SPICA
DATA SYSTEMS
102
SPICA Non & Semi-Additive Facts
DATA SYSTEMS
103
SPICA DW Data Model- Sales System
DATA SYSTEMS
Customer_Dim Product_Dim
cust_key prod_key
customer id product id
customer name product name
phno Brand_Name
Active Flag SKU
Marital_Status gross_wt
Address_Line1
Address_line2
Birth_Date
No_Of_Children
Orde-Fact
prod_key (FK)
Region_Dim
cust_key (FK)
Time_Dim Region_Key
Time_Key (FK)
Time_Key Region_Key (FK) Zipcode
Cal_Date order_key (FK) city_name
Fiscal_Date ord_amt state_name
Day_Name ord_qty country_name
Month_No
Manth_Name
Order_Dim
order_key
order id
ord_cotract
Sum_Orde_Fact_Prod
Time_Key (FK)
Region_Key (FK)
prod_key (FK)
ord_amt Payment_mode_dim
ord_qty pay_mode_key
mode_type
Sales_Target_Fact
Time_Key (FK)
Region_Key (FK)
prod_key (FK)
target_amt
target_qty
Payment_Fact
order_key (FK)
Time_Key (FK)
Region_Key (FK)
pay_mode_key (FK)
amt_paid
104
Answer the following Queries For
SPICA Orders DW
DATA SYSTEMS
105
SPICA Query Observation
DATA SYSTEMS
106
Advantages of Dimensional Modeling over
SPICA ER modeling for DW
DATA SYSTEMS
107
SPICA OLTP VS DW Structure
DATA SYSTEMS
108
SPICA Steps in Dimensional Modeling
DATA SYSTEMS
109
SPICA
DATA SYSTEMS
110
SPICA Dimensional Modeling Life Cycle
DATA SYSTEMS
111
Retail Store Summarized
SPICA Business Case
DATA SYSTEMS
Background:
Chain consists of over 1000 grocery stores in five states
Stores average 60,000 stock keeping units (SKUs) in departments such as frozen
foods, dairy etc.
Bar codes are scanned directly into the cash registers’ point of sale (POS) system
Products are promoted via coupons, temporary price reductions , ads and in-store
promotions
Analytic Requirements:
Need to know what is selling in the stores each day in order to evaluate product
movement, as well as to see how sales are impacted by promotions
Need to understand the mix of products in a consumer’s market basket.
112
Design Steps 1-3
SPICA
DATA SYSTEMS
113
SPICA Declare the Grain
DATA SYSTEMS
• Decisions to be made:
1. What should be level of granularity?
2. What if business data contains data at 2 different
granularities?
3. Should we create a summary fact?
Date Store
Promotion Product
115
Identify Facts
SPICA
DATA SYSTEMS
• Decisions to be made:
1. Which facts will appear in the fact table?
2. Whether to store calculated values in fact table
or in views?
Sales Quantity
Unit Sales Price
Total sales price
Profit
116
Retail Store Sales Star
SPICA Schema in Action
DATA SYSTEMS
What were the weekly sales for the snacks category during the “Super Bowl”
promotion in the NY District during the month January 2003?
STORE KEY
PRODUCT KEY
Store Name
Product Desc City
Product Size District
DATE KEY
Package Type Zone
Category PRODUCT KEY
STORE KEY
PROMOTION KEY
I
OS N
PY Oper.
ES Data Bases F
ETL Data Warehouse O Reporting
RT Or
AE Raw
Data
Process R
data Clean
T M
Stores data M
I S
O A Data Mining
N T
A
L I
External Data Marts Data Marts O
Sources N
Information
Analytic Applications
118
Ralph Kimball’ Approach
SPICA Simplified Elements of a Warehouse
DATA SYSTEMS
Data Mart Bus:
Conformed facts and dims
Extract
Services:
Transform from Data Mart #1
source-to-Target Dimensional
Atomic AND Ad Hoc Query Tools
Maintain Conform summery data
Report Writers
Dimensions Business
Extract Process Centric
No user query
Access Analytic Applications
support Design Goals: Modeling:
Source System Data Staging Area Presentation Area Data Access Tools
119
Independent Data Marts: Ralph Kimball’s Ideology
Bill Inmon’ Approach
SPICA
DATA SYSTEMS
Data Mart #1
Extract Dimensional
summery data
“Operational Departmental
Centric
Data Store”
Atomic Data
Load User query Data
support to Warehouse
Extract #...
atomic data
Access
121
SPICA DW Tools
DATA SYSTEMS
122
SPICA Other DW Tools
DATA SYSTEMS
• Modeling
– ERwin
• Data Cleansing Cleansing
– Ascential Quality Manager, Trillium, Vality, FirstLogic,
Innovative Systems
123
SPICA Data Mart
DATA SYSTEMS
124
SPICA Operational Data Store
DATA SYSTEMS
DM
Advantages Advantages
• “Fast” Implementation • Single Version of the truth.
• Quick ROI •Consistent Data model.
• Departmental Control • Robust data transformation.
Disadvantages Disadvantages
• Multiple data Models. •Must have an existing data warehouse
•Multiple interfaces to manage/maintain. • Must fit with the corporate strategy
• No single version of the truth.
• Duplication of Data.
126
SPICA Enterprise Architecture with ODS layer
DATA SYSTEMS
IT users
Operational Data
Data transformation
ODS Layer
Enterprise Warehouse
Data Marts/DW
127
Business User
SPICA DW Project Lifecycle
DATA SYSTEMS
• Project Planning
• Business Requirement Definition
• Technical Architecture Design
• Dimensional Modeling
• Physical Design
• Data Staging Design and Development (ETL)
• Analytic Application Specification Design and
Development (OLAP)
• Testing and Production Deployment
• Maintenance
128
SPICA Snowflake Schema
DATA SYSTEMS
129
Snowflaking
SPICA
DATA SYSTEMS
130
SPICA Example of Snow flake schema
DATA SYSTEMS
131
SPICA Snowflake - Disadvantages
DATA SYSTEMS
132
Star vs. Snowflake
SPICA Design Variations
DATA SYSTEMS
Promotion
Date
POS
Trxn
Store Product
Brand
City
Prod
Class Color Store
POS District
Size Trxn
Promotion
Day
133
SPICA Factless Fact table
DATA SYSTEMS
134
SPICA Factless Fact table
DATA SYSTEMS
Promotion Key
135
SPICA
DATA SYSTEMS
136
Slowly Changing Dimensions
SPICA
DATA SYSTEMS
138
SPICA Slow-changing Dimensions
DATA SYSTEMS
139
Type 1 Slowly Changing Dimension
SPICA (Overwrite)
DATA SYSTEMS
• Overwrite one or more values of the dimension with the new value
• Use when
– the data are corrected
– there is no interest in keeping history
– there is no need to run previous reports or the changed value is
immaterial to the report
• Type 1 Overwrite results in an UPDATE SQL statement when the
value changes
• If a column is Type-1, the ETL subsystem must
– Add the dimension record, if it’s a new value or
– Update the dimension attribute in place
• Must also update any Staging tables, so that any subsequent
DW load from the staging tables will preserve the overwrite
• This update never affects the surrogate key
140
Applying Type 1 Changes to the
Data Warehouse
SPICA
DATA SYSTEMS
141
SPICA Type 1 Change
DATA SYSTEMS
142
Type-2 Slowly Changing Dimension
SPICA (Preservation of History)
DATA SYSTEMS
143
SPICA Type-2 Slowly Changing Dimensions
DATA SYSTEMS
144
SPICA Types of Type 2
DATA SYSTEMS
145
SPICA Type-2 SCD Precise Time Stamping
DATA SYSTEMS
146
Type 2 Change
SPICA (Marital status change than address change)
DATA SYSTEMS
147
Type-3 Slowly Changing Dimensions
SPICA (Alternate Realities/Soft Changes)
DATA SYSTEMS
• Applicable when a change happens to a dimension record but the old record remains
valid as a second choice
– Product category designations
– Sales-territory assignments
• Instead of creating a new row, a new column is inserted (if it does not already exist)
– The old value is added to the secondary column
– Before the new value overrides the primary column
– Example: old category, new category
• Usually defined by the business after the main ETL process is implemented
– “Please move Brand X from Men’s Sportswear to Leather goods but allow me to track Brand
X optionally in the old category”
• The old category is described as an “Alternate reality”
148
SPICA Type 3 Change
DATA SYSTEMS
149
SPICA Standardize Dimension & Facts
DATA SYSTEMS
• Conform Dimensions
• Conform Facts
150
SPICA Conformed Dimensions
DATA SYSTEMS
151
SPICA Conformed Dimensions..
DATA SYSTEMS
153
SPICA Conformed Facts
DATA SYSTEMS
154
SPICA BUS MATRIX
DATA SYSTEMS
Dimensions Facts
Sales X X x X X
HR X X X
CRM X X X
Marketing X X X X
155
SPICA Business Objects Data Integrator
DATA SYSTEMS
156
SPICA Course Overview
DATA SYSTEMS
158
SPICA Designer
DATA SYSTEMS
159
Data Integrator repository
SPICA
DATA SYSTEMS
160
Two types of repositories:
SPICA
DATA SYSTEMS
161
Data Integrator Job Server
SPICA
DATA SYSTEMS
162
Data Integrator engine
SPICA
DATA SYSTEMS
163
Data Integrator Access Server
SPICA
DATA SYSTEMS
164
Data Integrator Administrator
SPICA
DATA SYSTEMS
165
SPICA Data Integrator Objects
DATA SYSTEMS
• Projects
• Jobs
• Data Flows
• Work flows
• Scripts
• Transforms
166
SPICA Projects & Jobs
DATA SYSTEMS
• Projects
– A project is an object that allows you to group jobs. A
project is the highest level of organization offered by
Data Integrator.
– Opening a project makes one group of objects easily
accessible in the user interface.
• Jobs
– A job is the only object you can execute. You can
manually execute and test jobs in development.
– In production, you can schedule batch jobs.
– A job is made up of steps you want executed
together. 167
SPICA Data Flow
DATA SYSTEMS
168
SPICA Workflow
DATA SYSTEMS
169
SPICA Transforms
DATA SYSTEMS
170
SPICA Scripts
DATA SYSTEMS
171
SPICA Object Use
DATA SYSTEMS
172
SPICA Relationship between the objects
DATA SYSTEMS
• Data flow:
173
SPICA Projects
Object Hierarchy
DATA SYSTEMS
Jobs
Transforms
Database
Datastores
Tables
File Formats
Template
Tables Flat Files
Functions
• Design
• Test
• Production
175
SPICA Designer Interface
DATA SYSTEMS
176
SPICA Key Areas of Designer Window
Menu Bar Workspace
DATA SYSTEMS
Project
Area
Tool Palette
Local
Object
Library
177
SPICA Defining Source and Target Metadata
DATA SYSTEMS
178
SPICA Datastores
DATA SYSTEMS
179
SPICA Types of Datastores
DATA SYSTEMS
• Database Datastores:
– Provides a simple way to import way to import
metadata
• Application Datastores:
– Easily import metadata from most ERP systems
• Adapter Datastores:
– Provides access to an application’s data and
metadata or just metadata
180
SPICA Using Datastore
DATA SYSTEMS
• Explain Datastore
• Create a database datastore
• Change a datastore definition
181
SPICA Importing metadata
DATA SYSTEMS
• Types of metadata
• Capture metadata information from imported data
• Import metadata by browsing
• Activity : Creating ODS and importing table metadata
182
SPICA Defining a file format
DATA SYSTEMS
183
SPICA Handling errors in file format
DATA SYSTEMS
184
SPICA Exercise
DATA SYSTEMS
• Login to Designer
• Import Tables
• Import File Definition & define error handling
185
SPICA Creating a Batch Job
DATA SYSTEMS
186
SPICA Agenda
DATA SYSTEMS
• Create a Project
• Creating a Job
• Explain source & target objects
• Explain what a transform is
• Understand the Query transform
• Understand Job Execution
• Activity
– Defining a data flow to load into target
– Using format file to populate a target table
– Using a template tables
187
SPICA Introduction
DATA SYSTEMS
• Create a project
• Create a job
• Create Data Flow
• Add,connect and delete objects in workspace
• Using Query Transforms
188
SPICA Create Data Flow
DATA SYSTEMS
189
SPICA Source Objects & Target Objects
DATA SYSTEMS
190
SPICA Transform
DATA SYSTEMS
191
SPICA Understanding Query Transform
DATA SYSTEMS
192
SPICA Query Editor Window
Input Schema Output Schema
DATA SYSTEMS
194
SPICA Table loading options
DATA SYSTEMS
Rows per commit No. of rows sent to target database in one fetch
process
Column comparison This specifies how the input columns are mapped
to output columns. There are 2 options:
Enable Partitioning This loads data using the no. of partitions in the
table as maximum number of parallel instances.You
can select one of the following loader options: No.
of loaders,Enable partitioning or transactional
loading. 195
SPICA
DATA SYSTEMS
No. of loaders Loading with one loader is known as Single loader loading.
Parallel loading refers to loading jobs that contain a number of loaders greater than one.
The default no. of loaders for this option is one. Maximum no. of loaders is five
Use overflow file This option is used for recovery purposes. If row cannot be loaded, it is written to a file.
When this option is selected, options are enabled for the file name and file format. The
overflow format can include the data rejected and the operation being
performed(write_data) or SQL command used to produce the rejected operation(write _sql)
Ignore columns with value Enter a value that might appear in source column and that you do not want updated in the
target table. When this value appears in the source column , the corresponding target
column is not updated during auto correct loading. You can enter spaces.
Ignore columns with null Auto correct load options check box should be enabled to use this option and if you do not
want NULL source columns updated in the target table
196
SPICA
DATA SYSTEMS
Use input keys If target table contains no primary key, this option enables
Data Integrator to use primary key from the input.
Update Key columns This option allows Data Integrator to update key column
values when it loads data to target
Auto correct load Ensures that same row is not duplicated in a target table.
This is particularly useful for data recovery operations.
198
List of Trace Options
SPICA
DATA SYSTEMS
Trace Description
Session Writes a message when a job description is read from the repository, when the job is optimized and when the job
runs.
Work Flow Writes a message when a work flow description is read from the repository ,when the work flow is optimized
,when the work flow runs and when the work flow ends.
Data Flow Writes a message when the data flow starts, when the work flow is optimized, when the work flow runs, and when
the work flow ends.
Custom Writes a message when a case transform starts and completes successfully.
Transform
Custom functions Writes a message of all user invocations of the AE_log message function from custom C Code
SQL Readers Writes a message (using Table comparison transforms) about whether a row exists in the target table that
corresponds to input row in the source table.
199
SPICA Using Log files
DATA SYSTEMS Error log
Tool Description
Monitor log Itemizes the steps executed in the job and
the time execution began and ended
Statistics log Displays each step of each data flow in the
job, the number of rows streamed through
each step, and the duration of each step.
200
SPICA Using descriptions & annotations
DATA SYSTEMS
• Sources
View data allows you to see source data before you execute a job.
Using data details you can:
– Create higher quality job designs
– Scan and analyze imported table and file data from object library
– See the data for those same objects within existing jobs.
– Refer back to the source data after you execute the job.
• Targets
View data allows you to check target data before you
executing a job,then look at the changed data after the job
executes. In a data flow, you can use one or more View data
panels to compare data between transforms and within source
and target objects. 202
SPICA Using Interactive Debugger
DATA SYSTEMS
• The Designer includes an interactive debugger that allows you to examine and
modify data row-by-row (during a debug mode job execution) by placing filters
and breakpoints on lines in a data flow diagram. The interactive debugger
provides powerful options to debug a job.Designer displays 3 additional
windows : Call Stack,Trace,Variables & View Data Panes
Call Stack
Windows
Variable windows
Trace Windows
Data sample rate — The number of rows cached for each line when a job executes using the interactive debugger. For
example, in the following data flow diagram, if the source table has 1000 rows and you set the Data sample rate to 500,
then the Designer displays up to 500 of the last rows that pass through a selected line. The debugger displays the last
row processed when it reaches a breakpoint 203
SPICA Setting filters and breakpoints
DATA SYSTEMS
• Before you start a debugging session, however, you might want to set the
following:
• Filters and breakpoints
• Interactive debugger port between the Designer and an engine.
204
SPICA Template tables
DATA SYSTEMS
206
SPICAConvert template table into regular table
DATA SYSTEMS
207
SPICA Using Built-in transforms
DATA SYSTEMS
208
SPICA List of Built In Transforms
DATA SYSTEMS
• Query
• Case
• Merge
• Data Transfer
• Date Generation
• Key Generation
• Validation
209
SPICA Query Transform
DATA SYSTEMS
210
SPICA Propose Join
DATA SYSTEMS
211
SPICA Outer Join Specifications
DATA SYSTEMS
212
SPICA Outer Join Example
DATA SYSTEMS
213
Join Rank
(The highest ranked source is accessed first to construct the join.
SPICA )
DATA SYSTEMS
214
SPICA Calculation
DATA SYSTEMS
215
Convert Query to Flat file & connect it as
SPICA target (Useful for testing purpose)
DATA SYSTEMS
216
SPICA Aggregate Data
DATA SYSTEMS
217
SPICA Group By
DATA SYSTEMS
218
SPICA Using Case Transforms
DATA SYSTEMS
219
SPICA Case Expression
DATA SYSTEMS
220
SPICA Case Logic
DATA SYSTEMS
221
SPICA Case output
DATA SYSTEMS
222
SPICA Exercise
DATA SYSTEMS
223
SPICA Merge Transform
DATA SYSTEMS
225
Data Outputs
SPICA
DATA SYSTEMS
• The output data set contains a row for every row in the
source data sets.
• The transform does not strip out duplicate rows.
• If the data types of columns in the sources do not match
the target, add a query in the data flow before the Merge
transform. In the query, apply a data type conversion to
the columns with data types that do not match the target
column data types.
• You must apply other operations such as DISTINCT in a
query following the Merge transform.
226
SPICA Merge Output
DATA SYSTEMS
227
SPICA Exercise
DATA SYSTEMS
228
SPICA Date_generation Transform
DATA SYSTEMS
230
SPICA Options..
DATA SYSTEMS
• Join rank
• A positive integer indicating the weight of the output data
set if the data set is used in a join. Sources in the join
are accessed in order based on their join ranks. The
highest ranked source is accessed first to construct the
join.
• Cache
• Select this check box to hold the output from the
transform in memory to be used in subsequent
transforms. Select Cache only if the resulting data set is
small enough to fit in memory.
231
SPICA Configuration
DATA SYSTEMS
232
SPICA Key Generation
DATA SYSTEMS
233
Options
SPICA
DATA SYSTEMS
• Table Name
– The fully qualified name of the source table from
which the maximum existing key is determined.
Should already be imported.
• Generated key column
– The column in the key source table containing the
existing keys values. A column with the same name
must exist in the input data set;
• Increment value
– The interval between generated key values.
234
SPICA Configuration
DATA SYSTEMS
235
SPICA Time Dimension Population
DATA SYSTEMS
236
SPICA Data Transfer Transform
DATA SYSTEMS
237
SPICA Data Transfer Transform..
DATA SYSTEMS
238
SPICA Configuration
DATA SYSTEMS
239
Example
SPICA
DATA SYSTEMS
240
SPICA Example..
DATA SYSTEMS
When you execute the job, Data Integrator displays messages for
each sub data flow. Also watch table getting created & dropped during
job run.
242
SPICA Exercise
DATA SYSTEMS
243
SPICA Using Validation Transform
DATA SYSTEMS
244
Example
SPICA
DATA SYSTEMS
245
SPICA Apply Validation Rule on Zip Code
DATA SYSTEMS
246
SPICA Input Data
DATA SYSTEMS
247
SPICA Output in both Targets
DATA SYSTEMS
248
SPICA Built in Transforms & Operation Codes
DATA SYSTEMS
249
SPICA Operation Codes
DATA SYSTEMS
Normal Creates a New row in the target. All rows in a data set are
flagged as normal when they are extracted from source table.
Most of the transforms operate only on rows flagged as
NORMAL
Insert Rows can be flagged as INSERT by the Table _Comparison
transforms to indicate that a change occurred in a data set as
compared with an earlier image of the same data set. The
Map_Operation transform can also produce rows flagged as
INSERT. Only History_Preserving and Key_Generation
transforms can accept data sets with rows flagged as INSERT
as input.
Delete Is ignored by the target. Rows flagged as DELETE are not
loaded.Rows can be flagged as DELETE in the Map_Operation
and Table Comparison transforms. Only the
History_Preserving,transform with the Preserve delete row(s)
as update row(s)option selected, can accept data sets with
rows flagged as
DELETE.
Update Rows can be flagged as UPDATE by the Table _Comparison
transform to indicate that a change occurred in a data set as
compared with an earlier image of the same data set.
Map_Operation transform can also produce rows flagged as
UPDATE. Only History_Preserving and Key_Generation
transforms can accept data sets with rows flagged as UPDATE
250
as input.s
SPICA Workflows
DATA SYSTEMS
251
What is a work flow?
SPICA
DATA SYSTEMS
252
SPICA Jobs Vs Workflow
DATA SYSTEMS
253
Elements in work flows
SPICA
DATA SYSTEMS
• Workflow
• Data flows
• Scripts
• Conditionals
• While loops
• Try/catch blocks
254
Order of execution in work flows
SPICA
DATA SYSTEMS
255
SPICA Parallel Dataflow execution
DATA SYSTEMS
256
SPICA Parallel execution of complex workflows
DATA SYSTEMS
257
Using Data Integrator Scripting Language
and Variables
SPICA
DATA SYSTEMS
258
SPICA Variables..
DATA SYSTEMS
259
Variables can be used as file names for:
SPICA
DATA SYSTEMS
260
SPICA Local Vs Global Variables
DATA SYSTEMS
261
SPICA Parameters
DATA SYSTEMS
262
SPICA Parameter & Variable creation
DATA SYSTEMS
263
Passing values into data flows
SPICA
DATA SYSTEMS
264
Defining local & Global variables
SPICA
DATA SYSTEMS
265
Defining parameters
SPICA
DATA SYSTEMS
266
SPICA Assign value to Parameter
DATA SYSTEMS
267
Viewing global variables- Job Property
SPICA
DATA SYSTEMS
268
Setting global variable values
SPICA
DATA SYSTEMS
269
Understanding Data Integrator Scripting
SPICA Language
DATA SYSTEMS
• Introduction
• Language Syntax
– Supports ANSI SQL-02 varchar behavior
– Treats an empty string as zero length varchar value(instead of NULL)
– Evaluates comparisons to FALSE.
– Uses new is NULL and IS NOT NULL operators in Data Integrator
Scripting language to test for NULL values.
– Treats trailing blanks as regular characters, instead of trimming them,
when reading from all sources.
– Ignores trailing blanks in comparisons in transforms(Query and
Table_Comparison) and functions
(decode,ifthenelse,lookup,lookup_ext,lookup_seq)
270
SPICA Basic Syntax rules
DATA SYSTEMS
271
Comparison results for the
variable assignments $var1 = NULL and $var2=NULL
SPICA
DATA SYSTEMS
Conditions Recommendations
If ($var1=$var2) Do not compare without explicitly
testing for NULLS. Business
Objects does not recommend
using this logic because any
relational comparison to NULL
value returns FALSE.
If (($var1 IS NULL) AND ($var2 IS Will execute the TRUE branch if
NULL)) OR ($var1-$var2)) both $var1 and $var2 are NULL, or
if neither are NULL but are equal to
each other.
273
SPICA Workflow Object-Scripts
DATA SYSTEMS
274
Setting file names at run-time using variables
SPICA
DATA SYSTEMS
275
Workflow-Try/catch blocks
SPICA
DATA SYSTEMS
276
SPICA Try-Catch
DATA SYSTEMS
277
SPICA Available Exceptions for Catching
DATA SYSTEMS
278
Conditionals
SPICA
DATA SYSTEMS
279
SPICA Workflow with try-catch & conditional
DATA SYSTEMS
280
SPICA Conditional Configuration
DATA SYSTEMS
281
SPICA Functions
DATA SYSTEMS
282
SPICA Built in functions
DATA SYSTEMS
283
Differentiating between functions and
SPICA transforms
DATA SYSTEMS
284
SPICA Types of operations for Functions
DATA SYSTEMS
286
SPICA Other types of functions
DATA SYSTEMS
287
SPICA Using functions in expressions
DATA SYSTEMS
Before creating a custom function, you must know the input, output, and
return values and their data types. The return value is predefined to be
Return. 289
SPICA Using built in functions
DATA SYSTEMS
• The built in functions for date and time and built-in date_generation transform
are useful when building a time dimension table.
• to_char
– To convert date to string.
• to_date
– To convert a string to a date.
• Month
– To determine the month in which the given date fails.
• Quarter
– To determine the quarter in which given date fails.
290
Use lookup functions to look up status in a
SPICA table
DATA SYSTEMS
291
SPICA Lookup functions
DATA SYSTEMS
292
Lookup_ext()
SPICA
DATA SYSTEMS
• While all lookup functions return one row for each row in the source they differ by how
they choose which of several matching rows to return:
– Lookup_ext()
• Allows specification of an Order by column and Return policy(Min,Max) to
return the record with highest/lowest value in a given field. For e.g. a
surrogate key.
• This function also extends functionality by allowing you to :
– Return multiple columns from a single lookup.
– Choose from more operators to specify a lookup condition.
– Specify a return policy for lookup.
– Call lookup_ext, using Function Wizard, in the query output.
.
293
SPICA lookup_ext Syntax:
DATA SYSTEMS
294
SPICA Using lookup_seq
DATA SYSTEMS
295
Use database type functions to
SPICA return information on data sources
DATA SYSTEMS
• db_type
– function returns the database type of the data store configuration in use at runtime.
This function is useful if your datastore has multiple configurations.
Syntax: db_type(ds_name)
• db_version
– Function returns the database version of datastore configuration in use at runtime.
Syntax: db_version(ds_name)
• db_database_name
– Function returns the database name of the datastore configuration in use at runtime.
Syntax:db_database_name(ds_name)
296
DB Functions
SPICA
DATA SYSTEMS
• db_owner :
– Function returns the real owner name for the datastore configuration that is in
use at runtime.This function is useful if your datastore has multiple configurations
because with multiple configurations, you can use alias owner instead of
database owner names.
Syntax : db_owner(ds_name,alias_name)
• decode :
– Function to return an expression based on the first condition in the specified list
of conditions and expressions that evaluates to TRUE. It provides an alternate
way to write nested ifthenelse functions.
Syntax : decode(condition_and expression_list,default expression)
297
SPICA
DATA SYSTEMS
More Transforms
298
Map_Operation
SPICA
DATA SYSTEMS
299
To Delete rows based on input change
SPICA Normal-Delete
DATA SYSTEMS
300
Table_Comparison
SPICA
DATA SYSTEMS
301
SPICA Data Inputs
DATA SYSTEMS
302
Comparison method
SPICA
DATA SYSTEMS
303
SPICA Options
DATA SYSTEMS
304
SPICA
DATA SYSTEMS
306
Data Outputs
SPICA
DATA SYSTEMS
307
SPICA Insert
DATA SYSTEMS
308
SPICA Update
DATA SYSTEMS
309
SPICA Delete
DATA SYSTEMS
310
History_Preserving
SPICA
DATA SYSTEMS
311
Data Inputs
SPICA
DATA SYSTEMS
312
SPICA Input Data..
DATA SYSTEMS
313
SPICA Properties
DATA SYSTEMS
314
SPICA SCD-2
DATA SYSTEMS
• Date Based
• Flag Based
315
SPICA Flag based Dimension
DATA SYSTEMS
• Compare columns
– Rows flagged as Insert should be inserted with
current flag as ‘A’
– Input might have been flagged as update because of
phone no change. This row should be updated.
Current flag does not change
– If input is flagged as update because of state change,
this row should be inserted with current flag as ‘A’ &
existing row should be updated with current flag as ‘I’
316
Auditing
SPICA
DATA SYSTEMS
317
SPICA
DATA SYSTEMS
318
SPICA Audit Configuration
DATA SYSTEMS
319
Accessing the Audit window
SPICA
DATA SYSTEMS
320
SPICA Define Audit Points
DATA SYSTEMS
321
SPICA Audit Rule & Action on failure
DATA SYSTEMS
322
SPICA Installation & Configuration
DATA SYSTEMS
323
Create local or Central Repositories in
SPICA Repository Manager
DATA SYSTEMS
324
Add Job Server & Associate Repositories
SPICA
DATA SYSTEMS
325
Connect to local repository & access global
repository from it by activating link
SPICA (Tools/Central Repositories)
DATA SYSTEMS
326
SPICACentral Repository Objects are available
DATA SYSTEMS
327
Add objects from local to central repository
SPICA for the first time to check-in the object
DATA SYSTEMS
328
Check-out the object next time you
SPICA want to make any changes
DATA SYSTEMS
329
SPICA Check-In after changes are done
DATA SYSTEMS
330
SPICA Undo check-out to read only permission
DATA SYSTEMS
331
SPICA Management Consol
DATA SYSTEMS
332
SPICA Register Repository
DATA SYSTEMS
333
SPICA Schedule Batch Jobs
DATA SYSTEMS
334
SPICA Execute Job & view log
DATA SYSTEMS
335
SPICA Add Users
DATA SYSTEMS
336
SPICA Log Retention Period
DATA SYSTEMS
337