Professional Documents
Culture Documents
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracles products remains at the sole discretion of Oracle.
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 1
CON6624
Oracle Data Integration Platform
A Cornerstone for Big Data
September, 2016
Copyright 2016, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Internal/Restricted/Highly Restricted
Agenda
4. Data
D
Governance
ATA THAT CAN BE TRUSTED
5. Streaming Data
DATA IN MOTION OR AT REST
Eight Core
Products
Cloud or On-
Premise
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 5
#1 Realtime / Streaming
Data Integration Tool
Innovative st
Technology 1 to certify replication with
Streaming Big Data
Highly Available
Databases Bulk Data
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 7
Reference Architecture
Comprehensive architecture covers key areas #1. Data Ingestion, #2. Data Preparation &
Transformation, #3. Streaming Big Data, #4. Parallel Connectivity, and #5. Data Governance
and Oracle Data Integration has it covered.
Examples
Business Oracle Data Integrator Serving
Data Layer
Data Streams
Speed Layer Apps
Pub / Sub
Social and Logs Stream Analytics
Data Preparation REST APIs Analytics
Dataflow ML
Enterprise Data
Batch Layer NoSQL
GoldenGate
Highly Available
Databases Connectors Bulk Data
Active DataGuard
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 8
Oracle GoldenGate
Oracle GoldenGate provides low-impact capture, routing,
transformation, and delivery of database transactions
across homogeneous and heterogeneous environments in
real-time with no distance limitations.
Cloud
Most Big
Data Transaction Streams
Databases Data
Events
Realtime Performance DBs
Extensible & Flexible * The most popular enterprise integration tool in history
Supports Databases, Big Data and NoSQL:
Proven & Reliable
GoldenGate for Ingest
Applications
Platforms
Applications
Applications
Applications Databus Speed Speed
Layer Layer Serving
Serving
Layer
Layer
Streaming Analytics
REST
Services
User
Updates
Application
DBMS Batch Layer
Batch Layer
Updates
Visualization
Tools
Capture
Deliver
Route
Pump
Trail
GG GG Reporting
Tools
Data Marts
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 10
Oracle Data Preparation
Oracle Data Preparation is a self-service tool that makes
it simple to transform, prepare, enrich and standardize
business data it can help IT accelerate solutions for the
Business by giving control of data formatting directly to
data analysts.
Files
ETL
Self-Service
Apps Reporting
Better Recommendations
I want my Enterprise
data!! Reporting
DATA WRANGLING
wastes time and money
Weeks or Discovery
Logs Months & Visualization
Internet
MONTHS of effort
spent on each new
dataset
PROGRAMERS writing
scripts or complex ETL
STRUCTURED
Enterprise
ETL & Data
Big Datas dirty little secret is that 90% of time spent on a project is Integration
devoted to preparing data After all the preparation work, there isnt
enough time left to do sophisticated analytics on it Thomas H. Davenport
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 12
Oracle Data Integrator
Oracle Data Integrator provides high performance bulk
data movement, massively parallel data transformation
using database or big data technologies, and block-level
data loading that leverages native data utilities
Bulk Data
Transformation Cloud
Most Apps, Big
Bulk Data Performance Databases
& Cloud Bulk Data Movement DBs Data
Spark SQL
Sqoop Reporting
Tools
Big Data Frameworks
Pig Loaders SQL
Hive Data Marts
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 14
Business Value of ODI: Only Tool with
Portable Mappings
Runtime exec in
No ETL engine is
Oozie or via ODI
required
Java Agent
Separation of
Rich set of pre-
Logical and
built operators
Physical design
Physical exec on
User defined
SQL, Hive, Pig, or
functions
Spark
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 15
Oracle Stream Analytics
Oracle Stream Analytics is a powerful analytic toolkit
designed to work directly on data in motion simple data
correlations, complex event processing, geo-fencing, and
advanced dashboards run on millions of events per
second.
Extreme Performance
Applications
Applications
Applications
Oracle Dataflow ML
Applications
Batch Layer
Batch Layer
Application
Visualization
Tools
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 18
Oracle Metadata Management
Oracle Metadata Management provides an integrated
toolkit that combines business glossary, workflow,
metadata harvesting and rich data steward collaboration
features.
BI Report Lineage
Taxonomy Lineage
Business Glossary
Data Model Lineage
End-to-End Lineage
Supports Databases, Big Data, ETL Tools, BI Tools etc:
100+ Supported Systems
OEMM for Data Governance
Applications
Applications
Applications
Applications Databus Speed Speed
Layer Layer Serving
Serving
Layer
Layer Data Catalog
Kafka REST
Generated Streaming
User
Services
140+ Supported Tools
Updates
NoSQL
Application
DBMS Batch Layer
Batch Layer
Updates
ER Models ETL HDFS Files Visualization
Tools Tools
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 20
Eight Core
Products
Cloud or On-
Premise
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 21
Agenda
Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential, under Non-Disclosure 23
Analytic Data Sandbox
Discovery, Exploratory and Business Intelligence, Reporting and
Visualization Style Analytics Dashboard Style Analytics
Analytic Data Sandbox: Oracle Endeca, Big Data Discovery Oracle BIEE, Visual Analyzer
Stakeholder: Functional Line of Business (LoB) Tableau, Cliq, Spotfire Cognos, SAS, MicroStrategy
DataMeer etc Business Objects, Actuate etc
Core Value: Faster access to business data, Faster
time to value on Analytics
Innovation: Schema-on-read empowers rapid data Data First Model First
staging and true Data Discovery Analytics Analytics
Industries: All industries BI Self Service
Staging Sandbox
Supports Data First Style of Analytics
No schema required Often the data flow may
not require any ETL Tooling
Staging data is simple and fast
Minimal data preparation required
(mainly for un/semi-structured data sets)
Typical Customer Data Types / Sets Other data flows may still
require ETL as a pipeline
Usually bringing in Structured Data from OLTP
(Primary data is their existing Application data) DBMS
ETL Offload (on prem or cloud)
Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential, under Non-Disclosure 24
ETL Offload
Discovery, Exploratory and Business Intelligence, Reporting and
Visualization Style Analytics Dashboard Style Analytics
2. ETL Offload: Oracle Endeca, Big Data Discovery Oracle BIEE, Visual Analyzer
Stakeholder: Information Technology (IT) Tableau, Cliq, Spotfire Cognos, SAS, MicroStrategy
DataMeer etc Business Objects, Actuate etc
Core Value: Cost avoidance on DW/Marts
Innovation: YARN/Hadoop empowers lower cost
compute and lower cost storage Data First Model First
Industries: Teradata, Netezza & AbInitio customers Analytics Analytics
Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential, under Non-Disclosure 25
Deep Data Storage
Discovery, Exploratory and Business Intelligence, Reporting and
Visualization Style Analytics Dashboard Style Analytics
3. Deep Data Storage: Oracle Endeca, Big Data Discovery Oracle BIEE, Visual Analyzer
Stakeholder: Risk / Compliance (LoB) Tableau, Cliq, Spotfire Cognos, SAS, MicroStrategy
DataMeer etc Business Objects, Actuate etc
Core Value: High fidelity aged data
Innovation: SQL on Hadoop engines enable very low
cost, queryable data access Data First Model First
Industries: Insurance and Banking Analytics Analytics
Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential, under Non-Disclosure 26
Streaming Big Data Analytics
4. Streaming:
Stakeholder: Marketing (LoB) / Telematics (LoB)
Core Value: New Data Services or Higher Click Rates
Innovation: MPP capable streaming platforms
combined with modern in-motion analytics In-Motion Data First Model First
Industries: Automotive, Aerospace, Industrial Analytics Analytics Analytics
Manufacturing, some Energy/Oil & Gas
Staging Sandbox
Decisions on Data Before it hits Disk Pattern
Data volume may be too high to persist all data mining
Only save the important data
Data may be highly repetitive (sensor data)
Correlations may need to happen with very low Streaming Deep Data
Storage
latency requirements based on LoB demand
Other data flows may still
Key Use Case for Data Monetization require ETL as a pipeline
Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential, under Non-Disclosure 27
Some Common Themes Across Use Cases
1. Nearly 100% Analytic Use Cases 6. Customers are Starting in Phases:
Data Discovery directly in Hadoop By Value: IT led vs. LoB led initiatives have different
ETL Offloading for analytics in SQL DB characteristics even if the Lake / Reservoir factors in as a
Deep Data Storage for analytics in SQL DB long term goal, the initial phases are often quite small in scale
Streaming Analytics for data before it hits disk Lambda Arch 7. Size of Hadoop Clusters vary widely:
2. Nearly all the Data is Structured Data: Investment Sizes Differ (by a lot): some start with mega-
OLTP Sources: every customer starts with the trusted data sets commitments (1000s of Nodes) and others start very small
that already drive the majority of business value App Data 8. Commodity H/W Clusters Dominate:
New Sources: Clickstream Logs, Machine Data and other App Commodity: for use cases designed to work across groups
Exhaust all have structure even if they may not have schema Appliances: for use cases attached to a single project
3. Many more Sources are App/OLTP Sources: 9. Data Lakes as a Way to Handle Vendor Diversity:
By Quantity of Sources: most customers have many (dozens or Middleware for Data: bigger customers have DWs/DBs from
hundreds) of App/OLTP source they are bringing in every vendor and >6+ different BI tools; Hadoop is becoming the
By Volume: by quantity of data, the amount of Machine Data or canonical data platform to sit in between
Log data may often exceed the OLTP data sets
10. Open Source Data Platform is a Strategic Priority:
4. Mainframes Matter: Senior Stakeholder Feedback: as a design point priority for their
High Value App : most of the biggest customers bringing next gen it is becoming more important that Open Source has
mainframe (DB2/z, IMS, VSAM) data to Hadoop a central role to play in the enterprise data platform
5. Multiple Projects / Programs using Hadoop: 11. Industry Clusters:
Larger Customers: most of the biggest customers have multiple 1. Banking, 2. Insurance, 3. Manufacturing, 4. Media, 4. Retail
Hadoop projects running in parallel, some are IT led (DW/ETL
Offload) and others are LoB led (Discovery/Telematics)
Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential, under Non-Disclosure 28
Agenda
T : @m ar k ri t tm an
About the Presenter
Oracle ACE Director, blogger + ODTUG member
Regular columnist for Oracle Magazine
Past ODTUG Executive Board Member
Author of two books on Oracle BI
Co-founder of Rittman Mead, now independent analyst
15+ Years in Oracle BI, DW, ETL + now Big Data
Based in Brighton, UK
How Why
Streaming
Heterogeneous sources with
Enterprise
Chat + JSON payloads
Web sources
Voice + Chat
Transcripts
Invalid
emails
Batch Load Invalid and missing data
from files, DB: Sensitive data
Easy
Embedded Information in
NLP Entities
unstructured text
Load raw text from
blog entries,
reviews
41
Demo Middleware
Demoground
Database
Demoground
Big Data
Showcase
Stations: - Moscone South - Moscone South - Moscone South
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 44
Data Integration Solutions Program - tinyurl.com/DISOOW16
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 45
Oracle Cloud Platform Oracle PaaS Customer
Innovation Awards Appreciation Reception
Meet the Most Impressive Cloud Meet the Most Impressive Cloud
Platform Innovators Platform Innovators
Tuesday, Sep 20, 4:00 p.m. - 6:00 p.m. Tuesday, Sep 20, 6:00 p.m. - 8:30 p.m.
YBCA Theater | 701 Mission St YBCA Theater | 701 Mission St
@OracleDI
Blogs.oracle.com/DataIntegration/
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 51