You are on page 1of 51

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracles products remains at the sole discretion of Oracle.

Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 1
CON6624
Oracle Data Integration Platform
A Cornerstone for Big Data

Christophe Dupupet (@XofDup)


Director | A-Team

Mark Rittman (@markrittman)


Independent Analyst

Julien Testut (@JulienTestut)


Senior Principal Product Manager

September, 2016

Copyright 2016, Oracle and/or its affiliates. All rights reserved. | Confidential Oracle Internal/Restricted/Highly Restricted
Agenda

1 Oracle Data Integration for Big Data


2 Big Data Patterns
3 A Practitioners View on Oracle Data integration for Big Data
4 Q&A

Copyright 2015, Oracle and/or its affiliates. All rights reserved. |


1. DBusiness
A A
ATA
Continuity
LWAYS VAILABLE

Five Core 2. Data Movement


DATA ANYWHERE ITS NEEDED
Capabilities
3. Data
D A
Transformation
ATA FCCESSIBLE IN ANY ORMAT

4. Data
D
Governance
ATA THAT CAN BE TRUSTED

5. Streaming Data
DATA IN MOTION OR AT REST
Eight Core
Products

Cloud or On-
Premise
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 5
#1 Realtime / Streaming
Data Integration Tool

Most #1 Pushdown / E-LT


Data Integration Tool

Innovative st
Technology 1 to certify replication with
Streaming Big Data

1st to certify E-LT tool with


Apache Spark/Python

1st to power Data Preparation


w/ML + NLP + Graph Data

1st to offer Self-Service &


Hybrid Cloud solution
Hybrid Open-Source
...Open Source at the core of speed & batch processing engines

...Enterprise Vendor tools for connecting to existing IT system and

...Cloud Platforms for data fabric


Business Serving
Data Layer
Speed Layer Apps
Data Streams
Raw Data Stream Processing Prepared Data Pub / Sub
Social and Logs

REST APIs Analytics


Enterprise Data
Batch Layer NoSQL
Batch Processing

Highly Available
Databases Bulk Data

Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 7
Reference Architecture
Comprehensive architecture covers key areas #1. Data Ingestion, #2. Data Preparation &
Transformation, #3. Streaming Big Data, #4. Parallel Connectivity, and #5. Data Governance
and Oracle Data Integration has it covered.
Examples
Business Oracle Data Integrator Serving
Data Layer
Data Streams
Speed Layer Apps
Pub / Sub
Social and Logs Stream Analytics
Data Preparation REST APIs Analytics
Dataflow ML
Enterprise Data
Batch Layer NoSQL
GoldenGate
Highly Available
Databases Connectors Bulk Data
Active DataGuard

Data Quality, Metadata Management & Business Glossary

Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 8
Oracle GoldenGate
Oracle GoldenGate provides low-impact capture, routing,
transformation, and delivery of database transactions
across homogeneous and heterogeneous environments in
real-time with no distance limitations.
Cloud

Most Big
Data Transaction Streams
Databases Data
Events
Realtime Performance DBs

Extensible & Flexible * The most popular enterprise integration tool in history
Supports Databases, Big Data and NoSQL:
Proven & Reliable
GoldenGate for Ingest
Applications
Platforms
Applications
Applications
Applications Databus Speed Speed
Layer Layer Serving
Serving
Layer
Layer
Streaming Analytics
REST
Services
User
Updates

Application
DBMS Batch Layer
Batch Layer
Updates
Visualization
Tools
Capture

Deliver
Route
Pump
Trail

GG GG Reporting
Tools

Data Marts

Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 10
Oracle Data Preparation
Oracle Data Preparation is a self-service tool that makes
it simple to transform, prepare, enrich and standardize
business data it can help IT accelerate solutions for the
Business by giving control of data formatting directly to
data analysts.
Files
ETL
Self-Service
Apps Reporting

Better Recommendations

Built-in Data Graph Zero software Better Graph database


to install, easy automation and of real-world
to use browser less grunt work facts used for
based interface for humans enrichment
BDP for Data Preparation
OPPORTUNITY BUSINESS VALUE
UNSTRUCTURED

I want my Enterprise
data!! Reporting

DATA WRANGLING
wastes time and money
Weeks or Discovery
Logs Months & Visualization
Internet
MONTHS of effort
spent on each new
dataset

PROGRAMERS writing
scripts or complex ETL
STRUCTURED

Enterprise
ETL & Data
Big Datas dirty little secret is that 90% of time spent on a project is Integration
devoted to preparing data After all the preparation work, there isnt
enough time left to do sophisticated analytics on it Thomas H. Davenport
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 12
Oracle Data Integrator
Oracle Data Integrator provides high performance bulk
data movement, massively parallel data transformation
using database or big data technologies, and block-level
data loading that leverages native data utilities

Bulk Data
Transformation Cloud
Most Apps, Big
Bulk Data Performance Databases
& Cloud Bulk Data Movement DBs Data

Non Invasive Footprint

Future Proof IT Skills 1000s of Flexible ELT Up to 2x faster


customers workloads run batch processes
more than other anywhere: DBs, and 3x more
ETL tools Big Data, Cloud efficient tooling
Copyright 2015, Oracle and/or its affiliates. All rights reserved. |
ODI for Transformations
Oracle Data Integrator
Applications
ETL Engines
Applications
Applications
Applications Databus Speed Speed
Layer Layer Serving
Serving
Layer
Layer
Kafka REST
Spark Streaming Services
User
Updates
NoSQL
Application ERP
DBMS Batch Layer
Batch Layer
Updates
Oozie Visualization
OGG Tools

Spark SQL
Sqoop Reporting
Tools
Big Data Frameworks
Pig Loaders SQL
Hive Data Marts

Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 14
Business Value of ODI: Only Tool with
Portable Mappings
Runtime exec in
No ETL engine is
Oozie or via ODI
required
Java Agent

Separation of
Rich set of pre-
Logical and
built operators
Physical design

Physical exec on
User defined
SQL, Hive, Pig, or
functions
Spark

Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 15
Oracle Stream Analytics
Oracle Stream Analytics is a powerful analytic toolkit
designed to work directly on data in motion simple data
correlations, complex event processing, geo-fencing, and
advanced dashboards run on millions of events per
second.

Business Friendly Web / Devices


Data
Event Downstream
Data (eg; Hadoop)
DB Data & Transaction Streams
Event

Extreme Performance

Spatial Awareness Innovative dual Simple to use Includes Oracle


model for spatial and geo- GoldenGate for
Apache Spark or fencing features streaming
Coherence grid an industry first transactions
Oracle Dataflow ML
Oracle Dataflow ML is big data solution for stream and
batch processing in a single environment Lambda based
applications that can run streaming ETL for cloud based
analytic solutions.

Streaming Data Cloud


Big Data
Stream or Batch Data Most Apps,
Databases
Bulk Data
Movement
Pipeline DBs
Big
Data
& Cloud

Spark based Pipelines

ML-powered Profiling Batch and Machine


Data movement
stream learning guides
across Oracle
processing at users for data
PaaS services
the same time profiling
Copyright 2015, Oracle and/or its affiliates. All rights reserved. |
Streaming Data
from Devices
Applications
Devices
Applications Databus Speed Speed
Layer Layer Serving
Serving
Layer
Layer

Oracle Stream Analytics REST


Services

Applications
Applications
Applications
Oracle Dataflow ML
Applications
Batch Layer
Batch Layer
Application
Visualization
Tools

Oracle GoldenGate Reporting


Tools
from Databases
Data Marts

Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 18
Oracle Metadata Management
Oracle Metadata Management provides an integrated
toolkit that combines business glossary, workflow,
metadata harvesting and rich data steward collaboration
features.
BI Report Lineage

Taxonomy Lineage
Business Glossary
Data Model Lineage
End-to-End Lineage
Supports Databases, Big Data, ETL Tools, BI Tools etc:
100+ Supported Systems
OEMM for Data Governance
Applications
Applications
Applications
Applications Databus Speed Speed
Layer Layer Serving
Serving
Layer
Layer Data Catalog
Kafka REST
Generated Streaming
User
Services
140+ Supported Tools
Updates
NoSQL
Application
DBMS Batch Layer
Batch Layer
Updates
ER Models ETL HDFS Files Visualization
Tools Tools

Generated ETL Code BI Models


Sqoop Reporting
Tools
OLTP Databases HCatalog
Hive Data Warehouses
Data Marts

Oracle Enterprise Metadata Management

Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 20
Eight Core
Products

Cloud or On-
Premise
Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 21
Agenda

1 Oracle Data Integration for Big Data


2 Big Data Patterns
3 A Practitioners View on Oracle Data integration for Big Data
4 Q&A

Copyright 2015, Oracle and/or its affiliates. All rights reserved. |


4 Business Patterns of Big Data Customer Adoption
1. Analytic Data Sandbox:
Stakeholder: Functional Line of Business (LoB)
Core Value: Faster access to business data, Faster Leverage Wide Range of Modern Analytic Styles
time to value on Analytics
Innovation: Schema-on-read empowers rapid data In-Motion Data First Model First
staging and true Data Discovery Analytics Analytics Analytics
2. ETL Offload:
Stakeholder: Information Technology (IT) Staging Sandbox
Core Value: Cost avoidance on DW/Marts
Innovation: YARN/Hadoop empowers lower cost
compute and lower cost storage
3. Deep Data Storage: Streaming Deep Data
Stakeholder: Risk / Compliance (LoB) Storage
Core Value: High fidelity aged data
Innovation: SQL on Hadoop engines enable very low
cost, queryable data access
DBMS
4. Streaming: ETL Offload (on prem or cloud)

Stakeholder: Marketing (LoB) / Telematics (LoB)


Core Value: New Data Services or Higher Click Rates
Innovation: MPP capable streaming platforms
combined with modern in-motion analytics

Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential, under Non-Disclosure 23
Analytic Data Sandbox
Discovery, Exploratory and Business Intelligence, Reporting and
Visualization Style Analytics Dashboard Style Analytics
Analytic Data Sandbox: Oracle Endeca, Big Data Discovery Oracle BIEE, Visual Analyzer
Stakeholder: Functional Line of Business (LoB) Tableau, Cliq, Spotfire Cognos, SAS, MicroStrategy
DataMeer etc Business Objects, Actuate etc
Core Value: Faster access to business data, Faster
time to value on Analytics
Innovation: Schema-on-read empowers rapid data Data First Model First
staging and true Data Discovery Analytics Analytics
Industries: All industries BI Self Service
Staging Sandbox
Supports Data First Style of Analytics
No schema required Often the data flow may
not require any ETL Tooling
Staging data is simple and fast
Minimal data preparation required
(mainly for un/semi-structured data sets)

Typical Customer Data Types / Sets Other data flows may still
require ETL as a pipeline
Usually bringing in Structured Data from OLTP
(Primary data is their existing Application data) DBMS
ETL Offload (on prem or cloud)

Often bringing in Semi-Structured data


(Secondary data is clickstream, logs, machine data)
Business value is usually in the combination of the
various data sets and the improved speed of
discovery

Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential, under Non-Disclosure 24
ETL Offload
Discovery, Exploratory and Business Intelligence, Reporting and
Visualization Style Analytics Dashboard Style Analytics
2. ETL Offload: Oracle Endeca, Big Data Discovery Oracle BIEE, Visual Analyzer
Stakeholder: Information Technology (IT) Tableau, Cliq, Spotfire Cognos, SAS, MicroStrategy
DataMeer etc Business Objects, Actuate etc
Core Value: Cost avoidance on DW/Marts
Innovation: YARN/Hadoop empowers lower cost
compute and lower cost storage Data First Model First
Industries: Teradata, Netezza & AbInitio customers Analytics Analytics

Supports Model First Style of Analytics Staging Sandbox


Schemas required
(for working areas, sources and targets)
Staging data requires modeled staging tables
Data preparation required (mapping data sets)
(un/semi-structured data sets require pre-parsing)

Typical Customer Data Types / Sets Primary Data Flow Requires


Data Integration Tools
Usually bringing in Structured Data from OLTP Apps
(Primary data is their existing Application data) DBMS
ETL Offload (on prem or cloud)

Occasionally adding new data types to EDW schema


(Secondary data is clickstream, logs, machine data)
Business value is usually tied to the cost avoidance
around escalating DW and ETL tooling costs

Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential, under Non-Disclosure 25
Deep Data Storage
Discovery, Exploratory and Business Intelligence, Reporting and
Visualization Style Analytics Dashboard Style Analytics
3. Deep Data Storage: Oracle Endeca, Big Data Discovery Oracle BIEE, Visual Analyzer
Stakeholder: Risk / Compliance (LoB) Tableau, Cliq, Spotfire Cognos, SAS, MicroStrategy
DataMeer etc Business Objects, Actuate etc
Core Value: High fidelity aged data
Innovation: SQL on Hadoop engines enable very low
cost, queryable data access Data First Model First
Industries: Insurance and Banking Analytics Analytics

Typically Deep Storage of Relational Data Staging Sandbox


Schemas required Pattern
(item detail records, not necessarily aggregates) mining
Archival can be on the way in as part of routine
loading, and also via periodic pruning from the Deep Data
Queryable Archive

EDW and data marts Storage


Compliance

Popular with SQL on Hadoop and Federation


Teradata Query Grid from Teradata/Aster
IBM BigSQL from Netezza/PureData DBMS
ETL Offload (on prem or cloud)

Oracle Big Data SQL from Exadata


Pivotal HAWQ from Greenplum
Cisco Composite Software also selling on this use
case (in addition to BI Virtualization)

Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential, under Non-Disclosure 26
Streaming Big Data Analytics
4. Streaming:
Stakeholder: Marketing (LoB) / Telematics (LoB)
Core Value: New Data Services or Higher Click Rates
Innovation: MPP capable streaming platforms
combined with modern in-motion analytics In-Motion Data First Model First
Industries: Automotive, Aerospace, Industrial Analytics Analytics Analytics
Manufacturing, some Energy/Oil & Gas
Staging Sandbox
Decisions on Data Before it hits Disk Pattern
Data volume may be too high to persist all data mining
Only save the important data
Data may be highly repetitive (sensor data)
Correlations may need to happen with very low Streaming Deep Data
Storage
latency requirements based on LoB demand
Other data flows may still
Key Use Case for Data Monetization require ETL as a pipeline

Customers are standing up new Data Services (eg; DBMS


ETL Offload (on prem or cloud)
realtime equipment failure alerts and subscription
based monitoring)
Connected Car services from most car makers
Disaster preparedness centers Energy/Aerospace

Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential, under Non-Disclosure 27
Some Common Themes Across Use Cases
1. Nearly 100% Analytic Use Cases 6. Customers are Starting in Phases:
Data Discovery directly in Hadoop By Value: IT led vs. LoB led initiatives have different
ETL Offloading for analytics in SQL DB characteristics even if the Lake / Reservoir factors in as a
Deep Data Storage for analytics in SQL DB long term goal, the initial phases are often quite small in scale
Streaming Analytics for data before it hits disk Lambda Arch 7. Size of Hadoop Clusters vary widely:
2. Nearly all the Data is Structured Data: Investment Sizes Differ (by a lot): some start with mega-
OLTP Sources: every customer starts with the trusted data sets commitments (1000s of Nodes) and others start very small
that already drive the majority of business value App Data 8. Commodity H/W Clusters Dominate:
New Sources: Clickstream Logs, Machine Data and other App Commodity: for use cases designed to work across groups
Exhaust all have structure even if they may not have schema Appliances: for use cases attached to a single project
3. Many more Sources are App/OLTP Sources: 9. Data Lakes as a Way to Handle Vendor Diversity:
By Quantity of Sources: most customers have many (dozens or Middleware for Data: bigger customers have DWs/DBs from
hundreds) of App/OLTP source they are bringing in every vendor and >6+ different BI tools; Hadoop is becoming the
By Volume: by quantity of data, the amount of Machine Data or canonical data platform to sit in between
Log data may often exceed the OLTP data sets
10. Open Source Data Platform is a Strategic Priority:
4. Mainframes Matter: Senior Stakeholder Feedback: as a design point priority for their
High Value App : most of the biggest customers bringing next gen it is becoming more important that Open Source has
mainframe (DB2/z, IMS, VSAM) data to Hadoop a central role to play in the enterprise data platform
5. Multiple Projects / Programs using Hadoop: 11. Industry Clusters:
Larger Customers: most of the biggest customers have multiple 1. Banking, 2. Insurance, 3. Manufacturing, 4. Media, 4. Retail
Hadoop projects running in parallel, some are IT led (DW/ETL
Offload) and others are LoB led (Discovery/Telematics)

Copyright 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential, under Non-Disclosure 28
Agenda

1 Oracle Data Integration for Big Data


2 Big Data Patterns
3 A Practitioners View on Oracle Data Integration for Big Data
4 Q&A

Copyright 2015, Oracle and/or its affiliates. All rights reserved. |


THOUGHTS ON ORACLE DATA INTEGRATION
FOR BIG DATA - A PRACTITIONER'S VIEW
Mark Rittman, Oracle ACE Director

ORACLE OPENWORLD 2016, SAN FRANCISCO

T : @m ar k ri t tm an
About the Presenter
Oracle ACE Director, blogger + ODTUG member
Regular columnist for Oracle Magazine
Past ODTUG Executive Board Member
Author of two books on Oracle BI
Co-founder of Rittman Mead, now independent analyst
15+ Years in Oracle BI, DW, ETL + now Big Data
Based in Brighton, UK

( C ) Mar k R i t t m an 2016 31 W : h t t p : / / www. r i t t m a n .c o .u k T : @m ar k r it tm an


Big Data Technology Core to Modern BI Platforms
Every engagement and customer discussion has Big Data central to the project
Hadoop extending traditional DWs through scalability, flexibility, cost, RDBMS -compatibility
Hadoop as the ETL engine driven by ODI Big Data KMs
New datatypes and methods of analysis enabled by Hadoop schema -on-read
Project innovation driven by machine learning, streaming, ability to store + keep *all* data

And what is driving the interest in these projects?

Oracle Big Data Platform


Operational Data
Data Factory
Data Reservoir
Segments
Transactions
OGG for Oracle
Big Data 12c Raw
Customer Data
Mapped Models Data Visualization
Customer Customer Data
Master ata Data stored in Machine
Data sets
Data streams
ODI12c
the original
produced by
Learning Oracle Big Data Appliance Oracle Big Data Appliance
format (usually
files) such as
mapping and
Marketing / Starter Rack + Expansion Starter Rack + Expansion
transforming
SS7, ASN.1,
Event, Social + Oracle JSON etc.
raw data Sales Applications
Unstructured Data Stream Cloudera CDH + Oracle software Cloudera CDH + Oracle software
Analytics 18 High-spec Hadoop Nodes with 18 High-spec Hadoop Nodes with
Data sets and Models and InfiniBand switches for internal Hadoop Infiniband InfiniBand switches for internal Hadoop
Oracle samples traffic, optimised for network throughput traffic, optimised for network throughput
programs
Data 1 Cisco Management Switch 1 Cisco Management Switch
Preparation Oracle Big Data Discovery Single place for support for H/W + S/W Single place for support for H/W + S/W
Voice + Chat
Transcripts Safe & secure Discovery and Development
environment
Scoring
Enriched
Customer Profile
Modeling

( C ) Mar k R i t t m an 2016 32 W : h t t p : / / www. r i t t m a n .c o .u k T : @m ar k r it tm an


( C ) Mar k R i t t m an 2016 33 W : h t t p : / / www. r i t t m a n .c o .u k T : @m ar k r it tm an
( C ) Mar k R i t t m an 2016 W : h t t p : / / www. r i t t m a n .c o .u k T : @m ar k r it tm an
The Big data Secret? ITs all about Data Integration
Data from all the sources will need to be integrated to create the single customer view
Hadoop technologies (Flume, Kafka, Storm) can be used to ingest events, log data
Files can be loaded as is into the HDFS filesystem
Oracle/DB data can be bulk-loaded using Sqoop
GoldenGate for trickle-feeding transactional data
But nature of new data sources brings challenges
May be semi-structured or unknown schema
Joining schema-free datasets
Need to consider quality and resolve incorrect,
incomplete, and inconsistent customer data

How Why
Streaming
Heterogeneous sources with
Enterprise
Chat + JSON payloads
Web sources

Single Customer Enriched


Customer Profile
ViewData from
structured +
schema-on-read
Who sources needs
integrating What
M/L
Apply Schema to
Raw and Semi- Requires
Structured Data preparation +
obfuscation

( C ) Mar k R i t t m an 2016 35 W : h t t p : / / www. r i t t m a n .c o .u k T : @m ar k r it tm an


Landing, Preparing and Securing Raw Data is *Hard*
Finding raw data is easy; then the real work needs to be done - can be > 90% of project
Four main tasks to land, prepare and integrate raw data to turn it into a customer profile
1. Ingest it in real-time into the data reservoir
2. Apply Schema to Raw and Semi-Structured Data
3. Remove Sensitive Data from Any Input Files
4. Transform and map into your Customer 360 -degree profile

( C ) Mar k R i t t m an 2016 36 W : h t t p : / / www. r i t t m a n .c o .u k T : @m ar k r it tm an


Oracle Big Data Preparation Cloud Service
Data enrichment tool aimed at domain experts, not programmers
Uses machine-learning to automate
data classification + profiling steps
Automatically highlight sensitive data,
and offer to redact or obfuscate
Dramatically reduce the time required
to onboard new data sources
Hosted in Oracle Cloud for zero-install
File upload and download from browser
Automate for production data loads Raw Data Mapped Data

Data stored in the original Data sets produced by


format (usually files) such mapping and transforming
as SS7, ASN.1, JSON etc. raw data

Voice + Chat
Transcripts

( C ) Mar k R i t t m an 2016 37 W : h t t p : / / www. r i t t m a n .c o .u k T : @m ar k r it tm an


Step 2: Apply Schema to Raw and Semi-Structured Data

Invalid
emails
Batch Load Invalid and missing data
from files, DB: Sensitive data
Easy

Stream from Embedded Information


APIs, HTTP: No reliable patterns
Moderate

Embedded Information in
NLP Entities
unstructured text
Load raw text from
blog entries,
reviews

( C ) Mar k R i t t m an 2016 38 W : h t t p : / / www. r i t t m a n .c o .u k T : @m ar k r it tm an


Step 3: Remove Sensitive Data from Any Input Files
Automatically profile and analyse datasets
Use Machine Learning to spot and obfuscate sensitive data automatically

( C ) Mar k R i t t m an 2016 39 W : h t t p : / / www. r i t t m a n .c o .u k T : @m ar k r it tm an


Step 4 : Transform, Join + Map into Polyglot Data Stores
Oracle Data Integration offers a wider set of products for managing Customer 360 data
Oracle GoldenGate
Oracle Enterprise Data Quality
Oracle Data Integrator
Oracle Enterprise Metadata
Management
All Hadoop enabled
Works across Big Data,
Relational and Cloud

( C ) Mar k R i t t m an 2016 40 W : h t t p : / / www. r i t t m a n .c o .u k T : @m ar k r it tm an


Future-Proof Big Data Integration Platform
Projects build yesterday using MapReduce today need to be rewritten in Spark
Then Spark needs to be upgraded to Spark Streaming + Kafka for real time
Upgrades, and replatforming onto the latest tech, can bring fragile initiatives to a halt
ODIs pluggable KM approach to big data integration makes tech upgrades simple
Focus time + investment on new big data initiatives
Not rewriting fragile hand-coded scripts

Big Data Management Platform

Big Data Platform - All Running Natively Under Hadoop

Hive + Pig Spark Data


Kafka + Spark Apache Warehouse
(Log processing, (In-Memory
UDFs etc) Data Processing)
Streaming Beam?
ODI Curated data :
Desktop Historical view and
Client business aligned
YARN (Cluster Resource Management) access
Scoring
Enriched
HDFS (Cluster Filesystem holding raw data)
Customer Profile
Modeling

Data sets and


Models and programs
samples

Discovery & Development Labs


Safe & secure Discovery and Development environment

41

( C ) Mar k R i t t m an 2016 41 W : h t t p : / / www. r i t t m a n .c o .u k T : @m ar k r it tm an


And the Next Challenge : Data Quality + Provenance
Big data projects have had it easy so far in terms of data quality + data provenance
Innovation labs + schema-on-read prioritise discovery + insight, not accuracy and audit trails
But a data reservoir without any cleansing, management + data quality = data cesspool
and nobody knows where all the contamination came from, or who made it worse

( C ) Mar k R i t t m an 2016 42 W : h t t p : / / www. r i t t m a n .c o .u k T : @m ar k r it tm an


Data Governance : Why I Recommend Oracle DI
Tools
From my perspective, this is what makes Oracle Data Integration my Hadoop DI platform of choice
Most vendors can load and transform data in Hadoop (not as well, but basic capability)
Only Oracle have the tools to tackle
tomorrows Big Data challenge:
Data Quality + Data Governance
Oracle Enterprise Data Quality
Oracle Enteprise Metadata Mgmt
Seamlessly integrated with ODI
Brings enterprise smarts to
less mature Big Data projects

( C ) Mar k R i t t m an 2016 43 W : h t t p : / / www. r i t t m a n .c o .u k T : @m ar k r it tm an


Data Integration Solutions Program - tinyurl.com/DISOOW16

Presen- Oracle Oracle


Oracle Oracle
Oracle Enterprise Big Data
tations Data
Integrator
GoldenGate
Enterprise
Data Quality
Metadata Preparation
Management Cloud Service
on:
Oracle Oracle Oracle Big Data
Hands- Enterprise GoldenGate
ODI and OGG
for Big Data
Preparation
Data Quality Deep Dive Cloud Service
on labs: HOL7466 HOL7528
HOL7434
HOL7432

Demo Middleware
Demoground
Database
Demoground
Big Data
Showcase
Stations: - Moscone South - Moscone South - Moscone South

Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 44
Data Integration Solutions Program - tinyurl.com/DISOOW16

Monday, Sept 19 Wednesday, Sept 21


Oracle Data Integration Solutions Platform Overview and Roadmap Data Quality for the Cloud: Enabling Cloud Applications with Trusted Data
[CON6619 ] [CON6629]
Oracle Data Integration: the Foundation for Cloud Integration [CON6620 ] Transforming Streaming Analytical Business Intelligence to Business
A Practical Path to Enterprise Data Governance with Cummins [CON6621] Advantage [CON7352]
Oracle Data Integrator Product Update and Strategy [CON6622] Oracle Enterprise Data Quality for All Types of Data [HOL7466]
Deep Dive into Oracle GoldenGate 12.3 New Features for the Oracle 12.2 Oracle GoldenGate for Big Data [CON6632]
Database [CON6555] Accelerate Cloud On-Boarding using Oracle GoldenGate Cloud Service
[CON6633]
Oracle GoldenGate Deep Dive and Oracle GoldenGate Cloud Service for Cloud
Onboarding [HOL7528]
Tuesday, Sept 20
Oracle Big Data Integration in the Cloud [CON7472]
Oracle Data Integration Platform: a Cornerstone for Big Data [CON6624] Thursday, Sept 22
Oracle Data Integrator and Oracle GoldenGate for Big Data [HOL7434] Best Practices for Migrating to Oracle Data Integrator [CON6623]
Oracle Enterprise Data Quality Product Overview and Roadmap Best Practices for Oracle Data Integrator: Hear from the Experts [CON6625]
[CON6627] Dataflow, Machine Learning and Streaming Big Data Preparation [CON6626]
Self Service Data Preparation for Domain Experts No Programming Data Governance with Oracle Enterprise Data Quality and Metadata
Required [CON6630] Management [CON6628]
Oracle Big Data Preparation Cloud Service: Self-Service Data Prep for Faster Design, Development and Deployment with Oracle GoldenGate Studio
Business Users [HOL7432] [CON6634]
Oracle GoldenGate 12.3 Product Update and Strategy [CON6631] Getting started with Oracle GoldenGate [CON7318]
New GoldenGate 12.3 Services Architecture [CON6551] Best Practice for High Availability and Performance Tuning for Oracle
Meet the Experts: Oracle GoldenGate Cloud Service [MTE7119] GoldenGate [CON6558]

Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 45
Oracle Cloud Platform Oracle PaaS Customer
Innovation Awards Appreciation Reception
Meet the Most Impressive Cloud Meet the Most Impressive Cloud
Platform Innovators Platform Innovators
Tuesday, Sep 20, 4:00 p.m. - 6:00 p.m. Tuesday, Sep 20, 6:00 p.m. - 8:30 p.m.
YBCA Theater | 701 Mission St YBCA Theater | 701 Mission St

Meet peers who implemented FREE Appreciation Reception for all


cutting-edge solutions with Oracle Oracle PaaS Customers directly
Cloud Platform following the Innovation Awards
Learn how you can Transform your Ceremony
Business No OpenWorld pass is required to attend this reception

No registration or OpenWorld pass required to attend

Copyright 2016 Oracle and/or its affiliates. All rights reserved. |


Connect with Oracle Data Integration

Oracle Data Integration

@OracleDI

Blogs.oracle.com/DataIntegration/

Oracle Data Integration

Copyright 2015, Oracle and/or its affiliates. All rights reserved. |


Agenda

1 Oracle Data Integration for Big Data


2 Big Data Patterns
3 A Practitioners View on Oracle Data integration for Big Data
4 Q&A

Copyright 2015, Oracle and/or its affiliates. All rights reserved. |


Copyright 2016, Oracle and/or its affiliates. All rights reserved. | 49
Safe Harbor Statement
The preceding is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracles products remains at the sole discretion of Oracle.

Copyright 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 51

You might also like