Professional Documents
Culture Documents
2
Overwhelmed by new data types?
Sensor- / machine-based data
Sentiment Clickstream
data Big Data
Transactions, Interactions, Observations
data
3
80% of new data in 2015 will land on Hadoop!
4
Hadoop Core
Data Presentation
Integration
Analytics
Analytics Apps Transactional Apps
Batch Middleware
Access Management
Ingestion
Operations
Security
Data Processing
Data Management
Metadata Distributed Distributed
Data Encryption
Services Non-relational Structured In
Storage Processing
DB Memory
(HDFS) (MapReduce)
Infrastructure
Virtualization Compute / Storage / Network
5
Example application landscape
Machine Learning
Real Time (Mahout, etc)
Streams
(Social,
sensors)
Real-Time
Processing
(s4, storm,
spark) Data Visualization
(Excel,
(Excel, Tableau)
Tableau)
Cloud Infrastructure
Compute Storage Networking
Source: Vmware
Disruptive innovations in Big Data
..
7
Innovations: Hadoop is 100x cheaper per TB
than in-memory appliances like HANA and
handles unstructured data as well
Hadoop
High Performance Ecosystem
BI Forward-looking
Legacy BI predictive analysis
Quasi-real-time
analysis Questions defined in
Backward-looking the moment, using
analysis Using data out of
Business business applications data from many
Business Using data out of sources
problem
problem business applications
Selected Vendors
SAP Business Objects Oracle Exadata Hadoop distributions
IBM Cognos SAP HANA No ACID transactions
Technology
Technology MicroStrategy Limited SQL Set (joins)
Solution
Solution Data Type/Scalability
Structured Structured Structured or
Limited (2 3 TB in Limited (2 8 TB in unstructured
RAM) RAM) Unlimited (20 30 PB)
True big data
Legacy vendor definition of big data
Innovations:
Store first, ask questions later
Much cheaper storage
but not just storage
Illustrative acquisition cost ? !
Based on HDS Based on Netapp Based on Netapp Hardware can be Based on large
SAN Storage FAS-Series E-Series (NOSH) self-assembled scale object
storage interfaces
1) 9
Hadoop offers Storage + Compute (incl. search). Data Cloud offers Amazon S3 and native storage functions
Target use cases
Higher IT Infrastructure Business Line of Business & CXO
& Operations Intelligence & Business Analysts
Data Warehousing
Capacity Planning &
Utilization New
Customer Profiling & Business
Enterprise Data Revenue Analytics Models
Potential Warehouse Targeted Advertising
value Offload Analytics
Enterprise Data Service Renewal
Warehouse Implementation
Archive CDR based Data
Lower Cost
ETL Offload Analytics
Storage
Fraud Management
Enterprise
Data Lake
Lower
Shorter Longer
Time to value
10
Enterprise data warehouse offload use case
The Challenge The Solution
Many EDWs are at capacity Hadoop for data storage and
Running out of budget before processing: parse, cleanse,
running out of relevant data apply structure and transform
Older data archived in the dark, Free EDW for valuable queries
not available for exploration Retain all data for analysis!
Operational (50%)
Operational (44%)
Analytics (50%)
Analytics (11%)
HADOOP
ETL Processing (42%) Cost is
1/10th Storage & Processing
11
From data puddles and ponds to lakes and oceans
AVOID: GOAL:
Systems separated by workload Platform that natively supports
type due to contention mixed workloads as shared service
Page 12
Questions to ask in designing a solution
for a particular business use case
Presentation Which distribution is right for your needs today vs. tomorrow?
Data Application
Which distribution will ensure you stay on the main path of
Operations
Inte-
Security
gra-
tion Data Processing
open source innovation, vs. trap you in proprietary forks?
Data Management
Infrastructure
We cant justify
We dont have big We dont have
the budget for a
data problems petabytes of data
new project
14
Every organization has data problems!
Hadoop can help
MYTH: MYTH:
Big Data means Petabytes Big Data means Data Science
Not just Volume Hadoop solves existing problems
Remember Variety, Velocity faster, better, cheaper than
Plenty of issues at smaller scales conventional technology, e.g.
Data processing Landing zone capturing and
Unstructured data refining multi-structured data
types with unknown future value
Often warehouse volumes are small
Cost effective platform for
because the technology is
retaining lots of data for long
expensive, not because there is no
periods of time
relevant data
Walk before you run
Scalability is about growing with the
business, affordably and predictably Big Data Is a State of Mind
15
Waves of adoption crossing the chasm
Wave 3
Wave 2 Real-Time Orientation
Interactive Orientation
Wave 1
Batch Orientation
17
Challenges in the Enterprise
18
Questions?
gregory.smith@t-systems.com
10 110 1011
0
0 1 0 1 0101111
11100 1
1 011 00
01111 010110
110 01 010101 1101
10110 0101
0 1 011 0
111 0 0101111
0