You are on page 1of 35

Utilizing Big Data Analytics

with Hadoop
Fern Halper
@fhalper
TDWI Research Director for Advanced Analytics
April 17, 2014
Sponsor


3
Speakers
Fern Halper
Research Director for
Advanced Analytics,
TDWI
Tapan Patel
Product Marketing Manager,
SAS
Agenda
The evolving big data ecosystem
Status of big data, analytics,and hadoop
Considerations for getting started

4
New TDWI Checklist

Free to download
http://tdwi.org/rese
arch/list/tdwi-
checklist-
reports.aspx

An evolving ecosystem
6
Hadoop
Big data
Advanced
Analytics
in-memory
Examining the pieces: Big Data
7
Social
M2M/IoT
Text
Mobile/Location
Volume
Formats

70% of those respondents
using or currently using predictive
analytics are utilizing big data


(source: TDWI Predictive Analytics Best Practices Report, 2014)
8
Examining the pieces: Analytics
The Analytics Spectrum
Excel
Dashboards
and Reports
Other BI Visualization
Advanced
Analytics
9
Advanced Analytics
10
Advanced analytics provides algorithms for
complex analysis of either structured or unstructured
data. It includes sophisticated statistical models,
machine learning, text analytics, advanced
visualization, and other advanced
data mining techniques.
Examining the pieces: Hadoop


HDFS/MapReduce
Schema on read
Ecosystem of tools
Commercial distributions

11
In-memory analytics
Performance
Interactivity
12
Status: Evolving architectures
13
Source: (TDWI Evolving Data Warehouse Architectures In the Age of Big Data, 2014) n=1688 responses
What technical issues or practices are driving change in your DW architecture?
Select all that apply.
Status: Big data pieces
14
Status: Analytics pieces
15
Considerations
16
Defining the problem
Data preparation
Analyzing the data
Making it work (i.e., the team)
Governance
Data preparation
ETL vs. ELT
Data quality
Metadata
17
Data exploration
18
Query
Visualization
Descriptive statistics
Analysis
19
Data mining
Supervised
Unsupervised
Other analytics
Operationalize
20
Business process
In-database scoring
Skills
21
Computing
Analytic modeling
Creative thinker
Communicator
Big Data:
The Big Data Maturity Model
22
Poll Question
Are you making use of Hadoop for advanced
analytics
Yes
No, but were thinking about it
No, and no plans to do so
Dont know
23
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
UTILIZING BIG DATA ANALYTICS
WITH HADOOP
TAPAN PATEL, PRODUCT MARKETING MANAGER, SAS
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
DATA TO DECISION LIFECYCLE
TEXT
COMPETITIVE
ADVANTAGE
PREPARE
DATA
E
X
P
L
O
R
E

D
A
T
A

DEVELOP
MODELS
D
E
P
L
O
Y

&


M
O
N
I
T
O
R

Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
ACCESS TO HADOOP
HADOOP
Hive QL


SAS
SERVER
Push some of SAS processing to Hadoop
1
Key Offerings:
SAS/Access to Hadoop
SAS/Access to Cloudera Impala
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
EMBEDDED PROCESS FRAMEWORK
HADOOP
SAS Data Step
& DS2


SAS
SERVER
Push SAS processing to Hadoop with MapReduce 2
Key Offerings:
SAS Scoring Accelerator for Hadoop
SAS Data Quality Accelerator for Hadoop
SAS Code Accelerator for Hadoop
SAS Data Management
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
SAS

IN-MEMORY ANALYTICS AND HADOOP


In-memory processing; use Hadoop for storage persistence and commodity computing 3
SAS

LASR ANALYTIC
SERVER
SAS

IN-MEMORY
SAS

IN-MEMORY
SAS

IN-MEMORY
SAS

IN-MEMORY
SAS

IN-MEMORY
HADOOP WEB CLIENTS APPLICATIONS
ERP
SCM
CRM
Images
Audio
and Video
Machine
Logs
Text
f
Web and
Social
Data Discovery and
Visualization
Statistics and
Predictive Analytics
Data Management
Text Analytics
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
SAS

VISUAL
STATISTICS
INTERACTIVE PREDICTIVE ANALYTICS
EXPLORE AND
DISCOVER
PREDICT AND
REFINE
DEPLOY AND
MONITOR
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
SAS

VISUAL
STATISTICS
INTERACTIVE PREDICTIVE ANALYTICS
Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
SAS

IN-MEMORY
STATISTICS FOR
HADOOP
WHAT IS IT
Provides a single interactive programming environment
for Hadoop to perform:
analytical data manipulation
variable transformations
exploratory analysis
statistical modeling and machine learning
integrated modeling comparison and scoring

Takes advantage of distributed in-memory computing
optimized for analytical workloads
TEXT
MANIPULATE
DATA
E
X
P
L
O
R
E

D
A
T
A

DEVELOP
MODELS
S
C
O
R
E


Copyr i ght 2014, SAS I nst i t ut e I nc. Al l r i ght s r eser ved.
SAS

IN-MEMORY STATISTICS FOR HADOOP


PRODUCT DEMONSTRATION
33
Questions?
34
Download a free
copy of the report
Download the report as a PDF
file at:

http://tdwi.org/research/2014/03/
checklist-utilizing-big-data-
analytics-with-hadoop

Feel free to distribute the PDF file
of any TDWI Checklist Report
35
Contact Information
If you have further questions or comments:


Fern Halper, TDWI
fhalper@tdwi.org


Tapan Patel, SAS
Tapan.Patel@sas.com

You might also like