Professional Documents
Culture Documents
1. 2. 3. 4.
Pull Out Your Phone Open Your Texting App Prepare to Send a Text to 22333 Here are the Possible Text Responses
223-33
Todays Agenda
Big Reality:
Real Solutions
SOCIAL
BLOG
SMART METER
VOLUME
VELOCITY
VARIETY
VALUE/ VERACITY
Structured: aka "Process-mediated": Examples: ERP, CRM , POS Characteristics: transactional, referential, relational, Traditionally IT managed
Semi-Structured: aka "Machine-generated Examples: XML, JSON, Network Logs, Sensor Data Characteristics: Well suited to computer processing but massive in Volume and accumulation, often too large for EDW Unstructured: aka "Human-sourced": Examples: CDR, Doctor's Notes, Social Media, Audio, Video, comment fields Characteristics: subjective record of personal experiences, structuring required to realize value
Users
Data Virtualization
BI / Analytics
Hadoop
Disparate, non-performing
Integrated, performance-optimized
Human Information
Hadoop
NoSQL
(Traditional) EDW
Analytics Mart
In-Memory
Vendor Specific
1T Rows
10
Reliability
Meridium Capstone DB / Excel
Supply Chain
Aspen EDO Scheduling
Engineering
Intergraph AspenTech AutoCAD Bentley Aveva
Finance
SAP JDE Oracle
SAP ESS
I was thinking if we created a new search engine with added functionality to help me find that obscure R2-D2 action figure faster and easier?
Bob, what if we could answer these questions What did someone buy?, What did they bid on it? Its also Where were they at the time? Its also Who influenced them within their social circle? All that data is amassed. cool idea? We can build a Big Datas infrastructure challenge first test run of a Hadoop cluster consisted of 400 nodes and Two 24-petabyte clusters this will allow us to bring the processing to where the data sits, removing the need for time-consuming data transfers Larger Index than Voyager More descriptions, history & metadata in indexes 100 engineers, all new codebase, 18 months
Reference check 21 image DB Clickstream data from banking web site Meaning based positive comments
From a database get me all matches from the CRM and Call Detail Records that match the query
From unstructured sources get me all matches for calls, chat, email that were positive for the structured results
Pull < total 30% of net worth from Check 21 Image database
Customers who conducted net worth report from our banking web site
Structured Data
Columnar
10,015,664,356,165 rows (10 trillion) 22B-43B rows loaded daily 20 node x86 cluster 1.794 petabytes raw data 7:1 compression
Unstructured Data
In 2009
Peanut butter recall cost producers $1,000,000,000
48 million people get sick 128,000 are hospitalized and 3000 die each year from foodborne diseases
What where people talking about two weeks before the recall? How did they feel? Where were they? What were they eating?
16
How Do I Start?
Data Analytics Gap Analysis
Proof of Value
Vendor Agnostic Assessment of Current State and Go-Forward Strategy Benchmark your Data Management Infrastructure against industry
ROI Assessment
17
Thank You
18