Professional Documents
Culture Documents
Deloitte Analytics
Your data, inside out
Big Data refers to the set of problems and subsequent technologies developed to
solve them that are hard or expensive to solve in traditional relational databases
Flexibility of Big Data technologies is traded with consistency and integrity of RDBMS
technologies
Big Data technologies are complementary and not a replacement for RDBMS technologies
2013
Multiple Commercial distributions of
Increase in Big Data powered projects
2010
Facebook announce their Hadoop cluster has 21 PB of storage
July 27, 2011 the data had grown to 30 PB
June 13, 2012 the data had grown to 100 PB
November 8, 2012, the warehouse grows by roughly half a PB per day
2006
Google releases paper describing Big Table 2008
Yahoo announce 10,000
2004 core Linux Hadoop cluster
Google releases paper is powering search
describing MapReduce
2005
Hadoop open-source
implementation of
Attempts to use multiple Time MapReduce created
RDBMS through sharding
Processing Solutions
T - Traditional I - In-Memory
Traditional General Appliances
Purpose Processing
Systems Distributed
Computing Architectures
M - MPP - Massively C- Distributed
Parallel Processing Cluster Systems
Common Usage Scenarios for Processing Systems
Turnaround Time / Processing Velocity
Batch Near Real Time - Real Time
T Traditional M MPP M MPP M MPP + I In-Memory
Structured
Data volume Data volume
T Traditional + M MPP C Distributed Cluster M Specialized MPP + C Distributed Cluster
Variety
Semi-
structured Data volume Data volume
C Distributed Cluster Specialized Systems (Hybrid Solutions)
Unstructured
Data volume Data volume
Costs
Syntactic Analysis and text mining, with statistical models for keywords and key
Search & topics detection
Analysis Semantic Analysis with a Natural Language Processing engine and ontologies,
to map concepts in a specific context
Dynamic and easy-to-use reports for freely navigating across data with no
Data predefined paths
Discovery
Tools with the ability to combine structured and unstructured data from
Data disparate systems and automatically organize information for search, discovery, and
Mash-Up analysis
48 2M 47K 2.5 8 35
Hours of video queries on App downloads Zettabytes of Zettabytes of Zettabytes of
uploaded every Google every per minute via world data in world data in world data in
minute to Youtube minute iTunes 2010 2015 2020
Web / Social Media Machine to Machine Big Transaction Data Biometric Human Generated
?
One Third of Business Uncertainty is due to Hypothesis - Long Term rewards
leaders do not trust inconsistency, validation to determine and better outcomes
the information they ambiguity, latency if the data will have a from hidden
use and approximation meaningful impact relationships in data
Unstructured
Social Interaction Data Attitudinal Data
Customer contact notes
media
Customer Email Options
Contracts
survey text Chat transcription Preferences
Scanned documents Web crawling Call Center notes Needs and Desires
Web analytics Market Research
E-mail Log Web In person dialogues Social Media
Competitor
Web Cust. Experience
scans External
Internal External (credit and risk) Behavioural Data Descriptive Data
Marketing data agency data
P&L Telephone Orders Attributes
Customer directory Transactions Characteristics
survey results Payment History Self Declared Info
Socio-demographic
Customer contact logs Usage History Social Geo /
Name, Address details Demographics info
Price benchmark comparisons
Transaction history
Structured
TMT Energy
Manufact
Insurance uring
Log Analysis / Ad
Social Networks Analysis Dynamic Pricing /
Retail Optimization
Event Analytics recommendation Engines
Consumer Cross Channel Analytics
Brand and Sentiment Session / Content
Goods Loyalty Program
Analysis Optimization
Optimization
Churn prediction Market Analysis
Fraud Detection
TMT Fraud scenarios Consumers behavior
Market needs
identification Targeting
Risk Modeling & Fraud
Surveillance and Fraud
Identification Real-time upsell, cross sales
Finance Detection
Trade Performance marketing offers
Customer Risk Analysis
Analytics
Production Optimization Grid Failure Prevention
Energy Individual Power Grid
Consumption prediction Smart Meters
Supply Chain Optimization Dynamic Delivery
Manufacturing Customer Churn Analysis
Asset maintenance Replacement parts
Multi-party Fraud Scenario Customer Risk Analysis
Insurance Premium
Insurance Investigation Dynamic Insurance Plan
Determination
Weather Impact Analysis (sensor enabled)
Strategy Determine business drivers and if Big Data can play a role in better insight
Identify and acquire the skill sets required to understand and leverage Big
Data to add value
Future
Future
Industry Strategic Action-
Key Issues Synthesis
Scenarios Direction plans
Scenarios
Think out
of the box
Situation Analysis of Strategic Big
Reward
Assessment internal sources Data Plan
What data sources should be collected and how can they be acquired efficiently?
Should retention be provided for those data?
How intensively will those data be processed?
How is data quality managed across so many sources of data, many of which come from
outside the organization, such as public social networks?
What structure can be derived from non-traditional data sources (documents, Web logs, video
streams, etc.) to make storage, analysis, and ultimately decision-making easier?
How can non-traditional unstructured data be integrated with data stored in traditional
transactional systems?
How can decision-makers comprehend the results of analyzing so much data quickly enough
to act?
What data governance is appropriate when analysis is distributed, needs change and data
definitions and schemas evolve over time?
What architectures and algorithms can be used to decompose problems and data for rapid
execution in parallel environments?
What levels of availability and reliability are possible in mission-critical applications, as data
volumes are so large?
Is specialized hardware required for a particular need, or can low-cost commodity hardware be
leveraged to scale processing?
Given the specialized nature of processing needed, is cloud computing an appropriate platform
choice, and if so, what variant of cloud computing (public, private, hybrid) is needed?
How can security and privacy concerns be factored into the design of a Big Data
environment to reduce vulnerability to external and internal threats?
How are regulations around audit trails and data destruction to be interpreted in a Big Data
environment?
What intellectual property, licensing, and data protection considerations apply when Big
Data environments are distributed across organizational and national boundaries?
How can current IT skill sets best be leveraged in evolving the infrastructure to include Big
Data?