You are on page 1of 23

APAC Big Data & Cloud Summit 2013

Girish Juneja
GM, Big Data Software
Software & Services Group

Data

Fab

Transistor

System

Enablement

Optimization

Intelligence

Feedback loops driving exponential growth


Machine Generated

Data

Computing

Experience

Social

User Generated

30 million networked sensors growing at 30% a year

1 trillion devices connected to the Internet by 2015

500 million smart phone users increasing 20% a year

Evolving towards end-to-end real-time analytics


Decade Paradigm Reporting / Data Mining High Cost / Isolated use Architecture Batch sales reports Sequential SQL queries
Scale
Multi-core

Platform RDMS

90s

Model-based discovery High Cost / Dept Use

2000s

Batch-ie correlated buying pattern No SQL. parallel analysis Shared disk/memory


Scale
Node Node Node

No SQL RDMS

Proprietary MPP/ DW Appliance


Open Source SW loosely coupled to commodity HW
Node Node Node

Today

Unbounded Map Reduce Query Low Cost / Enterprise Use Arrival of vast amounts of unstructured data

Real-time - ie recommend engine Process @ storage node Built-in data replication/reliability Shared nothing, in memory
Unlimited Linear Scale
Distributed node addition

Make big data work for you


Amount of data your enterprise will need to ingest: 50X

Proportion of data that is useful to you: 10%


Projected increase in your IT budget: 10% => Business as usual is not an option

Benefit from Intels long-standing investments


Systems Architecture Manufacturing Leadership
Energy Efficient Performance

Security

Software

Global Ecosystem

Using volume economics to drive innovation


Intel

Fabricating silicon for big data


2007 2009 2011

45 nm

32 nm

22 nm

37%
Performance Gain at Low Voltage1

22nm
A Revolutionary Leap in Process Technology
High-k Metal Gate
Intel lead vs. Industry

Tri Gate
Intel lead vs. Industry

>50%
Active Power Reduction at Constant Performance1

3.5 years

4 years

Pumping the heart of the open datacenter


Intel Xeon Processor E7-4800 Product Family Intel Xeon Processor E5-4600 Product Family

Highest reliability & scalability Highest memory capacity Highest enterprise & database performance

Density-optimized Cost-optimized Improved HPC performance

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. 1 Source: Published results as of 8 May 2012. See http://www.intel.com/performance/server/xeonE7/summary.htm for full list of benchmarks and configuration details.

Enabling open source solutions


Optimize software to take advantage of Intel architecture
3x performance in 3 years Mission Critical deployments Accelerates Crypto in JBoss 30x throughput Trusted Compute Pools

VT-*

MCA

AES-NI

SSD, 10GbE

TXT

Contributing to Apache Hadoop


HBase distributed tables across data centers HDFS data replication across data centers Archival storage support for cold data on HDFS SSE Instructions JVM Enhancements Infiniband RDMA Support

File based encryption for Hadoop jobs ACLs for HDFS and HBase at cell level

Flash storage for MapReduce shuffle data Caching and non-volatile memory for increased throughput HDFS adaptive replication of hot-files

Supporting Intel Distribution for Apache Hadoop


Security Data Mining

Batch Analytics

Graph Analytics

Full SQL

Full Text Search

Intel Distribution for Apache Hadoop* software


Granular access control in HBase
Rhino

Common authentication, access control, auditing

Up to 20X faster crypto with AES-NI* 30X faster Terasort on Intel Xeon processors, Intel 10GbE, and SSD
HPC

Bringing MapReduce to data on Lustre FS


Enabling real-time 100% SQL on Hadoop Optimizing Hadoop for virtualization & cloud

Up to 8.5X faster queries in Hive* Job profiling and configuration, automated by Intel Active Tuner
*Based on internal testing

Cloud

Backed by portfolio of datacenter products


Software
Cache Acceleration Software

Server

Storage & Memory

Network

With broad support from the ecosystem

* Other names and brands may be claimed as the property of others.

Proven in the enterprise


Using the Intel Distribution to gain tremendous results

IT

* Other names and brands may be claimed as the property of others.

From Hype to High Performance


Putting advanced capabilities at workto solve real use cases
Expose new data Dashboard/historical reporting Real-time campaigns Vertical apps Predictive data services Graph visualization Log analysis

Fraud & threat detection Life sciences research Behavioral analysis Warranty analysis Customer segmentation Infrastructure optimization

Data-Driven Business: Customer Service


Value
Enable subscriber access to billing data 30X gain in performance; lower TCO
Subscriber Self Service

Analytics
Provides real-time retrieval of 6 months data Supports new BI with 15 types of queries Enables targeted ad serving and promotions
Intel Distribution

Data Management
30 TB/month of billing data 300K reads/second; 800K inserts/second
CDR

133-node cluster / Intel Xeon E5 processors

Data-Intensive Discovery: Genomics


Value
Enable researchers to discover biomarkers and drug targets by correlating genomic data sets
90% gain in throughput; 6X data compression

Analytics
Provide curated data sets with pre-computed analysis (classification, correlation, biomarkers) Provide APIs for applications to combine and analyze public and private data sets

Intel Distribution

Data Management
Use Hive and Hadoop for query and search Dynamically partition and scale Hbase 10-node cluster / Intel Xeon E5 processors / 10GbE

Data-Rich Communities: Smart City


Value
Enforce traffic laws and detect license fraud Monitor and predict traffic patterns In a city of 31 million people
Regional Detection Prevention

Analytics
Detect traffic law violations automatically Detect driver license fraud by data mining

Forecast traffic with predictive analytics


Local

Data Management
30,000 cameras 6Mb/s stream rate per camera 15 PB of images in use / 2B records in HBase

Foster the ecosystem and develop new markets for Intel and its partners

Catalyzing the ecosystem

Resources
Content
Case Studies Whitepapers Demos
http://hadoop.intel.com

Contacts Girish Juneja RK Hiremane Eddie Toh hadoop@intel.com