APAC Big Data & Cloud Summit 2013

Girish Juneja
GM, Big Data Software
Software & Services Group








Feedback loops driving exponential growth

Machine Generated





User Generated

30 million networked sensors growing at 30% a year

1 trillion devices connected to the Internet by 2015

500 million smart phone users increasing 20% a year

Evolving towards end-to-end real-time analytics

Decade Paradigm Reporting / Data Mining High Cost / Isolated use Architecture Batch sales reports Sequential SQL queries

Platform RDMS


Model-based discovery High Cost / Dept Use


Batch-ie correlated buying pattern No SQL. parallel analysis Shared disk/memory

Node Node Node


Proprietary MPP/ DW Appliance

Open Source SW loosely coupled to commodity HW
Node Node Node


Unbounded Map Reduce Query Low Cost / Enterprise Use Arrival of vast amounts of unstructured data

Real-time - ie recommend engine Process @ storage node Built-in data replication/reliability Shared nothing, in memory
Unlimited Linear Scale
Distributed node addition

Make big data work for you

Amount of data your enterprise will need to ingest: 50X

Proportion of data that is useful to you: 10%

Projected increase in your IT budget: 10% => Business as usual is not an option

Benefit from Intels long-standing investments

Systems Architecture Manufacturing Leadership
Energy Efficient Performance



Global Ecosystem

Using volume economics to drive innovation


Fabricating silicon for big data

2007 2009 2011

45 nm

32 nm

22 nm

Performance Gain at Low Voltage1

A Revolutionary Leap in Process Technology
High-k Metal Gate
Intel lead vs. Industry

Tri Gate
Intel lead vs. Industry

Active Power Reduction at Constant Performance1

3.5 years

4 years

Pumping the heart of the open datacenter

Intel Xeon Processor E7-4800 Product Family Intel Xeon Processor E5-4600 Product Family

Highest reliability & scalability Highest memory capacity Highest enterprise & database performance

Density-optimized Cost-optimized Improved HPC performance

Enabling open source solutions

Optimize software to take advantage of Intel architecture
3x performance in 3 years Mission Critical deployments Accelerates Crypto in JBoss 30x throughput Trusted Compute Pools




SSD, 10GbE


Contributing to Apache Hadoop

HBase distributed tables across data centers HDFS data replication across data centers Archival storage support for cold data on HDFS SSE Instructions JVM Enhancements Infiniband RDMA Support

File based encryption for Hadoop jobs ACLs for HDFS and HBase at cell level

Flash storage for MapReduce shuffle data Caching and non-volatile memory for increased throughput HDFS adaptive replication of hot-files

Supporting Intel Distribution for Apache Hadoop

Security Data Mining

Batch Analytics

Graph Analytics

Full SQL

Full Text Search

Intel Distribution for Apache Hadoop* software

Granular access control in HBase

Common authentication, access control, auditing

Up to 20X faster crypto with AES-NI* 30X faster Terasort on Intel Xeon processors, Intel 10GbE, and SSD

Bringing MapReduce to data on Lustre FS

Enabling real-time 100% SQL on Hadoop Optimizing Hadoop for virtualization & cloud

Up to 8.5X faster queries in Hive* Job profiling and configuration, automated by Intel Active Tuner
Backed by portfolio of datacenter products

Cache Acceleration Software


Storage & Memory


With broad support from the ecosystem

Proven in the enterprise

Using the Intel Distribution to gain tremendous results


From Hype to High Performance

Putting advanced capabilities at workto solve real use cases
Expose new data Dashboard/historical reporting Real-time campaigns Vertical apps Predictive data services Graph visualization Log analysis

Fraud & threat detection Life sciences research Behavioral analysis Warranty analysis Customer segmentation Infrastructure optimization

Data-Driven Business: Customer Service

Enable subscriber access to billing data 30X gain in performance; lower TCO
Subscriber Self Service

Provides real-time retrieval of 6 months data Supports new BI with 15 types of queries Enables targeted ad serving and promotions
Intel Distribution

Data Management
30 TB/month of billing data 300K reads/second; 800K inserts/second

133-node cluster / Intel Xeon E5 processors

Data-Intensive Discovery: Genomics

Enable researchers to discover biomarkers and drug targets by correlating genomic data sets
90% gain in throughput; 6X data compression

Provide curated data sets with pre-computed analysis (classification, correlation, biomarkers) Provide APIs for applications to combine and analyze public and private data sets

Intel Distribution

Data Management
Use Hive and Hadoop for query and search Dynamically partition and scale Hbase 10-node cluster / Intel Xeon E5 processors / 10GbE

Data-Rich Communities: Smart City

Enforce traffic laws and detect license fraud Monitor and predict traffic patterns In a city of 31 million people
Regional Detection Prevention

Detect traffic law violations automatically Detect driver license fraud by data mining

Forecast traffic with predictive analytics


Data Management
30,000 cameras 6Mb/s stream rate per camera 15 PB of images in use / 2B records in HBase

Foster the ecosystem and develop new markets for Intel and its partners

Catalyzing the ecosystem

Case Studies Whitepapers Demos

