Professional Documents
Culture Documents
Christian D. Black
Datacenter Solutions Architect NSG, Intel Corporation
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
*Other names and brands may be claimed as the property of others.
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intels current plan of record product roadmaps. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as Terasort , HiBench, and Iometer, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Internal Testing Results have been simulated and are provided for informational purposes only. Results were derived using simulations run on an architecture simulator or model. Any difference in system hardware or software design or configuration may affect actual performance. Configurations: Configuration: Terasort* - Intel Xeon 5600 & E5, 7200 RPM HDD & Intel 520 Series SSD, Intel 1GbE and 10Gb Ethernet, and open source Apache Hadoop * & Intel Distribution for Apache Hadoop* Configuration: Iometer* - See http://www.intel.com/go/ssd for detailed products specifications and testing configurations. Configuration: Intel Cache Acceleration Software - Intel Xeon E5, 7200 RPM HDD & Intel SSD DC S3700 Series, Intel 10Gb Ethernet, Intel Distribution for Apache Hadoop* & Intel Cache Acceleration Software Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright 2013 Intel Corporation. All rights reserved.
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
2
NSG
Agenda
Whoami? Background - Hadoop* Distributions & Ecosystem Fitting Hadoop* into the Landscape Lots of businesses talking Intel SSDs for the datacenter Intel SSDs and Hadoop* Where Intel SSDs can best speed Hadoop* Main Clusters & Intel Cache Acceleration Software Your Workload Really Matters Info, Contacts, and Questions
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
3
*Other names and brands may be claimed as the property of others.
NSG
whoami?
Datacenter Solutions Architect, NSG Moved from Intel IT to NSG in Q1 13 Started on an Atari* 400 in 1980 Anyone recall Pilot* & Logo*? 20 years of Enterprise IT Experience USAF Migration from Novell* to Windows NT* MCSE*, SAP*, Datacenter Architecture, & IT Research/Pathfinding Driving SSD into Enterprise IT @ Intel for last 4 years Hadoop* experience 15x cluster builds w/5 different distros 18 months supporting research & development internally
*Other names and brands may be claimed as the property of others.
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
4
NSG
Background - Hadoop*
Just over the peak of the hype cycle at 8 years old! Intel reliable & long term commitment to Big Data Intel Distribution for Apache Hadoop* software Q412 Active contributions to Intel Distribution for Apache Hadoop* software and Apache* open source projects Why Hadoop*? Hadoop* invented to download & index the Internet Application & software framework HDFS (Hadoop* File System) Map-Reduce (Distributed Compute Framework) Sift and store mountains of data at /GB
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
5
*Other names and brands may be claimed as the property of others.
NSG
Infrastructure Vendors
Apache* 100% Open Source Intel Distribution for Apache Hadoop*, Cloudera* (CHD) Greenplum* (GDH),MapR* (MRH), Hortonworks* (HDH), + 8-10 or more % of Open Source in codebase Software, Services, and/or Support Revenue Model In Big Data there are 100s of Companies building on 10-15 Open Source Projects
Differentiation
Commercial Software
Open Source
# http://mattturck.com/2012/10/15/a-chart-of-the-big-data-ecosystem-take-2/
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
6
NSG
#Hadoop architecture diagram and component inclusion in the Intel Distribution for Apache Hadoop* Software @ hadoop.intel.com. Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel's current plan of record product roadmaps.
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
7
Framework
#Hadoop
NSG
Many are
Facebook*, EBay* Cluster Size Range 120 to 4500 Hosts 40+ other Companies Cluster Size Range 20 to 120 Hosts
1Adobe*,
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
8
NSG
Sequential Read @ 500 MB/s & Sequential Write @ 450 MB/s 2 4k Random IOPS - 75k Read - 36k Write @ 45-65s 2 Up to 14PB Endurance (800GB Drive)
2
Sequential Read @ 500 MB/s & Sequential Write @ 450 MB/s 2 4k Random IOPS @ 75k Read & 11.5k Write & 50-65s 2 Up to 450TB Endurance (800GB Drive)
70000 60000 50000 40000 30000 20000 10000 0 0 500 1000 1500 2000
Full data path protection Parity, CRC, memory ECC, LBA tag validation1, and power loss protection (PLI)
1 Abbreviations : CRC Cyclical Redundancy Check, LBA Logical block Address, PLI Power Loss Imminent 2 Data based on Intel SSD DC S3700 and DC 3500 Series data sheets. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as Iometer*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Internal Testing Configuration: See http://www.intel.com/go/ssd for detailed products specifications and testing configurations.
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
9
NSG
128MB-256MB sequential IO operations Write once, read many, occasional re-balance Perfect for $.04/GB 7k RMP spinning rust @ 130-150MB/Sec Temp intermediate/spillover data creates disk contention
SSDs provide SSD 450-500MB/Sec Sustained Intel internal testing shows pure SSDs provide up to 80%1 performance increase for 1TB Terasort* in Hadoop* $1-$2.35/GB for SSD Due to cost per GB SSDs are perceived as a tough sell with typical enterprise IT financial constraints
1 Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as Terasort or HiBench, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Internal Testing Configuration: Intel Xeon 5600 & E5, 7200 RPM HDD & Intel 520 Series SSD, Intel 1GbE and 10Gb Ethernet, and open source Apache Hadoop * & Intel Distribution for Apache Hadoop*
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
10
NSG
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
11
NSG
Subset of a companys Big Data, typically <20 servers Java*/MapReduce*/Hadoop* developers time critical
Up to 97%1 accelerated development with a combination of current Intel products over last generation of Intel products
Balanced node resources with the right mix of Intel SSD + Intel Xeon E5 processor + Intel 10Gb Ethernet Intel SSD DC S3700 Series
1 Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as Terasort or HiBench, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Internal Testing Configuration: Intel Xeon 5600 & E5, 7200 RPM HDD & Intel 520 Series SSD, Intel 1GbE and 10Gb Ethernet, and open source Apache Hadoop * & Intel Distribution for Apache Hadoop*
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
12
NSG
Impala*, Hbase*, or other Real-Time Query Clusters Fast Query Returns Business Critical
Up to 97%1 accelerated development with a combination of current Intel products, over last generation of products Balanced node resources with the right mix of Intel SSD + Intel Xeon E5 processor + Intel 10Gb Ethernet Intel SSD DC S3700 Series
1 Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as Terasort or HiBench, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Internal Testing Configuration: Intel Xeon 5600 & E5, 7200 RPM HDD & Intel 520 Series SSD, Intel 1GbE and 10Gb Ethernet, and open source Apache Hadoop * & Intel Distribution for Apache Hadoop*
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
13
NSG
Main Hadoop Clusters >20 Servers MapReduce* Intermediate Data (Temp/Intermediate) + 1x Intel SSD DC S3700 Series per host
Intermediate Data written to SSD (Hadoop* .xml Setting) Alleviates contention on HDDs by moving temp/intermediate to SSD Product selection based on estimated max generated spillover
Cache Acceleration Software
Main Hadoop Clusters > 20 Servers Intel Cache Acceleration Software + 1x Intel SSD DC S3700 Series/host or 1x Intel SSD 910 Series PCIe*/host
Read/Write Caching for all Data Disks 1 Accelerates Terasort* jobs up to 42% Product selection based on write and working dataset size
1 Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as Terasort or HiBench, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Internal Testing Configuration: Intel Xeon E5, 7200 RPM HDD & Intel SSD DC S3700 Series, Intel 10Gb Ethernet, Intel Distribution for Apache Hadoop* & Intel Cache Acceleration Software
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
14
NSG
IO, CPU, Network, or Memory intensive & when? Generates lots or little intermediate data?
Intel SSDs provide 450-500MB/Sec Sustained Bandwidth and handle both Sequential and Random IO gracefully
Hadoop* relies on bandwidth/throughput 2x400GB (1GB/Sec )better than 1x800GB SSD (500MB/Sec) for Hadoop* Unleashing IO can require a rebalance May require more/faster cores and a faster network
1 Data based on Intel SSD DC S3700 and DC 3500 Series data sheets. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as Iometer*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Internal Testing Configuration: See http://www.intel.com/go/ssd for detailed products specifications and testing configurations.
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
15
NSG
Business Contacts
Please work with identified NSG BDMs for Intel SSD + Hadoop* design wins!
Expertise
Datacenter Solutions Architecture, Big Data & HPC
Contact
Christian Black, DSA, NSG Carolyn Hanley, PME, NSG David Collins, IDH Director of Sales, IAG/DSD
Intel SSD + Intel CAS SW Marketing Support Intel Distribution for Apache Hadoop* software
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
16
NSG
Open Questions
2013 SNIA Analytics and Big Data Summit. Intel Corporation. All Rights Reserved.
17
NSG