Professional Documents
Culture Documents
February 2012
Page 1
Hortonworks Vision
We believe that by the end of 2015, more than half the world's data will be processed by Apache Hadoop
How to achieve that vision???
Enable ecosystem around enterprise-viable open source data platform.
Page 2
Set of open source projects Transforms commodity hardware into a service that:
Stores petabytes of data reliably Allows huge distributed computations
Key attributes:
Redundant and reliable (no data loss) Extremely powerful Batch processing centric Easy to program distributed apps Runs on commodity hardware
One of the best examples of open source driving innovation and creating a market
Page 3
Page 4
Page 5
Serving Logs
Social Media
Sensor Data
Text Systems
Unstructured Systems
Hortonworks Inc. 2012 Page 6
Page 7
CASE STUDY
YAHOO! HOMEPAGE
Personalized for each visitor Result: twice the engagement
Recommended
links
News
Interests
vs. one size fits all Top
Searches
vs. editor selected
+79% clicks
vs. randomly selected
Source: McKinsey & Company report. Big data: The next frontier for innovation, competition, and productivity. May 2011.
Page 9
Page 10
Web Site
Web Site
Reports
Transaction Logs
Web Site
Page 11
Web
Mobile
Hadoop
EDW
Page 12
Offsite Backups / DR
HDFS Snapshots, Cloud Backup, other tools
No Storage Limits
No file limits, scale beyond 10,000 computers / cluster
Page 13
Page 14
Page 15
BI, Reporting, Visualization Analytics, EDW Algorithms, Data Science Tools, Languages
Operations
Page 16
Web
Mobile
Hadoop
Teradata Aster
Teradata
Social
Online Archival
Ex. Historical Black Friday data
Other Logs
Page 17
Key benefits
Graphical development Robust and scalable execution Broadest connectivity to support all systems: 450+ components Real-time debugging
Page 18
JavaScript Framework
Interactive JavaScript console for fast iterative development Fluent data query API that translates JavaScript queries to server-side Pig Latin and HiveQL Robust data visualization & charting
Open source client and server-side frameworks Open source Hive ODBC Driver
Business Analysts
Page 19
Page 20
Page 21
Hadoop Core
HBase
= New Version
Hortonworks Inc. 2012
Pig
(Cluster Coordination) (Columnar NoSQL Store) (Data Flow)
Hive
(SQL)
Ambari &
Other Monitoring & Management
HBase
Zookeeper
MapReduce
(Distributed Programing Framework)
Oozie
Workflow scheduling
HCatalog
(Table & Schema Management)
Sqoop &
Other Ingest, ETL tools
HDFS
(Hadoop Distributed File System)
Mahout &
Other libraries
Page 24
Page 25
1.0
preview
1.0
preview
1.0
preview
1.0
1.1
preview
1.1
1.2
1.3
2.0
2.1
Thank You!
Questions & Answers
Page 27