Professional Documents
Culture Documents
Vision Paper
Why You Should Read This Document This paper describes Intels perspective on the analytics of big data generated by sensors and devices on the edge of networks. The paper includes a discussion of: The importance of data at the edge of networks where some of biggest big data is generated How big data is inherently different from the data managed by traditional data management or business intelligence platforms, and why it matters A quick overview of emerging technologies, including distributed frameworks such as the Apache Hadoop* framework and Apache* MapReduce Four analytics use cases for government, retail, automotive, and manufacturingtwo utilizing the Hadoop* framework and two focused on intelligent systems
AUGUST 2012
Vision Paper
Contents
3 Data at the Edge: New Opportunities for Big Data 4 Big Data and Emerging Technologies: The Abridged Version 5 Big Data at the Edge: A Closer Look 7 Harnessing Data from Intelligent Systems and Sensors 9 Use Cases for Data at the Edge 12 Whats Next?
Intel IT Center Vision Paper | Distributed Data Mining and Big Data
Intel IT Center Vision Paper | Distributed Data Mining and Big Data
Application
Relation-Based Data
Single-computer platform that scales with better CPUs; centralized processing Relational databases (SQL); centralized storage
Big Data
Cluster platforms that scale to thousands of nodes; distributed processing Nonrelational databases that manage varied data types and formats (NoSQL and HBase* databases); distributed storage Real-time; predictive and prescriptive; distributed analytics
Analytics
Intel IT Center Vision Paper | Distributed Data Mining and Big Data
23475111
Sensors from multiple systems at the edge stream data via the Internet, LANs, WANs, and mobile networks.
Intel IT Center Vision Paper | Distributed Data Mining and Big Data
For an example of the size and scope of edge data, consider the machine-generated data from the engines of a Boeing* jet. Each engine generates 20 terabytes (TB) of sensor data every hour, so that a four-engine jumbo jet quickly reaches 640 TB of data during an Atlantic crossing. With more than 25,000 commercial flights in the U.S. sky on any given day, a single day of sensor data can measure in exabytes.1
Humans are also generating sensed data. Deb Roy, director of the Cognitive Machines Group at the MIT Media lab, tracked the activities and sounds in his home for three years, starting with the day he brought home his newborn son. Analysis of more than 90,000 hours of video and 140,000 hours of audio mapped his sons acquisition of speech and has provided enormous insight into how humans develop and learn.2
1 Rogers, Shawn. Big Data Is Scaling BI and Analytics. Information Management (September 1, 2011). information-management.com/issues/21_5/big-data-is-scaling-bi-and-analytics-10021093-1.html?zkPrintable=true 2 Roy, Deb. The Birth of a Word. TED talk (March 2011). ted.com/talks/deb_roy_the_birth_of_a_word.html
Intel IT Center Vision Paper | Distributed Data Mining and Big Data
3 Global Internet Traffic Projected to Quadruple by 2015. The Network (press release) (June 1, 2011). http://newsroom.cisco.com/press-release-content?type=webcontent&articleId=324003
Intel IT Center Vision Paper | Distributed Data Mining and Big Data
Intel IT Center Vision Paper | Distributed Data Mining and Big Data
Intel IT Center Vision Paper | Distributed Data Mining and Big Data
10 Intel IT Center Vision Paper | Distributed Data Mining and Big Data
11 Intel IT Center Vision Paper | Distributed Data Mining and Big Data
Whats Next?
Big data is a game changerand its already here. While most of the momentum around big data today is around social media sources, Intel believes that realizing the promise of big data analytics must include a way to harness the potential of big data from intelligent systems and sensors. Intel sees the following next steps as critical for organizations who want to take advantage of edge data sources: Understand use cases and their implications. We must understand how existing disparate data sources can be evolved into a network of integrated, intelligent, connected systems. Define the usage model requirements for the analytics of edge data. The architecture must take advantage of big data distributed frameworks to move computation closer to where the data resides and support big data analytics at the edge via intelligent systems and local clouds. Enable the fast and secure delivery of aggregated data from edge analytics systems to other cloud and analytics platforms for further analysis. Address issues related to data ownership, interoperability, security, and privacy. As interest in data from intelligent systems and sensors continues to grow and organizations understand better how they can use it, Intel is at the forefront of this emerging topic. Intel is already taking a leadership role with cloud computing and big data analytics. As technical advisor to the Open Data Center Alliance (ODCA), an independent IT consortium comprising global IT leaders from more than 300 companies, Intel will play a major role in the newly formed Data Services Workgroup as it works to define usage model requirements that support the secure collection, management, and analysis of big data; drive benchmarking for the Hadoop framework; and develop interoperable standards that make big data frameworks cloud ready. Plus, Intel has years of experience providing the technology that powers intelligent systems, as well as platforms that deliver the exceptional performance, low latency, and high throughput needed to handle large data sets and transform them into deep insights. Count on Intel for the technology, guidance, and vision to make big data work for you.
To learn more about edge data and big data analytics, visit the IT Center at intel.com/bigdata.
12 Intel IT Center Vision Paper | Distributed Data Mining and Big Data
Sponsors of Tomorrow.
13 Intel IT Center Vision Paper | Distributed Data Mining and Big Data