Professional Documents
Culture Documents
AJAY S PIYUSH SAINI ASMITA BARDHAN ANIRUDH SAHIL YADAV MANAS DWIVEDI RANJITH TUSHAR KESHAV
Introduction
Big Data is essentially large amount of data that is being stored and warehoused Large amount of data being stored in databases all around the world due to
E-commerce E-banking Social networking E-business
Retrieving, processing and storing high speed data streams is now a challenging task for many business organizations Around 100 PB of data is being processed everyday all around the world
Data Streams
Continuous streams of data that interact with a large number of applications They are huge, fast and dynamic Processing and storing data streams is highly challenging A few well known applications of data streams are
Google Around 20 PB / day Facebook Around 5 PB at approx. 50 TB/day E-bay Around 8 PB at approx. 100 TB/day
Data Volume
Require parallel programs running on several thousands of servers at the same time Affects internet search, finance and business informatics
Hadoop
Also known as Apache Hadoop developed by Doug Cutting and Mike Cafarella to handle the Big Data paradigm
Open-source software which supports Data intensive operations Derived from Googles MapReduce and Google File System
Provides high reliability and data motion to applications using MapReduce framework Uses Hadoop Distributed File System (HDFS) which enables applications to work with several petabytes of data
HDFS Architecture
Clarify results
Efficient data visualization for creating dynamic presentations and reports Provide clarity on the current and future trends and implications
Thank You