You are on page 1of 13

BIG DATA

AJAY S PIYUSH SAINI ASMITA BARDHAN ANIRUDH SAHIL YADAV MANAS DWIVEDI RANJITH TUSHAR KESHAV

Introduction
Big Data is essentially large amount of data that is being stored and warehoused Large amount of data being stored in databases all around the world due to
E-commerce E-banking Social networking E-business

Retrieving, processing and storing high speed data streams is now a challenging task for many business organizations Around 100 PB of data is being processed everyday all around the world

Big Data Landscape

Data Streams
Continuous streams of data that interact with a large number of applications They are huge, fast and dynamic Processing and storing data streams is highly challenging A few well known applications of data streams are
Google Around 20 PB / day Facebook Around 5 PB at approx. 50 TB/day E-bay Around 8 PB at approx. 100 TB/day

Data Volume

Challenges posed by Big Data


The major challenge is the difficulty to perform the following operations on data in Real Time
Searching Sharing Capturing Storage Transferring Analysis Visualization

Require parallel programs running on several thousands of servers at the same time Affects internet search, finance and business informatics

Hadoop
Also known as Apache Hadoop developed by Doug Cutting and Mike Cafarella to handle the Big Data paradigm
Open-source software which supports Data intensive operations Derived from Googles MapReduce and Google File System

Provides high reliability and data motion to applications using MapReduce framework Uses Hadoop Distributed File System (HDFS) which enables applications to work with several petabytes of data

Hadoop Distribution File System


Frame-work written in Java for Hadoop system
Distributable, scalable and portable file system Fragments the application into several racks, each containing nodes and stores data in those nodes Nodes communicate with each other to reposition, replicate and rebalance data to achieve high reliability

HDFS Architecture

Users of data analytics


A few of the many users of Data analytics and Hadoop are Yahoo! Facebook AOL Amazon eBay LinkedIn Twitter

Benefits of Big Data Analytics


Identify new business opportunities
Increased market penetration Locating new areas of growth including untapped customer demographics

Improve data management processes


Facilitates ease of access, storage and retrieval Ease of analysis of market, growth and financial statistics

Expand data sources


Gather new data from disparate sources in various formats Ease of collation and consolidation Increased access to new data

Clarify results
Efficient data visualization for creating dynamic presentations and reports Provide clarity on the current and future trends and implications

Big Data Growth

Thank You

You might also like