Professional Documents
Culture Documents
Jnaneshwar Bohara
Mobile devices
(tracking all objects all the time)
The progress and innovation is no longer hindered by the ability to collect data
But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion
5
How much data?
500 Million Tweets sent each day!
More than 4 Million Hours of content uploaded to
Youtube every day!
3.6 Billion Instagram Likes each day.
4.3 BILLION Facebook messages posted daily!
5.75 BILLION Facebook likes every day.
40 Million Tweets shared each day!
6 BILLION daily Google Searches!
https://blog.microfocus.com/how-much-data-is-
created-on-the-internet-each-day/
Introduction to Big Data|JBohara
What is Big Data?
It is a new set of approaches for analysing data sets that
were not previously accessible because they posed
challenges across one or more of the 3 Vs of Big Data
Data Volume
44x increase from 2009 2020
From 0.8 zettabytes to 35zb
Data volume is increasing exponentially
Exponential increase in
collected/generated data
100s of
millions
data every day
of GPS
? TBs of
enabled
devices sold
annually
25+ TBs of 2+
log data
every day billion
people on
the Web
76 million smart meters by end
in 2009 2011
200M by 2014
Variety (Complexity)
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
Social Network, Semantic Web (RDF),
Streaming Data
You can only scan the data once
14
Challenges in Handling Big Data
15
Big Data Challenges
https://blogs.wsj.com/experts/2014/03/26/six-
challenges-of-big-data/
Chapter 1: Introduction to Big Data
Integration
Integration involves consolidating data for
analysis.
Retrieving relevant data from various sources for analysis
Eliminating redundant data or clustering data to obtain a smaller
representative sample
Analysis
Searching for relationships between data items in a database, or
exploring data in search of classifications or associations.
Analysis can yield descriptions or predictions.
Analysis based on interpretation, organizations can determine
whether and how to act on them.
Data Analytics Process: Discovery
Interpretation
Analytic processes are reviewed by data scientists
to understand results and how they were
determined.
Interpretation involves retracing methods,
understanding choices made throughout the
process and critically examining the quality of the
analysis.
It provides the foundation for decisions about
whether analytic outcomes are trustworthy.
Data Analytics Process: Application
Application
Associations discovered amongst data in the
knowledge phase of the analytic process are
incorporated into an algorithm and applied.
In the application phase organizations gather the
benefits of knowledge discovery.
Through application of derived algorithms,
organizations make determinations upon which
they can act.
A Brief History of Big Data
C 18,000 BCE
Humans use tally sticks to record data for
the first time. These are used to track
trading activity and record inventory.
C 2400 BCE
The abacus is developed, and the first
libraries are built in Babylonia
300 BCE 48 AD
The Library of Alexandria is the worlds largest
data storage center until it is destroyed by
the Romans.
Introduction to Big Data|JBohara
A Brief History of Big Data
100 AD 200 AD
The Antikythera Mechanism, the first mechanical
computer is developed in Greece
1663
John Graunt conducts the first recorded statistical-
analysis experiments in an attempt to curb the
spread of the bubonic plague in Europe
1865
The term business intelligence is used by Richard
Millar Devens in his Encyclopedia of Commercial
and Business Anecdotes
Introduction to Big Data|JBohara
A Brief History of Big Data
1881
Herman Hollerith creates the Hollerith Tabulating
Machine which uses punch cards to vastly reduce
the workload of the US Census.
1926
Nikola Tesla predicts that in the future, a man will
be able to access and analyze vast amounts of data
using a device small enough to fit in his pocket.
1928
The term business intelligence is used by Richard
Millar Devens in his Encyclopedia of Commercial
and Business Anecdotes
1944
Fremont Rider speculates that Yale Library will
contain 200 million books stored on 6,000 miles of
shelves, by 2040.
1958
Hans Peter Luhn defines Business Intelligence as the
ability to apprehend the interrelationships of
presented facts in such a way as to guide action
towards a desired goal.
1965
The US Government plans the worlds first data center
to store 742 million tax returns and 175 million sets of
fingerprints on magnetic tape.
1970
Relational Database model developed by IBM mathematician
Edgar F Codd. The Hierarchal file system allows records to be
accessed using a simple index system. This means anyone can
use databases, not just computer scientists.
1976
Material Requirements Planning (MRP) systems are commonly
used in business. Computer and data storage is used for
everyday routine tasks.
1989
Early use of term Big Data in magazine article by fiction author
Erik Larson commenting on advertisers use of data to target
customers.
1991
The birth of the internet. Anyone can now go
online and upload their own data, or analyze data
uploaded by other people.
1996
The price of digital storage falls to the point
where it is more cost-effective than paper.
1997
Google launch their search engine which will
quickly become the most popular in the world.
1999
First use of the term Big Data in an academic paper Visually
Exploring Gigabyte Datasets in Real-time (ACM)
2008
Globally 9.57 zettabytes (9.57 trillion gigabytes) of information is
processed by the worlds CPUs.
2011
The McKinsey report states that by 2018 the US will
face a shortfall of between 140,000 and 190,000
professional data scientists, and warns that issues
including privacy, security and intellectual property will
have to be resolved before the full value of Big Data
will be realized.
2014
Mobile internet use overtakes desktop for the first
time.
88% of executives responding to an international
survey by GE say that big data analysis is a top priority