You are on page 1of 48

Big Data

An Introduction

Big Data

What is big data?


What makes data big?

What is Big Data

Difficult to define
There is not a definitive definition

What is Big Data

From www.gartner.com
Diverse, high-volume, high-velocity information assets
that require new forms of processing to enable
enhanced decision making, insight discovery, and
process optimization.

What is Big Data

From www.the-bigdatainstitute.com
Exhibit variety;
Includes structured, unstructured, and semi-structured data;
Is generated at high velocity with an uncertain pattern;
Do not fit neatly into traditional, structured, relational
databases;
Can be captured, processed, transformed, and analyzed in a
reasonable amount of time only by sophisticated information
systems.

What Constitutes Big Data

Big Data Generally Consist of:

Traditional enterprise data


Machine-generated/sensor data
Social Data
Images captured by billions of devices located around
the world
Digital cameras, camera phones, medical scanners, and
security cameras

What Constitutes Big Data

Is Data Big?

Google
Process more than 24 petabytes/day (2013)

Facebook
10 million photos in one hour
3 billion like or comments in one day

Twitter
450 million tweets in one day (mid 2013)
Grows 200% every year

Big Data Characteristics

Volume
Velocity
Variety

Big Data: 3Vs

10

Characteristics: Volume
Data volume is increasing exponentially
From 1 zettabytes (2008) to 44ZB (2020): 44x increase

11

Characteristics: Variety

Various formats, types, and


structures
Text, numerical, multimedia
(images, audio, video), social media
data,
A single application may have to
handle many types of data from
multiple sources
12

Characteristics: Velocity
Data is begin generated fast and need to be processed fast
Transactions, GPS, Posts, Sensors, Chats,

Online Data Analytics


Faster decisions means a competitive advantage
Examples
Push promotion (e.g. store nearby) to your mobile based on
your current location and profile
Traffic social monitoring: sensors monitoring peoples routes
(and more) to update route in real-time.

13

Whos Generating Big Data

Mobile devices
Scientific instruments

Social networks
Sensor technology and networks

14

Whos Generating Big Data


The Model Has Changed
Old Model: Few companies generating data, all others consuming data

New Model: all are generating data; all are consuming data

15

What is collecting all this data?


Web Browsers

Search Engines

Microsofts
Internet Explorer

Googles

Mozillas FireFox

Microsofts

(Non-profit foundation,
used to be Netscape)

Googles Chrome
Yahoos

Apples Safari
IAC Searchs
Time-Warners AOL
Explorer

What is collecting all this data?


Smartphones & Apps

Apples iPhone
(Apple O/S)

Samsung, HTC.
Nokia, Motorola
(Android O/S)

RIM Corps Blackberry


(BlackBerry O/S)

Tablet Computers & Apps

Apples iPad

Samsungs Galaxy

Amazons Kindle Fire

What is collecting all this data?

Games Boxes and GPS Systems

Internet Service Providers

What is collecting all this data?


HDTVs and Blu-Ray Players with
built-in Internet connectivity

Movie Rental Sites

What is collecting all this data?


Pharmacies

Hospitals & Other Medical Systems


Laboratories
Imaging Centers
Emergency Medical Services (EMS)
Hospital Information Systems
Doc-in-a-Box
Electronic Medical Records
Blood Banks
Birth & Death Records

Banking & Phone Systems


Can you hear me now?
(Heh heh heh!)

What is collecting all this data?

What are they collecting?


Restaurant reservations
(Open Table)
Weather in L.A. in 3 days
(Weather+)
Side effects of medications
(MedWatcher)
3-star hotels in New Orleans
(Priceline)
Which PC should I buy and
where (PriceCheck)

Who is collecting all of this data?


Government Agencies

Corporations

Who is collecting all this data?


Corporations

Consumer Products Companies

Big Box Stores

Who is collecting what?


Airline ticket

Credit Card Companies

Restaurant check

What data are they getting?


Grocery Bill
Hotel Bill

Why are they collecting all this data?

Target Marketing
To send you catalogs for exactly
the merchandise you typically
purchase.
To suggest medications that
precisely match your medical
history.
To push television channels to
your set instead of your
pulling them in.

Targeted Information
To know what you need before
you even know you need it
based on past purchasing habits!
To notify you of your expiring
drivers license or credit cards or
last refill on a Rx, etc.
To give you turn-by-turn
directions to a shelter in case of
emergency.

Whats driving Big Data?

OLTP: Online Transaction Processing (DBMSs)


OLAP: Online Analytical Processing (Data Warehousing)
RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)
What else could we need?
26

Whats driving Big Data


- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time

- Ad-hoc querying and reporting


- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets

27

Whats driving Big Data

Whats driving Big Data

Challenges?

30

Challenges in Handling Big Data

Technology
New architecture, tools and techniques are required

Analysis
Intuition and new algorithms are needed
Data Scientist is an hot role

Visualization
Conveying Information

31

Any Technology out there?

32

33

Technology

34

Technology

Analysis

When properly analyzed big data can reveal valuable


patterns and information.
Segmenting Population to Customize Actions
Replacing/Supporting Human Decision Making with
Automated Algorithms
Innovating New Business Models, Products, and
Services
Organizations Can Analyze Far More Data
Enabling Experimentation

Analysis

Value of Data

Relevance of Data

Visualization

How to visualize big data?


What is relevant, what is not?
How to convey a message effectively?

Visualization

How to visualize big data?


What is relevant, what is not?
How to convey a message effectively?
Multiple ways

Dashboards

At a glance view of relevant indicators


Provides access to timely information and reports
Use sophisticated visualizations
Display blend of historical and real-time data
Data updates interval getting shorter
1h, 30m, 15m,

Team Analysis

Management Cockpit

Future?

Future? What You Dont Know, Yet

You might also like