You are on page 1of 94

Big Data Analytics for Business

About the course


 Course outline
 Project
 Chapters of Book / Any topic : Data Mining
Applications in R [RF1]
 Exploration of the topics related to Big data
Analytics

Topics
 Sectoral Analysis







Big
Big
Big
Big
Big
Big

Data
Data
Data
Data
Data
Data

Analytics
Analytics
Analytics
Analytics
Analytics
Analytics

in
in
in
in
in
in

Banking
Retail
Supply Chain
telecommunications
e-governance
Healthcare

Topics
 Role of Big Data Analytics in
marketing
 Big data and cloud analytics
 Big data analytical frameworks
 Privacy issues in Big Data

Acknowledgement
Cloudera
Hortonworks
Tera-Data University network
Big Data University
Data science Central
IBM
IBM IBV/MIT Sloan Management Review Study 2011
McKinsey / Gartner / IDC reports
Taming The Big Data Tidal Wave: Finding
Opportunities in Huge Data Streams with Advanced
Analytics (Author : Bill Franks)
 Bid Data (Authors: Viktor Mayer- Schonberger)
 Internet ( for generic search results)










What will be covered in the


course





Exploration of Big Data Analytics


Unstructured Data Analysis
Hadoop Environment
Applications
 Recommendation Systems
 Network Analysis
 Sentiment Analysis

Need for Analytics?


 Caf Terazza is looking to offer few
discount coupons
 Amazon
 H1N1 Flu
 Aviva Insurance company
 Target
 Smart Grid
 IoT

Three pillars of Analytics

 Business
 Methodology
 Tools / Technology

Steps in Analytics





Data Generation
Data Capturing
Data Storing
Data Processing Reporting and
Visualization

Big Data ???


We are
surrounded
with
Machines

We are
surrounded
with DATA

Competing on the 3rd


Platform


From 2005 to 2020, the digital universe will


grow by a factor of 300, from 130 exabytes to
40,000 exabytes*

The investment per gigabyte during that


period will drop from $2.00 to $0.20*

 Currently a quarter of the information in the


Digital Universe would be useful for big data if
it were tagged and analyzed. Only 3% of the
potentially useful data is tagged, and even less
is analyzed*
*The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East (sponsored by
EMC)

3 Vs of Big Data

Big Data : Some thoughts


What is Big Data?
How it is generated?
Why Big Data Analytics?
How Big Data Analytics will help?
How to do Big Data Analytics?
What will be the cost of Big Data
Analytics?
 Do I need to generate Big data first to
do Big Data Analytics ?







Myths about Big Data /Data


Analytics
 Only Big (Large scale) Organizations have Big
Data
 Big Data Analytics are useful for Large
Enterprises only
 It is not possible to do Big Data analytics by
small or Medium level Enterprise
 I know my business well, no need for Big Data
Analytics or any kind of Analytics
 Big Data Analytics / Data Analytics cannot give
me Competitive Advantage ?

How MSMEs can benefit from Big


Data Analytics





R is open source
Hadoop is open source
RHadoop Packages are open source
Application Areas
 Sensor data from machines
 Social Network data analysis for
promotion of products
 Trend analysis on Twitter

Other Platforms
 Hortonworks Sandbox
 Cloudera
 SAS Data Loader ( SAS Cloudera)

What is supposed to be
discussed..
 Generation of Big Data in
organisation
 Processing it
 Reporting / Using it for organizational
performance

18

Memory Unit

19

Jargons of Big Data

20

Big Data Facts


According to McKinsey a
retailer using big data to the
fullest could increase its
operating margins by more
than 60%
Googles Eric Schmidt claims
that every two days now
we create as much
information as we did from
the dawn of civilization until
2003

Bad data or poor quality


data costs US
businesses $600
Billion annually

According to Zuckerberg, 1
billion pieces of content are
shared via Facebooks Open
Graph Daily

By 2015 4.4 Million IT jobs


will be created to support Big
According to Gartner Big Data
Data generating 1.9
will drive $232 Billion in
million jobs in the United
spending through 2016
States

Data never sleeps


Google Receives Over
How Much Email Users Send
Data Is
2,000,000
204,166,667
Emails
Generated
Search Queries
Every
Minute?
24/7/365
Apple Receives About
47,000 App Downloads

Brands on Facebook Get


34,722 Likes

https://www.aabacosmallbusiness.com/advisor/big-data-biggerfacts-132520713.html

30 billion
12+ TBs

RFID tags today


(1.3B in 2005)

phones
world
wide

100s
of
millio
ns of
GPS
enabl
2+
ed
billio

data every
day

? TBs of

of tweet data
every day

25+
TBs of
log data
every day

4.6
billio
n camera

76 million
smart meters in
2009
200M by 2014

devices
sold
people
annually
on the

Web by
end 2011

Digital Data is Exploding

According to
IBM 90% of the
worlds
information

was created in the


last 2 years

Is the Big Part or the Data Part


More Important?
(1)The big part
(2)The data part
(3)Both
The answer is choice (4)
(4)Neither
What organizations do with big data

35

Key sectors for big Data








Financial
Healthcare
Communications
Digital Media
Real Estate







Manufacturing
Travel
Retailing
Government
Energy

Demand for analytical skills

140,000 to
190,000 with
deep analytical
skills

will be needed
by 2018

Demand for general big data


skills

1,500,000
managers
and
analysts

will be needed
to fill jobs in Big
Data by 2018

McKinsey Institute on Big


Data Jobs


There will be a
shortage of talent
necessary for
organizations to take
advantage of big data.
By 2018, the United
States alone could face
a shortage of 140,000
to 190,000 people with
deep analytical skills as
well as 1.5 million
managers and analysts
with the know-how to
use the analysis of big
data to make effective
decisions.
39

The number of organizations who see


analytics as a competitive advantage is
growing.

63%
2010

business initiative

BUSINESS
IMPERATIVE
2011
2012

IQ

Studies show that organizations


competing
on analytics outperform their peers
substantially outperform

IBM IBV/MIT Sloan Management Review Study 2011


Copyright Massachusetts Institute of Technology 2011

1.6x Reve
nue
4
1

Grow
th

2.5xStock
Price
Appreci
ation

2.0xEBIT
DA
Grow
th

How are revenues looking


like.

Big Data Analytics: Helped Chennai


Express tap social media
Shah Rukh Khan's Chennai Express, one of the
biggest Bollywood grossers on 2013, used Big
Data & Analytics solutions by IT services firm
Persistent Systems to drive social media and
digital marketing campaigns.

"Chennai Express related tweets generated over


1 billion cumulative impressions and the total
number of tweets across all hashtags was over
750
thousand over the 90-day campaign
period," Persistent Systems claimed in a release.

43

Introduction to Big Data


What is Big Data?
What makes data, Big Data?

44

Finally.
`Big- Data is similar to Small-data but bigger,
speedy and multi- structured
.. But having data bigger it requires different
approaches:
Techniques, tools, architecture

with an aim to solve new problems


Or old problems in a better way

Few more Myths About Big Data

 Big Data Is New


 Big Data Is Only About Massive Data Volume
 Big Data Means Hadoop
 Big Data Need A Data Warehouse
 Big Data Means Unstructured Data
 Big Data Is for Social Media & Sentiment Analysis
source :http://mashable.com/2012/06/19/big-data-myths/

Big Data Definition


 No single standard definition
Big Data is data whose scale, diversity,
and complexity require new architecture,
technique, algorithms, and analytics to
manage it and
extract value and hidden knowledge from
it
47

SAS defines Big Data Analytics


as


Big data analytics is the process of examining big data to uncover


hidden patterns, unknown correlations and other useful information
that can be used to make better decisions. With big data analytics,
data scientists and others can analyze huge volumes of data that
conventional analytics and business intelligence solutions can't
touch.

Consider this; it's possible that your organization could accumulate


(if it hasn't already) billions of rows of data with hundreds of millions
of data combinations in multiple data stores and abundant formats.
High-performance analytics is necessary to process that much data
in order to figure out what's important and what isn't. Enter big data
analytics.

48

What Is Big Data?


Big data exceeds the reach of commonly used
hardware environments and software tools to capture,
manage, and process it with in a tolerable elapsed time
for its user population. - Teradata Magazine

article, 2011
Big data refers to data sets whose size is beyond the
ability of typical database software tools to capture,
store, manage and analyze. - The McKinsey Global

Institute, 2011

49

What Is Big Data?

IOPS(Input/Output Operations Per Second)


50

Big Data Analytics

Big Data Will Transform Your Business


New Sources of Customer, Product, Market and Operational
Insights

Todays Decision-making

Rearview Mirror hindsight


Less than 10% of available data
Incomplete, disjointed, inaccurate
Business Monitoring

Big Data Will Transform Your Business


New Sources of Customer, Product, Market and Operational
Insights

Big Data Decision-making

Forward-looking recommendations
Exploit all data from diverse sources
Real-time, correlated, governed
Business Optimization

Complementary Approaches
for Different Use Cases
New Approach
Creative, holistic thought,
intuition

Traditional
Approach

Data
Structured, analytical, Transaction Data
Warehou
logical
se
Internal App
Structured
Data
Structure
Repeatabl
ed
Mainframe Data
Linear
Repeatab
Monthly sales reportsle
OLTP
System
Profitability analysis
Data
Linear
Customer surveys

ERP data

Traditional
Sources

Hadoop
Streams
Enterprise
Integration

Web Logs

Social Data
Unstructur
Unstructur ed
edExplorator
Text Data:
y
emails
Exploratory
Iterative
Sensor data: images

Iterative
Brand sentiment

Product strategy
Maximum asset
RFID
utilization

New
Sources

Big Data vis--vis Existing Communities


Variety
Machine Learning
NLP

Big Data
Databases

Volume
Velocity

Complex Event Processing

Big Data: 3Vs

56

Characteristics of Big Data:


1-Scale (Volume)


Data Volume



44x increase from 2009 2020


From 0.8 zettabytes to 35zb

Data volume is increasing


exponentially

Exponential increase in
collected/generated
data

57

Characteristics of Big Data:


2-Complexity (Varity)






Various formats, types, and


structures
Text, numerical, images, audio,
video, sequences, time series,
social media data, multi-dim
arrays, etc
Static data vs. streaming data
A single application can be
generating/collecting many types
of data

To extract knowledge all


these types of data need
to linked together
58

Characteristics of Big Data:


3-Speed (Velocity)
 Data is begin generated fast and
need to be
processed fast
 Online Data Analytics
 Late decisions  missing
opportunities
 Examples


E-Promotions: Based on your current location, your purchase history,


what you like  send promotions right now for store next to you

Healthcare monitoring: sensors monitoring your activities and body


 any abnormal measurements require immediate reaction

59

Some Make it 4Vs

60

With Big Data, Weve Moved


into a New Era of Analytics
12+

terabytes

of Tweets
create daily.

trade events
per second.

Volume

Velocity

Variety

Veracity

100s

of different
types of data.

5+million

Only

1 in 3

decision makers
trust
their information.

Harnessing Big Data

OLTP: Online Transaction Processing (DBMSs)OLAP: Online Analytical Processing (Data


62
Warehousing)RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)

Whos Generating Big Data

Mobile devices
(tracking all objects all the time)
Social media and networksScientific instruments
(all of us are generating data)(collecting all sorts of data)




Sensor technology and


networks
(measuring all kinds of
data)

The progress and innovation is no longer hindered by the ability to collect data
But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion

63

The Model Has Changed




The Model of Generating/Consuming Data


has Changed

Old Model: Few companies are generating data, all others are consumin

New Model: all of us are generating data, and


all of us are consuming data

64

Whats driving Big Data


-

Optimizations and predictive analytics


Complex statistical analysis
All types of data, and many sources
Very large datasets
More of a real-time
-

65

Ad-hoc querying and reporting


Data mining techniques
Structured data, typical sources
Small to mid-size datasets

Analytic With Data-In-Motion &


Data At Rest
Data
Ingest

6
6

Boots
trap
Enrich

Adaptive
Analytics
Model

Forecast

Nowcast

Opportunity Cost
Starts Here

01011001100011101001001001001
11000100101001001011001001010
0011010100100100100110100101010011100101001111001000100100010010001000100101
01100100101001001010100010010
01100100101001001010100010010
11000100101001001011001001010
01100100101001001010100010010
01100100101001001010100010010
01100100101001001010100010010
01100100101001001010100010010
11000100101001001011001001010
01100100101001001010100010010
01100100101001001010100010010
01100100101001001010100010010
01100100101001001010100010010
01100100101001001010100010010
11000100101001001011001001010
01100100101001001010100010010
01100100101001001010100010010
01100100101001001010100010010
11000100101001001011001001010

Value of Big Data Analytics




Big data is more real-time


in nature than traditional
DW applications
Traditional DW
architectures are not wellsuited for big data apps
Massively parallel
processing, scale out
architectures are wellsuited for big data apps

67

Challenges in Handling Big Data

The Bottleneck is in technology




New architecture, algorithms, techniques are needed

Also in technical skills




Experts in using the new technology and dealing with big


data

68

69

Web Data: A source of


Big Data

Web Data Classification


 Web Content, Web structure and Web Usage
Mining
 Data in Web Usage Mining:
 Web server logs
 Site contents
 Data about the visitors, gathered from external
channels
 Further application data

 Not all these data are always available.


 When they are, they must be integrated.
 A large part of Web usage mining is about
processing usage/ clickstream data.
 After that various data mining algorithm can be applied.
71

360-Degree View
 Organizations have talked about a
360-degree view of their
customers
 What is a 360-degree view?
Names & Addresses

72

What Are You Missing?


 About 2% of browsing sessions
complete a purchase
 Information is missing on more than
98% of web sessions
 If only transactions are tracked

98% of Information

73

Importance of Missing Information


 For every purchase transaction
 There might be dozens or hundreds of
specific actions
 That information needs to be collected
and analyzed
Action flow

74

New Ways of Communicating

motivation1
Intention1

Motiva
tion2

You have visibility into the entire buying


Process Instead of seeing just the results

Preference1
Etc.
Preference2

Inten
tion2

75

Data That Should Be Collected


 Collects detailed event history from
any customer touch point






Web sites
Kiosks
Behaviors That Can Be Captured
Mobile apps Purchases
Requesting help
Product views
Forwarding a link
Social media Shopping basket additions Posting a comment
Watching a video
Registering for a webinar
Etc
Accessing a download
Executing a search
Reading / writing a review

And many more!

76

Shopping Behaviors
 How customers come to a site to
begin shopping
 What search engine do they use?
 What specific search terms are entered?
 Do they use a bookmark they created
previously?
 Associated with higher sales rates
Search
keywords

77

Shopping Behaviors (cont.)


 Start to examine all the products they
explore
 Who looked at a product landing page?
 Who drilled down further?
 Who looked at detailed product
specifications?
 Who looked at shipping information?

78

Shopping BehaviShopping Behaviors


(cont.)orcont.)

 Start to examine all the products they


explore
 Who took advantage of any other
information?
 Which products were added/later
removed to a wish list or basket?

79

Research Behaviors
 Understanding how customers utilize
the research content can lead to
tremendous insights into
 How to interact with each individual
customer
 How different aspects of the site do or do
not add value

80

Research Behaviors - An Example


 An organization may see an unusual
number of customers dropping a
specific product

Detailed specification

81

Feedback Behaviors
 Some of the best information is
 Detailed feedback on products and
services

 By using text mining, we can


understand
 Tone
 Intent
 Topic
82

Feedback Behaviors - Examples


 Some customers post reviews on a regular
basis
 It is smart to give special incentives to keep the good
words coming

 By parsing the questions and comments via


online help
 It is possible to get a feel for what each specific
customer is asking about

Customers
in general

Each specific
customer

The Next Best Offer


 A common marketing analysis is to
predict what the next best offer is for
each customer
 To maximize the chances of success

 Having web behavior data can be


very useful

The Next Best Offer - An Example


 At a bank, information about Mr. Smith






He
He
He
He
He

has four accounts: checking, savings, credit card, and a car loan
makes five deposits and 25 withdrawals per month
never visits a branch in person
has a total of $50,000 in assets deposited
owes a total of $15,000 between his credit card and car loan

A lower credit card interest rate


An offer of a CD for his sizable cash holdings

The Next Best Offer - An Example


(cont.)
 We have nothing that says it is remotely relevant
 If Mr. Smiths web behavior is examined and we got
additional information





He browsed mortgage rates five times in past month


He viewed information about homeowners insurance
He viewed information about flood insurance
He explored home load options (i.e., fixed versus variable,
15- versus 30-year) twice in the past month

Attrition Modeling
 In the telecommunications industry,
 Companies have invested massive
amounts of time and effort for churn
models

 It is critical to understand patterns of


customer usage and profitability

Attrition modeling: an example


 Mrs. Smith
 A customer of telecom Provider 101
How do I cancel my Provider 101
contract?

Provider
101s
cancellation
policies page

Response Modeling
 It is similar to attrition modeling
 The goal is predicting a negative behavior rather
than a positive behavior (purchase or response)

 In response model, all customers are scored


and ranked
 In theory, every customer has a unique score
 In practice, a small number of variables define
most models
 Many customers end up with identical or nearly
identical scores
 Web data can help increase differentiation among
customers

Response Modeling - An Example


 4 customers scored by a response model


Has the exact same score due to having the same value:

0.62







Last purchase was within 90 days


Six purchases in the past year
Spent $200 to $300 in total
Homeowner with estimated household income of $100,000 to $150,000
Member of the loyalty program
Has purchased the featured product category in the past year

 Using web data, the scores are changed drastically


 Customer 1
 Customer 2
0.62  0.67
 Customer 3
 0.78
 Customer 4
basket once,

has never browsed your site : 0.62  0.54


viewed the product category featured in the offer within the past month:
viewed the specific product featured in the offer within the past month: 0.62
browsed the specific product featured 3 times last week, added it to a
abandoned the basket, then viewed the product again later: 0.62  0.86

Customer Segmentation
 Web data enables to segment
customers based upon typical
browsing patterns
Dreamer

91

Web server logs

2006-02-01 00:08:43 1.2.3.4 - GET /classes/cs589/papers.html - 200 9221


HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727)
http://dataminingresources.blogspot.com/

2006-02-01 00:08:46 1.2.3.4 - GET /classes/cs589/papers/cms-tai.pdf - 200 4096


HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727)
http://maya.cs.depaul.edu/~classes/cs589/papers.html

2006-02-01 08:01:28 2.3.4.5 - GET /classes/ds575/papers/hyperlink.pdf - 200


318814 HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1)
http://www.google.com/search?hl=en&lr=&q=hyperlink+analysis+for+the+web+survey

2006-02-02 19:34:45 3.4.5.6 - GET /classes/cs480/announce.html - 200 3794


HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)
http://maya.cs.depaul.edu/~classes/cs480/

2006-02-02 19:34:45 3.4.5.6 - GET /classes/cs480/styles2.css - 200 1636


HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)
http://maya.cs.depaul.edu/~classes/cs480/announce.html

2006-02-02 19:34:45 3.4.5.6 - GET /classes/cs480/header.gif - 200 6027


HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)
Courtesy : Bing Liu
http://maya.cs.depaul.edu/~classes/cs480/announce.html

Web usage mining process

Bing Liu

93

94

You might also like