You are on page 1of 29

Web Analytics: A Brief Tutorial

by
Dr. Robert J. Boncella
Professor of Information Systems & Technology
School of Business
Washburn University
Presented
March 2008
To
SAIS 2008

Introduction
Web analytics is the study of the behavior of
website visitors.
In a commercial context, web analytics refers
to the use of data collected from a web site to
determine which aspects of the website achieve
the business objectives
Tutorial Outline
Web Analytics: Context
Web Analytics: Technology & Terminology
Web Analytics: Tools and Case Studies
2

Context for Web Analytics


DSS Decision Support System
A conceptual framework for a process of supporting managerial
decision- making, usually by modeling problems and employing
quantitative models for solution analysis

BI - Business Intelligence subset of DSS


An umbrella term that combines architectures, tools, databases,
applications, and methodologies

BA - Business Analytics subset of BI


The application of models directly to business data
Assists in making strategic decisions

WA - Web Analytics subset of BA


The application of business analytics activities to Web-based
processes, including e-commerce
3

Web Analytics - Details


Relevant Technology

Internet & TCP/IP


Client / Server Computing
HTTP (HyperText Transfer Protocol)
Server Log Files & Cookies
Web Bugs

Data Collection
The Clickstream
Server Log Files
Page Tagging

Data Analysis
Data Preparation
Pattern Discovery
Pattern Analysis

Client/Server Computing
Server

Client

This is a request
This is a response
5

Internet & TCP/IP


The Internet
The infrastructure that provides for the
delivery of data between computer based
processes

TCP/IP
The protocols that provides for reliable
delivery of data on The Internet

HTTP Protocol
Client sends a request to a server
Server sends a response to client
Connectionless
Client:

Opens connection to server


Sends request

Server

Responds to request
Closes connection

Stateless

Client/Server have no memory of prior


connections
Server cannot distinguish one client request from
another client
7

Cookies
Used to solve the Statelessness of the HTTP
Protocol
Used to store and retrieve user-specific
information on the web
When an HTTP server responds to a request it
may send additional information that is stored by
the client - state information
When client makes a request to this server the
client will return the cookie that contains its state
information
State information may be a client ID that can be
used as an index to a client data record on the
server
8

Web Bug Process


Server C
Res: Page_C.html
Page C cnts
- URLs & Img Src
- WebBug Img@
WBS. TRKSTRM.COM

Server B
Req:

Page_B.html

Req:
Page_C.html

Req:

WBS

Cookie: My_Brwsr
Pg A - Server A
Pg B - Server B
Pg C - Server C

WebBug IMG
-Referer Header
- Any cookie for
TRKSTRM.com

Res:
WebBug Img
-Cookie to client
Browser on 1st Req.

Client
Browser
My_Brwsr
1. Render page
2. Click on URL

Res: Page_B.html

Page B cnts
- URLs & Img Src
- WebBug Img@
WBS. TRKSTRM.COM
Req: Page_A.html

Server A

Res: Page_A.html

Page A cnts

- URLs & Img Src


- WebBug Img @
WBS. TRKSTRM.COM
9

Common Clickstream Data Sources


Server Log Files
Passive data collection
Normal part of web browser/ web server
transaction

Page Tagging
Active data collection
Often requires a third party to implement a
vendor
10

Server Log Files


Each time a client requests a resource the server of
that resource may record the following in its log files:

The name & IP address of the client computer


The time of the request
The URL that was requested
The time it took to send the resource
If HTTP authentication used; the username of
the user of the client will be recorded
Any errors that occurred
The referer link
The kind of web browser that was used
11

Server Log Files


Example
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700]
"GET /apache_pb.gif HTTP/1.0" 200 2326

127.0.0.1 Remote host


frank - user name
[10/Oct/2000:13:55:36 -0700] - date & time
"GET /apache_pb.gif HTTP/1.0" - request
200 - status
2326 - bytes

12

Server Log Files


Technical issues for server log data
Data Preparation
Pageview Identification
User Identification
Session Identification

13

Page Tags as Data Source


Provided by Third Party - Vendor
Vendor Supplies Page Tags
Vendor Collects the Data
Vendor Analyzes the Data
Business Accesses the Data
Online or
Reports sent to Business

14

Web Data Abstractions


Abstractions concerning Web usage, Content,
and Structure
Establishes precise semantics for the concepts

Web site
Users or Visitors
User Sessions
Server Sessions or Visits
Pageviews
Clickstreams
15

Data Abstractions
Web Site - collection of interlinked Web pages,
including a host page, residing at the same
network location.
User or Visitors - principal using a client to
interactively retrieve and render resources or
resource manifestations
an individual that is accessing files from a
Web server, using a browser.
User Session - a delimited set of user clicks
across one or more Web servers
16

Data Abstractions
Server Session or Visit - a collection of user
clicks to a single Web server during a user
session
Pageview - the visual rendering of a Web page
in a specific environment at a specific point in
time
a pageview consists of several items
frames, text, graphics, and scripts that construct a single
Web page

Clickstream - a sequential series of pageview


requests made from a single user
17

Web Data Abstractions


(High Level)
Abstractions concerning Visitors
Establishes precise semantics for the concepts

Unique Visitor
Conversion Rate
Abandonment Rate
Attrition
Loyalty
Frequency
Recency
18

Data Abstractions
Unique Visitor
A unique visitor is counted when a human being uses
a web browser to visit a web site.
A visitor may be unique for different periods of time.
The individual is defined by a cookie in the visitors
web browser

19

Data Abstractions
Conversion Rate
A conversion rate is the number of completers
divided by the number of starters for any online
activity that is more than one logical step in length
Starting and finishing any activity
Purchase
Download a research article
Etc.

20

Data Abstractions
Abandonment Rate
The abandonment rate for any step in a multi-step
process is one minus the number of units that make it
to step n+1 divided by those at step n
The formula is (1 ((n+1)/n)
Consider a 10 step process to acquire a resource
How any quit after step 1 or 2 or 3 or 4 or

Consider a 5 step process to acquire a resource


How any quit after step 1 or 2 or 3 or 4 or

21

Data Abstractions
Attrition
Attrition is a measurement of people you have been
able to successfully convert but are unable to retain to
convert again
Consider e-bay web site vs. web site for technical
information

22

Data Abstractions
Loyalty
Loyalty is a measure of the number of visits any
visitor is likely to make over their lifetime as a visitor
Reported as number of visits per visitor
100 visitors made 3 visits each, 87 visitors made 4, etc.
Avoid double counting (i.e. do not count the 87 in with the
100)

23

Data Abstractions
Frequency
Frequency is a measure of the activity a visitor
generates on a web site in terms of time between
visits
Measured in terms of days between visits

24

Data Abstractions
Recency
Recency is the number of days since the last visit (or
purchase)
Reported as the number of visitors who returned after
n days.

25

Uniquely Identified Visitors


Unique Visitors

as
in

gV
alu
e

of

Da
ta

Pyramid Model of Web Analytics Data

Visits

Inc
re

Page Views
Hits
Volume of Available Data
26

Web Usage Mining


Web usage mining is to apply statistical and data
mining techniques to the processed server log
data, in order to discover useful patterns
Data mining methods and algorithms that have
been adapted for the Web domain

Association rules
Sequential pattern discovery
Clustering
Classification
27

Web Usage Data Mining


After discovering patterns from usage data, a
further analysis has to be conducted.
Common ways of analyzing such patterns
Using a query mechanism on a database where the
results are stored
Loading the results into a data cube and then
performing OLAP operations
Visualization techniques are used for an easier
interpretation of the results

Using these results in association with content


and structure information concerning the Web
site there can be extracted useful knowledge for
modifying the site according to the correlation
between user and content groups.
28

Web Analytics:
Tools and Case Studies
Tools
VisiStat - www.visistat.com

Web Analytics Case Studies

Communications Provider - TuVox.com


Online Retailer - TicketsByInternet.com
Winery & Entertainment Venue - The Mountain Winery
Non-Profit Organization - SFBallet.org
Public Relations & Media Agency - BLASTmedia
Technology Provider for Real Estate Professionals - Pullan.com
Real Estate Agency - Intero Real Estate
Start-Up Online Business - GuruPrint.com

29

You might also like