You are on page 1of 6

Web Analysis

Abstract looking at. Web analytics allows you to go much deeper,


without losing this basic information.
Companies maintaining web sites today have an incredible
wealth of customer information available to them in the form A. Hits and impression: In most web analytics
of implicit clicks, actions and behaviors. Web analytics tools discussions a „hit‟ is defined as a single request for
are used widely to track these metrics including 69% any item on your website. This can include images,
companies that track conversions resulting from their site animations, audio, video, downloads, PDF or Word
visitors. documents or anything else that you allow visitors to
Web Analytics does not purely focus on the amount of traffic access.
which might only be helpful in evaluating your bandwidth B. Page views: The number of times a page (an analyst-
usage and server‟s capabilities. Instead it focuses on in-depth definable unit of content) was viewed.
comparison of available visitor data, referral data, and site C. Graphics hits: Other types of hits are important too.
navigation patterns as well as being able to tell us the amount “Graphics Hits”, are the number of requests for
of traffic we receive over any specified period of time. images, animations or other graphics.
This paper focuses on the Open source Web analytics tools D. Downloads: This can be programs, archives, zip
and also comparison between different Web analytic tools. files, or PDF documents that users download from
The comparison might help the user to select the best tool your site.
according to his requirements. E. Errors: One of the great features of Summary is that
these metrics are counted in the context of many of
Key words: Visitors, Page visits, Visits, traffic resource, the reports.
hits, logs F. Bytes: The count of bytes in a period is very useful
for tracking the bandwidth usage on your network.
I. Introduction: The byte count does accurately reflect the amount of
data requested from your server.
According to Wikipedia, Web analytics is the measurement,
collection, analysis and reporting of internet data for purposes
of understanding and optimizing web usage.
III. Advanced Units of Measure
Web analytics provides data on the number of visitors, page
views, etc to gauge the traffic and popularity trends which A. Users: metrics that are helpful in relating
help doing the market research. content to user experience, but you are probably
wondering now just how many users
There are two categories of web analytics; off-site and on-site experienced your content. The most accurate
web analytics. way of counting users is to require them to login.
Off-site web analytics refers to web measurement and analysis
irrespective of whether you own or maintain a website. It B. Unique hosts: The number of inferred
includes the measurement of a website's potential audience, individual people (filtered for spiders and robots)
share of voice, and buzz that is happening on the Internet as a within a designated reporting timeframe, with
whole. activity consisting of one or more visits to a site.
Each individual is counted only once in the
On-site web analytics measure a visitor's journey once on your unique visitor measure for the reporting period.
website. This includes its drivers and conversions; for C. Visits and sessions: A visit is an interaction, by
example, which landing pages encourage people to make a an individual, with a website consisting of one or
purchase. On-site web analytics measures the performance of more requests for an analyst-definable unit of
your website in a commercial context. This data is typically content (i.e. “page view”).
compared against key performance indicators for D. Visit tracking with cookies: One of the most
performance, and used to improve a web site or marketing common techniques for improving visit metrics
campaign's audience response. and request data in general is to configure your
web server to send out a „session cookie.‟ A
II. Basic Units of Measure „session cookie‟ has no expiration date set so the
browser deletes it as soon as the session
While generating and discussing web metrics there are several completes (usually when the user closes the
units that are commonly referred to. It is important to know browser). Session cookies are therefore unique
the distinction between these so that you know what you are to each visit.
IV. Collecting Web Activity Data: HTTP/1.0 Mozilla/4.72+[en]C-SBI-
NC472++(Windows+NT+5.0;+U)
There are two main technological approaches to collecting the
data. The first method, log file analysis, reads the log files in WEBTRENDS_ID=192.168.32.180-3425858080.29527895
which the web server records all its transactions. The second
http://www.awebsite.com/thingamajiggerad.html
method, page tagging or client side tagging, uses JavaScript
on each page to notify a third-party server when a page is Your log file can vary from this example, because you can
rendered by a web browser. Both collect data that can be configure your server to include the information you want.
processed to produce web traffic reports. Also, the information available may vary according to the
brand of server software (for example, IIS, Sun Java System,
A. Using Web Server Logs or Apache).
Traditionally, web site analysis has relied on web server log B. Using Client-Side Tagging:
files to provide insightful data on web activity. Web servers
record some of their transactions in a log file. A second and increasingly popular method of collecting web
activity data is through the use of client-side tagging. A tag is
Each time a visitor attempts to view something on your web
a small segment of code, called a script, which contains
site, download a file from your site, or in some other way instructions that you can put on the web page you want to
requests something from your site, the web server—which
track and analyze.
holds and delivers the content for your site—adds a record to
a log file. This record contains some basic information about Client-side tagging works like this. When a visitor makes a
the request the visitor made. request for a page that is being tracked with a tag, one of two
things happens: either a web server plug-in automatically
Some of this information is known directly by the server, such
embeds a tiny script in the page as it is delivered to the visitor,
as the time, date, what‟s requested, and the size of what‟s
or the web site manager manually embeds a small script in
requested. Other information is obtained through a
any page that he or she intends to track. Either way, the page
cooperative and heavily standardized relationship between the
delivered to the client contains some JavaScript code.
browser and the server, in which the visitor‟s browser is
programmed to send certain information, such as the IP The key to data collection is in the HTTP request, which is a
address of the computer it‟s running on and specifics about the transparent 1 pixel by 1 pixel image. In reality, the image
browser version and operating system of the visitor‟s request is just a transport vehicle for the variable, which
computer. contains the visit information. The information in the variable
gets transported to the data collection server in the request. At
Most web server log files are text files that contain the the data collection server, the information in the variable is
following pieces of information: used to add a new record to a web activity file that you can
use for web site analysis.
Date and time that the visitor asked for something from the
web server, the IP address (Internet Protocol address) or The basic steps of the tagging process:
domain name of the visitor‟s computer, the web server‟s
name, the web server‟s IP address, method used in the request The tagging method can actually be hosted externally, or you
(get, post, head), the URL of requested contents, any query may end up hosting it on site. Typically, if you want deeper
parameter, the return code, The number of bytes sent by the analysis capabilities, you would handle the data collection
web server to the client, The number of bytes sent by the internally to keep the data on hand. Most external hosting
client to the web server, The amount of time (in milliseconds) companies do not hold your data for an extended period, they
to fulfill the request, The port on the client machine used to simply offer you standard reports on summary web activity
send requests and receive the requested data, The client data.
machine‟s browser type and version number (also known as
“the user agent”), Cookie information, if the client machine 1. A visitor wants to view a page on your site. This
has a cookie for your site, Referrer information, if the visitor initiates a page request to your web server.
was sent to your site from an external site. 2. Your server sends the page to the visitor, and this
page contains a JavaScript tag.
Each log entry appears as information on one very long line in 3. The tag triggers a request for a GIF with parameters
the file. For example: attached.
4. The GIF file is sent to the visitor.
2002-09-16 00:01:58 65.70.31.3 W3SVC82 HERC 5. The request with the parameters is analyzed.
209.224.1.170
The tags put information into a web activity file for analysis.
GET /products/thingamajigger.html 200 4199 363 266 80 A typical web activity file record might look like this:
2001-03-04 00:08:18 proxy7.hotmail.com W3SVC3 web1 Here is the really short summary to make analog ready for
192.168.1.1 GET /ads/ analysis of your website:
default.aspredir=products&ad=http%3A//
www.boatdealer.com&WT.mc_n=Boat%20Dealer%20Campa 1. Edit analog.cfg and make changes accordingly
ign&WT.mc_t=Banner&WT.mc_s=3/3/2001&WT.mc_c=60& 2. Run analog .exe (a DOS window flashes up).
WT.ad=P-32,%20P-58,%20P- 3. Read Report.html in which report is displayed.
72%20Options%20Offer&WT.sv=Web%20Server%201&WT.t
i=Advertising%20Redirect&WT.tz= You can configure analog by putting commands in the
420&WT.ul=en&WT.cd=32&WT.sr=1024x768&WT.jo=Yes configuration file, analog.cfg. One command you will need
&WT.js=Yes&WT.co=Yes 200 0 1 75 1 80 HTTP/1.1
straight away is
Microsoft+Internet+Explorer/4.40.305beta+(Windows+95)
WEBTRENDS_ID=192.168.16.148-1615253808.29527727 LOGFILE logfilename # to set where your logfile lives
http://www.boatdealer.com/ dealers/pacific/dealerlist.htm

V. Tools available for web analysis The log file must be stored locally.

There are plenty of Web analytics applications out there, and I had configured the analog.cfg with the host name as
we probably already know the big guns such as Google “http://oscex-en.url.trendmicro.com” and the path of the
Analytics, Crazy Egg, and remote-site services such as Alexa folder where the log files are stored and the name of file in
and Compete. which the report is stored i.e. report_oscex.html.

I have used following tools for web analysis. Then the analog.exe file is run the report file and it ran
successfully. The report_oscex.html file is also generated but
Open source tools:
the report was not as expected. Though there were many
For this case study I tried to install majorly used 4 open source entries of “http://oscex-en.url.trendmicro.com” in log file
tools like Analog, AWStats, W3Perl, and Webalizer. These all which was given as input, the report generated was showing
tools use log files for analysis. number of hits as 0. The reason behind this can be the
configuration file which may not be configured properly.
The details of these softwares are as follows: Because the log file available for the analysis was not in the
standard format that Analog knows. We cannot give “data
TABLE I type” in log format. It is very difficult to configure the format
of log file. If the log file is available in standard format then
Open source tools this tool can be useful. One need to study this file properly
and should know everything in order to get proper report.
Name Platform Supported Latest License
databases stable B. AWStats:
release
Analog C Log file- 6.0 GPL
AWStats is short for Advanced Web Statistics. AWStats is
based
powerful log analyzer which creates advanced web, ftp, mail
AWStats Perl Log file- 6.9 GPL
and streaming server statistics reports based on the rich data
based
contained in server logs. Data is graphically presented in easy
W3Perl Perl Log file- 3.09 GPL to read web pages.
based Requirements:
Webalizer C Log file- 2.20-01 GPL You must have access to the server logs for the reporting you
based want to perform (web/ftp/mail).
You must be able to run perl scripts (.pl files) from command
Performance of the tools mentioned above: line and/or as a CGI. If not, you can solve this by
downloading latest Perl version at ActivePerl (Win32) or
A. Analog: Perl.com (Unix/Linux/Other).
Though the user know Perl, it may not be possible that user
Analog is a program which analyses log files from WWW knows how to use ActivePerl. The use of this tool was stuck
servers. It works on almost any operating system. It is up because I didn‟t know how to use ActivePerl.
designed to be fast and to produce accurate and attractive
statistics and combined with Report Magic, you can C. W3Perl:
generate even prettier reports. It's free software. Analog comes
with no warranty. W3Perl is an open source log file analyzer written in Perl. It
can deliver output in an HTML file with graphs and a sortable
text-data table. This program comes with a graphical web
interface, and we can retrieve anywhere from a single-page
report of statistics to a multi-page collection of hundreds of
reports. These reports can be scheduled for hourly, daily,
weekly, and monthly processing.

W3Perl can parse Web, FTP, and Mail log files, and can do
page tagging if you don't have log file access. Cross-platform
support allows you to install W3Perl on any machine that runs
Perl, and Windows users have a special installation for
different server types, so it's really easy for them. But after
installation, configuration is very difficult. Though they have
The report generated contains the following details like page
provided the configuration interface we may not be able to
views, visits, pages/visit, Bounce rate, and Avg. time on site,
understand the terms those have to be filled while configuring
%new visit.
the tool.

There are many options available for customizing the report.


D. Webalizer:
1. Google analytics and MS-Excel:
The Webalizer is a web server log file analysis program which
produces usage statistics in HTML format for viewing with a
browser. The results are presented in both columnar and Excellent Analytics is a simple Excel plug-in that lets you
graphical format, which facilitates interpretation. Yearly, import web analytics data from Google Analytics into a
monthly, daily and hourly usage statistics are presented, along spreadsheet. It‟s an open source project and 100% free to
with the ability to display usage by site, URL, referrer, user download and use for individuals and businesses.
agent (browser) and country (user agent and referrer are only
available if your web server produces combined log format The fetched data looks in excel as:
files).
Excel Analytics
Free Tools
Excellent Analytics functionality:
E. Google Analytics:
 Build queries with all dimensions and metrics
Google analytics is a free website tracker from Google. Using available in Google Analytics
Google analytics you can track your visitors and other related  Apply filters to create advanced queries
details. Google Analytics is the enterprise-class web analytics  All queries are stored in the spreadsheet
solution that gives you rich insights into your website traffic Benefits of reporting and analyzing Google Analytics data in
and marketing effectiveness. Powerful, flexible and easy-to- Microsoft Excel:
use features now let us see and analyze our traffic data in an
entirely new way.  Get one less tool to keep track of
 Use a familiar interface
I used Google Analytics for tracking the website of  Combine data from multiple data sources
http://www.afterhsc.com. The registration process is very  Use Excel formulas, charts, and pivot tables
simple. The user need not have programming knowledge. We
 Define and calculate customized KPIs
just have to enter the name of site we want to track. After
 Build dashboards just the way you like them
successful registration a java script is generated automatically.
 Share workbooks with other Excel users
That java script we need to insert in the code of the pages of
our site. And then Google analytics will automatically track
that website. The java script should be inserted before <head>.
Once it is done the report will also be generated. The report
generated looks like

VI. Conclusion:

The tools used above can be compared on the basis following criteria:
Tools which use log files for analysis:

TABLE II

Comparison of tools for web analytics

Features AWStats Analog Webalizer


Available on all
Yes Yes Yes
platforms
Works with
Apache
Yes Yes Yes
combined
(XLF)
Works with IIS
log format Yes Yes Need a patch
(W3C)
Works with
personalized log Yes Yes No
format
Analyze
Web/Ftp/Mail Yes/yes/yes Yes/no/no Yes/no/no
log files
Report unique
"human" Yes No No
visitors
Report session
Yes No No
duration
Report
From IP location Domain name Domain name
countries
Report
Yes/yes No/no Yes/yes
entry/exit page
Link for http://www.webalizer.org/do
http://awstats.sourceforge.net/ http://www.analog.cx/
download wnload.html

Google analytics has following advantage over tools used 12. It can be integrated with your Google
above and which use java script for analysis: AdWords account
It has the ability to track campaigns from multiple
1. Excellent Graphical Representation of Data sources
2. Provides for Marketing-Based Analysis
3. No knowledge of programming is required
4. Able to find out how your visitors locate Latest Features
your website. 1. Advanced Segmentation. Isolate and analyze subsets
5. Able to identify which pages and links your of your traffic.
visitors click the most. 2. Motion Charts. Motion Charts add sophisticated
6. Visitor segmentation. multi-dimensional analysis to most Google Analytics
7. Analyze and compare your visitors‟ reports.
keyword usage 3. Custom Reports. Create, save, and edit custom
8. Beta version for Intelligence reports that present the information you want to see
9. It has all the features you need to improve organized in the way you want to see it.
online marketing performance 4. Benchmarking. Find out whether your site usage
10. It is easy to learn and easy to use. metrics underperform or outperform those of your
11. It is scalable; as your site grows, Google industry vertical.
Analytics will grow with it 5. GeoTargeting. Find out where your visitors come
from and identify your most lucrative geographic
markets.
References:

[1] http://www.wikipedia.com
[2] http://awstats.sourceforge.net/
[3] http://www.analog.cx/
[4] http://www.webalizer.org/download.html
[5] http://www.webanalytics20.com/
[6]http://www.kaushik.net/avinash/2009/09/web-analytics-
books.html

You might also like