You are on page 1of 49

A STUDY ON WEB ANALYTICS

WITH REFERENCE TO SELECT SPORTS WEBSITES

A project report submitted to GITAM Institute of Management, GITAM


University in partial fulfillment for the award of degree of

BACHELOR OF BUSINESS ADMINSTRATION


(BUSINESS ANALYTICS)

Submitted by
Y. Bhanu Prakash,
Regd.No: 1214415127

Under the guidance of


Dr. D. Vijaya Geeta
Associate Professor

GITAM INSTITUTE OF MANAGEMENT


GITAM UNIVERSITY
VISAKHAPATNAM
2015-2018

1 | Page
Declaration By Student
I, Y. Bhanu Prakash, Regd.No:1214415127 hereby declare that the project titled A study
on web analytics with reference to select sports websites is submitted to GITAM Institute

Of Management, GITAM University is an original work done by me and is not being


submitted to any other University for award of any degree or diploma.

Y. Bhanu Prakash
Regd.No: 1214415127

2 | Page
Certificate By Guide
This is to certify that the project titled A study on web analytics with reference to select
sports websites is project work undertaken by Y. Bhanu Prakash, Regd.No: 1214415127
under my guidance.

Place:- Visakhapatnam Dr. D. Vijaya Geeta


Date:- Associate Professor

3 | Page
ACKNOWLEDGEMENT

It is a genuine pleasure to express my deep sense of thanks and gratitude to my principal


Prof. P. Sheela, GITAM Institute of Management, GITAM University, Visakhapatnam,
Andhra Pradesh for her continuous support and guidance throughout my project. Her
dedication and keen interest above all her overwhelming attitude to help her students had
been solely and mainly responsible for completing my work. Her timely advice,
meticulous scrutiny and scholarly advice have helped me to a very great extent to
accomplish my task.

And also I take this moment to thank my guide Dr. D. Vijaya Geeta, Associate Professor,
GITAM Institute of Management, GITAM University, Visakhapatnam, Andhra Pradesh.
Her prompt inspirations, timely suggestions with kindness, enthusiasm and dynamism
have enabled me to complete my project. I perceive as this opportunity as a big milestone
in my career development. I will strive to use gained skills and knowledge in the best
possible way, and I will continue to work on their improvement, in order to attain desired
career objectives. Hope to continue cooperation with all of you in the future.

Y. Bhanu Prakash

4 | Page
CONTEXT Pg.No

CHAPTER 1:
INTRODUCTION TO ANALYTICS
DATA ANALYTICS 8-16
WEB ANALYTICS 17-19

CHAPTER 2:
PROFILE OF ALEXA,COM
ALEXA INTERNET 21-23

CHAPTER 3:
METHODOLOGY
RESEARCH PROBLEM 25
OBJECTIVES OF THE STUDY 25
METHODLOGY OF THE STUDY 25-26
SCOPE OF THE STUDY 26
LIMITATIONS OF THE STUDY 26

CHAPTER 4:
ANALYSIS AND DATA INTERPRETATION 28-40

CHAPTER 5:
OBSERVATIONS AND CONCLUSION
OBSERVATIONS 42
CONCLUSION 43
BIBLIOGRAPHY 44
ANNEXURE 45-46

5 | Page
LIST OF TABLES
Table Page
No. Title No.
1 List Of Top 10 Websites (Global) 29
2 List Of Top 10 Websites In India 30
List Of Summary Statistics Of The 50 Selected
3 Sports(Cricket) Websites 31
Count Of The Websites With Reference To Their Summary
4 Statistics 32

LIST OF CHARTS
Chart Page
No. Title No.
Percentage Of Visits By Indian User For Top 10
1 Websites (Global) 29
Percentage Of Visits By Indian User For Top 10
2 Websites In India 30

LIST OF FIGURES
Figure Page
No. Title No.
The Websites Which Has Highest Trend During April July
1 Months 33-35
The Websites Which Has Highest Trend During August-October
2 Months 36-38
The Websites Which Has Highest Trend During October-
3 December Months 39
The Websites Which Has Highest Trend During January-March
4 Months 40

6 | Page
CHAPTER I
INTRODUCTION TO ANALYTICS

7 | Page
CHAPTER-I
Data Analytics

1. Introduction

Imagine a world without data storage; a place where every detail about a person or
organization, every transaction performed, or every aspect which can be documented is
lost directly after use. Organizations would thus lose the ability to extract valuable
information and knowledge, perform detailed analyses, as well as provide new
opportunities and advantages. Data is an essential part of our lives, and the ability to store
and access such data has become a crucial task which we cannot live without. Anything
ranging from customer names and addresses, to products available, to purchases made, to
employees hired, etc. has become essential for day to day continuity. Data is the building
block upon which any organization thrives.

2. Big Data

The term "Big Data" has recently been applied to datasets that grow so large that they
become awkward to work with using traditional on-hand database management tools.
They are data sets whose size is beyond the ability of commonly used software tools and
storage systems to capture, store, manage, as well as process the data within a tolerable
elapsed time. Big data also refers to databases which are measured in terabytes and above,
and are too complex and large to be effectively used on conventional systems.

Big data sizes are a constantly moving target, currently ranging from a few dozen
terabytes to many petabytes of data in a single data set. Consequently, some of the
difficulties related to big data include capture, storage, search, sharing, analytics, and
visualizing. Today, enterprises are exploring large volumes of highly detailed data so as to
discover facts they didnt know before. Business benefit can commonly be derived from
analyzing larger and more complex data sets that require real time or near-real time

8 | Page
capabilities, however, this leads to a need for new data architectures, analytical methods,
and tools. In this section, we will discuss the characteristics of big data as well the issues
surround storing and analyzing such data.

2.1. Big Data Characteristics

Big data is data whose scale, distribution, diversity, and/or timeliness require the use of
new technical architectures, analytics, and tools in order to enable insights that unlock
new sources of business value. Big data is characterized by three main features: volume,
variety, and velocity. The volume of the data is its size, and how enormous it is. Velocity
refers to the rate with which data is changing, or how often it is created. Finally, variety
includes the different formats and types of data, as well as the different kinds of uses and
ways of analyzing the data.

2.2. Importance of Managing Big Data

There are five broad ways in which using big data can create value. First of all, big data
can unlock significant value by making information transparent and usable at a much
higher frequency. Second of all, as organizations create and store more and more
transactional data in a digital form, they can collect more accurate and detailed
performance information on everything from product inventories to sick days. This can
therefore expose variability in the data and boost performance. Third of all, big data
allows a narrower segmentation of customers and therefore much more precisely tailored
products or services to meet their needs and requirements. Fourth of all, sophisticated
analytics performed on big data can substantially improve decision making. Finally, big
data can also be used to improve the development of the next generation of products and
services. For example, manufacturers are currently using data obtained from sensors
which are embedded in products to create innovative after-sales service offerings such as
proactive maintenance, which are preventive measures that take place before a failure
occurs or is even noticed by the customer.

9 | Page
Nowadays, along with the increasing ubiquity of technology comes the increase in the
amount of electronic data. Only a few years ago, corporate databases tended to be
measured in the range of tens to hundreds of gigabytes. Now, however, multi-terabyte
(TB) or even petabyte (PB) databases have become normal. According to Longbottom ,
the World Data Center for Climate (WDDC) stores over 6PB of data overall and the
National Energy Research Scientific Computing Center (NERSC) has over 2.8PB of
available data around atomic energy research, physics projects and so on. These are only a
couple of examples of the enormous amounts of data which must be dealt with nowadays.

Furthermore, even companies such as Amazon are running with databases in the tens of
terabytes, and companies which wouldnt be expected to have to worry about such
massive systems are dealing with databases with sizes of hundreds of terabytes.
Additionally, other companies with large databases in place include telecom companies
and service providers, as well as social media sites. For telecom companies, just dealing
with log files of all the events happening and call logs can easily build up database sizes.
Moreover, social media sites, even those that are primarily text, such as Twitter or
Facebook, have big enough problems; and sites such as YouTube have to deal with
massively expanding datasets. With such increasing amounts of big data, there arises an
essential need to be able to analyze the datasets. Thus, big data analytics will be discussed
in the subsequent section.

3. Big Data Analytics

Big data analytics is where advanced analytic techniques operate on big data sets.
Analytics based on large data samples reveals and leverages business change. However,
the larger the set of data, the more difficult it becomes to manage. Sophisticated analytics
can substantially improve decision making, minimize risks, and unearth valuable insights
from the data that would otherwise remain hidden. Sometimes decisions do not
necessarily need to be automated, but rather augmented by analyzing huge, entire datasets
using big data techniques and technologies instead of just smaller samples that individuals

10 | P a g e
with spreadsheets can handle and understand. Therefore, decision making may never be
the same. Some organizations are already making better decisions by analyzing entire
datasets from customers, employees, or even sensors embedded in products. In this
section, we will discuss the data analytics lifecycle, followed by some advanced data
analytics methods, as well as some possible tools and methods for big data analytics in
particular.

3.1. Advanced Data Analytics Methods

With the evolution of technology and the increased multitudes of data flowing in and out
of organizations daily, there has become a need for faster and more efficient ways of
analyzing such data. Having piles of data on hand is no longer enough to make efficient
decisions at the right time. The acquired data must not only be accurate, consistent, and
sufficient enough to base decisions upon, but it must also be integrated and subject-
oriented, as well as non volatile and variant with time. New tools and algorithms have
been designed to aid decision makers in automatically filtering and analyzing these
diverse pools of data.

Data Analytics is the process of applying algorithms in order to analyze sets of data and
extract useful and unknown patterns, relationships, and information. Furthermore, data
analytics are used to extract previously unknown, useful, valid, and hidden patterns and
information from large data sets, as well as to detect important relationships among the
stored variables. Thus, analytics have had a significant impact on research and
technologies, since decision makers have become more and more interested in learning
from previous data, thus gaining competitive advantage.

11 | P a g e
Nowadays, people dont just want to collect data, they want to understand the meaning
and importance of the data, and use it to aid them in making decisions. Data analytics
have gained a great amount of interest from organizations throughout the years, and have
been used for many diverse applications. Some of the applications of data analytics
include science, such as particle physics, remote sensing, and bioinformatics, while other
applications focus on commerce, such as customer relationship management, consumer
finance, and fraud detection.

In this section, we will take a look at some of the most common data analytics methods. In
order to fully grasp the concept of data analytics, we will take a look at some of the most
common approaches as well as how they can be applied and what algorithms are
frequently used. Three different data analytics approaches will be discussed: association
rules, clustering, and decision trees.

3.2. Association Rules

Association rules are one of the most popular data analytics tasks for discovering
interesting relations between variables in large databases. It is an approach for pattern
detection which finds the most common combinations of categorical variables. Using
association rules shows relationships between data items by identifying patterns of their
co-occurrence. Since so many various association rules can be derived from even a tiny
dataset, the interest in such rules is restricted to those that apply to a reasonably large
number of instances and have a reasonably high accuracy on the instances to which they
apply to.

Association rule analytics discover interesting correlations between attributes of a


database by using two measures, support and confidence. Support is the probability that
two different attributes occur together in a single event, or the frequency of occurrence,

12 | P a g e
while confidence is the probability that when one attribute occurs, the other will also
occur in the same event. Association rules are normally used in business applications to
determine the items which are usually purchased together. An example of an association
rule would be the statement that people who buy cars also buy CDs 80% of the time,
written as Car CD. In this case the two attributes being associated are the car and the
CD, while the confidence value is the 80% and the support value is how many times in the
database both a car and a CD were bought together. If a rule passes the minimum support
then it is considered as a frequent rule, while rules which pass both support and
confidence are considered strong rules.

One of the most common algorithms for association rule analytics is the Apriori
algorithm. Like most association rule algorithms, it splits the problem into two major
tasks. The first task is frequent itemset generation, in which the objective is to find all the
itemsets which satisfy the minimum support threshold and are thus frequent itemsets. The
formula for calculating support is:

The second task is rule generation, in which the objective is to extract the high confidence,
or strong, rules from the previously found frequent itemsets. The formula for calculating
confidence is:

Since the first step is computationally expensive and requires the generation of all
combinations of itemsets, the Apriori algorithm provides a principle for guiding itemset
generation and reducing computational requirements.

The Apriori principle states that a subset of a frequent itemset must also be frequent. In
this case, if an itemset is not frequent, then it will be discarded and will not be used as a
subset for the generation of another itemset. The algorithm uses a breadth first search
strategy and a tree structure, to count candidate itemsets efficiently.

13 | P a g e
Each level in the tree contains all the k-itemsets, where k is the number of items in the
itemset. For example level 1 contains all 1-itemsets, level 2 all 2-itemsets, and so forth.
Instead of ending up with so many itemsets through all possible combinations of items,
the Apriori algorithm only considers the frequent itemsets. So in the first level, the
algorithm calculates the support of each itemset. Frequent itemsets which pass the
minimum support are taken to the next level, and all possible 2-itemset combinations are
made only out of these frequent sets, while all others are discarded. Finally, rules are
extracted from the frequent itemsets in the form of A B (if A then B). The confidence
for each rule is calculated, and rules which pass the minimum confidence are taken as
strong rules.

3.3. Clustering

Data clustering is a technique which uses unsupervised learning, or in other words


discovers unknown structures. Clustering is the process of grouping sets of objects
together into classes based on similarity measures and the behavior of the group. Instances
within the same group are similar to each other, and are dissimilar to instances in other
groups. Clustering is similar to classification in that it groups data into classes; however
the main difference is that clustering is unsupervised, and the classes are defined by the
data alone, hence they are not predefined. Therefore, data to be analyzed is not compared
to a model built from training data, but is rather compared to other data and clustered
according to the level of similarity between them. Several representations of clusters are
depicted.

3.4. Decision Trees

Another type of data analytics technique is the decision tree. Decision trees are used as
predictive models to map observations about an attribute to conclusions about an

14 | P a g e
attributes target value. A decision tree is a hierarchical structure of nodes and directed
edges which consists of three types of nodes. The root node is a node with no incoming
edges and zero or more outgoing edges to other nodes. An internal node is a node in the
middle levels of the tree, and consists of one incoming edge and two or more outgoing
edges. Finally, the leaf node has exactly one incoming edge and no outgoing edges, and is
assigned a class label which provided the decision of the tree.

Each of the trees nodes specifies a test of a certain attribute of the instance, and each
descending branch from the node corresponds to one of the attributes possible values. An
instance is classified by moving down the tree by starting at the root node, testing the
attribute specified by that node, and moving down the branch which corresponds to the
value of the given attribute to a new node. The same process is repeated at that node, until
a leaf node providing a decision is finally reached.

4. Big Data Analytics Tools and Methods

Big data is too large to be handled by conventional means, and the larger the data grows,
the more organizations purchase more powerful hardware and computational resources.
However, the data keeps on growing and performance needs increase, but the available
resources have a maximum capacity and capability. The MapReduce paradigm is based on
adding more computers or resources, rather than increasing the power or storage capacity
of a single computer; in other words, scaling out rather than scaling up. The fundamental
idea of MapReduce is breaking a task down into stages and executing the stages in parallel
in order to reduce the time needed to complete the task.

Map Reduce is a parallel programming model which is suitable for big data processing. It
is built on Hadoop, which is a concrete platform which implements MapReduce. In
MapReduce, data is split into distributable chunks, which are called shards. The steps to
process those chunks are defined, and the big data processing is run in parallel on the
chunks. This model is scalable, in that the bigger the data processing becomes, or the more

15 | P a g e
computational resources are the required, the more machines can be added to process the
chunks.

The first phase of the MapReduce job is to map input values to a set of key/value pairs as
output. Thus, unstructured data such as text can be mapped to a structured key/value pair,
where, in this case, the key could be the word in the text and the value is the number of
occurrences of the word. This output is then the input to the "Reduce" function. Reduce
then performs the collection and combination of this output. So assuming we have
millions of text documents and would like to count the occurrence of a certain word. The
text documents would be divided upon several workers, or machines, which will perform
parallel processing. These workers will act as mappers and map the desired word to the
number of occurrences in the text documents given to it for processing in parallel. The
reducers will then aggregate these counts, thus giving the total count in the millions of text
documents.

Hadoop is a framework for performing big data analytics which provides reliability,
scalability, and manageability by providing an implementation for the MapReduce
paradigm as well as gluing the storage and analytics together. Hadoop consists of two
main components: the Hadoop Distributed File System (HDFS) for the big data storage,
and MapReduce for big data analytics. The HDFS storage function provides a redundant
and reliable distributed file system which is optimized for large files. Data is stored in
replicated file blocks across the multiple Data Nodes, and the Name Node acts as a
regulator between the client and the Data Node, directing the client to the particular Data
Node which contains the requested data. Additionally, the data processing and analytics
functions are performed by MapReduce which consists of a java API as well as software
in order to implement the services which Hadoop needs to function.

The MapReduce function within Hadoop depends on two different nodes: the Job Tracker
and the Task Tracker nodes. The Job Tracker nodes are the ones which are responsible for
distributing the Mapper and Reducer functions to the available Task Trackers, as well as

16 | P a g e
monitoring the results. On the other hand, the Task Tracker nodes actually run the jobs and
communicate results back to the Job Tracker. That communication between nodes is often
through files and directories in HDFS so inter-node communication is minimized.

5. Big Data Challenges

Several issues will have to be addressed in order to capture the full potential of big data.
Policies related to privacy, security, intellectual property, and even liability all need to be
addressed in a big data world. Organizations need to put the right talent and technology in
place, as well as additionally structure workflows and incentives to optimize the use of big
data. Access to data is critical, and companies will need to increasingly integrate
information from multiple data sources, often from third parties or different locations.
Furthermore, questions on how to store and analyze data with volume, variety, and
velocity have arisen, and current research lacks the capability for providing an answer.

Consequently, the biggest problem has become not only the sheer volume of data, but the
fact that the type of data companies must deal with is changing. In order to accommodate
for the change in data, the approaches for storing data have changed throughout the years.
Data storage started with data warehouses, data marts, data cubes, and then moved on to
master data management, data federation and other techniques such as in-memory
databases. However, database suppliers are still struggling to cope with enormous amounts
of data, and the emergence of interest in big data has led to a need for storing and
managing such large amounts of data.

17 | P a g e
Web Analytics

Web analytics is reporting and analysis of data on website visitor activity. It is not only a
tool to measure web traffic but also can be used as a tool for business and market research.
Techniques used to access and improve the contribution of e-marketing to a business, such
as referrals, click streams, online research data, customer satisfaction surveys, and leads
and sales. Thus, marketers use web analytics exploring data and reports to build their
knowledge on customers' preference and behavior according to types of sites, which areas
customers click more often when they online. It also helps marketer understand their
customers better and improve their business performance.

These are three stages that they need to concern when setting up a web analytic tool. The
analysis is the ticket for them move from Steupland to Actionland. It is the isolating of
meaningful and actionable insights in data and reports that when acted upon by your
organization can drive business value.

Alignment Stage: At this early planning stage, it is necessary for marketer to gather
their business objectives and capture stakeholders' online behavior by their online
measurement strategy. Clearly understand measurement strategy and well analyze visitors
is critical to success. Thus, marketers have to carefully handling relevant and meaningful
data which will directly affect the business in the long-term.

18 | P a g e
Collection Stage: At this point of stage, large companies may spend amount of time
on technical implementation such as multiple web domains and online marketing
initiatives.

Reporting Stage: This is the last stage for companies move from Setupland to
Actionland. This stage is important where you create report and distribute them to
organization using a manual or preferably automated approach.

TOOLS AND METHODS USED TO HELP MARKETER:-


There are two types of web analytics, on-site and off-site web analytics.

ON-SITE ANALYTICS

On-site web analytics is used for marketers to measure a visitor's activity when he
browses on your website. This includes its drivers and conversations, for example which
ads on landing page encourage more people to purchase and which title of information
visitors click most. This data is used to analysis visitors' online behavior and can be used
to improve website or marketing campaign's audience response.

Simply, on-site web analytics tools are used to analysis and measure behaviors of visitors'
journey and actual visitor traffic arriving on your website. For example, which landing
page encourage visitors to make a purchase, what links visitors clicked on (from search
engine to get to the site or came there directly) to the site, and time they spent and stayed
on given page. Therefore, On-site web analytics measures of website in a commercial
context.

19 | P a g e
For the business, website became more important than ever before, it handles more
information. Companies also need to know if their marketing campaigns are working on
internet-based.

OFF-SITE ANALYTICS

Off-site analytics data can be obtained for any website-including your competitors and
partners. Which means is analysis the internet as a whole for the websites. Thus, the key
differences of off-site web analytics measures from your potential audience (opportunity),
share of voice (visibility), and buzz (comments).

Unlikely to on-site web analytics only captures what happens when visitors visit and
engage with your website, by using various technologies to help monitor and analysis
website to create meaningful actions and results. However, as social website becomes
more popular and ascendant channel for internet users, and everything becomes more
transparent on social web, organization information are shared, spread on it, thus, through
this platform, marketers are able to measure the latest buzz about website or organization.
It is important for marketers to monitor not only what happens on the website but also
outside of your website. Improving from what other people are saying about the company
and provide products and services match customers requires. Off-site Web Analytics
solutions can help businesses stay on the leading edge of overall trends.

20 | P a g e
21 | P a g e
CHAPTER- II
PROFILE OF ALEXA.COM

22 | P a g e
CHAPTER-II
Alexa Internet

Alexa Internet, Inc. is a California-based company that provides commercial web traffic
data and analytics. It is a wholly owned subsidiary of Amazon.com

Founded as an independent company in 1996, Alexa was acquired by Amazon in 1999.


Its toolbar collects data on browsing behavior and transmits them to the Alexa website,
where they are stored and analyzed, forming the basis for the company's web traffic
reporting. According to its website, Alexa provides traffic data, global rankings and other
information on 30 million websites, and as of 2015 its website is visited by over 6.5
million people monthly.

Operation & History

Alexa Internet was founded in April 1996 by American web entrepreneurs Brewster Kahle
and Bruce Gilliat. The company's name was chosen in homage to the Library of
Alexandira of Ptlolemaic Egypt, drawing a parallel between the largest repository of
knowledge in the ancient world and the potential of the Internet to become a similar store
of knowledge.

Alexa initially offered a toolbar that gave Internet users suggestions on where to go next,
based on the traffic patterns of its user community. The company also offered context for
each site visited: to whom it was registered, how many pages it had, how many other sites
pointed to it, and how frequently it was updated. Alexa's operations grew to include
archiving of web pages as they are crawled. This database served as the basis for the
creation of the Internet Archive accessible through the Wayback Machine. In 1998, the
company donated a copy of the archive, two terabytes in size, to the Library of Congress.
Alexa continues to supply the Internet Archive with Web crawls.

23 | P a g e
In 1999, as the company moved away from its original vision of providing an "intelligent"
search engine, Alexa was acquired by Amazon.com for approximately US$250 million in
Amazon stock. Alexa began a partnership with Google in early 2002 and with the web
directory DMOZ in January 2003. In May 2006, Amazon replaced Google with Bing (at
the time known as Windows Live Search) as a provider of search results. In December
2006, Amazon released Alexa Image Search. Built in-house, it was the first major
application built on the company's Web platform.

In December 2005, Alexa opened its extensive search index and Web-crawling facilities to
third party programs through a comprehensive set of web services and APIs. These could
be used, for instance, to construct vertical search engines that could run on Alexa's own
servers or elsewhere. In May 2007, Alexa changed their API to limit comparisons to three
websites, reduce the size of embedded graphs in Flash, and add mandatory embedded
BritePic advertisements.

On November 27, 2008, Amazon announced that Alexa Web Search was no longer
accepting new customers, and that the service would be deprecated or discontinued for
existing customers on January 26, 2009. Thereafter, Alexa became a purely analytics-
focused company.

On March 31, 2009, Alexa launched a major website redesign. The redesigned site
provided new web traffic metricsincluding average page views per individual user,
bounce rate, and user time on site. In the following weeks, Alexa added more features,
including visitor demographics, click stream and search traffic statistics. Alexa introduced
these new features to compete with other web analytics services.

Toolbar

Alexa ranks sites based primarily on tracking a sample set of internet trafficusers of its
toolbar for the Internet Explorer, Firefox and Google Chrome web browsers. The Alexa

24 | P a g e
Toolbar includes a popup blocker, a search box, links to Amazon.com and the Alexa
homepage, and the Alexa ranking of the site that the user is visiting. It also allows the user
to rate the site and view links to external, relevant sites. In early 2005, Alexa stated that
there had been 10 million downloads of the toolbar, though the company did not provide
statistics about active usage.

Originally, web pages were only ranked amongst users who had the Alexa Toolbar
installed, and could be biased if a specific audience subgroup was reluctant to take part in
the rankings. This caused some controversies over how representative Alexa's user base
was of typical Internet behavior, especially for less-visited sites.

Until 2007, a third-party-supplied plugin for the Firefox Browser served as the only option
for Firefox users after Amazon abandoned its A9 toolbar. On July 16, 2007, Alexa released
an official toolbar for Firefox called Sparky.

On 16 April 2008, many users reported drastic shifts in their Alexa rankings. Alexa
confirmed this later in the day with an announcement that they had released an updated
ranking system, claiming that they would now take into account more sources of data
"beyond Alexa Toolbar users".

Certified Statistics

Using the Alexa Pro service, website owners can sign up for "certified statistics," which
allows Alexa more access to a site's traffic data. Site owners input Javascript code on each
page of their website that, if permitted by the users security and privacy settings, runs and
send traffic data to Alexa, allowing Alexa to display or not display, depending on the
owner's preference more accurate statistics such as total pageviews and unique pageviews.

25 | P a g e
26 | P a g e
CHAPTER III
METHODOLOGY

27 | P a g e
CHAPTER III
METHODOLOGY
RESEARCH PROBLEM:- Study on web-analytics with reference to select sport
websites.

OBJECTIVES OF THE STUDY:-


The objectives of the study are,
To find out the sports (cricket) websites with highest visitors of India.
To know the rank of the website in India and as wells as global too.
To know the bounce rate, daily page views per visitor and daily time on site of the
selected websites.

METHODOLOGY OF THE STUDY:-


The top 50 sport websites selected to cricket is taken for the study where website traffic
rank is a combined measure of page views and users. So, albeit a website has more reach
i.e., number of users visiting the website, its rank may differ based on the unique pages
that were visited for the website.
The main aim of the study is to know the most popular cricket website in India and as
well as global with reference to the alexa.com website and also to know the bounce rate,
daily pageviews per visitor and daily time on site of the selected websites.
The metrics used to measure the popularity of the websites are taken as follows:
Bounce Rate:- The percentage of visitors to a particular website who navigate
away from the site after viewing only one page. A rising bounce rate is a sure sign
that your homepage is boring or off-putting.
Daily Pageviews Per Visitor:- The average number of pages viewed by each
visitor to your website per day. When this number is higher, your website is
considered to have more engaging information.
Daily Time On The Site:- The average number of minutes of minutes spent on
your website by each visitor per day. As with Daily Pageviews Per Visitor, when
this number is higher, your website is considered to have more engaging
information.

28 | P a g e
SCOPE OF THE STUDY:-
The Web traffic data for this study is collected from alexa.com that collects the traffic
data by using a Web crawler.
Only top 50 sports (cricket) websites as ranked by Alexa.com as on 31 st May. 2016 is
taken for the study.

LIMITATIONS OF THE STUDY:- The study is limited to the sites or the data given in
the alexa.com website. And the study is limited only to the top 50 sites of sports (cricket)
websites. The rank of the each website is taken only as per global and in India, other
countries ranks are not taken in the study.

29 | P a g e
CHAPTER IV
ANALYSIS AND INTERPRETATION

30 | P a g e
CHAPTER IV
ANALYSIS AND INTERPRETATION
In this section, the results are shown along with the interpretations.
The categorization, along with the rank in India & percentage of visitors in India of
top 10 websites as per global rank as on 31 st May, 2016, is exhibited in the Table 1 and
Chart 1. And for the top 10 websites in India are shown in Table 2 and Chart 2. These
tables also includes the bounce rate, daily page views per visitor & daily time on the site
of those selected sports (cricket) websites.

31 | P a g e
Table 1 : List Of Top 10 Websites (Global)
% Of
Glob Ran Visit Daily Daily
al k In ors Boun Pagevie Time
S.N Ran Indi In ce ws Per On
O Website k a India Rate Visitor Site
1 Https://www.youtube.com/user/CricketICC 2 3 9.3 33.30 13.42 23:42
2 Msn.com/en-in/sports/cricket 15 43.30 4.19 11:36
3 Bbc.com/sport/cricket 126 131 7.1 52.20 2.75 4:30
4 Telegraph.co.uk/sport/cricket/ 327 293 8.4 71.20 2.28 3:10
5 Espncricinfo.com 370 75 47.6 33.30 3.44 5:46
6 Cricbuzz.com 623 110 83.6 27.10 3.52 5:47
103
7 Skysports.com/cricket 1110 5 8.9 44.10 2.89 4:12
802
8 Iplt20.com 1522 8 64.7 27.50 4.01 6:36
462
9 Smh.com.au/sport/cricket/ 1914 7 4 30.10 2.16 5:32
793
10 Stuff.co.nz/sport/cricket 3242 0 4.5 41.00 3.78 7:07

Chart 1: Percentage Of Visits By Indian User For Top


10 Websites (Global)

% Of Visitors InMsn.com/en-in/sports/cricket
India
Https://www.youtube.com/user/CricketICC

Bbc.com/sport/cricket Telegraph.co.uk/sport/cricket/
2% 2% 4% 3%

27% 4%
Espncricinfo.com Cricbuzz.com
20%

Skysports.com/cricket 4% Iplt20.com

35%
Smh.com.au/sport/cricket/ Stuff.co.nz/sport/cricket

The above chart shows about the percentage of visits by Indian user for the top 10
websites of sports (cricket) websites as per global rank of the website as on 31 st may,

32 | P a g e
2016. Among the top 10 websites in global, cricbuzz.com holds the first place in highest
percentage of viewers in India i.e., 35%. Msn.com has 15th rank in global but it doesnt
have any rank in India because it deals mainly with England cricket.

Table 2 : List Of Top 10 Websites In India


Ran % Of Daily Daily
k In Visito Glob Boun Pagevie Time
S.N Indi rs In al ce ws Per On
O Website a India Rank Rate Visitor Site
1 Https://www.youtube.com/user/CricketICC 3 9.3 2 33.3 13.42 23:42
2 Espncricinfo.com 75 47.6 370 33.3 3.44 5:46
3 Cricbuzz.com 110 83.6 623 27.1 3.52 5:47
4 Bbc.com/sport/cricket 131 7.1 126 52.2 2.75 4:30
5 Telegraph.co.uk/sport/cricket/ 293 8.4 327 71.2 2.28 3:10
103
6 Skysports.com/cricket 5 8.9 1110 44.1 2.89 4:12
263 1815
7 Icc-cricket.com/cricket-world-cup 5 57.4 8 56.5 1.95 2:52
424 1478
8 Cricket.com.au 9 37.3 4 53.9 2.23 3:39
462
9 Smh.com.au/sport/cricket/ 7 4 1914 30.1 2.16 5:32
495 7127
10 Bcci.tv 5 83.5 5 45.5 2.59 2:49

33 | P a g e
Chart 2 : Percentage Of Vists By Indain UserFor Top 10
Websites In India

The above chart shows about the percentage of visits by Indian user for the top 10
websites of sports (cricket) websites as per rank in India of the website as on 31 st may,
2016. Among the top 10 websites in India, cricbuzz.com & youtube.com holds the first
place in highest percentage of viewers in India i.e., 24% each. Smh.com gets lower
percentage of visitors in India as it deals mainly with Australian cricket.
SUMMARY STATISTICS

Table 3 : List Of Summary Statistics Of The 50 Selected Sports (Cricket)


Websites
Glob Rank % Of Boun Daily Daily
al In Visitors In ce Pageviews Per Time On
Rank India India (%) Rate Visitor Site
77311 25670
Maximum 8 3 92.9 75 24 42.18
Minimum 2 3 1 4 1.16 1.11
77311 25670
Range 6 0 91.9 71 22.84 41.16
22127
Mean 4 70601 34.34 43.73 3.72 6.61
17210
Median 2 44956 21.5 42.8 2.78 3.83
Standard 22073
Deviation 8 71308 29.51 15.22 3.62 8.29

34 | P a g e
The above table gives the summary statistics of the data. Following inferences can be
drawn from the table,
Websites which are having lesser bounce rate value than the mean value of bounce
rate, they are said to be more popular sites.
Websites which are having greater daily pageviews per visitor mean value are said
to be more popular sites.
Similarly as daily pageviews per visitor, websites having greater mean value of
daily time on site are said to be more popular sites.

Table 4 : Count Of The Websites With Reference To Their Summary


Statistics
Rank % Of Daily
Global In Visitors In Bounc Pageviews
Rank India India (%) e Rate Per Visitor
Greater Than Mean 19 16 16 24 14
Lesser Than Mean 31 23 23 26 36
Greater Than Median 25 19 19 25 25
Lesser Than Median 25 19 19 25 25
Greater Than Standard
Deviation 19 16 17 49 14
Lesser Than Standard
Deviation 31 23 22 1 36

35 | P a g e
From the above table we say that, 38% of the websites had good global rank and 32%
websites had good rank in India and same percentage of visitors in India. 52% of websites
had bounce rate lesser than mean, it means the viewers are going deeply into the website
for more information. 28% of websites have greater daily pageviews per visitor value. It
means that 28% visitors are visiting the website daily. Even though bounce rate is good
for many websites, daily pageviews per visitor is not so good for the websites.

INTERPRETATION OF WEBSITES CATEGORIZED BY THE TREND


Category 1 : The Websites Which Has Highest Trend During April July Months
1. Espncricinfo.com 2. Cricbuzz.com

3. Msn.com/en-in/sports/cricket 4. Icc-cricket-com/cricket-world-cup

36 | P a g e
5. Iplt20.com 6. Pakpassion.net

7. Batsman.com 8. Cricketnmore.com

9. Carribeancricket.com 10.Cricwaves.com

11.Cricketwrold.com 12.Cricketweb.net

13.Pcb.com.pk 14.Lastmanstands.com

37 | P a g e
15.Cricketweb.net 16.Kkr.in

17.Cricket365.com 18.Cricruns.com

19.www.youtube.com/user/CricketICC 20.Windiescricket.com

21.Royalchallengers.in 22.Cricketfresh.in

38 | P a g e
From the above table, we can say that all the websites have their highest point in their
trend line during the April July months. Many of the sites are from India. This is because
of the IPL season. But some sites of other countries like msn.com, carribeancricke.com,
winidescricket.com have also their peak stage during April month because it was the time
of T20 World Cup Playoffs.

Category 2: The Websites Which Has Highest Trend During August-October Months
1. Ecb.co.uk 2. Skysports.com/cricket

3. Islandcricket.lk 4. Lords.org

5. Foxsports.com.au/cricket/ 6. Kiaoval.com

39 | P a g e
7. Yorkshireeccc.com 8. Lccc.co.uk

9. Telegraph.co.uk/sport/cricket 10.Srilankacricket.lk

11.Mcc.org.au 12.Glamorgancricket.com

13.Smh.com/au/sport/cricket 14.Middleexccc.com

15.Kentcricket.co.uk 16.Wccc.co.uk

17.Indiancricketfans.com 18. Islandcricket.lk

40 | P a g e
The sites of the above table have their highest point in their trend line during the months
of August October. Most of sites in this category belong to England cricket. Since it was
time of county cricket and England played most of their matches during that time, these
sites had their peak stage. Two sites namely srilankancricket.com and
indiancricketfan.com also have their peak stages during those months because their
countries played some international cricket during that time.

41 | P a g e
Category3: The Websites Which Has Highest Trend During October-December Months
1. Bbc.com/sport/cricket 2. Supersport.com/cricket/

3. Sports24.co.za.uk 4. Stuff.co.nz/sport/cricket

5. Lol.co.za/sport/cricket/ 6. Cricbay.com

The sites of the above table have their highest point in their trend line during the months
October - December. Most of the above websites belong to Australia and New Zealand.
These sites have their peak stage during these months because it was the time of Big Bash
League.

Category4: The Websites Which Has Highest Trend During January-March Months
1. Cricket.com.au 2. Cricketfanforum.net

42 | P a g e
3. Cricket.co.za 4. Bccci.tv

From the above table, we can say that all the websites have their highest point in their
trend line during the January March months. These websites belong to India. Since,
India had played more international cricket during this period, these sites had peak stage.

43 | P a g e
CHAPTER V
OBSERVATIONS AND CONCLUSION

44 | P a g e
CHAPTER - V
OBSERVATIONS
The findings from the above inferences are given below:
Globally, www.youtube.com/user/CricketICC holds the first rank in cricket
websites.
In India also www.youtube.com/user/CricketICC holds the top rank.
The websites can be categorized into 4 categories by using their trend line graph.
1. April July Months:- The websites under this category have their highest
point during that time because of IPL and T20 World Cup playoffs.
2. August October Months:- The websites under this category have their
highest point during that time because of County cricket in England.
3. October December Months:- The websites under this category have their
highest point during that time because of Big Bash League in Australia.
4. January March Months:- The websites under this category have their
highest point because of India played more international cricket during that
time.
32% (16) websites have good percentage of visitors in India.
52% (26) websites have good percentage of bounce rate.
28% (14) websites have good percentage of pageviews per visitor in India.

CONCLUSION
The following study has been undertaken to understand the popularity and usage of cricket
websites in India. Categorization of 50 websites that are preferred by the users has given

45 | P a g e
an insight into the kind of information that a user seek in the net. Specifically, four
categories have been selected by looking at their trend line. By this categorization we can
easily know the popularity of the website and its reason. Web traffic analysis method does
cover large set of population but fails to get more information about the user in particular.
A combination of survey and Web traffic analysis method can be adopted to get even more
information about the users.

BIBLIOGRAPHY
Avinash Kaushik, Web Analytics 2.0., SYBEX, A Willey Brand, New Delhi, 2015.

46 | P a g e
http://www.uniassignment.com/essay-samples/information-technology/big-data-
analytics-opportunities-and-challenges-information-technology-essay.php
https://www.ukessays.com/essays/information-technology/what-are-web-analytics-
information-technology-essay.php
http://www.alexa.com/topsites/category/Top/Sports/Cricket

ANNEXURE

47 | P a g e
% Of Daily
Visit Pagevi Daily
Glob Rank ors Boun ews Time
al In In ce Per On
S.NO Website Rank India India Rate Visitor Site
1 Espncricinfo.com 370 75 47.6 33.30 3.44 5:46
2 Cricbuzz.com 623 110 83.6 27.10 3.52 5:47
1478
3 Cricket.com.au 4 4249 37.3 53.90 2.23 3:39
4 Bbc.com/sport/cricket 126 131 7.1 52.20 2.75 4:30
5 Msn.com/en-in/sports/cricket 15 43.30 4.19 11:36
7127
6 Bcci.tv 5 4955 83.5 45.50 2.59 2:49
1815
7 Icc-cricket.com/cricket-world-cup 8 2635 57.4 56.50 1.95 2:52
8 Iplt20.com 1522 8028 64.7 27.50 4.01 6:36
9 Supersport.com/cricket/ 6185 34.80 3.99 5:20
7661 1752
10 Ecb.co.uk 2 9 32.4 50.90 2.65 2:51
1257 6831
11 Sport24.co.za/Cricket/ 9 8 1.4 55.60 2.48 3:50
12 Skysports.com/cricket 1110 1035 8.9 44.10 2.89 4:12
7256 2604
13 Pakpassion.net 8 5 21.5 38.00 3.90 8:12
1211
14 Batsman.com 60 2.6 29.80 8.40 17:00
1202 1183 40:5
15 Cricketnmore.com 04 5 92.9 25.00 24.00 2
9594 1163 0:02:
16 Islandcricket.lk 9 95 4.8 63.30 1.93 26
1778 1412
17 Lords.org 20 90 9.8 36.70 3.60 3:34
18 Foxsports.com.au/cricket/ 9447 8667 8.6 59.90 2.06 3:43
9143 3602
19 Cricwaves.com 8 2 52.7 59.80 1.59 1:45
1930 3599
20 Cricketworld.com 51 1 52.9 65.10 1.76 1:35
3004 2093
21 Kiaoval.com 67 57 8.5 46.90 2.14 3:20
3019 1678
22 Yorkshireccc.com 17 56 9.9 28.30 3.20 3:50
23 Stuff.co.nz/sport/cricket 3242 7930 4.5 41.00 3.78 7:07
1947 1043
24 Cricketweb.net 08 55 28.7 21.70 6.00 10:09
2113 2567
25 Pcb.com.pk 55 03 49.00 3.10 3:15
3196 1110
26 Lccc.co.uk 16 51 18 32.00 3.60 3:51
4005 1712
27 Caribbeancricket.com 35 64 13.8 29.70 4.10 48 | 6:25
Page
3420
28 Lastmanstands.com 04 41.00 4.70 4:36
49 | P a g e

You might also like