You are on page 1of 8

Creating our own analytics:

Research techniques how we can create our own analytics, to get user-/visitordata
and statistics from all our websites.
A website can be hosted on external server
Example of data we should have is 404 errors, errors on page, coming from channels
(organic, ads, direct etc), time on website, number of pages visited, recurring visitors,
conversionrates, browser, device type etc.
Possible technique: a crawling method with own data mining+event
observers+algoritmes to extract the data we need
Possible technique: what can we do with and how can we access
browserfingerprints, to get data from browsers and user behaviour?
how can we remember users when they return to continue the tracking, combined
with conversionrate? also when they have dynamic ip. Possible technique: create and/or
read cookies?
-Can you find out who is visiting us by checking if we can find a facebook profile, read his cookies?

To provide analytics information about clients sites was proposed to create our own
analytics script. We can divide that information on 2 groups:
Information about whole site ( site state ( down, up ) , 404 pages,
pages state, etc... );
To extract general site information ( status, list of errors ( page not found, redirects,
critical errors, etc ) ) which is common for all users we can create a crawler which will collect
all this information crawling all public sites pages. The advantage of this method is that we
arent depending on that page user goes. Well crawl all public pages which can be accessible
by simple user ( a tag ). Main disadvantage of this method is that we cant access pages which
arent public or requires authentication.

Information about user activity ( users device, operating system,


referrer page, etc... ).
To extract activity information about every user - we need to include our script in every
loaded page by users of our client, which will collect information and send it on our main server.
That kind of script can be included on both application parts:
BackEnd
BackEnd analytic script can be written on backend language and included in whole
project. We chose PHP because we know that all clients applications are written on PHP
language and we are sure that server contains PHP. We can get users device, OS, incoming
parameters ( POST, GET ), etc Advantage of this kind analytic script is that we can get server
stats for that moment, see server load and POST parameters ( that information we cant get
from front end ). Disadvantage of this method is that we need to include that application
manually in every application in code. Our analytics application can be included clients
application but in that reason we can have problems:
We include our logik in clients application. This means that we
raise clients application load time. We cant evade this problem totally but can
just minimize load time with good architecture and good quality. Also to minify
load time of application we can create an analytics local storage on every
domain and share it on our main server by a cron job. This means that we collect
users analytic information on users file system and share or extract it hourly
( interval can be changed ). We lose in data actuality ( on our main server ) but
we win in load speed.

Our script will send data to our main server. If our server will be
overloaded or down - load time of clients page will rise. PHP is synchronous
language. This means that user cant load page till we dont collect and send
information to our server. Every user need to wait while clients application loads
+ time for our backend script analytics. Exists a possibility to run PHP
asynchronous - to run it with exec function ( create new process ) but this
method is not flexible and can provide security leaks, that is why is not
recommended.

BackEnd analytics work scheme.

In result we have PHP script which should be included in code of every application. We
can minify process of including its in application but we cant evade this problem totally. For WP
applications the good idea to write analytics script as a plugin. In this case we simplify process
of installing. This script can affect clients load speed and we cant evade this totally we can just
minify it. Advantage of creating backend analytics system is possibility to extract server
information and POST parameters. We think that this method has too much problems with too
low benefits. Practically all data we can receive from front end script which is described below.

FrontEnd
FrontEnd analytics means script which included on clients site as a front end file, which
is able to interact with our main server. We have 2 possible front end technologies which
provides interaction with external server:
Flash
Flash provides possibility to send HTTP queries with parameters. This technology
provides all required functionality for us but this technology requires additional plugin for
browser and some of browsers stop support it because of few security leaks. That is why we
propose to not use it.
JavaScript
JS - standard browsers language. It has all required functionality. This language is able
to extract user-agent details ( users device and OS ) , send cross domain ajax requests ( our
external server ), manage users cookie and create custom events on page and register it.
Using JS we can write a script which will analyze users actions and send all required
information to our main server which will work as API. The main question is what data do we
need? All process of getting information is reduced to listening predefined events, collect
needed information and sending it to our main server API.
All installation process is reduced to add external script from our main server in example
<script src=http://best4u.nl/analytics.js></script>
Complexity of that script depends of our needs. Algorithm examples to extract:
Referer page ( from where user comes ) - we can get this information from
referer property of document object. We cant access users history. Developers have
no access to it.
Browser and Device - we can get from user-agent property. In example
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.100
Safari/537.36
In this example we can see that user uses Chrome/54.0.2840.100 and Linux OS ( details
). We can extract it using Regexp ( the most flexible tool for string parsing ).
Users IP - when user access our external analytics scripts we have possibility to
get users IP using accessing superglobal array $_SERVER.
Remember users - to determine is a user already used a web app we usually use
authentication system with login and password. But in case with analytics it isnt normal
to force user register on our service and login in our analytics system. No one will do it.
That is why process of analyzing is user was here? should be in background, without
user intervention.
if user is logged in facebook ( or another social network ), to
extract users unique ID, we can save it to our main database and next time we
just see was this user before by ID or not. Social networks provide us possibility
to extract unique ID, but we cant do it without users confirm. He need to confirm
that he share us that data to us ( oAuth ) . It is too much actions for users. To low
users percentage will do it, that is why we propose to refuse that idea.
Determine user by IP - this method has too much inaccuracies.
In one users session he can switch wifi ( his IP will change ) or it can be a
network with few users. This network has one IP. Every user will have the same
IP. We have to much inaccuracies, that is why we propose to refuse that idea.
Browsers Cookie & Local Storage - these are methods of data
storage on client side. Difference between them are :
1. Cookies interacts in HTTP headers between server
and client on every query. But local storage saves data to browser local
database. Local storage saves information for every domain for user. We
have no access to localstorage and cookies of another domain.
2. Cookies are connected to server, but local storage
connected to domain name. That gives for us possibility to save users
sessions on our external server and to use that session on another
domains to identify the same user on different domains.
Analyzing that techniques we propose to use standard browser cookies which is able
emulate simple users session. We are able to see if the same user enters on different domains,
which are supported by our analytics script. Disadvantages of this method is that if user switch
device or delete his cookies ( overwhelming majority of users dont do this ) - next time our
system will interpret that user as a new one. We cant evade this bottleneck.

Time on website & number of pages visited - we are able to write a script which
periodically sends users information to our server. We collect this data locally ( on users
browser ) and when user reloads page or closes window we send information to our
server. The bottleneck here is: if the computer has been force turned off then we cant
catch that moment. That is why we propose to create an interval on client side, which
periodically sends a request on our main server with information about time on a page.
In example if users computer was turned off - next time when he visit clients site we will
get information from local storage and synchronize it with our main common database.
In this case we can lose analytics data maximum an interval between requesting server
when users computer was turned off and this user will not enter on this site. This will
happen extremely rare.
FrontEnd analytics work scheme.

Information from social networks


Unfortunately we have no possibility to extract facebook profile info ( or any other social
network ) without users confirm and actions. Few years ago it was possible but for now all social
networks use oAuth protocol ( it is like login with a social network ). We cant extract users
information without his interaction. To receive that information we need:
1. to register our application on social network;
2. to implement it in our analytics;
3. when user enters on our clients site he need to be redirected on facebook;
4. when user need to confirm that he want to share us his information;
5. after that facebook redirects user on our site with needed params.
Is is too much actions to get that information. User need to do that steps. We and our client
need that information, but not user. Users just will not do this, they are not interested in it.
As a result we have 3 applications of clients side analytics:
JS script, which collects data and sends information to server;
PHP script on server side which receives that data and put it in our main DB;
Application which shows results of it.
First and second applications are not complicated because they just collect data and put it in
database in right syntax. Great attention should be paid to second part - all requests from users will
send to this page. This part should be optimized as many as possible. This part should have no
frameworks, it should be written on native, clean PHP. We need to think about every millisecond of
loading time.
Also we need to think about 3rd application part. Main logic of whole analytics is in this part,
because in first and second part we just collect elementary parts of information, but in 3rd part - we
need to show it to user as charts or in another user-friendly way. We need to know exact list of
requirements, what we want to show on that resource. Only then will we see another potential
problems of getting it.

Additional possibilities
Time on website, number of pages visited, recurring visitors, browser, device type etc - all
this is standard analytics information and we have no problems in collecting that information. It is
simple JS listeners and intervals which collect information and periodically sends needed data to our
main server. We want propose to create unusual tools. These tools is too complicated to describe it
in this document. If we are interested in it - we need to realize a prototype of it and see practical
problems, because in theory we can do it. You can see below just proposal, but not a technical
description of it:

Clients custom events handler ( or another event on another element on page ) - we can
create a construct for generating clients own analytics. In example user registers his application in
our system. We can give him possibility to add custom events in user-friendly interface on our main
server.
It can be realized in next way:
User enters on our main analytics site on his account details page. He goes to domain
details page and start to generate own analytics listeners. We open in new tab his site through our
( our site will work as proxy ). As a result he see his site and can navigate like simple user but our
proxy will add a configuration panel to it. Here he can construct his analytics listener. We can build
user-friendly interface to add custom listeners. In example client wants to see how many time user
clicks decline on a page.
It can looks like ( it is just sketch ):
Select object on page - when user clicks on it he is able to select ( click on ) object on page
( in our case it is Decline button ). Using this button user select object on that event is initializing.
Select event - After he choses element on page he should select event on which will be
initialized analytics information getting. In our case it is Click ( when user clicks on Decline button).
Generate params - If user need to share some params to our main server. In this case it is
not necessary, that is why it is not required.

It was step 1 - to register event listeners. After that he need to fill step 2. On step 2 he need
to select and configure how information from step 1 should be looked ( Different charts templates
and configuration of incoming params from Generate params fields ).

After that configuration client will see his custom analytics for his domain name. We will
share this configuration to analytics js file, which every user requests from our common server. In will
be dynamically generated script.

Heatmap of clicks - JS is able to track every users mouse clicks and to show most clicked
sites zone. It is very useful tool for clients, they are able to see what is the hottest sites zone.
,
Example of heatmap of clicks report.

We are able to save mouse coordinates on moment when user clicks on something on page
and share coords of mouse to our main server. After that we can collect average clicks on page,
open site through our service, which will be able to draw heatmap of clicks.

Users mouse road - by the same way as heatmap of clicks we are able to save mouse
road of a user. We are able to save users mouse position and show it our prepared service, which is
able to show current clients site and emulate users mouse actions.

You might also like