MANUAL Urchin v5x

Urchin 5.
000 Urchin Administration/User Guide
Copyright 2003 Urchin Software Corporation. All rights reserved. Printed Date: 01/25/2005 03:03:02 Modified Date: 01/23/2005 11:41:26
Table of Contents
Chapter 1: Getting Started................................................................................................................................1 Welcome to Urchin! .................................................................................................................................1 System Requirements..............................................................................................................................2 Supported Platforms and Hardware Requirements...........................................................................2 Urchin Setup Requirements ...............................................................................................................4 Installation...............................................................................................................................................6 Quickstart Installation Guide .............................................................................................................6 Installation Guide (Windows).........................................................................................................10 Installation Guide (UNIX)..............................................................................................................11 . Installation Guide (Mac OS X 10.2.x)............................................................................................14 Installation Guide (Sun Cobalt) .......................................................................................................15 Uninstalling Urchin 5......................................................................................................................18 Troubleshooting Install Problems...................................................................................................20 Upgrades................................................................................................................................................20 Upgrading Urchin 4.........................................................................................................................20 Upgrading Urchin 3.........................................................................................................................23 Upgrading Urchin 3 on Sun Cobalt.................................................................................................24 Urchin 3, 4, &5 Reporting Differences...........................................................................................26 Upgrading Urchin 5.........................................................................................................................28 Initial Configuration..............................................................................................................................30 Ecommerce Reporting ...................................................................................................................30 Setup Recommendations.................................................................................................................32 Chapter 2: Visitor Tracking............................................................................................................................34 Using UTM with Ecommerce ..............................................................................................................34 Visitor Identification Methods...............................................................................................................35 Urchin Traffic Monitor (UTM) ..............................................................................................................38 SessionID Identification......................................................................................................................45 UTM QuickInstall (Apache) ................................................................................................................46 Installing UTM On Every Page (Apache).............................................................................................47 UTM QuickInstall (IIS).......................................................................................................................48 Using UTM with Domain Aliases.........................................................................................................49 Using UTM with Multiple Sites............................................................................................................50 Tracking Flash and Browser Events (UTM5 only).............................................................................51 Tracking Banner Ad Exits and Other Outbound Links.........................................................................53 Chapter 3: Urchin Administration..................................................................................................................54 Administration Overview .......................................................................................................................54 Profiles...................................................................................................................................................56 Importing Profiles (Windows)........................................................................................................56 . Working with Profiles.....................................................................................................................57 Log Files................................................................................................................................................59 Working with Log Sources ..............................................................................................................59 Log Management ............................................................................................................................60 . Log Rotation Best Practices............................................................................................................61 Logging Apache and IIS..............................................................................................................64 Logging iPlanet............................................................................................................................67 Logging: Tomcat (Apache Jakarta Project)....................................................................................67 . Logging Other Webservers..........................................................................................................68 Wildcard &Date Substitution in Log Path......................................................................................69 Processing Historical Logs..............................................................................................................72 i
Table of Contents
Chapter 3: Urchin Administration Log Reprocessing............................................................................................................................73 Filtering..................................................................................................................................................74 Filtering Overview..........................................................................................................................74 Filter Fields ......................................................................................................................................76 Exclude/Include Filters ....................................................................................................................81 Decode URL Filters .........................................................................................................................82 Search &Replace.............................................................................................................................83 Lookup Table Filters.......................................................................................................................84 Advanced Filters ..............................................................................................................................85 DynamicURL Filters (deprecated)..................................................................................................86 Regular Expression Overview.........................................................................................................88 Affiliations, Users &Groups..................................................................................................................89 Working with Affiliations...............................................................................................................89 Working with Users &Groups .........................................................................................................90 Scheduling Tasks...................................................................................................................................92 Working with the Task Scheduler...................................................................................................92 System Settings......................................................................................................................................94 Changing the Port Number..............................................................................................................94 Licensing Urchin.............................................................................................................................94 DNS Database Update.....................................................................................................................96 Chapter 4: Reporting Interface.......................................................................................................................97 ReportSide Filtering............................................................................................................................97 Reporting Interface Overview ................................................................................................................97 Exporting Data.......................................................................................................................................99 Date Range.............................................................................................................................................99 Chapter 5: Ecommerce Module..................................................................................................................102 Ecommerce Overview.......................................................................................................................102 ELF &ELF2 Log Formats ....................................................................................................................104 Custom Ecommerce Logs ..................................................................................................................107 Visitor Correlation...............................................................................................................................112 Cancelling Ecommerce Transactions................................................................................................113 Chapter 6: Campaign Tracking Module......................................................................................................114 Campaign Tracking Overview.............................................................................................................114 The Five Dimensions of Campaign Tracking......................................................................................117 Step 1: Track Campaign Data (Set up UTM3)..................................................................................118 Step 2: Install and License Campaign Tracking..................................................................................120 Step 3: Define a Conversion Goal.......................................................................................................120 Tagging Your Online Links 123.....................................................................................................122 Import Cost Data from Google............................................................................................................123 Import Cost Data from Overture..........................................................................................................126 Adding Cost and Impression Data.......................................................................................................128 How To Analyze Keyword Buying.....................................................................................................129 How To Track ContentTargeted Ads................................................................................................134 How To Track Email Campaigns........................................................................................................136 How To Use Master Tracking Codes ...................................................................................................138 URL Builder........................................................................................................................................139 Implementation Checklist....................................................................................................................141 ii
Table of Contents
Chapter 7: Advanced Topics ..........................................................................................................................143 Utilities .................................................................................................................................................143 Administration Utilities Overview................................................................................................143 geoupdate: DNS Database Update Utility..................................................................................145 inspector: Urchin Installation Integrity Checker...........................................................................147 u3importer: Urchin 3 Data Import Utility.....................................................................................148 uconfdriver: Configuration Management Utility........................................................................151 uconfexport: Textbased Configuration Export Utility..............................................................162 uconfimport: Textbased Configuration Import Utility.............................................................164 uconfschedule: Global Scheduling Utility..................................................................................167 udbsanitizer: Database Maintenance Utility...............................................................................168 urchinctl: Urchin Services Control Utility....................................................................................171 urchin: Urchin Log Processing Engine.........................................................................................172 Integration............................................................................................................................................173 NFS locking requirement..............................................................................................................173 Overview of Urchin Integration Capabilities................................................................................173 Changing the Location of the Urchin Data Directory...................................................................175 Using an Existing Apache Webserver (UNIXtype Platforms)...................................................177 Using an Existing IIS Webserver (Windows Platforms)..............................................................179 . Using External Authentication or Authentication Bypass............................................................181 Linking Directly to Urchin Reports ...............................................................................................183 Scriptbased Configuration Management Overview....................................................................186 Data Export...................................................................................................................................189 . Customization......................................................................................................................................190 Custom Log Formats.....................................................................................................................190 Custom Navigation........................................................................................................................192 Custom Reports.............................................................................................................................194 Custom Date/Time Formats..........................................................................................................196 Custom DNS Entries.....................................................................................................................197 Custom Lookup Tables.................................................................................................................198 Cobranding Urchin........................................................................................................................200 Hosting Automation Solutions .............................................................................................................201 How are HSphere and Urchin 5 Integrated?...............................................................................201 Using Urchin with Plesk PSA 5.0.................................................................................................201 Ensim Webppliance .......................................................................................................................202 Sphera's HostingDirector ...............................................................................................................203 Performance &Tuning.........................................................................................................................203 Global Filtering of Hits from Monitoring Software......................................................................203 Reducing Disk Storage for Urchin Profile Monthly Databases....................................................204 Security Features ..................................................................................................................................207 Activating SSL on the Urchin Webserver.....................................................................................207 Chapter 8: Reference......................................................................................................................................208 Integer Field List..................................................................................................................................208 Regular Field List................................................................................................................................209 Regular Report List..............................................................................................................................213 Configuration Table and Directive List...............................................................................................217 Error code list for failed FTP and HTTP remote webserver log transfers...........................................225
iii
Chapter 1: Getting Started
Welcome to Urchin!
Urchin 5 represents 7 years of development, and is in our view the most advanced web analytics package available today. Combining proven datacenterclass performance with unprecedented easeofuse, Urchin 5 is the best choice for businesses and hosting providers of all sizes. What is Urchin? Urchin is a web analytics system designed to enable businesses to easily analyze the traffic to their website(s) and create detailed, insightful, and intuitive reports. Basically, Urchin is a loganalysis program, but its sophisticated unique visitor reporting goes far beyond what was available up until now. Chapter 1: Getting Started 1
How Does Urchin Work? Urchin consists of 4 primary components: The Admin Server The Logprocessing and DNS resolution engine The Visitor Interaction Data Architecture (VIDA) database The Scheduler The Admin Server is Urchin's nerve center. It is a webbased control panel system, powered by a customized Apache web server, that controls all the other Urchin components. With the Admin Server, you can access and control the Urchin system from any computer on the Internet (by turning on remote access and reporting). The logprocessing and DNS resolution engine does the heavy lifting in the Urchin system, coverting large raw log files into meaningful data, translating IP addresses to domains, and entering that information into the VIDA database. The VIDA system is our highlyspecialized, optimized, proprietary database for quickly entering and extracting web analytics data. This analyticsspecific database is a significant part of Urchin's speed advantage over the competition. The Scheduler regularly checks the configuration database for scheduled tasks that need to be run, and executes Urchin to process them at their scheduled times. Who should use Urchin? Urchin is ideal for any individual or business who has access to their website's log file(s) and HTML. If you do not have access to your site's log file(s), ask your hosting provider to install Urchin. It is very popular among hosts. Contact sales@urchin.com.
System Requirements
Supported Platforms and Hardware Requirements
Urchin runs on numerous architectures and operating systems. An Urchin installation is only needed on a system that will be processing logs. For viewing reports, only a web browser is required. Supported Platforms Chapter 1: Getting Started 2
Windows Windows 2003 Server Windows XP Windows 2000 (Professional and Server) Windows NT 4.x UNIXtype Systems Mac OS X (10.1 and higher) Mac OS X Server (10.1 and higher) Linux x86 RedHat Enterprise 3.0, RedHat 9, RedHat 8, RedHat 7.x, RedHat 6.x Fedora Core 2, Fedora Core 1 SuSE 9 Other Linux OSes should be compatible; see the list in the NonExplicitly Supported Platforms section FreeBSD 5.2, FreeBSD 4.x Solaris 2.6, 7, 8 , 9 (SPARC) Solaris 9 (x86) Sun Cobalt RaQ550, Qube3, RaQ4, RaQ3 Anticipated OS Support The following OSes should have a native build of Urchin released in the timeframe noted for each one: FreeBSD 5.3 first quarter 2005 Solaris 10 first quarter 2005 If you don't see your OS listed, and a substitute cannot be found in the compatibility list in the next section, contact us to suggest it as a possible inclusion. NonExplicitly Supported Platforms We strive to make Urchin available natively on as many platforms as is economically reasonable. If there is no specific Urchin distribution for your platform, you may find an available Urchin distribution that is compatible with your OS as explained below. Windows 98, Windows 3.x: Urchin cannot be installed on Windows 98 or 3.x, but these platforms can be used to view reports with Internet Explorer 4.x and newer. Linux: There are many different variants of Linux and we don't build an individual Urchin distribution for all of them. However, there is typically a high degree of compatibility across Linux flavors so one of our distributions almost certainly will work on your machine. Some known compatible distributions are: Chapter 1: Getting Started 3
RedHat Enterprise Linux 2.1: use the RedHat 7.2 distribution of Urchin SuSE Linux 8: use the RedHat 7.2 distribution of Urchin For all other x86based Linux variants you can determine which Urchin distribution to use by looking at our FAQ article on this topic. Solaris: For SPARC systems, any OS release prior to Solaris 2.6 is not supported. For x86 systems, any OS release prior to Solaris 9x86 is not supported. Urchin 5 System Requirements Urchin's superior performance allows you to get more from less hardware investment. For instance, an older Pentium II might be too slow for desktop use, but will make a fine Urchin server. And Urchin's unmatched portability means you can use whichever operating system you like. Below, we provide a recommended level of hardware for high performance. Recommended Systems Single Small to Medium Website Analysis 500mhz or better processor 128mb RAM 10GB+ IDE hard disk Ethernet interface Service Provider / Enterprise Installations 1Ghz Pentium IV / 500mhz UltraSPARC / similar mhz range PPC/MIPS/etc. 256mb RAM Ultra2/Wide SCSI hard disk (such as a Seagate Cheetah) 100baseT ethernet Backup system Memory/System/Disk Usage Urchin Memory(RAM) usage can be configured to use between 20500Mb Urchin can be configured to run at low, normal or high priority Urchin's data storage will use approximately 10% of the size of raw logs
Urchin Setup Requirements
This article lists the operational issues that should be anticipated prior to installing and running Urchin. Some of the information is required to operate Urchin successfully. Other items are important for using Urchin most effectively once the software is installed. Basic Urchin Installation Considerations On Windows you must install while logged in as the Administrator. On UNIXtype systems you may install as any user, but if you do not install as the superuser, you will be restricted in what areas of the file system you may install. Urchin comes bundled with an Apache webserver binary for configuration and report delivery. Your systems administrators should be aware that this new web service will be running after Urchin is installed. Although the Urchin distribution itself is small, taking up only about 25 megabytes, you should install in a disk location that has plenty of room (e.g. several hundred megabytes at least) to allow for the growth of the Urchin databases over time. See the Performance and Management Issues section for additional considerations. If you are upgrading from Urchin 3, you will need to import your databases into Urchin 5 using the u3importer utility. There is no direct upgrade of Urchin 3 to Urchin 5 simply by running the Urchin 5 installer. See Upgrades in the Getting Started section of the Documentation Center. Upgrading from Urchin 3 or Urchin 4 to Urchin 5 requires relicensing your product. Basic Urchin Processing Considerations Access to webserver logs you must know the path to the log files for a given site, and you must have permission to access these files. If the logs are on a remote system, then you will also need an account name and password to use when retrieving the logs. Properly configured log format although Urchin can process custom log formats, you will simplify the management requirements if you configure your webserver as appropriate to log in a standard format. It is recommended that you use either Extended Combined Log Format (e.g. NCSA or Apache logs), or W3C Extended Log Format (e.g. IIS logs). For IIS sites, logging of Process Accounting should be turned off. See the Advanced Configuration section for additonal considerations. Unique user account for Urchin processes On UNIXtype systems it is desirable to enhance security by having Urchin programs run as a special user id that is used exclusively for Urchin and has only limited privileges. Setting up such an account will require that you have elevated or superuser privileges on the system in question. Scheduling you will need to choose a run schedule for Urchin processing to deliver reports in a timely fashion as well as account for the time needed to process if you have large data sets. Advanced Urchin Processing Considerations If you desire Unique Visitor tracking then you will have to perform the following basic steps: Install the UTM sensor code in the web pages on your site Activate cookie logging in the log format for your webserver Set the tracking methodology in the Urchin Profile for the website to be UTM If you choose not to use Unique Visitor tracking then you should consider what level of granularity you desire for visitor or session reporting, and select the appropriate alternative Visitor Tracking Method for each site. Besides UTM the choices are IP only, IP/UserAgent (the default), Session ID, or Username. Chapter 1: Getting Started 5
Performance and Management Issues Log rotation if you do not have some external mechanism for archiving or removing webserver logs after they have been processed by Urchin, you can configure Urchin to perform this task in the Advanced Settings for each Log Source. Retaining past Urchin databases for historical reporting once the databases for a given month are created they are available from then on for historical analysis. Users should consider how far back they need to keep historical data so they can plan for purging unnecessary data to save disk space. Urchin can be configured to compress databases that are older than a certain date. Memory requirements Urchin has configuration controls to limit the amount of RAM it utilizes when processing logs. The default is set to 20Mb, which may be too conservative for sites with logs greater than 10Mb in size. Plan to have sufficient system RAM so that you may increase Urchin memory usage as needed and tune the software's memory settings for maximum processing performance. Location of Urchin data storage utilizing the etc/urchin.conf file, Urchin can be configured so that the report databases are stored in a file system area outside the Urchin distribution. This allows you to allocate dedicated sufficient file system space for database growth where it's most convenient. Remote Access and Integration Issues Using SSL for Urchin administration and reporting the webserver that is bundled with Urchin is compiled with support for SSL. The configuration does not have SSL activated by default, however this can be turned on as desired by the user. Firewall configuration if your network topology includes firewalls, proxy servers, and other elements that will be in between the Urchin processing server and users trying to view reports or systems that hold logs that need to be retrieved, then those devices will have to be configured so that they don't interfere with Urchin's remote access. This typically can be done without subverting the security that such a topology is intended to provide.
Installation
Quickstart Installation Guide
This Quickstart article is for first time installers of Urchin. If you have an existing installation, read the Upgrades section. When you have completed the installation steps, login to the Urchin administration interface to perform configuration. The initial username and password are:
Username: admin Password: urchin Reset the password during your initial configuration in the Setup Wizard. If you require unique visitor and session tracking, complete the steps in this Quickstart Guide and continue with the UTM QuickInstall article in the Visitor Tracking section. Installing on Windows Systems Go to www.urchin.com and click the Download link. Download the Urchin for Windows installer to your desktop. Once the download has completed, doubleclick the installer file to start the InstallShield wizard. Follow the onscreen instructions. The defaults should be acceptable for most installations. Once the installer has completed, go to Start > Programs > Urchin > Urchin Administration and login. You will get a License Urchin screen. Click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. Once licensing is completed you will be presented with a Setup Wizard. Follow the instructions to complete your initial configuration. Please make sure to reset the password for the admin account and to record this password somewhere for safekeeping. When the Setup Wizard has completed you'll be taken to the Profile configuration screen. Click Add to create a new Profile. Once you have created the appropriate Profiles, you're ready to start processing logs so that you can view Report data for your sites. To access the administration interface remotely or for users to see their individual reports, use the URL http://yourhost:9999, where yourhost should be replaced by the name of the system where Urchin is installed. Installing on UNIXtype Systems Go to www.urchin.com and click the Download link. Select the installer for the OS type that most closely matches your platform. The name of the installer image will include the Urchin version and the operating system type (e.g. urchin5000_freebsd4x.sh, urchin5000_redhat9.tar.gz) If necessary, upload the installer to a temporary location on the system on which you are installing Urchin. If you are not on the system's console, telnet (or use ssh if available) to the system and cd to the directory where the installer is located. Installers will have either a .sh or a .tar.gz suffix. Depending on the type of installer you will do one of the following: For a shell archive (e.g. urchin5000_freebsd4x.sh) simply type the name of the file like so: ./urchin5000_freebsd4x.sh. This will unpack several files that comprise the installation kit. For a tar.gz image (e.g. urchin5000_redhat9.tar.gz), uncompress and unpack the installation files with the commands: Chapter 1: Getting Started 7
gunzip urchin5000_redhat9.tar.gz tar xf urchin5000_redhat9.tar From the command line execute the main installation script by typing: ./install.sh The script will prompt you for input as needed; just follow the instructions. When the installer has finished, you will be given the URL to access the Urchin administration interface, as well as the default admin password. Copy/paste the URL into a browser window, and enter the admin username and password to start configuring Urchin. You will get a License Urchin screen. Click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. Once licensing is completed you will be presented with a Setup Wizard. Follow the instructions to complete your initial configuration. Please make sure to reset the password for the admin account and to record this password somewhere for safekeeping. When the Setup Wizard has completed you'll be taken to the Profile configuration screen. Click Add to create a new Profile. Once you have created the appropriate Profiles, you're ready to start processing logs so that you can view Report data for your sites. To access the administration interface remotely or for users to see their individual reports, use the URL http://yourhost:9999, where yourhost should be replaced by the name of the system where Urchin is installed. Installing on Mac OS X 10.2.x Systems Go to www.urchin.com and click the Download link Download the Urchin installation archive for Mac OS X 10.2.x If the installer is downloaded directly via a browser to the system where it will be installed, an Urchin 5 folder will automatically be created on the desktop. If downloaded via some other mechanism such as ftp, doubleclicking the installation archive icon which will unpack the archive and create the desktop folder. Open the Urchin 5 folder, and doubleclick the Urchin.mpkg file, which will launch an interactive installation process. It's required that you are using an account with administration privileges to install. At the end of the installation a browser will launch and take you to the Urchin administration screen. You will get a License Urchin screen. Click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. Once licensing is completed you will be presented with a Setup Wizard. Follow the instructions to complete your initial configuration. Please make sure to reset the password for the admin account and to record this password somewhere for safekeeping. When the Setup Wizard has completed you'll be taken to the Profile configuration screen. Click Add to create a new Profile. Once you have created the appropriate Profiles, you're ready to start processing logs so that you can view Report data for your sites. To access the administration interface remotely or for users to see their individual reports, use the URL http://yourhost:9999, where yourhost should be replaced by the name of the system where Urchin is installed. Installing on Sun Cobalt Systems
Use a web browser from your desktop system to connect to http://www.urchin.com/download/urchin5 On the download page select the installer that most closely matches your platform. The name of the installer will include the Urchin version and the Sun Cobalt system type. For example: urchinc5.0.00_cobalt_raq550.i386.pkg Save the .pkg file to your desktop or to a temporary folder Using your browser connect to the main Site Administrator's page for your Cobalt box Navigate to the section of the interface for installing new third party software. The location of this area in the Cobalt interface will be platform specific: Raq 3, RaQ 4 click on Maintenance in the left hand frame, then click Install Software in the top row Qube 3, RaQ 550 click on the BlueLinQ tab, then click Third Party Software, then click the Install Manually button XTR click on the BlueLinQ tab, then click New Software, then click the Install Manually button Prepare and launch the package installer: RaQ 3, RaQ 4 In the Software Package box select the Upload radio button, then click the Browse button to the right and navigate to the location on your desktop system where you saved the .pkg file you downloaded. Once you click the Open button in the browse window, the pkg filename will be entered into the Upload text box. Then click the "Install a pkg Package" button. When the installation is finished an Urchin link will appear in the lower box for installed software. Qube 3, XTR, RaQ 550 In the Location box select the Upload radio button, then click the Browse button to the right and navigate to the location on your desktop system where you saved the .pkg file you downloaded. Once you click the Open button in the browse window, the pkg filename will be entered into the Upload text box. Then click the Prepare button. Once the package has been prepared, an Install Software window will appear. In this window click the Install button. When the installation is finished Urchin 5 will be listed under the Programs tab. Click on the Urchin link in your Cobalt administration interface and the Urchin admininstration login window will appear. Enter the admin username and password to start configuring Urchin. You will get a License Urchin screen. Click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. Once licensing is completed you will be presented with the Setup Wizard. Follow the instructions to complete your initial configuration. Please make sure to reset the password for the admin account and to record this password somewhere for safekeeping. When the Setup Wizard has completed you'll be taken to the Profile configuration screen. Click Add to create a new Profile. Once you have created the appropriate Profiles, you're ready to start processing logs so that you can view Report data for your sites. To access the administration interface remotely or for users to see their individual reports, use the URL http://yourhost:9999, where yourhost should be replaced by the name of the system where Urchin is installed.
Installation Guide (Windows)
The installer is an executable which guides you through all the steps necessary to install Urchin. The basic components of the Urchin 5 installation process are: Creating the distribution directory and unpacking the files Installing and starting an Apache webserver as an NT service to allow web based configuration and report delivery Installing and launching the Urchin task scheduler which manages log processing jobs as an NT service Initial configuration and demo licensing of Urchin via the administration interface Installation Preparation You must be logged in as Administrator on the console of your system in order to install Urchin. By default the Urchin webserver service will use port 9999 when it launches. You will have the option of choosing a different port number during installation. Please verify that any port you choose does not conflict with existing operational services on your system. You will need access to the Internet from your machine. Internet access is required to complete the demo licensing and activate your Urchin distribution once it is installed. Installation Instructions If you are upgrading an existing installation of Urchin, please consult the Upgrades section of the Documentation Center for relevant details. Double click on the urchin5XXX_win_setup.exe (e.g. urchin5000_win_setup.exe) icon to launch the installer, and follow the instructions in the dialog screens. Initial Configuration Using the Administration Interface Once Urchin is installed you can connect to your Urchin administration interface by going to the Start Menu, and selecting Programs>Urchin>Urchin Administration. Alternatively, you can enter the direct URL http://localhost:port_number into your browser, where port_number is either 9999 or a number you may have chosen during the installation. Wen you initially connect to the configuration interface, you will be presented with a License Urchin wizard. You should click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. When finished with the license process, you will be returned to the Urchin administration interface, where you will be led through a Setup wizard that will set some required initial configuration parameters. Remote Access Configuration Chapter 1: Getting Started 10
If you connect to the Urchin configuration interface by using the hostname in the URL (e.g. http://yourhost:9999 instead of http://localhost:9999) the program will detect this as a remote access (even if you are on the console of the machine you're connecting to) and will prompt you for a username and password. The default settings for logging in with administration privileges are: Username: admin Password: urchin Managing Urchin Services There are two programs that are installed as NT services, the Urchin Task Scheduler and the Urchin Webserver. These services may be manually stopped and started by using the Disable Services and Enable Services shortcuts under Start Menu>Programs>Urchin. When these shortcuts are used both services are simultaneously turned off or on. User Access to Reports Users should use URL http://yourhost:portnumber, where yourhost is the name of the system where Urchin is installed and portnumber is the port number for the Urchin webserver (9999, unless you specified a different number during installation). Advanced Reporting Options If you require unique visitor and session tracking, continue with the Visitor Tracking section of the Documentation Library. If you would like to know about processing ecommerce data, please see the Ecommerce Module section as well.
Installation Guide (UNIX)
The basic components of the Urchin 5 installation process are: Creating the distribution directory, unpacking the files, and setting appropriate ownership and file permissions Configuring and launching an Apache webserver to allow web based configuration and report delivery Launching the Urchin task scheduler daemon, which manages log processing jobs Initial configuration and demo licensing of Urchin via the administration interface The installer image you download is in the form of an archive, which will unpack into an install script, some support files, and the Urchin distribution. Urchin can be installed by any legitimate user on your system. It does not expect nor require any special system privileges either to install or operate, and is specifically designed to run as a nonroot user for security reasons. Installation Preparation Chapter 1: Getting Started 11
You may install as any user, with the exceptions that you will have to install as the superuser if you install in a directory that has write access restrictions, or if you configure your webserver to respond to requests on a port number that is lower than 1025. Only the superuser can configure the webserver with a port number lower than 1025. Please verify that the port you choose does not conflict with existing operational services on your system. The installation process will attempt to check for conflicts. If you are installing as root, you will also be asked for a user account name and a group name, which are used in the configuration file for the webserver, and also used to set the ownership on the installed Urchin distribution. The user and group names you select must be valid logins recognized by your system; you cannot choose arbitrary names for these. You also are not allowed to use root as the login to own the Urchin files for system security reasons. If you are not logged in as root while installing, you will typically not have the privileges to set the ownership of the files to the user of your choice. The install script will automatically detect this and install the distribution with your login as the owner of the files. Lastly, you will need access to the Internet from your machine, since it is required for you to connect to the urchin.com site to complete the demo licensing and activate your Urchin distribution once it is installed. Installation Instructions The installer archive could be either a .tar.gz or a .sh archive, depending on your OS, and will be labeled with a name that identifies it for your OS type (e.g. urchin5000_freebsd4x.sh, urchin5000_redhat9.tar.gz). Copy the archive to any writeable area on your system and depending on your install image type do one of the following: For a shell archive (e.g. urchin5000_freebsd4x.sh) simply type the name of the file like so: ./urchin5000_freebsd4x.sh If you get a "Permission Denied" error, then run the command in this fashion: sh ./urchin5000_freebsd4x.sh For a .tar.gz image (e.g. urchin5000_redhat9.tar.gz), uncompress and unpack the installation files with the commands: gunzip urchin5000_redhat9.tar.gz tar xf urchin5000_redhat9.tar Once the archive has been unpacked you should have the following files install.sh (the installation script) install.txt (instructions similar to this document) license.txt (legal restrictions, licensing, and purchasing info) inspector (verifies the installed distribution) gunzip (supplied to unpack urchin.tar.gz) urchin.tar.gz (a tarred and compressed Urchin distribution) To install simply type:
12
./install.sh and follow the instructions. Initial Configuration Using the Administration Interface The installation script will start the Urchin webserver and Task Scheduler daemons. Once they are started you can connect to your Urchin administration interface by using the URL http://yourhost:9999, where yourhost is the DNS hostname for your system. If you have changed the default port number from 9999 to some other port during the installation, then you should use that port number in the URL. You will get a login screen. Use these initial login values: Username: admin Password: urchin Upon initial login, the interface will take you to a License Urchin wizard. You should click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. When finished with the license process, you will be returned to the Urchin administration interface where you will be led through a Setup wizard that will set some required initial configuration parameters. Managing Urchin 5 Services There are 2 daemons, urchind and urchinwebd, that need to be running in order for log processing, reporting, and configuration administration to occur. These daemons are stopped and started by the urchinctl program in the bin subdirectory of your Urchin distribution. To start or stop both daemons, use: ./urchinctl start ./urchinctl stop You can also specify to only start or stop one daemon at a time by using a w option for the webserver or a s option for the scheduler. To see all of the available options, execute urchinctl with the h option. Any errors encountered when one of the daemons is launched should be reported on the command line. For the urchinwebd daemon, once you think it is running successfully, you should also check the var/error_log file for any startup problems. At install time, the install.sh script will create a bootup/shutdown script that you can use in conjunction with your system rc files to cause the Urchin services daemons to be started at boot time and halted at shutdown. The script is named urchin_daemons and is located in the util subdirectory of your Urchin distribution. User Access to Reports Users should use URL http://yourhost:portnumber, where yourhost is the name of the system where Urchin is installed and portnumber is the port number for the Urchin webserver (9999, unless you specified a different number during installation). Advanced Reporting Options Chapter 1: Getting Started 13
If you require unique visitor and session tracking, continue with the Visitor Tracking section of the Documentation Library. If you would like to know about processing ecommerce data, please see the Ecommerce Module section as well.
Installation Guide (Mac OS X 10.2.x)
These installation notes pertain to installing Urchin 5 on systems running a minimum of Mac OS X 10.2. For older Mac OS X versions please see the general instructions for UNIXtype installations. The Mac OS X 10.2 installer is a pointandclick package style installer that is downloaded in the form of a disk image. The basic components of the Urchin 5 installation process are: Download Urchin and unpack the installation archive Doubleclick the Urchin.mpkg file, which will launch an interactive installation process The installer will install 3 distinct parts: Urchin binaries, utilities, and support files, including an Apache webserver for administration and report delivery Urchin StartupItems Urchin Preference Pane Installation Preparation The Mac OS X installer requires Mac OS X 10.2 or higher. Users of older Mac OS X systems need to use the Mac OS X 10.1.x shell archive installer. An installing user must be able to authenticate using an account that has administrative privileges on the system since the installer will be installing files in restricted locations. While installing a dialog will inquire about what disk you want to install on. Currently, it is required that you install on the Startup volume. Installation Instructions If you are upgrading an existing installation of Urchin, please consult the Upgrades section of the Documentation Center for relevant details. If the installation disk image is downloaded to your system via a browser, it will automatically unpack and create an Urchin 5 folder on the desktop. If the installer image is downloaded via ftp or other mechanism, once the disk image is doubleclicked, it will uncompress and create the desktop folder. Inside the folder the contents will be as follows:
14
Urchin.mpkg Readme.rtf Install.rtf License.rtf uninstall_urchin.sh Packages folder Doubleclick the Urchin.mpkg icon and follow the instructions in the dialog boxes to complete your installation. The dialogs will prompt you Initial Configuration Using the Administration Interface The installer will start the Urchin webserver and Task Scheduler daemons and launch a browser to connect you to the Urchin administration interface. You will get a login screen. Use these initial login values: Username: admin Password: urchin Upon initial login, the interface will take you to a License Urchin wizard. Click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. When finished with the license process, you will be returned to the Urchin 5 administration interface where you will be led through a Setup wizard that will set some required initial configuration parameters. At any time in the future you can connect to your Urchin 5 administration interface by using the URL http://yourhost:9999, where yourhost is the DNS hostname for your system. Managing Urchin 5 Services The Urchin 5 services can be controlled or monitored by launching the System Preferences and clicking the Urchin icon. User Access to Reports Users should use URL http://yourhost:9999, where yourhost is the name of the system where Urchin is installed. Advanced Reporting Options If you require unique visitor and session tracking, continue with the Visitor Tracking section of the Documentation Library. If you would like to know about processing ecommerce data, please see the Ecommerce Module section as well.
Installation Guide (Sun Cobalt)
15
The installer is a .pkg file, installed via the Cobalt Administration interface. The basic tasks of the Urchin 5 installer for Sun Cobalt are: Create the /home/urchinc distribution directory, unpack the files, and set appropriate ownership and file permissions Configure and launch a light Apache webserver to allow web based configuration and report delivery Launch the Urchin task scheduler daemon, which manages log processing jobs Permit initial configuration and demo licensing of Urchin via the administration interface Installation Preparation You must have root access to install Urchin on a Sun Cobalt system. Although Urchin itself does not require any special system privileges to operate, and is specifically designed to run as a nonroot user for security reasons, installation requires superuser access to some areas of your system. You should download the appropriate package file for your system. This can be done one of 2 ways: Use a web browser from your desktop system to download from http://www.urchin.com/download/urchin5 and save the .pkg file on your local machine until you're ready to install Use ftp directly from your Cobalt system to ftp.urchin.com/pub/urchin5, and put the downloaded .pkg file into the /home/packages directory Your Cobalt system will need access to the Internet, since it is necessary for you to connect to the urchin.com site to complete the demo licensing and activate your Urchin distribution once it is installed. RaQ550 owners should read and understand the information on RaQ550 web.log permissions issues when installing Urchin. Installation Instructions If you are upgrading an existing installation of Urchin, please consult the Upgrades section of the Documentation Center for relevant details. Begin by connecting with your browser to the main Site Administrator's page for your Cobalt box, and navigate to the section of the Sun Cobalt administration interface used to install new third party software. The location of this area in the Cobalt interface will be platform specific: Raq 3 or RaQ 4 click on Maintenance in the left hand frame, then click Install Software in the top row RaQ 550 or Qube 3 click on the BlueLinQ tab, then click Third Party Software, then click the Install Manually button In the new software area, prepare and launch the package installer using the directions appropriate for your platform: RaQ 3 or RaQ 4
16
If you downloaded the Urchin package by using a browser and saving the .pkg file on your desktop system, then in the Software Package box select the Upload radio button, then click the Browse button to the right and navigate through your local filesystem until you locate the file. Once you click the Open button in the browse window, the pkg filename will be entered into the Upload text box. If you copied the software into your Cobalt system's /home/packages directory, then select the radio button labeled Loaded. Then choose your package installer from the drop down box to the right of this button. Click the "Install a pkg Package" button. When the installation is finished, in the lower section labeled Software on the Sun Cobalt Server, an Urchin link will appear. RaQ 550 or Qube 3 If you downloaded the Urchin package by using a browser and saving the .pkg file on your desktop system, then in the Location box select the Upload radio button, then click the Browse button to the right and navigate through your local filesystem until you locate the file. Once you click the Open button in the browse window, the pkg filename will be entered into the Upload text box. If you copied the software into your Cobalt system's /home/packages directory, then in the Location box, select the radio button labeled "Packages in /home/packages", and choose your package installer from the drop down box to the right of this button. Click the Prepare button. Once the package has been prepared, an Install Software window will appear. In this window click the Install button. When the installation is finished, Urchin will be listed under the Programs tab. Initial Configuration Using the Administration Interface Click on the Urchin link in your Cobalt administration interface and the Urchin admininstration login window will appear. Alternatively, you can connect directly to your Urchin administration interface without going through the Cobalt administration interface by using the URL http://yourhost:9999, where yourhost is the DNS hostname for your Cobalt system. Enter the admin username and password to start configuring Urchin. Use these initial login values: Username: admin Password: urchin Upon initial login, the interface will take you to a License Urchin wizard. You should click on "Obtain Demo License". The interface will connect to the licensing server at the Urchin Software Corporation website and walk you through the process. When finished with the license process, you will be returned to the Urchin administration interface where you will be led through the Setup Wizard, which will set some required initial configuration parameters. Follow the instructions to complete your initial configuration. Please make sure to reset the password for the admin account and to record this password somewhere for safekeeping. When the Setup Wizard has completed you'll be taken to the Profile configuration screen. Click Add to create a new Profile. You will see a list of some sample Cobalt profiles and Log Sources that you can use as templates. Once you have a Profile you're ready to start processing logs and viewing Report data. Managing Urchin Services There are 2 daemons, "urchind" and "urchinwebd", that need to be running in order for log processing, reporting, and configuration administration to occur. These daemons are automatically launched by the Chapter 1: Getting Started 17
installation process and are configured for your system so that they should always restart if the system is rebooted. However, you may have need to control these processes manually. The daemons are stopped and started by the urchinctl program in the bin subdirectory of your Urchin distribution in /home/urchin. To start or stop both daemons, use: /home/urchinc/bin/urchinctl start /home/urchinc/bin/urchinctl stop You can also specify to only start or stop one daemon at a time by using a w option for the webserver or a s option for the scheduler. To see all of the available options, execute urchinctl with the h option. Any errors encountered when one of the daemons is launched should be reported on the command line. For the urchinwebd daemon, once you think it is running successfully, you should also check the var/error_log file for any startup problems. User Access to Reports Users should use URL http://yourhost:portnumber, where yourhost is the name of the system where Urchin is installed and portnumber is the port number for the Urchin webserver (9999, unless you specified a different number during installation). Urchin Traffic Monitor On Sun Cobalt systems, due to the combination of the default webserver logging format, the automated webserver log splitting mechanism, and the builtin statistics gathering software, it is currently not possible to utilize the Urchin UTM.
Uninstalling Urchin 5
Windows Uninstalling on a Windows system can be done in two ways. Using Add/Remove Programs control panel go to Start>Settings >Control Panel and doubleclick on Add/Remove Programs. Highlight Urchin and click the Change/Remove button. An InstallShield window should launch and present you with a dialog box with 3 radio button choices: Modify, Repair, and Remove. Select the Remove button and click Next, then follow the remaining dialog boxes to complete. Rerunning an Urchin installer Running the setup.exe you installed Urchin with should detect that Urchin is already installed and present you with the dialog box with the Modify, Repair, and Remove radio buttons. When the uninstall process is completed there will be an Urchin data folder left in its original installation location. This folder contains Urchin report and configuration data, and is not removed during uninstallation. If you are completely removing Urchin from your system, you may remove the Urchin folder to reclaim disk space. Chapter 1: Getting Started 18
UNIXtype Systems Using the urchinctl program, stop the Urchin webserver and Urchin task scheduler services like so:
/path/to/urchin/bin/urchinctl stop
Once this is done you can remove the entire Urchin installation directory. If you have installed the urchin_daemons boot script that causes the Urchin services to start/stop when the system is rebooted, you should remove this script from the startup initialization area of your system. Mac OS X 10.2.x and later In the Urchin installer disk image (e.g. the urchin5.XXXX_macosx102.dmg), there is a script that can be used to automate removing Urchin. Mount the Urchin installer disk image by double clicking on it. This will mount a new volume on your desktop named Urchin 5.XXX (where 5.XXX) is the Urchin version number, e.g. 5.702) Open up a Terminal window by launching the Finder and selecting Applications from the Go menu. Navigate into the Utilities folder and double click on Terminal. In your terminal window run the command:
sudo /Volumes/Urchin 5.XXX/uninstall_urchin.sh
where 5.XXX is the Urchin version number. The uninstall_urchin.sh script will remove all of the Urchin binaries and support files, but leave your configuration and report data intact. If you want to remove all data as well, then you should manually delete the /usr/local/urchin directory with the command:
sudo rm r /usr/local/urchin
Sun Cobalt Connect to the main Site Administration page for your Cobalt system and follow the directions below for your system type: RaQ3 or RaQ4 Click on Maintenance in the left hand frame, then click Install Software in the top row. In the list of installed software on the system, click the Urchin link. In the Urchin management screen, click Uninstall Urchin, then click to confirm that you want to uninstall. Qube3, XTR, or RaQ 550 Select the BlueLinQ tab, then click Installed Software in the left hand frame. In the software list you will see an entry for Urchin. The right hand column of the Urchin entry has an uninstall icon. Click the icon and then click OK to confirm that you want to uninstall. When the uninstall has completed the Cobalt Administration interface should refresh and any entry for Urchin should be gone. Once this is done you can remove the /home/urchin directory. Please note that removing /home/urchin will irretrievably delete any remaining configuration and report data.
19
Troubleshooting Install Problems
All Platforms Please pay close attention to the output from your installer. Read all dialog boxes, requests for user input, and output text carefully. If your installer fails please record the complete and exact error messages that are generated including any error codes. This info is required for full analysis of your problem. Windows To create debugging output of what's happening during a Windows installation, you can have the setup.exe program log its activities to a file. This is particularly useful when you have some unknown error during installation. To trigger logging, you'll have to launch setup.exe by using the Run Command mechanism. Go to the Start menu and select Run... and in the Open: entry box, enter the full path to the setup.exe along with the appropriate logging options like so: C:\temp\setup.exe /v"/Lv C:\temp\installer.log" Pay close attention to the syntax of this command. The spaces, quotes, slashes, and backslashes should all be entered in exactly as shown. C:\temp\setup.exe should be replaced by the real path to setup.exe on your system. The file C:\temp\installer.log is where the execution logging output will be stored. UNIXtype Systems If you have problems executing the shell archive installer, you may download a compressed tar archive of the installation kit which you simply gunzip and untar. Then you can run the install.sh installation script and complete the install as described in the installation guide notes for Unix.
Upgrades
Upgrading Urchin 4
Overview Upgrading from Urchin 4 to Urchin 5 requires using the procedure that is specific both to your operating system and the Urchin version you're running. This document contains upgrade sections that cover all supported platforms and Urchin versions. Please make sure to verify that you are following the appropriate instructions for your situation. Before performing any upgrades please make sure you do the following: Chapter 1: Getting Started 20
1. Shutdown the Urchin services and back up your entire existing Urchin installation. Having the services disabled will guarantee that there is no database activity while the backup is in progress. 2. Have a record of the existing installation location and port number of the webserver. Considerations When Upgrading Licensing: Urchin 4 licenses are not compatible with Urchin 5. You will have to upgrade your Urchin 4 licenses. Speak with your sales rep for assistance with this. Differences in report numbers: After upgrading to Urchin 5, you may notice some changes in the session counts in your reports compared to using Urchin 4. Please see the article entitled Urchin 3, 4, &5 Reporting Differences in this section for details on these differences Visitor tracking with UTM1: if you are using UTM1 with Urchin 4 to track unique visitors to your site, no website changes are required. Urchin 5 will process and report on UTM1 data. Once you have upgraded to Urchin 5, it is advisable, although not required, to edit the Profiles for any UTM enabled sites. Go to the Profile Settings tab and select UTM Enabled All for the Default Report Set. It is strongly advised that you upgrade to at least Urchin 5.6 and update your website to use UTM4. UTM4 improves visitor tracking metrics and options for campaign tracking. If you do not need the campaign tracking capability UTM4 provides you can reduce log space overhead by editing the __utm.js file and setting __utmctm=0. Procedures Windows with Urchin 4.10x 1. Doubleclick on the urchin5xxx_win_setup.exe file and follow the instructions in the Welcome and License Agreement dialog screens. 2. In the dialog screen labeled Preparing to Upgrade Urchin Installation, the installer will present you with a list showing you the directory location and webserver port number it has determined for your existing installation. It will use these parameters for your upgrade. 3. If you decide you don't want to use these installer settings, you may exit the installation by clicking the Cancel button. It is not an option to alter the settings of your current installation during your software upgrade, since the upgrade has to match the previous configuration information stored in your Urchin 4 databases. 4. Click the Next button and the installer will proceed with converting your installation to the new version. Your report and configuration data will automatically be preserved during this process. Windows with Urchin 4.00x To migrate properly from Urchin 4.00x installations to Urchin 5, you will have to save some of your existing configuration and data files. 1. Disable your currently running Urchin Services by going to Start>Programs>Urchin>Disable Services 2. Navigate to your Urchin installation folder (e.g. C:\Program Files\Urchin) and rename the data folder to datasaved 3. Copy etc\httpd.conf to var\urchinwebd.conf 4. Launch the urchin5xxx_win_setup.exe installer. The installer should detect your previous installation and determine the configuration parameters. Just follow the instructions in the dialog windows of the Chapter 1: Getting Started 21
installer. 5. After the installer has finished go to Start>Programs>Urchin>Disable Services to deactivate your currently running Urchin 5 services. 6. In the Urchin installation folder (e.g. C:\Program Files\Urchin), rename the new data folder to datanotused, and rename the datasaved folder from step 2 back to data 7. Launch the Urchin services by going to Start>Programs>Urchin >Enable Services UNIX with all versions of Urchin 4 The Urchin 5 install.sh script will properly handle all existing installations of Urchin 4 on UNIXtype systems. When running install.sh, be sure to select Upgrade as the installation type when prompted. Please see the Installation Guide (UNIX) section of the Documentation Center for full instructions. Mac OS X 10.2.x Note: Mac OS X 10.1 users should use the section on upgrades for UNIXtype systems. For the majority of cases the Urchin 5 package installer for Mac OS X 10.2 systems will automatically detect existing Urchin 4 installations and upgrade them. You simply use the instructions in the Installation Guide for Mac OS X 10.2. The one exception to this standard package installer upgrade procedure is if you previously installed using the shell archive installer but did not install in the default location of /usr/local/urchin4, and are now using the package installer to upgrade. In this case you should take the following steps: Turn off your existing Urchin services using the urchinctl program (i.e. ./urchinctl stop) Install normally using the package installer, but do not do any initial configuration using the admin interface. When the browser launches at the end of the install, simply close the window. After the package installer has completed, go to the Urchin Preference Pane in the System Preferences and stop all running Urchin services In the new Urchin 5 installation directory, /usr/local/urchin, rename the data subdirectory to datasaved Move the entire data subdirectory from your old Urchin version install directory into /usr/local/urchin In /usr/local/urchin update the ownership of the data subdirectory using the command: chown R www:www data Launch your new Urchin 5 services by using the Urchin preference pane in System Preferences to start the scheduler and webserver Your old configuration and report data should now be available in your new Urchin installation. Once you have confirmed that the configuration and processing is normal you can remove your old Urchin 4 distribution, as well as the /usr/local/urchin/datasaved directory. Sun Cobalt Login to your Sun Cobalt administration interface and navigate to the section where Urchin is listed. Uninstall Urchin 4 by clicking the Uninstall Urchin 4 link. Once the system reports the uninstall process is complete, you can install the Urchin 5 pkg installer normally per instructions in the installation section of this guide. The Urchin 5 installation will detect the existing Urchin 4 data and move it into place as part of the Urchin 5 installation. Chapter 1: Getting Started 22
Upgrading Urchin 3
Overview Urchin 5 is an entirely new product with thoroughly revised internal workings and data formats that are not compatible with Urchin 3. Therefore an existing Urchin 3 installation cannot be upgraded by simply installing Urchin 5 in its place. However, it is possible to install Urchin 5 side by side with Urchin 3 so that you may migrate report and configuration data from one to the other. The basic Urchin 3 to Urchin 5 upgrade process consists of: Installing and licensing Urchin 5 Deactivating Urchin 3 log processing Running a migration tool to import Urchin 3 data into Urchin 5 Post migration configuration of Urchin 5 processing There are some special circumstances to consider for Urchin 3 to Urchin 5 migrations: Not all Urchin 3 configurations can be migrated. In particular existing configurations that rely on the Urchin 3 SubreportMode directive cannot be imported directly into Urchin 5, which does not support SubreportMode. u3importer cannot be used to migrate Urchin 3 data between differing platform types as part of the import process. So you cannot, for example, take Urchin 3 databases created on a Windows platform and try to import them into an Urchin 5 installation on a Sun. u3importer must be run on a platform of the type where the Urchin 3 databases were created. If you are upgrading a Sun Cobalt server, you should use the instructions in the special dedicated section of the Upgrades documentation on Upgrading Urchin 3 on Sun Cobalt. Procedure You should already have downloaded the Urchin 5 installer appropriate for your system. Also you will need to know the full path to your Urchin 3 config file to complete the upgrade process. Proceed as follows: Install Urchin 5 as appropriate for your platform per instructions in the Installation section of the Documentation Center Obtain a license for Urchin 5 and perform basic configuration of global settings such as assigning an admin password and so forth, but do not create any Profiles. Run the inspector program in the Urchin 5 util subdirectory to verify that your installation is correct. If any errors are reported correct them before proceeding with your Urchin 3 migration. If necessary to guarantee that no changes are made to your Urchin 3 databases during migration, deactivate your Urchin 3 log processing as follows: Windows launch the Urchin 3 configuration interface and set reports to Off as appropriate UNIXtype systems edit your crontab and comment out the line that controls Urchin processing
23
Run the u3importer program located in the Urchin 5 util subdirectory. This program will prompt you for the full path to your Urchin 3 config file, then prompt to indicate which sites you want to import into Urchin 5. Once u3importer has finished, your Urchin 3 report and configuration data should be established in Urchin 5. Connect to the Urchin 5 administration interface and verify that you have correct Profiles for all your websites. Ecommerce Processing Urchin 5 has the ability to process ELF logs and correlate the data with access logs. The ELF log source can be added to the regular log source for a profile if the Urchin 5 Ecommerce module is installed.
Upgrading Urchin 3 on Sun Cobalt
Overview Please read and understand all these instructions first before proceeding with your migration from Urchin 3 to Urchin 5 on Sun Cobalt systems. The recommended way to upgrade on Sun Cobalt is to start with a fresh installation of Urchin 5 which has not gone through any configuration other than the having the initial Setup Wizard run. You should not manually configure any Profiles, Log Sources, Users, etc. after installing. The migration utilities will create these as needed while importing your Urchin 3 data. The basic steps in upgrading a Sun Cobalt system are: 1. Backup your entire Urchin 3 distribution 2. Deactivate Urchin 3 log processing 3. Install Urchin 5, connect to the administration interface and run the Setup Wizard 4. Run u3importer to import Urchin 3 report configurations as Urchin 5 profiles and convert Urchin 3 data to Urchin 5 format 5. Download and run the u5_cobalt_import.pl script to import other Urchin 3 for Cobalt config settings, such as Customers and Users, into your Urchin 5 configuration 6. Schedule tasks to process logs for each newly created Profile Procedure You will have to telnet or ssh into your Cobalt system as root to perform some of these instructions. You should keep a terminal window open so that you can move back and forth from the command line to the graphical interfaces as necessary. 1. Backup Urchin 3 distribution this is suggested strictly as a standard precaution. The process of importing your Urchin 3 data into Urchin 5 does not alter your Urchin 3 installation.
24
2. Deactivate Urchin 3 log processing on Cobalt systems this requires that you move 2 scripts that manage the daily execution of Urchin. On the command line in your terminal window execute these commands: mv /etc/cron.daily/urchin /home/urchin3da/admin/bin mv /etc/cron.daily/urchin_purge_weblogs /home/urchin3da/admin/bin 3. Install Urchin 5 following the instructions for Sun Cobalt in the Installation section of this guide, and run the Setup Wizard to do the initial configuration. Important: in the Admin Settings screen of the Setup Wizard, you must set Data Center Mode to On. 4. Run u3importer this program, located in the util subdirectory of your Urchin 5 installation, will prompt you to import report directives from your Urchin 3 config file. When prompted for the location of your config file path, enter /home/urchin3da/config Subsequently you will see additional prompts that say Import Urchin 3 Configurations and then Import Urchin 3 Data Just hit the return key to accept the default response of all for these last two steps. When u3importer has finished you should verify that correct Profiles were created by examining the configuration via the Urchin administration interface. 5. Download and run u5_cobalt_import.pl since u3importer only deals with importing Urchin 3 databases and creating Urchin 5 Profiles for your existing Urchin 3 reports, other configuration info that is specific to Cobalt installations has to be imported separately using this tool. You can download u5_cobalt_import.pl from ftp://ftp.urchin.com/urchin5/support. Put this script in the /home/urchin/util directory on your Cobalt system. Then in your terminal window execute the program like so: ./u5_cobalt_import.pl The script will prompt you for input as needed. When the script has finished, the configuration import portion of the migration process will be complete. 6. Schedule tasks to process logs for the Urchin 5 profiles when you import Urchin 3 data using u3importer, a task is created in the Scheduler for each Profile, but the scheduled time to run is not set. You can either set each schedule manually via the Urchin 5 administration interface, or use the uconfschedule utility to set all tasks simultaneously.
25
Urchin 3, 4, &5 Reporting Differences
Differences to Note When Migrating Between Urchin Products This document is an overview describing basic differences in data analysis as well as certain migration issues when moving from one major Urchin version to another. Each major Urchin version is listed along with a summary explaining key elements of how data is analyzed. The latter portion of this page covers issues to anticipate for particular migration scenarios. Urchin 3 Visitor tracking is done by incoming IP address only. There is no distinction between a visitor and a session. All MIME types except images (gif/jpg/png) are treated as pageviews. Pageview hits with a HEAD request type are logged as treated as actual pageviews. Pageviews are not required to count a visitor, so a request for a single image file could be counted as a new visitor. Hits with error codes of 404 or 5xx are considered legitimate visits and could increment the visitor count. Traffic>Hourly report and Tracking reports (e.g. Top Entrances, Top Exits) data is stored on a monthly basis, therefore the only report granularity is for a single month date range. Urchin 4 The default visitor tracking method uses a combination of IP address plus the UserAgent field from log entries. Other tracking options include UTM1, session id, and IPonly (i.e. Urchin 3 style tracking). Urchin 4 provides UTM1 to enable optimal visitor and session tracking. UTM1 utilizes client side cookies to identify unique individuals as opposed to relying on IP addresses, which are not necessarily unique to a particular person or system. By default a session requires a legitimate pageview to be counted. A request for an image is not considered a pageview nor is a request with a status code other than 2xx, 302, or 304. This will typically reduce counts for visitors, sessions, pageviews and related reports in Urchin 4 using IPOnly tracking when compared to Urchin 3 reports for the same data. The pageview requirement is configurable by the Urchin administrator for sites that have a design that makes counting of images as pageviews desirable. When using UTM1 tracking, sessions without UTM cookie info will be processed using the default of IP+UserAgent. Traffic>Hourly report and Tracking reports (e.g. Top Entrances, Top Exits) data is stored on a daily basis, therefore the report granularity is for any time period of a day or greater. Urchin 5 The default visitor tracking method uses a combination of IP address plus the UserAgent field from log entries. Other tracking options include UTM (either UTM2 or UTM1), session id, username, and IPonly (i.e. Urchin 3 style tracking). Chapter 1: Getting Started 26
Urchin 5 provides improved visitor and session tracking based on UTM2, which uses client side cookies with a configurable session timeout. With this technology, hits with the same cookies spread out over a large period of time can be counted as multiple sessions as opposed to a single long session. This produces more meaningful averages in the reports. For both UTM1 and UTM2 tracking, the processing logic has changed so that only hits with UTM cookie information are processed when counting visitors, sessions, and pageviews. Hits without UTM info do not fall back to processing using IP+UserAgent as in Urchin 4. Such noncookie sessions are tracked only for reports that are based on hits and bytes. This can lower counts for visitors, sessions, pageviews, and related reports when compared to Urchin 4 because it significantly reduces the effect of robot traffic on your statistics. An explicit include or exclude MIME type list is now used to define what a pageview is. By default, Urchin 5 excludes the following MIME types from the pageview list: gif,jpg,jpeg,png,js,css,cur,ico,ida All other MIME types are considered to be pageviews or downloads. Pageview hits which use HEAD as the request type only cause the Hits count for that page to be incremented, the pageview count is not. By default a session requires a legitimate pageview to be counted. A request for an image is not considered a pageview nor is a request with a status code other than 2xx, 302, or 304. This will typically reduce counts for visitors, sessions, pageviews and related reports in Urchin 5 using IPOnly tracking when compared with older Urchin version reports for the same data. The pageview requirement is configurable by the Urchin administrator for sites that have a design that makes counting of images as pageviews desirable. With the exception of the Status and Errors report, all reports that graph vs. hits are based on valid hits. Previously, such graphs were based on all hits (i.e. valid and hits with errors). Traffic>Hourly report and Tracking reports (e.g. Top Entrances, Top Exits) data is stored on a daily basis, therefore the report report granularity is for any time period of a day or greater. Migrating from Urchin 3 to Urchin 5 Reporting Since in Urchin 3 the Tracking reports and Traffic>Hourly Graph data is only stored on a monthly basis, and in Urchin 4 and Urchin 5 this data is stored on a daily basis, a side by side comparison of these reports requires that you set the date range to one month in the newer products. Also, when importing Urchin 3 data there is no way to break out the monthly data for these reports into individual days, so all data for these specific reports for a given month will be placed into the first day of the month in the newer Urchin version. Here too, setting the date range to one month will allow the imported historical data to be viewed in the correct context. Administration Administration is primarily via graphical interface and is based on a binary configuration database. However, command line tools and the ability to import a flat file configuration are available for those who are used to and prefer the config file approach of Urchin 3. Migrating from Urchin 4 to Urchin 5 Reporting Chapter 1: Getting Started 27
Urchin 4 databases are fully compatible with Urchin 5. Report data will be immediately available once you upgrade. As noted above in the product descriptions, Urchin 5 uses a different logic for processing hits, so once you upgrade you will initially see a difference in report numbers compared to recent historical data generated with Urchin 4. These variances will differ depending on which visitor tracking method you've been using. Log Tracking Logtracking data in Urchin 4 is kept in a single tracking file. In Urchin 5 this data is kept in individual monthly databases. When an Urchin 4 installation is upgraded to Urchin 5, the old logtracking data is converted into equivalent Urchin 5 monthly logtracking databases, and the Urchin 4 logtrack file is archived. Migrating from UTM1 to UTM2 Reporting IMPORTANT: UTM2 cannot be used with Urchin 4. You must be running Urchin 5 before switching your website to use UTM2. The improved accuracy in identifying unique visitors that UTM2 provides means that you may see some differences in reported numbers compared to what you have been seeing using UTM1. These differences should be on the order of 10% or less.
Upgrading Urchin 5
Overview Upgrading Urchin 5 is a straightforward process. The installers typically deal automatically with upgrading existing installations while leaving your configuration and report data intact. This document contains upgrade sections that cover all supported platforms. Please make sure to verify that you are following the appropriate instructions for your situation. Before performing any upgrades please make sure you do the following: 1. Back up your entire existing Urchin installation, in particular any customized configuration files. 2. Shutdown the Urchin services. Having the services disabled will guarantee that there is no database activity while the backup is in progress. 3. Have a record of the existing installation location and port number of the webserver. Considerations When Upgrading It is always advisable to install on your website the latest __utm.js provided with the current release when upgrading Urchin. In addition, as of Urchin 5.7 there is a new UTM, and all users of Urchin 5.x products are encouraged to upgrade to this UTM version even if you do not upgrade to 5.7 at this time. Chapter 1: Getting Started 28
Campaign Tracking Module users who download Google CPC data must modify their Google download process when upgrading to Urchin 5.6 or 5.7. Please see the help article on importing Google cost data in the Campaign Tracking Module section. Visitor tracking with UTM1, UTM2, or UTM3: Urchin 5.6 and newer versions are backwards compatible when processing all older versions of UTM data. Although not required, it is strongly advised that you upgrade your website to UTM4 or later regardless of the Urchin 5 version you are using. Optimizing UTM4 settings: UTM4 improves visitor tracking metrics and options for campaign tracking. If you do not need the campaign tracking capability, you can reduce log space overhead by editing the __utm.js file and setting __utmctm=0. This will still allow you to benefit from the improved UTM4 visitor tracking. Procedures Windows 1. Doubleclick on the urchin5xxx_win_setup.exe file and follow the instructions in the Welcome and License Agreement dialog screens. 2. In the dialog screen labeled Preparing to Upgrade Urchin Installation, the installer will present you with a list showing you the directory location and webserver port number it has determined for your existing installation. It will use these parameters for your upgrade. It is not an option to alter the settings of your current installation during your software upgrade, since the upgrade has to match the previous configuration information stored in your Urchin 5 databases. 3. Click the Next button and the installer will proceed with converting your installation to the new version. Your report and configuration data will automatically be preserved during this process. UNIX The install.sh installation script which is bundled as part of all UNIXtype installers will properly upgrade any older version of Urchin 5 installed on UNIXtype systems. When running install.sh, be sure to select Upgrade as the installation type when prompted. Otherwise the upgrade procedure for UNIX is identical to a new installation. When using install.sh interactively to do an upgrade, at one point you will be presented with the prompt:
Please select the installation type [Default: 1] 1. New 2. Upgrade >
Be sure to select 2 to trigger an Upgrade. If you are using install.sh in non interactive mode by specifying command line options then be sure to use the m option to specify an upgrade. Please see the Installation Guide (UNIX) section of the Documentation Center for full instructions on using install.sh. Mac OS X 10.2.x Note: Mac OS X 10.1 users should use the section on upgrades for UNIXtype systems.
29
If you have previously used an Urchin 5 package installer for Mac OS X 10.2, then using a newer package installer will automatically detect your existing Urchin 5 installation and upgrade it. Users in this situation should simply use the instructions in the Installation Guide for Mac OS X 10.2 and skip the rest of the instructions in this subsection as in this case the instructions for new installation and upgrade are the same. If you did not previously use the package installer, but installed using the install.sh installation script and did not install in the default location of /usr/local/urchin, then you must use a modified procedure to upgrade. The package installer will only install in /usr/local/urchin, so it cannot be used to automatically upgrade another install location. If you have this situation, you have two choices: 1. Do not use the package installer to upgrade. Instead download and use the same type of installer you used previously. This means you can follow the standard upgrade instructions for UNIXtype systems detailed in the previous subsection. 2. If you prefer to start using the package installer to upgrade, you can take the steps listed below, but realize that this procedure will cause your Urchin installation to be relocated to the default of /usr/local/urchin: Turn off your existing Urchin services using the urchinctl program (i.e. ./urchinctl stop) Move the current Urchin installation directory to /usr/local/urchin. You will need to move the entire directory structure starting with the top level directory of your current Urchin installation. For example if you previously had installed Urchin in /applications/urchin, then you would use the following command: mv /applications/urchin /usr/local/urchin You should verify that you have enough disk space in the /usr/local file system for your current Urchin installation before doing the move. Once you've relocated Urchin to the proper location you may launch the Urchin 5 pkg installer and follow the interactive instructions to upgrade Your old configuration and report data should now be available in your updated Urchin installation. Sun Cobalt The Urchin 5 pkg installers for Sun Cobalt systems automatically detect existing installations and upgrade the Urchin 5 files as needed. Simply follow the instructions for a new Sun Cobalt installation to perform an upgrade.
Initial Configuration
Ecommerce Reporting
Urchin is capable of extensive ecommerce reporting in conjunction with its standard web traffic reports. To Chapter 1: Getting Started 30
accomplish this, two basic elements are required: Shopping cart software that produces activity logs in the ELF/ELF2 format (many can be configured to do so). The Urchin Ecommerce Module, which is available as an addon to any Urchin 5.x license. To set up a Profile for ELF/ELF2 processing, use the Profile Setup Wizard in the Urchin admin interface and choose Profile type Ecommerce. In the Log Source Wizard (which you will be taken through in the Profile setup process), you will need to specify two Log Sources the standard website access log, and the ELF log. ELF: To process existing ELF logs with Urchin requires only that you set LogFormat in the Log Source to ELF (or auto), and that the Visitor Tracking method in the Profile for the site be set to IPONLY. ELF2: To use ELF2 you must configure your shopping cart software to generate log entries formatted as shown below. The ELF2 log format is based on the ELF log format and specification. Some additional fields were added to improve visitor tracking. Any fields containing internal tab characters must be quoted. The transaction line starts with an exclamation character '!' and contains the following fields separated by tabs:
!orderid remote host IP (as given by %h in NCSA extended/combined log format) time (as given by %t in NCSA extended/combined log format) store sessionid total tax shipping billcity billstate billzip billcountry cs_useragent cs_cookie
The item line does not start with an exclamation character and contains the following fields separated by tabs:
orderid remote host IP (as given by %h in NCSA extended/combined log format) time (as given by %t in NCSA extended/combined log format) productcode productname variation price quantity upsold cs_useragent cs_cookie
31
Setup Recommendations
Overview Once Urchin is installed, there are some initial operational parameters that will have to be configured. This is done via a Setup Wizard that runs when you connect to your Urchin administration interface for the first time, and during the first stages while you are establishing Profiles. These initial configuration actions include: Licensing Urchin Configure Admin Settings for remote report and administration access, as well as establishing Data Center Mode operation Setting the Urchin Administrator account password Scheduling tasks for each of your profiles to process data Log management Procedure Connect to the Urchin administration interface. On Windows systems you can go to Start>Programs>Urchin>Urchin Administration. On UNIX type systems, Sun Cobalt, and Mac OS X you can use the URL http://hostname:9999, where hostname is the registered hostname of your Urchin system. For a new installation you should use the following to login:
Username: admin Password: urchin
You'll be presented with an Urchin Setup Wizard welcome screen. Click Continue to proceed through each of the following wizard screens. Note that the choices you make in this initial configuration can always be altered later on. License Urchin You have to choose one of the links under the Action Items area of this screen to license Urchin before you can proceed with configuring and using the software. Click Buy License to purchase and install a license via the web right away. If you purchased a license via a sales rep prior to installing Urchin, then click Activate Pre Purchased License. Otherwise, click Obtain Demo License to install an expiring license. Admin Settings Remote Access Settings select On for each case if you want to allow remote browser connections. If you select Off for either of these, then the only allowed access is on the console of the system where Urchin is installed. Data Center Mode this setting determines whether Urchin is configured to allow creation of Affiliations, which allow you to logically organize Profiles, Groups, and Users into restricted access categories. If you are undecided it is best to set this to On as it adds no overhead and gives you the flexibility to use it in the future.
32
Administrative User Reset the password for the admin account and record this password for safekeeping. Scheduling Tasks When you create profiles you are given the option to schedule what time to run the task that processes the data for that profile. You should check the settings for each profile to be sure that the timing of you task makes sense in terms of when the log data will be available, how long it will take to process that data, and when you want the updated reports available to your users. Log Management Urchin includes a log tracking module which keeps track of how far into each log it has processed so far. Thus, log file rotation does not necessarily need to be coordinated with Urchin operation. However, Urchin does provide automation for log rotation or removal under the Advanced Settings for each Log Source.
33
Chapter 2: Visitor Tracking
Using UTM with Ecommerce
Overview Since the key aspect of UTM is the ability to identify and correlate visitor activity, when utilized on an ecommerce site in conjunction with the ECommerce Module, visitor activity that generates revenue can be tracked across your sites and reported on collectively. Transactions on the server that hosts your shopping cart software can be correlated with sessions on your other webserver, allowing session variables, such as referrals and keywords, to be reported on versus the revenue they generate. When using the Campaign Tracking Module, the UTM provides multisession tracking that tracks the visitor from source to purchase or goal. Conversion ratios and ROI reports in the Campaign Tracking Module provide detailed results of online marketing efforts including keyword buying, email campaigns, and link exchanges. Same Domain Configuration If the frontend website and secure ecommerce site use the same domain, installing the UTM on your ecommerce site is no different than installing on other types of websites. The information in the other areas of this section on UTM installation will provide the specifics of installation. Special attention should be paid to the areas explaining how to set the UTM domain appropriately for your ecommerce and other sites. Further information on ecommerce transaction log formats is provided in the Ecommerce section of the Chapter 2: Visitor Tracking 34
documentation. Cross Domain Configuration It is increasingly common for web sites with ecommerce shopping carts to outsource the ecommerce component to another organization such as Amazon, PayPal, or Yahoo Stores. This can create a problem for the UTM tracking as the domain for UTM changes as a visitor goes from the main website to the secure store. In order for the secure store to use the same UTM visitor ID as the main website, the visitor ID must be passed in the link to the secure store. The UTM contains a __utmLinker function that will wrap the link with the appropriate id before sending the link to the store. Instead of linking directly to the store, simply pass the link to the __utmLinker function. Here are the specific instructions for using the __utmLinker: 1. Edit the __utm.js file in the document root of both web sites and set the __utmdn variable to "none". 2. Set the UTM Domain for the profile to nothing (blank). 3. Change the links from the main site to the secure site in the form:
<a href='javascript:__utmLinker("https://previous_link?with_parameters");'> link toshoppingcart </a>
Visitor Identification Methods
Overview Urchin has five different methods for identifying visitors and sessions, depending on available information. Of these, the patentpending Urchin Traffic Monitor (UTM) is a highly accurate system that was specifically designed to identify unique visitors, sessions, exact paths, and return frequency behavior. There are a number of visitor loyalty and client reports that are only available when using the UTM System. The UTM System is easy to install and is highly recommended for all businesses. In addition to the UTM System, Urchin can use IP addresses, UserAgents, Usernames, and SessionIDs to identify sessions. The following table compares the abilities of each of the five identification techniques: Ability IP Only IP+UserAgent Username Session ID UTM Identifies nonproxied sessions X X X X X Identifies some proxied sessions X X X X Uniquely identifies each session X X Defeats session IP proxying X X Defeats most provider caching X X Defeats browser caching X Chapter 2: Visitor Tracking 35
Uniquely identifies visitors Captures exact path sequence Captures visitor loyalty metrics Captures browser capabilities Data Model
X X X X
The underlying model within Urchin for handling unique visitors is based on a hierarchical notion of a unique set of visitors interacting with the website through one or more sessions. Each session can contain one or more hits and pageviews. Pageviews are kept in order so that a path through the website for each session is understood. As shown in the diagram, the Visitor represents an individuals interaction with the website over time. Each unique visitor will have one or more sessions, and within each session is zero or more pageviews that comprise the path the visitor took for that session.
Proxying and Caching In attempting to identify and track unique visitors and sessions, we are basically going against the nature of the web, which is anonymous interaction. Particularly troublesome to tracking visitors are the increasingly common proxying and caching techniques used by service providers and the browsers, themselves. Proxying hides the actual IP address of the visitor and can use one IP address to represent more than one web user. A users IP address can change between sessions and in some cases multiple IP addresses will be used to represent a cluster of users. Thus, it is possible that one visitor will have different IP addresses for each hit and/or different IP addresses between sessions. Caching of pages can occur at several locations. Large providers look to decrease the load on their network by caching or remembering commonly viewed pages and images. For example, if thousands of users from a particular provider are viewing the CNN website, the provider may benefit from caching the static pages and images of the website and delivering those pieces to the users from within the providers network. This has the effect of pages being delivered without the knowledge of the actual website. Browser caching adds to the question. Most browsers are configured to only check content once per session. If a visitor lands on the home page of a particular website, clicks to a subpage, and then uses the backbutton to go back to the home page, the second request of the home page is most likely never sent to the website server, but pulled from the browsers memory. An analysis of paths may result in an incomplete path missing the Chapter 2: Visitor Tracking 36
cached pages.
In the above diagram, the actual path taken through the website by the client is shown at the top, while the apparent path from the servers point of view is shown at the bottom. In this case, before proceeding to Page3 the user goes back to the Page1. The server never sees this request and from its point of view it appears the user went directly from Page2 to Page3. There may not even be a link from Page2 to Page3. Visitor Identification Methods As mentioned previously, Urchin has five different methods for identifying visitors, sessions and paths. The more sophisticated methods which can address the above issues may require special configuration of your website. The following descriptions describe the workings of each method in more detail. 1. IPOnly: The IPOnly method is provided for backward compatibility with Urchin 3, and for basic IT reporting where uniquely identifying sessions is not needed. This method uses only the IP Address to identify visitor sessions. Thirty minutes of inactivity will constitute a new session. The only data requirements for using this method is a timestamp and IP Address of the visitor. 2. IPAgent: The default method, which requires no additional configuration, uses the IP address and useragent (browser) information to determine unique sessions. A configurable thirtyminute timeout is used to identify the beginning of a new session for a visitor. While this method is still susceptible to proxying and caching, the addition of the useragent information can help detect multiple users from one IP address. In addition, this method includes a special AOL filter, which attempts to reduce the impact of their roundrobin proxying techniques. This method does not require any additional configuration. 3. Usernames: This method is provided for secure sites that require logins such as Intranets and Extranets. Websites that are only partially protected should not use this method. The Username identification is taken directly from the username field in the log file. This information is generally logged if the website is configured to require authentication. This method uses a thirtyminute period of inactivity to separate sessions from the same username. 4. Session ID: The fourth visitor identification method available in Urchin is the Session ID method, which can use pre existing unique session identifiers to uniquely identify each session. Many content delivery applications and web servers will provide session ids to manage user interaction with the webserver. These session ids are typically located in the URI query or stored in a Cookie. As long as this information is Chapter 2: Visitor Tracking 37
available in the log data, Urchin can be configured to take advantage of these identifiers. Using session ids provides a much more accurate measurement of unique sessions, but still does not identify returning unique visitors. This method is also susceptible to some forms of caching including the above example. In many cases, the ability to use session ids may already be available, and thus, the time required to configure this feature may be short. For dynamically generated sites, taking advantage of this feature should be straightforward. The result is more accurate visitor session and path analysis. 5. Urchin Traffic Monitor (UTM): The last method for visitor identification available in Urchin is the Urchin Tracking Module. This system was specifically designed to negate the effects of caching and proxying and allow the server to see every unique click from every visitor without significantly increasing the load on the server. The UTM system tracks return visitor behavior, loyalty and frequency of use. The clientside data collection also provides information on browser capabilities. The UTM is installed by including a small amount of JavaScript code in each of your webpages. This can be done manually or automatically via server side includes and other template systems. Complete details on installing UTM are covered in the articles later in this section. Once installed, the Urchin Traffic Monitor is triggered each time someone views a page from the website. The UTM Sensor uniquely identifies each visitor and sends one extra hit for each pageview. This additional hit is very lightweight and most systems will not see any additional load. The Urchin engine identifies these extra hits in the normal log file and uses this additional data to create an exact picture of every step taken by the users. This method also identifies visitors and sessions uniquely so that return visitation behavior can be properly analyzed. While this method takes a little extra time to configure, it highly recommended for comprehensive detailed analytics.
Urchin Traffic Monitor (UTM)
Overview The patentpending Urchin Traffic Monitor (UTM), originally available in Urchin 4, was specifically designed to provide the most accurate measurements of unique website visitors. For businesses looking to get a deeper understanding of their online visitor behavior, the UTM is an extremely valuable technology that combines the best of client and server side information while letting you control the data. Easy to install, this technology allows business owners to exactly identify unique visitors, click paths, and return loyalty metrics including: first time visitors, returning visitors, and frequency of use. The second version of UTM, UTM2, released with Urchin 5, expands these capabilities, capturing additional browser parameters and loyalty metrics. UTM3, released with Urchin 5.5, adds a powerful campaign tracking capability. Subsequent versions of the UTM released with Urchin 5.6, and Urchin 5.7 contain a number of enhancements to the campaign tracking capability. There are two components to the Urchin Traffic Monitor System: the UTM Sensor, which is a lightweight Chapter 2: Visitor Tracking 38
module installed into the content of the website; and the UTM Engine which is part of the log processing Urchin Engine. The UTM Sensor enables clientside data collection, which is then funneled back through the web server augmenting the normal logfile. The clientside information is combined with the existing serverside data by the UTM Engine to provide a more accurate and complete picture of website activity. The UTM Sensor is a small amount of JavaScript code that accomplishes two important functions. First, the Sensor negates the effects of caching by forcing at least one hit to progress to the original web server for each pageview. The impact on the server is minimal, and the details about the additional hit are logged into the normal web logfiles resulting in a more complete data set. Secondly, the UTM Sensor uniquely identifies each visitor by using clientside "1st party" cookies to keep track of first time and returning visitors. This cookie identifier is a communication tag only viewable to your web server in the same nature as session ids. It is not a third party cookie, which provides information outside your system, violating many privacy policies.
The above diagram illustrates the operation of the UTM System. The web server in the middle of the diagram provides two basic functions: content delivery and logging. The content of the website includes the UTM Sensor which is delivered to the users browser, shown on the left. The UTM Sensor sets unique identifiers and sends an additional request back to the same web server. This additional request is logged into the normal log file along with all of the normal traffic. The UTM Engine, which is part of the Urchin log processing engine, understands this additional data and merges the two types of data together providing an accurate and more complete picture of visitor behavior. UTM Sensor The UTM Sensor increases the accuracy and completeness of logfile data by negating the effects of caching and proxying. The following example illustrates how the UTM Sensor handles caching. Shown in the figure below, the user receives the content of a pageview from the cached memory of the browser. This typically occurs when the user goes back to a previously viewed page. The same model applies if the caching is provided by a service provider. In the example, the content for page "X" is not delivered from the web server, but from the cached memory of the browser. At this point, there is no knowledge of the pageview as it is not seen by the web server. However, the UTM Sensor activates an additional unique hit that forces at least one small record back to the original web server. This information is logged in the normal logfile, which now has knowledge of the originating "X" pageview.
39
The second important function of the UTM Sensor is to uniquely identify both sessions and unique visitors. Through a patentpending combination of browser cookies, the Sensor detects and initializes the unique visitor and session identifiers allowing exact monitoring of new and returning visitors regardless of service provider proxy behavior. Most service providers take advantage of proxying by recycling IP addresses and clustering users behind firewalls. This can cause problems with normal logfile tracking, which typically utilizes the IP address as an identifier of the user. In the example shown in the figure below, the UTM Sensor is able to pierce the veil of the proxy by utilizing the cookie identifiers instead of the IP addresses. In the figure, a first time unique visitor accesses the website through a firewall with IP address #1. The delivered Pageview contains the UTM Sensor, which sets the identifier on the visitors browser. On the return visit by the same visitor shown in the bottom of the figure, the unique id is passed to the web server along with each request. So even if the user is now assigned a second IP address, the UTM technology properly identifies the visitor with the original id. In addition to negating the effects of complex proxying techniques, this also tracks visitors who travel and may use their laptops from several locations and through several providers.
Once the additional UTM data is recorded in the normal web server log , the UTM Engine will recognize and process these additional hits in order to create an exact analysis of each click of the user. During installation, it is important that the logging format is checked for both referral and cookie logging to be present so that all of the appropriate data is stored. Installation Chapter 2: Visitor Tracking 40
There are four steps to installing the UTM system, which can be accomplished in a very short amount of time. Complex sites may be able to take advantage of existing serverside includes or centralized delivery methods to shorten the installation. During installation, you will need access and permissions to modify the content of the website. You may also need to modify the logging of the web server, which may require a different set of permissions. The following four steps do not necessarily need to be performed in order. Upgrade Note: UTM2, which ships with Urchin 5, is not recognized by Urchin 4. Once UTM2 is installed, you will no longer be able to run Urchin 4. All versions of Urchin 5 recognize both UTM1 and UTM2. As well, as of Urchin 5.5 there is UTM3. Only Urchin 5.5 and up can process UTM3 data. Therefore, when upgrading, it is important to migrate to the appropriate version of Urchin before installing a more recent version of the UTM sensor. UTM4 data, however, can be processed by any Urchin 5.x version. 1. Install UTM Sensor into content: The first step in installing the UTM is to include the JavaScript and GIF components of the UTM Sensor in the content of the site. The two pieces necessary for completing this step are included in the util/utm/ folder within the Urchin distribution. It is important that the names of these two files are not changed and that they are copied to the document root directory of the website. Either drag and drop, upload, or copy the __utm.js and the __utm.gif files into the main directory of your website.
Once these files are in place, you will need to include the __utm.js file at the beginning of each webpage in the website. If your site utilizes server side includes and you use a header include for each file, it is possible to include the UTM in the beginning of this include file only. It will then automatically be a part of each webpage. For static HTML sites that do not use includes, you will need to modify and add the UTM entry to each page individually. For dynamic sites that use a content generation engine, the UTM can be included at the beginning of the template that is delivered to the customer. In any case, the following line of code should be included in the beginning of each HTML page, but after any META tags, that is delivered to the end user. For static sites, edit each webpage and add the line below before the rest of the HTML content (but after any META tags).
<script src="/__utm.js" type="text/javascript"></script>
For sites that undergo regular maintenance or have multiple authors, be sure to build the addition of this line into the your internal website authoring procedures, guidelines, and QA processes. If you are using a package like "HTML Tidy", you may want to include the Javascript line in the HEAD area of your page to make it more palatable, for instance:
<html> <head> <meta httpequiv="ContentType" content="text/html; charset=ISO
41
88591"> <script src="/__utm.js" type="text/javascript"></script> ... </head>
2. Set UTM Domain (if necessary): The UTM (beginning with UTM2) has a domain setting that controls the scope of the cookies. For single websites, the default setting, "auto", can be left alone. If you have multiple websites that share a common root domain and you wish to process them together, then the domain should be set to the common root domain. To set the domain setting, edit the __utm.js file that was copied into your document root in step 1. Towards the top, you will see the line:
var __utmdn="auto"; /* ...
Change the word "auto" to the domain that the cookies should apply to. The domain must be part or all of the actual URL for this site. Example:
var __utmdn="urchin.com"; /* ...
3. Activate cookies in the logging: The third step to installing the UTM System is to verify and potentially modify the logging format of your web server. For the UTM to function properly, it is required that both referral and cookie information is logged. You will need access to the configuration of the web server. The following general guidelines should work for most IIS and Apache users, however you should check with your system administrator to ensure proper formats. For Apache Users: Apache servers typically use a configuration file called "httpd.conf." Within this file, configuration directives determine the format and location of logfiles. By default, most Apache configurations will log in the NCSA Extended Combined format, which includes referrals and useragents, but is missing the cookie information. Be sure that your logfiles contain the "{Cookie}i" field specification. To modify your logging format from the default, a "special" LogFormat directive can be added and then the log files can reference this format using the CustomLog directive.
The above example is provided as a reference and does not apply to all possible Apache settings. Please refer to the Apache documentation and consult with your system administrator on the actual directives needed to activate cookie logging. The LogFormat directive specifies the specific format of the log file. The example shows the addition of the cookie information to the end of the log file. This format is then named "special" so that it can be identified in the virtual host configurations. The CustomLog directive in the virtual host Chapter 2: Visitor Tracking 42
specification identifies the location of the log file and the format to use. The example uses the "special" format as defined previously. For Microsoft IIS users: The Internet Services Manager provides a pointandclick interface for adjusting the web server configuration. To access this manager, you will need to login to the web server with the appropriate administrator privileges. To access the Internet Services Manager, click on the "Start" menu > Settings > Control Panel, and then double click on the Administrative Tools Folder and then on the Internet Services Manager icon to open the manager.
Modifications to each website can either be made individually or the entire server can be modified. In the left window either rightclick on the server name to modify the entire server, or rightclick on the website name such as "mysite1.com." Select the "Properties" option to open the properties dialog box. For the entire server, click on the "EditE button with "WWW Services" selected in the menu to bring up the Properties dialog box shown on the left below.
Shown in the above figure, be sure that logging is enabled and set to the "W3C Extended Log File Format." Then select the "PropertiesE button to configure the log file format specifics.
43
The window shown above will appear. Click on the "Extended Properties" tab, scroll down and make sure both the "Cookie" and "Referrer" boxes are checked. If not, check these boxes and "Apply" the changes to the site. Whether you use IIS, Apache or another web server, please refer to your server documentation for more information on configuring logfile formats. All major web servers support the logging of cookies and are easily modified to activate this feature. 4. Set Urchin configuration to UTM: The final step in configuring the UTM for your site is to enable the UTM tracking in the Urchin Configuration. This is either done at the time the Profile was created or after by editing the Profile. Open the Urchin Configuration either directly on the machine or by logging in remotely as the "admin" user. Your installation instructions will provide more details on how to access the configuration. Once open, Click on the "Configuration" icon to the left to provide a list of the existing Profiles in the configuration. To enable UTM tracking for a particular Profile, click the "Edit" to the right of the profile name. (Note: if you have not already added the profile, do so now using the "Add" button). After clicking on the "Edit" button, click on the "Reporting" tab to bring up the Reporting Settings Window.
Under the "Visitor Tracking Options" section, use the menu to select "Urchin Traffic Monitor (UTM)" for the Visitor Tracking Method. If you explicitly set the UTM Domain in step 2, then set the UTM Domain setting in the above figure to the same value as in step 2. If you did not specify the domain in step 2, then set the Chapter 2: Visitor Tracking 44
UTM Domain to the address of your website without the "www.". If your website domain does not start with "www.", then use the whole thing. Click the "Update" button to save your settings. Thats it. The installation is complete, and future traffic will contain and benefit from the the UTM System.
SessionID Identification
Overview Many application servers including ASP pages will use a unique session number to identify individuals currently on the site. And while this information doesn't usually contain any historical tracking, it does provides an accurate way of identifying unique sessions. Session IDs are typically located in either the URL query parameters or in a Cookie that is assigned to the user. As long as this information is logged into the Log File, Urchin can use this to uniquely identify each session. Using Session IDs increases the accuracy of reporting by defeating the effects of proxy servers. Using Session IDs does not provide unique visitor tracking like the UTM system, but if you already have Session IDs in place, it can be an easy way to increase the session accuracy immediately. Session ID Location Before configuring Urchin to use Session IDs, check your log file to make sure the IDs are coming through and make a note of the field and format. If the Session IDs are in the request, then the 'request_query' field will contain the variable string. If they are in the cookie field, then the 'cs_cookie' field should be used.
Make a note of the field and the variable name used to mark the identifier. If you don't see the ID in the Log File and you are sure you are using Session IDs, check to see that the logging format contains the appropriate field. Urchin Configuration Once you have the Session ID information, you can easily set your Profile in Urchin to use this for visitor identification. Bring up the Urchin Administration Interface and under Configuration, click on Edit next to the Chapter 2: Visitor Tracking 45
Profile you wish to configure (or click Add if you don't have the Profile configured yet). Click on the Reporting Tab to bring up the "Visitor Identification" settings:
Shown in the above image, change the Visitor Tracking Method to "Session ID" and set the Session Field to either request_query or cs_cookie as determined above. Then enter what comes before and after the Session ID in the two Parsing boxes. For the first example provided at the beginning of this document, sid=12345,enter "sid=" and ""in the two boxes. Click update and you are ready to go.
UTM QuickInstall (Apache)
The following is intended as a quick runthrough on installing the UTM for websites running on an Apache server on all platforms except for Sun Cobalt. For more detailed information on the UTM, please see the article entitled "Urchin Traffic Monitor (UTM)" found in this section. Step 1: Copy UTM files to website document root. The files, __utm.js and __utm.gif are located in the "util/utm" directory in the Urchin distribution. Copy these two files to the main directory of your website content. IMPORTANT: the filenames start with two underscore characters. Step 2: Reference UTM in your HTML. Enter the following line in all of your HTML pages. While it can go anywhere in the pages, we recommend putting it in the <head> section. If you use a common include or template, you can enter it there. IMPORTANT: the filename starts with two underscores.
If you are using a package like "HTML Tidy", you may want to include the Javascript line in the HEAD area of your page to make it more palatable, for instance:
<html> <head> <meta httpequiv="ContentType" content="text/html; charset=ISO88591"> <script src="/__utm.js" type="text/javascript"></script>
46
... </head>
Step 3: Enable cookies in your Apache logging. If not already enabled, you can use the following httpd.conf example to enable cookie logging:
Step 4: Set Urchin Profile to use UTM. In the Urchin Administration interface, edit the profile in question and click on the Reporting tab. Set the Visitor Tracking Method to UTM. Set the UTM Domain to the address of your website without the www. When done click the Update button. Then click on the Profile Settings tab and choose UTMEnable All for the Default Report Set, then click Update again. That's it! Your website will now begin logging UTM data into your normal log file which will be identified the next time you run Urchin. Is it working? To see if the UTM is successfully making entries to your log file, examine the log after you have installed the UTM and clicked on a few pages of the site. You should see an entry similar to the following at the end of the log file:
... "GET /__utm.gif?..." 200 ..."__utma=..."
If you don't see the __utma entries, be sure to check that cookies was enabled in the logging properly. If the status code is not 200 then check to make sure the files were properly copied to your document root.
Installing UTM On Every Page (Apache)
Installing the UTM sensor on every page of a web site allows Urchin to provide the most accurate analytics possible. This article describes how to easily install the UTM sensor on every page of a large site. How can I install UTM on every page? mod_layout is an Apache module that provides both a Footer and Header directive to automatically include output from other URIs at the beginning and ending of every web page. You can use it to include the __utm.js calls on every page of a site. It is an invaluable tool for service providers who do not wish to modify their clients' web pages, as well as for single sites with a large number of web pages. To install mod_layout: Chapter 2: Visitor Tracking 47
1. Download mod_layout from tangent.org 2. Extract the compressed file and read the README. 3. Install mod_layout as described in INSTALL 4. Create an html file called utm.html 5. Add <script src="/__utm.js" type="text/javascript"></script> to utm.html 6. Modify your current Apache configuation file to include the utm.html file. Example
<VirtualHost 63.212.171.4> ServerName urchin.com ServerAlias www.urchin.com LayoutHeader /path/to/file/utm.html ... </VirtualHost>
UTM QuickInstall (IIS)
The following is intended as a quick runthrough on installing the UTM for websites running on a Microsoft IIS server on any Windows platform. For more detailed information on the UTM, please see the article entitled "Urchin Traffic Monitor (UTM)" found in this section. Step 1: Copy UTM files to website document root. The files, __utm.js and __utm.gif are located in the "utils\utm" folder in the Urchin distribution. Copy these two files to the main folder of your website content. IMPORTANT: the filenames start with two underscore characters. Step 2: Reference UTM in your HTML. Enter the following line in all of your HTML pages. While it can go anywhere in the pages, we recommend putting it in the <head> section. If you use a common include or template, you can enter it there. IMPORTANT: the filename starts with two underscores.
If you are using a package like "HTML Tidy", you may want to include the Javascript line in the HEAD area of your page to make it more palatable, for instance:
<html> <head> <meta httpequiv="ContentType" content="text/html; charset=ISO88591"> <script src="/__utm.js" type="text/javascript"></script> ... </head>
Step 3: Enable cookies in your IIS logging. Open the IIS Manager and bring up the Properties window for your website. Make sure the logging is enabled and set to the W3C Extended format. Click the Properties Chapter 2: Visitor Tracking 48
button next to the format and under the Extended Properties Tab, check the box next to Cookie. Step 4: Set Urchin Profile to use UTM. In the Urchin Administration interface, edit the profile in question and click on the Reporting tab. Set the Visitor Tracking Method to UTM. Set the UTM Domain to the address of your website without the www. When done click the Update button. Then click on the Profile Settings tab and choose UTMEnable All for the Default Report Set, then click Update again. That's it! Your website will now begin logging UTM data into your normal log file which will be identified the next time you run Urchin. Is it working? To see if the UTM is successfully making entries to your log file, examine the log after you have installed the UTM and clicked on a few pages of the site. You should see an entry similar to the following at the end of the log file:
... "GET /__utm.gif?..." 200 ..."__utma=..."
If you don't see the __utma entries, be sure to check that cookies was enabled in the logging properly. If the status code is not 200 then check to make sure the files were properly copied to your document root.
Using UTM with Domain Aliases
Background Because cookies are domain based objects, there are some important considerations when a site has multiple domains. A cookie that is set under a domain, "mysite.com", will be passed to all subdomains such as "www.mysite.com". However, this cookie will not be passed to "mysite.net" or any other different root domains. If your website only has one domain responding to "mysite.com" and "www.mysite.com", you can follow the standard UTM installation. However, If you have a website with one or many aliases, it is recommended to redirect traffic from the aliases to the primary site. This will ensure that the UTM visitor tracking is getting set under the primary domain and that all visitors are tracked consistently. If we don't do this, then a visitor may appear as two visitors if they access the same site through two separate domains. The following instructions provides an example of how to redirect aliased domains to the primary domain in Apache and IIS servers. Redirecting Aliases in Apache If you are using an Apache webserver, the configuration can be easily modified to redirect all traffic originating under one of the aliases to the primary site. One way to do this is to create two VirtualHost entries. The first will be the primary domain which will include your normal configuration; and the second VirtualHost will be for all the aliases and will redirect to the primary. Example:
#primary virtualhost <VirtualHost 1.2.3.4> Servername www.mysite.com
49
Serveralias mysite.com ... </VirtualHost> #second virtualhost <VirtualHost 1.2.3.4> Servername mysite.org Serveralias www.mysite.org mysite.net www.mysite.net RewriteEngine on RewriteRule ^(.*) http://www.mysite.com$1 [R=301] </VirtualHost>
The second VirtualHost uses a rewrite rule with a 301 (Moved Permanently) redirect code to forward all traffic to the original site. A single 301 hit will still be recorded in the log file which is nice for tracking which domains people are entering on, but all remaining traffic will be forced under the one domain. At this point, as far as the UTM is concerned, the site appears to be a one domain site and is ready for normal UTM installation. Note: please be advised that you should work with your administrator and reference the apache.org site on configuration parameters. Redirecting Aliases in IIS If you are using a Microsoft IIS webserver, the configuration can be easily modified to redirect all traffic originating under one of the aliases to the primary site. One way to do this is to create two websites in the IIS configuration. The first will be the primary domain (www.mysite.com) which will include your normal configuration; and the second will be for all the aliases (mysite.net, mysite.org, etc) and will redirect to the primary. In the IIS Manager, right click on one of the websites and bring up the properties dialog. On the "Web Site" tab, click the "Advanced..." button. This brings up the window where additional domains can be assigned to the website using the "Host Header" field. Set the primary domain in the primary website, and use the second website to house all of the aliases. Once the second website housing all of the aliases is configured and enabled, create a blank homepage with the following redirect code:
<head> <META HTTPEQUIV=Refresh CONTENT="0; URL=http://www.mysite.com/"> </head>
This will instruct the visitor's browser to immediately redirect to the primary URL. At this point, the primary website appears to be a simple onedomain configuration, and normal UTM installation can proceed with default settings.
Using UTM with Multiple Sites
Multiple Sites Same Root Domain Chapter 2: Visitor Tracking 50
Multiple sites with the same domain (e.g., www.urchin.com and help.urchin.com) can either be processed together or separately, depending on the UTM Domain setting of the two sites. If the UTM Domain is set to the default, "auto", then the two sites will be processed separately. This means that Visitor tracking information will be kept separate for each site. Visitor reporting for one site will not be affected by visitor traffic to other site. Process Together If you wish to process the sites together, sharing Visitor tracking information, then the UTM Domain can be explicitly set to the common domain (e.g., urchin.com). You will need to set this in the UTM code and the Urchin configuration. To set this in UTM code, edit the __utm.js file in the document root of each site. Towards the top, you will see the line:
var __utmdn="auto"; /* ...
Change the word "auto" to the common domain:

var __utmdn="urchin.com"; /* ...
Next, in the Urchin configuration, create a single Profile with UTM activated, and set the UTM Domain to the common domain. You will be processing the logs from both sites in the same Profile. In processing the logs for two sites together, it is recommended to apply a Filter to one of the logs in order to distinguish pages and paths. For the www.urchin.com and help.urchin.com example, inserting '/help' in front of the URLs for help.urchin.com log will allow you distiguish between http://www.urchin.com/foo.html and http://help.urchin.com/foo.html. The resulting pages will be referenced as "/foo.html" and "/help/foo.html", respectively. Create a search and replace filter on the 'request_stem' field with the following settings:
Filter Field: request_stem Search String: ^/ Replace String: /help/
In our example, this filter would then be applied to the log file for help.urchin.com. Running the two separate Log Sources together will require an additional Load Balanced Server Module in the license. Please contact your sales representative for details.
Tracking Flash and Browser Events (UTM5 only)
You can track any browser based event, including Flash and Javascript events, if you have installed the UTM5 (available at ftp://ftp.urchin.com/urchin5/utm5/) on your website. To track an event, call the urchinTracker JavaScript function with an argument specifying a name for the event. For example, calling: javascript:urchinTracker('/homepage/flashbuttons/button1'); Chapter 2: Visitor Tracking 51
will cause each occurrence of the the calling Flash event to be logged as though it were a pageview under the name /homepage/flashbuttons/button1. The argument must begin with a forward slash. The event names may be organized into any directory style structure you wish. For example, if you wish to organize flash events by page, by type of event, you might organize a hierarchy along these lines: '/homepage/flashbuttons/button1' '/homepage/clips/clip1' Flash Code Examples
on (release) { // Track with no action getURL("javascript:urchinTracker('/folder/file');"); } on (release) { //Track with action getURL("javascript:urchinTracker('/folder/file');"); _root.gotoAndPlay(3); myVar = "Flash Track Test" }
onClipEvent (enterFrame) { getURL("javascript:urchinTracker('/folder/file');"); }
HTML Code Examples The following illustrates how to log an onClick event:
<a href="javascript:void(0);" onClick="javascript:urchinTracker('/folder/file');">
The following illustrates how to log a rollover event:

<a href="javascript:void(0);"
52
onMouseOver="javascript:urchinTracker ('/folder/file');">
Tracking Banner Ad Exits and Other Outbound Links
If you publish advertising banners on your site, there is an easy way for you to track which banners visitors click on to leave your site and which advertisers they visit. First, make sure that you have installed the UTM5 (available at ftp://ftp.urchin.com/urchin5/utm5/) on your website. Next, you will need to add some code to each of the banners. For an animated GIF or other type of static banner ad, you would add the following code:
<a href="http://www.advertisersite.com" onClick="javascript:urchinTracker ('/bannerads/advertisername/bannername');">
This code causes each click on the banner to be logged as though it were a pageview named /bannerads/advertisername/bannername. It is a good idea to log all of your advertising banners into a logical directory structure such as /bannerads/the name of the advertiser/the name of the banner. This way, you will be able to easily identify the number of referrals to each advertiser. The equivalent code for a Flash banner is provided below:
on (release) { getURL("javascript:urchinTracker('/bannerads/advertisername/bannername');"); getURL("http://www.advertisersite.com"); }
53
Chapter 3: Urchin Administration
Administration Overview
Introduction The Urchin Administration Interface is a browserbased command center from which you can control virtually everything related to running Urchin, including setting up Profiles, scheduling log processing events, managing Users and Groups, configuring Filters, and much more.
54
To get started, login to your Urchin system using a browser. If the default port was used during installation, then the URL should be http://your.server.com:9999/, replacing 'your.server.com' with the actual name of the system Urchin is running on. Alternatively, http://localhost:9999/ can be used if you are directly on the system. On Windows platforms, there is an 'Urchin Administration' shortcut in the Start menu. The default password for the 'admin' account is 'urchin'. Be sure to change this to a more secure password. Controls After logging into the system and proceeding through the startup wizard, you will see the administration screen with a menu on the leftside navigation. The three primary buttons are 'View Reports', 'Configuration', and 'Preferences'. Click on the 'Configuration' button to begin configuring Urchin.
This menu provides access to all of the critical configuration controls. Click on the arrows to expand a particular section. The darkened color indicates which control is currently being displayed. When first clicking on one of the configuration sections, a list of existing entries may be shown with appropriate 'edit' buttons next to each entry.
Clicking on the 'Edit' button next to a particular entry will allow you to modify the configuration for the entry. To add new entries, click the 'Add' button in the upper right shown in the above figure. After clicking 'Edit' on a particular entry, the set of configuration screens available for that entry is shown using tabs across the top to select the particular configuration subject.
55
Click on a particular tab to access the configuration settings under the tab. After changing any settings, be sure to click the 'Update' button provided at the bottom of each screen. Once you have a long list of entries in a particular area, there are some additional controls that make it easier to find those entries. The Next/Previous buttons are located just above and below the list of entries for scrolling through the entries. The number shown can also increase how many entries are shown at one time
Shown in the above image, the + Filter option can help you quickly find a particular entry. Simply enter all or part of the entry's name and press return. Details about each section are provided further in this manual and by clicking the 'Help' link provided at the bottom of each admin screen. Definitions about each configuration parameter are generally found by clicking the 'Help' link.
Profiles
Importing Profiles (Windows)
56
Overview Urchin's Import Profiles function is a convenient way for users with systems running the Microsoft Internet Information Server to set up Profiles for each of their IIS sites quickly. Urchin can read the IIS configuration, determine what websites are running on the server, and then build basic Profiles for each website that use the IIS logs as their Log Sources. You can then customize the Profiles or add additional Profiles as desired for the imported sites. How to Import Profiles To get started importing Profiles, login to the Urchin administration system as admin and click on the Configuration button at left. Click the Import button at topright and you will be taken to the Import Profiles screen. This screen allows you to select which, if any, websites to import. Once you've checked sites to import click the Import button. Click Done when you've finished with all your import choices.
Recommendations It's a good idea to create at least one Profile for each website on the server so that you get a complete picture of traffic to the server via Urchin's Summary Report. The Summary Report gives you overall traffic information for the server, as well as a ranking of each site by various traffic parameters. This is very handy if you are a host and bill according to bandwidth usage. Note that the Summary Report only shows data based on Profiles that have been configured sites without functioning Profiles are not included.
Working with Profiles
Overview A Profile is the term used for a set of reports for a website and the configuration settings needed to create those reports. In general, you will need to set up a Profile for each website for which you want reporting. If needed, multiple Profiles can be used for the same website with different filtering options. The configuration of a Profile includes information about the website, log file sources, filters, and the schedule for processing. Once a Profile is created and configured, it needs to be 'Run' in order to process raw Chapter 3: Urchin Administration 57
log file data. Licensing Information The Urchin base license includes 100 Profiles. If you need more Profiles, the license can be upgraded by contacting your sales representative or by clicking on the Settings > License > Upgrade link within the configuration. The base license also includes one server per Profile. If you need additional Load Balanced Servers, you will need to upgrade your license. Creating a Profile To get started creating a Profile, login to the Urchin administration interface as an Urchin admin and click on the Configuration button at left. To create a new Profile, click the Add button at topright as shown in the image below. You will be taken to the Add Profile Wizard. This is a simple series of steps designed to help you get the Profile set up in basic form quickly and easily. Each screen in the Wizard has explicit help information that is available by clicking on the ? icon.
Once a Profile is created, the configuration can be modified by clicking the 'Edit' button next to that entry in the list. Tabs are provided at the top of the configuration area to easily access the different configuration screens. Recommendations Urchin has several different methods for identifying visitors and sessions, depending on available information. Of these, the patentpending Urchin Traffic Monitor (UTM) is a highly accurate system that was specifically designed to identify unique visitors, sessions, exact paths, and return frequency behavior. There are a number of visitor loyalty and client reports that are only available when using the UTM System. The UTM System is easy to install and is highly recommended for all businesses. To install UTM, please refer to the UTM install instructions in the Visitor Tracking section of this documentation. If you intend to set up one or more Filters in conjunction with your Profile, it is advisable to have more than one Profile for that website or part thereof. We recommend having one Profile that is the "master" it contains everything. If you wish, for example, to filter out spiders or robots, it's a good idea to put these Filters in a second Profile so you can easily compare the results of the Filters to the master Profile.
58
Log Files
Working with Log Sources
Overview You will generally add a Log Source in the course of creating a Profile. A Log Source is Urchin's way of identifying the characteristics of an access log (sometimes called a transfer log) for one of your websites. Access logs contain all the hits, or requests for web documents, that are made to your website. Some of the log file characteristics that are associated with a Log Source are the path to the log file, the format of the log file (e.g. W3C or NCSA), whether the log is local or on a remote system, and whether a filter should be applied to the log file during processing. An important concept to understand is that Log Sources exist independently of Profiles. Every Profile must have at least one Log Source associated with it to obtain reporting. However, several Profiles could conceivably use the same Log Source. For example, you may want to create multiple Profiles using the same Log Source, but give each Profile a different filter to produce varying report results. So there is not necessarily a 1:1 ratio between Log Sources and Profiles. Configuring Log Sources To get started adding a Log Source to the system, login to the Urchin administrative system as the administrator and click on the Configuration button at left. Next, click the Log Manager button. To create a new Log Source, click the Add button at the top right of the screen. You will be taken to the Add Log Source Wizard. This is a simple series of steps designed to help you get the Log Source set up quickly and easily. Each screen in the Wizard has explicit help information to explain the configuration information displayed on that screen. In the Log Settings screen you will note that you have to choose a Log Format. This setting tells Urchin how the data in your log file is arranged. It is important that you select the correct format for your log or Urchin will not be able to produce meaningful report data. Urchin understands a default set of log formats that you can choose from via a dropdown menu. They are: Auto: Urchin uses this format to automatically detect NCSA, W3C, Netscape, ELF, and ELF2 log formats. Instead of explicitly selecting one of these, you may choose Auto and Urchin will correctly deduce how to read the data if your log format is in this list. NCSA: Apache modified Extended/Combined format (see Logging Apache and IIS for a description of this format) W3C: Microsoft IIS servers typically use this format, although other webservers can also be configured to produce W3C logs. Netscape: Netscape and iPlanet servers use this format by default.
59
ELF/ELF2: ECommerce Log Format; see the specification in the Ecommerce Module section for details. Google: If you have licensed the Campaign Tracking Module, use this format for logs containing Google costperclick spending data. Note that the Google log format can not be auto detected. Overture:If you have licensed the Campaign Tracking Module, use this format for logs containing Overture costperclick spending data. Note that the Overture log format can not be auto detected. Custom: Although not initially listed in the dropdown menu, you can create your own custom log formats, which will automatically appear in the dropdown menu when properly configured. Please refer to the "Custom Log Formats" article in the Advanced Topics > Customization section of the Documentation Center. If you don't believe your webserver currently produces logs in one of the recognized default formats, then either you can reconfigure your webserver to log in one of these formats, or you can create a custom log format that conforms to how your webserver currently logs. If you want to reconfigure your webserver logging, then it is recommended that you choose the W3C or NCSA style logging. Load Balancing and Parallel Log Processing If you have purchased a Load Balancing License, the Log Source Wizard provides a Parallel Log Processing option. When Parallel Log Procesing is enabled, Urchin opens all of the log files at once and reads them in a rotating fashion, one section at a time, each section corresponding to 15 minutes of log activity. Enabling Parallel Log Processing significantly increases performance on load balanced sites.
Log Management
Overview Log management is an important concern when running software such as Urchin. Because busy sites will build up large log files fairly quickly (up to several gigabytes in one month in some cases), log management should be considered carefully. It is recommended that a standard log rotation practice be established. Compressing and otherwise archiving files offline are standard practices. Please see the article on Log Rotation Best Practices in this section for further information on establishing such a procedure. Log management is necessary only for disk resource usage considerations, not for purposes of avoiding reprocessing data. Urchin does not need any sort of log rotation to avoid data duplication, as it is equipped with a log tracking capability that ensures that previously read log data is not reprocessed. Because Urchin should never need to reread a log file once they have been processed, at your discretion you may delete the log(s) after each processing run. However, it is not uncommon to keep old logs for a specified amount of time for historical or auditing reasons. Managing Logs via Urchin Each Log Source has a Log Destiny setting with the options Don't Touch, Archive/Compress, and Delete. Once all Profiles that are utilizing a Log Source have finished their processing, Urchin uses the Log Destiny setting to determine the disposition of the Log Source. The Log Destiny setting is accessible under the Advanced Settings tab for a given Log Source. It is recommended to set Log Destiny to Archive/Compress so Chapter 3: Urchin Administration 60
that you save disk space if you want to keep your logs for some period of time. If you are comfortable with the fact that once you've processed a log that it is removed, then you can choose a Log Destiny of Delete. However, realize that this means you will not have the option of rerunning Urchin against that log in the future unless you have a backup elsewhere. Considerations A few special situations should be noted: Do not use the Archive or Delete options with a Log Source if you are processing live logs. A live log is one that is being actively written to by a webserver. Using these setting with a live log will cause a loss of data. If Log Destiny for a remotely retrieved Log Source is set to Don't Touch, then that log will grow continually unless there is some process external to Urchin that is handling log management on the machine where the log is created. Since Urchin must transfer a copy of the remote logfile to the local system before processing, as the log file grows it will take Urchin longer and longer to transfer the file. This will have the side effect of lengthening your overall Urchin run time.
Log Rotation Best Practices
Overview It is very typical in most operating environments for the system services and applications such as webservers to generate logfiles that record actions and events related to those services. In most cases, it is also standard practice for the operating system and/or applications to perform regular maintenance on the logfiles to keep the size of the logfiles in check. This prevents the logfiles from growing without bounds and eventually running out of disk space. A common approach to managing logs is to have a regularly scheduled log rotation task that renames the existing logs with a timestamp and then restarts the service or application with a new, zero length logfile. It is also a standard practice for the log rotation task to compress the old logfiles, and to delete logfiles after a certain age or rotation cycle threshold has been reached. In the specific case of webserver logs, the rotation is usually handled on a daily basis to ensure that the logs remain at a manageable size. In addition, a daily rotation schedule is generally a good granularity to facilitate postprocessing of webserver logs with an analysis tool like Urchin. Some webservers such as Microsoft's IIS have builtin log rotation functionality, which, when enabled, will rotate logs on a daily basis by default. Other webservers such as Apache have no explicit log rotation handler, but provide tools for easily restarting the webserver (without loss of web service) to accommodate the log rotation operation (e.g. apachectl restart ). Log Rotation in Previous Versions of Urchin Chapter 3: Urchin Administration 61
Unlike Urchin 4 and 5, previous versions of Urchin have no built in log tracking mechanism to determine which logs have already been processed, so those earlier Urchin versions depend heavily on a reliable log rotation scheme to ensure that logs are only processed a single time. As such, preUrchin 4 versions have the option of providing simple log rotation functionality and the ability to restart the webserver as part of the overall processing duties. If this Urchin logrotation mechanism is not utilized, the responsibility of reliable log rotation must be handled completely by an external log management mechanism. This has traditionally been the function of a larger overall system log management scheme provided as part of the operating system (e.g. the opensource "logrotated" found in many Linux distributions). Log Rotation Practices with Urchin 5 With the advent of Urchin 4, the need for log rotation to avoid duplicate processing of logs has been eliminated thanks to Urchin's Log Tracking technology. This allows Urchin 5 much greater flexibility in processing of logs, such as the ability to process "live" logs that are still being written by the webserver, or to process logs that are rotated on an manual or irregular basis. Important Note: Unlike previous versions of Urchin, Urchin 5 does not provide hooks for invoking a log rotation procedure or restarting a webserver after log rotation tasks have been performed, although certain postlogprocessing actions are possible as described below. While Urchin 5 operation does not require that webserver logs be rotated regularly or at all, it is recommended that a standard log rotation scheme be implemented to ensure smooth operation and to keep the Log Tracking utility from having to do a lot of unnecessary processing. It is much more efficient from both a system and application standpoint to manage several smaller logs than one very large log, as file operations tend to slow considerably as files get larger. Smaller files are also much easier to back up and restore in the event of a disk failure or other system failure. Log rotation mechanisms needn't be overly complex in most cases, a simple shell script or Perl script run daily from cron on UNIXtype systems is all that is necessary. The script merely needs to rotate the existing webserver log and timestamp it (using the %Y%m%d or YYYYMMDD formats is recommended), and restart the webserver. Additional logic can be added to prune old logfiles to keep disk space usage in check. A sample log rotation script written in Perl can be downloaded from http://www.urchin.com/support in the Helper Scripts area. This script rotates one or more logs and timestamps them appropriately, then removes logs that are older than a certain number of days (configurable). Note: If you are running IIS on a Windows system, the log rotation functionality is included as part of the IIS management and no external script is needed. Configuring Urchin 5 for Use with Log Rotation Once you have your log rotation scheme in place, it is a simple matter to configure Urchin to process your rotated log. You can either set up the Log File Path specification to use a wildcard which matches the timestamped log filename pattern when configuring a Log Source (.e.g. accesslog.* for Apache logs or ex*.log for IIS logs) or you can use Urchin's builtin timestamp pattern matching (e.g. accesslog.%Y%m%d for Apache, ex%y%m%d.log for IIS). When Urchin encounters this pattern, it will substitute yesterday's date for the %Y%m%d pattern and process the log with the resulting filename (e.g. accesslog.20020617). For further information on the date matching pattern, please see the article in this section entitled Wildcard &Date Substitution in Log Paths.
62
The wildcard specification has the advantage of allowing you to place a number of unprocessed logs in a single directory and have Urchin process them the next time it runs. This is especially convenient for handling situations where the expected logfiles are not in place when Urchin runs, e.g. due to a remote webserver being down or loss of network connectivity. The disadvantage is that Urchin must open up the directory and search each log file to determine if it has already been processed, and this can induce significant overhead when many log files are resident in the directory. If you deem your log rotation scheme to be reliable, using the YYYYMMDD pattern matching scheme is a more efficient method. You may also wish to have Urchin 5 delete or archive/compress the log once it has been processed. Different Log Destiny options can be set in the the Advanced Settings of a Log Source. For more information on these Log Destiny settings, please see the Log Management document in the Log Files section of the Urchin Administration area. Important! Log Destiny options should not be used with live logs that have not been rotated! Configuring Log Rotation on UNIXtype systems Due to the large variation operating system functionality and webserver configurations, and the high likelyhood that log rotation procedures are highly sitespecific, there is no cookbook method for establishing webserver log rotation on UNIXtype systems. However, a sample log rotation script called WebLogRotate is available from the Urchin web site in the Helper Scripts area. This script is written in Perl to make it as portable as possible, and is typically invoked from cron on a daily basis. Configuring Log Rotation for Windows IIS Webservers As mentioned above, the management functions of IIS allow for automatic log rotation of webserver logs, though this functionality is not enabled by default. Please follow the steps below to configure an IIS webserver for proper log rotation. It is recommended that the logs be rotated daily, and that the log rotation be set to happen in relation to local time. By default, IIS will rotate logs at midnight GMT rather than localtime. Under Windows 2000, you should insure that IIS webserver is configured properly to do log rotation. This is accomplished using the Computer Management function of Windows 2000. Windows NT, Windows XP and Windows 2003 Server utilize a similar procedure. To open Computer Management and establish log rotation, perform the following actions: Click Start > Settings > Control Panel Doubleclick Administrative Tools Doubleclick Computer Management. Doubleclick on Internet Information Services Rightclick on Default Web Site and select Properties In the popup window, select the Web Site tab At the bottom of the window, click on the Properties tab Click the Daily radio button under the New Log Time Period heading Click the Use local time for file naming and rollover checkbox. This will ensure that IIS rotates the webserver logs on a daily basis just after midnight.
63
Logging Apache and IIS
Overview It is critical to set up your webserver logging in a format that allows Urchin to properly interpret the data and produce fully detailed reporting. This article explains the process for the most common webservers, Apache and Microsoft IIS. For maximum reporting depth, it is important to enable logging to include Referral and User Agent information. To enable unique visitor reporting when using the Urchin Tracking Module (UTM), it is additionally required to enable cookie logging. UTMbased tracking is the only way to get true unique visitor reporting. It's advisable, although not required, that you decide whether you want to use UTM prior to changing your webserver logging. If so, you should enable cookies in your logs now. It will not hurt if you enable cookies but do not install UTM on your website immediately. You may want to look over the section on Visitor Tracking to familiarize yourself with the UTM installation before proceeding. Configuration Apache By default, Apache generally logs in what's called common log format, and also provides an option to log in a more detailed format known as NCSA extended/combined log format. For optimal reporting, Urchin requires a variation of the NCSA extended/combined format. To configure Apache to use the appropriate format do the following: 1. Make a backup copy of your httpd.conf file. Then use a text editor to open your original httpd.conf. 2. Locate the section containing lines that begin with the word LogFormat 3. Insert a new LogFormat line using one of the forms shown below, depending on whether you will be using UTM or not. The LogFormat entry must be added to your configuration file as a single line without carriage returns or line breaks. Make sure you pay close attention to entering in all the characters correctly. For websites that will not use UTM LogFormat "%h %v %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{UserAgent}i\"" urchin For UTMenabled websites: LogFormat "%h %v %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{UserAgent}i\" \"%{Cookie}i\"" urchin The word "urchin" at the end of the LogFormat line is a nickname that will be used elsewhere in your httpd.conf to apply this format to a log file. This string can be anything you choose. Using "urchin" will help identify that this entry was created to accommodate Urchin processing. Chapter 3: Urchin Administration 64
4. Examine the <VirtualHost> entry for which you wish to enable this new logging format. Deactivate any existing TransferLog or CustomLog entries within a <VirtualHost></VirtualHost> group by inserting a # in front (e.g. TransferLog becomes #TransferLog). Then insert the following new CustomLog entry, replacing the string path_to_log with the appropriate path to your log location: CustomLog path_to_log/access.log urchin If you chose some identifier other than "urchin" as the nickname for your LogFormat entry earlier, use that nickname in place of "urchin" in the CustomLog entry. 5. Save the edits to your httpd.conf file. 6. IMPORTANT! Check the syntax of your new httpd.conf by running the command: apachectl configtest This should produce the response syntax ok. If not, doublecheck your httpd.conf file and fix any errors. If you cannot get the correct response, do not continue with this procedure. Instead, make a backup copy of your edited file, then restore the original by overwriting this version with a copy of httpd.conf you saved at the start of this procedure. This will ensure that your webserver continues to work normally while you figure out what is wrong with your changes. 7. Once you have confirmed the syntax of your httpd.conf, restart Apache. The preferred method is by calling the apachectl script, which is typically installed with Apache. apachectl restart 8. Check the logging. Open a browser and hit the site in question a few times. Then examine the last few lines of the log file specified in your CustomLog entry. You should see several recent hits have been written to the log. For the Urchin modified extended/combined log format, a log line will look similar to this: 64.40.51.27 www.urchin.com [28/Aug/2002:15:11:01 0700] "GET //var/www/urchin_helptest/images/urchin_header_logo.gif HTTP/1.1" 200 3017 "http://www.urchin.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" If you have configured UTM on your site and have turned on cookie logging a log line will look similar to this: 64.40.51.27 www.urchin.com [28/Aug/2002:15:11:01 0700] "GET //var/www/urchin_helptest/images/urchin_header_logo.gif HTTP/1.1" 200 3017 "http://www.urchin.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" "__utma=171060324.1378004559.1063331913.1063334677.1063521838.3; __utmb=171060324; __utmc=171060324" Note the additional UTM cookie information at the end of the line. Microsoft Internet Information Server (IIS) Note: Microsoft IIS uses a W3C logging format. Chapter 3: Urchin Administration 65
Urchin can provide very basic reporting if your IIS log files have, at the very least, the following fields: Date Time CIP CSURIStem SCStatus SCBytes These are required fields. Without them you will not get meaningful reporting. However, this minimal logging does not provide enough information for Referral and Browser reporting. Therefore it is advisable to set more detailed logging properties for your IIS server. IIS logging properties are configured either separately for each domain on the server, or globally. For servers with more than a few domains, the global option is recommended. The following steps will ensure that the required log file fields are being recorded. If you elect to log additional fields, Urchin will just ignore them at processing time. However, logging unneeded fields will increase the size of your log files so it is best to only log the fields needed by Urchin. 1. Launch the IIS services management tool by going to Start>Programs>Administrative Tools>Computer Management 2. Expand the Services and Applications tree, then select Internet Information Services, which should bring up a list of websites (except on Windows 2003 Server which will require that you further expand the Web Sites folder to get a listing of sites). 3. Right click on the entry for the site you want to modify and select Properties 4. Select the Web Site tab and in the section at the bottom of this screen verify that the Enable Logging checkbox is checked. Then from the Active Log Format dropdown menu choose W3C Extended Log File Format. 5. Click on the Properties button next to the Active Log Format box 6. Select the Extended Properties tab 7. Check the boxes for the following fields: Date [ date] Time [ time ] Client IP Address [ cip ] User Name [ csusername ] Method [ csmethod ] URI Stem [ csuristem ] URI Query [ csuriquery ] Protocol Status [ scstatus ] Bytes Sent [ scbytes ] User Agent [ cs[UserAgent] ] Referer [ cs[Referer] ] Cookie [ cs[Cookie] ] (This field only required for UTM tracking) 8. You should make sure the Process Accounting box is unchecked as it does not provide useful web access activity information. 9. Select Apply and OK on each window to save your settings. 10. It is not necessary to restart IIS. Your logs should immediately begin logging according to the new settings. Chapter 3: Urchin Administration 66
Logging iPlanet
Overview This article provides a brief overview of how to configure logging for an iPlanet webserver to facilitate proper processing and reporting for Urchin. Use "Netscape" type for Log Source setting. There is a set of minimally required fields necessary for Urchin to produce reports. They are: date time hostname or ip address of requesting system request (i.e. what document did the requesting system ask from your webserver) status code generated by request (numeric) bytes (bytes transferred from server to client) In addition, for the most complete reporting you need the following fields: referral useragent cookies (if the Urchin Traffic Monitor is installed on your site) Configuration
Init fn=flexinit access="$accesslog" format.access="%Ses>client.ip% %Req>vars.authuser% [%SYSDATE%] clfrequest%\ %Req>srvhdrs.clfstatus% %Req>srvhdrs.contentlength% s.useragent%\ s.referer%\ s.cookie%\
Logging: Tomcat (Apache Jakarta Project)
67
Overview This article describes how to configure the Tomcat webserver for use with Urchin. Standard logging format without cookies. className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="access_log" suffix=".log" pattern="%h %v %u %t "%r" %s %b "%{Referer}i" "%{UserAgent}i"" resolveHosts="false"/> You must have Tomcat 5 to log cookies. className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="access." suffix=".log" pattern="%h %v %u %t %r %s %b %{Referer}i %{UserAgent}i %{Cookie}i" resolveHosts="false" />
Logging Other Webservers
Overview This article provides a brief overview of how to configure logging for webservers other than Apache and IIS to facilitate proper processing and reporting for Urchin. Urchin will process any webserver log as long as it can understand how the data is organized in each log file entry. The information in this article applies only to access logs. If you are interested in details of ecommerce logging, please see the Ecommerce Module section of the Documentation Center. Regardless of your webserver type or logging format there is a set of minimally required fields necessary for Urchin to produce reports. They are: date time hostname or ip address of requesting system request (i.e. what document did the requesting system ask from your webserver) status code generated by request (numeric) bytes (bytes transferred from server to client) In addition, for the most complete reporting you need the following fields: referral useragent cookies (if the Urchin Traffic Monitor is installed on your site) Configuration The specifics of how to make changes to logging characteristics for every webserver would be too cumbersome to list. In general the easiest approach is to configure your logging to conform to either Urchin's NCSA or W3C form, then choose the appropriate default format from the Log Format dropdown menu in the Log Source. If your webserver can support this approach then see the document Logging Apache and IIS. Chapter 3: Urchin Administration 68
The information there on how the necessary data fields are setup may be useful to you, even though the details on the methods for making the changes won't necessarily apply to your webserver.
Wildcard &Date Substitution in Log Path
Overview Urchin 5 allows you to specify wildcard and date matching variables in the path to a log file. When an Urchin task is executed and the log path is read, these variables are converted and compared for matches with the directories and filenames on your system. The date matching capabilities in Urchin 5 are more extensive than those provided in previous versions of Urchin, namely: Date substitution may happen at any point in the pathname of the logfile; previous versions of Urchin only allowed substitutions in the actual filename specification A more robust and flexible data pattern matching algorithm has been implemented, although the previous YYYYMMDDstyle pattern matching is still supported for backward compatibility The most commonly used time matching variables and formats are shown next. The full set of all supported time formatting variables is listed at the end of this article. * an asterisk matches zero or more consecutive characters DD is replaced by the 2 digit numeric day of the month, e.g. 0131 %d is equivalent to DD MM is replaced by the 2 digit numeric month, e.g. 0112 %m is equivalent to MM YY is replaced by the 2 digit numeric year, e.g. 0199 YYYY is replaced by the 4 digit numeric year, e.g. 00012003 %Y is equivalent to YYYY Note that the asterisk in this context behaves like filename matching as you'd have in a command shell in UNIX or DOS, not like regular expression matching where this character would match zero or more instances of the preceeding character. These variables can be combined in any way the user chooses. The list below shows examples of how instances of these variables would translate on 08/13/2003. Note that the day specifiers DD and %d get converted into the day before 13. YYYYMMDD would translate into 20030812 %Y%m%d would translate into 20030812 %Y/%m/%d would translate into 2003/08/12 (note that this has implications in a path) *YYYYMMDD would match any filename ending in the string 20030812 The DD and %d day specifiers get converted into the previous day by default because of the way webserver logs and Urchin processing are typically managed. Your logs will usually be rotated daily to keep them from growing too large and so that each log contains primarily data for a single day. This rotation happens most frequently just before midnight. Urchin processing would usually occur after this when the clock has moved Chapter 3: Urchin Administration 69
past midnight to the next day of the month. If you were adding a YYYYMMDD style timestamp to your log file name as it is rotated, then that date and Urchin's run time would differ by one day. Evaluating a day conversion at the time Urchin is run would result in a failure to find the correct log name since the log timestamp would read 20030812, but Urchin would be executed on 20030813. Although this is the most common model for log management, it isn't the only option. So Urchin has a configuration parameter that controls the manner in which these variables get resolved to a particular day. The Date/Time Wildcard Substitution in Log Path Name setting can be used to adjust a time offset that controls how DD and %d are evaluated. This setting is explained in greater detail at the end of this article. The year, month, and day variables can be used either in the log file name or in the directory/folder path to the log file. The asterisk can only be used in the filename portion of the log file path. As well, time format variables can be repeated within a log source path, but the asterisk may only be used once. The examples in the Procedure section will help clarify this. Procedure When creating or editing a Log Source, you should use the time variables in the path you use in the Log File Path box under the Log Settings tab. As an example, a typical daily Apache webserver log rotation scheme creates a log with the datestamp indicating the date of the log entries, e.g. at 1 minute after midnight on 07/16/2002 the log rotation mechanism archives the log:
/var/log/httpd/access.log
and saves it as
/var/log/httpd/access.log.20020715
To match this pattern in the log source for an Urchin Profile, you'd simply specify
/var/log/httpd/access.log.YYYYMMDD
in the Log File Path and Urchin will automatically look for the previous day's log when it runs that day. As another example, when Microsoft's IIS webserver is configured to rotate logs daily, it will name the logfile and include the current date as part of the filename, e.g. ex021127.log. Therefore, to process a daily IIS log, you would use a logfile specification something like:
C:\WINNT\System32\LogFiles\W3SVC1\exYYMMDD.log
in the Log File Path field of the Log Source for the Profile. To allow Urchin to process logs that are rotated more frequently than just a daily basis, you can use a combination of the YYYYMMDD syntax and wildcards to match all logfiles created the previous day. To do this, you would need to ensure that the rotated log file was named consistently, e.g. with an hour appended to the filename. In the Log File Path specification, you'd then use a pattern such as:
/var/log/httpd/access.log.YYYYMMDD*
or Chapter 3: Urchin Administration 70
C:\WINNT\System32\LogFiles\W3SVC1\exYYMMDD*.log
A more complex usage would be one where logs are stored in directories named so that they reflect the year, month, and day. Suppose you had the following directory paths for storing logs: /logs/2003/07 /logs/2003/08 /logs/2003/09 and you kept all logs for a given month in their respective directories and each log had the day of the month appended to it (e.g. access.log.01, access.log.02). To allow Urchin to figure out what logs to process you could use one of the following log path formats: /logs/YYYY/MM/access.log.DD /logs/%Y/%M/access.log.%d At log processing times, Urchin will then process all logs matching yesterday's date pattern, with any suffix. As with any use of wildcards in the Log File Path field specification, it is important that Log Tracking for the Profile be enabled to ensure that Urchin does not reprocess logs. Considerations To determine the date for the replacement pattern, Urchin subtracts 24 hours from the current time, based on the local time. It will properly handle month and year boundaries. However, this can be modified using the Date/Time Wildcard Substitution in Log Path Name setting under the Advanced Settings tab of a log source. You can select either Localtime or GMT time as the basis for your time adjustments, then using the Hours edit box specify a plus or minus offset in hours. Complete Date and Time Format Reference This is the full list of supported time format variables, which follows conventions used in the Standard C Library strftime() routine: %A = national representation of the full weekday name. %a = national representation of the abbreviated weekday name. %B = national representation of the full month name. %b = national representation of the abbreviated month name. %d = the day of the month as a decimal number (0131). %e = the day of month as a decimal number (131); single digits are preceded by a blank. %H = the hour (24hour clock) as a decimal number (0023). %I = the hour (12hour clock) as a decimal number (0112). %j = the day of the year as a decimal number (001366). %k = the hour (24hour clock) as a decimal number (023); single digits are preceded by a blank. %l = the hour (12hour clock) as a decimal number (112); single digits are preceded by a blank. %M = the minute as a decimal number (0059). %m = the month as a decimal number (0112). %p = national representation of either "ante meridiem" or "post meridiem" as appropriate. %S = the second as a decimal number (0060). %s = the number of seconds since the Epoch, UTC (see mktime(3)). Chapter 3: Urchin Administration 71
%w = the weekday (Sunday as the first day of the week) as a decimal number (06). %Y = the year with century as a decimal number. %y = the year without century as a decimal number (0099). %z = the time zone offset from UTC; a leading plus sign stands for east of UTC, a minus sign for west of UTC, hours and minutes follow with two digits each and no delimiter between them (common form for RFC 822 date headers). %% = `%'. (for use when a literal percent sign is needed inside a date/time entry)
Processing Historical Logs
Overview You may wish to process your historical logs after installing Urchin. This is easily accomplished. Simply specify a directory and a partial filename and/or wildcard (including regular expressions) in the Log Manager's Log Settings screens. NOTE: You may not use wildcards on remote HTTP and HTTPS log sources. How to Process Historical Logs First, add a Log Source to the system. Click on the Configuration button at left, and then the Log Manager button. On the main screen, click on the Add button at topright. On the first screen, select Add Local Log Source, and continue. On the next screen, click Browse, which will bring up the File Browser. Locate the correct directory in the leftside window. The rightside window will display the files in the directory, and the left side will display any other directories. When you are in the correct directory, enter a partial filename and an asterisk (or other regular expression), and click the Verify button. A window will open which will show you all the matches to your pattern. Click any of the filenames to get information on the file location, size, modification date, and file permissions. If the pattern match is correct, click OK, and then OK again in the File Browser window. Next, if it hasn't been already, associate this log file with a Profile by clicking the Configuration button at left, and then the Profiles button. Once the association has been completed (see Working with Profiles), click the Run/Schedule button next to the Profile in the main Profiles listing, and schedule the execution of the Profile, or click the Run Now button for immediate processing. Urchin does not need any sort of log rotation to avoid data duplication. Urchin is equipped with a log tracking capability that ensures only new hits are processed. However, as mentioned above, logs can quickly consume large volumes of disk space, so it is a good idea to periodically compress and archive log files. Because Urchin never needs to reread log files once they have been processed, it is perfectly acceptable to delete the log(s) after each processing run. However, many people keep logs for a specified amount of time in case they are needed for some reason, such as if a new Profile is created for that site, and historical analysis is desired. Recommendations
72
Log management is not essential from the outset, but as logs grow, it becomes important. We recommend deciding on a log management plan when you initially deploy Urchin.
Log Reprocessing
Overview Certain circumstances may warrant reprocessing of log data, such as a DNS server being down when the processing was, incorrectly applied filters, and so on. The following document describes the proper procedure necessary to back out and reprocess webserver log data. Please note that reprocessing logs requires the use of Urchin utilities that are only available from a command line shell environment. It is not possible to do the complete procedure exclusively from the Urchin webbased administrative GUI. Reprocessing a Single Day: In the Urchin admin GUI, edit the Profile and turn off Log Tracking under the Storage/DB tab. Be sure to click Update to save your change. Under the Log Sources tab, ensure that the proper log file (s) to be reprocessed are specified. The log data should only contain hits for the date(s) that you are zeroing out the statistics for. Invoke a command shell on the Urchin system Run the udbsanitizer utility in the 'util' directory/folder of the Urchin distribution with the command udbsanitizer p profilename d YYYYMM where YYYYMM is the year and month containing the day you wish to reprocess Select option 5, Zero out one or more days. The utility will prompt you for the correct day and will zero out the statistics for that particular day. If you have a range of contiguous days you'd like to zero you you can specify that range by using the numbers of the start and end days separated by a hyphen (e.g. 510 to zero out days 5 through 10 of the month). If necessary, re invoke the utility to zero out statistics for additional days in that month if you cannot use a range. Click the Run Now button under the Run/Schedule tab for the Profile to reprocess the log data Reset the Log Source by changing the Log File Path back to its original setting Under the Storage/DB tab in the profile edit area, turn Log Tracking back on Reprocessing an Entire Month: The procedure for reprocessing an entire month's worth of data is identical to the single day procedure above, except when invoking the udbsanitizer utility select Option 2, Delete this month entirely instead of Option 5. Additional information:
73
The udbsanitizer utility provides additional functionality for managing Urchin databases. Please see the udbsanitizer article in the Advanced Topics>Utilities section for further information about its capabilities and usage.
Filtering
Filtering Overview
This article describes data processing filters, which are applied before reports are generated. In addition to data processing filters, Urchin provides report filtering on the reporting interface. Read Reporting Interface > Report Side Filtering for information. To create a filter, click the Add button in the Filter Manager screen. Filtering Sequence Each time the scheduler runs a profile, each entry in the log files passes through the steps shown in the figure below. Before any of the report tables are updated, the 'raw' fields in the log file entry are parsed, which creates a number of 'auto' calculated fields. For example, the browser and platform fields are calculated from the raw cs_useragent field.
Filtering is applied once all of the fields have been populated, and before any entries are made in the report tables. Filters can be applied to any type of field, including calculated fields. No additional parsing occurs after filters are applied. Thus, it is important to apply the Filter to the correct field. A list of the purpose of each available field is provided in the next section. Filters are applied in the following order: 1. Advanced Filters, Search &Replace Filters, and DynamicURL Filters 2. Decode URL and Japanese Encoding Filters 3. Lookup Tables 4. Include and Exclude Filters For example, if an Exclude Filter is applied to the same field as the Decode URL Filter, the Exclude Filter must take into account that encoded characters, such as %20, will have already been translated.
74
Filter Types Exclude Pattern: This type of filter excludes log file lines (hits) that match the Filter Pattern. Matching lines are ignored in their entirety; for example, a filter that excludes Netscape will also exclude all other information in that log line, such as visitor, path, referral, and domain information. Include Pattern: This type of filter includes log file lines (hits) that match the Filter Pattern. All nonmatching hits will be ignored and any data in nonmatching hits is unavailable to the Urchin reports. Decode URL: This is a predefined filter that decodes URLencoded characters back to their original form. For example, '%20' in a URL is replaced with a space. Apply this filter to URIstems and queries to see the original text. Japanese Encode (UTF8): This is a predefined filter, generally applied to the keywords field or other potentially multiencoded field, that looks for Japanese encoded words and converts the encoding to UTF8 format for a consistent storage and display. Search &Replace: This is a simple filter that can be used to search for a pattern within a field and replace the found pattern with an alternate form. See the section on Search &Replace Filters for more information. Dynamic URL (deprecated): This type of filter is used to translate arcane dynamically generated URLs into more humanreadable page names. Note: the new Page Query Terms Report duplicates the bulk of this function, and the Advanced Filter encompasses all DynamicURL's features and more. It is strongly recommended that you either eliminate old Dynamic URL filters if possible or else convert them to one of the newer forms of filter. Advanced: This type of filter allows you to build a field from one or two other fields. The filtering engine will apply the expressions in the two Extract fields to the specified fields and then construct a field using the Constructor expression. Read the Advanced Filters article for more information. Choosing Where To Apply a Filter Filters can be applied either to profiles or to individual log sources. The scope of the filter can be different for each of these cases. A filter applied to a profile will affect all log sources processed for that profile. A filter applied to a log source will always affect that specific log source, even if multiple profiles are using the same log source. In general, you should apply filters to the profile unless one of the following cases occurs: You have multiple log sources for a profile and you do not want the filter to apply to all of the log sources. You have multiple profiles using the same log source, and you want all of the profiles to use the same filter. In these two cases, apply the filter to the specific log Source, otherwise, it is recommended to apply the filter to the profile. Creating and Managing Filters In the Urchin administration interface, click Configuration, then Urchin Profile>Filter Manager. Click the Add button to launch the Filter Wizard. Once you have created a filter, edit the profiles or log sources to which you wish to apply the filter, and add the filter. Chapter 3: Urchin Administration 75
To create a filter while editing a profile or log source, click the Profile Filters tab or Log Filters tab. A window appears showing the currently active filters. Click the Add button on this window to launch the Filter Wizard. The filter creation screen has a dropdown menu at the top with selectable builtin filters for common filtering tasks such as filtering out robot traffic to your site. These builtin filters also serve as examples of how to set up various kinds of filters.
Filter Fields
Overview When a hit or line in a log file is read during processing, the hit is broken down into 'Raw Fields'. Fields are generally separated by spaces, tabs, or commas. The Log Format as chosen in the Log Source>Log Settings screen determines how these Raw Fields are assigned internally. Once the Raw Fields are read, Urchin automatically calculates the 'Auto Fields', using the values in the 'Raw Fields'. Most reports use data in these Auto Fields for updating. Filters can be applied to either Raw or Auto Fields. The following two tables provides insight into the purpose of each Field. The first table lists the Fields used for standard reports. A dash in the Fields Used column means that the report in question summarizes numbers generated in other reports and therefore is not tied specifically to the data in particular fields. The second table lists all available Fields and their purpose. Report Field List Report Name Traffic Sessions Graph Pageviews Graph Hits Graph Bytes Graph Summary Visitors &Sessions Visitors by Day Sessions by Day Unique Visitors Unique Sessions Visitor Loyalty Session Frequency Summary Pages &Files Chapter 3: Urchin Administration utm_session_number Fields Used
76
Requested Pages Downloads All Files Directory by Pages Drilldown Directory by Files Drilldown Directory by Bytes Drilldown File Types by Hits File Types by Bytes Page Query Terms Posted Forms Status and Errors Navigation Entrance Pages Exit Pages Click Paths Click To and From Length of Pageview Depth of Session Length of Session Click To and From Report Referrals Referrals Referral Drilldown Search Terms Search Engines Referral Errors Domains &Users Domains Domain Drilldown Countries IP Addresses IP Drilldown Usernames by Hits Usernames by Bytes Usernames by Sessions Browsers &Robots Browsers by Sessions Drilldown Browsers by Hits Drilldown Browsers by Bytes Drilldown
request_stem request_stem request_origfilepath request_stem request_origfilepath request_stem request_origmime request_origmime request_stem|request_query request_stem sc_status|request_errordetail request_stem request_stem request_stem request_stem request_stem request_stem referral_domainandstem referral_domainandstem referral_domain|referral_keywords referral_domain|referral_keywords referral_errordetail|referral_domainandstem domain_primary|domain_complete domain_primary|domain_complete domain_primary|domain_complete c_ip c_ip cs_username cs_username cs_username useragent_complete useragent_complete useragent_complete
77
Platforms by Sessions Drilldown Platforms by Hits Drilldown Platforms by Bytes Drilldown Combos by Sessions Robots by Hits Drilldown Robots by Bytes Drilldown Client Parameters Screen Resolution Screen Colors Languages Java Enabled Timezone Offset Javascript Version ECommerce Revenue Number of Transactions Products by Revenue Products by Quantity Products by Revenue Drilldown Products by Quantity Drilldown ECommerce Summary Revenue Source Revenue by Region Drilldown Revenue by City Revenue by Referrals Revenue by Search Terms Revenue by Search Engines Drilldown Revenue by Domains Drilldown Complete Field List id 1 2 3 4 5 6 7 8 Field iis_date iis_time apache_time c_ip cs_username selected>cs_request cs_method cs_uristem Type (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW)
useragent_complete useragent_complete useragent_complete useragent_complete browser_base browser_base utm_screen_resolution utm_screen_colors utm_language utm_java_enabled utm_timezone_offset utm_js_version elf_productname|elf_productcode elf_productname|elf_productcode elf_productname|elf_productcode elf_productname|elf_productcode elf_region elf_region referral_domainandstem referral_domain|referral_keywords referral_domain|referral_keywords domain_primary|domain_complete
Purpose IIS raw date of hit field. IIS raw time of hit field. Apache raw date &time of hit field. Client IP Address. Client username (if any) Apache raw entire request field. IIS raw request method field. IIS raw request stem field. 78
9 10 11 12 13 14 15 16 17 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 76 77 78 79 80 81 82 83
cs_uriquery sc_status sc_bytes c_host cs_useragent cs_cookie cs_referer custom_date custom_time cs_host s_port cs_version s_sitename s_computername s_ip elf_orderid elf_store elf_sessionid elf_total elf_tax elf_shipping elf_billcity elf_billstate elf_billzip elf_billcountry elf_productcode elf_productname elf_variation elf_price elf_quantity elf_upsold referral_protocol referral_host referral_domain referral_port referral_url referral_uri referral_stem referral_query
(RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO)
IIS raw request query field. Return status code from server. Number of bytes transferred for request. Client hostname (converts to c_ip if necessary). Browser useragent information. Cookies sent by browser. Raw Referral information (could be internal). Used for datestamp in Custom Logs. Used for timestamp in Custom Logs. Requested virtualhost by Client. Server port number. IIS Raw HTTP version. IIS Server site name. IIS Computer name. IIS Server IP address. Ecommerce order id number. Ecommerce store name. Ecommerce session id. Ecommerce transaction amount. Ecommerce tax amount. Ecommerce shipping amount. Ecommerce customer city. Ecommerce customer state. Ecommerce customer zip code. Ecommerce customer country. Ecommerce product code. Ecommerce product name. Ecommerce product variation. Ecommerce product price. Ecommerce product quantity. Ecommerce upsold variable. Referral protocol (http/https/etc.) Referral complete hostname. Referral domain name. Referral port number (if any). Referral complete URL. (includes host) Referral complete URI. (no host) Referral URI stem without query info. Referral Query info by itself.
79
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 119 120 121 122 124
referral_anchor referral_directory referral_filename referral_mime referral_keywords referral_domainandstem referral_errordetail request_method request_url request_version request_protocol request_host request_port request_uri request_stem request_query request_anchor request_directory request_filename request_mime request_origfilepath request_origmime request_errordetail useragent_complete browser_base browser_version platform_base platform_version domain_primary domain_complete sid utm_cookiea utm_cookieb utm_cookiec utm_cookie1 utm_cookie2 utm_cookie3 utm_unique_id utm_page
(AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO)
Referral information after # tag. Referral directory up to filename. Referral filename without directory. Referral mime type (file extension) Referral search engine keywords Referral domain and URI stem together. Referral error detail information. Request method (GET/POST/etc.). Request complete URL (if provided). Request protocol version. Request protocol (HTTP/etc.). Request hostname (if any). Request port number (if any). Request URI with query. Request URI without query. Request query information (e.g., after ?) Request information after # tag Request directory without filename. Request filename without directory. Request mime type (file extension). Request original uri stem if UTM. Request original mime type if UTM. Request detail for error hits. Complete user agent. Browser name (e.g., Netscape). Browser version. Platform (e.g., Windows). Platform version. First level domain. (e.g. com). Complete domain. (e.g. urchin.com). Session id (if any). UTM2 cookiea UTM2 cookieb UTM2 cookiec UTM1 cookie1 UTM2 cookie2 UTM3 cookie3 UTM unique visitor id. UTM page variable (used for request_ variables).
80
125 126 127 128 129 130 131 132 133 134 135 145
utm_referral utm_screen_resolution utm_screen_available utm_browser_size utm_screen_colors utm_language utm_java_enabled utm_cookies_enabled utm_timezone_offset utm_js_version utm_session_number elf_region
(AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO)
UTM Referral (used for referral_ variables). Screen resolution (e.g., 800x600). Available screen resolution in pixels. Browser size in pixels. Screen color bit depth. Browser language code setting. yes|no if java is enabled. yes|no if cookies are enabled. +/HHMM timezone offset value of browser. Javascript version info. Number of sessions for this visitor. ECommerce region drilldown information.
Exclude/Include Filters
Introduction Exclude and Include Filters, set up in the admin interface and applied to a log source or profile, are used to eliminate unwanted hits when processing a log file. The filters use POSIX regular expressions when matching against data in the fields of a hit. If you are unfamiliar with regular expressions, please read the Regular Expression Overview document in this section before proceeding. How Urchin Uses Exclude/Include Filters These filters are applied after the Decode URL, Japanese Encode, Dynamic URL, Search & Replace and Advanced filters. Urchin applies the Exclude/Include filters in succession. If the filter being applied is an Exclude Filter and the pattern matches, the hit is thrown away and Urchin continues with the next hit. If the pattern does not match, Urchin applies the next filter to the hit. This means that you can create either a single Exclude Filter with multiple patterns separated by '|' or you can create multiple Exclude Filters with a single pattern each. Include Filters are applied with the reverse logic. When an Include Filter is applied, the hit is thrown away if the pattern does not match the data. If multiple Include Filters are applied, the hit must match every applied Include Filter in order for the hit to be saved. To include multiple patterns for a specific field, create a single include filter that contains all of the individual expressions separated by '|'. Using Exclude/Include Filters
81
In the figure above, the exclude filter requires a filter expression and a filter field. During processing, the filter expression is compared with data in the filter field and the hit is thrown away if the filter matches. See the Filter Fields article for a complete list of fields that are available. The above example illustrates how to filter out image hits by filtering out all mime types that match gif, jpg, png, jpeg, and ico. This list can be customized to match any mime type.
In the figure above, the include filter requires a filter expression and a filter field. During processing, the filter expression is compared with data in the filter field and the hit is thrown away if the filter does not match. See the Filter Fields article for a complete list of fields that are available. This example shows how to filter in only html pages by requiring the mime type of the request to be html. Controls The 'Case Sensitive' control allows you to specify whether the filter should be applied with or without case sensitivity.
Decode URL Filters
Introduction The Decode URL filter is used to convert data from a URL encoded form to a more readable form. Encodings such as %20, for example, are converted into spaces. Chapter 3: Urchin Administration 82
Using Decode URL Filters
To use the Decode URL filter, select a Filter Field. During processing, the data in the Filter Field is decoded and stored back in that field. The field can then be displayed in the reports. Refer to the Filter Fields article for a complete field list. Although the above example illustrates how to use a Decode URL filter to decode referral keywords, the filter is also useful for decoding request stem and request query.
Search &Replace
Introduction Use the Search &Replace Filters to replace a matched expression with another string. This type of filter is a simplified version of an advanced filter. Using Search &Replace Filters
The search &replace filter requires a filter field, an expression to search for, and a replace expression. The search expression is a POSIX regular expression. The replace expression is any text that you wish to have replace the matched part. Refer to the Filter Fields article for a complete field list. This above example above illustrates how to use a search &replace URL filter to remove a leading directory from the path of a page. Another use for this type of filter would be to replace category id numbers with descriptive words in the query string of a request. For example, suppose that samples of the requested file with attached queries looks as Chapter 3: Urchin Administration 83
follows: /docs/document.cgi?id=1000 /docs/document.cgi?id=2000 Using the search and replace filter, you could convert the 1000 or 2000 ids to their equivalents. For example, 1000 could be changed to books and 2000 to magazines. This would make the viewing of the pages report more useful for people who are not familiar with the codes used to identify the individual items.
Lookup Table Filters
The Lookup Table filter is available beginning with Urchin 5.6. The Lookup Table filter can be used to: implement master tracking codes for campaign tracking. Read Campaign Tracking Module>How To Use Master Tracking Codes. implement an external data table to lookup and replace character field values when a match occurs. Lookup tables can match against a single field and update multiple fields. Read Advanced Topics>Customization>Custom Lookup Tables. map Japanese phone manufacturer/model abbreviations to full names. See below. To apply the Japanese phone filter: 1. In the Filter Wizard:Settings screen, enter your desired filter name (Filter Name field). 2. Select Lookup Table as shown in the screen image below. 3. Select platform_version (AUTO) from the Filter Field drop down menu, as shown in the screen image below. 4. Select phone models from the Table Name drop down menu, as shown below.
84
Advanced Filters
Introduction The Advanced Filter option allows you to construct Fields for reporting from one or two existing Fields. POSIX regular expressions and corresponding variables can be used to capture all or parts of Fields and combine the result in any order you wish. For general information on how filtering works and a list of what each Field is used for, see the Filtering Overview and Filter Fields articles at the beginning of this section. Using Advanced Filters
Shown in the figure above, the Advanced Filter takes up to two fields: Field A and Field B, and constructs the Output Field. The construction occurs in the following manner. The Extract A expression is applied to Field A, and the Extract B expression is applied to Field B. These expression can use complete or partial text matches and include wildcards. The following is a list of the most common wildcards and their meanings. The expressions conform to POSIX regular expressions. Wildcard . * + ? () [] | ^ $ \ Meaning match any single character match zero or more of the previous item match one or more of the previous item match zero or one of the previous item remember contents of parenthesis as item match one item in this list create a range in a list or match to the beginning of the field match to the end of the field escape any of the above
Use the parenthesis () to capture parts of the Fields. These can be referenced in the Constructor using the $A1, $A2, $B1, $B2 notation. The A|B refers to the Field, and the number refers to which Chapter 3: Urchin Administration 85
parenthesis to grab. In the above example, the entire A Field and the entire B Field are captured and assembled as the new field. The Output Field can be a separate field or the same field as Field A or Field B. Controls The 'Override Output Field' control allows you to decide what to do if the Output Field already exists. The 'Required Field' allows you to decide what to do if one of the expressions does not match.
DynamicURL Filters (deprecated)
Note Urchin 5 and later displays query terms in a seperate report by default. Unlike versions 4 and earlier, it is not necessary to create a filter to display query strings. However, unlike versions 4 and earlier, the data is not displayed in the 'Pages' report. Urchin 5 displays this data in a drilldown report titled 'Page Query Terms.' This report can be found under the 'Pages &Files' report menu. Many sites today will use a CGI, ASP or other scripting mechanism to provide dynamic content. Often, a single script is used to deliver multiple pages of information. While this can be a handy way to track users sessions or provide ?live? content, it poses an additional challenge for meaningful reporting. By default, Urchin strips all the parameters associated with a page request (e.g. those that would typically be used with a CGI or ASP) and stores only the pathname of the page requested in its database. The DynamicURL filtering feature allows you to use regular expressions to selectively capture these parameters and present them in an intuitive way. As an example, a CGI script might be used to deliver information about all products in a catalog. The script draws from a database, and uses parameters passed through the request to determine which product to display. The resulting hit in the webserver log for this request might look like:
/cgibin/showProduct.cgi?sessionId=123456789 |______________________| |_________________________________|
Under normal operation, Urchin will record that the showProduct.cgi page was requested, and all parameters up to and including the "?" will be stripped. By using a DynamicURL filter, Urchin can store some or all of the parameters and produce a unique page record based on the parameter list. Now in this example, we don?t necessarily want to capture the entire second part of the request because of the ?sessionId.? Let?s assume that this parameter changes for each visit and we get 30,000 visits per day. Including this piece of information would create far too many unique pages and render the Pages reporting useless. Instead we just want to capture the ?productId? and report only on that information.
/cgibin/showProduct.cgi?sessionId=123456789
86
We may still want to know which script was used as well as which product was implicated in the request. By using a DynamicURL filter, we can capture multiple parts of the request and recombine them into a new, formatted request ready for reporting. Here is an example of a filter that could be used with the page request above:
(/cgibin/showProduct.cgi\?).*productId=(.*)
This regular expression will match the above request no matter what the value of the sessionId or productId was. And the parenthesis capture the parts of the request that we want to keep for reporting. The effective request of the above example would look like:
/cgibin/showProduct.cgi/knobs
Up to 5 sets of parenthesis can be used. And, multiple filters can be applied. If a request does not match the DynamicURL filter, it is left unmodified, but still included in the reporting. This allows you to use multiple DynamicURL filters for each area of a site. Keep in mind there is a slight performance hit for each filter used. Note that DynamicURL filters can only be applied to the base URL and query string that form the page request. They cannot be used to filter referrals or any other fields in the log file. Also, when DynamicURLs and FilterIn/FilterOut are used together the DynamicURL will be applied after the other filters. So consideration must be given to how one set of filters affects the others when choosing what to filter. Examples Example 1: We want to capture the all the specific Knowledgebase article IDs in the Urchin 4 report for help.urchin.com. Here's a sample of what the Request portion of the hit looks like in the log file: GET /knowledge.cgi?cmd=2 The proper Dynamic URL filter to extract the article ID is: (/knowledge\.cgi\?)cmd=2 and this produces Top Pages reports that look like:
1. 2. 3. 4. 5. /knowledge.cgi /knowledge.cgi/id=767 /knowledge.cgi/id=807 /knowledge.cgi/id=768 /knowledge.cgi/id=777 1,081 244 136 50 40 46.43% 10.48% 5.84% 2.15% 1.72%
Example 2: We want to capture the all the search keywords used in the Urchin 4 report for help.urchin.com. Here's a sample of what the Request portion of the hit looks like in the log file: GET /knowledge.cgi?cmd=1PE=0= utm The proper Dynamic URL filter to extract the keyword information is: (/knowledge.cgi\?).*s_(keyword=[^ and this produces Top Pages reports that look like:
1. /knowledge.cgi 1,373 2. /knowledge.cgi/keyword=utm 29 3. /knowledge.cgi/keyword=default+page 18 68.65% 1.45% 0.90%
87
4. /knowledge.cgi/keyword=no+referral 5. /knowledge.cgi/keyword=scheduler
11 10
0.55% 0.50%
Regular Expression Overview
Introduction Posix regular expressions are used to match or capture portions of a field using wildcards and metacharacters. They are often used for text manipulation tasks. Most of the filters included in Urchin use these expressions to match the data and perform an action when a match is achieved. For instance, an exclude filter is designed to exclude the hit if the regular expression in the filter matches the data contained in the field specified by the filter. Regular expressions are text strings that contain characters, numbers, and wildcards. A list of common wildcards is contained in the table below. Note that these wildcard characters can be used literally by escaping them with a backslash '\'. Wildcard . * + ? () [] | ^ $ \ Meaning match any single character match zero or more of the previous item match one or more of the previous item match zero or one of the previous item remember contents of parenthesis as item match one item in this list create a range in a list or match to the beginning of the field match to the end of the field escape any of the above
Tips for Regular Expressions 1. Make the regular expression as simple as possible. Complex expressions take longer to process or match than simple expressions. 2. Avoid the use of .* if possible since this expression matches everything and may slow down processing the expression. For instance, if you need to match index.html, use index\.html, not .*index\.html.* 3. Try to group patterns together when possible. For instance, if you wish to match a file suffix or .gif, .jpg, and .png, use "\.(gif|jpg|png)" not "\.gif|\.jpg|\.png". 4. Be sure to escape the regular expression wildcards or metacharacters if you wish to match Chapter 3: Urchin Administration 88
those literal characters. 5. Use anchors whenever possible. The anchor characters are ^ and $, which match either the beginning or end of an expression. Using these when possible will speed up processing. For instance, to match foo directory in /foo/bar, use ^/foo/ instead of /foo/. Using the ^ will force the expression to match at the beginning and will improve processing speed.
Affiliations, Users &Groups

Working with Affiliations
Overview An Affiliation is a high level association that is used to group together related Profiles, Log Sources, Users and Groups under a single identifying label, which is typically a corporate or client name. For Urchin installations where there is a need to support multiple complex client organizations, creating an Affiliation allows the Urchin administrator to keep track easily of all the Urchin reporting components for a particular client or corporate entity. Access rights to Urchin reports can be controlled via an Affiliation association, and within an Affiliation even more granular access rights to certain reports can be assigned to Groups or Users, thereby protecting your data as desired at multiple levels. As well, Affiliation level administration rights can be assigned to a user who can then act as a local Urchin administrator for the Affiliation. This allows distribution of the responsibility for managing, configuring, and maintaining the Urchin reports within an Affiliation. It is important to note that since Affiliations are the highest level organizational "element" in the Urchin administration interface, you should create the Affiliations before you create any Profiles, Log Sources, Users, or Groups that will be associated with the Affiliation. The choice of Affiliation must be made when an Urchin element is created, and it cannot be changed afterwards. At creation time if you do not choose a specific Affiliation the default Affiliation of (NONE) will be set. This tells Urchin that the element in question has no Affiliation. Creating Affiliations To create an Affiliation, go to the Configuration>Users &Groups>Affiliation screen and click the Add button. Only the Affiliation Name is required; Contact and Contact Email are not used by Urchin and are strictly informational fields for the benefit of the adminstrator. Report Data Location (optional) specifies where to store the report data for all the Profiles that belong to the Affiliation. The default location is the data/reports directory within the Urchin distribution. Changing the Report Data Location allows you to physically separate within your file system the report data for different organizations. Directory Browsing Location (optional) is provided as a security measure when giving an Affiliation administrator access to create Profiles. A directory entered into this field will limit the Affiliation admin's ability to browse for log files to only that directory. By default, there are no Chapter 3: Urchin Administration 89
restrictions on where an Affiliation admin can browse for log files. Using Affiliations To assign a Profile, Log Source, User, or Group to an Affiliation, use the dropdown menu labeled Optional Affiliation in the initial screen of the setup wizard when creating the given element. Once an Affiliation is assigned to a Profile, Log Source, User, etc., the Urchin admin interface will restrict modification choices to those elements that are associated with the Affiliation. In this way the Affiliation acts to control access rights at a high level so that you can isolate organizations from one another without the need to set specific access permissions on every report. Affiliations also aid in distributing management reponsibilities for the Urchin configuration. Within an Affiliation, the primary Urchin administrator can assign local admin privileges to Affiliation users. There are three admininstrative levels of control that can be assigned to a User. See the Working with Users &Groups article in this section for details. When viewing the admin screens for Urchin configuration parameters, you may filter the entries to selectively show only those with a particular Affiliation name by using the Affiliation dropdown menu in the top bar of the table. Although (NONE) means no affiliation, for purposes of filtering (NONE) is shown as an option in the Affilation dropdown so that you may view only entries that are not affiliated.
Working with Users &Groups
Overview Urchin's Users &Groups functionality allows Urchin administrators to easily set up any number of users and grant them access to whichever reports deemed appropriate. These users can then be put into groups to expedite and simplify management of large numbers of users. If a group of users is granted report access, all users in that group will have access upon logging in to the system. Users do not have report access ever unless specifically allowed. How to Use Users 1. Login to your Urchin system as an administrator. Note: the URL for accessing the Urchin system is identical regardless of the user type. 2. Click on Configuration in the main leftside navigation. 3. Click on Users &Groups. 4. Click the Add button at upperright to enter the User Wizard. 5. Select a username it should be lowercase and must not have spaces and choose a password. 6. Enter the user's real name this will be displayed in the Urchin system. Click the Next button. 7. Determine what level of control the user should have unless you are running Urchin in Datacenter mode, the only choices will be User and Super Admin. If you are running in Chapter 3: Urchin Administration 90
Datacenter mode, you also have the choice of Affiliate Admin (see the embedded help by clicking on the help link for specifics on affiliate admin settings). Click the Next button. 8. Once the user has been created, click on the Edit icon next to that user. 9. Click on the Report Access tab. The available Profiles will be shown in the box at left. 10. Select one or more Profiles. To select multiple Profiles, use the command key or control key depending on your platform. 11. Click the rightfacing arrow to move the Profile(s) to the Access Granted box. 12. Click Update to save changes.
How to Use Groups 1. Login to your Urchin system as an administrator and click on Configuration at left 2. Click on Users &Groups. 3. Click on Groups. 4. Click the Add button at topright. 5. Enter the Group Name this can be anything descriptive. 6. Enter the Group Description this might be something to do with the location or composition of the group. 7. Click Finish. 8. Click Done and then click Edit next to the group name. 9. Click the Users in Group tab to select users to add to the group to select multiple users, use the command key or control key depending on your platform. 10. Click the rightfacing arrow to move users to the Users in Group box. 11. Click the Update button to save changes.
91
12. To add users to the group, click the Users in Group tab and add users as described in the procedure above.
Recommendations Any "Super Admin" level user has complete control over the Urchin system, so it is advisable to only grant that privilege to one person. Passwords should contain one or more capital letters and/or symbols to make them difficult to guess.
Scheduling Tasks
Working with the Task Scheduler
Overview The Task Scheduler is the nervecenter of Urchin it is responsible for the actual scheduling and execution of Urchin log processing events for all Profiles. From the Scheduler, you can run tasks immediately or add them to the list of Urchin events for repeated execution at nearly any interval desired. Chapter 3: Urchin Administration 92
How to Use the Scheduler 1. Login to your Urchin Administration Interface and click Configuration in the main leftside navigation, then Urchin Profiles. 2. Locate the Profile you wish to schedule and click Edit. 3. Click the Run/Schedule tab. 4. Under Task Settings, select the desired interval. Daily is recommended. 5. Set the time of day for the task. 6. Click Update to save changes. 7. To run the task immediately, click Run Now. Subsequent scheduled tasks will occur according to the schedule you have set.
Recommendations Most tasks should be scheduled for daily execution, since that is the log rotation schedule for many webservers. However, Urchin's log tracking facility makes it possible to read the same log multiple times without doubling data, so this is not required. Notes on the Scheduler's Operation All tasks are handled sequentially by Urchin, so multiple tasks given the same time of execution will still be processed one at a time. To see the results of all tasks that have been executed, see the Task History screen in the Scheduler navigation section.
93
System Settings
Changing the Port Number
Changing the Port Number The default port number that the Urchin webserver will listen on is 9999. Changing this number consists of two basic steps: Changing the port number in the Server Settings screen Stopping and starting the Urchin services, which will be a slightly different process for Windows versus Unixtype systems The detailed process is as follows: Login to the Urchin administration interface Navigate to Configuration>Settings>Access Settings and click on the Server Settings tab Set your new port number in the Server Port Number box Click on the Update button Now you must restart the Urchin services: On Unixtype systems go to the bin directory of your Urchin distribution and run:
./urchinctl restart
On Windows systems, from the console, go to Start>Programs>Urchin and choose Disable Services, then choose Enable Services. The webserver should now be listening for connection requests on the new port number. This means that the URL used to view reports and configure the Urchin software has changed, and your users should be notified regarding the new URL. Notes Please note that on many systems, root privileges may be required to use port numbers less than 1024. Also, if another service is already running on the port specified, Urchin will fail to start.
Licensing Urchin
Overview Urchin must be licensed in one of the three ways before it can be used: Obtain Demo License Buy License Activate PrePurchased License
94
If you are trying out Urchin for the first time, you will want to install a demo license. This is a free 15day evaluation, which no limitations on Urchin's function. Installing a Demo License To install a demo license, login to your Urchin Administration Interface with a web browser (usually http://your.server.com:9999), click "Install Demo License", and follow the onscreen steps, including entering your contact information. It's important to enter your real information, as it will be necessary if and when you decide to purchase Urchin later. Click the Install Demo License link to complete the process. Buy License To purchase a license, login to the Urchin system as an administrator and click on Configuration at left. Next, click on the Settings button and then the License button. On the main screen, click the Buy License link, which will take you to our online licensing center. Once you have completed the purchase, the Urchin system will be fully operational in perpetuity. Activate PrePurchased License To activate a prepurchased license (such as if you purchased Urchin on CD or you have moved the Urchin installation to a new server), login to the Urchin system as an administrator and click on Configuration at left. Next, click on the Settings button and then the License button. On the main screen, click the Activate PrePurchased License link, which will take you to our online licensing Chapter 3: Urchin Administration 95
center. Once you have completed the process, the Urchin system will be fully operational in perpetuity. Installing a License Without Internet Access It is possible to license Urchin without internet access, such as behind a firewall. To accomplish this, you will need to run the "inspector" utility, which is bundled with Urchin and found in the "util" directory. Attach its output to an email and send to sales@urchin.com for assistance. Recommendations Please enter your real contact information when activating your demo, as it will be necessary for billing purposes if you decide to buy. We will also need to know who you are in order to provide support.
DNS Database Update
DNS Database Update Urchin includes a DNS database which provides the information used in creating the Domain Reports, including the conversion of IP addresses to domain names. These databases are stored in the Urchin data directory and need to be updated on a periodic basis. Urchin includes geoupdate which is a utility that checks for updates and downloads new updates when they are available. The utility is scheduled to check for updates once a month and allows the user to set the day and time for the download or allows for disabling the downloads. The geoupdate utility can also be used to import custom entries into the DNS databases. For more information, see the geoupdate utility article in the Advanced Topics > Utilities section. Considerations The geoupdate utility needs an internet connection to be able to check for and download new updates. The utility uses port 80 to communicate with the webserver providing the updates. It is possible that proxy servers and firewalls can interfere with Urchin's ability to successfully download updates.
96
Chapter 4: Reporting Interface
ReportSide Filtering
Urchin is capable of sophisticated filtering of any textbased report via the reporting interface. To filter in or out any text string, enter it into the Filter box and click the "+" (include) or "" (exclude) button. The Urchin reporting system will requery the database and only display corresponding results. To conduct more complex filtering operations, POSIX regular expressions can be used (POSIX is a standard for text manipulation which is beyond the scope of this guide).
Reporting Interface Overview
Welcome to the Urchin Reporting Interface! Overview
97
The Urchin Reporting Interface is the system that displays the actual Urchin reports. To access the Reporting Interface, login to your Urchin Administration Interface and select a report to view. If you are the Administrator, you will have access to all reports. If you are a User, you will have access to those reports specified by the Administrator. Each Profile that has been configured has its own set of reports. Click the magnifying glass icon next to a profile to view the reports for that profile.
Controls Note the Date Range at the top of any report. All data shown is for that time period only. To change the timeframe, just select a different date range from the controls at bottomleft of the screen. See the Date Range article in this section for more information. Standard/SVG: Urchin can display reports in either standard HTML, or via Adobe's Scalable Vector Graphics (SVG) format. By default, Urchin will attempt to determine if the browser in use has the SVG plugin installed. If so, reports will be displayed in SVG format. If not, Urchin will use standard HTML. If the user attempts to select SVG with a browser that does not have the SVG plugin, a link will be provided to a web page with information on getting the plug in. Search: To instantly find any item in listtype reports, enter a search term or phrase into the Search box and press Enter. The list will be updated with any matches. Filter: To filter in or out any items in listtype reports, enter a string of text into the Filter box and click the "+" (include) or "" (exclude) button. The list will be changed accordingly #Shown: To show a different number of items in the report being viewed, simply select the desired number from the pulldown menu. Go To#: If you know the position number of the entry you would like to see, enter it here and press Enter. Export: Tab: click the "T" button to export data in tabdelimited format. Word: click the Word icon to export data in Microsoft Word native format. Excel: click the Excel icon to export data in Microsoft Excel native format. Printing: click the printer icon to get a printfriendly view of the data; click the Print Page link from that screen to actually print the report. Recommendations Try different Date Range settings to see how your data changes over time. For low traffic sites, a month may be a better timeframe than a week, since traffic might not be statistically significant for that small of a time period. Chapter 4: Reporting Interface 98
See Also Glossary of Terms
Exporting Data
Overview Urchin's data export function makes it easy to extract data from any Urchin report. This is useful for bringing report data into a spreadsheet, word processor, database, etc. for further analysis. How to Use Export Data To export data from any report, select the appropriate type based on the application you plan to use to manipulate the data. For general database importing, use tabseparated format. For Word and Excel export, the application should launch automatically after the data is exported, and the new document should be populated with the data you have exported. Tab: click the "T" button to export data in tabdelimited format. Word: click the Word icon to export data in Microsoft Word native format. Excel: click the Excel icon to export data in Microsoft Excel native format. Printing: click the printer icon to get a printfriendly view of the data; click the Print Page link from that screen to actually print the report. Recommendations To export data to a database, tabseparated is usually the preferred format
Date Range
Overview Urchin's Date Range function allows you to view report data by any timeframe desired, from 1 day to the entire period of time for which data exists, or any part thereof. The Date Range feature makes it easy to specify either a standard timeframe (such as a week, month, or year), or any custom timeframe. Using the Date Range Function
99
Standard Date Range: To view data by a standard timeframe such as a day, week, month, or year, just click the desired period in the Date Range navigation area, and the report's data will change accordingly. The Date Range Calendar is clickable in many ways to accomplish this: Year: click the year to display data for the entire calendar year. Month: click the name of the month. Week: click the arrow to the left of the calendar for the week you are interested in. Date: click the date you are interested in. Day: to only show data for every instance of a particular day of the week in the currently selected Date Range, click the day name. Custom: click the Enter Range button, which brings up the Urchin Calendar. Select the starting date in the calendar at left, and the ending date at the calendar at right. Click Apply Date Range, and the report will change to show data for the timeframe selected.
After selecting a custom Date Range, all reports you examine that are compatible with the selected timeframe will display data for that period until either the browser is closed or a different Date Range is specified. Recommendations If you are examining a lowtraffic site, try looking at a longer timeframe to get more meaningful data. If you are interested in traffic trends over the life of your site (and the data exists), try analyzing a year or more worth of data Urchin will adjust the size of bar graph elements to accommodate the selection. Urchin 5.6 feature: You can also see the data displayed hourly, daily, or monthly over your selected date range. Select hourly, daily, or monthly from the Date View pulldown, as shown in the image below.
100
101
Chapter 5: Ecommerce Module
Ecommerce Overview
Introduction Urchin's Ecommerce reporting module expands the power of Urchin's reporting to allow you to follow visitors all the way to the point of conversion and actually measure your ROI on various aspects of your website and marketing campaigns. There are two sections of reporting enabled by the module. The first section of reports, ECommerce, provides trend analysis of online revenue, transactions, and product detail. The second commerce reports section, Revenue Source, correlates revenue against visitor parameters including keywords and search engines.
102
This valuable reporting capability allows you to exploit crosssystem online business resources and optimize online campaigns. Easily calculate your ROI from CPC, and organic search engine placements. System Overview When a visitor to a website makes a purchase, the shopping cart software will make an entry into a transaction log file which, when processed along with normal web traffic logs, creates a complete picture of the Ecommerce system.
Urchin processes these logs together and correlates the web site session with the Ecommerce transaction. The purchase and product information is stored in the Urchin databases ready for viewing in the ECommerce reports. Configuration There are three key elements for configuring Urchin's ecommerce processing: Establish a usable ecommerce log format: Urchin needs to understand how your ecommerce logs are constructed. The choices are ELF2, ELF, or custom log format. See the ELF &ELF2 Log Formats or Custom Ecommerce Log Formats articles in this section for details. Coordinating processing of ecommerce and webserver access logs together: typically ecommerce transactions are tracked separately from normal webserver activity and frequently the sites are not even hosted on the same machine. You'll have to make sure that both sets of logs are available to Urchin so that they can be processed together. Both logs should be listed as log sources in the single profile that is setup to handle your ecommerce reporting. Choosing a visitor tracking method: the visitor tracking method will determine how well Urchin can correlate ecommerce activity with normal website activity. You should decide which visitor tracking method will yield the level of analysis you desire. The more accurate UTM method requires making some simple modifications to your website documents to achieve the most complete analysis. These modifications should be made to all websites involved in your online business. See the Visitor Tracking section of the Documentation Center for details on setting up UTM. Considerations Chapter 5: Ecommerce Module 103
It is strongly advised to have your shopping cart software log in ELF2 format if possible. This will reduce some of the Urchin administration overhead in setting up your ecommerce reporting since Urchin has a builtin capability to deal with this format automatically.
ELF &ELF2 Log Formats
Overview The Ecommerce log formats (ELF &ELF2) were designed to record information about customer transactions from online shopping sites. ELF was originally created for use with Urchin 3 and may be used with Urchin 5 when processing data with the IPOnly visitor method. ELF2 is similar to ELF and includes additional fields that allow for visitor correlation using the IP+UserAgent, UTM, and other visitor tracking methods. It is recommended to log data in the ELF2 format since it is able to provide better visitor correlation with your webserver data. If you cannot set up your shopping cart software to log in ELF/ELF2, then you must configure your own Urchin custom log format prior to attempting to process your ecommerce data. This document describes the format of the ELF and ELF2 log files that are created by the shopping cart software and explains how to configure Urchin for processing of e commerce logs. Configuring Urchin for ELF/ELF2 Log Files You must select specific Urchin configuration parameters depending on your ecommerce log type. ELF processing In the Log Source>Log Settings screen set the Log Format to either elf or auto In the Profile>Reporting screen set the Visitor Tracking Method in the Profile to IPONLY, which is the only method supported when using ELF ecommerce log formats ELF2 processing In the Log Source>Log Settings screen set the Log Format to either elf2 or auto In the Profile>Reporting screen set the Visitor Tracking Method to any of the choices, which are all supported when using ELF2 Your ecommerce log should be listed as a second log source along with your main website log in the profile that is created to handle your ecommerce reporting. The logs are processed sequentially by Urchin. ELF/ELF2 Log Format Description Both ELF and ELF2 are tabseparated multiline log formats. The first line begins with an '!' exclamation character and contains overall information about the purchase. Subsequent lines contain detailed information about the items purchased. The first line is referred to as the transaction and the subsequent lines are referred to as items. Blank fields should contain a '' character. Since tabs are Chapter 5: Ecommerce Module 104
used to separate fields, the tab character is not allowed to be used within a field. A typical ELF/ELF2 log file will have the following general form: !transation1 item1 item2 item3 !transaction_2 item1 item2 ... ELF2 Log Format ELF2 Transaction Line The ELF2 transaction line begins with an '!' exclamation and contains the following tab separated fields (empty fields should contain a '' character): !%{ORDERID} %{REMOTE_HOST} %{DATE/TIME} %{STORE} %{SESSIONID} %{TOTAL} %{TAX} %{SHIPPING} %{BILL_CITY} %{BILL_STATE} %{BILL_ZIP} %{BILL_COUNTRY} %{USER_AGENT} %{COOKIES} where: %{ORDERID} is the order number %{REMOTE_HOST} is the hostname/ip address of the remote machine %{DATE/TIME} is the time in the common log format [dd/mmm/yyyy:HH:MM:SS +/ZZZZ] %{STORE} is the name/id of the storefront %{SESSIONID} is the unique session identifier of the customer %{TOTAL} is the transaction total including tax and shipping (decimal only, no '$' characters) %{TAX} is the amount of tax charged to the subtotal %{SHIPPING} is the amount of shipping charges %{BILL_CITY} is the billing city of the customer %{BILL_STATE} is the billing state of the customer %{BILL_ZIP} is the billing zip code of the customer %{BILL_COUNTRY} is the billing country of the customer %{USER_AGENT} is the user agent of the customers browser %{COOKIES} are the incoming cookies contained in the headers from the customers browser ELF2 Item Line The ELF2 item line contains the following tab separated fields (empty fields should contain a '' character): Chapter 5: Ecommerce Module 105
%{ORDERID} %{REMOTE_HOST} %{DATE/TIME} %{PRODUCT_CODE} %{PRODUCT_NAME} %{VARIATION} %{PRICE} %{QUANTITY} %{UPSOLD} %{USER_AGENT} %{COOKIES} where: %{ORDERID} is the order number %{REMOTE_HOST} is the hostname/ip address of the remote machine %{DATE/TIME} is the time in the common log format [dd/mm/yyyy:HH:MM:SS +/ZZZZ] %{PRODUCT_CODE} is the identifier of the product %{PRODUCT_NAME} is the name of the product %{VARIATION} is an optional variation of the product for colors, sizes, etc %{PRICE} is the unit price of the product (decimal only, no '$' signs) %{QUANTITY} is the quantity ordered of this product %{UPSOLD} is a boolean (0|1) if the product was on sale %{USER_AGENT} is the user agent of the customers browser %{COOKIES} are the incoming cookies contained in the headers from the customers browser ELF2 Log File Example The following 2 lines demonstrate a transaction and corresponding item entry in an ELF2 log:
!36530 123.123.123.123 [21/Aug/2003:11:31:45 0800] 895.00 Virginia Beach VA 23452 US "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" "__utma=171060324.2002410569.1061216915.1061216915.1061490246.2; __utmb=171060324;__utmc=171060324" 36530 123.123.123.123 [21/Aug/2003:11:31:45 0800] U5BASE Urchin 5 Base License 895.00 1 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" "__utma=171060324.2002410569.1061216915.1061216915.1061490246.2; __utmb=171060324;__utmc=171060324"
ELF Log Format ELF Transaction Line The ELF transaction line begins with an '!' exclamation and contains the following tab separated fields (empty fields should contain a '' character): !%{ORDERID} %{REMOTE_HOST} %{STORE} %{SESSIONID} %{DATE/TIME} %{TOTAL} %{TAX} %{SHIPPING} %{BILL_CITY} %{BILL_STATE} %{BILL_ZIP} %{BILL_COUNTRY} where: %{ORDERID} is the order number %{REMOTE_HOST} is the hostname/ip address of the remote machine Chapter 5: Ecommerce Module 106
%{STORE} is the name/id of the storefront %{SESSIONID} is the unique session identifier of the customer %{DATE/TIME} is the time in the common log format [dd/mmm/yyyy:HH:MM:SS +/ZZZZ] %{TOTAL} is the transaction total including tax and shipping (decimal only, no '$' characters) %{TAX} is the amount of tax charged to the subtotal %{SHIPPING} is the amount of shipping charges %{BILL_CITY} is the billing city of the customer %{BILL_STATE} is the billing state of the customer %{BILL_ZIP} is the billing zip code of the customer %{BILL_COUNTRY} is the billing country of the customer ELF Item Line The ELF item line contains the following tab separated fields (empty fields should contain a '' character): %{ORDERID} %{PRODUCT_CODE} %{PRODUCT_NAME} %{VARIATION} %{PRICE} %{QUANTITY} %{UPSOLD} where: %{ORDERID} is the order number %{PRODUCT_CODE} is the identifier of the product %{PRODUCT_NAME} is the name of the product %{VARIATION} is an optional variation of the product for colors, sizes, etc %{PRICE} is the unit price of the product (decimal only, no '$' signs) %{QUANTITY} is the quantity ordered of this product %{UPSOLD} is a boolean (0|1) if the product was on sale ELF Log File Example The following lines demonstrate 2 transactions and corresponding item entries in an ELF log:
!12313 ppp46.miatc2.netrox.net ZongStore 1102323131 [27/Jul/1999:11:43:02 0700] 198.12 8.12 10.00 Cedar Rapids Iowa 52403 US 12313 102 T Shirt XL 10.00 10 0 12313 103 Boxers L 9.00 10 0 !12314 213.12.54.123 110123413 [27/Jul/1999:11:43:02 0700] 11.75 0.75 1.00 Santa Ana CA 92705 US 12314 102 T Shirt S 10.00 1 0
Custom Ecommerce Logs
107
Overview Many shopping carts provide the ability to capture and log valuable information regarding purchases in formats other than ELF or ELF2, and therefore cannot be automatically processed by Urchin. This article explains how to create a custom log format for your Ecommerce log file if you cannot alter your shopping cart to generate ELF/ELF2. Before continuing, please read the article titled "Custom Log Formats" in the Advanced Topics>Customization section of the Document Library, which explains the creation of custom logs in detail. Ecommerce Log Format Types Shopping carts are capable of logging information about purchases and the items purchased in either a single line format or a multiline format. In single format each line contains all the information necessary to completely describe a transaction and the items purchased and all lines have the same layout. In multiline formats, multiple lines are used to describe a purchase, with one format for the transaction lines and another format for the items purchased. ELF/ELF2 logs are multiline formats. You must examine your Ecommerce logs to determine if the data is single line or multiline as this will affect how you set up your custom log format. Please follow the instructions below depending on your type of log format. General Ecommerce Logging Requirements Regardless of the format of the log entries your shopping cart produces, each entry must contain the date and time and at least one of the following fields to provide visitor correlation: Remote Host or IP Address (for IPOnly or IPUseragent visitor methods) Useragent (for IPUseragent visitor method) Cookies (for UTM or SID visitor method) Session ID (for SID visitor method) If any of the above fields are missing Urchin will not produce meaningful analysis of your revenue. Urchin also defines the following Ecommerce fields: %{ORDERID} is the order number %{STORE} is the name/id of the storefront %{SESSIONID} is the unique session identifier of the customer %{TOTAL} is the transaction total including tax and shipping (decimal only, no '$' characters) %{TAX} is the amount of tax charged to the subtotal %{SHIPPING} is the amount of shipping charges %{BILL_CITY} is the billing city of the customer %{BILL_STATE} is the billing state of the customer %{BILL_ZIP} is the billing zip code of the customer %{BILL_COUNTRY} is the billing country of the customer %{PRODUCT_CODE} is the identifier of the product %{PRODUCT_NAME} is the name of the product %{VARIATION} is an optional variation of the product for colors, sizes, etc %{PRICE} is the unit price of the product (decimal only, no '$' characters) Chapter 5: Ecommerce Module 108
%{QUANTITY} is the quantity ordered of this product %{UPSOLD} is a boolean (0|1) if the product was on sale Singleline Format Logs Follow these instructions if your Ecommerce log file only contains hits that all have the same line format as explained above. 1. Create a new custom log format in the lib/custom/logformats directory by making a copy of the custom.lf.sample logformat file. Name your copy with a .lf suffix. 2. Edit your new custom log format file and set the following entries based on the recommendations below: PrimaryPositions: This entry specifies the order of fields in your log file. Create a comma separated list of field ids which describes your field order. The field names and ids are found in the lib/reporting/logformats/fieldlist.txt file. See example below. SecondaryPositions: Leave this as '' since it is not used for singleline format log files. PrimaryKey: Leave this as '' since it is not used for singleline format log files. SecondaryKey: Leave this as '' since it is not used for singleline format log files. PrimaryContent: Valid entries for this field are TRANSACTION or ITEM. If the hits in your log file describe the purchase of each individual product, set this to ITEM. If the hits in the log file describe the entire purchase, set this to TRANSACTION. SecondaryContent: Leave this as '' since it is not used for singleline format log files. CommentKey: If some of the lines in your log file are comments or are not considered hits and begin with a specific character, enter the character here. FieldSeparator1: The field separators define which characters are considered field separators. Typical entries are tabs (\t) and spaces (\s). Set these appropriately based on the characters between the fields in your log file. FieldSeparator2: See FieldSeparator1 above QuotesEscapeSep: This specifies whether field separators will be ignored inside a field that contains quote "" characters. This should probably be left as YES. BracketsEscapeSep: This specifies whether field separators will be ignored inside a field that contains bracket [] characters. This should probably be left as YES. MergSuccessiveSep: This specifies whether to consider two separator characters in a row as one separator. This can probably be left as NO. CleanWhiteSpace: This specifies whether to remove white space from the ends of the fields when they are parsed. This can probably be left as NO. StatusRequired: Leave this set to NO unless your hits contain web server type status codes CustomDateFormat: If your log format contains a custom date format, set the appropriate strptime format that describes the entry CustomTimeFormat: If your log format contains a custom time format, set the appropriate strptime format that describes the entry 3. Save your custom log format in the lib/custom/logformats directory 4. Select the custom log format for your log source in the Urchin Admin interface. 5. Process your log file(s) with Urchin. Singleline Format Example
109
The following example is a single hit from a log that only has transaction data.
12345 123.123.123.123 "Urchin Store" [26/Aug/2003:11:43:02 0700] 192.73 "San Diego" "CA" 92101 "US" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)" "__utma=171060324.2734232095.1061444425.1061444425.1061444763.2"
The list below shows each field name listed with the id number obtained from the lib/reporting/logformats/fieldslist.txt file. The id numbers thus assigned are used in the PrimaryPositions field in your custom log format file. 1. Transaction ID 25 2. Remote Host or IP Address 12 3. Store Name 26 4. Apache Date/Time 3 5. Total Cost 28 6. Bill City 31 7. Bill State 32 8. Bill Zip 33 9. Bill Country 34 10. User Agent 13 11. Cookies 14 Based on the list above, you would set the following entries in the custom logformat file:
PrimaryPositions: "25, 12, 26, 3, 28, 31, 32, 33, 34, 13, 14" SecondaryPositions: PrimaryKey: SecondaryKey: PrimaryContent: TRANSACTION SecondaryContent: CommentKey: # FieldSeparator1: \s FieldSeparator2: \t QuotesEscapeSep: YES BracketsEscapeSep: YES MergSuccessiveSep: NO CleanWhiteSpace: NO StatusRequired: NO CustomDateFormat: CustomTimeFormat:
The PrimaryPositions specify the field order and the PrimaryContent tells Urchin that this log contains transactions (or general information about purchases). The field separators were set to space and tab since the fields were separated by white space. The custom date/time formats were not specified since the date/time was formatted as an Apache date. Multiline Format Logs Urchin has the ability to read multiline formats as long as the beginning character of each line contains a specific character that can identify which format is being used. For example, the ELF/ELF2 log files contain a '!' exclamation character as the first character in the transaction line. The item lines do NOT contain a leading '!' character. Chapter 5: Ecommerce Module 110
Follow these instructions if your Ecommerce log file contains two different format lines, one for the transaction and the other for product or item details. 1. Create a new custom log format in the lib/custom/logformats directory by making a copy of the custom.lf.sample logformat file. Name your copy with a .lf suffix. 2. Edit the new custom log format file and set the following entries based on the recommendations below: PrimaryPositions: This entry specifies the order of fields in your log file. Create a comma separated list of field ids which describes your field order. The field names and ids are found in the lib/reporting/logformats/fieldlist.txt file. SecondaryPositions: This entry specifies the order of fields in your log file. Create a comma separated list of field ids which describes your field order. The field names and ids are found in the lib/reporting/logformats/fieldlist.txt file. PrimaryKey: Set the primary key to the character that identifies the log file line as the same format described by the primarypositions SecondaryKey: Set the seconday key to the character that identifies the log file line as the same format described by the secondarypositions PrimaryContent: Valid entries for this field are TRANSACTION or ITEM. If the hits in your log file describe the purchase of each individual product, set this to ITEM. If the hits in the log file describe the entire purchase, set this to TRANSACTION. SecondaryContent: See PrimaryContent above CommentKey: If some of the lines in your log file are comments or are not considered hits and begin with a specific character, enter the character here. FieldSeparator1: The field separators define which characters are considered field separators. Typical entries are tabs (\t) and spaces (\s). Set these appropriately based on the characters between the fields in your log file. FieldSeparator2: See FieldSeparator1 above QuotesEscapeSep: This specifies whether field separators will be ignored inside a field that contains quote "" characters. This should probably be left as YES. BracketsEscapeSep: This specifies whether field separators will be ignored inside a field that contains bracket [] characters. This should probably be left as YES. MergSuccessiveSep: This specifies whether to consider two separator characters in a row as one separator. This can probably be left as NO. CleanWhiteSpace: This specifies whether to remove white space from the ends of the fields when they are parsed. This can probably be left as NO. StatusRequired: Leave this set to NO unless your hits contain web server type status codes CustomDateFormat: If your log format contains a custom date format, set the appropriate strptime format that describes the entry CustomTimeFormat: If your log format contains a custom time format, set the appropriate strptime format that describes the entry 3. Save your custom log format in the lib/custom/logformats directory 4. Select the custom log format for your log source in the Urchin Admin interface. 5. Process your log file(s) with Urchin.
111
Visitor Correlation
Overview Visitor correlation is the process of identifying visitor behavior even if the sessions come from different log files or independnet web or Ecommerce servers. Urchin uses data from the various log sources to analyze relationships between sessions and transactions and then correlates this information to provide a clear picture of how visitor activity relates to purchases, thereby providing valuable return on investment reporting. For example, referrals from search engines and specific keywords can be correlated with the amount purchased from your Ecommerce site to tell you which referrals and search terms are yielding the most revenue. Typically you will have one log for your public facing website, and another log from your secure transaction website. In order to correlate disparate data sources together, Urchin must use the same visitor identification method for each site. Types of Visitor Correlation Urchin is capable of correlating visitors based on several methods. These methods are described in the Visitor Identification Methods article in the Visitor Tracking section. Your choice of visitor tracking method will directly affect the accuracy of the information Urchin has at its disposal to correlate the Ecommerce transactions with other website activity. Please choose the method that is suitable for the level of detail you desire. The following data is required in the Ecommerce log file for each of the visitor methods: UTM: cookies SID: cookies or SID field Username: username IP+UserAgent: remote host or IP address and useragent (i.e. browser type) IPOnly: remote host or IP address These items need to be considered when you examine your Ecommerce logging format. Please review the ELF &ELF2 Log Formats and Custom Ecommerce Log Formats documents in this section for more detail on how to ensure the proper data is in your logs. Configuraton This section presents a general overview of ecommerce visitor correlation configuration issues. NonUTM Sites If you do not have the UTM sensor installed on your sites, then your visitor correlation configuration will depend on what ecommerce format you are using. ELF: Use IPOnly as your visitor tracking method. In such a case Urchin will automatically correlate the various sessions as long as all logs contain the IP fields. Simply choose IPOnly for the Visitor Tracking Method in the Profile's Reporting screen. Chapter 5: Ecommerce Module 112
ELF2: Use IP+UserAgent as your visitor tracking method. In such a case Urchin will automatically correlate the various sessions as long as all logs contain the IP and UserAgent fields. Simply choose IP+UserAgent for the Visitor Tracking Method in the Profile's Reporting screen. UTMEnabled Sites For UTMenabled sites, the same version of the UTM sensor must be installed on all the pages you want to track. In the Profile Reporting screen set Visitor Tracking Method to Urchin Traffic Monitor. In general you would set the UTM Domain to the domain that is common to the sites you're processing. For example, when processing web logs from ads.urchin.com along with logs from secure.urchin.com the UTM Domain would be set to urchin.com. For specific details on installing and configuring UTM, please see the Visitor Tracking section of the Documentation Library.
Cancelling Ecommerce Transactions
It is sometimes necessary to backout or cancel an ecommerce transaction. Cancelling orders which did not go through or which were disallowed for one reason or another ensures that your Urchin reports, including Campaign Tracking reports, provide accurate information. To cancel an order or transaction, find the transaction in your ELF or ELF2 log. Then, create a duplicate entry which contains a negative transaction total that cancels out the original transaction. For example, if the the original transaction total is $699, enter a duplicate entry with 699 dollars as the transaction total. Read ELF &ELF2 Log Formats to understand the Ecommerce log format that applies to you.
113
Chapter 6: Campaign Tracking Module
Campaign Tracking Overview
The Urchin Campaign Tracking Module accurately tracks visitors from a source, such as a search engine or email link, to a conversion or transaction on your site.
With the Urchin Campaign Tracking Module, you gain the benefits of: MultiSession Tracking: Track visitors from lead to conversion across multiple sessions. Chapter 6: Campaign Tracking Module 114
ROI Analysis: Buy the keywords that convert. Cut those that don't. Goal Conversion: Verify conversion to purchase or any other goal. A/B Testing: Test content and go with what works. Click Fraud Reporting: Identify and take action against click fraud. Day Parts Reporting: Don't waste money when your audience is away. MultiDimensional Comparisons: Marketing campaigns, advertising channels, e mail blasts, search engines, specific keywords, organic searches, and more. How does it work? The Urchin Campaign Tracking Module tracks data from a variety of sources to provide closedloop ROI analysis. Let's look at the steps. Step 1: From Link to Web Page Each visitor to your site enters via a link indicating where they clicked from, the keywords they used, if any, as well as campaign and medium information. The patentpending Urchin Traffic Monitor (UTM3 and UTM4), which is part of the Campaign Tracking Module, parses the link to obtain this information.
The UTM is a small amount of JavaScript code in each of your web pages. You can install the UTM3 or UTM4 manually in each web page or automatically via server side includes and other template systems. Once installed, the UTM is triggered each time a visitor views the page. The UTM performs three tasks; it ensures that a page hit is registered in the web log if the page was cached or proxied, it parses the link to obtain and log campaign information, and it updates visitor activity information. Step 2: Parsing the Link The UTM parses the incoming link to obtain the campaign information. For example, http://www....com/?utm_source=googleperclick indicates that the visitor clicked on a costperclick link on the Google search engine. (UTM4 automatically detects the keywords that the visitor searched on.) Although this particular link uses only two variables, utm_source and utm_medium, which indicate the source, Google, and the medium "costperclick", your links may incorporate three additional variables: utm_campaign, utm_content, and utm_term. These three variables are available to indicate a specific marketing initiative, ad content, and a paid search term (necessary for UTM3), respectively. Information on these variables and how to set up your Urchin Campaign Tracking Module software is provided in the article Step 1: Track Campaign Data. The UTM is not limited to parsing links that you embed in emails or paid keywords, but also parses keyword information from organic links. This is important because it enables you to make sidebyside comparisons of paid versus unpaid search results. The UTM recognizes links from the top search engines and parses out the source and keyword information. In addition, the Campaign Tracking Module can also be configured to recognize and parse links from custom organic search Chapter 6: Campaign Tracking Module 115
engines, if required. Information on how to do this is provided in the article Adding a Custom Search Engine. Step 3: Logging Campaign Information and User Activity The UTM does two things with the campaign information it parses from the links; it formats a web document request that allows the web server to make a special entry in the web log, and updates the client firstparty cookie. The UTM formats the information it parses from the link into the appropriate web document request that will result in the web server adding the referral information to the web log. The UTM also reads the client's firstparty cookie, updating user tracking information as required. For example, if this is the user's first visit to your site, the UTM will add the campaign tracking information to the cookie. If the user previously found and visited your site, the UTM increments the session counter in the cookie. Regardless of how many sessions or how much time has passed, the UTM "remembers" the original referral. This gives the Campaign Tracking Module true multisession tracking capability. Step 4: Adding Goal and CPC Data For the purposes of campaign tracking and ROI calculation, the Urchin Database receives a conversion goal via the Urchin Admin interface (optional), search engine costperclick data (optional), and data from the web log. Once a page in your web site has been defined as a conversion goal, the Urchin Campaign Tracking Module will be able to calculate metrics indicating how successful your site is at converting visitors. By comparing referrals, sessions, and visitor activity to conversions, the Urchin Campaign Tracking Module can report on the effectiveness of your keywords, mediums, campaigns, and content. The system can also report latency metrics such as time to goal and sessions to goal. To learn how to define a conversion goal, read Step 3: Define a Conversion Goal. The Campaign Tracking Module allows you to import your costperclick data directly from your Google and Overture spending accounts. This allows the system to report ROI at all levels of granularity, from perkeyword/persearch engine to percampaign aggregates. To learn how to import spending data from Google, read Import Cost Data from Google. To learn how to import spending data from Overture, read Import Cost Data from Overture. Updates from the web log to the Urchin Database occur according to the schedule that you establish for your profile, as part of your Urchin base product configuration. Step 5: Closing the Loop: Reporting and ROI Once the the Urchin database has been updated with visitor activity, a conversion goal, and costperclick data, the Urchin Reporting Engine is able to create over fifty campaign tracking reports. Among these reports is the following report excerpt, which compares ROI for the keyword Chapter 6: Campaign Tracking Module 116
"analytics system architecture" for each search engine (both "cpc", costper click, and "organic") on which visitors searched for the keyword.
In this case, the keyword "analytics system architecture" was purchased on Google Adwords (google[cpc]). Visitors clicked on the sponsored link 102 times, for a total cost of $6.34 to the advertiser. A revenue amount of $89.15 resulted from these clicks, for an ROI of 1306.15The average value of each click indicates that the advertiser should bid a maximum of 88 cents per click on this keyword. There was also one click on an organic (unpaid) search link, but it did not result in any revenue. The Next Step With visitor tracking and referral link parsing by the UTM and Ecommerce revenue and costperclick data import, the Campaign Tracking Module can accurately correlate conversions to specific campaigns and keywords, provide sidebyside comparisons of paid versus unpaid keywords, and calculate ROI and conversion ratios for keyword buys. To begin realizing these benefits, read Step 1: Track Campaign Data.
The Five Dimensions of Campaign Tracking
Effective campaign tracking uses a combination of the following five marketing dimensions: Source Medium Term Content Campaign This article describes how these marketing dimensions are used in the Urchin Campaign Tracking Module to track campaign referrals. Chapter 6: Campaign Tracking Module 117
Source Every referral to a web site has an origin, or source. Examples of sources are the Google search engine, the AOL search engine, the name of a newsletter, or the name of a referring web site. Medium The medium helps to qualify the source; together, the source and medium provide specific information about the origin of a referral. For example, in the case of a Google search engine source, the medium might be "costperclick", indicating a sponsored link for which the advertiser paid, or "organic", indicating a link in the unpaid search engine results. In the case of a newsletter source, examples of medium include "email" and "print". Term The term or keyword is the word or phrase that a user types into a search engine. Content The content dimension describes the version of an advertisement on which a visitor clicked. It is used in contenttargeted advertising and Content (A/B) Testing to determine which version of an advertisement is most effective at attracting profitable leads. Campaign The campaign dimension differentiates product promotions such as "Spring Ski Sale" or slogan campaigns such as "Get Fit For Summer". To Learn More To learn how to use Urchin Campaign Tracking Management software to track your referrals along the five dimensions of campaign tracking, read Step 1: Track Campaign Data.
Step 1: Track Campaign Data (Set up UTM3)
In order to track campaign data, you need to: copy the UTM files to your web site document root, reference the UTM in your HTML and enable cookies in your logging, pass the UTM variables in your links. Copy the UTM files to your web site document root Copy the files __utm.js and __utm.gif from the util/utm directory of your Urchin distribution to your web site document root. Important: Do not change the names of these files. Reference the UTM in your HTML and enable cookies in your logging Chapter 6: Campaign Tracking Module 118
In the Visitor Tracking section of the Urchin documentation, find the QuickInstall article that applies to your environment. Follow the instructions in Step 2 and Step 3 of the article to reference the UTM in your HTML and enable cookies in your logging. Pass the UTM variables in your links The UTM variables provide a way of tracking your referrals along the five dimensions of campaign tracking by attaching campaign data to your links. The UTM parses this campaign data to determine the referral source, the keywords used, and other campaign tracking information. To pass the UTM variables in a link, add a question mark(?) to the URL followed by the variables and values you would like to assign. Values may be any string containing letters, numbers, underscore(_), and plus(+). Use underscores to separate multiple words in a value (e.g. utm_campaign=think_different); your URL may not contain spaces. Urchin provides a URL builder tool that creates links for you, embedding the campaign information that you specify. Using this tool ensures that your links contain the correct syntax. The following link indicates that the visitor was referred by a paid Google link and that the visitor had searched on the keywords "running shoes". It also indicates that the medium was costperclick. Example http://www.mycompany.com/?utm_source=googletm_term=running+shoes How you use the variables in your links will depend upon your campaign tracking objectives. For recommendations on how to use the UTM variables for Search Engine Marketing, read How To Analyze Keyword Buying. For recommendations on how to use the UTM variables for A/B testing, read How To Perform A/B Testing. For recommendations on how to use the UTM variables for contenttargeted advertising, read How To Track ContentTargeted Ads. Learn how to use each variable from the table below. Variable Name utm_source Description Example
Required. Use utm_source to identify a search engine or utm_source=google other source. Recommended. Use utm_medium to identify a medium utm_medium utm_medium=cpc such as email, costperclick(cpc), or cpccontent. Required for keyword analysis using Urchin 5.5/UTM3. (Urchin 5.6/UTM4 and later versions, automatically detect keywords from costperclick utm_term referrals.) Use utm_term to identify the keywords that the utm_term=running+shoes visitor searched on to get your link. If you specify a utm_term with Urchin 5.6/UTM4 and higher, your specified term overrides the detected term. Chapter 6: Campaign Tracking Module 119
Required for contenttargeted advertising and A/B testing. Use utm_content to differentiate ads or links that utm_content=logolink point to the same URL. Required for keyword analysis. Use utm_campaign to utm_campaign identify a specific product promotion or strategic utm_campaign=spring_sale campaign. utm_content
Step 2: Install and License Campaign Tracking
You must have an Urchin base product license and a Campaign Tracking Module license in order to use the Campaign Tracking Module. If you wish to perform ROI calculations, you must also have an Ecommerce Module License. If you have not yet installed the Urchin base product, read the Getting Started>System Requirements>Urchin Setup Requirements article and the Getting Started>Installation section now. If you are already using the Urchin base product, follow these steps to obtain a Campaign Tracking Module License: 1. Sign into Urchin as Administrator. 2. In the Urchin Admin Interface, click Configuration. 3. Click Settings>License. The License Information screen appears with a link entitled Upgrade License. 4. Click Upgrade License. The Upgrade License wizard appears. 5. Follow the steps as indicated in the Wizard to purchase and install the license.
Step 3: Define a Conversion Goal
120
A goal is a web site page which a visitor reaches once she or he has made a purchase or performed some other desired action, such as a download or user registration. Before Urchin can calculate goal conversion metrics, you must define one or more goals within your campaign profile. What is my campaign profile? Your campaign profile is the profile from which you intend to run campaign reports. If you have never used Urchin before, you will first need to create a basic profile. Follow the instructions in Urchin Administration>Profiles>Working with Profiles, then follow the instructions in this article.
If you have already defined a profile, follow the instructions in this article to enable the profile for campaign tracking and create a conversion goal. Enable a profile for campaign tracking and create a conversion goal 1. Sign on as Administrator and, in the Admin Interface, click Configuration. 2. Click Urchin Profiles>Profiles. 3. Click the Edit key next to the profile you wish to edit. 4. In the Profile Settings tab, click the Campaign Website radio button and click Update. 5. On the Profile Filters tab, make sure that the following filters are applied: Decode UTM Campaign Content Decode UTM Campaign Name Decode UTM Campaign Source (Medium) Decode UTM Campaign Source (Medium) Term Decode UTM Campaign Term If these filters are not applied, click Add. The Filter Wizard appears. Select PreConfigured Filter radio button and press Next. Select the filters listed above in the Available Filters area, move them to the Applied Filters area, and click Finish. On the Profile Filters tab, click Update. 6. In the Reporting tab, add a Primary Goal Match, a Primary Goal Field, and click Update. Urchin logs a goal completion each time that the Primary Goal Field matches the value specified in Primary Goal Match. Any POSIX regular expression may be entered in the Primary Goal Match field. For example, if you select "request_stem" from the dropdown menu as the Primary Goal Field and enter "/downloads" as the Primary Goal Match, Urchin logs a goal completion each time the request_stem (i.e. request URI without query information) has a value of "/downloads". The field "request_stem" is the most common Primary Goal Field used, however, other fields may be used as well. For a complete description of fields, read Reference>Regular Field Chapter 6: Campaign Tracking Module 121
List. Setting Multiple Goals It is possible to set multiple goals in the Primary Goal Match field. For example, entering the following in the Primary Goal Match field and "request_stem" in the Primary Goal Field: /((forms/(downloadarea|registerarea)_confirmation)|special/profile_form))\.asp will tag the following pages as goals: /forms/downloadarea_confirmation.asp /forms/registerarea_confirmation.asp /special/profile_form.asp You may enter any POSIX regular expression, up to 255 characters in length, in the Primary Goal Match field.
Tagging Your Online Links 123
If you are using the Urchin Profit Suite or the Urchin Campaign Tracking Module, you'll want to make sure that you've got a comprehensive strategy for tagging your online ads. This is an important prerequisite to allowing Urchin to show you which marketing activities are really paying off. Fortunately, the tagging process goes smoothly once you understand how to differentiate your campaigns. Here is a threestep process to help you get started. 1. Tag only what you need to. Generally speaking, you need to tag all of your paid keyword links (such as those on Google Adwords and Overture), your banners and other ads, and the links inside your promotional email messages. There are certain links that you don't need to, and many times will not be able to tag. You should not attempt to tag organic (unpaid) keyword links from search engines and it isn't necessary to tag links that come from referral sites, such as portals and affiliate sites. Urchin automatically detects the search engine and keyword from organic (unpaid) keyword referrals, and you'll see metrics for these referrals in your Urchin reports, typically under "Organic" listings. Urchin also detects referrals from other websites and displays them in your reports, whether or not you have tagged them. 2. Create your links using the URL Builder. Campaign links consist of a URL address followed by a question mark and your campaign variables. But, you won't need to worry about link syntax if you fill out the URL Builder form and press the Generate URL button. A tagged link will be generated for you and you'll be able to copy and paste it to your ad. If you are asking "which fields should I fill in?", you're ready for Step 3.
122
3. Use only the campaign variables you need. Urchin's link tagging capabilities allow you to uniquely identify virtually any campaign you can think of. But, don't think that you must use all six fields in the URL Builder form in each of your links. On the contrary, you should usually only need to use three: Source, Medium, and Campaign. Let's look at the best ways to tag the three most common kinds of online campaigns banner ads, email campaigns, and paid keywords. Banner Ad Campaign Source Campaign Medium Campaign Term Campaign Content Campaign Name productxyz productxyz productxyz citysearch banner Email Campaign newsletter1 email Pay Per Click Keywords google ppc
You'll notice that Campaign Term isn't used for any of these links, even the Pay Per Click Keyword campaign. That's because Term is no longer necessary as long as you are using Urchin 5.6/UTM4 and later. (Campaign Term IS necessary if you are still using Urchin 5.5 and UTM3.) What about Campaign Content? We're only covering the most common scenarios in this article, but if you are interested in Campaign Content, read the article How To Perform A/B Testing. What about the Campaign ID/Master Tracking Code? If you want to hide the tagging information that you put in your links, Urchin gives you a way of creating a table that keeps all the information private. To read more about this, see the article How To Use Master Tracking Codes. So get started tagging your links and tracking your way to online success!
Import Cost Data from Google
Importing cost data from Google is easy. Just perform the following steps: Download your Google AdWord spending into a log file. Download your spending data on a daily or weekly basis, prior to the regularly scheduled run of your profile (or before manually runnning the profile).
123
Modify your profile to read the log file. You will only need to modify your profile once, as part of your initial setup. Download Google AdWord spending into a log file 1. On adwords.google.com, log in to your Google AdWords account. 2. In the Reports tab, click Custom Report. 3. Fill out the URL Report fields and click Create Report.
View check the Daily Metrics radio button. Date Range If this is the first time you are downloading data for campaign tracking, enter a date range beginning with the date you started tracking campaign data and ending with yesterday's date. (The date you started tracking campaign data is the date you completed implementing the instructions in the article Campaign Tracking Module>Step 1: Track Campaign Data.) If you have already downloaded historical data, enter a date range beginning with the day after your previous download enddate, and ending with yesterday's date. For daily downloads (recommended), enter a date range beginning with yesterday's date and ending with yesterday's date. Detail Level click Show options. Check Keyword names and Include all keywords. Check Campaign names and select All Campaigns. Check Ad Group names and select All Ad Groups. Values
Ad Text Check Destination URL.
Conversions (Optional. Check the following if you have enabled conversion tracking on your Google Adwords account. If you do not have conversion tracking enabled, skip this step.) Chapter 6: Campaign Tracking Module 124
Report name Enter a report name. Scheduling Checkmarking this box means that you will not have to define this report again. The report format will be saved and will be automatically run each day, week, or month (according to your selection). Email Checkmarking this box tells Google to email you when the report has run.
4. Click the "Create Report" button 5. Once you have manually run the report, or once it has been automatically scheduled and run by Google, the report will appear in the Download Center. (Click the Download Center link at the top of the page.) Select the report for the day or week you want and download it as a .tsv file. Modify your profile to read the log file You will only need to modify your profile once, as part of your initial setup. 1. In the Urchin Admin interface, click Configuration, and then Urchin Profiles >Profiles. 2. Click the Edit icon for your campaign profile. Your campaign profile is the profile that you configured as part of Step 2:Configure Urchin>Define a Conversion Goal. 3. On the Profile Settings tab, make sure that Profile Type is Campaign with E Commerce Website. 4. On the Reporting tab, under Campaign Options, make sure that Primary Goal Match and a Primary Goal Field are filled in. If they are not, read Step 2:Configure Urchin>Define a Conversion Goal. 5. On the Profile Filters tab, make sure that the following filters are applied: Decode UTM Campaign Content Decode UTM Campaign Name Decode UTM Campaign Source (Medium) Decode UTM Campaign Source (Medium) Term Decode UTM Campaign Term If these filters are not applied, click Add. The Filter Wizard appears. Select PreConfigured Filter radio button and press Next. Select the filters listed above in the Available Filters area, move them to the Applied Filters area, and click Finish. On the Profile Filters tab, click Update. Chapter 6: Campaign Tracking Module 125
6. On the Log Sources tab, click Add. The Log Source Wizard appears. 7. Select PreConfigured Log Source and click Next. 8. In the Available Log Sources area, select a log source that contains your Google AdWords spending data, move it to the Log Sources to Process area, and click Finish. 9. On the Log Sources tab, click Update.
Import Cost Data from Overture
Importing cost data from Overture is easy. Just perform the following steps: Download your Overture spending into a log file. Download your spending data on a daily or weekly basis, prior to the regularly scheduled run of your profile (or before manually runnning the profile).
Modify your profile to read the log file. You will only need to modify your profile once, as part of your intial setup. Download Overture spending into a log file 1. On www.overture.com, click Advertiser Login and log in to your Overture account. 2. Click the Reports tab on the top of the page. 3. In the Select a Report Type dropdown menu, select Acount Activity Detail (Match Type) 4. Specify a filter, date range, and click Create Report. Select Overture Results filter. Enter a date range. If this is the first time you are downloading data for campaign tracking, enter a date range beginning with the date you started tracking campaign data and ending with yesterday's date. (The date you started tracking campaign data is the date you completed implementing the instructions in the article Campaign Tracking Module>Step 1: Track Campaign Data.) If you have already downloaded historical data, enter a date range beginning with the day after your previous download enddate, and ending with yesterday's date. For daily downloads (recommended), enter a date range beginning with yesterday's date and ending with yesterday's date.
126
5. When the report appears in your browser, scroll to the bottom of the page and click Download as Spreadsheet. 6. Save the file to your logsource directory. 7. Convert the report file to the UTF8 format, using one of the two methods below. Open the file in Excel, File>Save As, and choose tabdelimited format.
or In the util directory of your Urchin distribution, execute the following script: iconv f UTF16 t UTF8 filename > newlogsourcename Modify your profile to read the log file You will only need to modify your profile once, as part of your intial setup. 1. In the Urchin Admin interface, click Configuration, and then Urchin Profiles >Profiles. 2. Click the Edit icon for your campaign profile. Your campaign profile is the profile that you configured as part of Step 2:Configure Urchin>Define a Conversion Goal. 3. On the Profile Settings tab, make sure that Profile Type is Campaign with E Commerce Website. 4. On the Reporting tab, under Campaign Options, make sure that Primary Goal Match and a Primary Goal Field are filled in. If they are not, read Step 2:Configure Urchin>Define a Conversion Goal. 5. On the Profile Filters tab, make sure that the following filters are applied: Decode UTM Campaign Content Decode UTM Campaign Name Decode UTM Campaign Source (Medium) Decode UTM Campaign Source (Medium) Term Decode UTM Campaign Term If these filters are not applied, click Add. The Filter Wizard appears. Select PreConfigured Filter radio button and press Next. Select the filters listed above in the Available Filters area, move them to the Applied Filters area, and click Finish. On the Profile Filters tab, click Update. 6. On the Log Sources tab, click Add. The Log Source Wizard appears. 7. Select PreConfigured Log Source and click Next. 8. In the Available Log Sources area, select a log source that contains your Overture spending data, move it to the Log Sources to Process area, and click Finish. 9. On the Log Sources tab, click Update.
127
Adding Cost and Impression Data
The Urchin Campaign Tracking Module (beginning with version 5.6) allows you to add fixed advertising costs and impression data to campaigns. If, for example, you have a cost associated with search engine optimization, website development, or an email campaign, you can enter this cost and see it reflected in Urchin reports, including campaign ROI calculations. The cost and impression data you enter is aggregated for the date range you specify when viewing reports. For example, if you enter 10,000 impressions for a campaign for January 1 and 5,000 impressions for February 1, Urchin will report 15,000 impressions for the reporting date range of January through February. You may also enter negative numbers, thereby adjusting cost and impression data. Using the same example, if you enter 10,000 impressions for January 1 and 5000 impressions for February 1, Urchin will report 5,000 impressions for January through February. How To Add Cost and Impression Data 1. In the Admin interface, click Configuration. 2. Edit the profile to which you wish to add data. 3. In the Storage/DB tab, click the Add Cost Data button. The Add CTM Entry Wizard appears. 4. Enter the date as of which the cost and/or number of impressions should apply. 5. Enter the CTM variable(s) that describe the campaign for which you are entering data. For example, to apply the data towards all organic Google referrals, specify the Source as google and the Medium as organic. To apply the data towards the summer newsletter (and assuming that you tag summer newsletter referrals with a utm_source=summer_news), specify the Source as summer_news. 6. Enter the cost amount and/or number of impressions that you want to associate with this campaign. 7. Click Add to Next Run. Urchin adds the cost/impression data to the Urchin database the next time that this profile is run. Example: How To Add NonSearch Engine Specific SEO Costs If you wish to enter a cost that applies to all organic search (i.e. the cost is not specific to Google or Yahoo, etc), enter a "" for Source and "organic" for Medium, as shown below.
128
How To Analyze Keyword Buying
How does keyword buying analysis help me? Which keywords should I invest in? How much should I bid for a keyword? How much do I make on keywords? At which times of the day should I maximize my search engine exposure? How can I identify click fraud? You can answer these and other questions by analyzing your keyword buying with the Urchin Campaign Tracking Module. This article provides a walkthrough of each step, from collecting the data to analyzing the reports. What are the steps to analyze keyword buying? License and Install Urchin You will need to purchase and install the Urchin base product and the Campaign Tracking Module. If you need keyword ROI metrics, you should license the Profit Suite, which includes the Urchin base product, the Campaign Tracking Module, and the ECommerce Module. Read Step 2:Install and License Campaign Tracking for more information. Define a Conversion Goal Read Step 3:Define a Conversion Goal to learn how to specify a goal for your site. Purchase Your Keywords Chapter 6: Campaign Tracking Module 129
For each purchased keyword from a payperclick search engine (such as Google or Overture), you will need to set up a referral link to your site and embed UTM variables. This article describes the best way to use the UTM variables for keyword buying analysis, below. Track Campaign Data You will need to install the UTM, enable cookie logging, and embed UTM variables in your referral links. The article Step 1:Track Campaign Data contains information on how to do all three of these things, and additional information on how to use the UTM variables for keyword buying analysis is provided in this article, below. Import Keyword Spending Data(only necessary for ROI reporting) Read Import Cost Data from Google and or Import Cost Data from Overture. Import Ecommerce Data (only necessary for ecommerce ROI analysis reports) Optimize Your Keyword Buys Information on how to use the keyword reports to optimize your keyword buying is provided in this article, below. Which UTM variables should I use for keyword analysis? If you are using UTM4 (Urchin 5.6) and later versions For paid search engine links, such as Google AdWords, use utm_source, utm_medium, and utm_campaign. If you are using broad matching, and you wish to see metrics for the broad matched keyword (rather than letting UTM detect the specific keyword), you may wish to use utm_term. The following link is an example of how you would use the UTM variables in a Google AdWords link. It indicates that the referral came from a paid Google search term and that the medium was costperclick. It also indicates that the visitor clicked on your adidas promotion link. The UTM4 automatically detects the keywords used to find your site. Example http://www.mycompany.com/?utm_source=googletm_campaign=adidas Urchin provides a URL builder tool that creates links for you, embedding the campaign information that you specify. Using this tool ensures that your links contain the correct syntax. If you are using UTM3 (Urchin 5.5 and 5.501) For paid search engine links, such as Google AdWords, use utm_source, utm_medium, utm_term, and utm_campaign. The following link is an example of how you would use the UTM3 variables in a Google AdWords link. It indicates that the referral came from a paid Google search term, that the medium was costperclick, and that the visitor had searched on the keywords "running shoes". It also indicates that the visitor clicked on your adidas promotion link. Example http:// www.mycompany.com/? utm_source=googletm_term=running+shoesdas
130
Urchin provides a URL builder tool that creates links for you, embedding the campaign information that you specify. Using this tool ensures that your links contain the correct syntax. What about unsponsored links? (aka unpaid, free, or organic listings in search engines) You only embed variables in sponsored links links for which you paid on a search engine, or links over which you otherwise have control, such as links in an email that you send to customers. You dont have to worry about unsponsored links because Urchin Campaign Tracking automatically determines which search engine the referral came from and which keywords the visitor used. Be consistent with UTM variables. It is important that you use consistent names and spellings for all of your campaign variable values. For example, choose a code or name that indicates costperclick and use it consistently. To Urchin Campaign Tracking, utm_medium=cpc and utm_medium=cost_per_click are different mediums. Beginning with Urchin 5.6, a master tracking code feature is available that significantly reduces the possibility of consistency errors. Read How To Use Master Tracking Codes.
Optimize your keyword buys. Which keywords should I buy? How much should I pay for a keyword? How much do I make on a keyword? At which times of the day should I maximize keyword exposure? How can I identify clickfraud? Which keywords should I buy? You should buy keywords that return the highest number of transactions and/or goals, or yield the highest revenue. Begin by looking at the Keyword Comparison>Conversion report. Which keywords deliver the highest goal conversion and/or sales conversion rates? In this example, the highest goal conversion and sales conversion rates (1.8% and 1.75%, respectively) come the third item. (Note: Sales conversion rate appears only if you have licensed the Ecommerce module.)
131
Next, look at the Keyword Analysis>Conversion report and drill down to see keyword by keyword detail inside of each of your organic search engines. Keywords that deliver high conversion rates on organic search engines are often good keywords to buy. If you have licensed the Ecommerce module, look at the Keyword Comparison>ROI report and the Keyword Analysis>ROI report. Which keywords perform the best on each search engine? Again, often organic search engine results can give a good indication of how a keyword will perform as a sponsored link on a particular search engine. How much should I pay for a keyword? To answer this question, you will need to have licensed the Ecommerce module. Look at the Keyword Comparison>ROI report and drill down on the keyword you are analyzing. The Avg. Value metric will tell you the average value per click, or total revenue divided by clicks. This is the maximum amount you should bid on the keyword. Note that the average value does not take into account production costs or other business expenses. In the example below, the keyword "analytics system architecture" on the "google [cpc]" (Google costperclick) search engine is yielding an average value of 33 cents.
How much do I make on a keyword? If you have licensed the Ecommerce module, look at the Keyword Comparison>ROI report or the Keyword Analysis>ROI report. Both of these reports show your Return on Investment for each keyword on each search engine, for all keywords across a single search engine, and for all search engines across a single keyword. In the example below, the costperclick ROI for "analytics system architecture" on Google is 763%.
At which times of the day should I maximize a keyword's exposure? Look at the Day Parts Breakdown>Goal Conversion by Hour or Sales Conversion by Hour. Drill Chapter 6: Campaign Tracking Module 132
down on the keyword you are analyzing. The report will display the number of goals or transactions and a conversion rate by hour of the day. The timezone is controlled by your administrator using the Time Offset field on the Reporting tab of Configuration>Urchin Profiles>Profiles>Edit. By default, this is set to "Local Time". How can I identify clickfraud? Look at the Click Fraud Watch>Repeat Clicks by IP report. Drill down on any IPVisitor ID to view search engines. If any costperclick search engines appear with Repeat Clicks, click on those search engines to determine the keyword(s) on which Repeat Clicks occurred. You can ignore any Repeat Clicks on organic search engines since these clicks are harmless. Note that Repeat Clicks do not necessarily indicate hostile activity; Repeat Clicks often occur naturally as result of visitors going back and forth between the referral and your site. Look for a high number of Repeat Clicks (10 or more) per day over a period of several days on specific paid keywords. You can click on an IPVisitor ID in the Repeat Clicks by IP report to obtain information on the click originator. Note that this data may contain information on the ISP and not the individual visitor. For additional information on clickfraud, visit Alchemist Media, Inc. Click Fraud Guidelines or contact Jessie Stricchiola (jessie@alchemistmedia.com). Acquisition report. In the Filter field, type "referral|none|organic" and click the minus button. The report now displays only referrals that were tagged with UTM variables. For example, in the following report, the report displays referrals from three source[medium] combinations.
Drill down on the source[medium] for the content you would like to examine. The report now displays the different versions of content within that source [medium]. The CTR (Click Through Rate) tells you the percentage of impressions (ad displays) that resulted in clicks. The %New field tells you the percentage of clicks that are new leads. Which versions of my advertisements refer the visitors most interested in my site? Look at the Content (A/B) Testing>Quality report. In the Filter field, type "referral|none|organic" and click the minus button. The report now displays only referrals that were tagged with UTM variables. Drill down on a source[medium] for the content you would like to examine. The report now displays the different versions of content within that source [medium]. Depth is a measure of interest which Chapter 6: Campaign Tracking Module 133
tells you the average number of pages on your site that each visitor viewed. Loyalty indicates the average number of times visitors returned to your site. Which versions of my advertisements refer the visitors most likely to reach a conversion goal on my site? Look at the Content (A/B) Testing>Conversion report. In the Filter field, type "referral|none|organic" and click the minus button. The report now displays only referrals that were tagged with UTM variables. Drill down on a source[medium] for the content you would like to examine. The report now displays the different versions of content within that source [medium]. Goal Conv. (Goal Conversion Rate) is the percentage of referrals that reached a goal on your site. If you have licensed the the Ecommerce Module, the percentage of referrals that made a purchase on your site, Sales Conv. (Sales Conversion Rate), is displayed. Which versions of my advertisements provide the biggest return? If you have licensed the Ecommerce module, you will be able to see the revenue associated with each version of content. Look at the Content (A/B) Testing>ROI report. In the Filter field, type "referral|none|organic" and click the minus button. The report now displays only referrals that were tagged with UTM variables. Drill down on a source[medium] for the content you would like to examine. The report now displays the different versions of content within that source [medium]. Revenue is the gross revenue associated with referrals from the content within the source[medium]. For Content Analysis, the Cost column will always be 0 and the ROI column will be equal to Revenue.
How To Track ContentTargeted Ads
If you currently purchase keywords on Google or Overture, you should also consider participating in the Google and Overture contenttargeted advertising programs. These programs place your costperclick search ads on content sites that are published by Google and Overture partners. For example, if you sell vacation packages to France, your ad might appear in an article on Parisian restaurants. Participation in the Google and Overture programs is free, and you pay the same costperclick that you pay for search engine referrals. To track your contenttargeted ad referrals, you will need to: sign up for contenttargeted advertising on your Google and/or Overture cost perclick account edit your links to track contenttargeted ad referrals Once you have signed up for contenttargeted advertising, each of your costperclick ads will have Chapter 6: Campaign Tracking Module 134
two links associated with it one link for search referrals and one link for contenttargeted referrals. You will need to edit the link used for contenttargeted referrals so that you can track search engine referrals and contenttargeted referrals separately. Which UTM variables should I use to track contenttargeted ad referrals? For contenttargeted ad referrals, you should use utm_source, utm_medium, utm_content. Use utm_source to indicate the search engine. Use utm_medium to indicate a costperclick contenttargeted ad. For example, you might use "utm_medium=cpccontent" to differentiate from your search referrals which say "utm_medium=cpc". Use utm_content to specify which specific ad referred the visitor. If you have multiple types of products or multiple campaigns, you should also use utm_campaign. For example, if you have a spring sale campaign and an adidas promotion, you should indicate the appropriate campaign in your link. The following link illustrates how you would use the UTM variables for a contenttargeted ad referral: Example http:// www.mycompany.com/buy_page?utm_source=googleontent Urchin provides a URL builder tool that creates links for you, embedding the campaign information that you specify. Using this tool ensures that your links contain the correct syntax. To sign up for contenttargeted ad placements on Google 1. Log in to your AdWords account. 2. In the Campaign Summary table, click the appropriate ad campaign. 3. Click Edit Campaign Settings above the Ad Groups table. 4. At the bottom of the Edit Campaign Settings table, locate the distribution preferences checkboxes, and: 5. Click the checkbox next to content sites in Googles network to check this option. Your ad will be included on additional content sites in the expanded network (iIf you click again to remove the check, your ad will not be included on these sites). 6. Click Save All Changes at the bottom of the page to finish. To edit your links to track content targeted referrals from Google 1. Log in to your Adwords account. 2. Navigate to your campaign and Ad Group 3. Scroll to the bottom of your keyword list to edit your Ad(s) 4. Click Edit on one of your ads (Or, click Create New Ad) 5. Create a link according to the guidelines described above, in the section "Which UTM variables should I use to track contenttargeted ad referrals?" To sign up for contenttargeted ad placements on Overture 1. Log in to your Overture account. 2. Click the Account SetUp link on the Account tab. Chapter 6: Campaign Tracking Module 135
3. Under Content Match Advertising, select On and click Submit. To edit your links to track content targeted referrals from Overture 1. Log in to your Overture account. 2. Click the Manage Products tab. "PayFor_Performance Search | Content Match" displays on the top margin of the page. 3. Click Content Match and select Manage Listings from the drop down menu. 4. Click next to the search term you wish to edit and press Edit Listings. 5. Click Modify Listings in the pop up. 6. Create a link according to the guidelines described above, in the section "Which UTM variables should I use to track contenttargeted ad referrals?"
How To Track Email Campaigns
The Urchin Campaign Tracking Module (beginning with version 5.6 and UTM4) allows you to track email campaign impressions, clickthroughs, and conversions. An email impression is registered when the email recipient opens the email message. A clickthrough is registered when the recipient clicks on a link inside the email message. A conversion is registered when the recipient reaches a goal page on your site or completes a purchase. This article describes how to: create the email message, and interpret the email campaign results Creating the Email Message You will need to create your email message as described in this section, so that the Urchin Campaign Tracking Module can accurately track impressions (opened emails) and referrals. 1. To track the email impressions, embed the __utm.gif image anywhere in the message as illustrated below. Example 1 (Tracking the email impressions using a master tracking code) <img src="http://www.mysite.com/__utm.gif?utmt=imp reference the __utm.gif installed on your site. utmt=imp ad is campaign required. tracking codes.
Example 1 illustrates how your reference to __utm.gif should look if you are using master tracking codes. (To learn how to use master tracking codes, read How To Use Master Tracking Codes.) In Example 1, the email impressions will be credited to the Chapter 6: Campaign Tracking Module 136
master tracking code of 10. Example 2 (Tracking the email impressions without a master tracking code) <img src="http://www.mysite.com/__utm.gif? utmt=impcmd=email"> reference the __utm.gif installed on your site. utmt=imp ad campaign tracking is codes. required.
Example 2 illustrates how your reference to __utm.gif should look if you do not use master tracking codes, and therefore explicitly state your campaign tracking variables in the reference. In Example 2, the email impressions would be credited to the source "news1" and the medium "email". Explanation of Variables The reference to __utm.gif includes campaign variables that are different than the campaign variables you use in links. In your reference to __utm.gif, use: utmccn instead of utm_campaign utmcsr instead of utm_source utmcmd instead of utm_medium utmcct instead of utm_content utmcid instead of utm_id (for use with master tracking codes) 2. To track the email referrals, create links to your site in the email message. Tag these links using the utm_medium, utm_source, utm_content, utm_campaign, and utm_id campaign variables. Continuing with the example above of an email message which you track using the source "news1" and the medium="email":
<a href="http://www.mysite.com/?utm_source=news1">ad text</a>
Analyzing the Results To see the number of clicks, impressions, and the clickthrough rate for email campaigns, look at the Medium Comparison>Acquisition report under Campaign Tracking. For example, if you have tagged your links with utm_medium=email, this report will show all your email activity summarized under "email". You can drill down on "email" to view each individual newsletter or mailing (source). If you are conducting A/B testing in conjunction with an email campaign, look at the Content(A/B) Testing>Acquisition report. For information on A/B Testing, read How To Perform A/B Testing
137
How To Use Master Tracking Codes
The Urchin Campaign Tracking Module (beginning with version 5.6 and UTM4) allows you to tag your links using master tracking codes (a utm_id) instead of individual variables. You simply use utm_id in your links, and define the meaning of each utm_id in a table. For example, instead of http://www.hostsite.com/?utm_source=overturetm_campaign=springpromo you can use the UTM4 variable utm_id as follows. http://www.hostsite.com/?utm_id=2 Using utm_id hides your campaign tracking variables from web surfers and makes your tagging process less errorprone, since campaign tracking variable values are specified in a table, where corrections and changes are easily made. To use master tracking codes, you Define your codes in a table Apply the table as a filter to your profile Use your master tracking codes in your links Defining Your Codes To define your codes: 1. Create a table in Excel that maps your codes to a set of campaign variables. An example is shown below. The first row of the file must begin with "#Fields:", followed by UTM variable names in any order. Each row defines the campaign variable settings for the master tracking code you place in the #Fields column. Use a hyphen () to indicate no value. You may omit any fields from the table for which there are no values for any code. For example, utm_term has been omitted from the spreadsheet, below.
2. Save the Excel table as a tab delimited plain text file in the lib/custom/lookuptables directory of your Urchin distribution. You must save the file with an extension of ".lt". Chapter 6: Campaign Tracking Module 138
Applying the Table to Your Profile To apply the table to a profile: 1. In the Admin tool, click Configuration. 2. Edit the profile to which you wish to apply the master tracking codes. 3. In the Profile Filters tab, click Add. 4. In the Filter Wizard:Options screen, select Add New Filter and click Next 5. In the Filter Wizard:Settings screen, select Lookup Table. The Table Name field appears in the wizard. 6. From the Table Name drop down list, select the name of the table you created in the section Defining Your Codes, above. If your table does not appear in the drop down list, make sure that the table file name ends with .lt and that it has been saved in the lib/custom/lookuptables directory of your Urchin distribution. 7. From the Filter Field drop down list, select utm_id (AUTO). 8. Click Finish. Using Your Codes in Your Links Use the values in the Fields column of your lookup table as utm_id values. For example, using the lookup table created in this article, above, you might tag a link as follows: http://www.hostsite.com/?utm_id=1 Additional Notes If you use utm_id in conjunction with other UTM variables in a link (such as utm_source), the values of the other variables will be overwritten with the values in your lookup table. For example, using the lookup table shown in this article, above, the utm_source for the following link would be overwritten with the value of "google": http://www.hostsite.com/?utm_id=1ure However, the utm_term in the following link would not be overwritten, since no value for utm_term was provided in the lookup table. (utm_source and utm_medium would be overwritten with values from the table.) http://www.hostsite.com/?utm_id=1ureutm_medium=cpc
URL Builder
139
Urchin CTM URL Generator Fill in the form information and click the "Generate URL" button, below. If you are new to tagging links or this is your first time using this tool, read Tagging Your Online Links 123. Step 1: Type the URL of your website. Website URL: *
(e.g. http://www.urchin.com/download.html)
Step 2: If you are using a Master Tracking Code, type the code and go to Step 3. Campaign ID/Master Tracking Code: * (e.g. 1003) Step 2: Or, if you are not using a Master Tracking Code, fill in the fields below and go to Step 3. Campaign Source: * Campaign Medium: * Campaign Term: Campaign Content: Campaign Name: * Step 3 (referrer: google, citysearch, newsletter4) (marketing medium: ppc, banner, email) (keywords. Not necessary beginning with Urchin 5.6) (use to differentiate ads) (product, promo code, or slogan)
Help Information Available for 5.6/UTM4 and later versions. If you are using master Campaign tracking codes, enter the code with which to tag this link. If you do ID/Master Tracking not enter a Campaign ID, you must enter a Campaign Source and a Code (utm_id) Campaign Medium. Campaign Source (utm_source) Required. Use utm_source to identify a search engine, newsletter name, or other source. Example: utm_source=google
Campaign Medium Required. Use utm_medium to identify a medium such as email or (utm_medium) costper click. Example: utm_medium=cpc Campaign Term (utm_term) Campaign Content (utm_content) Campaign Name (utm_campaign) Required for keyword analysis on preUrchin 5.6 and UTM3 tracked sites. Use utm_term to identify the keywords that the visitor searched on to get your link. Example: utm_term=running+shoes Required for A/B testing and contenttargeted ads. Use utm_content to differentiate ads or links that point to the same URL. Examples: utm_content=logolink or utm_content=textlink Required for keyword analysis. Use utm_campaign to identify a specific product promotion or strategic campaign. Example: 140
utm_campaign=spring_sale * Required Fields. You must enter a Campaign ID or enter a Campaign Source and Campaign Medium.
Implementation Checklist
Print this checklist and use it to implement campaign tracking. help.urchin.com Campaign Tracking Module Articles The Five Dimensions of Campaign Tracking, Step 1:Track Campaign Data, How To Use Master Tracking Codes How To Analyze Keyword Buying, Step 1:Track Campaign Data How To Track ContentTargeted Ads Step 1:Track Campaign Data, How To Perform A/B Testing, How To Track Email Campaigns
Task Status
Tasks
Create a plan to ensure consistent use of campaign tracking variables in referral links
Tag paid search engine keywords
Tag Google and Overture contenttargeted ads
Tag newsletter and advertising links
Determine your license requirements (Profit Suite, Base License+CTM, or Campaign Tracking Module only (if you currently license Urchin 5.0)
141
Purchase and download software
Step 2:Install and License Campaign Tracking Getting Started>Installation, Getting Started>Initial Configuration, Step 1: Track Campaign Data Visitor Tracking>Quick Install Step 3:Define a Conversion Goal Import Cost Data from Google, Import Cost Data from Overture Ecommerce Module ELF &ELF2 Log Formats Import Cost Data from Google, Import Cost Data from Overture
Install software
Copy the UTM3 or UTM4 to web site and reference it in all HTML pages Enable cookie logging on webserver
Set up a campaign profile with a goal
Add costperclick data log sources to campaign profile for Google and Overture spending data
Add ecommerce (ELF2) log sources to campaign profile for ecommerce revenue data Create a schedule for downloading keyword spending data from Google and Overture
142
Chapter 7: Advanced Topics
Utilities
Administration Utilities Overview
Overview Urchin ships with a number of utility programs that are used for diagnostic and configuration purposes. These utilities are located in the util directory of the Urchin distribution. This document is intended as an introductory overview of these utilities. It is not a comprehensive guide to their usage. Please consult the specific documentation for each utility in the Utilities section of the Advanced Topics area of the Urchin Documentation Center at http://help.urchin.com for detailed information about the capabilities and usage of each of these programs. All utilities support "h" and "v" command line arguments. Invoking a utility with the "h" argument will give a summary of usage and the available options for that tool. Invoking with "v" argument prints the Urchin version of the utility.
143
geoupdate This utility is used to check for updates to Urchin's internal DNS database files and download the updates if they are available. The utility can also be used to import custom entries into the DNS databases by using the domain.local file or another specified text file. inspector This utility performs basic sanity checks on your installed Urchin distribution, ensuring that the overall structure of the distribution is intact, that all the binaries shipped with the product are the proper version, and that the underlying permissions are correct (on UNIXtype platforms). The utility also reports on the operational status of the Urchin Scheduler (urchind) and the bundled Apache web server (urchinwebd). u3importer This utility is used to migrate existing Urchin 3 config file information and report databases into Urchin. It runs interactively and prompts the user for the location of the Urchin 3 config file. The process then imports the Urchin 3 data without disturbing the existing Urchin 3 installation. u3importer cannot import all configuration data specified by Urchin 3 config file directives. Some Urchin 3 directives, such as subreport mode, are not supported in Urchin 5, and have no equivalent. Others such as filters, are organized significantly different in Urchin 5, so they cannot be imported exactly as they were specified in Urchin 3. The main objective of this tool is to get all your Urchin 3 report blocks and data imported in a basic fashion so that you have Urchin 5 reporting operational for all your sites, and past report data is available. uconfexport/uconfimport These utilities use an XMLstyle text format to represent the contents of the Urchin 5 configuration database in a human readable intermediate form. An Urchin configuration can therefore be exported and saved, or imported to restore the state of an Urchin configuration. Saved configurations can also be modified with an editor, or configuration files can constructed from scratch, before being imported back into the Urchin configuration database. You can therefore mimic the "config" file functionality that existed with Urchin 3 if desired. It is recommended that you use uconfexport on a regular basis to save your current configuration state as a backup. uconfdriver This utility provides a command line interface for administering the Urchin configuration. All functionality present in the Urchin administration interface is available in this utility, thus it can completely replace the use of the administration interface as far as managing all aspects of Urchin. uconfdriver is intended for use in situations where scriptable actions for managing the Urchin configuration are desired. This makes it ideal for environments such as large shared hosting operations, where the amount of data that must be managed makes it impractical to manage Urchin via the webbased administrative interface. uconfschedule This global task scheduling utility alows you to schedule all configured Profiles to run at a certain time, including scheduling them all to run immediately. The Urchin Task Scheduler (urchind) must be running for uconfschedule to work. Chapter 7: Advanced Topics 144
udbsanitizer The udbsanitizer program is used to effect repairs on your Profile databases when there is a problem that leads to database inconsistency or corruption. The Urchin log processing engine routinely does database consistency checks while processing logs. When it detects a database that needs repair it will report the need for udbsanitizer to be run. In addition to database repair, this utility allows removal of a single day or a month's worth of data in the event that webserver logs need to be reprocessed. urchinctl (Note: this utility lives in the bin directory, not the util directory). This utility provides a means of starting and stopping the Urchin Scheduler and Urchin Webserver services. On UNIXtype systems, urchinctl is typically called from one of the system's boottime scripts to automatically start up or shut down Urchin services. urchin_daemons On UNIXtype machines Urchin runs a scheduler daemon (urchind) and an Apache webserver (urchinwebd). These programs should be started at boot time. The urchin_daemons script can be added to the system initialization scripts in the location appropriate for your UNIXtype OS, and it will cause the Urchin service daemons to be launched properly.
geoupdate: DNS Database Update Utility
Overview The geoupdate utility is used to check for updates to Urchin's internal DNS database files and download the updates if they are available. The utility can also be used to import custom entries into the DNS databases by using the domain.local file or another specified text file. Usage The geoupdate utility is most often executed by the Urchin scheduler (urchind) based on the __domaindb task. This task is set to check for new updates once per month and can be configured by the user to occur at a certain time using the admin interface. There are two main functions for this utility: 1. Download new versions of the domain databases if they are available 2. Import custom domain entries The default behavior is to check for updates, download the new databases and overwrite the existing ones, then import the local data from domain.local. The domain.local file must be located in the data/geodata subdirectory of your Urchin installation. The default behaviors can be overridden by running geoupdate from the command line with the appropriate options. The usage is: Chapter 7: Advanced Topics 145
geoupdate [Hhv] [D | F] [i file] H h v D F i log output to History file instead of stdout print help information print version information for this utility disable download of domain databases (Cannot be used with F) force download of domain databases (Cannot be used with D) import domain data from specified file rather than domain.local
Using the D option will disable downloading new databases. This is most often useful when importing local changes into the database without causing a complete update. Using the F option will force the download of new databases even if the databases are already up to date. This feature is useful if you imported incorrect custom domain information or otherwise damaged your domain database and wish to start over with a fresh copy. Logging output to a history file with the H option will make the run information available when viewing the Task History screen in the Configuration>Scheduler area of the Urchin administration interface. Examples To force a download of the latest domains database: geoupdate F To import custom entries from the domain.local file without downloading new database: geoupdate D To import custom entries from the file mydomains without downloading new database: geoupdate D i mydomains Custom DNS Entriess When using either the domain.local file in the data/geodata directory or some other custom file, the format of the entries should be one entry per line, starting with the IP or network address followed by a space or tab, and then the domain for that address. Spaces are not allowed in domain name. The allowed forms include the following: 192.168.10.100 somehost.somedomain.net (Explicit hostname IP) 192.168.10.16/24 somedomain.net (IP address with network prefix) 192.168.10.0/24 somedomain.net (subnet with network prefix) When processing Urchin will check for specific IPs first and then look for encompassing network ranges. Considerations Since geoupdate will completely overwrite any existing Urchin domains database each time it updates, it is advisable that you always keep your local modifications in the domain.local file so they will automatically be added. Otherwise after an update you will have to add local adjustments manually. Be sure to keep a backup copy of your domain.local file elsewhere. The geoupdate utility needs an internet connection to be able to check for and download new updates. The utility uses port 80 to communicate with the webserver providing the updates. It is possible that you will have problems when going through firewalls and proxy servers when doing updates. Please consult your network administrator if this is the case. Chapter 7: Advanced Topics 146
inspector: Urchin Installation Integrity Checker
Overview The Urchin Installation Integrity Checker, inspector, provides a means of checking the Urchin 5 distribution and reporting if there are problems with permissions, missing files, or certain problems with the Urchin configuration itself. By default, it also provides a summary of information about the Urchin installation and the type of platform it is installed on. The types of operations that inspector can perform are: Perform a sanity check on the Urchin distribution, including existance of files and proper permissions Check availability of a network port (for Urchin's webserver) Reset proper permissions on the Urchin distribution Usage inspector is located in the util directory of the Urchin 5 distribution. Usage of the utility is as follows:
inspector [h] (prints usage message and exits) inspector [v] (prints version and exits) inspector [v] [p port] [r]
where:
p r checks a specified port to determine availability directs inspector to fix permissions on the distribution files
When called with no command line arguments, inspector will perform a number of sanity checks on the Urchin installation to ensure integrity of the installation. Specifically, it performs the following operations: 1. Prints the Urchin version and Server information such as the operating system version and hostname 2. Verifies that all the proper Urchin binaries and utilities are in place and are the correct version. On UNIXtype platforms, discrepancies in the permissions of the binaries and files in the Urchin distribtuion are also noted. 3. Checks and reports the status of the Urchin Apache webserver (urchinwebd) 4. Checks and reports the status of the Urchin Scheduler (urchind) 5. Checks the Urchin configuration and reports the total number of records and some summary licensing information. When invoked with the "p" argument, inspector will check for the availability of the specified network port. This option is intended primarily for use by the Urchin installation process, and provides no information about the actual Urchin distribution itself. Chapter 7: Advanced Topics 147
The "r" (repair) option will cause the utility to attempt to repair the permissions of any files in the Urchin distribution that are not consistent with the the original installation. This is typically only useful on UNIXtype platforms, and in most cases will require that the utility be run as the "root" user.
u3importer: Urchin 3 Data Import Utility
Overview The u3importer is a command line utility found in the util directory of the Urchin distribution. This utility allows for the importing of report configurations from an existing Urchin 3 config file as well as the data associated with each Urchin 3 report. The u3importer has two modes; an interactive mode which prompts the user for the required information to perform the import operations, and a noninteractive mode that requires no dialog (more suited to unattended scripting operations). When run in interactive mode without any command line arguments, for each <Report> block and its associated directives in the Urchin 3 config file the utility will: 1. Create an Urchin 5 Profile, Log Source, and Task 2. Create any associated Filter entries 3. Read the Urchin 3 databases and write the data into new Urchin 5 databases in the appropriate location in the Urchin 5 distribution In noninteractive mode using the "c" option (explained below), u3importer will simply take Urchin 3 databases and import them into new Urchin 5 database files without making any changes to the Urchin 5 configuration. Important Note: The utility must be run on the same operating system platform that originally created the Urchin 3 report data. It is unable to read and convert data created on a different operating system platform and attempting to do may cause unexpected termination of the utility. If you wish to upgrade from an Urchin 3 installation on one operating system to a new Urchin 5 installation on another platform type, you should first install a temporary copy of Urchin 5 on the old platform. Next, run the u3importer to create an interim Urchin 5 configuration and profile data. Since the resulting Urchin 5 profile data is platformindependent, this data can then be moved over to your permanent Urchin 5 installation on the new platform. Please note that it is not necessary to license the Urchin 5 distribution on your old platform in order to use the u3importer utility. Please be sure to read the Limitations section below for a more detailed explanation of the limitations of importing Urchin 3 data. Usage Usage of the utility is as follows:
u3importer [v] (prints usage and exits)
148
u3importer (interactive mode) u3importer [c urchin3reportdatapath urchin5reportdatapath] (noninteractive mode)
Data Importing Procedure When upgrading from Urchin 3, u3importer should be run before creating any new Urchin 5 profiles, if possible, since it must create a new profile for each Urchin 3 report it reads. The utility will not add data to an existing profile. Instead if a profile of the same name as it is trying to import exists, it will create a similar name with the number 2 appended to it and import the data into that profile. Therefore, it is strongly advised to run u3importer before normal operation begins under Urchin 5. It is also recommended to disable the automation of any existing Urchin 3 processing, so that new log files are not discarded or lost during the upgrade process. This will also ensure that the Urchin 3 data is not changing while running u3importer to import your databases. Interactive Mode Operation UNIX Inside a command line shell on the system where Urchin is installed. Change directory to the Urchin util directory, and execute u3importer like so:
./u3importer
Windows On the system where Urchin is installed, open a Command Shell by going to Start>Run..., enter "cmd" and hit Enter. Once the shell window launches, type:
C: cd \Program Files\Urchin\util u3importer.exe
Step 1: Locate the Urchin 3 configuration The u3importer utility will prompt for the location of the Urchin 3 configuration file. A suggested location is provided. To accept the suggestion, simply press return. Otherwise, enter the complete path to the config file located in the Urchin 3 folder. Wherever Urchin 3 was installed, there should be a config file located in that Urchin 3 folder. On Unix systems, this could be /usr/local/urchin3/config. On Windows systems, this could be C:\Program Files\Urchin3\config. If you cannot find the Urchin3 installation, please contact your system administrator for details. Step 2: Import Urchin 3 configuration profiles Once the utility locates the Urchin 3 configuration, it will list all of the sites that exist in the configuration and prompt you for which ones to import. To import all profiles, press enter. To import only select profiles, type Y or N as each profile is prompted. Before continuing to the next step, you may verify that the configurations were imported correctly by inspecting the Urchin 4 Configuration interface. Step 3: Import Urchin 3 data After importing the configurations, the utility will then prompt to import the data associated with each profile. Chapter 7: Advanced Topics 149
Importing the data will allow you to view Urchin 3.x historical reports under the new Urchin interface. To import data for all profiles, press enter. To import data for only select profiles, type Y or N as each profile is prompted. Noninteractive Mode Operation In noninteractive mode u3importer simply reads existing Urchin 3 databases and creates the associated new Urchin 5 databases. It is up to the user to make sure the Urchin 5 databases are located properly within the Urchin distribution and that the necessary Profiles, Log Sources, and Tasks are created to fully configure the site. To launch u3importer in noninteractive mode without additional dialog, use the "c" option like so:
u3importer c /path/to/urchin3_udata /path/to/urchin5_databases
The path to your Urchin 3 data should point to the same directory path as shown in your Urchin 3 config file ReportDirectory directive for a given site. The path to the place to create your Urchin 5 directory can be anywhere. But if you want it to automatically become a part of your Urchin 5 configuration when you later add a profile for a site, you should make the path point to the data/reports subdirectory of your Urchin distribution. As an example, if you have a site named test.urchin.com in your Urchin 3 configuration, then in the report block for that site will be ReportDirectory directive similar to:
ReportDirectory: /urchin3/test.urchin.com/
Assuming your Urchin 5 installation is in the default location of /usr/local/urchin5, then to convert the Urchin 3 databases for this site and have them put in the proper location in your Urchin 4 installation, you would run the following command (line breaks added for readability, this command would be invoked on a single line):
u3importer c \ /urchin3/test.urchin.com \ /usr/local/urchin5/data/reports/test.urchin.com
In a scripted environment, after running this command you would typically use the uconfimport or uconfdriver utilities to create the associated Profile, Log Source, and Task configuration records to complete your migration for this site. Limitations Urchin 3 provided only a subset of the reports that are available by default in Urchin 5, and in many cases the reporting data was maintained only on a monthly basis. Therefore, not all Urchin 5 reports will be populated when importing Urchin 3 data. The following outlines the limitations of importing data from Urchin 3: 1. Unpopulated reports: the following reports were not present in Urchin 3, so they will be blank in Urchin 5 after importing Urchin 3 data. i. Traffic > (hourly view for all graph reports) ii. Visitors &Sessions (all reports) iii. Pages &Files > Downloads iv. Pages &Files > All Files v. Pages &Files > Directory by Files Drilldown vi. Pages &Files > Page Query Terms Chapter 7: Advanced Topics 150
vii. Navigation > Click Paths viii. Navigation > Length of Pageview ix. Referrals > Referral Errors x. Domains &Users > IP Addresses xi. Domains &Users > IP Drilldown xii. Domains &Users > Usernames by Bytes xiii. Browsers &Robots > Browsers by Bytes Drilldown xiv. Browsers &Robots > Platforms by Bytes Drilldown xv. Browsers &Robots > Robots by Hits Drilldown xvi. Browsers &Robots > Robots by Bytes Drilldown xvii. Client Parameters (all reports) 2. Navigation data in Urchin 3 is stored on a monthly basis, whereas in Urchin 5 it is stored daily. When importing Urchin 3 data, the data is all consolidated into the first of the month for the following reports: i. Navigation > Entrance Pages ii. Navigation > Exit Pages 3. No imported Profiles are scheduled to run, since there was no concept of scheduled tasks in Urchin 3. Use the uconfschedule utility to globally schedule all imported Profiles to run at a certain time. 4. Unique Log Sources are created for every TransferLog entry in the Urchin 3 config file, even if muliple Urchin 3 reports use that same TransferLog. You may want to consolidate the Log Sources after importing your Urchin 3 data if multiple Profiles share the same log file. 5. Unique Filters are created for each FilterIn, FilterOut, and DynamicURL directive encountered in the Urchin 3 config file. Again, you may want to consolidate the imported Filters if you have multiple Profiles or Log Sources that use the same filter. 6. The u3importer does not perform any type of disk space checking before running, so you will want to ensure that you have enough disk space to perform the import operation. Generally speaking, you will need to allocate at least as much additional disk space as what is currently required for all your Urchin 3 report directories. Considerations 1. No checking is done to see if previous data exists in the Urchin databases for the time range covered by the imported data. Therefore, if you run the u3importer tool again, you will double your statistics for that period. 2. Care should be taken when using the u3importer tool to import data from months that have already been populated with native Urchin 5 data. While the importing process does not overwrite existing data, it does no checking to see if data already exists for a given day and will happily add (possibly duplicate) data to the databases.
uconfdriver: Configuration Management Utility
Overview
151
The Configuration Management Utility. uconfdriver, provides a command line interface for administering the Urchin 5 configuration. All functionality present in the Urchin 5 administration interface is available in this utility, thus it can completely replace the use of the administration interface for managing any facet of the Urchin 5 configuration. The uconfdriver is intended for use in situations where managing the Urchin 5 configuration through automated/unattended scripts is desired. This makes it ideal for environments such as web hosting or large corporations where the amount of data that must be managed makes it impractical to manage Urchin via the standard webbased administration interface. The uconfdriver utility is located in the util directory of the Urchin 5 distribution. Usage Usage of the utility is as follows:
uconfdriver [h] (prints usage message and exits) uconfdriver [v] (prints version and exits) uconfdriver [f file] [d path] command parameters...
where:
f d e specifies the path to a file containing uconfdriver commands specifies an alternative path to the Urchin configuration database specifies printing of the "entry=" field (Urchin 5.100 and later)
and command is one of the following (note that each command is all on one line, line breaks added for readability): action=ntotalrecords action=nrecords table=tablename action=list table=tablename [start=startnum] [n=count] action=get ident action=add table=tablename name=recordname \ directive=value [directive=value ...] action=edit ident directive=value \ directive=value ...] action=delete ident action=get_parameter ident parameter=directive action=set_parameter ident directive=value and ident is one of: recnum=recordnum table=tablename entry=tblentry table=tablename name=recordname and tablename is one of: Global Machine Filter Logfile Profile Task Affiliation Group
152
User
The format of a uconfdriver command is a series of name/value pairs passed as command line arguments that define the action to be taken. In particular, the action command line argument directs the behavior of uconfdriver. The nine possible actions are described in the Usage section above. Valid userspecified configuration directives begin with the prefixes "ct_", "cr_", or "cs_", but Urchin also uses directives with a "cx_" for internal purposes. These "cx_" directives should never be modified by uconfdriver utility. Beginning with Urchin 5.100, you must now use the "e" option to have the "entry=" field printed for uconfdriver commands that return an entire record. This was done to improve uconfdriver performance when it is used with very large Urchin configurations. Calculation of the "entry" field requires uconfdriver to search sequentially through all records in the database, which is a slow process on Urchin configurations with tens of thousands of records. When used without the "e" argument, uconf_driver actions are very quick even with Urchin configuration databases containing millions of records. When invoked, uconfdriver will either execute the single action specified by the command line arguments or the multiple actions contained in a file specified with the "f" option. If no command line arguments are given and no "f" option is specified, uconfdriver will read actions from stdin. When specifying multiple commands via stdin or the "f" option, each line of input should represent one uconfdriver command. A complete list of valid Urchin configuration directives is available in Reference section of the Urchin Document Center at http://help.urchin.com and is entitled "Configuration Table and Directive List". Overview of uconfdriver Actions Several of the uconfdriver actions require you to identify the record that is to be retrieved or modified. This identification can be accomplished in three ways. One of the argument lists below should be substituted for the ident string (see "Usage" above) wherever a command requires a record identifier. 1. If the internal record number is known, then substitute "recnum=recordnum" for ident as the command line argument. Note that this record number is not necessarily sequential within a table. 2. If it desired to loop through all of the records in a particular table, then specify "entry=tblentry" and "table=tablename" as the ident command line argument. The first entry defaults to "1". 3. The exact name of the record can be specified using the "name=name argument. Be sure to use quotes if the name may contain white spaces or characters that may be interpreted as metacharacters by the shell running your command. Description of Actions Retrieving Data: these commands provide the ability to extract records, partial records, and record counts from the Urchin configuration. "ntotalrecords" outputs the total number of records in all tables in the Urchin configuration database. "nrecords table=tablename" outputs the number of records in the specified tablename. "list table=tablename [start=startnum] [n=count]" retrieves multiple records from a particular table. Each record is printed on a separate line. The optional "start=startnum" argument specifies the starting number of the table entry to begin grabbing, and the "n=count" argument specifies how many entries to retrieve. For example, if a particular table has 44 records and you want to grab records Chapter 7: Advanced Topics 153
2029, specify "start=20 n=10". "get ident" prints all information associated with the particular ident argument on a single line. li>"get_parameter" allows you to retrieve the value for a specific directive from the record matching ident in the configuration database, and requires the "parameter=parameter" argument. Modifying Data: the "add", "edit" and "delete" functions provide comprehensive editing ability of records in the Urchin configuration database. A directive list should also be passed along with these actions. The "set_parameter" function sets a particular directive within a record. "add" requires both the "table=tablename" and "name=recordname" parameters and inserts a new record into the database with the specified set of "directive=value pairs. "edit" is similar to the "add" command, except the record directives for the specified id are replaced with the new list of "directive=value pairs. This function clears all previous directives for id and adds all new directives specified on the command line. "delete" deletes the record matching ident from the table. "set_parameter" sets the specified "directive=value" directive in the configuration for the record matching ident. Diagnostics Returned and Exit status Upon completion of a command, uconfdriver will print out one of the following diagnostics based on the action parameter that was specified: Action (any) Output Description [usage msg] command line not in recognized format command didn't perform any action, perhaps due to an out of range entry, (any) [no msg] incorrect table name, etc. (any) l command line parameters invalid for request type add [recnum] record number of record that was created delete [recnum] record number of record that was deleted edit [recnum] record number of record that was edited get [record] complete set of name/value directives for specified record get_paramenter [param] value of requested directive list [records] complete list of all name/value directives for all records in the specified table nrecords [count] count of records in the specified table ntotalsrecords [count] count of all records in all tables set_parameter 0 always prints 0 on success
Please note that you must parse the runtime output from uconfdriver to determine if the command was successful. At present, the utility always exits with a status code of 0, so the exit status cannot be used to determine if the command succeeded or not. Examples
154
Here is a set of example commands using the uconfdriver utility. Please note that all commands are on a single line; line breaks have been added for readability.
# Extract the total number of records from the Urchin configuration database uconfdriver action=ntotalrecords # Extract the number of records in the "profile" table uconfdriver action=nrecords table="profile" # List records 68 in the "logfile" table uconfdriver action=list table="logfile" start=6 n=3 # Extract the record for user "joe" from the "user" table uconfdriver action=get table=user name="joe" # Add a regular nonprivileged user to the configuration uconfdriver action=add table=user name="bob" ct_fullname="Bob Jones" ct_password="b0bz@pw" cs_adminlevel=3 # Set the default language for the Urchin reports to English uconfdriver action=set_parameter table=user name="bob" cs_language=en # Change the network port that the Urchin webserver uses uconfdriver action=set_parameter recnum=1 ct_port=1234
Sample Bourne Shell script to add a Profile/Task/LogSource/User using uconfdriver This is a sample working script that demonstrates how uconfdriver could be embedded in a script to automate the creation of an entire new Profile, a Log Source for it to process, a scheduled Task, and a User with rights to view the Profile.
#!/bin/sh # # Proofofconcept Bourne shell script for adding a Profile, # Task, Log Source and User record set to the Urchin configuration. # The Profile will be set to run at 01:05am daily. # # NOTE: Line breaks have been added for readability # # Define the pertinent information here. Obviously, this should really be # stuff that's parsed from the command line. domain=mysite.com logfile=/path/to/webserverlogs/mysiteaccess.log username=userjoe password=joepasswd language=en region=us cd /path/to/urchin/util # Add Profile p_recnum=`./uconfdriver action=add table=profile name=$domain \ ct_name=$domain ct_website=http://www.$domain \ ct_reportdomains=$domain,www.$domain` # Add Task t_recnum=`./uconfdriver action=add table=task name=$domain \
155
ct_name=$domain cr_frequency=5 cr_enabled=on cs_hour=01 \ cs_minute=05 cs_rid=$p_recnum` # Set proper cross reference from Profile to Task recnum=`./uconfdriver action=set_parameter recnum=$p_recnum cs_taskid=$t_recnum` if [ "$recnum" != "$p_recnum" ]; then echo "Failed to associate profile with task" fi # Add Log Source l_recnum=`./uconfdriver action=add table=logfile name=$domain \ cr_action=2 ct_name=$domain cr_type=local ct_loglocation=$logfile \ cs_logformat=auto cs_rlist=!$p_recnum!` # Set proper cross reference from Profile to Log Source recnum=`./uconfdriver action=set_parameter recnum=$p_recnum cs_llist=!$l_recnum!` if [ "$recnum" != "$p_recnum" ]; then echo "Failed to associate profile with log source" fi # Add regular nonprivileged user with access to this Profile u_recnum=`./uconfdriver action=add table=user name=$username \ ct_name=$username ct_password=$password ct_fullname="$domain user" \ cs_language=$language cs_region=$region cs_adminlevel=3 \ cs_rlist=!$p_recnum!` # Set proper cross reference from Profile to User recnum=`./uconfdriver action=set_parameter recnum=$p_recnum cs_ulist=!$u_recnum!` if [ "$recnum" != "$p_recnum" ]; then echo "Failed to associate user with profile" fi exit ## ## END OF SCRIPT ##
Here is the same script, written using DOS commands.

@echo off REM Proofofconcept Windows batch file for adding a Profile, REM Task, Log Source and User record set to the Urchin configuration. REM The Profile will be set to run at 01:05am daily. REM REM This should work on Windows 2000, XP and 2003 Server. REM REM NOTE: Line breaks have been added for readability you will need REM to ensure that all your commands appear on one line in the script REM REM REM Prompt the user for the information we need. This section could REM be replaced using values from command line arguments instead. REM set/p domain=Enter domain: set/p logfile=Enter webserver log pathname: set/p username=Enter username:
156
set/p password=Enter password: set/p language=Enter language: set/p region=Enter region: cd c:\program files\urchin\util REM REM Add Profile REM uconfdriver action=add table=profile name=%domain% ct_name=%domain% \ ct_website=http://www.%domain% \ ct_reportdomains=%domain%,www.%domain% > #tmp set/p p_recnum= <#tmp REM REM Add Task REM uconfdriver action=add table=task name=%domain% ct_name=%domain% \ cr_frequency=5 cr_enabled=on cs_hour=01 cs_minute=05 \ cs_rid=%p_recnum% > #tmp set/p t_recnum= <#tmp REM REM Set proper cross reference from Profile to Task REM uconfdriver action=set_parameter recnum=%p_recnum% cs_taskid=%t_recnum% > #tmp set/p recnum= <#tmp if not %recnum%==%p_recnum% echo Failed to associate profile with task REM REM Add Log Source REM uconfdriver action=add table=logfile name=%domain% cr_action=2 \ ct_name=%domain% cr_type=local ct_loglocation=%logfile% \ cs_logformat=auto cs_rlist=!%p_recnum%! > #tmp set/p l_recnum= <#tmp REM REM Set proper cross reference from Profile to Log Source REM uconfdriver action=set_parameter recnum=%p_recnum% cs_llist=!%l_recnum%!` > #tmp set/p recnum= <#tmp if not %recnum%==%p_recnum% echo Failed to associate profile with log source REM REM Add regular nonprivileged user with access to this Profile REM uconfdriver action=add table=user name=%username% ct_name=%username% \ ct_password=%password% ct_fullname="%domain% user" \ cs_language=%language% cs_region=%region% cs_adminlevel=3 \ cs_rlist=!%p_recnum%! > #tmp set/p u_recnum= <#tmp REM REM Set proper cross reference from Profile to User REM uconfdriver action=set_parameter recnum=%p_recnum% cs_ulist=!%u_recnum%! > #tmp set/p recnum= <#tmp if not %recnum%==%p_recnum% echo Failed to associate user with profile del #tmp REM REM END OF SCRIPT REM
157
Here is the same script, written using VBS. On a Windows system, you would need to use "cscript" to run this script, e.g. "cscript add_records.vbs".
' ' Proofofconcept win32 cscript for adding a Profile, ' Task, Log Source and User record set to the Urchin configuration. ' The Profile will be set to run at 01:05am daily. ' ' NOTE: Line breaks have been added for readability ' ' Define the pertinent information here. Obviously, this should really be ' stuff that's parsed from the command line. ' Option Explicit 'On Error Resume Next ' ' Declare needed vars ' Dim driverpath Dim tmpfilepath Dim domain Dim logfile Dim username Dim password Dim language Dim region Dim calls driverpath tmpfilepath domain logfile username password language region = = = = = = = = "C:\Program Files\Urchin\util\uconfdriver.exe" "C:\Program Files\Urchin\var\udtemp" "mysite.com" "/path/to/webserverlogs/mysiteaccess.log" "userjoe" "joepasswd" "en" "us"
' ' Declare objects ' Dim shell, fso, execo, tfile Dim recnum, p_recnum, t_recnum, l_recnum, u_recnum Set shell = CreateObject("WScript.Shell") Set fso = CreateObject("Scripting.FileSystemObject") ' ' create temporary empty file ' Set tfile = fso.CreateTextFile(tmpfilepath) tfile.Close ' ' add profile ' calls = """" &driverpath &""" action=add table=profile name=""" &domain &"""
158
ct_name=""" &domain &""" ct_website=""http://www." &domain &""" ct_reportdomains=""" &domain &",www." &domain &""" f " &tmpfilepath WScript.Echo calls Set execo = Shell.Exec(calls) While execo.StdOut.AtEndOfStream true p_recnum = p_recnum &(execo.StdOut.ReadLine()) wend if (isNumeric(p_recnum) = False Or p_recnum <= 0) Then WScript.Echo "Add Profile Failed " &p_recnum WScript.Quit(1) End If WScript.Stdout.write (p_recnum) &isNumeric(p_recnum) &vbCRLf ' ' add Task ' calls = """" &driverpath &""" action=add table=task name=""" &domain &""" ct_name=""" &domain &""" cr_frequency=5 cr_enabled=on cs_hour=01 cs_minute=05 cs_rid=" &p_recnum &" f " &tmpfilepath WScript.Echo calls Set execo = Shell.Exec(calls) While execo.StdOut.AtEndOfStream true t_recnum = t_recnum &(execo.StdOut.ReadLine()) wend if (isNumeric(t_recnum) = False Or t_recnum <= 0) Then WScript.Echo "Add Task Failed " &t_recnum WScript.Quit(1) End If WScript.Stdout.write (t_recnum) &isNumeric(t_recnum) &vbCRLf ' ' Associate task back to profile ' calls = """" &driverpath &""" action=set_parameter recnum=" &p_recnum &" cs_taskid=" &t_recnum &" f " &tmpfilePath WScript.Echo calls Set execo = Shell.Exec(calls) recnum = "" While execo.StdOut.AtEndOfStream true recnum = recnum &(execo.StdOut.ReadLine()) wend if (isNumeric(recnum) = False Or recnum <= 0 or recnum p_recnum) Then WScript.Echo "Failed to associate profile to task" &recnum WScript.Quit(1) End If WScript.Stdout.write (recnum) &isNumeric(recnum) &vbCRLf '
159
' Add log source ' calls = """" &driverpath &""" action=add table=logfile name=""" &domain &""" cr_action=2 ct_name=""" &domain &""" cr_type=local ct_loglocation=""" &logfile &""" cs_logformat=auto cs_rlist=!" &p_recnum &"! f " &tmpfilePath WScript.Echo calls Set execo = Shell.Exec(calls) While execo.StdOut.AtEndOfStream true l_recnum = l_recnum &(execo.StdOut.ReadLine()) wend if (isNumeric(l_recnum) = False Or l_recnum <= 0) Then WScript.Echo "Failed to Add Log Source" &l_recnum WScript.Quit(1) End If WScript.Stdout.write (l_recnum) &isNumeric(l_recnum) &vbCRLf ' ' Associate Log Source to Profile ' calls = """" &driverpath &""" action=set_parameter recnum=" &p_recnum &" cs_llist=!" &l_recnum &"! f " &tmpfilePath WScript.Echo calls Set execo = Shell.Exec(calls) recnum = "" While execo.StdOut.AtEndOfStream true recnum = recnum &(execo.StdOut.ReadLine()) wend if (isNumeric(recnum) = False Or recnum <= 0 or recnum p_recnum) Then WScript.Echo "Failed to associate profile width log source" &recnum WScript.Quit(1) End If WScript.Stdout.write (recnum) &isNumeric(recnum) &vbCRLf ' ' Add regular nonprivileged user with access to this Profile ' calls = """" &driverpath &""" action=add table=user name=""" &username &""" ct_name=""" &username &""" ct_password=""" &password &""" ct_fullname=""" &domain &""" cs_language=""" &language &""" cs_region=""" &region &""" cs_adminlevel=3 cs_rlist=!" &p_recnum &"! f " &tmpfilepath WScript.Echo calls Set execo = Shell.Exec(calls) While execo.StdOut.AtEndOfStream true u_recnum = u_recnum &(execo.StdOut.ReadLine()) wend if (isNumeric(u_recnum) = False Or u_recnum <= 0) Then WScript.Echo "Failed to Add Log Source" &u_recnum WScript.Quit(1)
160
End If WScript.Stdout.write (u_recnum) &isNumeric(u_recnum) &vbCRLf ' ' Associate user to profile ' calls = """" &driverpath &""" action=set_parameter recnum=" &p_recnum &" cs_ulist=!" &u_recnum &"! f " &tmpfilePath WScript.Echo calls Set execo = Shell.Exec(calls) recnum = "" While execo.StdOut.AtEndOfStream true recnum = recnum &(execo.StdOut.ReadLine()) wend if (isNumeric(recnum) = False Or recnum <= 0 or recnum p_recnum) Then WScript.Echo "Failed to associate profile width user" &recnum WScript.Quit(1) End If WScript.Stdout.write (recnum) &isNumeric(recnum) &vbCRLf ' ' delete temp file and exit ' If (fso.FileExists(tmpfilepath) = True) Then fso.DeleteFile(tmpfilepath) End If WScript.Quit (0) ' ' END OF SCRIPT '
Special Conditions of Use Passwords When adding or editing the "ct_password" directive for either User or Remote Log Source password, uconf_driver will automatically encrypt the password before writing it to the configuration database to ensure that passwords are not stored in clear text. For portability reasons, the encryption is in a proprietary format that is not compatible with other password encryption formats such as "crypt" on UNIXtype systems. Managing associations between various records Note that uconfdriver is a lowerlevel utility that does not map any of the directives for you. In using the utility to script some operations, please be aware that many of the tables contain directives that refer to other records, or lists of records. These directives are: ct_ulist, ct_glist, ct_llist, ct_flist, and ct_rlist, which refer to the user, group, logfile, filter, and profile tables respectively. These lists are represented as exclamation point delimited list of record numbers:
ct_flist="!13!36!56!"
161
where each entry represents the recnum value of a record and is surrounded with exclamation points. Important! Be sure to keep the cross references intact. For example, a Filter record has a "ct_rlist" which details all of the profiles that the filter applies to; and a Profile record has a "ct_flist" which details all of the filters that apply to this profile. Important! The uconfdriver uses this exclamation point delimited lists of record numbers, whereas the uconfexport and uconfimport utilities use commadelimited lists of names. Be sure to use the appropriate list specification depending on which utility you are using. Considerations The uconfdriver utility is intended for advanced users who are comfortable with command line scripting, and as such it provides only minimal error/sanity checking. Exercise caution when using the utility as improper usage could result in an unusable Urchin configuration. It is strongly recommended that the Urchin configuration be backed up using the uconfexport utility before making changes with uconfdriver. See Also: Scriptbased Configuration Management in the Integration section of the Advanced Topics area of the Urchin documentation provides a general overview of using a scriptbased, unattended/automated approach to managing Urchin's configuration Configuration Table and Directive List in the Reference area of the Urchin documentation. This document has a complete list of all the valid tables and parameters that can be used with the uconfdriver utility.
uconfexport: Textbased Configuration Export Utility
Overview The Textbased Configuration Export Utility. uconfexport, provides a command line interface for reading the Urchin configuration database and exporting it in a humanreadable text format. The exported data is an XMLtype record format, which is directly compatible with the format expected by the uconfimport utility. Each record in the exported data corresponds to a configuration record in the Urchin configuration database. The uconfexport utility is located in the util directory of the Urchin 5 distribution. Usage Usage of the utility is as follows:
uconfexport [h] (prints usage message and exits) uconfexport [v] (prints version and exits) uconfexport [f file]
162
where:
f specifies the path to a file that uconfexport will write to
If no command line arguments are given, uconfexport will write to the standard output. Output format for uconfexport The utility exports configuration records in an XMLstyle format. Each record begins with a <Table Name="RecordName"> line and ends with a </Table> line. The list of configuration directives associated with the record are printed, one per line, between the record begin/end lines. For instance, a Profile record would look something like this:
<Profile Name="help.urchin.com"> ct_name=help.urchin.com ct_website=http://help.urchin.com ct_reportdomains=help.urchin.com,www.help.urchin.com cs_llist="help.urchin.com daily log" ct_defaultpage=index.html cs_referrallevel=3 cs_timeoffset=localtime cr_logtracking=on cr_processpath=on cs_pathlevel=3 cs_vmethod=0 cs_visitortimeout=1800 cr_sessionpageview=on cs_ulist="webmaster" ct_affiliation=(NONE) cr_profiletype=Standard_Website cs_reportset=Basic_All </Profile>
The valid list of Table names is:

Global Machine Filter Logfile Profile Task Affiliation Group User
Considerations The uconfexport utility provides no ability to export a certain subset of the configuration data. If you wish to extract only certain records from the output of uconfexport, you should use an external script to parse this output or use the uconfdriver utility to extract single records. The output of uconfexport is carefully constructed so that it can be directly imported into the Urchin configuration using the uconfimport utility. This allows you to use uconfexport to create a textbased backup of your Urchin configuration which can then be easily restored using Chapter 7: Advanced Topics 163
uconfimport. See Also: Scriptbased Configuration Management in the Integration section of the Advanced Topics area of the Urchin documentation provides a general overview of using a scriptbased, unattended/automated approach to managing Urchin's configuration Configuration Table and Directive List in the Reference area of the Urchin documentation. This document has a complete list of all the valid tables and parameters that may appear in the output of the uconfexport utility.
uconfimport: Textbased Configuration Import Utility
Overview The Textbased Configuration Import Utility. uconfimport, provides a command line interface for importing textbased configuration records into the Urchin configuration database. The imported data is an XMLtype record format, which is directly compatible with the format exported by the uconfexport utility. The uconfimport utility is located in the util directory of the Urchin 5 distribution. Usage Usage of the utility is as follows:
uconfimport [h] (prints usage message and exits) uconfimport [v] (prints version and exits) uconfimport [o|r] [f file]
where:
f o r specifies the path to a file that uconfimport will write to overwrites existing records with the same name with the data being imported removes all existing configuration data and replaces it with the data being imported
If neither the "o" or "r" arguments are specified, only new records will be written into the configuration database; no existing records will be modified or overwritten. If no command line arguments are given, uconfimport will write to the standard output. Input format for uconfimport
164
The utility imports configuration records in an XMLstyle format. Each record begins with a <Table Name="RecordName"> line and ends with a </Table> line. The list of configuration directives associated with the record should be ordered, one per line, between the record begin/end lines. For instance, a Profile record would look something like this:
<Profile Name="help.urchin.com"> ct_name=help.urchin.com ct_website=http://help.urchin.com ct_reportdomains=help.urchin.com,www.help.urchin.com cs_llist="help.urchin.com daily log" ct_defaultpage=index.html cs_referrallevel=3 cs_timeoffset=localtime cr_logtracking=on cr_processpath=on cs_pathlevel=3 cs_vmethod=0 cs_visitortimeout=1800 cr_sessionpageview=on cs_ulist="webmaster" ct_affiliation=(NONE) cr_profiletype=Standard_Website cs_reportset=Basic_All </Profile>
The valid list of Table names is:

Global Machine Filter Logfile Profile Task Affiliation Group User
Cross Linking of Records Many records in the Urchin configuration contain directives that cross reference records in other tables. For instance, a Profile record will have a cross reference to the Logfile table for each specified Log Source; likewise, a record in the Logfile table will have a cross reference to the Profiles which utilize that Log Source. Directives of this type usually have a name like ct_Xlist, and consist of a string of commaseparated record names in other tables that the record references. When using uconfimport, the utility will verify the input data and build the proper internal crossreference links. For instance uconfimport will properly crossreference the following two records when it imports them. Notice the use of csllist in the Profile record, and cs_rlist in the Logfile record.
<Profile Name="download.urchin.com"> ct_name=download.urchin.com ct_website=http://download.urchin.com ct_reportdomains=download.urchin.com,www.download.urchin.com cs_llist="download.urchin.com daily log" cs_timeoffset=localtime
165
cr_logtracking=on cr_processpath=on cs_pathlevel=3 cs_vmethod=0 cs_visitortimeout=1800 cr_sessionpageview=on cr_profiletype=Standard_Website cs_reportset=Basic_All </Profile> <Logfile Name="download.urchin.com daily log"> ct_name=download.urchin.com daily log ct_loglocation=/bigdisk/rawlogs/download.urchin.com/access.log cr_type=local cs_rlist="download.urchin.com" </Logfile>
Record Ordering Requirement with "r" argument When using uconfimport with the "r" argument with uconfimport, it is required that the first four records in the imported configuration be as follows:
<Global Name="Access Settings"> ... </Global> <Machine Name="Process Settings"> ... </Machine> <Affiliation Name="(NONE)"> ... </Affiliation> <User Name="(admin)"> ... </User>
If uconfimport does not find the first four records to be of the type and order specified above, the utility will print a warning diagnostic and exit without performing the import. Saving and Restoring the Urchin Configuration Using the capabilities of the uconfexport and uconfimport utilities, it is a simple process to both save and restore the Urchin configuration to a known good state. You merely need to save the output of the uconfexport utility in a file, and then reimport it at some later date with uconfimport. This allows you recover easily from corruption caused by a server crash or other catastrophic event. For example, consider the following scenario: As a standard operational policy, you save your Urchin configuration regularly with the command:
uconfexport f /path/to/backupdir/urchin_saved_config.txt
Time goes by, bad stuff happens and the server disk crashes, requiring a complete reinstall of Urchin. You restore your Urchin configuration with the command: Chapter 7: Advanced Topics 166
uconfimport r f /path/to/backupdir/urchin_saved_config.txt
In addition, it should be noted that using the "r" argument with uconfimport can be used to clean out old unreferenced records in the Urchin configuration database. Simply doing the following procedure clears out all unused records in the database, thereby compacting it.
uconfexport f /path/to/tempdir/urchinconfig.txt uconfimport r f /path/to/tempdir/urchinconfig.txt
Upon importing, uconfimport will also do a certain amount of sanity checking to ensure the critical records are in the proper order and that the proper record crossreferences are in place. Considerations When a ct_password directive for a Logfile or User record is imported, the supplied password must be in cleartext format or already encrypted in Urchin's proprietary encryption format. The utility will automatically encrypt cleartext passwords as it stores them in the configuration database. To allow Urchin configuration data to be moved among different operating system platforms, Urchin uses an internal encryption format that is not compatible with standard password formats such as those generated by crypt on UNIXtype systems. The "r" argument should be used with caution. When using this mode, you must ensure that the input data is both comprehensive and in the correct format since the Urchin configuration is being completely overwritten by the data supplied to uconfimport. It is safest to use "r" importing functionality only with configuration data created by the uconfexport utility. See Also: Scriptbased Configuration Management in the Integration section of the Advanced Topics area of the Urchin documentation provides a general overview of using a scriptbased, unattended/automated approach to managing Urchin's configuration Configuration Table and Directive List in the Reference area of the Urchin documentation. This document has a complete list of all the valid tables and directives that may be used with the uconfimport utility.
uconfschedule: Global Scheduling Utility
Overview The Global Scheduler utility, uconfschedule, provides a means of scheduling Urchin 5 tasks to run immediately or at some regularly scheduled interval without having to make individual scheduling changes to each Urchin Profile. The types of scheduling operations that uconfschedule can perform are: Chapter 7: Advanced Topics 167
schedule all profiles to run immediately schedule all profiles to run at a particular time and/or frequency disable scheduling for all profiles The Global Scheduling Utility is one of the Urchin 5 utilities that is typically used in an automated/scripted environment. For a synopsis of the entire suite of scripting utilities and an indepth description of their usage, please see the Scriptbased Configuration Management document in the Integration section of the Advanced Topics area on http://help.urchin.com. Usage uconfschedule is located in the util directory of the Urchin 5 distribution. Usage of the utility is as follows:
uconfschedule [v] (prints version and exits) uconfschedule [k] [r]
In the default mode, uconfschedule will interactively prompt for the type of global scheduling operation to apply, and any other parameters necessary (such as the time of day, date, etc.). If the "k" option is supplied, the scheduling is done without modifying the existing schedule for the profiles. If the "r" option is specified, the utility will simply schedule all profiles to run immediately, with no further input required. Running "uconfschedule k r" is a useful way of scheduling all profiles to run immediately without changing the normally scheduled run times for the Profiles. Considerations The uconfschedule utility does not actually run jobs itself. It merely applies a global scheduling change to all profiles, which in turn is picked up by the Urchin Scheduler (urchind). Therefore, it is necessary to be running the Urchin Scheduler in order for events scheduled by uconfschedule to be invoked, including RunNow events.
udbsanitizer: Database Maintenance Utility
Overview The Urchin Database Maintenance Utility, udbsanitizer, provides a means of checking the Urchin 5 profile databases and performing various maintenance operations on these databases. The types of operations that udbsanitizer can perform are: Check the integrity of Urchin monthly profile databases Rebuild Urchin monthly profile database headers and indexes Roll back databases to a previous saved backup state Delete profile data for a day, multiple days, or an entire month Chapter 7: Advanced Topics 168
Usage udbsanitizer is located in the util directory of the Urchin 5 distribution. Usage of the utility is as follows:
udbsanitizer [h] (prints usage message and exits) udbsanitizer [v] (prints version and exits) udbsanitizer p profile [d YYYYMM[DD]] bfhiprqx] [z [e DD]]
where:
b d e f h i p r q x z go directly to rollback option specifies yearmonth and optionally the day to operate on with z and d options, zero multiple days (range d>e) in same month force action to occur without confirmation print this help information go directly to rebuildindex option specifies name of profile (required) go directly to remove option quiet mode, suppress output except for critical user confirmation go directly to rebuildheader option go directly to zeroday option
Note: When udbsanitizer is called with options that do not completely describe what action to take, it will prompt the user as needed for additional input. You can cause an action to be performed without any user interaction by using the "d" option in conjunction with any of the b,i,r,x, or z options. Operation In normal operation, udbsanitizer is invoked from a command shell and interactively prompts the user for the actions to take. For each month of Urchin reporting data that the utility finds, it will present the following interactive menu:
Options: 1. Rollback data to state before last run 2. Delete this month entirely 3. Rebuild header to match data 4. Rebuild indexes 5. Zero out one or more days Please choose 15 or press return to do nothing:
If no action for the currently selected month is desired, pressing the Enter/Return key will cause the utility to move forward to the next chronological month where data is present and present the same menu choices. Actions associated with the options presented above are: 1. Data rollback The utility will revert all reporting data for a profile to that contained in a ZIP archive. The user is presented with a list of ZIP archive backups to choose from. The ZIP archives are named with the following convention "YYYYMMbackupYYYYMMDDHHMMSS.zip", where the first YYYYMM refers to the month of data being backed up (e.g.200309 refers to September 2003), and Chapter 7: Advanced Topics 169
the YYYYMMDDHHMMSS portion is the timestamp of when the ZIP archive was created. This timestamp should be helpful in determining which ZIP archive you want to roll back to. Please note that there is no way to invoke udbsanitizer to do a rollback based solely on command line arguments; it will always prompt for the ZIP archive to rollback to if any exist. If no ZIP backup archives exist, the utility prints a diagnostic to that effect and exits. 2. Delete monthly data All data for a particular profile for the specified month is removed. This option is useful for zeroing out the statistics for a month if the data is incorrect, e.g. the wrong filters were applied or the wrong logs were processed; or perhaps some of the advanced profile parameters were changed such as the click path depth or referral level and it is desirable to update that month's Urchin reporting data to reflect the change. This action can be performed without user interaction by invoking udbsanitizer with the "f", "r" and "d" arguments, e.g.
udbsanitizer f r d 200309 p mysite.com
3. Rebuild database headers This causes the utility to read the Urchin database tables directly and rebuild the database headers based on the data found. This should only be done if udbsanitizer finds a discrepancy between the headers and the data. WARNING: if the database headers do not match the data, this is typically indicative of some type of database corruption; in this case, the prudent course of action is to completely remove the data for that month and reprocess the corresponding webserver logs. This may not be possible for various reasons, so rebuilding the headers may be the only way to resuscitate the databases so that the Urchin log processing and reporting engines are able to work with them, but this is not guaranteed to fix corruption. This action can be performed without user interaction by invoking udbsanitizer with the "f", "i" and "d" arguments, e.g.
udbsanitizer f x d 200309 p mysite.com
4. Rebuild database indexes This causes the utility to read the Urchin database tables directly and rebuild the database indexes based on the data found. This should only be done if udbsanitizer finds a discrepancy between the headers and the data. NOTE: the same warning given about corruption in the database headers applies to this option as well. This action can be performed without user interaction by invoking udbsanitizer with the "f", "i" and "d" arguments, e.g.
udbsanitizer f i d 200309 p mysite.com
5. Zero data for one or more days This option allows data for selected days within the month to be zeroed out, thereby allowing Urchin log processing to be rerun for those days only (e.g. urchin p profile d YYYYMMDD). This action can be performed without user interaction to zero out a single day by invoking udbsanitizer with the "f", "z" and "d" arguments, e.g.
udbsanitizer f z d 20030907 p mysite.com
and for multiple days by including the "e" argument as well to specify an end date, e.g.
udbsanitizer f z d 20030907 e 10 p mysite.com
which will zero out data for September 7th through the 10th. This is more efficient than invoking multiple instances of udbsanitizer to zero out a single day at a time, as the database indexes and headers only are checked once. The index/header checking operation can require a noticeable amount of time on profiles with a lot of data. Considerations
170
Invoking udbsanitizer without specific dates on profiles with a lot of historical data can be time consuming, as the utility must open up the databases for each month, perform sanity checks, and then present the menu of actions. Actions that delete daily or monthly data cannot be undone! The only recourse is to reprocess the webserver logs for that time period to repopulate the profile databases. Use these options with care.
urchinctl: Urchin Services Control Utility
Overview The Urchin Services Control utility, urchinctl, provides a means of starting and stopping the Urchin Scheduler and Urchin Webserver services. On UNIXtype systems, urchinctl is typically called from one of the system's boottime scripts to automatically start up or shut down Urchin services. The types of operations that urchinctl can perform are: Start, stop, or restart the scheduler or webserver (or both) Start the webserver on an alternate port Start the webserver with SSL encryption Usage urchinctl is located in the bin directory of the Urchin distribution. Usage of the utility is as follows:
urchinctl [h] (prints usage message and exits) urchinctl [v] (prints version and exits) urchinctl [e] [p port] [s | w] action
where:
e p s w activates encryption (SSL) specifies the port for the performs the action on the performs the action on the in the webserver webserver to listen on Urchin scheduler ONLY Urchin webserver ONLY
and action is one of:

start stop restart status (starts the service(s)) (stops the service(s)) (stops and then starts the service(s)) (displays webserver/scheduler runtime status)
171
By default, the action is performed on both the webserver and the scheduler unless the "s" or "w" command line arguments are specified. Note that these arguments are mutually exclusive. Considerations On UNIXtype systems, urchinctl should be run as the user/UID that Urchin is installed as to ensure that the urchinwebd and urchind processes are started as that UID. Starting up the Urchin webserver with SSL encryption initially requires additional configuration steps. Please see the document titled Activating SSL on the Urchin Webserver in the Security Features section of the Advanced Topics area of the Urchin Documentation Library.
urchin: Urchin Log Processing Engine
Overview The Urchin Log Processing Engine, urchin, is the core log processing component of Urchin. Ordinarily, the log processing engine is invoked from the Urchin Scheduler (urchind) when a task is run. However, it is possible to execute urchin directly from a command shell to run a specific profile. This is useful in highly scripted environments where running a Urchin tasks from an external source such as the Windows Task Scheduler or cron on UNIXtype systems. It is also useful for running a profile under special circumstances, such as to process only hits for a particular day, or to do some type of debugging. urchin is not truly a utility it is documented here because it possesses some limited commandline capabilities that may prove useful in certain environments. Usage urchin is located in the bin directory of the Urchin distribution. Usage of the utility is as follows:
urchin [h] (prints usage message and exits) urchin [v] (prints version and exits) urchin [DHt [d YYYYMMDD] p profile
where:
D H t d p runs urchin in debug mode causes the runtime output of urchin to be logged to the standard history directory for the profile runs urchin in test mode only to output runtime parameters specifies that Urchin should only process hits from logs that match the specified date (YYYYMMDD format) specifies the profile to run
172
Considerations On UNIXtype systems, urchin should be run as the user/UID that Urchin is installed as to ensure that the databases for the profile are owned by that UID, since urchin will create them if they do not already exist.
Integration
NFS locking requirement
Urchin configuration, which uses data files located in the 'data/conf' folder of the installation, will set read and write locks during setup and administration. If the data/conf folder is mounted over NFS, it is required that the NFS server is running the appropriate locking daemons to handle remote locking. If the locking does not work properly, the Urchin installation may hang indefinitely. Technical Notes: Urchin uses the fcntl() function on UNIX to perform advisory and exclusive (read/write) locking. Some older systems may not support all of the rpc locking requests of different platforms. Be sure you are running recent versions of your OS with the latest patches. Make sure that the NFS server is running 'statd' and 'lockd' or equivalent.
Overview of Urchin Integration Capabilities
Introduction The Urchin 5 front end is built on modular components and industry standard techniques to allow administrators to integrate Urchin report access into their existing or planned infrastructures. As part of our commitment to hosting providers and data centers, we test and support three integration points and six different functions which should allow Urchin to integrate into most architectures. The following diagram illustrates the primary components of the Urchin frontend and the three integration points. Both administrative and reporting functions are webbased and delivered from this system.
173
All content from the Urchin system is delivered via an Apache server that is installed as "urchinwebd" shown in the left side of the diagram. Requests are handled by the Apache server and passed using the CGI interface to a Session Controller application (session.cgi). The Session Controller performs Authentication and depending on the action passes the request on to either the Admin Engine (admin) or the Urchin Report Engine (urchin.cgi). Content with embedded session identifiers is passed back to the user. The three points labeled "A", "B", and "C" are the three integration points mentioned previously. Point "A" allows the administrator to either replace or bypass the Urchin web server. Point "B" allows for external or no authentication. And point "C" allows for direct access to the reporting via a wrapper or portal. Point A: Web Server Integration Many hosting companies will already have a running web server or have a specially compiled web server that runs within their systems. As long as a web server is providing basic CGI operations, you may replace the Apache binary or use an existing web server to provide access to Urchin. This integration point keeps the rest of the Urchin system intact. The Urchin interface will be used for administration, authentication, and report delivery. Users and Profiles will need to be configured within the Urchin system so as a user enters the system, Urchin will know which reports to allow them access to. For complete details on the requirements of replaced binary or an example of using an existing web server, please see the appropriate document for the type of webserver you are running under the Integration section of the Advanced topics area of the Urchin Documentation Center at http://help.urchin.com. Point B: External Authentication or Authentication Bypass Complex hosting environments will often have existing centralized controls for providing user authentication such as LDAP. With a simple configuration change, the Session Controller can call an authentication routine of your choice. A simple interface is provided for returning successful or failed logins. Existing authentication routines can be easily wrapped to provide the correct framework for both systems. Using integration at point "B", it also possible to bypass the authentication with a dummy routine and link directly to the landing page. Chapter 7: Advanced Topics 174
By using integration at point "B", the overall Urchin system remains intact with the exception that authentication is performed by and external application. Users and Profiles will need to be configured within the Urchin system so as a user enters the system, Urchin will know which reports to allow them access to. For complete details on the interface for using external authentication and an example of how to bypass authentication, please see the associated Integration document entitled Using External Authentication or Authentication Bypass. Point C: Link Directly to Reports from Wrapper or Portal Many of our customers have a onetoone relation between a customer and a website, and wish to link to reporting for a particular website directly from the website's administration area. This area is typically already authenticated. Point C integration makes it easy to link directly to reporting either via a wrapper script or from an existing portal. In this scenario, the Urchin authentication and initial report selection screen are bypassed as users are taken directly to the Urchin reports. By using integration at point "C", it is assumed that the service provider has control over who gets to view what, either by a onetoone relation or by an existing configuration database. Urchin will still need to be configured for generating Profile reports, but this can be automated within or outside of the administrative interface. Urchin provides the capability to link directly to an Urchin report with a specific URL when the "Direct Report Linking" is enabled in the administration interface. Access to the Urchin report is controlled via a ".report.conf" file embedded in the directory specified by this URL. For a complex portal integration, the Urchin reporting engine will propagate session and other portal variables in order to keep the session operating. For details on how to use a wrapper or portal to access Urchin reporting directly, and proper configuration of the ".report.conf" please see the Integration document entitled Linking Directly to Urchin Reports.
Changing the Location of the Urchin Data Directory
Overview For simplicity, the default Urchin 5 installation is contained in a single umbrella directory that contains all Urchin applications and utilities, library files, documentation, etc. All data processed by Urchin is also stored in this installation directory under the data subdirectory. In many cases, it is desirable to configure Urchin to store its data on another disk partition or drive that is better suited to dynamic data and allows for greater storage capacity. Urchin can easily be configured to do this. Procedure The location of where Urchin stores its report data can be changed in the urchin.conf file, which is located in the etc directory of the main Urchin installation directory. Chapter 7: Advanced Topics 175
For UNIXtype systems: 1. Open a command shell as the user that Urchin was installed as 2. cd to the directory where Urchin is installed 3. Stop the Urchin services with the command: bin/urchinctl stop 4. Using a text editor, open the urchin.conf file in the etc directory of the Urchin distribution 5. Uncomment the following line by removing the leading '#' character: #dataDirectory: ./data/ and substitute the full directory path you want to use instead of the default directory, e.g. dataDirectory: /bigdisk/urchin/data/ 6. If necessary, create the new data directory. Important note: this directory must be readable and writable by the user Urchin runs as. 7. Copy the existing data to the new location. To ensure that all permissions and data are properly preserved, it is recommended that you use the following command: cd data; tar cf . | (cd /bigdisk/urchin/data; tar xpf ) 8. Rename the data directory to data.old. You can remove it completely if you wish, but you may want to ensure that everything is working properly before doing this. 9. For ease of administration, you may want to create a symlink in the main Urchin directory pointing at this new location, e.g. ln s /bigdisk/urchin/data ./data 10. Restart the Urchin services with the command: bin/urchinctl start 11. Log in to Urchin as the admin user. You should be presented with the License Urchin screen. Simply click on the Reactivate License link and you are finished. For Windows systems: 1. Stop the Urchin Services: Start>Programs>Urchin>Disable Urchin Services 2. Open Windows Explorer and navigate to the etc folder of the Urchin distribution. By default, this is C:\Program Files\Urchin\etc 3. Using a text editor, open the urchin.conf file 4. Uncomment the following line by removing the leading '#' character: #dataDirectory: ./data/ and substitute the drive letter and full pathname you want to use instead of the default folder, e.g. dataDirectory: E:\Urchin\data 5. If necessary, create the new data folder. Important note: the permissions on this folder must allow read and write access to the Urchin service. 6. Copy the contents of the data folder to the new folder. 7. Rename the data folder to data.old. You can remove it completely if you wish, but you may want to ensure that everything is working properly before doing this. 8. Restart the Urchin Services: Start>Programs>Urchin>Enable Urchin Services Considerations Due to the way licensing is implemented on UNIXtype systems to prevent tampering, moving Urchin's data directory will require the Urchin license to be reactivated. However, the Chapter 7: Advanced Topics 176
relicense operation is extremely simple you merely need to log in as the Urchin admin user and click Reactivate License.
Using an Existing Apache Webserver (UNIXtype Platforms)
By default, Urchin 5 administration and reporting are done using a standalone Apache server that is bundled with the Urchin product. In the vast majority of Urchin installations, this is the preferred method for delivering Urchin admin and reporting interfaces. However, in rare instances it may be necessary to utilize an existing Apache installation. This may be due to site requirements that a localized version of Apache be used throughout the organization, or that all web services be controlled via a single Apache configuration. The information below describes two different models that can be employed to meet these requirements. DISCLAIMER: These modifications to the Urchin installation are unsupported and fall outside the scope of the standard Urchin free and paid support plans. Any assistance rendered to set up or debug these configurations will be done at Urchin Software Corporation's standard Hourly Support rate. Option 1: Utilitizing an existing sitespecific Apache httpd binary to run Urchin services as a separate instance Side effects: Urchin upgrades that depend on features added and/or configuration changes to the bundled Apache may not work properly if the existing Apache binary doesn't support these changes/features. Configuration changes Ensure that your httpd includes support for the following modules: mod_access mod_cgi mod_dir mod_mime Install Urchin 5 in the normal fashion, choosing the desired port for Urchin's admin and reporting interfaces to run on. Once Urchin is installed, do the following:
cd /path/to/urchin/bin ./urchinctl stop mv urchinwebd urchinwebd.orig ln s /path/to/your/httpd urchinwebd ./urchinctl start
This will start up a separate instance of Apache that uses your Apache binary, but runs independently from your standard web services. Option 2: Running Urchin services from an existing Apache configuration
177
Side effects: The entire Urchin distribution must be owned by the same UID that httpd runs as. The admin GUI cannot be used to change the port Urchin runs on. Use the bin/urchinctl command with the s argument exclusively to start/stop only the Urchin scheduler (urchind) Configuration changes Add the following lines to your existing httpd.conf file. You will need to supply the IP address for your server, the port number for the Urchin to use, as well as the path to the location where you've installed Urchin 5.
## Support for Urchin administration and reporting services Listen [port#] <VirtualHost [serverip]:[port#]> ErrorLog /path/to/urchindistribution/var/error.log DocumentRoot /path/to/urchindistribution/htdocs/ <Directory /path/to/urchindistribution/htdocs/> AddHandler cgiscript .cgi Options ExecCGI DirectoryIndex session.cgi AllowOverride None Order allow,deny Allow from all </Directory> </VirtualHost>
Once these configuration changes have been made, perform the following tasks: Change the ownership of the Urchin distribution to the UID that your Apache webserver runs as:
chown R apacheuser /path/to/urchin/bin
Ensure that the Apache bundled with Urchin is stopped:

cd /path/to/urchin/bin ./urchinctl w stop
Restart your Apache server to enable Urchin reporting and administration Edit your Urchin boottime startup script(s) and replace any instances of
urchinctl start
with
urchinctl s start
This will cause only the Urchin scheduler to be run at boot time, rather than both the scheduler and Apache server.
178
Using an Existing IIS Webserver (Windows Platforms)
By default, Urchin 5 administration and reporting are done using a standalone Apache server that is bundled with the Urchin product. In the vast majority of Urchin installations, this is the preferred method for delivering Urchin admin and reporting interfaces. However, in rare instances it may be necessary to utilize an existing IIS webserver. This may be due to site requirements that disallow the use of a third party webserver product on the server, or the need to set up Urchin reporting as a virtual host on an existing IIS server. DISCLAIMER: These modifications to the Urchin installation are unsupported and fall outside the scope of the standard Urchin free and paid support plans. Any assistance rendered to set up or debug these configurations will be done at Urchin Software Corporation's standard Hourly Support rate. Procedure Note: this procedure assumes that Urchin has been installed in the default location of C:\Program Files\Urchin. If you have installed Urchin elsewhere, please be sure to substitute the proper location in the example below. Step 1: Create a new user for the Urchin web interface 1. Go to "Administrative Tools" > "Computer Management" 2. On the left hand side of the Computer Management screen, click "Local Users and Groups" 3. Rightclick on the Users folder and select "New User..." 4. Enter "IUSR_URCHIN" in the "User name:" field 5. Uncheck the "User must change password at next logon" box 6. Check the "User cannot change password" box 7. Click "Create" and then "Close" 8. Double click on the"Users" folder on the left 9. Rightclick on the IUSR_URCHIN on the right and select "Properties..." 10. Under the "Member of" tab, remove all existing entries, and then click the "Add.." button and choose "Guests" in the popup window, and click "Add" again, then click "OK" 11. Click "Apply" and "Close" to save your changes Step 2: Install Urchin (if not already done) Step 3: Disable the Urchin Apache web server 1. Go to "Administrative Tools" > "Services" 2. Under "Services" find the "Urchin Webserver" record 3. Right click on "Urchin Webserver" and select "Stop" 4. Right click on "Urchin Webserver" and select "Properties", then change the Startup type" to "Disabled" 5. Click "OK" Chapter 7: Advanced Topics 179
Step 4: Added a new web site to IIS 1. Go to "Administrative Tools" > "Internet Services Manager" 2. Rightclick on your server's name and select "New" and then "Web Site" 3. In the "Description:" field type "Urchin" and click Next 4. Select the IP address and port number (typically 9999) and click Next 5. In the "Path:" field browse to the location where Urchin is installed (typically C:\Program Files\Urchin\htdocs) and click Next 6. Add a check mark in "Execute:" and click Next and then Finish 7. Rightclick on the new Urchin web site and go to "Properties" a. Under the "Web Site" tab, uncheck "Enable Logging" b. Click on the "Home Directory" tab and check "Script source access" c. Click on the "Documents" tab and Remove both Default entries, then click Add and enter "session.cgi" in the popup window, then click OK. d. Click on the "Directory Security" tab, and then click Edit in the "Anonymous access and authentication control" area. Ensure that the "Anonymous access" box is checked, then click Edit... to change the "Account used for anonymous access". In the popup window, select "IUSR_URCHIN" for the Username. Click OK and then OK again to get back to the Properties window. e. Click OK to save your changes and exit the Properties window Step 5: Set up directory permissions 1. Right click on "Start" and select "Explore" 2. Navigate to the location where Urchin is installed (typically C:\Program Files\Urchin) 3. Right click on the "Urchin" folder and select "Properties" 4. Click on the "Security" tab 5. Uncheck "Allow inheritable permissions from parent to propagate to this object" and then click "Remove" in the popup window. 6. Click "Add" and select the Administrator user and then click "Add" 7. Click "Add" and select IUSR_URCHIN and then click "Add" 8. In the "Name:" field ensure that only the Administrator and IUSR_URCHIN entries are there 9. Ensure that only the following permissions are allowed for both the Administrator and IUSR_URCHIN users: Read &Execute List Folder Contents Read 10. Click OK to save the permissions. 11. Click into the "Urchin" folder in the Windows Explorer window. 12. Right click on the "data" folder and select "Properties" 13. Click on the "Security" tab. 14. Uncheck "Allow inheritable permissions from parent to propagate to this object" and then click "Remove" in the popup window. 15. Click "Add" and select the Administrator user and then click "Add" 16. Click "Add" and select IUSR_URCHIN and then click "Add" 17. In the "Name:" field ensure that only the Administrator and IUSR_URCHIN entries are there 18. Ensure that the following permissions are allowed for both the Administrator and IUSR_URCHIN users: Full Control Modify Read &Execute Chapter 7: Advanced Topics 180
List Folder Contents Read Write 19. Click "OK" Step 6: IIS 6 only 1. Go to the "Web Service Extentions Manager" 2. Click on 'Add new web service extention..' 3. Enter "Urchin CGI" in the "Extention Name:" 4. Click 'Add' 5. Browse to and highlight report.cgi and session.cgi Default location: C:\Program Files\Urchin\htdocs 6. Check "Set Extention Status to Allow" 7. Click "OK" 8. Go to the main IIS entry for your server: "Hostname" 9. Right click and select "Properties" 10. Click on "Mime Types" 11. Click "New" 12. Enter ".cgi" in the "Extention:" field. 13. Enter "application/octetstream" in the "Mime Type" field. 14. Click "OK" 15. Click "OK" Your IIS webserver should now be set up to call the Urchin web interface if you connect to it using the URL of http://my.server.com:port, where port is what you set Urchin to use when you installed it (default is 9999).
Using External Authentication or Authentication Bypass
Overview By default, Urchin authentication is performed when the Urchin Session Controller (session.cgi) calls the auth binary located in the bin directory of the Urchin Installation. This binary queries the configuration database and compares the username and password provided with that stored in the configuration. An exit code signifying either success or failure is returned to the Session Controller. The location of the authentication binary can be controlled with a configuration change. This modular design allows administrators to call an external authentication program instead of the default auth binary.
181
Shown in the above diagram, this external authentication program could perform any desired authentication function including LDAP and other database calls. As long as the program is executable by the Urchin user and conforms to the input/output requirements, Urchin can be easily modified to use a different form of authentication. Specifying the Authentication Routine To configure which authentication routine the Session Controller calls, edit the etc/session.conf file located in the Urchin Installation. This file contains configurable parameters that control the behavior of the Session Controller including which routine to call for authentication. Edit the line: AUTHENTICATION: ../bin/auth Replace, the ../bin/auth with the path to your authentication routine. Be sure that the authentication routine is executable by the same user that urchinwebd (Urchins Apache web server) is running as. Input/Output Requirements When the Session Controller calls the authentication routine, it will pass the username, password, and the remote IP address of the user as command line arguments, such that: argv[1] = username argv[2] = password argv[3] = remote_addr The external authentication routine could choose to ignore any and all of these parameters. But typical authentication routines will at least look at the first two. After performing any and all desired authentication, the routine should exit with a code equal to zero for success and a minus one for failure. Exit Code 0 = successful authentication 1 = authentication failed The above authentication interface allows administrators to easily customize their own routines for validating user logins. Bypassing Authentication Using the above techniques, the Urchin authentication can be purposefully bypassed. In the case where a hosting provider wants to use the entire Urchin System for controlling users and groups, but they have already authenticated the user by the time the user arrives at Urchin, bypassing the authentication is an option to avoid a double login. As long as the host can guarantee that access to the Urchin System is controlled from an authenticating portal and that the username cannot be tampered with, the host can bypass authentication using the following technique. To bypass the authentication create a dummy external authentication routine that always exits with a zero. For example, perl code might look like: #!/usr/bin/perl exit(0); Chapter 7: Advanced Topics 182
Point the Session Controller at this dummy authentication routine by editing the etc/session.conf file to point to this dummy routine as described above. Next, simply provide a link that looks like: http://hostname:9999/session.cgi?action=login Modify the above link to point to your actual hostname and port, and modify the user to the point to the desired username or variable. The dummy authentication routine will automatically approve this login. Please use this method with care to avoid security problems. Note for Windows Users In order to provide similiar functionality in Windows environments where Perl is not installed, a simple noauth.exe binary is available from the Helper Scripts area of the Urchin Support web site. This binary is merely a "noop" it simply returns a successful status when called. Be sure you understand the security implications of this before implementing this solution.
Linking Directly to Urchin Reports
Overview In a standard Urchin installation, delivery of Urchin reports is controlled via the embedded session controller and Apache webserver that ship with Urchin. Users view their Urchin reports by authenticating themselves via an Urchincontrolled login process, and are then presented with a list of Urchin reports that they are authorized to view. It is possible, however, to bypass Urchin's authentication and session controller and provide users with Urchin reports via a direct link from a portal or other external web site. Urchin 5 provides this capability as one of its standard integration points. If all Urchin report data is in /data/reports, read Basic Scenario, below. If Urchin report data is in a location other than /data/reports, read Advanced Scenario, below. If you are supporting multiple users with multiple profiles, and each user has his/her reports in his/her own directory, read Large Configuration Scenario, below. Basic Scenario This section applies to a standard Urchin installation, where all Urchin reporting data is located in the data/reports directory of the Urchin distribution and the Apache webserver supplied with Urchin (urchinwebd) delivers the content for the Urchin reports. Step 1: Enable "Direct Report Linking" Log in to the Urchin administrative interface as the "admin" user. Navigate to the Settings > Access Settings screen. Chapter 7: Advanced Topics 183
Set the "Direct Report Linking" field to "on".
Alternatively, you can enable direct report linking by using the "uconfdriver" utility in the "util" directory of the Urchin distribution: Start a command shell on the Urchin system Change directory to the "util" directory of the Urchin distribution Set the direct report linking by typing: uconfdriver action=set_parameter recnum=1 cr_directlink=on Step 2: Configure links to Urchin reports For each profile (report) to which you want to provide a link, create a link in the following format:
(baseurl)/report.cgi?profile=(profilename)
where (baseurl) is everything before session.cgi in the current URL, (profilename) is the URIescaped name of the profile, and (username) is a user that has access to the report. The user= setting is optional and controls the language and localization preferences; if not specified, the admin user is assumed. For example,
http://www.hollywoodweb.com/report.cgi?profile=www.hollywoodweb.com
Advanced Scenario Step 1: Enable "Direct Report Linking" Follow Step 1 in Basic Scenario, above. Step 2: Set urchin.cgi permissions Make sure that bin/urchin.cgi and util/uconfdriver in your Urchin installation are accessible and executable by your web server user. Step 3: Copy htdocs/report.cgi to the directory(ies) from which you wish to run it. Step 4: Link/alias uicons, ujs, usvg, and ucss folders The reporting interface needs access to certain javascript files and icons. From the directory that will contain report.cgi, create links or aliases to these folders. This can be done as symbolic links, web server aliases, or you can simply copy the folders into the location. To set up symbolic links: cd [report.cgi location] ln s [urchin path]/htdocs/uicons uicons ln s [urchin path]/htdocs/ujs ujs ln s [urchin path]/htdocs/usvg usvg ln s [urchin path]/htdocs/ucss ucss When using symbolic links, make sure that your web server is configured to allow the following of symbolic Chapter 7: Advanced Topics 184
links. For Apache, this is the FollowSymLinks directive. Step 5: Ensure that the web server is configured to execute CGI applications. For Apache, enable ExecCGI. Do not use script aliases. Step 6: Edit htdocs/.report.conf Uncomment the "Profile =" and "User =" lines if they are commented, and specify the profile and user. For example:
Profile = www.hollywoodweb.com User = johng
Large Configuration Scenario This scenario allows you to easily support multiple users with multiple profiles. Instead of creating a .report.conf file for each userprofile combination, you create a single .report.conf file that allows the report engine to dynamically extract the user and profile name from the URI stem. The steps to establishing the Large Configuration Scenario are identical to the Advanced Scenario, above, except that instead of explicitly setting Profile and User in a .report.conf file, you specify a Regular Expression (as is commonly used in Tcl, Perl, C#, VB.NET, and other languages) in the URIMatch field. For example, if the user johng types in the following URL to get his reports:
http://mach1.net/johng/www.hollywoodweb.com/report.cgi
then specifying the following fields in the .report.conf file would extract "johng" as the user and "www.hollywoodweb.com" as the profile:
URIMatch = ^/(.*)/(.*)/report.cgi Profile = $2 User = $1
Explanation: The URI stem is "/johng/www.hollywoodweb.com/report.cgi". The URIMatch field is a Regular Expression which parses out the user and profile from the URI stem. The first part of the URIMatch field is "^/" indicating that the URI stem must begin with a "/". The last part of the URIMatch field is "report.cgi" indicating that the URI stem must contain "report.cgi" before or at the end. The "(.*)/" means "extract the string up to the next "/" and save it (argument 1). Thus, the string "johng" is assigned to $1. The next "(.*)/" means "extract the string up to the next "/" and save it (argument 2). "Profile = $2" means assign the contents of argument 2 to Profile. "User = $1" means assign the contents of argument 1 to User. Therefore, the steps for the Large Configuration Scenario are:
185
Step 1 5: Follow steps 1 through 5 in the Advanced Scenario. Step 6: Create a .report.conf file that uses Regular Expressions to dynamically extract the user and profile name from the URL string that the user types in.
Scriptbased Configuration Management Overview
Overview Urchin 5 provides utilities and functionality to allow all administrative operations to be performed via unattended scripts. Only the report viewing requires the webbased interface to be operational. All configuration and log processing activities can be scripted using the following utilities and techniques. For first time users, it is helpful to run the webbased administrative interface first, in order to get familiar with the terminology and capabilities of the Urchin administrative system. Urchin 5 includes several utilities for modifying the Urchin configuration database without using the webbased administrative interface. Located in the util directory of the distribution, these utilities are:
uconfdriver uconfexport uconfimport uconfschedule
Each of these utilities must be run from a command shell or a script, as there is no ability to execute them from the webbased Urchin administrative interface. Complete documentation for each of these utilities is available in the Utilities section of the Advanced Topics area of the Urchin Documentation Center at http://help.urchin.com. Here is a summary of the functional use of these utilities: 1. The uconfexport utility exports the entire configuration into a file, or to standard output (stdout) if no file is specified. The format of the exported data is an XMLtype format defined in the documentation for uconfexport. Each record in the exported data corresponds to a configuration record in the configuration database. 2. The uconfimport utility imports the same XML type formatted data used by uconfexport into the configuration database. This tool provides functionality for importing or editing of single records, or replacing the entire Urchin configuration with the contents of a textbased configuration file. 3. The uconfdriver utility performs specific actions to individual records. All parameters can be passed on one line as arguments to the script, or a file with multiple commands (one per line) can be used. 4. The uconfschedule utility updates task scheduling directives on a global bases for all configured profiles; for example to set all Profiles to run at 1:00am daily. It also has additional capabilities to run Profiles immediately with or without permanently changing the scheduled time see the documentation for uconfschedule for additional details on these features. Note that you can use the uconfexport and uconfimport utilities to easily make a backup of or restore your Urchin configuration. This provides a very quick method of recovering an Urchin Chapter 7: Advanced Topics 186
installation after a disk failure or other system problem. An example of this functionality: Save configuration for safekeeping:
uconfexport > /path/to/savedconfigurations/urchincfg.save
Restore Urchin configuration from a known good backup:

uconfimport r f /path/to/savedconfigurations/urchincfg.save
Intended Usage The uconfexport and uconfimport utilities are intended to provide a simple method for importing and exporting data from the Urchin configuration database using regular text files in an XMLtype format. These utilities also allow you to specify the names of profiles, log sources, filters, users, etc. in configuration directives which specify access lists rather than the more cryptic record number lists that are used by the uconfdriver utility. The uconfimport utility can be used to add new records or modify existing records, but it cannot remove old records (unless the database is completely reset with the "r" option). The uconfdriver utility is very powerful and can be used for very specific scripting operations that may change only a few parameters in a database record, as well as performing complete record additions, modifications and deletions from the database. It can also be used for querying the configuration database for several parameters. This utility is more ideally suited for use in an environment where scripting all administrative functions of Urchin is desired, such as in automated provisioning systems or very large hosting environments where use of the Urchin administrative GUI is impractical. Note that the uconfdriver is a lowerlevel utility that does not automatically maintain associations between the various database tables when working with directives that maintain crossreference lists. When using uconfdriver to script configuration operations, please be aware that many of the tables contain directives that refer to other records, or lists of records. These directives are: ct_ulist, ct_glist, ct_llist, ct_flist, and ct_rlist, which refer to the user, group, logfile, filter, and profile tables respectively. These lists are represented as exclamation point delimited list of recnums, as demonstrated by this list of filter records:
ct_flist="!13!36!56!"
where each entry represents the recnum value of a record and is surrounded with exclamation points. For the uconfimport and uconfexport utilities, this directive would be specified as:
ct_flist="filter1,filter2,filter3"
Important! Regardless of which utility you use to manipulate the configuration, you must be careful to keep crossreferences intact. For example, a Filter record has a ct_rlist which details all of the profiles that the filter applies to; and a Profile record has a ct_flist which details all of the filters that apply to this profile. Note that the uconfimport and uconfexport utilities translate and verify the lists specified in the directive for you; uconfdriver does not. Special Usage Notes
187
The uconfdriver uses exclamation point delimited lists of record numbers for directives that maintain associations with other tables (e.g. ct_flist), whereas the uconfexport and uconfimport utilities use commadelimited lists of names in these directives. Be sure to use the appropriate list specification syntax for the utility you are using. If a Profile is added but no corresponding Task is added, scheduling of the Profile cannot be managed within the Urchin admin GUI interface. In addition, the Profile cannot be scheduled to run with the Urchin Task Scheduler. When adding or editing the "ct_password" directive for use with either a User or Remote Log Source password, uconf_driver and uconfimport will automatically encrypt the password before writing it to the Urchin configuration database to ensure that passwords are not stored in clear text. For portability reasons, the encryption is in a proprietary format that is not compatible with other password encryption formats such as "crypt" on UNIXtype systems. Examples of pseudocode scripts to perform tasks This section gives examples of using uconfdriver and uconfimport in scripting pseudocode, which could be easily translated into a UNIXtype shell script, a Perl script or a Visual Basic script depending on the needs of the application. Additional examples are given in the documentation for each specific utility. Apply a German language setting to all users $user_count = `uconfdriver action=nrecords table=user`; for ($i = 1 ; $i <= $user_count ; $i ++) { uconfdriver action=set_parameter table=user entry=$i ct_language=ge; } Add a new profile/task/logsource/user Note that in this example, uconfimport takes care of building the proper mappings so that the new profile, task, log source and user will all be properly cross referenced, e.g. the log source is associated with the profile and viceversa, the user will be in the profile's list of authorized users and viceversa, etc. It should also be noted that the Task is associated to the Profile because they share the same "Name=" tag. Step 1: Some automated provisioning application creates the following text file (/path/to/textfile): <Profile Name="www.newdomain.com"> ct_name=www.urchin.com ct_affiliation=(NONE) ct_website=http://www.newdomain.com ct_reportdomains=www.newdomain.com,newdomain.com cs_llist=newdomain.comaccesslog ct_defaultpage=index.html cs_vmethod=0 cs_ulist=newuser </Profile>
188
<Task Name="www.newdomain.com"> ct_name=www.newdomain.com ct_affiliation=(NONE) cr_frequency=0 cr_runnow=0 cr_enabled=off </Task> <Logfile Name="newdomain.comaccesslog"> ct_name=newdomain.comaccesslog ct_affiliation=(NONE) ct_loglocation=/path/to/logs/newdomain.comaccess.log cs_logformat=auto cr_type=local cs_rlist=www.newdomain.com </Logfile> <User Name="newuser"> ct_affiliation=(NONE) ct_fullname="New User" ct_name=newuser ct_password=change$me cs_rlist=www.newdomain.com </User> Step 2: Call the uconfimport utility to import the new profile and user into Urchin: uconfimport f /path/to/textfile
Data Export
There are two ways to export data from Urchin: using the buttons on the upper right of any report screen (easy for any
Urchin user) using a database export script (advanced option for programmers only) Using the Export buttons Urchin's data export function makes it easy to extract data from any Urchin report. This is useful for bringing report data into a spreadsheet, word processor, database, etc. for further analysis. To export data from any report, select the appropriate type based on the application you plan to use to manipulate the data. For general database importing, use tabseparated format. For Word and Excel Chapter 7: Advanced Topics 189
export, the application should launch automatically after the data is exported, and the new document should be populated with the data you have exported. Tab: click the "T" button to export data in tabdelimited format. Word: click the Word icon to export data in Microsoft Word native format. Excel: click the Excel icon to export data in Microsoft Excel native format. Printing: click the printer icon to get a printfriendly view of the data; click the Print Page link from that screen to actually print the report. Recommendation: To export data to a database, tabseparated is usually the preferred format. Using a Database Export Script The following PERL script, U5DataExtractor, queries the Urchin databases for a particular profile and produces textbased reports that are suitable for sending via Email. The script should be configured before use with the proper path to the Urchin distribution, and the default profile name. Exectuting the script with the "help" option displays the usage. Important Note: This script is strictly provided asis, with no warranty expressed or implied. This is an unsupported script. Use this script for Urchin 5 only.
Customization
Custom Log Formats
Introduction Urchin can process virtually any log file format. By providing Urchin with the necessary information about the log format, Urchin will read and parse the raw data according to your configuration. This article describes a step by step method for creating custom log formats. Once a custom format is created it can be selected in the Administration Interface as the format for any log file.
190
It should be noted that certain log data is required in order to create certain reports. You will need to be sure your log contains the minimally required fields for Urchin to process it. If you are unsure see the Log Files Logging Other Webservers document in the Urchin Administration section of the Documentation Center. Creating Custom Formats A custom log format is created by creating a format specification file in the
[urchin install dir]/lib/custom/logformats/
folder. A sample file is provided called 'custom.lf'. Multiple custom files can be created. Each one needs the '.lf' extension for Urchin to recognize it. The default builtin log formats such as apache, w3c, netscape, etc. are located in
[urchin install dir]/lib/reporting/logformats/
In each of the above directories, there is an available fields list in the fieldlist.txt file. The custom folder holds the custom fields list and the reporting folder holds the standard fields list. Custom log formats can refer to fields in either list by using the field id numbers. Once a custom format is created, it is available for selection in the Admininstration Interface. Here are the basic steps for creating a custom format: Step 1: Copy the example In the 'lib/custom/logformats' folder of the Urchin distribution, make a copy of the 'custom.lf' file. The new filename should not use spaces, and it needs the '.lf' extension. Step 2: Set the primary positions Edit the file created in the above step. The file contains a lot of useful information about each variable. The first step is to set the PrimaryPositions variable. This comma separated list identifies each field (by id number) in the log format and its relative position. Use the fieldlist.txt files in the two directories mentioned above to determine which field ids to use. If you have custom date and time formats that are different from Apache and IIS formats, use fields 16 and 17 respectively. If the date and time are together in one field, just use field 16. Step 3: Check the fields separator Check the FieldSeparator1 variable, and set this to the separator between your fields. Use \s for a Chapter 7: Advanced Topics 191
space and a \t for a tab. Step 4: Is the HTTP status field available? If the HTTP status field is available, then leave the StatusRequired variable set to YES. This will separate valid and error hits appropriately. If there is not status in the log, then set this to NO, so that all hits will be considered valid. Step 5: Are you using a custom Date/Time format? If so, then edit the CustomDateFormat for field 16 and the CustomTimeFormat for field 17. The format is specified using the % variables defined in the Custom Date Format article later in this section of the documentation. Step 6: Check the other variables Most of the other variables will be OK for most custom log formats, but check the comments provided in the file on each variable to see if it applies to your situation. Step 7: Do you have custom calculated fields? In addition to the format, you can specify custom calculated fields in the specification file. An example with comments is provided at the bottom of the file. The custom calculated field works the same way as the Advanced Filter in the filtering section except that custom calculated fields are processed first. Please see the article on custom calculated fields form more information. Save the file and you ready to assign it as the format for a log file in the Administrative Interface. It will automatically show up as one of the format options in the pull down menu for log formats.
Custom Navigation
Introduction The list of reports in the navigation of the reporting interface can be completely customized. Urchin ships with a number of default "Report Sets" which control the navigation and list of reports. Additional Report Sets can be easily added. And once created, they are automatically selectable in the Report Set configuration setting for the Profile. Different Report Sets can be set for different users as well.
192
Using Report Sets, you can modify the list of reports, turn off entire sections, change colors, change text and move reports around. This is all done by creating a Report Set definition file. Keep in mind that a Report Set definition is created for a particular Profile Type such as "Standard Website". Furthermore, adding a Profile can automatically select a custom Report Set as the default if desired. Creating Custom Report Set A custom Report Set is created by creating a Report Set definition file in the
[urchin install dir]/lib/custom/profiletypes/<ProfileType>/
folder, where <ProfileType> is "Standard_Website", "ECommerce_Website", or another Profile Type. Multiple custom Report Sets can be created. Each one will need a '.rs' extension for Urchin to recognize it. Be sure to use underscore "_" characters in the directory and filenames instead of spaces. Once a custom Report Set is created, it is available for selection in the Admininstration Interface. Here is a step by step procedure for creating a custom Report Set. Step 1: Copy an existing Report Set Sample Report Set definition files are included in the Profile Type folders that ship with the product. The first step is to copy the sample and rename the file with a ".rs" extension. Be sure to use underscores instead of spaces in the name. For example, if creating a Report Set for the Standard Website Profile Type, then in the directory, "lib/custom/profiletypes/Standard_Website/" within the Urchin installation, copy the file "All_Reports.rs.sample" and rename it to "my_report_set.rs" or some other descriptor. Step 2: Modify the Report Set Edit the new Report Set definition file. Each line in the definition file can represent a report or a menu. A description of all the configuration fields for each line is described in the top of the file. But in general, to turn off reports or entire sections, enter a '#' character in front of the entries you wish to turn off. The names of reports can be changed by modifying the third field. Names and help text can either reference an entry in the dictionary system or you can enter the text directly into the field in this file. Colors can be changed by editing fields five and six. Menus can be created by copying one of the existing menu lines and editing the text entry appropriately. The order of the entries in the file mirrors the order seen in the navigation. Each entry will need a unique ID which is in the first field. For list of all Regular Reports, see the Reference Section of this manual. Chapter 7: Advanced Topics 193
Step 3: Is this the default Report Set for Profiles of this type? If you want the newly created Report Set to be the default setting when creating new Profiles of this type, then copy the 'default.config' file from the 'lib/reporting/profiletypes/Standard_Website/' folder to the 'lib/custom/profiletypes/Standard_Website/' folder, where 'Standard_Website' is the Profile type that we are creating the Report Set for. Edit the new default.config file in the custom area and add a line that contains: 'cs_reportset=my_report_set' where 'my_report_set is the name of the Report Set we just created without the .rs extension. Save the file(s) and you ready to assign it as the Report Set for existing Profiles. In the Administrative Interface, edit a profile and under the Profile Settings tab, use the Report Set pull down menu to select your new Report Set.
Custom Reports
Overview Urchin's internal data processing and reporting engine is very powerful and can be configured to process virtually any type of data. This document has been created as an overview and guide for creating custom reports. As shown in the figure below, there are two separate processes within Urchin that control the creation of reports. There is the Log Processing step which parses the raw log file data and populates a number report data tables. Log processing is triggered by the Urchin Scheduler or when you click the 'Run Now' button in the Administration. The second process is triggered when someone actually clicks to view a report. All reports are created dynamically. Clicking on a particular report creates a query to the data tables which are compiled by the Reporting Engine and delivered to the user as a viewable report.
194
The key to creating valuable custom reports is understanding the three controls that affect reporting. The first control is the Log Format, which tells the Urchin Engine what is in the log file and how to process it. To create custom log formats, please see the appropriate article in this section of the manual. The second control is the Data Map. The Data Map controls which fields are stored in which data tables. To create a custom report, you may need to create an additional entry in the Data Map as discussed below. The Data Map is critical to defining what information is available for reporting. The last control is the Report Set. The Report Set contains the listing of reports that is seen in the navigation when viewing reports. The Report Set entries also contain the information necessary for making the query to one of the data tables, including the table number, and how to display the data. Creating a report set entry is discussed below. Step 1: Select your Fields You may need to reference the Regular Field List and Regular Report List in the Reference section of this guide to see exactly which fields are available and what existing reports are storing. Urchin data tables can correlate two fields together as an option. In general, the data tables store text data versus numeric data versus time. You will need to define the text fields and the numeric fields to use. See the examples at the end of this document. Step 2: Create a new Profile Type In the Urchin installation folder, copy one of the existing Profile Types in 'lib/reporting/profiletypes/' to 'lib/custom/profiletypes/'. Simply create a copy of the entire folder, such as 'Standard_Website', and place the copy into the 'lib/custom/profiletypes/' folder. The copy should include all of the files contained in the original folder. Be sure to rename the folder to give it a unique Profile Type name. Use underscores in the name instead of spaces. Step 3: Edit the Data Map In your newly created folder, there will be a "datamap.dm" file which contains all of the Data Table entries. The format of the file is described in the beginning of the file itself. Edit the file and create a new entry at the end of the file. Here are some notes on setting each field in the new entry: TABLE: Make sure this is a unique entry between 1200. This is the table id number that will be referenced during the query. AFIELD: Use the Regular Field List in the reference section to look up the id value of the field you wish to report on. BFIELD: If you are correlating two fields, enter the second field here and set the SEP field to a pipe symbol (|), otherwise leave these two fields as dashes (). IFIELD: Use the Integer Field List in the reference section to look up the id value of the field you wish to report versus. REQUIRED: Set this to A, B, or BOTH to require the field(s) to exist before making an entry. Step 4: Edit the Report Set In the same folder, edit one of the Report Sets, (.rs files), to include the reference to your new table. You can also create a new Report Set if desired. This file contains entries in the same order as the navigation for the reporting. Each section of reports begins with a menu. Determine which section you wish to place your new report and copy one of the existing reports entries that matches the format Chapter 7: Advanced Topics 195
and type of report you are creating. Edit your newly created entry as follows: VID: The first field is the unique View ID. Modify this so that it is a unique value for this report. NAVNAME/FNAME/NAME: Set all three of these fields to the name of your report. Replace the dictionary entry with the name in quotes. IFIELD: Set this to one of (VISITORS,SESSIONS,PAGEVIEWS,HITS,BYTES,TIME,REVENUE,TRANS,ITEMS) INAME: Set this to the text name of the Integer field. TABLE: Set this to the table number created in the previous step. HELP: Set to "" for now. FPART: If you used only Field A in the previous step, the set this to CFIELD1, otherwise you can grab BOTH. FILTYPE/FILTER/FILCON/SOPT: Set all of these to "". Step 5: Configure and Process Create or edit a Profile and set the Profie Type to the newly created Profile Type and the Report Set to the one created/edited in the previous step. Process log data and view your new report.
Custom Date/Time Formats
Introduction Urchin can process virtually any date or time format contained in a log file. The only requirement is to provide Urchin with the necessary date or time format which matches the pattern of the date contained in the log file. This article describes the variables used to specify the date or time format. These formats are specified inside a custom log format file. Please see the "Custom Log Formats" article above in this section. How Date/Time Parsing Works Urchin determines the date/time by comparing a specified format against the date/time field(s) in the log file. For example, an IIS log contains the date in the following form: 20021112 Urchin is able to determine the year, month, and day be using the following format: %Y%m%d Creating A Date/Time Format To create a custom date/time format, first look at the order and pattern of the date/time data contained in your log file. Then, select from the following Date/Time variables listed below to make up the format.
196
For example, if your log file contains the time as "07:01:47", then you need to create a pattern to match this. The first thing to note is that the pattern is hours:minutes:seconds. Looking at the variable list below, you will note that %H is the variable for hours, %M is the variable for minutes, and %S is the varialble for seconds. Putting these together yields a format of "%H:%M:%S". If you have a literal '%' character in the date or time format field, you can specify the literal % as %%. The most common variables are: %Y, %m, %d, %H, %M, and %S. Date/Time Variable Definitions %A = national representation of the full weekday name. %a = national representation of the abbreviated weekday name. %B = national representation of the full month name. %b = national representation of the abbreviated month name. %d = the day of the month as a decimal number (0131). %e = the day of month as a decimal number (131); single digits are preceded by a blank. %H = the hour (24hour clock) as a decimal number (0023). %I = the hour (12hour clock) as a decimal number (0112). %j = the day of the year as a decimal number (001366). %k = the hour (24hour clock) as a decimal number (023); single digits are preceded by a blank. %l = the hour (12hour clock) as a decimal number (112); single digits are preceded by a blank. %M = the minute as a decimal number (0059). %m = the month as a decimal number (0112). %p = national representation of either "ante meridiem" or "post meridiem" as appropriate. %S = the second as a decimal number (0060). %s = the number of seconds since the Epoch, UTC (see mktime(3)). %w = the weekday (Sunday as the first day of the week) as a decimal number (06). %Y = the year with century as a decimal number. %y = the year without century as a decimal number (0099). %z = the time zone offset from UTC; a leading plus sign stands for east of UTC, a minus sign for west of UTC, hours and minutes follow with two digits each and no delimiter between them (common form for RFC 822 date headers). %% = `%'.
Custom DNS Entries
Overview The geoupdate utility is used to check for updates to Urchin's internal DNS database files and download the updates if they are available. This utility is run regularly via the __domaindb task that should be listed in the Configuration>Scheduler screen of the Urchin administration interface. The Chapter 7: Advanced Topics 197
utility can also be used to import custom entries into the DNS databases. See the section on the geoupdate program under Advanced Topics>Utilities for complete instructions on the options to use for creating custom DNS entries.
Custom Lookup Tables
Beginning with version 5.6, Urchin allows you to define custom lookup tables. One useful application of a lookup table is to substitute human readable text for the often cryptic request parameters used with dynamic URLs. For example, consider a web site in which the Pages &Files>Page Query Terms report is used to rank the popularity of requested documents. In the report (shown below), the document id is displayed instead of the full document name. The numeric id is shown because the report simply ranks the popularity of requests of the form http://www.hostsite.com/index.cgi?id=1001
Applying a lookup table which maps document names to numeric ids allows us to view the same information in Pages &Files>Requested Pages, with the full document name displayed.
This article illustrates how to create and apply a lookup table for this example. The details of your lookup table and filters may differ according to your particular application, however, the basic steps will still apply. Defining Your Lookup Table To define your table:
198
1. Create a table in Excel that maps your codes to text labels. An example is shown below. The first row of the file must begin with "#Fields:", followed by "request_stem" in column 2.
2. Save the Excel table as a tab delimited plain text file in the lib/custom/lookuptables directory of your Urchin distribution. You must save the file with an extension of ".lt". Applying the Table 1. Apply the following Advanced filter to your profile. This filter tells Urchin to look in the request_uri for the string "id=", extract the id, and write the id into the request_stem. (Note that request_stem is the title of the second column of the lookup table.)
2. Apply the following Lookup Table filter (applied on the request_stem field) to your profile. Select your lookup table in the Table Name drop down list. If your lookup table does not appear as an option in the drop down list, make sure that your lookup table file name ends with .lt and that it has been saved in the lib/custom/lookuptables directory of your Urchin
199
distribution.
Cobranding Urchin
Overview Urchin accomodates cobranding in the administration interface, the reporting interface, and, beginning with Urchin 5.6, the login screen. There are two files to edit in order to include HTML at the top of the interface (three files with version 5.6). If a complete portal integration is being done, then the Urchin reporting can be delivered within a frameset or table by your application server. (Please see the article on Portal Integration in the Integration Section.) Otherwise, follow the instructions below to cobrand your interface. Please note that your license agreement may prohibit obscuring or changing the Urchin Logo and Reports, beyond what is provided in this article. Cobranding Instructions To cobrand your interface, you will need to edit the following files located in the Urchin installation:
[urchin install dir]/lib/custom/cobrands/cobrand_admin.tpl [urchin install dir]/lib/custom/cobrands/cobrand_report.tpl [urchin install dir]/lib/custom/cobrands/cobrand_session.tpl (version 5.6 only)
The first file controls the cobranding on the admin interface and the second controls the cobranding on the reporting interface. The third file, available beginning with version 5.6, controls the cobranding on the login screen. Add HTML content to these files as necessary to include your branding. The HTML provided in these files will be placed on top of the Urchin interface as shown in the example below.
200
Hosting Automation Solutions

How are HSphere and Urchin 5 Integrated?
An unlicensed copy of Urchin 5 is now integrated into Positive Software Corporation's HSphere. Psoft customers who wish to integrate Urchin 5 can now download HSphere to enable Urchin 5: http://www.psoft.net/HSdocumentation/new_features.html#231 Note: Urchin 5 comes unlicensed, so customers may activate the 15 day demo license, then purchase an Urchin license via the standard methods: either in the Urchin 5 admin interface, on the urchin.com website, or by contacting sales@urchin.com. Detailed download and installation information is available here: http://www.psoft.net/HSdocumentation/sysadmin/urchin4.html
Using Urchin with Plesk PSA 5.0
Note: the following information has been provided by Plesk technical personnel. Instructions Chapter 7: Advanced Topics 201
Log into your PSA interface as Admin and select the Extras button. This takes you to MyPlesk.com and allows MyPlesk.com to know that you are the Admin of a PSA license. Under the Server Tools tab you will see the Urchin offering. Below are some additional instructions that you will need when installing Urchin: For PSA 5.0 only Use Urchin install instructions found at MyPlesk.com PSA log rotation feature should be turned off Configure and use Urchin log archiving/deleting Everything in Urchin should be set as described in documentation with only one difference Log File path should point to /path/to/vhosts/domain.com/logs/access_log.processed and if ssl is enabled, /path/to/vhosts/domain.com/logs/access_ssl_log.processed. .processed files are created by PSA statistics utility (after calculations by internal PSA stats) and processing should be scheduled to run daily at 5:00am Here are some examples: If you are running the standard version of PSA, your path will be /usr/local/psa/home/vhosts/DOMAIN.NAME/logs/access_log.processed If you are running the RPM version of PSA, your path will be /home/httpd/vhosts/DOMAIN.NAME/logs/access_log.processed If you began using PSA version 1.3.x and have upgraded to PSA version 5.x, your path will be /usr/local/plesk/apache/vhosts/DOMAIN.NAME/logs/access_log.processed Affiliate Program Info Please note that if your server is registered with MyPlesk.com (and you join the Affiliate Program), your MyPlesk.com account will be credited with an amount equal to 10% of the amount you pay for the Urchin Solution.
Ensim Webppliance
How are Ensim Webppliance and Urchin 5 Integrated? An unlicensed copy of Urchin 5 is currently integrated into Webppliance 3.6 for Windows. Ensim has not announced plans to integrate Urchin into their Webppliance for Linux/Unix products. They are interested in determining demand, though. If you would like Ensim to support this integration, please contact sales@ensim.com. You may attempt to run Urchin 5 outside of your Ensim environment, but it is unsupported.
202
Sphera's HostingDirector
How are Sphera's HostingDirector and Urchin 5 Integrated? An unlicensed copy of Urchin 5 is being integrated into HostingDirector. Sphera customers who wish to upgrade to Urchin 5 will be able to do so when HostingDirector 3.8 is available toward the end of 2003. With that release Urchin 5 will become a shared service on the server, so customers will no longer have to license a separate copy of Urchin for each VPS.
Performance &Tuning
Global Filtering of Hits from Monitoring Software
Overview Most Hosting environments provide some sort of monitoring of customer webservers in order to maintain Service Level Agreements (SLAs). As a side effect, however, the hits from this monitoring can really skew the Urchin reporting for the monitored web sites artificially inflating session, pageview, hit and byte counts. Recommendation In Hosting environments that employ such monitoring, it is highly recommended that a standard/global Urchin filter is applied to each customer's configured Urchin profiles to strip out the hits generated by montoring software. This is easily done in an environment where a centralized Urchin installation (managed by the hosting company) provides reporting for each customer's website(s). In dedicated/colocation environments where the customers themselves maintain an instance of Urchin on their server(s), the Hosting company should provide a sample filter that is appropriate for the monitoring being used. To aid in the implementation of Urchin filtering, the Host and the customer should work together to create a specific page on the customer website that only the monitoring software utilizes, e.g. something like:
http://www.customerdomain.com/healthpage.html
Examples Example 1: Filter out the IP address for the monitoring system Chapter 7: Advanced Topics 203
Filter Type: Exclude Filter Field: IP Filter Spec: 172\.16\.1\.1
This will strip any hits with the IP address 172.16.1.1 out of the webserver log as Urchin is processing it. Example 2: Filter out specific page that the monitoring system hits
Filter Type: Exclude Filter Field: REQUEST Filter Spec: ^/healthpage.html
This will strip any hits with a request for /healthpage.html out of the webserver log as Urchin is processing it. Considerations 1. It may be desirable to create additional, nonfiltered Profiles for the customers so they can see the actual traffic load (including the filtering) on the webserver(s). 2. The Hosting company may want to provide a Profile that provides reporting exclusively for the monitoring hits e.g. it filters in only hits from the monitoring software. This profile could be used to show that the proper monitoring is being done and that SLAs are being met.
Reducing Disk Storage for Urchin Profile Monthly Databases
Overview Urchin reporting data is stored in independent monthly databases for each Profile configured within Urchin. These databases typically reside in the data/reports directory of the Urchin distribution. By default, Urchin will keep an unlimited number of these monthly Profile databases. For most small and medium sized sites, the storage requirements are modest. Because Urchin reporting does not require access to the raw webserver logs once they've been processed, there is no need to keep the webserver logs. The processed Urchin monthly databases will be approximately 510% of the size of the raw webserver logs that were processed to populate the Urchin databases, and in most cases this will represent a very minimal amount of disk space even if all Urchin databases are kept indefinitely. For large sites, however, which produce hundreds or thousands of megabytes worth of webserver logs per day, or hosting providers who have a very large number of Profiles configured, it may be desirable to reduce Urchin's ongoing data storage requirement. This can be accomplished in one of the following ways: 1. Set the profile to automatically delete the raw tracking data after processing the logs 2. Set the profile to archive historic data Chapter 7: Advanced Topics 204
3. Limit the number of months of historical reporting data that are retained Instructions for each of these methods is provided at the end of this article. Technical Overview of Urchin Database Storage For each Urchin profile, Urchin maintains a set of nine monthly databases that provide data for the reporting engine. The databases are named after the month for which they store data. The complete list of databases is:
YYYYMMhdata.und YYYYMMhdata.uni YYYYMMhdata.uns YYYYMMldata.und YYYYMMldata.uni YYYYMMpdata.und YYYYMMsdata.und YYYYMMtdata.und YYYYMMudata.unf YYYYMMvdata.und YYYYMMvdata.uni > > > > > > > > > > > hash table data hash table index hash table string data log tracking data log tracking indexes path data session data totals data header for the database visitor data visitor index
Each set of databases is complete for the month of data that it contains. Since there is no interdependency between the monthly database sets, archiving and pruning operations can be performed independently on each database set without affecting any other month. Under normal operation, the entire set of nine monthly database file is retained for each month. However, four of these database files are used only by the Urchin log processing engine. These database files are:
YYYYMMpdata.und YYYYMMsdata.und YYYYMMvdata.und YYYYMMvdata.uni
These databases contain information about paths, sessions and visitors and can account for a substantial percentage of the total storage space required for the month, on the order of 1050%. Thus there can be a significant disk space advantage by setting the Keep Raw Tracking Data option to off in the Storage/DB screen of the Profile configuration. Important Note: If you plan to upgrade to a future major release of Urchin, this raw tracking data will be used for linking records together. Absence of this data will affect certain new visitorcentric drill down reports that are planned for Urchin. Therefore, it is recommended that only extremely high traffic sites for which keeping the raw tracking data represents a disk or CPU resource consumption issue disable the keeping of raw tracking data. Other potential disk space savings can be obtained by compressing historic Urchin monthly databases into ZIP archives. The resulting archives are typically only 2030% the size of the uncompressed database set. While the Urchin reporting engine cannot read the ZIP archives directly, it has the ability to extract the databases it needs from the ZIP archives on the fly. This is completely transparent to a person viewing Urchin reports, other than a slight delay while the databases are being unpacked. The reporting engine does not remove the databases Chapter 7: Advanced Topics 205
it has unpacked; this allows quicker access to data while the person is viewing the Urchin reports. However, the original ZIP archive is left in place, so a periodic cleanup operation can simply remove the unpacked databases to regain the disk space once again. The last avenue for reducing Urchin storage requirements is to establish a policy for the duration of historical reporting that Urchin is to provide. For instance, in environments where Urchin is provided as a reporting service with a hosting package, it is very common to provide Urchin historical for the period of one year. Due to the monthly organization of Urchin databases, it is very easy for automatic scripting mechanisms to automatically remove old monthly databases that have aged past a certain threshold. When a historical reporting length policy is implemented, Urchin's data storage requirement will typically stabilize or only increase slightly once the historical retention limit has been reached. Methods for Reducing Data Storage How To Method 1: Delete the Raw Tracking Data after Log Processing You can configure the profile to delete raw visitor and session information after processing. For large sites, this improves performance and reduces the amount of data stored. Note: Sessions that overlap days appear as two sessions (one for each day) instead of one session, when this configuration is selected. The difference in results will be negligible for most sites. To configure the profile to delete raw visitor and session information after processing: 1. In the Admin interface, click Configuration, then Urchin Profiles>Profiles. 2. Edit the desired profile. 3. In the Storage/DB tab, turn the Keep Raw Tracking Data field "off". 4. Click Update. Method 2: AutoArchive Historic Data You can configure the profile to compress historic monthly data into an archive. The reports can view the archived data, but no additional hits may be processed for the archived months. To configure the profile to archive historic data, 1. In the Admin interface, click Configuration, then Urchin Profiles>Profiles. 2. Edit the desired profile. 3. In the Storage/DB tab, turn the Archive DB field "on". 4. Specify a number of months for the Archive DB After field. 5. Click Update. Method 3: Limit Retention of Databases for Historical Reporting For each Urchin Profile, simply remove any databases in the data/reports/profilename directory that begin with a YYYYMM prefix that have aged past the threshold needed for historical reporting. For example, if you wish to retain a oneyear reporting history and the current month is February 2004, you would remove any databases named 200301*data.un* to delete the reporting data from January 2003 for that Urchin profile. This would be repeated for all databases older than January 2003. Chapter 7: Advanced Topics 206
For an example of a readytorun Perl script that will automatically prune the Urchin databases after a certain period of time, please see the PruneUrchinData script at http://www.urchin.com/support/scripts/purge_udata.pl
Security Features
Activating SSL on the Urchin Webserver
The Urchin webserver that ships with Urchin 4.100 and later is capable of encrypting communication via SSL. To enable SSL, you will need to have either a valid certificate signed by a certificate authority or a selfsigned certificate. The process for enabling SSL in the Urchin webserver are as follows: 1. Copy your SSL certificate file into the Urchin var directory and name it server.crt 2. Copy your SSL key file into the Urchin var directory and name it server.key 3. Edit the urchinwebd.conf.template file located in the Urchin var directory. Change the ServerName directive from localhost to the name of your webserver. For instance: ServerName: www.urchin.com NOTE: The ServerName in the urchinwebd.conf.template file needs to match the name of the server that is in the certificate file. 4. Start or restart the webserver using urchinctl with the "e" option. Urchinctl is located in the Urchin bin directory. The "e" option instructs urchinctl to enable SSL in the webserver. For example, to restart the webserver with SSL enabled, use: urchinctl e restart To start the server without SSL enabled, just remove the "e" option from the urchinctl command. You should now be able to access your SSL enabled server using https://servername.domain.com:port/ NOTE: Customizing the SSL settings in the urchinwebd.conf.template may result in problems that could prohibit the webserver from starting.
207
Chapter 8: Reference
Integer Field List
Overview When a hit is processed by Urchin, certain integer fields are available including whether the hit is a pageview, a new session, how many bytes were transferred, etc. These integer values are used in updating many of the tables. In particular the Data Map which maps all of the texttype data tables references these integer fields by number. Integer Field List The following table lists all of the available integer fields and their corresponding id number. IFIELD id 1 2 3 4 5 6 Field Name Session Pageview NonPageviews Hits Valid Hits Error Hits
208
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
UTM Hits Non UTM Hits Robot Hits Non Robot Hits Bytes Robot Bytes Non Robot Bytes Forms Responses Transactions Items Transaction Revenue Item Revenue Downloads Repeat Responses Cost Primary Goals Clicks Impressions
Regular Field List
Overview When a hit or entry in a log file is read during processing, the hit is broken down into 'Raw Fields'. Fields are generally separated by spaces, tabs, or commas. The Log Format determines how these Raw Fields are assigned internally. Once the Raw Fields are read, Urchin calculates a number of 'Auto Fields' based on the 'Raw Fields'. Most reports use these Auto Fields for updating. Filters can be applied to either Raw or Auto Fields. The following table lists all available Fields and their purpose. Regular Field List id 1 2 iis_date iis_time Field Type (RAW) (RAW) Purpose IIS raw date of hit field. IIS raw time of hit field.
209
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 76 77
apache_time c_ip cs_username cs_request cs_method cs_uristem cs_uriquery sc_status sc_bytes c_host cs_useragent cs_cookie cs_referer custom_date custom_time cs_host s_port cs_version s_sitename s_computername s_ip elf_orderid elf_store elf_sessionid elf_total elf_tax elf_shipping elf_billcity elf_billstate elf_billzip elf_billcountry elf_productcode elf_productname elf_variation elf_price elf_quantity elf_upsold referral_protocol referral_host
(RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (RAW) (AUTO) (AUTO)
Apache raw date &time of hit field. Client IP Address. Client username (if any) Apache raw entire request field. IIS raw request method field. IIS raw request stem field. IIS raw request query field. Return status code from server. Number of bytes transferred for request. Client hostname (converts to c_ip if necessary). Browser useragent information. Cookies sent by browser. Raw Referral information (could be internal). Used for datestamp in Custom Logs. Used for timestamp in Custom Logs. Requested virtualhost by Client. Server port number. IIS Raw HTTP version. IIS Server site name. IIS Computer name. IIS Server IP address. Ecommerce order id number. Ecommerce store name. Ecommerce session id. Ecommerce transaction amount. Ecommerce tax amount. Ecommerce shipping amount. Ecommerce customer city. Ecommerce customer state. Ecommerce customer zip code. Ecommerce customer country. Ecommerce product code. Ecommerce product name. Ecommerce product variation. Ecommerce product price. Ecommerce product quantity. Ecommerce upsold variable. Referral protocol (http/https/etc.) Referral complete hostname.
210
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116
referral_domain referral_port referral_url referral_uri referral_stem referral_query referral_anchor referral_directory referral_filename referral_mime referral_keywords referral_domainandstem referral_errordetail request_method request_url request_version request_protocol request_host request_port request_uri request_stem request_query request_anchor request_directory request_filename request_mime request_origfilepath request_origmime request_errordetail useragent_complete browser_base browser_version platform_base platform_version domain_primary domain_complete sid utm_cookiea utm_cookieb
(AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO)
Referral domain name. Referral port number (if any). Referral complete URL. (includes host) Referral complete URI. (no host) Referral URI stem without query info. Referral Query info by itself. Referral information after # tag. Referral directory up to filename. Referral filename without directory. Referral mime type (file extension) Referral search engine keywords Referral domain and URI stem together. Referral error detail information. Request method (GET/POST/etc.). Request complete URL (if provided). Request protocol version. Request protocol (HTTP/etc.). Request hostname (if any). Request port number (if any). Request URI with query. Request URI without query. Request query information (e.g., after ?) Request information after # tag Request directory without filename. Request filename without directory. Request mime type (file extension). Request original uri stem if UTM. Request original mime type if UTM. Request detail for error hits. Complete user agent. Browser name (e.g., Netscape). Browser version. Platform (e.g., Windows). Platform version. First level domain. (e.g. com). Complete domain. (e.g. urchin.com). Session id (if any). UTM2 cookiea UTM2 cookieb
211
117 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155
utm_cookiec utm_cookie1 utm_cookie2 utm_cookie3 utm_unique_id utm_new_campaign utm_page utm_referral utm_screen_resolution utm_screen_available utm_browser_size utm_screen_colors utm_language utm_java_enabled utm_cookies_enabled utm_timezone_offset utm_js_version utm_session_number utm_repeat_campaign utm_campaign utm_medium utm_source utm_term utm_content utm_campaign_session utm_campaign_number utm_campaign_time elf_region utm_campaign_srcmedium utm_campaign_srcmedtrm utm_campaign_sesdelta utm_campaign_daysdelta utm_campaign_hour utm_campaign_goal log_source_name utm_ipandvisitorid utm_id utm_type
(AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO) (AUTO)
UTM2 cookiec UTM1 cookie1 UTM2 cookie2 UTM3 cookie3 UTM unique visitor id. new campaign variables detected UTM page variable (used for request_ variables). UTM Referral (used for referral_ variables). Screen resolution (e.g., 800x600). Available screen resolution in pixels. Browser size in pixels. Screen color bit depth. Browser language code setting. yes|no if java is enabled. yes|no if cookies are enabled. +/HHMM timezone offset value of browser. Javascript version info. Number of sessions for this visitor. Repeat campaign detected. same as utm_campaign in a link. same as utm_medium in a link. same as utm_source in a link. same as utm_term in a link. same as utm_content in a link. session number of this campaign. Number of responses in __utmz. Time in seconds of the current campaign. ECommerce region drilldown information. utm_campaign [utm_medium]. utm_campaign [utm_medium] | utm_term. difference in current session number and campaign session number. difference in days between the hit viewtime and the campaign time. hour of the day the campaign occurred. campaign goal that was met. Log source name in Log Source Wizard. IP address or host visitor key. same as utm_id in a link. used for email impressions. 212
Regular Report List
Overview During processing, each hit in the log file is separated and calculated into different fields. These fields are then used to update data tables which are queried for report views. Some reports have special storage and are not included in the data tables. The following table lists all of the predefined reports and which data table is queried for each. Regular Field List View# 1100 1102 1103 1104 1105 1110 1900 1903 1907 1901 1905 1904 1906 1902 1200 1201 1211 1206 1202 1207 1208 1203 1209 Report Name Traffic Sessions Graph Pageviews Graph Hits Graph Bytes Graph Summary Visitors &Sessions Visitors by Day Sessions by Day Unique Visitors Unique Sessions Visitor Loyalty Session Frequency Summary Pages &Files Requested Pages Downloads All Files Directory by Pages Drilldown Directory by Files Drilldown Directory by Bytes Drilldown File Types by Hits File Types by Bytes Data Table 37 7 17 12 7 12 18 13 19 utm_session_number vs. sessions request_stem vs. pageviews request_stem vs. downloads request_origfilepath vs. hits request_stem vs. pageviews request_origfilepath vs. hits request_stem vs. bytes request_origmime vs. hits request_origmime vs. bytes 213 Fields Used
1210 1205 1204 1600 1601 1602 1609 1603 1610 1604 1608 1606 1300 1301 1303 1302 1304 1305 1400 1401 1402 1403 1404 1405 1406 1407 1409 1500 1501 1504 1505 1502 1506 1507 1503 1510 1511 1800 1801
Page Query Terms Posted Forms Status and Errors Navigation Entrance Pages Exit Pages Click Paths Click To and From Length of Pageview Depth of Session Length of Session Click To and From Report Referrals Referrals Referral Drilldown Search Terms Search Engines Referral Errors Domains &Users Domains Domain Drilldown Countries IP Addresses IP Drilldown Usernames by Hits Usernames by Bytes Usernames by Sessions Browsers &Robots Browsers by Sessions Drilldown Browsers by Hits Drilldown Browsers by Bytes Drilldown Platforms by Sessions Drilldown Platforms by Hits Drilldown Platforms by Bytes Drilldown Combos by Sessions Robots by Hits Drilldown Robots by Bytes Drilldown Client Parameters Screen Resolution
11 8 14 20 20 21 7 22 20 1 1 2 2 23 4 4 4 6 6 10 16 5 3 9 15 3 9 15 3 24 25 31
request_stem|request_query vs. hits request_stem vs. hits sc_status|request_errordetail vs. hits request_stem vs. pageviews request_stem vs. pageviews request_stem vs. sessions request_stem vs. pageviews request_stem vs. time request_stem vs. pages referral_domainandstem vs. sessions referral_domainandstem vs. sessions referral_domain|referral_keywords vs. sessions referral_domain|referral_keywords vs. sessions referral_errordetail|referral_domainandstem vs. hits domain_primary|domain_complete domain_primary|domain_complete vs. sessions domain_primary|domain_complete vs. sessions c_ip vs. sessions c_ip vs. sessions cs_username vs. hits cs_username vs. bytes cs_username vs. sessions useragent_complete vs. sessions useragent_complete vs. hits useragent_complete vs. bytes useragent_complete vs. sessions useragent_complete vs. hits useragent_complete vs. bytes useragent_complete vs. sessions browser_base vs. hits browser_base vs. bytes utm_screen_resolution vs. sessions
214
1804 1805 1806 1808 1809 2100 2101 2102 2103 2104 2105 2106 2107 2100 2101 2102 2103 2104 2105 2106 2200 2201 2202 2203 2204 2206 2206 2211 2212 2213 2214 2215 2216 2221 2222 2223
Screen Colors Languages Java Enabled Timezone Offset Javascript Version ECommerce Revenue Number of Transactions Products by Revenue Products by Quantity Products by Revenue Drilldown Products by Quantity Drilldown ECommerce Summary Revenue Source Revenue by Region Drilldown Revenue by City Revenue by Referrals Revenue by Search Terms Revenue by Search Engines Drilldown Revenue by Domains Drilldown Campaign Tracking Lead SourceAcquisition Lead SourceQuality Lead SourceConversion Lead SourceROI Lead SourceConversion Lead Sourcecost breakdown Keyword AnalysisAcquisition Keyword AnalysisQuality Keyword AnalysisConversion Keyword AnalysisROI Keyword AnalysisConversion Keyword Analysiscost breakdown Keyword ComparisonAcquisition Keyword ComparisonQuality Keyword ComparisonConversion
32 33 34 35 36 42 41 42 41 43 43 44 45 45 46 56, 57, 53 56, 52, 51 56, 55, 81 56, 82, 54 56, 55 54 70, 71, 67 70, 66, 65 70, 69, 85 70, 86, 68 70, 69 68
utm_screen_colors vs. sessions utm_language vs. sessions utm_java_enabled vs. sessions utm_timezone_offset vs. sessions utm_js_version vs. sessions elf_productname|elf_productcode vs. revenue elf_productname|elf_productcode vs. items elf_productname|elf_productcode vs. revenue elf_productname|elf_productcode vs. items elf_region vs. revenue elf_region vs. revenue referral_domainandstem vs. revenue referral_domain|referral_keywords vs. revenue referral_domain|referral_keywords vs. revenue domain_primary|domain_complete vs. revenue source(medium) vs clicks, impressions, new leads source(medium) vs clicks, pages, sessions source(medium) vs clicks, goals, transactions source(medium) vs clicks, revenue, cost source(medium) vs clicks, goals source(medium) vs cost source (medium) vs clicks, impressions, new leads source(medium) | term vs clicks, pages, sessions source (medium) | term vs clicks, goals, transactions source(medium) | term vs clicks, revenue, cost source(medium) | term vs clicks, goals source(medium) | term vs clicks
70, 71, 67 source (medium) vs clicks, impressions, new leads 70, 66, 65 source (medium) | term vs clicks, pages, sessions 70, 69, 85 source (medium) | term vs clicks, goals, transactions
215
2224 Keyword ComparisonROI Keyword 2225 ComparisonConversion Keyword Comparisoncost 2226 breakdown Campaign 2231 ComparisonAcquisition 2232 Campaign ComparisonQuality 2233 Campaign ComparisonConversion
70, 86, 68 source(medium) | term vs clicks, revenue, cost 70, 69 68 56, 57, 53 source (medium) | term vs clicks, goals source (medium) | term vs clicks
2234 Campaign ComparisonROI Campaign ComparisonConversion Campaign Comparisoncost 2236 breakdown 2235
campaign name | source(medium) vs clicks, impressions, leads campaign name | source(medium) vs clicks, pages, 56, 52, 51 sessions campaign name | source(medium) vs clicks, goals, 56, 55, 81 transactions campaign name | source(medium) vs clicks, revenue, 56, 82, 54 cost 56, 55 54 campaign name | source(medium) vs clicks, goals campaign name | source(medium) vs cost
2241 Medium ComparisonAcquisition 56, 57, 53 2242 2243 2244 2245 2246 2251 2252 2253 2254 2255 2256 2265 2266 2267 2268
campaign name | source(medium) vs clicks, impressions, leads campaign name | source(medium) vs clicks, pages, Medium ComparisonQuality 56, 52, 51 sessions campaign name | source(medium) vs clicks, goals, Medium ComparisonConversion 56, 55, 81 transactions campaign name | source(medium) vs clicks, revenue, Medium ComparisonROI 56, 82, 54 cost Medium ComparisonConversion 56, 55 campaign name | source(medium) vs clicks, goals Medium Comparisoncost 54 campaign name | source(medium) vs cost breakdown campaign name | source(medium) vs clicks, Content TestingAcquisition 63, 64, 60 impressions, leads campaign name | source(medium) vs clicks, pages, Content TestingQuality 63, 59, 58 sessions campaign name | source(medium) vs clicks, goals, Content TestingConversion 63, 62, 83 transactions campaign name | source(medium) vs clicks, revenue, Content TestingROI 63, 84, 61 cost Content TestingConversion 63, 62 campaign name | source(medium) vs clicks, goals Content Testingcost breakdown 61 campaign name | source(medium) vs cost Goal Conversion by Hour 73, 72 term | hour vs goals, clicks Sales Conversion by Hour 87, 72 term | hour vs transactions, clicks IPVisitorID | source (medium) vs |term repeat Repeat Clicks by IP 76 responses Repeat Clicks by IP 76 216
2261 2262 2263 2264
Time To Goal Sessions To Goal Time To Transaction Sessions To Transaction
75 74 89 88
IPVisitorID | source (medium) vs |term repeat responses days delta vs goals session delta vs goals days delta vs transactions session delta vs transactions
Configuration Table and Directive List
Overview The following matrices provide exact details on the table names, directives, and meanings for each database table in the Urchin 5 configuration. The first matrix defines each of the table names and what that table is used for. Then, for each database table, a comprehensive list of directives is provided. Please note that most records will not specify a value for every possible directive for the table to which the record belongs. In some cases the directives may not be applicable to that particular record. Also, Urchin will use default values if there is no explicit definition for a directive. Directives may be manipulated by the webbased Urchin administration interface, or by scripts that use the uconfdriver or uconfimport utilities. It should also be noted that this reference guide does not contain verbose descriptions of the directives and how they are to be used. In many cases, the intended usage of the directive may not be immediately obvious from the directive name and description provided. You should consult the appropriate sections in the Urchin Documentation Center at http://help.urchin.com to gain more insight about the capabilities of the product (e.g. filtering, backups, archiving, report view customization, etc.) and how the capabilities can be controlled with the configuration directives detailed below. Note: Where applicable, default values are printed in bold typeface. Table Name Definitions Table Name global machine filter logfile profile task Meaning/Purpose General settings including licensing and remote access Process settings including database sizing, memory usage and process priority Specifies log and profile runtime filter parameters Specifies the location and format of a log source Log Processing and Reporting settings for a particular website Runtime schedule settings for a particular profile 217
affiliation group user
Enterpriselevel management of profiles, log sources, filters, groups and users Grouplevel management of users including profile access Individual user settings including password, language and locality
Directive List: global table Directive cr_dcmode cr_directlink cr_remoteaccess cr_remoteadmin cs_region Meaning/Purpose datacenter mode (on|off) allow direct web links to Urchin reports (on|off) allow remote access (on|off) allow remote administration (on|off) twoletter global region code fr=France ge=Germany it=Italy ja=Japan ko=Korea po=Portugal sp=Spain sw=Sweden uk=United Kingdom
ct_license ct_name ct_port ct_serial ct_schedulers cr_setupwizard ct_var
license code used by Urchin Licensing identifier for record in global table port that Apache runs on (default: 9999) serial code used by Urchin Licensing [internal use only] run setup wizard first time (on|off) VAR code used by Urchin licensing length of time in seconds that the scheduler waits before checking for the next ct_schedulersleep task (default: 3) Directive List: machine table Directive cr_priority cs_preset cs_limitdbtable ct_dbuffsize ct_pbuffsize ct_name Meaning/Purpose run priority of Urchin log processing engine (low|normal|high) [internal use only] maximum number of records allowed in database tables (default: 10000) data buffer size in MB (default: 13) path buffer size in MB (default: 1) identifier for record in machine table
218
ct_sbuffsize ct_tbuffsize ct_vbuffsize
session buffer size in MB (default: 3) text buffer size in MB (default: 1) visitor buffer size in MB (default:2)
Directive List: filter table Unless otherwise noted, directives in this table apply to all filter types. cr_action [internal use only] filter is case sensitive (yes|no) cr_casesensitive applies to advanced|exclude|include|replace filter types cr_filtertype type of filter: advanced=Advanced filter built from two other fields decode=Decode URLencoded characters back to their original form dynamicurl=DynamicURL filter from Urchin 3 and Urchin 4 (deprecated) exclude=Exclude pattern filter include=Include pattern filter jaconv=Convert various Japanese encodings into UTF8 encoding replace=Pattern search and replace filter overwrite data in the output field if it is already populated (yes|no) cr_override applies to advanced filter type ID number of field to apply filter to, from the Regular Field List reference table cs_filterfield applies to decode|exclude|include|jaconv|replace filter types ID number of first field to apply filter to, from the Regular Field List reference cs_infielda table applies to advanced filter type ID number of second field to apply filter to, from the Regular Field List reference cs_infieldb table applies to advanced filter type ID number of the field to ouput filter results to, from the Regular Field List cs_outfield reference table applies to advanced filter type exclamationpoint delimited list of log source recnums to which this filter is cs_llist applied (uconfdriver) comma delimited list of log source names to which this filter is applied (uconfimport) exclamationpoint delimited list of profile recnums to which this filter is applied cs_rlist (uconfdriver) comma delimited list of profile names to which this filter is applied (uconfdriver) ct_affiliation optional affiliation filter pattern (simple pattern or POSIX regular expression) ct_filter applies to include|exclude filter types Chapter 8: Reference 219
ct_inexpa ct_inexpb ct_name ct_outexp ct_replace ct_search
regular expression pattern for first filter applies to advanced filter type regular expression pattern for second filter applies to advanced filter type identifier for record in filter table expression defined explicitly or constructed from saved pattern parts of input expressions, e.g.($A1, $B2) applies to advanced filter type replacement string pattern applies to replace filter type search string pattern applies to replace filter type
Directive List: logfile table Directive cr_action Meaning/Purpose [internal use only] disposition of log after processing (1=don't touch, 2=archive/compress, cr_logdestiny 3=delete) cr_protocol remote log transfer protocol (ftp|http) cr_type location of log (local|remote) cr_uristemtolower convert the URI stem to lower case when reading log (on|off) exclamationpoint delimited list of filter recnums which are applied to this log cs_flist source (uconfdriver) comma delimited list of filter names which are applied to this log source (uconf import) cs_logformat logging format for logfile (auto|elf|elf2|ncsa|netscape|w3c) exclamationpoint delimited list of profile recnums using this log source cs_rlist (uconfdriver) comma delimited list of profile names using this log source (uconfimport) ct_affiliation optional affiliation ct_loglocation local log pathname/location (e.g. /logs/access.log) ct_name identifier for record in logfile table password for ftp/http remote log access and UNC pathnames in Windows ct_password environments offset in hours from local time when using date matching patterns in the logfile ct_pathtimeoffset specification (e.g. +8) substitute GMT time for local time when using date matching patterns in the ct_pathtimegmt logfile specification (on|off) ct_port port number (e.g. 21 for ftp, 80 for http) ct_querytoken specify the query token separating the URI stem from the query (default: ? ) ct_remotelocation remote log pathname/location Chapter 8: Reference 220
ct_separator ct_server ct_username Directive List: profile table Directive cr_archivedata
single character field separator character (\s, \t are escaped characters for space and tab) fully qualified domain name or IP address of remote host/server for remote log downloads username to use for ftp/http remote log access and UNC pathnames in Windows environments (default: anonymous)
Meaning/Purpose enable automatic ZIP archiving of older Urchin monthly databases (on|off) enable automatic rollback of Urchin databases after failed log processing cr_autorollback (on|off) cr_cleanbackups enable automatic removal of outdated Urchin database ZIP backups (on|off) enable automatic creation of Urchin database ZIP backups to allow rollback cr_createbackups functionality (on|off) specify whether ct_mimes list of pageview suffix/MIME types should be an cr_includemimes include or exclude list (exclude|include) specify whether ct_parameters list of URI query terms types should be an cr_includeparameters include or exclude list (exclude|include) cr_keeprawtrackingdata specify whether raw tracking data should be retained (on|off) cr_logtracking turn log tracking (on|off) cr_pgoalcasesensitive campaign primary goal match case sensitive (yes|no) cr_processpath turn visitor tracking (on|off) specify whether to keep visitor information between log processing runs cr_processvisitors (on|off) cr_profiletype profile type (Standard_Website|ECommerce_Website) cr_sessionpageview session requires a pageview (on|off) cs_archivenmonths create monthly ZIP archives of Urchin databases after n months (default: 12) exclamationpoint delimited list of filter recnums applied to this profile cs_flist (uconfdriver) comma delimited list of filter names applied to this profile (uconfdriver) exclamationpoint delimited. list of group recnums granted access to this cs_glist profile (uconfdriver) comma delimited. list of group names granted access to this profile cs_keepnbackups specify number of ZIP backups to keep (010, default: 2) maximum number of database records to keep for any database table for this cs_limitdbtable profile (overrides cs_limitdbtable global value; default: 10000) exclamationpoint delimited. list of log source recnums associated with this cs_llist profile (uconfdriver) comma delimited. list of log source names associated with this profile Chapter 8: Reference 221
cs_pathlevel cs_pgoalfield cs_referrallevel
cs_reportset
cs_sidfield cs_taskid cs_timeoffset cs_ulist
cs_vmethod cs_visitortimeout ct_affiliation ct_defaultpage ct_downloads ct_keywords ct_lasthit ct_mimes ct_name ct_parameters ct_pgoalexp ct_pgoalfield ct_reportdomains ct_sidpre ct_sidpost ct_utmdomain
(uconfimport) depth of path reporting (default: 3) internal numeric id of field in ct_pgoalfield referral level to report (default: 3) report view template for this profile specified as one of the six builtin templates: Basic All|Basic Lite|Basic IT UTMEnabled All|UTMEnabled Nopaths|UTMEnabled Webdesign or a UserSpecified reporting template that matches a custom ".rs" reporting template file ID number for field where session ID is contained, from the Regular Field List reference table recnum for associated task in Task table (uconfdriver only) Time offset (in seconds) for data in log (default: 0=GMT) exclamationpoint delimited list of user recnums granted access to this profile (uconfdriver) comma delimited list of user names granted access to this profile (uconfimport) visitor tracking (0=IPUserAgent, 1=Session ID, 2=UTM, 3=IPOnly) session timeout in seconds (default: 3600) optional affiliation default page for site (e.g. index.html) comma separated list of download page suffix/MIMEtypes to match (default: dmg,doc,exe,gz,pdf,pkg,ppt,sh,tar,xls,zip) comma separated list of search engine referral keywords to match (default: general,key,kw,mt,p,q,qs,qt,query,search,search_string,text,word,words) time of most recent hit processed for this profile in seconds since 1970 [readonly, set by log processing engine] comma separated list of pageview suffixes/MIME types to match or exclude (default: css,cur,gif,ico,ida,jpeg,jpg,js,png) identifier for record in profile table comma separated list of URI query terms to include or exclude in the Page Query Terms report (default: sid) campaign primary goal expression to match field name to match expression in ct_pgoalexp against comma delimited list of site domains (e.g. urchin.com,www.urchin.com,quantified.net,www.quantified.net) text pattern that precedes session id pattern being matched text pattern that terminates the session id pattern being matched domain named to be used for UTM tracking (must match that set in __utm.js file in the document root of the website itself) 222
ct_website Directive List: task table Directive
URL for website associated with this profile (e.g. http://www.urchin.com)
Meaning/Purpose start time of last run for this task in seconds since 1970 [readonly, set by log cd_btime processing engine] finish time of last run for this task in seconds since 1970 [readonly, set by log cd_etime processing engine] time of last initiation for this task in seconds since 1970 [readonly, set by log cd_lastrun processing engine] cd_nextrun time of next run for this task in seconds since 1970 cr_dow day of week to run task (0=Sun,1=Mon,2=Tue,3=Wed,4=Thu,5=Fri,6=Sat) cr_enabled [internal use only] cr_frequency task frequency (0=never,3=once,4=hourly,5=daily,6=weekly,7=monthly) cr_runnow [internal use only] cs_dom day of month to run task (monthly scheduling option) [131] cs_hour hour of day to run task [023] cs_minute minute of hour to run task [059] cs_rid recnum for associated profile in Profile table (uconfdriver only) ct_affiliation optional affiliation ct_application [internal use only] ct_completed percent of log processing completed [readonly, set by log processing engine] ct_day day of month to run task (runonce option)[131] ct_lockid [internal use only] ct_month month to run task (runonce option)[112] ct_pid [internal use only] ct_name identifier for record in task table current runtime status of task (0=processing logs,1=processing ct_runstatus DNS,2=completed,3=error,4=queued) [readonly, set by log processing engine] current scheduling status of task (0=disabled,1=not ct_status scheduled,2=scheduled,3=running,4=completed,5=error) [readonly, set by log processing engine] ct_year year for task to run (runonce option) [4digit CCYY format] Directive List: affiliation table Directive Meaning/Purpose pathname specification for toplevel directory allowed for browsing for logs in ct_browselocation Log Source ct_cachedirectory Chapter 8: Reference 223
pathname specification of directory used to store temporary cache files used in display of reports for an affiliation ct_contact descriptive name for the affiliation's contact person ct_email email address for affiliation's contact person ct_name identifier for record in affiliation table pathname specification for toplevel directory where Urchin reporting databases ct_reportdirectory will live for the affiliation Directive List: user table cr_changelanguage cr_changepassword cr_changeregion cr_leveltype user may change language preference (no|yes) user may change password (no|yes) user may change region preference (no|yes) affiliation admin privilege level 0=manage users/groups/tasks 1=manage users/groups/tasks/filters 2=manage users/groups/tasks/filters/log sources/profiles admin level (1=admin, 2=affiliate admin, 3=user) exclamationdelimited list of group recnums the user belongs to (uconfdriver) comma delimited list of group names the user belongs to (uconfimport) twoletter report language code for user en=English fr=French ge=German ja=Japanese sp=Spanish twoletter region code for user us=United States ch=China fr=France ge=Germany it=Italy ja=Japan ko=Korea po=Portugal sp=Spanish sw=Sweden uk=United Kingdom exclamationpoint delimited list of profile recnums the user has access to (uconfimport) 224
cs_adminlevel cs_glist
cs_language
cs_region
cs_rlist Chapter 8: Reference
cs_rslist ct_affiliation ct_fullname ct_name ct_password Directive List: group table
commadelimited list of profile names the user has access to (uconfimport) exclamationpoint delimited set of "recnum|ReportSetName" pairs that optionally controls the report view for this user for a particular report (e.g. !79|Basic_All!83|Basic_Lite!) optional affiliation for user full name of user identifier for record in user table user password (automatically encrypted on input by uconfdriver or uconfimport)
exclamationpoint delimited list of profile recnums the group has access to (uconfdriver) commadelimited list of profile names the group has access to (uconfimport) exclamationpoint delimited set of "recnum|ReportSetName" pairs that optionally cs_rslist controls the report view for this group for a particular report (e.g. !79|Basic_All!83|Basic_Lite!) exlamationpoint delimited list of user recnums assigned to the group cs_ulist (uconfdriver) commadelimited list of user names assigned to the group (uconfimport) ct_affiliation optional affiliation ct_groupdesc description of the group ct_name identifier for record in group table cs_rlist
Error code list for failed FTP and HTTP remote webserver log transfers
Overview An Urchin Log Source can be configured to collect a webserver log from a remote server via FTP or HTTP. Under normal circumstances, the transfer will be successful and no errors appear in the runtime log. However, if some error is encountered during the transfer (e.g. an invalid username/password, remote server unreachable, remote log unreadable, etc.), Urchin will log an error code in the runtime output, as viewable in the Task History for the Profile. This error code appears in parenthesis next to the "failed" message after the webserver log transfer is attempted, e.g. (9)
225
The error codes are listed below along with a text message explaining the problem that was encountered. Error Code List
1 Unsupported protocol. This build support for this protocol. Failed to initialize. URL malformat. The syntax was not correct. URL user malformatted. syntax was not correct. The userpart of the URL of curl has no
2 3 4
Couldn't resolve proxy. The given proxy host not be resolved. Couldn't resolve not resolved. host.
could
The given remote host was
7 8
Failed to connect to host. FTP weird server reply. The server sent couldn't parse. data curl
9 10
FTP access denied. The server denied login. FTP user/password incorrect. Either were not accepted by the server. one or both
11
FTP weird PASS reply. Curl couldn't parse the reply sent to the PASS request. FTP weird USER reply. Curl couldn't parse the reply sent to the USER request. FTP weird PASV reply, Curl couldn't parse the reply sent to the PASV request. FTP weird 227 format. Curl 227line the server sent. couldn't parse the
12
13
14
15
FTP can't get host. Couldn't resolve the host IP we got in the 227line. FTP can't reconnect. Couldn't connect to the host we got in the 227line. FTP couldn't set binary. Couldn't method to binary. Partial fered. file. Only change transfer
16
17
18
a part of the file was trans
19
FTP couldn't download/access the given RETR (or similar) command failed. FTP write
file,
the
20
error. The transfer was reported bad by
226
the server. 21 FTP quote error. A from the server. quote command returned error
22
HTTP not found. The requested page was not found. This return code only appears if fail is used. Write error. Curl couldn't write data filesystem or similar. to a local
23
24 25
Malformat user. User name badly specified. FTP couldn't STOR file. The server denied the STOR operation. Read error. Various reading problems. Out of memory. A memory allocation request failed.
26 27 28
Operation timeout. The specified timeout period was reached according to the conditions. FTP couldn't set unknown reply. ASCII. The server returned an
29
30 31 32
FTP PORT failed. The PORT command failed. FTP couldn't use REST. The REST command failed. FTP couldn't use SIZE. The SIZE command failed. The command is an extension to the original FTP spec RFC 959. HTTP range error. The range "command" didn't work. postrequest generation
33 34
HTTP post error. Internal error.
35 36
SSL connect error. The SSL handshaking failed. FTP bad download resume. Couldn't continue an ear lier aborted download. FILE couldn't read file. Failed to open Permissions? the file.
37
38 39 40 41
LDAP cannot bind. LDAP bind operation failed. LDAP search failed. Library not found. The LDAP library was not found. LDAP function was
Function not found. A required not found. Aborted by callback. abort the operation.
42
An application told curl to
227
43
Internal error. A function was called parameter. Internal order. error. A function was
with
bad
44
called in a bad
45
Interface error. A could not be used.
specified
outgoing
interface
46
Bad password entered. An error was signaled when the password was entered. Too many redirects. When following redirects, hit the maximum amount. Unknown TELNET option specified. Malformed telnet option. The remote peer's SSL certificate wasn't ok The server didn't reply considered an error. SSL crypto engine not found Cannot set SSL crypto engine as default Failed sending network data Failure in receiving network data Share is in use (internal error) Problem with the local certificate Couldn't use specified SSL cipher Problem with the CA cert (path? permission?) Unrecognized transfer encoding anything, which here is curl
47
48 49 51 52
53 54 55 56 57 58 59 60 61
228

MANUAL Urchin v5x

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MANUAL Urchin v5x

Uploaded by

Copyright:

Available Formats

Urchin 5.

000 Urchin Administration/User Guide

Chapter 1: Getting Started

Urchin Setup Requirements

Chapter 1: Getting Started

Chapter 1: Getting Started

Chapter 1: Getting Started

Chapter 1: Getting Started

Installation Guide (Windows)

Installation Guide (UNIX)

Chapter 1: Getting Started

Installation Guide (Mac OS X 10.2.x)

Chapter 1: Getting Started

Installation Guide (Sun Cobalt)

Chapter 1: Getting Started

Chapter 1: Getting Started

Chapter 1: Getting Started

Troubleshooting Install Problems

Chapter 1: Getting Started

Upgrading Urchin 3 on Sun Cobalt

Chapter 1: Getting Started

Chapter 1: Getting Started

Urchin 3, 4, &5 Reporting Differences

Chapter 1: Getting Started

Chapter 1: Getting Started

Chapter 1: Getting Started

Chapter 1: Getting Started

Chapter 2: Visitor Tracking

Using UTM with Ecommerce

Visitor Identification Methods

Urchin Traffic Monitor (UTM)

Chapter 2: Visitor Tracking

Chapter 2: Visitor Tracking

88591"> <script src="/__utm.js" type="text/javascript"></script> ... </head>

Chapter 2: Visitor Tracking

UTM QuickInstall (Apache)

Chapter 2: Visitor Tracking

Installing UTM On Every Page (Apache)

UTM QuickInstall (IIS)

Using UTM with Domain Aliases

Chapter 2: Visitor Tracking

Using UTM with Multiple Sites

Multiple Sites Same Root Domain Chapter 2: Visitor Tracking 50

Change the word "auto" to the common domain:

Tracking Flash and Browser Events (UTM5 only)

onClipEvent (enterFrame) { getURL("javascript:urchinTracker('/folder/file');"); }

The following illustrates how to log a rollover event:

Chapter 2: Visitor Tracking

Tracking Banner Ad Exits and Other Outbound Links

Chapter 2: Visitor Tracking

Chapter 3: Urchin Administration

Chapter 3: Urchin Administration

Chapter 3: Urchin Administration

Chapter 3: Urchin Administration

Working with Profiles

Chapter 3: Urchin Administration

Chapter 3: Urchin Administration

Log Rotation Best Practices

Chapter 3: Urchin Administration

Chapter 3: Urchin Administration

Logging Apache and IIS

Logging: Tomcat (Apache Jakarta Project)

Chapter 3: Urchin Administration

Logging Other Webservers