Web Hosting
Home > Articles > Web Hosting Related > Website Statistics Strategies

Carlton Lovegrove

An important component of any ecommerce initiative is to track the effectiveness of the marketing effort. Through careful analysis of a web site's statistics much information can be gleaned that can be further used to fine tune the advertising, web site content, and customer relationship management strategies and policies. These are all important elements of Internet Marketing plans and strategies that can ultimately dictate the success or failure of any ecommerce initiative.

Surfing the World Wide Web involves traversing the connections among hyperlinked documents. It is one of the most common ways of accessing web pages. Theories and models are beginning to explain how observed patterns of surfing behavior emerge from fundamental human information search processes. Therefore, the ability to predict surfing patterns has the potential to be instrumental in solving many problems facing producers and consumers of web page content. For instance, web site designs can be evaluated and optimized by predicting how users will surf through their structures. Web client and server applications can also reduce user perceived network latency by pre-fetching content predicted to be on the surfing path of individual users or groups of users with similar surfing patterns. Systems and user interfaces can be enhanced by the ability to recommend content of interest to users, or by displaying information in a way that best matches users' interests. Proper analysis of a web site's activity is therefore an important process that supports an enhanced and intelligent design of a web site.

A common and popular source of tracking data and statistics for any website is the log file on the web server. Most web servers have a system for recording all requests for web site objects to a log file. The data in the log file indicates which objects were requested, when, and information about whom or what requested them. Therefore, with the appropriate software that is used to process this data, company managers and executives can measure the success of their websites and develop appropriate strategies to address weaknesses and enhance their prospects for future success by assessing their site's visibility (the ease with which customers can locate your site), navigability (the paths that customers use to navigate through your site), and the usability (how easy is it for customers to use your site).

However, complete reliance on data collected in log files has its pitfalls, some of which will be discussed in this article. Other tools such as tracking counters help overcome some of the problems encountered with log file analysis. Therefore, an intelligent selection of site statistics software requires the ability to recognize the strengths and weaknesses of each tool in order to effectively strike a balance that realizes the missions and goals of your organization. Understanding the statistics provided by web site analysis software is critical in order to properly interpret, evaluate, and design subsequent marketing strategies.

Log file data

While web servers have the ability to record vast amounts of information, relatively few fields are typically recorded. Several formats have evolved from the Common Logfile Format (CLF), including the Extended Logfile Format (ECLF) as well as a variety of customized formats. For the most part, the following fields are recorded by web servers:

  • the time of the request in seconds,
  • the machine making the request is recorded as either the domain name or IP address,
  • the name of the requested URL as specified by the client,
  • the size of the transferred URL, and
  • various HTTP related information like version number, method, and return status.

Various web servers also enable other fields to be recorded, the most common of which are:

  • the URL of the previously viewed page (the “referrer” field),
  • the identity of the software used to make the request (the “user agent” field), and
  • a unique identifier issued by the server to each client (typically a “cookie”).

Understanding how all of this data is interpreted and displayed in a user readable format for subsequent decision analysis is an important component of any statistical analysis. It is therefore crucial that users be aware that there are different ways that the statistical analysis software can present the data to you. Subsequent sections of this article address some of the important decisions that the statistical analysis software must make when creating reports on your web site activity.

URLs and Referrer Fields

While these fields are useful to analyze and provide reasonable characterizations, several limitations make analysis difficult when attempting path reconstruction efforts. The URL recorded is the URL as requested by the user, not the location of the file returned by the server. This behavior can cause false tabulation for pages when the requested page contains relative hyperlinks, symbolic links, and/or hard coded expansion/translation rules, e.g., directories do not always translate to “index.html.” It also can lead to two paths being considered different when in actuality they contain the same content. While both pieces of information are useful, the canonical file system-based URL returned by the server would arguably be more useful as it removes the ambiguity of what resource was returned to the user.

In addition, the content of the information contained in the referrer field can be quite varied. Various browsers and proxies do not send this information to the server for privacy and other reasons. In addition, the value of the referrer field is undefined for cases in which the user requests a page by typing in the URL, selects a page from their Favorites/Bookmarks list, or uses other interface navigational aids like the history list. Furthermore, several browsers provide conflicting values for the referrer field. To illustrate, suppose a user selects a listing for the Dell Corporation on Yahoo. In requesting the Dell splash page, the URL for the page on Yahoo is provided as the value for the referrer field. Now suppose the user clicks on the Products page, returns to the Dell splash page, and reloads the splash page. In several popular browsers, the referrer field for Yahoo is included in the second request for the Dell splash page although the last page viewed on the user's surfing path was the Product page in the Dell site. If one chooses to reconstruct paths by relying upon the referrer field, the paths of two users may be identified instead of only one. Given these limitations, strong reliance upon the information in the referrer field may be more problematic than one would initially expect.

User Agent Fields

The user agent field also suffers from imprecise semantics, different implementations, and missing data. This can partially be attributed to the use of the field by browser vendors to perform content negotiation. Given that the rendering of HTML differs from browser to browser, servers have the ability of altering the HTML based upon which browser is on the other end. Consequently, the user agent field may contain the name of multiple browsers. Some proxies also append information to this field. In addition, the value of the user agent field can vary for requests made by the same user using the same Web browser. Adding to the confusion, there is no standardized manner to determine if requests are made by autonomous agents (e.g., robots), semi-autonomous agents acting on behalf of users (e.g., copying a set of pages for off-line reading), or humans following hyperlinks in real time. Clearly, it is important to be able to understand these classes of requests when attempting to model surfing behaviors.

Page 2

Web HostingWeb Hosting


Web Host

windows 2003 host