Hack79.Accurately Measure Downloads


Hack 79. Accurately Measure Downloads

Many sites need to know not just who has requested a downloadable file but whether that file was successfully delivered. Fortunately, in many instances, you can use your web measurement application and server logfiles to make this determination.

Measuring downloads can be crucial for many types of web sites, including those belonging to software vendors, content publishers, and computer gaming companies. Accurately measuring downloads from such sites can be challenging. Many common approaches to web analytics do not allow downloads to be measured at all. Page-tagging data collection methods are oriented toward measuring web page views only, not the download of the thousands of other files types that may be distributed through web sites. If tracking downloads is important for your site, then you will need to take this into special consideration when selecting a web analytics product or service. In addition, tracking downloads has become more complicated by the proliferation of "download managers" that are used to speed download times for users.

5.13.1. Multiple Levels of Measurement Can Be Used to Track Downloads

Downloads can be tracked to a lesser or greater degree of granularity and accuracy, depending on how involved you want to get. The first level of granularity includes determining that a visitor requested a download and that the particular download began. The second level of granularity includes determining that a download began and was actually completed, and the third includes determining that a download was completed in multiple parts by a download manager.

5.13.1.1 Mine basic HTTP requests.

The first level of measurement of downloads requires that you use a web analytics product that collects the HTTP request data directly from your web and application servers, either into logfiles or into a database. For example, if a web site is distributing trial versions of multiple software products, then you must meet the following minimum requirements to measure "that a download was started by a visitor to the site."

  • You must have a web analytics product that allows you to define downloads as a metric separate from your page views metrics. Many web analytics products do not have a construct for downloads; they require that you look at a download as if it were a page view. Though you will be able to see how many downloads were started, your page view counts will be skewed.

  • Make sure your web analytics product is set up to extract requests for the types of files that you publish for download. Different web analytics products require that this be done in different ways, but almost all of them require that you do this proactively. For instance, define .zip, .pdf, .exe, and other file types as "downloads." You may also need to configure your web analytics product so that it does not filter out requests made for these files types. Some products will filter out these requests by default.

Understand that this first level of download measurement is not "accurate" with regard to the completion of a download. It will tell you only that a visitor requested and started a download.

5.13.1.2 Mine basic HTTP requests for download completion.

The second level of downloads measurement has the same requirements as the first level, plus a few additional requirements. The objective of the second level of download measurement is to determine whether the download was completed. The following steps can be taken to make this determination.

  • Create a delimited file listing all of the URIs corresponding to each of the downloadable files you host for distribution through your web site. In this list, you will need to add information about each of these downloadable files: the URI stem to the downloadable files and the actual byte count of each of these files. The following is an example of such a delimited file:

     Product1TrialVersion|downloads/product1.zip|48947000 Product2TrialVersion|downloads/product1.zip|59936897 Product1Docs|downloads/product1docs.pdf|7893457 Product2Docs|downloads/product2docs.pdf|7864509 

  • Configure your web server to capture the actual bytes sent when a request for a particular download is made. This configuration varies by web server. In Microsoft Internet Information Server, this configuration is called bytes sent or sc-bytes and may be checkboxed in the extended logging properties dialog box within the Microsoft IIS web site properties.

  • When you process your log data, have your web measurement product look for the URI stem, as defined within your delimited file, corresponding to the downloads in the logs as they are processed. When you see one that matches your list, look up the actual byte count of that download in your list of downloadable files. Then compare that actual byte count with the bytes sent for that download. If the bytes sent equals or is greater than the actual byte count for that particular downloadable file, then increment your count for successful downloads of that file by one. If bytes sent are less than the actual byte count for that downloadable file, then increment the unsuccessful downloads count for that downloadable file by one.

5.13.1.3 Compensate for download managers.

The third level of accurate download measurement compensates for the use of download managers. Your site visitors may use one of the many available download managers to speed the downloading of files. FlashGet, GetRight, Go!Zilla, Fresh Download, and Internet Download Manager are among the commonly used download management applications. While these applications ease the process of downloading files for your visitors, they may add complexity to the measurement of downloads from your site.

Consider an example in which FlashGet was used to download a large .zip file (48947000 bytes). Here are the relevant log records (FlashGet shows up in the logs as cs(User-Agent) (Mozilla/4.0+(compatible;+MSIE+5.00;+Windows+98)):

 date time c-ip cs-username s-sitename s-computername s-ip s-port cs-method  cs-uri-stem cs-uri-query sc-status sc-bytes cs-bytes time-taken cs(User- Agent)cs(Cookie)cs(Referer) 2004-11-22 12:49:16 192.168.10.38 -W3SVC1 VS 10.2.1.90 80 GET /product1.zip  -200 10420488 249 32000 Mozilla/4.0+(compatible;+MSIE+5.00;+Windows+98)  c=419389F68E854973 Download+Test  2004-11-22 12:49:16 192.168.10.38 -W3SVC1 VS 10.2.1.90 80 GET /product1.zip - 206 9961776 273 30703 Mozilla/4.0+(compatible;+MSIE+5.00;+Windows+98) c=419389F68E854973 Download+Test 2004-11-22 12:49:16 192.168.10.38 -W3SVC1 VS 10.2.1.90 80 GET /product1.zip -206 9961776 273 30968 Mozilla/4.0+(compatible;+MSIE+5.00;+Windows+98) c=419389F68E854973 Download+Test 2004-11-22 12:49:16 192.168.10.38 -W3SVC1 VS 10.2.1.90 80 GET /product1.zip -206 9929568 273 31031 Mozilla/4.0+(compatible;+MSIE+5.00;+Windows+98) c=419389F68E854973 Download+Test 2004-11-22 12:49:17 192.168.10.38 -W3SVC1 VS 10.2.1.90 80 GET /product1.zip -206 10027312 273 31656 Mozilla/4.0+(compatible;+MSIE+5.00;+Windows+98) c=419389F68E854973 Download+Test  

The log records show that FlashGet made five GET requests for /product1.zip, and the sc-bytes for the five requests add up to 50,300,920. The first request has HTTP status code 200 (successful request by the client), and the second through fifth requests have status code 206 (partial content; the partialGET request has been successful). FlashGet accepted a persistent cookie and let the user set cs(referrer); most often you will find cs(referrer) to be null.

To track a successful download for a particular download manager user with a particular cookie (cs(cookie)), you must add up the sc-bytes from an initial 200 status record for the download file and the multiple subsequent 206 records from the same cs(cookie) until the number of sc-bytes exceeds the downloadable file's actual byte count (i.e., the sum of the 200 and 206 requests' sc-bytes counts will generally exceed the download file's actual byte count; most download managers download more than a proportional part of the file with each request as an intentional overlap to allow correct concatenation at the download manager). In an unsuccessful multi-part download, the sum of the sc-bytes counts for the 200 requests and 206 requests would be less than the actual byte count of the file.

Different download managers other than the one used here may behave differently, and different but similar rules would need to be applied.


It is highly unlikely that you will want to invest time and resources to investigate your log data in this manner for all of the download attempts that occur from your web site. Certain web measurement products can automate this process for you, as long they can consider multiple log records that come in over time and operate on them together to determine whether a download was successful or unsuccessful. Your vendor's solution may define a "rule" to produce a list of cookies or IP and user agent combinations that make multiple GET requests for the same object (starting with a 200 record followed by multiple 206 records) within some time span. With this information, you could assess the behavior of various download managers and set your web analytics products' rules appropriately for tracking successful and unsuccessful downloads attempted by download managers.

Jim MacIntyre and Eric T. Peterson



    Web Site Measurement Hacks
    Web Site Measurement Hacks: Tips & Tools to Help Optimize Your Online Business
    ISBN: 0596009887
    EAN: 2147483647
    Year: 2005
    Pages: 157

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net