Hack 79. Accurately Measure Downloads
Many sites need to know not just who has requested a downloadable file but whether that file was successfully delivered. Fortunately, in many instances, you can use your web measurement application and server logfiles to make this determination. Measuring downloads can be crucial for many types of web sites, including those belonging to software vendors, content publishers, and computer gaming companies. Accurately measuring downloads from such sites can be challenging. Many common approaches to web analytics do not allow downloads to be measured at all. Page-tagging data collection methods are oriented toward measuring web page views only, not the download of the thousands of other files types that may be distributed through web sites. If tracking downloads is important for your site, then you will need to take this into special consideration when selecting a web analytics product or service. In addition, tracking downloads has become more complicated by the proliferation of "download managers" that are used to speed download times for users. 5.13.1. Multiple Levels of Measurement Can Be Used to Track DownloadsDownloads can be tracked to a lesser or greater degree of granularity and accuracy, depending on how involved you want to get. The first level of granularity includes determining that a visitor requested a download and that the particular download began. The second level of granularity includes determining that a download began and was actually completed, and the third includes determining that a download was completed in multiple parts by a download manager. 5.13.1.1 Mine basic HTTP requests.The first level of measurement of downloads requires that you use a web analytics product that collects the HTTP request data directly from your web and application servers, either into logfiles or into a database. For example, if a web site is distributing trial versions of multiple software products, then you must meet the following minimum requirements to measure "that a download was started by a visitor to the site."
Understand that this first level of download measurement is not "accurate" with regard to the completion of a download. It will tell you only that a visitor requested and started a download. 5.13.1.2 Mine basic HTTP requests for download completion.The second level of downloads measurement has the same requirements as the first level, plus a few additional requirements. The objective of the second level of download measurement is to determine whether the download was completed. The following steps can be taken to make this determination.
5.13.1.3 Compensate for download managers.The third level of accurate download measurement compensates for the use of download managers. Your site visitors may use one of the many available download managers to speed the downloading of files. FlashGet, GetRight, Go!Zilla, Fresh Download, and Internet Download Manager are among the commonly used download management applications. While these applications ease the process of downloading files for your visitors, they may add complexity to the measurement of downloads from your site. Consider an example in which FlashGet was used to download a large .zip file (48947000 bytes). Here are the relevant log records (FlashGet shows up in the logs as cs(User-Agent) (Mozilla/4.0+(compatible;+MSIE+5.00;+Windows+98)): date time c-ip cs-username s-sitename s-computername s-ip s-port cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs-bytes time-taken cs(User- Agent)cs(Cookie)cs(Referer) 2004-11-22 12:49:16 192.168.10.38 -W3SVC1 VS 10.2.1.90 80 GET /product1.zip -200 10420488 249 32000 Mozilla/4.0+(compatible;+MSIE+5.00;+Windows+98) c=419389F68E854973 Download+Test 2004-11-22 12:49:16 192.168.10.38 -W3SVC1 VS 10.2.1.90 80 GET /product1.zip - 206 9961776 273 30703 Mozilla/4.0+(compatible;+MSIE+5.00;+Windows+98) c=419389F68E854973 Download+Test 2004-11-22 12:49:16 192.168.10.38 -W3SVC1 VS 10.2.1.90 80 GET /product1.zip -206 9961776 273 30968 Mozilla/4.0+(compatible;+MSIE+5.00;+Windows+98) c=419389F68E854973 Download+Test 2004-11-22 12:49:16 192.168.10.38 -W3SVC1 VS 10.2.1.90 80 GET /product1.zip -206 9929568 273 31031 Mozilla/4.0+(compatible;+MSIE+5.00;+Windows+98) c=419389F68E854973 Download+Test 2004-11-22 12:49:17 192.168.10.38 -W3SVC1 VS 10.2.1.90 80 GET /product1.zip -206 10027312 273 31656 Mozilla/4.0+(compatible;+MSIE+5.00;+Windows+98) c=419389F68E854973 Download+Test The log records show that FlashGet made five GET requests for /product1.zip, and the sc-bytes for the five requests add up to 50,300,920. The first request has HTTP status code 200 (successful request by the client), and the second through fifth requests have status code 206 (partial content; the partialGET request has been successful). FlashGet accepted a persistent cookie and let the user set cs(referrer); most often you will find cs(referrer) to be null. To track a successful download for a particular download manager user with a particular cookie (cs(cookie)), you must add up the sc-bytes from an initial 200 status record for the download file and the multiple subsequent 206 records from the same cs(cookie) until the number of sc-bytes exceeds the downloadable file's actual byte count (i.e., the sum of the 200 and 206 requests' sc-bytes counts will generally exceed the download file's actual byte count; most download managers download more than a proportional part of the file with each request as an intentional overlap to allow correct concatenation at the download manager). In an unsuccessful multi-part download, the sum of the sc-bytes counts for the 200 requests and 206 requests would be less than the actual byte count of the file.
It is highly unlikely that you will want to invest time and resources to investigate your log data in this manner for all of the download attempts that occur from your web site. Certain web measurement products can automate this process for you, as long they can consider multiple log records that come in over time and operate on them together to determine whether a download was successful or unsuccessful. Your vendor's solution may define a "rule" to produce a list of cookies or IP and user agent combinations that make multiple GET requests for the same object (starting with a 200 record followed by multiple 206 records) within some time span. With this information, you could assess the behavior of various download managers and set your web analytics products' rules appropriately for tracking successful and unsuccessful downloads attempted by download managers. Jim MacIntyre and Eric T. Peterson |