Hack 76. Hacking into Page-Level Details for Language
Generate a page-by-language text file that can be mined for deep and rich information. While most web measurement applications do a pretty good job reporting on the percentage distribution of languages used to browse your site, very few go so far as to let you segment your visitor activity [Hack #48] by language. Fortunately, if you have a nominal amount of control over your web site, you can write a relatively simple hack to provide this information. 5.10.1. The CodeThe following code is written in VBScript for Microsoft's Active Server Pages, but could quite easily be adapted to PHP, Perl, or Java. You should save the following code as language_by_page.inc and include it in your header files. <% Dim fso, lf df = "en" ' Default language for site is English (en) ' Test to see if current visitor is using other than the default language (df) if (Left(REQUEST.SERVERVARIABLES("HTTP_ACCEPT_LANGUAGE"), 2) <> df AND REQUEST.SERVERVARIABLES("HTTP_ACCEPT_LANGUAGE") <> "") then ' If so, open the language_by_page.txt file for appending Set fso = CreateObject("Scripting.FileSystemObject") Set lf = fso.OpenTextFile("c:\websites\webanalytics\cgi-data\ language_by_page.txt", 8, True) ' Append the name of the script, the language and the time stamp lf.WriteLine( REQUEST.SERVERVARIABLES("SCRIPT_NAME") & "|" & REQUEST.SERVERVARIABLES("HTTP_ACCEPT_LANGUAGE") & "|" & now()) ' Don't forget to close the "lf" object lf.Close end if %> 5.10.2. Running the CodeHere is what you need to do to have language_by_page.inc track the pages your non-English visitors are viewing:
That's it. As soon as visitors who have their browser language set to something other than English (which can be changed by changing the df variable; consult http://www.w3.org/WAI/ER/IG/ert/iso639.htm for a complete list of ISO 639 language codes and two-letter abbreviations used by HTTP_ACCEPT_LANGUAGE), the request is saved for future analysis in the language_by_page.txt file.
5.10.3. The ResultsWhat you'll end up with is a text file looking something like Figure 5-15. that contains a list of filenames, language codes, and times all separated by a pipe character. Figure 5-15. The language_by_page.txt fileIf you import this list into Microsoft Excel or any reasonable database, you can then begin to take a closer look at which pages your non-English speaking visitors are most interested in. These pages are good candidates for translation! 5.10.4. An Alternative to Hacking for Page-Level DetailsIf you're shy about writing code or you get a funny look from your system administrator when you show him this hack, don't despair. Some web measurement vendors provide the ability to segment visitors by browser language. While this functionality is far from ubiquitous, all you need to ask your vendor is, "Can you show me a report of all activityincluding page views, referrals, and conversion eventsby visitor browser language?" If you get a blank stare (or silence on the phone) you may want to point them toward this hack and have them do their homework! Regardless of how you get the data, knowing what your non-English reading visitors are most interested in on your web site and in what numbers they are visiting will help you to define a future plan for having a truly global Internet presence. |