13.4 Cookies

Team-Fly    

 
Internet-Enabled Business Intelligence
By William A. Giovinazzo
Table of Contents
Chapter 13.  Swimming in the Clickstream


James Joyce once said that the media could turn a bicycle accident into the fall of western civilization. Something like this happened in the late 1990s. The evening news was all a flurry. The White House was putting files on the computers of people who accessed the White House Web site. They did this to track who visited their site and what they did there! How insidious! How corrupt! How totally appropriate for tracking customer behavior. The Whitehouse was using a tool designed for just that purpose: cookies.

Regardless of the nefarious media spin put on cookies, they are a safe and simple way of maintaining state over the Internet. A cookie is a simple text file stored in either the memory of the browser or on the client system. These files are totally within the control of the user . If you are concerned about privacy, remember that on the Internet you can maintain as much or as little anonymity as you choose.

Cookies are a tool to help us establish an identity for our customers. We can establish this identity both within an individual session as well as between sessions. We can view a cookie as the Internet version of a Fred's card, the example used earlier in the chapter. The cookie provides us with a means to link individual visits by an individual customer to our store, just as the Fred's card did for Fred. Whether the user accesses our Web site via a proxy server, a new IP address, or a system shared with other users, we can link that individual with a specific set of sessions. We can also link that customer with sessions from different clients . We discuss how this works later when we examine the use of cookies.

Cookies also allow us to look at the market basket and beyond. When a customer makes a purchase, we can link all of those purchases together via the cookie. We can also use that same cookie to track the customer's path throughout the Web site and to track which products may have been put into the basket and then removed. This was something we were unable to do in the traditional brick-and-mortar world.

Most important to the customer-driven e-enterprise , we can now associate an identity with demographics and behaviors. The ability to analyze and understand this behavior assists the e-enterprise in better servicing the customer. Ultimately, the consumer profits from the use of cookies. As consumers, we need to recognize that the use of personal information in the hands of a trusted supplier is of great personal benefit. As suppliers, we need to recognize that if customers are loath to provide us with personal information, we have a real problem with our customer relationships. Remember, the point is to develop a mutually beneficial relationship. As with all relationships, if there is no trust, there is no real relationship. Let's take a moment to examine how cookies are used.

13.4.1 BAKING COOKIES

Browsers create cookies at the request of a Web server. The cookie resides in memory while the browser is open. When the browser is closed, persistent cookies are written to disk. The Web server requests creation of the cookie with a Set-Cookie HTTP Response Header. The format of the Set-Cookie Response Header is as follows :

 Set-Cookie: Name=  value  ; expires=  date  ; path=  path  ; domain=  domain;  secure 

where

  • Name= value This is the only required attribute of the Set-Cookie command. The name can be any sequence of characters with the exception of semicolons, commas, and white spaces. Cookies can store up to 4K of data, but typically are between 50 and 150 bytes.

  • expires= date The browser deletes the cookie from the system on this date.

  • path= path The path tells the browser to which Uniform Resource Locators (URL) the cookie is to be sent. A cookie with the specification path=/hobbits will be sent to URLs /hobbits, /hobbits/sam, and /hobbits/primula/frodo . This path is relative to the domain described next .

  • domain= domain The domain identifies for which domain name the cookie is valid. A comparison is first made of the domain name to see if there is a tail match . A tail match means the name matches the tail of a fully qualified domain name. For example, we may have domain=wizards.org as our specification. This would tail match the domains gandalf.wizards.org, sauraman.wizards.org, and dumbledore.wizards.org . Only hosts within a domain can specify that domain name. The default domain value is the host name of the server generating the Set-Cookie command.

  • secure A secure cookie is sent only over HTTPS (HTTP over SSL) servers. In order for the cookie to be sent, the communications must be secure.

Since there are so few limitations on cookies, Web servers can do a great deal with them. An individual server can send multiple Set-Cookie HTTP Response Headers at one time. However, a few rules apply to sending multiple response headers. Also, note that when sending multiple cookies, you must be careful about the order in which you send them. Cookies with more specific path names should be sent before cookies with less specific names . Cookies with the same path name overwrite one another, with the latest taking priority.

To delete a cookie, the server simply resends the cookie with the expiration date sometime in the past. This does not necessarily mean that the deletion occurs immediately. The browser can delete a cookie whenever it chooses. If the server sends a request for a cookie to be deleted, the browser may delete it at that time or some time thereafter. The browser can also choose to delete a cookie if the total number of cookies exceeds a maximum number. In most cases, the browser deletes the least recently used cookie even if the cookie has not expired . Users can also delete cookies through their browser or in some cases by simply removing the text file from the directory in which the cookies are saved.

Figure 13.6 demonstrates how cookies are created. The process begins when the user queries a Web server site. A CGI script, for example, will detect that the request does not contain a cookie, at which point the server assumes that this is the first visit to this site by the user. The CGI program then sends a Set-Cookie HTTP Response Header to the client. Depending on the setting of the browser, the client may or may not accept the cookie. For our purposes, we assume that the cookie is accepted. When subsequent requests are sent to a matching URL in that domain, the cookie is sent with the request. The Web server receives this cookie and knows with whom the server is communicating.

Figure 13.6. Cookie creation

graphics/13fig06.gif

13.4.2 MUNCHING ON COOKIES

Now that we understand how cookies work, let's look at how they solve some of our problems. Let's see how we can turn a cookie into a Web-based Fred's card . Let's think back to our discussion of market-basket analysis. First, we want to provide continuity between visits. We want to know when a customer returns to our store. Second, we want to detect patterns in his or her purchases. We would also like to perform some analyses with cookies that we were unable to do with our Fred's card. We want to know the path the customer traveled through our store, how the customer arrived at our store, and what items, if any, he or she put back on the shelf.

Let's see if we can create some sort of identity for the customer. The first step in the process is to create our Set-Cookie HTTP Response Header. In this instance, all we want to do is generate a cookie that assigns a new customer an identity. In our example, a customer, Barney, visits our store, Fred's Books, for the first time. When this happens, we respond with the following Set-Cookie HTTP Response Header:

 Set-Cookie: user=foo_0710_109; expires=01-Jan-2100; path=/freds_books; secure 

Since Barney is a first-time user, we don't know who he is, so we give him a temporary ID of foo_0710_109 . The temporary ID consists of the date, July 10, and a sequential counter of temporary IDs. This cookie tells us this user is the 109th prospective customer to access our site on July 10. After Barney browses the site for a while, he shuts down his system and leaves the office for the rest of the day.

Later that afternoon, Barney's coworker, Betty, comes into the office to do some market research on her system. Their office uses DHCP, and the server assigns Barney's old IP address to Betty. As she searches the Web, she is referred to Fred's Books by one of Fred's partner sites, Duresly Manufacturing & Drill Supply. This is Betty's first visit to Fred's site, so she receives the following cookie:

 Set-Cookie: user=foo_0710_146; expires=01-Jan-2100; path=/freds_books; secure 

As we can see, another 37 people have visited this site since Barney's visit. Betty browses the recommended pages and decides to purchase what seems to be a very important book, Object-Oriented Data Warehouse Design: Building the Star Schema by William A. Giovinazzo. As she makes her purchase, we gather her personal information. We now have an identity linked with the user foo_0710_146 . At this point, we send her the following cookie:

 Set-Cookie: user=17898; expires=01-Jan-2100; path=/freds_books; secure 

The next time that Betty comes to Fred's Web site, the cookie provides her identification to Fred. Fred's Books is a customer-driven organization. When it sees that Betty is a returning customer, a personalized welcome is put on her home page with a list of recommended books. We may also elect to delete the cookie with the old temporary customer ID by sending a Set-Cookie command with an expiration date in the past.

The following week, Betty sees Barney in the hall and tells him about the fascinating book she bought. Barney is so enthusiastic, he rushes to his system so that he can purchase his own copy from Fred's Books. When he sends his request to the Web server, the cookie that was assigned to him last week is sent with the request. The system immediately recognizes him as a returning customer. When he orders his own copy, we send him the following Set-Cookie HTTP Response Header that identifies him in our system:

 Set-Cookie: user=18003; expires=01-Jan-2100; path=/freds_books; secure 

As did Betty, Barney gets his own identification with Fred's Books. In addition, we link his previous visit to our Web site with his current purchase. Any future purchases he makes are also linked with these visits.

Two weeks later, Barney's wife tells him that he needs to get a birthday gift for his mother-in-law. Being very fond of his mother-in-law, Barney decides to share with her his recent discovery, Object-Oriented Data Warehouse Design: Building the Star Schema by William A. Giovinazzo. Using his home system, he returns to Fred's Books. Since this is the first time he has accessed Fred's from his home system, there is no cookie. He goes through the process of being assigned a temporary user cookie. Not only does he purchase the book for his mother-in-law, but he decides to pick up a few things for himself. He quickly decides on Building the Data Warehouse by W. H. Inmon and Convergence Marketing by Yoram Wind and Vijay Mahajan. When he actually makes his purchase, he is asked if he is a returning customer. Since Fred's Books has a frequent buyer program, Barney gladly identifies himself as a returning customer. The Web server sends his home system a cookie with the same customer identification as his system at work. In the future, if Barney decides to browse from his home or office, all the customer behavior will be associated with his individual account.

Now that we have described the clickstream from the customer perspective, let's examine how we can capture all this information and use it. We can save the cookies in a number of ways, such as with a Perl or CGI script. We can also use a Java program. The simplest method for our purposes is to capture the cookie as part of the transaction log. To do this, we must configure the Web server. Since there are a number of popular Web servers on the market, we will not attempt to define how to perform this configuration.

Typically, we will see the cookie appended to the transaction log record. Just as we did earlier, we take these log entries, parse them, and load them into our database. The sort we performed on the last clickstream was ordered by IP address, date, and time. Now that we have cookie information, we sort the data according to cookie, date, and time. These results are shown in a table similar to what we see in Figure 13.7. Compare this data with the data provided in Figure 13.4.

Figure 13.7. Parsed transaction log with cookies.

graphics/13fig07.gif

Another important action we need to take is to define attributes for the Web pages on our site. As customers travel through our site, we need to know more than the specific pages they visit. We need to understand what types of pages they are visiting. Consider the many types of pages that make up a Web site. There are home pages, personalized home pages, and individual product pages. Some pages are dedicated to product categories such as power tools, music, or electronics. We can drill down from these pages and find even more specific pages: circular saws, classical music, and Personal Digital Assistants (PDAs). Figure 13.8 provides an overview of a typical Web site. We can see that it would be impossible to really understand the type of page the customer visited simply by the location of the page within the Web site's structure.

Figure 13.8. Typical Web site layout.

graphics/13fig08.gif

To solve this problem, we list the attributes of the Web page within the HTML header. We can then extract the page name and its attributes from the Web server for incorporation into the data warehouse. Our data mining tool can then evaluate the effect not only of individual pages on customer behavior, but of entire classes of pages. We can determine if personalized home pages ultimately lead to sales. We can ask which product classifications are more effective than others. We may find that pages listing all types of power tools are more effective than pages listing specific categories of power tools.

The actions of Betty and Barney should generate a clickstream similar to the one shown in Figure 13.7. It is also interesting to note that this is the same clickstream data that was presented in Figure 13.4. Adding the cookie dimension to the data provides us with a much clearer image of what has been happening on our Web site. We can now separate the actions of Betty and Barney much more reliably. We can also link the actions of the individual customers to understand their individual behaviors.

One of the first things we want to do with our cookies is to provide a link between visits. Barney demonstrates how, with cookies, we can link multiple visits. The same cookie links his visits from his office and his home. To retrieve all of his visits from our database, we would simply use the key 18003. In addition, using the IDs established by the cookies, we can link the browsing done prior to any visits that resulted in a purchase. All we need do is to include any entries with the cookie foo_0710_109.

We would also like to see how the different customers and prospective customers move through our Web site. We can reconstruct the visits made by both Betty and Barney by extracting the clickstream for each and then rebuilding the path according to the time of the page request. In order to derive the path, we must answer some questions. First, when does a visit begin and end? If there is a gap of a day or more between clicks, we could easily assume that these represent separate visits. What if the time span is a matter of hours or minutes? We need to decide the length of inactivity before we call a visit a new visit. We also need to eliminate superfluous entries. As we said earlier, a hit includes many items, such as graphics images and sound files. We need to eliminate these from our clickstream as well.

We can now use this path information to link which pages are most frequently associated with a sale and which are not. By following the path of a particular customer, we can see when the path actually leads to the sale of product. We can also see which pages on our site are least frequently visitedwhere the dead spots are in the store. We can then explore why certain product pages are dead. Is it that the product is truly unpopular? Can we combine this product with more popular items to help improve sales?

We can also tell the origin of customer visits through the referrer field in the transaction log. We can use this information to find our virtual location on the Web.

As we discussed in the previous section, the use of raw IP addresses can be misleading. The stateless nature of the connection, proxy servers, and dynamic IP address assignments all contribute to the IP address not being a very reliable indicator of the individual user accessing our site. We have proposed that cookies are a preferred method of customer identification. We should note, however, that cookies have their own set of problems.

Cookies identify a particular system, not a person. We noted how Barney purchased products from Fred's Books on his home system. If the next day, Mrs. Barney uses that same system to browse The Art of Doily Design for Fun and Profit , the purchase would be attributed to Barney's identification. If she makes a purchase under her husband's account, as far as Fred is concerned, it is the same person. This may not necessarily be a problem. It could be argued that Barney would certainly want to know the types of products in which Mrs. Barney is interested. Being a good husband, if our recommendation engine sent him a notice of a product that is of interest to his wife, he would certainly wish to purchase it. This may or may not be the case. It is really dependent on the individual circumstance. The system designer needs to understand this problem and decide whether it is an issue for his or her analysis.

Users can also actively thwart the system. They may choose to refuse all cookies. They may manually delete their cookies on a regular basis. In other instances, the user may simply not care. In our example, Barney identified himself when he used his home system to browse. He may have chosen not to give us his identity. If he didn't purchase an item, we would not have known that he was a returning customer. He also could have purchased the item using a different credit card, in which case only a house-holding application in our data warehouse would have identified him as the same user.

While there is little we can do to prevent users from deleting cookies from their systems, we can certainly understand their concerns. Despite that cookies are limited to one domain, third-party companies can use them to follow an individual's path through the Internet. This is shown in Figure 13.9. As you will recall, a server sends a Set-Cookie HTTP Response Header to a client browser making a request of the server. This request could be for a Web page, a file, or even a Graphics Interchange Format (GIF) file embedded within a Web page. In step 1 of Figure 13.9, we see that a user visits the site sports_stuff.com . The page contains the graphic image microimage.gif from a completely different server, sneakyguys.com .

Figure 13.9. Cross-selling with cookies.

graphics/13fig09.gif

In step 2, we see that when the browser receives the HTML page, it makes a request to sneakyguys.com for the GIF file. Since graphics images can be as small as a single pixel, which is the case in our example, few of even the most observant users would notice the additional request. In step 3, the sneakyguys.com server sends a Set-Cookie HTTP Response Header to the client browser along with the GIF file. The browser follows the preferences set by the user and creates the cookie. In our example, when dad decides to buy that new laser range finder he has been eyeing for the past few months, sports_stuff.com is able to link his identity with his visits to its Web site. We described this process earlier in the chapter.

Later that day, Junior is left at home unsupervised . Being a 15-year-old, tightly wound sack of raging hormones , he sees his chance while the folks are away and is happily clicking his way to hot_nasty_monkey_love.com before the car is barely out of the driveway . He thinks he is getting away with murder. The problem is that he doesn't know about sneakyguys.com . In step 4, when he downloads his first page, there it is again: microimage.gif . This time, however, the cookie that Dad got when he bought his range finder is sent along with the request for the GIF file. We see this in step 6. Sneakyguys.com , however, can't differentiate between Junior and Dad. As far as sneakguys.com is concerned, dad is interested in something a bit more nefarious than golf. Sneakyguys.com is more than glad to market to dad's seemingly varied interest.

You might wonder how we can do this if a cookie applies to a single domain. In our example, the cookie does apply to the single domain. Since a Web page can request images from any and many domains, it is not limited to making requests of servers within its own domain. In this instance, sneakyguys.com have agreements with both sports_stuff and hot_nasty_monkey_love to include its image on their Web pages. Both servers reference a single common server in its own domain.

Both Dad and Junior are unaware of the agreement that sneakguys.com has with sports_stuff.com to purchase their customer information along with the clickstream data. Two weeks later, Dad starts getting email from the Young Norwegian Hedonist Association inviting him to a clothing-optional golf weekend in the Cayman Islands. Fortunately for sports_stuff.com , Dad can't figure out how the heck he got on the mailing list.

The moral of the story? First, as users, we need to be careful which sites we visit. You never know who is watching. My dear sainted Sicilian mother taught me at an early age never to put anything in writing. This is especially true of where you go on the Internet.

Second, as system designers, we need to establish trust with our customers. We discussed how the point of CRM is to establish a mutually beneficial relationship. A critical element of this relationship is trust. One of the ways we can easily lose that trust is to provide customer information to a third party without the customer's expressed consent . Whenever customer data is collected, the policy on the use of that information should be clearly stated on the Web site and stringently followed.

Finally, we also learn a lesson in permission marketing. While our example shows an abuse of collecting customer information, this is not necessarily the case. We could apply these same techniques to cross-selling. The second site could just as easily have been a golf resort or an online ticketing site. We could provide information on local golf courses and golf excursions, cross-selling these additional services. The difference between a campaign being an invasion of privacy or helpful advice is permission.

Before we start any sort of marketing campaign, sending customers an avalanche of emails, it is common courtesy to simply ask their permission. According to Tom Osenton in Customer Share Marketing , "capturing a prospect's identity and permissionrepresents the marketing intersection where mass marketing and direct marketing meet. It is at this level that an ongoing, extremely efficient and relevant permission-granted relationship can beginone that can uniquely complement the work that mass marketing must continue to do in launching new products in helping to build brand equity and in motivating prospects and customers to take action." As we can see, cookies and their use in capturing a customer's behavior can be an extremely powerful tool if used properly.


Team-Fly    
Top
 


Internet-Enabled Business Intelligence
Internet-Enabled Business Intelligence
ISBN: 0130409510
EAN: 2147483647
Year: 2002
Pages: 113

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net