Chapter 12: Privacy Protection and Anonymity Services | Security Technologies for the World Wide Web, Second Edition

In this chapter, we focus on the increasingly important field of privacy protection and anonymity services for the WWW. More specifically , we introduce the topic in Section 12.1, elaborate on some early work ( mainly in the field of providing anonymity services for electronic mail) in Section 12.2, discuss cookies and their privacy implications in Section 12.3, address technologies to anonymously browse and anonymously

publish on the Web in Sections 12.4 and 12.5, ^[1] elaborate on voluntary privacy standards in Section 12.6, and draw some conclusions in Section 12.7. Note that parts of this chapter are taken from [1]. Also note that many countries have data privacy or data protection laws that must be considered and taken into account when personal data are stored, processed , or transmitted. These laws and their implications are not addressed in this book. You may refer to [2] to get some further information about the legal situation in your country. Last but not least, [3] provides another source of information.

12.1 Introduction

Many users think that browsing the Web is an anonymous activity. This is because it is not immediately visible to them that there are many computer systems behind the scenes that busily collect information about or related to them. For example, each Web server has a log file that is usually configured to add an entry for every single HTTP request message that is received and processed. For example, a fictitious entry may look as follows :

proxy.esecurity.ch - - [13/May/2002:15:04:31 +0200]
"GET /esecurity.html HTTP/1.0"
200 1369"http://www.esecurity.ch/"
"Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"

In this example, a client machine with DNS name proxy.esecurity.ch anonymously ^[2] requested the resource http://www.esecurity.ch/esecurity.html using HTTP version 1.0 in the afternoon of May 13, 2002. The Web server accepted the request (indicated by status code 200) and sent back the 1369 bytes long HTML file esecurity.html. In addition to the information needed to serve the request, the client also sent to the server and the server logged some information related to the client platform and software in use. In this example, the client was running Windows 2000 and Microsoft Internet Explorer version 5.5.

Any interested reader may refer to the analyzing service of Privacy.net ^[3] to learn about the information his or her browser provides when it connects to a Web server. ^[4] For example, Figure 12.1 illustrates a corresponding Web page rendered by the Opera browser. In this example, the server correctly recognizes that the client is running an Opera browser version 6.0 (english version) running on a Windows NT 4.0 platform. Also, the server learns about the browser settings related to JavaScript, cookies, plug-ins, and many other features that are not even illustrated in Figure 12.1. Note, for example, that the browser reveals the fact that it is running the Shockwave Flash plugin version 5.0. From a security point of view, this fact reveals that the browser may be attacked using a vulnerability and corresponding exploit related to this specific version of the Shockwave Flash plug-in. From a privacy point of view, this fact also reveals that the browser is also used to display animated Web sites.

Figure 12.1: Privacy.net s dynamically created Web page to illustrate the client-side settings rendered by the Opera browser. ( 2002 Opera Software.)

Even more information is available to local network administrators and Internet service providers (ISPs). Their internetworking devices are usually configured to log relevant information. Most importantly, their HTTP proxy servers keep track of every Web site and URL that is requested by a user. Consequently, the local network administrators and ISPs are the ones that are most likely able to establish user profiles. These profiles may threat the privacy of users and it is an ongoing (legal) discussion about how far they can go.

The mechanism of choice to establish user profiles is traffic analysis. According to RFC 2828 [4], the term traffic analysis refers to the ˜ ˜inference of information from observable characteristics of data flow(s), even when the data is encrypted or otherwise not directly available. Such characteristics include the identities and locations of the source(s) and destination(s), and the presence, amount, frequency, and duration of occurrence. Outside the military, the threat of traffic analysis has largely been ignored. But traffic analysis is becoming a significant threat to the privacy of Web users, and the browsing behavior of Web users is increasingly subject to observation. As Web-based applications and services become more prevalent , this behavior includes the shopping habits and spending patterns of individual users, as well as other personal data that have traditionally been considered private. Similarly, the Web is becoming an important source for information and intelligence gathering. In a competitive environment, a company may wish to protect its current research topics. However, monitoring HTTP data traffic may reveal the company s primary focus. By keeping Web browsing characteristics private, the company s interests are adequately protected. We saw in Chapter 9 that some electronic payment systems (e.g., anonymous electronic cash systems) allow secure financial transactions over the Internet while preserving the untraceability and anonymity that normal cash allows. However, if electronic cash is transmitted over a channel that identifies both the payer and the payee, the transaction may no longer stay anonymous.

Unfortunately, traffic analysis is a threat that is very difficult to protect against, given the architecture of the Internet and WWW. ^[5] For example, simply encrypting IP packets between a browser and a Web server (e.g., using the SSL/TLS protocol) does not protect against traffic analysis (i.e., the analysis still reveals that the browser and the Web server are sending IP packets forth and back). Consequently, other security mechanisms are required to protect communicating peers against traffic analysis and to provide corresponding anonymity services.

According to [5], there are three types of anonymous communication properties that can be provided individually or in combination:

Sender anonymity;
Receiver anonymity;
Unlinkability of sender and receiver (i.e., connection anonymity).

In short, sender anonymity means that the identity of the party who sent a particular message is hidden, while its receiver and the message itself might not be. Similarly, receiver anonymity means that the identity of the receiver is hidden, while its sender and the message itself might not be. Finally, unlinkability of sender and receiver (also referred to as connection anonymity ) means that though the sender and receiver can each be identified individually as participating in some communication, they cannot be identified as communicating with each other.

All three types of anonymous communication properties may be relevant for the WWW. For example, sender anonymity is relevant if somebody wants to publish anonymously on the Web. Refer to Section 12.5 for corresponding technologies. Similarly, receiver anonymity is relevant if somebody wants to browse anonymously through the Web. Refer to Section 12.4 for corresponding technologies. Last but not least, connection anonymity is relevant if somebody wants to hide the fact that he or she is participating in some Web traffic. In Section 12.4.4, we will learn about a technology called onion routing that can be used to implement anonymous connections.

^[1] In some literature, these technologies are also referred to as privacy enhancing technologies (PETs).

^[2] The fact that the request was anonymous is represented by the empty user name that would follow the client name in the log file.

^[3] http://privacy.net/analyze

^[4] German speaking readers may also refer to http://www.datenschutz.ch for an analysis of the browser s privacy settings.

^[5] In leased lines and circuit-switched networks, traffic padding may be used to protect against traffic analysis.