Section 26.2. Usability Is Even More Important for Privacy | Security and Usability: Designing Secure Systems That People Can Use

26.2. Usability Is Even More Important for Privacy

Usability affects security in systems that aim to protect data confidentiality. But when the goal is privacy, usability can become even more important. A large category of anonymizing networks, such as Tor, JAP, Mixminion, and Mixmaster, aim to hide not only what is being said, but also who is communicating with whom, which users are using which web sites, and so on. These systems have a broad range of users, including ordinary citizens who want to avoid being profiled for targeted advertisements, corporations that don't want to reveal information to their competitors, and law enforcement and government intelligence agencies that need to do operations on the Internet without being noticed.

Anonymizing networks work by hiding users among users. An eavesdropper might be able to tell that Alice, Bob, and Carol are all using the network, but should not be able to tell which of them is talking to Dave. This property is summarized in the notion of an anonymity setthe total set of people who, as far as the attacker can tell, might include the one engaging in some activity of interest. The larger the set, the more anonymous the participants.^[2] When more users join the network, existing users become more secure, even if the new users never talk to the existing ones!^[3] Thus, "anonymity loves company."^[4]

^[2] Assuming that all participants are equally plausible, of course. If the attacker suspects Alice, Bob, and Carol equally, Alice is more anonymous than if the attacker is 98% suspicious of Alice and 1% suspicious of Bob and Carol, even though the anonymity sets are the same size. Because of this imprecision, recent research is moving beyond simple anonymity sets to more sophisticated measures based on the attacker's confidence.

^[3] Alessandro Acquisti, Roger Dingledine, and Paul Syverson, "On the Economics of Anonymity," in Rebecca N. Wright (ed.), Financial Cryptography (Springer-Verlag, LNCS 2742, Jan. 2003); Adam Back, Ulf Möller, and Anton Stiglic, "Traffic Analysis Attacks and Trade-offs in Anonymity Providing Systems," in Ira S. Moskowitz (ed.), Information Hiding (IH 2001), (Springer-Verlag, LNCS 2137, Apr. 2001).

^[4] This catch phrase was first made popular in our context by Michael Reiter and Aviel Rubin, "Crowds: Anonymity for Web Transactions," ACM Transactions on Information and System Security 1:1 (June 1998).

In a data confidentiality system like PGP, Alice and Bob can decide by themselves that they want to get security. As long as they both use the software properly, no third party can intercept the traffic and break their encryption. However, Alice and Bob can't get anonymity by themselves: they need to participate in an infrastructure that coordinates users to provide cover for each other.

No organization can build this infrastructure for its own sole use. If a single corporation or government agency were to build a private network to protect its operations, any connections entering or leaving that network would be obviously linkable to the controlling organization. The members and operations of that agency would be easier, not harder, to distinguish.

Thus, to provide anonymity to any of its users, the network must accept traffic from external users, so the various user groups can blend together.

In practice, existing commercial anonymity solutions (like Anonymizer.com) are based on a set of single-hop proxies. In these systems, each user connects to a single proxy, which then relays the user's traffic. Single proxies provide comparatively weak security, because a compromised proxy can trivially observe all of its users' actions, and an eavesdropper needs to watch only a single proxy to perform timing correlation attacks against all its users' traffic. Worse, all users need to trust the proxy company to have good security itself as well as to not reveal user activities.

The solution is distributed trust: an infrastructure made up of many independently controlled proxies that work together to make sure that no transaction's privacy relies on any single proxy. With distributed-trust anonymizing networks like the ones discussed in this chapter, users build tunnels or circuits through a series of servers. They encrypt their traffic in multiple layers of encryption, and each server removes a single layer of encryption. No single server knows the entire path from the user to the user's chosen destination. Therefore, an attacker can't break the user's anonymity by compromising or eavesdropping on any one server.

Despite their increased security, distributed-trust anonymizing networks have their disadvantages. Because traffic needs to be relayed through multiple servers, performance is often (but not always) worse. Also, the software to implement a distributed-trust anonymizing network is significantly more difficult to design and implement.

Beyond these issues of the architecture and ownership of the network, however, there is another catch. For users to keep the same anonymity set, they need to act like each other. If Alice's client acts completely unlike Bob's client, or if Alice's messages leave the system acting completely unlike Bob's, the attacker can use this information. In the worst case, Alice's messages stand out entering and leaving the network, and the attacker can treat Alice and those like her as if they were on a separate network of their own. But even if Alice's messages are recognizable only as they leave the network, an attacker can use this information to break exiting messages into "messages from User1," "messages from User2," and so on, and can now get away with linking messages to their senders as groups, instead of trying to guess from individual messages. Some of this partitioning is inevitable: if Alice speaks Arabic and Bob speaks Bulgarian, we can't force them both to learn English in order to mask each other.

What does this imply for usability? More so than with encryption systems, users of anonymizing networks may need to choose their systems based on how usable others will find them, in order to get the protection of a larger anonymity set.

26.2.1. Case Study: Usability Means Users, Users Mean Security

Let's consider an example. Practical anonymizing networks fall into two broad classes: high latency and low latency.

High-latency networks. Networks like Mixminion and Mixmaster can resist strong attackers who can watch the whole network and control a large part of the network infrastructure. To prevent this "global attacker" from linking senders to recipients by correlating when messages enter and leave the system, high-latency networks introduce large delays into message delivery times, and are thus suitable only for applications like email and bulk data deliverymost users aren't willing to wait half an hour for their web pages to load.
Low-latency networks. Networks like Tor, on the other hand, are fast enough for web browsing, secure shell, and other interactive applications, but have a weaker threat model. An attacker who watches or controls both ends of a communication can trivially correlate message timing and link the communicating parties.

Clearly, users who need to resist strong attackers must choose high-latency networks or nothing at all, and users who need to anonymize interactive applications must choose low-latency networks or nothing at all. But what should flexible users choose? Against an unknown threat model, with a noninteractive application (such as email), is it more secure to choose security or usability?

Security, we might decide, is more important than usability. If the attacker turns out to be strong, then we'll prefer the high-latency network, and if the attacker is weak, then the extra protection doesn't hurt.

But because many users might find the high-latency network inconvenient, suppose that it gets few actual usersso few, in fact, that its maximum anonymity set is too small for our needs. In this case, we need to pick the low-latency system, because the high-latency system, although it always protects us, never protects us enough; whereas the low-latency system can give us enough protection against at least some attackers.

This decision is especially messy because even the developers who implement these anonymizing networks can't recommend which approach is safer, since they can't predict how many users each network will get, and they can't predict the capabilities of the attackers we might see in the wild. Worse, the anonymity research field is still young, and doesn't have many convincing techniques for measuring and comparing the protection we get from various situations. So even if the developers or users could somehow divine what level of anonymity they require and what their expected attacker can do, the researchers still don't know what parameter values to recommend.

26.2.2. Case Study: Against Options

Too often, designers faced with a security decision bow out, and instead leave the choice as an option: protocol designers leave implementors to decide, and implementors leave the choice for their users. This approach can be bad for security systems, and is nearly always bad for privacy systems. With security:

Extra options often delegate security decisions to those least able to understand what they imply. If the protocol designer can't decide whether the AES encryption algorithm is better than the Twofish encryption algorithm, how is the end user supposed to pick?
Options make code harder to audit by increasing the volume of code, by increasing the number of possible configurations exponentially, and by guaranteeing that nondefault configurations will receive little testing in the field. If AES is always the default, even with several independent implementations of your protocol, how long will it take to notice if the Twofish implementation is wrong?

Most users stay with default configurations as long as they work, and reconfigure their software only as necessary to make it usable. For example, suppose the developers of a web browser can't decide whether to support a given extension with unknown security implications, so they leave it as a user-adjustable option, thinking that users can enable or disable the extension based on their security needs. In reality, however, if the extension is enabled by default, nearly all users will leave it on whether it's secure or not; and if the extension is disabled by default, users will tend to enable it based on their perceived demand for the extension rather than on their security needs. Thus, only the most savvy and security-conscious usersthe ones who know more about web security than the developers themselveswill actually wind up understanding the security implications of their decision.

The real issue here is that designers often end up with a situation where they need to choose between "insecure" and "inconvenient" as the default configurationmeaning that they've already made a mistake in designing their application.

Of course, when end users do know more about their individual security requirements than application designers do, adding options is beneficial, especially when users describe their own situation (home or enterprise; shared versus single-user host) instead of trying to specify what the program should do about their situation.

In privacy applications, superfluous options are even worse. When there are many different possible configurations, eavesdroppers and insiders can often tell users apart by which settings they choose. For example, the Type I or "Cypherpunk" anonymous email network uses the OpenPGP encrypted message format, which supports many symmetric and asymmetric ciphers. Because different users prefer different ciphers, and because different versions of encryption programs implementing OpenPGP (such as PGP and GnuPG ) use different cipher suites, users with uncommon preferences and versions stand out from the rest, and get little privacy at all.

Similarly, the Type I network allows users to pad their messages to a fixed size so that an eavesdropper can't correlate the sizes of messages passing through the networkbut it forces the user to decide what size of padding to use! Unless a user can guess which padding size will happen to be most popular, the option provides attackers with another way to tell users apart.

Even when users' needs genuinely vary, adding options does not necessarily serve their privacy. In anonymizing networks, for example, because the default option usually prevails for casual users, this option actually provides more security for security-conscious userseven when it would not otherwise be their best choice! For example, when an anonymizing network allows user-selected message latency (like the Type I network does), most users tend to use whichever setting is the default, so long as it works. Of the fraction of users who change the default at all, most will not, in fact, understand the security implications; those few who do will need to decide whether the increased traffic-analysis resistance that comes with more variable latency is worth the decreased anonymity that comes from splitting away from the bulk of the user base.

26.2.3. Case Study: Mixminion and MIME

We've argued that providing too many observable options can hurt privacy, but we've also argued that focusing too hard on privacy over usability can hurt privacy itself. What happens when these principles conflict?

We encountered such a situation when designing how the Mixminion anonymous email network^[5] should handle MIME-encoded data. MIME (Multipurpose Internet Mail Extensions) is the way a mail client tells the receiving mail client about attachments, which character set was used, and so on. As a standard, MIME is so permissive and flexible that different email programs are almost always distinguishable by which subsets of the format, and which types of encodings, they choose to generate. Trying to "normalize" MIME by converting all mail to a standard works only up to a point: it's trivial to convert all encodings to quoted-printable, for example, or to impose a standard order for multipart/alternative parts, but demanding a uniform list of formats for multipart/alternative messages, normalizing HTML, stripping identifying information from Microsoft Office documents, or imposing a single character encoding on each language would likely be an impossible task.

^[5] George Danezis, Roger Dingledine, and Nick Mathewson, "Mixminion: Design of a Type III Anonymous Remailer Protocol," 2003 IEEE Symposium on Security and Privacy, IEEE CS (May 2003), 215.

Other possible solutions to this problem could include limiting users to a single email client or simply banning email formats other than plain 7-bit ASCII. But these procrustean approaches would limit usability and turn users away from the Mixminion network . Because fewer users means less anonymity, we must ask whether users would be better off in a larger network where their messages are more likely to be distinguishable based on the email client, or in a smaller network where everyone's email formats look the same.

Some distinguishability is inevitable anyway, because users differ in their interests, languages, and writing styles: if Alice writes about astronomy in Amharic, her messages are unlikely to be mistaken for Bob's, who writes about botany in Basque. Also, any attempt to restrict formats is likely to backfire. If we limited Mixminion to 7-bit ASCII, users wouldn't stop sending each other images, PDF files, and messages in Chinese. Instead, they would follow the same evolutionary path that led to MIME in the first place, and encode their messages in a variety of distinguishable formats, with each client software implementation having its own ad hoc favorites. So, imposing uniformity in this case would not only drive away users, it would probably fail in the long run, and lead to fragmentation at least as dangerous as that we were trying to avoid.

With Mixminion, we also had to consider threat models. To take advantage of format distinguishability, an attacker needs to observe messages leaving the network, and either exploit prior knowledge of suspected senders ("Alice is the only user who owns a 1995 copy of Eudora"), or feed message format information into traffic analysis approaches ("Because half of the messages to Alice are written in English, I'll assume they mostly come from different senders than the ones in Amharic"). Neither attack is certain or easy for all attackers; even if we can't defeat them in the worst possible case (where the attacker knows, for example, that only one copy of LeetMailPro was ever sold), we can provide vulnerable users with protection against weaker attackers.

In the end, we compromised: we perform as much normalization as we can, and warn the user about document types such as Microsoft Word that are likely to reveal identifying information, but we do not forbid any particular format or client software. This way, users are informed about how to blend with the largest possible anonymity set, but users who prefer to use distinguishable formats rather than nothing at all still receive and contribute protection against certain attackers.

26.2.4. Case Study: Tor Installation, Marketing, and GUI

Usability and marketing have also proved important in the development of Tor, a low-latency anonymizing network for TCP traffic. The technical challenges Tor has solved, and the ones it still needs to address, are described in its design paper,^[6] but at this point, many of the most crucial challenges are in adoption and usability.

^[6] Roger Dingledine, Nick Mathewson, and Paul Syverson, "Tor: The Second-Generation Onion Router," Proceedings of the 13th USENIX Security Symposium (August 2004).

While Tor was in its earliest stages, its user base was a small number of fairly sophisticated privacy enthusiasts with experience running Unix services, who wanted to experiment with the network (or so they say; by design, we don't track our users). As the project gained more attention from venues including security conferences and articles on Slashdot.org and Wired News, we added more users with less technical expertise. These users can now provide a broader base of anonymity for high-needs users, but only when they receive good support themselves.

For example, it has proven difficult to educate less sophisticated users about DNS issues. Anonymizing TCP streams (as Tor does) does no good if applications reveal where they are about to connect by first performing a nonanonymized hostname lookup. To stay anonymous, users had to do one of the following:

Configure their applications to pass hostnames to Tor directly by using SOCKS4a or the hostname-based variant of SOCKS5
Manually resolve hostnames with Tor and pass the resulting IPs to their applications
Direct their applications to application-specific proxies that handle each protocol's needs independently

None of these is easy for an unsophisticated user, and when users misconfigure their systems, they not only compromise their own privacy, but also provide no cover for the users who are configured correctly: if Bob leaks a DNS request whenever he is about to connect to a web site, an observer can tell that anybody connecting to Alice's web site anonymously must not be Bob. Thus, experienced users have an interest in making sure that inexperienced users can use the system correctly. Having Tor be hard to configure is a weakness for everybody.

We've tried a few solutions that have not worked as well as we had hoped. Improving documentation helped only the users who read it. We changed Tor to warn users who provided an IP address rather than a hostname, but this warning usually resulted in several email exchanges to explain DNS to the casual user, who typically had no idea how to solve his problem.

At the time of this writing, the most important solutions for these users have been to:

Improve Tor's documentation for how to configure various applications to use Tor
Change the warning messages to refer users to a description of the solution ("You are insecure; see this web page") rather than a description of the problem ("Your application is sending IPs instead of hostnames, which may leak information; consider using SOCKS4a instead")
Bundle Tor with the support tools that it needs, instead of relying on users to find and configure them on their own

26.2.5. Case Study: JAP and its Anonym-o-meter

The Java Anon Proxy (JAP) is a low-latency anonymizing network for web browsing developed and deployed by the Technical University of Dresden in Germany.^[7] Unlike Tor, which uses a free-route topology where each user can choose where to enter the network and where to exit, JAP has fixed-route cascades that aggregate user traffic into a single entry point and a single exit point. The JAP client includes a GUI, shown in Figure 26-1.

^[7] Oliver Berthold, Hannes Federrath, and Stefan Köpsell, "Web MIXes: A System for Anonymous and Unobservable Internet Access," in Hannes Federrath (ed.), Designing Privacy Enhancing Technologies: Workshop on Design Issues in Anonymity and Unobservability (Springer-Verlag, LNCS 2009, July 2000).

Figure 26-1. JAP client GUI

Notice the "anonymity meter," which gives the user an impression of the level of protection for his current traffic.

How do we decide the value that the anonym-o-meter should report? In JAP's case, it's based on the number of other users traveling through the cascade at the same time. But alas, because JAP aims for quick transmission of bytes from one end of the cascade to the other, it falls prey to the same end-to-end timing correlation attacks as we described earlier. That is, an attacker who can watch both ends of the cascade won't actually be distracted by the other users.^[8] The JAP team has plans to implement full-scale padding from every user (sending and receiving packets all the time, even when they have nothing to send), butfor usability reasonsthey haven't gone forward with these plans.

^[8] George Danezis, "The Traffic Analysis of Continuous-Time Mixes," in David Martin and Andrei Serjantov (eds.), Privacy Enhancing Technologies (PET 2004) (May 2004).

As the system is now, anonymity sets don't provide a real measure of security for JAP, since any attacker who can watch both ends of the cascade wins, and the number of users on the network is no obstacle to this attack. However, we think the anonym-o-meter is a great way to present security information to the user, and we hope to see a variant of it deployed one day for a high-latency system like Mixminion, where the amount of current traffic in the system is more directly related to the protection it offers.