Whether you keep your data encrypted or not, you also need to worry about software that might allow others access to it when you didn't explicitly intend for them to have this access. Software that you are running typically has all the privileges that you have as a user, and so it is allowed to do anything that you are allowed to do. This means that it has the same access to sensitive data that is stored on your computer that you, as a user , have. Usually, this is a good thing ”you don't want to manually access each and every bit of data on your computer through one single privileged application. Instead you want to be able to open a word processor (any word processor, preferably) and read any text document you've constructed . You also want to be able to access these through your email client so that you can mail them to other people, access them with cryptography programs so that you can encrypt them, and so on. Unfortunately , this means that all this software has the ability to access these (or any other of your) files when it's been invoked by you, but isn't under your direct and immediate control as well, and this ability is sometimes used for purposes that are not explicitly conveyed to you as a user.
Sometimes such capabilities and functions exist as an intentionally designed feature of the software. For example, while you are accessing a Web page, there is usually a large amount of data exchanged between your computer and a Web server that you are not party to. From your point of view, you send a query to the Web server that says, "Hello www.apple.com, please send me the file index.html ," and the server responds with a Web page. Behind the scenes, the dialog is much more complex, and your computer sends along quite a bit more information than you probably know. Most of this data is simply part of a dialog intended to allow the server to better understand what you're requesting, but even so, it may contain information that you're not intending to divulge. The following output shows some of the data that any request to a Web server makes available to any software that is called as a result of that request:
HTTP_UA_CPU="PPC" HTTP_UA_OS="MacOS" HTTP_USER_AGENT="Mozilla/4.0 (compatible; MSIE 5.22; Mac_PowerPC)" REMOTE_ADDR="188.8.131.52" REMOTE_PORT="-1911" REQUEST_METHOD="GET" REQUEST_URI="/cgi-bin/printenv.cgi" SCRIPT_FILENAME="/usr/local/httpd/cgi-bin/printenv.cgi" SCRIPT_NAME="/cgi-bin/printenv.cgi"
Notice that it knows what type of CPU I have, what operating system I'm using, my IP address and port, and also my browser. All that I knowingly sent was a request for the file /cgi-bin/printenv.cgi (a little program I've written for my Web server so that I can check these things). Without my knowledge or permission, the browser has added some additional information to the exchange.
Although none of this looks to be particularly threatening information, it is the fact that it can be, and is divulged without your control, that should get you to thinking. If I am using an email application that supports HTML email (such as Apple's mail client, and in fact most other email clients other than Mail in the terminal, Mailsmith, and Powermail), and the HTML mail includes an embedded image linked to a remote Web site, simply reading that email is going to send information such as shown above to the remote site, without my knowledge, without my permission, and without me having any significant control.
GETTING WHAT YOU ASKED FOR
In fact, this very technique is what has a large portion of the Internet in an uproar over "Web bugs " at this point. The misguided masses who clamored for HTML-capable email clients, so that they could bloat their email messages with ugly fonts and useless images, have, in the last year, suddenly realized that for their email client (or Web browser) to be able to read HTML email with images that come from remote sites, the client must contact that site, and thereby divulge the client's IP, and potentially other interesting facts about the user. Of course, the companies that provide these clients, and the companies that send HTML email, had figured this out somewhat sooner than the consumers, and so have been sending email containing links to invisible "one pixel" images now for years . The reader's HTML-capable email client dutifully contacts the remote server to retrieve the image, divulging the reader's IP, OS, CPU, and other interesting facts.
What's worse , because you're accessing a Web server using a Web browser (never mind that it's called an email client, unless your email client specifically doesn't do HTML; nowadays it's a Web browser), the Web server can set and retrieve cookies from your browser. Cookies are small files containing information sent by Web servers, and a source of great consternation to users who understand that they contain possibly private information, but are confused about how it got there and who can access it. A cookie can contain only information that's sent by a Web server, so it's unlikely that a Web-bug will be able to divulge the contents of your private files through this mechanism, but the information that could be available is quite varied. In fact, because most people don't know what cookies have been sent to their computers from what Web servers (and usually most users have hundreds of them stored), there could be almost anything in there: Anything that any Web server you've ever visited has known might be stored in the cookies on your computer, from your user ID on eBay, to your search preferences for Google, to the credit card number you entered on an online merchant's ordering page. If you've entered your actual name and address on some Web page that cooperates with the numerous Web-bug client tracking agencies, every time you read an HTML-containing email in an HTML-capable email client (or browse a Web-bugged page in a Web browser), the advertising company may be getting a record of what email you've read, what Web pages you've visited, and what products you've ordered, all linked to you, the physical consumer at your home snail -mail address. As more software becomes automatically HTML enabled and allows the embedding of links and Web content, this technology for tracking the documents you access becomes even more pervasive. According to The Privacy Foundation's report on document bugging (http://www.privacyfoundation.org/privacywatch/report.asp?id=39&action=0 and http://www.privacyfoundation.org/resources/docbug.asp), Microsoft Word, Excel, and PowerPoint have been susceptible to being bugged since the '97 versions. (The Privacy Foundation report also includes a nice Flash animation showing the mechanism of Web bugs, and a fairly comprehensive FAQ.)
For companies that store information regarding the fact that you read the mail, opened the document, or clicked on the Web page in a database, and that elicit cooperation among a multitude of Web sites and commercial emailers, this data enables them to track consumer response to various types of email advertising, chart their browsing habits around the Web, and build considerably more powerful customer profiles than would be possible if they used only what a customer would willingly divulge in a survey.
Of course, consumers are now acting indignant over the fact that the technology that they unthinkingly embraced is doing exactly what it was designed to do, and want assurances that companies won't, or preferably can't use it for that. It's like asking for a super-heavy macho-man hammer for pounding nails , and then complaining that it's not soft enough when you hit your thumb. So long as they're capable of gathering the information, and so long as it appears to be of economic benefit for them to do so, businesses are going to take every opportunity to collect and analyze information regarding their potential customers. Historically, this has been paid for by the businesses and consists of sending surveys and collecting demographics based on targeted advertising. The Web has made the arrangement much sweeter for businesses because it's the consumer who pays for the opportunity to send the information necessary to build the databases themselves . If you're a business, what could be better?
If you don't want companies to know that you've accessed their computers, don't access them. It's as simple as that.
If I was using one of those newfangled Pentium chips with the built-in digital serial number, and Microsoft's newfangled XP operating system with its irrevocably linked copy of Explorer, would my CPU's serial number and my OS's registration number be embedded in exchanges of information with Web servers as well? Short of setting up my own Web server and figuring out how to write the software to extract the content of the dialog, would there be any way for me to tell whether they were? Don't put your trust in some misguided notion that only fly-by-night companies would mine your personal information in this fashion. It's been well-documented that Microsoft's Windows Media Player phones home with information regarding every DVD that you play in your computer, and Microsoft's disclosure of this fact has been notably lacking (http://www.computerbytesman.com/privacy/wmp8dvd.htm). You might be inclined to think "well, if you've got nothing to hide, why would it bother you?" I challenge you to try that argument on the next woman you meet who's carrying a purse. Grab it, dump the contents and start rifling through them. Against her protestations, use the argument "If you've got nothing to hide, you shouldn't be bothered." She'll explain to you, in much more graphic detail than I can convey here, why that argument doesn't fly.
As a probably larger problem, because I'm running it, my browser has access to any information I've entered in its preferences, my system settings, and actually all my files as well, all with my permissions, and all without my necessarily knowing that it's accessing or divulging this information. With the advent of cookies, practically anything I've ever entered or seen on a Web page could be being sent back and forth to sites that I access, without my knowledge or consent .
Some of this information exchange is completely necessary for the applications to function. It would be impossible for a Web server to send you a Web page that you've requested without it getting the information regarding your IP address to know where to send the page. Likewise, cookies have an important purpose in making your Web experience seamless and convenient because without them there's no convenient and relatively secure method for a Web server to keep track of things such as your shopping cart at an online merchant. Unfortunately, it's exceedingly difficult to separate those exchanges that are in your best interest from those that are taking advantage of the fact that you're unaware of the dialog going on.
Our best advice is that it serves most people well to be cautious, but not overly concerned about the information being transmitted in this fashion. The majority of reputable commercial entities on the Web work hard to avoid storing cookies with sensitive data, or to make certain that the cookies are destroyed when the browser leaves the site. Those sites that use Web bugs can track some pages you visit, or emails that you read, but what are they going to do with the information? They're going to use it to more precisely target annoying pop-up ads at you, so that you get fewer ads for junk you are completely disinterested in, and more ads for junk that you might possibly be interested in, but probably aren't. The technology can be misused, and it is misused frequently, but unless you've a real reason to be worried about information you've sent to a Web site, or about other people knowing what pages you've browsed or spam email you've read, it's unlikely that these technologies are going to significantly endanger your machine.
On the other hand, it turns out to be absurdly easy to write software with either accidents or features that can be abused to cause it to perform functions that aren't in its list of intended features. Some of these allow unintended execution of programs, or other types of direct software compromises of your system, but others, insidiously, can result in your system divulging information at your command that is either random, or worse, exactly the data that you thought you were protecting in the process.
By way of example, for quite a while there was a problem with Microsoft Office applications, in that the documents had a tendency to absorb random pieces of information from around a user's system into themselves, without the user's permission or desire (http://news.zdnet.co.uk/story/0,,t281-s2109785,00.html). At least a few people claim that this is an ongoing problem (http://filebox.vt.edu/users/sears/bloated.html), but Microsoft issued a patch for the '98 releases, and to the best of our ability to discern has fixed this in the Mac OS X versions of their software.
Another example, unfortunately with the same suite of software, is a certain sloppiness about managing the data stored within its files. For some time, it was not uncommon for a Word document to contain portions of many different earlier versions of the document.
This may not appear to be a critical misfeature, and in fact I have on occasion come to rely upon this unintentional archive of data to recover the data from Word documents that have become corrupted and refuse to open, or in which I've accidentally deleted important information. Having a 10-month-old daughter in the house, many pages of Mac OS X Unleashed would have been lost without the ability to recover data from files that had mysteriously changed into something much less useful. However, we just recently heard from one of OSU's computer-savvy legal staff that this misfeature has cost at least one vendor (one that's incredibly apropos in this instance) a lucrative contract with the University. The vendor in this instance sent their business proposal to OSU as a Microsoft Word document. Unfortunately (for them), they were less knowledgeable than they should have been regarding the potential pitfalls of their software of choice, and when the OSU lawyers carefully examined the file, they discovered the remnants of an earlier version of the contract lurking in the file that had been sent. In the earlier version of the contract, the company was offering the same services to another university, but at a much better price. Needless to say, this seemingly insignificant misfeature cost someone a fair sum of money, though we don't know whether it ended up costing anyone his or her job.
Based on a quick scan through the files we've used to compose this book, it appears that this problem doesn't seem to be as prevalent as it once was. However, with only a few iterations of opening and closing files under Word version X, Service Release 1 on OS X 10.2, I managed to produce a file that looks like Figure 4.14 in Word, but that when looked at with the strings command from the command line, contains all the data shown in Listing 4.2. The dangerous part is near the bottom of the listing. I constructed this document as though I were mailing off instructions to meet for a surprise party for some friends , and then had reused the driving directions part of the document by deleting the party announcement and replacing it with a request for help to my friend. That last line of the message body in the listing is the original text from the beginning of the document that would have gone to the people attending the party. I deleted it from the Word document, and it doesn't show up anywhere in what Word shows me on the screen, or in what prints out when I print it, yet it obviously remains in the binary contents of the file itself. If I had really constructed the document this way and sent it off as a Word document to some friends, then did a bit of editing and sent a different version off to my rather computer-literate friend Adam, I might have been rather embarrassed.
Adam, remember, I need some help this evening chasing that wiring reroute down at Marijan. I've got a tracer rented until 7:30, but I can't get there with it before 7, so we're going to have to work fast. Directions to Marijan park: North on 315 to 161. Get off the highway heading west on 161 and take the first left onto the road paralleling the highway. Head south a mile or so until you see the sign for the park entrance on the left. Take both right turns on the park road to get down to the pavilion area. See you there! Will . . . Everybody, tonight's Adam's surprise party - I'll get Adam there and keep him busy 'til 7:30 Will Ray Will Ray Microsoft Word 10.1
The danger, of course, is not unique to Microsoft products. These problems are inherent in the notion that software that you run can act upon the system with your permission and authorization. If a programmer makes a simple error that accidentally causes an application to store data it wasn't supposed to, or to read another part of the disk space you control, there's nothing in the action of the system to either inform you of the error or to allow you to prevent it. Your most useful defense is to pay attention to the warnings that people send to security newsgroups and Web sites, and to act upon them by avoiding or patching software that displays problems of this nature. It doesn't hurt to take a look in files that you think you know the contents of, either ”you'll probably be surprised at what you sometimes find there.