Section 3.6. Summary | Zero Configuration Networking: The Definitive Guide

3.6. Summary

Zeroconf's Multicast DNS, like RFC 3927 link-local addressing, provides a safety net when the equivalent conventional infrastructure is not present or working. When there's no DHCP server, link-local addressing gets you an address that's at least good for the local link. When there's no DNS server, or there is one but you have no way to add your own hostnames to it, Multicast DNS gives you a way of referring to devices by name that at least works on the local link. This gets us to a useful level of functionality: even when DHCP and DNS are broken, link-local addressing and Multicast DNS mean that you can give your devices names, refer to them from other computers using those names, and establish working TCP connections so that you can do useful networking. Zeroconf doesn't stop there, though. With the technology described so far, you can do useful networking, but you need to know the hostname, you need to remember it, and you need to type it in correctly. If you mistype it, misremember it, or just don't know the name of the printer, you're in trouble. Wouldn't it be better if you didn't have to know the name of the printer in advance? Wouldn't it be better if you could just say, "I need to print a document. Is there anything on the network that can help me with that?" That's Zeroconf's DNS Service Discovery technology, and that's the subject of the next chapter.

UTF-8

The American Standard Code for Information Interchange (ASCII) has long been the standard way most computers represent text in their memories and on their disks. Each letter, digit, and symbol is represented by a different seven-bit binary code. Rather than write the codes in binary, they are usually written in numerical form, as numbers from 0 to 127. For example, the code for A is 65.

The problem is that while 128 values are enough to represent English uppercase letters, lowercase letters, digits, and punctuation, they're not enough to represent all the accented characters used in European languages, and definitely not enough to represent all the characters used in Japanese, Hebrew, Indian languages, and so on.

A new standard for representing text called the Universal Character Set (UCS), or Unicode, solves this by having literally millions of possible codes. The problem is that to work directly with Unicode data, software that manipulates text has to be rewritten, which takes a long time. Also, Unicode is less efficient than ASCII for English words, making it slow to gain popularity among many English speakers who don't see much benefit from having all those extra characters if they're not using them. The word "Hello" in Unicode can take double or even four times the space to store as the same word stored using ASCII.

UCS Transformation Format 8 (UTF-8) solves this problem in a simple, elegant way. A single eight-bit byte in computer memory can hold 256 possible values, from 0 to 255, but ASCII requires only 128 values, leaving the other 128 unused. The ingenious solution is that UTF-8 uses the values 0-127 to represent exactly the same characters as ASCIIso the word "Hello" stored using UTF-8 and the word "Hello" stored using ASCII are exactly the same in memory. UTF-8 is a compatible superset of ASCII. So we can declare, by fiat, as it were, that every single ASCII string stored in memory or on disk anywhere in the world is actually a UTF-8 string, and not a single line of software has had to change.

The second step is how UTF-8 represents all those additional non-roman characters. UTF-8 uses those byte values in the range 128-255, unused by ASCII, to represent those characters. Depending on the character, it may be represented in memory as a consecutive sequence of two, three, or more bytes in the 128-255 range.

The beauty of this is that almost all software that works with ASCII text can work without modification with the new UTF-8 text. Of course, if you want to display UTF-8 text on the screen or print it on paper, you need software that knows how to properly decode UTF-8 and draw the right characters, but most software never needs to do this. DNS code is concerned with putting data into packets and reading it out, not with what those characters look like to humans. The little bit of user-interface code responsible for showing text on the screen has to draw UTF-8 characters correctly, but the rest of the DNS protocol codethe vast bulk of itcan just pass the data around as raw data, unconcerned with how that data might eventually be presented to the human user.

UTF-8 is popular in the United States, because it allows non-roman characters to be represented using multibyte sequences in otherwise standard ASCII files. In some places outside the USA, where most characters need to be represented using multibyte sequences, UTF-8 is less popular, and many people prefer to use 16-bit Unicode characters (UTF-16) directly. Multicast DNS adopts UTF-8 as the best way to maintain compatibility with existing ASCII names, while at the same time providing the capability to represent non-roman characters, too.