Visual Equivalence Attacks and the Homograph Attack
In early 2002, two researchers, Evgeniy Gabrilovich and Alex Gontmakher, released an interesting paper entitled The Homograph Attack, available at http://www.cs.technion.ac.il/~gabr/pubs.html. The crux of their paper is that some characters look the same as others, but they are in fact different. Take a look at Figure 11-2.
Figure 11-2. Looks like localhost, doesn't it? However, it's not. The word localhost has a special Cyrillic character o that looks like an ASCII o .
The problem is that the last letter o in localhost is not a Latin letter o, it's a Cyrillic character o (U+043E), and while the two are visually equivalent they are semantically different. Even though the user thinks she is accessing her machine, she is not; she is accessing a remote server on the network. Other Cyrillic examples include a, c, e, p, y, x, H, T, and M they all look like Latin characters, but in fact, they are not.
Another example, is the fraction slash, , U+2044, and the slash character, / , U+002F. Once again, they look the same. There are many others in the Unicode repertoire; I've outlined some in Chapter 14, Internationalization Issues.
The oldest mixup is the number zero and the uppercase letter O .
The problem with visual equivalence is that users may see a URL that looks like it will perform a given action, when in fact it would perform another action. Who would have thought a link to localhost would have accessed a remote computer named localhost?