For reasons of compatibility with legacy character sets, as well as out-and-out mistakes, a number of characters have more than one representation in Unicode. For example, the umlaut character can be represented as either the single character ¼ or as a u followed by a combining diaresis. XML 1.0  treats these two forms as distinct. For example, Münchn (M ¼nchen) is not the same as Münchn (M ¼nchen). You can see that this might be a bit of a problem.
While such differences are not significant to XML parsing, they may very well be significant to applications that build on top of XML. You should normalize all text that comes into your program before acting on it. Unicode defines four separate normalization algorithms, suitable for different needs. Probably the most generally useful is Normalization Form C (NFC). This tends to produce text that is best displayed by existing software. However, for sorting, searching, indexing, and so forth, Normalization Form KC (NFKC) is usually more appropriate. It's similar to NFC except that it's a little more aggressive in unifying characters. In particular, stylistic variants such as the fi ligature would be replaced by the two letters f and i, whereas NFC would not unify them. Both NFC and NFKC unify stylistically equivalent sequences and characters such as ¼ and a u followed by a combining diaresis.
Actually implementing the various normalization algorithms is relatively tricky, although it mostly involves table lookups. It is a task best left to the experts. Fortunately, high-quality open source and public domain code is available that can do the job for you.
A Google search will turn up numerous other options. Normalization is still something of an esoteric subject. Few developers realize how much they need this, so it hasn't made its way into the standard libraries in major programming languages just yet. Indeed it may not be necessary in pure ASCII environments. However, as soon as you move beyond the ASCII character set and the English language, normalization of strings becomes very important.