WordprocessingML is Microsoft's XML format for Word documents. It's what you get when you select Save As... and choose "XML Document." WordprocessingML is a lossless format, which means that it contains all the information that Word needs to re-open a document, just as if it had been saved in the traditional .doc format all text, formatting, styles, document metadata, images, macros, revision history, Smart Tags, etc. ( The one exception is that WordprocessingML does not embed TrueType fonts, which is only a disadvantage if the users opening the document do not have the needed font installed on their system.) Indicative of Word's tremendous size and legacy, the WordprocessingML schema file approaches 7,000 lines in length. Fortunately, a little bit of knowledge about WordprocessingML can go a long way.
To gain an advanced understanding of WordprocessingML, you'll need to first understand the fundamentals of Word itself. While this chapter briefly touches on Word's global architecture and design, books such as the following can provide a more solid foundation:
In this chapter, we'll examine several increasingly detailed examples of WordprocessingML. First, we'll take a look at the definitive "Hello, World" example for WordprocessingML. Next, after learning some tips for working with WordprocessingML, we'll take a tour through an example WordprocessingML document as output by Word. Then, we'll systematically cover Word's primary formatting constructs: runs, paragraphs, tables, lists, sections, etc. Finally, we'll take another look at one of Word's most important features: the style. Understanding how styles work how they interact with direct formatting and how they relate to document templates is essential to an overall understanding of WordprocessingML and Word in general.
2.1.1 A Simple Example
Example 2-1 shows a WordprocessingML document that one might create by hand in a plain text editor. This example represents the simplest non-empty WordprocessingML document possible.
Example 2-1. A simple WordprocessingML document created by hand
<?xml version="1.0"?> <?mso-application prog?> <w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"> <w:body> <w:p> <w:r> <w:t>Hello, World!</w:t> </w:r> </w:p> </w:body> </w:wordDocument>
The first thing to note about this example is the mso-application processing instruction (PI). This is a generic PI used by various applications within the Microsoft Office System. Its purpose is to associate the given .xml file with a particular application in the Office suite. In this case, the file is associated with Microsoft Word. This has a double effect: not only is the Word application launched when a user double-clicks the file, but Windows Explorer renders the file using a special Word XML icon. This behavior is enabled through an Explorer shell that is automatically installed with Office 2003. All XML documents saved by Word will include this PI. We'll see more uses of the mso-application PI in Chapter 7 and Chapter 10.
As mentioned above, Example 2-1 shows the simplest non-empty WordprocessingML document possible. The w:body element is the only required child element of the w:wordDocument root element. It technically can be empty, but that would make for a pretty boring first example. The w:p element stands for "paragraph," w:r stands for "run," and w:t stands for "text." The namespace prefix w maps to the primary WordprocessingML namespace: http://schemas.microsoft.com/office/word/2003/wordml.
With few exceptions, all text in a given document is contained within a w:t element that's contained within a w:r element that's contained within a w:p element. A final thing to note is that, except for the w:wordDocument element, none of the elements in Example 2-1 (w:body, w:p, w:r, and w:t) can have attributes. As we'll see, properties are instead assigned (to paragraphs and runs) using child elements. Figure 2-1 shows the result of opening our example document in Word. We see "Hello, World!" in the default font and font size, in the default view. Word supplies these defaults, because they are not explicitly specified in our WordprocessingML document.
Figure 2-1. Our hand-edited WordprocessingML file, opened in Word