2.1 Introduction to WordprocessingML


WordprocessingML is Microsoft's XML format for Word documents. It's what you get when you select Save As... and choose "XML Document." WordprocessingML is a lossless format, which means that it contains all the information that Word needs to re-open a document, just as if it had been saved in the traditional .doc format all text, formatting, styles, document metadata, images, macros, revision history, Smart Tags, etc. ( The one exception is that WordprocessingML does not embed TrueType fonts, which is only a disadvantage if the users opening the document do not have the needed font installed on their system.) Indicative of Word's tremendous size and legacy, the WordprocessingML schema file approaches 7,000 lines in length. Fortunately, a little bit of knowledge about WordprocessingML can go a long way.

It was only recently that Microsoft began calling Word's XML format "WordprocessingML," whereas previously it was called, simply, "WordML" (as still reflected in the schema's namespace URI). Why they decided to adopt this new name isn't entirely clear...though it certainly is wordier.


To gain an advanced understanding of WordprocessingML, you'll need to first understand the fundamentals of Word itself. While this chapter briefly touches on Word's global architecture and design, books such as the following can provide a more solid foundation:

Word Pocket Guide, by Walter Glenn (O'Reilly)
Word 2000 in a Nutshell, by Walter Glenn (O'Reilly)

In this chapter, we'll examine several increasingly detailed examples of WordprocessingML. First, we'll take a look at the definitive "Hello, World" example for WordprocessingML. Next, after learning some tips for working with WordprocessingML, we'll take a tour through an example WordprocessingML document as output by Word. Then, we'll systematically cover Word's primary formatting constructs: runs, paragraphs, tables, lists, sections, etc. Finally, we'll take another look at one of Word's most important features: the style. Understanding how styles work how they interact with direct formatting and how they relate to document templates is essential to an overall understanding of WordprocessingML and Word in general.

2.1.1 A Simple Example

Example 2-1 shows a WordprocessingML document that one might create by hand in a plain text editor. This example represents the simplest non-empty WordprocessingML document possible.

Example 2-1. A simple WordprocessingML document created by hand
<?xml version="1.0"?> <?mso-application prog?> <w:wordDocument   xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">   <w:body>     <w:p>       <w:r>         <w:t>Hello, World!</w:t>       </w:r>     </w:p>   </w:body> </w:wordDocument>

The first thing to note about this example is the mso-application processing instruction (PI). This is a generic PI used by various applications within the Microsoft Office System. Its purpose is to associate the given .xml file with a particular application in the Office suite. In this case, the file is associated with Microsoft Word. This has a double effect: not only is the Word application launched when a user double-clicks the file, but Windows Explorer renders the file using a special Word XML icon. This behavior is enabled through an Explorer shell that is automatically installed with Office 2003. All XML documents saved by Word will include this PI. We'll see more uses of the mso-application PI in Chapter 7 and Chapter 10.

As mentioned above, Example 2-1 shows the simplest non-empty WordprocessingML document possible. The w:body element is the only required child element of the w:wordDocument root element. It technically can be empty, but that would make for a pretty boring first example. The w:p element stands for "paragraph," w:r stands for "run," and w:t stands for "text." The namespace prefix w maps to the primary WordprocessingML namespace: http://schemas.microsoft.com/office/word/2003/wordml.

Beware the default namespace! Word, in its longstanding attempt to be everything to everybody, does something funny when you try to open a WordprocessingML document that uses a default namespace, rather than the w (or some other) prefix, for elements in the WordprocessingML namespace. It sees the naked (un-prefixed) body element and thinks "This must be HTML!" The easiest way to avoid this problem is to always use an XML declaration (e.g., <?xml version="1.0"?>) at the beginning of an XML document that will be opened by Word. Word will consistently recognize the document as XML if the XML declaration is present.


With few exceptions, all text in a given document is contained within a w:t element that's contained within a w:r element that's contained within a w:p element. A final thing to note is that, except for the w:wordDocument element, none of the elements in Example 2-1 (w:body, w:p, w:r, and w:t) can have attributes. As we'll see, properties are instead assigned (to paragraphs and runs) using child elements. Figure 2-1 shows the result of opening our example document in Word. We see "Hello, World!" in the default font and font size, in the default view. Word supplies these defaults, because they are not explicitly specified in our WordprocessingML document.

Figure 2-1. Our hand-edited WordprocessingML file, opened in Word
figs/oxml_0201.gif




Office 2003 XML
Office 2003 XML
ISBN: 0596005385
EAN: 2147483647
Year: 2003
Pages: 135

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net