Office 2007-Open XML Format


As I write this, Office 2007 has just gone to Beta 2 and should be commercially available by the time the book is on the shelves. Apart from the ribbon and other highly visible changes to the Office user interface, the biggest change relates to XML developers. The native file format for most of the documents is now XML-or rather, a number of XML files bound together in a ZIP format. Figure 25-14 shows the contents of a simple DOCX file.

image from book
Figure 25-14

The files stored within the document contain the actual text, as well as the formatting and other elements. The most commonly used files are:

  • q [Content_Types].xml-A manifest file containing the list of the XML files that make up the document. This also includes the MIME types of each of the documents. The document.xml MIME type is defined as: application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml.

  • q document.xml-The actual text of the document, in XML format (WordProcessingML for Word documents, SpreadsheetML for Excel documents). Note that this only includes the content that made up the <w:body> element of the Word 2003 WordML document (see Figure 25-15), albeit in a different schema.

    image from book
    Figure 25-15

  • q .rels, document.xml.rels-Any files that have relationships with other files include an entry like this. Relationships are pointers to other required files. For example, the document.xml.rels file contains pointers to the settings.xml, theme1.xml. styles.xml, fontTable.xml, and numbering.xml files (see Listing 25-13) because these are all needed to correctly render the Word document. Similarly, the root .rels file has pointers to the document.xml file and the files in the docProps folder. If the document contained images or hyperlinks, these items would also be listed in the relationships file, and stored separately. This helps to reduce the overall size of the document.xml file.

The previous documents are the only required elements for a Word 2007 document. In addition, there are a number of optional files that may occur:

  • q theme1.xml-Contains information about the selected font, color, and format schemes applied to the document, if appropriate.

  • q settings.xml-Configuration settings defined for the document. For example, the document template applied to the file, whether revision marks are turned on, and so on.

  • q webSettings.xml-Configuration settings specific to opening the document in Internet Explorer.

  • q styles.xml-The styles available in the document.

  • q custom.xml-Contains any custom user-defined metadata applied to the document.

  • q app.xml-Contains application-specific metadata. For Word, this includes the number of pages, characters, whether document protection is enabled, and so on.

  • q core.xml-Basic metadata about the document, such as the author, last save date, and so on.

  • q fontTable.xml-Listing of the used fonts in the document, as well as their attributes. These attributes can be used to identify a replacement font if the original is not present.

  • q numbering.xml-The numbering definitions part of the document. This defines how numbered and bulleted lists are displayed. The document references these schemes when displaying lists.

  • q media-Subdirectory where all attached media files, such as images, are stored. A reference pointing to this document in located in the document.xml.rels file.

Listing 25-13: Document.xml.rels

image from book
      <?xml version="1.0" encoding="utf-8" standalone="yes"?>      <Relationships        xmlns="http://schemas.openxmlformats.org/package/2006/relationships">      <Relationship       Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings"        Target="settings.xml" />        <Relationship         Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles"        Target="styles.xml" />      <Relationship       Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/numbering      "        Target="numbering.xml" />        <Relationship         Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme"        Target="theme/theme1.xml" />      <Relationship       Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable      "        Target="fontTable.xml" />      <Relationship       Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettin      gs"        Target="webSettings.xml" />      </Relationships> 
image from book

The basic flow for processing a document using OpenXML format should be the following:

  1. Read the _rels\.rels file to determine the file containing the document. Typically, this is the item identified as rId1, but this is not essential. Look for the relationship that contains a pointer to the http://www.schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument schema:

          <Relationship       Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDoc      ument"        Target="word/document.xml" /> 

  2. Open the document file and process.

  3. If you need additional information, refer to the document.xml.rels file to locate the files needed. All currently have types defined as a subset of the URN http://www.schemas.openxmlformats.org/officeDocument/2006/relationships.

The OpenXML specification does not only define Word documents; it also defines Excel and PowerPoint documents. It is also an extensible and flexible document format. See the References section that follows for the current specification.




Professional XML
Professional XML (Programmer to Programmer)
ISBN: 0471777773
EAN: 2147483647
Year: 2004
Pages: 215

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net