Office Open XML File Formats


The remainder of this chapter discusses how to generate documents by using the new Office Open XML file formats that are being introduced with the 2007 Microsoft Office System. As you will see, these new file formats make it possible to generate Office Word 2007 documents and Office Excel 2007 workbooks on the server without running a desktop version of Word or Excel. Along the way, we will describe what makes using the Office Open XML file formats especially attractive when creating custom solutions for WSS and MOSS.

The Office Open XML file format specification was approved in 2006 by Ecma International as the Ecma 376 standard and was under review by the ISO international standards body at press time. You can download the Office Open XML specification as well as other great online resources at http://openxmldeveloper.com. Our goal in this chapter is to get you up and running with the programming techniques required to create simple Word documents within a WSS solution.

Motivation for the Office Open XML File Format

It used to be challenging to write and deploy server-side applications that could read, modify, and generate documents used by the Microsoft Office suite of applications. The older binary file format used by Word, Excel, and Office PowerPoint was introduced in 1997 and remained the default file format up through the Microsoft Office 2003 editions. Experience has shown that this binary file format has proven to be too tricky for most companies to work with directly because one little mistake in the code that generates or modifies an Office document typically corrupts the entire file. The vast majority of production applications that read and write documents using the Microsoft Office 2003 editions (or earlier) do so by going through the object model of the hosting application.

Custom applications and components that use the application object model, such as Word or Excel, run much better on the desktop than they do in server-side scenarios. Anyone who has spent time writing the extra infrastructure code required to make a desktop application behave reliably on the server will tell you it’s a hack because desktop applications, such as Word and Excel, were never designed to run on the server. They require a custom utility program to terminate and restart them whenever they encounter a modal dialog box that requires human intervention.

What is far more desirable in a server-side scenario is the ability to read and write documents without going through the object model of the hosting application. The Microsoft Office 2000 and Microsoft Office 2003 editions introduced some modest capabilities for using XML to create the content for Excel workbooks and Word documents. This advancement introduced the possibility of writing portions of a document by using an XML parser, such as the one contained in the .NET Framework provided through the System.XML namespace.

With the 2007 Microsoft Office System, Microsoft has taken this idea much further by adopting the Office Open XML file formats for documents used by Word, Excel, and PowerPoint. Office Open XML file formats are an exciting advancement for WSS and MOSS developers because these formats provide the ability to read, write, and generate a Word document, Excel workbook, or PowerPoint presentation on the server without requiring the hosting Web server to run a desktop application.

Word 2007 Document Internals

Let’s begin by examining the structure of a simple Word document based on the Office Open XML file formats. Office Open XML file formats are based on standard ZIP file technology. Each top-level file is saved as a ZIP archive, which means you are able to open a Word document just as you would any other ZIP file and snoop around inside by using the ZIP file support built into Windows Explorer.

You should note that the 2007 Microsoft Office suite of applications, such as Word and Excel, introduced new file extensions for documents that use the new formats. For example, the .docx extension is used for Word documents stored in the Office Open XML file format, whereas the older and more familiar .doc extension continues to be used for Word documents stored in the older binary format.

Once Word 2007 is installed, you can start by creating a new Word document and adding the text “Hello World.” Save the document using the default format to a new file named Hello.docx and close Word. Next, locate Hello.docx in the file system by using Windows Explorer. Rename it Hello.zip. This enables Windows Explorer to recognize the file as a ZIP archive. You can now open the Hello.zip archive and see the structure of folders and files that Word created, as shown in Figure 7-6.

image from book
Figure 7-6: A .docx file is a ZIP archive known as a package that contains parts and items.

After this quick look inside a .docx file, it’s time to introduce some basic concepts and terminology involved with documents conforming to the Office Open XML file formats. The top-level file (such as Hello.docx) is known as a package. Because a package is implemented as a standard ZIP archive, it automatically provides compression and makes its contents instantly accessible to many existing utilities and APIs on Windows platforms and non-Windows platforms alike.

Inside a package are two kinds of internal components: parts and items. In general, parts contain content and items contain metadata describing the parts. Items can be further subdivided into relationship items and content-type items. We will now dive into a bit more detail on each type of component.

A part is an internal component containing content that is persisted inside the package. The majority of parts are simple text files serialized as XML with an associated XML schema. However, parts can also be serialized as binary data when necessary, such as when a Word document contains a graphic image or media file.

A part is named by using a uniform resource identifier (URI) that contains its relative path within the package file combined with the part file name. For example, the main part within the package for a Word document is named /word/document.xml. The following list presents examples of typical part names that you will find inside the package for a simple Word document.

 /docProps/app.xml /docProps/core.xml /word/document.xml /word/fontTable.xml /word/settings.xml /word/styles.xml /word/theme/theme1.xml

Office Open XML file formats use relationships to define associations between a source and atarget part. A package relationship defines an association between the top-level package and a part. A part relationship defines an association between a parent part and a child part.

Relationships are important because they make these associations discoverable without examining the content within the parts in question. Relationships are independent of content-specific schemas and are, therefore, faster to resolve. An additional benefit is that you can establish a relationship between two parts without modifying either of them.

Relationships are defined in internal components known as relationship items. A relationship item is stored inside the package just like a part, although a relationship item is not actually considered a part. For consistency, relationship items are always created inside folders named _rels.

For example, a package contains exactly one package relationship item named /_rels/.rels. The package relationship item contains XML elements to define package relationships, such as the one between the top-level package for a .docx file and the internal part /word/document.xml.

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Relationships xmlns="../package/2006/relationships ">   <Relationship                  Type="../officeDocument/2006/relationships/officeDocument"                 Target="word/document.xml"/> </Relationships>

As you can see, a Relationship element defines a name, type, and target part. You should also observe that the type name for a relationship is defined by using the same conventions used to create XML namespaces.

In addition to a single package relationship item, a package can also contain one or more part relationship items. For example, you define relationships between /word/document.xml and child parts inside a package relationship item located at the URI /word/_rels/document.xml.rels. Note that the Target attribute for a relationship in a part relationship item is a URI relative to the parent part and not the top-level package.

Every part inside a package is defined in terms of a specific content type. Don’t confuse these content types with a content type in WSS because the two are distinct. A content type within a package is metadata that defines a part’s media type, a subtype, and a set of optional parameters. Any content type used within a package must be explicitly defined inside a component known as a content type item. Each package has exactly one content type item named /[Content_Types].xml. The following is an example of content type definitions inside the /[Content_Types].xml item of a typical Word document.

  <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">   <Default      Extension="rels"      ContentType="application/vnd.openxmlformats                   package.relationships+xml"/>   <Default      Extension="xml"      ContentType="application/xml"/>   <Override    PartName="/word/document.xml"    ContentType="application/vnd.openxmlformats                 officedocument.wordprocessingml.document.main+xml "/> </Types> 

Content types are used by the consumer of a package to interpret how to read and render the content within its parts. As you can see in the previous listing, a default content type is typically associated with a file extension, such as .rels or .xml. Override content types are used to define a specific part in terms of a content type that differs from the default content type associated with its file extension. For example, /word/document.xml is associated with an Override content type that differs from the default content type used for files with an .xml extension.

Generating Your First .docx File

Although there are several existing libraries that you can use to read and write to ZIP archives, you should prefer using the new packaging API that is part of the WindowsBase.dll assembly that ships with the .NET 3.0 Framework because the packaging API is aware of the Office Open XML file formats. For example, certain convenience methods make it easy to add relationship elements to a relationship item and add content type elements to a content type item. The packaging API makes things easier because you never have to touch relationship or content type items directly.

One of the nice things about developing for WSS 3.0 is that it has dependency on the .NET 3.0 Framework. You can be certain that the WindowsBase assembly and packaging API will always be available on any Web server running WSS 3.0 or MOSS.

To begin programming against the packaging API in a Microsoft Visual Studio 2005 project, you should add a reference to the WindowsBase assembly, as shown in Figure 7-7.

image from book
Figure 7-7: Add a reference to the WindowsBase.dll assembly to begin programming against the new packaging API.

Let’s begin by creating a simple console application that generates a .docx file using the Office Open XML file formats. We will then modify the code to generate a .docx file on the server from a custom application page in the DocumentManager solution in response to a client request from the browser.

The classes that make up the packaging API are contained inside the System.IO.Package namespace. When working with packages, you are also frequently programming against older (and hopefully familiar) classes in the System.IO and System.Xml namespaces. Examine the following code that shows the skeleton for creating a new package:

 using System; using System.IO; using System.IO.Packaging; using System.Xml; namespace HelloDocx {   class Program {     static void Main() {       // (1) create a new package       Package package = Package.Open(@"c:\Data\Hello.docx",                         FileMode.Create,                         FileAccess.ReadWrite);       // (2) WRITE CODE HERE TO CREATE PARTS AND ADD CONTENT       // (3) close package       package.Close();     }   } }

The System.IO.Packaging namespace contains the Package class that exposes a shared method named Open that can be used to create new packages as well as to open existing packages. As with many other classes that deal with file IO, a call to Open should always be complemented with a call to Close.

Once you create the new package, the next step is to create one or more parts and serialize content into them. In our next example, we follow the official guidelines for a “hello world” application that requires the creation of a single part named /word/document.xml. You can create a part by calling the CreatePart method on an open Package object and passing parameters for a URI and a string-based content type.

 // create main document part (document.xml) ... Uri uri = new Uri("/word/document.xml", UriKind.Relative); string partContentType; partContentType = "application/vnd.openxmlformats" +                   "-officedocument.wordprocessingml.document.main+xml"; PackagePart part = package.CreatePart(uri, partContentType); // get stream for document.xml StreamWriter streamPart; streamPart = new StreamWriter(part.GetStream(FileMode.Create,                                              FileAccess.Write));

The call to CreatePart passes a URI based on the path /word/document.xml and the content type that’s required by the Office Open XML file formats for the part in a word processing document containing the main story. Once you create a part, you must serialize your content into it by using standard stream-based programming techniques. The previous code opens a stream on the part by calling the GetStream method and uses this stream object to initialize a StreamWriter object.

The StreamWriter object is used to serialize the “hello world” XML document into document.xml. However, it’s important that you understand what the resulting XML is going to look like. Examine the following XML that represents the simplest of XML documents that can be serialized into document.xml.

 <?xml version="1.0" encoding="utf-8"?> <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">   <w:body>     <w:p>       <w:r>         <w:t>Hello Open XML</w:t>       </w:r>     </w:p>   </w:body> </w:document>

Note that all elements within this XML document are defined within the http://schemas.openxmlformats.org/wordprocessingml/2006/main namespace as required by the Office Open XML file formats. The XML document contains a high-level document element, and within the document element is a body element that contains the main story of the Word document itself.

Within the body element is a <p> element for each paragraph. Within the <p> element is an <r> element that defines a run. A run is a region of elements that share the same set of characteristics. Within the run is a <t> element that defines a range of text.

It is now time to generate this XML document with code by using the XmlWriter class from the System.Xml namespace. Examine how the following code creates these elements within the proper structure and by using the appropriate namespace.

 // define string variable for Open XML namespace for nsWP: string nsWP = "http://schemas.openxmlformats.org" +                "/wordprocessingml/2006/main"; // write elements into XML document... XmlWriter writer = XmlWriter.Create(streamPart); writer.WriteStartDocument(); writer.WriteStartElement("w", "document", nsWP); writer.WriteStartElement("body", nsWP); writer.WriteStartElement("p", nsWP); writer.WriteStartElement("r", nsWP); writer.WriteStartElement("t", nsWP); // write hello world text into Word Text element writer.WriteValue("Hello Open XML"); // close all elements writer.WriteEndElement(); writer.WriteEndElement(); writer.WriteEndElement(); writer.WriteEndElement(); writer.WriteEndElement(); writer.WriteEndDocument(); // close XmlWriter object writer.Close();

We are through writing the XML content into document.xml. The final step is to create a relationship between the package and document.xml by calling the CreateRelationship method of the Package object. This is an easy process as long as you know the correct string value for the relationship type and can come up with a unique name (such as “rId1”) for the relationship being created.

 // create the relationship part string relationshipType; relationshipType = "http://schemas.openxmlformats.org" +                    "/officeDocument/2006/relationships/officeDocument"; package.CreateRelationship(uri,                            TargetMode.Internal,                            relationshipType,                            "rId1"); package.Flush();

You should observe the call to Flush after the call to CreateRelationship. This call forces the packaging API to update the package relationship item with the proper Relationship element. The final call to the Close method of the Package object completes the package serialization and releases the file handle on Hello.docx.

You have viewed all of the necessary steps to generate a simple .docx file from a console application written in C#. If you want to step through this console application, it is available on the companion Web site in the project named HelloDocx.

Generating .docx Files on the Server

The first thing to consider when modifying code from the console application to run on a Web server is that you probably want to avoid saving the package as a physical file on the host computer. Instead, it is faster to write the contents of the package file into a MemoryStream object created within the hosting IIS worker process. Once you write the package file contents into memory, you can then write the file back to the client by using the OutputStream object of the ASP.NET Response object.

Let’s start by changing the code to use a MemoryStream object instead of a physical file. Examine the following code that has been added to a server-side event handler inside an ASP.NET 2.0 page in the Visual Studio Web site project named HelloDocxfromASPNET.

 // create in-memory stream as buffer MemoryStream bufferStream = new MemoryStream(); // create new package in memory stream Package package = Package.Open(bufferStream,                                FileMode.Create,                                FileAccess.ReadWrite); // this calls same code shown in previous HelloDocx example WriteContentToPackage(package); // save/close package object leaving DOCX file in MemoryStream package.Close(); // (1) TODO - SET UP HTTP HEADERS FOR RESPONSE // (2) TODO - WRITE PACKAGE CONTENT INTO RESPONSE BODY

Note that the first parameter to Package.Open has changed from the string-based file path to a MemoryStream object. This approach enables you to reuse the same code for generating the package file and its parts that was shown in the previous console application example. However, you need not worry about creating and naming an OS-level file. This approach also provides faster response times and better throughput in high-traffic scenarios because it eliminates any need for disk I/O.

The previous code in this section creates a MemoryStream object and then serializes a .docx file into it just as you would serialize a .docx file into a physical file. The code inside the custom WriteContentToPackage method was taken directly from the HelloDocx console application project shown earlier, but it’s now creating a package and serializing it into a buffer in memory instead of into a physical file.

Once you write the package into the MemoryStream object and call Close on the package object, you are done with the packaging API. Let’s assume that we don’t want to store the newly created package file on the server in this case. Instead, we simply want to send it back to the client for viewing and editing. This also gives the user the option to save the package file locally or back into a WSS document library. All that’s left to do is set up the appropriate HTTP headers and write the package content into the body of the response that is being sent back to the client.

Let’s start with the HTTP headers. You should call methods on the ASP.NET Response object to clear any existing headers and add a content disposition header specifying that the response contains an attachment with the file name Hello.docx.

 Response.ClearHeaders(); Response.AddHeader("content-disposition",                    "attachment; filename=Hello.docx");

Next, you must set up the encoding and Multipurpose Internet Mail Extensions (MIME) content type for the HTTP response and then write the binary content for the package into the body of the HTTP response. This can be accomplished with the following code:

 Response.ClearContent(); Response.ContentEncoding = System.Text.Encoding.UTF8; Response.ContentType = "application/vnd.ms-word.document.12"; // write package to response stream bufferStream.Position = 0; BinaryWriter writer = new BinaryWriter(Response.OutputStream); BinaryReader reader = new BinaryReader(bufferStream); writer.Write(reader.ReadBytes((int)bufferStream.Length)); reader.Close(); writer.Close(); bufferStream.Close(); // flush and close the response object Response.Flush(); Response.Close();

Note that the last two calls to Response.Flush and Response.Close are required to make sure the entire package is completely written into the HTTP response before it is transmitted back to the caller. If you would like to test the code you have just seen, the sample code for this chapter contains a directory named HelloDocxFromASPNET that contains an ASP.NET 2.0 Web site with a functioning version. Using Visual Studio, you can use the Open Web Site command and open the directory for this project through the file system.

When you run the code behind the default.aspx to generate the .docx file, it executes all of the code you have just seen and transmits the file back to the user. As long as the client machine is configured to understand the MIME content type associated with a .docx file, the user is presented with the dialog box shown in Figure 7-8, which provides the option to open the document immediately within Word or to save it to a location on the local hard drive. The correct MIME type is configured whenever you install Word 2007, and it is also configured to work correctly with earlier versions of Word as long as you install the converter for .docx files that can be downloaded for free from the Microsoft public Web site.

image from book
Figure 7-8: When you set up the appropriate MIME content type, the user is given a chance to open a server-side–generated .docx file directly in Word.

If the user clicks the Open button, the .docx file opens automatically in Word. It is interesting to note that, up to this point, the package file has never been saved as a physical file to the file system, but has only been stored in memory both on the Web server and on the client desktop computer. If the user closes the document without saving it, it is as if it never existed at all. This approach of generating documents on the server is ideal for creating letters, memos, customer lists, and other types of documents based on data in WSS lists and databases.

Although the Word document we just created was perhaps a little boring, the concepts and possibilities of what you can do with the Office Open XML file formats are far more exciting. With a little imagination, you can start pulling data from WSS lists as well as backend databases and Web services to create Word documents that make your users and their managers happy. Once again, we encourage you to frequent the community site at http://openxmldeveloper.com to view some great examples and expand your horizons even further.

Saving a .docx File in a Document Library

The last two examples demonstrated how to generate a .docx file from both a console application and a standard ASP.NET 2.0 page. Now that you know the basic concepts, we will move ahead and provide some examples in a custom application page that’s part of the DocumentManager solution.

The DocumentManager4.aspx application page, which is accessible through the Site Actions menu, enables a user to select a customer and a letter template, as shown in Figure 7-9. Note that the DocumentManager feature adds a Customers list and a Letter Templates list and populates each of them with some sample data during feature activation. Once the user selects a customer and letter template, two command buttons with event handlers demonstrate two different methods to create a .docx file. Although the two event handlers use different techniques to create a customer letter, both save the resulting .docx file to a document library named Customer Letters.

image from book
Figure 7-9: DocumentManager4.aspx demonstrates two ways to generate a .docx file from WSS list data.

Open the DocumentManager4.aspx application page in Visual Studio and look at the event handler named cmdGenerateLetter1_Click, which provides the code to determine the selected customer and letter templates. It then uses this data to retrieve the required information to generate a customer letter from the Customers and Letter Templates lists. It accomplishes this task by creating a package file in a MemoryStream object and dynamically generating the document.xml part as you saw in the last example.

This example differs from the previous one because it saves the generated .docx file into a document library named Customer Letters. To accomplish this, the code must determine the path to the document library as well as the path to be used to save the .docx file.

 SPDocumentLibrary targetLibrary; targetLibrary = (SPDocumentLibrary)site.Lists["Customer Letters"]; string libraryRelativePath = targetLibrary.RootFolder.ServerRelativeUrl; string libraryPath = siteCollection.MakeFullUrl(libraryRelativePath); string documentName = "CustomerLetter01.docx"; string documentPath = libraryPath + "/" + documentName;

Once you determine the document path and write the package for the .docx file into a memory stream, you can call the Add method of the current site’s Files collection.

 MemoryStream documentStream = new MemoryStream(); // create new package in memory stream Package package = Package.Open(documentStream,                                FileMode.Create,                                FileAccess.ReadWrite); // Code to write package for .docx file omitted for clarity package.Close(); site.Files.Add(documentPath, documentStream, true); Response.Redirect(libraryPath);

If you examine the code inside the cmdGenerateLetter1_Click event handler that generates the WordProcessingML content for document.xml, you see that it is more involved than the Hello World example shown earlier. The code demonstrates the creation of multiple paragraphs in a document as well as using line breaks and paragraph formatting.

A Closer Look at Relationships

The structure of a package in the Office Open XML file formats is heavily dependent on relationships. If you create parts but fail to associate them to the package through relationships, then consumer applications (such as Word) are not able to recognize them because every part must have a relationship or a chain of relationships that associate it with its containing package.

As discussed earlier in this chapter, package relationships define an association between a package and its top-level parts. Figure 7-10 shows that the typical top-level parts in a .docx file created with Word 2007 are /docProps/app.xml, /docProps/core.xml, and /word/document.xml. Part relationships define a parent-child relationship between two parts within the same package. The part /word/document.xml typically has relationships to several different child parts, such as /word/settings.xml and /word/styles.xml.

image from book
Figure 7-10: A package defines a hierarchy of relationships in which the package is always the root.

The Office Open XML file format specification states that every part inside a package must be associated either directly or indirectly with the package itself. A part, such as /word/document.xml, is directly associated with the package through a package relationship. Another part, such as /word/styles.xml, is associated with the package indirectly because it is associated with the top-level part /word/document.xml that is, in turn, associated with the package.

A very important concept to understand is that a consumer application must be able to discover any part within a package by enumerating through its relationships. In fact, when you are writing your own applications that read packages created by other applications, such as Word 2007 and Excel 2007, you are also encouraged to discover the existing parts by enumerating relationships.

Package Viewer Sample Application

As mentioned earlier, the specification for the Office Open XML file format states that all parts within a package must be discoverable through relationships. Therefore, it’s possible to write an application that inspects a package and displays all of the parts inside it.

This chapter is accompanied by a sample Windows Forms application named the Package Viewer, which is available on the companion Web site. You can open this project in Visual Studio and run it to test out the code. The Package Viewer application enables the user to open a file structured using the Office Open XML file formats. When the user opens a package file, the Package Viewer populates a TreeView control with nodes that show a package and all of its parts nested within a hierarchy of relationships, as shown in Figure 7-11. The application also provides more information about the package and specific parts when you click a node in the tree view. For each part, you can see its content type, its parent, and the relationship type that associates it with its parent.

image from book
Figure 7-11: The Package Viewer sample application enables you to inspect the parts within a package.

The Package Viewer application enables the user to select a package file by using the standard Open File dialog box. Once the user selects a file, the application enumerates through all of the package relationships to discover the top-level parts.

The Package Viewer also provides the functionality to display the contents of any XML-based parts within the package by reopening the package and acquiring stream-based access to the target part. The XML-based content of the part is then written to a temporary file. Finally, the file is loaded into the Windows Forms WebBrowser control, which displays the XML content with color coding and collapsible sections. Hopefully, you will find this sample application useful as you begin learning about the parts and the XML required to work with Office Open XML file format documents.

From our discussion so far, you should have a better idea concerning what’s required to generate documents for Word, Excel, and PowerPoint using the Office Open XML file formats. In theory, it’s quite simple. All that you must do is create a new package file, add the required parts, and fill them up with XML content structured in accordance with the appropriate XML schemas. Yet though the theory is simple, the learning curve for getting up to speed on the practical details takes some time. It also requires that you read through the relevant sections of the Office Open XML file formats that can be downloaded from http://openxmldeveloper.org.

If you are going to work with Word documents, you must learn what types of parts go inside a package and how they must be structured in terms of content types and relationships. You also must learn how to generate the WordProcessingML that goes inside each of these parts. If you want to work with Excel spreadsheets, the content types and relationships are different. Instead of using WordProcessingML, you need to learn SpreadsheetML. It takes an investment on your part if you want to generate the XML required to create documents from scratch that contain things such as tables, graphics, and fancy formatting. However, the value of programmatically generating rich documents directly from backend data is often well worth the investment.

Data Binding to Word Content Controls

We conclude this chapter by showing you one more technique for creating professional-looking Word 2007 documents. However, this technique does not require you to write any code to generate WordProcessingML at run time.

We start by introducing two new features of Word 2007 that can be used when working with documents stored in the new Office Open XML file formats. The first feature is the XML Data Store, which enables you to embed one or more user-defined XML documents as parts inside a .docx file. The second feature is Content Controls, which are user interface components defined inside the /word/document.xml part that support data entry and data binding.

If you want to experiment with using Content Controls, you should first go to the Word Options dialog box in Word 2007 and enable the Show Developer Tab In The Ribbon option.

Enable and navigate to the Developer tab to find the set of controls that can be added to a Word document, as shown in Figure 7-12.

image from book
Figure 7-12: The Developer tab in Word 2007 provides a set of Content Controls.

Note that you can add Content Controls to only Word documents that are stored in the new .docx format. Content Controls are not supported in .doc files because they cannot be defined by using the older binary format. However, when working in the new format, Content Controls can be added to provide user input elements into a Word document. For example, you can construct a Word document that solicits the user for certain pieces of information to complete a business document, as shown in Figure 7-13.

image from book
Figure 7-13: Content Controls can provide user input capabilities to a Word document.

Keep in mind that Content Controls have two different modes: edit mode and display mode. Edit mode enables the user to do things such as type text, pick a date from a date picker, or select an item from a drop-down list. Display mode is optimized for displaying and printing. In other words, the editing aspects of Content Controls are invisible to a user who is no longer in edit mode.

Separating Data from Presentation in Word 2007

Although the XML Data Store and Content Controls are two separate features of Word 2007 that can be used independently of one another, they provide a powerful level of synergy when used together. For example, you can embed an XML document customer data or an invoice inside a Word document. You can then bind Content Controls to data inside this XML document by using XPath expressions. This effectively enables you to separate your data from the formatting instructions that tell Word how to display it.

The second example of generating a .docx file in the DocumentManager4.aspx custom application page doesn’t involve writing code to generate WordProcessingML. Instead, it works with a standard XML document that holds the data required to create a customer letter. The following example demonstrates what one of these XML document instances looks like.

  <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <LitwareLetter xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"                xmlns:xsd="http://www.w3.org/2001/XMLSchema"                xmlns="http://litware.com/2006/letters">   <Customer>     <FirstName>Brian</FirstName>     <LastName>Cox</LastName>     <Company>Litware</Company>     <Address>1000 Madison Ave</Address>     <City>Las Vegas</City>     <State>NV</State>     <Zip>32145</Zip>   </Customer>   <Date>December 25, 2006</Date>   <Body>We miss you. Please give us a call.</Body>   <Employee>     <Name>Angela Barbariol</Name>   </Employee> </LitwareLetter> 

Using the technique discussed here usually involves manually manipulating the contents of the .docx file that you intend to use as the template. When you begin building a template from a new .docx file, you should change its file extension to .zip so that you can open the package file directly and add parts by dragging them into the package by using Windows Explorer.

Inside the sample code for this chapter is a directory named LitwareLetterTemplate. Inside this directory is a sample template file named LetterTemplate.docx that contains all of the necessary parts and relationships for binding Content Controls to an XML document in the XML Data Store. We encourage you to inspect LetterTemplate.docx with the Package Viewer application because this allows you to see quickly how all of the pieces fit together.

The XML document shown in this section is embedded as a part inside the LetterTemplate.docx file using the URI /customXml/item1.xml. Note that you can name this part something other than item1.xml. However, in this example, we wanted to be consistent with the names that Word 2007 uses when it creates and renames parts in the XML Data Store.

The next thing you must do is create a way to identify the content within the customer XML document. You do this by defining a datastoreItem with an identifying GUID, which is accomplished by creating a part named /customXml/itemProps1.xml and establishing a relation between it and its parent part /customXml/item1.xml. The following example displays what the contents of /customXml/itemProps1.xml should look like.

 <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <ds:datastoreItem ds:item  xmlns:ds="http://schemas.openxmlformats.org/officeDocument/2006/customXml">   <ds:schemaRefs>     <ds:schemaRef ds:uri="http://www.w3.org/2001/XMLSchema" />     <ds:schemaRef ds:uri="http://litware.com/2006/letters" />   </ds:schemaRefs> </ds:datastoreItem>

The next step is to create a part relationship between /word/document.xml and /customXml/ item1.xml. The relationship should define the parent part as /word/document.xml and the child part as /customXml/item1.xml, and the relationship type should be established using the following string.

 http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml

After you have completed these steps to set up a user-defined XML document as an identifiable datastoreItem, you can now access its data from /word/document.xml. For example, you can write VBA code behind a macro-enabled Word 2007 document that retrieves a CustomXmlPart object from the ActiveDocument object and pulls the required customer data out of the embedded XML document. In our case, we are going to create Content Controls in /word/ document.xml and bind them to particular elements within the XML document with the customer data.

Unfortunately, Word 2007 provides no way through its user interface to bind Content Controls to elements within the XML Data Store. Therefore, you are required to make direct edits to a /word/document.xml part by using an XML editor, such as Visual Studio 2005. In the following example of the WordprocessingML element, you can add to the /word/document.xml part to create a binding to the FirstName element within the Customer element of the user-defined XML document.

 <w:sdtPr>   <w:dataBinding    w:prefixMappings="xmlns:ns0='http://litware.com/2006/letters'"    w:xpath="/ns0:LitwareLetter[1]/ns0:Customer[1]/ns0:FirstName"    w:storeItem /> </w:sdtPr>

The data-binding element contains a storeItemID attribute that references the GUID of the datastoreItem that holds the customer data. The data-binding element also contains an xpath attribute that defines an XPath expression to bind the Content Control to a specific element within the user-defined XML file. Once you have updated the /word/document.xml file to contain all of the data-binding elements you need to display the relevant data from the user-defined XML file, you can construct a Word document with bound controls that can read from and write back to the item1.xml part.

Although it might take some time to become fluent with the technique of manually constructing Word documents that contain user-defined XML files and bound Content Controls, it is often well worth the effort. The design approach provides an elegant solution to separating your data from the presentation of that data. Once you have created a minimal Word document with bound Content Controls, you can do the rest of the work in making the document look professional directly within Word. For example, you can add literal text and a logo just like any other Word user. You can create new tables and sections and simply drag the Content Controls where you would like them to appear.

It’s also important to keep in mind that bound Content Controls can provide two-way synchronization with data in a user-defined XML file. Although users can see whatever data you have added to the user-defined XML file, they can also update this data. When a user updates the data in a Content Control and saves the document, those changes are written back to the user-defined XML file. This makes it very easy to extract the updated data in an XML format that can conform to any XML schema that you would like to work with.

Updating the XML Data Store Programmatically

You have seen how to bind Content Controls to a user-defined XML document in the XML Data Store. The final step is to write code that generates a new instance of that XML document with the data for a particular customer letter and to overwrite the XML document inside an existing Word document template that is set up with the proper bound Content Controls.

Now let’s return to the custom application page named DocumentManager4.aspx and discuss the code behind the second command button’s OnClick event handler. This code loads the pre-defined document template named LetterTemplate.docx into a MemoryStream object and then opens it with the .NET 3.0 packaging API.

 Stream documentStream = new MemoryStream(); SPFile DocumentTemplate =        site.GetFile(libraryPath + "/Forms/LetterTemplate.docx"); Stream documentTemplateStream = DocumentTemplate.OpenBinaryStream(); BinaryReader DocumentTemplateReader; DocumentTemplateReader = new BinaryReader(documentTemplateStream); BinaryWriter DocumentWriter = new BinaryWriter(documentStream); DocumentWriter.Write(     DocumentTemplateReader.ReadBytes((int)documentTemplateStream.Length)); DocumentWriter.Flush(); DocumentTemplateReader.Close(); documentTemplateStream.Dispose(); // open .docx file in memory stream as package file Package package = Package.Open(documentStream,                                FileMode.Open,                                FileAccess.ReadWrite); // retrieve package part with XML data Uri uriData = new Uri("/customXML/item1.xml", UriKind.Relative); PackagePart LetterXmlPart = package.GetPart(uriData); // get stream for item1.xml and delete any content that's inside it Stream LetterXmlPartStream = LetterXmlPart.GetStream(); LetterXmlPartStream.SetLength(0);

Once LetterTemplate.docx is loaded into a MemoryStream object, the code in this listing obtains stream-based access to the /customXML/item1.xml part and deletes any existing content inside. All that is now required is to write a new XML document with content for a new customer letter into this part. In our example, we create the new XML document instance by using a schema-generated class named LitwareLetter that is defined in the source file LitwareLetter.cs. Once the code creates an instance of the LitwareLetter class and initializes it with data for a specific customer letter, it then serializes the XML document into item1.xml by using an XmlSerialization object.

 // create objects from schema-generated types LitwareLetter letter = new LitwareLetter(); // code to populate LitwareLetter object omitted for clarity XmlSerializer serializer = new XmlSerializer(typeof(LitwareLetter)); serializer.Serialize(LetterXmlPartStream, letter); //close package file to finalize writing to memory buffer LetterXmlPartStream.Close(); package.Close(); // add new .docx file into target library site.Files.Add(documentPath, documentStream, true);

As you can see, there is also code to close the stream used to access item1.xml and close the package file itself. Finally, a call to the Add method of the current SPWeb object’s Files collection is used to actually save the new .docx file into the document library named Customer Letters. The resulting Word document produced by this code is shown in Figure 7-14. The resulting document is richly formatted, but our code does not need to deal with that aspect because the formatting is part of the document template and is separated from the letter content.

This concludes our introduction to generating documents in the 2007 Microsoft Office System by using the Office Open XML file formats. Although the examples shown here are solely presented as an introduction, you can see that quite a bit of potential exists for creating rich business solutions and that you can pack quite a punch by using the Office Open XML file formats together with WSS and MOSS. Once you create your documents, you have a place to store and manage them while utilizing all of the valuable aspects of document management in SharePoint Technologies such as collaboration, search, and archiving.

image from book
Figure 7-14: The DocumentManager4.aspx custom application page demonstrates the power of using Content Controls together with the XML Data Store to generate formatted Word documents.




Inside Microsoft Windows Sharepoint Services Version 3
Inside Microsoft Windows Sharepoint Services Version 3
ISBN: 735623201
EAN: N/A
Year: 2007
Pages: 92

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net