5.2 Creating a Smart Document Solution

The document shown in Figures Figure 5-1 and Figure 5-2 was created using a fairly simple Smart Document solution. The remainder of the chapter will walk through each of the steps involved in building a similar application. While far from robust, it touches on each of the major capabilities incorporated into the Smart Document API and will hopefully set your imagination in motion.

Figure 5-1. Article created with Smart Document solution

Figure 5-2. Article with XML Tag View on

Smart Document solutions can be created using Visual Basic 6, Visual Basic .NET, C++ 6, or C# .NET. The examples in this section are all written using VB .NET; however, the Microsoft Office 2003 Smart Document SDK includes examples in all four languages.

A number of articles relating to the creation of Smart Documents can be found on the MSDN web site at http://msdn.microsoft.com/office.

The following components (schemas, XML instance, templates, and styles) will be used as the basis for the examples in this section. The schema is fairly simplistic and included for demonstration purposes only; the intended usage is for magazine article submissions.

5.2.1 Schemas

As mentioned in Chapter 2, Microsoft Office 2003 supports only W3C XML schemas. If you're working with existing SGML or XML document instances, it's likely that you'll have DTDs rather than schemas associated with these instances. This section provides some insight about migrating or extending existing XML environments to Microsoft Word. While far from an exact science, the following guidelines will help you avoid problem areas. Only through experimentation will you be able to determine what works best for your particular applications.

5.2.1.1 Existing Word environments

Chances are good that if your users are already using Microsoft Word to author, revise, and maintain their documents, you'll be able to create a schema and build a suitable XML-based Smart Document solution. Documents that incorporate information from external sources can take advantage of database connectivity and web services to automatically populate information and ensure that it is always current.

5.2.1.2 Existing XML (or SGML) environments

Numerous organizations already take advantage of structured markup for document authoring, editing, and delivery. If you are planning to develop a smart document solution using an existing DTD or schema, give careful consideration to the applicability of such schemas to the goals of the tasks to be designed in Word. If you are working with a schema that is fairly complex, it might be more appropriate to create subsets and build a suite of solutions focused on specific tasks.

Characteristics of a complex schema include, but are not limited to:

More than a handful of elements allowed at common insertion points
Numerous elements with similar meaning that can easily be confused
Elements rarely used
Deep structures (common to DTDs and schemas that are designed to produce multi-volume information sets)

If your organization has an existing repository of XML documents, an analysis of the markup actually used versus what is allowed by the schema can be a valuable resource. Not only will this aid in the development of any Smart Document solutions, it can also serve to simplify any other tools already in existence that must be supported.

Refer to Appendix D for information on converting already-existing DTDs to schemas.

5.2.1.3 Starting from scratch

If your organization doesn't already have DTDs or schemas in place, you will need to either create your own, or find schemas in the public domain suited to the task at hand. It's important to keep the goals of the project in mind while developing schemas; these goals will play an important role in determining the level of granularity to be supported and the specificity of the markup itself, while ensuring that markup will be interchangeable with any other known processes.

5.2.1.3.1 Customer-specific DTDs or schemas

The process of performing an analysis on an information set for the purpose of creating schemas can be quick and easy, drawn out and complex, or anywhere in between. It depends on several factors:

Depth of information set (very complex markup models to support external processes, such as reference works or aircraft documentation versus newspaper articles or consumer-oriented user guides)
Breadth of information set (multiple information delivery types from a single source, such as user guides, administrative manuals, reference manuals, training materials, and marketing materials versus single-purpose documents such as sales proposals or white papers)
External information sets (compatibility with other data sets that will become inputs, integrated with, or accept result data from the solution)

Any and all potential users should have some input into the analysis; it is common for different departments to use different names for similar components, or to view data in very different ways from each other. These differences do not need to be reconciled; instead, unique Smart Document solutions can be created that are targeted to the various groups. One common information set with XML at its core; different frontend applications designed to meet the needs of the individual information worker: that's the power of XML!

5.2.1.3.2 DTDs or schemas developed by committee

Organizations often use industry-standard or consortia-developed schemas. One mistake to avoid is choosing an existing schema rather than developing your own because it seems like the easier thing to do. Before making this decision, it is important that an analysis is performed of your organization's information set, and that the goals and objectives of your overall project are documented. The results can then be evaluated against the existing schema to determine whether or not the chosen schema is appropriate for your organization. Chances are that once you've done the analysis work, you'll discover that creating the actual schema is a simple task, and your organization won't be dependent an outside group for maintenance and revisions.

When using a particular schema in order to meet governmental or corporate requirements, it is usually possible to create a simplified subset for your particular application. The subset will be valid within the overall schema, yet the developers and end users will not need to deal with some of the inherent complexities of these behemoths. Another alternative is to create a mapping from your internal schema to the one required. This will allow your end users to work in an environment that is familiar to them, yet still enable your organization to meet the stated requirements by transforming the resulting information set.

Microsoft Word has always been suited to a certain class of documents, and this hasn't really changed in Office 2003. If you are currently using Word to produce your documents, then chances are that you'll be able to build an XML-enabled smart document solution to accommodate it. If your documents currently require a more sophisticated composition tool, such as Adobe's FrameMaker, or a full-time dedicated XML editor, such as Arbortext's EpicEditor, then Word, even with Smart Documents, most likely will not be able to support your requirements unless you are creating a simplified solution.

5.2.1.4 The SDArticle schema

The SDArticle schema is by no means comprehensive, but it is enough to show how Smart Documents work. As shown in Example 5-1, it consists of an article root element, followed by a title and introductory paragraphs. From there the article is divided into four levels of sections, which contain a mix of paragraphs, lists, warnings, notes, and code blocks. Inline elements consist of emphasis, subscript, superscript, and code.

Example 5-1. The SDArticle example schema (whitespace added for readability)

<xs:schema targetNamespace="http://www.office-xml.com/ns/sdarticle"    xmlns:xs="http://www.w3.org/2001/XMLSchema"     xmlns="http://www.office-xml.com/ns/sdarticle"     elementFormDefault="qualified">     <xs:element name='Article'>   <xs:complexType>    <xs:sequence>      <xs:element ref='ArticleTitle'/>      <xs:choice maxOccurs='unbounded'>        <xs:element ref='Para'/>        <xs:element ref='Section1'/>      </xs:choice>    </xs:sequence>   </xs:complexType>  </xs:element>     <xs:element name='ArticleTitle'>   <xs:complexType mixed='true'>   </xs:complexType>  </xs:element>      <xs:element name='BulletList'>   <xs:complexType>    <xs:sequence>      <xs:element ref='Item' maxOccurs='unbounded'/>    </xs:sequence>   </xs:complexType>  </xs:element>     <xs:element name='Code'>   <xs:complexType mixed='true'>   </xs:complexType>  </xs:element>      <xs:element name='CodeExample'>   <xs:complexType mixed='true'>    <xs:choice minOccurs='0' maxOccurs='unbounded'>     <xs:element ref='Emphasis'/>     <xs:element ref='Superscript'/>     <xs:element ref='Subscript'/>    </xs:choice>   </xs:complexType>  </xs:element>      <xs:element name='Definition'>   <xs:complexType>    <xs:sequence>      <xs:element ref='Para' maxOccurs='unbounded'/>    </xs:sequence>   </xs:complexType>  </xs:element>      <xs:element name='Emphasis'>   <xs:complexType mixed='true'>   <xs:attribute name='CDATA' default='italic'>    <xs:simpleType>     <xs:restriction base='xs:string'>      <xs:enumeration value='bold'/>      <xs:enumeration value='italic'/>      <xs:enumeration value='underscore'/>     </xs:restriction>    </xs:simpleType>   </xs:attribute>   </xs:complexType>  </xs:element>      <xs:element name='Heading1'>   <xs:complexType mixed='true'>   </xs:complexType>  </xs:element>      <xs:element name='Heading2'>   <xs:complexType mixed='true'>   </xs:complexType>  </xs:element>      <xs:element name='Heading3'>   <xs:complexType mixed='true'>   </xs:complexType>  </xs:element>      <xs:element name='Heading4'>   <xs:complexType mixed='true'>   </xs:complexType>  </xs:element>      <xs:element name='Item'>   <xs:complexType>    <xs:sequence>      <xs:element ref='Para' maxOccurs='unbounded'/>    </xs:sequence>   </xs:complexType>  </xs:element>      <xs:element name='Note'>   <xs:complexType>    <xs:choice maxOccurs='unbounded'>      <xs:element ref='Para'/>      <xs:element ref='NumberList'/>      <xs:element ref='BulletList'/>    </xs:choice>   </xs:complexType>  </xs:element>      <xs:element name='NumberList'>   <xs:complexType>    <xs:sequence>      <xs:element ref='Item' maxOccurs='unbounded'/>    </xs:sequence>   </xs:complexType>  </xs:element>      <xs:element name='Para'>   <xs:complexType mixed='true'>    <xs:choice minOccurs='0' maxOccurs='unbounded'>     <xs:element ref='Code'/>     <xs:element ref='Emphasis'/>     <xs:element ref='Superscript'/>     <xs:element ref='Subscript'/>    </xs:choice>   </xs:complexType>  </xs:element>      <xs:element name='Section1'>   <xs:complexType>    <xs:sequence>      <xs:element ref='Heading1'/>      <xs:choice minOccurs='0' maxOccurs='unbounded'>        <xs:element ref='Para'/>        <xs:element ref='CodeExample'/>        <xs:element ref='VariableList'/>        <xs:element ref='NumberList'/>        <xs:element ref='BulletList'/>        <xs:element ref='Note'/>        <xs:element ref='Warning'/>      </xs:choice>      <xs:element ref='Section2' minOccurs='0' maxOccurs='unbounded'/>    </xs:sequence>   </xs:complexType>  </xs:element>      <xs:element name='Section2'>   <xs:complexType>    <xs:sequence>      <xs:element ref='Heading2'/>      <xs:choice minOccurs='0' maxOccurs='unbounded'>        <xs:element ref='Para'/>        <xs:element ref='CodeExample'/>        <xs:element ref='VariableList'/>        <xs:element ref='NumberList'/>        <xs:element ref='BulletList'/>        <xs:element ref='Note'/>        <xs:element ref='Warning'/>      </xs:choice>      <xs:element ref='Section3' minOccurs='0' maxOccurs='unbounded'/>    </xs:sequence>   </xs:complexType>  </xs:element>      <xs:element name='Section3'>   <xs:complexType>    <xs:sequence>      <xs:element ref='Heading3'/>      <xs:choice minOccurs='0' maxOccurs='unbounded'>        <xs:element ref='Para'/>        <xs:element ref='CodeExample'/>        <xs:element ref='VariableList'/>        <xs:element ref='NumberList'/>        <xs:element ref='BulletList'/>        <xs:element ref='Note'/>        <xs:element ref='Warning'/>      </xs:choice>      <xs:element ref='Section4' minOccurs='0' maxOccurs='unbounded'/>    </xs:sequence>   </xs:complexType>  </xs:element>      <xs:element name='Section4'>   <xs:complexType>    <xs:sequence>      <xs:element ref='Heading4'/>      <xs:choice minOccurs='0' maxOccurs='unbounded'>        <xs:element ref='Para'/>        <xs:element ref='CodeExample'/>        <xs:element ref='VariableList'/>        <xs:element ref='NumberList'/>        <xs:element ref='BulletList'/>        <xs:element ref='Note'/>        <xs:element ref='Warning'/>      </xs:choice>    </xs:sequence>   </xs:complexType>  </xs:element>     <xs:element name='Subscript'>   <xs:complexType mixed='true'>   </xs:complexType>  </xs:element>      <xs:element name='Superscript'>   <xs:complexType mixed='true'>   </xs:complexType>  </xs:element>      <xs:element name='Term'>   <xs:complexType mixed='true'>   </xs:complexType>  </xs:element>      <xs:element name='VariableEntry'>   <xs:complexType>    <xs:sequence>      <xs:element ref='Term'/>      <xs:element ref='Definition'/>    </xs:sequence>   </xs:complexType>  </xs:element>      <xs:element name='VariableList'>   <xs:complexType>    <xs:sequence>      <xs:element ref='VariableEntry' maxOccurs='unbounded'/>    </xs:sequence>   </xs:complexType>  </xs:element>      <xs:element name='Warning'>   <xs:complexType>    <xs:choice maxOccurs='unbounded'>      <xs:element ref='Para'/>      <xs:element ref='NumberList'/>      <xs:element ref='BulletList'/>    </xs:choice>   </xs:complexType>  </xs:element> </xs:schema>

While having a schema is important, it is also a good idea to create a sample instance for development and testing that incorporates each of the elements (and their possible attribute values) and the context in which they can occur. This helps to ensure that you don't leave anything out, whether in your style setup, your actions pane, or your transformations. Example 5-2 shows just such a sample instance.

Example 5-2. A sample document conforming to the SDArticle schema

<?xml version="1.0" encoding="UTF-8"?> <Article xmlns="http://www.office-xml.com/ns/sdarticle">  <ArticleTitle>Article Title</ArticleTitle>  <Para>This is the introductory paragraph.</Para>  <Section1>   <Heading1>Heading 1</Heading1>   <Para>This is a paragraph. ... This is a paragraph.     <Emphasis CDATA="italic">This sentence is in italics.</Emphasis>     This is a paragraph.<Superscript>1</Superscript>   </Para>   <CodeExample>Code Example Code Example Code Example    Code Example Code Example Code Example    Code Example Code Example Code Example</CodeExample>   <VariableList>    <VariableEntry>     <Term>Term1</Term>     <Definition>      <Para>Definition of term1.</Para>     </Definition>    </VariableEntry>    <VariableEntry>     <Term>Term2</Term>     <Definition>      <Para>Definition of term2.</Para>     </Definition>    </VariableEntry>   </VariableList>   <NumberList>    <Item>     <Para>Numbered list item 1</Para>    </Item>    <Item>     <Para>Numbered list item 2</Para>    </Item>    ...   </NumberList>   <BulletList>    <Item>     <Para>Bulleted list item 1</Para>    </Item>    <Item>     <Para>Bulleted list item 2</Para>    </Item>    ...   </BulletList>   <Note>    <Para>This is a note. ... This is a note.</Para>    <NumberList>     <Item>      <Para>Numbered list inside a note - item 1.</Para>     </Item>     <Item>      <Para>Numbered list inside a note - item 2.</Para>     </Item>    </NumberList>   </Note>   <Warning>    <Para>This is a warning. ... This is a warning.</Para>    <BulletList>     <Item>      <Para>Bulleted list inside a warning - item 1</Para>     </Item>     <Item>      <Para>Bulleted list inside a warning - item 2</Para>     </Item>    </BulletList>   </Warning>   <Section2>    <Heading2>Heading 2</Heading2>    <Para>This is a paragraph. <Emphasis CDATA="italic">This sentence      is bold.</Emphasis> This is a paragraph.<Superscript>2</Superscript>    </Para>    ...    <Section3>     <Heading3>Heading 3</Heading3>     <Para>This is a paragraph. <Emphasis CDATA="italic">This sentence       Is underscored.</Emphasis> This is a paragraph.      <Superscript>3</Superscript>     </Para>     ...     <Section4>      <Heading4>Heading 4</Heading4>      <Para>This is a paragraph. <Code>This is inline code.</Code>        This is a paragraph.<Superscript>4</Superscript>      </Para>      ...     </Section4>    </Section3>   </Section2>  </Section1> </Article>

Figure 5-3 shows Example 5-2 loaded into Word 2003.

Figure 5-3. Sample instance in Word 2003

5.2.2 Templates

Templates are keepers of styles. It is not uncommon to have several templates, each using the same set of style names, but with different formatting characteristics and page layouts defined in each. This allows the same XML schema, transformations, and Smart Document code to be used to create multiple document types.

5.2.3 Styles

The most common way to associate formatting characteristics with XML elements is through the use of styles. A style is merely shorthand for any number of individual traits, such as font, point size, leading, indent, pre-space, post-space, widow/orphan rules, hyphenation rules, and the like. While it is possible to use individual codes (often referred to as primitives) to affect the desired visual appearance, it is typically avoided.

When creating a Smart Document solution, a set of styles should be created that conforms to the desired look. You will need to create a separate style for each level of heading, for various types of paragraphs, and for any other unique components that are part of your document set. You should also create character styles to apply inline formatting characteristics such as bold, bold italic, superscripts, and the like. Office 2003 allows styles to be protected; by creating named styles for each type of formatting required you can prevent the end user from creating new styles, modifying existing styles, and using the formatting icons on the toolbar, ensuring a consistent appearance for your documents.

For more information on creating styles and templates, refer to Walter Glenn's Word 2000 in a Nutshell and Word Pocket Guide (O'Reilly).

The sample application will need several styles. Each of these styles will be applied to the document based on the particular element. Elements alone will not be sufficient to identify the appropriate style; instead, we'll need to evaluate the element in the context of its surroundings its parent, ancestors and siblings. The paragraph style names (and their associated schema elements) are listed in Table 5-1.

Table 5-1. Paragraph styles
Paragraph style name	Element-in-context
ArticleTitle	ArticleTitle
SectionHead1	Heading1
SectionHead2	Heading2
SectionHead3	Heading3
SectionHead4	Heading4
ParagraphDefault	Para
NumberListItem	<NumberList><Item><Para>
BulletListItem	<BulletList><Item><Para>
Note	<Note><Para>
NoteNumberListItem	<Note><NumberList><Item><Para>
NoteBulletListItem	<Note><BulletList><Item><Para>
Warning	<Warning><Para>
WarningNumberListItem	<Warning><NumberList><Item><Para>
WarningBulletListItem	<Warning><BulletList><Item><Para>
VariableListEntry	<VariableEntry>
CodeBlock	<CodeExample>

Character styles, listed in Table 5-2, are also necessary. Note that several styles are determined by an attribute value rather than by an element's positioning within the overall structure of the document instance.

Table 5-2. Character styles
Character style name	Element-in-context
Italic	<Emphasis type="italic">
Bold	<Emphasis type="bold">
Underscore	<Emphasis type="underscore">
Superscript	<Superscript>
Subscript	<Subscript>
InlineCode	<Code>

As long as you keep the names of your styles consistent, you will be able to use the same transformations and smart document solution code with multiple styles and templates.

Our sample XML instance in Word 2003, with styles associated as indicated above and with the Styles and Formatting task pane displayed, is shown in Figure 5-4.

Figure 5-4. Sample XML instance in Word 2003

5.2.4 Shell Instance

Many Word templates contain placeholder text; that is, text that describes to the end user the type of content that is to be inserted at a particular location within the document. When creating a template for a Smart Document solution, the template should include a shell XML instance, containing at least the top-level element that will be used for the particular document type as well as any required elements and structure guidelines. When tags are turned off (which is anticipated to be the default mode for most Smart Document applications), the user will see, instead, placeholder text. Not only does this serve as a form of help, it also ensures that the information worker knows exactly where content is allowed within the XML document structure. Once the shell is in place, the Document Actions Task Pane will take over the job of displaying the various options that are allowed at any particular point within the instance.

5.2.5 Boilerplate

Another common feature of a template is boilerplate text. This may be default header/footer content, legal notices, company descriptions, or any other information that is routinely included as part of the particular document type. Storing the content directly in the template means it will be included automatically each time the template is used and also provides a single location for updating.

A template, shown in Figure 5-5, has been created that contains the requisite page layout information along with the styles listed in Tables Table 5-1 and Table 5-2. Since styles are linked to specific XML elements, the styles have been protected, meaning that no additional styles can be added to the document, the styles cannot be changed, and only those styles listed can be used. A minimal document instance is included as part of the template to get the end user started.

Note the placeholder text (the shaded gray areas) as well as the grayed out areas on the toolbar. Since the styles have been protected, the user does not have the option of selecting the bold, italic, justification, or other formatting icons. Placeholder text is only displayed when tags are turned off, the anticipated mode for most end users.

Figure 5-5. Smart document authoring template with protected styles

5.2.6 XSL Transformations

XSLT plays a vital role in any Smart Document solution. As illustrated in Chapter 4, transformations are used to integrate external schemas with WordprocessingML in order to create formatted Word documents. Transformations can also be invoked when saving a document, including the built-in transform that extracts all Word-related markup, leaving only the external schema-related markup in the result instance. A third use for transformations may not be quite as obvious: transformations can be invoked as part of any action called from the Document Actions Task Pane. This allows for styles to be applied as markup is inserted in the instance.

Only the InsertXML method, available on both Selection and Range objects, supports running transformations within document actions. This can be very handy when inserting blocks of XML markup, associated styles, and placeholder text. For instance:

Range.InsertXML("<VariableList></VariableList> ", "path\transform.xsl")

will insert the element VariableList and then call the named XSLT file. InsertXML must return a valid WordprocessingML document; upon matching the root, all the necessary WordprocessingML markup is inserted down to the opening w:body element. At that point, the appropriate template is selected (matching the element VariableList). Rather than executing the numerous steps involved one by one, the transform, as shown in Example 5-3, performs all steps in a single pass.

Example 5-3. An XSLT transformation for applying style to markup inserted in an instance

<xsl:template match="/">         <w:body>       <xsl:apply-templates select="*"/>     </w:body>   </w:wordDocument> </xsl:template>     <xsl:template match="VariableList">   <w:p/>   <ns0:VariableList>     <w:p>       <w:pPr>         <w:pStyle w:val="VariableListEntry"/>       </w:pPr>       <ns0:VariableEntry>         <ns0:Term w:placeholder="Enter term here">           <w:r>             <w:rPr>               <w:rStyle w:val="Term"/>             </w:rPr>             <w:r>               <w:t/>             </w:r>           </w:r>         </ns0:Term>         <w:r>           <w:tab/>         </w:r>         <ns0:Definition>           <ns0:Para w:placeholder="Enter description or definition of term">             <w:r>               <w:t/>             </w:r>           </ns0:Para>         </ns0:Definition>       </ns0:VariableEntry>     </w:p>   </ns0:VariableList>   <w:p/> </xsl:template>

When creating action transformations, keep your WordprocessingML markup to a minimum; that is, only use what is required to create a valid WordprocessingML document instance. Otherwise you may suffer from performance issues.

See Chapter 4 and Appendix B for more information on using XSLT with Word.