3.1. Information capture and reuse
For all the
abstract data that is managed in database systems, there is even more that is hidden in rendered word processing documents. That fact represents an
intellectual property loss for
, of course, but it also represents a nuisance and a time-
for the information workers who work with those documents.
Consider the articles written for a company's
and newsletters. Every one is likely to contain a title, author, and date within it, but more often than not that information has to be retyped, or individually
and pasted, to get it into a catalog entry. That's because there is no reliable way for a computer to recognize those data items in order to extract them.
3.1.1 Word processing
In contrast, look at Figure 3-1, which shows an article being edited in Microsoft Word.
Figure 3-1. Word document showing optional tag icons and task pane with XML structure
The article is actually an XML document that conforms to a schema of the
's choosing, in this case
. The user has
to display icons that represent the start- and end-tags. Note that there are distinct elements for the
Solution developers can use the XML elements to check and normalize information as it is entered, whether or not the tag icons are displayed. An application, for example, could notify the user if the text entered for a
element isn't really a valid date. Or it could automatically supply the current year if none was entered.
The right-hand pane is called the
it can be used for various purposes. In the figure, the top of the task pane shows the XML structure of the document. At the bottom is a list of the types of element that are valid at the current point in the document, according to the
The document is also a normal Word document, so Word's formatting features can be used in the usual way.
There are three ways to save this document as XML:
is Word's native XML file format. It
the Word document just as the DOC format would, including formatting and
. However, it doesn't include any of the
markup, so we won't discuss this option further here. (We cover it in Chapter 5, "Rendering and presenting XML documents", on page 86.)
The document can be saved as an XML document conforming to a custom schema; in this case,
. A custom schema would normally be defined by an enterprise, or by a committee set up by an industry to which the enterprise belongs. For that reason, it would be designed to preserve the abstract data needed for the user's applications. For example, the
can easily be identified by software and extracted for use in a catalog of articles.
The saved document could contain both WordML and the
markup, since the two are in different namespaces. This option preserves the formatting applied by the user, while still
the abstract data and distinguishing it from the
In our example, the article is the entire Word document, but that isn't a requirement. It is possible to intersperse short XML documents within a larger Word document. For example, a travel guide might include multiple XML structures that describe hotels, with subelements for the
, address, number of rooms, rates, etc.
Using XML with Word documents enables companies to capture more of the intellectual property that is created informally by individuals and work groups, and that typically remains inaccessible to enterprise information systems. As XML, that property becomes a portable asset that can be reused as needed.
For many purposes, a data entry form is more suitable for information capture than a typically larger and less constrained word processing document. InfoPath lets you design and use forms that are really XML documents that conform to your own custom schemas.
Figure 3-2 shows the layout of an order form in InfoPath's design mode. The structure of the
schema is shown in the task pane on the right, from which element types can be dragged onto the form.
Figure 3-2. InfoPath design interface with data source in task pane
Note that there is only one
line in the form design. Because the
elements to be repeated, a user entering data will be able to add
lines as needed. Had
elements been repeatable, the form would expand to allow insertion of the
of customer information fields.
Unlike Word, InfoPath generates an XSLT stylesheet to control the rendering of the form. The formatting can even be based on the data entered in the form. For example, the dialog box in Figure 3-3 specifies that negative prices should be shown in a different
Figure 3-3. InfoPath conditional formatting dialog
InfoPath is described in detail in Chapter 9, "Designing and using forms", on page 180.
3.1.3 Relational data
XML elements, whether captured in Word or Excel or InfoPath (or any other way, for that matter), are as
and predictable as the
and tables of a database. XML documents of all kinds are therefore a source of information as rich as any other operational data store. Companies can aggregate, parse, search, manage, and reuse the data in documents in the same way they do the transactional data that is typically captured for relational databases.
They can also import the document data into a database and use it in conjunction with data from other sources. In addition, they can export DBMS data as XML documents.
Figure 3-4, for example, shows the options Access offers when exporting data as XML. You can specify which tables and records to export and how to
and/or transform them.
Figure 3-4. Access dialog for exporting data as XML
Figure 3-5 shows the options for exporting a schema as XML. You can choose whether or not to export the schema, and whether it should be exported within the data document or as an independent schema document.
Figure 3-5. Access dialog for exporting schema as XML