The ServerDocument Object Model | Visual Studio Tools for Office: Using Visual Basic 2005 with Excel, Word, Outlook, and InfoPath

The ServerDocument object model enables you to read and write all the deployment information and cached data stored inside a customized document. This section goes through all the data properties and methods in this object model, describing what they do, their purpose, and why they look the way they do. Chapter 20, "Deployment," describes the deployment portions of the object model.

Warning

Before we begin, note that the ServerDocument object model is what we like to call an "enough rope" object model. Because this object model enables you to modify all the information about the customization, it is quite possible to create documents with inconsistent cached data or nonsensical deployment information. The VSTO runtime engine does attempt to detect malformed customization information and throw the appropriate exceptions, but still exercise caution when using this object model.

ServerDocument Class Constructors

The ServerDocument class has seven constructors, but five of them are mere syntactic sugars for these two:

Public Sub New(ByVal bytes As Byte(), ByVal fileType As String) Public Sub New(ByVal documentPath As String, _   ByVal onClient As Boolean, ByVal access As FileAccess)

These correspond to the two primary ServerDocument scenarios: You want to read/edit a document either in memory or on disk. Note that these two scenarios cannot be mixed; if you start by opening a file on disk, you cannot treat it as an array of bytes in memory, and vice versa.

The in-memory version of the constructor takes a string that indicates the type of the file. Because all you are giving it is the bytes of the file, as opposed to the name of the file, the constructor does not know whether this is an .XLS, .XLT, .DOC, .DOT, or .XML. Pass in one of those strings to indicate what kind of document this is. If you pass in .XML, the document you pass must be in the WordprocessingML (WordML) format supported by Word. ServerDocument cannot read documents saved in the Excel XML format.

The byte array passed in must be an image of a customized document. The ServerDocument object model does not support in-memory manipulation of not-yet-customized documents.

The on-disk version takes the document path, from which it can deduce the file type. The onClient flag indicates whether your code is running in a client scenario (such as the document viewer sample above) or a server scenario (such as the customized data-island-generation example at the beginning of this chapter).

Why does the ServerDocument care whether it is running on a client or a server? Most of the time, it does not care. There is one important scenario, however: What if you pass in a document that does not yet have a customization?

In that case, the ServerDocument object attempts to add customization information to the uncustomized document. Adding the customization information requires the ServerDocument class to start Word or Excel, load the document into the application, and manipulate it using the Office object model. Because doing that is a very bad idea in server scenarios, the ServerDocument throws an exception if given an uncustomized document on the server.

The file access parameter can be FileAccess.Read or FileAccess.ReadWrite. If it is read-only, attempts to change the document will fail. (Opening an uncustomized document on the client in read-only mode is not a very good idea; the attempt to customize the document will fail.)

The other in-memory constructor is provided for convenience; it simply reads the entire stream into a byte array for you:

Public Sub New(ByVal stream As Stream, ByVal fileType As String)

Finally, the three remaining on-disk constructors act just like the three-argument constructor above, with the onClient flag defaulting to False if omitted and the file access defaulting to ReadWrite if omitted:

Public Sub New(ByVal documentPath As String, _   ByVal onClient As Boolean) Public Sub New(ByVal documentPath As String, _   ByVal access As FileAccess) Public Sub New(ByVal documentPath As String)

Saving and Closing Documents

The ServerDocument object has two important methods and one property used to shut down a document:

Public Sub Save() Public ReadOnly Property Document As Byte() Public Sub Close()

If you opened the ServerDocument object with an on-disk document, the Save method writes the changes you have made to the application manifest, cached data manifest, or data island to disk. If you opened the document using a byte array or stream, the changes are saved into a memory buffer that you can access with the Document property. Note that it is an error to read the Document property if the file was opened on disk.

It is good programming practice to close the ServerDocument object explicitly when you have finished with it. Large byte arrays and file locks are both potentially expensive resources that will not be reclaimed by the operating system until the object is closed (or, equivalently, disposed by either the garbage collector or an explicit call to IDisposable.Dispose).

Server-side users of ServerDocument are cautioned to be particularly careful when opening on-disk documents for read-write access. It is a bad idea to have multiple writers (or a single writer and one or more readers) trying to access the same file at the same time. The ServerDocument class will do its best in this situation; it will make "shadow copy" backups of the file so that readers can continue to read the file without interference while writers write. Making shadow copies of large files can prove time-consuming, however.

If you do find yourself in this situation, consider doing what we did in the first example in this chapter: Read the file into memory, and edit it in memory rather than on disk. As long as the on-disk version is only read, it will never need to be shadow-copied and runs no risk of multiple writers overwriting one another's changes.

Static Helper Methods

Developers typically want to perform a few common scenarios with the ServerDocument object model; the class exposes some handy static helper methods so that you do not have to write the boring boilerplate code. All these scenarios work only with on-disk filesnot with "in-memory" files. The following static methods are associated with ServerDocument:

Public Shared Function AddCustomization( _   ByVal documentPath As String, ByVal assemblyName As String, _   ByVal deploymentManifestPath As String, _   ByVal applicationVersion As String, _   ByVal makePathsRelative As Boolean, _   ByRef nonpublicCachedDataMembers As String()) As String Public Shared Sub RemoveCustomization( _   ByVal documentPath As String) Public Shared Function IsCustomized( _   ByVal documentPath As String) As Boolean Public Shared Function IsCacheEnabled( _   ByVal documentPath As String) As Boolean

AddCustomization

AddCustomization takes an uncustomized document and adds customization information to it. It creates a new application manifest and cached data manifest. If AddCustomization is given an already-customized document, the customization information is destroyed and replaced with the new information. This allows you to create new customized documents on a machine without Visual Studio; you could create the customization assemblies on a development box and then apply the customizations to documents on a different machine.

Note

AddCustomization should be called only on client machines, never on servers, because it always starts Word or Excel to embed the customization information in the uncustomized document.

The document and assembly paths are required; the deployment manifest path may be Nothing or empty if you do not want to use a deployment manifest to manage updating your customization.

The application version string must be a standard version string of the form "1.2.3.4". Note that this is the version number of the customization itself, not the version number of the assembly. (It may be wise, however, to use the version number of the assembly as the version number of your customized document application.)

If the makePathRelative flag is set to true, the assembly location written into the customization information will be relative to the document location. If the document location is a UNC path such as \\accounting\documents\budget.doc, for example, and the assembly location is \\accounting\documents\dlls\budget.dll, the assembly location written into the document will be dlls\budget.dll, not the full path. Otherwise, if makePathRelative is False, the assembly location is written exactly as it is passed in.

The AddCustomization method loads the assembly and scans it for document/worksheet classes that contain members marked with the Cached attribute so that it can emit information into the cached data manifest indicating that these members need to be filled when the customization starts for the first time. Because the VSTO runtime will be unable to fill in nonpublic members of these classes, the AddCustomization method returns the names of such members to help you catch this mistake early.

RemoveCustomization

RemoveCustomization removes all customization information from a document, including all the cached data in the data island. It also starts Word/Excel, so do not call it on a server. Calling RemoveCustomization on an uncustomized document results in an invalid operation exception.

IsCustomized and IsCacheEnabled

IsCustomized and IsCacheEnabled are similar but subtly different because of a somewhat obscure scenario. Suppose that you have a customized document that contains cached data in the data island, and you use the ServerDocument object model to remove all information about what document/worksheet classes need to be started. In this odd scenario, the document will not run any customization code when it starts; therefore, there is no way for the document to access the data island at runtime. Essentially, the document has become an uncustomized document with no code behind it, but all the data is still sitting in the data island. The VSTO designers anticipated that someone might want to remove information about the code while keeping the data island intact for later extraction via the ServerDocument object model.

IsCustomized returns true if the document is customized and will attempt to run code when it starts. IsCacheEnabled returns TRue if the document is customized at all and, therefore, has a data island, regardless of whether the customization information says what classes to start when the document is loaded. (Note that IsCacheEnabled says nothing about whether the data island actually contains any datajust whether the document supports caching.)

Cached Data Objects, Methods, and Properties

As you saw in our handy utility above, a customized document's data island contains a small XML document called the cached data manifest, which describes the classes and properties in the cache (or, if the document is being run for the first time, the properties that need to be filled). The cached data is organized hierarchically; the manifest consists of a collection of view class elements, each of which contains a collection of items corresponding to cached members of the class. Here is a cached data manifest that has one cached member of one view class. The cached data member contains a typed DataSet:

<cdm:cachedDataManifest cdm:revision="1">   <cdm:view cdm:view>     <cdm:dataInstance cdm:data        cdm:dataType="ExcelCached.NorthwindDataSet,        ExcelCached, Version=1.0.1854.30463, Culture=neutral,        PublicKeyToken=null" />   </cdm:view> </cdm:cachedDataManifest>

Having a collection of collections is somewhat more complex than just having a collection of cached items. The cached data manifest was designed this way to avoid the ambiguity of having two host item classes (such as Sheet1 and Sheet2) each with a cached property named the same thing. Because each item is fully qualified by its class, there is no possibility of name collisions.

The actual serialized data is stored in the data island, not in the cached data manifest. In the object model, however, it is more convenient to associate each data instance in the cached data manifest with its serialized state.

The Cached Data Object Model

To get at the cached data manifest and any serialized data in the data island, the place to start is the CachedData property of the ServerDocument class. The CachedData object returns the CachedDataHostItemCollection, which contains a CachedDataHostItem for each host item in your customized document. A CachedDataHostItem is a collection of CachedDataItem objects that correspond to each class member variable that has been marked with the Cached attribute. Figure 18.3 shows an object model diagram for the objects returned for the example in Figure 18.1.

Figure 18.3. The cached data object model for the example in Figure 18.1.

There are no constructors for any of the types we will be discussing. The CachedData class has four handy helper methods (Clear, FromXml, ToXml, and ClearData) and a collection of CachedDataHostItem:

Public Sub Clear() Public Sub FromXml(ByVal cachedDataManifest As String) Public Function ToXml() As String Public Sub ClearData() Public ReadOnly Property HostItems As _    CachedDataHostItemCollection

Like the application manifest, the Clear method throws away all information in the cached data manifest; the FromXml method clears the manifest and repopulates it from the XML state; and the ToXml method serializes the manifest as an XML string.

The ClearData method throws away all information in the data island but leaves all the entries in the cached data manifest. When the document is started in the client, all the corresponding members will be marked as needing to be filled.

The CachedDataHostItem Collection

The HostItems collection is a straightforward extension of CollectionBase that provides a simple strongly typed collection of CachedDataHostItem objects. (It is called "host items" because these always correspond to items provided by the hosting application, such as Sheet1, Sheet2, or ThisDocument.) The cached data host item collection has the following methods and properties:

Public Function Add(ByVal hostItemId As String) _   As CachedDataHostItem Public Function Contains(ByVal hostItemId As String) As Boolean Public Sub CopyTo(ByVal hostItems As CachedDataHostItem(), _   ByVal index As Integer) Public Function IndexOf(ByVal hostItem As CachedDataHostItem) _   As Integer Public Sub Insert(ByVal index As Integer, _   ByVal value As CachedDataHostItem) Public Sub Remove(ByVal hostItem As CachedDataHostItem) Public Sub Remove(ByVal hostItemId As String) Public ReadOnly Property Item(ByVal index As Integer) _   As CachedDataHostItem Public ReadOnly Property Item(ByVal hostItemId As String) _   As CachedDataHostItem

The hostItemId argument corresponds to the namespace-qualified name of the host item class. Be careful when creating new items to ensure that the class identifier is fully qualified.

The CachedDataHostItem Object

Each CachedDataHostItem object corresponds to a host item in your document and has a CachedData property that returns a collection of CachedDataItem objects that correspond to cached members of the customized host item class:

Public Function Add(ByVal dataId As String, _   ByVal dataType As String) As CachedDataItem Public Function Contains(ByVal dataId As String) As Boolean Public Sub CopyTo(ByVal items As CachedDataItem(), _   ByVal index As Integer) Public Function GetEnumerator() As CachedDataItemEnumerator Public Function IndexOf(ByVal data As CachedDataItem) _   As Integer Public Sub Insert(ByVal index As Integer, ByVal item _   As CachedDataItem) Public Sub Remove(ByVal data As CachedDataItem) Public Sub Remove(ByVal dataId As String) Public ReadOnly Property Item(ByVal dataId As String) _   As CachedDataItem Public ReadOnly Property Item(ByVal index As Integer) _   As CachedDataItem

You may wonder why it is that you must specify the type of the property when adding a new element via the Add method. If you have a host item class declared like the following lines of code, surely the name of the class and property is sufficient to deduce the type, right?

 Public Class Sheet1    <Cached()> Public myData As NorthwindDataSet

In this case, it would be sufficient to deduce the compile-time type, but it would not be if the compile-time type were Object. When the document is run in the client, and the cached members are deserialized and populated, the deserialization code in the VSTO runtime needs to know whether the runtime type of the member is a dataset, datatable, or other serializable type.

The CachedDataItem Object

The identifier of a CachedDataItem is the name of the property or field on the host item class that was marked with the Cached attribute. The CachedDataItem itself exposes the type and identifier properties

Public Property DataType As String Public Property Id As String

as well as two other interesting properties and a helper method:

Public Property Schema As String Public Property Xml As String Public Sub SerializeDataInstance(ByVal value As Object)

Setting the Xml and Schema properties correctly can be slightly tricky; the SerializeDataInstance method takes an Object and sets the Xml and Schema properties for you. If you do not have an instance of the object on the server, however, and want to manipulate just the serialized XML strings, you must understand the rules for how to set these properties correctly.

The first thing to note is that the Schema property is ignored if the DataType is not a DataTable or DataSet (or subclass thereof). If you are serializing out another type via XML serialization, there is no schema, so leave it blank. On the other hand, if you are writing out a DataSet or DataTable, you must specify the schema.

Second, the data island may contain DataSets and DataTables in either in regular raw XML form or in diffgram form. The regular format that you are probably used to seeing XML-serialized DataSets in looks something like this:

<DataSet1 xmlns="http://www.foocorp.org/schemas/customers.xsd">   <dbo_Customers>     <Name>Maria Anders</Name>     <Address>Obere Str. 57</Address>   </dbo_Customers>   <dbo_Customers>     <Name>Ana Trujillo</Name>     <Address>Avda. de la Constitución 2222</Address>   </dbo_Customers>

And so on. A similar DataSet in diffgram form looks different:

<diffgr:diffgram>   <NorthwindDataSet     xmlns="http://www.foocorp.org/schemas/NorthwindDataSet.xsd">     <Customers diffgr: msdata:rowOrder="0">     <CustomerID>ALFKI</CustomerID>     <CompanyName>Alfreds Futterkiste</CompanyName>     <ContactName>Maria Anders</ContactName>

You can store cached DataSets and DataTables by setting the Xml property to either format. By default, the VSTO runtime saves them in diffgram format. Why? Because the diffgram format not only captures the current state of the DataSet or DataTable, but also records how the object has changed because it was filled in by the data adapter. That means that when the object's data is poured back into the database, the adapter can update only the rows that have changed instead of having to update all of them.

Be Careful

One final caution about using the ServerDocument object model to manipulate the cache: The cache should be all or nothing. Either the cached data manifest should have no data items with serialized XML or they should all have XML. The VSTO runtime does not support scenarios in which some cached data items need to be filled and others do not. If, when the client runtime starts, it detects that the cache is filled inconsistently, it will assume that the data island is corrupted and start fresh, refilling everything. If you need to remove some cached data from a document, remove the entire data item from the host item collection; do not just set the XML property to an empty string.