Populating a Document with Data on the Server


Consider the following portion of an all-too-common server scenario. An authenticated user, perhaps a salesperson, requests an Excel spreadsheet from a server. The spreadsheet is an expense report, and the server is an ASP, ASP.NET, or SharePoint server. The server code looks up some information about the user from a database, Active Directory, or Web service. Perhaps the server has a list of recent corporate credit card activity that it will prepopulate into the expense list. The server starts Excel but keeps it "invisible" because there is no interactive user on the server. Then it uses the Excel object model to insert the data into the appropriate cells, saves the result, and serves the resulting file to the user.

This is a very suboptimal document life cycle for two reasons. First, it is completely unsupported and strongly recommended against by Microsoft. Word and Excel were designed to be run interactively on client machines with perhaps a few instances of each running at the same time. They were not designed to be scalable and robust in the face of thousands of Web-server hits creating many instances on "headless" servers that allow no graphical user interfaces.

Second, this process thoroughly conflates the "view" with the data. The server needs to know exactly how the document is laid out visually so that it can insert and remove the right fields in the right places. A simple change in the document format can necessitate many tricky changes in the server code.

But automatically serving up documents full of a user's data is such a compelling scenario that many organizations have ignored Microsoft's guidelines and built solutions around server-side manipulation of Word and Excel documents. Those solutions tend to have serious scalability and robustness problems.

What can we do to mitigate these two problems?

Data-Bound VSTO Documents

As discussed in Chapter 17, "VSTO Data Programming," one way to solve this problem is to move the processing onto the client. Just serve up a blank document that detects whether there is no cached data in its data island and fills its datasets from the database server if so. When the client is ready to send the data back to the database, it connects again and updates the database. No special document customization has to happen on the server at all, and the database server is doing exactly what it was designed to do.

This solution has a major drawback, however: It requires that every user have access to the database. From a security perspective, it might be smarter to give only the document server access to the database, thereby decreasing the "attack surface" exposed to malicious hackers. What we really want to do is have the document ready to go with the user data in it from the moment the user obtains the document but without having to start Word or Excel on the server.

XML File Formats

Avoiding the necessity of starting a client application on the server is key. Consider the first half of the scenario above: The server takes an existing on-disk document and uses Excel to produce a modified version of the document. Excel is just a means to an end; if you know what changes need to be made to the bits of the document and how to manipulate the file format, you have no need to start the client application.

The Word and Excel binary file formats are opaque, but Word and Excel now support persisting documents in a much more transparent XML format. It is not too hard to write a program that manipulates the XML document without ever starting Word or Excel.

The XML file formats have some drawbacks, however. Although it certainly is faster and easier to manipulate the XML format directly, parsing large XML files is still not blazingly fast. XML files tend to be quite a bit larger than the corresponding binary files. And worst, although the Word XML format is full fidelity, the Excel format is not. Excel loses information about the VSTO customization when it saves a document as XML.

Furthermore, unfortunately, the Word XML file format does not store the data island in human-readable, editable XML. Rather, it serializes out the binary state that would have gone into the binary-file-format data island.

Also, we have not addressed the second problem that we identified earlier. Now we are not just manipulating the view; we are manipulating the persisted state of the view to insert or extract data. It would be much cleaner if we could simply get at the data island.

We need a way to solve these additional problems; we need a solution that works on binary non-human-readable files, works with VSTO-customized documents, and cleanly separates view from data.

Accessing the Data Island

Chapter 17, "VSTO Data Programming," showed how to cache the state of public host item class members that contain data in a "data island" so that they could be persisted into the document as XML, independent of their user-interface representation. The VSTO 2005 runtime library comes with a class, ServerDocument, that can read and write the data island directly; it does not need to start Word or Excel on the server. The ServerDocument object can read and write Word documents in binary or XML format and Excel documents in binary format.

Let's re-create the above document life cycle using the data island. Then we describe the advanced features of the ServerDocument object model in more detail.




Visual Studio Tools for Office(c) Using Visual Basic 2005 with Excel, Word, Outlook, and InfoPath
Visual Studio Tools for Office: Using Visual Basic 2005 with Excel, Word, Outlook, and InfoPath
ISBN: 0321411757
EAN: 2147483647
Year: N/A
Pages: 221

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net