HTML, XML, and CSS


Microsoft Excel and Word both have the ability to save a document as HTML. They can also reopen that document and it generally looks exactly like the original, even if HTML does not provide the level of granularity required to produce the original document in standard HTML. This extra functionality is provided by additional custom attributes on standard HTML tags, embedded XML with the documents, and extended use of less-common properties as well as custom Office-specific properties.

This extra information is commonly referred to as junk by most Web developers who have ever seen the results of an Office application's Save As HTML command. However, for our particular needs, this extra information is vital in reproducing custom Word and Excel files that take advantage of functionality not supported by straight HTML and CSS. In this section, we'll review examples of creating Word and Excel files through this technique. We'll also examine some of the Office-specific attributes, XML, and CSS properties, and identify a few drawbacks or areas of concern that can arise with this technique.

NOTE

Microsoft PowerPoint, too, has the ability to Save as HTML and reproduce the presentation from HTML. However, for the purpose of generating new slideshows, this technique is less useful because the generated HTML is extremely complex and makes extensive use of other technologies, including VRML. PowerPoint documents always produce multifile HTML renderingseven a simple two-slide presentation can produce 34 different files as part of the HTML version.


To generate an Excel or Word document through HTML, we have to produce HTML just as in a standard browser request and then append the <cfheader> and <cfcontent> tags shown earlier in the chapter to the end of the request. For simple content or when fine-grained control is not necessary, developers can stick to standard HTML and CSS without using anything Office-specific and still generate valid Office documents.

All Office-specific XML elements and attributes are namespace qualified. In order to use them, the namespaces must be declared; this is usually done on the root <html> element. All XML content is placed inside an actual <xml> element that itself is enclosed within comments, as in the following example:

 <html xmlns:w="urn:schemas-microsoft-com:office:word"> <head> <!--[if gte mso 9]> <xml>  <w:WordDocument>   <w:View>Print</w:View>  </w:WordDocument> </xml> <![endif]--> </head> 

This common code we'll see at the top of all HTML generatedWord examples and has a lot of new information. First, at the top is the "urn:schemas-microsoft-com:office:word" namespace declaration. This is required because we're including Word-specific XML content. The namespace for Office common content is "urn:schemas-microsoft-com:office:office", and for Excel-specific content it's "urn:schemas-microsoft-com:office:excel".

Within the <head> element there is a comment. In this case the comment is conditional, meaning the content is processed in Office version 9 (2000) and above. Office 97 applications do have the ability to save and load as HTML, but they don't understand the new custom content. You can choose to leave the conditional comment out of your documents if you expect all users will have Office 2000 or higher. You can also use this comment technique to display an alternate message to Office 97 users who may not be able to view your documents correctly.

The actual custom content starts with an <xml> element and then Word namespacequalified <w:WordDocument> and <w:View> elements. Notice that the <xml> element is truly an element and not a declaration, so it is not enclosed in question marks and does not specify the version. The <w:WordDocument> element contains Word-specific content. The <w:View> element sets the initial view to Print Layout view, which makes the document look like a standard Word document.

Listing 27.5 provides a complete example of a generated Word document. This code produces the same result as the earlier example (Listing 27.2) in which we merged customer data into an existing confirmation letter. Here we're generating the letter from scratch and merging the same data.

Listing 27.5. MailMerge2.cfmCreate Confirmation Letter for Word Through HTML, XML, and CSS
 <!---   Filename: MailMerge2.cfm   Purpose: Creates a mail-merge letter via HTML ---> <!--- set up properties to merge into the document ---> <cfscript>   OrderDate = "November 23, 2004";   FirstName = "Samuel";   LastName = "Neff";   Address = "215 East Road";   City = "Rockville";   State = "MD";   Zip = "20850"; </cfscript> <!--- set up word standard Word document ---> <html xmlns:w="urn:schemas-microsoft-com:office:word"> <head>   <!--[if gte mso 9]>   <xml>    <w:WordDocument>     <w:View>Print</w:View>     <w:DontDisplayPageBoundaries/>    </w:WordDocument>   </xml>   <![endif]-->   <!--- use styles to remove margin between paragraphs and set the font --->   <style>     p {        margin:0in;        font-size:12.0pt;        font-family:"Times New Roman";        }     <!--- style for indented return address --->     .ReturnAddress {       margin-left: 3in;       }   </style> </head> <body>   <div >     <p>300 Main Street</p>     <p>Washington, DC 20012</p>     <p><cfoutput>#OrderDate#</cfoutput></p>   </div>   <p>&nbsp;</p>   <cfoutput>   <p>#FirstName# #LastName#</p>   <p>#Address#</p>   <p>#City#, #State# #Zip#</p>   </cfoutput>   <p>&nbsp;</p>   <p><cfoutput>#FirstName#</cfoutput>:</p>   <p>&nbsp;</p>   <!---     notice Office specific style property to preserve     whitespace between sentences   --->   <p>Thank you for purchasing Advanced ColdFusion MX 7.<span   style='mso-spacerun:yes'>   </span>Your purchase will be mailed   to you within two business days.</p>   <p>&nbsp;</p>   <p>Sincerely,</p>   <p>&nbsp;</p>   <p>CFBookstore</p> </body> </html> <cfheader   name="Content-Disposition"   value="attachment; filename=""Confirmation.doc"""> <cfcontent   reset="no"   type="application/msword"> 

Notice that the listing is almost completely standard HTML and has only a small amount of Office-specific XML and CSS mixed in. We use the same XML discussed earlier to display the document in Print Layout view, and we also turn off page boundaries to simplify reading onscreen. Toward the end, we use an Office-specific CSS property mso-spacerun to override the default whitespace collapsing and preserve a double space between paragraphs. At the end are the <cfheader> and <cfcontent> tags that are now very familiar.

Listing 27.6 shows a variation on our earlier order-details example. It creates a single-worksheet Excel document that has all customer orders in a table, formatted, with a formula to calculate the total.

Listing 27.6. OrderDetails2.cfmCustomer Order Details in a Single-Sheet Workbook via HTML
 <!---   Filename: OrderDetails2.cfm   Purpose: Dump customer order details to a single-sheet workbook via HTML   Requires: IntegratingOffice datasource ---> <!--- Select all order details for analysis ---> <cfquery name="orders" dataSource="IntegratingOffice">   SELECT     C.CustomerName,     O.OrderDate,     P.ProductName,     OD.Quantity,     P.UnitCost,     OD.Quantity * P.UnitCost AS TotalCost   FROM     ((Customers C         INNER JOIN     Orders O         ON       C.CustomerID = O.CustomerID)         INNER JOIN     OrderDetails OD         ON       O.OrderID = OD.OrderID)         INNER JOIN     Products P         ON       OD.ProductID = P.ProductID   ORDER BY     C.CustomerName,     O.OrderDate,     P.ProductName </cfquery> <!---   we're using a lot of Excel XML and custom attributes, so we need the   Excel namespace declaration ---> <html xmlns:x="urn:schemas-microsoft-com:office:excel"> <head>   <!--- use XML to specify the Worksheet name and turn on gridlines --->   <xml>    <x:ExcelWorkbook>     <x:ExcelWorksheets>      <x:ExcelWorksheet>       <x:Name>Orders</x:Name>       <x:WorksheetOptions>         <x:GridlineColorIndex>64</x:GridlineColorIndex>       </x:WorksheetOptions>      </x:ExcelWorksheet>     </x:ExcelWorksheets>    </x:ExcelWorkbook>   </xml>   <!---     declare styles we'll use throughout the document with a mix     of standard CSS and Excel-specific properties.   --->   <style>     body {        font-size:10.0pt;        font-family:Arial;       }     .header {        font-weight:700;        border-bottom:.5pt solid black;        }     .orderDate {        mso-number-format:"Short Date";       }     .number {        mso-number-format:"\#\,\#\#0";       }     .grandTotal {        font-weight:700;        mso-number-format:"\#\,\#\#0";        border-top:.5pt solid black;       }     .grandText {        font-weight:700;       }   </style> </head> <body> <!---   notice the x:str attribute.  This specifies that the cell content   is text if not otherwise specified. ---> <table x:str> <!---   the <col> element is standard HTML but is less commonly used.   Note that the number in the span attribute specifies the number   of columns affected and does NOT represent spanned or merged   columns the way the colspan attribute does on a <td> element.   Excel uses a custom CSS property to specify the exact width.  The   unit is 1/256th of a character. ---> <col span="3" style="mso-width-alt:3072"> <col span="3" style="mso-width-alt:2560"> <!---   we're using <td> elements for headers--Excel doesn't process   <th> elements correctly ---> <tr>   <td >Customer</td>   <td >Order Date</td>   <td >Product</td>   <td >Quantity</td>   <td >Unit Cost</td>   <td >Total Cost</td> </tr> <cfoutput query="orders">   <!---     x:num attribute specifies that the content is numeric, overriding     the default x:str attribute that we specified on the <table> element.     We leave the cell contents blank since Excel will fill them with the     value appropriately formatted according to the style.     Notice that we use JavaCast to convert the date to a number--the     number of days since 12/31/1899.   --->   <tr>     <td>#orders.CustomerName#</td>     <td              x:num="#JavaCast('long', orders.OrderDate)#"></td>     <td>Dryer</td>     <td  x:num="#orders.Quantity#"></td>     <td  x:num="#orders.UnitCost#"></td>     <td  x:num="#orders.TotalCost#"></td>   </tr> </cfoutput> <!---   for a blank row, specify an empty <td> element for every cell.  If   we had specified a single row with a colspan attribute then Excel   would merge the cells and give a result different then what we're after. ---> <tr>   <td></td>   <td></td>   <td></td>   <td></td>   <td></td>   <td></td> </tr> <!---   notice the grand total is a number but no value is specified.  Instead   a formula is specified and the value will be calculated when Excel opens   the worksheet. ---> <tr>   <td >Grand Total</td>   <td></td>   <td></td>   <td></td>   <td></td>   <cfoutput>     <td              x:num       x:fmla="=SUM(F2:F#orders.RecordCount + 1#)"></td>   </cfoutput> </tr> </table> </body> </html> <!--- tell the browser this is an Excel file ---> <cfheader   name="Content-Disposition"   value="attachment; filename=""orders.xls"""> <cfcontent  reset="no"  type="application/vnd.ms-excel"> 

Notice that the document starts with the Excel namespace declaration and an XML block that defines the worksheet name and turns on gridlines. (Gridlines are off by default when opening an HTML file in Excel.)

The next section declares the styles we'll use in the spreadsheet. Most of the styles are standard CSS, but we also use the Excel-specific mso-number-format style to specify number formats, which can be named or custom formats.

On the opening <table> element we added the Excel-specific x:str attribute. Excel uses attributes to specify the data type of all values. Supported attributes are x:bool, x:num, and x:str. When specified on a table cell, these attributes can either be used as a marker to identify the data type, where no value is provided for the attribute, or to provide a more exact value than is displayed. Think of the text within the cell as the displayed and formatted data, and the value for the attribute as the original unformatted and complete data. When the attributes are specified on a <table> element, they define the default data type for the table.

Excel understands the <col> element that we use to specify column widths. Although the element itself is standard HTML, we use an Excel-specific style property to specify the width in units native to Excel: of a character. The values here3072 and 2560correspond to 12 and 10 characters, respectively.

Notice that within the query loop we specify dates as numbersinternally, Excel stores dates as numeric values representing the number of days since 12/31/1899. We can easily calculate this value using the ColdFusion DateDiff function.

At the end of the table we provide a grand total through a formula using the x:fmla attribute. This is a simple SUM function, and we're basing the range on the number of records in the query. Finally, we end the template with the same <cfheader> and <cfcontent> tags used in previous listings.

The examples here use mostly standard HTML, with a small amount of Office-specific content mixed in. This is fairly typical of generated Excel and Word documents. However, some features require substantial additional work. For example, adding headers, footers, or images to a Word document requires the developer to generate multiple HTML files and a separate XML file that binds them together. Similarly, an Excel workbook with multiple worksheets will require multiple filesone for each sheet, an index, and a file list in XML.

This multifile requirement poses a stumbling block for developers, because all files must exist on the client machine in order for Excel and Word to process themthe developer cannot refer to external URLs. Since a single browser request can only respond with a single file, we must employ another technology for combining the files. This is done with what Microsoft refers to within Office as a "Web archive." It's another name for multipart MIME messagesthe same format used to send attachments with email. ColdFusion developers can take advantage of a free MultipartMIME component for generating these responses correctly. It is available for download from http://www.rewindlife.com/archives/000033.cfm.

TIP

Microsoft Excel and Word 2002 and higher support a pure XML syntax, as well as the hybrid HTML, XML, CSS syntax discussed here. If you know your users will have this version, you can opt to use this format. The advantage is that as a single technology it's cleaner than the hybrid approach, and the pure XML technique has greater support for Word and Excel features without ever requiring a multifile document.


NOTE

Generating Excel and Word documents through HTML, XML, and CSS is not widely documented. There is some information available online, such as the "Microsoft Office HTML and XML Reference," http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnoffxml/html/ofxml2k.asp. This provides documentation on every custom XML element and attribute, but it does not provide information on the custom CSS properties. It offers very few examples and not much "How-to" information. It's purely a reference.

The best source of information can be accessed by using Excel's and Word's own Save As command. For example, if you want to know how to create a Word document with columns, use the Word user interface to create a multicolumn document and save it as HTML. Then open the HTML in Dreamweaver code view to see what custom content was added to support the columns. Through this technique you can see how to reproduce most Excel and Word functionality.

Unfortunately, some features cannot be re-created through HTML, XML, and CSS. These include charts, forms, Visual Basic for Applications modules, and some mailing labels.




Advanced Macromedia ColdFusion MX 7 Application Development
Advanced Macromedia ColdFusion MX 7 Application Development
ISBN: 0321292693
EAN: 2147483647
Year: 2006
Pages: 240
Authors: Ben Forta, et al

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net