As I've noted, Excel 2007 uses a new XML format for its workbooks, templates, and addins. These files are actually ZIP compressed files. As such, they can be " unzipped " and examined.
Previous versions of Excel used a binary file format. Although the binary file format specifications are known, working with binary files is not easy. The Excel 2007 XML file format, on the other hand, is an " open format." As such, these files can be created and manipulated using other software.
In this section, I describe the various parts within a typical Excel XLSM (macro-enabled) workbook file. The workbook, named sample.xlsm , is shown in Figure 4-4. It has one worksheet, one chart sheet, and a simple VBA macro. The worksheet contains a table, a button (from the Forms controls), a SmartArt diagram, and a photo of a flower.
CD | The sample.xlsm workbook is available on the companion CD-ROM. |
To view the innards of an Excel 2007 file, you need to open an Explorer window and add a ZIP extension to the filename. So the sample.xlsm file is renamed to sample.xlsm.zip . You can then open the file by using any unzipping program. I use the zip feature built into Windows XP.
Tip | You may prefer to extract the zipped files into an uncompressed directory. Doing so makes it easier to view the files. In Windows, right-click the filename and choose Extract All. |
The first thing that you notice is that the file contains a directory structure. The left panel of Figure 4-5 shows the fully expanded directory structure for the workbook file. The actual directories will vary with the workbook.
With a few exceptions, all of the files are text files. More specifically , they are XML files. You can view them in a text file editor, an XML editor, a Web browser, or even in Excel. Figure 4-6 shows one of these files viewed in the Firefox browser. The non-XML files include graphic images and VBA projects (these are stored in binary format).
This XML file has three root-level folders, and some of these have subfolders . You'll notice that many of the folders contain a _rels folder. These folders contain XML files that define the relationships to other parts within the package.
Following is a list of the folders in the sample.xlsm workbook:
_ rels: Contains information about the package relationships.
docProps: Contains XML files that describe the file properties and application settings.
xl: This folder holds the meat of the file. The name varies with the Office document type (xl, ppt, word, and so on). You'll find several XML files that contain settings for the workbook. And if your workbook contains VBA code, it will be in a binary file with a BIN extension. The xl folder has several subfolders (some workbooks may have more or fewer subfolders, depending on the content):
charts : Contains an XML file for each chart. This file contains the chart settings.
chartsheets: Contains an XML file with data for each chart sheet in the workbook.
diagrams: Contains XML files that describe the diagrams (SmartArt) in the workbook.
drawings : Contains an XML file with data for each "drawing." Drawings include items such as buttons , charts, and images.
media: Contains embedded media, such GIF and JPG files.
tables: Contains an XML file with data for each table.
theme: Contains an XML file with data about the workbook's theme.
worksheets: Contains an XML file for each worksheet in the workbook.
Tip | If you add a ZIP extension to an Excel file, you can still open it in Excel. Excel doesn't care what the file's extension is. Also, you can save a workbook with a ZIP extension. In the Save As dialog box, add a ZIP extension and then place double quotation marks around the entire file name. For example: "Myworkbook.xlsx.zip". |
The new "open" XML file formats for Microsoft Office represent a significant step for the computing community. For the first time, it's relatively easy to read and write Excel workbooks using software other than Excel. For example, it's possible to write a program to modify thousands of Excel workbook files without even opening Excel. Such a program could insert a new worksheet into every file. The programmer, of course, would need to have excellent knowledge of the XML file structures, but such a task is definitely doable.
Importantly, the new file formats are much less prone to corruption (compared to the old binary formats). I saved a workbook file and then deleted one of the worksheet XML files. When I tried to reopen it in Excel, I got the message shown in Figure 4-7. Excel was able to tell that the file was damaged by comparing the information in the .res files with what's actually in the file. In this case, Excel was able to repair the file and open it. The deleted worksheet was re-inserted, but it was empty.
In addition, the zipped XML files are usually smaller than comparable binary files. And, finally, the structured nature of the files makes it possible to extract individual elements (for example, all graphic images).
The typical Excel user won't need to examine or modify the XML components of a workbook file. But, as a developer, you may want to write code that changes Excel's Ribbon user interface. If that's the case, you will need to be at least somewhat familiar with the structure of a workbook XML file.
CROSS-REFERENCE | Refer to Chapter 22 for more information about modifying Excel's Ribbon. |