Inside an Excel File


As I've noted, Excel 2007 uses a new XML format for its workbooks, templates, and addins. These files are actually ZIP compressed files. As such, they can be " unzipped " and examined.

Previous versions of Excel used a binary file format. Although the binary file format specifications are known, working with binary files is not easy. The Excel 2007 XML file format, on the other hand, is an " open format." As such, these files can be created and manipulated using other software.

Dissecting a file

In this section, I describe the various parts within a typical Excel XLSM (macro-enabled) workbook file. The workbook, named image from book  sample.xlsm , is shown in Figure 4-4. It has one worksheet, one chart sheet, and a simple VBA macro. The worksheet contains a table, a button (from the Forms controls), a SmartArt diagram, and a photo of a flower.

image from book
Figure 4-4: A simple workbook.
CD  

The image from book  sample.xlsm workbook is available on the companion CD-ROM.

To view the innards of an Excel 2007 file, you need to open an Explorer window and add a ZIP extension to the filename. So the image from book  sample.xlsm file is renamed to sample.xlsm.zip . You can then open the file by using any unzipping program. I use the zip feature built into Windows XP.

Tip  

You may prefer to extract the zipped files into an uncompressed directory. Doing so makes it easier to view the files. In Windows, right-click the filename and choose Extract All.

The first thing that you notice is that the file contains a directory structure. The left panel of Figure 4-5 shows the fully expanded directory structure for the workbook file. The actual directories will vary with the workbook.

image from book
Figure 4-5: The directory structure of the workbook file.

With a few exceptions, all of the files are text files. More specifically , they are XML files. You can view them in a text file editor, an XML editor, a Web browser, or even in Excel. Figure 4-6 shows one of these files viewed in the Firefox browser. The non-XML files include graphic images and VBA projects (these are stored in binary format).

image from book
Figure 4-6: Viewing an XML file in a Web browser.

This XML file has three root-level folders, and some of these have subfolders . You'll notice that many of the folders contain a _rels folder. These folders contain XML files that define the relationships to other parts within the package.

Following is a list of the folders in the image from book  sample.xlsm workbook:

  • _ rels: Contains information about the package relationships.

  • docProps: Contains XML files that describe the file properties and application settings.

  • xl: This folder holds the meat of the file. The name varies with the Office document type (xl, ppt, word, and so on). You'll find several XML files that contain settings for the workbook. And if your workbook contains VBA code, it will be in a binary file with a BIN extension. The xl folder has several subfolders (some workbooks may have more or fewer subfolders, depending on the content):

    • charts : Contains an XML file for each chart. This file contains the chart settings.

    • chartsheets: Contains an XML file with data for each chart sheet in the workbook.

    • diagrams: Contains XML files that describe the diagrams (SmartArt) in the workbook.

    • drawings : Contains an XML file with data for each "drawing." Drawings include items such as buttons , charts, and images.

    • media: Contains embedded media, such GIF and JPG files.

    • tables: Contains an XML file with data for each table.

    • theme: Contains an XML file with data about the workbook's theme.

    • worksheets: Contains an XML file for each worksheet in the workbook.

Tip  

If you add a ZIP extension to an Excel file, you can still open it in Excel. Excel doesn't care what the file's extension is. Also, you can save a workbook with a ZIP extension. In the Save As dialog box, add a ZIP extension and then place double quotation marks around the entire file name. For example: "Myworkbook.xlsx.zip".

Why is the file format important?

The new "open" XML file formats for Microsoft Office represent a significant step for the computing community. For the first time, it's relatively easy to read and write Excel workbooks using software other than Excel. For example, it's possible to write a program to modify thousands of Excel workbook files without even opening Excel. Such a program could insert a new worksheet into every file. The programmer, of course, would need to have excellent knowledge of the XML file structures, but such a task is definitely doable.

Importantly, the new file formats are much less prone to corruption (compared to the old binary formats). I saved a workbook file and then deleted one of the worksheet XML files. When I tried to reopen it in Excel, I got the message shown in Figure 4-7. Excel was able to tell that the file was damaged by comparing the information in the .res files with what's actually in the file. In this case, Excel was able to repair the file and open it. The deleted worksheet was re-inserted, but it was empty.

image from book
Figure 4-7: Excel can often repair a damaged workbook file.

In addition, the zipped XML files are usually smaller than comparable binary files. And, finally, the structured nature of the files makes it possible to extract individual elements (for example, all graphic images).

The typical Excel user won't need to examine or modify the XML components of a workbook file. But, as a developer, you may want to write code that changes Excel's Ribbon user interface. If that's the case, you will need to be at least somewhat familiar with the structure of a workbook XML file.

CROSS-REFERENCE  

Refer to Chapter 22 for more information about modifying Excel's Ribbon.




Excel 2007 Power Programming with VBA
Excel 2007 Power Programming with VBA (Mr. Spreadsheets Bookshelf)
ISBN: 0470044012
EAN: 2147483647
Year: 2007
Pages: 319

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net