OpenOffice-The Open Document Format


Although Microsoft Office is by far the most popular set of applications for editing common documents, it is not the only set. Recently, a new competitor has increased in popularity: OpenOffice, also known as the Sun Java Desktop. This increasing popularity is partly because it is not Office, but also because of the file format used by these applications. OpenOffice uses a fully documented open XML format for its data. In addition, like Open XML, it uses multiple XML documents, separating the content from the formatting. These multiple documents are stored in a ZIP file, which represents the document created by the tools of OpenOffice.

As the Open Document Format is actually stored in ZIP format, you can open it with WinZip or similar tool and view the created documents. Figure 25-16 shows the files created for a simple OpenOffice Writer file.

image from book
Figure 25-16

The files stored within the ODF file contain not only the content of the document, but also the formatting and application configuration used. The typical files you see are the following:

  • q mimetype-A text file containing the MIME type for the document. For Writer documents, this is application/vnd.oasis.opendocument.text.

  • q content.xml-An XML file containing the actual text of the document, as well as the association of the styles used. Listing 25-14 shows a part of this document.

  • q styles.xml-An XML file containing the description of the styles used by the document.

  • q meta.xml-An XML file containing the metadata for the document using Dublin Core syntax. This includes the author, creation date and similar information.

  • q thumbnail.png-A graphics file showing the first page of the document. This is used by the operating system or other preview views of the file.

  • q settings.xml-An XML file that contains application settings for this document. This includes information such as the size and position of the window, printer settings and so on.

  • q manifest.xml-An XML file that lists the files stored in the document (see Figure 25-17). Each file is identified with a file-entry entry, which gives the MIME type of the file as well as the logical path within the XML file used to store the file.

image from book
Figure 25-17

Listing 25-14: Content.xml file

image from book
      <?xml version="1.0" encoding="utf-8"?>      <office:document-content      xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"      xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"      xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"      xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"      xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"      xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"      xmlns:xlink="http://www.w3.org/1999/xlink"      xmlns:dc="http://purl.org/dc/elements/1.1/"      xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"      xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"      xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"      xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0"      xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"      xmlns:math="http://www.w3.org/1998/Math/MathML"      xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"      xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"      xmlns:ooo="http://openoffice.org/2004/office"      xmlns:ooow="http://openoffice.org/2004/writer"      xmlns:oooc="http://openoffice.org/2004/calc"      xmlns:dom="http://www.w3.org/2001/xml-events"      xmlns:xforms="http://www.w3.org/2002/xforms"      xmlns:xsd="http://www.w3.org/2001/XMLSchema"      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"      office:version="1.0">        <office:scripts />        <office:font-face-decls>        </office:font-face-decls>        <office:automatic-styles>        </office:automatic-styles>        <office:body>          <office:text>            <text:sequence-decls>            </text:sequence-decls>            <table:table table:name="Table1" table:style-name="Table1">              <table:table-column table:style-name="Table1.A" />              <table:table-column table:style-name="Table1.B" />              <table:table-column table:style-name="Table1.C" />              <table:table-row table:style-name="Table1.1">                <table:table-cell table:style-name="Table1.A1"                table:number-columns-spanned="3"                office:value-type="string">                  <text:p text:style-name="P1">Foo deBar</text:p>                  <text:p text:style-name="Standard">123 Any Drive, Some                  Place, PA, 12345</text:p>                  <text:p text:style-name="Standard">+1 (111)                  555-1212</text:p>                  <text:p text:style-name="E-mail_20_address">                  foo@debar.com</text:p>      ... 
image from book

image from book
OpenDocument versus Open XML

The battle over the format of your documents has begun once again because the OpenDocument and Open XML formats are now both offering to help your word processing documents, spreadsheets, presentations, and other documents become cross- platform XML documents. OpenDocument is supported by Sun, IBM, the OASIS consortium, and others, and it is an ISO standard (26300). OpenXML is supported by Microsoft, ECMA, and is targeted (as of this writing) towards also becoming an ISO standard.

Choosing between these two formats on technical merit is difficult: Both use one or more XML files, stored in a ZIP format. Both leverage existing work and standards, such as namespaces, VML, XSD, XLink, SVG, and so on. Both use references heavily to connect parts of the document. OpenXML requires slightly more work to do this because it often requires you to follow two references: the first to the appropriate .rels file, and the second to the file containing the data.

Invariably, the choice between these two document formats is likely to be more of a business decision. Do you need to work with Word, Excel, and the rest of Microsoft Office 2007? Then use Microsoft Office. Would you rather align yourself with an Open Source file format or products such as Lotus Notes (that will support ODF in the future)? Use OpenOffice. Alternatively, as both file formats are XML, it is likely that you will be able to use XSLT to transform one document format into the other, allowing you to support both standards.

image from book

Just as with the OpenXML format, much of processing ODF involves following references. For example, the style reference E - mail_20_address is defined within the styles.xml file.

The basic flow for processing an ODF file is to open the \meta-inf\manifest.xml file to locate needed files. The bulk of the information is located in the content.xml file.




Professional XML
Professional XML (Programmer to Programmer)
ISBN: 0471777773
EAN: 2147483647
Year: 2004
Pages: 215

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net