Parsing XML

Table of contents:

This chapter introduces two ways of parsing XML data, available from Qt's XML module. We demonstrate event-driven parsing with SAX, the Simple API for XML, and tree-style parsing with DOM, the Document Object Model.

14.1

The Qt XML Module

325

14.2

Event-Driven Parsing

325

14.3

XML, Tree Structures, and DOM

329

XML is an acronym for eXtensible Markup Language. It is a markup language similar to HTML (HyperText Markup Language), but with stricter syntax and no semantics (i.e., no meanings associated with the tags).

XML's stricter syntax is in strong contrast to HTML. For example:

  • Each XML must have a closing , or be self-closing, like this:
    .
  • XML tags are case sensitive: is not the same as .
  • Characters such as > and < that are not actually part of a tag must be replaced by passive equivalents such as > and < in an XML document to avoid confusing the parser.

Example 14.1 is an HTML document that does not conform to XML rules.

Example 14.1. src/xml/html/testhtml.html

This is a title

This is a paragraph. What do you think of that?

Html makes use of unterminated line-breaks:
And those do not make XML parsers happy.

  • HTML is not very strict.
  • An unclosed tag doesn't bother HTML parsers one bit.

 

 


 

If we combined XML syntax with HTML element semantics, we would get a language called XHTML. Example 14.2 shows Example 14.1 rewritten as XHTML.

Example 14.2. src/xml/html/testxhtml.html

This is a title

This is a paragraph. What do you think of that?

Html self-terminating linebreaks are ok:
They don't confuse the XML parser.

  • This is proper list item
  • This is another list item

 

 


 

XML is a whole class of file formats that is understandable and editable by humans as well as by programs. XML has become a popular format for storing and exchanging data from Web applications. It is also a natural language for representing hierarchical (tree-like) information, which includes most documentation.

Many applications (e.g., Qt Designer, Umbrello, Dia) use an XML file format for storing data. Qt Designer's .ui files use XML to describe the layout of Qt widgets in a GUI. The book you are reading now is written in a flavor of XML called Slacker's DocBook.[1] It's like DocBook,[2] an XML language for writing books, but it adds some shorthand tags from XHTML and custom tags for describing courseware.

[1] http://slackerdoc.tigris.org/

[2] http://www.docbook.org

An XML document is comprised of nodes. Elements are nodes and look like this: text or elements . An opening tag can contain attributes. An attribute has the form: name="value". Elements nested inside one another form a parent-child tree structure.

Example 14.3. src/xml/sax1/samplefile.xml

Intro to XML This is a paragraph


 
  • This is an unordered list item.
  • This only shows up in the textbook

Look at this example code below:

In Example 14.3,

  • has two
  • children, and its parent is a. Elements with no children can be self-terminated with a />, i.e., . Some elements such asand have attributes. Indenting nested elements helps readability, but extra whitespace is ignored by most parsers.

    How many direct children are there of the

    ?  

    XML Editors

    There are several open-source XML editors available. You are encouraged to try them before you go with a commercial solution.

    1. jEdit[3] has an XML plugin that works quite well. Be sure to use this option: "insert closing tag when
    2. For KDE users, there is quanta.[4] Like kdevelop, this is based on Kate, the KDE advanced text editor. If you are accustomed to using emacs keys, be sure to get this Kate plugin: ktexteditor-emacsextensions.[5]
     

    [3] http://www.jedit.org

    [4] http://quanta.kdewebdev.org/

    [5] http://www.kde-apps.org/content/show.php?content=21706

    XMLLINT

    The free tool xmllint is very handy for checking an XML file for errors. It reports very descriptive error messages (mismatched start/end tags, missing characters, etc.) and points out where the errors are. It can also be used to indent/"pretty print" a well-formed XML document.



Part I: Introduction to C++ and Qt 4

C++ Introduction

Classes

Introduction to Qt

Lists

Functions

Inheritance and Polymorphism

Part II: Higher-Level Programming

Libraries

Introduction to Design Patterns

QObject

Generics and Containers

Qt GUI Widgets

Concurrency

Validation and Regular Expressions

Parsing XML

Meta Objects, Properties, and Reflective Programming

More Design Patterns

Models and Views

Qt SQL Classes

Part III: C++ Language Reference

Types and Expressions

Scope and Storage Class

Statements and Control Structures

Memory Access

Chapter Summary

Inheritance in Detail

Miscellaneous Topics

Part IV: Programming Assignments

MP3 Jukebox Assignments

Part V: Appendices

MP3 Jukebox Assignments

Bibliography

MP3 Jukebox Assignments



An Introduction to Design Patterns in C++ with Qt 4
An Introduction to Design Patterns in C++ with Qt 4
ISBN: 0131879057
EAN: 2147483647
Year: 2004
Pages: 268

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net