SGML and XML

 <  Day Day Up  >  


The syntax of traditional HTML (2.0, 3.0, 3.2, and 4.0) is defined in SGML (Standard Generalized Markup Language) notation. SGML is a meta-language , a language that is used to define other languages. Although HTML is the best-known SGML-defined language, SGML itself has been used successfully to define special document types ranging from aviation maintenance manuals to scholarly texts . SGML is used to define the various elements, attributes, and entities of a markup language and the ways they can be used together. The various rules of the language are represented in a file called a document type definition or DTD. We reference the DTD for HTML using a doctype statement like this:

  <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"   "http://www.w3.org/TR/html4/loose.dtd">  

The actual DTD file contains a variety of statements in SGML that define the syntax of HTML. For example, the syntax for a br element is defined in SGML by the following:

 <!ELEMENT BR - O EMPTY                 -- forced line break --> <!ATTLIST BR   %coreattrs;                          -- id, class, style, title --   clear       (leftallrightnone) none -- control of text flow --   > 

The SGML syntax indicates that the br element is empty; namely, it encloses no content or markup and has the clear attribute to control text flow and core attributes id , class , style , and title . The rest of the HTML elements are similarly defined. If you are interested in reading the specification directly, you can learn how to read a DTD in Appendix F.

SGML is used to specify languages, and unless you are writing your own language, you probably will not use SGML directly but rather one of its application instance languages like HTML. However, if you are writing your own tags for a specialized application, you might wonder , why not use SGML? While SGML seems like a reasonable candidate to increase HTML's flexibility, as well as being able to scale and represent very complex information structures, it is overly complex at times and wasn't built with today's online applications in mind. The language first appeared in the late 1970s, the golden age of batch processing, and simply wasn't designed to be used in networked, interactive applications.

XML is, in fact, an attempt to define a subset of SGML that is specifically designed for use in a Web context. With XML, we define application languages generally using either a document type definition just like in SGML or a more powerful grammar mechanism called a schema . Consider the doctype statement for XHTML, which looks very similar to the traditional HTML doctype .

  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">  

The only differences we see between the two doctype statements are that the root element html is now lowercased and the identifier and URL reference the XHTML transitional specification instead of the HTML transitional specification.

Once you look at the XHTML DTD you'll note that even the syntax of DTDs is extremely similar to the HTML DTD, as demonstrated by the syntax for the br element under the XHTML transitional specification presented here.

 <!ELEMENT br EMPTY>   <!-- forced line break --> <!ATTLIST br        %coreattrs;        clear     (leftallrightnone) "none" > 

Both SGML and XML are meta-languages used to define markup languages. Over the years , various application languages have been defined using each, as shown here:

click to expand

Of course, HTML and XHTML are by far the most popular languages defined with these technologies.

At this point, you might wonder why you should care about XML. You might imagine you won't need to write a language of your own, and if you do you really wouldn't want to learn all that nasty XML syntax anyway. Writing your own language, however, does have advantages for exchanging information with other sites, and the strictness that XML provides actually makes parsing the data much easier. So read on- you'll find out that you can indeed get started with XML with very little difficulty!



 <  Day Day Up  >  


HTML & XHTML
HTML & XHTML: The Complete Reference (Osborne Complete Reference Series)
ISBN: 007222942X
EAN: 2147483647
Year: 2003
Pages: 252
Authors: Thomas Powell

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net