Automatic Conversion from HTML to XHTML

You may already have a huge Web site full of HTML pages, and you might be reading all this with some trepidation. How are you going to convert all those pages to the far more strict XHTML? In fact, there's a utility out there that can do it for you: the Tidy utility, created by Dave Raggett. This utility is available for a wide variety of platforms, and you can download it for free from http://tidy. sourceforge .net/. There's also a complete set of instructions on that page. (Other options include the HTML Kit at http://www.chami.com/ html-kit /.)

Here's an example: I'll use Tidy in Windows to convert a file from HTML to XHTML. In this case, I'll use the example HTML file we developed earlier, as saved in a file named ch16_01.html:

 <HTML>      <HEAD>         <TITLE>             Welcome to my page         </TITLE>     </HEAD>     <BODY>         <H1>             Welcome to XHTML!         </H1>     </BODY> </HTML> 

After downloading Tidy, you run it at the command prompt. Here are the command-line switches, or options, you can use with Tidy:

  • -config file Uses the configuration file named file

  • -indent or -i Indents element content

  • -omit or -o Omits optional end tags

  • -wrap 72 Wraps text at column 72 (default is 68)

  • -upper or -u Forces tags to upper case (default is lowercase)

  • -clean or -c Replaces font , nobr , &amp; , and center tags, by Cascading Style Sheets (CSS)

  • -raw Doesn't substitute entities for characters 128 to 255

  • -ascii Uses ASCII for output, Latin-1 for input

  • -latin1 Uses Latin-1 for both input and output

  • -utf8 Uses UTF-8 for both input and output

  • -iso2022 Uses ISO2022 for both input and output

  • -numeric or -n Outputs numeric rather than named entities

  • -modify or -m Modifies original files

  • -errors or -e Shows only error messages

  • -quiet or -q Suppresses nonessential output

  • -f file Writes errors to file

  • -xml Use this when input is in XML

  • -asxml Converts HTML to XML

  • -slides Bursts into slides on h2 elements

  • -help Lists command-line options

  • -version Shows release date

In this example, I'll use three switches: -m to indicate that I want Tidy to modify the file I pass to it, which will be index.html; -i to indicate that I want it to indent the resulting XHTML elements; and -config to indicate that I want to use a configuration file named config.txt. Here's how I use Tidy from the command line:

 %tidy -m -i -config ch16_02.txt ch16_01.html 

Tidy is actually a utility that cleans up HTML, as you might gather from its name . To make it create XHTML, you have to use a configuration file, which I've named ch16_02.txt here. You can see all the configuration file options on the Tidy Web site. Here are the contents of ch16_02.txt, which I'll use to convert ch16_01.html to XHTML:

Listing ch16_03.txt
 output-xhtml: yes add-xml-pi: yes doctype: loose 

Here, output-xhtml indicates that I want Tidy to create XHTML output. Using add-xml-pi indicates that the output should also include an XML declaration, and doctype: loose means that I want to use the transitional XHTML DTD. If you don't specify what DTD to use, Tidy will guess, based on your HTML.

Here's the resulting XHTML document:

 <?xml version="1.0"?>  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml">   <head>     <meta name="generator" content="HTML Tidy, see www.w3.org" />     <title>Welcome to my page</title>   </head>   <body>     <h1> Welcome to XHTML!</h1>   </body> </html> 

You can even teach Tidy about new XHTML tags that you've added. If you're ever stuck and want a quick way of translating HTML into XHTML, check out Tidy; it's fast, it's effective, and it's free.



Real World XML
Real World XML (2nd Edition)
ISBN: 0735712867
EAN: 2147483647
Year: 2005
Pages: 440
Authors: Steve Holzner

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net