Cleaning Up HTML Documents with tidy


Cleaning Up HTML Documents with tidy

If you ever have to develop HTML documentswhen developing personal Web sites, completing a class project, or creating Web pages on the jobthe tidy utility can be a handy resource for you. If you're creating HTML pages by hand, you'll likely make occasional errors. These errors probably won't cause significant problems with using the pages, but they might make the pages harder to read, harder to maintain, and harder to subject to the scrutiny of your peers. Not to worry; tidy can help!

tidy is not usually included with Linux or Unix distributions, but you can download (and install, using the instructions in Chapter 14) from http://tidy.sourceforge.net.

To Clean Up Html Documents with tidy:

1.

vi sampledoc.html

Use the editor of your choice to create an HTML document. Our sample document is called, well, sampledoc.html (Figure 17.1) Don't worry about getting the tagging or syntax exactly right; tidy will take care of the details. Save and close your document.

Figure 17.1. Even a flawed HTML document, like this one, can be fixed by tidy.


2.

tidy sampledoc.html

The tidy utility will apply HTML formatting rules and then output a massaged version of your document that is technically correct (Code Listing 17.1). Cool, huh?

Code Listing 17.1. The tidy command is handy for cleaning up HTML documents.

[jdoe@frazz public_html]$ tidy   sampledoc.html Tidy (vers 4th August 2000) Parsing  "sampledoc.html line 10 column 6 -  Warning: discarding unexpected </ul> sampledoc.html: Document content looks  like HTML 2.0 1 warnings/errors were found! <!DOCTYPE html PUBLIC "-//IETF//DTD  HTML 2.0//EN> <html> <head> <meta name="generator" content="HTML  Tidy, see www.w3.org> <title>Jdoe's Home Page</title> </head> <body> <h1>Making Unix Work, One Day at a  Time</h1> <p>Read these tips, when I get around to  writing them, and weep.</p> <ul> <li>To be written</li> <li>To be written later</li> <li>To be written next week</li> </ul> <address>jdoe@example.com</address> </body> </html> HTML & CSS specifications are available  from http://www.w3.org/ To learn more about Tidy see  http://www.w3.org/People/Raggett/tidy/ Please send bug reports to Dave Raggett  care of <html-tidy@w3.org> Lobby your company to join W3C, see  http://www.w3.org/Consortium [jdoe@frazz public_html]$ 

3.

tidy sampledoc.html > fixedupdoc.html

If you like the results, redirect the document to a new filename, as shown here, or use tidy m sampledoc.html to replace the original document.

Tips

  • For even spiffier results, we like using tidy indent quiet doctype loosemodify sampledoc.html, which suppresses the informative messages from tidy, makes the output an HTML 4 document, tidily indents the output, and replaces the original with the modified file (Code Listing 17.2). All that, and only one command.

  • Consider using tidy with the sed script (described in the next section) to do a lot of cleanup at once.


Code Listing 17.2. The tidy command, with the appropriate flags, performs miraclesalmost.

 [jdoe@frazz public_html]$ tidy -indent  -quietdoctypeloose  sampledoc.html line 10 column 6 -- Warning: discarding  unexpected </ul> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML  4.01 Transitional//EN> <html>   <head>     <meta name="generator" content="HTML      Tidy, see www.w3.org>     <title>       Jdoe's Home Page     </title>   </head>   <body>     <h1>       Making Unix Work, One Day at a Time     </h1>     <p>      Read these tips, when I get around       to writing them, and weep.     </p>     <ul>       <li>         To be written       </li>       <li>         To be written later       </li>       <li>         To be written next week       </li>      </ul>      <address>        jdoe@example.com      </address>    </body>  </html> HTML&CSS specifications are available  from http://www.w3.org/ To learn more about Tidy see  http://www.w3.org/People/Raggett/tidy/ Please send bug reports to Dave Raggett  care of <html-tidy@w3.org> Lobby your company to join W3C, see  http://www.w3.org/Consortium [jdoe@frazz public_html]$ 




Unix(c) Visual Quickstart Guide
UNIX, Third Edition
ISBN: 0321442458
EAN: 2147483647
Year: 2006
Pages: 251

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net