One area where Unix really shines is the Internet. Whether it's running a fast server from under your desk or simply surfing the Web intelligently and efficiently , there's precious little you can't embed in a shell script when it comes to Internet interaction.
Internet tools are scriptable, even though you might never have thought of them that way. For example, ftp , a program that is perpetually trapped in debug mode, can be scripted in some very interesting ways, as is explored in Script #59. It's not universally true, but shell scripting can improve the performance and output of most command-line utilities that work with some facet of the Internet.
Perhaps the best tool in the Internet scripter's toolbox is lynx , a powerful text-only web-browsing tool. Sites don't look glamorous when you strip out all the graphics, but lynx has the ability to grab website content and dump it to standard output, making it a breeze to use grep and sed to extract specific snippets of information from any website, be it Yahoo!, the Federal Reserve, or even the ESPN.com home page.
Figure 7-1 shows how my own website ( http://www.intuitive.com/ ) looks in the spartan lynx browser:
An alternative browser that's, well, synonymous with lynx is links , offering a similar text-only browsing environment that has rich possibilities for use in shell scripting. Of the two, lynx is more stable and more widely distributed.
If you don't have either browser available, you'll need to download and install one or the other before you proceed with the scripts in this chapter. You can get lynx from http://lynx.browser.org/ and links from http://links.browser.org/ . The scripts in this chapter use lynx , but if you have a preference for links , it is sufficiently similar that you can easily switch the scripts to use it without much effort.
One limitation to the website scraper scripts in this chapter is that if the website that a script depends on changes its layout, the script can end up broken until you go back and ascertain what's changed with the site. If any of the website layouts have changed since November 2003, when this chapter was completed, you'll need to be comfortable reading HTML (even if you don't understand it all) to fix these scripts. The problem of tracking other sites is exactly why the Extensible Markup Language (XML) was created: It allows site developers to provide the content of a web page separately from the rules for its layout.