Section 20.10. Exercises


20.10. Exercises

20-1.

urllib Module and Files. Update the friends3.py script so that it stores names and corresponding number of friends into a two-column text file on disk and continues to add names each time the script is run.

Extra Credit: Add code to dump the contents of such a file to the Web browser (in HTML format). Additional Extra Credit: Create a link that clears all the names in this file.

20-2.

urllib Module. Write a program that takes a user-input URL (either a Web page or an FTP file, i.e., http://python.org or ftp://ftp.python.org/pub/python/README), and downloads it to your machine with the same filename (or modified name similar to the original if it is invalid on your system). Web pages (HTTP) should be saved as .htm or .html files, and FTP'd files should retain their extension.

20-3.

urllib Module. Rewrite the grabWeb.py script of Example 11.,4, which downloads a Web page and displays the first and last non-blank lines of the resulting HTML file so that you use urlopen() instead of urlretrieve() to process the data directly (as opposed to downloading the entire file first before processing it).

20-4.

URLs and Regular Expressions. Your browser may save your favorite Web site URLs as a "bookmarks" HTML file (Mozilla-flavored browsers do this) or as a set of .URL files in a "favorites" directory (IE does this). Find your browser's method of recording your "hot links" and the location of where and how they stored. Without altering any of the files, strip the URLs and names of the corresponding Web sites (if given) and produce a two-column list of names and links as output, and storing this data into a disk file. Truncate site names or URLs to keep each line of output within 80 columns in size.

20-5.

URLs, urllib Module, Exceptions, and REs. As a follow-up problem to the previous one, add code to your script to test each of your favorite links. Report back a list of dead links (and their names), i.e., Web sites that are no longer active or a Web page that has been removed. Only output and save to disk the still-valid links.

20-6.

Error Checking. The friends3.py script reports an error if no radio button was selected to indicate the number of friends. Update the CGI script to also report an error if no name (e.g., blank or whitespace) is entered.

Extra Credit: We have so far explored only server-side error checking. Explore JavaScript programming and implement client-side error checking by creating JavaScript code to check for both error situations so that these errors are stopped before they reach the server.

Problems 20-7 to 20-10 below pertain to Web server access log files and regular expressions. Web servers (and their administrators) generally have to maintain an access log file (usually logs/access_log from the main Web, server directory) which tracks requests file. Over a period of time, such files get large and either need to be stored or truncated. Why not save only the pertinent information and delete the files to conserve disk space? The exercises below are designed to give you some exercise with REs and how they can be used to help archive and analyze Web server data.

20-7.

Count how many of each type of request (GET versus POST) exist in the log file.

20-8.

Count the successful page/data downloads: Display all links that resulted in a return code of 200 (OK [no error]) and how many times each link was accessed.

20-9.

Count the errors: Show all links that resulted in errors (return codes in the 400s or 500s) and how many times each link was accessed.

20-10.

Track IP addresses: For each IP address, output a list of each page/data downloaded and how many times that link was accessed.

20-11.

Simple CGI. Create a "Comments" or "Feedback" page for a Web site. Take user feedback via a form, process the data in your script, and return a "thank you" screen.

20-12.

Simple CGI. Create a Web guestbook. Accept a name, an e-mail address, and a journal entry from a user and log it to a file (format of your choice). Like the previous problem, return a "thanks for filling out a guestbook entry" page. Also provide a link that allows users to view guestbooks.

20-13.

Web Browser Cookies and Web site Registration. Update your solution to Exercise 20-4. So your user-password information should now pertain to Web site registration instead of a simple text-based menu system.

Extra Credit: familiarize yourself with setting Web browser cookies and maintain a login session for 4 hours from the last successful login.

20-14.

Web Clients. Port Example 20.1, crawl.py, the Web crawler, to using the HTMLParser module or the BeautifulSoup parsing system.

20-15.

Errors. What happens when a CGI script crashses? How can the cgitb module be helpful?

20-16.

CGI, File Updates, and Zip Files. Create a CGI application that not only saves files to the server's disk, but also intelligently unpacks Zip files (or other archive) into a subdirectory named after the archive file.

20-17.

Zope, Plone, TurboGears, Django. Investigate each of these complex Web development platforms and create one simple application in each.

20-18.

Web Database Application. Think of a database schema you want to provide as part of a Web database application. For this multi-user application, you want to provide everyone read access to the entire contents of the database, but perhaps only write access to each individual. One example may be an "address book" for your family and relatives. Each family member, once successfully logged in, is presented with a Web page with several options, add an entry, view my entry, update my entry, remove or delete my entry, and view all entries (entire database).

Design a UserEntry class and create a database entry for each instance of this class. You may use any solution created for any previous problem to implement the registration framework. Finally, you may use any type of storage mechanism for your database, either a relational database such as MySQL or some of the simpler Python persistent storage modules such as anydbm or shelve.

20-19.

Electronic Commerce Engine. Use the classes created for your solution to Exercise 13-11 and add some product inventory to create a potential electronic commerce Web site. Be sure your Web application also supports multiple customers and provides registration for each user.

20-20.

Dictionaries and cgi module. As you know, the cgi.FieldStorage() method returns a dictionary-like object containing the key-value pairs of the submitted CGI variables. You can use methods such as keys() and has_key() for such objects. In Python 1.5, a get() method was added to dictionaries which returned the value of the requested key, or the default value for a non-existent key. FieldStorage objects do not have such a method. Let's say we grab the form in the usual manner of:

form = cgi.FieldStorage()


Add a similar get() method to class definition in cgi.py (you can rename it to mycgi.py or something like that) such that code that looks like this:

if form.has_key('who'):     who = form['who'].value else:   who = '(no name submitted)'


... can be replaced by a single line which makes forms even more like a dictionary:

howmany = form.get('who', '(no name submitted)')


20-21.

Creating Web Servers. Our code for myhttpd.py in Section 20.7 is only able to read HTML files and return them to the calling client. Add support for plain text files with the ".txt" ending. Be sure that you return the correct MIME type of "text/plain."

Extra credit: add support for JPEG files ending with either ".jpg" or ".jpeg" and having a MIME type of "image/jpeg."

20-22.

Advanced Web Clients. URLs given as input to crawl.py must have the leading "http://" protocol indicator and top-level URLs must contain a trailing slash, i.e., http://www.prenhallprofessional.com/. Make crawl.py more robust by allowing the user to input just the hostname (without the protocol part [make it assume HTTP]) and also make the trailing slash optional. For example, www.prenhallprofessional.com should now be acceptable input.

20-23.

Advanced Web Clients. Update the crawl.py script in Section 20.3 to also download links that use the "ftp:" scheme. All "mailto:" links are ignored by crawl.py. Add support to ensure that it also ignores "telnet:", "news:", "gopher:", and "about:" links.

20-24.

Advanced Web Clients. The crawl.py script in Section 20.3 only downloads .html files via links found in Web pages at the same site and does not handle/save images that are also valid "files" for those pages. It also does not handle servers that are susceptible to URLs that are missing the trailing slash ( / ). Add a pair of classes to crawl.py to deal with these problems.

A My404UrlOpener class should subclass urllib.FancyURLOpener and consist of a single method, http_error_404() which determines if a 404 error was reached using a URL without a trailing slash. If so, it adds the slash and retries the request again (and only once). If it still fails, return a real 404 error. You must set urllib._urlopener with an instance of this class so that urllib uses it.

Create another class called LinkImageParser, which derives from htmllib.HTMLParser. This class should contain a constructor to call the base class constructor as well as initialize a list for the image files parsed from Web pages. The handle_image() method should be overridden to add image filenames to the image list (instead of discarding them like the current base class method does).



Core Python Programming
Core Python Programming (2nd Edition)
ISBN: 0132269937
EAN: 2147483647
Year: 2004
Pages: 334
Authors: Wesley J Chun

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net