Introduction


Credit: Guido van Rossum, creator of Python

Network programming is one of my favorite Python applications. I wrote or started most of the network modules in the Python Standard Library, including the socket and select extension modules and most of the protocol client modules (such as ftplib). I also wrote a popular server framework module, SocketServer, and two web browsers in Python, the first predating Mosaic. Need I say more?

Python's roots lie in a distributed operating system, Amoeba, which I helped design and implement in the late 1980s. Python was originally intended to be the scripting language for Amoeba, since it turned out that the Unix shell, while ported to Amoeba, wasn't very useful for writing Amoeba system administration scripts. Of course, I designed Python to be platform independent from the start. Once Python was ported from Amoeba to Unix, I taught myself BSD socket programming by wrapping the socket primitives in a Python extension module and then experimenting with them using Python; this was one of the first extension modules.

This approach proved to be a great early testimony of Python's strengths. Writing socket code in C is tedious: the code necessary to do error checking on every call quickly overtakes the logic of the program. Quick: in which order should a server call accept, bind, connect, and listen? This is remarkably difficult to find out if all you have is a set of Unix manpages. In Python, you don't have to write separate error-handling code for each call, making the logic of the code stand out much clearer. You can also learn about sockets by experimenting in an interactive Python shell, where misconceptions about the proper order of calls and the argument values that each call requires are cleared up quickly through Python's immediate error messages.

Python has come a long way since those first days, and now few applications use the socket module directly; most use much higher-level modules such as urllib or smtplib, and third-party extensions such as the Twisted framework, whose popularity keeps growing. The examples in this chapter are a varied bunch: some construct and send complex email messages, while others dwell on lower-level issues such as tunneling. My favorite is Recipe 13.11, which implements PyHeartBeat: it's useful, it uses the socket module, and it's simple enough to be an educational example. I do note, with that mixture of pride and sadness that always accompanies a parent's observation of children growing up, that, since the Python Cookbook's first edition, even PyHeartBeat has acquired an alternative server implementation based on Twisted!

Nevertheless, my own baby, the socket module itself, is still the foundation of all network operations in Python. It's a plain transliteration of the socket APIsfirst introduced in BSD Unix and now widespread on all platformsinto the object-oriented paradigm. You create socket objects by calling the socket.socket factory function, then you call methods on these objects to perform typical low-level network operations. You don't have to worry about allocating and freeing memory for buffers and the likePython handles that for you automatically. You express IP addresses as (host, port) pairs, in which host is a string in either dotted-quad ('1.2.3.4') or domain-name ('www.python.org') notation. As you can see, even low-level modules in Python aren't as low level as all that.

Despite the various conveniences, the socket module still exposes the actual underlying functionality of your operating system's network sockets. If you're at all familiar with sockets, you'll quickly get the hang of Python's socket module, using Python's own Library Reference. You'll then be able to play with sockets interactively in Python to become a socket expert, if that is what you want. The classic, highly recommended work on this subject is W. Richard Stevens, UNIX Network Programming, Volume 1: Networking APIs - Sockets and XTI, 2d ed. (Prentice-Hall). For many practical uses, however, higher-level modules will serve you better.

The Internet uses a sometimes dazzling variety of protocols and formats, and the Python Standard Library supports many of them. In the Python Standard Library, you will find dozens of modules dedicated to supporting specific Internet protocols (such as smtplib to support the SMTP protocol to send mail and nntplib to support the Network News Transfer Protocol (NNTP) to send and receive Network News). In addition, you'll find about as many modules that support specific Internet formats (such as htmllib to parse HTML data, the email package to parse and compose various formats related to emailincluding attachments and encoding).

I cannot even come close to doing justice to the powerful array of tools mentioned in this introduction, nor will you find all of these modules and packages used in this chapter, nor in this book, nor in most programming shops. You may never need to write any program that deals with Network News, for example; if that is the case, you don't need to study nntplib. But it is still reassuring to know it's there (part of the "batteries included" approach of the Python Standard Library).

Two higher-level modules that stand out from the crowd, however, are urllib and urllib2. Each of these two modules can deal with several protocols through the magic of URLsthose now-familiar strings, such as http://www.python.org/index.html, that identify a protocol (such as http), a host and port (such as www.python.org, port 80 being the default for the HTTP protocol), and a specific resource at that address (such as /index.html). urllib is very simple to use, but urllib2 is more powerful and extensible. HTTP is the most popular protocol for URLs, but these modules also support several others, such as FTP. In many cases, you'll be able to use these modules to write typical client-side scripts that interact with any of the supported protocols much quicker and with less effort than it might take with the various protocol-specific modules.

To illustrate, I'd like to conclude with a cookbook example of my own. It's similar to Recipe 13.2, but, rather than a program fragment, it's a little script. I call it wget.py because it does everything for which I've ever needed wget. (In fact, I originally wrote this script on a system where wget wasn't installed but Python was; writing wget.py was a more effective use of my time than downloading and installing the real thing.)

import sys, urllib def reporthook(*a): print a for url in sys.argv[1:]:     i = url.rfind('/')     file = url[i+1:]     print url, "->", file     urllib.urlretrieve(url, file, reporthook)

Pass this script one or more URLs as command-line arguments; the script retrieves them into local files whose names match the last components of the URLs. The script also prints progress information of the form:

(block number, block size, total size)

Obviously, it's easy to improve on this script; but it's only seven lines, it's readable, and it worksand that's what's so cool about Python.

Another cool thing about Python is that you can incrementally improve a program like this, and after it's grown by two or three orders of magnitude, it's still readable, and it still works! To see what this particular example might evolve into, check out Tools/webchecker/websucker.py in the Python source distribution. Enjoy!



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net