There are a variety of Internet-related modules in the standard library that will not be covered here in their specific usage. In the first place, there are two general aspects to writing Internet applications. The first aspect is the parsing, processing, and generation of messages that conform to various protocol requirements. These tasks are solidly inside the realm of text processing and should be covered in this book. The second aspect, however, are the issues of actually sending a message "over the wire": choosing ports and network protocols, handshaking, validation, and so on. While these tasks are important, they are outside the scope of this book. The synopses below will point you towards appropriate modules, though; the standard documentation, Python interactive help, or other texts can help with the details.
A second issue comes up also, moreover. As Internet standards usually canonicalized in RFCs have evolved, and as Python libraries have become more versatile and robust, some newer modules have superceded older ones. In a similar way, for example, the re module replaced the older regex module. In the interests of backwards compatibility, Python has not dropped any Internet modules from its standard distributions. Nonetheless, the email module represents the current "best practice" for most tasks related to email and newsgroup message handling. The modules mimify, mimetools, MimeWriter, multifile, and rfc822 are likely to be utilized in existing code, but for new applications, it is better to use the capabilities in email in their stead.
As well as standard library modules, a few third-party tools deserve special mention (at the bottom of this section). A large number of Python developers have created tools for various Internet-related tasks, but a small number of projects have reached a high degree of sophistication and a widespread usage.
5.3.1 Standard Internet-Related Tools
Asynchronous socket service clients and servers.
Manage Web browser cookies. Cookies are a common mechanism for managing state in Web-based applications. RFC-2109 and RFC-2068 describe the encoding used for cookies, but in practice MSIE is not very standards compliant, so the parsing is relaxed in the Cookie module.
SEE ALSO: cgi 376; httplib 396;
Work with character set encodings at a fine-tuned level. Other modules within the email package utilize this module to provide higher-level interfaces. If you need to dig deeply into character set conversions, you might want to use this module directly.
SEE ALSO: email 345; email.Header 351; unicode 423; codecs 189;
Support for implementing custom File Transfer Protocol (FTP) clients. This protocol is detailed in RFC-959. For a full FTP application, ftplib provides a very good starting point; for the simple capability to retrieve publicly accessible files over FTP, urIIib.urlopen() is more direct.
SEE ALSO: urllib 388; urllib2 398;
Gopher protocol client interface. As much as I am still personally fond of the gopher protocol, it is used so rarely that it is not worth documenting here.
Support for implementing custom Web clients. Higher-level access to the HTTP and HTTPS protocols than using raw sockets on ports 80 or 443, but lower-level, and more communications oriented, than using the higher-level urllib to access Web resources in a file-like way.
SEE ALSO: urllib 388; socket 397;
Internet access configuration (Macintosh).
Internet Config replacement for open() (Macintosh).
Recognize image file formats based on their first few bytes.
Examine the mailcap file on Unix-like systems. The files /etc/mailcap, /usr/etc/mailcap, /usr/local/etc/mailcap, and $HOME/.mailcap are typically used to configure MIME capabilities in client applications like mail readers and Web browsers (but less so now than a few years ago). See RFC-1524.
Interface to MH mailboxes. The MH format consists of a directory structure that mirrors the folder organization of messages. Each message is contained in its own file. While the MH format is in many ways better, the Unix mailbox format seems to be more widely used. Basic access to a single folder in an MH hierarchy can be achieved with the mailbox.MHMailbox class, which satisfies most working requirements.
SEE ALSO: mailbox 372; email 345;
Various tools used by MIME-reading or MIME-writing programs.
Generic MIME writer.
Mimification and unmimification of mail messages.
Examine the netrc file on Unix-like systems. The file $HOME/.netrc is typically used to configure FTP clients.
SEE ALSO: ftplib 395; urllib 388;
Support for Network News Transfer Protocol (NNTP) client applications. This protocol is defined in RFC-977. Although Usenet has a different distribution system from email, the message format of NNTP messages still follows the format defined in RFC-822. In particular, the email package, or the rfc822 module, are useful for creating and modifying news messages.
SEE ALSO: email 345; rfc822 397;
Wrapper around Netscape OSA modules (Macintosh).
RFC-822 message manipulation class. The email package is intended to supercede rfc822, and it is better to use email for new application development.
SEE ALSO: email 345; poplib 368; mailbox 372; smtplib 370;
Wait on I/O completion, such as sockets.
Recognize sound file formats based on their first few bytes.
Low-level interface to BSD sockets. Used to communicate with IP addresses at the level underneath protocols like HTTP, FTP, POP3, Telnet, and so on.
SEE ALSO: ftplib 395; gopherlib 395; httplib 396; imaplib 366; nntplib 397; poplib 368; smtplib 370; telnetlib 397;
Asynchronous I/O on sockets. Under Unix, pipes can also be monitored with select.socket supports SSL in recent Python versions.
Support for implementing custom telnet clients. This protocol is detailed in RFC-854. While possibly useful for intranet applications, Telnet is an entirely unsecured protocol and should not really be used on the Internet. Secure Shell (SSH) is an encrypted protocol that otherwise is generally similar in capability to Telnet. There is no support for SSH in the Python standard library, but third-party options exist, such as pyssh. At worst, you can script an SSH client using a tool like the third-party pyexpect.
An enhanced version of the urllib module that adds specialized classes for a variety of protocols. The main focus of urllib2 is the handling of authentication and encryption methods.
SEE ALSO: urllib 388;
Remote-control interfaces to some browsers.
5.3.2 Third-Party Internet Related Tools
There are many very fine Internet-related tools that this book cannot discuss, but to which no slight is intended. A good index to such tools is the relevant page at the Vaults of Parnassus:
In brief, Quixote is a templating system for HTML delivery. More so than systems like PHP, ASP, and JSP to an extent, Quixote puts an emphasis on Web application structure more than page appearance. The home page for Quixote is <http://www.mems-exchange.org/software/quixote/>
To describe Twisted, it is probably best simply to quote from Twisted Matrix Laboratories' Web site <http://www.twistedmatrix.com/>:
While Twisted overlaps significantly in purpose with Zope, Twisted is generally lower-level and more modular (which has both pros and cons). Some protocols supported by Twisted usually both server and client and implemented in pure Python are SSH; FTP; HTTP; NNTP; SOCKSv4; SMTP; IRC; Telnet; POP3; AOL's instant messaging TOC; OSCAR, used by AOL-IM as well as ICQ; DNS; MouseMan; finger; Echo, discard, chargen, and friends; Twisted Perspective Broker, a remote object protocol; and XML-RPC.
Zope is a sophisticated, powerful, and just plain complicated Web application server. It incorporates everything from dynamic page generation, to database interfaces, to Web-based administration, to back-end scripting in several styles and languages. While the learning curve is steep, experienced Zope developers can develop and manage Web applications more easily, reliably, and faster than users of pretty much any other technology.
The home page for Zope is <http://zope.org/>.