Recipe14.5.Checking Content Type via HTTP

Recipe 14.5. Checking Content Type via HTTP

Credit: Bob Stockwell

Problem

You need to determine whether a URL, or an open file, obtained from urllib.open on a URL, is of a particular content type (such as 'text' for HTML or 'image' for GIF).

Solution

The content type of any resource can easily be checked through the pseudo-file that urllib.urlopen returns for the resource. Here is a function to show how to perform such checks:

import urllib def isContentType(URLorFile, contentType='text'):     """ Tells whether the URL (or pseudofile from urllib.urlopen) is of         the required content type (default 'text').     """     try:         if isinstance(URLorFile, str):             thefile = urllib.urlopen(URLorFile)         else:             thefile = URLorFile         result = thefile.info( ).getmaintype( ) == contentType.lower( )         if thefile is not URLorFile:             thefile.close( )     except IOError:         result = False    # if we couldn't open it, it's of _no_ type!     return result

Discussion

For greater flexibility, this recipe accepts either the result of a previous call to urllib.urlopen, or a URL in string form. In the latter case, the Solution opens the URL with urllib and, at the end, closes the resulting pseudo-file again. If the attempt to open the URL fails, the recipe catches the IOError and returns a result of False, considering that a URL that cannot be opened is of no type at all, and therefore in particular is not of the type the caller was checking for. (Alternatively, you might prefer to propagate the exception; if that's what you want, remove the TRy and except clause headers and the result = False assignment that is the body of the except clause.)

Whether the pseudo-file was passed in or opened locally from a URL string, the info method of the pseudo-file gives as its result an instance of mimetools.Message (which doesn't mean you need to import mimetools yourselfurllib does all that's needed). On that object, we can call any of several methods to get the content type, depending on what exactly we wantgettype to get both main and subtype with a slash in between (as in 'text/plain'), getmaintype to get the main type (as in 'text'), or getsubtype to get the subtype (as in 'plain'). In this recipe, we want the main content type.

The string result from all of the type interrogation methods is always lowercase, so we take the precaution of calling the lower method on parameter contentType as well, before comparing for equality.

Recipe14.5.Checking Content Type via HTTP