MIME-Tools | Network Programming with Perl

	Network Programming with Perl By Lincoln D. Stein Slots : 1
	Table of Contents

	Chapter 7. SMTP: Sending Mail

Content

Net::SMTP and MailTools provide the basic functionality to create simple text-only e-mail messages. The MIME-Tools package takes this a step further by allowing you to compose multipart messages that contain text and nontext attachments. You can also parse MIME-encoded messages to extract the attachments, add or remove attachments, and resend the modified messages.

A Brief Introduction to MIME

The Multipurpose Internet Mail Extensions, or MIME, are described in detail in RFCs 1521, 2045, 2046, and 2049. Essentially, MIME adds three major extensions to standard Internet mail:

Every message body has a type. In the MIME world, the body of every message has a type that describes its nature; this type is given in the Content-Type: header field. MIME uses a type/subtype nomenclature in which type indicates the category of document, and subtype gives its specific format. Table 7.4 lists some common types and subtypes . The major media categories are "audio," "video," "text," and "image." The "message" category is used for e-mail enclosures, such as when you forward an e-mail onward to someone else, and the "application" category is a hodgepodge of things that could not be classified otherwise . We'll talk about "multipart" momentarily.
Every message body has an encoding. Internet e-mail was originally designed to handle messages consisting entirely of 7-bit ASCII text broken into relatively short lines; some parts of the e-mail system are still limited to this type of message. However, as the Internet became global, it became necessary to accommodate non-English character sets that have 8- or even 16-bit characters . Another problem was binary attachments such as image files, which are not even text-oriented.

To accommodate the full range of messages that people want to send without rewriting the SMTP protocol and all supporting software, MIME provides several standard encoding algorithms that can encapsulate binary data in a text form that conventional mailers can handle. Each header has a Content-Transfer-Encoding: field that describes the message body's encoding. Table 7.5 lists the five standard encodings.

If you are dealing with 8-bit data, only the quoted-printable and base64 encodings are guaranteed to make it through e-mail gateways.
Any message may have multiple parts. The multipart/* MIME types designate messages that have multiple parts. Each part has its own content type andMIME headers. It's even possible for a part to have its own subparts. The multipart/alternativec MIME type is used when the various subparts correspond to the same document repeated in different formats. For example, some browser-based mailers send their messages in both text-only and HTML form. multipart/mixed is used when the parts are not directly related to each other, for example an e-mail message and a JPEG enclosure.

Table 7.4. Common MIME Types

Type	Description
audio/*	A sound
audio/basic	Sun microsystem's audio "au" format
audio/mpeg	An MP3 file
audio/midi	An MIDI file
audio/x-aiff	AIFF sound format
audio/x-wav	Microsoft's "wav" format
image/*	An image
image/gif	Compuserve GIF format
image/jpeg	JPEG format
image/png	Portable network graphics format
image/tiff	TIFF format
message/*	An e-mail message
message/news	Usenet news message format
message/rfc822	Internet e-mail message format
multipart/*	A message containing multiple parts
multipart/alternative	The same information in alternative forms
multipart/mixed	Unrelated pieces of information mixed together
text/*	Human-readable text
text/html	Hypertext Markup Language
text/plain	Plain text
text/richtext	Enriched text in RFC 1523 format
text/tab-separated-values	Tables
video/*	Moving video or animation
video/mpeg	MPEG movie format
video/quicktime	Quicktime movie format
video/msvideo	Microsoft "avi" movie format
application/*	None of the above
application/msword	Microsoft Word Format
application/news-message-id	News posting format
application/octet-stream	A raw binary stream
application/postscript	PostScript
application/rtf	Microsoft rich text format
application/wordperfect5.1	Word Perfect 5.1 format
application/gzip	Gzip file compression format
application/zip	PKZip file compression format

Table 7.5. MIME Encodings

Encoding	Description
7bi	The body is not actually encoded. This value simply asserts that text is 7-bit ASCII, with no line longer than 1,000 characters.
8bit	The body is not actually encoded. This value asserts that the text may contain 8-bit characters, but has no line longer than 1,000 characters.
binary	The body is not actually encoded. This value asserts that the text may contain 8-bit characters and may have lines longer than 1,000 characters.
quoted-printable	This encoding is used for text-oriented messages that may contain 8-bit characters (such as messages in non-English character sets). All 8-bit characters are encoded into 7-bit escape sequences, and long lines are folded at 72 characters.
base64	This encoding is used for arbitrary binary data such as audio and images. Every 8-bit character is encoded as a 7-bit string using the uuencode algorithm. The resulting text is then folded into 72-character lines.

Any part of a multipart MIME message may contain a Content-Disposition: header, which is a hint to the mail reader as to how to handle the part. Possible dispositions include attachment, which tells the reader to treat the part's body as an enclosure to be saved to disk, and inline, which tells the reader to try to display the part as a component of the document. For example, a mail reader application may beable to display an inline image in the same window as the textual part of the message. The Content-Disposition: field can also suggest a filename to store attachments under. Another field, Content-Description:, provides an optional human-readable description of the part.

Notice that an e-mail message with a JPEG attachment is really a multipart MIME message containing two parts, one for the text of the message and the other for the JPEG image.

Without going into the format of a MIME message in detail, Figure 7. 5 shows a sample multipart message to give you a feel for the way they work. This message has four parts: a 7-bit text message that appears at the top of the message, a base64-encoded audio file that uses the Microsoft WAV format, a base64-encoded JPEG file, and a final 7-bit part that contains some parting words and the e-mail signature. (The binary enclosures have been truncated to save space.)

Figure 7.5. A sample multipart MIME message

graphics/07fig05.gif

Notice that each part of the message has its own header and body, and that the parts are delimited by a short unique boundary string beginning with a pair of hyphens. The message as a whole has its own header, which is a superset of the RFC 822 Internet mail header, and includes a Content-Type: field of multipart/mixed.

This is pretty much all you need to know about MIME. The MIME modules will do all the rest of the work for you.

Organization of the MIME::* Modules

MIME-Tools has four major parts.

MIME::Entity

MIME::Entity is a MIME message. It contains a MIME::Head (the message header) and a MIME::Body (the message body). In multipart messages, the body may contain other MIME::Entities, and any of these may contain their own MIME::Entities, ad infinitum.

Among other things, MIME::Entity has methods for turning the message into a text string and for mailing the message.

MIME::Head: MIME::Head is the header part of a MIME message. It has methods for getting and setting the various fields.

MIME::Body: MIME::Body represents the body part of a message. Because MIME bodies can get quite large (e.g., audio files), MIME::Body has methods for storing data to disk and reading and writing it in a filehandle-like fashion.

MIME::Parser: The MIME::Parser recursively parses a MIME-encoded message from a file, a filehandle, or in-memory data, and returns a MIME::Entity. You can then extract the parts, or modify and remail the message.

Figure 7.6 is a short example of using MIME::Entity to build a simple message that consists of a text greeting and an audio enclosure.

Figure 7.6. Sending an audio attachment with MIME tools

graphics/07fig06.gif

Lines 1 “3: Load modules We turn on strict type checking and load the MIME::Entity module. It brings in the other modules it needs, including MIME::Head and MIME::Body.

Lines 4 “8: Create top-level MIME::Entity Using the MIME::Entity->build() , method, we create a "top-level" multipart MIME message that contains the two subparts. The arguments to build() include the From: and To: fields, the Subject: line, and a MIME Type of multipart/mixed. This returns a MIME::Entity object.

Lines 9 “18: Attach the text of the message We create the text of the message and store it in a scalar variable. Then, using the top-level MIME entity's attach() method, we incorporate the text data into the growing multipart message, specifying a MIME Type of text/plain, an Encoding of 7bit, and the message text as the Data .

Lines 19 “23: Attach the audio file We again call attach() , but this time specify a Type of audio/wav and an Encoding of base64. We don't want to read the whole audio file into memory, so we use the Path argument to direct MIME::Entity to the file where the audio data can be found. The Description argument adds a human-readable description of the attachment to the outgoing message.

Lines 24 “25: Sign the message We call the MIME entity object's sign() utility to append our signature file to the text of the message.

Lines 26 “27: Send the message We call the send() method to format and mail the completed message using the smtp method.

That's all there is to it. In the next sections we will look at the MIME modules more closely.

MIME::Entity

MIME::Entity is a subclass of Mail::Internet and, like it, represents an entire e-mail message. However, there are some important differences between Mail::Internet and MIME::Entity. Whereas Mail::Internet contains just a single header and body, the body of a MIME::Entity can be composed of multiple parts, each of which may be composed of subparts. Each part and subpart is itself a MIME::Entity (Figure 7.7). Because of these differences, MIME:: Entity adds several methods for manipulating the message's body in an object-oriented fashion.

Figure 7.7. A MIME message can contain an unlimited number of nested attachments

graphics/07fig07.gif

This summary omits some obscure methods. See the MIME::Entity POD documentation for the full details.

The main constructor for MIME::Entity is build() : build() negotiates a large number of constructors. These are the most common:

$entity = MIME::Entity->build(arg1 => $val1, arg2 => $val2, ...)

The build() method is the main constructor for MIME::Entity. It takes a series of named arguments and returns an initialized MIME::Entity object. The following arguments are the most common.

Field name . Any of the RFC 822 or MIME-specific fields can be used as arguments, and the provided value will be incorporated into the message header. As in Mail::Header, you can use an array reference to pass a multivalued field. You should probably confine yourself to using RFC 822 fields, such as From: and To:, because any MIME fields that you provide will override those generated by MIME::Entity.

Data. For single-part entities only, the data to use as the message body. This can be a scalar or an array reference containing lines to be joined to form the body.

Path. For single-part entities only, the path to a file where the data for the body can be found. This can be used to attach to the outgoing message a file that is larger than you could store in main memory.

Boundary. The boundary string to place between parts of a multipart message. MIME::Entity will choose a good default for you; ordinarily you won't want to use this argument.

Description. A human-readable description of the body used as the value of the Content-Description: field.

Disposition. This argument becomes the value of the header's Content-Disposition: field. It may be either attachment or inline , defaulting to inline if the argument is not specified.

Encoding. The value of this argument becomes the Content-Encoding: field. Youshould provide one of 7bit, 8bit, binary, quoted-printable, or base64. Include this argument even if you are sending a simple text message because, if you don't, MIME::Entity defaults to binary . You may also provide a special value of- SUGGEST to have MIME::Entity make a guess based on a byte-by-byte inspection of the entire body.

Filename. The recommended filename for the mail reader to use when saving this entity to disk. If not provided, the recommended filename will be derived from the value of Path .

Type. The MIME type of the entity, text/plain by default. MIME::Entity makes no attempt to guess the MIME type from the file name indicated by the Path argument or from the contents of the Data argument.

Here's the idiom for creating a single-part entity (which may later be attached to a multipart entity):

 $part = MIME::Entity->build(To       => 'jdoe@acme.org',                             Type     => 'image/jpeg',                             Encoding => 'base64',                             Path     => '/tmp/pictures/oranges.jpg');

And here's the idiom for creating a multipart entity, to which subparts will be added:

 $multipart = MIME::Entity->build(To    => 'jdoe@acme.org',                                  Type  => 'multipart/mixed');

Notice that single-part entities should have a body specified using either the Data or the Path arguments. Multipart entities should not.

Once the MIME::Entity is created, you will attach new components to it using add-part() or attach() :

$part = $entity->add_part($part [,$offset])

The add_part() method adds a subpart to the multipart MIME::Entity contained in $entity . The $part argument must be a MIME::Entity object. Each multipart MIME::Entity object maintains an array of its subparts, and by default, the new part is appended to the end of the current array. You can modify this by providing an offset argument. The method returns the newly added part.

If you attempt to add a part to a single-part entity, MIME::Entity automagically converts the entity into type multipart/mixed, and reattaches the original contents as a subpart. The entity you are adding then becomes the second subpart on the list. This feature allows you to begin to compose a single-part message and later add attachments without having to start anew.

$part = $entity->attach(arg1 => $val1, arg2 => $val2, ...)

The attach() method is a convenience function that first creates a new MIME::Entity object using build() , and then calls $entity->add_part() to insert the newly created part into the message. The arguments are identical to those of build() . If successful, the method returns the new MIME::Entity.

Several methods provide access to the contents of the entity:

$head = $entity->head([$newhead])

The head() method returns the MIME::Head object associated with the entity. You can then call methods in the head object to examine and change fields. The optional $newhead argument, if provided, can be used to replace the header with a different MIME::Body object.

$body = $entity->bodyhandle([$newbody])

The bodyhandle() method gets or sets the MIME::Body object associated with the entity. You can then use this object to retrieve or modify the unencoded contents of the body. The optional $newbody argument can be used to replace the body with a different MIME::Body object. Don't confuse this method with body() , which returns an array ref containing the text representation of the encoded body.

If the entity is multipart, then there will be no body, in which case bodyhandle() , returns undef . Before trying to fetch the body, you can use the is_multipart() , method to check for this possibility.

$pseudohandle = $entity->open($mode)

The open() method opens the body of the entity for reading or writing, and returns a MIME pseudohandle. As described later in the section on the MIME::Body class, MIME pseudohandles have object methods similar to those in the IO::Handle class (e.g., read() , getline() , and print() ), but they are not handles in the true sense of the word. The pseudohandle can be used to retrieve or change the contents of the entity's body.

 $mode is one of "r" for reading, or "w" for writing.

@parts = $entity->parts($index)

$parts = $entity->parts($index)

@parst= $entity->parts(\@parts)

The parts() method returns the list of MIME::Entity parts in a multipart entity. If called with no arguments, the method returns the entire list of parts; if called with an integer index, it returns the designated part. If passed the reference to an array of parts, the method replaces the current parts with the contents of the array. This allows you delete parts or rearrange their order.

For example, this code fragment reverses the order of the parts in the entity:

 $entity->parts([reverse $entity->parts])

If the entity is not multipart, parts() returns an empty list.

A variety of methods return information about the Entity:

$type = $entity->mime_type

$type = $entity->effective_type

The mime_type() and effective_type() methods both return the MIME type of the entity's body. Although the two methods usually return the same value, there are some error conditions in which MIME::Parser cannot decode the entity and is therefore unable to return the body in its native form. In this case, mime_type() , returns the type that the body is supposed to be, and effective_type() returns the type that actually returns when you retrieve or save the body data (most probably application/octet-stream ). To be safe, use effective_type() when retrieving the body of an entity created by MIME::Parser. For entities you created yourself with MIME::Entity->build() , there's no difference.

$boolean = $entity->is_multipart

The is_multipart() method is a convenience routine that returns true if the entity is multipart, false if it contains a single part only.

$entity->sign(arg1 => $val1, arg2=> $val2, ...)

The sign() method attaches a signature to the message. If the message contains multiple parts, MIME::Entity searches for the first text entity and attaches the signature to that.

The method adds some improvements to the version implemented in Mail::Internet, however you must provide at least one set of named arguments. Possibilities include:

File. This argument allows you to use the signature text contained in a file. Its value should be the path to a local file.

Signature. This argument uses the indicated text as the signature. Its value can be a scalar or a reference to an array of lines.

Force. Sign the entity even if its content type isn't text/*. The value is treated as a Boolean.

Remove. Call remove_sig() to scan for an existing signature and remove it before adding the new signature. The value of this argument is passed to remove_sig() . Provide 0 to disable signature removal entirely.

For example, here's how to add a signature using a scalar value:

 $entity->sign(Signature => "That's all folks!");

$entity->remove_sig([$nlines])

Remove_sig() scans the last $nlines of the message body as it looks for a line consisting of the characters "--". The line and everything below it is removed. $nlines defaults to 10.

$entity->dump_skeleton([FILEHANDLE])

Dump_skeleton() is a debugging utility. It dumps a text representation of the structure of the entity and its subparts to the indicated filehandle, or, if no filehandle is provided, to standard output.

Finally, several methods are involved in exporting the entity as text and mailing it:

$entity->print([FILEHANDLE])

$entity->print_header([FILEHANDLE])

$entity->print_body([FILEHANDLE])

These three methods, inherited from Mail::Internet, print the encoded text representations of the whole message, the header, or the body, respectively. The parts of a multipart entity are also printed. If no filehandle is provided, it prints to STDOUT .

$arrayref = $entity->header

The header() method, which is inherited from Mail::Internet, returns the text representation of the header as a reference to an array of lines. Don't confuse this with the head() method, which returns a MIME::Head object.

$arrayref = $entity->body

This method, which is inherited from Mail::Internet, returns the body of the message as a reference to an array of lines. The lines are encoded in a form suitable for passing to a mailer. Don't confuse this method with bodyhandle() (discussed next), which returns a MIME::Body object.

$string = $entity->as_string $string

$string = $entity->stringify_body

$string $entity->stringify_header

The as_string() method converts the message into a string, encoding any parts that need to be. The stringify_body() and stringify_header() methods respectively operate on the body and header only.

$result = $entity->send([$method])

The send() method, which is inherited from Mail::Internet, sends off the message using the selected method. I have noticed that some versions of the UNIX mail program have problems with MIME headers, and so it's best to set $method explicitly to either "sendmail" or "smtp".

$entity->purge

If you have received the MIME::Entity object from MIME::Parser, it is likely that the body of the entity or one of its subparts is stored in a temporary file on disk. After you are finished using the object, you should call purge() to remove these temporary files, reclaiming the disk space. This does not happen automatically when the object is destroyed .

MIME::Head

The MIME::Head class contains information about a MIME entity's header. It is returned by the MIME::Entity head() method.

MIME::Head is a class of Mail::Header and inherits most of its methods from there. It is a historical oddity that one module is called "Head" and the other "Header." MIME::Head adds a few utility methods to Mail::Header, the most useful of which are read() and from_file() :

$head = MIME::Head->read(FILEHANDLE)

In addition to creating a MIME::Head object manually by calling add() for each header field, you can create a fully initialized header from an open filehandle by calling the read() method. This supplements Mail::Header's read() method, which allows you to read a file only into a previously created object.

$head = MIME::Head->from_file($file)

The from_file() constructor creates a MIME::Head object from the indicated file by opening it and passing the resulting filehandle to read() .

All other functions behave as they do in Mail::Header. For example, here is one way to retrieve and change the subject line in a MIME::Entity object:

 $old_subject = $entity->head->get('Subject'); $new_subject = "Re: $old_subject"; $entity->head->replace(Subject => $new_subject);

Like Mail::Header, MIME::Head->get() also returns newlines at the ends of removed field values.

MIME::Body

The MIME::Body class contains information on the body part of a MIME::Entity. MIME::Body objects are returned by the MIME::Entity bodyhandle() method, and are created as needed by the MIME::Entity build() and attach() methods. You will need to interact with MIME::Body objects when parsing incoming MIME-encoded messages.

Because MIME-encoded data can be quite large, an important feature of MIME::Body is its ability to store the data on disk or in memory ("in core " as the MIME-Tools documentation calls it). The methods available in MIME::Body allow you to control where the body data is stored, to read and write it, and to create new MIME::Body objects.

MIME::Body has three subclasses, each specialized for storing data in a different manner:

MIME::Body::File: This subclass stores its body data in a disk file. This is suitable for large binary objects that wouldn't easily fit into main memory.

MIME::Body::Scalar: This subclass stores its body data in a scalar variable in main memory. It's suitable for small pieces of data such as the text part of ane-mail message.

MIME::Body::InCore: This subclass stores its body data in an array reference kept in main memory. It's suitable for larger amounts of text on which you will perform multiple reads or writes .

Normally MIME::Parser creates MIME::Body::File objects to store body data on disk while it is parsing.

$body = MIME::Body::File->new($path)

To create a new MIME::Body object that stores its data to a file, call the MIME::, Body::File->new() method with the path to the file. The file doesn't have to exist, but will be created when you open the body for writing.

$body = MIME::Body::Scalar->new(\$string)

The MIME::Body::Scalar->new() method returns a body object that stores its data in a scalar reference.

$body = MIME::Body::InCore->new($string)

$body = MIME::Body::InCore->new(\$string)

$body = MIME::Body::InCore->new(\@string)

The MIME::Body::InCore class has the most flexible constructor. Internally it stores its data in an array reference, but it can be initialized from a scalar, a reference to a scalar, or a reference to an array.

Once you have a MIME::Body object, you can access its contents by opening it with the open() method.

$pseudohandle = $body->open($mode)

This method takes a single argument that indicates whether to open the body for reading ("r") or writing ("w"). The returned object is a pseudohandle that implements the IO::Handle methods read() , print() , and getline() . However, it is not a true filehandle, so be careful not to pass the returned pseudohandle to any of the built-in procedures such as <> or read() .

The following code fragment illustrates how to read the contents of a large MIME::Body stored in a MIME::Entity object and print it to STDOUT . The contents recovered in this way are in their native form, free of any MIME encoding:

 $body = $entity->body handle or die "no body"; $handle = $body->open("r"); print $data while $handle->read($data,1024);

For line-oriented data, we would have used the getline() method instead.

Another code fragment illustrates how to write a MIME::Body's contents using its print() method. If the body is attached to a file, the data is written there. Otherwise, it is written to an in-memory data structure:

 $body = $entity->body handle or die "no body"; $handle = $body->open("w"); $handle->print($_) while <>;

MIME::Body provides a number of convenience methods:

@lines = $body->as_lines

$string = $body->as_string

as_lines() and as_string() are convenience functions that return the entire contents of the body in a single operation. as_lines() opens the body and calls get_line() repeatedly, returning an array of newline- terminated lines. as_string() reads the entire body into a scalar. Because either method can read a large amount of data into memory, you should exercise some caution before calling them.

$path = $body->path([$newpath])

If the body object is attached to a file, as in MIME::Body::File, then path() returns the path to the file or sets it if the optional $newpath argument is provided. If the data is kept in memory, then path() returns undef .

$body->print([FILEHANDLE])

The print() method prints the unencoded body to the indicated filehandle, or, if none is provided, to the currently selected filehandle. Do not confuse this with the print() method provided by the pseudohandles returned by the open() method, which is used to write data into the body object.

$body->purge

Purge unlinks the file associated with the body object, if any. It is not called automatically when the object is destroyed.

MIME::Parser

The last major component of MIME-Tools is the MIME::Parser class, which parses the text representation of a MIME message into its various components. The class is simple enough to use, but has a large number of options that control various aspects of its operation. The short example in Figure 7.8 will give you the general idea.

Figure 7.8. Using MIME::Parser

graphics/07fig08.gif

Lines 1 “3: Load modules We turn on strict type checking and load the MIME::Parser module. It brings in the other modules it needs, including MIME::Entity.

Lines 4 “5: Open a message We recover the name of a file from the command line, which contains a MIME-encoded message, and open it. This filehandle will be passed to the parser later.

Lines 6 “8: Create and configure the parser We create a new parser object by calling MIME::Parser->new() . We then call the newly created object's output_dir() , method to set the directory where the parser will write the body data of extracted enclosures.

Lines 9 “10: Parse the file We pass the open filehandle to the parser's parse() , method. The value returned from the method is a MIME::Entity object corresponding to the top level of the message.

Lines 11 “14: Print information about the top-level entity To demonstrate that we parsed the message, we recover and print the From: and Subject: lines of the header, calling the entity's head() method to get the MIME::Head object each time. We also print the MIME type of the whole message, and the number of subparts, which we derive from the entity's parts() method.

Lines 15 “17: Print information about the parts We loop through each part of the message. For each, we call its mime_type() method to retrieve the MIME type, and the path() method of the corresponding MIME::Body to get the name of the file that contains the data.

Line 18: Clean up When we are finished, we call purge() to remove all the parsed body data files.

When I ran the program on a MIME message stored in the file mime.test, this is was the result:

 %  simple_parse.pl ~/mime.test  From       = Lincoln Stein <lstein@cshl.org> Subject    = testing mime parser MIME type  = multipart/mixed Parts      = 5         text/plain   /tmp/msg-1857-1.dat         audio/wav    /tmp/assimilated.wav         image/jpeg   /tmp/aw-2-19.jpg         audio/mpeg   /tmp/NorthwestPassage.mp3         text/plain   /tmp/msg-1857-2.dat

This multipart message contains five parts. The first and last parts contain text data and correspond to the salutation and the signature. The remaining parts are enclosures, consisting of an audio/wav sound file, a JPEG image, and a ripped MP3 track.

We will walk through a more complex example of MIME::Parser in Chapter 8, where we deal with writing Post Office Protocol clients . The example developed there will spawn external viewers to view image and audio attachments.

Because MIME files can be quite large, MIME::Parser's default is to store the parsed MIME::Body parts as files using the MIME::Body::File class. You can control where these files are stored using either the output_dir() or the output_under() methods. The output_dir() method tells MIME::Parser to store the parts directly inside a designated directory. output_under() , on the other hand, creates a two- tier directory. For each parsed e-mail message, MIME::Parser creates a subdirectory under the base directory specified by output_under() , and then writes the MIME::Body::File data there.

In either case, all the temporary files are cleared when you call the top-level MIME::Entity's purge() method. You can instead keep some or all of the parts. To keep some parts, step through the message parts and call purge() selectively on those that you don't want to keep. You can either leave the other parts where they are or move them to a different location for safekeeping. To keep all parsed parts, don't call purge() at all.

Parsing is complex, and the parse() method may die if it encounters any of a number of exceptions. You can catch such exceptions and attempt to perform some error recovery by wrapping the call to parse() in an eval{} block:

 $entity = eval { $parser->parse(\*F) }; warn $@ if $@;

Here is a brief list of the major functions in MIME::Parser, starting with the constructor.

$parser = MIME::Parser->new

The new() method creates a new parser object with default settings. It takes no arguments.

$dir = $parser->output_dir

$previous = $parser->output_dir($newdir)

The output_dir() method gets or sets the output directory for the parse. This is the directory in which the various parts and enclosures of the parsed message are (temporarily) stored.

If called with no arguments, it returns the current value of the output directory. If called with a directory path, it sets the output directory and returns its previous value. The default setting is ".", the current directory.

$dir = $parser->output_under

$parser->output_under($ basedir [,DirName=>$dir [,Purge=>$purge]])

output_under() changes the temporary file strategy to use a two-tier directory. MIME::Parser creates a subdirectory inside the specified base directory and then places the parsed MIME::Body::File data in the newly created subdirectory.

In addition to $basedir , output_under() accepts two optional named arguments:

DirName. By default, the subdirectory is named by concatenating the current time, process ID, and a sequence number. If you would like a more predictable directory name, you can use DirName to provide a subdirectory name explicitly.

Purge. If you use the same subdirectory name each time you run the program, you might want to set Purge to a true value, in which case output_under() will remove anything in the subdirectory before beginning the parse.

Called with no arguments, output_under() returns the current base directory name. Here are two examples:

 # store enclosures in ~/mime_enclosures $parser->output_under("$ENV{HOME}/mime_enclosures"); # store enclosures under /tmp in subdirectory "my_mime" $parser->output_under("/tmp", DirName=>'my_mime', Purge=>1);

The main methods are parse() , parse_data() , and parse_open() :

$entity = $parser->parse(\*FILEHANDLE)

The parse() method parses a MIME message by reading its text from an open filehandle. If successful, it returns a MIME::Entity object. Otherwise, parse() can throw any number of run-time exceptions. To catch those exceptions, wrap parse() in an eval{} block as described earlier.

$entity = $parser->parse_data($data)

The parse_data() method parses a MIME message that is contained in memory. $data can be a scalar holding the text of the message, a reference to a scalar, or a reference to an array of scalars. The latter is intended to be used on an array of the message's lines, but can be any array which, when concatenated , yields the text of the message. If successful, parse_data() returns a MIME::Entity object. Otherwise, it generates a number of run-time exceptions.

$entity = $parser->parse_open($file)

The parse_open() method is a convenience function. It opens the file provided, and then passes the resulting filehandle to parse() . It is equivalent to:

 open (F,$file); $entity = $parser->parse(\*F);

Because parse_open() uses Perl's open() function, you can play the usual tricks with pipes. For example:

 $entity = $parser->parse_open("zcat ./mailbox.gz ");

This uncompresses the compressed mailbox using the zcat program and pipes the result to parse() .

Several other methods control the way the parse operates:

$flag = $parser->output_to_core

$parser->output_to_core($flag)

The output_to_core() method controls whether MIME::Parser creates files to hold the decoded body data of MIME::Entity parts, or attempts to keep the data in memory. If $flag is false (the default), then the parts are parsed into disk files. If $flag is true, then MIME::Parser stores the body parts in main memory as MIME::Body::InCore objects.

Since enclosures can be quite large, you should be cautious about doing this. With no arguments, this method returns the current setting of the flag.

$flag = $parser->ignore_errors

$parser->ignore_errors($flag)

The ignore_errors() method controls whether MIME::Parser tolerates certain syntax errors in the MIME message during parsing. If true (the default), then errors generate warnings, but if not, they cause a fatal exception during parse() .

$error = $parser->last_error

$head = $parser->last_head

These two methods are useful for dealing with unparseable MIME messages. last_error() returns the last error message generated during the most recent parse. It is set when an error was encountered , and either ignore_errors() is true, or the call to parse() was wrapped in an eval{} .

last_head() returns the top-level MIME::Head object from the last stream we attempted to parse. Even though the body of the message wasn't successfully parsed, we can use the header returned by this method to salvage some information, such as the subject line and the name of the sender.

MIME Example: Mailing Recent CPAN Entries

In this section, we develop an application that combines the Net::FTP module from Chapter 19 with the Mail and MIME modules from this chapter. The program will log into the CPAN FTP site at ftp.perl.org, read the RECENT file that contains a list of modules and packages recently contributed to the site, download them, and incorporate them as attachments into an outgoing e-mail message. The idea is to run the script at weekly intervals to get automatic notification of new CPAN uploads.

Figure 7.9 shows the listing for the application, called mail_recent.pl.

Figure 7.9. The mail_recent.pl script

graphics/07fig09.gif

Lines 1 “4: Load modules We turn on strict syntax checking and load the Net::FTP and MIME::Entity modules.

Lines 5 “9: Define constants We set constants corresponding to the FTP site to connect to, the CPAN directory, and the name of the RECENT file itself. We also declare a constant with the e-mail address of the recipient of the message (in this case, my local username), and a DEBUG constant to turn on verbose progress messages.

Lines 10 “11: Declare globals The %RETRIEVE global contains the list of files to retrieve from CPAN. $TMPDIR contains the path of a directory in which to store the downloaded files temporarily before mailing them. This is derived from the TMPDIR environment variable, or, if not otherwise specified, from /usr/tmp. Windows and Macintosh users have to check and modify this for their systems.

Lines 12 “15: Log into CPAN and fetch the RECENT file We create a new Net::FTP object and log into the CPAN mirror. If successful, we change to the directory that contains the archive and call the FTP object's retr() method to return a filehandle from which we can read the RECENT file.

Lines 17 “23: Parse the RECENT file RECENT contains a list of all files on the CPAN archive that are new or have changed recently, but we don't want to download them all. The files we're interested in have lines that look like this:
 modules/by-module/Apache/Apache-Filter-1.011.tar.gz modules/by-module/Apache/Apache-iNcom-0.09.tar.gz modules/by-module/Audio/Audio-Play-MPG123-0.04.tar.gz modules/by-module/Bundle/Bundle-WWW-Search-ALL-1.09.tar.gz 
We open the file for reading and scan through it one line at a time, looking for lines that match the appropriate pattern. We store the filename and its CPAN path in %RETRIEVE .

After processing the filehandle, we close it.

Lines 24 “32: Begin the mail message We begin the outgoing mail message with ashort text message that gives the number of enclosures. We create a new MIME::Entity object by calling the build() constructor with the introduction as its initial contents.

Notice that the arguments we pass to build() create a single-part document of type text/plain. Later, when we add the enclosures, we rely on MIME::Entity's ability to convert the message into a multipart message when needed.

Lines 33 “44: Retrieve modules and attach them to the mail We loop through the filenames stored in %RETRIEVE . For each one, we call the FTP object's get() , method to download the file to the temporary directory. If successful, we use the Filename argument to attach the file to the outgoing mail message by calling the top-level entity's attach() method. Other attach() arguments set the encoding to base64 , and the MIME type to application/x-gzip. CPAN files are gzipped by convention. We also add a short description to the attachment; currently it is just a copy of the filename.

Line 45: Add signature to the outgoing mail If there is a file named .signature in the current user 's home directory, we call the MIME entity's sign() method to attach it to the end of the message.

Lines 46 “49: Send the mail We call the entity's send() method to MIME-encode the message and send it via the SMTP protocol. When this is done, we call the entity's purge() method, deleting the downloaded files in the temporary directory. This works because the files became the basis for the MIME-entity bodies via the MIME::Body::File subclass when they were attached to the outgoing message, and purge() recursively deletes these files.

Note that the send() method relies on libnet being correctly configured to find a working SMTP server. If this is not the case, check and fix the Libnet.cfg file.

Line 51: Close FTP connection Our last step is to close the FTP connection by calling the FTP object's quit() method.

Figure 7.10 shows a screenshot of Netscape Navigator displaying the resulting MIME message. Clicking on one of the enclosures will prompt you to save it to disk so that you can unpack and build the module.

Figure 7.10. A mail message sent from mail_recent.pl

graphics/07fig10.gif

A deficiency in the program is that the CPAN filenames can be cryptic, and it isn't always obvious what a package does. A nice enhancement to this script would be to unpack the package, scan through its contents looking for the POD documentation, and extract the description line following the NAME heading. This information could then be used as the MIME::Entity Description: field rather than the filename itself. A simpler alternative would be to enclose the .readme file that frequently (but not always) accompanies a package's .tar.gz file.

Top