16.4. ReceivedMailExaminer | Prefactoring: Extreme Abstraction, Extreme Separation, Extreme Readability

< Day Day Up >

The responsibility of the ReceivedMailExaminer is to examine the mail and determine if it is acceptable. The ReceivedMailExaminer delegates the actual examination to objects implementing the MailExaminer interface ("Separating Concerns Makes Smaller Concerns"), shown in Example 16-4.

Example 16-4. MailExaminer interface

 interface MailExaminer         UnacceptableRating examine_mail(MailDTO a_mail_dto,             MailReport a_report)

The UnacceptableRating returned by the examine_mail( ) method of MailExaminer is compared by the ReceivedMailExaminer to a configuration setting to see if the mail should be considered spam. If so, the appropriate response is returned to the ConnectionHandler .

On construction, the ReceivedMailExaminer calls the MailExaminerFactory . The MailExaminerFactory method checks configuration information to see what examiners are requested , creates the appropriate examiners, and returns a reference to an examiner . This reference might be to the only one, to a composite one, or to a DefaultMailExaminer .

The composite pattern (see Design Patterns by Erich Gamma et al.) is used if multiple MailExaminer s are desired. If the configuration calls for multiple examiners, MailExaminerFactory creates a MultipleMailExaminer :

 MultipleMailExaminer implements MailExaminer         add_examiner(MailExaminer an_examiner)         // plus methods in MailExaminer

The MailExaminerFactory adds each specified MailExaminer to the MultipleMailExaminer object.

When an examine method is called, the MultipleMailExaminer object calls each MailExaminer that has been added and returns the highest UnacceptableRating that is reported by any of them. ^[*]

^[*] There are at least two types of mail examiners: those that examine a message for spam and those that accept a message unconditionally. An example of the latter is checking the sender against a whitelist of senders that a user is willing to accept mail from, regardless of its contents. A MailExaminer of that type can return a value corresponding to UnconditionallyAccept . If one did so, further MailExaminer s need not be called.

16.4.1. Alternative Interface

An alternative interface for MailExaminer is to have multiple methods, as shown in Example 16-5. Each method is called based on the_status of the MailDTO .

Example 16-5. Alternative MailExaminer interface

 interface MailExaminer         UnacceptableRating examine_greeting(Greeting a_greeting,             MailReport a_report)         UnacceptableRating examine_envelope_sender(Greeting a_greeting,             EmailAddress sender, MailReport a_report)         UnacceptableRating examine_envelope_recipient(Greeting a_greeting,             EmailAddress sender, EmailAddress recipient, MailReport a_report)

The two approaches to the MailExaminer interface (Examples 16-4 and 16-5) are another example of the "Spreadsheet Conundrum." In Example 16-4, the single method is passed an object that has information on how much data it contains. In Example 16-5, each individual method designates how much data is being used.

16.4.2. Alternative Responsibility Assignment

An alternative assignment of responsibilities is to make the MailDTO class more active. For example, the method that adds a recipient to an Envelope could also call the examine_mail( ) method in ReceivedMailExaminer . This would tie MailDTO to ReceivedMailExaminer . This coupling makes MailDTO less usable in other contexts that do not require spam checking ("Obey the Class Maxims"). In addition, MailDTO , along with the associated ADTs, has enough work to do to ensure the validity of the message format ("Do a Little Job Well and You May Be Called Upon Often").

This separation also makes testing easier ("Plan for Testing"). The ReceivingMailServer can be tested without any MailExaminer s. The individual MailExaminer s can be called with MailDTO s that are assembled from a collection of messages. The performance of each MailExaminer in identifying spam can be analyzed easily in a context other than in the act of receiving mail.

16.4.3. MailReport

A MailReport represents the examination methods' findings. Each MailExaminer adds an instance of ExaminerResult s to the MailReport . These classes look like this:

 class ExaminerResult         CommonString examiner_name         UnacceptableRating unacceptable_rating         CommonString [] explanation         HeaderField [] headers_to_insert     class MailReport         ExaminerResult [] results

The MailReport is stored along with the message in the ReceivedMailQueue (to be described shortly). When the message is delivered to the recipient's user agent, the MailReport can be appended to it, based on user configuration settings. A recipient can use this information for further message filtering.

16.4.4. MailExaminers

Each method in the MailExaminer interface determines whether an email message is spam, based on the information currently available. The following examples list some ways that a MailExaminer can examine a MailDTO to determine if a message is spam. The list is based on current spam filters (such as sendmail milters) plus comments made in an antispam mailing list (spamtools).

The Greeting can be examined to:

Check that the sending_ip_address is not listed in a black-hole list. Black-hole lists are maintained by several organizations. They contain the IPAddress es of hosts that have been reported as sending spam.
Check that the hello_host_name is formatted properly.

See if the reverse name lookup for an IPAddress matches the hello_host_name .

The Envelope can be examined to:

Check that the sender is not on a blacklist that is global to the ReceivingMailServer .
Keep track of how many recipients were invalid. If the number exceeds MAXIMUM_NUMBER_INVALID_RECIPIENTS , return response with an appropriate result.

For each recipient:

See if the sending_ip_address or the sender is on a blacklist for the recipient. If either is the case, the recipient can be removed from the_envelope.recipients .
See if the recipient is a honey pot . A honey pot is an email address that is placed on web pages or in news groups specifically to attract spammers who use automated tools to generate mailing lists. Appropriate action can be taken.

The Message can be examined to:

Check for spam by executing a program that uses rules or Bayesian analysis to determine a spam score and take appropriate action ("The Easiest Code to Debug Is That Which Is Not Written"). SpamAssassin is an example of a rule-based spam judge.
Check the headers to see if the "Received" headers form a proper sequence from user agent through final delivery.
Check for viruses by executing a virus checker to see if the message contains a virus ("Don't Reinvent the Wheel").
Check the_envelope.recipients and the Greeting to make user-based spam determination.

Note: One aspect of SMTP in regards to spam is that the DATA command can be accepted or rejected only as a whole. If users were allowed to create their own configuration commands (e.g., accept anything), some users might reject a message while others accept it. If this were the case, the mail administrator has a quandary : whether to return Success or PermanentFailure . A common policy for messages that are accepted ( Success returned) is to tag the message with a header line and deliver it to the users who wanted to reject it. Another alternative could be to send back a PermanentFailure response with the textual explanation that the message was not delivered to all recipients, but to deliver the message to any recipients that did not reject it ("Never Remain Silent").

To check on individual users, the RecipientCollection is accessed:

 interface RecipientCollection         Recipient find_recipient(EmailAddress recipient)     interface SpamConfiguration         should_this_message_be_accepted(Greeting a_greeting, Envelope an_envelope)     class Recipient         EmailAddress email_address         SpamConfiguration spam_configuration

These interfaces are implemented by accessing the appropriate user information on a particular operating system ("Adapt or Adopt"). For example, on Linux/Unix, a datafile (e.g., /etc/passwd on an unsecured system) is searched for the recipient by the find_recipient( ) method. The SpamConfiguration information can be read from a file in the user's home directory.

16.4.5. Efficiency Considerations

It is possible that the MailExaminer s are slow to examine a message. SendingMailServer s do not wait forever for a response. The time it takes to determine a response to most commands is short. However, if the amount of data in the Message is large, it could take longer to check for spam than the SendingMailServer is willing to wait. In that case, the sender might terminate the connection and attempt to resend the email later.

If an examination of the logs shows that a faster response to the DATA command is needed, several solutions might present themselves ( "Don't Speed Until You Know Where You Are Going") . The MailExaminer s could be given a fixed amount of time to create an UnacceptableRating . If that time was exceeded, the rating could default to an acceptable one. As another alternative, a listener interface could be implemented. At particular points within the receipt of the email message, the DataCommand process( ) method could call DataCommandListener s. The interface is as follows :

 DataCommandListener         UnacceptableRating on_received_line(CommonString line, Byte [] bytes)         UnacceptableRating on_end_of_header(MailDTO mail_dto)         UnacceptableRating on_end_of_content_part(MailDTO mail_dto)     DataCommand         DataCommandListener data_command_listener

The DataCommand.process( ) method creates a listener by calling the following:

 DataCommandListener data_command_listener =         DataCommandListenerFactory.get_instance(  )

The DataCommandListenerFactory method checks the configuration file, and creates a listener, a composite listener, or a DefaultDataCommandListener (if there are no listeners in the configuration file). "Consistency Is Simplicity" suggests that this pattern should follow the MailExaminer pattern.

At each appropriate point in receiving data, the DataCommand.process( ) method executes the appropriate method in DataCommandListener . If the UnacceptableRating exceeds a configuration setting, the process( ) method is terminated and an appropriate Response is returned. ^[*]

^[*] A composite DataCommandListener that calls multiple DataCommandListener s might need to organize those calls so that implementations that are quick to execute are called first.

16.4.6. Separation of Concerns

The main method of ReceivingMailServer creates a listening socket on TCP port 25, the SMTP port. When a connection is received, it creates a thread that handles the connection. The logic for the ConnectionHandler as shown in Example 16-1 handles a single connection. Each connection from a SendingMailServer could be handled serially . However, that would tie up the ReceivingMailServer for long periods.

So ConnectionHandler.handler( ) runs in the context of a thread. Each connection from a SendingMailServer operates in a different thread. The logic of ConnectionHandler.handler( ) does not change regardless of whether it runs serially or in a thread.

The portion of the program that creates the separate threads can keep track of the number of connections from a particular SendingMailServer . If that number exceeds MAXIMUM_NUMBER_OF_CONNECTIONS_FROM_SENDING_MAIL_SERVER , subsequent connection attempts could be dropped immediately and handle_connection( ) would not be called.

This separates the processing of a single thread from the processing of a group of threads ("Do a Little Job Well and You May Be Called Upon Often").

< Day Day Up >