16.1. The Context | Prefactoring: Extreme Abstraction, Extreme Separation, Extreme Readability

< Day Day Up >

Spam mail needs no introduction. Everyone has probably received some that has leaked through spam filters. This chapter's case study looks at a program designed to check for and trap incoming spam. Many programs receive mail and many programs analyze mail for spam. So why create another one? This design shows some alternative ways of assigning responsibilities in a system. It demonstrates the guidelines in a different context.

16.1.1. The Environment

Several processes are involved in email delivery. The user agent (e.g., Outlook, Eudora, etc.) interacts with the user to create messages and to display received messages. The user agent sends outgoing messages to a mail server. The mail server ( SendingMailServer ) TRansmits the messages to the receiving mail server ( ReceivingMailServer ). The receiving server queues the received mail for a particular user. The user agent for the recipient picks up the received mail and displays it to the user. Figure 16-1 shows the email delivery process.

Figure 16-1. Process of delivering email

Mail also can be transmitted by programs that send messages to a list of people (e.g., Mailman ). These programs, often called list servers , typically send the messages to a SendingMailServer for delivery.

A spamming program acts as a SendingMailServer . It contacts ReceivingMailServer s to deliver the spam to the end user. It is not easy to control the senders, so the bulk of the responsibility for spam prevention falls on the receiver. In this chapter, we describe a ReceivingMailServer that identifies spam and reacts to it. The ReceivingMailServer must try to distinguish between a legitimate message and an illegitimate message. The server can examine a message to see if it resembles spam. The server can also identify SendingMailServer s that transmit mostly spam and establish by policy that all messages from those servers are illegitimate.

It is also incumbent on legitimate SendingMailServer s not to send spam. At the end of this chapter, we examine a design for a SendingMailServer that reuses the spam detection classes of a ReceivingMailServer .

16.1.2. SMTP

A SendingMailServer and a ReceivingMailServer communicate using the Simple Mail Transfer Protocol (SMTP). SMTP is described in RFC 2821 (http://www.ietf.org/rfc/rfc2821.txt). The basic protocol consists of a sequence of commands and responses between the sender and the receiver. The SMTP is a text-based protocol ("To Text or Not to Text"). To keep this description within reasonable bounds, I show only the basic SMTP sequence of commands to transfer a message in the following steps. All ReceivingMailServer s must support the commands in this sequence and all SendingMailServer s must be prepared to send to receivers who understand only these commands. ^[*]

^[*] You also can use SMTP as the protocol between a user agent and the SendingMailServer . To avoid being classified as a spammer, a SendingMailServer should authenticate user agents that use it to transmit email messages and should perform other operations to ensure that spam is not transmitted through it.

Here is the basic SMTP sequence of commands and responses:

ReceivingMailServer :

Awaits connection from sender.

On connection, sends greeting.
SendingMailServer :

Sends HELO SendingHost .
ReceivingMailServer :

Sends response.
SendingMailServer :

Sends MAIL FROM: < SendingUser> .
ReceivingMailServer :

Sends response.
SendingMailServer :

Sends RCPT TO: < Recipient> .

Multiple recipients are established by executing this command repeatedly.
ReceivingMailServer :

Sends response.
SendingMailServer :

Sends DATA .
ReceivingMailServer :

Sends response.

This intermediate response indicates a readiness to accept data.
SendingMailServer :

Sends data (the email message).

Data ends with EndofData (a period [ . ] all alone on a single line).
ReceivingMailServer :

Sends response.
SendingMailServer :

Sends QUIT .
ReceivingMailServer :

Sends response.

The response is indicated with a numeric value, as well as optional text. Response values fall into three main categories: Success , PermanentFailure , and TemporaryFailure . If the response is PermanentFailure , the SendingMailServer usually sends a message back to the SendingUser . If the response is TemporaryFailure , the SendingMailServer attempts the transfer again later. It might also send a message back to the SendingUser . If the response to the end of the data for the DATA command is Success , the ReceivingMailServer has accepted responsibility for delivering the message to the recipients.

The SendingUser and the Recipient are email addresses. The combination of the SendingUser and the list of Recipient s is termed the Envelope . The email addresses in the Envelope do not have to agree with the "From:" or "To:" addresses listed in the message header.

To complete the basic protocol, SendingMailServer can send four additional commands.

  EHLO

This requests that the ReceivingMailServer respond with additional (extended) commands that it supports. For the purposes of the design in this chapter, this works the same as HELO .

  VRFY

This requests that the ReceivingMailServer verify an email address. Many ReceivingMailServer s do not support this command.

  RSET

This clears the Envelope . MAIL FROM : also does this.

  NOOP

This is sent just to get a response from the ReceivingMailServer .

16.1.3. State Diagram

A state diagram can clarify the sequence of states for a ReceivingMailServer ("See What Condition Your Condition Is In"). The states shown in Figure 16-2 are AWAITING_HELO , AWAITING_MAIL_FROM , AWAITING_RCPT_TO_OR_DATA , INPUT_DATA , and TERMINATE .

Figure 16-2. State transition for SMTP for the ReceivingMailServer

The transition events are Helo (a HELO command), MailFrom (a MAIL FROM command), RcptTo : (a RCPT TO: command), Data (a DATA command), Rset (a RSET command), and Unrecognized (an unrecognized command).

I was originally tempted to name these events Hello , ReceiptTo , etc., to make them real words. However, the "Use the Client's Language" and "Consistency Is Simplicity" guidelines suggested that they should be named corresponding to the commands.

The INPUT_DATA state ends when the EndofData is received. No transition events are recognized within this state. Upon completion of INPUT_DATA , the state is set to AWAITING_MAIL_FROM . When the TERMINATE state is reached, the ReceivingMailServer terminates the connection to the SendingMailServer . If the amount of data received while in INPUT_DATA exceeds a maximum limit, the ReceivingMailServer might go to TERMINATE .

If an unknown command is received, the state does not change. A count of unknown commands (across all states) is kept and when more than MAXIMUM_UNKNOWN_COMMANDS_ALLOWED is reached, the state goes to TERMINATE .

< Day Day Up >