1.1 Mail Basics

The Internet's SMTP mail delivers a message from a sender to one or more recipients. The sender and recipients are usually people, but may also be mailing lists or other software agents. From the point of view of the mail system, the sender and each recipient are addresses. The message is a sequence of lines of text. (RFC 2821 uses the word "mailbox" as a synonym for "address" and "content" for the message.)

1.1.1 Addresses

All email addresses have the simple form local-part@domain. The domain, the part after the at-sign, indirectly identifies a host to which mail should be delivered (although the host rarely has the same name as the domain). The local-part, the part before the at-sign, identifies a mailbox within that domain.

The set of valid domains is maintained by the Internet's Domain Name System (DNS). Every domain is a sequence of names separated by dots, such as example.com. The names in email domains consist of letters, digits, and hyphens. (If current efforts to internationalize domain names ever settle down, the set of valid characters will probably become larger.)

The local-part is interpreted only by the host that handles the address's domain. In principle, the mailbox can contain any characters other than an at-sign and angle brackets, but in practice, it is usually limited to letters, digits, and a small set of punctuation such as dots, hyphens, and underscores. Upper- and lowercase letters are equivalent in domains. It's up to the receiving mail host whether upper- and lowercase are equivalent in local parts, although most mail software including qmail treats them as equivalent.

Addresses appear in two different contexts: "envelope" data that is part of an SMTP transaction defined by RFC 2821, or in the header of a message defined by RFC 2822. In an SMTP envelope, addresses are always enclosed in angle brackets and do not use quoting characters or permit comments. In message headers, the address syntax is considerably more flexible. An address like "Fred.Smith"@example.com (Fred Smith) is valid in message headers but not in SMTP. (The form Fred.Smith@example.com is valid in either.)[2]

[2] Sendmail has often confused the two address contexts and has accepted message header formats in SMTP, both causing and masking a variety of bugs.

1.1.2 Envelopes

Every message handled by SMTP has an envelope containing the addresses of the sender and recipients). Often the envelope addresses match the addresses in the To: and From: headers in the message, but they don't have to match. There are plenty of legitimate reasons why they might not.

The envelope sender address is primarily used as the place to send failure reports (usually called bounce messages) if message can't be delivered. If the sender address is null (usually written in angle brackets as <>), any failure reports are discarded. Bounce messages are sent with null envelope senders to avoid mail loops if the bounce message can't be delivered. The sender address doesn't affect normal mail delivery.

The envelope recipient address(es) control where a message is to be delivered. Usually a message starts out with the envelope recipients matching the ones on the To: and Cc: lines, but as a message is routed through the network, the addresses change. If, for example, a message is sent to able@example.com and baker@domain.com, the copy sent to the host handling example.com will only have able's address in the envelope and the one sent to the host handling domain.com will only have baker's address. In many cases a user will have a different internal than external address for example, mail to john.q.public@example.com is delivered to jqpublic@example.com, in which case the envelope recipient address is changed at the place where the mail is received for the original address and readdressed to the new one.

1.1.3 Messages

An Internet email message has a well specified format defined in RFC 2822. The message consists of lines of text, each ended by a carriage-return line-feed pair. All of the text must be seven-bit ASCII. (The 8BITMIME extension to SMTP permits characters with the high bit as well but still doesn't permit arbitrary binary data. If you want to send binary material as email, you must encode it using MIME encodings.)

The first part of the message is the header. Each header line starts with a tag that says what kind of header it is, followed by a colon, usually some whitespace, and then the contents of the header line. If a header is too long to fit on one line, it can be split into multiple lines. The second and subsequent lines start with whitespace to identify them as continuations. Every message must have From: and Date: header lines, and most have other headers such as To:, Cc:, Subject:, and Received:. The contents of some headers (such as Date:) are in a strictly defined format, while the contents of others (such as Subject:) are entirely arbitrary.

Some mail programs are more careful than others to create correct headers. (Many, for example, put invalid time zones in Date: headers.) Qmail is quite careful when it creates headers at the time a new message is injected into the mail system, but doesn't look at or change message headers on messages that are transported through the system. The only change it makes to existing messages is to add Received: and Delivered-To: headers at the top, to chronicle the message's path through the system.

The headers are separated from the body of the message by an empty line. The body can contain any arbitrary text, subject to a rarely enforced limit of 998 characters per line. The message must end with CR/LF, that is, no partial line at the end.

1.1.4 Lines

Every line in a message must end with CR/LF, the two hex bytes 0D 0A. This simple sounding requirement has caused a remarkable amount of confusion and difficulty over the years. Different computer operating systems use different conventions for line endings. Some use CR/LF, including all of Microsoft's systems and a string of predecessors from CP/M to the 1960s era TOPS-10. Unix and Unix-like systems use LF. Macintoshes use CR, just to be different.

Regardless of the local line-ending convention, messages sent and received via SMTP have to use CR/LF, and the MTA has to translate from local to CR/LF when sending mail and back from CR/LF to local when receiving mail. Unfortunately, a common bug in some MTAs has been to forget to make this translation, typically sending bare LFs rather than CR/LF. Furthermore, RFC 822 said nothing about what a bare CR or LF in a mail message means. Some MTAs (sendmail, notably) treat a bare LF the same as CR/LF. Others treat it as any other data character. Qmail rejects incoming SMTP mail containing a bare CR or LF on the theory that it's impossible to tell what the sender's intent was, and RFC 2822 agrees with qmail that a bare CR or LF is forbidden. (It's easy enough to tweak qmail's SMTP daemon to accept bare LF, of course, if you really want to. See Chapter 6.)



qmail
qmail
ISBN: 1565926285
EAN: 2147483647
Year: 2006
Pages: 152

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net