D. J. Bernstein
Internet mail
Internet mail message header format

Tokenizable field values

A field value may be structured as a series of tokens, comments, spaces, and tabs.

The semantics of a tokenizable field value depend only on its tokens. A writer can freely insert comments, spaces, and tabs between tokens. For example, some writers will insert spaces inside long addresses:

     To: cryptographic-cookie-940d3af4f1357d203c7afd5162e9d06e
         @heaven.af.mil
Readers should identify tokens as discussed below, and ignore spaces and comments during parsing.

Unfortunately, many readers use ad-hoc parsers that do not extract tokens correctly. It is a bad idea to put spaces, tabs, or comments at unusual locations. 822bis has a huge number of new rules prohibiting or discouraging various spaces, tabs, and comments.

Tokens

There are three types of tokens: An atom is a string of one or more characters terminated by the end of the field value or by any of these characters: space, tab, @, <, >, [, left parenthesis, comma, semicolon, colon, dot, double quote. (Note that adjacent atom tokens must be separated by at least one comment, space, or tab.) An atom represents itself as a string. For example,
     heaven
is an atom representing the 6-byte string "heaven". A string containing an unusual character, such as space or semicolon, cannot be encoded as an atom. (822 prohibits all control characters, byte 127 and bytes 0 through 31, in atoms, as well as ], backslash, and right parenthesis.)

A quoted string is a double quote, zero or more quoted string chunks, and another double quote. A quoted string chunk represents a single character; it can be

A quoted string represents the concatenation of the characters represented by the quoted string chunks. For example,
     "heaven"
and
     "h\e\ave\n"
are two quoted strings, each representing the 6-byte string "heaven"; and
     "\\\\\\"
is a quoted string representing three backslashes. Any string can be encoded as a quoted string.

A domain literal is a left bracket, zero or more domain literal chunks, and a right bracket. A domain literal chunk represents a single character; it can be

A domain literal represents the concatenation of (1) a left bracket, (2) the characters represented by the domain literal chunks, and (3) a right bracket. For example,
     [127.0.0.1]
and
     [\1\2\7\.\0\.\0\.\1]
are two domain literals, each representing the 11-byte string "[127.0.0.1]". Any string starting with [ and ending with ] can be encoded as a domain literal.

Several clients (reportedly: AMS and various IMAP servers) cannot handle domain literals containing colons:

     [FF02::3492:A98F]

Comments

A comment is a left parenthesis, zero or more comment chunks, and a right parenthesis. A comment chunk can be Some examples of comments:
     (D. J. Bernstein)
     (comment (nested (deeply)) (and (oh no!) again))
     (\)\\)
     (by way of Whatever <redir@my.org>)    (generated by Eudora)

Examples

The field value
     ":sysmail"@  group. org, Muhammed.(the greatest) Ali @(the)Vegas.WBA
contains
  1. token: quoted string representing the 8-byte string ":sysmail"
  2. token: at sign
  3. space
  4. space
  5. token: atom representing the 5-byte string "group"
  6. token: dot
  7. space
  8. token: atom representing the 3-byte string "org"
  9. token: comma
  10. space
  11. token: atom representing the 8-byte string "Muhammed"
  12. token: dot
  13. comment (the greatest)
  14. space
  15. token: atom representing the 3-byte string "Ali"
  16. space
  17. token: at sign
  18. comment (the)
  19. token: atom representing the 5-byte string "Vegas"
  20. token: dot
  21. token: atom representing the 3-byte string "WBA"
This is the first example in RFC 822, but most mail-reading programs can't handle it. Pine 3.91 can't even handle
     God@heaven. af.mil
correctly; it truncates the address after the first dot.