D. J. Bernstein
Internet publication
FTP: File Transfer Protocol

Files, usernames, and pathnames

An FTP server provides access to a collection of files. Each file is identified by a server-defined username and a server-defined pathname.

Many servers provide public files under the standard username anonymous. Most of these servers demand a password, but allow any password that ends with @. The Netscape FTP client uses the password mozilla@, for example.

FTP defines three types of files: text files, binary files, and directories. Text files and binary files are collectively known as regular files.

In theory, the client needs to use different requests to retrieve different types of files. In practice, servers store all regular files internally as binary files; when the client asks for a text file, the server reads a series of lines from the binary file in the server's favorite text format, and sends the lines separated by \015\012.

Pathnames and encoded pathnames

A pathname is any string of bytes beginning with a slash and not containing \000.

An encoded pathname is a string of bytes not containing \012. It normally represents the pathname obtained by replacing each \000 in the encoded pathname with \012. However, if it does not start with a slash, it represents the pathname obtained by concatenating

For example, if the name prefix is /home/joe, the encoded pathname /public represents the pathname /public; the encoded pathname tex represents the pathname /home/joe/tex; the encoded pathname ab\000c represents the pathname /home/joe/ab\012c.

In practice, several bytes cause problems when they appear in pathnames:

Adding further to the problems is a widespread document that recommends encoding \015 as \015\000. I strongly recommend against this; it breaks current use of \015 without fixing anything.

Pathname display

By convention, any FTP pathname that is a valid UTF-8 string is displayed as a UTF-8 string. In particular, any 7-bit FTP pathname is displayed as an ASCII string.

RFC 959 generally requires 7-bit requests and responses except in TELNET strings. This requirement is obsolete.

Some FTP servers provide access to local file collections in which file names are, by convention, displayed as ISO-8859-1. These servers can easily translate file names from ISO-8859-1 to UTF-8 before providing them as pathnames to FTP clients. When the FTP client asks for a file under a pathname p, the server can behave as follows:

  1. Go to step 4 if p is not a valid UTF-8 string, or if some of the characters in the pathname cannot be expressed in ISO-8859-1.
  2. Translate p from UTF-8 to ISO-8859-1. If the translation fails temporarily (because of, e.g., insufficient memory), print an error and stop.
  3. Attempt to find the file under the translated name. If the attempt fails temporarily, print an error and stop. If the attempt succeeds, operate on the file and stop.
  4. Attempt to find the file under the name p. If the attempt fails temporarily, print an error and stop. If the attempt fails permanently, print an error and stop. If the attempt succeeds, operate on the file and stop.
This provides a reasonable level of backwards compatibility with clients using ISO-8859-1 pathnames.