D. J. Bernstein
Internet publication
FTP: File Transfer Protocol
The LIST and NLST verbs
Easily Parsed LIST Format
An EPLF response to
LIST
is a series of lines,
each line specifying one file.
Each line contains
- a plus sign (\053);
- a series of facts about the file;
- a tab (\011);
- an abbreviated pathname; and
- \015\012.
Note that the terminating \015\012 does not depend on the
binary flag.
Each fact is zero or more bytes of information,
terminated by a comma and not containing any tabs.
Facts may appear in any order.
Each fact appears at most once.
Facts have the general format xy,
where x is one of the following strings:
- r:
If this file's pathname is supplied as a
RETR
parameter,
the RETR may succeed.
The server is required to use an empty y.
The server must supply this fact
unless it is confident that
(because of file type problems, permission problems, etc.)
there's no point in the client sending RETR.
Mirroring clients can save time
by issuing RETR requests only for files where this fact is supplied.
The presence of r does not guarantee that RETR will succeed:
for example, the file may be removed or renamed,
or the RETR may suffer a temporary failure.
- /:
If this file's pathname is supplied as a
CWD
parameter,
the CWD may succeed.
The server is required to use an empty y.
As with r,
the server must supply this fact
unless it is confident that there's no point in the client sending CWD.
Indexing clients can save time
by issuing CWD requests only for files where this fact is supplied.
The presence of / does not guarantee that CWD will succeed.
- s:
The size of this file is y.
The server is required to provide a sequence of one or more ASCII digits
in y, specifying a number.
If the file is retrieved as a binary file
and is not modified, it will contain exactly y bytes.
This fact is optional;
it should not be supplied for files that can never be retrieved,
or for files whose size is constantly changing.
Clients can use this fact to preallocate space.
- m:
This file was last modified at y.
The server is required to provide a sequence of one or more ASCII digits
in y,
specifying a number of seconds, real time,
since the UNIX epoch at the beginning of 1970 GMT.
This fact is optional;
it should not be supplied by servers that do not know the time in GMT,
and it should not be supplied
for files that have been modified more recently than one minute ago.
(It also cannot be supplied for files last modified before 1970.)
Mirroring clients can save time
by skipping files whose modification time has not changed
since the previous mirror.
- i:
This file has identifier y.
If two files on the same FTP server
(not necessarily in the same LIST response)
have the same identifiers
then they have the same contents:
a RETR of each file will produce the same results,
for example,
and a CWD to each file will produce the same results
in a subsequent RETR or LIST.
(Under UNIX, for example,
the server could use
dev.ino
as an identifier,
where dev and ino
are the device number and inode number of the file
as returned by stat().
Note that lstat() is not a good idea for FTP directory listings.)
Indexing clients can use this fact
to avoid searching the same directory twice;
mirroring clients can use this fact
to avoid retrieving the same file twice.
This fact is optional,
but high-quality servers will always supply it at least for directories
so that indexing programs can avoid CWD loops.
- up:
The client may use
SITE CHMOD
to change the UNIX permission bits
of this file.
The server must provide three ASCII digits in y, in octal,
showing the current permission bits.
Further facts may be defined in the future
to support other client operations.
Servers are permitted to use new facts;
clients must skip unrecognized facts.
Examples
Here is a typical EPLF response,
with \011 displayed as a space,
and with \015\012 at the end of each line:
+i8388621.48594,m825718503,r,s280, djb.html
+i8388621.50690,m824255907,/, 514
+i8388621.48598,m824253270,r,s612, 514.html
An EPLF-aware client (in the Pacific time zone)
might display the following human-readable listing:
Tue Feb 13 15:58:27 1996 514/
612 bytes Tue Feb 13 15:14:30 1996 514.html
280 bytes Fri Mar 1 14:15:03 1996 djb.html
More examples:
+/,m824255907,i!#@$%^&*(), 514
+r,up644, This file name has spaces, commas, etc.
+up000, secret
Sample code
The following C function takes
a pointer to a string containing one line of an EPLF response.
It assumes that the original response did not contain \000,
and that the trailing \015\012 has been replaced by \000.
It returns a pointer to the filename,
or 0 if the line does not appear to be an EPLF response.
char *eplf_name(char *line)
{
if (*line != 43) return 0;
while (*line) if (*line++ == 9) return line;
return 0;
}
The following C function takes
a pointer to a string containing one line of an EPLF response as above,
and prints a human-readable listing.
It assumes that the local character set is ASCII,
that file modification times fit into a local time_t,
and that file sizes fit into a local unsigned long.
It also assumes that
time_t is interpreted as a number of seconds since the beginning of 1970 GMT.
(A more portable function could use mktime()
to discover the time_t representation of 1970 GMT.)
Note that its output is not machine-readable,
since the file name might contain the local newline sequence.
#include <time.h>
int eplf_readable(char *line)
{
int flagcwd = 0;
time_t when = 0;
int flagsize = 0;
unsigned long size;
if (*line++ != '+') return 0;
while (*line)
switch (*line) {
case '\t':
if (flagsize) printf("%10lu bytes ",size);
else printf(" ");
if (when) printf("%24.24s",ctime(&when));
else printf(" ");
printf(" %s%s\n",line + 1,flagcwd ? "/" : "");
return 1;
case 's':
flagsize = 1;
size = 0;
while (*++line && (*line != ','))
size = size * 10 + (*line - '0');
break;
case 'm':
while (*++line && (*line != ','))
when = when * 10 + (*line - '0');
break;
case '/':
flagcwd = 1;
default:
while (*line) if (*line++ == ',') break;
}
return 0;
}
Design principles
EPLF was designed to
- reliably communicate the information needed by clients;
- make client and server implementation as easy as possible; and
- be readable to humans,
when readability does not complicate implementations.
Modification times are expressed as second counters
rather than calendar dates and times, for example,
because second counters are much easier to generate and parse,
making it more likely that browsers will display times
in the viewer's time zone and native language.
I designed EPLF in March 1996.
The documentation was improved by suggestions from
Scott Schwartz and Benjamin Riefenstahl.