D. J. Bernstein
Internet publication
FTP: File Transfer Protocol

How an indexing client works

Here is how a typical FTP connection works from an indexing client.

The client wants to see what regular files and directories are available on an FTP server. It has an initial list of pathnames of directories to search.

The client connects to the FTP server, waits for the server's greeting, and sends USER and optional PASS requests the same way as a browser:

     220 ftp.heaven.af.mil ready.
     USER anonymous
     331 Please identify yourself in a password.
     PASS supermirror@
     230 Thanks.

The client now removes the first pathname from its list, and asks the server to send that directory:

     PASV
     227 =10,1,2,3,10,6
     CWD /pub/software/security
     250 Okay.
     LIST
     150 Opening data connection...
     226 Data transfer complete.

The directory is transferred over a separate TCP connection that the client made to the server before sending LIST.

The directory includes abbreviated pathnames of some regular files and directories. The client figures out the complete pathname for each file, records each pathname in its index, and adds each new directory pathname to its list.

The client then tries the next pathname in its list:

     PASV
     227 =10,1,2,3,10,7
     CWD /pub/software/crypto
     250 Okay.
     LIST
     150 Opening data connection...
     226 Data transfer complete.

The client repeats this process until the list is empty. Then it quits.

Directory loops

The server may have two directories with each directory listed in the other one. The client keeps track of the pathnames it has already searched, and does not search those pathnames again.

The server may have many pathnames for the same directory. For example, / and /pub and /pub/pub and /pub/pub/pub and so on may all refer to the same directory; the client will find each of these names if this directory includes the abbreviated name pub. The client can detect this situation by recording the identifiers provided by EPLF.