D. J. Bernstein

The io library interface

This page starts with an introduction to the io library. It continues by describing the io functions for creating descriptors, reading data, writing data, setting time limits on descriptors, waiting for ready descriptors, and using descriptors created in other ways.

Introduction to the io library

A UNIX program receives information from external sources through ``file descriptors.'' Despite the word ``file,'' these sources are not necessarily files on disk; a file descriptor might be a ``socket'' receiving information from the cnn.com web server, for example, or a ``tty'' receiving information from the user's keyboard.

Similarly, a UNIX program provides information to external sinks through file descriptors.

The io (input/output) library manages file descriptors. It can

The descriptors are identified by nonnegative integers (0, 1, 2, etc.) stored in int64 variables.

For example, a program that wants to read a file from disk can use io_readfile() to create a descriptor for the file, io_tryread() repeatedly to receive information from the file, and then io_close() to eliminate the descriptor.

The following program uses the io library to copy standard input (descriptor 0) to standard output (descriptor 1), with a 65536-byte buffer in the middle:

     #include "io.h"
     
     char buf[65536];
     int64 readpos = 0;
     int64 writepos = 0;
     int flageof = 0;
     
     int main()
     {
       int64 r;
     
       if (!io_fd(0)) return 111;
       if (!io_fd(1)) return 111;
     
       for (;;) {
         if (flageof && writepos >= readpos) return 0;
     
         if (!flageof && readpos < sizeof buf) {
           r = io_tryread(0,buf + readpos,sizeof buf - readpos);
           if (r <= -2) return 111; /* read error other than EAGAIN */
           if (r == 0) flageof = 1;
           if (r > 0) readpos += r;
         }
     
         if (writepos < readpos) {
           r = io_trywrite(1,buf + writepos,readpos - writepos);
           if (r <= -2) return 111; /* write error other than EAGAIN */
           if (r > 0) {
             writepos += r;
             if (writepos == readpos) readpos = writepos = 0;
	     /* if writepos is big, might want to left-shift buffer here */
           }
         }
       }
     }
io_fd makes file descriptors 0 and 1 visible to the io library. io_tryread(0,...) reads data if descriptor 0 is readable, and io_trywrite(1,...) writes data if descriptor 1 is writable. If the buffer is partially full, the program will check both descriptors, rather than delaying further reads until the current data is written.

If descriptor 0 is unreadable (or there is no space in the buffer for new data) and descriptor 1 is unwritable (or the buffer is empty), the above program busy-loops: it calls io_tryread() and io_trywrite() repeatedly, making no progress until the situation changes. Here is a smarter program that instead goes to sleep until the situation changes, leaving the CPU free for other programs:

     #include "io.h"
     
     char buf[65536];
     int64 readpos = 0;
     int64 writepos = 0;
     int flageof = 0;
     
     int main()
     {
       int64 r;
     
       if (!io_fd(0)) return 111;
       if (!io_fd(1)) return 111;
     
       for (;;) {
         if (flageof && writepos >= readpos) return 0;
     
         if (!flageof && readpos < sizeof buf) io_wantread(0);
         if (writepos < readpos) io_wantwrite(1);
         io_wait();
         if (!flageof && readpos < sizeof buf) io_dontwantread(0);
         if (writepos < readpos) io_dontwantwrite(1);
     
         if (!flageof && readpos < sizeof buf) {
           r = io_tryread(0,buf + readpos,sizeof buf - readpos);
           if (r <= -2) return 111; /* read error other than EAGAIN */
           if (r == 0) flageof = 1;
           if (r > 0) readpos += r;
         }
     
         if (writepos < readpos) {
           r = io_trywrite(1,buf + writepos,readpos - writepos);
           if (r <= -2) return 111; /* write error other than EAGAIN */
           if (r > 0) {
             writepos += r;
             if (writepos == readpos) readpos = writepos = 0;
	     /* if writepos is big, might want to left-shift buffer here */
           }
         }
       }
     }
io_wait watches the descriptors previously indicated by io_wantread and io_wantwrite.

Creating descriptors

     #include "io.h"
     char s[];
     int64 d;
     io_readfile(&d,s);
io_readfile sets d to the number of a new descriptor reading from the disk file named s, and returns 1.

If something goes wrong, io_readfile sets errno to indicate the error, and returns 0; it does not create a new descriptor, and it does not touch d.


     #include "io.h"
     char s[];
     int64 d;
     io_createfile(&d,s);
io_createfile sets d to the number of a new descriptor writing to the disk file named s, and returns 1. If s already existed, it is truncated to length 0; otherwise, it is created, with mode 0600.

If something goes wrong, io_createfile sets errno to indicate the error, and returns 0; it does not create a new descriptor, and it does not touch d. (However, it may have truncated or created the file.)


     #include "io.h"
     int64 d[2];
     io_pipe(d);
io_pipe creates a new UNIX ``pipe.'' The pipe can receive data and provide data; any bytes written to the pipe can then be read from the pipe in the same order.

A pipe is typically stored in an 8192-byte memory buffer; the exact number depends on the UNIX kernel. Bytes are written to the end of the buffer and read from the beginning of the buffer. Once a byte has been read, it is eliminated from the buffer, making space for another byte to be written; readers cannot ``rewind'' a pipe to read old data. Once 8192 bytes have been written to the buffer, the pipe will not be ready for further writing until some of the bytes have been read. Once all the bytes written have been read, the pipe will not be ready for further reading until more bytes are written.

io_pipe sets d[0] to the number of a new descriptor reading from the pipe, and sets d[1] to the number of a new descriptor writing to the pipe. It then returns 1 to indicate success. If something goes wrong, io_pipe returns 0, setting errno to indicate the error; in this case it frees any memory that it allocated for the new pipe, and it leaves d alone.


     #include "io.h"
     int64 d;
     io_close(d);
io_close eliminates the descriptor numbered d. This usually does not mean eliminating the object that the descriptor is talking to. (For example, if a descriptor writes to a named disk file, closing the descriptor will not remove the file; it simply removes one way of writing to the file. On the other hand, a pipe disappears as soon as no descriptors refer to it.)

io_close has no return value; it always succeeds in deallocating the memory used for the descriptor. If d is not the number of a descriptor, io_close has no effect.


Reading data

     #include "io.h"
     int64 d;
     char buf[];
     int64 len;
     int64 result;
     result = io_tryread(d,buf,len);
io_tryread tries to read len bytes of data from descriptor d into buf[0], buf[1], ..., buf[len-1]. (The effects are undefined if len is 0 or smaller.) There are several possible results: io_tryread does not pause waiting for a descriptor that is not ready. If you want to pause, use io_waitread or io_wait.
     #include "io.h"
     int64 d;
     char buf[];
     int64 len;
     int64 result;
     result = io_waitread(d,buf,len);
io_waitread tries to read len bytes of data from descriptor d into buf[0], buf[1], ..., buf[len-1], pausing if necessary so that the descriptor is ready. (The effects are undefined if len is 0 or smaller.) There are several possible results:

Writing data

     #include "io.h"
     int64 d;
     const char buf[];
     int64 len;
     int64 result;
     result = io_trywrite(d,buf,len);
io_trywrite tries to write len bytes of data from buf[0], buf[1], ..., buf[len-1] to descriptor d. (The effects are undefined if len is 0 or smaller.) There are several possible results: io_trywrite does not pause waiting for a descriptor that is not ready. If you want to pause, use io_waitwrite or io_wait.

Once upon a time, many UNIX programs neglected to check the success of their writes. They would often encounter EPIPE, and would blithely continue writing, rather than exiting with an appropriate exit code. The UNIX kernel developers decided to send a SIGPIPE signal, which terminates the process by default, along with returning EPIPE. This papers over the problem without fixing it: the same programs ignore other errors such as EIO. One hopes that the programs have been fixed by now; kernels nevertheless continue to generate the SIGPIPE signal. The first time io_trywrite or io_waitwrite is called, it arranges for SIGPIPE to be ignored. (Technically, for SIGPIPE to be caught by an empty signal handler, so this doesn't affect child processes.) Do not use SIGPIPE elsewhere in the program.


     #include "io.h"
     int64 d;
     const char buf[];
     int64 len;
     int64 result;
     result = io_waitwrite(d,buf,len);
io_waitwrite tries to write len bytes of data from buf[0], buf[1], ..., buf[len-1] to descriptor d, pausing (perhaps repeatedly) until the descriptor is ready. (The effects are undefined if len is 0 or smaller.) There are several possible results:

Setting time limits on descriptors

     #include "io.h"
     int64 d;
     tai6464 t;
     io_timeout(d,t);
The io library keeps track of an optional ``timeout'' for each descriptor. The timeout is a specific moment in time, stored in a tai6464 variable.

io_timeout(d,t) sets the timeout for descriptor d to t.

io_timeout has no return value; it always succeeds. (Space to store the timeout was already allocated as part of the descriptor.) It has no effect if d is not the number of a descriptor.


     #include "io.h"
     int64 d;
     char buf[];
     int64 len;
     int64 result;
     result = io_tryreadtimeout(d,buf,len);
io_tryreadtimeout is identical to io_tryread, with the following exception: if then io_tryreadtimeout instead returns -2, with errno set to ETIMEDOUT.
     #include "io.h"
     int64 d;
     const char buf[];
     int64 len;
     int64 result;
     result = io_trywritetimeout(d,buf,len);
io_trywritetimeout is identical to io_trywrite, with the following exception: if then io_trywritetimeout instead returns -2, with errno set to ETIMEDOUT.

Waiting for ready descriptors

     #include "io.h"
     int64 d;
     io_wantread(d);
     io_wantwrite(d);
     io_dontwantread(d);
     io_dontwantwrite(d);
For each descriptor, the io library keeps track of the number of parts of the current program interested in reading the descriptor. The number is incremented by io_wantread and decremented by io_dontwantread. (The effects are undefined if the number is decremented below 0.) The number is 0 when the descriptor is created. Closing a descriptor implicitly sets the number to 0.

Similar comments apply to io_wantwrite and io_dontwantwrite.

These functions have no return value; they always succeed. (Space to store the numbers was already allocated as part of the descriptor.) They have no effect if d is not the number of a descriptor.

You do not have to indicate interest in a descriptor before using the descriptor. The importance of io_wantread and io_wantwrite is their interaction with io_wait, io_waituntil, and io_check.


     #include "io.h"
     tai6464 t;
     io_wait();
     io_waituntil(t);
     io_check();
io_wait() checks the descriptors that the program is interested in to see whether any of them are ready. If none of them are ready, io_wait() tries to pause until one of them is ready, so that it does not take time away from other programs running on the same computer.

io_wait pays attention to timeouts: if a descriptor reaches its timeout, and the program is interested in reading or writing that descriptor, io_wait will return promptly.

Under some circumstances, io_wait will return even though no interesting descriptors are ready. Do not assume that a descriptor is ready merely because io_wait has returned.

io_wait is not interrupted by the delivery of a signal. Programs that expect interruption are unreliable: they will block if the same signal is delivered a moment before io_wait. The correct way to handle signals is with the self-pipe trick.

io_waituntil(t) is like io_wait() but does not wait (noticeably) past time t. io_check() is like io_wait() but does not wait at all; its importance is its interaction with io_canread and io_canwrite.


     #include "io.h"
     int64 d;
     d = io_canread();
     d = io_canwrite();
io_wait saves a list of numbers of the descriptors that are ready to be read (or that have timed out) and that the program is interested in reading. io_canread returns the next descriptor number on the list, or -1 if there are no more descriptor numbers on the list. The list is reset by the next call to io_wait.

io_waituntil and io_check interact with io_canread in the same way that io_wait does. io_wait, io_waituntil, and io_check share one list of descriptors.

Do not assume that data can actually be read merely because io_canread has pointed to a descriptor. Data that was readable a moment ago could already have been read by another program. Even worse, the low-level UNIX routines used by io_wait could have failed to allocate memory; in this case, io_canread has no choice but to indicate that all descriptors are ready. You must check the results of io_tryread.

Similar comments apply to io_canwrite.

You do not need to use io_canread and io_canwrite. You can call io_wait, try io_tryread and io_trywrite for every descriptor, and repeat. However, if you have thousands of descriptors of which only a few are ready, most of the effort of trying descriptors will be wasted. It is faster to focus on the descriptors indicated by io_canread and io_canwrite.

(The io_wait implementation and underlying UNIX kernel routines might have their own speed problems with thousands of descriptors, but this problem can be solved without any change in programs that use io_wait.)


Using descriptors created in other ways

     #include "io.h"
     int64 d;
     io_fd(d);
There is a slight difference between the io library's view of descriptors and the UNIX view of descriptors: the io library has its own list of descriptors, and will not work with UNIX descriptors that aren't on the list. In particular: The io_fd function adds the UNIX descriptor numbered d to the io list, and returns 1. If the descriptor is already on the list, io_fd returns 1 without doing anything. If something goes wrong (such as running out of memory), io_fd returns 0, setting errno to indicate the error.
     #include "io.h"
     int64 d;
     io_nonblock(d);
io_nonblock puts UNIX descriptor d into ``non-blocking mode.'' Calling io_nonblock(d) before io_fd(d) saves some time in io_tryread and io_trywrite.

Actually, current UNIX kernels do not support non-blocking descriptors; they support non-blocking open files. Furthermore, many programs will break if they encounter non-blocking mode. This means that you must not use io_nonblock for a descriptor inherited from another program.

io_nonblock has no return value; it always succeeds. If d is not the number of a UNIX descriptor, io_nonblock has no effect.

If io_fd is given a descriptor in blocking mode, io_tryread and io_trywrite go through the following contortions to avoid blocking:

  1. Stop if poll says that the descriptor is not ready. Otherwise there's a good chance, but not a guarantee: even if poll says the descriptor is ready, the descriptor might not be ready a moment later. (Furthermore, poll can fail on some systems.)
  2. Catch SIGALRM. SIGALRM must not be blocked, and must not be used elsewhere in the program.
  3. Set an interval timer so that any blocking call will be interrupted by SIGALRM within 10 milliseconds. (Current UNIX kernels do not allow any shorter interval.) Of course, this may still mean a 10-millisecond delay.
If io_fd is given a descriptor in non-blocking mode (or a descriptor for a regular disk file), io_tryread and io_trywrite avoid these contortions.

Future versions of the io library may use kernel extensions to avoid these contortions.


     #include "io.h"
     int64 d;
     io_closeonexec(d);
io_closeonexec arranges for UNIX descriptor d to not be inherited by children of the current process. You should do this for any descriptor obtained from a UNIX function such as socket and open; it's an unfortunate historical accident that the UNIX functions don't do this. (If you want a descriptor to be inherited by a child process, arrange that when you create the child process.)

io_closeonexec has no return value; it always succeeds. If d is not the number of a UNIX descriptor, io_closeonexec has no effect.