D. J. Bernstein
Data structures and program structures
cdb

The cdbmake and cdbdump programs

     cdbmake f ftmp
cdbmake reads a series of encoded records from its standard input and writes a constant database to f.

cdbmake ensures that f is updated atomically, so programs reading f never have to wait for cdbmake to finish. It does this by first writing the database to ftmp and then moving ftmp on top of f. If ftmp already exists, it is destroyed. The directories containing ftmp and f must be writable to cdbmake; they must also be on the same filesystem.

cdbmake always makes sure that ftmp is safely written to disk before it replaces f. If the input is in a bad format or if cdbmake has any trouble writing ftmp to disk, cdbmake complains and leaves f alone.

     cdbdump
cdbdump reads a constant database from its standard input and prints the database contents, in cdbmake format, on standard output.

Record format

Records are indexed by keys. A key is a string. f is structured so that another program, starting from a key, can quickly find the relevant record. cdbmake allows several records with the same key, although cdbmake and cdbdump preserve the order of records.

A record is encoded for cdbmake as +klen,dlen:key->data followed by a newline. Here klen is the number of bytes in key and dlen is the number of bytes in data. The end of data is indicated by an extra newline. For example:

     +3,5:one->Hello
     +3,7:two->Goodbye

key and data may contain any characters, including colons, dashes, newlines, and nulls.

Keys and data do not have to fit into memory, but cdbmake needs roughly 16 bytes of memory per record. A database cannot exceed 4 gigabytes.

f is portable across machines.