D. J. Bernstein
Fast arithmetic

Integer multiplication benchmarks

These are some timings for multiplication of two b-bit integers, for various values of b, using various libraries.

This best place to discuss these timings is the BNIs mailing list.

Benchmark details

My current benchmark machine is thoth, an Athlon-900 running FreeBSD 4.5 with gcc 2.95.3. Note that the choice of compiler and compiler options can make a big difference in speed.

Timings are expressed in clock cycles. On an Athlon-900 there are approximately 900 million clock cycles per second. The Athlon has a built-in cycle counter; timing overhead is around 15 cycles.

Some older timings are for a Pentium II-400 running Linux 2.2 with gcc (for C) and egcs 2.90.29 (for C++). A few timings are for a Pentium-133 running Linux 2.0 with egcs 2.90.21.

I ran several repetitions of each multiplication, timing each repetition separately. For example, the line

     256:       1485        502        445        445        445        445 
means that six consecutive 256-bit-by-256-bit multiplications took 1485 cycles, 502 cycles, 445 cycles, 445 cycles, 445 cycles, and 445 cycles respectively. The first repetition is generally somewhat slower than the rest when it has to allocate memory, read new data into the cache, or read new code into the cache.

Note that these libraries support many more functions than integer multiplication. Measurements of multiplication speed should not be interpreted as summaries of overall performance.

A warning

One popular multiplication strategy, using complex FFTs, has a quite noticeable gap between the parameter bounds that have been proven to work and the parameter bounds that seem to work.

Some implementors, for the sake of speed, set parameters close to the edge of what works in tests. Sometimes failures are reported, and the implementors move the parameters a bit farther away. There is, of course, no guarantee that the new parameters will work.

One fix is to use hash127-type techniques to check whether the product is correct modulo a random 128-bit prime, and automatically redo the computation with safe parameters if the check fails.

Anyway, I have made no attempts to see whether the benchmarked libraries actually work.