D. J. Bernstein

Fast arithmetic
# Integer multiplication benchmarks

apfloat

BN

CLN

FreeLIP

giantint

GNU MP

libI

MIRACL

MPFUN

NTL LIP

Piologie

Zmult

These are some timings
for multiplication of two *b*-bit integers,
for various values of *b*,
using various libraries.
This best place to discuss these timings is the
BNIs mailing list.

## Benchmark details

My current benchmark machine is `thoth`,
an Athlon-900 running FreeBSD 4.5 with gcc 2.95.3.
Note that the choice of compiler and compiler options
can make a big difference in speed.
Timings are expressed in clock cycles.
On an Athlon-900 there are approximately 900 million clock cycles per second.
The Athlon has a built-in cycle counter;
timing overhead is around 15 cycles.

Some older timings are for a Pentium II-400 running Linux 2.2
with gcc 2.7.2.3 (for C) and egcs 2.90.29 (for C++).
A few timings are for a Pentium-133 running Linux 2.0 with egcs 2.90.21.

I ran several repetitions of each multiplication,
timing each repetition separately.
For example, the line

256: 1485 502 445 445 445 445

means that six consecutive 256-bit-by-256-bit multiplications
took 1485 cycles, 502 cycles, 445 cycles, 445 cycles, 445 cycles,
and 445 cycles respectively.
The first repetition is generally somewhat slower than the rest
when it has to allocate memory,
read new data into the cache, or
read new code into the cache.
Note that these libraries
support many more functions than integer multiplication.
Measurements of multiplication speed
should not be interpreted as summaries of overall performance.

## A warning

One popular multiplication strategy,
using
complex FFTs,
has a quite noticeable gap
between the parameter bounds that have been *proven* to work
and the parameter bounds that *seem* to work.
Some implementors, for the sake of speed,
set parameters close to the edge of what works in tests.
Sometimes failures are reported,
and the implementors move the parameters a bit farther away.
There is, of course, no guarantee
that the new parameters will work.

One fix is to use
hash127-type techniques
to check whether the product is correct
modulo a random 128-bit prime,
and automatically redo the computation
with safe parameters if the check fails.

Anyway,
I have made no attempts to see
whether the benchmarked libraries actually work.