This best place to discuss these timings is the BNIs mailing list.
Timings are expressed in clock cycles. On an Athlon-900 there are approximately 900 million clock cycles per second. The Athlon has a built-in cycle counter; timing overhead is around 15 cycles.
Some older timings are for a Pentium II-400 running Linux 2.2 with gcc 188.8.131.52 (for C) and egcs 2.90.29 (for C++). A few timings are for a Pentium-133 running Linux 2.0 with egcs 2.90.21.
I ran several repetitions of each multiplication, timing each repetition separately. For example, the line
256: 1485 502 445 445 445 445means that six consecutive 256-bit-by-256-bit multiplications took 1485 cycles, 502 cycles, 445 cycles, 445 cycles, 445 cycles, and 445 cycles respectively. The first repetition is generally somewhat slower than the rest when it has to allocate memory, read new data into the cache, or read new code into the cache.
Note that these libraries support many more functions than integer multiplication. Measurements of multiplication speed should not be interpreted as summaries of overall performance.
Some implementors, for the sake of speed, set parameters close to the edge of what works in tests. Sometimes failures are reported, and the implementors move the parameters a bit farther away. There is, of course, no guarantee that the new parameters will work.
One fix is to use hash127-type techniques to check whether the product is correct modulo a random 128-bit prime, and automatically redo the computation with safe parameters if the check fails.
Anyway, I have made no attempts to see whether the benchmarked libraries actually work.