Fast arithmetic

djbfft

Unfortunately,
**the Frigo-Johnson benchmarks are highly deceptive**:

- FFTW has never been competitive with the state of the art on the Pentium or Pentium MMX. Frigo and Johnson did not include any Pentium-optimized libraries in their original benchmarks; they removed the Pentium and Pentium MMX from their benchmark pages in 1998.
- djbfft (starting with version 0.60 in 1997)
is faster than FFTW on the Pentium Pro, Pentium II, and Pentium III.
However,
when Frigo and Johnson added djbfft to their benchmarks in 1998,
they
**slowed it down by changing the compiler options despite my explicit instructions**. I have asked them repeatedly to remove djbfft from their benchmarks if they aren't willing to follow the installation instructions; they have ignored my requests. - djbfft is a work in progress; version 0.70 was released in 1998, and version 0.75 in 1999. The current version is substantially faster than FFTW on the UltraSPARC. (In fact, as far as I know, the current version is competitive with FFTW on every common CPU.) But Frigo and Johnson report results for djbfft 0.60. I have asked them repeatedly to remove djbfft from their benchmarks if they aren't willing to keep up with the latest version; they have ignored my requests.

The FFTW benchmark methodology has several other problems:

- For complex FFTs, the authors benchmark either the forward transform or the inverse transform, and assume that the other transform runs at the same speed. This is not always true for in-order transforms (including FFTW), and it is rarely true for out-of-order transforms.
- For real FFTs, the authors benchmark the forward transform plus the inverse transform. However, in many applications, the forward transform is invoked more often than the inverse transform.
- The authors combine multiple libraries on the same CPU into one graph. However, they don't combine multiple compiler options into one graph. This is inconsistent.
- The authors distinguish between in-order and out-of-order transforms, but they don't distinguish between scaled and unscaled transforms. This is inconsistent. (The cynical explanation is that FFTW produces results in order but doesn't scale.)

- ``How fast is this library for this computation?'' Answered by a table showing each library's time for each computation.
- ``How fast is the fastest library for this computation?'' Answered by a table showing the smallest time for each computation.
- ``What are the fastest libraries for this computation?'' Answered by a table showing each library's time divided by the best time.

- The results are expressed as inverse time, rather than time. Inverse time is unnecessarily difficult to use. The time for a convolution, for example, is a straightforward sum of transform times and multiplication times; the inverse time, in contrast, is the inverse of the sum of the inverse of the inverse times. (The FFTW authors claim, incorrectly, that the fast FFTs are more ``huddled'' in a time graph than in an inverse-time graph.)
- The results are scaled by 5n lg n for a size-n transform. This, too, makes the numbers more difficult to use. Referring to the result as ``mflops'' is misleading: a split-radix FFT uses only about 4n(lg n - 1.5) floating-point operations. (The FFTW authors claim, incorrectly, that scaling is essential to fit the results onto one graph.)
- The results for FFTW itself are represented as large, navy blue dots, while the results for the next library are represented as thin, light blue stars that are almost invisible on a printout. Is the goal here to provide information, or to advertise FFTW?