Complex convolution library interface

djbfft currently supports sizes 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, and 8192. This document focuses on 256 for simplicity.

Single-precision transforms

     #include <fftc4.h>

     complex4 a[256];

     fftc4_256(a);
     fftc4_scale256(a);
     fftc4_un256(a);

fftc4_256 computes a 256-point complex discrete Fourier transform. It evaluates the polynomial

     a[0] + a[1] x + a[2] x^2 + ... + a[255] x^255

at all the 256th roots of 1, and puts the values into a, overwriting the input. (Beware that the results are stored in an unusual order.) Each a[n] is a complex number with 4-byte real part a[n].re and 4-byte imaginary part a[n].im.

To compute the inverse transform, reconstructing a polynomial from its values, call fftc4_scale256 and then fftc4_un256. (fftc4_scale256 multiplies each a[n] by 1/256.)

Note that the position of a in memory can affect performance.

Single-precision convolutions

     #include <fftc4.h>

     complex4 a[256];
     complex4 b[256];

     fftc4_mul256(a,b);

fftc4_mul256 multiplies each a[n] by b[n] and puts the result into a[n].

The sequence of operations

     fftc4_256(a);
     fftc4_256(b);
     fftc4_mul256(a,b);
     fftc4_scale256(a);
     fftc4_un256(a);

convolves a with b: it multiplies the polynomial

     a[0] + a[1] x + a[2] x^2 + ... + a[255] x^255

     b[0] + b[1] x + b[2] x^2 + ... + b[255] x^255

modulo x^256-1 and puts the result back into a. The sequence of operations

     fftc4_256(b);
     fftc4_scale256(b);
     fftc4_256(a);
     fftc4_mul256(a,b);
     fftc4_un256(a);

has the same effect. If you have many polynomials to multiply by the same b, you can save time by reusing the transformed (and scaled) b.

Double-precision transforms and convolutions

     #include <fftc8.h>

     complex8 a[256];
     complex8 b[256];

     fftc8_256(a);
     fftc8_scale256(a);
     fftc8_un256(a);

     fftc8_mul256(a,b);

The fftc8 functions are just like the fftc4 functions except that they work with 8-byte floating-point numbers instead of 4-byte floating-point numbers.

WARNING: Some compilers, notably gcc without the -malign-double option, do not guarantee 8-byte alignment for 8-byte floating-point variables. The Pentium, Pentium II, et al. will slow down dramatically if your arrays are not aligned to 8-byte boundaries.

Frequencies

     #include <fftfreq.h>

     unsigned int n;
     unsigned int f;

     f = fftfreq_c(n,256);

What fftc4_256 and fftc8_256 put into a[n] is the value of the input polynomial at exp(2 pi if/256) where f = fftfreq_c(n,256).