D. J. Bernstein
Fast arithmetic

Multiplication of polynomials over F_2

Minimum number of bit operations

The Gao-Mateer multiplication algorithm

Shuhong Gao and Todd Mateer have introduced a beautiful new multiplication algorithm for binary polynomials, i.e., polynomials over fields of characteristic 2; in particular, for polynomials over F_2 = {0,1}.

My understanding is that Gao and Mateer are preparing a paper on the algorithm. In the meantime you can find the algorithm in Section 3.5 of Mateer's thesis:

• (PDF) Todd Mateer. Fast Fourier transform algorithms with applications. Ph.D. Thesis, Clemson University, August 2008.
Mateer can be reached as tmateer at howardcc dot edu.

The Gao-Mateer algorithm is a streamlined, and asymptotically much faster, version of a well-known multiplication algorithm for binary polynomials published in 1988 by Yao Wang and Xuelong Zhu and independently in 1989 by David Cantor. The central step in each algorithm is an "additive FFT" that quickly computes the unique k[t]-algebra map from k[t]/(t^q+t) to prod_{α in F_q} k[t]/(t+α) = k[t]/t x k[t]/(t+1) x ..., where q is a power of 2 and k is a field containing F_q. The Wang-Zhu-Cantor additive FFT uses approximately (1/2) q lg q multiplications and (1/2) q (lg q)^1.585 additions in k. What's new in the Gao-Mateer multiplication algorithm is a new additive FFT algorithm, using approximately (1/2) q lg q multiplications and (1/4) q lg q lg lg q additions in k.

More generally, for each β in k, a size-q Gao-Mateer additive FFT quickly maps k[t]/(t^q+t+β^q+β) to prod_{α in F_q} k[t]/(t+β+α). Gao and Mateer very efficiently reduce a size-q^2 additive FFT to 2q size-q additive FFTs, which they compute recursively. The base case, q=2, simply evaluates a polynomial f+ft at t=β, obtaining f+fβ with one multiplication and one addition, and at t=β+1, obtaining f+f+fβ with one extra addition.

The central Gao-Mateer trick maps k[t]/(t^(q^2)+t+γ^q+γ) to prod_{α in F_q} k[t]/(t^q+t+γ+α) as follows:

• Write the input, a polynomial f of degree below q^2, in base t^q+t: i.e., find polynomials f,f,...,f[q-1] of degree below q such that f = f + (t^q+t)f + ... + (t^q+t)^(q-1)f[q-1]. Do this in the usual asymptotically fast way: first compute f mod (t^q+t)^(q/2) and floor(f/(t^q+t)^(q/2)), and then handle each half recursively. Cost: only (1/2) q^2 lg q additions, thanks to the extreme sparseness of (t^q+t)^2 = t^2q+t^2, (t^q+t)^4 = t^4q+t^4, etc.
• Recall that a size-q additive FFT maps k[u]/(u^q+u+γ^q+γ) to prod_{α in F_q} k[u]/(u+γ+α). Tensor with the k-module k+kt+...+kt^(q-1): q parallel size-q additive FFTs map (k+kt+...+kt^(q-1))[u]/(u^q+u+γ^q+γ) to prod_{α in F_q} (k+kt+...+kt^(q-1))[u]/(u+γ+α). Apply this map to f+fu+...+f[q-1]u^(q-1) to compute f+f(γ+α)+...+f[q-1](γ+α)^(q-1); this is exactly the desired f mod t^q+t+γ+α. Cost: q recursive size-q additive FFTs.
Gao introduced the special case γ=0 of this trick, mapping k[t]/(t^(q^2)+t) to prod_{α in F_q} k[t]/(t^q+t+α), in unpublished work in 2001, and built an additive FFT from this special case; see Section 3.4 of Mateer's thesis. However, the Gao-Mateer additive FFT is considerably more efficient.