D. J. Bernstein
Fast arithmetic

Multiplication of polynomials over F_2

Minimum number of bit operations

The Gao-Mateer multiplication algorithm

Shuhong Gao and Todd Mateer have introduced a beautiful new multiplication algorithm for binary polynomials, i.e., polynomials over fields of characteristic 2; in particular, for polynomials over F_2 = {0,1}.

My understanding is that Gao and Mateer are preparing a paper on the algorithm. In the meantime you can find the algorithm in Section 3.5 of Mateer's thesis:

Mateer can be reached as tmateer at howardcc dot edu.

The Gao-Mateer algorithm is a streamlined, and asymptotically much faster, version of a well-known multiplication algorithm for binary polynomials published in 1988 by Yao Wang and Xuelong Zhu and independently in 1989 by David Cantor. The central step in each algorithm is an "additive FFT" that quickly computes the unique k[t]-algebra map from k[t]/(t^q+t) to prod_{α in F_q} k[t]/(t+α) = k[t]/t x k[t]/(t+1) x ..., where q is a power of 2 and k is a field containing F_q. The Wang-Zhu-Cantor additive FFT uses approximately (1/2) q lg q multiplications and (1/2) q (lg q)^1.585 additions in k. What's new in the Gao-Mateer multiplication algorithm is a new additive FFT algorithm, using approximately (1/2) q lg q multiplications and (1/4) q lg q lg lg q additions in k.

More generally, for each β in k, a size-q Gao-Mateer additive FFT quickly maps k[t]/(t^q+t+β^q+β) to prod_{α in F_q} k[t]/(t+β+α). Gao and Mateer very efficiently reduce a size-q^2 additive FFT to 2q size-q additive FFTs, which they compute recursively. The base case, q=2, simply evaluates a polynomial f[0]+f[1]t at t=β, obtaining f[0]+f[1]β with one multiplication and one addition, and at t=β+1, obtaining f[0]+f[1]+f[1]β with one extra addition.

The central Gao-Mateer trick maps k[t]/(t^(q^2)+t+γ^q+γ) to prod_{α in F_q} k[t]/(t^q+t+γ+α) as follows:

Gao introduced the special case γ=0 of this trick, mapping k[t]/(t^(q^2)+t) to prod_{α in F_q} k[t]/(t^q+t+α), in unpublished work in 2001, and built an additive FFT from this special case; see Section 3.4 of Mateer's thesis. However, the Gao-Mateer additive FFT is considerably more efficient.

The Gao-Mateer size-q^2 additive FFT starts with the Gao-Mateer trick, and then recursively handles each k[t]/(t^q+t+γ+α) with a size-q additive FFT. Cost for a size-q^2 additive FFT: (1/2) q^2 lg q additions, plus 2q size-q additive FFTs. Replace q by q^(1/2) to see the cost for a size-q additive FFT: (1/4) q lg q additions, plus 2 q^(1/2) size-q^(1/2) additive FFTs. A size-q additive FFT thus uses (1/4) q lg q additions at the top level, (1/4) q lg q additions at the second level, etc., for lg lg q levels; plus (1/2) q lg q size-2 additive FFTs, each of which involves 1 multiplication and 2 additions.

One way to multiply polynomials in F_2[x] of degree below n is to multiply polynomials in F_q[y] of degree approximately 2n/lg q. Specifically, write F_q as F_2[x]/phi, where phi is an irreducible polynomial of degree lg q; map F_2[x] to F_2[x][y]/(x^m-y), where m < 1+(1/2)lg q; lift to F_2[x][y], obtaining polynomials of x-degree at most m-1 and of y-degree at most ceil((n-1)/m); map to (F_2[x]/phi)[y] = k[y]; multiply in k[y]; recover the product in F_2[x][y], using the fact that the product polynomials have x-degree at most 2(m-1) < lg q. If q and m are chosen so that ceil((n-1)/m) < q then the multiplication in k[y] can be performed with the Gao-Mateer multiplication algorithm, using approximately (1/4) q lg q lg lg q additions in k, each of which takes lg q bit operations, and approximately (1/2) q lg q multiplications in k.

For example, to multiply polynomials in F_2[x] of degree below 2^36, one can multiply polynomials in F_q[y] of degree below q, where q = 2^32. The Gao-Mateer additive FFT uses approximately 5 2^35 additions in F_q and approximately 2^36 multiplications in F_q, so multiplication in F_2[x] uses approximately 15 2^35 additions in F_q and approximately 3 2^36 multiplications in F_q.