Smaller decoding exponents: ball-collision decoding D. J. Bernstein
Error-correcting codes

Smaller decoding exponents: ball-collision decoding

[ballcoll] 26pp. (PDF) Daniel J. Bernstein, Tanja Lange, Christiane Peters. Smaller decoding exponents: ball-collision decoding. Document ID: 0e8c929565e20cf63e6a19794e570bb1. URL: https://cr.yp.to/papers.html#ballcoll. Date: 2011.03.07. Supersedes: (PDF) Ball-collision decoding. 2010.11.17.

Reference implementation

bcd.cpp is a complete reference implementation of ball-collision decoding. The implementation generates 10000 challenge problems, solves each problem with ball-collision decoding, and prints various statistics. Here are more comments on exactly what the program does and on the resulting statistics.

Each challenge consists of a random full-rank parity-check matrix H and a syndrome s for a secret random weight-w target vector. The minimum distance of H can be small, so it's possible for the decoding algorithm to find other vectors with the same syndrome, but the program doesn't stop with the first output; it continues iterations for the challenge until finding an output that happens to match the secret vector. (This is also what the paper analyzes. In modern list-decoding variants of McEliece the secret vector is recognizable from OAEP-style encoding. The distinction doesn't arise in classic McEliece, since there's a unique weight-w vector with the given syndrome.)

The challenge sizes are set at compile time: n=30, k=11, and w=8, with attack parameters k1=6, k2=5, p1=2, p2=1, q1=1, q2=2, l1=4, l2=5. These aren't even close to optimized parameters, but they achieve a reasonable level of code coverage and have various deliberate asymmetries, making them a much better test case than optimized parameters would be. We have also tried other sizes, of course; all of the sizes can be easily edited at the top of bcd.cpp. The implementation excludes the extreme cases p1=0 and p2=0, and obviously will overflow RAM and various counters when parameters are large; but within these constraints the implementation shows the full flexibility of ball-collision decoding.

The third paragraph of the program output considers 10000 numbers: the number of iterations actually used to solve the first challenge, the number of iterations actually used to solve the second, etc. The first line (with the sizes shown above and the Linux random() seeding; one can of course use different random numbers) says "average iterations to target = 43.2722 +- 43.1045"; this says that the 10000 numbers have average 43.2722 and deviation 43.1045. Iterations are independent, so it isn't surprising that the deviation is as large as the average.

The third paragraph continues by comparing this observed iteration effectiveness to the predicted iteration effectiveness: "predicted = 43.355; actual/predicted = 0.99809". This "predicted" number is computed as 1/b(p1,p2,q1,q2,l1,l2) where b is defined in Section 5 of the paper under "Success probability." One expects 10000 experiments from a 43.355 +- 43.355 distribution to produce an average of approximately 43.355 +- 0.433, so 43.2722 (0.2% smaller) is unsurprising. A 16-minute run with 1000000 instead of 10000 produces 43.369 (0.03% larger), again unsurprising.

Similar comments apply to other choices of sizes that we have tried. To summarize, the actual number of iterations used by the program is fully consistent with the 1/b(...) formula in Section 5 of our paper.

The program also keeps track of the cost of each iteration, divided into four components: Gaussian elimination (asymptotically negligible when parameters are optimized), building S, building T, and finding outputs. The cost model is discussed in our paper and is standard for the literature on this topic; it counts bit operations for arithmetic while ignoring memory use, copies, and other communication costs. One of the virtues of a C++ implementation is that this cost accounting is encapsulated at a very low level; see the "bit" and "lbits" classes. It is not difficult to see from code inspection that the total number of machine instructions followed by the program is within a small factor of this cost. (For a typical loop this is immediately obvious: there is a cost increment inside the loop. The outer loop in Step 8 deviates from this, since there is a cost increment only when head[T[j].sum.index()] is nonnegative; but this deviation adds only O(Tlen) instructions to each iteration, and the T construction, Step 7, has more than Tlen cost increments.)

We found it simplest to implement a form of Gaussian elimination slightly less naive than the form mentioned in our paper, although this obviously has no impact on the asymptotic performance of the algorithm. The fourth paragraph of program output compares (1) the observed average cost of Gaussian elimination per iteration; (2) a slightly pessimistic prediction, namely (n-k-1)(n-k)(n+1)/2; and (3) the formula (n-k)(n-k)(n+k)/2 from our paper. For our sample sizes n=30 etc., the observed cost is 5270, the prediction is 5301, and the paper's formula is 7400.5. Similar comments apply to other sizes that we have tried: the actual cost of Gaussian elimination per iteration in the program is slightly below (n-k-1)(n-k)(n+1)/2, which is obviously below (n-k)(n-k)(n+k)/2.

For S we again found it simplest to implement something marginally better than what the paper says: instead of generating all p1 choices out of k1 from all p1-1 choices out of k1 (and so on recursively), we generate all p1 choices out of k1 from all p1-1 choices out of k1-1 (and so on recursively). This reduces the cost stated in the paper, namely (l1+l2)(binomial(k1,p1)+binomial(k1,p1-1)+...+binomial(k1,2)) + min(1,q1)binomial(k1,p1)(binomial(l1,q1)+...+binomial(l1,1)), down to (l1+l2)(binomial(k1,p1)+binomial(k1-1,p1-1)+...+binomial(k1-p1+2,2)) + min(1,q1)binomial(k1,p1)(binomial(l1,q1)+...+binomial(l1-q1+1,1)). Of course, for p1<=2 and q1<=1 there's no difference; the paper's version has some expository advantages; and the difference is asymptotically irrelevant in any case. Anyway, the fifth paragraph of program output confirms that the observed cost of generating S is exactly this prediction, and that the prediction is at most the formula in the paper. The sixth paragraph is a similar confirmation for T.

For finding outputs, the implementation does exactly what the paper says, but the actual cost is slightly below what the paper says, because the formula 2(w-p1-p2-q1-q2+1) in the paper is slightly pessimistic: it is easy to see that the actual average number of bits considered in an early-abort weight calculation is slightly smaller than that formula. The final paragraph of program output confirms that the actual average cost is below what the paper says.