From: "D. J. Bernstein" To: hash-forum@nist.gov Cc: keccak@noekeon.org Subject: Second preimages for 6 (7? (8??)) rounds of Keccak? At the Crypto 2010 rump session, Dinur and Shamir announced an algebraic attack on Hamsi-256. They've now posted their paper: http://eprint.iacr.org/2010/602 It turns out that Dinur and Shamir are looking through 2^256 possible second preimages, but they argue that this still qualifies as an attack because they've reduced the number of bit operations in the inner loop. The simplest form of the attack is as follows. Split a possible preimage into a 224-bit prefix and a 32-bit suffix. The structure of Hamsi allows (e.g.) output bits 5,156,214,221,249 to be written as polynomials of degree at most 6 in the suffix bits, where the polynomial coefficients are particular functions of the prefix. Compute these polynomials, and then evaluate these polynomials on all 2^32 suffixes; only about 2^27 suffixes will match the target in the selected 5 output bits. Standard fast algorithms for multipoint evaluation take only about 5*32*2^31 bit operations, for a total of 80*2^256 bit operations. Computing the polynomials in the first place is not a bottleneck, since the polynomials have low degree. The number of compression-function evaluations is reduced by a factor of almost 2^5. Now let's think about what this means for Keccak. Out of the five SHA-3 candidates specified in the Keccak submission, let's focus on the maximum-security candidate keccakc1024: Keccak[r=576,c=1024,nr=24] with 512-bit output. Each 576-bit input block is fed through 24 rounds; let's weaken Keccak by reducing the 24 rounds to 6 rounds. Because the block size is already above 512 bits, we can expect to find a single-block second preimage. Here's how we'll do that: Choose 64 bits (b[1],...,b[64]) not matching the first preimage. For each pattern of 128 bits (c[1],...,c[128]): Compute a polynomial representation of the function f that maps (x[1],...,x[384]) to the first 10 bits of Keccak6(b[1],...,b[64],c[1],...,c[128],x[1],...,x[384]). Use fast multipoint evaluation to compute f for all (x[1],...,x[384]). For each (x[1],...,x[384]) where f matches the target: Compute Keccak6(b[1],...,b[64],c[1],...,c[128],x[1],...,x[384]). This procedure searches through 2^128*2^384 = 2^512 Keccak inputs, so it has an excellent chance of finding a second preimage, and adding early termination with a restart effectively increases this chance to 1. So the only question is how long the procedure takes. Each Keccak round is quadratic in the input bits, so Keccak6 has degree at most 64, so each bit of f has degree at most 64 in x[1],...,x[384]; i.e., f can be written as sum_S c_S prod_{i in S} x_i where S ranges over subsets of {1,2,...,384} having size at most 64, and each c_S is a 10-bit string. We'll compute c_S using a standard identity, namely higher-order differentiation modulo 2: c_S = sum_{T subset S} f(T). There are sum(j=0,64,binomial(384,j)) < 2^246 choices of T, so the initial computation of each f(T) uses only 2^246 evaluations of Keccak6. Across the whole attack that's just 2^128*2^246 = 2^374 evaluations of Keccak6. Separate computation of c_S = sum_{T subset S} f(T) for each S now uses sum(j=0,64,2^j*binomial(384,j)) < 2^310 additions of 10-bit strings. Across the whole attack that's a total of 10*2^438 bit operations. This is clearly highly suboptimal---standard optimizations (CSE, Paar, etc.) make a huge difference in the number of additions, and this becomes important as the degree increases---but I'll stick to it for simplicity. Fast multipoint evaluation now takes just 384*2^383 additions of 10-bit strings. Across the whole attack that's a total of 1920*2^512 bit operations. This is again suboptimal, but it's already much faster than 2^512 evaluations of Keccak6. This initial filter eliminates most of the 2^384 input strings, leaving just 2^374 input strings that require Keccak6 evaluations. Across the whole attack that's just 2^128*2^374 = 2^502 Keccak6 evaluations. Overall the attack involves 2^502 Keccak6 evaluations plus 1920*2^512 bit operations plus negligible extra work. Can someone double-check the details and see whether I've calculated all this correctly? For comparison, there are several papers using Keccak degree bounds to "distinguish" more rounds of Keccak from a random oracle, but nobody has managed to give a proper definition of a hash-function "distinguisher" (a definition that's mathematically clear and that doesn't allow every function to be trivially distinguished), and it's certainly not possible to turn these "distinguishers" into attacks against accepted security notions such as preimage resistance. The SHA-3 Zoo doesn't list any preimage attacks against reduced-round Keccak, and the only occurrences of "preimage" on the Keccak cryptanalysis page are pointing to * Lathrop finding partial preimages in 4 rounds of Keccak and speculating that the attacks could be pushed to more rounds, and * Morawiecki and Srebrny finding more restricted partial preimages in 3 rounds of Keccak. I think that this very simple 6-round Keccak attack can be pushed to 7 rounds (possibly even 8?), if CSE is used to accelerate the computation of c_S; if that's correct then Keccak will need at least 8 (or even 9?) rounds to meet its preimage-resistance goals. Full 24-round Keccak therefore has <=200% extra computation as a security margin. For people who evaluate attack cost by counting compression-function evaluations, this 2^10 reduction in the cost of second preimages can be easily improved to a reduction factor of 2^100 or more. Of course, this would actually make the attack much slower; to me this is yet another clear demonstration of how silly it is to evaluate attack cost by counting compression-function evaluations. In the opposite direction, for people who have even the slightest understanding of the physical reality of attack cost, this attack makes no sense: it's inherently memory-intensive and communication-intensive, and for the same machine size it's clearly much slower than parallel exhaustive search, even though it has somewhat fewer bit operations. The same is true for the Dinur--Shamir attack on Hamsi-256. One can also reasonably ask why anyone should care about the difference between 2^520 bit operations and 2^525 bit operations, or between 2^260 bit operations and 2^270 bit operations. If we're really trying to maximize the epsilon in 2^(512+epsilon), shouldn't we be selecting the SHA-3 candidate with the largest number of bit operations in its compression function and with no known inner-loop optimizations? Is there any evidence that this type of superficial "security" metric has any value in predicting real-world security? ---D. J. Bernstein Research Professor, Computer Science, University of Illinois at Chicago P.S. I'm also curious whether anyone has tried message modification against Keccak. I wouldn't expect anything to get past 12 rounds, but the lack of reduced-round collision examples is a bit embarrassing!