D. J. Bernstein
Authenticators and signatures
hash127

Frequently asked questions

How can I securely authenticate messages?

Say you have a message m = (m[0],m[1],...,m[l-1]) consisting of signed 32-bit integers, i.e., integers between -2^31 and 2^31-1.

Compute the authenticator t = (h_r(m)+k[n]) mod p where

Now you can transmit n,m,t through an untrusted channel to a receiver who also knows k and r. The receiver checks whether t = (h_r(m)+k[n]) mod p and discards n,m,t otherwise.

The hash127 library computes t given m, r, and k[n].

Exactly how secure is this message authentication system?

If the secret numbers are independent and uniformly distributed, then an attacker has chance at most (C+D(L+2))/2^127 of successfully forging a message that you didn't authenticate, where See Theorem 9.1 of my paper on h_r.

For example, if messages are at most 4 gigabytes long, and the attacker tries 1000000000000000 forgeries, then his chance of success is at most 0.0000000000000063.

What if I can't afford to share 128 truly random bits per message?

You can use the authenticator (h_r(m)+E_k(n)) mod p, where E is your favorite 128-bit-to-128-bit cryptographic function, using a secret (uniform random) key k of some length, independent of r. Nonces here can be up to 128 bits long.

As an alternative, you can use the authenticator F_k(n,h_r(m)), where F is your favorite 256-bit-to-128-bit cryptographic function, using a secret (uniform random) key k of some length, independent of r.

Here are some good choices for F:

Exactly how secure are those systems?

The attacker's forgery chance for (h_r(m)+E_k(n)) mod p is at most (C+D(L+2))/2^127 + epsilon_E, where epsilon_E is the attacker's chance of distinguishing x |-> E_k(x) from a uniform random 128-bit-to-128-bit function.

A standard cryptographic assumption about E, used throughout the cryptographic literature to justify a variety of secret-key constructions, is that E is unpredictable, i.e., epsilon_E is negligible. Beware that proving such an assumption is far beyond the state of the art in theoretical computer science; it is possible that epsilon_E is actually nonnegligible for every easily computable low-entropy function E. On the other hand, the best known attacks on widely used functions E have not achieved a noticeable value of epsilon_E.

The attacker's forgery chance for F_k(n,h_r(m)) is at most D(2L+5)/2^128 + epsilon_F, where epsilon_F is the attacker's chance of distinguishing x |-> F_k(x) from a uniform random 256-bit-to-128-bit function. See Theorem 9.3 of my paper on h_r.

What if I don't have unique message numbers?

You could use the authenticator E_k(h_r(m)). If you do, the attacker's forgery chance is at most (C(C-1+2D)(L+2)+D)/2^128 + epsilon_E. See Theorem 9.2 of my paper on h_r. Beware that this chance grows quadratically with the number of messages.

Note that many protocols already need unique message numbers for other purposes, such as replay prevention. Those message numbers can be used directly as n's.

Are there machine-specific variants of h_r?

No. There are several different implementations, but they all compute the same function. The mathematical structure of h_r allows excellent performance on a wide variety of computer architectures:

How does this authentication system compare to CBC MACs and XOR MACs?

CBC MACs and XOR MACs are like E_k(h_r(m)) in that they can be built from any 128-bit-to-128-bit cryptographic function E, and have a proof of security if E is unpredictable. (Beware, however, that the security proof for CBC MACs assumes fixed-length messages. CBC MACs must be changed slightly for variable-length messages. Beware also that security degrades quadratically with the number of messages.)

CBC MACs and XOR MACs differ from E_k(h_r(m)) in that they apply the cryptographic function to every block of m. This is much slower than applying h_r.

How does this authentication system compare to HMAC?

HMAC-MD5 and HMAC-SHA are the same type of authenticator as E_k(h_r(m)), but using MD5(r,m) or SHA(r,m) instead of h_r(m).

Compare my performance measurements for h_r with the best available results for MD5 and SHA: for 2048-byte messages, currently about

h_r is very fast for short messages too.

Are there other functions that can be used in place of h_r?

Yes. The crucial property of h_r is that it has low collision probability: for any two different inputs, the outputs will collide with probability on the scale of 1/2^127 if you choose r randomly. There are many other functions with the same property. See section 10 of my paper on h_r for a survey of the literature.

What distinguishes h_r is its speed. h_r is the first high-security system to break the MD5 speed barrier.

All of these ``universal hash functions'' live in the pleasant world of mathematically guaranteed security. Other parts of a cryptographic system may fail and need replacement, but h_r can be used forever; it will never be broken. Effort invested in h_r will not be wasted.