[Talk notes, D. J. Bernstein, 22 February 2005.]
[This is approximately the talk I gave on 21 February 2005.]
[It took slightly under 25 minutes.]
[Start while still setting up computer.]
Thank you.
Let me start with something that doesn't really need a slide.
The _type_ of Poly1305-AES.
It is, first of all, a function.
What inputs does it have?
There's a 16-byte string r,
and a 16-byte string k,
and a 16-byte string n,
[Visual: title slide on computer right]
and finally a variable-length string m.
[Visual: ``The Poly1305-AES function'' on computer right]
Ah, here we go.
The output of Poly1305-AES is also 16 bytes,
which I'm writing Poly1305 sub r of m comma AES sub k of n.
This tells you something about how k and n are used.
I did not write the definition of Poly1305-AES up here
but it's really very simple.
You take the message m and split it as you heard this morning,
split it into 16-byte chunks
plus maybe a final chunk between 1 and 15 bytes.
You stick a 1 bit onto each chunk,
view the resulting numbers in little-endian form
as the coefficients of a polynomial,
and evaluate that polynomial modulo 2 to the 130 minus 5.
The evaluation point is exactly r,
also treated in little-endian form.
You then add AES sub k of n modulo 2 to the 128
and that's your output.
So it's a really easy definition.
[Visual: ``Poly1305-AES authenticators'' on computer right]
Now how do we use this for message authentication?
Well, what I've written here
is just one of the standard authentication protocols.
The input m is a message being authenticated.
The string n is a nonce,
a unique message number.
You heard some criticism of nonces this morning,
which I will respond to in a few minutes.
The string r is a secret shared by the sender and receiver,
and the string k is also a secret shared by the sender and the receiver.
So the secret key is 32 bytes overall.
The sender takes the output of Poly1305-AES as a 16-byte authenticator
which is transmitted along with the message.
The receiver does the same computation and throws the message away
if the authenticator doesn't match.
So this is again just a completely standard authentication protocol.
Now, how secure is Poly1305-AES?
[Visual: ``Poly1305-AES security guarantee'' on computer right]
Well, I'm using the standard notion of security here,
what users actually care about,
which is whether forgeries get through.
What I've said at the bottom here is that
all forgeries are rejected with probability very close to 1.
There's a lot of parameters here.
First of all there's the number of messages authenticated.
I'm calling that C for chosen messages.
The attacker might choose the messages sent.
Of course, if the attacker can actually choose messages
that the sender will authenticate,
then the attacker doesn't need to forge authenticators,
but the security guarantee applies anyway.
No matter how much power the attacker has to control the sender's messages,
between no control and this complete control,
the attacker won't be able to come up with authenticators
on any messages that the sender didn't authenticate.
Okay.
Then there's D, the number of forgery attempts.
Of course, the more forgery attempts you have,
the more chance there is of the attacker succeeding.
Let's also put a limit on message lengths, call that L,
for example L might be 1536 for Internet applications.
And then there's one parameter we can't control,
maybe the attacker can control it,
which is delta, the chance of the attacker breaking AES.
I'm using the usual notion of an attack here,
which is the chance of distinguishing AES from
a completely random permutation of the set of 16-byte strings.
Hopefully delta is small.
Of course, this was one of the security goals
in the original call for AES proposals.
In terms of all that,
the probability that all forgeries are rejected
is at least 1 minus delta minus blah blah blah.
Okay, here's an example to say what I really mean here.
[Visual: ``Example'' on computer right]
Let's assume that L is 1536.
Let's assume that delta is at most 2 to the minus 40.
So that would be all the computers in the world
working for 100 years on a brute-force attack on AES.
Of course, there might be a better attack,
but hopefully AES is secure.
Let's assume that the sender authenticates at most 2 to the 64 messages,
and let's assume that the attacker tries at most 2 to the 64 forgeries.
Then all of the forgeries are rejected with probability
at least 0.999 lots of nines,
which is a comfortable security level.
The rest of this slide is about birthday attacks,
which you heard about this morning.
Lots of standard nonceless authenticators
are breakable if you take this much effort.
Of course, that's still a huge amount of effort.
What I'm saying here is that everyone satisfied
with the standard CBC security level
should also be satisfied with the Poly1305-AES security level.
By the way,
those authenticators also have security guarantees like this one
but if you look at the details
then you see that there's a quadratic term,
which is bad.
[Visual: ``Do nonces require'' on computer right]
Now you heard this morning that nonces
require additional message expansion overhead.
That's not true.
I think there's a communication problem here
between the people doing cryptographic applications
and the cryptographers.
Let's look at a real network protocol,
an Internet protocol, like TCP.
Let's imagine a huge amount of data being transmitted,
say 2 to the 64 bytes being transmitted,
or you could have fewer bytes.
Of course, these bytes are split into separate messages.
Maybe you transmit x0 x1 x2 in one packet,
and then you transmit x3 x4 et cetera in another packet.
Each message has a nonce associated with it,
which is the position of that packet in the stream of data.
The sender and receiver both know this nonce.
Even without cryptography you need to know these nonces
to deal with networks dropping packets,
transmitting packets out of order, and so on.
So the nonces are already there.
We might as well use them in cryptography.
Okay.
There are of course many protocols,
many specific functions
that you can use to authenticate messages.
Some of the functions do not achieve a good security level
so we throw them away.
And then the functions that are left
compete on the basis of speed.
A few days ago I released my fast public-domain Poly1305-AES software.
Here are some timings for verifying an authenticator.
These are expressed as usual in CPU cycles.
I should emphasize that these are not cycles per byte,
these are actual times for various numbers of bytes.
For example on an Athlon, a 16-byte authenticator takes 712 clock cycles.
On an UltraSPARC III, a 1024-byte authenticator takes 5601 clock cycles.
Just for comparison,
Alpha-MAC, which you heard about this morning,
the paper says something like 1000 cycles plus 10 cycles per byte.
[Actually, what the Alpha-MAC paper says (for a Pentium III)
is an ``estimate'' of 1448 cycles plus 10.7 cycles per byte.
Obviously this can be improved,
but it's very far behind Poly1305-AES.]
Now some of you might look at what I've written here
and complain that I'm assuming all data is aligned,
which in some applications is inconvenient.
Sometimes a message comes in from the network
and it's not naturally aligned.
Also all the message lengths l that I'm looking at
are multiples of 16.
Well, what if you have a non-multiple of 16?
[Visual: ``Unaligned messages'' on computer right]
Here are some timings where the data is not aligned
and the message length is not a multiple of 16.
You can see that there's some penalty, but not terrible.
There are lots more questions you might ask.
For example, what happens if the message is not in cache?
Or the key is not in cache?
I have a web page that has the answers to these questions,
all sorts of numbers that you might want to know.
Okay.
I'd like to take the time that I have left
to focus on an issue that I don't think has received enough attention,
namely benchmarking.
I'm going to take the attacker's perspective here.
There's some information being communicated,
namely how much time a function takes to compute.
The sender of the information
is a computer that knows how much time the function takes by trying it out.
The receiver of the information
is the reader of a cryptographic paper saying how much time the function takes.
The attacker is the author of the cryptographic paper
saying how much time the function takes.
[Visual: ``The art of benchmarking'' on computer right]
I hope everyone here is comfortable taking the attacker's perspective.
I'm going to tell you five attacks
that have actually been carried out,
successfully,
by authors of cryptographic papers.
Bait-and-switch timings,
guesses reported as timings,
et cetera.
I'm going to go through these in detail.
The end result of these attacks is that readers think the functions are fast
even though they are not actually fast.
[Visual: ``Bait-and-switch'' on computer right]
The first attack is an interleaving attack.
There's a two-step protocol going on here.
The reader says ``How fast is your function?''
and you say how fast the function is.
Then the reader says ``How secure is your function?''
and you say how secure the function is.
Now the reader repeats this protocol many times
for many different functions,
asking the same questions.
Here I have two functions,
one function is fast and breakable,
the other function is slow and secure.
The attack is to interleave two runs of the protocol.
I'm sure you've seen pictures of these interleaving attacks
from formal protocol analysis.
The reader says ``How fast is your function?''
and you say ``ZMAC''---hypothetical example---``ZMAC is very fast,''
referring to ZMAC-32.
Then the reader says ``How secure is your function?''
and you say ``ZMAC is very strong,''
referring to ZMAC-128.
Of course, to get away with this,
as a psychological matter,
you have to give two functions the same name,
for example ``ZMAC.''
You have to say that ZMAC is a _family_ of functions,
because that's what the cryptographic user wants.
He doesn't want to just type ssh.
He wants to type ssh minus zmac security level equals 112
minus cipher equals quadruple DES blah blah blah.
Anyway, once you have given two functions the same name,
you can proceed with this attack.
[Visual: ``Guesses reported'' on computer right]
Here's another attack you can carry out.
Usually the attacker looks at a message from the sender
and then modifies the message.
But the attacker can also simply make up a sender.
Try forging a message without even waiting for the real sender.
That's what's happening in my example here.
On page 1 of a paper,
the authors say that their function on a Pentium II
achieves speeds of 2.2 cycles per byte.
I wonder if the authors are here in the audience.
On page 12 of the paper,
the authors say that they haven't actually finished implementing the function.
They _think_ it will run at 2.2 cycles per byte,
and maybe that's a good guess,
but it's still just a guess.
It's not an actual timing.
I can tell from the giggles in the audience
that some of you have seen this attack before.
[Visual: ``My-favorite-CPU'' on computer right]
Here's attack number 3.
This is a denial-of-service attack.
This is where you have
two computers on your desk,
a Pentium 4 and an UltraSPARC,
and maybe also a Pentium III and so on.
The UltraSPARC is sending a message that you don't like,
namely that your function is very slow on the UltraSPARC.
So you take the sender and shoot it.
You take the UltraSPARC outside and smash it into little pieces
[miming a baseball bat]
so that the only computer left in your office is a Pentium 4.
And then you say
``All speeds were measured on a Pentium 4
because that's the only computer we have in the office.''
Great.
Let me emphasize that all of these attacks _have_ been carried out.
[Visual: ``Long-message'' on computer right]
Attack number 4 is also a denial-of-service attack.
This is where you talk about cycles per byte for _long_ messages.
You don't shoot the sender
but you discard some of the information from the sender.
You measure the function for 1024 bytes
and 1024 plus 32 bytes
and then take the _difference_ and divide by 32.
My example is someone saying that a function takes 2 cycles per byte.
Plus 2000 cycles---but that's the sort of thing
that doesn't appear until page 12.
For a 16-byte message this function takes 32 cycles!
Plus 2000 cycles.
For a 64-byte message this function takes 128 cycles!
Plus 2000 cycles.
You're really misleading the reader
as to how fast this function is.
[Visual: ``Timings after precomputation'' on computer right]
Attack number 5 is a more subtle denial-of-service attack,
not very subtle but more subtle than the previous ones.
This is an attack that I've carried out.
For this attack you exclude everything that happens just once.
You look at the time taken by your function
and you say that, well, some of that time is kind of annoying,
like precomputation and loading data from cache
and priming the branch-prediction unit,
so you _subtract_ that time.
You run the function _twice_ and report the time for the second run.
Of course, this is not what a cryptographic user actually does.
Cryptographic users perform each computation just _once_.
I carried out this attack several years ago,
with hash127.
Several of us broke the MD5 speed barrier at about the same time,
but we all used big tables.
When you have an application
that deals with a huge number of keys simultaneously,
those tables don't fit into level-1 cache,
they don't even fit into level-2 cache,
and because the tables are so big you need a lot of time to load them from DRAM.
But I ignored all that in my benchmarks.
[Visual: ``The honest alternative'' on computer right]
The big difference with Poly1305-AES
is that it achieves high speed _without_ precomputation.
The CPU cycles that I reported before
_include_ all precomputation.
They _include_ key expansion.
And if you look on my web pages
then you can see all sorts of timings for what happens if,
for example,
the key is not in cache.
The performance is still quite good.
A large part of the design of Poly1305-AES
was aimed at exactly this feature,
achieving high performance without precomputation.
In particular, the prime 2 to the 130 minus 5,
it's really important that it's a couple of bits bigger
than 2 to the 128,
and it's really important that it's very close to a power of 2,
so divisions are easy.
Anyway, I hope you've all learned from this list of attacks.
Maybe we'll be able to see the attacks being carried out in subsequent talks.
Thank you for your attention.