AES timing variability at a glance D. J. Bernstein
Authenticators and signatures
A state-of-the-art message-authentication code

AES timing variability at a glance

Introduction
Understanding the pictures
AMD Athlon
Intel Pentium III
Intel Pentium M
IBM PowerPC RS64 IV
Sun UltraSPARC III

Introduction

The Rijndael designers and the NIST AES selectors claimed that it was easy to make AES---in particular, the AES table lookups---run in constant time. That claim was, and is, incorrect. The following pictures demonstrate, in a visually obvious way, that common AES implementations have data-dependent timings on a wide variety of platforms, even when the timing software makes no special effort to knock AES table entries out of cache.

These pictures also include my own AES software, which goes to a tremendous amount of CPU-specific effort to reduce the timing variability. I say ``reduce'' rather than ``eliminate'' for three reasons: first, without help from the operating system, it's impossible to guarantee that the AES tables are in cache throughout the AES computation; second, even if the AES tables are in cache, there may be timing variability too small to appear in these pictures and too obscure to be documented by the CPU manufacturers; third, even if the in-cache timings really are data-independent for these CPUs, other CPUs may pose new problems. On the bright side, my software clearly has much less timing variability than other AES libraries.

To understand why this is important, read my paper Cache-timing attacks on AES.

Understanding the pictures

Each picture is a 256x256 array of 4x3 blocks, occupying 1024x768 pixels overall. The inline images on this page are the first 512x384 pixels; click to see the full pictures.

Each row of the picture is an AES key. Each column of the picture is an AES input. Each key was applied repeatedly to each input, 3100 times on average, in a random order. Each 4x3 block reflects the distribution of AES cycle counts for one key and one input.

The 36 color intensities in each block (3 colors for 4x3 pixels) are assigned, in a complicated order, to 36 AES cycle counts surrounding a typical AES cycle count. Cycle counts outside this 36-cycle range are clipped to the bottom and top.

``Good'' pictures are regular 256x256 lattices, showing that every (key,input) pair has the same distribution of cycle counts. Warning: browsers that scale pictures to the screen size are likely to spoil the pattern; make sure to view each picture at 1 dot per pixel.

``Bad'' pictures are irregular, showing that different (key,input) pairs have different distributions of cycle counts. Three examples, in increasing order of irregularity: OpenSSL on the UltraSPARC has varying colors in each column, showing key-dependent timings, and occasional variance within rows, showing data-dependent timings for each key; OpenSSL on the PowerPC RS64 IV is a complicated red-green pattern, showing that half of the (key,input) pairs have one cycle count and the other half have another; the OpenSSL and Gladman implementations on the Athlon are random jumbles of dots, showing a tremendous amount of data-dependent variability.

Here's how to generate your own pictures:

     ./v1 > v1.out
     pnmcolormap 256 < v1.out > v1.palette
     pnmremap -map=v1.palette < v1.out | ppmtogif > v1.gif

AMD Athlon

OpenSSL:

Gladman:

Gladman, using 1K tables:

My aes_athlon:

Intel Pentium III

OpenSSL:

My aes_ppro:

Intel Pentium M

OpenSSL:

Gladman:

Gladman, using 1K tables:

My aes_ppro:

IBM PowerPC RS64 IV

OpenSSL:

My aes_aix:

Sun UltraSPARC III

OpenSSL:

My aes_sparc: