[Talk notes, D. J. Bernstein, 24 June 2004.]
[This is approximately the talk I gave on Thursday 24 June 2004.]
[It took 25 minutes.]
[Visual: title slide on computer right]
Thank you.
My objective for this talk
is to make sure that you understand certain connections
between
computational geometry,
primality proving,
the generalized Riemann hypothesis,
doubly focused enumeration, whatever _that's_ supposed to mean,
and the design of warp engines.
[Only a few laughs at this point. Uh-oh, tough crowd.]
[Visual: a-taste slide on computer right, shifting]
Let me begin by showing you one of the simplest problems
in computational geometry.
This is the near-neighbor problem.
You're given a bunch of points u1 through um
in d-dimensional Euclidean space.
Also points v1 through vn
in d-dimensional Euclidean space.
I've insisted that the coordinates all be integers
to avoid any questions about how you represent real numbers inside a computer.
[People point out that a nearby microphone is producing feedback,
so I toss it across the room.]
Okay.
So you're given these points.
You're also given a radius or squared radius R.
The problem
is to find which _pairs_ of points are close together
as determined by that radius.
Of course,
you can simply _try_ all mn possibilities for i and j,
and maybe the answer is _all_ of them.
But _typically_ the output size is _much_ smaller than mn.
So you can ask for a faster algorithm.
[Visual: sort-and-merge slide on computer right, shifting]
Here are some standard algorithms.
A really good algorithm in dimension 1
is to sort and merge.
You take your points u1 through um along a line,
remember dimension 1,
and you sort the points into increasing order.
Similarly you sort the points v into increasing order.
Now you take the first u,
and figure out all the v's close to that u.
Then you slide along to the next u,
and slide along to the appropriate range of v's,
and then move on to the next u after that,
and slide the v range along,
and so on.
Once you reach the end of the lists you're done.
You've identified all the v's for each u.
Now the second algorithm here,
partitioning,
is slightly different,
and might seem slightly worse,
but it has the advantage of generalizing immediately to any dimension.
You have your points in space.
Let's say two-dimensional space.
You cover them with a grid.
You choose the distance between grid lines
to be comparable to the radius that was specified in the problem.
Now for each box in the grid
you look at the u's in that box,
and you look at the v's in that box,
and also the u's in boxes nearby,
and compare those u's and v's to see if they're close enough.
Of course,
distributing the points into these big boxes
is kind of like sorting.
In one way it's less informative,
because you don't have a _total_ ordering of points.
In another way it's _more_ informative,
because extracting the top digits of points
to put them into the right box
is saying something _more_ than simply comparing coordinates.
So this is a slightly different algorithm,
which again has the advantage of working easily
for say dimension 2.
Okay.
That's all you need to know about computational geometry.
[Visual: proving-primality slide on computer right, shifting]
Let me move on
to computational number theory,
specifically proving primality.
Say you want to know whether a 100-bit integer n is a prime.
If n is a _uniform_ random integer,
then people will typically just check
whether 2 to the n minus 1 power is 1 modulo n.
This takes a hundred multiplications mod n,
and it'll tell you whether or not n is prime.
Except that occasionally it makes mistakes.
There are certainly composite numbers n
for which 2 to the n minus 1 is 1 mod n.
What do you do
if somebody feeds you a bad value of n?
What do you do
if you don't want to make _any_ mistakes?
Well,
there's a huge literature on _proving_ primality,
and here's the best algorithm from the literature
for 100-bit primes.
Instead of just taking a power of 2
modulo n,
you take powers of all primes up through 367,
you check the conditions listed here on my slide.
This is maybe a couple hundred times slower
than the simple algorithm,
but this _never_ makes a mistake.
[Visual: proof-relies slide on computer right, shifting]
The proof of this theorem
involved a huge computation,
which was _not_
checking every integer up through 2 to the 100.
The computation checked
that every nonsquare positive integer up to 2 to the 80
is nonsquare in the p-adics for some prime p up through 367.
The number 367 here is the same as the number 367
in this primality-proving theorem.
The number 80 is 100 minus 20.
What you're supposed to be thinking here
is that if an integer is a square in _all_ the completions of the rationals,
then the integer is a square in the _rationals_.
This _computation_ says
that if the integer is _small_
then you just have to check a _few_ completions.
If you assume the generalized Riemann hypothesis
then you can prove another explicit bound here,
look for example at Eric Bach's thesis.
But that bound for 2 to the 80 is considerably larger than this 367.
The computation produces better results.
Now 2 to the 80
might still sound like a lot of computation.
But the actual work is below 2 to the 60,
which is much more reasonable.
In fact,
one might say that it's 2 to the 20 times more reasonable.
There are a couple of big speedups here,
both of which I will explain later.
One of the speedups
is focusing,
which actually you already know,
it means for example focusing on the integers congruent to 1 mod 8.
The other speedup
is _double_ focusing,
which most people _don't_ seem to know.
It's quite simple,
it could easily have been discovered two hundred years ago,
it applies to all sorts of sieving problems,
but I have not been able to find it in the literature.
Anyway my main concern is not priority,
my main concern is _advertising_ this technique
so that everyone realizes you can speed up your sieving computations.
Now, I still haven't said what double focusing _is_,
but I can say that the technique I used a few years ago
works only in dimension 1.
Does that sound familiar?
Anyway, in the context of other primality-proving theorems,
Hugh Williams asked me if I could
achieve the same speedup for dimension 2.
The point of this talk is that the answer is yes.
Here's an example
[Visual: in-ring slide on computer right, shifting]
of the results of a computation.
I'm working here with the Gaussian integers,
i is the square root of -1.
On the second line here,
I've written down a number 837947981 plus 2833822740 i,
which first of all is not a square,
because it is not a square modulo 269,
but it looks locally like a square
at a whole lot of smaller primes.
For example,
mod 13,
I claim that this number is the square of a unit,
which is a fairly stringent condition:
since 13 is the product of two primes,
each of norm 13,
there's only,
13 minus 1 over 2,
6 out of 13 chances of working mod each prime,
so under one quarter chance of working mod 13,
and then I've written down a whole bunch more primes,
almost all of which are congruent to 1 mod 4.
So the chance of a random number satisfying these conditions
is extremely small.
You can easily _check_
that this number satisfies all these conditions.
You can also easily build a _big_ number
satisfying these conditions.
But there's no obvious way to find a _small_ number
satisfying these conditions
other than to read it off my slide.
Now the second paragraph
is something that isn't even easy to _check_.
I claim
that if you allow real and imaginary parts up to 2 to the 32,
there are exactly 12 numbers satisfying these conditions.
[Here they are:]
[1833789311]
[2759203409]
[837947981 + 2833822740i]
[1833802225 + 2823930792i]
[869472789 + 3277372900i]
[2993144889 + 1967806280i]
[3875176031]
[287210499 + 3918061420i]
[3929710199]
[4151897881]
[4292632971 + 1655841460i]
[2500597255 + 3928736112i]
[Anyone want to double-check? I wrote this code at the last minute.]
Exactly 12 Gaussian integers that are not squares
but that are unit squares at all of these primes.
All of them are non-squares mod 269.
So any non-square Gaussian integer in this range
is non-square at one of these small primes.
This computation covered 2 to the 64 numbers.
But the computation took only about 2 to the 50
clock cycles.
Those of you who don't think in terms of clock cycles
can translate this to your local units of time,
it was about 250 microcenturies.
The point is,
this was not a serious use of computer time.
You could easily go beyond this,
push it up to say 2 to the 80.
So what is doubly focused enumeration?
Well,
let me start
by pointing out the limits
of focused enumeration.
[Visual: focused-enumeration slide on computer right, shifting]
We're starting with 2 to the 64
Gaussian integers.
Now suppose we focus on the unit squares modulo 5.
In other words,
instead of stepping horizontally by 1 and vertically by 1,
instead of stepping through the _entire_ lattice,
we step by distance 5 horizontally and vertically,
starting from 1 -1 2i and -2i.
We end up seeing only 4 out of every 25 numbers,
so that makes the whole algorithm about six times faster.
Well, 5 is certainly not the optimal modulus.
A much more reasonable modulus
is 8 times 3 times 5 times 7 times 13 17 29 37.
You simply focus
on each of the 224 billion unit squares mod this thing,
one at a time of course.
For each one you enumerate the small Gaussian integers
congruent to that square.
Now this is about as far as focused enumeration can be pushed.
It would be nice
to add more primes.
Each new prime
reduces the number of elements that we have to consider,
but that would also make the number of possibilities here,
the number of unit squares mod whatever,
substantially larger.
It doesn't take any time to write down these 4 possibilities mod 5,
but as you add more primes this turns into a serious problem.
Enumerating these 224 billion possibilities is already a bottleneck,
never mind enumerating the small Gaussian integers that are congruent
to each of those possibilities.
So this is about as far as focused enumeration can go.
[Visual: doubly-focused-enumeration slide on computer right, shifting]
_Doubly_ focused enumeration
goes considerably beyond this.
It allows almost twice as many primes.
Here's a concrete example.
Let's define m1 and m2
as the moduli shown here.
Let's choose a fundamental domain
mod the lattice of multiples of m1m2,
a set of remainders mod m1m2,
let's just take the usual remainders,
real and imaginary parts
between 0 inclusive and m1m2 exclusive.
Now since m1 and m2 are coprime,
we can write any Gaussian integer
as the difference between u and v,
where u is a multiple of m1
and v is a multiple of m2.
Then by subtracting off some multiple of m1m2
from both u and v
we can put v inside the fundamental domain.
Of course that preserves
u being a multiple of m1 and v being a multiple of m2.
Now, in terms of u and v,
we can recognize the unit squares modulo m1m2
as having u working mod m2
and -v working mod m1:
u-v is a unit square mod m2 for example
if and only if u is a unit square mod m2
since v is 0 mod m2.
This is really advanced number theory.
Good.
At this point
it's supposed to be blazingly obvious
that we're facing a computational geometry problem,
a near-neighbor problem.
[Visual: want-near-neighbors slide on computer right, shifting]
The v's that we're interested in,
the middle of this slide,
are exactly the multiples of m2
that are in the fundamental domain
and that work mod m1.
The u's that we're interested in
are exactly the multiples of m1
that are close to the fundamental domain
and that work mod m2.
What we're looking for are near neighbors.
The small unit squares mod m1m2
are exactly the differences of near neighbors between these two sets.
For example,
the number I showed you earlier,
8379 et cetera,
is the difference between u and v
where u is this multiple of m1
and v is this multiple of m2.
These numbers u and v are close together,
the difference is small.
Also u is a unit square mod m2
and -v is a unit square mod m1.
[Visual: m 2^51 slide on computer right, shifting]
So the computation I did
was to partition these u's and v's
into about 2 to the 36 boxes,
each box having size comparable to the radius I'm interested in.
There were several u's and v's in the average box,
and by comparing those I found several near neighbors,
for a total of about 2 to the 39
small unit squares mod m1m2,
each of which I quickly checked
to see how many _more_ primes worked beyond m1 and m2.
Let's look in more detail
at the generation of u's and v's.
Each box has something like
2 to the 15 multiples of m1 and m2,
out of which only a small fraction
are elements of S and T.
I could have just checked each multiple
to see which ones work mod m2 and m1,
in which case it would have been better
to balance the parameters a little differently,
but anyway there's a standard method
to write down the elements of S more efficiently
in a random-looking order,
which I then sorted into a useful order.
I did _not_ want to write down _all_ of S at once,
that would have gotten into disk sorting which is not incredibly fast,
but there are techniques to use quite a lot less memory.
This part is quite standard;
I don't think I need to go into any more detail here.
I do want to mention that
this time-memory tradeoff is a good idea
only because our current computer architectures,
conventional computers,
make computation seem more expensive than it actually is,
while memory is quite inexpensive.
For a better-balanced computer,
with more computational units,
you should _not_ use this tradeoff;
you should simply check each multiple of m1 to see if it's in S.
Okay.
[Visual: advantage slide on computer right, shifting]
Last slide.
One way to understand the benefit of doubly focused enumeration
is from these asymptotic formulas
which say that _focused_ enumeration,
when optimized,
saves a factor of roughly N to the 1 over lig log N,
lig is log base 2,
log is log base e,
so in other words that's
2 to the log N over log log N,
where N is the number of points you're sieving.
_Doubly_ focused enumeration
saves the _square_ of the same factor.
I'm ignoring various 1 plus o(1) powers here.
As N grows,
the advantage of doubly focused enumeration
also grows.
I said in the abstract
that it saves a factor of roughly 1000,
which is typical for interesting sizes of N these days.
But instead of 2 to the 10
you may prefer to remember 2 to the log N over log log N.
Let me close
with a quick advertisement
regarding my backup plan for this talk.
Ahem.
Many pieces of white paper,
one dollar.
Tape to attach white paper to the wall,
five dollars.
[Showing computer projector to the audience:]
InFocus LP70+ computer projector,
a couple thousand dollars.
Not having to worry
about whether the official conference equipment works properly,
priceless.
[The audience was laughing at this point,]
[so I skipped the followup:]
[There are some things money can't buy.]
[For everything else there's MasterCard.]
Thank you for your attention.