D. J. Bernstein
Data structures and program structures
MCS 501 and CS 501, Spring 2002

What I did in class

2002.01.07: Can sort n numbers with O(n log n) comparisons. Time may be much larger; account for length of numbers and overhead. Another cost measurement: dollars = product of time and cost of computer. Can sort n small numbers for n^(2+o(1)) dollars. Easier way to achieve same result: odd-even transposition sort. Example of odd-even transposition sort. General statement of odd-even transposition sort. Better result: Schimmler sort. Example of Schimmler sort. Time analysis, by induction. Cost analysis: only n^{1.5+o(1)} dollars.

2002.01.09: [Read chapter 27 of textbook.] Summary of situation for 1-dimensional mesh, 2-dimensional mesh, 3-dimensional mesh, hypercube. Specify your model of computation and cost measure. Pictorial proof that odd-even transposition sort works for 0-1 sequences. Three cases in inductive proof: nth number is 1; nth number is 0 and n is odd; nth number is 0 and n is even. Example of zero-one law: odd-even transposition sort works for all sequences. Schimmler sort for a typical 0-1 sequence.

2002.01.11: Formal definition of odd(x). Example: odd(3,1,4,1,5) = (1,3,1,4,5). Formal definition of even(x). Example: even(2,7,1,8,2) = (2,1,7,2,8). Formal definition of oddeven_k(x). oddeven_k(x) is a permutation of x. Examples of oddeven_k(2,7,1,8,2). Can compute oddeven_k in k steps on a size-n mesh. Crucial fact: oddeven_n(x[1],...,x[n]) is sorted. Diagram: algorithm, trace of computation, proof. Exercise: oddeven_n(x[1],...,x[n-1],1) is sorted if oddeven_{n-1}(x[1],...,x[n-1]) is sorted and all x[k]'s are in {0,1}. Similar fact is true for ,0 instead of ,1. Thus oddeven_n(x[1],...,x[n]) is sorted if all x[k]'s are in {0,1}. The [] notation. If [c<y1] <= [c<y2] <= ... <= [c<yn] for all c then y1 <= y2 <= ... <= yn. Proof. The zero-one strategy to prove that y1,y2,...,yn is sorted. Proof that oddeven_n(x[1],...,x[n]) is sorted.

2002.01.14: [Skim chapters 30 and 31 of textbook. Look at description of ``counting sort'' in textbook.] Schimmler's algorithm works for any matrix of 0's and 1's.

2002.01.16: Sorting on a Pentium III-1000. LSD radix sort. Stable counting sort.

2002.01.18: LSD radix sort, continued. MSD radix sort. Unstable counting sort without extra space.

2002.01.23: Standard asymptotic model of sequential computation: multitape Turing machine. Standard representation of an integer as a string of bits. Time to add two integers. Time to divide an integer by 2. Recursive algorithm to multiply two integers. Quadratic time. Can do much better: fast multiplication, essentially linear time. Straightforward algorithm to multiply many integers. Analysis of straightforward algorithm. Divide-and-conquer algorithm. Divide-and-conquer algorithm takes essentially linear time, using fast multiplication.

2002.01.25: Examples of several trees to multiply 2 3 5 7 11. Underlying multiplication properties: associativity; commutativity. Balanced tree in divide-and-conquer algorithm has logarithmic height, so its size is only logarithmically larger than the original numbers. Can do better when sizes of numbers are unbalanced. Example: one large number, many small numbers; left-to-right tree, balanced tree, right-to-left tree. Strassen's Huffman-tree multiplication algorithm: replace two smallest numbers with their product; repeat. Need fast priority queue, such as heap, so that overhead does not dominate multiplications. Can multiply many 2x2 matrices quickly. Beware that matrix multiplication is generally not commutative.

2002.01.28: Class cancelled.

2002.01.30: Comments on homework 1 and homework 2. Using product of 2x2 integer matrices to compute sums of rational numbers. Using product of 2x2 integer matrices to compute numerator of 1+1/2!+1/3!+...+1/k!.

2002.02.01: Using product of 2x2 integer matrices to compute continued fractions. Exercise: using product of 2x2 integer matrices for a 2-term recurrence. Divide-and-conquer algorithm to multiply many 2x2 matrices quickly. Associativity of 2x2 matrix multiplication. Size of product of two 2x2 matrices. Size of product of many 2x2 matrices. Final multiplication in divide-and-conquer algorithm takes essentially linear time.

2002.02.04: Newton's method; why fast multiplication implies fast division.

2002.02.06: Cook's division algorithm. Proof that Cook's division algorithm works. Proof that Cook's division algorithm has at most 33 adjustment steps. Time for Cook's division algorithm. Example.

2002.02.08: Exercise: computing x/x1, x/x2, ..., x/xn with n^{1+o(1)} multiplications, where x = x1 x2 ... xn. Recursive algorithm to compute a mod b1, a mod b2, ..., a mod bk. Example: 61 mod 2, 3, 5, 7. Product tree picture for k=8. Example of 61 again using a left-to-right product tree. What the Chinese remainder theorem says. Definition of coprime. Output of Chinese remainder theorem as linear combination of inputs. Example.

2002.02.11: Can compute (b/b1) mod b1, (b/b2) mod b2, ..., (b/bk) mod bk, where b = b1 b2 ... bk, in essentially linear time: use (b/bi) mod bi = (b mod bi^2) / bi. Example. Computing inverse of (b/bi) mod bi in essentially linear time: later. Total time so far is essentially linear. Using fast addition of rational numbers to finish Chinese-remainder-theorem computation. There are other fast Chinese-remainder-theorem algorithms. Using division to isolate small prime factors of many integers.

2002.02.13: Recursive use of division to isolate small prime factors of many integers. Example: 75 68 70 82 over 2,3,5,7. Algorithm produces all small prime factors of each integer. Proof. Tree for trace of algorithm; example for 75 68 70 82 over 2,3,5,7; why algorithm takes essentially linear time.

2002.02.15: Problem of computing all prime factors below k of a1,...,ak. Solution using previous recursive algorithm. Total time, if each ai has roughly (log k)^2 bits. Time per integer. Each ai is independent in problem, but not in solution. Speed of previous solutions: trial division; Pollard's method (1974); Lenstra's elliptic-curve method (1987); the Lenstra-Pila-Pomerance hyperelliptic-curve method (1993). Comments on size of exp sqrt(log k log log k). Some fast-multiplication history: time n^(lg 3+o(1)) by Karatsuba (1963); time n^(1+o(1)) by Toom (1963); time n (log n)^(O(1)) by Schoenhage and Strassen, independently Pollard, independently Nicholson (1971); recursive division by Moenck and Borodin (1974); recursive factorization by Bernstein (2000). Philosophical note on time needed to perform many computations.

2002.02.18: Reducing integer multiplication to polynomial multiplication.

2002.02.20: Algebraic complexity of polynomial multiplication is subquadratic. Idea of the fast Fourier transform.

2002.02.22: Details of the fast Fourier transform.

2002.02.25: Implementing the sieve of Eratosthenes on a mesh.

2002.02.27: Implementing general PRAM algorithms on a mesh.

2002.03.01: More on fast multiplication.

2002.03.04: Saving log factors in fast multiplication. Euclid's algorithm for modular inverses.

2002.03.06: Fast modular inverses.

2002.03.08: Typical parallelization problem: finding collisions in a 128-bit-to-128-bit function. Trying 2^128 random pairs. Hashing or sorting 2^64 random values.

2002.03.11: Trying 2^108 pairs on each of 2^20 computers independently. Trying 2^54 values on each of 2^20 computers independently, with 2^54 memory per computer. Trying 2^44 values on each of 2^20 computers, with 2^44 memory per computer and with massive communication. A few comparisons that reflect all collisions: Pollard's rho method. Pollard's rho method is inherently serial.

2002.03.13: Trying 2^54 values on each of 2^20 computers independently, with very little memory per computer, by Pollard's rho method. Even fewer comparisons that reflect all collisions: distinguished points. Parallel collision search: trying 2^44 values on each of 2^20 computers, with very little memory per computer, and not much communication.

2002.03.15: Homework on parallel breadth-first search.

2002.03.18: No class.

2002.03.20: No class.

2002.03.22: No class.

2002.03.25: Parallel prefix sums.

2002.03.27: Using parallel prefix sums.

2002.03.29: Integer addition with parallel prefix sums.

2002.04.01: Parallel prefix for linked lists.

2002.04.03: Data compression with Huffman trees. Average path length in Huffman tree is at least entropy.

2002.04.08: Huffman tree achieves shortest possible average path length among all binary trees.

2002.04.10: Universal compression.

2002.04.12: Universal compression, continued. Example of move-to-front.

2002.04.15: Move-to-front examples. Characterization of each move-to-front output byte as number of different bytes between occurrences of this character. Time for move-to-front encoding on a serial computer. Exercise: parallel. Time for move-to-front decoding on a serial computer. Matrix version to apply parallel prefix.

2002.04.17: The Burrows-Wheeler transform.