There are several ways to continue your study of algorithms:
Here are some questions that I wrote for the MCS preliminary exam in 2002 based on the material I had taught in MCS 501:
There's no reason for me to cover the same topics in Fall 2005. My own interests in algorithms are quite broad, and I'm happy to select course topics to match student interests. Let me know what you're interested in hearing about!
The class location is listed as 2AH, which I think means Addams Hall.
Call numbers for registration: 12364 for MCS 501; 21116 for CS 501.
My office hours will be 15:10-16:00 Monday, 16:10-17:00 Monday, 17:10-18:00 Monday in 410 SEO.
Today's class material: How fast is sorting? Several difficulties with an answer such as ``n log n comparisons'': comparison time isn't specified; time for other operations isn't specified; the sorting machine isn't specified. More precise question: a machine sorts n integers in {0,1,2,...,n^2}, each integer represented in the usual way as a string of bits in binary; how long does ths machine take? Sorting machine #1: single-tape Turing machine using insertion sort. Machine finishes in n^(2+o(1)) seconds; machine costs n^(1+o(1)) dollars. Sorting machine #2: two-dimensional RAM using merge sort. Machine finishes in n^(1.5+o(1)) seconds; machine costs n^(1+o(1)) dollars.
Class material from Friday: Odd-even transposition sort of n^(0.5) integers on a one-dimensional mesh. Example of odd-even transposition sort. Sorting machine #4: two-dimensional mesh using Schimmler sort. Example of Schimmler sort. Time analysis of Schimmler sort. Machine finishes in n^(0.5+o(1)) seconds; machine costs n^(1+o(1)) dollars. What's covered in a prerequisite course such as MCS/CS 401:
Today's class material: Schimmler sort on typical balanced array of 0's and 7's. Schimmler sort on typical unbalanced array of 0's and 7's. The parallel price-performance myth. Reality. Example: Schimmler sort. Example: dual-core CPUs.
Class material from last Friday: How expensive is addition of two n-bit integers in the usual binary representation, producing an (n+1)-bit integer in the usual binary representation? Answer depends on addition machine and cost measure. First cost measure: number of instructions; ``time'' in a baby algorithms course. Algorithm performing Theta(n) instructions. Second cost measure: wall-clock time, i.e., real time. Previous algorithm takes time n^(1.5+o(1)). Pipelined algorithm takes time n^(1+o(1)). Third cost measure: circuit depth. Converting carry-chain recurrence to product of many matrices. Parallel multiplication of many matrices.
Class material from Wednesday: Example of integer addition. Matrix view of same example. Simplest algorithm taking depth n^(o(1)) is very bad in other cost measures: number of instructions, mesh price-performance ratio, etc. Fourth cost measure: circuit gates times circuit depth. Serial algorithm has cost n^(2+o(1)). Parallel algorithm has cost n^(2+o(1)). Example for n=6. Better parallel algorithm, merging intermediate results. Cost n^(1+o(1)); exponent 1 is optimal. Constant factors: look up ``parallel prefix.'' Increasing cost measure by expanding circuit depth to circuit delay, accounting for distance in circuit; raises cost to n^(3/2+o(1)). Further increasing cost measure by expanding number of gates to length of wires, accounting for price of wiring; again cost n^(3/2+o(1)). Mesh price-performance ratio is also n^(3/2+o(1)).
Today's class material: Why use price-performance ratio, i.e., price-time product? Minimizing time without regard to price is unrealistic. But why price-performance ratio rather than price + time, for example, or price time^2, or some other combination? Using price-performance ratio as cost has two nice features: k independent computations in serial are, together, k times as expensive as one computation; and k independent computations in parallel are, together, k times as expensive as one computation. Can also graph (price,time) pairs. Advantages and disadvantages of various time measures. Advantages and disadvantages of various price measures. Back to addition: can reduce mesh price-performance ratio to n^(1+o(1)) by changing the problem, using redundant representation of integers. Moving on to integer multiplication: How expensive is multiplication of two n-bit integers in the usual binary representation, producing a 2n-bit integer in the usual binary representation? Simple solution, n^{2+o(1)} instructions: multiply each bit of the first integer u by the second integer v, and add the results. 5-bit example.
If you're interested in learning more about the Brent-Kung area-time theorem, read the original paper.
If you're interested in the Atrubin mesh, look for it in Section 4.3.3 of Knuth's Art of Computer Programming.
If you're interested in achieving multiplication depth Theta(lg n) rather than Theta((lg n)^2), look up ``Wallace trees.''
The FFT is an ``algebraic algorithm'': it produces the output coefficients using complex additions, complex subtractions, and complex multiplications starting from the input coefficients and various constants. I took ``algebraic complexity'' as my cost measure; this is the total number of additions, subtractions, and multiplications. This cost measure avoids the question of how complex numbers are represented inside a computer, and the question of how much time is used by a complex operation.
The FFT is given two size-n polynomials u,v and produces the size-n polynomial w = uv mod x^n-1. In more detail, the FFT has two inputs: a sequence (u[0],u[1],u[2],...,u[n-1]) of n complex numbers representing the polynomial u = u[0] + u[1]x + u[2]x^2 + ... + u[n-1]x^{n-1}, and a sequence (v[0],v[1],v[2],...,v[n-1]) of n complex numbers representing the polynomial v = v[0] + v[1]x + v[2]x^2 + ... + v[n-1]x^{n-1}. The FFT has one output: a sequence (w[0],w[1],w[2],...,w[n-1]) of n complex numbers representing the unique polynomial w = w[0] + w[1]x + w[2]x^2 + ... + w[n-1]x^{n-1} of degree below n such that w-uv is a multiple of x^n-1.
The FFT output can be described without polynomials as the ``cyclic convolution'' of the inputs:
For example, for n=2, the inputs are two complex vectors (u[0],u[1]) and (v[0],v[1]) representing the linear polynomials u = u[0]+u[1]x and v = v[0]+v[1]x. The output is the complex vector (w[0],w[1])=(u[0]v[0]+u[1]v[1],u[0]v[1]+u[1]v[0]) representing the linear polynomial w = w[0]+w[1]x = (u[0]v[0]+u[1]v[1])+(u[0]v[1]+u[1]v[0])x = uv - u[1]v[1](x^2-1).
As a larger example, for n=4, the inputs are two complex vectors (u[0],u[1],u[2],u[3]) and (v[0],v[1],v[2],v[3]) representing the polynomials u = u[0]+u[1]x+u[2]x^2+u[3]x^3 and v = v[0]+v[1]x+v[2]x^2+v[3]x^3. The product uv is (u[0]v[0]) + (u[0]v[1]+u[1]v[0])x + (u[0]v[2]+u[1]v[1]+u[2]v[0])x^2 + (u[0]v[3]+u[1]v[2]+u[2]v[1]+u[3]v[0])x^3 + (u[1]v[3]+u[2]v[2]+u[3]v[1])x^4 + (u[2]v[3]+u[3]v[2])x^5 + (u[3]v[3])x^6, so uv mod x^4-1 is uv - ((u[1]v[3]+u[2]v[2]+u[3]v[1]) + (u[2]v[3]+u[3]v[2])x + (u[3]v[3])x^2)(x^4-1) = (u[0]v[0]+u[1]v[3]+u[2]v[2]+u[3]v[1]) + (u[0]v[1]+u[1]v[0]+u[2]v[3]+u[3]v[2])x + (u[0]v[2]+u[1]v[1]+u[2]v[0]+u[3]v[3])x^2 + (u[0]v[3]+u[1]v[2]+u[2]v[1]+u[3]v[0])x^3. The output is the vector (u[0]v[0]+u[1]v[3]+u[2]v[2]+u[3]v[1],u[0]v[1]+u[1]v[0]+u[2]v[3]+u[3]v[2],u[0]v[2]+u[1]v[1]+u[2]v[0]+u[3]v[3],u[0]v[3]+u[1]v[2]+u[2]v[1]+u[3]v[0]).
The idea of the FFT, in a nutshell, is to compute uv mod x^n-1 from uv mod x^(n/2)-1 and uv mod x^(n/2)+1, each of which is computed recursively. In more detail:
The FFT has algebraic complexity 5n lg n + n:
Homework due 28 September 2005 is to analyze in more detail the n lg n multiplications by constants. How many of the n lg n constants are primitive 4th roots of 1, namely i or -i? How many of the n lg n constants are primitive 8th roots of 1, namely sqrt(i), -sqrt(i), sqrt(-i), or -sqrt(-i)? (For example, there are 4 multiplications by primitive 8th roots of 1 if n=8, and 12 multiplications by primitive 8th roots of 1 if n=16; of course, your answer should cover all values of n.) Same question for 16th, 32nd, etc.
I also described another solution to the problem of handling x^(n/2)+1 recursively: namely, to ``twist'' the multiplication mod x^(n/2)+1 into a multiplication mod x^(n/2)-1. A careful combination of these two solutions produces the ``split-radix FFT,'' which is about 20% faster than the original solutions, and which held the speed records for decades; see, e.g., Section 2 of my paper ``Fast multiplication and its applications.'' Last year, Van Buskirk introduced what I call the ``tangent FFT,'' which is about 5% faster than the split-radix FFT; see the paper ``A modified split-radix FFT with reduced arithmetic complexity'' by Johnson and Frigo.
I then explained one way to use the FFT for integer multiplication. The problem is to multiply the integers u[0] + 2u[1] + 4u[2] + ... and v[0] + 2v[1] + 4v[2] + ..., given bits u[0],v[0],u[1],v[1],...; the basic idea is to multiply the polynomials u[0] + u[1]x + u[2]x^2 + ... and v[0] + v[1]x + v[2]x^2 + ... using the FFT. There are three complications in making this work: