The Fast Fourier Transform: Elegance in O(n log n)
The Fast Fourier Transform is one of those algorithms that makes you reconsider what’s possible. It takes a problem that seems inherently — evaluating a polynomial at points — and solves it in . The trick is choosing the right points.
The problem
Given a polynomial of degree :
we want to evaluate at specific points. Naively, evaluating at a single point takes multiplications (using Horner’s method), so evaluating at points costs .
This matters because polynomial multiplication reduces to evaluation and interpolation. If and are degree- polynomials, their product has degree . If we can evaluate both at points, we get at each point in , then interpolate to recover the coefficients of . The bottleneck is evaluation.
Roots of unity
The FFT’s insight: evaluate at the -th roots of unity. These are the complex numbers:
They’re evenly spaced around the unit circle in the complex plane. The primitive root generates all of them: .
These roots have a critical property. When is even, the squares of the -th roots of unity are the -th roots of unity:
and the roots pair up: and share the same square. This is what makes divide-and-conquer possible.
The divide-and-conquer structure
Split by even and odd coefficients:
Define two polynomials of half the degree:
Then .
Now evaluate at for . For the first half ():
For the second half, using :
Both halves reuse the same evaluations of and at the -th roots of unity. Two subproblems of size , combined in .
The recurrence is , which gives .
The butterfly operation
The combination step is called a “butterfly” because of how data flows. At each level, pairs of values are combined:
where comes from the even subproblem and from the odd. The factor is called a twiddle factor. Each butterfly does one complex multiply and two complex adds.
Implementation
The recursive version is clean but allocates memory at each level. Here’s the straightforward recursive FFT in Python:
import cmath
def fft_recursive(a):
"""Compute FFT of coefficient list a. len(a) must be a power of 2."""
n = len(a)
if n == 1:
return a
# Split into even and odd
even = fft_recursive(a[0::2])
odd = fft_recursive(a[1::2])
# Combine
omega = cmath.exp(2j * cmath.pi / n)
w = 1
result = [0] * n
for k in range(n // 2):
result[k] = even[k] + w * odd[k]
result[k + n // 2] = even[k] - w * odd[k]
w *= omega
return result
The iterative version avoids recursion overhead by processing levels bottom-up. It requires a bit-reversal permutation to reorder the input:
def bit_reverse(x, log_n):
"""Reverse the last log_n bits of integer x."""
result = 0
for _ in range(log_n):
result = (result << 1) | (x & 1)
x >>= 1
return result
def fft_iterative(a):
"""In-place iterative FFT. len(a) must be a power of 2."""
n = len(a)
log_n = n.bit_length() - 1
# Bit-reversal permutation
a = [a[bit_reverse(i, log_n)] for i in range(n)]
# Butterfly passes
length = 2
while length <= n:
omega = cmath.exp(2j * cmath.pi / length)
for start in range(0, n, length):
w = 1
for k in range(length // 2):
u = a[start + k]
v = w * a[start + k + length // 2]
a[start + k] = u + v
a[start + k + length // 2] = u - v
w *= omega
length *= 2
return a
For comparison, here’s a C implementation that works on interleaved real/imaginary pairs in a flat array — the kind of thing you’d actually see in a systems context:
#include <math.h>
void fft(double *re, double *im, int n) {
int log_n = 0;
for (int temp = n; temp > 1; temp >>= 1) log_n++;
// Bit-reversal permutation
for (int i = 0; i < n; i++) {
int j = 0;
for (int b = 0; b < log_n; b++)
if (i & (1 << b)) j |= (1 << (log_n - 1 - b));
if (i < j) {
double tmp;
tmp = re[i]; re[i] = re[j]; re[j] = tmp;
tmp = im[i]; im[i] = im[j]; im[j] = tmp;
}
}
// Butterfly passes
for (int len = 2; len <= n; len *= 2) {
double angle = 2.0 * M_PI / len;
double w_re = cos(angle), w_im = sin(angle);
for (int start = 0; start < n; start += len) {
double cur_re = 1.0, cur_im = 0.0;
for (int k = 0; k < len / 2; k++) {
int u = start + k;
int v = start + k + len / 2;
double t_re = cur_re * re[v] - cur_im * im[v];
double t_im = cur_re * im[v] + cur_im * re[v];
re[v] = re[u] - t_re;
im[v] = im[u] - t_im;
re[u] += t_re;
im[u] += t_im;
double new_re = cur_re * w_re - cur_im * w_im;
cur_im = cur_re * w_im + cur_im * w_re;
cur_re = new_re;
}
}
}
}
Polynomial multiplication via FFT
Now we can multiply two polynomials in :
- Pad both coefficient vectors to length (next power of 2).
- FFT both vectors: , .
- Pointwise multiply: for each .
- Inverse FFT to recover coefficients: .
The inverse FFT is almost identical to the forward FFT. The only differences: use instead of , and divide the result by :
def ifft(a):
"""Inverse FFT via conjugation trick."""
n = len(a)
# Conjugate, apply forward FFT, conjugate, scale
a_conj = [x.conjugate() for x in a]
result = fft_recursive(a_conj)
return [x.conjugate() / n for x in result]
def poly_multiply(a, b):
"""Multiply two polynomials given as coefficient lists."""
n = 1
while n < len(a) + len(b):
n *= 2
# Pad to length n
a_padded = a + [0] * (n - len(a))
b_padded = b + [0] * (n - len(b))
# Transform, pointwise multiply, inverse transform
fa = fft_recursive(a_padded)
fb = fft_recursive(b_padded)
fc = [x * y for x, y in zip(fa, fb)]
c = ifft(fc)
return [round(x.real) for x in c]
Testing it:
# (1 + 2x + 3x^2) * (4 + 5x) = 4 + 13x + 22x^2 + 15x^3
print(poly_multiply([1, 2, 3], [4, 5]))
# [4, 13, 22, 15, 0, 0, 0, 0]
Why this matters
The FFT shows up everywhere. Audio processing (decomposing signals into frequencies), image compression (JPEG uses the closely related DCT), convolution in neural networks, multiplying large integers (Schönhage—Strassen), solving PDEs with spectral methods, and even string matching algorithms.
The deeper lesson is about structure. The bound seemed fundamental until someone asked: what if we choose evaluation points with algebraic structure that lets subproblems overlap? The roots of unity aren’t just convenient — they’re the only choice that makes the recursion work, because they’re the only points whose squares collapse evaluations into .
Cooley and Tukey published the modern FFT in 1965, but the idea traces back to Gauss around 1805. It’s one of the most important algorithms ever discovered, and it fits in 30 lines of code.