The Fast Fourier Transform: Elegance in O(n log n)

The Fast Fourier Transform is one of those algorithms that makes you reconsider what’s possible. It takes a problem that seems inherently $O(n^2)$ — evaluating a polynomial at $n$ points — and solves it in $O(n \log n)$ . The trick is choosing the right points.

The problem

Given a polynomial of degree $n - 1$ :

$P(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_{n-1} x^{n-1} = \sum_{k=0}^{n-1} a_k x^k$

we want to evaluate $P$ at $n$ specific points. Naively, evaluating at a single point takes $O(n)$ multiplications (using Horner’s method), so evaluating at $n$ points costs $O(n^2)$ .

This matters because polynomial multiplication reduces to evaluation and interpolation. If $A(x)$ and $B(x)$ are degree- $n$ polynomials, their product $C(x) = A(x) \cdot B(x)$ has degree $2n$ . If we can evaluate both at $2n + 1$ points, we get $C(x_i) = A(x_i) \cdot B(x_i)$ at each point in $O(1)$ , then interpolate to recover the coefficients of $C$ . The bottleneck is evaluation.

Roots of unity

The FFT’s insight: evaluate at the $n$ -th roots of unity. These are the $n$ complex numbers:

$\omega_n^k = e^{2\pi i k / n}, \quad k = 0, 1, \ldots, n - 1$

They’re evenly spaced around the unit circle in the complex plane. The primitive root $\omega_n = e^{2\pi i / n}$ generates all of them: $\omega_n^0 = 1, \omega_n^1, \omega_n^2, \ldots, \omega_n^{n-1}$ .

These roots have a critical property. When $n$ is even, the squares of the $n$ -th roots of unity are the $n/2$ -th roots of unity:

$(\omega_n^k)^2 = \omega_{n/2}^k$

and the roots pair up: $\omega_n^k$ and $\omega_n^{k + n/2}$ share the same square. This is what makes divide-and-conquer possible.

The divide-and-conquer structure

Split $P(x)$ by even and odd coefficients:

$P(x) = (a_0 + a_2 x^2 + a_4 x^4 + \cdots) + x(a_1 + a_3 x^2 + a_5 x^4 + \cdots)$

Define two polynomials of half the degree:

$P_{\text{even}}(y) = a_0 + a_2 y + a_4 y^2 + \cdots$

$P_{\text{odd}}(y) = a_1 + a_3 y + a_5 y^2 + \cdots$

Then $P(x) = P_{\text{even}}(x^2) + x \cdot P_{\text{odd}}(x^2)$ .

Now evaluate $P$ at $\omega_n^k$ for $k = 0, \ldots, n - 1$ . For the first half ( $k < n/2$ ):

$P(\omega_n^k) = P_{\text{even}}(\omega_{n/2}^k) + \omega_n^k \cdot P_{\text{odd}}(\omega_{n/2}^k)$

For the second half, using $\omega_n^{k + n/2} = -\omega_n^k$ :

$P(\omega_n^{k + n/2}) = P_{\text{even}}(\omega_{n/2}^k) - \omega_n^k \cdot P_{\text{odd}}(\omega_{n/2}^k)$

Both halves reuse the same evaluations of $P_{\text{even}}$ and $P_{\text{odd}}$ at the $n/2$ -th roots of unity. Two subproblems of size $n/2$ , combined in $O(n)$ .

The recurrence is $T(n) = 2T(n/2) + O(n)$ , which gives $T(n) = O(n \log n)$ .

The butterfly operation

The combination step is called a “butterfly” because of how data flows. At each level, pairs of values are combined:

\begin{aligned} y_k &= u_k + \omega_n^k \cdot v_k \\ y_{k + n/2} &= u_k - \omega_n^k \cdot v_k \end{aligned}

where $u_k$ comes from the even subproblem and $v_k$ from the odd. The factor $\omega_n^k$ is called a twiddle factor. Each butterfly does one complex multiply and two complex adds.

Implementation

The recursive version is clean but allocates memory at each level. Here’s the straightforward recursive FFT in Python:

import cmath

def fft_recursive(a):
    """Compute FFT of coefficient list a. len(a) must be a power of 2."""
    n = len(a)
    if n == 1:
        return a

    # Split into even and odd
    even = fft_recursive(a[0::2])
    odd = fft_recursive(a[1::2])

    # Combine
    omega = cmath.exp(2j * cmath.pi / n)
    w = 1
    result = [0] * n
    for k in range(n // 2):
        result[k] = even[k] + w * odd[k]
        result[k + n // 2] = even[k] - w * odd[k]
        w *= omega

    return result

The iterative version avoids recursion overhead by processing levels bottom-up. It requires a bit-reversal permutation to reorder the input:

def bit_reverse(x, log_n):
    """Reverse the last log_n bits of integer x."""
    result = 0
    for _ in range(log_n):
        result = (result << 1) | (x & 1)
        x >>= 1
    return result


def fft_iterative(a):
    """In-place iterative FFT. len(a) must be a power of 2."""
    n = len(a)
    log_n = n.bit_length() - 1

    # Bit-reversal permutation
    a = [a[bit_reverse(i, log_n)] for i in range(n)]

    # Butterfly passes
    length = 2
    while length <= n:
        omega = cmath.exp(2j * cmath.pi / length)
        for start in range(0, n, length):
            w = 1
            for k in range(length // 2):
                u = a[start + k]
                v = w * a[start + k + length // 2]
                a[start + k] = u + v
                a[start + k + length // 2] = u - v
                w *= omega
        length *= 2

    return a

For comparison, here’s a C implementation that works on interleaved real/imaginary pairs in a flat array — the kind of thing you’d actually see in a systems context:

#include <math.h>

void fft(double *re, double *im, int n) {
    int log_n = 0;
    for (int temp = n; temp > 1; temp >>= 1) log_n++;

    // Bit-reversal permutation
    for (int i = 0; i < n; i++) {
        int j = 0;
        for (int b = 0; b < log_n; b++)
            if (i & (1 << b)) j |= (1 << (log_n - 1 - b));
        if (i < j) {
            double tmp;
            tmp = re[i]; re[i] = re[j]; re[j] = tmp;
            tmp = im[i]; im[i] = im[j]; im[j] = tmp;
        }
    }

    // Butterfly passes
    for (int len = 2; len <= n; len *= 2) {
        double angle = 2.0 * M_PI / len;
        double w_re = cos(angle), w_im = sin(angle);

        for (int start = 0; start < n; start += len) {
            double cur_re = 1.0, cur_im = 0.0;
            for (int k = 0; k < len / 2; k++) {
                int u = start + k;
                int v = start + k + len / 2;

                double t_re = cur_re * re[v] - cur_im * im[v];
                double t_im = cur_re * im[v] + cur_im * re[v];

                re[v] = re[u] - t_re;
                im[v] = im[u] - t_im;
                re[u] += t_re;
                im[u] += t_im;

                double new_re = cur_re * w_re - cur_im * w_im;
                cur_im = cur_re * w_im + cur_im * w_re;
                cur_re = new_re;
            }
        }
    }
}

Polynomial multiplication via FFT

Now we can multiply two polynomials in $O(n \log n)$ :

Pad both coefficient vectors to length $2n$ (next power of 2).
FFT both vectors: $\hat{A} = \text{FFT}(a)$ , $\hat{B} = \text{FFT}(b)$ .
Pointwise multiply: $\hat{C}_k = \hat{A}_k \cdot \hat{B}_k$ for each $k$ .
Inverse FFT to recover coefficients: $c = \text{IFFT}(\hat{C})$ .

The inverse FFT is almost identical to the forward FFT. The only differences: use $\omega_n^{-1} = e^{-2\pi i / n}$ instead of $\omega_n$ , and divide the result by $n$ :

$a_k = \frac{1}{n} \sum_{j=0}^{n-1} \hat{A}_j \cdot \omega_n^{-jk}$

def ifft(a):
    """Inverse FFT via conjugation trick."""
    n = len(a)
    # Conjugate, apply forward FFT, conjugate, scale
    a_conj = [x.conjugate() for x in a]
    result = fft_recursive(a_conj)
    return [x.conjugate() / n for x in result]


def poly_multiply(a, b):
    """Multiply two polynomials given as coefficient lists."""
    n = 1
    while n < len(a) + len(b):
        n *= 2

    # Pad to length n
    a_padded = a + [0] * (n - len(a))
    b_padded = b + [0] * (n - len(b))

    # Transform, pointwise multiply, inverse transform
    fa = fft_recursive(a_padded)
    fb = fft_recursive(b_padded)
    fc = [x * y for x, y in zip(fa, fb)]
    c = ifft(fc)

    return [round(x.real) for x in c]

Testing it:

# (1 + 2x + 3x^2) * (4 + 5x) = 4 + 13x + 22x^2 + 15x^3
print(poly_multiply([1, 2, 3], [4, 5]))
# [4, 13, 22, 15, 0, 0, 0, 0]

Why this matters

The FFT shows up everywhere. Audio processing (decomposing signals into frequencies), image compression (JPEG uses the closely related DCT), convolution in neural networks, multiplying large integers (Schönhage—Strassen), solving PDEs with spectral methods, and even string matching algorithms.

The deeper lesson is about structure. The $O(n^2)$ bound seemed fundamental until someone asked: what if we choose evaluation points with algebraic structure that lets subproblems overlap? The roots of unity aren’t just convenient — they’re the only choice that makes the recursion work, because they’re the only points whose squares collapse $n$ evaluations into $n/2$ .

Cooley and Tukey published the modern FFT in 1965, but the idea traces back to Gauss around 1805. It’s one of the most important algorithms ever discovered, and it fits in 30 lines of code.