← Back to blog

The Fast Fourier Transform: Elegance in O(n log n)

The Fast Fourier Transform is one of those algorithms that makes you reconsider what’s possible. It takes a problem that seems inherently O(n2)O(n^2) — evaluating a polynomial at nn points — and solves it in O(nlogn)O(n \log n). The trick is choosing the right points.

The problem

Given a polynomial of degree n1n - 1:

P(x)=a0+a1x+a2x2++an1xn1=k=0n1akxkP(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_{n-1} x^{n-1} = \sum_{k=0}^{n-1} a_k x^k

we want to evaluate PP at nn specific points. Naively, evaluating at a single point takes O(n)O(n) multiplications (using Horner’s method), so evaluating at nn points costs O(n2)O(n^2).

This matters because polynomial multiplication reduces to evaluation and interpolation. If A(x)A(x) and B(x)B(x) are degree-nn polynomials, their product C(x)=A(x)B(x)C(x) = A(x) \cdot B(x) has degree 2n2n. If we can evaluate both at 2n+12n + 1 points, we get C(xi)=A(xi)B(xi)C(x_i) = A(x_i) \cdot B(x_i) at each point in O(1)O(1), then interpolate to recover the coefficients of CC. The bottleneck is evaluation.

Roots of unity

The FFT’s insight: evaluate at the nn-th roots of unity. These are the nn complex numbers:

ωnk=e2πik/n,k=0,1,,n1\omega_n^k = e^{2\pi i k / n}, \quad k = 0, 1, \ldots, n - 1

They’re evenly spaced around the unit circle in the complex plane. The primitive root ωn=e2πi/n\omega_n = e^{2\pi i / n} generates all of them: ωn0=1,ωn1,ωn2,,ωnn1\omega_n^0 = 1, \omega_n^1, \omega_n^2, \ldots, \omega_n^{n-1}.

These roots have a critical property. When nn is even, the squares of the nn-th roots of unity are the n/2n/2-th roots of unity:

(ωnk)2=ωn/2k(\omega_n^k)^2 = \omega_{n/2}^k

and the roots pair up: ωnk\omega_n^k and ωnk+n/2\omega_n^{k + n/2} share the same square. This is what makes divide-and-conquer possible.

The divide-and-conquer structure

Split P(x)P(x) by even and odd coefficients:

P(x)=(a0+a2x2+a4x4+)+x(a1+a3x2+a5x4+)P(x) = (a_0 + a_2 x^2 + a_4 x^4 + \cdots) + x(a_1 + a_3 x^2 + a_5 x^4 + \cdots)

Define two polynomials of half the degree:

Peven(y)=a0+a2y+a4y2+P_{\text{even}}(y) = a_0 + a_2 y + a_4 y^2 + \cdots

Podd(y)=a1+a3y+a5y2+P_{\text{odd}}(y) = a_1 + a_3 y + a_5 y^2 + \cdots

Then P(x)=Peven(x2)+xPodd(x2)P(x) = P_{\text{even}}(x^2) + x \cdot P_{\text{odd}}(x^2).

Now evaluate PP at ωnk\omega_n^k for k=0,,n1k = 0, \ldots, n - 1. For the first half (k<n/2k < n/2):

P(ωnk)=Peven(ωn/2k)+ωnkPodd(ωn/2k)P(\omega_n^k) = P_{\text{even}}(\omega_{n/2}^k) + \omega_n^k \cdot P_{\text{odd}}(\omega_{n/2}^k)

For the second half, using ωnk+n/2=ωnk\omega_n^{k + n/2} = -\omega_n^k:

P(ωnk+n/2)=Peven(ωn/2k)ωnkPodd(ωn/2k)P(\omega_n^{k + n/2}) = P_{\text{even}}(\omega_{n/2}^k) - \omega_n^k \cdot P_{\text{odd}}(\omega_{n/2}^k)

Both halves reuse the same evaluations of PevenP_{\text{even}} and PoddP_{\text{odd}} at the n/2n/2-th roots of unity. Two subproblems of size n/2n/2, combined in O(n)O(n).

The recurrence is T(n)=2T(n/2)+O(n)T(n) = 2T(n/2) + O(n), which gives T(n)=O(nlogn)T(n) = O(n \log n).

The butterfly operation

The combination step is called a “butterfly” because of how data flows. At each level, pairs of values are combined:

yk=uk+ωnkvkyk+n/2=ukωnkvk\begin{aligned} y_k &= u_k + \omega_n^k \cdot v_k \\ y_{k + n/2} &= u_k - \omega_n^k \cdot v_k \end{aligned}

where uku_k comes from the even subproblem and vkv_k from the odd. The factor ωnk\omega_n^k is called a twiddle factor. Each butterfly does one complex multiply and two complex adds.

Implementation

The recursive version is clean but allocates memory at each level. Here’s the straightforward recursive FFT in Python:

import cmath

def fft_recursive(a):
    """Compute FFT of coefficient list a. len(a) must be a power of 2."""
    n = len(a)
    if n == 1:
        return a

    # Split into even and odd
    even = fft_recursive(a[0::2])
    odd = fft_recursive(a[1::2])

    # Combine
    omega = cmath.exp(2j * cmath.pi / n)
    w = 1
    result = [0] * n
    for k in range(n // 2):
        result[k] = even[k] + w * odd[k]
        result[k + n // 2] = even[k] - w * odd[k]
        w *= omega

    return result

The iterative version avoids recursion overhead by processing levels bottom-up. It requires a bit-reversal permutation to reorder the input:

def bit_reverse(x, log_n):
    """Reverse the last log_n bits of integer x."""
    result = 0
    for _ in range(log_n):
        result = (result << 1) | (x & 1)
        x >>= 1
    return result


def fft_iterative(a):
    """In-place iterative FFT. len(a) must be a power of 2."""
    n = len(a)
    log_n = n.bit_length() - 1

    # Bit-reversal permutation
    a = [a[bit_reverse(i, log_n)] for i in range(n)]

    # Butterfly passes
    length = 2
    while length <= n:
        omega = cmath.exp(2j * cmath.pi / length)
        for start in range(0, n, length):
            w = 1
            for k in range(length // 2):
                u = a[start + k]
                v = w * a[start + k + length // 2]
                a[start + k] = u + v
                a[start + k + length // 2] = u - v
                w *= omega
        length *= 2

    return a

For comparison, here’s a C implementation that works on interleaved real/imaginary pairs in a flat array — the kind of thing you’d actually see in a systems context:

#include <math.h>

void fft(double *re, double *im, int n) {
    int log_n = 0;
    for (int temp = n; temp > 1; temp >>= 1) log_n++;

    // Bit-reversal permutation
    for (int i = 0; i < n; i++) {
        int j = 0;
        for (int b = 0; b < log_n; b++)
            if (i & (1 << b)) j |= (1 << (log_n - 1 - b));
        if (i < j) {
            double tmp;
            tmp = re[i]; re[i] = re[j]; re[j] = tmp;
            tmp = im[i]; im[i] = im[j]; im[j] = tmp;
        }
    }

    // Butterfly passes
    for (int len = 2; len <= n; len *= 2) {
        double angle = 2.0 * M_PI / len;
        double w_re = cos(angle), w_im = sin(angle);

        for (int start = 0; start < n; start += len) {
            double cur_re = 1.0, cur_im = 0.0;
            for (int k = 0; k < len / 2; k++) {
                int u = start + k;
                int v = start + k + len / 2;

                double t_re = cur_re * re[v] - cur_im * im[v];
                double t_im = cur_re * im[v] + cur_im * re[v];

                re[v] = re[u] - t_re;
                im[v] = im[u] - t_im;
                re[u] += t_re;
                im[u] += t_im;

                double new_re = cur_re * w_re - cur_im * w_im;
                cur_im = cur_re * w_im + cur_im * w_re;
                cur_re = new_re;
            }
        }
    }
}

Polynomial multiplication via FFT

Now we can multiply two polynomials in O(nlogn)O(n \log n):

  1. Pad both coefficient vectors to length 2n2n (next power of 2).
  2. FFT both vectors: A^=FFT(a)\hat{A} = \text{FFT}(a), B^=FFT(b)\hat{B} = \text{FFT}(b).
  3. Pointwise multiply: C^k=A^kB^k\hat{C}_k = \hat{A}_k \cdot \hat{B}_k for each kk.
  4. Inverse FFT to recover coefficients: c=IFFT(C^)c = \text{IFFT}(\hat{C}).

The inverse FFT is almost identical to the forward FFT. The only differences: use ωn1=e2πi/n\omega_n^{-1} = e^{-2\pi i / n} instead of ωn\omega_n, and divide the result by nn:

ak=1nj=0n1A^jωnjka_k = \frac{1}{n} \sum_{j=0}^{n-1} \hat{A}_j \cdot \omega_n^{-jk}

def ifft(a):
    """Inverse FFT via conjugation trick."""
    n = len(a)
    # Conjugate, apply forward FFT, conjugate, scale
    a_conj = [x.conjugate() for x in a]
    result = fft_recursive(a_conj)
    return [x.conjugate() / n for x in result]


def poly_multiply(a, b):
    """Multiply two polynomials given as coefficient lists."""
    n = 1
    while n < len(a) + len(b):
        n *= 2

    # Pad to length n
    a_padded = a + [0] * (n - len(a))
    b_padded = b + [0] * (n - len(b))

    # Transform, pointwise multiply, inverse transform
    fa = fft_recursive(a_padded)
    fb = fft_recursive(b_padded)
    fc = [x * y for x, y in zip(fa, fb)]
    c = ifft(fc)

    return [round(x.real) for x in c]

Testing it:

# (1 + 2x + 3x^2) * (4 + 5x) = 4 + 13x + 22x^2 + 15x^3
print(poly_multiply([1, 2, 3], [4, 5]))
# [4, 13, 22, 15, 0, 0, 0, 0]

Why this matters

The FFT shows up everywhere. Audio processing (decomposing signals into frequencies), image compression (JPEG uses the closely related DCT), convolution in neural networks, multiplying large integers (Schönhage—Strassen), solving PDEs with spectral methods, and even string matching algorithms.

The deeper lesson is about structure. The O(n2)O(n^2) bound seemed fundamental until someone asked: what if we choose evaluation points with algebraic structure that lets subproblems overlap? The roots of unity aren’t just convenient — they’re the only choice that makes the recursion work, because they’re the only points whose squares collapse nn evaluations into n/2n/2.

Cooley and Tukey published the modern FFT in 1965, but the idea traces back to Gauss around 1805. It’s one of the most important algorithms ever discovered, and it fits in 30 lines of code.