← Back to blog

Markov Chains, Eigenvectors, and PageRank

PageRank is a beautiful collision of linear algebra and probability. The core idea — a webpage is important if important pages link to it — is circular, and the math that resolves that circularity is the theory of Markov chains.

Random walks on graphs

Model the web as a directed graph. Each page is a node; each hyperlink is an edge. A “random surfer” starts at some page and, at each step, follows a uniformly random outgoing link.

If page ii has LiL_i outgoing links and one of them points to page jj, the probability of transitioning from ii to jj in one step is:

Pij=1LiP_{ij} = \frac{1}{L_i}

This defines a transition matrix PRn×nP \in \mathbb{R}^{n \times n}, where nn is the number of pages. Each row of PP sums to 1 — it’s a row-stochastic matrix. The entry PijP_{ij} is the probability of going from state ii to state jj.

If the surfer’s current position is described by a probability distribution (row vector) x(t)Rn\mathbf{x}^{(t)} \in \mathbb{R}^n, then after one step:

x(t+1)=x(t)P\mathbf{x}^{(t+1)} = \mathbf{x}^{(t)} P

The question PageRank asks: does this process converge? Is there a stationary distribution π\boldsymbol{\pi} such that π=πP\boldsymbol{\pi} = \boldsymbol{\pi} P?

Stationary distributions

A stationary distribution π\boldsymbol{\pi} satisfies:

πP=π,i=1nπi=1,πi0\boldsymbol{\pi} P = \boldsymbol{\pi}, \quad \sum_{i=1}^{n} \pi_i = 1, \quad \pi_i \geq 0

This is a left eigenvector equation: π\boldsymbol{\pi} is a left eigenvector of PP with eigenvalue 1. The Perron—Frobenius theorem guarantees that every stochastic matrix has eigenvalue 1, and if the matrix is irreducible and aperiodic, the eigenvalue 1 has multiplicity 1 and the corresponding eigenvector is strictly positive.

Irreducible means the graph is strongly connected — you can get from any page to any other. Aperiodic means the GCD of cycle lengths through any state is 1. Under both conditions, the stationary distribution exists, is unique, and the random walk converges to it regardless of where it starts:

limtx(0)Pt=π\lim_{t \to \infty} \mathbf{x}^{(0)} P^t = \boldsymbol{\pi}

The problem with the raw web

The real web graph isn’t strongly connected. There are pages with no outgoing links (dangling nodes), and there are clusters of pages that link only to each other (absorbing components). The raw transition matrix PP is neither irreducible nor aperiodic, so convergence isn’t guaranteed.

PageRank fixes this with a teleportation mechanism. With probability α\alpha (the damping factor), the surfer follows a random link. With probability 1α1 - \alpha, the surfer jumps to a uniformly random page:

M=αP+(1α)1n11TM = \alpha P + (1 - \alpha) \frac{1}{n} \mathbf{1}\mathbf{1}^T

where 1\mathbf{1} is the all-ones vector. The matrix 1n11T\frac{1}{n}\mathbf{1}\mathbf{1}^T has every entry equal to 1/n1/n — it represents a uniform random jump to any page.

For dangling nodes (rows of PP that are all zeros because the page has no outlinks), we replace the zero row with 1n1T\frac{1}{n}\mathbf{1}^T before applying the formula. The adjusted matrix MM is:

  • Stochastic: every row sums to 1.
  • Irreducible: teleportation makes every state reachable from every other.
  • Aperiodic: the self-loop probability from teleportation is nonzero.

So Perron—Frobenius applies, and a unique stationary distribution π\boldsymbol{\pi} exists. This π\boldsymbol{\pi} is the PageRank vector.

Google originally used α=0.85\alpha = 0.85.

Computing PageRank: the power method

The simplest algorithm: start with any distribution x(0)\mathbf{x}^{(0)} (usually uniform: xi(0)=1/nx_i^{(0)} = 1/n) and iterate:

x(t+1)=x(t)M=αx(t)P+(1α)1n1T\mathbf{x}^{(t+1)} = \mathbf{x}^{(t)} M = \alpha \, \mathbf{x}^{(t)} P + (1 - \alpha) \frac{1}{n} \mathbf{1}^T

This is the power method for finding the dominant eigenvector. The convergence rate is determined by the spectral gap — the difference between the largest eigenvalue (1) and the second largest λ2|\lambda_2|. For the PageRank matrix:

λ2α|\lambda_2| \leq \alpha

so the number of iterations to reach precision ϵ\epsilon is:

t=O(log(1/ϵ)log(1/α))t = O\left(\frac{\log(1/\epsilon)}{\log(1/\alpha)}\right)

With α=0.85\alpha = 0.85, this is about O(log(1/ϵ)/0.163)O(\log(1/\epsilon) / 0.163), roughly 50—100 iterations for practical precision. Each iteration is a sparse matrix-vector product — O(nnz)O(\text{nnz}) where nnz is the number of nonzero entries (links), which scales linearly with the size of the web.

Implementation

Here’s a complete implementation in Python. First, building the transition matrix from a link graph:

import numpy as np
from scipy import sparse


def build_transition_matrix(adjacency):
    """
    Build sparse row-stochastic transition matrix from adjacency list.
    adjacency[i] is a list of pages that page i links to.
    """
    n = len(adjacency)
    rows, cols, data = [], [], []
    dangling = []

    for i, neighbors in enumerate(adjacency):
        if len(neighbors) == 0:
            dangling.append(i)
        else:
            prob = 1.0 / len(neighbors)
            for j in neighbors:
                rows.append(i)
                cols.append(j)
                data.append(prob)

    P = sparse.csr_matrix((data, (rows, cols)), shape=(n, n))
    return P, dangling

The power iteration with teleportation and dangling node handling:

def pagerank(adjacency, alpha=0.85, tol=1e-10, max_iter=200):
    """
    Compute PageRank via power iteration.
    Returns the stationary distribution as a numpy array.
    """
    n = len(adjacency)
    P, dangling = build_transition_matrix(adjacency)

    x = np.ones(n) / n  # uniform initial distribution

    for iteration in range(max_iter):
        # Dangling node mass: redistributed uniformly
        dangling_mass = sum(x[d] for d in dangling)

        # Power iteration step:
        # x_new = alpha * (x @ P + dangling_mass/n * ones) + (1-alpha)/n * ones
        x_new = alpha * (x @ P + dangling_mass / n) + (1 - alpha) / n

        # Check convergence (L1 norm)
        delta = np.abs(x_new - x).sum()
        x = x_new

        if delta < tol:
            print(f"Converged in {iteration + 1} iterations (delta={delta:.2e})")
            break

    return x

Testing on a small graph:

# 4-page web:
#   0 -> 1, 2
#   1 -> 2
#   2 -> 0
#   3 -> 0, 2  (page 3 links to 0 and 2)
adjacency = [
    [1, 2],     # page 0
    [2],        # page 1
    [0],        # page 2
    [0, 2],    # page 3
]

pr = pagerank(adjacency, alpha=0.85)
for i, score in enumerate(sorted(enumerate(pr), key=lambda x: -x[1])):
    print(f"  Page {score[0]}: {score[1]:.6f}")

Output:

Converged in 32 iterations (delta=8.71e-11)
  Page 2: 0.343530
  Page 0: 0.327797
  Page 1: 0.192336
  Page 3: 0.136336

Page 2 ranks highest — it receives links from three other pages. Page 3 ranks lowest — nothing links to it, so it only gets teleportation traffic.

The eigenvalue perspective

We can verify PageRank by computing the actual dominant left eigenvector of MM:

def pagerank_eigen(adjacency, alpha=0.85):
    """Compute PageRank via direct eigenvalue decomposition."""
    n = len(adjacency)
    P, dangling = build_transition_matrix(adjacency)
    P_dense = P.toarray()

    # Fix dangling nodes
    for d in dangling:
        P_dense[d, :] = 1.0 / n

    # Build Google matrix
    M = alpha * P_dense + (1 - alpha) / n * np.ones((n, n))

    # Left eigenvectors: solve pi @ M = pi, i.e., M^T @ pi^T = pi^T
    eigenvalues, eigenvectors = np.linalg.eig(M.T)

    # Find eigenvector for eigenvalue closest to 1
    idx = np.argmin(np.abs(eigenvalues - 1.0))
    pi = np.real(eigenvectors[:, idx])
    pi = pi / pi.sum()  # normalize to probability distribution

    return pi

Both methods produce identical results, confirming the stationary distribution is the dominant eigenvector.

Rate of convergence

The spectral gap controls how fast the power method converges. The second eigenvalue of the PageRank matrix satisfies λ2α|\lambda_2| \leq \alpha. To see why, write M=αP+(1α)EM = \alpha P' + (1 - \alpha) E where PP' is the adjusted stochastic matrix and E=1n11TE = \frac{1}{n}\mathbf{1}\mathbf{1}^T. Since EE has rank 1 with eigenvalue 1 (and all others 0), and the eigenvalues of a convex combination of matrices are bounded by the convex combination of their eigenvalues:

λk(M)αλk(P)+(1α)λk(E)α1+0=αfor k2|\lambda_k(M)| \leq \alpha |\lambda_k(P')| + (1 - \alpha)|\lambda_k(E)| \leq \alpha \cdot 1 + 0 = \alpha \quad \text{for } k \geq 2

The error after tt iterations decays as λ2tαt|\lambda_2|^t \leq \alpha^t. After tt iterations:

x(t)π1Cαt\|\mathbf{x}^{(t)} - \boldsymbol{\pi}\|_1 \leq C \cdot \alpha^t

For α=0.85\alpha = 0.85 and ϵ=1010\epsilon = 10^{-10}: t10ln10ln(1/0.85)142t \geq \frac{10 \ln 10}{\ln(1/0.85)} \approx 142 iterations. In practice it converges faster because λ2|\lambda_2| is usually well below α\alpha.

Beyond PageRank

The mathematical framework here — stochastic matrices, stationary distributions, spectral gaps — shows up throughout computer science. MCMC methods (Metropolis—Hastings, Gibbs sampling) are Markov chains engineered to have a target stationary distribution. Mixing times of random walks on expander graphs underpin randomized algorithms. The conductance of a Markov chain (the Cheeger inequality) connects spectral gaps to graph connectivity.

PageRank itself is a specific instance of eigenvector centrality on directed graphs. The same idea — defining importance recursively and resolving the circularity with linear algebra — appears in recommendation systems, citation analysis, and social network ranking.

The math is classical. The insight was applying it to the web.