Markov Chains, Eigenvectors, and PageRank

PageRank is a beautiful collision of linear algebra and probability. The core idea — a webpage is important if important pages link to it — is circular, and the math that resolves that circularity is the theory of Markov chains.

Random walks on graphs

Model the web as a directed graph. Each page is a node; each hyperlink is an edge. A “random surfer” starts at some page and, at each step, follows a uniformly random outgoing link.

If page $i$ has $L_i$ outgoing links and one of them points to page $j$ , the probability of transitioning from $i$ to $j$ in one step is:

$P_{ij} = \frac{1}{L_i}$

This defines a transition matrix $P \in \mathbb{R}^{n \times n}$ , where $n$ is the number of pages. Each row of $P$ sums to 1 — it’s a row-stochastic matrix. The entry $P_{ij}$ is the probability of going from state $i$ to state $j$ .

If the surfer’s current position is described by a probability distribution (row vector) $\mathbf{x}^{(t)} \in \mathbb{R}^n$ , then after one step:

$\mathbf{x}^{(t+1)} = \mathbf{x}^{(t)} P$

The question PageRank asks: does this process converge? Is there a stationary distribution $\boldsymbol{\pi}$ such that $\boldsymbol{\pi} = \boldsymbol{\pi} P$ ?

Stationary distributions

A stationary distribution $\boldsymbol{\pi}$ satisfies:

$\boldsymbol{\pi} P = \boldsymbol{\pi}, \quad \sum_{i=1}^{n} \pi_i = 1, \quad \pi_i \geq 0$

This is a left eigenvector equation: $\boldsymbol{\pi}$ is a left eigenvector of $P$ with eigenvalue 1. The Perron—Frobenius theorem guarantees that every stochastic matrix has eigenvalue 1, and if the matrix is irreducible and aperiodic, the eigenvalue 1 has multiplicity 1 and the corresponding eigenvector is strictly positive.

Irreducible means the graph is strongly connected — you can get from any page to any other. Aperiodic means the GCD of cycle lengths through any state is 1. Under both conditions, the stationary distribution exists, is unique, and the random walk converges to it regardless of where it starts:

$\lim_{t \to \infty} \mathbf{x}^{(0)} P^t = \boldsymbol{\pi}$

The problem with the raw web

The real web graph isn’t strongly connected. There are pages with no outgoing links (dangling nodes), and there are clusters of pages that link only to each other (absorbing components). The raw transition matrix $P$ is neither irreducible nor aperiodic, so convergence isn’t guaranteed.

PageRank fixes this with a teleportation mechanism. With probability $\alpha$ (the damping factor), the surfer follows a random link. With probability $1 - \alpha$ , the surfer jumps to a uniformly random page:

$M = \alpha P + (1 - \alpha) \frac{1}{n} \mathbf{1}\mathbf{1}^T$

where $\mathbf{1}$ is the all-ones vector. The matrix $\frac{1}{n}\mathbf{1}\mathbf{1}^T$ has every entry equal to $1/n$ — it represents a uniform random jump to any page.

For dangling nodes (rows of $P$ that are all zeros because the page has no outlinks), we replace the zero row with $\frac{1}{n}\mathbf{1}^T$ before applying the formula. The adjusted matrix $M$ is:

Stochastic: every row sums to 1.
Irreducible: teleportation makes every state reachable from every other.
Aperiodic: the self-loop probability from teleportation is nonzero.

So Perron—Frobenius applies, and a unique stationary distribution $\boldsymbol{\pi}$ exists. This $\boldsymbol{\pi}$ is the PageRank vector.

Google originally used $\alpha = 0.85$ .

Computing PageRank: the power method

The simplest algorithm: start with any distribution $\mathbf{x}^{(0)}$ (usually uniform: $x_i^{(0)} = 1/n$ ) and iterate:

$\mathbf{x}^{(t+1)} = \mathbf{x}^{(t)} M = \alpha \, \mathbf{x}^{(t)} P + (1 - \alpha) \frac{1}{n} \mathbf{1}^T$

This is the power method for finding the dominant eigenvector. The convergence rate is determined by the spectral gap — the difference between the largest eigenvalue (1) and the second largest $|\lambda_2|$ . For the PageRank matrix:

$|\lambda_2| \leq \alpha$

so the number of iterations to reach precision $\epsilon$ is:

$t = O\left(\frac{\log(1/\epsilon)}{\log(1/\alpha)}\right)$

With $\alpha = 0.85$ , this is about $O(\log(1/\epsilon) / 0.163)$ , roughly 50—100 iterations for practical precision. Each iteration is a sparse matrix-vector product — $O(\text{nnz})$ where nnz is the number of nonzero entries (links), which scales linearly with the size of the web.

Implementation

Here’s a complete implementation in Python. First, building the transition matrix from a link graph:

import numpy as np
from scipy import sparse


def build_transition_matrix(adjacency):
    """
    Build sparse row-stochastic transition matrix from adjacency list.
    adjacency[i] is a list of pages that page i links to.
    """
    n = len(adjacency)
    rows, cols, data = [], [], []
    dangling = []

    for i, neighbors in enumerate(adjacency):
        if len(neighbors) == 0:
            dangling.append(i)
        else:
            prob = 1.0 / len(neighbors)
            for j in neighbors:
                rows.append(i)
                cols.append(j)
                data.append(prob)

    P = sparse.csr_matrix((data, (rows, cols)), shape=(n, n))
    return P, dangling

The power iteration with teleportation and dangling node handling:

def pagerank(adjacency, alpha=0.85, tol=1e-10, max_iter=200):
    """
    Compute PageRank via power iteration.
    Returns the stationary distribution as a numpy array.
    """
    n = len(adjacency)
    P, dangling = build_transition_matrix(adjacency)

    x = np.ones(n) / n  # uniform initial distribution

    for iteration in range(max_iter):
        # Dangling node mass: redistributed uniformly
        dangling_mass = sum(x[d] for d in dangling)

        # Power iteration step:
        # x_new = alpha * (x @ P + dangling_mass/n * ones) + (1-alpha)/n * ones
        x_new = alpha * (x @ P + dangling_mass / n) + (1 - alpha) / n

        # Check convergence (L1 norm)
        delta = np.abs(x_new - x).sum()
        x = x_new

        if delta < tol:
            print(f"Converged in {iteration + 1} iterations (delta={delta:.2e})")
            break

    return x

Testing on a small graph:

# 4-page web:
#   0 -> 1, 2
#   1 -> 2
#   2 -> 0
#   3 -> 0, 2  (page 3 links to 0 and 2)
adjacency = [
    [1, 2],     # page 0
    [2],        # page 1
    [0],        # page 2
    [0, 2],    # page 3
]

pr = pagerank(adjacency, alpha=0.85)
for i, score in enumerate(sorted(enumerate(pr), key=lambda x: -x[1])):
    print(f"  Page {score[0]}: {score[1]:.6f}")

Output:

Converged in 32 iterations (delta=8.71e-11)
  Page 2: 0.343530
  Page 0: 0.327797
  Page 1: 0.192336
  Page 3: 0.136336

Page 2 ranks highest — it receives links from three other pages. Page 3 ranks lowest — nothing links to it, so it only gets teleportation traffic.

The eigenvalue perspective

We can verify PageRank by computing the actual dominant left eigenvector of $M$ :

def pagerank_eigen(adjacency, alpha=0.85):
    """Compute PageRank via direct eigenvalue decomposition."""
    n = len(adjacency)
    P, dangling = build_transition_matrix(adjacency)
    P_dense = P.toarray()

    # Fix dangling nodes
    for d in dangling:
        P_dense[d, :] = 1.0 / n

    # Build Google matrix
    M = alpha * P_dense + (1 - alpha) / n * np.ones((n, n))

    # Left eigenvectors: solve pi @ M = pi, i.e., M^T @ pi^T = pi^T
    eigenvalues, eigenvectors = np.linalg.eig(M.T)

    # Find eigenvector for eigenvalue closest to 1
    idx = np.argmin(np.abs(eigenvalues - 1.0))
    pi = np.real(eigenvectors[:, idx])
    pi = pi / pi.sum()  # normalize to probability distribution

    return pi

Both methods produce identical results, confirming the stationary distribution is the dominant eigenvector.

Rate of convergence

The spectral gap controls how fast the power method converges. The second eigenvalue of the PageRank matrix satisfies $|\lambda_2| \leq \alpha$ . To see why, write $M = \alpha P' + (1 - \alpha) E$ where $P'$ is the adjusted stochastic matrix and $E = \frac{1}{n}\mathbf{1}\mathbf{1}^T$ . Since $E$ has rank 1 with eigenvalue 1 (and all others 0), and the eigenvalues of a convex combination of matrices are bounded by the convex combination of their eigenvalues:

$|\lambda_k(M)| \leq \alpha |\lambda_k(P')| + (1 - \alpha)|\lambda_k(E)| \leq \alpha \cdot 1 + 0 = \alpha \quad \text{for } k \geq 2$

The error after $t$ iterations decays as $|\lambda_2|^t \leq \alpha^t$ . After $t$ iterations:

$\|\mathbf{x}^{(t)} - \boldsymbol{\pi}\|_1 \leq C \cdot \alpha^t$

For $\alpha = 0.85$ and $\epsilon = 10^{-10}$ : $t \geq \frac{10 \ln 10}{\ln(1/0.85)} \approx 142$ iterations. In practice it converges faster because $|\lambda_2|$ is usually well below $\alpha$ .

Beyond PageRank

The mathematical framework here — stochastic matrices, stationary distributions, spectral gaps — shows up throughout computer science. MCMC methods (Metropolis—Hastings, Gibbs sampling) are Markov chains engineered to have a target stationary distribution. Mixing times of random walks on expander graphs underpin randomized algorithms. The conductance of a Markov chain (the Cheeger inequality) connects spectral gaps to graph connectivity.

PageRank itself is a specific instance of eigenvector centrality on directed graphs. The same idea — defining importance recursively and resolving the circularity with linear algebra — appears in recommendation systems, citation analysis, and social network ranking.

The math is classical. The insight was applying it to the web.