Markov Chains, Eigenvectors, and PageRank
PageRank is a beautiful collision of linear algebra and probability. The core idea — a webpage is important if important pages link to it — is circular, and the math that resolves that circularity is the theory of Markov chains.
Random walks on graphs
Model the web as a directed graph. Each page is a node; each hyperlink is an edge. A “random surfer” starts at some page and, at each step, follows a uniformly random outgoing link.
If page has outgoing links and one of them points to page , the probability of transitioning from to in one step is:
This defines a transition matrix , where is the number of pages. Each row of sums to 1 — it’s a row-stochastic matrix. The entry is the probability of going from state to state .
If the surfer’s current position is described by a probability distribution (row vector) , then after one step:
The question PageRank asks: does this process converge? Is there a stationary distribution such that ?
Stationary distributions
A stationary distribution satisfies:
This is a left eigenvector equation: is a left eigenvector of with eigenvalue 1. The Perron—Frobenius theorem guarantees that every stochastic matrix has eigenvalue 1, and if the matrix is irreducible and aperiodic, the eigenvalue 1 has multiplicity 1 and the corresponding eigenvector is strictly positive.
Irreducible means the graph is strongly connected — you can get from any page to any other. Aperiodic means the GCD of cycle lengths through any state is 1. Under both conditions, the stationary distribution exists, is unique, and the random walk converges to it regardless of where it starts:
The problem with the raw web
The real web graph isn’t strongly connected. There are pages with no outgoing links (dangling nodes), and there are clusters of pages that link only to each other (absorbing components). The raw transition matrix is neither irreducible nor aperiodic, so convergence isn’t guaranteed.
PageRank fixes this with a teleportation mechanism. With probability (the damping factor), the surfer follows a random link. With probability , the surfer jumps to a uniformly random page:
where is the all-ones vector. The matrix has every entry equal to — it represents a uniform random jump to any page.
For dangling nodes (rows of that are all zeros because the page has no outlinks), we replace the zero row with before applying the formula. The adjusted matrix is:
- Stochastic: every row sums to 1.
- Irreducible: teleportation makes every state reachable from every other.
- Aperiodic: the self-loop probability from teleportation is nonzero.
So Perron—Frobenius applies, and a unique stationary distribution exists. This is the PageRank vector.
Google originally used .
Computing PageRank: the power method
The simplest algorithm: start with any distribution (usually uniform: ) and iterate:
This is the power method for finding the dominant eigenvector. The convergence rate is determined by the spectral gap — the difference between the largest eigenvalue (1) and the second largest . For the PageRank matrix:
so the number of iterations to reach precision is:
With , this is about , roughly 50—100 iterations for practical precision. Each iteration is a sparse matrix-vector product — where nnz is the number of nonzero entries (links), which scales linearly with the size of the web.
Implementation
Here’s a complete implementation in Python. First, building the transition matrix from a link graph:
import numpy as np
from scipy import sparse
def build_transition_matrix(adjacency):
"""
Build sparse row-stochastic transition matrix from adjacency list.
adjacency[i] is a list of pages that page i links to.
"""
n = len(adjacency)
rows, cols, data = [], [], []
dangling = []
for i, neighbors in enumerate(adjacency):
if len(neighbors) == 0:
dangling.append(i)
else:
prob = 1.0 / len(neighbors)
for j in neighbors:
rows.append(i)
cols.append(j)
data.append(prob)
P = sparse.csr_matrix((data, (rows, cols)), shape=(n, n))
return P, dangling
The power iteration with teleportation and dangling node handling:
def pagerank(adjacency, alpha=0.85, tol=1e-10, max_iter=200):
"""
Compute PageRank via power iteration.
Returns the stationary distribution as a numpy array.
"""
n = len(adjacency)
P, dangling = build_transition_matrix(adjacency)
x = np.ones(n) / n # uniform initial distribution
for iteration in range(max_iter):
# Dangling node mass: redistributed uniformly
dangling_mass = sum(x[d] for d in dangling)
# Power iteration step:
# x_new = alpha * (x @ P + dangling_mass/n * ones) + (1-alpha)/n * ones
x_new = alpha * (x @ P + dangling_mass / n) + (1 - alpha) / n
# Check convergence (L1 norm)
delta = np.abs(x_new - x).sum()
x = x_new
if delta < tol:
print(f"Converged in {iteration + 1} iterations (delta={delta:.2e})")
break
return x
Testing on a small graph:
# 4-page web:
# 0 -> 1, 2
# 1 -> 2
# 2 -> 0
# 3 -> 0, 2 (page 3 links to 0 and 2)
adjacency = [
[1, 2], # page 0
[2], # page 1
[0], # page 2
[0, 2], # page 3
]
pr = pagerank(adjacency, alpha=0.85)
for i, score in enumerate(sorted(enumerate(pr), key=lambda x: -x[1])):
print(f" Page {score[0]}: {score[1]:.6f}")
Output:
Converged in 32 iterations (delta=8.71e-11)
Page 2: 0.343530
Page 0: 0.327797
Page 1: 0.192336
Page 3: 0.136336
Page 2 ranks highest — it receives links from three other pages. Page 3 ranks lowest — nothing links to it, so it only gets teleportation traffic.
The eigenvalue perspective
We can verify PageRank by computing the actual dominant left eigenvector of :
def pagerank_eigen(adjacency, alpha=0.85):
"""Compute PageRank via direct eigenvalue decomposition."""
n = len(adjacency)
P, dangling = build_transition_matrix(adjacency)
P_dense = P.toarray()
# Fix dangling nodes
for d in dangling:
P_dense[d, :] = 1.0 / n
# Build Google matrix
M = alpha * P_dense + (1 - alpha) / n * np.ones((n, n))
# Left eigenvectors: solve pi @ M = pi, i.e., M^T @ pi^T = pi^T
eigenvalues, eigenvectors = np.linalg.eig(M.T)
# Find eigenvector for eigenvalue closest to 1
idx = np.argmin(np.abs(eigenvalues - 1.0))
pi = np.real(eigenvectors[:, idx])
pi = pi / pi.sum() # normalize to probability distribution
return pi
Both methods produce identical results, confirming the stationary distribution is the dominant eigenvector.
Rate of convergence
The spectral gap controls how fast the power method converges. The second eigenvalue of the PageRank matrix satisfies . To see why, write where is the adjusted stochastic matrix and . Since has rank 1 with eigenvalue 1 (and all others 0), and the eigenvalues of a convex combination of matrices are bounded by the convex combination of their eigenvalues:
The error after iterations decays as . After iterations:
For and : iterations. In practice it converges faster because is usually well below .
Beyond PageRank
The mathematical framework here — stochastic matrices, stationary distributions, spectral gaps — shows up throughout computer science. MCMC methods (Metropolis—Hastings, Gibbs sampling) are Markov chains engineered to have a target stationary distribution. Mixing times of random walks on expander graphs underpin randomized algorithms. The conductance of a Markov chain (the Cheeger inequality) connects spectral gaps to graph connectivity.
PageRank itself is a specific instance of eigenvector centrality on directed graphs. The same idea — defining importance recursively and resolving the circularity with linear algebra — appears in recommendation systems, citation analysis, and social network ranking.
The math is classical. The insight was applying it to the web.