Persistent Homology
Key Points
- •Persistent homology tracks how topological features (components, loops, voids) appear and disappear as you grow a scale parameter over a filtered simplicial complex.
- •A filtration is an increasing sequence of complexes where every simplex enters at a specific scale but only after all its faces.
- •The core computation reduces a boundary matrix over a field (often \(\)) to obtain birth–death pairs and barcodes/diagrams.
- •Vietoris–Rips filtrations are built from point clouds by inserting simplices when all pairwise distances are below a threshold, and the filtration value is the maximum edge length.
- •Zero columns in the reduced boundary matrix create features; nonzero columns kill previously created features, forming birth–death pairs.
- •Long bars often indicate meaningful shape while short bars are more likely noise, and stability theorems quantify this robustness.
- •Computing only \(\) (connected components) is very fast with union–find and is equivalent to Kruskal’s MST process.
- •Full persistent homology can be cubic in the number of simplices, so careful filtration design and low-dimensional truncation are crucial.
Prerequisites
- →Basic linear algebra — Understanding vector spaces, matrices, kernels, images, and rank is needed for chain groups and boundary matrix reduction.
- →Graph theory and combinatorics — Simplicial complexes generalize graphs; counting and constructing simplices relies on combinatorial reasoning.
- →Point-set topology (introductory) — Concepts like connectedness and loops clarify the meaning of H0 and H1 features.
- →Algorithms and data structures — Efficient implementations need sorting, hash maps, union–find, and sparse representations.
- →Metric spaces and distances — Rips filtrations depend on pairwise distances; properties like triangle inequality matter.
- →Computational geometry (basics) — Alternative filtrations (alpha complexes) use Delaunay/Voronoi structures and geometric primitives.
Detailed Explanation
Tap terms for definitions01Overview
Persistent homology is a method from topological data analysis (TDA) that summarizes the shape of data across scales. Given data (for example, a point cloud), we do not fix a single scale; instead, we build a family of shapes—called a filtration—by steadily relaxing a scale parameter. As the scale grows, new connections form, loops appear, and voids are filled. Persistent homology records when each feature is born and when it dies, producing a multiscale signature of the data such as a barcode (intervals) or a persistence diagram (points in the plane). Practically, we first turn data into a simplicial complex (e.g., Vietoris–Rips or alpha complex), assign each simplex a filtration value that respects the face condition (faces appear no later than the simplex), sort simplices by this order, build a boundary matrix over a field, and perform a matrix reduction. The pivot structure of the reduced matrix reveals birth–death pairs across homology dimensions ((H_0) for components, (H_1) for loops, (H_2) for voids, etc.). The result is robust: small perturbations of input cause small changes in the diagram. Persistent homology has become a standard tool for understanding structure in high-dimensional and noisy data.
02Intuition & Analogies
Imagine slowly flooding a landscape. At first, isolated puddles form in the lowest areas—these are connected components ((H_0)). As the water level rises, puddles merge, and some components die—the moment two pools connect, one of their identities is lost. Occasionally, water can surround a hill, creating a loop ((H_1)); later, as the level keeps rising, that loop is filled in and dies. In higher dimensions, the same story repeats with voids ((H_2)), like air pockets in a sponge, which appear and later disappear. Persistent homology is the logbook of this flood, noting the water levels (scales) at which each feature appears (birth) and vanishes (death). For point clouds, replace the water with growing balls around each point. Where balls touch, edges appear; where triangles are supported by mutually touching triplets, loops form; further connections can fill and then kill those loops. If a loop persists over a large range of radii, it’s like a ring-shaped canyon—hard to destroy—so it shows as a long bar and likely reflects real structure. Short bars often correspond to small, local irregularities—ripples in the terrain—more likely caused by noise. The beauty is that we don’t have to pick a single radius (which can be arbitrary and fragile); instead, we read structure across all radii and trust the features that endure.
03Formal Definition
04When to Use
Use persistent homology when you need scale-robust shape descriptors of data. Typical scenarios include: (1) point clouds in (\mathbb{R}^d), where you suspect clusters, loops, or voids (e.g., sensor trajectories, molecular conformations); (2) images or voxel data, where cubical complexes capture tunnels and cavities; (3) graphs and networks, where (H_0) persistence provides a principled multiscale clustering (via Kruskal/MST) and higher-dimensional features can be studied on clique complexes; (4) time series, via sliding-window embeddings that turn dynamics into loops/torii; (5) model comparison, where persistence diagrams allow quantitative comparison through distances such as the bottleneck distance. Choose Vietoris–Rips for generic metric data (simple to build, but combinatorially heavy), alpha complexes for Euclidean point clouds (geometrically faithful and sparser), or cubical complexes for gridded data. Compute only as high a dimension as needed (often (H_0)–(H_1)), and restrict the filtration range to control complexity. Persistent homology is especially valuable when you need invariance to isometries and robustness to noise and when conventional clustering or dimensionality reduction misses global structure like loops.
⚠️Common Mistakes
• Violating the face condition: inserting a simplex before its faces leads to invalid filtrations and wrong pairings. Always ensure every (k)-simplex enters no earlier than all its ((k-1))-faces. • Mishandling ties: when multiple simplices share the same filtration value, inconsistent ordering can change pairings. Break ties deterministically—e.g., by increasing dimension, then lexicographic vertex order. • Ignoring coefficient fields: working over (\mathbb{Z}2) simplifies signs; using other fields changes results (especially torsion). State and fix the field upfront. • Overbuilding Rips complexes: they blow up combinatorially; computing up to dimension 1 or 2 and capping the radius often suffices. • Misinterpreting infinite bars: long (H_0) bar corresponds to the final connected component (the whole space), not a special feature. • Numerical instability: distances that are nearly equal can cause large tie sets; snap with a tolerance and use a stable sort. • Memory blowups: dense boundary matrices are huge; prefer sparse column representations and avoid constructing unnecessary higher-dimensional simplices. • Conflating homology dimensions: births in column-reduction happen at the column’s own dimension; deaths recorded by a column of dimension (k) correspond to features in (H{k-1}). Keep this dimension shift clear.
Key Formulas
Face Condition for Filtrations
Explanation: A simplex can only appear after all of its faces. This guarantees each filtration level is a valid simplicial complex and enables correct persistence pairing.
Vietoris–Rips Complex
Explanation: A simplex is included if all its edges are short. In a filtration, one sets the filtration value of \(\) to the maximum edge length inside it.
Chain Groups and Boundary Maps
Explanation: Chains are vectors over simplices; boundary maps send k-chains to (k−1)-chains and compose to zero. This structure defines cycles and boundaries.
Homology at Scale t
Explanation: Homology groups capture k-dimensional holes present at filtration level t by modding out boundaries from cycles.
Betti Numbers
Explanation: The number of independent k-dimensional features at time t. These counts can change only when a simplex enters the filtration.
Persistence Module Maps
Explanation: Inclusion-induced maps track how homology classes evolve across scales. Their structure decomposes into interval modules, giving barcodes.
Stability Theorem (Bottleneck Distance)
Explanation: Small perturbations of the filtration function move the persistence diagram by at most the perturbation size. This justifies robustness of persistent homology.
Combinatorial Growth
Explanation: The number of simplices grows rapidly with n and k. This limits practical computations to low dimensions and moderate n.
Boundary Matrix Reduction Cost
Explanation: Reducing an m-by-m boundary matrix densely can be cubic. Sparse methods and heuristics often make it much faster in practice.
H0 via Union–Find
Explanation: Zero-dimensional persistence using Kruskal’s process is near-linear in the number of edges, where \(\) is the inverse Ackermann function.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // This educational example builds a Vietoris–Rips filtration up to dimension 2 5 // from a small 2D point cloud, reduces the boundary matrix over Z2, and prints 6 // persistence intervals for H0 and H1. It is NOT optimized and intended for small inputs. 7 8 struct Simplex { 9 vector<int> verts; // sorted vertex indices 10 int dim; // dimension = verts.size()-1 11 double filt; // filtration value (for Rips: max edge length in the simplex) 12 }; 13 14 // Key encoding for a simplex's vertex set (sorted) to map to its index 15 static string keyOf(const vector<int>& v){ 16 string s; s.reserve(v.size()*3); 17 for(size_t i=0;i<v.size();++i){ 18 if(i) s.push_back(','); 19 s += to_string(v[i]); 20 } 21 return s; 22 } 23 24 // Symmetric difference (XOR) of two sorted vectors of ints 25 static vector<int> symdiff(const vector<int>& a, const vector<int>& b){ 26 vector<int> out; out.reserve(a.size()+b.size()); 27 size_t i=0,j=0; 28 while(i<a.size() || j<b.size()){ 29 if(j==b.size() || (i<a.size() && a[i]<b[j])){ out.push_back(a[i++]); } 30 else if(i==a.size() || b[j]<a[i]){ out.push_back(b[j++]); } 31 else { // equal -> cancel in Z2 32 ++i; ++j; 33 } 34 } 35 return out; 36 } 37 38 // Compute Euclidean distances 39 static vector<vector<double>> pairwiseDistances(const vector<pair<double,double>>& pts){ 40 int n = (int)pts.size(); 41 vector<vector<double>> D(n, vector<double>(n, 0.0)); 42 for(int i=0;i<n;i++){ 43 for(int j=i+1;j<n;j++){ 44 double dx = pts[i].first - pts[j].first; 45 double dy = pts[i].second - pts[j].second; 46 double d = sqrt(dx*dx + dy*dy); 47 D[i][j]=D[j][i]=d; 48 } 49 } 50 return D; 51 } 52 53 int main(){ 54 // Example point cloud: noisy circle-like points 55 vector<pair<double,double>> P = { 56 {1.0,0.0},{0.707,0.707},{0.0,1.0},{-0.707,0.707},{-1.0,0.0}, 57 {-0.707,-0.707},{0.0,-1.0},{0.707,-0.707},{1.2,0.1},{-1.1,-0.1} 58 }; 59 int n = (int)P.size(); 60 auto D = pairwiseDistances(P); 61 62 // Build Vietoris–Rips filtration up to dim=2 with a radius cap to keep size small 63 double maxR = 1.1; // cap (adjust to control size). Triangles/edges with filt > maxR are skipped. 64 65 vector<Simplex> simplices; 66 simplices.reserve(n + n*(n-1)/2); 67 68 // 0-simplices (vertices) with filtration 0 69 for(int i=0;i<n;i++){ 70 simplices.push_back({{i}, 0, 0.0}); 71 } 72 73 // 1-simplices (edges) 74 for(int i=0;i<n;i++){ 75 for(int j=i+1;j<n;j++){ 76 double w = D[i][j]; 77 if(w <= maxR){ 78 simplices.push_back({{i,j}, 1, w}); 79 } 80 } 81 } 82 83 // 2-simplices (triangles): include if all three edges <= maxR (i.e., max edge length <= maxR) 84 for(int i=0;i<n;i++){ 85 for(int j=i+1;j<n;j++){ 86 if(D[i][j] > maxR) continue; 87 for(int k=j+1;k<n;k++){ 88 double m = max({D[i][j], D[i][k], D[j][k]}); 89 if(m <= maxR){ 90 simplices.push_back({{i,j,k}, 2, m}); 91 } 92 } 93 } 94 } 95 96 // Sort by (filtration, dimension, lex vertices) to enforce a consistent order 97 stable_sort(simplices.begin(), simplices.end(), [](const Simplex& a, const Simplex& b){ 98 if (a.filt != b.filt) return a.filt < b.filt; 99 if (a.dim != b.dim) return a.dim < b.dim; // faces first on ties 100 return a.verts < b.verts; 101 }); 102 103 int m = (int)simplices.size(); 104 105 // Map simplex key -> index in sorted order 106 unordered_map<string,int> indexOf; indexOf.reserve(m*2); 107 for(int i=0;i<m;i++){ 108 indexOf[keyOf(simplices[i].verts)] = i; 109 } 110 111 // Build boundary columns: store each column as sorted list of row indices with 1s (Z2) 112 vector<vector<int>> boundary(m); 113 for(int col=0; col<m; col++){ 114 const auto& s = simplices[col]; 115 if(s.dim == 0) continue; // vertex has empty boundary 116 // faces by removing one vertex at a time 117 for(int t=0; t<(int)s.verts.size(); t++){ 118 vector<int> face = s.verts; 119 face.erase(face.begin()+t); 120 auto it = indexOf.find(keyOf(face)); 121 if(it == indexOf.end()){ 122 cerr << "Error: face not found in filtration (violates face condition)\n"; 123 return 1; 124 } 125 boundary[col].push_back(it->second); 126 } 127 // Sort rows to enable XOR via symmetric difference 128 sort(boundary[col].begin(), boundary[col].end()); 129 } 130 131 // Column reduction over Z2 132 vector<int> low(m, -1); // low[j] = pivot row of reduced column j, or -1 if zero column 133 unordered_map<int,int> pivotCol; // pivot row -> column having that low 134 pivotCol.reserve(m*2); 135 136 for(int j=0; j<m; j++){ 137 auto& colVec = boundary[j]; 138 // compute low(j) 139 auto getLow = [&](){ return colVec.empty() ? -1 : colVec.back(); }; 140 int lj = getLow(); 141 while(lj != -1){ 142 auto it = pivotCol.find(lj); 143 if(it == pivotCol.end()) break; 144 int k = it->second; // existing column with same low 145 // XOR columns: col_j = col_j + col_k over Z2 146 colVec = symdiff(colVec, boundary[k]); 147 lj = getLow(); 148 } 149 low[j] = lj; 150 if(lj != -1){ 151 pivotCol[lj] = j; // j kills the class born at row lj 152 } 153 } 154 155 // Collect intervals: 156 // - If column j is zero after reduction (low[j]==-1), it creates an H_{dim(s_j)} class at filt(s_j) with infinite death (unless killed later by higher-dim columns, which appears via their pairing) 157 // - If low[j]==i, then column j (dim d) kills a class in H_{d-1} that was born at column i's filtration 158 159 struct Interval { int dim; double birth, death; bool infinite; }; 160 vector<Interval> intervals; 161 162 // Map pivot row -> its birth simplex index is itself (column index equal to row index) 163 // Create finite intervals from nonzero columns 164 for(int j=0;j<m;j++){ 165 if(low[j] != -1){ 166 int i = low[j]; 167 int d = simplices[j].dim - 1; // killed feature dimension 168 if(d >= 0){ 169 intervals.push_back({d, simplices[i].filt, simplices[j].filt, false}); 170 } 171 } 172 } 173 // Create infinite intervals from zero columns that were never used as a pivot row 174 // A zero column at dim d births an H_d class; if its index appears as some low[j], it is killed; otherwise it persists to infinity 175 vector<char> isKilled(m, 0); 176 for(int j=0;j<m;j++) if(low[j]!=-1) isKilled[low[j]] = 1; 177 for(int i=0;i<m;i++){ 178 if(!isKilled[i]){ 179 // Column i either zero or nonzero. Only zero columns are births; nonzero columns correspond to deaths of lower-dim classes 180 if(low[i] == -1){ 181 int d = simplices[i].dim; 182 intervals.push_back({d, simplices[i].filt, numeric_limits<double>::infinity(), true}); 183 } 184 } 185 } 186 187 // Print results grouped by dimension 188 cout.setf(std::ios::fixed); cout<<setprecision(4); 189 cout << "Intervals (dimension 0):\n"; 190 for(const auto& I: intervals){ 191 if(I.dim==0){ 192 cout << " [" << I.birth << ", " << (I.infinite? string("inf"): to_string(I.death)) << ")\n"; 193 } 194 } 195 cout << "Intervals (dimension 1):\n"; 196 for(const auto& I: intervals){ 197 if(I.dim==1){ 198 cout << " [" << I.birth << ", " << (I.infinite? string("inf"): to_string(I.death)) << ")\n"; 199 } 200 } 201 202 return 0; 203 } 204
This program constructs a Vietoris–Rips filtration up to dimension 2 from a small 2D point set. Each 0-simplex has filtration 0; each edge’s filtration is its length; each triangle’s filtration is the maximum of its three edge lengths. After sorting simplices by (filtration, dimension, lexicographic vertices), it builds a sparse boundary matrix over Z2, performs the standard column-reduction algorithm, and extracts intervals. A nonzero reduced column of dimension d pairs with its pivot row to give an H_{d-1} interval [birth_of_pivot, filtration_of_column). A zero column at dimension d creates an H_d class which persists unless later killed (which would be reflected as someone else’s pivot). The output lists H0 and H1 intervals.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Compute zero-dimensional persistence (H0) from a point cloud by 5 // sorting all edges by length and unioning components (Kruskal). 6 // Each union corresponds to a death time for one H0 bar; one bar persists to infinity. 7 8 struct DSU { 9 vector<int> p, r; 10 DSU(int n=0){init(n);} 11 void init(int n){ p.resize(n); r.assign(n,0); iota(p.begin(), p.end(), 0);} 12 int find(int x){ return p[x]==x?x:p[x]=find(p[x]); } 13 bool unite(int a, int b){ a=find(a); b=find(b); if(a==b) return false; if(r[a]<r[b]) swap(a,b); p[b]=a; if(r[a]==r[b]) r[a]++; return true; } 14 }; 15 16 int main(){ 17 vector<pair<double,double>> P = { 18 {0,0},{1,0},{2,0},{0,1},{1,1},{2,1} 19 }; // 6 points in a grid-like pattern 20 int n = (int)P.size(); 21 22 // Build all edges with their lengths 23 struct Edge{int u,v; double w;}; 24 vector<Edge> E; 25 for(int i=0;i<n;i++) for(int j=i+1;j<n;j++){ 26 double dx=P[i].first-P[j].first, dy=P[i].second-P[j].second; 27 double d=hypot(dx,dy); 28 E.push_back({i,j,d}); 29 } 30 sort(E.begin(), E.end(), [](const Edge& a, const Edge& b){ return a.w < b.w; }); 31 32 // Initially each vertex births an H0 class at time 0 33 // Each successful union kills one class at the current edge weight 34 DSU dsu(n); 35 vector<pair<double,double>> intervals; // [birth, death) 36 int components = n; 37 for(const auto& e : E){ 38 if(dsu.unite(e.u, e.v)){ 39 intervals.push_back({0.0, e.w}); // one bar dies now (birth=0) 40 components--; 41 if(components==1) break; // remaining bar is infinite 42 } 43 } 44 45 cout.setf(std::ios::fixed); cout<<setprecision(4); 46 cout << "H0 intervals from Kruskal/Union-Find:\n"; 47 for(const auto& I : intervals){ 48 cout << " [" << I.first << ", " << I.second << ")\n"; 49 } 50 cout << " [0.0000, inf)\n"; // final component persists 51 return 0; 52 } 53
This code computes H0 persistence by sorting all pairwise edges and performing union–find merges—exactly Kruskal’s MST process. Each time two components merge under the smallest available edge, one H0 bar dies at that edge length, while all births occur at 0 for points. One component remains forever, yielding the infinite interval [0, ∞). This scales to large datasets and is the standard fast approach for connected-component persistence.