📚TheoryIntermediate

Topological Data Analysis (TDA)

Key Points

•
Topological Data Analysis (TDA) studies the shape of data using tools from algebraic topology, producing summaries like Betti numbers, barcodes, and persistence diagrams.
•
A Vietoris–Rips complex builds a simplicial complex from a point cloud by connecting points whose pairwise distances are within a threshold.
•
Persistent homology tracks how connected components, loops, and voids appear and disappear as a scale parameter grows, yielding robust multiscale features.
•
For 0-dimensional persistence, a Union–Find (disjoint set) structure with Kruskal’s algorithm computes barcodes efficiently from pairwise distances.
•
Euler characteristic curves computed from the 2-skeleton (vertices, edges, triangles) give quick shape proxies and can hint at loop counts.
•
Time-series data can be analyzed via lower-star filtrations on graph signals to obtain 0D persistence that identifies basins and merges.
•
Exact higher-dimensional homology via matrix reduction can be expensive, so practitioners cap the complex dimension or sparsify data.
•
Careful choice of distance metric, scale convention (radius vs. distance), and numerical stability is crucial for valid results.

Prerequisites

→Metric spaces and distance functions — TDA constructions like Rips and Čech complexes rely on pairwise distances and triangle inequalities.
→Basic graph theory — Understanding vertices, edges, connected components, and spanning trees is essential for 0D persistence and filtrations on graphs.
→Linear algebra — Homology is computed via kernels and images of boundary matrices; concepts like rank and nullity are fundamental.
→Algebraic topology (introductory) — Concepts such as simplices, homology groups, and Betti numbers form the theoretical backbone of TDA.
→Disjoint Set Union (Union–Find) — Efficient computation of 0D persistent homology relies on DSU with near-constant-time union/find operations.
→Sorting and algorithm analysis — Edge sorting (Kruskal) and complexity understanding are needed for practical TDA implementations.
→Computational geometry (basic) — Building complexes from point clouds requires distance computation and sometimes geometric structures like Delaunay triangulations.
→C++ vectors, iterators, and STL algorithms — Implementations use standard containers, sorting, and numeric utilities for performance and clarity.

Detailed Explanation

Tap terms for definitions

01Overview

Topological Data Analysis (TDA) brings ideas from algebraic topology to data science. Instead of focusing on specific coordinates or distances alone, TDA captures the global shape of data: how many connected pieces it has, how many loops or holes exist, and how these features persist across scales. The core workflow begins with a dataset embedded in a metric space, often a high-dimensional point cloud. We build a family of combinatorial objects called simplicial complexes (e.g., Vietoris–Rips complexes) that expand as a scale parameter increases. Homology provides a way to count k-dimensional features: connected components (k=0), loops (k=1), voids (k=2), and so on. Persistent homology tracks when such features appear (birth) and disappear (death) while sweeping the scale parameter, producing a barcode (collections of intervals) or a persistence diagram (points in the plane). These summaries are robust to noise, have theoretical stability guarantees, and often reveal structures that standard statistics or dimensionality reduction can miss. TDA applies broadly: clustering via 0D persistence, finding loops in sensor networks, detecting periodicity in time series, and characterizing shapes in images or molecular structures. While general persistent homology can be computationally heavy, practical pipelines use efficient data structures (like union–find for 0D), limit the complex’s dimension, or use sparsification to keep computations tractable.

02Intuition & Analogies

Imagine placing identical, growing balls around each data point. At a very small radius, every ball is isolated—many separate islands. As you inflate the balls, nearby balls touch and islands merge into continents. Keep inflating and tunnels (loops) might form—think of two peninsulas connected by narrow bridges, creating a loop around a lake. Inflate more, and those tunnels fill in, leaving solid land with no holes. This movie of shapes across scales is what persistent homology records: when each island appears and when it merges, when each loop opens and when it gets filled. Another analogy is turning up the volume on a 3D scanner fog: initially you only see scattered specks (points), but as density increases, shapes appear, holes open, and eventually everything becomes a solid blob. TDA asks: which features are real structures and which are fleeting mirages? Long-lived features (those that persist over a wide range of scales) are usually meaningful; short-lived blips are often noise. For 0D features (connected components), you can picture Kruskal’s minimum spanning tree process: start with isolated points and add the shortest edges first, merging clusters as edges appear. Each merge represents the death of a component at the edge length where it connected to an older one. For signals on a line (like a time series), imagine slowly flooding a landscape from the lowest valley upwards: each basin (local minimum) starts a new pool, and as water rises, pools connect; the younger pool dies at the height where the connection happens. These stories—growing balls, fog density, and rising water—match the precise filtrations used in TDA and help explain why persistence captures multi-scale shape robustly.

03Formal Definition

Let (X, d) be a finite metric space (e.g., a point cloud with Euclidean distance). For \(

ε

\geq

0\), the Vietoris–Rips complex \(

VR_{ε}

(X)\) is the abstract simplicial complex whose k-simplices are subsets \(

σ

\subseteq

X\) with all pairwise distances \(d(

x_{i}

x_{j}

)

\leq

ε

\). As \(

ε

\) increases, we obtain a filtration \(

VR_{ε_{1}}

(X)

\subseteq

VR_{ε_{2}}

(X)

\subseteq

\dots

\). Given a simplicial complex K, its k-th chain group over a field \(

\Bbb

F\) is \(

C_{k}

(K;

\Bbb

F)\), with boundary operators \(

\partial_{k}

C_{k}

\to

C_{k - 1}

\) satisfying \(

\partial_{k}

\circ

\partial_{k + 1}

= 0\). The k-th homology group is \(

H_{k}

(K;

\Bbb

F) =

ker

(

\partial_{k}

) /

im

(

\partial_{k + 1}

)\), and the k-th Betti number is \(

β_{k}

dim

H_{k}

\). Persistent homology studies the induced maps on homology along a filtration: for indices \(i

\leq

j\), the linear maps \(

H_{k}

(

K_{i}

)

\to

H_{k}

(

K_{j}

)\). A feature’s birth is the smallest index where it appears in homology, and its death is the first index where it maps to zero (e.g., gets filled or merged). The collection of intervals $[b, d)$ is the barcode; equivalently, the persistence diagram is the multiset of points $(b, d)$ in the plane. For 0D persistence on a complete graph weighted by distances, Kruskal’s algorithm yields birth at 0 for each vertex and deaths at the edge weights when components merge. The Euler characteristic of a finite complex is \(

χ

\sum_{k \geq 0}

(-1)^k

f_{k}

\sum_{k \geq 0}

(-1)^k

β_{k}

\), where \(

f_{k}

\) is the number of k-simplices.

04When to Use

Use TDA when you care about the shape of data beyond local distances or averages, especially under noise and across scales. Typical scenarios include: (1) Point clouds in moderate-to-high dimensions where cluster structure and loops matter—e.g., analyzing sensor trajectories, word embeddings, or single-cell RNA-seq manifolds. Here, 0D persistence supports hierarchical clustering (single linkage), and 1D persistence reveals cycles such as periodic behavior or nontrivial topology in embeddings. (2) Time series or scalar fields on graphs/grids via lower-star or sublevel filtrations, where 0D persistence identifies basins and merge levels (useful for peak/valley detection and denoising). (3) Image or volumetric data through cubical complexes, to detect voids and tunnels in materials science or medical imaging. (4) Quick global summaries—Euler characteristic curves or transforms—to compare shapes or monitor topology across thresholds. Choose Vietoris–Rips for simplicity and ease on metric data; choose \v{C}ech or alpha complexes for geometric fidelity when you have Euclidean points and want better approximation guarantees. Limit the complex’s maximal dimension or sparsify the data (e.g., witness complexes) when scalability is a concern. Use union–find for fast 0D computations; resort to matrix reduction (boundary matrix algorithms) for k ≥ 1 when exact homology is required.

⚠️Common Mistakes

• Confusing radius versus distance thresholds: some definitions use balls of radius r (edges when distance ≤ 2r) while others use a direct distance cutoff ε. Mixing these yields off-by-2 errors in barcodes. Always state the convention you adopt. • Ignoring normalization of features: Euclidean distance on unscaled coordinates can drown meaningful structure; standardize or choose an appropriate metric. • Building too large a complex: naive Vietoris–Rips scales combinatorially (edges O(n^2), triangles O(n^3), higher simplices explode). Cap the dimension and/or sparsify the dataset. • Misinterpreting short bars: not every short interval is noise; context matters. Conversely, long bars in very sparse data may reflect sampling artifacts. • Overlooking field dependence: Betti numbers can depend on the coefficient field (e.g., (\Bbb Z_2) vs. (\Bbb Q)). Be explicit about coefficients. • Assuming (\beta_1) of a graph equals (\beta_1) of the Rips complex: triangles in the complex fill cycles. If you only use the 1-skeleton, you may overcount loops. • Poor tie-breaking and numeric precision: equal distances or nearly equal function values require deterministic rules to avoid unstable pairings; use stable sorting and tolerances. • Treating infinite bars as finite: surviving features (e.g., the single global component) should be recorded with death at (+\infty) or a clear sentinel, not capped arbitrarily without noting it.

Key Formulas

Vietoris–Rips Complex

VR_{ε} (X) = {σ \subseteq X : d (x_{i}, x_{j}) \leq ε \forall x_{i}, x_{j} \in σ}

Explanation: A simplex is included when all its vertices are pairwise within distance $ε .$ Increasing ε yields a filtration capturing shape at multiple scales.

Čech Complex

\overset{ˇ}{C}_{r} (X) = {σ \subseteq X : x \in σ ⋂ B (x, r) \neq = \emptyset}

Explanation: A simplex is included if radius-r balls around its vertices have a common intersection. In Euclidean space, the Čech complex recovers the union-of-balls topology.

Homology Group

H_{k} (K; F) = ker (\partial_{k}) \/ im (\partial_{k + 1})

Explanation: The k-th homology is the space of k-cycles modulo k-boundaries. Its dimension gives the number of independent k-dimensional holes.

Betti Number

β_{k} = dim H_{k} (K; F)

Explanation: The Betti number is the rank (dimension) of the k-th homology group. It counts independent features like components ( $β 0),$ loops ( $β 1),$ and voids ( $β 2) .$

Euler Characteristic

χ (K) = k \geq 0 \sum (- 1)^{k} f_{k} = k \geq 0 \sum (- 1)^{k} β_{k}

Explanation: The Euler characteristic equals the alternating sum of simplex counts or Betti numbers. It provides a compact summary linking combinatorics and topology.

Filtration

K_{1} \subseteq K_{2} \subseteq \dots \subseteq K_{m}

Explanation: A nested sequence of complexes indexed by a parameter (e.g., $ε) .$ Persistent homology studies how homology changes along this chain.

Persistence Diagram

D = {(b_{i}, d_{i})}_{i = 1}^{N}

Explanation: A multiset of birth–death pairs encoding the lifetime of topological features. Points far from the diagonal indicate persistent, likely meaningful features.

Stability Theorem (Bottleneck Distance)

d_{B} (D_{f}, D_{g}) \leq ∥ f - g ∥_{\infty}

Explanation: The bottleneck distance between diagrams from functions f and g is bounded by the sup norm of their difference. Persistence is robust to small perturbations.

Cyclomatic Number (Graph Only)

β_{1} (G) = ∣ E ∣ - ∣ V ∣ + c

Explanation: For a pure graph (1-skeleton), the first Betti number equals edges minus vertices plus number of connected components. In a Rips complex, filled triangles reduce $β 1.$

Lower-Star Filtration Time

t (u) = f (u), t (uv) = max {f (u), f (v)}

Explanation: Vertices enter at their function values; edges at the max over endpoints. This underlies 0D persistent homology on graphs and time series.

Rank–Nullity Form of Betti

β_{k} = dim ker (\partial_{k}) - dim im (\partial_{k + 1})

Explanation: By linear algebra, the Betti number equals cycles minus boundaries. This connects homology computation to matrix ranks.

Boundary Matrix Reduction Cost

T (m) = O (m^{3}) in worst case

Explanation: Computing full persistent homology with matrix reduction can be cubic in the number of simplices m in the worst case, motivating sparsification and truncation.

Euler Characteristic of 2-Skeleton

χ (ε) = f_{0} - f_{1} (ε) + f_{2} (ε)

Explanation: For a Rips 2-skeleton at scale $ε,$ χ equals vertices minus edges plus triangles. As ε grows, tracking χ( $ε)$ yields an Euler characteristic curve.

0D Persistence–MST Connection

0D deaths = {edge weights added by Kruskal’s MST}

Explanation: In single-linkage clustering, each merge corresponds to an MST edge. Thus 0D death times are exactly the MST edge lengths (under distance filtration).

Complexity Analysis

Building Vietoris–Rips complexes naively is combinatorial: for n points, there are O(

n^{2}

) potential edges, O(

n^{3}

) potential triangles, and in general O(

n^{k + 1}

) k-simplices. For 0D persistence on a complete graph weighted by pairwise distances, we avoid enumerating higher simplices and use Kruskal’s algorithm with a Union–Find (DSU) structure. Constructing all edges costs O(

n^{2}

) time and space; sorting edges costs O(

n^{2}

lo g

n). DSU unions and finds are effectively constant-time amortized, O(

α

(n)) per operation, where \(

α

\) is the inverse Ackermann function. Thus, 0D persistence via MST runs in O(

n^{2}

lo g

n) time and O(

n^{2}

) space dominated by edges. For time-series 0D persistence using a lower-star filtration on a path graph of length n, we sort vertices by value in O(n

lo g

n) and perform at most n unions, for an overall O(n

lo g

n) time (or O(n) if values are pre-ordered) and O(n) space. Computing Euler characteristic curves using the 2-skeleton requires counting edges and triangles at each scale

ε .

A naive approach per ε takes O(

n^{2}

) for edges and O(

n^{3}

) for triangles, so with S scales the total is O(S

n^{3}

); this is practical only for small n or small S. In contrast, exact higher-dimensional persistent homology via boundary matrix reduction has worst-case O(

m^{3}

) time and O(

m^{2}

) space in terms of the number of simplices m, which can itself be exponential in n. Therefore, practical pipelines cap the maximal simplex dimension (e.g., up to 2), sparsify data

\frac{l an d ma r k s}{w i t n esses}

, or use approximate neighbors (k-NN graphs) to reduce edge counts, leading to significant savings at the cost of approximation.

Code Examples

0D Persistent Homology (Vietoris–Rips on a Point Cloud) via Union–Find

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct DSU {
5     vector<int> parent, rnk;
6     vector<double> birth; // birth time of component at the root
7     DSU(int n, double initBirth = 0.0) : parent(n), rnk(n, 0), birth(n, initBirth) {
8         iota(parent.begin(), parent.end(), 0);
9     }
10     int find(int x) {
11         if (parent[x] == x) return x;
12         parent[x] = find(parent[x]);
13         return parent[x];
14     }
15     // Union by rank; return pair (survivor_root, dead_root)
16     pair<int,int> unite(int a, int b) {
17         a = find(a); b = find(b);
18         if (a == b) return {a, -1};
19         // Older component survives: smaller birth time; tie-break by id for determinism
20         if (birth[a] > birth[b] || (birth[a] == birth[b] && a > b)) swap(a, b);
21         if (rnk[a] < rnk[b]) swap(a, b);
22         parent[b] = a;
23         if (rnk[a] == rnk[b]) rnk[a]++;
24         // survivor a, dead b
25         return {a, b};
26     }
27 };
28 
29 struct Point { vector<double> x; };
30 
31 double dist(const Point &a, const Point &b) {
32     double s = 0.0; size_t d = a.x.size();
33     for (size_t i = 0; i < d; ++i) {
34         double t = a.x[i] - b.x[i]; s += t * t;
35     }
36     return sqrt(s);
37 }
38 
39 struct Edge { int u, v; double w; };
40 
41 int main() {
42     ios::sync_with_stdio(false);
43     cin.tie(nullptr);
44 
45     // Example: read n d, then points; or generate a simple dataset if none provided
46     int n = 8, d = 2;
47     vector<Point> pts(n);
48     // Two clusters for demonstration
49     vector<pair<double,double>> seeds = {{0,0},{0.1,0.1},{-0.1,0.05},{0.05,-0.05},{3,3},{3.1,3.05},{2.9,3.1},{3.05,2.95}};
50     for (int i = 0; i < n; ++i) pts[i].x = {seeds[i].first, seeds[i].second};
51 
52     // Build complete graph edges with Euclidean distances
53     vector<Edge> edges;
54     edges.reserve((size_t)n * (n - 1) / 2);
55     for (int i = 0; i < n; ++i) {
56         for (int j = i + 1; j < n; ++j) {
57             edges.push_back({i, j, dist(pts[i], pts[j])});
58         }
59     }
60     sort(edges.begin(), edges.end(), [](const Edge &a, const Edge &b){ return a.w < b.w; });
61 
62     // 0D persistent homology: all births at 0 (distance-based Rips convention)
63     DSU dsu(n, 0.0);
64     vector<pair<double,double>> intervals; // (birth, death). use +inf for survivors
65 
66     for (const auto &e : edges) {
67         auto [survivor, dead] = dsu.unite(e.u, e.v);
68         if (dead != -1) {
69             // The younger component (dead) dies at scale = edge length (distance threshold ε)
70             double b = dsu.birth[dead]; // here 0.0, but kept for clarity/generalization
71             intervals.push_back({b, e.w});
72         }
73     }
74     // Remaining components persist to infinity
75     double INF = numeric_limits<double>::infinity();
76     vector<int> seen(n, 0);
77     for (int i = 0; i < n; ++i) {
78         int r = dsu.find(i);
79         if (!seen[r]) { seen[r] = 1; intervals.push_back({dsu.birth[r], INF}); }
80     }
81 
82     // Output barcode (birth death)
83     cout.setf(std::ios::fixed); cout << setprecision(6);
84     cout << "0D barcode intervals (birth death), ε uses distance threshold convention:\n";
85     for (auto &bd : intervals) {
86         cout << bd.first << ' ' << (isinf(bd.second) ? 1e9 : bd.second) << "\n"; // print large number for +inf
87     }
88 
89     return 0;
90 }
91

Build a complete weighted graph on the point cloud using Euclidean distances and process edges in ascending order. Using DSU, each edge that connects two different components kills the younger one at the edge weight, producing a 0D persistence interval [0, weight). Survivors are recorded with death at +∞. This matches single-linkage clustering and the MST-based characterization of 0D persistence under a distance-threshold Rips filtration.

Time: O(n^2 log n)Space: O(n^2)

Euler Characteristic Curve of a Vietoris–Rips 2-Skeleton

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct Point { vector<double> x; };
5 
6 double dist(const Point &a, const Point &b) {
7     double s = 0.0; for (size_t i = 0; i < a.x.size(); ++i) { double t = a.x[i] - b.x[i]; s += t * t; }
8     return sqrt(s);
9 }
10 
11 struct DSU { vector<int> p, r; DSU(int n): p(n), r(n,0){ iota(p.begin(), p.end(), 0);} int f(int x){ return p[x]==x?x:p[x]=f(p[x]); } void u(int a,int b){ a=f(a); b=f(b); if(a==b) return; if(r[a]<r[b]) swap(a,b); p[b]=a; if(r[a]==r[b]) r[a]++; } };
12 
13 int main(){
14     ios::sync_with_stdio(false);
15     cin.tie(nullptr);
16 
17     // Generate noisy circle points
18     int n = 60; int d = 2; double R = 1.0; double noise = 0.03;
19     std::mt19937 rng(42); std::normal_distribution<double> N(0.0, noise);
20     vector<Point> P(n); for(int i=0;i<n;++i){ double t = 2*M_PI*i/n; double cx = R*cos(t)+N(rng); double cy = R*sin(t)+N(rng); P[i].x = {cx, cy}; }
21 
22     // Precompute pairwise distances (symmetric matrix)
23     vector<vector<double>> D(n, vector<double>(n, 0.0)); double dmax = 0.0;
24     for(int i=0;i<n;++i){ for(int j=i+1;j<n;++j){ double dij = dist(P[i], P[j]); D[i][j]=D[j][i]=dij; dmax = max(dmax, dij); } }
25 
26     // Sample S scales between 0 and a fraction of dmax
27     int S = 20; double eps_max = 0.6 * dmax; // enough to see loops and fillings on the circle
28     cout.setf(std::ios::fixed); cout << setprecision(6);
29     cout << "# eps V E T chi beta0 est_beta1\n";
30     for(int s=0; s<=S; ++s){
31         double eps = eps_max * s / S;
32         // Count edges and triangles in Rips 2-skeleton at scale eps
33         int V = n; long long E = 0; long long T = 0;
34         vector<vector<char>> A(n, vector<char>(n, 0));
35         for(int i=0;i<n;++i){ for(int j=i+1;j<n;++j){ if(D[i][j] <= eps){ A[i][j]=A[j][i]=1; ++E; } } }
36         // Triangle count: i<j<k with all edges present
37         for(int i=0;i<n;++i){ for(int j=i+1;j<n;++j){ if(!A[i][j]) continue; for(int k=j+1;k<n;++k){ if(A[i][k] && A[j][k]) ++T; } } }
38         long long chi = (long long)V - E + T;
39 
40         // Also compute beta0 via DSU on edges at this eps
41         DSU dsu(n);
42         for(int i=0;i<n;++i){ for(int j=i+1;j<n;++j){ if(D[i][j] <= eps) dsu.u(i,j); } }
43         unordered_set<int> roots; roots.reserve(n*2);
44         for(int i=0;i<n;++i) roots.insert(dsu.f(i));
45         long long beta0 = (long long)roots.size();
46         // If β2≈0 (often true for planar samples and moderate eps), estimate β1 = β0 − χ
47         long long est_beta1 = beta0 - chi;
48 
49         cout << eps << ' ' << V << ' ' << E << ' ' << T << ' ' << chi << ' ' << beta0 << ' ' << est_beta1 << "\n";
50     }
51 
52     return 0;
53 }
54

This program builds the Vietoris–Rips 2-skeleton (vertices, edges, triangles) of a noisy circle at multiple ε values. For each ε, it counts edges (pairs within ε) and triangles (3-cliques) to compute the Euler characteristic χ = V − E + T. It also computes β0 using DSU on the ε-graph. When β2 is negligible (typical for planar shapes and moderate ε), an estimate of β1 is β0 − χ. This gives an Euler characteristic curve and a rough loop count without full matrix reduction. It is cubic in n per ε, so intended for small datasets.

Time: O(S n^3)Space: O(n^2)

0D Persistence of a 1D Signal via Lower-Star Filtration

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct DSU {
5     vector<int> parent, rnk; 
6     vector<double> comp_birth; // birth time for component roots
7     vector<char> active; // whether vertex is active (entered filtration)
8     DSU(int n): parent(n), rnk(n,0), comp_birth(n, 0.0), active(n, 0) {
9         iota(parent.begin(), parent.end(), 0);
10     }
11     int find(int x){ return parent[x]==x?x:parent[x]=find(parent[x]); }
12     // activate a vertex at its birth time
13     void activate(int v, double b){ active[v]=1; comp_birth[v]=b; parent[v]=v; }
14     // merge components; return dead root (or -1 if none). Survivor is older (smaller birth), tie by index.
15     int unite(int a, int b){
16         a = find(a); b = find(b); if (a==b) return -1;
17         // older survives
18         if (comp_birth[a] > comp_birth[b] || (comp_birth[a]==comp_birth[b] && a>b)) swap(a,b);
19         if (rnk[a] < rnk[b]) swap(a,b);
20         parent[b]=a; if (rnk[a]==rnk[b]) rnk[a]++;
21         return b; // dead
22     }
23 };
24 
25 int main(){
26     ios::sync_with_stdio(false);
27     cin.tie(nullptr);
28 
29     // Example 1D signal: mixture of sinusoids with noise
30     int n = 64; vector<double> f(n);
31     std::mt19937 rng(0); std::normal_distribution<double> N(0.0, 0.05);
32     for(int i=0;i<n;++i){ double t = (2*M_PI*i)/n; f[i] = sin(2*t) + 0.4*sin(5*t) + N(rng); }
33 
34     // Lower-star 0D persistence on a path graph: vertices i, edges (i, i+1)
35     vector<int> order(n); iota(order.begin(), order.end(), 0);
36     stable_sort(order.begin(), order.end(), [&](int a, int b){ if (f[a]==f[b]) return a<b; return f[a] < f[b]; });
37 
38     DSU dsu(n);
39     vector<pair<double,double>> intervals; // (birth, death), survivors to +inf
40 
41     vector<char> is_active(n, 0);
42     for(int idx : order){
43         double birth = f[idx];
44         dsu.activate(idx, birth);
45         is_active[idx] = 1;
46         // Neighbors in path graph
47         vector<int> nbrs; if (idx-1>=0 && is_active[idx-1]) nbrs.push_back(idx-1); if (idx+1<n && is_active[idx+1]) nbrs.push_back(idx+1);
48         // When idx activates, edges to active neighbors appear at time max(f(nei), f(idx)) = f(idx)
49         // Merge all neighboring components; each merge kills the younger one at current time f[idx]
50         for(int nb : nbrs){
51             int dead = dsu.unite(idx, nb);
52             if (dead != -1){
53                 double b = dsu.comp_birth[dead];
54                 intervals.push_back({b, birth}); // younger component dies now
55             }
56         }
57     }
58 
59     // Survivors persist to +inf
60     double INF = numeric_limits<double>::infinity();
61     vector<int> seen(n, 0);
62     for(int i=0;i<n;++i){ if (!is_active[i]) continue; int r = dsu.find(i); if(!seen[r]){ seen[r]=1; intervals.push_back({dsu.comp_birth[r], INF}); } }
63 
64     // Output intervals
65     cout.setf(std::ios::fixed); cout << setprecision(6);
66     cout << "Lower-star 0D barcode (birth death); death=+inf printed as large number:\n";
67     for(auto &bd : intervals){ cout << bd.first << ' ' << (isinf(bd.second)? 1e9 : bd.second) << "\n"; }
68 
69     return 0;
70 }
71

Implements 0D persistent homology on a 1D path graph under the lower-star filtration of a scalar signal. Vertices are added from low to high function values; when a vertex activates, edges to already-active neighbors enter immediately. Each edge merge kills the younger component, producing an interval [birth_younger, current_value). Surviving basins persist to +∞. This detects minima and their merge levels, a useful summary for denoising and peak/valley analysis.

Time: O(n log n)Space: O(n)

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct DSU {
5	vector<int> parent, rnk;
6	vector<double> birth; // birth time of component at the root
7	DSU(int n, double initBirth = 0.0) : parent(n), rnk(n, 0), birth(n, initBirth) {
8	iota(parent.begin(), parent.end(), 0);
9	}
10	int find(int x) {
11	if (parent[x] == x) return x;
12	parent[x] = find(parent[x]);
13	return parent[x];
14	}
15	// Union by rank; return pair (survivor_root, dead_root)
16	pair<int,int> unite(int a, int b) {
17	a = find(a); b = find(b);
18	if (a == b) return {a, -1};
19	// Older component survives: smaller birth time; tie-break by id for determinism
20	if (birth[a] > birth[b] \|\| (birth[a] == birth[b] && a > b)) swap(a, b);
21	if (rnk[a] < rnk[b]) swap(a, b);
22	parent[b] = a;
23	if (rnk[a] == rnk[b]) rnk[a]++;
24	// survivor a, dead b
25	return {a, b};
26	}
27	};
28
29	struct Point { vector<double> x; };
30
31	double dist(const Point &a, const Point &b) {
32	double s = 0.0; size_t d = a.x.size();
33	for (size_t i = 0; i < d; ++i) {
34	double t = a.x[i] - b.x[i]; s += t * t;
35	}
36	return sqrt(s);
37	}
38
39	struct Edge { int u, v; double w; };
40
41	int main() {
42	ios::sync_with_stdio(false);
43	cin.tie(nullptr);
44
45	// Example: read n d, then points; or generate a simple dataset if none provided
46	int n = 8, d = 2;
47	vector<Point> pts(n);
48	// Two clusters for demonstration
49	vector<pair<double,double>> seeds = {{0,0},{0.1,0.1},{-0.1,0.05},{0.05,-0.05},{3,3},{3.1,3.05},{2.9,3.1},{3.05,2.95}};
50	for (int i = 0; i < n; ++i) pts[i].x = {seeds[i].first, seeds[i].second};
51
52	// Build complete graph edges with Euclidean distances
53	vector<Edge> edges;
54	edges.reserve((size_t)n * (n - 1) / 2);
55	for (int i = 0; i < n; ++i) {
56	for (int j = i + 1; j < n; ++j) {
57	edges.push_back({i, j, dist(pts[i], pts[j])});
58	}
59	}
60	sort(edges.begin(), edges.end(), [](const Edge &a, const Edge &b){ return a.w < b.w; });
61
62	// 0D persistent homology: all births at 0 (distance-based Rips convention)
63	DSU dsu(n, 0.0);
64	vector<pair<double,double>> intervals; // (birth, death). use +inf for survivors
65
66	for (const auto &e : edges) {
67	auto [survivor, dead] = dsu.unite(e.u, e.v);
68	if (dead != -1) {
69	// The younger component (dead) dies at scale = edge length (distance threshold ε)
70	double b = dsu.birth[dead]; // here 0.0, but kept for clarity/generalization
71	intervals.push_back({b, e.w});
72	}
73	}
74	// Remaining components persist to infinity
75	double INF = numeric_limits<double>::infinity();
76	vector<int> seen(n, 0);
77	for (int i = 0; i < n; ++i) {
78	int r = dsu.find(i);
79	if (!seen[r]) { seen[r] = 1; intervals.push_back({dsu.birth[r], INF}); }
80	}
81
82	// Output barcode (birth death)
83	cout.setf(std::ios::fixed); cout << setprecision(6);
84	cout << "0D barcode intervals (birth death), ε uses distance threshold convention:\n";
85	for (auto &bd : intervals) {
86	cout << bd.first << ' ' << (isinf(bd.second) ? 1e9 : bd.second) << "\n"; // print large number for +inf
87	}
88
89	return 0;
90	}
91