🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
⚙️AlgorithmIntermediate

Principal Component Analysis (PCA)

Key Points

  • •
    Principal Component Analysis (PCA) finds new orthogonal axes (principal components) that capture the maximum variance in your data.
  • •
    You must first center the data by subtracting the mean of each feature; otherwise PCA gives wrong directions.
  • •
    Computationally, PCA is an eigendecomposition of the covariance matrix or, equivalently, an SVD of the centered data matrix.
  • •
    Projecting data onto the top k principal components reduces dimensionality while preserving most variance and often denoises.
  • •
    Explained variance ratios from eigenvalues help you choose k to meet a variance retention target (e.g., 95%).
  • •
    For large datasets, avoid forming the covariance explicitly; use power iteration with matrix–vector products via Xc​ and Xc​^T.
  • •
    Store the fitted mean and components to transform future data consistently and to reconstruct approximations if needed.
  • •
    PCA is linear; for curved manifolds or nonlinear patterns consider kernel PCA or nonlinear methods.

Prerequisites

  • →Vectors and matrices — PCA manipulates matrices, performs matrix–vector products, and relies on orthogonality and norms.
  • →Variance and covariance — Understanding covariance clarifies why PCA maximizes variance along components.
  • →Eigenvalues and eigenvectors — PCA is defined by the eigendecomposition of the covariance matrix.
  • →Iterative eigenmethods (power iteration) — Efficient top-k PCA often uses power/orthogonal iteration instead of full eigendecomposition.
  • →Feature scaling and standardization — When features have different units or scales, standardization affects PCA results.

Detailed Explanation

Tap terms for definitions

01Overview

Principal Component Analysis (PCA) is a linear technique for reducing the dimensionality of a dataset while preserving as much variability (information) as possible. It re-expresses the data in terms of new, orthogonal axes called principal components. Each component is a direction in feature space along which the data varies the most, with the first component capturing the maximum variance, the second capturing the next largest variance subject to being orthogonal to the first, and so on. Practically, PCA helps compress data, visualize high-dimensional patterns, remove noise, and decorrelate features. Mathematically, PCA is performed by computing the covariance matrix of mean-centered data and then finding its eigenvalues and eigenvectors. The eigenvectors define the principal directions, and the eigenvalues quantify how much variance each direction captures. Alternatively, PCA can be obtained via the Singular Value Decomposition (SVD) of the centered data matrix, which is often numerically robust. After fitting PCA, we can project original samples onto the top k components to obtain a lower-dimensional representation, and we can approximately reconstruct original data by reversing the projection. Choosing k usually involves examining cumulative explained variance. Because PCA is linear and sensitive to scale and outliers, thoughtful preprocessing (centering, optionally standardization) is essential.

02Intuition & Analogies

Imagine a cloud of points representing your data as a flock of birds in 3D space. If you want to understand the flock’s main flight direction, you’d look for the longest axis of the cloud. That is the first principal component: the direction where the flock spreads out the most. Next, you’d look for the second longest axis, as long as it’s perpendicular to the first. Together, these axes describe the main shape of the cloud. If someone told you to summarize the flock with a simpler description—say, a 2D shadow—you would choose the best viewing plane to capture as much of the flock’s shape as possible. Projecting onto the first two principal components is exactly picking that best plane. In photography, this is like finding the angle where the object appears most detailed; in audio, it’s like separating loud, informative tones from faint background noise. PCA automates this by calculating the directions of maximum spread (variance) in your data. Each principal component is a recipe (a weighted mixture) of your original features that points along these informative directions. By keeping only the top few components, you keep the essence of the data while discarding directions that look mostly like noise. That’s why PCA is great for compression and denoising. But because it is a straight-line (linear) summary, it can miss curved patterns—like trying to approximate a spiral with straight rulers.

03Formal Definition

Given data matrix X ∈ Rn×d with rows x_i⊤ (samples) and columns as features, define the mean-centered matrix Xc​=X - 1μ⊤, where μ = n1​∑i=1n​ xi​ and 1 is an n-dimensional vector of ones. The sample covariance matrix is C = n−11​ X_c⊤ Xc​ ∈ Rd×d. PCA seeks an orthonormal matrix V = [v1​,…,vd​] such that C vi​ = λi​ vi​ with eigenvalues λ1​ ≥ λ2​ ≥ ⋯ ≥ λd​ ≥ 0 and V⊤V=I. The principal components are the eigenvectors vi​; their variances are the eigenvalues λi​. For dimensionality reduction to k dimensions, form Vk​ = [v1​,…,vk​] and compute the low-dimensional representation (scores) Y=Xc​ Vk​ ∈ Rn×k. Approximate reconstruction is X^ = Y V_k⊤ + 1μ⊤. Equivalently, if Xc​=U Σ V⊤ is the SVD with singular values σi​, then the principal directions are the right singular vectors V, and λi​ = σi2​/(n-1). The explained variance ratio for component i is ri​ = λi​ / ∑j=1d​ λj​, and cumulative explained variance for top k is Rk​ = ∑i=1k​ ri​.

04When to Use

Use PCA when you need to reduce dimensionality while retaining most variance: for visualization (projecting to 2D/3D), compressing features before downstream tasks (e.g., clustering, regression), denoising by discarding low-variance directions, and decorrelating features to simplify models. It is well-suited when relationships are approximately linear, noise is roughly isotropic, and features are commensurate (or standardized if on different scales). PCA is also helpful when d is large and you want to mitigate the curse of dimensionality by capturing structure with far fewer components. Consider PCA before k-means clustering to improve cluster separation, or before linear regression to reduce multicollinearity (principal component regression). Use SVD-based approaches or covariance-free matrix–vector products for large n or d to avoid materializing huge covariance matrices. Do not rely on PCA when patterns are strongly nonlinear, when interpretability in original feature space is critical, or when outliers dominate variance. For heterogeneous units (e.g., kilograms vs meters), standardize features first. In supervised settings, PCA ignores labels; confirm that variance correlates with predictive signal or prefer supervised dimensionality reduction.

⚠️Common Mistakes

  • Skipping centering: Not subtracting feature means causes the first component to align with the mean offset instead of true variation. Always center; standardize if units differ.
  • Forgetting to sort eigenpairs: Some libraries return unsorted eigenvalues. Ensure components are ordered by decreasing eigenvalue.
  • Mixing up shapes: Confusing whether samples are rows or columns leads to wrong covariance. In our convention, samples are rows and features are columns.
  • Data leakage: Fitting PCA (means, components) on the full dataset before splitting leaks test information. Fit on training data; apply to validation/test with the stored mean and components.
  • Keeping too many/few components: Arbitrary k may underfit or include mostly noise. Use explained variance plots or thresholds (e.g., 95%).
  • Numerical instability: Explicit covariance of very high-dimensional or ill-conditioned data can be unstable. Prefer SVD or covariance-free multiplies X_c^{\top}(X_c v).
  • Outliers: PCA is variance-driven and can chase outliers. Consider robust scaling or outlier removal.
  • Misinterpretation: Loadings (eigenvectors) are linear combinations of features; large coefficients do not always imply causality.

Key Formulas

Feature mean

μ=n1​i=1∑n​xi​

Explanation: The mean vector averages each feature across all samples. Subtracting this vector centers the data for PCA.

Mean-centering

Xc​=X−1μ⊤

Explanation: Center the data matrix by subtracting the feature means from every row. PCA requires centered data to measure variance about zero.

Sample covariance

C=n−11​Xc⊤​Xc​

Explanation: This symmetric matrix captures pairwise covariances between features. PCA performs an eigendecomposition of C.

Eigendecomposition

C=VΛV⊤,V⊤V=I,Λ=diag(λ1​,…,λd​)

Explanation: The covariance decomposes into orthonormal eigenvectors and nonnegative eigenvalues. Eigenvectors define principal directions; eigenvalues give variances along them.

SVD relation

Xc​=UΣV⊤,C=n−11​VΣ2V⊤

Explanation: The right singular vectors of centered data equal PCA loadings. Squared singular values (scaled by 1/(n−1)) are the eigenvalues of the covariance.

Projection (scores)

Y=Xc​Vk​

Explanation: Multiplying centered data by the top k eigenvectors yields k-dimensional coordinates that preserve maximal variance.

Reconstruction

X^=YVk⊤​+1μ⊤

Explanation: Map low-dimensional scores back to the original feature space using the top components, then add back the mean.

Explained variance

ri​=∑j=1d​λj​λi​​,Rk​=∑j=1d​λj​∑i=1k​λi​​

Explanation: ri​ is the fraction of variance captured by component i; Rk​ is the cumulative fraction for the first k components. Use these to choose k.

Optimal reconstruction error

∥X−X^∥F2​=i=k+1∑d​λi​

Explanation: The squared Frobenius error of the best rank-k PCA reconstruction equals the sum of discarded eigenvalues. PCA is the optimal linear rank-k approximation.

Power iteration

v(t+1)=∥Cv(t)∥2​Cv(t)​

Explanation: Iteratively multiplying by C and normalizing converges to the dominant eigenvector. Orthogonalize to obtain subsequent components.

Whitening transform

Z=YΛk−1/2​

Explanation: Scaling scores by inverse square roots of eigenvalues yields features with approximately identity covariance (unit variance and zero covariance).

Dual (n \ll d) trick

G=Xc​Xc⊤​,V=Xc⊤​UΣ−1

Explanation: When features far outnumber samples, compute eigenpairs of the sample Gram matrix G. Map left singular vectors back to feature space to obtain principal directions.

Complexity Analysis

Let n be the number of samples and d the number of features. Centering the data costs O(nd) time and O(d) extra space for the mean. Explicitly forming the covariance matrix C = (1/(n−1)) X_cT Xc​ costs O(n d2) time and O(d2) space, which becomes prohibitive when d is large. If you then compute a full eigendecomposition of C, the time is O(d3) and space is O(d2). However, many applications only need the top k components (k ≪ d). In that case, iterative methods like power iteration or orthogonal iteration avoid full eigendecomposition. Each power iteration step requires a matrix–vector product with C. If C is explicitly formed, this is O(d2) per iteration. A better approach is to avoid C and compute C v via X_cT(Xc​ v)/(n−1), which costs O(nd) per iteration and uses only O(nd) space to store the centered data and O(dk) to store k components. With t iterations per component (typically tens to a few hundred, depending on the eigengap), total time is roughly O(center: nd + project: k t nd) and space O(nd + dk). Projection of all samples onto k components costs O(ndk), and reconstruction costs O(ndk) as well. If n ≪ d, using the dual approach via the Gram matrix G=Xc​ X_cT (size n×n) reduces eigendecomposition to O(n3) and then mapping back to feature space costs O(ndk). SVD-based PCA on Xc​ has time O(n d min(n, d)) and space O(nd), often giving the best numerical stability. Overall, the choice between explicit covariance, covariance-free multiplies, dual methods, and SVD depends on the relative sizes of n and d and the required number of components k.

Code Examples

PCA via power iteration without forming the covariance matrix (orthogonal iteration)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Helper to index row-major matrices stored in a flat vector
5inline size_t idx(size_t r, size_t c, size_t ncols) { return r * ncols + c; }
6
7// Compute column means (size d) and center X in place (n x d)
8vector<double> center_columns(vector<double>& X, size_t n, size_t d) {
9 vector<double> mu(d, 0.0);
10 for (size_t j = 0; j < d; ++j) {
11 double s = 0.0;
12 for (size_t i = 0; i < n; ++i) s += X[idx(i, j, d)];
13 mu[j] = s / static_cast<double>(n);
14 for (size_t i = 0; i < n; ++i) X[idx(i, j, d)] -= mu[j];
15 }
16 return mu;
17}
18
19// y = X * v where X is (n x d), v is (d)
20void matVec_X_v(const vector<double>& X, const vector<double>& v, vector<double>& y, size_t n, size_t d) {
21 fill(y.begin(), y.end(), 0.0);
22 for (size_t i = 0; i < n; ++i) {
23 double s = 0.0;
24 const double* row = &X[idx(i, 0, d)];
25 for (size_t j = 0; j < d; ++j) s += row[j] * v[j];
26 y[i] = s;
27 }
28}
29
30// z = X^T * u where X is (n x d), u is (n)
31void matTVec_XT_u(const vector<double>& X, const vector<double>& u, vector<double>& z, size_t n, size_t d) {
32 fill(z.begin(), z.end(), 0.0);
33 for (size_t j = 0; j < d; ++j) {
34 double s = 0.0;
35 for (size_t i = 0; i < n; ++i) s += X[idx(i, j, d)] * u[i];
36 z[j] = s;
37 }
38}
39
40// w = (1/(n-1)) * X^T * (X * v) (covariance times vector) without forming C
41void covMatVec_via_data(const vector<double>& Xc, const vector<double>& v, vector<double>& w, size_t n, size_t d) {
42 static thread_local vector<double> tmp; // reusable buffer for Xc * v
43 tmp.assign(n, 0.0);
44 matVec_X_v(Xc, v, tmp, n, d); // tmp = Xc * v (n)
45 matTVec_XT_u(Xc, tmp, w, n, d); // w = Xc^T * tmp (d)
46 double scale = 1.0 / static_cast<double>(n > 1 ? (n - 1) : 1);
47 for (size_t j = 0; j < d; ++j) w[j] *= scale;
48}
49
50// Dot product and L2 norm
51inline double dot(const vector<double>& a, const vector<double>& b) {
52 double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i] * b[i]; return s;
53}
54inline double norm2(const vector<double>& a) { return sqrt(max(0.0, dot(a, a))); }
55
56// Orthogonalize v against columns of W (d x m) using Modified Gram-Schmidt
57void orthogonalize(vector<double>& v, const vector<double>& W, size_t d, size_t m) {
58 for (size_t c = 0; c < m; ++c) {
59 const double* w = &W[idx(0, c, m)];
60 double proj = 0.0;
61 for (size_t j = 0; j < d; ++j) proj += v[j] * w[j];
62 for (size_t j = 0; j < d; ++j) v[j] -= proj * w[j];
63 }
64}
65
66// Power iteration with orthogonalization to find top-k eigenvectors of covariance
67struct PCAResult { vector<double> mean; vector<double> components; vector<double> eigenvalues; size_t n, d, k; };
68
69PCAResult pca_power_iteration(const vector<double>& X_in, size_t n, size_t d, size_t k,
70 size_t max_iter = 500, double tol = 1e-6, uint64_t seed = 42) {
71 vector<double> X = X_in; // copy to center in place
72 vector<double> mean = center_columns(X, n, d);
73
74 // Storage for components (d x k, column-major across k) and eigenvalues
75 vector<double> W(d * k, 0.0);
76 vector<double> lambdas(k, 0.0);
77
78 mt19937_64 rng(seed);
79 normal_distribution<double> dist(0.0, 1.0);
80
81 vector<double> v(d), w(d);
82
83 for (size_t comp = 0; comp < k; ++comp) {
84 // Random initialization
85 for (size_t j = 0; j < d; ++j) v[j] = dist(rng);
86 // Orthogonalize against previous components
87 if (comp > 0) orthogonalize(v, W, d, comp);
88 double nv = norm2(v); if (nv > 0) for (size_t j = 0; j < d; ++j) v[j] /= nv;
89
90 double prev_val = 0.0;
91 for (size_t it = 0; it < max_iter; ++it) {
92 // Multiply by covariance via data and orthogonalize
93 covMatVec_via_data(X, v, w, n, d);
94 if (comp > 0) orthogonalize(w, W, d, comp);
95 double nw = norm2(w);
96 if (nw == 0.0) break; // degenerate
97 for (size_t j = 0; j < d; ++j) v[j] = w[j] / nw;
98 // Rayleigh quotient as eigenvalue estimate
99 covMatVec_via_data(X, v, w, n, d);
100 double val = dot(v, w);
101 if (fabs(val - prev_val) < tol * max(1.0, fabs(prev_val))) { prev_val = val; break; }
102 prev_val = val;
103 }
104 // Save component v into W(:, comp)
105 for (size_t j = 0; j < d; ++j) W[idx(j, comp, k)] = v[j];
106 // Final eigenvalue estimate
107 covMatVec_via_data(X, v, w, n, d);
108 lambdas[comp] = dot(v, w);
109 }
110
111 return {mean, W, lambdas, n, d, k};
112}
113
114// Project centered data onto components: Y = Xc * W_k (n x k)
115vector<double> project_scores(const vector<double>& X, const vector<double>& mean,
116 const vector<double>& W, size_t n, size_t d, size_t k) {
117 vector<double> Y(n * k, 0.0);
118 for (size_t i = 0; i < n; ++i) {
119 // Build centered row
120 for (size_t c = 0; c < k; ++c) {
121 double s = 0.0;
122 for (size_t j = 0; j < d; ++j) s += (X[idx(i, j, d)] - mean[j]) * W[idx(j, c, k)];
123 Y[idx(i, c, k)] = s;
124 }
125 }
126 return Y;
127}
128
129// Reconstruct: X_hat = Y * W^T + 1 * mean^T
130vector<double> reconstruct(const vector<double>& Y, const vector<double>& W, const vector<double>& mean,
131 size_t n, size_t d, size_t k) {
132 vector<double> Xhat(n * d, 0.0);
133 for (size_t i = 0; i < n; ++i) {
134 for (size_t j = 0; j < d; ++j) {
135 double s = 0.0;
136 for (size_t c = 0; c < k; ++c) s += Y[idx(i, c, k)] * W[idx(j, c, k)];
137 Xhat[idx(i, j, d)] = s + mean[j];
138 }
139 }
140 return Xhat;
141}
142
143int main() {
144 // Create synthetic 2D data with correlated features embedded in 3D
145 size_t n = 600, d = 3, k = 2;
146 vector<double> X(n * d);
147 mt19937_64 rng(123);
148 normal_distribution<double> g1(0.0, 3.0); // main variance
149 normal_distribution<double> g2(0.0, 1.0); // secondary variance
150 normal_distribution<double> noise(0.0, 0.2);
151
152 // True latent 2D, then mix into 3D via a loading matrix A (3x2)
153 array<array<double,2>,3> A = {{{0.8, 0.1}, {0.6, -0.2}, {0.0, 0.98}}};
154 for (size_t i = 0; i < n; ++i) {
155 double z1 = g1(rng), z2 = g2(rng);
156 double x = A[0][0]*z1 + A[0][1]*z2 + noise(rng);
157 double y = A[1][0]*z1 + A[1][1]*z2 + noise(rng);
158 double z = A[2][0]*z1 + A[2][1]*z2 + noise(rng);
159 X[idx(i,0,d)] = x + 5.0; // add mean shift to test centering
160 X[idx(i,1,d)] = y - 2.0;
161 X[idx(i,2,d)] = z + 1.0;
162 }
163
164 auto res = pca_power_iteration(X, n, d, k, 300, 1e-7);
165
166 // Report eigenvalues and explained variance
167 double total_var = 0.0; for (double l : res.eigenvalues) total_var += l; // partial total (top-k)
168 // For true total variance, one would sum all d eigenvalues; here we show top-k share estimate
169
170 cout << fixed << setprecision(6);
171 cout << "Top-" << k << " eigenvalues (variances):\n";
172 for (size_t i = 0; i < k; ++i) cout << " lambda_" << (i+1) << " = " << res.eigenvalues[i] << "\n";
173
174 // Project and reconstruct first 5 samples
175 auto Y = project_scores(X, res.mean, res.components, n, d, k);
176 auto Xhat = reconstruct(Y, res.components, res.mean, n, d, k);
177
178 cout << "\nOriginal vs Reconstructed (first 3 samples):\n";
179 for (size_t i = 0; i < 3; ++i) {
180 cout << "Sample " << i << ":\n orig: ";
181 for (size_t j = 0; j < d; ++j) cout << setw(9) << X[idx(i,j,d)] << ' ';
182 cout << "\n recon: ";
183 for (size_t j = 0; j < d; ++j) cout << setw(9) << Xhat[idx(i,j,d)] << ' ';
184 cout << "\n";
185 }
186
187 // Show first component loadings
188 cout << "\nFirst component loadings (v1): ";
189 for (size_t j = 0; j < d; ++j) cout << res.components[idx(j,0,k)] << ' ';
190 cout << "\n";
191 return 0;
192}
193

This program implements PCA using orthogonal power iteration without explicitly forming the covariance matrix. It centers the data, repeatedly multiplies a vector by the covariance via X_c^T(X_c v)/(n−1), orthogonalizes against previously found components, and normalizes. The Rayleigh quotient estimates each eigenvalue (variance). It then projects samples onto the top k components and reconstructs approximate originals. This approach scales to larger problems by avoiding O(d^2) covariance storage and uses only matrix–vector products on the centered data.

Time: Centering: O(nd). Power iteration per step via data: O(nd). For k components and t iterations: O(k t n d). Projection/reconstruction: O(n d k).Space: O(nd) to store centered data, O(dk) for components, O(d) for mean and working vectors.
Streaming (online) covariance with Welford’s algorithm + PCA of the covariance
1#include <bits/stdc++.h>
2using namespace std;
3
4inline size_t idx(size_t r, size_t c, size_t ncols) { return r * ncols + c; }
5
6struct OnlineCov {
7 size_t d = 0; // dimensionality
8 size_t n = 0; // count
9 vector<double> mean; // running mean (d)
10 vector<double> M2; // sum of outer products of deviations (d x d)
11
12 explicit OnlineCov(size_t dim) : d(dim), n(0), mean(d, 0.0), M2(d * d, 0.0) {}
13
14 void update(const vector<double>& x) {
15 if (x.size() != d) throw runtime_error("dimension mismatch");
16 n++;
17 vector<double> delta(d), delta2(d);
18 for (size_t j = 0; j < d; ++j) delta[j] = x[j] - mean[j];
19 // update mean
20 for (size_t j = 0; j < d; ++j) mean[j] += delta[j] / static_cast<double>(n);
21 for (size_t j = 0; j < d; ++j) delta2[j] = x[j] - mean[j];
22 // M2 += outer(delta, delta2)
23 for (size_t r = 0; r < d; ++r) {
24 double dr = delta[r];
25 for (size_t c = 0; c < d; ++c) M2[idx(r,c,d)] += dr * delta2[c];
26 }
27 }
28
29 vector<double> covariance() const {
30 vector<double> C = M2;
31 double denom = (n > 1) ? (static_cast<double>(n) - 1.0) : 1.0;
32 for (double& v : C) v /= denom;
33 return C; // d x d
34 }
35};
36
37inline double dot(const vector<double>& a, const vector<double>& b) { double s=0; for (size_t i=0;i<a.size();++i) s+=a[i]*b[i]; return s; }
38inline double norm2(const vector<double>& a) { return sqrt(max(0.0, dot(a,a))); }
39
40// w = C * v for symmetric C (d x d)
41void symMatVec(const vector<double>& C, const vector<double>& v, vector<double>& w, size_t d) {
42 fill(w.begin(), w.end(), 0.0);
43 for (size_t r = 0; r < d; ++r) {
44 const double* row = &C[idx(r,0,d)];
45 double s = 0.0; for (size_t c = 0; c < d; ++c) s += row[c] * v[c];
46 w[r] = s;
47 }
48}
49
50// Orthogonalize against W (d x m)
51void orthogonalize(vector<double>& v, const vector<double>& W, size_t d, size_t m) {
52 for (size_t c = 0; c < m; ++c) {
53 const double* w = &W[idx(0, c, m)];
54 double proj = 0.0; for (size_t j = 0; j < d; ++j) proj += v[j] * w[j];
55 for (size_t j = 0; j < d; ++j) v[j] -= proj * w[j];
56 }
57}
58
59struct EigenResult { vector<double> vecs; vector<double> vals; size_t d, k; };
60
61EigenResult topk_eigs_power(const vector<double>& C, size_t d, size_t k, size_t max_iter = 500, double tol = 1e-6, uint64_t seed=7) {
62 vector<double> W(d * k, 0.0), vals(k, 0.0);
63 mt19937_64 rng(seed); normal_distribution<double> nd(0.0, 1.0);
64 vector<double> v(d), w(d);
65 for (size_t comp = 0; comp < k; ++comp) {
66 for (size_t j = 0; j < d; ++j) v[j] = nd(rng);
67 if (comp > 0) orthogonalize(v, W, d, comp);
68 double nv = norm2(v); if (nv > 0) for (size_t j = 0; j < d; ++j) v[j] /= nv;
69 double prev = 0.0;
70 for (size_t it = 0; it < max_iter; ++it) {
71 symMatVec(C, v, w, d);
72 if (comp > 0) orthogonalize(w, W, d, comp);
73 double nw = norm2(w); if (nw == 0) break;
74 for (size_t j = 0; j < d; ++j) v[j] = w[j] / nw;
75 symMatVec(C, v, w, d); double val = dot(v, w);
76 if (fabs(val - prev) < tol * max(1.0, fabs(prev))) { prev = val; break; }
77 prev = val;
78 }
79 for (size_t j = 0; j < d; ++j) W[idx(j, comp, k)] = v[j];
80 symMatVec(C, v, w, d); vals[comp] = dot(v, w);
81 }
82 return {W, vals, d, k};
83}
84
85int main() {
86 // Stream synthetic 5D data (two strong directions + noise)
87 size_t d = 5; size_t n = 2000;
88 OnlineCov oc(d);
89 mt19937_64 rng(2024);
90 normal_distribution<double> z1(0.0, 4.0), z2(0.0, 2.0), noise(0.0, 0.4);
91 // Loading matrix A (5 x 2)
92 array<array<double,2>,5> A = {{{0.7, 0.2},{0.5, -0.3},{0.3, 0.8},{0.0, 0.4},{0.1, -0.1}}};
93
94 for (size_t i = 0; i < n; ++i) {
95 double u1 = z1(rng), u2 = z2(rng);
96 vector<double> x(d);
97 for (size_t r = 0; r < d; ++r) x[r] = A[r][0]*u1 + A[r][1]*u2 + noise(rng);
98 // Add drift to test centering
99 if (i % 2 == 0) x[0] += 5.0;
100 oc.update(x);
101 }
102
103 auto C = oc.covariance();
104 size_t k = 2;
105 auto er = topk_eigs_power(C, d, k, 300, 1e-7);
106
107 // Report eigenvalues and explained variance
108 double total_var = 0.0; for (size_t j = 0; j < d; ++j) total_var += C[idx(j,j,d)];
109 cout << fixed << setprecision(6);
110 cout << "Top-" << k << " eigenvalues: \n";
111 for (size_t i = 0; i < k; ++i) cout << " lambda_" << (i+1) << " = " << er.vals[i] << "\n";
112 double cum = 0.0; for (size_t i = 0; i < k; ++i) cum += er.vals[i];
113 cout << "Cumulative explained variance (top-" << k << ") = " << (cum / total_var) * 100.0 << "%\n";
114
115 // Print first component loadings
116 cout << "First component loadings: ";
117 for (size_t j = 0; j < d; ++j) cout << er.vecs[idx(j,0,k)] << ' ';
118 cout << "\n";
119 return 0;
120}
121

This program computes a covariance matrix online using Welford’s algorithm, which updates the mean and the sum of outer-product deviations per arriving sample. After streaming all samples, it forms the sample covariance, then finds the top k eigenpairs using power iteration on the explicit symmetric covariance. It reports eigenvalues and cumulative explained variance, demonstrating how to handle data that do not fit in memory at once while still performing PCA at the end.

Time: Per update: O(d^2) (outer product). Total streaming: O(n d^2). Eigen solve for k with t iterations: O(k t d^2).Space: O(d^2) for M2/covariance and O(d) for mean and working vectors.
#principal component analysis#pca c++#eigendecomposition#covariance matrix#dimensionality reduction#power iteration#svd#explained variance#whitening#projection#orthogonal iteration#rayleigh quotient#deflation#mean centering