∑MathIntermediate

Pseudoinverse (Moore-Penrose)

Key Points

•
The Moore–Penrose pseudoinverse generalizes matrix inversion to rectangular or singular matrices and is denoted A⁺.
•
For full column rank tall matrices ( $m \geq n$ ), the pseudoinverse is A⁺ = (AᵀA)^{-1}Aᵀ and solves least-squares problems.
•
Geometrically, A⁺ maps a vector to the best-fit solution and projects data onto A’s column space.
•
Numerically, forming AᵀA can amplify conditioning; QR or SVD-based methods are usually more stable.
•
The pseudoinverse yields the minimum-norm solution among all least-squares solutions.
•
You can compute A⁺b without forming A⁺ explicitly by solving normal equations or via QR: R x = Qᵀ b.
•
When A is rank-deficient, use SVD with thresholding or Tikhonov regularization instead of (AᵀA)^{-1}.
•
Time to compute A⁺ for m × n full-rank tall A is roughly O(m $n^{2}$ ); memory is O(m n + $n^{2}$ ).

Prerequisites

→Matrix multiplication and transpose — Understanding AᵀA and Aᵀb requires comfort with basic matrix operations.
→Linear independence and rank — Full column rank is essential for using A⁺ = (AᵀA)^{-1}Aᵀ and guarantees invertibility of AᵀA.
→Least squares and projections — Pseudoinverse solves least-squares problems and represents orthogonal projections.
→Cholesky and triangular solves — Efficiently applying (AᵀA)^{-1} without forming an explicit inverse uses these techniques.
→QR and SVD basics — Stable computation of least-squares solutions uses QR or SVD instead of normal equations.

Detailed Explanation

Tap terms for definitions

01Overview

The Moore–Penrose pseudoinverse extends the idea of a matrix inverse to cases where the matrix is not square or not invertible. For a tall matrix A with more rows than columns ( $m ≥ n$ ) and full column rank, the pseudoinverse A⁺ acts like a “best possible” inverse: it provides the least-squares solution to $Ax ≈ b$ and projects vectors onto the column space of A. Concretely, when A has full column rank, A⁺ = (AᵀA)^{-1}Aᵀ. This formula appears naturally from the normal equations of least squares. The pseudoinverse is unique and satisfies four defining equations (the Moore–Penrose conditions) that make it the canonical choice among generalized inverses. In data fitting, signal processing, control, and robotics, A⁺ is used to recover parameters from noisy measurements, compute minimum-norm solutions, and project onto subspaces. Practically, while A⁺ can be formed explicitly using the formula above, it is often better numerically to avoid explicit inversion and instead solve linear systems using QR or SVD factorizations. This improves stability, especially when the columns of A are nearly linearly dependent. Understanding the pseudoinverse helps unify linear regression, projections, and constrained optimization under a single mathematical tool.

02Intuition & Analogies

Imagine A as a machine that takes an input vector x (parameters) and produces an output Ax (measurements). If A were a perfect square and invertible machine, you could reverse it exactly with $A^{-1}$ . But in real life, we often have more measurements than parameters (tall matrices), and they may be noisy or inconsistent. There is no exact reverse button because many measurement vectors b are not exactly producible by any x. The pseudoinverse A⁺ is the next best thing: it is the “undo as well as possible” button. It takes b and returns the x that makes Ax as close as possible to b in the least-squares sense (minimizes the sum of squared errors). Geometrically, think of A’s columns spanning some plane or subspace inside a higher-dimensional space. Given b, you first drop a perpendicular from b onto that plane to find the closest point p (this is the projection onto the column space). Then you find the coordinates x of p in the basis given by A’s columns. The composition of “project then express coordinates” is exactly what A⁺ does. If the machine A slightly squashes or stretches space (some directions stronger than others), A⁺ unsquashes appropriately, but it refuses to chase the parts of b that lie outside the subspace (that would require guessing information that A never produced). When the columns of A are nearly redundant (almost in the same direction), reversing those squashes becomes numerically risky—small measurement noise can blow up. That’s why stable methods (QR, SVD) are often preferred over directly inverting AᵀA.

03Formal Definition

For any real matrix A

\in

R^{m \times n}

, the Moore–Penrose pseudoinverse

A^{+}

\in

R^{n \times m}

is the unique matrix satisfying the four Penrose conditions: (1)

A A^{+} A = A

, (2)

A^{+}

A^{+} = A^{+}

, (3) (

A A^{+}

)^{

T

} =

A A^{+}

, and (4) (

A^{+}

A)^{

T

} =

A^{+}

A. These conditions enforce that

A^{+}

behaves like a true inverse on A’s range and gives orthogonal projections onto the column and row spaces. When A has full column rank (

rank

(A) = n, m

\geq

n), A^{

T

}A is symmetric positive definite and invertible, and

A^{+}

can be written as

A^{+}

= (A^{

T

}A)^{-1}A^{

T

}. In general, using the singular value decomposition (SVD) A = U

Σ

V^{

T

}, with

Σ

containing singular values

σ_{1}

\geq

\dots

\geq

σ_{r}

> 0, the pseudoinverse is

A^{+} = V Σ^{+} U^{T}

, where

Σ^{+}

replaces each nonzero

σ_{i}

by 1/

σ_{i}

and transposes the rectangular shape. For overdetermined systems,

x^{⋆} = A^{+} b

is the minimum-norm least-squares solution to

min_{x}

∥

Ax - b

∥_{2}

. The orthogonal projectors

A A^{+}

and

A^{+}

A project onto

col

(A) and

row

(A), respectively.

04When to Use

Use the pseudoinverse when you need to solve $Ax ≈ b$ in a least-squares sense, especially for tall full-column-rank matrices. Typical applications include linear regression (fitting models to data), parameter estimation in system identification, and signal reconstruction from noisy measurements. In geometry and graphics, A⁺ is used to compute best-fit transformations and projections. In control and robotics, it appears in inverse kinematics to compute joint configurations that achieve a desired end-effector motion while minimizing joint effort (minimum-norm solutions). In statistics and machine learning, it provides closed-form solutions for ordinary least squares and ridge regression variants. Use the closed-form A⁺ = (AᵀA)^{-1}Aᵀ when A has full column rank and when the problem is reasonably well conditioned; otherwise, prefer numerically stable algorithms such as QR or SVD to compute x = A⁺b without explicitly forming A⁺. For rank-deficient or nearly singular problems, use SVD with small singular value thresholding or Tikhonov regularization (ridge) to control noise amplification.

⚠️Common Mistakes

Blindly forming (AᵀA)^{-1} when A is rank-deficient or nearly so. AᵀA then becomes singular or ill-conditioned, leading to massive numerical errors. Use QR/SVD or add regularization.\n- Computing a literal matrix inverse instead of solving triangular systems. Inverse formation is slower and less stable. Use Cholesky/QR solves to apply (AᵀA)^{-1}Aᵀ.\n- Ignoring scaling and conditioning. Poorly scaled columns can inflate the condition number, making solutions sensitive to noise. Normalize features or use QR/SVD.\n- Dimension mismatches. For $m × n$ tall A, A⁺ must be $n × m$ ; AA⁺ is $m × m$ (a projector), while A⁺A is $n × n$ .\n- Assuming A⁺A equals the identity for any A. It equals I only on A’s column space (for tall full-rank A, A⁺A = Iₙ but AA⁺ ≠ Iₘ).\n- Using float instead of double. Accumulating dot products in low precision increases round-off error. Prefer double and use tolerances when checking identities.\n- Forgetting that normal equations square the condition number. Forming AᵀA can worsen numerical stability by roughly \kappa(A)^2; prefer QR/SVD for ill-conditioned problems.

Key Formulas

Full-column-rank pseudoinverse

A^{+} = (A^{T} A)^{- 1} A^{T}

Explanation: If A has full column rank ( $m \geq n$ , rank n), AᵀA is invertible and this closed form holds. It is the classic formula used to compute least-squares solutions via normal equations.

Least-squares solution

x^{⋆} = ar g x min ∥ A x - b ∥_{2} = A^{+} b

Explanation: The pseudoinverse maps b to the minimum‑residual, minimum‑norm solution. This connects the geometric projection view with an explicit algebraic solution.

Normal equations

A^{T} A x = A^{T} b

Explanation: Stationary condition for minimizing the squared 2‑norm of residuals. Solving this SPD system yields the least-squares solution when A has full column rank.

SVD pseudoinverse

A = U Σ V^{T}, A^{+} = V Σ^{+} U^{T}

Explanation: By inverting nonzero singular values and swapping U and V, we obtain A⁺ that satisfies the Moore–Penrose conditions even when A is rectangular or rank deficient.

Spectral condition number

κ_{2} (A) = \frac{σ _{m a x} ( A )}{σ _{m i n} ( A )}

Explanation: The ratio of the largest to smallest singular values measures sensitivity to perturbations. Squaring occurs when forming AᵀA, worsening numerical stability.

Orthogonal projectors

P = A A^{+}, Q = A^{+} A

Explanation: P projects onto the column space of A; Q projects onto the row space. They are idempotent (P² = P) and symmetric for real matrices.

Symmetry of projectors

(A A^{+})^{T} = A A^{+}, (A^{+} A)^{T} = A^{+} A

Explanation: These are two of the Moore–Penrose conditions enforcing that the projectors are orthogonal (symmetric) projections.

Tikhonov-regularized pseudoinverse

(A^{T} A + λ I)^{- 1} A^{T}

Explanation: A damped inverse used when AᵀA is ill-conditioned. The parameter λ > 0 reduces noise amplification at the cost of bias.

Complexity Analysis

Let A be m × n with

m \geq n

and full column rank. Forming the Gram matrix

G = A^{T} A

costs O(m

n^{2}

) time: there are n(n+1)/2 unique entries, each a dot product of length m. Computing a Cholesky factorization

G = L L^{T}

requires O(

n^{3}

) time and O(

n^{2}

) memory. To build the explicit pseudoinverse

A^{+}

= (A^{

T

}A)^{-1}A^{

T

} without forming the inverse, we solve two triangular systems with multiple right-hand sides: solve L

Y = A^{T}

(n × m) and then L^{

T

}

X = Y

to obtain

X = A^{+}

. Each triangular solve with m right-hand sides costs O(

n^{2}

m). The overall time is O(m

n^{2}

n^{3}

n^{2}

m) = O(m

n^{2}

n^{3}

); for tall problems (m ≫ n), the O(m

n^{2}

) term dominates. Memory usage is O(m n) to store A plus O(

n^{2}

) for G and O(n m) for

A^{+}

. If you only need

x = A^{+}

b, you can avoid forming

A^{+}

entirely by solving G

x = A^{T}

b using Cholesky in O(m n +

n^{2}

n^{2}

) per right-hand side, saving O(n m) memory. Using QR decomposition (Householder) to solve least squares has time O(m

n^{2}

) and avoids squaring the condition number, making it preferable for ill-conditioned A. In contrast, explicitly inverting G (forming

G^{- 1}

) costs O(

n^{3}

) and increases numerical error; it is better to solve via triangular systems.

Code Examples

Compute A⁺ for tall full-column-rank A using Cholesky without explicit inverse

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Utility functions for small dense matrices using vector<vector<double>>
5 using Matrix = vector<vector<double>>;
6 
7 Matrix zeros(int r, int c) { return Matrix(r, vector<double>(c, 0.0)); }
8 Matrix identity(int n) { Matrix I = zeros(n, n); for (int i=0;i<n;++i) I[i][i]=1.0; return I; }
9 
10 Matrix transpose(const Matrix &A){
11     int m = (int)A.size(), n = (int)A[0].size();
12     Matrix AT = zeros(n, m);
13     for(int i=0;i<m;++i) for(int j=0;j<n;++j) AT[j][i] = A[i][j];
14     return AT;
15 }
16 
17 Matrix matmul(const Matrix &A, const Matrix &B){
18     int m = (int)A.size(), k = (int)A[0].size();
19     int n = (int)B[0].size();
20     Matrix C = zeros(m, n);
21     for(int i=0;i<m;++i){
22         for(int t=0;t<k;++t){
23             double a = A[i][t];
24             if (a==0.0) continue;
25             for(int j=0;j<n;++j){
26                 C[i][j] += a * B[t][j];
27             }
28         }
29     }
30     return C;
31 }
32 
33 // Cholesky factorization for SPD matrix G: returns lower triangular L such that G = L L^T
34 Matrix cholesky(const Matrix &G){
35     int n = (int)G.size();
36     Matrix L = zeros(n, n);
37     for(int i=0;i<n;++i){
38         for(int j=0;j<=i;++j){
39             double sum = G[i][j];
40             for(int k=0;k<j;++k) sum -= L[i][k]*L[j][k];
41             if(i==j){
42                 if(sum <= 0) throw runtime_error("Matrix not SPD (Cholesky failed)");
43                 L[i][i] = sqrt(sum);
44             } else {
45                 L[i][j] = sum / L[j][j];
46             }
47         }
48     }
49     return L;
50 }
51 
52 // Solve L Y = B for Y, where L is lower triangular; B can have multiple RHS (columns)
53 Matrix solveLower(const Matrix &L, const Matrix &B){
54     int n = (int)L.size();
55     int m = (int)B[0].size();
56     Matrix Y = zeros(n, m);
57     for(int col=0; col<m; ++col){
58         for(int i=0;i<n;++i){
59             double sum = B[i][col];
60             for(int k=0;k<i;++k) sum -= L[i][k]*Y[k][col];
61             Y[i][col] = sum / L[i][i];
62         }
63     }
64     return Y;
65 }
66 
67 // Solve L^T X = Y for X (upper triangular solve)
68 Matrix solveUpperFromLowerT(const Matrix &L, const Matrix &Y){
69     int n = (int)L.size();
70     int m = (int)Y[0].size();
71     Matrix X = zeros(n, m);
72     for(int col=0; col<m; ++col){
73         for(int i=n-1;i>=0;--i){
74             double sum = Y[i][col];
75             for(int k=i+1;k<n;++k) sum -= L[k][i]*X[k][col]; // using L^T entries
76             X[i][col] = sum / L[i][i];
77         }
78     }
79     return X;
80 }
81 
82 Matrix pseudoinverse_full_column_rank(const Matrix &A){
83     // A: m x n with full column rank, m >= n
84     int m = (int)A.size();
85     int n = (int)A[0].size();
86     Matrix AT = transpose(A);              // n x m
87     Matrix G  = matmul(AT, A);             // n x n (Gram matrix)
88     Matrix L  = cholesky(G);               // n x n lower triangular
89     // Compute A^+ = (A^T A)^{-1} A^T by solving twice: X = G^{-1} AT = L^{-T} L^{-1} AT
90     Matrix Y  = solveLower(L, AT);         // solve L Y = A^T
91     Matrix X  = solveUpperFromLowerT(L, Y);// solve L^T X = Y -> X = A^+
92     return X; // X is n x m
93 }
94 
95 void printMatrix(const Matrix &A, const string &name){
96     cout << name << " (" << A.size() << "x" << A[0].size() << ")\n";
97     cout.setf(ios::fixed); cout<<setprecision(6);
98     for(const auto &row: A){
99         for(double v: row) cout << setw(12) << v << ' ';
100         cout << '\n';
101     }
102 }
103 
104 int main(){
105     // Example: A is 4x2 with full column rank (tall matrix)
106     Matrix A = {
107         {1.0, 1.0},
108         {1.0, 2.0},
109         {1.0, 3.0},
110         {1.0, 4.0}
111     }; // columns: [1,1,1,1]^T and [1,2,3,4]^T
112 
113     Matrix Aplus = pseudoinverse_full_column_rank(A);
114 
115     // Verify properties approximately: A^+ A ≈ I_2 and A A^+ A ≈ A
116     Matrix ApA = matmul(Aplus, A);       // should be 2x2 identity
117     Matrix AAp = matmul(A, Aplus);       // projector 4x4
118     Matrix AApA = matmul(AAp, A);        // should recover A
119 
120     printMatrix(Aplus, "A^+");
121     printMatrix(ApA,   "A^+ A");
122     printMatrix(AApA,  "A A^+ A");
123 
124     return 0;
125 }
126

This program computes the pseudoinverse A⁺ of a tall full-column-rank matrix using the identity A⁺ = (AᵀA)^{-1}Aᵀ. It avoids forming an explicit inverse by using Cholesky factorization of G = AᵀA and two triangular solves, which is both faster and more stable than inverting G directly. The demo prints A⁺, checks that A⁺A ≈ I (on Rⁿ), and verifies AA⁺A ≈ A.

Time: O(m n^2 + n^3) to form AᵀA, factor it, and solve two triangular systems with m right-hand sides.Space: O(m n + n^2) to store A, AᵀA, and intermediate factors; plus O(n m) for A⁺.

Solve least squares x = A⁺ b using QR (Modified Gram–Schmidt) without forming A⁺

1 #include <bits/stdc++.h>
2 using namespace std;
3 using Matrix = vector<vector<double>>;
4 using Vec = vector<double>;
5 
6 Matrix zeros(int r, int c){ return Matrix(r, vector<double>(c, 0.0)); }
7 Matrix transpose(const Matrix &A){ int m=A.size(), n=A[0].size(); Matrix AT=zeros(n,m); for(int i=0;i<m;++i) for(int j=0;j<n;++j) AT[j][i]=A[i][j]; return AT; }
8 
9 // Modified Gram-Schmidt QR for m x n with m >= n
10 void qr_mgs(const Matrix &A, Matrix &Q, Matrix &R){
11     int m = (int)A.size(), n = (int)A[0].size();
12     Q = zeros(m, n); R = zeros(n, n);
13     vector<vector<double>> V = A; // copy columns to orthogonalize
14     for(int j=0;j<n;++j){
15         // Compute norm of current vector
16         double norm = 0.0;
17         for(int i=0;i<m;++i) norm += V[i][j]*V[i][j];
18         R[j][j] = sqrt(max(0.0, norm));
19         if(R[j][j] == 0.0) throw runtime_error("Rank deficiency detected in QR");
20         for(int i=0;i<m;++i) Q[i][j] = V[i][j] / R[j][j];
21         // Orthogonalize remaining columns against Q[:,j]
22         for(int k=j+1;k<n;++k){
23             double r = 0.0;
24             for(int i=0;i<m;++i) r += Q[i][j] * V[i][k];
25             R[j][k] = r;
26             for(int i=0;i<m;++i) V[i][k] -= Q[i][j] * r;
27         }
28     }
29 }
30 
31 // Solve upper triangular R x = y (n x n)
32 Vec back_substitute(const Matrix &R, const Vec &y){
33     int n = (int)R.size();
34     Vec x(n, 0.0);
35     for(int i=n-1;i>=0;--i){
36         double sum = y[i];
37         for(int k=i+1;k<n;++k) sum -= R[i][k]*x[k];
38         x[i] = sum / R[i][i];
39     }
40     return x;
41 }
42 
43 // Compute x = argmin ||Ax - b||_2 using QR: R x = Q^T b
44 Vec least_squares_qr(const Matrix &A, const Vec &b){
45     int m = (int)A.size(), n = (int)A[0].size();
46     if((int)b.size()!=m) throw runtime_error("Dimension mismatch");
47     Matrix Q, R; qr_mgs(A, Q, R);
48     // y = Q^T b
49     Vec y(n, 0.0);
50     for(int j=0;j<n;++j){
51         double dot = 0.0;
52         for(int i=0;i<m;++i) dot += Q[i][j] * b[i];
53         y[j] = dot;
54     }
55     // Solve R x = y
56     return back_substitute(R, y);
57 }
58 
59 int main(){
60     // Fit a line y = c0 + c1 * x to noisy data via least squares
61     vector<double> xs = {0,1,2,3,4,5};
62     vector<double> ys = {1.1, 2.0, 2.9, 4.2, 5.1, 5.9};
63     int m = (int)xs.size();
64     // Build A with columns [1, x]
65     Matrix A(m, vector<double>(2));
66     for(int i=0;i<m;++i){ A[i][0]=1.0; A[i][1]=xs[i]; }
67     // b is observations
68     vector<double> b = ys;
69 
70     // Solve x = [c0, c1]
71     vector<double> coeff = least_squares_qr(A, b);
72 
73     cout.setf(ios::fixed); cout<<setprecision(6);
74     cout << "c0 = " << coeff[0] << ", c1 = " << coeff[1] << "\n";
75 
76     // Predict and print residual norm
77     double rss = 0.0;
78     for(int i=0;i<m;++i){
79         double pred = coeff[0] + coeff[1]*xs[i];
80         double r = pred - ys[i];
81         rss += r*r;
82     }
83     cout << "Residual sum of squares = " << rss << "\n";
84     return 0;
85 }
86

This code solves a least-squares problem using QR decomposition via Modified Gram–Schmidt, computing x such that Ax ≈ b without ever forming A⁺ explicitly. It is numerically more stable than normal equations because it avoids forming AᵀA and reduces sensitivity to conditioning. The example fits a straight line to noisy data.

Time: O(m n^2) for QR factorization and O(m n) for forming Qᵀ b, dominated by O(m n^2).Space: O(m n) to store Q plus O(n^2) for R.

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Utility functions for small dense matrices using vector<vector<double>>
5	using Matrix = vector<vector<double>>;
6
7	Matrix zeros(int r, int c) { return Matrix(r, vector<double>(c, 0.0)); }
8	Matrix identity(int n) { Matrix I = zeros(n, n); for (int i=0;i<n;++i) I[i][i]=1.0; return I; }
9
10	Matrix transpose(const Matrix &A){
11	int m = (int)A.size(), n = (int)A[0].size();
12	Matrix AT = zeros(n, m);
13	for(int i=0;i<m;++i) for(int j=0;j<n;++j) AT[j][i] = A[i][j];
14	return AT;
15	}
16
17	Matrix matmul(const Matrix &A, const Matrix &B){
18	int m = (int)A.size(), k = (int)A[0].size();
19	int n = (int)B[0].size();
20	Matrix C = zeros(m, n);
21	for(int i=0;i<m;++i){
22	for(int t=0;t<k;++t){
23	double a = A[i][t];
24	if (a==0.0) continue;
25	for(int j=0;j<n;++j){
26	C[i][j] += a * B[t][j];
27	}
28	}
29	}
30	return C;
31	}
32
33	// Cholesky factorization for SPD matrix G: returns lower triangular L such that G = L L^T
34	Matrix cholesky(const Matrix &G){
35	int n = (int)G.size();
36	Matrix L = zeros(n, n);
37	for(int i=0;i<n;++i){
38	for(int j=0;j<=i;++j){
39	double sum = G[i][j];
40	for(int k=0;k<j;++k) sum -= L[i][k]*L[j][k];
41	if(i==j){
42	if(sum <= 0) throw runtime_error("Matrix not SPD (Cholesky failed)");
43	L[i][i] = sqrt(sum);
44	} else {
45	L[i][j] = sum / L[j][j];
46	}
47	}
48	}
49	return L;
50	}
51
52	// Solve L Y = B for Y, where L is lower triangular; B can have multiple RHS (columns)
53	Matrix solveLower(const Matrix &L, const Matrix &B){
54	int n = (int)L.size();
55	int m = (int)B[0].size();
56	Matrix Y = zeros(n, m);
57	for(int col=0; col<m; ++col){
58	for(int i=0;i<n;++i){
59	double sum = B[i][col];
60	for(int k=0;k<i;++k) sum -= L[i][k]*Y[k][col];
61	Y[i][col] = sum / L[i][i];
62	}
63	}
64	return Y;
65	}
66
67	// Solve L^T X = Y for X (upper triangular solve)
68	Matrix solveUpperFromLowerT(const Matrix &L, const Matrix &Y){
69	int n = (int)L.size();
70	int m = (int)Y[0].size();
71	Matrix X = zeros(n, m);
72	for(int col=0; col<m; ++col){
73	for(int i=n-1;i>=0;--i){
74	double sum = Y[i][col];
75	for(int k=i+1;k<n;++k) sum -= L[k][i]*X[k][col]; // using L^T entries
76	X[i][col] = sum / L[i][i];
77	}
78	}
79	return X;
80	}
81
82	Matrix pseudoinverse_full_column_rank(const Matrix &A){
83	// A: m x n with full column rank, m >= n
84	int m = (int)A.size();
85	int n = (int)A[0].size();
86	Matrix AT = transpose(A); // n x m
87	Matrix G = matmul(AT, A); // n x n (Gram matrix)
88	Matrix L = cholesky(G); // n x n lower triangular
89	// Compute A^+ = (A^T A)^{-1} A^T by solving twice: X = G^{-1} AT = L^{-T} L^{-1} AT
90	Matrix Y = solveLower(L, AT); // solve L Y = A^T
91	Matrix X = solveUpperFromLowerT(L, Y);// solve L^T X = Y -> X = A^+
92	return X; // X is n x m
93	}
94
95	void printMatrix(const Matrix &A, const string &name){
96	cout << name << " (" << A.size() << "x" << A[0].size() << ")\n";
97	cout.setf(ios::fixed); cout<<setprecision(6);
98	for(const auto &row: A){
99	for(double v: row) cout << setw(12) << v << ' ';
100	cout << '\n';
101	}
102	}
103
104	int main(){
105	// Example: A is 4x2 with full column rank (tall matrix)
106	Matrix A = {
107	{1.0, 1.0},
108	{1.0, 2.0},
109	{1.0, 3.0},
110	{1.0, 4.0}
111	}; // columns: [1,1,1,1]^T and [1,2,3,4]^T
112
113	Matrix Aplus = pseudoinverse_full_column_rank(A);
114
115	// Verify properties approximately: A^+ A ≈ I_2 and A A^+ A ≈ A
116	Matrix ApA = matmul(Aplus, A); // should be 2x2 identity
117	Matrix AAp = matmul(A, Aplus); // projector 4x4
118	Matrix AApA = matmul(AAp, A); // should recover A
119
120	printMatrix(Aplus, "A^+");
121	printMatrix(ApA, "A^+ A");
122	printMatrix(AApA, "A A^+ A");
123
124	return 0;
125	}
126