∑MathIntermediate

Positive Definite Matrices

Key Points

•
A real symmetric matrix A is positive definite if and only if $x^{T}$ A $x > 0$ for every nonzero vector x, and positive semidefinite if $x^{T}$ A $x \geq 0$ .
•
All eigenvalues of a positive definite matrix are strictly positive; for semidefinite they are nonnegative.
•
A positive definite matrix admits a unique Cholesky factorization $A = L$ $L^{T}$ with L lower triangular and positive diagonal.
•
Sylvester’s criterion says A is positive definite iff all leading principal minors det( $A_{1 : k, 1 : k}$ ) are positive.
•
Cholesky-based solves are faster and more stable than generic LU for symmetric positive definite systems Ax = b.
•
Any Gram matrix $A = B^{T}$ B is positive semidefinite; it is positive definite if B has full column rank.
•
Numerically, tiny negative pivots can appear due to roundoff; use tolerances and prefer Cholesky to naive minor tests.
•
In optimization, covariance, kernels, and energy functions naturally produce (semi)definite matrices, enabling efficient algorithms.

Prerequisites

→Matrices and vectors — Positive definiteness is defined via quadratic forms x^T A x and requires familiarity with matrix–vector operations.
→Eigenvalues and eigenvectors — Eigenvalue characterizations (λ_i > 0 or ≥ 0) are central to understanding (semi)definiteness.
→Matrix factorizations (LU/QR) — Cholesky is a specialized factorization; comparing it with LU/QR clarifies when and why to use it.
→Convexity and Hessians — Definiteness of the Hessian determines local curvature and convexity in optimization problems.

Detailed Explanation

Tap terms for definitions

01Overview

Positive definite (PD) and positive semidefinite (PSD) matrices are special classes of real symmetric matrices that behave like positive numbers in many algebraic and geometric ways. They guarantee strictly positive (or nonnegative) quadratic forms $x^T$ A x, which implies important properties: invertibility for PD matrices, all real eigenvalues, and the existence of efficient factorizations such as the Cholesky decomposition A = L $L^T$ . These properties are fundamental in numerical linear algebra, optimization, statistics, and machine learning. For example, covariance matrices are PSD; Hessians of strictly convex functions are PD at minima; kernel Gram matrices are PSD by construction. Algorithmically, PD structure enables faster and more stable linear solves and log-determinant computations. The eigenvalue characterization (all eigenvalues > 0 for PD, ≥ 0 for PSD) ties geometry (curvature) to algebra. Sylvester’s criterion provides a determinant-based test using leading principal minors. In practice, we often test PD by attempting a Cholesky factorization; success with positive diagonals confirms PD. Understanding (semi)definiteness helps diagnose model validity (e.g., ensuring covariance matrices are well-posed) and choose efficient computational strategies.

02Intuition & Analogies

Think of a matrix as a machine that bends space. Feed in a vector x, and measure the “energy” it creates as $x^T$ A x. For a PD matrix, no matter which nonzero direction you push, the energy is strictly positive—like compressing a perfect spring that always resists and stores energy. For PSD, the energy is never negative; it may be zero for some special directions (like pushing along a hinge that moves freely). The eigenvectors are the principal directions of bending, and the eigenvalues tell you how strongly each direction is resisted. PD means every direction has strictly positive resistance; PSD allows some “flat” directions with zero resistance. Cholesky factorization is like finding a coordinate system (L) where this energy is just the sum of squared coordinates: $x^T$ A x = || $L^T$ x||_2^2. This looks like Pythagoras—no cross terms—so computations become simpler. Gram matrices A = $B^T$ B encode dot products between columns of B; energies become squared lengths after a linear transform, which can never be negative. Sylvester’s criterion resembles checking whether every top-left submachine behaves like a good spring—if each smaller system stores positive energy, the whole one does. In practice, attempting Cholesky is like trying to take a stable square root of the matrix: if the attempt fails (you need the square root of a negative number), the matrix wasn’t PD.

03Formal Definition

Let A

\in

R^{n \times n}

be symmetric (

A = A^{T}

). A is positive definite (PD) if for all nonzero x

\in

R^{n}

x^{T}

x > 0

. It is positive semidefinite (PSD) if for all x,

x^{T}

A x

\geq

0. Equivalently, if A is symmetric with eigenvalues \{

λ_{i}

\}_{

i=1}^n

, then A is PD iff

λ_{i}

> 0 for all i, and PSD iff

λ_{i}

\geq

0 for all i. A PD matrix admits a unique Cholesky factorization

A = L

L^{T}

where L is lower triangular with strictly positive diagonal entries. Sylvester’s criterion states: A is PD iff all leading principal minors are positive, that is,

det

(

A_{1 : k, 1 : k}

) > 0 for

k = 1

\dots

,n. A is PSD iff there exists B such that

A = B^{T}

B (Gram representation). For any x, the Rayleigh quotient

R_{A}

(x) =

\frac{x ^{T} A x}{x ^{T} x}

satisfies

λ_{m i n}

\leq

R_{A}

(x)

\leq

λ_{m a x}

; PD implies strictly positive bounds. PD implies invertibility with

A^{- 1}

also PD. PSD allows singularity when some eigenvalues are zero.

04When to Use

Solving symmetric positive definite linear systems Ax = b: use Cholesky instead of LU to gain speed (about half the flops) and numerical stability.
Statistical modeling: covariance matrices must be PSD (often PD for strict invertibility). Cholesky is used to simulate correlated normals and compute log-likelihoods via log $\operatorname{det}(A)$ .
Optimization: Hessians of strictly convex quadratic objectives are PD; more general convex Hessians are PSD. Newton and trust-region methods exploit PD structure.
Machine learning: kernel (Gram) matrices from inner products are PSD; enforcing PSD ensures valid kernels and stable training.
Control and signal processing: Lyapunov and Riccati equations rely on PD solutions; energy and stability analyses often reduce to checking definiteness.
Regularization: adding \lambda I (\lambda > 0) to a PSD matrix yields PD, improving conditioning and guaranteeing unique solutions. Choose PD methods when your matrix is symmetric and you can argue $x^T$ A x is strictly positive (or enforce it via regularization). Use PSD-aware logic when zero-energy directions may exist (e.g., underdetermined problems, rank deficiency).

⚠️Common Mistakes

Confusing elementwise positivity (all $a_{ij}$ > 0) with positive definiteness; they are unrelated. A matrix can have negative entries and still be PD, or have all positive entries and be indefinite.
Forgetting symmetry: $x^T$ A x depends only on the symmetric part (\tfrac{1}{2}(A + $A^T$ )). Non-symmetric A cannot be PD in the standard sense; always symmetrize before testing.
Numerical pitfalls in Sylvester’s criterion: determinants of leading minors can under/overflow and are extremely sensitive to roundoff. Prefer Cholesky as a practical PD test.
Ignoring tolerances: tiny negative pivots during Cholesky can arise from rounding even for theoretically PD A. Use a small \varepsilon and treat pivots below \varepsilon as failure.
Using generic LU/QR to solve SPD systems: this wastes structure and may reduce stability. Cholesky is simpler, faster, and more accurate for SPD.
Misinterpreting PSD vs PD: PSD allows zero eigenvalues, so A may be singular; attempting to invert or Cholesky-factor a merely PSD matrix (without pivoting or jitter) will fail.
Assuming positive trace or determinant implies PD; they are necessary (det > 0 for PD) but not sufficient alone. Always use rigorous tests (Cholesky, eigenvalues, or Sylvester’s full criterion).

Key Formulas

PD via quadratic form

x^{T} A x > 0 for all x \neq = 0

Explanation: This is the defining property of positive definite matrices: every nonzero direction yields strictly positive energy. $R e pl a ce > w i t h$ $\geq$ for semidefinite.

Symmetry requirement

A = A^{T}

Explanation: Definiteness is defined for symmetric matrices; the quadratic form depends only on the symmetric part of A.

Eigenvalue characterization

{λ_{i}}_{i = 1}^{n} \subset R, A v_{i} = λ_{i} v_{i}, A PD ⟺ λ_{i} > 0 \forall i

Explanation: All eigenvalues of a PD matrix are strictly positive; for PSD they are nonnegative. This ties curvature to spectral properties.

Cholesky factorization

A = L L^{T}, L lower triangular, diag (L) > 0

Explanation: Every PD matrix has a unique Cholesky factor L with positive diagonal. It is a numerically stable and efficient factorization.

Sylvester’s criterion

det (A_{1 : k, 1 : k}) > 0 for k = 1, \dots, n

Explanation: All leading principal minors must be positive for positive definiteness. It provides a determinant-based test, though less stable numerically.

Rayleigh quotient bounds

R_{A} (x) = \frac{x ^{T} A x}{x ^{T} x}, λ_{m i n} \leq R_{A} (x) \leq λ_{m a x}

Explanation: The Rayleigh quotient lies between the smallest and largest eigenvalues. It provides variational characterizations of eigenvalues.

Energy bounds

λ_{m i n} ∥ x ∥_{2}^{2} \leq x^{T} A x \leq λ_{m a x} ∥ x ∥_{2}^{2}

Explanation: For PD A, the quadratic form is bounded above and below by eigenvalues times the squared norm, ensuring coercivity and stability.

Gram representation

A = B^{T} B (\Rightarrow A ⪰ 0)

Explanation: Any matrix of the form $B^{T}$ B is PSD. If B has full column rank n, then A is PD.

Cholesky complexity

T_{chol} (n) \approx \frac{1}{3} n^{3} flops, S_{chol} (n) = O (n^{2})

Explanation: Cholesky requires roughly one-third n cubed floating-point operations and stores O( $n^{2}$ ) numbers. It is about twice as fast as LU for SPD.

Log-determinant via Cholesky

lo g det (A) = 2 i = 1 \sum n lo g L_{ii}

Explanation: For $A = L$ $L^{T}$ with positive diagonal, the determinant is the square of the product of diagonal entries. Taking logs turns products into sums for numerical stability.

Condition number (2-norm)

κ_{2} (A) = ∥ A ∥_{2} ∥ A^{- 1} ∥_{2} = \frac{λ _{m a x} ( A )}{λ _{m i n} ( A )}

Explanation: For PD matrices, the 2-norm condition number is the ratio of extreme eigenvalues. It predicts error amplification in linear solves.

Complexity Analysis

For an n×n symmetric positive definite matrix, Cholesky factorization runs in O(

n^{3}

) time with about

\frac{1}{3} n^{3}

floating-point operations and requires O(

n^{2}

) space to store the lower-triangular factor. This is roughly twice as fast as LU decomposition on general matrices due to exploiting symmetry and definiteness. Solving

A x = b

via Cholesky proceeds in three stages: (1) factorization

A = L

L^{T}

(O(

n^{3}

)); (2) forward substitution to solve L

y = b

(O(

n^{2}

)); and (3) back substitution to solve

L^{T}

x = y

(O(

n^{2}

)), so the total solve after factorization is O(

n^{2}

). When solving for multiple right-hand sides, the initial O(

n^{3}

) factorization is reused, and each additional solve costs only O(

n^{2}

), providing substantial savings. Constructing a PSD Gram matrix

A = B^{T}

B where B is m×n costs O(m

n^{2}

) if done naively (multiplying

B^{T}

and B), with O(

n^{2}

) storage for A. Numerical stability is generally good for Cholesky on well-conditioned PD matrices; however, if the condition number κ is large, rounding errors can degrade accuracy. Adding a small multiple of the identity (

A \leftarrow A

λ I)

can improve conditioning and ensure PD at a cost of O(

n^{2}

) to adjust diagonal entries. Determinant computations using Cholesky avoid unstable pivot growth and reduce to O(n) work after factorization to sum log-diagonals for the log-determinant. Sylvester’s criterion requires computing determinants of all leading minors, which naively is O(

n^{4}

) and numerically ill-conditioned; thus, it is seldom used in practice for large matrices.

Code Examples

Cholesky decomposition (PD test) and factor construction

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Check if a matrix is (approximately) symmetric within tolerance
5 bool isSymmetric(const vector<vector<double>>& A, double tol = 1e-10) {
6     int n = (int)A.size();
7     for (int i = 0; i < n; ++i) {
8         if ((int)A[i].size() != n) return false;
9         for (int j = i+1; j < n; ++j) {
10             if (fabs(A[i][j] - A[j][i]) > tol * (1.0 + max(fabs(A[i][j]), fabs(A[j][i]))))
11                 return false;
12         }
13     }
14     return true;
15 }
16 
17 // Computes the Cholesky factorization A = L * L^T for SPD A.
18 // Returns true on success and fills L (lower triangular).
19 // On failure (not PD), returns false and L is partial.
20 bool choleskyDecompose(const vector<vector<double>>& A,
21                        vector<vector<double>>& L,
22                        double eps = 1e-12) {
23     int n = (int)A.size();
24     L.assign(n, vector<double>(n, 0.0));
25 
26     for (int i = 0; i < n; ++i) {
27         for (int j = 0; j <= i; ++j) {
28             long double sum = A[i][j];
29             for (int k = 0; k < j; ++k) sum -= (long double)L[i][k] * L[j][k];
30 
31             if (i == j) {
32                 // Diagonal element: must be positive for PD
33                 if (sum <= eps) return false; // not PD (or numerically singular)
34                 L[i][i] = sqrt((double)sum);
35             } else {
36                 L[i][j] = (double)(sum / L[j][j]);
37             }
38         }
39     }
40     return true;
41 }
42 
43 int main() {
44     // Example SPD matrix
45     vector<vector<double>> A = {
46         {  4.0,  2.0,  2.0 },
47         {  2.0, 10.0,  4.0 },
48         {  2.0,  4.0,  9.0 }
49     };
50 
51     if (!isSymmetric(A)) {
52         cerr << "Matrix is not symmetric; definiteness not defined in the standard sense.\n";
53         return 0;
54     }
55 
56     vector<vector<double>> L;
57     bool ok = choleskyDecompose(A, L);
58     if (!ok) {
59         cout << "Cholesky failed: A is not positive definite (or ill-conditioned).\n";
60         return 0;
61     }
62 
63     cout << "Cholesky factor L (lower triangular):\n";
64     cout.setf(std::ios::fixed); cout << setprecision(6);
65     for (int i = 0; i < (int)L.size(); ++i) {
66         for (int j = 0; j < (int)L.size(); ++j) cout << setw(10) << L[i][j] << ' ';
67         cout << '\n';
68     }
69 
70     // Optional: compute log(det(A)) = 2 * sum log L_{ii}
71     double logdet = 0.0;
72     for (int i = 0; i < (int)L.size(); ++i) logdet += 2.0 * log(L[i][i]);
73     cout << "log(det(A)) = " << logdet << "\n";
74 
75     return 0;
76 }
77

Attempts a Cholesky factorization A = L L^T. If any diagonal pivot becomes nonpositive, A is not PD (or is numerically singular), and the routine reports failure. On success, L has positive diagonal entries and can be used for efficient solves and determinant computations.

Time: O(n^3) for an n×n matrixSpace: O(n^2) to store L

Solve Ax = b using Cholesky (forward/back substitution)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 bool choleskyDecompose(const vector<vector<double>>& A,
5                        vector<vector<double>>& L,
6                        double eps = 1e-12) {
7     int n = (int)A.size();
8     L.assign(n, vector<double>(n, 0.0));
9     for (int i = 0; i < n; ++i) {
10         for (int j = 0; j <= i; ++j) {
11             long double sum = A[i][j];
12             for (int k = 0; k < j; ++k) sum -= (long double)L[i][k] * L[j][k];
13             if (i == j) {
14                 if (sum <= eps) return false;
15                 L[i][i] = sqrt((double)sum);
16             } else {
17                 L[i][j] = (double)(sum / L[j][j]);
18             }
19         }
20     }
21     return true;
22 }
23 
24 // Solve L y = b (L lower triangular with positive diagonal)
25 vector<double> forwardSubstitution(const vector<vector<double>>& L, const vector<double>& b) {
26     int n = (int)L.size();
27     vector<double> y(n, 0.0);
28     for (int i = 0; i < n; ++i) {
29         long double sum = b[i];
30         for (int j = 0; j < i; ++j) sum -= (long double)L[i][j] * y[j];
31         y[i] = (double)(sum / L[i][i]);
32     }
33     return y;
34 }
35 
36 // Solve L^T x = y (back substitution)
37 vector<double> backSubstitutionLt(const vector<vector<double>>& L, const vector<double>& y) {
38     int n = (int)L.size();
39     vector<double> x(n, 0.0);
40     for (int i = n - 1; i >= 0; --i) {
41         long double sum = y[i];
42         for (int j = i + 1; j < n; ++j) sum -= (long double)L[j][i] * x[j];
43         x[i] = (double)(sum / L[i][i]);
44     }
45     return x;
46 }
47 
48 int main() {
49     // SPD system Ax = b
50     vector<vector<double>> A = {
51         {25.0, 15.0, -5.0},
52         {15.0, 18.0,  0.0},
53         {-5.0,  0.0, 11.0}
54     };
55     vector<double> b = {35.0, 33.0,  6.0};
56 
57     vector<vector<double>> L;
58     if (!choleskyDecompose(A, L)) {
59         cerr << "Matrix is not PD; cannot use Cholesky.\n";
60         return 0;
61     }
62 
63     // Solve A x = b via L L^T x = b
64     vector<double> y = forwardSubstitution(L, b);
65     vector<double> x = backSubstitutionLt(L, y);
66 
67     cout.setf(std::ios::fixed); cout << setprecision(6);
68     cout << "Solution x:\n";
69     for (double xi : x) cout << xi << " ";
70     cout << "\n";
71 
72     // Residual check
73     int n = (int)A.size();
74     vector<double> r(n, 0.0);
75     for (int i = 0; i < n; ++i)
76         for (int j = 0; j < n; ++j)
77             r[i] += A[i][j] * x[j];
78     for (int i = 0; i < n; ++i) r[i] -= b[i];
79 
80     double rnorm = 0.0; for (double ri : r) rnorm = max(rnorm, fabs(ri));
81     cout << "Max absolute residual: " << rnorm << "\n";
82 
83     return 0;
84 }
85

Factors A = L L^T, then solves L y = b and L^T x = y to obtain x. This is the standard SPD solve pipeline and is both efficient and numerically stable compared to generic methods.

Time: O(n^3) for factorization, then O(n^2) per right-hand sideSpace: O(n^2) to store L plus O(n) working vectors

Construct a PSD matrix as a Gram matrix A = B^T B and verify x^T A x ≥ 0

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Multiply A = B^T * B for B of size m x n (rows m, cols n)
5 vector<vector<double>> gramMatrix(const vector<vector<double>>& B) {
6     int m = (int)B.size();
7     int n = (int)B[0].size();
8     vector<vector<double>> A(n, vector<double>(n, 0.0));
9     for (int i = 0; i < n; ++i) {
10         for (int j = 0; j <= i; ++j) {
11             long double sum = 0.0L;
12             for (int k = 0; k < m; ++k) sum += (long double)B[k][i] * B[k][j];
13             A[i][j] = A[j][i] = (double)sum;
14         }
15     }
16     return A;
17 }
18 
19 int main() {
20     // Build B (m x n) with m >= n, then A = B^T B is PSD; PD if B has full column rank
21     int m = 5, n = 3;
22     vector<vector<double>> B(m, vector<double>(n));
23     mt19937 rng(42);
24     normal_distribution<double> N(0.0, 1.0);
25     for (int i = 0; i < m; ++i) for (int j = 0; j < n; ++j) B[i][j] = N(rng);
26 
27     vector<vector<double>> A = gramMatrix(B);
28 
29     // Empirical check: sample random x and confirm x^T A x >= 0
30     bool ok = true;
31     for (int t = 0; t < 1000; ++t) {
32         vector<double> x(n);
33         for (int i = 0; i < n; ++i) x[i] = N(rng);
34         // Compute q = x^T A x
35         long double q = 0.0L;
36         for (int i = 0; i < n; ++i) {
37             long double row = 0.0L;
38             for (int j = 0; j < n; ++j) row += (long double)A[i][j] * x[j];
39             q += row * x[i];
40         }
41         if (q < -1e-10L) { ok = false; break; }
42     }
43 
44     cout << (ok ? "A appears PSD by sampling (x^T A x >= 0)." : "Potential issue: negative quadratic form found.") << "\n";
45 
46     // If B has full column rank, A is PD. Add small ridge to enforce PD if needed.
47     for (int i = 0; i < n; ++i) A[i][i] += 1e-8; // regularization (optional)
48 
49     return 0;
50 }
51

Constructs a Gram matrix A = B^T B, which is always PSD by theory. It then empirically verifies nonnegativity of the quadratic form by random sampling. Adding a small diagonal “ridge” can enforce strict PD in practice.

Time: O(m n^2) to form A from an m×n matrix BSpace: O(n^2) to store A

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Check if a matrix is (approximately) symmetric within tolerance
5	bool isSymmetric(const vector<vector<double>>& A, double tol = 1e-10) {
6	int n = (int)A.size();
7	for (int i = 0; i < n; ++i) {
8	if ((int)A[i].size() != n) return false;
9	for (int j = i+1; j < n; ++j) {
10	if (fabs(A[i][j] - A[j][i]) > tol * (1.0 + max(fabs(A[i][j]), fabs(A[j][i]))))
11	return false;
12	}
13	}
14	return true;
15	}
16
17	// Computes the Cholesky factorization A = L * L^T for SPD A.
18	// Returns true on success and fills L (lower triangular).
19	// On failure (not PD), returns false and L is partial.
20	bool choleskyDecompose(const vector<vector<double>>& A,
21	vector<vector<double>>& L,
22	double eps = 1e-12) {
23	int n = (int)A.size();
24	L.assign(n, vector<double>(n, 0.0));
25
26	for (int i = 0; i < n; ++i) {
27	for (int j = 0; j <= i; ++j) {
28	long double sum = A[i][j];
29	for (int k = 0; k < j; ++k) sum -= (long double)L[i][k] * L[j][k];
30
31	if (i == j) {
32	// Diagonal element: must be positive for PD
33	if (sum <= eps) return false; // not PD (or numerically singular)
34	L[i][i] = sqrt((double)sum);
35	} else {
36	L[i][j] = (double)(sum / L[j][j]);
37	}
38	}
39	}
40	return true;
41	}
42
43	int main() {
44	// Example SPD matrix
45	vector<vector<double>> A = {
46	{ 4.0, 2.0, 2.0 },
47	{ 2.0, 10.0, 4.0 },
48	{ 2.0, 4.0, 9.0 }
49	};
50
51	if (!isSymmetric(A)) {
52	cerr << "Matrix is not symmetric; definiteness not defined in the standard sense.\n";
53	return 0;
54	}
55
56	vector<vector<double>> L;
57	bool ok = choleskyDecompose(A, L);
58	if (!ok) {
59	cout << "Cholesky failed: A is not positive definite (or ill-conditioned).\n";
60	return 0;
61	}
62
63	cout << "Cholesky factor L (lower triangular):\n";
64	cout.setf(std::ios::fixed); cout << setprecision(6);
65	for (int i = 0; i < (int)L.size(); ++i) {
66	for (int j = 0; j < (int)L.size(); ++j) cout << setw(10) << L[i][j] << ' ';
67	cout << '\n';
68	}
69
70	// Optional: compute log(det(A)) = 2 * sum log L_{ii}
71	double logdet = 0.0;
72	for (int i = 0; i < (int)L.size(); ++i) logdet += 2.0 * log(L[i][i]);
73	cout << "log(det(A)) = " << logdet << "\n";
74
75	return 0;
76	}
77

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Multiply A = B^T * B for B of size m x n (rows m, cols n)
5	vector<vector<double>> gramMatrix(const vector<vector<double>>& B) {
6	int m = (int)B.size();
7	int n = (int)B[0].size();
8	vector<vector<double>> A(n, vector<double>(n, 0.0));
9	for (int i = 0; i < n; ++i) {
10	for (int j = 0; j <= i; ++j) {
11	long double sum = 0.0L;
12	for (int k = 0; k < m; ++k) sum += (long double)B[k][i] * B[k][j];
13	A[i][j] = A[j][i] = (double)sum;
14	}
15	}
16	return A;
17	}
18
19	int main() {
20	// Build B (m x n) with m >= n, then A = B^T B is PSD; PD if B has full column rank
21	int m = 5, n = 3;
22	vector<vector<double>> B(m, vector<double>(n));
23	mt19937 rng(42);
24	normal_distribution<double> N(0.0, 1.0);
25	for (int i = 0; i < m; ++i) for (int j = 0; j < n; ++j) B[i][j] = N(rng);
26
27	vector<vector<double>> A = gramMatrix(B);
28
29	// Empirical check: sample random x and confirm x^T A x >= 0
30	bool ok = true;
31	for (int t = 0; t < 1000; ++t) {
32	vector<double> x(n);
33	for (int i = 0; i < n; ++i) x[i] = N(rng);
34	// Compute q = x^T A x
35	long double q = 0.0L;
36	for (int i = 0; i < n; ++i) {
37	long double row = 0.0L;
38	for (int j = 0; j < n; ++j) row += (long double)A[i][j] * x[j];
39	q += row * x[i];
40	}
41	if (q < -1e-10L) { ok = false; break; }
42	}
43
44	cout << (ok ? "A appears PSD by sampling (x^T A x >= 0)." : "Potential issue: negative quadratic form found.") << "\n";
45
46	// If B has full column rank, A is PD. Add small ridge to enforce PD if needed.
47	for (int i = 0; i < n; ++i) A[i][i] += 1e-8; // regularization (optional)
48
49	return 0;
50	}
51