🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
∑MathIntermediate

Multivariate Gaussian Distribution

Key Points

  • •
    A multivariate Gaussian (normal) distribution models a vector of real-valued variables with a bell-shaped probability hill in many dimensions.
  • •
    It is defined by a mean vector that sets the center and a covariance matrix that sets the shape, size, and orientation of the probability ellipsoid.
  • •
    Its probability density function uses the determinant and inverse (or Cholesky factor) of the covariance, so numerically stable linear algebra is essential.
  • •
    Distances are measured by the Mahalanobis distance, which rescales and rotates space to account for correlations between variables.
  • •
    Linear transformations of a multivariate Gaussian remain Gaussian, and marginals and conditionals are also Gaussian with closed-form parameters.
  • •
    Maximum likelihood estimation of the mean and covariance has simple formulas, but the covariance must be positive definite; regularization often helps.
  • •
    In code, avoid explicitly inverting the covariance; use Cholesky decomposition to compute log-densities, sample, and solve systems robustly.
  • •
    Computationally, precompute a Cholesky factor once in O(d3) and then evaluate many log-densities or draw samples in O(d2) each.

Prerequisites

  • →Linear algebra (vectors, matrices, transpose, matrix multiplication) — The PDF and all computations rely on matrix operations, eigenvalues, and determinants.
  • →Probability and random variables — Understanding distributions, expectation, and independence is necessary to interpret μ and Σ.
  • →Variance and covariance — The covariance matrix encodes variances and covariances; their meaning drives model interpretation.
  • →Numerical stability and conditioning — Stable computation of log-densities and sampling requires avoiding explicit matrix inverses.
  • →Cholesky decomposition — Core tool for factoring Σ to compute Mahalanobis distances, log-determinants, and sampling.
  • →Random number generation — Sampling needs standard normal draws and reproducible RNGs.

Detailed Explanation

Tap terms for definitions

01Overview

The multivariate Gaussian distribution is a cornerstone of probability and statistics for modeling continuous, real-valued vectors. Think of it as a higher-dimensional extension of the familiar bell curve. Instead of a single mean and variance, it uses a mean vector to specify the center of mass and a covariance matrix to capture spread and correlation among components. Correlation matters: if two dimensions tend to increase together, the probability mass stretches along that shared direction. This distribution is popular because it is mathematically elegant and stable under many operations: linear transformations of a Gaussian remain Gaussian, and the marginals (subsets of coordinates) and conditionals (given others) are also Gaussian with tractable formulas.

In applications, multivariate Gaussians approximate sensor noise, model feature vectors in computer vision, underlie Kalman filters in tracking, and appear in Bayesian inference, PCA, and Gaussian process models. Computationally, the covariance matrix must be symmetric positive definite; this ensures the distribution is well-defined. Implementation-wise, computing probabilities and drawing samples efficiently and stably requires matrix decompositions—especially the Cholesky factorization—rather than direct matrix inversion. The result is a model that is both interpretable (through means, variances, and correlations) and practical for a wide range of engineering and scientific tasks.

02Intuition & Analogies

Picture a pile of sand poured onto a flat floor: in one dimension it forms a bell; in two dimensions it forms a smooth, mound-like hill; in more dimensions, it becomes a higher-dimensional hill you can’t see but can describe mathematically. The peak of the hill is at the mean vector μ, which says where the center of the mound sits. The shape of the hill is not necessarily circular; it can be stretched and tilted. That stretching and tilting are encoded by the covariance matrix Σ. If two coordinates tend to rise and fall together, the hill becomes elongated along that diagonal direction. If they vary independently, the hill is more circular (or spherical in higher dimensions).

A helpful mental model is to start with a spherical cloud of points (like a ball-shaped fog) where each direction has the same variance and no correlations. Apply a stretch-and-rotate operation (an affine transform): stretch more in some directions (larger variance), less in others, and rotate to align with preferred axes. The result is an ellipsoid-shaped cloud—the multivariate Gaussian. Distances inside this cloud are not best measured by plain Euclidean distance; instead, you measure how many standard deviations away a point is after undoing the stretch and rotation. That special distance is the Mahalanobis distance. If you rescale and rotate the space just right (by Σ^{-1/2}), the ellipsoid becomes a unit sphere and the distribution looks like a standard normal again.

In everyday terms: if you predict height and weight together, knowing someone is taller than average suggests they might also be heavier than average—those variables correlate. The multivariate Gaussian captures this idea by letting probability ‘lean’ along the joint trend rather than treating features independently.

03Formal Definition

A d-dimensional random vector X has a multivariate normal distribution with mean vector μ ∈ Rd and covariance matrix Σ ∈ Rd×d (symmetric positive definite) if its probability density function is f(x) = (2π)d/2det(Σ)​1​ \exp\left(-21​(x-μ)^{⊤}Σ−1(x-μ)\right). Here, det(Σ) is the determinant and Σ^{-1} is the inverse; the quadratic form (x−μ)⊤Σ^{-1}(x−μ) is the squared Mahalanobis distance. Equivalently, X can be written as X = μ + A Z where Z ∼ N(0, Id​) and A A⊤ = Σ (e.g., A can be the Cholesky factor). Key closure properties include: (1) linear transformations of a Gaussian are Gaussian (AX + b ∼ N(Aμ + b, AΣA⊤)); (2) any subset of coordinates (marginal) is Gaussian with the corresponding subvector and submatrix of μ and Σ; (3) conditionals are Gaussian and can be computed from block partitions of μ and Σ. The covariance must be symmetric positive definite (all eigenvalues strictly positive) to ensure the density exists and integrates to 1. Estimation via maximum likelihood from independent samples {xi​} yields μ^​ = n1​∑ xi​ and Σ^ = n1​∑(xi​ − μ^​)(xi​ − μ^​)^{⊤} for the MLE (or 1/(n−1) for the unbiased estimator).

04When to Use

Use a multivariate Gaussian when you need to model real-valued vectors that cluster around a central mean and exhibit elliptical, correlated spread. Typical scenarios include sensor fusion and tracking (Kalman filters assume Gaussian process and measurement noise), feature modeling in vision and audio (where covariance captures feature correlations), anomaly detection (large Mahalanobis distance flags outliers), and generative classification (LDA/QDA assume class-conditionals are Gaussian). It is also appropriate when linear combinations of many small, independent effects drive the data; by a multivariate central limit effect, the Gaussian becomes a good approximation.

It is especially effective if scatter plots of variable pairs look roughly elliptical and if residuals appear unimodal and symmetric. If you must repeatedly evaluate densities or sample, and you can precompute a factorization of the covariance, the model is computationally tractable. However, if the data are multimodal (several separate clusters) or strongly non-elliptical (heavy tails, skewness), a single Gaussian may be too restrictive; consider mixtures of Gaussians or other distributions. In high dimensions with limited data, covariance estimation becomes ill-conditioned; then use regularization (e.g., add λI), shrinkage estimators, or low-rank structure (factor analysis) to remain robust.

⚠️Common Mistakes

• Inverting Σ directly: Numerical inversion is unstable and slow. Prefer a Cholesky decomposition Σ = L L^{\top} to compute log-densities and Mahalanobis distances without forming Σ^{-1}. • Forgetting positive definiteness: A sample covariance can be singular (especially when n ≤ d or features are collinear). Add a small ridge term λI to ensure Σ is positive definite and Cholesky succeeds. • Mixing biased and unbiased estimators: The MLE uses 1/n while the unbiased sample covariance uses 1/(n−1). Be consistent with your statistical goals when reporting or regularizing. • Ignoring units and scaling: Variables with wildly different scales can dominate distances. Standardize or carefully interpret Σ; Mahalanobis distance already accounts for scale, but ill-conditioned Σ still harms numerics. • Dropping the normalization constant: When computing true probabilities (or comparing non-nested models), include the determinant term; for relative comparisons within the same Σ, the constant may cancel, but not across different Σ. • Misinterpreting correlation as causation: A large off-diagonal covariance indicates association, not causal influence. • Assuming mixtures are Gaussian: A sum of two different Gaussians (mixture) is not Gaussian; only linear transformations and sums of independent Gaussians with the same random draw (not mixture) preserve normality. • Failing to use log-space: Products of small densities underflow; sum log-densities instead and use the log-pdf.

Key Formulas

Multivariate Gaussian PDF

f(x)=(2π)d/2det(Σ)​1​exp(−21​(x−μ)⊤Σ−1(x−μ))

Explanation: This gives the probability density at point x. It depends on the squared Mahalanobis distance and the determinant of the covariance for proper normalization.

Log-PDF

logf(x)=−21​(dlog(2π)+logdet(Σ)+(x−μ)⊤Σ−1(x−μ))

Explanation: Working in log-space avoids numerical underflow and turns products into sums. It is the standard way to compute likelihoods in high dimensions.

Mahalanobis Distance Squared

DM2​(x)=(x−μ)⊤Σ−1(x−μ)

Explanation: This measures distance in units of standard deviation along principal directions. Equal-density contours correspond to constant Mahalanobis distance.

Mean Estimator (MLE)

μ^​=n1​i=1∑n​xi​

Explanation: The maximum likelihood estimate of the mean is the sample average. It centers the Gaussian at the empirical mean of the data.

Covariance Estimator (MLE)

Σ^MLE​=n1​i=1∑n​(xi​−μ^​)(xi​−μ^​)⊤

Explanation: This is the MLE for covariance assuming independent samples. Using 1/(n-1) instead yields the unbiased sample covariance.

Affine Transform of a Gaussian

Y=AX+b,X∼N(μ,Σ)⇒Y∼N(Aμ+b,AΣA⊤)

Explanation: Applying a linear map and shift to a Gaussian preserves normality. The mean and covariance transform predictably.

Conditional Gaussian

[X1​X2​​]∼N([μ1​μ2​​],[Σ11​Σ21​​Σ12​Σ22​​])⇒X1​∣X2​=x2​∼N(μ1​+Σ12​Σ22−1​(x2​−μ2​),Σ11​−Σ12​Σ22−1​Σ21​)

Explanation: Conditioning one block of variables on another yields a Gaussian. The conditional mean is an affine function of the observed block, and the conditional covariance shrinks accordingly.

Sum of Independent Gaussians

X+Y∼N(μX​+μY​,ΣX​+ΣY​)if X⊥⊥Y

Explanation: The sum of independent Gaussian vectors is Gaussian. Covariances add and means add when independence holds.

Differential Entropy

H(X)=21​log((2πe)ddet(Σ))

Explanation: Entropy of a multivariate normal depends only on the determinant of the covariance. Larger spread (determinant) means higher uncertainty.

KL Divergence Between Gaussians

DKL​(N(μ0​,Σ0​)∥N(μ1​,Σ1​))=21​(tr(Σ1−1​Σ0​)+(μ1​−μ0​)⊤Σ1−1​(μ1​−μ0​)−d+logdetΣ0​detΣ1​​)

Explanation: A closed-form measure of how one Gaussian differs from another. Useful in variational methods and model comparison.

Determinant via Eigenvalues

det(Σ)=i=1∏d​λi​,logdet(Σ)=i=1∑d​logλi​

Explanation: For symmetric positive definite Σ with eigenvalues λi​, the determinant is their product. Log-determinant is the sum of their logs and is numerically stable.

Complexity Analysis

Core operations on a d-dimensional multivariate Gaussian revolve around linear algebra. A numerically stable approach uses the Cholesky factorization Σ = L LT. Computing the Cholesky factor costs O(d3) time and O(d2) space. Once L is available, many repeated tasks become cheaper: evaluating the log-density for a single point requires solving two triangular systems (effectively computing y=L−1(x−μ)) and taking an inner product yTy, which costs O(d2) time; the log-determinant is 2∑ log Lii​, available in O(d) after factorization. Thus, for m points with a fixed Σ, total time is O(d3 + m d2). Sampling k points from N(μ, Σ) can be done by drawing z ∼ N(0, I) and setting x = μ + L z. Each sample requires a matrix–vector multiply with L, costing O(d2), so total time is O(d3 + k d2) including the one-time Cholesky. Memory usage remains O(d2) for L and O(d) per vector. Estimating μ and Σ from n samples requires O(n d) to compute the mean and O(n d2) to accumulate the covariance (outer products). If needed, a Cholesky factorization of the estimate adds O(d3). In high dimensions (large d) with limited n, Σ may be ill-conditioned or singular; adding a small ridge λI preserves positive definiteness and enables stable O(d3) factorizations. Avoid explicit matrix inversion (O(d3) and numerically fragile); instead, reuse L to solve linear systems and compute Mahalanobis distances in O(d2).

Code Examples

Stable log-pdf and Mahalanobis distance using Cholesky (no explicit inverse)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Simple matrix/vector aliases
5using Vec = vector<double>;
6using Mat = vector<vector<double>>;
7
8// Create an identity matrix of size d
9Mat eye(int d) { Mat I(d, Vec(d, 0.0)); for (int i=0;i<d;++i) I[i][i]=1.0; return I; }
10
11// Cholesky decomposition: Sigma = L * L^T for symmetric positive definite Sigma.
12// Returns true on success, false if Sigma is not SPD.
13bool cholesky(const Mat &Sigma, Mat &L) {
14 int d = (int)Sigma.size();
15 L.assign(d, Vec(d, 0.0));
16 for (int i = 0; i < d; ++i) {
17 for (int j = 0; j <= i; ++j) {
18 double sum = Sigma[i][j];
19 for (int k = 0; k < j; ++k) sum -= L[i][k] * L[j][k];
20 if (i == j) {
21 if (sum <= 0.0) return false; // not SPD
22 L[i][i] = sqrt(max(sum, 0.0));
23 } else {
24 L[i][j] = sum / L[j][j];
25 }
26 }
27 }
28 return true;
29}
30
31// Solve L * y = b for y (L is lower triangular)
32Vec solveLower(const Mat &L, const Vec &b) {
33 int d = (int)L.size();
34 Vec y(d, 0.0);
35 for (int i = 0; i < d; ++i) {
36 double sum = b[i];
37 for (int k = 0; k < i; ++k) sum -= L[i][k] * y[k];
38 y[i] = sum / L[i][i];
39 }
40 return y;
41}
42
43// Solve L^T * x = y for x (L is lower triangular)
44Vec solveUpperFromLowerT(const Mat &L, const Vec &y) {
45 int d = (int)L.size();
46 Vec x(d, 0.0);
47 for (int i = d-1; i >= 0; --i) {
48 double sum = y[i];
49 for (int k = i+1; k < d; ++k) sum -= L[k][i] * x[k];
50 x[i] = sum / L[i][i];
51 }
52 return x;
53}
54
55// Compute log(det(Sigma)) from its Cholesky factor L: log det = 2 * sum log L_ii
56double logDetFromCholesky(const Mat &L) {
57 double s = 0.0;
58 for (int i = 0; i < (int)L.size(); ++i) s += log(L[i][i]);
59 return 2.0 * s;
60}
61
62// Mahalanobis distance squared using Cholesky: y = L^{-1}(x-mu); D^2 = y^T y
63double mahalanobis2_from_cholesky(const Vec &x, const Vec &mu, const Mat &L) {
64 int d = (int)x.size();
65 Vec xm(d);
66 for (int i = 0; i < d; ++i) xm[i] = x[i] - mu[i];
67 Vec y = solveLower(L, xm); // y = L^{-1}(x-mu)
68 double d2 = 0.0;
69 for (double v : y) d2 += v * v;
70 return d2;
71}
72
73// Log-pdf of N(mu, Sigma) using precomputed Cholesky L of Sigma
74double logpdf_mvn(const Vec &x, const Vec &mu, const Mat &L) {
75 int d = (int)x.size();
76 double d2 = mahalanobis2_from_cholesky(x, mu, L);
77 double logdet = logDetFromCholesky(L);
78 const double log2pi = log(2.0 * M_PI);
79 return -0.5 * (d * log2pi + logdet + d2);
80}
81
82int main() {
83 // Example: 2D Gaussian with correlation
84 Vec mu = {1.0, -2.0};
85 Mat Sigma = {{2.0, 0.8},
86 {0.8, 1.0}}; // SPD if 2*1 - 0.8^2 > 0
87
88 Mat L;
89 if (!cholesky(Sigma, L)) {
90 cerr << "Covariance is not SPD; consider adding a small ridge to the diagonal.\n";
91 return 1;
92 }
93
94 Vec x = {0.5, -1.0};
95
96 double d2 = mahalanobis2_from_cholesky(x, mu, L);
97 double lp = logpdf_mvn(x, mu, L);
98
99 cout << fixed << setprecision(6);
100 cout << "Mahalanobis^2: " << d2 << "\n";
101 cout << "Log-pdf: " << lp << "\n";
102
103 // Demonstrate batched log-likelihood for multiple points
104 vector<Vec> data = { {0.5, -1.0}, {1.2, -2.1}, {2.0, -2.5} };
105 double total_loglik = 0.0;
106 for (const auto &xi : data) total_loglik += logpdf_mvn(xi, mu, L);
107 cout << "Total log-likelihood of 3 points: " << total_loglik << "\n";
108 return 0;
109}
110

This program computes the Mahalanobis distance and the log-pdf of a multivariate Gaussian using a Cholesky factor of the covariance. It avoids forming the inverse by solving triangular systems, which is more stable and efficient. The log-determinant is read directly from the Cholesky diagonal.

Time: O(d^3) to factor once, then O(d^2) per log-pdf evaluationSpace: O(d^2) for the Cholesky factor plus O(d) temporaries
Sampling from N(μ, Σ) using Cholesky and standard normals
1#include <bits/stdc++.h>
2using namespace std;
3
4using Vec = vector<double>;
5using Mat = vector<vector<double>>;
6
7bool cholesky(const Mat &Sigma, Mat &L) {
8 int d = (int)Sigma.size();
9 L.assign(d, vector<double>(d, 0.0));
10 for (int i = 0; i < d; ++i) {
11 for (int j = 0; j <= i; ++j) {
12 double sum = Sigma[i][j];
13 for (int k = 0; k < j; ++k) sum -= L[i][k] * L[j][k];
14 if (i == j) {
15 if (sum <= 0.0) return false;
16 L[i][i] = sqrt(max(sum, 0.0));
17 } else {
18 L[i][j] = sum / L[j][j];
19 }
20 }
21 }
22 return true;
23}
24
25Vec matvec_lower(const Mat &L, const Vec &z) {
26 int d = (int)L.size();
27 Vec x(d, 0.0);
28 for (int i = 0; i < d; ++i) {
29 double sum = 0.0;
30 for (int j = 0; j <= i; ++j) sum += L[i][j] * z[j];
31 x[i] = sum;
32 }
33 return x;
34}
35
36int main(){
37 // Define a 3D Gaussian
38 Vec mu = {0.0, 1.0, -1.0};
39 Mat Sigma = {
40 {1.0, 0.5, 0.2},
41 {0.5, 2.0, 0.3},
42 {0.2, 0.3, 1.5}
43 };
44
45 Mat L;
46 if (!cholesky(Sigma, L)) {
47 cerr << "Sigma not SPD. Add ridge: Sigma[i][i] += 1e-6 and retry.\n";
48 return 1;
49 }
50
51 // Random engine and standard normals
52 std::mt19937_64 rng(42);
53 std::normal_distribution<double> stdnorm(0.0, 1.0);
54
55 auto sample = [&](void){
56 int d = (int)mu.size();
57 Vec z(d); for (int i=0;i<d;++i) z[i] = stdnorm(rng); // z ~ N(0, I)
58 Vec Lz = matvec_lower(L, z); // L z
59 Vec x(d); for (int i=0;i<d;++i) x[i] = mu[i] + Lz[i]; // x = mu + L z
60 return x;
61 };
62
63 // Draw and print 5 samples
64 cout << fixed << setprecision(5);
65 for (int k = 0; k < 5; ++k) {
66 Vec x = sample();
67 cout << "sample " << k << ": ";
68 for (int i=0;i<(int)x.size();++i) cout << x[i] << (i+1==(int)x.size()?"\n":" ");
69 }
70
71 return 0;
72}
73

This program samples from a multivariate Gaussian by factoring Σ = L L^T and transforming standard normal draws z via x = μ + L z. Using Cholesky ensures the correct covariance and numerical stability.

Time: O(d^3) preprocessing for Cholesky, then O(d^2) per sampleSpace: O(d^2) for storing L and O(d) per vector
Fit μ and Σ by MLE from data with ridge regularization, then evaluate log-likelihood
1#include <bits/stdc++.h>
2using namespace std;
3
4using Vec = vector<double>;
5using Mat = vector<vector<double>>;
6
7Mat zeros(int r, int c){ return Mat(r, Vec(c, 0.0)); }
8
9bool cholesky(const Mat &Sigma, Mat &L) {
10 int d = (int)Sigma.size();
11 L.assign(d, Vec(d, 0.0));
12 for (int i = 0; i < d; ++i) {
13 for (int j = 0; j <= i; ++j) {
14 double sum = Sigma[i][j];
15 for (int k = 0; k < j; ++k) sum -= L[i][k] * L[j][k];
16 if (i == j) {
17 if (sum <= 0.0) return false;
18 L[i][i] = sqrt(max(sum, 0.0));
19 } else {
20 L[i][j] = sum / L[j][j];
21 }
22 }
23 }
24 return true;
25}
26
27Vec solveLower(const Mat &L, const Vec &b) {
28 int d = (int)L.size(); Vec y(d,0.0);
29 for (int i=0;i<d;++i){ double s=b[i]; for(int k=0;k<i;++k) s -= L[i][k]*y[k]; y[i]=s/L[i][i]; }
30 return y;
31}
32
33double logDetFromCholesky(const Mat &L){ double s=0.0; for(int i=0;i<(int)L.size();++i) s+=log(L[i][i]); return 2.0*s; }
34
35double logpdf_mvn(const Vec &x, const Vec &mu, const Mat &L){
36 int d=(int)x.size(); Vec xm(d); for(int i=0;i<d;++i) xm[i]=x[i]-mu[i];
37 Vec y = solveLower(L, xm); double d2=0.0; for(double v:y) d2+=v*v;
38 double logdet = logDetFromCholesky(L);
39 return -0.5*(d*log(2.0*M_PI)+logdet+d2);
40}
41
42// Fit mean (MLE) and covariance (MLE) with optional ridge lambda on the diagonal
43pair<Vec, Mat> fit_gaussian_mle(const vector<Vec> &data, double ridge_lambda=0.0){
44 int n = (int)data.size();
45 int d = (int)data[0].size();
46 // Mean
47 Vec mu(d, 0.0);
48 for (const auto &x : data) for (int j=0;j<d;++j) mu[j] += x[j];
49 for (int j=0;j<d;++j) mu[j] /= n;
50 // Covariance MLE (1/n)
51 Mat S = zeros(d, d);
52 for (const auto &x : data){
53 for (int i=0;i<d;++i){ double xi = x[i]-mu[i];
54 for (int j=0;j<=i;++j){ double xj = x[j]-mu[j]; S[i][j] += xi*xj; }
55 }
56 }
57 for (int i=0;i<d;++i){
58 for (int j=0;j<=i;++j){ S[i][j] /= n; S[j][i] = S[i][j]; }
59 }
60 // Ridge regularization
61 for (int i=0;i<d;++i) S[i][i] += ridge_lambda;
62 return {mu, S};
63}
64
65int main(){
66 // Synthetic 2D dataset (e.g., two correlated features)
67 vector<Vec> data = {
68 {2.1, 1.0}, {1.9, 1.2}, {2.0, 1.1}, {2.2, 0.9}, {1.8, 1.3},
69 {2.3, 0.8}, {1.7, 1.4}, {2.1, 1.0}, {2.0, 0.95}, {2.2, 1.05}
70 };
71
72 // Fit parameters with a small ridge for numerical safety
73 double lambda = 1e-6;
74 auto [mu, Sigma] = fit_gaussian_mle(data, lambda);
75
76 // Factor for fast log-likelihood
77 Mat L; if (!cholesky(Sigma, L)) { cerr << "Failed Cholesky; increase ridge.\n"; return 1; }
78
79 // Compute average log-likelihood on the training data
80 double sum_loglik = 0.0; for (const auto &x : data) sum_loglik += logpdf_mvn(x, mu, L);
81 double avg_loglik = sum_loglik / (double)data.size();
82
83 cout << fixed << setprecision(6);
84 cout << "Estimated mu: [" << mu[0] << ", " << mu[1] << "]\n";
85 cout << "Estimated Sigma:\n";
86 cout << "[" << Sigma[0][0] << ", " << Sigma[0][1] << "]\n";
87 cout << "[" << Sigma[1][0] << ", " << Sigma[1][1] << "]\n";
88 cout << "Average log-likelihood: " << avg_loglik << "\n";
89
90 // Score a new point
91 Vec x_new = {2.4, 0.7};
92 double lp_new = logpdf_mvn(x_new, mu, L);
93 cout << "Log-pdf of new point: " << lp_new << "\n";
94 return 0;
95}
96

This program estimates the mean and covariance by maximum likelihood from data, adds a small ridge to ensure positive definiteness, factors the covariance via Cholesky, and evaluates average log-likelihood and the log-pdf of a new point. It demonstrates end-to-end modeling and scoring.

Time: O(n d^2) to accumulate covariance, O(d^3) for Cholesky, O(d^2) per log-pdfSpace: O(d^2) for covariance and Cholesky, O(n d) for data storage
#multivariate normal#gaussian distribution#covariance matrix#cholesky decomposition#mahalanobis distance#log likelihood#pdf evaluation#sampling#precision matrix#regularization#kalman filter#qda#lda#matrix determinant