🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
⚙️AlgorithmAdvanced

Hamiltonian Monte Carlo (HMC)

Key Points

  • •
    Hamiltonian Monte Carlo (HMC) uses gradients of the log-density to propose long-distance moves that still land in high-probability regions.
  • •
    It treats sampling as simulating a frictionless physical system with position (parameters) and momentum (auxiliary variables).
  • •
    The leapfrog integrator makes proposals that are reversible and volume-preserving, enabling a Metropolis correction with high acceptance rates.
  • •
    Per iteration cost is proportional to the number of leapfrog steps times the cost of computing the gradient of the log-density.
  • •
    HMC excels in moderate-to-high dimensions and for strongly correlated posteriors where random-walk MCMC struggles.
  • •
    Proper tuning of step size, number of steps, and mass matrix dramatically affects performance and acceptance rate.
  • •
    HMC is not suitable for discrete or non-differentiable parameters because it requires gradients.
  • •
    NUTS (No-U-Turn Sampler) builds on HMC by automatically choosing the trajectory length, reducing manual tuning.

Prerequisites

  • →Multivariable calculus (gradients) — HMC requires computing gradients of the log-density and understanding how they guide motion.
  • →Linear algebra (vectors, matrices) — Leapfrog updates and mass matrices require basic vector/matrix operations.
  • →Probability and Bayesian inference — Understanding target distributions, posteriors, priors, and likelihoods is essential.
  • →Markov Chain Monte Carlo (MCMC) basics — HMC is an MCMC method; knowledge of Metropolis-Hastings and detailed balance helps.
  • →Numerical integration of ODEs — HMC uses symplectic integrators (leapfrog) to approximate Hamiltonian dynamics.
  • →C++ programming and random number generation — Implementing samplers requires writing efficient, reproducible code.
  • →Data preprocessing and scaling — Standardization improves the geometry for HMC and allows larger stable step sizes.

Detailed Explanation

Tap terms for definitions

01Overview

Hamiltonian Monte Carlo (HMC) is a Markov Chain Monte Carlo (MCMC) algorithm that uses gradient information to explore probability distributions efficiently. Instead of taking short, random steps (like a random walk), HMC simulates the motion of a particle moving through a landscape defined by the negative log probability (potential energy). By introducing auxiliary momentum variables and integrating Hamilton’s equations with a symplectic integrator (typically leapfrog), HMC proposes distant states that still have high probability, thereby reducing autocorrelation and improving mixing.

The core idea is to alternate between sampling a fresh momentum from a Gaussian distribution and then following a nearly energy-conserving path through the parameter space for several steps before applying a Metropolis acceptance test. Because the leapfrog method is time-reversible and volume-preserving, the proposals are accepted with relatively high probability if the step size is tuned properly. This makes HMC particularly powerful in higher dimensions and for posteriors with strong correlations where random-walk proposals are inefficient.

HMC requires the gradient of the log target density, which restricts its use to differentiable models but yields large gains when available. Practical performance depends on selecting the step size, number of leapfrog steps, and an appropriate mass matrix that matches the geometry of the target. Variants like the No-U-Turn Sampler (NUTS) automate trajectory length selection and often include adaptation of both step size and mass matrix during warm-up.

02Intuition & Analogies

Imagine hiking across a mountainous landscape where altitude equals negative log probability: valleys are high-probability regions, and peaks are low-probability regions. A basic random-walk sampler takes small, jittery steps in random directions; it gets stuck meandering in valleys, repeatedly proposing moves uphill that are frequently rejected. Progress is slow and inefficient.

HMC gives you a map with slope directions (the gradient) and a way to build momentum so you can glide along the valley floor. You start by giving yourself a push (sample momentum), then follow the terrain’s shape using the gradient to steer. Because you move with momentum, you travel longer distances along contours of nearly constant altitude, visiting many plausible points before losing speed. This produces proposals that are far apart but still land in promising areas, leading to faster exploration and lower autocorrelation.

The leapfrog integrator is like a careful sequence of mini-steps that keeps total energy nearly constant as you move. Reversibility (you can exactly retrace your path backward) and volume preservation (you don’t squish or stretch space) ensure that, after your flight, a simple Metropolis check can correct for small numerical errors. If the energy barely changes, your proposal is accepted most of the time. Intuitively, HMC replaces clumsy, uphill-prone stumbling with smooth, physics-informed gliding that respects the landscape’s geometry.

03Formal Definition

Let \(π(x)\) be a target density on \(Rd\), assumed differentiable. Define the potential energy \(U(x) = -log π(x)\) and introduce an auxiliary momentum \(p ∈ Rd\) with kinetic energy \(K(p) = 21​ p^⊤ M−1 p\), where \(M\) is a symmetric positive definite mass matrix. The joint density over \((x,p)\) is proportional to \(exp(-H(x,p))\) with Hamiltonian \(H(x,p) = U(x) + K(p)\). Sampling \((x,p)\) from this joint and marginalizing out \(p\) leaves \(x\) distributed according to \(π\). A single HMC transition proceeds by: (1) resampling \(p ∼ N(0, M)\), (2) approximately integrating Hamilton’s equations \(x˙ = ∇p​ H = M^{-1}p\), \(p˙​ = -∇x​ H = -∇ U(x)\) for \(L\) steps with step size \(ε\) using a symplectic integrator (leapfrog), yielding a proposal \((x',p')\), and (3) performing a Metropolis accept/reject with probability \(α = min\{1, \exp\big(-H(x',p') + H(x,p)\big)\}\). If accepted, set \(x ← x'\); otherwise keep the old state. The leapfrog method updates as: \(pt+ε/2​=pt​ - 2ε​ ∇ U(xt​)\), \(xt+ε​=xt​ + ε M−1 pt+ε/2​\), and \(pt+ε​=pt+ε/2​ - 2ε​ ∇ U(xt+ε​)\). This scheme is reversible and volume-preserving, ensuring detailed balance when combined with the Metropolis correction.

04When to Use

  • Continuous, differentiable targets: Use HMC when your log-density is differentiable and gradients are available or cheaply computable (e.g., models with smooth priors/likelihoods).
  • Moderate-to-high dimensions: HMC shines as dimension grows, where random-walk proposals degrade severely. It manages correlated, elongated posteriors well by moving along level sets.
  • Complex posteriors with strong correlations: Hierarchical Bayesian models or logistic/Poisson regression with informative priors benefit significantly.
  • Expensive likelihoods where fewer, higher-quality proposals are better than many cheap, low-quality ones.
  • When you can precondition: If you can estimate scaling/rotation (mass matrix) from warm-up, HMC performance improves dramatically.

Avoid or modify when: parameters are discrete or piecewise constant (no gradients), the log-density is very rough or non-differentiable, or you cannot compute/approximate gradients. If hand-tuning (L) is difficult, prefer NUTS, which adapts trajectory length. If curvature varies widely, consider Riemannian HMC or adapt a dense mass matrix. For extremely multimodal targets with distant, isolated modes, HMC can still struggle to jump between modes; tempered methods may help.

⚠️Common Mistakes

  • Wrong sign for gradients: In HMC, the potential is (U(x) = -\log \pi(x)). The leapfrog uses (-\nabla U = \nabla \log \pi) only in specific places. Mixing these signs leads to poor acceptance or divergence. Define one convention and stick to it.
  • Step size too large: Large (\varepsilon) causes energy drift and rejections; too small wastes computation. Aim for an average acceptance around 60–90%, often near 65–80% in higher dimensions.
  • Too few or too many leapfrog steps: Too few makes proposals local (like random walk); too many can loop back (wasting computation) or accumulate numerical error. NUTS mitigates this automatically.
  • Ignoring mass matrix: Using (M=I) for anisotropic targets yields tiny stable step sizes. Estimate a diagonal or dense (M) from warm-up to match posterior scales/correlations.
  • Not centering/scaling parameters: Poorly scaled parameters amplify stiffness, forcing tiny (\varepsilon). Standardize predictors and reparameterize when possible.
  • Using non-symplectic integrators: Forward Euler breaks reversibility/volume preservation, causing bias even with Metropolis correction. Use leapfrog or higher-order symplectic schemes.
  • Forgetting to flip momentum on rejection/acceptance consistently: Ensure the accept/reject step uses the correct Hamiltonian and that proposals are constructed reversibly (commonly accept with current (p') or its negation consistently).
  • Insufficient warm-up: Without step size and mass adaptation, chains can diverge or mix poorly. Use dual averaging for (\varepsilon) and estimate (M) during warm-up.
  • Miscomputing the logistic regression gradient: Remember (\nabla U = -\nabla \log \pi). The sigmoid and prior terms must be included with correct signs.

Key Formulas

Potential Energy

U(x)=−logπ(x)

Explanation: Defines the energy landscape from the target density. Lower U corresponds to higher probability regions; HMC moves along this landscape.

Kinetic Energy

K(p)=21​p⊤M−1p

Explanation: Relates momentum to energy via the mass matrix. With M = I, it reduces to half the squared Euclidean norm of p.

Hamiltonian

H(x,p)=U(x)+K(p)

Explanation: Total energy used in the Metropolis correction. Leapfrog approximately conserves H, so proposals are often accepted.

Hamilton’s Equations

x˙=∇p​H(x,p)=M−1p,p˙​=−∇x​H(x,p)=−∇U(x)

Explanation: Continuous-time dynamics that HMC discretizes. They define how position and momentum evolve along the trajectory.

Leapfrog Half-Step (Momentum)

pt+ε/2​=pt​−2ε​∇U(xt​)

Explanation: First half-step update of momentum using the gradient of the potential. It keeps the integrator time-reversible and stable.

Leapfrog Full-Step (Position)

xt+ε​=xt​+εM−1pt+ε/2​

Explanation: Update position using the intermediate momentum. This is the central position update of leapfrog.

Leapfrog Half-Step (Momentum 2)

pt+ε​=pt+ε/2​−2ε​∇U(xt+ε​)

Explanation: Second half-step update of momentum using the new position. Together, the three updates form a reversible, volume-preserving step.

Metropolis Acceptance

α=min{1,exp(−H(x′,p′)+H(x,p))}

Explanation: Accept or reject the proposal based on the change in total energy. Small integration error (small ΔH) leads to high acceptance.

Logistic Sigmoid

σ(z)=1+e−z1​

Explanation: Maps real numbers to (0,1); used in logistic regression likelihoods. Its derivative is \(σ(z)(1-σ(z))\).

Logistic Regression Potential Gradient

∇U(w)=−∇logπ(w)=−i=1∑n​(yi​−σ(xi⊤​w))xi​+σ02​1​w

Explanation: For a Gaussian prior w ~ N(0, σ02​ I), the gradient of the potential includes the data-fit term and the prior penalty term.

Per-Iteration Complexity

Cost per iteration≈L⋅C∇​,C∇​=O(d) to O(nd)

Explanation: Each leapfrog step requires a gradient. For simple targets C_∇ is O(d); for generalized linear models it’s O(nd).

Complexity Analysis

Per HMC iteration, the dominant work is computing gradients of the potential U(x) for each leapfrog step. If L is the number of leapfrog steps and C_∇ is the cost of one gradient evaluation, the time per iteration is O(L · C_∇). For simple analytic targets like a multivariate Gaussian with a precomputed inverse covariance, C_∇ = O(d), giving O(L d) time. For Bayesian logistic regression with n data points and d features, evaluating the gradient requires computing n inner products and sums, so C_∇ = O(n d), and the iteration cost becomes O(L n d). The leapfrog integrator is second-order accurate: local energy error per step is O(ε3), and over L steps the accumulated error is roughly O(L ε3) for energy, which drives the Metropolis rejection rate. To maintain a desired acceptance probability as dimension grows, the step size ε often must decrease; preconditioning with a good mass matrix M can restore larger stable ε and reduce L. Space complexity is modest. Storing the current position x and momentum p takes O(d). For models with data, memory is dominated by the dataset storage, O(n d), plus any precomputed structures (e.g., a covariance inverse for Gaussians). Temporary workspaces for gradients and intermediate leapfrog states are O(d). If a dense mass matrix is used, storing M and possibly its Cholesky factor adds O(d2). In practice, diagonal M is common, keeping extra memory O(d). Overall, HMC is memory-light compared to many optimization routines, but time can be substantial if gradients are expensive or many leapfrog steps are needed.

Code Examples

HMC sampler for a 2D correlated Gaussian
1#include <bits/stdc++.h>
2using namespace std;
3
4struct HMC {
5 int d; // dimension
6 double eps; // step size
7 int L; // number of leapfrog steps
8 mt19937 rng;
9 normal_distribution<double> stdnorm{0.0, 1.0};
10 uniform_real_distribution<double> unif{0.0, 1.0};
11
12 vector<double> mu; // mean
13 vector<vector<double>> invSigma; // inverse covariance (precomputed)
14
15 HMC(int dim, double step, int steps, const vector<double>& mu_, const vector<vector<double>>& invS)
16 : d(dim), eps(step), L(steps), mu(mu_), invSigma(invS) {
17 random_device rd; rng.seed(rd());
18 }
19
20 // Vector utilities
21 static vector<double> add(const vector<double>& a, const vector<double>& b) {
22 vector<double> c(a.size());
23 for (size_t i = 0; i < a.size(); ++i) c[i] = a[i] + b[i];
24 return c;
25 }
26
27 static vector<double> sub(const vector<double>& a, const vector<double>& b) {
28 vector<double> c(a.size());
29 for (size_t i = 0; i < a.size(); ++i) c[i] = a[i] - b[i];
30 return c;
31 }
32
33 static vector<double> scal(double s, const vector<double>& a) {
34 vector<double> c(a.size());
35 for (size_t i = 0; i < a.size(); ++i) c[i] = s * a[i];
36 return c;
37 }
38
39 static double dot(const vector<double>& a, const vector<double>& b) {
40 double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i]*b[i]; return s;
41 }
42
43 vector<double> matvec(const vector<vector<double>>& A, const vector<double>& x) const {
44 vector<double> y(A.size(), 0.0);
45 for (size_t i = 0; i < A.size(); ++i) {
46 double s = 0.0;
47 for (size_t j = 0; j < A[i].size(); ++j) s += A[i][j] * x[j];
48 y[i] = s;
49 }
50 return y;
51 }
52
53 // Potential U(x) = 0.5 * (x-mu)^T invSigma (x-mu) (const dropped)
54 double U(const vector<double>& x) const {
55 vector<double> xm = sub(x, mu);
56 vector<double> t = matvec(invSigma, xm);
57 return 0.5 * dot(xm, t);
58 }
59
60 // Gradient of U: grad_U = invSigma * (x - mu)
61 vector<double> gradU(const vector<double>& x) const {
62 vector<double> xm = sub(x, mu);
63 return matvec(invSigma, xm);
64 }
65
66 // Single HMC transition from state x
67 vector<double> step(const vector<double>& x0, bool& accepted, double& dH) {
68 // Sample momentum p ~ N(0, I)
69 vector<double> p0(d);
70 for (int i = 0; i < d; ++i) p0[i] = stdnorm(rng);
71
72 // Cache initial energy
73 double H0 = U(x0) + 0.5 * dot(p0, p0);
74
75 // Initialize position and momentum
76 vector<double> x = x0;
77 vector<double> p = p0;
78
79 // Leapfrog integration
80 // Half-step momentum
81 vector<double> g = gradU(x);
82 for (int i = 0; i < d; ++i) p[i] -= 0.5 * eps * g[i];
83 // L full steps
84 for (int step = 0; step < L; ++step) {
85 // Full position step (M = I)
86 for (int i = 0; i < d; ++i) x[i] += eps * p[i];
87 // Compute gradient at new position, except after last step we'll still do half momentum
88 g = gradU(x);
89 // If not last step, full momentum step; but leapfrog uses half at end too
90 if (step != L - 1) {
91 for (int i = 0; i < d; ++i) p[i] -= eps * g[i];
92 }
93 }
94 // Final half-step momentum
95 for (int i = 0; i < d; ++i) p[i] -= 0.5 * eps * g[i];
96
97 // Compute proposed energy
98 double H1 = U(x) + 0.5 * dot(p, p);
99 dH = H1 - H0;
100
101 double loga = -dH; // log acceptance ratio
102 double u = log(unif(rng));
103 if (u < loga) { accepted = true; return x; }
104 accepted = false; return x0;
105 }
106};
107
108int main() {
109 // Define a 2D Gaussian with mean [0,0] and correlation 0.8
110 int d = 2;
111 vector<double> mu = {0.0, 0.0};
112 double rho = 0.8, s1 = 1.0, s2 = 0.5;
113 // Sigma = [[s1^2, rho*s1*s2],[rho*s1*s2, s2^2]]
114 double s11 = s1*s1, s22 = s2*s2, s12 = rho*s1*s2;
115 double det = s11*s22 - s12*s12;
116 vector<vector<double>> invSigma = {{ s22/det, -s12/det }, { -s12/det, s11/det }};
117
118 // HMC parameters
119 double eps = 0.05; // step size
120 int L = 25; // leapfrog steps
121
122 HMC hmc(d, eps, L, mu, invSigma);
123
124 // Initial state
125 vector<double> x = {2.0, -2.0};
126
127 // Run sampler
128 int n_iter = 3000;
129 int burn = 500;
130 int accepted_count = 0;
131
132 cout << fixed << setprecision(4);
133 for (int it = 0; it < n_iter; ++it) {
134 bool acc; double dH;
135 x = hmc.step(x, acc, dH);
136 accepted_count += acc ? 1 : 0;
137 if (it % 500 == 0) {
138 cout << "iter=" << it << ", x=[" << x[0] << ", " << x[1] << "]";
139 cout << ", accepted=" << acc << ", dH=" << dH << "\n";
140 }
141 if (it >= burn && it % 200 == 0) {
142 // Print a posterior draw occasionally after burn-in
143 cout << "post-sample: " << x[0] << ", " << x[1] << "\n";
144 }
145 }
146 cout << "Acceptance rate: " << (double)accepted_count / n_iter << "\n";
147 return 0;
148}
149

This program samples from a 2D correlated Gaussian using HMC. The potential U and its gradient are analytic and cheap: U(x) = 0.5 (x-μ)^T Σ^{-1} (x-μ), ∇U = Σ^{-1}(x-μ). We perform L leapfrog steps with step size ε, compute the Hamiltonian difference, and accept or reject the proposal. Because M = I and the target is Gaussian, well-tuned ε and L yield high acceptance and long, efficient moves along the correlated directions.

Time: O(L · d) per iteration (gradient is a matrix-vector product with precomputed Σ^{-1}).Space: O(d) for state and momentum; O(d^2) for storing Σ^{-1}.
HMC for Bayesian logistic regression (Gaussian prior)
1#include <bits/stdc++.h>
2using namespace std;
3
4struct LogisticHMC {
5 int n, d;
6 double eps; // step size
7 int L; // leapfrog steps
8 double sigma0; // prior std for weights
9
10 vector<vector<double>> X; // n x d
11 vector<int> y; // n labels in {0,1}
12
13 mt19937 rng;
14 normal_distribution<double> stdnorm{0.0, 1.0};
15 uniform_real_distribution<double> unif{0.0, 1.0};
16
17 LogisticHMC(const vector<vector<double>>& X_, const vector<int>& y_, double eps_, int L_, double sigma0_)
18 : n((int)X_.size()), d((int)X_[0].size()), eps(eps_), L(L_), sigma0(sigma0_), X(X_), y(y_) {
19 random_device rd; rng.seed(rd());
20 }
21
22 static double sigmoid(double z) { return 1.0 / (1.0 + exp(-z)); }
23
24 // U(w) = -log posterior = -sum(y_i * x_i^T w - log(1+exp(x_i^T w))) + (1/(2 sigma0^2))||w||^2
25 double U(const vector<double>& w) const {
26 double loss = 0.0;
27 for (int i = 0; i < n; ++i) {
28 double z = 0.0; for (int j = 0; j < d; ++j) z += X[i][j] * w[j];
29 // log(1 + exp(z)) is stable if we use log1p(exp(z))
30 double log1pexp = (z > 0) ? z + log1p(exp(-z)) : log1p(exp(z));
31 loss += - (y[i] * z - log1pexp);
32 }
33 double prior = 0.0; for (int j = 0; j < d; ++j) prior += 0.5 * (w[j] * w[j]) / (sigma0 * sigma0);
34 return loss + prior;
35 }
36
37 // grad U(w) = -sum((y_i - sigma(x_i^T w)) x_i) + (1/sigma0^2) w
38 vector<double> gradU(const vector<double>& w) const {
39 vector<double> g(d, 0.0);
40 for (int i = 0; i < n; ++i) {
41 double z = 0.0; for (int j = 0; j < d; ++j) z += X[i][j] * w[j];
42 double p = sigmoid(z);
43 double coeff = (p - (double)y[i]); // note: derivative of negative log-lik
44 for (int j = 0; j < d; ++j) g[j] += coeff * X[i][j];
45 }
46 double invs2 = 1.0 / (sigma0 * sigma0);
47 for (int j = 0; j < d; ++j) g[j] += invs2 * w[j];
48 return g;
49 }
50
51 // One HMC step
52 vector<double> step(const vector<double>& w0, bool& accepted, double& dH) {
53 vector<double> p0(d);
54 for (int j = 0; j < d; ++j) p0[j] = stdnorm(rng); // M = I
55
56 double H0 = U(w0) + 0.5 * inner_product(p0.begin(), p0.end(), p0.begin(), 0.0);
57
58 vector<double> w = w0;
59 vector<double> p = p0;
60
61 // Half-step momentum
62 vector<double> g = gradU(w);
63 for (int j = 0; j < d; ++j) p[j] -= 0.5 * eps * g[j];
64
65 // Leapfrog steps
66 for (int t = 0; t < L; ++t) {
67 for (int j = 0; j < d; ++j) w[j] += eps * p[j];
68 g = gradU(w);
69 if (t != L - 1) {
70 for (int j = 0; j < d; ++j) p[j] -= eps * g[j];
71 }
72 }
73 for (int j = 0; j < d; ++j) p[j] -= 0.5 * eps * g[j];
74
75 double H1 = U(w) + 0.5 * inner_product(p.begin(), p.end(), p.begin(), 0.0);
76 dH = H1 - H0;
77 double loga = -dH;
78 if (log(unif(rng)) < loga) { accepted = true; return w; }
79 accepted = false; return w0;
80 }
81};
82
83// Generate a simple synthetic dataset for logistic regression
84void make_dataset(int n, int d, vector<vector<double>>& X, vector<int>& y, mt19937& rng) {
85 normal_distribution<double> N01(0.0, 1.0);
86 vector<double> w_true(d, 0.0);
87 for (int j = 0; j < d; ++j) w_true[j] = (j % 2 == 0 ? 1.5 : -0.8); // true weights
88
89 X.assign(n, vector<double>(d)); y.assign(n, 0);
90 for (int i = 0; i < n; ++i) {
91 for (int j = 0; j < d; ++j) X[i][j] = N01(rng);
92 double z = 0.0; for (int j = 0; j < d; ++j) z += X[i][j] * w_true[j];
93 double p = 1.0 / (1.0 + exp(-z));
94 bernoulli_distribution Bern(p);
95 y[i] = Bern(rng);
96 }
97}
98
99int main() {
100 ios::sync_with_stdio(false);
101 cin.tie(nullptr);
102
103 // Create synthetic data
104 mt19937 rng(42);
105 int n = 300, d = 3;
106 vector<vector<double>> X; vector<int> y;
107 make_dataset(n, d, X, y, rng);
108
109 // Standardize columns (improves HMC stability)
110 for (int j = 0; j < d; ++j) {
111 double mean = 0.0; for (int i = 0; i < n; ++i) mean += X[i][j]; mean /= n;
112 double var = 0.0; for (int i = 0; i < n; ++i) { double v = X[i][j] - mean; var += v*v; }
113 var /= n; double sd = sqrt(var) + 1e-9;
114 for (int i = 0; i < n; ++i) X[i][j] = (X[i][j] - mean) / sd;
115 }
116
117 double eps = 0.01; // step size (tune for your data)
118 int L = 50; // leapfrog steps
119 double sigma0 = 5.0; // prior std
120
121 LogisticHMC hmc(X, y, eps, L, sigma0);
122
123 vector<double> w(d, 0.0); // initial weights
124 int n_iter = 2000, burn = 500;
125 int accepted = 0;
126
127 cout << fixed << setprecision(4);
128 for (int it = 0; it < n_iter; ++it) {
129 bool acc; double dH;
130 w = hmc.step(w, acc, dH);
131 accepted += acc ? 1 : 0;
132 if (it % 200 == 0) {
133 cout << "iter=" << it << ", dH=" << dH << ", acc=" << acc << ", w=[";
134 for (int j = 0; j < d; ++j) cout << w[j] << (j+1<d?", ":"");
135 cout << "]\n";
136 }
137 if (it >= burn && it % 200 == 0) {
138 cout << "posterior draw: ";
139 for (int j = 0; j < d; ++j) cout << w[j] << (j+1<d?", ":"\n");
140 }
141 }
142 cout << "Acceptance rate: " << (double)accepted / n_iter << "\n";
143 return 0;
144}
145

This example performs HMC for Bayesian logistic regression with a Gaussian prior on weights. The potential U is the negative log posterior, and gradU is its gradient. We standardize features to improve geometry, use M = I, and run leapfrog steps to propose new weights. The acceptance probability corrects for integration error. This shows how HMC handles data-dependent models where each gradient evaluation costs O(n d).

Time: O(L · n · d) per iteration due to gradient over all data points.Space: O(n · d) for the dataset, O(d) for parameters and momentum.
#hamiltonian monte carlo#hmc#mcmc#leapfrog integrator#bayesian inference#logistic regression#mass matrix#no-u-turn sampler#gradient-based sampling#metropolis acceptance#symplectic integrator#potential energy#kinetic energy#posterior sampling