⚙️AlgorithmAdvanced

Langevin Dynamics & Score-Based Sampling

Key Points

•
Langevin dynamics is a noisy gradient-ascent method that moves particles toward high probability regions while adding Gaussian noise to ensure proper exploration.
•
Unadjusted Langevin (ULA) uses $x_{k + 1} = x_{k}$ + $η$ $\nabla$ $lo g$ p( $x_{k}$ ) + $2 η$ $z_{k}$ and targets p approximately; MALA adds a Metropolis–Hastings accept step to obtain exact samples from p.
•
Score-based sampling replaces the true score $\nabla$ $lo g$ p with a learned score s_ $θ$ that estimates gradients of log-densities at different noise levels.
•
Annealed Langevin sampling runs Langevin steps from large to small noise scales, gradually sharpening samples toward the data distribution.
•
In practice, stability depends heavily on the step size $η$ , preconditioning, and whether you include a Metropolis correction.
•
Computational cost per step is dominated by evaluating the score/gradient; MALA roughly doubles this cost due to proposal density evaluations.
•
SGLD uses stochastic gradients and the same noise scaling as ULA, enabling scalable Bayesian posterior sampling from minibatches.
•
C++ implementations require careful random number handling, numerical stability, and consistent noise scaling $2 η$ .

Prerequisites

→Multivariable calculus (gradients, Jacobians) — Computing and interpreting the score \nabla \log p(x) requires gradient operations.
→Probability and Gaussian distributions — Understanding log-densities, Gaussian noise, and mixture models is essential for Langevin updates and proposal densities.
→Markov chains and Metropolis–Hastings — MALA’s correctness and acceptance rules rely on MH theory and detailed balance.
→Stochastic differential equations (basics) — Langevin dynamics originates from an SDE; discretization and stability follow from SDE numerics.
→Numerical methods (Euler–Maruyama) — ULA is a specific discretization; tuning step sizes relates to numerical stability.
→Neural networks (for score-based models) — Learning s_\theta requires understanding training objectives like denoising score matching.
→Random number generation — Implementing Gaussian and uniform RNGs correctly is critical for Langevin noise and acceptance tests.
→Bayesian inference (posterior, priors, likelihood) — Common application of Langevin methods is sampling from posteriors with gradient-accessible log-likelihoods.

Detailed Explanation

Tap terms for definitions

01Overview

Langevin dynamics is a way to sample from a probability distribution by combining gradient information (which points toward higher probability regions) with random noise (which keeps the process exploring). Imagine you want to draw samples from a complicated density p(x) in high dimensions. Classic random-walk Metropolis can be slow because it ignores the shape of p. Langevin dynamics instead follows the gradient of the log-density, often called the score, to move more intelligently. The continuous-time formulation is an SDE whose stationary distribution is p. A simple time-discretization gives the Unadjusted Langevin Algorithm (ULA), which is easy to implement but introduces discretization bias. Adding a Metropolis–Hastings correction yields MALA, which removes this bias at the cost of more computation. In modern machine learning, score-based generative models learn the score function with a neural network and then use annealed Langevin sampling to synthesize data. This connects Bayesian sampling, energy-based modeling, and diffusion-style generation. Practically, you choose a step size, compute a score/gradient at the current point, add Gaussian noise scaled by the step, and iterate. You also manage burn-in and thinning to reduce correlation between samples. Langevin methods are powerful when gradients are available or can be learned, and they scale better than gradient-free MCMC in high dimensions.

02Intuition & Analogies

Suppose you are hiking in fog toward the most likely places to find water (high probability). If you only walk uphill using the steepest ascent (pure gradient), you’ll converge to a peak and stop exploring, missing other lakes. If you wander completely randomly (random walk), you might eventually find lakes but very slowly. Langevin dynamics blends these: you mostly head uphill (following the score, the gradient of log probability) but with a calibrated jitter that keeps you exploring. The size of your steps is like the step size \eta; the jitter’s intensity is scaled as \sqrt{2\eta}. With small \eta, your motion better approximates a physical process (a particle buffeted by thermal noise) whose long-run position frequency matches p(x). Now picture a blurred photograph that becomes sharper as you clean your glasses. Score-based sampling uses a learned guide (the score network) that tells you, for a given blur level, which direction to nudge pixels to look more like real photos. You start with a lot of blur (large noise), make many Langevin-like adjustments (following the score plus noise), then reduce blur gradually (annealing), refining details at each level. This staircase of blurs prevents you from getting stuck and helps you home in on realistic images. In Bayesian inference, think of the score as a tug from the posterior density; the noise prevents overconfidence and maintains proper uncertainty. The Metropolis correction in MALA is like a quality-check that sometimes rejects steps that would otherwise miscalibrate the distribution due to discretization errors.

03Formal Definition

The overdamped Langevin diffusion targeting density p(x) on

R^{d}

is the stochastic differential equation d

X_{t}

\nabla

lo g

X_{t}

)\,dt +

2

\,d

W_{t}

, where

W_{t}

is standard Brownian motion. Under regularity conditions (smoothness, confinement), p is the stationary distribution of this diffusion. Discretizing with step

η

> 0 yields the Unadjusted Langevin Algorithm (ULA):

x_{k + 1} = x_{k}

η

\nabla

lo g

x_{k}

) +

2 η

z_{k}

, where

z_{k}

\sim

N

(0, I). ULA has discretization bias that shrinks as

η

\to

0. The Metropolis-Adjusted Langevin Algorithm (MALA) uses the same proposal

y = x

η

\nabla

lo g

p(x) +

2 η

\,z but accepts it with probability

α

(x,y) =

min

\{1,

\frac{p ( y ) q ( x ∣ y )}{p ( x ) q ( y ∣ x )}

\}, where q(

\cdot

|x) is

N

(x +

η

\nabla

lo g

p(x), 2

η

I). This correction yields an exact Markov chain with stationary distribution p. Score-based sampling generalizes by using a score function s(x,t)

\approx

\nabla_{x}

lo g

p_{t}

(x), the gradient of the log-density of x under a noise-perturbed distribution

p_{t}

(e.g., Gaussian convolution with variance

σ

(t)^2). Annealed Langevin sampling runs a sequence of ULA steps across decreasing noise scales

σ_{1}

σ_{2}

\dots

σ_{L}

, using s(x,

σ

ℓ

) to guide updates, gradually transforming noise into samples from the target (data) distribution.

04When to Use

Use Langevin dynamics when you can compute (or approximate) gradients of the log-density. Typical settings include Bayesian posterior sampling where the log posterior’s gradient is available up to a constant, energy-based models where p(x) \propto \exp(-U(x)) and \nabla \log p(x) = -\nabla U(x), and high-dimensional problems where random-walk proposals mix poorly. Choose ULA for quick approximate sampling when small bias is acceptable and you want simplicity. Choose MALA when exactness matters or when larger steps are desired while maintaining correctness. Use SGLD when datasets are large and you can only afford minibatch gradients; the injected noise compensates for stochastic gradient noise. Use score-based sampling (annealed Langevin) when you have a learned score network s_\theta that estimates gradients across noise levels, as in modern generative modeling (images, audio, 3D). It is preferred when you can’t write p(x) explicitly but can train s_\theta via denoising score matching. Avoid plain Langevin if gradients are unavailable or extremely noisy without control variates, or when the target is multimodal with narrow modes and you cannot tune steps/temperatures. In very complex geometries with strong correlations, Hamiltonian Monte Carlo or Riemannian preconditioning may be more efficient.

⚠️Common Mistakes

Wrong noise scaling: using \sqrt{\eta} instead of \sqrt{2\eta} breaks the stationary distribution. Always ensure noise standard deviation is \sqrt{2\eta}.
Sign errors: if p(x) \propto \exp(-U(x)), then \nabla \log p(x) = -\nabla U(x). Using +\nabla U(x) causes divergence.
Step size too large: large \eta leads to instability and low acceptance in MALA. Start small, increase cautiously, and monitor acceptance rates (aim roughly 50–70% in moderate d).
Ignoring burn-in and autocorrelation: early samples haven’t mixed; later samples can be highly correlated. Discard burn-in and thin if necessary.
Forgetting Metropolis correction for exactness: ULA is biased; use MALA if unbiased stationary sampling is required.
Inconsistent random seeds across threads: correlated noise can bias results; use independent RNG streams.
Poor preconditioning: isotropic steps in highly anisotropic targets cause slow mixing. Use a mass/preconditioner matrix approximating the target’s covariance or Hessian.
For score-based sampling: mismatch between noise schedule and step sizes can cause oversmoothing or artifacts; use steps proportional to \sigma^2 and enough iterations per level.

Key Formulas

Langevin SDE

d X_{t} = \nabla lo g p (X_{t}) d t + 2 d W_{t}

Explanation: This stochastic differential equation has p as a stationary distribution under mild conditions. The drift follows the score and the diffusion adds isotropic Gaussian noise.

Unadjusted Langevin (ULA)

x_{k + 1} = x_{k} + η \nabla lo g p (x_{k}) + 2 η z_{k}, z_{k} \sim N (0, I)

Explanation: Euler–Maruyama discretization of Langevin diffusion. Small step size $η$ reduces bias but increases computation.

MALA Acceptance Probability

α (x, y) = min {1, exp (lo g p (y) - lo g p (x) + lo g q (x ∣ y) - lo g q (y ∣ x))}

Explanation: Given proposal q(y|x) = $N$ (x+ $η$ s(x), 2 $η$ I), this accept/reject rule corrects discretization error to target p exactly.

Gaussian Proposal Density

lo g q (y ∣ x) = - \frac{1}{4 η} ∥ y - x - ηs (x) ∥^{2} + C

Explanation: The MALA proposal is Gaussian with mean x + $η$ s(x) and covariance 2 $η$ I. Only differences of log q matter in acceptance.

Fokker–Planck for Langevin

\partial_{t} ρ_{t} = - \nabla \cdot (ρ_{t} \nabla lo g p) + Δ ρ_{t}

Explanation: This PDE describes how densities evolve under Langevin dynamics. Setting $ρ_{t}$ = p makes the right-hand side zero, showing p is stationary.

Preconditioned Langevin

x_{k + 1} = x_{k} + η M \nabla lo g p (x_{k}) + 2 η M^{1/2} z_{k}

Explanation: A positive-definite matrix M adapts step sizes along different directions to improve mixing on anisotropic targets.

SGLD Update

x_{k + 1} = x_{k} + η \nabla lo g p (x_{k}) + 2 η z_{k}

Explanation: Uses a stochastic (minibatch) estimate of the score. The added Gaussian noise maintains the correct stationary behavior asymptotically.

Denoising Score Matching

L_{DSM} (θ) = E_{σ} E_{x \sim p_{data}} E_{z \sim N (0, I)} [s_{θ} (x + σ z, σ) + \frac{z}{σ}^{2}]

Explanation: Trains a score network to predict the score of Gaussian-smoothed data at multiple noise levels. Minimizing this recovers scores needed for annealed Langevin sampling.

Annealed Step Size Rule

η_{ℓ} = c σ_{ℓ}^{2}

Explanation: A practical heuristic sets the Langevin step at level $ℓ$ proportional to the squared noise level $σ$ _ $ℓ^{2}$ to balance drift and diffusion.

KL Descent Under Langevin Flow

\frac{d}{d t} KL (ρ_{t} ∥ p) = - E_{ρ_{t}} [\nabla lo g \frac{ρ _{t}}{p}^{2}] \leq 0

Explanation: In continuous time, Langevin diffusion decreases KL divergence to p monotonically, justifying its convergence in distribution.

Complexity Analysis

Let d be the dimension and

C_{s}

the cost to evaluate a score/gradient and, if needed, the log density. For many analytic targets

C_{s} = O (d

); for neural score networks,

C_{s}

is proportional to the network’s FLOPs and may dominate runtime. One ULA iteration performs one score evaluation and generates d Gaussian random numbers, costing O(

C_{s}

+ d). Over T iterations, time is O(T(

C_{s}

+ d)); memory is O(d) to hold the current state, plus O(N d) if you store N samples. MALA requires evaluating both the forward and reverse proposal densities, which entails two score evaluations (at x and y) and two quadratic forms. Thus, per iteration it is roughly 2× more expensive than ULA: O(

C_{s}

+ d) for proposing, then another O(

C_{s}

+ d) to compute acceptance, totaling O(2

C_{s}

+ d). The acceptance test also adds branching but negligible asymptotic overhead. In annealed Langevin with L noise levels and K_

ℓ

steps at each level, total time is O((

\sum_{ℓ = 1}^{L}

ℓ

)(

C_{s}

+ d)). Choosing K_

ℓ

proportional to 1/

η

ℓ

is common, so finer levels may require more steps. SGLD replaces exact gradients with minibatch estimates; if a minibatch of size m is used for a dataset of size n,

C_{s}

scales like O(m d) instead of O(n d), trading bias/variance for speed. Space complexity for all methods remains O(d) for the running state and temporary buffers. If you cache gradients or store full chains, memory grows with the number of stored samples. Parallel chains increase memory linearly with the number of chains but can reduce wall-clock time via independent RNG streams. Stability and effective sample size depend on

η

, preconditioning, and the target’s geometry; too large

η

can increase rejection (MALA) or bias (ULA), while too small

η

slows mixing.

Code Examples

ULA: Unadjusted Langevin sampling from a 2D Gaussian mixture

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Simple 2D vector utilities
5 struct Vec2 {
6     double x, y;
7 };
8 
9 Vec2 operator+(const Vec2 &a, const Vec2 &b){ return {a.x+b.x, a.y+b.y}; }
10 Vec2 operator-(const Vec2 &a, const Vec2 &b){ return {a.x-b.x, a.y-b.y}; }
11 Vec2 operator*(double s, const Vec2 &a){ return {s*a.x, s*a.y}; }
12 Vec2 operator*(const Vec2 &a, double s){ return {s*a.x, s*a.y}; }
13 
14 double dot(const Vec2 &a, const Vec2 &b){ return a.x*b.x + a.y*b.y; }
15 
16 typedef vector<Vec2> Vec2List;
17 
18 // Target: mixture of K isotropic Gaussians N(mu_k, s2 I)
19 struct GaussianMixture2D {
20     Vec2List means;       // component means
21     vector<double> w;     // weights (sum to 1)
22     double s2;            // component variance (scalar)
23 
24     // log unnormalized density at x
25     double logp(const Vec2 &x) const {
26         // log sum_k w_k * N(x|mu_k, s2 I)
27         // N(x|mu, s2 I) = (2pi s2)^(-d/2) * exp(-||x-mu||^2/(2 s2))
28         // constants cancel in gradients and MH ratios, but we keep them for clarity
29         double log_norm = -0.5 * 2 * log(2*M_PI*s2); // d=2
30         vector<double> logs;
31         logs.reserve(means.size());
32         for(size_t k=0;k<means.size();++k){
33             Vec2 diff = x - means[k];
34             double quad = dot(diff, diff);
35             logs.push_back(log(w[k]) + log_norm + (-0.5 * quad / s2));
36         }
37         // log-sum-exp for numerical stability
38         double m = *max_element(logs.begin(), logs.end());
39         double sumexp = 0.0;
40         for(double v: logs) sumexp += exp(v - m);
41         return m + log(sumexp);
42     }
43 
44     // score: grad_x log p(x)
45     Vec2 score(const Vec2 &x) const {
46         // \nabla log p(x) = sum_k r_k(x) * (mu_k - x)/s2, where r_k are posterior weights
47         vector<double> logs;
48         logs.reserve(means.size());
49         double log_norm = -0.5 * 2 * log(2*M_PI*s2);
50         for(size_t k=0;k<means.size();++k){
51             Vec2 diff = x - means[k];
52             double quad = dot(diff, diff);
53             logs.push_back(log(w[k]) + log_norm + (-0.5 * quad / s2));
54         }
55         double m = *max_element(logs.begin(), logs.end());
56         vector<double> r(means.size());
57         double Z = 0.0;
58         for(size_t k=0;k<means.size();++k){ r[k] = exp(logs[k]-m); Z += r[k]; }
59         for(size_t k=0;k<means.size();++k){ r[k] /= Z; }
60         Vec2 g{0.0, 0.0};
61         for(size_t k=0;k<means.size();++k){
62             Vec2 term = means[k] - x; // (mu_k - x)
63             g = g + (r[k] / s2) * term;
64         }
65         return g;
66     }
67 };
68 
69 // Draw standard normal N(0,1)
70 double randn(std::mt19937_64 &rng){ static normal_distribution<double> nd(0.0,1.0); return nd(rng); }
71 
72 int main(){
73     ios::sync_with_stdio(false);
74     cin.tie(nullptr);
75 
76     // Define a simple 2-component mixture
77     GaussianMixture2D mix;
78     mix.means = { Vec2{2.0, 2.0}, Vec2{-2.0, -2.0} };
79     mix.w = {0.5, 0.5};
80     mix.s2 = 0.5; // component variance
81 
82     // Langevin parameters
83     double eta = 0.02;          // step size (must be small enough for stability)
84     size_t burn_in = 2000;      // discard initial samples
85     size_t total_iters = 12000; // total iterations
86 
87     // RNG
88     std::random_device rd;
89     std::mt19937_64 rng(rd());
90 
91     // Initialize at the origin
92     Vec2 x{0.0, 0.0};
93 
94     // Collect some samples after burn-in
95     vector<Vec2> samples;
96     for(size_t t=0; t<total_iters; ++t){
97         // Compute score (gradient of log-density)
98         Vec2 g = mix.score(x);
99         // Gaussian noise with variance 2*eta per coordinate
100         Vec2 noise{ randn(rng), randn(rng) };
101         // ULA update: x <- x + eta * score + sqrt(2*eta) * noise
102         x = x + eta * g + sqrt(2.0*eta) * noise;
103         if(t >= burn_in && (t - burn_in) % 10 == 0){ // thin every 10 steps
104             samples.push_back(x);
105         }
106     }
107 
108     // Print a few samples
109     cout.setf(std::ios::fixed); cout<<setprecision(4);
110     for(size_t i=0; i<min<size_t>(samples.size(), 20); ++i){
111         cout << samples[i].x << "," << samples[i].y << "\n";
112     }
113     cerr << "Collected " << samples.size() << " samples after burn-in.\n";
114     return 0;
115 }
116

Implements ULA for a 2D Gaussian mixture with known score. The update moves toward nearby mixture means while adding Gaussian noise scaled by \sqrt{2\eta}. After burn-in and thinning, printed points should cluster around the two modes near (2,2) and (-2,-2).

Time: O(T K d) where T is iterations, K is number of mixture components (here 2), and d=2.Space: O(d) for the state; O(N d) if storing N samples.

MALA: Metropolis-Adjusted Langevin for exact sampling

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct Vec2 { double x, y; };
5 Vec2 operator+(const Vec2 &a, const Vec2 &b){ return {a.x+b.x, a.y+b.y}; }
6 Vec2 operator-(const Vec2 &a, const Vec2 &b){ return {a.x-b.x, a.y-b.y}; }
7 Vec2 operator*(double s, const Vec2 &a){ return {s*a.x, s*a.y}; }
8 Vec2 operator*(const Vec2 &a, double s){ return {s*a.x, s*a.y}; }
9 double dot(const Vec2 &a, const Vec2 &b){ return a.x*b.x + a.y*b.y; }
10 
11 typedef vector<Vec2> Vec2List;
12 
13 struct GaussianMixture2D {
14     Vec2List means; vector<double> w; double s2;
15     double logp(const Vec2 &x) const {
16         double log_norm = -0.5 * 2 * log(2*M_PI*s2);
17         vector<double> logs; logs.reserve(means.size());
18         for(size_t k=0;k<means.size();++k){
19             Vec2 d = {x.x - means[k].x, x.y - means[k].y};
20             double quad = dot(d, d);
21             logs.push_back(log(w[k]) + log_norm + (-0.5 * quad / s2));
22         }
23         double m = *max_element(logs.begin(), logs.end());
24         double sumexp = 0.0; for(double v: logs) sumexp += exp(v-m);
25         return m + log(sumexp);
26     }
27     Vec2 score(const Vec2 &x) const {
28         vector<double> logs; logs.reserve(means.size());
29         double log_norm = -0.5 * 2 * log(2*M_PI*s2);
30         for(size_t k=0;k<means.size();++k){
31             Vec2 d = {x.x - means[k].x, x.y - means[k].y};
32             double quad = dot(d, d);
33             logs.push_back(log(w[k]) + log_norm + (-0.5 * quad / s2));
34         }
35         double m = *max_element(logs.begin(), logs.end());
36         double Z = 0.0; vector<double> r(means.size());
37         for(size_t k=0;k<means.size();++k){ r[k] = exp(logs[k]-m); Z += r[k]; }
38         for(size_t k=0;k<means.size();++k){ r[k] /= Z; }
39         Vec2 g{0.0, 0.0};
40         for(size_t k=0;k<means.size();++k){ Vec2 term = {means[k].x - x.x, means[k].y - x.y}; g = g + (r[k]/s2) * term; }
41         return g;
42     }
43 };
44 
45 double randn(mt19937_64 &rng){ static normal_distribution<double> nd(0.0,1.0); return nd(rng); }
46 double randu(mt19937_64 &rng){ static uniform_real_distribution<double> ud(0.0,1.0); return ud(rng); }
47 
48 double log_q_gaussian(const Vec2 &to, const Vec2 &from, const Vec2 &score_from, double eta){
49     // q(to | from) = N(from + eta * score_from, 2 eta I)
50     Vec2 mean = from + eta * score_from;
51     Vec2 diff = {to.x - mean.x, to.y - mean.y};
52     double quad = dot(diff, diff);
53     // log density up to constant: -||diff||^2 / (4 eta) + C
54     return - quad / (4.0 * eta);
55 }
56 
57 int main(){
58     ios::sync_with_stdio(false);
59     cin.tie(nullptr);
60 
61     GaussianMixture2D mix;
62     mix.means = { Vec2{2.0, 2.0}, Vec2{-2.0, -2.0} };
63     mix.w = {0.5, 0.5};
64     mix.s2 = 0.5;
65 
66     double eta = 0.05; // MALA can often use larger steps due to MH correction; still tune carefully
67     size_t iters = 15000, burn_in = 3000;
68 
69     mt19937_64 rng(random_device{}());
70 
71     Vec2 x{0.0, 0.0};
72     size_t accepted = 0, proposed = 0;
73     vector<Vec2> samples;
74 
75     for(size_t t=0; t<iters; ++t){
76         Vec2 g_x = mix.score(x);
77         Vec2 noise{ randn(rng), randn(rng) };
78         // Propose y
79         Vec2 y = x + eta * g_x + sqrt(2.0*eta) * noise;
80 
81         // Compute acceptance log ratio
82         double logp_x = mix.logp(x);
83         double logp_y = mix.logp(y);
84         Vec2 g_y = mix.score(y);
85         double log_q_yx = log_q_gaussian(x, y, g_y, eta);
86         double log_q_xy = log_q_gaussian(y, x, g_x, eta);
87         double log_alpha = (logp_y - logp_x) + (log_q_yx - log_q_xy);
88 
89         double u = randu(rng);
90         bool accept = (log(u) < log_alpha);
91         ++proposed;
92         if(accept){ x = y; ++accepted; }
93 
94         if(t >= burn_in && (t-burn_in)%10==0){ samples.push_back(x); }
95     }
96 
97     cout.setf(std::ios::fixed); cout<<setprecision(4);
98     for(size_t i=0;i<min<size_t>(samples.size(),20);++i){ cout<<samples[i].x<<","<<samples[i].y<<"\n"; }
99     cerr << "Acceptance rate: " << (100.0*accepted/proposed) << "%\n";
100     return 0;
101 }
102

Implements MALA on the same Gaussian mixture. The proposal follows a Langevin step and is accepted/rejected with the Metropolis–Hastings rule based on target and proposal densities. This yields exact samples from the target in the limit of long runs.

Time: O(T K d) with a larger constant than ULA (two score/logp evaluations per iteration).Space: O(d) for state; O(N d) to store N samples.

Annealed Langevin sampling with an oracle score across noise levels

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct Vec2 { double x, y; };
5 Vec2 operator+(const Vec2 &a, const Vec2 &b){ return {a.x+b.x, a.y+b.y}; }
6 Vec2 operator-(const Vec2 &a, const Vec2 &b){ return {a.x-b.x, a.y-b.y}; }
7 Vec2 operator*(double s, const Vec2 &a){ return {s*a.x, s*a.y}; }
8 Vec2 operator*(const Vec2 &a, double s){ return {s*a.x, s*a.y}; }
9 double dot(const Vec2 &a, const Vec2 &b){ return a.x*b.x + a.y*b.y; }
10 
11 // Mixture model used as "data"; we will assume the oracle provides the score of the noise-convolved density p_\sigma
12 struct Mixture2D {
13     vector<Vec2> means; vector<double> w; double s2; // intrinsic variance of components
14 
15     // Score of p_\sigma(x) = (p * N(0, sigma^2 I))(x) for an isotropic mixture
16     // Each component's covariance becomes s2 + sigma^2
17     Vec2 score_sigma(const Vec2 &x, double sigma) const {
18         double v = s2 + sigma*sigma; // inflated variance
19         double log_norm = -0.5 * 2 * log(2*M_PI*v);
20         vector<double> logs; logs.reserve(means.size());
21         for(size_t k=0;k<means.size();++k){
22             Vec2 d = {x.x - means[k].x, x.y - means[k].y};
23             double quad = dot(d, d);
24             logs.push_back(log(w[k]) + log_norm + (-0.5 * quad / v));
25         }
26         double m = *max_element(logs.begin(), logs.end());
27         double Z = 0.0; vector<double> r(means.size());
28         for(size_t k=0;k<means.size();++k){ r[k] = exp(logs[k]-m); Z += r[k]; }
29         for(size_t k=0;k<means.size();++k){ r[k] /= Z; }
30         Vec2 g{0.0, 0.0};
31         for(size_t k=0;k<means.size();++k){ Vec2 term = {means[k].x - x.x, means[k].y - x.y}; g = g + (r[k] / v) * term; }
32         return g; // equals \nabla_x log p_\sigma(x)
33     }
34 };
35 
36 double randn(mt19937_64 &rng){ static normal_distribution<double> nd(0.0,1.0); return nd(rng); }
37 
38 int main(){
39     ios::sync_with_stdio(false);
40     cin.tie(nullptr);
41 
42     // Define a simple "data" mixture; treat score_sigma as an oracle (e.g., from a trained network)
43     Mixture2D data;
44     data.means = { {2.0, 0.0}, {-2.0, 0.0}, {0.0, 2.5} };
45     data.w = {0.4, 0.4, 0.2};
46     data.s2 = 0.3; // intrinsic variance
47 
48     // Noise schedule (from large to small)
49     vector<double> sigmas; // geometric schedule
50     double sigma_max = 2.0, sigma_min = 0.1; int L = 8;
51     for(int i=0;i<L;++i){
52         double p = (double)i/(L-1);
53         double s = sigma_max * pow(sigma_min/sigma_max, p); // geometric interpolation
54         sigmas.push_back(s);
55     }
56 
57     // Steps per level and step-size rule: eta_l = c * sigma_l^2
58     int steps_per_level = 100;
59     double c = 0.1; // tune for stability/quality
60 
61     // RNG and initialization from pure noise
62     mt19937_64 rng(random_device{}());
63     Vec2 x{ randn(rng)*sigma_max, randn(rng)*sigma_max };
64 
65     for(int l=0; l<L; ++l){
66         double sigma = sigmas[l];
67         double eta = c * sigma * sigma;
68         for(int k=0; k<steps_per_level; ++k){
69             Vec2 g = data.score_sigma(x, sigma); // oracle score at this noise level
70             Vec2 z{ randn(rng), randn(rng) };
71             x = x + eta * g + sqrt(2.0*eta) * z;
72         }
73         // Optional: small printout to track progress
74         cerr << "Level " << l+1 << "/" << L << ", sigma=" << sigma << ", x=(" << x.x << "," << x.y << ")\n";
75     }
76 
77     cout.setf(std::ios::fixed); cout<<setprecision(4);
78     cout << x.x << "," << x.y << "\n"; // final synthesized sample
79     return 0;
80 }
81

Demonstrates annealed Langevin sampling using an oracle score \nabla_x log p_\sigma(x) for a noise-convolved target. In practice, the oracle would be a trained score network s_\theta(x,\sigma). The schedule starts with large noise (easy landscape) and gradually sharpens, guiding x toward realistic samples.

Time: O(L K C_s) where L is the number of noise levels, K steps per level, and C_s the cost of a score evaluation (here O(K_mix d)).Space: O(d).

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Simple 2D vector utilities
5	struct Vec2 {
6	double x, y;
7	};
8
9	Vec2 operator+(const Vec2 &a, const Vec2 &b){ return {a.x+b.x, a.y+b.y}; }
10	Vec2 operator-(const Vec2 &a, const Vec2 &b){ return {a.x-b.x, a.y-b.y}; }
11	Vec2 operator(double s, const Vec2 &a){ return {sa.x, s*a.y}; }
12	Vec2 operator(const Vec2 &a, double s){ return {sa.x, s*a.y}; }
13
14	double dot(const Vec2 &a, const Vec2 &b){ return a.xb.x + a.yb.y; }
15
16	typedef vector<Vec2> Vec2List;
17
18	// Target: mixture of K isotropic Gaussians N(mu_k, s2 I)
19	struct GaussianMixture2D {
20	Vec2List means; // component means
21	vector<double> w; // weights (sum to 1)
22	double s2; // component variance (scalar)
23
24	// log unnormalized density at x
25	double logp(const Vec2 &x) const {
26	// log sum_k w_k * N(x\|mu_k, s2 I)
27	// N(x\|mu, s2 I) = (2pi s2)^(-d/2) * exp(-\|\|x-mu\|\|^2/(2 s2))
28	// constants cancel in gradients and MH ratios, but we keep them for clarity
29	double log_norm = -0.5 * 2 * log(2M_PIs2); // d=2
30	vector<double> logs;
31	logs.reserve(means.size());
32	for(size_t k=0;k<means.size();++k){
33	Vec2 diff = x - means[k];
34	double quad = dot(diff, diff);
35	logs.push_back(log(w[k]) + log_norm + (-0.5 * quad / s2));
36	}
37	// log-sum-exp for numerical stability
38	double m = *max_element(logs.begin(), logs.end());
39	double sumexp = 0.0;
40	for(double v: logs) sumexp += exp(v - m);
41	return m + log(sumexp);
42	}
43
44	// score: grad_x log p(x)
45	Vec2 score(const Vec2 &x) const {
46	// \nabla log p(x) = sum_k r_k(x) * (mu_k - x)/s2, where r_k are posterior weights
47	vector<double> logs;
48	logs.reserve(means.size());
49	double log_norm = -0.5 * 2 * log(2M_PIs2);
50	for(size_t k=0;k<means.size();++k){
51	Vec2 diff = x - means[k];
52	double quad = dot(diff, diff);
53	logs.push_back(log(w[k]) + log_norm + (-0.5 * quad / s2));
54	}
55	double m = *max_element(logs.begin(), logs.end());
56	vector<double> r(means.size());
57	double Z = 0.0;
58	for(size_t k=0;k<means.size();++k){ r[k] = exp(logs[k]-m); Z += r[k]; }
59	for(size_t k=0;k<means.size();++k){ r[k] /= Z; }
60	Vec2 g{0.0, 0.0};
61	for(size_t k=0;k<means.size();++k){
62	Vec2 term = means[k] - x; // (mu_k - x)
63	g = g + (r[k] / s2) * term;
64	}
65	return g;
66	}
67	};
68
69	// Draw standard normal N(0,1)
70	double randn(std::mt19937_64 &rng){ static normal_distribution<double> nd(0.0,1.0); return nd(rng); }
71
72	int main(){
73	ios::sync_with_stdio(false);
74	cin.tie(nullptr);
75
76	// Define a simple 2-component mixture
77	GaussianMixture2D mix;
78	mix.means = { Vec2{2.0, 2.0}, Vec2{-2.0, -2.0} };
79	mix.w = {0.5, 0.5};
80	mix.s2 = 0.5; // component variance
81
82	// Langevin parameters
83	double eta = 0.02; // step size (must be small enough for stability)
84	size_t burn_in = 2000; // discard initial samples
85	size_t total_iters = 12000; // total iterations
86
87	// RNG
88	std::random_device rd;
89	std::mt19937_64 rng(rd());
90
91	// Initialize at the origin
92	Vec2 x{0.0, 0.0};
93
94	// Collect some samples after burn-in
95	vector<Vec2> samples;
96	for(size_t t=0; t<total_iters; ++t){
97	// Compute score (gradient of log-density)
98	Vec2 g = mix.score(x);
99	// Gaussian noise with variance 2*eta per coordinate
100	Vec2 noise{ randn(rng), randn(rng) };
101	// ULA update: x <- x + eta * score + sqrt(2eta) noise
102	x = x + eta * g + sqrt(2.0eta) noise;
103	if(t >= burn_in && (t - burn_in) % 10 == 0){ // thin every 10 steps
104	samples.push_back(x);
105	}
106	}
107
108	// Print a few samples
109	cout.setf(std::ios::fixed); cout<<setprecision(4);
110	for(size_t i=0; i<min<size_t>(samples.size(), 20); ++i){
111	cout << samples[i].x << "," << samples[i].y << "\n";
112	}
113	cerr << "Collected " << samples.size() << " samples after burn-in.\n";
114	return 0;
115	}
116

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct Vec2 { double x, y; };
5	Vec2 operator+(const Vec2 &a, const Vec2 &b){ return {a.x+b.x, a.y+b.y}; }
6	Vec2 operator-(const Vec2 &a, const Vec2 &b){ return {a.x-b.x, a.y-b.y}; }
7	Vec2 operator(double s, const Vec2 &a){ return {sa.x, s*a.y}; }
8	Vec2 operator(const Vec2 &a, double s){ return {sa.x, s*a.y}; }
9	double dot(const Vec2 &a, const Vec2 &b){ return a.xb.x + a.yb.y; }
10
11	typedef vector<Vec2> Vec2List;
12
13	struct GaussianMixture2D {
14	Vec2List means; vector<double> w; double s2;
15	double logp(const Vec2 &x) const {
16	double log_norm = -0.5 * 2 * log(2M_PIs2);
17	vector<double> logs; logs.reserve(means.size());
18	for(size_t k=0;k<means.size();++k){
19	Vec2 d = {x.x - means[k].x, x.y - means[k].y};
20	double quad = dot(d, d);
21	logs.push_back(log(w[k]) + log_norm + (-0.5 * quad / s2));
22	}
23	double m = *max_element(logs.begin(), logs.end());
24	double sumexp = 0.0; for(double v: logs) sumexp += exp(v-m);
25	return m + log(sumexp);
26	}
27	Vec2 score(const Vec2 &x) const {
28	vector<double> logs; logs.reserve(means.size());
29	double log_norm = -0.5 * 2 * log(2M_PIs2);
30	for(size_t k=0;k<means.size();++k){
31	Vec2 d = {x.x - means[k].x, x.y - means[k].y};
32	double quad = dot(d, d);
33	logs.push_back(log(w[k]) + log_norm + (-0.5 * quad / s2));
34	}
35	double m = *max_element(logs.begin(), logs.end());
36	double Z = 0.0; vector<double> r(means.size());
37	for(size_t k=0;k<means.size();++k){ r[k] = exp(logs[k]-m); Z += r[k]; }
38	for(size_t k=0;k<means.size();++k){ r[k] /= Z; }
39	Vec2 g{0.0, 0.0};
40	for(size_t k=0;k<means.size();++k){ Vec2 term = {means[k].x - x.x, means[k].y - x.y}; g = g + (r[k]/s2) * term; }
41	return g;
42	}
43	};
44
45	double randn(mt19937_64 &rng){ static normal_distribution<double> nd(0.0,1.0); return nd(rng); }
46	double randu(mt19937_64 &rng){ static uniform_real_distribution<double> ud(0.0,1.0); return ud(rng); }
47
48	double log_q_gaussian(const Vec2 &to, const Vec2 &from, const Vec2 &score_from, double eta){
49	// q(to \| from) = N(from + eta * score_from, 2 eta I)
50	Vec2 mean = from + eta * score_from;
51	Vec2 diff = {to.x - mean.x, to.y - mean.y};
52	double quad = dot(diff, diff);
53	// log density up to constant: -\|\|diff\|\|^2 / (4 eta) + C
54	return - quad / (4.0 * eta);
55	}
56
57	int main(){
58	ios::sync_with_stdio(false);
59	cin.tie(nullptr);
60
61	GaussianMixture2D mix;
62	mix.means = { Vec2{2.0, 2.0}, Vec2{-2.0, -2.0} };
63	mix.w = {0.5, 0.5};
64	mix.s2 = 0.5;
65
66	double eta = 0.05; // MALA can often use larger steps due to MH correction; still tune carefully
67	size_t iters = 15000, burn_in = 3000;
68
69	mt19937_64 rng(random_device{}());
70
71	Vec2 x{0.0, 0.0};
72	size_t accepted = 0, proposed = 0;
73	vector<Vec2> samples;
74
75	for(size_t t=0; t<iters; ++t){
76	Vec2 g_x = mix.score(x);
77	Vec2 noise{ randn(rng), randn(rng) };
78	// Propose y
79	Vec2 y = x + eta * g_x + sqrt(2.0eta) noise;
80
81	// Compute acceptance log ratio
82	double logp_x = mix.logp(x);
83	double logp_y = mix.logp(y);
84	Vec2 g_y = mix.score(y);
85	double log_q_yx = log_q_gaussian(x, y, g_y, eta);
86	double log_q_xy = log_q_gaussian(y, x, g_x, eta);
87	double log_alpha = (logp_y - logp_x) + (log_q_yx - log_q_xy);
88
89	double u = randu(rng);
90	bool accept = (log(u) < log_alpha);
91	++proposed;
92	if(accept){ x = y; ++accepted; }
93
94	if(t >= burn_in && (t-burn_in)%10==0){ samples.push_back(x); }
95	}
96
97	cout.setf(std::ios::fixed); cout<<setprecision(4);
98	for(size_t i=0;i<min<size_t>(samples.size(),20);++i){ cout<<samples[i].x<<","<<samples[i].y<<"\n"; }
99	cerr << "Acceptance rate: " << (100.0*accepted/proposed) << "%\n";
100	return 0;
101	}
102