🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
∑MathAdvanced

Stochastic Differential Equations for Generation

Key Points

  • •
    A forward stochastic differential equation (SDE) models a state that drifts deterministically and is shaken by random Brownian noise over time.
  • •
    In diffusion-based generation, the forward SDE gradually corrupts data with a controlled noise schedule so the distribution becomes easy (often Gaussian).
  • •
    The reverse-time SDE, driven by the score (the gradient of log-density), inverts this corruption to generate new samples from noise.
  • •
    Euler–Maruyama is the basic C++-friendly integrator: add drift × Δt and noise × √Δt at each step.
  • •
    Choosing g(t) (the diffusion strength) is a design choice called the noise schedule; Variance Preserving (VP) and Variance Exploding (VE) are two popular families.
  • •
    Numerical stability needs small time steps; forgetting the √Δt factor or the sign in reverse-time integration is a common pitfall.
  • •
    Time complexity for simulating M paths with N steps in d dimensions is O(M N d) and space can be O(d) per path if streamed.
  • •
    A probability flow ODE provides a deterministic alternative to the reverse SDE that yields the same marginal distributions under ideal conditions.

Prerequisites

  • →Calculus (differentiation and integration) — Understand drift terms, integrals over time, and step-size effects.
  • →Probability and random variables — Interpret Gaussian noise, variance, and distributions over time.
  • →Gaussian distributions — Compute analytic scores and variances used in examples.
  • →Numerical methods for ODEs/SDEs — Apply Euler–Maruyama and reason about stability and convergence.
  • →Linear algebra (vectors, gradients) — Generalize to multi-dimensional SDEs and compute scores.
  • →Itô calculus (basic familiarity) — Know why noise scales as √Δt and how SDEs differ from ODEs.
  • →Random number generation — Implement high-quality Gaussian sampling in C++.

Detailed Explanation

Tap terms for definitions

01Overview

Hook → We often want to turn simple noise into rich, realistic data like images or audio. How can random motion become creative generation? Concept → Stochastic Differential Equations (SDEs) provide a continuous-time way to add or remove noise from data. A forward SDE has two parts: a drift f(x,t) that nudges the state in a predictable direction, and a diffusion g(t) that injects random Brownian kicks. Over time, this process transforms a complex data distribution into a simpler one (usually Gaussian). Crucially, there exists a reverse-time SDE that, if we know the score (the gradient of the log-density at each time), runs the process backward to synthesize new data from noise. Example → In diffusion models, the VP SDE uses a time-varying diffusion strength so that starting from a clean signal at t=0 and moving forward to t=1, the sample ends up as nearly pure Gaussian noise. Sampling then draws x(1) from that Gaussian and numerically integrates the reverse SDE back to t=0, producing a fresh sample that looks like the training data.

02Intuition & Analogies

Hook → Imagine a boat drifting down a river while random waves bump it around. If the bumps get stronger over time, the boat’s final location becomes mostly random. Concept → The forward SDE is that boat ride: the drift f(x,t) is the river’s current guiding the boat, while the diffusion g(t) dW is the choppy water adding random jolts. If we slowly turn up g(t), the randomness overwhelms the initial condition, washing away detailed structure. In generative modeling, this is exactly what we want: convert complicated data into standardized noise in a controlled way. Now flip the story. If you had a perfect map of how crowded each part of the river is at each time (the probability density), you could steer a similar boat against the flow of randomness. That steering force is the score ∇ log p_t(x). The reverse SDE says: combine the original current with a correction proportional to this score, and you can backtrack from noise to plausible data. Example → Think of a camera that gradually blurs an image with time. The reverse SDE is like a smart deblurring process that knows at every blur level which direction sharpens the image most—because it has the gradient of log-likelihood of clean images under that blur level.

03Formal Definition

Hook→To move from analogy to math, we write down the continuous-time stochastic dynamics. Concept→A forward Itô SDE in one dimension is dxt​=f(xt​,t) dt + g(t) dWt​, where Wt​ is standard Brownian motion. In d dimensions, dxt​=f(xt​,t) dt + G(t) dWt​ with Wt​ ∈ Rd and diffusion matrix G. The probability density pt​(x) of xt​ satisfies the Fokker–Planck (forward Kolmogorov) equation, linking sample paths to distribution evolution. In diffusion models, two canonical choices are Variance Preserving (VP): f(x,t) = -21​ β(t) x, g(t) = β(t)​, and Variance Exploding (VE): f(x,t) = 0, g(t) chosen to grow variance exponentially. The reverse-time SDE (Anderson, 1982; Song et al., 2021) that recovers pt​(x) from pt+dt​(x) reads dxt​ = [f(xt​,t) - G(t)G(t)^{⊤} ∇_x log pt​(xt​)] dt + G(t) d\bar Wt​ when integrating with t decreasing. Example→For scalar driftless diffusion dx=g(t) dW, the variance evolves as Var[xt​] = Var[x0​] + ∫_0^t g(s)^2 ds. The associated reverse SDE uses the score ∂_x log pt​(x) = -x/Var[xt​] to drive samples from high-variance noise back to the target variance.

04When to Use

Hook → When should you trade deterministic models for noisy ones? Concept → Use SDEs when randomness is inherent (thermal noise, finance) or when you intentionally inject noise to make learning and sampling easier (diffusion generative models). Forward SDEs are used to define a smooth corruption process from data to a simple prior so that likelihoods and training objectives are well-behaved. Reverse SDEs are used for sampling—turning easy noise into structured outputs—especially when you can approximate the score with a neural network. Example use cases → (1) Diffusion models for images/audio/text: VP/VE SDEs with learned score networks generate high-quality samples. (2) Physics/chemistry: simulate stochastic dynamics (e.g., Langevin or Ornstein–Uhlenbeck) for Monte Carlo estimation. (3) Uncertainty modeling: propagate distributions through noisy systems using many sample paths. (4) Toy pedagogical setups: choose g(t) to get analytic Gaussian marginals and demonstrate forward/backward dynamics without training a network.

⚠️Common Mistakes

Hook → Most bugs in SDE code are silent: results look plausible but are wrong. Concept → Common errors include: (1) Forgetting the √Δt factor for noise in Euler–Maruyama, which makes variance scale incorrectly. (2) Using the wrong sign or time direction in reverse-time integration—remember you integrate from T down to 0 and keep dt negative or index backward. (3) Confusing Itô with Stratonovich calculus; standard diffusion-model discretizations use Itô. (4) Choosing g(t) or Δt too large, causing instability or bias. (5) Ignoring dimensionality: in d dimensions, diffusion is a matrix G and Brownian noise is vector-valued. (6) Poor random number generation: reusing seeds incorrectly or coupling streams across threads can bias Monte Carlo. Example → In a VP SDE, if you implement x_{k+1} = x_k + f Δt + g ξ (omitting √Δt), your noise variance grows Δt times too fast, collapsing the distribution. In reverse SDE code, if you step forward in time by mistake (t: 0→T) with the reverse drift, samples diverge rather than denoise.

Key Formulas

Forward Itô SDE (scalar)

dxt​=f(xt​,t)dt+g(t)dWt​

Explanation: State xt​ changes by a deterministic drift f and a random Brownian term scaled by g(t). This is the basic model for diffusion processes used in generation.

Fokker–Planck (scalar diffusion)

∂t​pt​(x)=−∇⋅(f(x,t)pt​(x))+21​g(t)2Δpt​(x)

Explanation: The density evolves under drift-induced transport and diffusion-induced spreading. With scalar g(t), the diffusion term is g(t)^2/2 times the Laplacian.

Euler–Maruyama update

xk+1​=xk​+f(xk​,tk​)Δt+g(tk​)Δt​ξk​,ξk​∼N(0,1)

Explanation: This is the simplest numerical integrator for SDEs. At each step, add deterministic drift times step size and random noise scaled by the square root of the step.

Reverse-time SDE

dxt​=(f(xt​,t)−g(t)2∇x​logpt​(xt​))dt+g(t)dWˉt​(time runs backward)

Explanation: To sample from the data distribution, integrate this SDE from T to 0 using the score. The extra term subtracts the diffusion times the score to reverse the corruption.

Probability flow ODE

dxt​=f(xt​,t)−21​g(t)2∇x​logpt​(xt​)

Explanation: A deterministic flow with the same marginals as the forward SDE when the score is exact. It can be integrated with standard ODE solvers.

VP SDE coefficients

fVP​(x,t)=−21​β(t)x,gVP​(t)=β(t)​

Explanation: This choice preserves (approximately) the signal’s variance across time while adding noise. It is standard in DDPM-like diffusion models.

VP marginal scaling

α(t)=exp(−21​∫0t​β(s)ds),Var[xt​]≈1−α(t)2+α(t)2Var[x0​]

Explanation: For the VP SDE, the mean shrinks by α(t) and variance follows this expression. If Var[x0​]=1, the variance remains near 1.

Driftless variance growth

Var[xt​]=Var[x0​]+∫0t​g(s)2ds

Explanation: When f=0, the variance increases by the time-integral of the diffusion strength squared. This lets you design g(t) to achieve a target variance schedule.

Itô isometry

E[(∫0t​g(s)dWs​)2]=∫0t​g(s)2ds

Explanation: The variance of a stochastic integral equals the integral of g(s)^2. This underlies the variance growth in driftless SDEs.

Euler–Maruyama convergence rates

E[∣XT​−XTΔ​∣]=O(Δt1/2),∣E[ϕ(XT​)]−E[ϕ(XTΔ​)]∣=O(Δt)

Explanation: Euler–Maruyama has strong order 21​ and weak order 1. Pathwise errors decrease with the square root of the step size; expectation errors decrease linearly.

Monte Carlo standard error

SE(μ^​)≈σ/M​

Explanation: Averaging M independent samples reduces estimation error like 1/√M. This guides how many paths to simulate for reliable statistics.

Complexity Analysis

Simulating an SDE numerically with Euler–Maruyama requires sampling a Gaussian noise per dimension per time step and evaluating the drift and diffusion. For M independent paths, N time steps, and d-dimensional state, the time complexity is O(M N d): each step generates d normal variates and performs O(d) arithmetic for drift/diffusion. If the drift or score evaluation is more expensive—e.g., a neural network with cost Cn​et—time becomes O(M N (d + Cn​et)). Random number generation is often a bottleneck; high-quality RNGs (e.g., mt19937_64 with std::normald​istribution) add constant factors but keep overall linear scaling. Space complexity depends on whether you store full trajectories. Streaming a single path in-place uses O(d) memory (plus RNG state). Storing all time steps for analysis uses O(N d) per path, and O(M N d) if you retain all paths. For reverse-time SDEs in generative modeling, time runs from T to 0 with the same complexity order; however, computing the score (∇ log pt​) typically dominates if it’s a neural network. Deterministic probability-flow ODE solvers (e.g., Runge–Kutta) also scale O(M N d) but often require smaller steps for stability when the drift (including −1/2 g2 score) is stiff. In high dimensions, choosing diagonal or low-rank diffusion can reduce per-step cost. Overall, performance tuning focuses on vectorizing drift/score evaluations, batching RNG calls, and balancing step size (accuracy) against the linear step-count cost.

Code Examples

Forward SDE simulation (VP SDE) with Euler–Maruyama and empirical statistics
1#include <bits/stdc++.h>
2using namespace std;
3
4// Simulate the 1D Variance Preserving (VP) SDE:
5// dx = f(x,t) dt + g(t) dW, with f(x,t) = -0.5 * beta(t) * x, g(t) = sqrt(beta(t)).
6// Beta schedule: linear between beta_min and beta_max.
7
8struct VPSDE {
9 double beta_min, beta_max; // schedule parameters
10 explicit VPSDE(double bmin, double bmax) : beta_min(bmin), beta_max(bmax) {}
11 inline double beta(double t) const { return beta_min + (beta_max - beta_min) * t; }
12 inline double drift(double x, double t) const { return -0.5 * beta(t) * x; }
13 inline double diffusion(double t) const { return sqrt(max(0.0, beta(t))); }
14};
15
16int main() {
17 ios::sync_with_stdio(false);
18 cin.tie(nullptr);
19
20 // Simulation parameters
21 const int M = 20000; // number of independent paths
22 const int N = 1000; // number of time steps
23 const double T = 1.0; // final time
24 const double dt = T / N;
25
26 // VP SDE parameters (typical diffusion-model values)
27 VPSDE sde(0.1, 20.0); // beta_min, beta_max
28
29 // RNG setup
30 random_device rd;
31 mt19937_64 gen(rd());
32 normal_distribution<double> normal01(0.0, 1.0);
33
34 // Initialize x0 ~ N(0,1) to mimic standardized data
35 vector<double> x(M);
36 for (int i = 0; i < M; ++i) x[i] = normal01(gen);
37
38 // Euler–Maruyama integration forward from t=0 to t=T
39 double t = 0.0;
40 for (int k = 0; k < N; ++k) {
41 double g = sde.diffusion(t);
42 double sqdt = sqrt(dt);
43 for (int i = 0; i < M; ++i) {
44 double drift = sde.drift(x[i], t);
45 double noise = g * sqdt * normal01(gen);
46 x[i] += drift * dt + noise; // EM step
47 }
48 t += dt;
49 }
50
51 // Empirical mean and variance at time T
52 double mean = 0.0, var = 0.0;
53 for (double xi : x) mean += xi;
54 mean /= M;
55 for (double xi : x) var += (xi - mean) * (xi - mean);
56 var /= (M - 1);
57
58 cout.setf(std::ios::fixed); cout << setprecision(6);
59 cout << "Empirical mean at T: " << mean << "\n";
60 cout << "Empirical variance at T: " << var << "\n";
61
62 // For VP SDE with Var[x0]=1, the variance remains close to 1 theoretically.
63 // We can compute alpha(t) = exp(-0.5 * \int_0^T beta(s) ds).
64 auto integral_beta = [&](double Tfin){
65 // For linear beta(t) = bmin + (bmax-bmin) t: integral = bmin T + 0.5 (bmax-bmin) T^2
66 return sde.beta_min * Tfin + 0.5 * (sde.beta_max - sde.beta_min) * Tfin * Tfin;
67 };
68 double alpha = exp(-0.5 * integral_beta(T));
69 double var_theory = (1 - alpha * alpha) + (alpha * alpha) * 1.0; // Var[x0]=1
70 cout << "Theoretical variance (VP, Var[x0]=1): " << var_theory << "\n";
71
72 return 0;
73}
74

This program simulates the 1D Variance Preserving (VP) SDE using Euler–Maruyama. The drift shrinks x proportionally to β(t) while diffusion injects √β(t) noise. We initialize x0 ~ N(0,1), step forward to T=1, and report the empirical mean and variance. For the VP SDE, if the initial variance is 1, the variance stays about 1 across time, which the program confirms numerically.

Time: O(M N)Space: O(M)
Reverse-time SDE sampling in a driftless Gaussian case with analytic score
1#include <bits/stdc++.h>
2using namespace std;
3
4// Demonstrate reverse-time sampling without a neural net by choosing a driftless forward SDE
5// with analytically Gaussian marginals. Forward SDE: dx = g(t) dW, f=0.
6// Then Var[x_t] = Var[x_0] + \int_0^t g(s)^2 ds. Choose g(t) = sqrt(s), constant in t
7// (here, s is just a scalar), so Var[x_t] = Var[x_0] + s t.
8// We'll set Var[x_0] = 1 and s = 9, so Var[x_1] = 10. The score is ∂_x log p_t(x) = -x / Var[x_t].
9// Reverse-time SDE: dx = [f - g(t)^2 * score] dt + g(t) d\bar W (integrated from t=T down to 0).
10// With f=0 and g^2 = s, the reverse drift is (s / Var[x_t]) * x.
11
12struct DriftlessGaussian {
13 double var0; // Var at t=0
14 double s; // diffusion power: g^2 = s (constant)
15 DriftlessGaussian(double var0_, double s_) : var0(var0_), s(s_) {}
16 inline double g(double /*t*/) const { return sqrt(max(0.0, s)); }
17 inline double var_t(double t) const { return var0 + s * t; }
18 inline double score(double x, double t) const { return -x / var_t(t); } // ∂_x log p_t(x)
19};
20
21int main() {
22 ios::sync_with_stdio(false);
23 cin.tie(nullptr);
24
25 // Model: Var[x_0] = 1, s=9 -> Var[x_1] = 10
26 DriftlessGaussian model(1.0, 9.0);
27
28 // Reverse-time integration settings
29 const int M = 20000; // number of samples to generate
30 const int N = 1000; // steps
31 const double T = 1.0; // start from t=T down to 0
32 const double dt_pos = T / N; // positive step magnitude
33
34 // RNG
35 random_device rd; mt19937_64 gen(rd());
36 normal_distribution<double> normal01(0.0, 1.0);
37
38 // Initialize from the forward terminal distribution: x_T ~ N(0, Var[x_T]) with Var[x_T]=10
39 double varT = model.var_t(T);
40 normal_distribution<double> normal0_varT(0.0, sqrt(varT));
41 vector<double> x(M);
42 for (int i = 0; i < M; ++i) x[i] = normal0_varT(gen);
43
44 // Integrate the reverse SDE from t=T to t=0 with negative dt
45 double t = T;
46 for (int k = 0; k < N; ++k) {
47 double dt = -dt_pos; // negative because we go backward in time
48 double g = model.g(t);
49 double sqdt = sqrt(dt_pos); // magnitude for Brownian increment
50 for (int i = 0; i < M; ++i) {
51 // reverse drift: -g^2 * score = s * x / Var[x_t]
52 double rev_drift = (model.s / model.var_t(t)) * x[i];
53 double noise = g * sqdt * normal01(gen); // uses d\bar W with variance |dt|
54 x[i] += rev_drift * dt + noise; // Euler–Maruyama backward step
55 }
56 t += dt; // decrease time
57 }
58
59 // Empirical mean/variance of generated samples (target: Var[x_0] ~ 1)
60 double mean = 0.0, var = 0.0;
61 for (double xi : x) mean += xi; mean /= M;
62 for (double xi : x) var += (xi - mean) * (xi - mean); var /= (M - 1);
63
64 cout.setf(std::ios::fixed); cout << setprecision(6);
65 cout << "Generated mean at t=0: " << mean << "\n";
66 cout << "Generated variance at t=0 (should be ~1): " << var << "\n";
67
68 return 0;
69}
70

We pick a driftless forward SDE with constant diffusion power s so Var[x_t] = Var[x_0] + s t is analytic. This yields a simple, exact score: −x / Var[x_t]. Starting from x_T ~ N(0, 10) (with Var[x_0]=1 and s=9), we integrate the reverse SDE backward to t=0. The empirical variance of generated samples is close to 1, demonstrating score-based denoising without a neural network.

Time: O(M N)Space: O(M)
#stochastic differential equation#diffusion model#euler maruyama#reverse-time sde#score matching#fokker planck#variance preserving#variance exploding#brownian motion#probability flow ode#noise schedule#ito calculus#monte carlo#gaussian score#generative modeling