🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
⚙️AlgorithmIntermediate

Monte Carlo Estimation

Key Points

  • •
    Monte Carlo estimation approximates an expected value by averaging function values at random samples drawn from a probability distribution.
  • •
    The estimator \(μ^​N​ = N1​ ∑i=1N​ f(Xi​)\) is unbiased and its variance shrinks like \(Var(f(X))/N\).
  • •
    By the Law of Large Numbers, the Monte Carlo average converges to the true expectation as the number of samples increases.
  • •
    By the Central Limit Theorem, the error is approximately normal with standard error \(σ/N​\), enabling confidence intervals.
  • •
    Importance sampling and antithetic variates can dramatically reduce variance without changing the estimator’s expectation.
  • •
    Monte Carlo is especially powerful for high-dimensional integrals and problems with complex or unknown analytic solutions.
  • •
    The time complexity is linear in the number of samples and usually the memory cost is constant.
  • •
    Careful seeding, correct weighting, and support coverage are essential to avoid bias and numerical instability.

Prerequisites

  • →Basic probability and random variables — Understanding expectations, independence, and distributions is essential for why sample averages estimate means.
  • →Integral calculus — Expectations can be written as integrals; Monte Carlo approximates these integrals by sampling.
  • →Variance and Central Limit Theorem — Variance quantifies estimator spread, and the CLT justifies confidence intervals for Monte Carlo estimates.
  • →C++ random number generation — You must know how to seed and use std::mt19937_64 and distribution objects to implement Monte Carlo.
  • →Numerical stability and floating point — Log-domain computations and stable online variance are critical for rare events and long runs.

Detailed Explanation

Tap terms for definitions

01Overview

Monte Carlo estimation is a general technique to approximate expectations (averages) of random quantities using randomness itself. Suppose you want E[f(X)] for some function f and random variable X with distribution P, but you cannot compute it analytically. Monte Carlo draws independent samples X_1, X_2, ..., X_N from P and computes the average (1/N) Σ f(X_i). This sample mean is a random number that, on average, equals the desired expectation and gets closer to it as you use more samples. The strength of Monte Carlo is its generality: it works for many distributions, dimensions, and functions—even when traditional calculus or closed-form solutions fail. It is widely used in physics (particle simulations), finance (option pricing), computer graphics (global illumination), machine learning (Bayesian inference), and engineering (reliability analysis). The accuracy improves as you increase N, but only at a rate proportional to 1/√N unless you apply variance reduction. You can also quantify uncertainty with confidence intervals using the Central Limit Theorem. Practically, Monte Carlo needs a high-quality pseudorandom number generator, a correct way to sample from P, and careful numerical handling when probabilities are very small or very large.

02Intuition & Analogies

Imagine you want to know the average height of trees in a huge forest but measuring all trees is impossible. A practical approach is to randomly pick N trees, measure their heights, and average them. If your sampling is fair, this average is a good estimate of the forest’s true average height, and as you measure more trees, the estimate becomes more reliable. Monte Carlo estimation does the same for mathematical expectations: instead of trees, you randomly sample outcomes X_i from a distribution P, measure f(X_i), and average. The function f could be simple (like the square) or complicated (like a simulation that returns 1 if a system survives one year). Each sample is like a new tree measurement contributing to the overall average. The uncertainty in your final average is like the randomness in which trees you happened to pick; more samples reduce this uncertainty. Sometimes the forest has rare giant trees that dominate the average—if you rarely sample them, your average is noisy. Importance sampling is like deliberately walking to where giant trees are more common, then compensating with weights to keep the estimate fair. Antithetic variates are like measuring pairs of trees in opposite directions from a central point to cancel some randomness. Overall, Monte Carlo turns hard integrals and expectations into a counting-and-averaging game with a clear rule: sample fairly, average carefully, and use tricks to reduce noise.

03Formal Definition

Let (Ω, F, P) be a probability space and X ∼ P a random variable taking values in X. For a measurable function f: X → R with finite variance \(σ2 = Var(f(X)) < ∞\), the Monte Carlo estimator of \(μ = E[f(X)]\) using i.i.d. samples X1​,…,XN​ ∼ P is \(μ^​N​ = N1​∑i=1N​ f(Xi​)\). Then \(E[μ^​N​] = μ\) (unbiasedness) and \(Var(μ^​N​) = σ2/N\). By the Strong Law of Large Numbers, \(μ^​N​ → μ\) almost surely as N → ∞. By the Central Limit Theorem, if \(σ2 < ∞\), then \(N​(μ^​N​ - μ) d​ N(0, σ2)\). In practice, \(σ2\) is unknown and we estimate it with the sample variance \(s2 = N−11​∑i=1N​(f(Xi​) - μ^​N​)^2\), leading to an estimated standard error \(SE(μ^​N​) ≈ s/N​\) and approximate confidence intervals. For importance sampling with proposal Q having density q and target density p, if p is absolutely continuous with respect to q and w(x)=p(x)/q(x), the estimator is \(μ^​IS​ = N1​∑i=1N​ f(Yi​) w(Yi​)\) with Yi​ ∼ Q, which remains unbiased when \(EQ​[∣f(Y)w(Y)∣] < ∞\).

04When to Use

Use Monte Carlo estimation when the expectation or integral has no closed form or is too complicated for analytic techniques, particularly in high dimensions where grid-based numerical integration becomes infeasible. It is well-suited for: (1) High-dimensional integrals, such as computing volumes or Bayesian posterior expectations; (2) Stochastic simulations where f(X) encapsulates a complex process (e.g., queuing systems, reliability, epidemic spread); (3) Financial engineering tasks like pricing options and risk measures when payoff expectations under a risk-neutral measure are not solvable analytically; (4) Computer graphics global illumination, where the rendering equation is an integral over light paths; (5) Rare-event probabilities when combined with variance reduction like importance sampling; (6) Situations demanding error bars: Monte Carlo naturally provides standard errors and confidence intervals. Avoid naive Monte Carlo when the variance of f(X) is very large or infinite, or when sampling from P is itself extremely expensive; in such cases, use variance reduction (importance sampling, antithetic or control variates, stratification) or quasi–Monte Carlo. If the integrand is smooth and low-dimensional (say, 1–3 dimensions), deterministic quadrature (Simpson’s, Gauss–Legendre) may converge much faster than Monte Carlo.

⚠️Common Mistakes

  1. Biased sampling: drawing from the wrong distribution or with dependence between samples without accounting for it, which breaks unbiasedness and LLN assumptions. 2) Incorrect weighting in importance sampling: forgetting the weight w = p/q, using unnormalized densities inconsistently, or choosing a proposal with lighter tails than the target so that weights explode, causing enormous variance. 3) Ignoring support mismatch: if q(x) = 0 where p(x)f(x) ≠ 0, the importance sampler is biased or undefined. 4) Underestimating uncertainty: reporting the sample mean without a standard error or confidence interval, or computing variance with divisor N instead of N−1 for the unbiased estimator. 5) Numerical instability: computing weights in probability scale rather than log-scale for extreme events, leading to overflow/underflow. 6) Poor random number handling: not seeding reproducibly when needed, reusing the same seed across parallel workers, or mixing different PRNGs improperly. 7) Inefficient variance: using plain Monte Carlo for rare events where almost all samples contribute zero; importance sampling or stratification is needed. 8) Overinterpreting small N: early runs can look accurate by chance; always check convergence with increasing N or replicate runs to assess variability.

Key Formulas

Monte Carlo Estimator

μ^​N​=N1​i=1∑N​f(Xi​),Xi​∼i.i.d.P

Explanation: Average the function f over N independent samples from P. This unbiased estimator converges to the true expectation as N grows.

Unbiasedness and Variance

E[μ^​N​]=μ,Var(μ^​N​)=NVar(f(X))​

Explanation: The estimator’s expected value equals the true mean, and its variance decreases proportionally to 1/N. Doubling N reduces standard deviation by a factor of \(2​\).

Central Limit Theorem

N​(μ^​N​−μ)d​N(0,σ2)

Explanation: For large N, the scaled error is approximately normal, enabling confidence intervals using an estimate of \(σ\).

Sample Variance

s2=N−11​i=1∑N​(f(Xi​)−μ^​N​)2

Explanation: Empirical variance with N−1 in the denominator to remove small-sample bias. Use s/N​ as the standard error estimate.

Importance Sampling Estimator

μ^​IS​=N1​i=1∑N​f(Yi​)q(Yi​)p(Yi​)​,Yi​∼Q

Explanation: Draw from a proposal distribution Q but reweight by the likelihood ratio p/q to remain unbiased with potentially lower variance.

Weight in Log-Scale

w(y)=q(y)p(y)​=exp(logp(y)−logq(y))

Explanation: Computing weights via logs avoids underflow/overflow for extreme values. This is essential in rare-event estimation.

Bias-Variance Decomposition

MSE(μ^​)=Var(μ^​)+Bias(μ^​)2

Explanation: Overall error splits into variance and squared bias. Plain Monte Carlo is unbiased, so MSE equals variance.

Chebyshev's Inequality

P(∣μ^​N​−μ∣≥ε)≤Nε2Var(f(X))​

Explanation: A nonasymptotic bound on the probability of large deviations. It shows the \(1/N\) variance decay explicitly.

Sample Size for Target Error

N≈ε2z1−α/22​σ2​

Explanation: To achieve half-width \(ε\) with confidence \(1-α\), choose N using the normal quantile and variance estimate.

Antithetic Variates

μ^​anti​=N1​i=1∑N/2​2f(Zi​)+f(−Zi​)​

Explanation: Pairing symmetric samples induces negative correlation, often reducing variance without bias when the distribution is symmetric.

Complexity Analysis

The computational cost of plain Monte Carlo estimation is linear in the number of samples N. Each iteration draws one random sample and evaluates f at that sample, so the time complexity is O(N · Cf​), where Cf​ is the cost of a single function evaluation. If f is O(1) to compute, the overall time is O(N). The space complexity is typically O(1) because you can maintain running sums (and sums of squares) without storing all samples; retaining all samples would increase space to O(N), but this is unnecessary for the mean and variance. Variance-reduction techniques preserve the same big-O time while improving the constant factors in accuracy. For example, antithetic variates still require O(N) operations but can reduce the variance by a factor depending on the induced negative correlation. Importance sampling also runs in O(N), though each iteration includes evaluating a weight w(x)=p/q; the effective per-sample cost may rise slightly due to extra density computations. The key performance metric is error versus compute: the standard Monte Carlo error decays as O(1/N​). Hence, to reduce error by a factor of 10, you need 100 times more samples if you do not use variance reduction. In parallel environments, Monte Carlo scales almost linearly because samples are independent: with P processors, ideal time is roughly OPN​. Practical bottlenecks include PRNG throughput, synchronization (e.g., reductions), and numerical stability (e.g., catastrophic cancellation when summing many values), which can be mitigated with techniques like Kahan summation. Importance sampling can suffer from heavy-tailed weights, effectively reducing the usable sample size; monitoring effective sample size (ESS) is recommended.

Code Examples

Plain Monte Carlo: Estimate ∫_0^1 e^{-x^2} dx with confidence interval
1#include <bits/stdc++.h>
2using namespace std;
3
4// Estimate I = E[f(U)] with U ~ Uniform(0,1), f(x) = exp(-x^2)
5// Also compute 95% confidence interval using sample variance.
6
7int main() {
8 ios::sync_with_stdio(false);
9 cin.tie(nullptr);
10
11 const size_t N = 1'000'000; // number of samples
12
13 // Seed PRNG with high-resolution clock for variability
14 std::mt19937_64 rng(static_cast<uint64_t>(chrono::high_resolution_clock::now().time_since_epoch().count()));
15 std::uniform_real_distribution<double> unif(0.0, 1.0);
16
17 auto f = [](double x) {
18 return exp(-x * x);
19 };
20
21 // Online mean and variance via Welford's algorithm to improve numerical stability
22 long double mean = 0.0L;
23 long double M2 = 0.0L; // sum of squares of differences from the current mean
24
25 for (size_t i = 1; i <= N; ++i) {
26 double x = unif(rng);
27 double fx = f(x);
28 long double delta = fx - mean;
29 mean += delta / (long double)i;
30 long double delta2 = fx - mean;
31 M2 += delta * delta2; // accumulates squared deviations
32 }
33
34 long double sample_mean = mean; // \hat{\mu}
35 long double sample_var = M2 / (long double)(N - 1); // unbiased s^2
36 long double se = sqrt((double)sample_var / (double)N); // standard error
37
38 // 95% normal-approx CI with z=1.96
39 const long double z = 1.96L;
40 long double lo = sample_mean - z * se;
41 long double hi = sample_mean + z * se;
42
43 cout.setf(std::ios::fixed); cout << setprecision(8);
44 cout << "Estimate of ∫_0^1 e^{-x^2} dx: " << (double)sample_mean << "\n";
45 cout << "Standard error: " << (double)se << "\n";
46 cout << "95% CI: [" << (double)lo << ", " << (double)hi << "]\n";
47
48 // For reference, true value ~ 0.746824 (erf(1)/2*sqrt(pi))
49 return 0;
50}
51

We sample U ~ Uniform(0,1) and average f(U)=exp(-U^2), which equals the integral over [0,1]. Welford’s algorithm provides a numerically stable running variance, from which we compute a 95% confidence interval using the CLT.

Time: O(N)Space: O(1)
Antithetic variates: Estimate E[e^Z] with Z ~ N(0,1)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Compare plain Monte Carlo vs. antithetic variates for f(z)=exp(z), Z~N(0,1).
5// True E[exp(Z)] = exp(1/2) ≈ 1.6487212707.
6
7struct Stats { long double mean; long double var; };
8
9// Compute mean and unbiased variance from a vector (for fair comparison of per-evaluation variability)
10Stats summarize(const vector<long double>& v) {
11 size_t n = v.size();
12 long double mean = 0.0L, M2 = 0.0L;
13 for (size_t i = 0; i < n; ++i) {
14 long double x = v[i];
15 long double d = x - mean;
16 mean += d / (long double)(i + 1);
17 long double d2 = x - mean;
18 M2 += d * d2;
19 }
20 long double var = (n > 1) ? M2 / (long double)(n - 1) : 0.0L;
21 return {mean, var};
22}
23
24int main() {
25 ios::sync_with_stdio(false);
26 cin.tie(nullptr);
27
28 const size_t N = 1'000'000; // total function evaluations budget
29
30 std::mt19937_64 rng(123456789ULL); // fixed seed for reproducibility
31 std::normal_distribution<double> normal(0.0, 1.0);
32
33 auto f = [](double z) { return exp(z); };
34
35 // Plain MC: use N evaluations
36 vector<long double> vals_plain; vals_plain.reserve(N);
37 for (size_t i = 0; i < N; ++i) {
38 double z = normal(rng);
39 vals_plain.push_back(f(z));
40 }
41 Stats S_plain = summarize(vals_plain);
42 long double mean_plain = S_plain.mean;
43 long double se_plain = sqrt((double)S_plain.var / (double)N);
44
45 // Antithetic: use pairs (Z, -Z), each pair contributes (f(Z)+f(-Z))/2
46 // Keep overall function-evaluation budget the same by using N/2 pairs.
47 size_t pairs = N / 2;
48 vector<long double> pair_means; pair_means.reserve(pairs);
49 for (size_t i = 0; i < pairs; ++i) {
50 double z = normal(rng);
51 long double g = 0.5L * ( (long double)f(z) + (long double)f(-z) );
52 pair_means.push_back(g);
53 }
54 Stats S_anti = summarize(pair_means);
55 long double mean_anti = S_anti.mean; // estimator using N evaluations total
56 long double se_anti = sqrt((double)S_anti.var / (double)pairs);
57
58 cout.setf(std::ios::fixed); cout << setprecision(8);
59 cout << "True E[e^Z] = exp(1/2) ≈ 1.64872127\n";
60 cout << "Plain MC: mean = " << (double)mean_plain << ", SE ≈ " << (double)se_plain << "\n";
61 cout << "Antithetic: mean = " << (double)mean_anti << ", SE ≈ " << (double)se_anti << "\n";
62 cout << "SE reduction factor (plain/anti): " << (double)(se_plain / se_anti) << "\n";
63
64 return 0;
65}
66

We estimate E[exp(Z)] using plain Monte Carlo and with antithetic variates by pairing Z with −Z from the symmetric normal distribution. Averaging f(Z) and f(−Z) induces negative correlation, reducing variance without changing the expectation. We report standard errors under an equal total function-evaluation budget.

Time: O(N)Space: O(N) for storing samples (can be reduced to O(1) with online updates)
Importance sampling: Estimate P(Z > 5) with Z ~ N(0,1)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Estimate a rare-event probability p = P(Z > 5) where Z ~ N(0,1).
5// Naive Monte Carlo would require ~1/p samples; here we use importance sampling with Q = N(mu,1), mu=5.
6// Weight: w(y) = p(y)/q(y) = exp(-mu*y + 0.5*mu*mu), computed in log-domain for stability.
7
8int main() {
9 ios::sync_with_stdio(false);
10 cin.tie(nullptr);
11
12 const size_t N = 500'000; // number of importance samples
13 const double mu = 5.0; // shift toward the rare event region
14
15 std::mt19937_64 rng(2024ULL);
16 std::normal_distribution<double> q_dist(mu, 1.0); // proposal Q = N(mu,1)
17 std::normal_distribution<double> p_dist(0.0, 1.0); // for naive reference
18
19 // Importance sampling estimator
20 long double mean_is = 0.0L, M2_is = 0.0L; // online stats for weighted indicator
21
22 for (size_t i = 1; i <= N; ++i) {
23 double y = q_dist(rng);
24 // log w = -mu * y + 0.5 * mu^2 (derived from Gaussian density ratio)
25 long double logw = -mu * (long double)y + 0.5L * mu * mu;
26 long double w = expl(logw);
27 long double h = (y > 5.0) ? w : 0.0L; // indicator * weight
28
29 long double delta = h - mean_is;
30 mean_is += delta / (long double)i;
31 long double delta2 = h - mean_is;
32 M2_is += delta * delta2;
33 }
34 long double var_is = M2_is / (long double)(N - 1);
35 long double se_is = sqrt((double)var_is / (double)N);
36
37 // Naive MC for comparison (same N). Likely returns zero due to rarity.
38 size_t count_naive = 0;
39 for (size_t i = 0; i < N; ++i) {
40 double z = p_dist(rng);
41 if (z > 5.0) ++count_naive;
42 }
43 long double est_naive = (long double)count_naive / (long double)N;
44
45 cout.setf(std::ios::fixed); cout << setprecision(12);
46 cout << "Importance sampling estimate P(Z>5): " << (double)mean_is << "\n";
47 cout << "Standard error (IS): " << (double)se_is << "\n";
48 cout << "Naive MC estimate (same N): " << (double)est_naive << "\n";
49 cout << "Reference true value ≈ 2.867e-7 (for N(0,1))\n";
50
51 return 0;
52}
53

The rare event Z>5 has probability ≈ 2.87e−7, so naive Monte Carlo with moderate N will almost always see zero hits. Importance sampling draws from N(5,1), where such samples are common, and corrects with the weight w = p/q. We maintain online mean and variance to report a standard error. We compute weights in log-space for numerical stability.

Time: O(N)Space: O(1)
#monte carlo#expectation#variance reduction#importance sampling#antithetic variates#confidence interval#central limit theorem#law of large numbers#pseudorandom#standard error#effective sample size#rare event#integration#high-dimensional#sample mean