🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
⚙️AlgorithmIntermediate

Numerical Integration & Monte Carlo

Key Points

  • •
    Numerical integration approximates the area under a curve when an exact antiderivative is unknown, using deterministic quadrature rules or random sampling (Monte Carlo).
  • •
    Composite trapezoid and Simpson’s rules sample a function at evenly spaced points and combine them with weights to approximate the integral with errors O(h2) and O(h4), respectively.
  • •
    Adaptive quadrature refines the step size where the function changes quickly, balancing accuracy and efficiency automatically.
  • •
    Monte Carlo integration estimates integrals by averaging random samples; its error shrinks like O(1/N​) and does not explode with dimension.
  • •
    Variance reduction techniques such as importance sampling and stratified sampling can dramatically improve Monte Carlo accuracy for the same number of samples.
  • •
    Deterministic quadrature excels for smooth low-dimensional integrals; Monte Carlo is often superior in high dimensions or with irregular domains.
  • •
    Always monitor error: use error bounds for quadrature and confidence intervals (via sample variance) for Monte Carlo.
  • •
    Function evaluation cost often dominates; choose methods that minimize evaluations in expensive integrand scenarios.

Prerequisites

  • →Definite integrals and the Fundamental Theorem of Calculus — Understanding what an integral represents and why antiderivatives are not always available motivates numerical methods.
  • →Derivatives and smoothness — Error bounds for quadrature rules depend on second and fourth derivatives.
  • →Big-O notation — To interpret convergence rates and computational complexity.
  • →Basic probability (expectation, variance) — Monte Carlo views integrals as expectations and relies on variance for error estimation.
  • →Random number generation — Implementing Monte Carlo requires i.i.d. pseudo-random samples and proper seeding.
  • →Floating-point arithmetic — Numerical stability, precision, and accumulation errors affect accuracy.
  • →Recursion and divide-and-conquer — Adaptive quadrature uses recursion to refine intervals.
  • →Change of variables (Jacobian) — Essential for mapping complex or infinite domains to simpler ones in both quadrature and Monte Carlo.
  • →Basic C++ standard library — Using std::function, <random>, and numeric utilities for implementations.

Detailed Explanation

Tap terms for definitions

01Overview

Numerical integration is about estimating definite integrals when finding an exact antiderivative is impractical or impossible. Two major families are deterministic quadrature rules and stochastic (Monte Carlo) methods. Quadrature rules, like the trapezoidal and Simpson’s rules, approximate the integrand with simple polynomials over subintervals and sum the resulting areas. They are very effective in one or a few dimensions, especially for smooth functions, and offer predictable error behavior as the mesh is refined. Monte Carlo integration approaches the same goal differently: it treats the integral as an expectation and approximates it by averaging random samples. Its hallmark is dimension-agnostic convergence in rate: the standard error decreases as 1/\sqrt{N}, regardless of how many dimensions you integrate over. This makes Monte Carlo particularly attractive in high-dimensional problems where grid-based quadrature suffers from the curse of dimensionality. Adaptive methods further refine deterministic rules by allocating more samples where the function changes rapidly and fewer where it is smooth. On the stochastic side, variance reduction techniques like importance sampling and stratified sampling can dramatically reduce the number of samples needed for a desired accuracy. Choosing between these methods depends on the integrand’s smoothness, dimensionality, domain shape, and whether reliable error bars are needed.

02Intuition & Analogies

Imagine you want to measure the area of a curved, uneven garden bed. One approach is to lay a row of planks across it, measuring the width at many spots and averaging—the trapezoidal rule. If you fit small parabolic arches between neighboring measurements, you can follow the curve more closely—this is Simpson’s rule. If you notice that one patch is very wiggly while others are flat, you’d take more measurements in the wiggly area and fewer elsewhere—adaptive quadrature. Monte Carlo is like estimating the garden’s area by tossing random pebbles over a bounding rectangle and counting how many land inside the bed. The fraction that land inside, multiplied by the rectangle’s area, estimates the garden’s area. The more pebbles you toss, the better your estimate, and the error shrinks with the square root of the number of pebbles. If the garden is oddly shaped or occupies only a small fraction of the rectangle, you can aim your tosses more cleverly—importance sampling—so more pebbles land where they matter. In one or two dimensions with smooth edges, careful tape measurements (deterministic quadrature) are fast and precise. But if your garden is actually a maze with many twists in ten dimensions (think probabilities over many variables), grid-based measurements explode in cost, while pebble tossing (Monte Carlo) keeps the same reliable convergence rhythm. The art is choosing the right tool and, when using random pebbles, throwing them smartly to reduce wasted effort.

03Formal Definition

Given an integrable function f on a domain D, the goal is to approximate I = ∫D​ f(x)\,dx. A (deterministic) quadrature rule approximates I by a weighted sum of function values at selected nodes: Qn​(f) = ∑i=1n​ wi​ f(xi​), where nodes xi​ ∈ D and weights wi​ depend on the rule. Examples include the trapezoidal rule and Simpson’s rule on intervals [a,b], and Gaussian quadrature that chooses nodes/weights to integrate polynomials up to degree 2n−1 exactly. Composite rules partition [a,b] into m subintervals, apply a base rule on each, and sum the results. For smooth f, error bounds relate the step size h to derivatives of f, e.g., trapezoid’s error is O(h2) and Simpson’s is O(h4). Monte Carlo treats I as an expectation. If X ∼ Unif(D), then I = vol(D)\,E[f(X)]. The estimator \hat IN​ = vol(D)\,N1​∑k=1N​ f(Xk​) is unbiased with variance Var(\hat IN​) = vol(D)^2\,Var(f(X))/N. By the Central Limit Theorem, \hat IN​ is approximately normal for large N, enabling confidence intervals. More generally, with importance sampling using a density g with support on D, I = \mathbb{E}_g\left[ g(X)f(X)​ \right], estimated by N1​∑ g(Xk​)f(Xk​)​ when Xk​∼ g.

04When to Use

  • Smooth, low-dimensional (1–3D) integrals over simple intervals: choose composite Simpson’s rule or Gaussian quadrature for high accuracy with few evaluations.
  • Functions with localized sharp features or boundary layers: adaptive quadrature allocates effort where needed, often outperforming fixed-step methods.
  • High-dimensional integrals (d ≥ 5), integrals over complex domains, or integrals defined as expectations in probability/statistics: Monte Carlo or quasi–Monte Carlo typically scale better than grid-based quadrature.
  • When reliable error bars are required (e.g., in uncertainty quantification): Monte Carlo provides natural standard error estimates; deterministic rules need problem-specific error bounds or a refinement study.
  • Expensive integrand evaluations (e.g., simulations): prefer methods requiring fewer evaluations for a given accuracy—often adaptive quadrature in 1D or variance-reduced Monte Carlo in higher dimensions.
  • Infinite or semi-infinite domains: use change-of-variables, specialized quadrature rules, or importance sampling (e.g., exponential proposals for decaying tails).

⚠️Common Mistakes

  • Using Simpson’s rule with an odd number of subintervals (it requires an even count). Always check that n is even for composite Simpson.
  • Ignoring smoothness assumptions: applying high-order rules to non-smooth or discontinuous functions can degrade accuracy; prefer adaptive or Monte Carlo in such cases.
  • Forgetting Jacobians in change-of-variables: when mapping domains (e.g., infinite to finite intervals), always multiply by the absolute derivative of the transformation.
  • Mismanaging random numbers: not seeding the RNG properly, using low-quality RNGs, or unintentionally correlating samples (e.g., reusing RNG state across threads without care) biases or inflates variance.
  • Reporting Monte Carlo means without uncertainty: always compute sample variance and provide confidence intervals; otherwise, results can be misleading.
  • Overlooking the curse of dimensionality in deterministic grids: cost explodes as m^d; switch to Monte Carlo or sparse grids for higher d.
  • Numerical issues: using float instead of double for delicate integrals, summing in naive order (avoid catastrophic cancellation), or failing to guard adaptive recursion against infinite loops or stack overflow.
  • Poor importance sampling choices: choosing g too small where f is large causes huge weights and variance; design g to mimic the shape of |f|.

Key Formulas

Definite integral

I=∫ab​f(x)dx

Explanation: The area under the curve f between a and b. This is what numerical integration seeks to approximate.

General quadrature

∫ab​f(x)dx≈i=1∑n​wi​f(xi​)

Explanation: Any quadrature rule computes a weighted sum of function values at specific nodes. Choosing nodes and weights defines the method.

Composite trapezoidal rule

∫ab​f(x)dx≈2h​(f(a)+2i=1∑n−1​f(a+ih)+f(b))

Explanation: Approximates the integral by trapezoids over n subintervals of width h=(b−a)/n. Error decreases quadratically with h for smooth functions.

Composite Simpson’s rule

∫ab​f(x)dx≈3h​(f(a)+4i=1,iodd∑n−1​f(a+ih)+2i=2,ieven∑n−2​f(a+ih)+f(b))

Explanation: Uses parabolic fits over pairs of subintervals (n even). Error decreases as h4 for sufficiently smooth f.

Trapezoid error (global)

ET​=−12(b−a)​h2f′′(ξ)

Explanation: For some ξ in [a,b], the trapezoid error behaves like a constant times h2 times the second derivative. Smaller h or smaller curvature reduces error.

Simpson error (global)

ES​=−180(b−a)​h4f(4)(ξ)

Explanation: For some ξ in [a,b], Simpson’s error is proportional to h4 and the fourth derivative. It drops much faster than trapezoid for smooth f.

Integral as expectation

I=vol(D)E[f(X)],X∼Unif(D)

Explanation: Rewrites the integral over D as the volume times the average of f over uniformly drawn points. This enables Monte Carlo estimation.

Monte Carlo estimator

I^N​=vol(D)N1​k=1∑N​f(Xk​)

Explanation: A simple unbiased estimator of the integral using N i.i.d. uniform samples on D. Its variance shrinks as 1/N.

Monte Carlo variance

Var(I^N​)=Nvol(D)2​Var(f(X))

Explanation: Shows that the standard error scales as 1/N​. Doubling accuracy typically needs quadruple the samples.

CLT for Monte Carlo

I^N​≈N(I,Var(I^N​))

Explanation: By the Central Limit Theorem, the estimator is approximately normal for large N, supporting confidence intervals.

Importance sampling identity

I=∫D​f(x)dx=∫D​g(x)f(x)​g(x)dx=Eg​[g(X)f(X)​]

Explanation: Sampling from a density g that resembles f and reweighting by f/g can reduce variance dramatically if g is well chosen.

Change of variables

∫D​f(x)dx=∫T−1(D)​f(T(u))∣detJT​(u)∣du

Explanation: Transforming the domain requires multiplying by the absolute Jacobian determinant. This is essential for mapping infinite to finite intervals.

Grid growth (curse of dimensionality)

md

Explanation: Using m points per dimension in d dimensions requires md function evaluations, which becomes impractical as d grows.

Gaussian quadrature exactness

Gauss–Legendre exactness: degree ≤2n−1

Explanation: An n-point Gauss–Legendre rule integrates any polynomial up to degree 2n−1 exactly, explaining its high accuracy on smooth functions.

Complexity Analysis

For 1D deterministic quadrature on a uniform grid with n subintervals, both composite trapezoid and Simpson’s rules require O(n) function evaluations and O(1) auxiliary memory. Their computational cost is dominated by evaluating f; arithmetic overhead is linear and typically negligible by comparison. The global error scales as O(h2) for trapezoid and O(h4) for Simpson, where h=(b−a)/n. Achieving a target error ε thus requires n=O(ε−1/2) for trapezoid and n=O(ε−1/4) for Simpson when f is sufficiently smooth, translating into O(ε−1/2) and O(ε−1/4) evaluations, respectively. Adaptive quadrature attempts to minimize total evaluations by concentrating effort where the function is difficult. While worst-case bounds resemble fixed rules, practical performance often improves substantially. The recursion and bookkeeping add small constant factors in time and space; with tail recursion or an explicit stack, memory remains O(depth), typically modest. Monte Carlo integration with N samples requires O(N) evaluations of f and O(1) additional memory (tracking running sums and possibly variance). Its standard error decreases as O(1/N​), independent of dimension, which is crucial for high-dimensional problems. However, the constant factor—variance of the integrand under the sampling distribution—can be large. Variance reduction (importance sampling, stratification, antithetic variates) reduces this constant, effectively improving accuracy at the same O(N) cost. In contrast, deterministic tensor-product quadrature in d dimensions requires O(md) evaluations for m points per dimension, making it impractical as d grows (the curse of dimensionality). Thus, method selection hinges on dimensionality, integrand smoothness, and evaluation cost.

Code Examples

Composite Trapezoidal and Simpson’s Rules in 1D
1#include <bits/stdc++.h>
2using namespace std;
3
4// Composite trapezoidal rule on [a,b] with n subintervals (n >= 1)
5double composite_trapezoid(const function<double(double)>& f, double a, double b, int n) {
6 if (n <= 0) throw invalid_argument("n must be positive");
7 double h = (b - a) / n;
8 double sum = 0.5 * (f(a) + f(b));
9 for (int i = 1; i < n; ++i) {
10 sum += f(a + i * h);
11 }
12 return h * sum;
13}
14
15// Composite Simpson's rule on [a,b] with n subintervals (n must be even)
16double composite_simpson(const function<double(double)>& f, double a, double b, int n) {
17 if (n <= 0 || (n % 2 != 0)) throw invalid_argument("n must be positive and even for Simpson's rule");
18 double h = (b - a) / n;
19 double sum = f(a) + f(b);
20 // Odd indices get weight 4; even indices (except endpoints) get weight 2
21 for (int i = 1; i < n; ++i) {
22 double coeff = (i % 2 == 1) ? 4.0 : 2.0;
23 sum += coeff * f(a + i * h);
24 }
25 return (h / 3.0) * sum;
26}
27
28int main() {
29 // Example: integrate sin(x) on [0, pi]; exact value is 2
30 auto f = [](double x) { return sin(x); };
31 double a = 0.0, b = M_PI; // M_PI is a common extension; if missing, use acos(-1)
32 if (!isfinite(b)) b = acos(-1.0);
33
34 for (int n : {10, 20, 40, 80}) {
35 double T = composite_trapezoid(f, a, b, n);
36 double S = composite_simpson(f, a, b, (n % 2 == 0) ? n : n + 1); // ensure even for Simpson
37 cout.setf(ios::fixed); cout << setprecision(10);
38 cout << "n=" << n << ": Trapezoid= " << T << ", Simpson= " << S << ", Error_T= " << fabs(T - 2.0)
39 << ", Error_S= " << fabs(S - 2.0) << "\n";
40 }
41 return 0;
42}
43

This program implements composite trapezoidal and Simpson’s rules for a given function on an interval. It integrates sin(x) over [0, π], whose exact integral is 2, and prints the approximations and absolute errors for increasing n. Simpson’s rule converges much faster than trapezoid for this smooth function, as predicted by their error orders.

Time: O(n) function evaluationsSpace: O(1) extra space
Adaptive Simpson’s Rule with Error Control
1#include <bits/stdc++.h>
2using namespace std;
3
4// Basic Simpson evaluation on [a,b]
5static inline double simpson_eval(const function<double(double)>& f, double a, double b) {
6 double c = 0.5 * (a + b);
7 double fa = f(a), fb = f(b), fc = f(c);
8 return (b - a) * (fa + 4.0 * fc + fb) / 6.0;
9}
10
11// Recursive adaptive Simpson with function caching for fa, fm, fb
12static double adaptive_simpson_rec(const function<double(double)>& f, double a, double b,
13 double eps, double S, double fa, double fm, double fb,
14 int depth, int maxDepth, int& evals) {
15 double c = 0.5 * (a + b);
16 double left_m = 0.5 * (a + c);
17 double right_m = 0.5 * (c + b);
18
19 double f_left_m = f(left_m); ++evals;
20 double f_right_m = f(right_m); ++evals;
21
22 double Sleft = (c - a) * (fa + 4.0 * f_left_m + fm) / 6.0;
23 double Sright = (b - c) * (fm + 4.0 * f_right_m + fb) / 6.0;
24 double S2 = Sleft + Sright;
25
26 // Error estimate: |S2 - S|/15 for Simpson
27 if (depth >= maxDepth || fabs(S2 - S) <= 15.0 * eps) {
28 // Richardson extrapolation for improved estimate
29 return S2 + (S2 - S) / 15.0;
30 }
31 // Recurse on left and right with halved tolerance (balanced)
32 double left = adaptive_simpson_rec(f, a, c, eps * 0.5, Sleft, fa, f_left_m, fm, depth + 1, maxDepth, evals);
33 double right = adaptive_simpson_rec(f, c, b, eps * 0.5, Sright, fm, f_right_m, fb, depth + 1, maxDepth, evals);
34 return left + right;
35}
36
37// Public driver
38pair<double,int> adaptive_simpson(const function<double(double)>& f, double a, double b, double eps, int maxDepth=20) {
39 int evals = 0;
40 double c = 0.5 * (a + b);
41 double fa = f(a); ++evals;
42 double fb = f(b); ++evals;
43 double fm = f(c); ++evals;
44 double S = (b - a) * (fa + 4.0 * fm + fb) / 6.0;
45 double res = adaptive_simpson_rec(f, a, b, eps, S, fa, fm, fb, 0, maxDepth, evals);
46 return {res, evals};
47}
48
49int main() {
50 // Integrate exp(-x^2) from 0 to 1; reference: 0.5*sqrt(pi)*erf(1)
51 auto f = [](double x) { return exp(-x * x); };
52 double a = 0.0, b = 1.0;
53 double eps = 1e-10; // desired absolute error
54
55 auto [I, evals] = adaptive_simpson(f, a, b, eps, 30);
56
57 // Reference using erf
58 double ref = 0.5 * sqrt(numbers::pi_v<double>) * erf(1.0);
59
60 cout.setf(ios::fixed); cout << setprecision(12);
61 cout << "Adaptive Simpson result = " << I << "\n";
62 cout << "Reference = " << ref << "\n";
63 cout << "Abs error = " << fabs(I - ref) << "\n";
64 cout << "Function evaluations = " << evals << "\n";
65 return 0;
66}
67

This implementation of adaptive Simpson’s rule refines intervals until the estimated local error is below a given tolerance or a maximum depth is reached. It caches function values to avoid recomputation and returns both the integral estimate and the number of function evaluations. The example integrates e^{−x^2} on [0,1] and compares with a reference using the error function.

Time: O(m) function evaluations, where m depends on tolerance and integrand smoothnessSpace: O(depth) due to recursion (typically small)
Monte Carlo Integration with Confidence Intervals and Importance Sampling
1#include <bits/stdc++.h>
2using namespace std;
3
4struct MCResult { double estimate; double stderr; double ci_low; double ci_high; };
5
6// Simple Monte Carlo for integral over [0,1]: I = int_0^1 f(x) dx
7MCResult mc_unit_interval(function<double(double)> f, uint64_t N, mt19937_64& gen) {
8 uniform_real_distribution<double> U(0.0, 1.0);
9 long double sum = 0.0L, sum2 = 0.0L;
10 for (uint64_t i = 0; i < N; ++i) {
11 double x = U(gen);
12 double y = f(x);
13 sum += y;
14 sum2 += (long double)y * y;
15 }
16 long double mean = sum / (long double)N;
17 long double var = (sum2 / (long double)N) - mean * mean; // population variance estimate
18 if (var < 0) var = 0; // guard numerical noise
19 double stderr = sqrt((double)var / (double)N); // SE of the mean
20 double est = (double)mean; // since volume([0,1]) = 1
21 double z = 1.96; // ~95% CI
22 return {est, stderr, est - z * stderr, est + z * stderr};
23}
24
25// Importance sampling on [0, +inf) for I = int_0^inf f(x) dx using proposal g(x) = Exp(1)
26MCResult mc_importance_exp(function<double(double)> f_over_g, uint64_t N, mt19937_64& gen) {
27 // We expect f_over_g(x) = f(x)/g(x); for g(x)=e^{-x} 1_{x>=0}, sampling X=-ln(U)
28 uniform_real_distribution<double> U(0.0, 1.0);
29 long double sum = 0.0L, sum2 = 0.0L;
30 for (uint64_t i = 0; i < N; ++i) {
31 double u = U(gen);
32 // Inverse-CDF sampling for Exp(1)
33 double x = -log(1.0 - u); // numerically stable; equivalent to -ln(U)
34 double w = f_over_g(x); // equals f(x)/g(x)
35 sum += w;
36 sum2 += (long double)w * w;
37 }
38 long double mean = sum / (long double)N;
39 long double var = (sum2 / (long double)N) - mean * mean;
40 if (var < 0) var = 0;
41 double stderr = sqrt((double)var / (double)N);
42 double est = (double)mean; // Volume factor is 1 for expectation under g
43 double z = 1.96;
44 return {est, stderr, est - z * stderr, est + z * stderr};
45}
46
47int main() {
48 random_device rd; mt19937_64 gen(rd());
49
50 // Example 1: I = int_0^1 sin(pi x) dx = 2/pi
51 auto f1 = [](double x) { return sin(acos(-1.0) * x); }; // sin(pi*x)
52 uint64_t N1 = 1'000'000;
53 auto R1 = mc_unit_interval(f1, N1, gen);
54 cout.setf(ios::fixed); cout << setprecision(8);
55 cout << "Integral of sin(pi x) on [0,1]:\n";
56 cout << "MC estimate = " << R1.estimate << ", SE = " << R1.stderr
57 << ", 95% CI = [" << R1.ci_low << ", " << R1.ci_high << "]\n";
58 cout << "Exact = " << (2.0 / acos(-1.0)) << "\n\n";
59
60 // Example 2: Importance sampling for I = int_0^inf x^2 e^{-x} dx = 2
61 // With g(x)=Exp(1), f(x)/g(x) = x^2 e^{-x} / e^{-x} = x^2 (low variance vs naive)
62 auto f_over_g = [](double x) { return x * x; };
63 uint64_t N2 = 500'000;
64 auto R2 = mc_importance_exp(f_over_g, N2, gen);
65 cout << "Integral of x^2 e^{-x} on [0,inf):\n";
66 cout << "IS estimate = " << R2.estimate << ", SE = " << R2.stderr
67 << ", 95% CI = [" << R2.ci_low << ", " << R2.ci_high << "]\n";
68 cout << "Exact = 2.0\n";
69
70 return 0;
71}
72

The first routine estimates ∫_0^1 f(x)dx by averaging f(U) for U∼Uniform(0,1) and reports a 95% confidence interval using the sample variance. The example integrates sin(πx). The second routine performs importance sampling for I=∫_0^{∞} x^2 e^{−x} dx using g(x)=Exp(1). Because f(x)/g(x)=x^2, the estimator has much smaller variance than naive sampling over a truncated domain. Both use C++’s Mersenne Twister RNG and produce standard errors via the CLT.

Time: O(N) for N samplesSpace: O(1) extra space
#numerical integration#quadrature#trapezoidal rule#simpson's rule#adaptive quadrature#monte carlo integration#importance sampling#variance reduction#confidence interval#high-dimensional integration#law of large numbers#central limit theorem#gaussian quadrature#change of variables