🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
⚙️AlgorithmIntermediate

Stratified & Latin Hypercube Sampling

Key Points

  • •
    Stratified sampling reduces Monte Carlo variance by dividing the domain into non-overlapping regions (strata) and sampling within each region.
  • •
    Latin Hypercube Sampling (LHS) ensures uniform coverage in every dimension by taking exactly one sample from each 1/n interval per dimension and permuting independently across dimensions.
  • •
    Both methods keep marginal distributions uniform while improving coverage, leading to more stable estimates for the same number of samples.
  • •
    In practice, stratified sampling needs correct weighting when strata have different sizes; otherwise estimates become biased.
  • •
    LHS is especially effective in moderate dimensions when the function varies smoothly or monotonically along each coordinate.
  • •
    Time complexity for generating n samples in d dimensions is O(n d) and space complexity is typically O(n d).
  • •
    Random jitter inside each stratum prevents grid artifacts and preserves unbiasedness.
  • •
    Use independent shuffles per dimension for LHS; reusing the same permutation can reintroduce unwanted correlations.

Prerequisites

  • →Uniform random number generation — Sampling within strata and jittering rely on drawing uniform random variables.
  • →Basic probability and expectation — Understanding unbiased estimators and variance reduction requires the concepts of expectation and variance.
  • →Monte Carlo integration — Stratification and LHS are alternative designs for Monte Carlo estimators.
  • →Permutations and shuffling (Fisher–Yates) — LHS construction needs independent random permutations per dimension.
  • →Asymptotic notation — To analyze time and space complexity and compare methods using O(·) notation.
  • →Numerical integration over intervals and boxes — Provides context for what is being estimated and why coverage matters.

Detailed Explanation

Tap terms for definitions

01Overview

Hook: Imagine surveying a city: if you only ask people downtown at noon, you miss commuters and night-shift workers. A better plan is to split the city into neighborhoods and sample each one. That way, your average reflects the whole city. Concept: Stratified sampling formalizes this idea for Monte Carlo—split the domain into strata, then sample within each stratum and combine results with appropriate weights. Latin Hypercube Sampling (LHS) goes a step further for multi-dimensional problems by enforcing even coverage along every coordinate, so you don’t accidentally cluster samples in the same x- or y-interval. Example: To estimate an integral over [0,1], we can divide it into n equal subintervals and pick one random point per subinterval. In 2D, stratified sampling might use an s×s grid; in d dimensions, LHS chooses exactly one point from each of the n subintervals along every axis, pairing them via independent permutations to keep marginals uniform but reduce clumping. The result is lower estimator variance for many functions compared to naive Monte Carlo with the same sample budget.

02Intuition & Analogies

Hook: Filling a chocolate box evenly is easier if you place one piece in each slot rather than tossing them randomly and hoping every slot gets filled. Stratified and Latin hypercube sampling do exactly that for the space of possibilities. Concept: Random sampling can bunch points in some regions and leave gaps in others purely by chance. Stratification is like drawing a grid over the space and promising yourself you’ll visit every cell. That promise reduces randomness where it hurts (coverage) while keeping enough randomness (jitter) to avoid bias. LHS adds a clever twist for high dimensions: instead of a full grid (which explodes combinatorially), it ensures one sample per 1/n slice of each axis and then shuffles those slices independently per dimension. That way, each coordinate is well spread, but you still get n total samples, not n^d. Example: Suppose you’re testing a video game across screen sizes and frame rates. Pure random tests might accidentally cluster around mid-range screens and 60 FPS. Stratified sampling would split screen sizes and frame rates into bins and test each bin. LHS would ensure that across n tests, every screen-size bin and every FPS bin appears exactly once, paired randomly, giving broad coverage without an exponential blowup of combinations.

03Formal Definition

Hook: Coverage without explosion. Concept: Let D be a measurable domain with measure μ. A stratification is a collection of disjoint sets {Sh​}_{h=1}^{H} such that D = ⋃h=1H​ Sh​. If we draw nh​ independent samples Xh,i​ uniformly from Sh​, the stratified estimator for the mean of f over D is μ^​ = ∑h=1H​ Wh​ fˉ​h​, where Wh​ = μ(Sh​)/μ(D) and fˉ​h​ = (1/nh​) ∑i=1nh​​ f(Xh,i​). Under standard assumptions, μ^​ is unbiased for μ(f) = μ(D)1​ ∫D​ f(x)\, dμ(x). For equal-sized strata and one sample per stratum, this simplifies to the average of f evaluated at stratified points. LHS in d dimensions with n samples constructs points Xi​ = (Xi1​,…,Xid​) in [0,1]^{d} such that for each coordinate j, the set {X1j​,…,Xnj​} contains exactly one element from each interval ((k-1)/n, k/n], k=1,…,n. One implementation samples Uij​ ∼ Uniform(0,1), draws independent permutations πj​ of {1,…,n}, and sets Xij​ = (πj​(i) - Uij​)/n. This guarantees uniform marginals and improved space-filling properties while keeping the total number of samples n.

04When to Use

Hook: When randomness alone leaves holes, force coverage. Concept: Use stratified sampling whenever your function’s behavior differs across regions and you can define meaningful strata (e.g., time-of-day segments, spatial tiles, parameter bins). It excels in Monte Carlo integration and simulation where variance matters: rendering (anti-aliasing, soft shadows), rare-event estimation with region-aware allocation, and survey sampling with known subpopulations. LHS is ideal for moderate-dimensional (say d up to a few dozen) uncertainty quantification and design of experiments—calibration, sensitivity analysis, and surrogate modeling—because it spreads samples evenly in each coordinate without an exponential grid. Example: • Estimating \int_{[0,1]^{d}} f(x) dx when f changes faster near boundaries—use more/unequal strata there. • Running n expensive simulations that depend on d uncertain inputs—use LHS to cover each input’s range well with just n runs. • Generating blue-noise-like sample sets for 2D images—use a jittered grid (stratified 2D) for aesthetically uniform points.

⚠️Common Mistakes

Hook: Good coverage can be undone by small implementation slips. Concept: Common pitfalls include (1) forgetting weights when strata have different sizes, which biases the estimator; (2) using too few samples per stratum, making within-stratum variance estimates unstable; (3) reusing the same permutation across dimensions in LHS, which creates artificial correlations; (4) skipping random jitter and placing points on cell centers, which can bias estimates if f is aligned with the grid; (5) not randomizing across repeated trials—LHS without re-randomization can look deceptively precise; (6) mismatched boundaries where samples fall outside strata because of floating-point rounding; and (7) assuming LHS samples are independent—they are not, and this affects variance formulas. Example fixes: • Always compute W_{h} from geometric/measure proportions and multiply stratum means accordingly. • For unequal strata or heterogeneous variability, use Neyman allocation (n_{h} \propto W_{h} \sigma_{h}) rather than equal n_{h}. • In LHS, generate an independent shuffle for each dimension and a fresh set of jitters per run. • Use half-open intervals to avoid overlap, and clamp numerically to [0,1].

Key Formulas

Population mean over a domain

μ(f)=μ(D)1​∫D​f(x)dμ(x)

Explanation: This defines the true mean value of f over domain D with respect to measure μ. Monte Carlo methods aim to estimate this quantity.

Stratified estimator

μ^​strat​=h=1∑H​Wh​fˉ​h​,fˉ​h​=nh​1​i=1∑nh​​f(Xh,i​),Wh​=μ(D)μ(Sh​)​

Explanation: Average within each stratum and combine using stratum weights. This estimator is unbiased if Xh,i​ are sampled uniformly within each Sh​.

Variance of stratified estimator

Var(μ^​strat​)=h=1∑H​Wh2​nh​σh2​​

Explanation: The total variance is the weighted sum of within-stratum variances divided by their sample counts. It is typically smaller than naive Monte Carlo when strata isolate variability.

Neyman allocation

nh⋆​=n⋅∑k=1H​Wk​σk​Wh​σh​​

Explanation: Given total budget n, allocating samples proportional to Wh​ times the stratum standard deviation σh​ minimizes the estimator variance.

Baseline Monte Carlo mean and variance

YˉMC​=n1​i=1∑n​f(Xi​),Var(YˉMC​)=nσ2​

Explanation: With i.i.d. uniform samples, the sample mean is unbiased and its variance decreases as 1/n. Stratification aims to reduce the constant factor compared to this baseline.

LHS construction

Xij​=nπj​(i)−Uij​​,Uij​∼Uniform(0,1)

Explanation: For each dimension j, pick an independent permutation πj​ and jitter Uij​. This guarantees exactly one sample in each 1/n interval of each coordinate.

Variance comparison (monotone case)

Var(μ^​LHS​)≤Var(YˉMC​)for functions monotone in each coordinate

Explanation: For many monotone functions, LHS does not increase variance and often reduces it compared to i.i.d. Monte Carlo. The inequality formalizes this for a common class.

Complexity of LHS/stratified generation

T(n,d)=O(nd),S(n,d)=O(nd)

Explanation: Generating n points in d dimensions with stratification or LHS takes time proportional to the number of coordinates produced and linear space to store them.

Approximate confidence interval

μ^​±zα/2​Var(μ^​)​

Explanation: Using an estimate of the stratified variance and the normal quantile z, you can form an approximate (1-α) confidence interval for the mean.

Weights for unequal strata

Wh​=∑k=1H​μ(Sk​)μ(Sh​)​

Explanation: When strata have different sizes, compute weights from their measures to keep the estimator unbiased.

Complexity Analysis

Generating stratified or Latin Hypercube samples is linear in the number of coordinates you produce. For n samples in d dimensions, you create n·d uniform jitters, perform d independent permutations of size n (each O(n)), and combine them, yielding time complexity O(n d). The dominant costs are the random number generation and the d shuffles, which are typically implemented in O(n) using Fisher–Yates with high constant factors due to RNG calls. Space complexity depends on whether you stream points or store them. If you allocate the full sample matrix, it is O(n d). If memory is constrained, you can generate per dimension on the fly: store one permutation of size n at a time (O(n)) and write points row by row, reducing peak memory to O(max(n, d)). For 2D s×s jittered grids (a common stratified case), time is O(s2) and space is O(s2) if you store points, or O(1) if you stream them. Estimator computation (e.g., evaluating f at each point and averaging) adds O(n d) for coordinate handling plus the cost of evaluating f, often O(n) if f is O(1) per point. Compared to naive Monte Carlo (also O(n d)), stratified/LHS methods keep the same asymptotic complexity but typically achieve lower variance per sample, meaning they can reach a target error tolerance with fewer n in practice.

Code Examples

1D Stratified Sampling to Estimate an Integral on [0,1]
1#include <bits/stdc++.h>
2using namespace std;
3
4// Function to integrate: f(x) = exp(-x^2)
5static inline double f(double x) { return exp(-x * x); }
6
7// Plain Monte Carlo estimator with n samples on [0,1]
8double mc_integral_1d(size_t n, mt19937_64 &rng) {
9 uniform_real_distribution<double> U(0.0, 1.0);
10 double sum = 0.0;
11 for (size_t i = 0; i < n; ++i) {
12 double x = U(rng);
13 sum += f(x);
14 }
15 return sum / static_cast<double>(n);
16}
17
18// Stratified estimator with n strata (one sample per stratum) on [0,1]
19double stratified_integral_1d(size_t n, mt19937_64 &rng) {
20 uniform_real_distribution<double> U(0.0, 1.0);
21 double sum = 0.0;
22 for (size_t k = 0; k < n; ++k) {
23 // Stratum interval: ((k)/n, (k+1)/n]; use jitter u in (0,1]
24 double u = U(rng); // jitter in [0,1)
25 double x = (k + u) / static_cast<double>(n); // uniform in stratum
26 sum += f(x);
27 }
28 // All strata are equal length => weight is 1/n for each
29 return sum / static_cast<double>(n);
30}
31
32int main() {
33 ios::sync_with_stdio(false);
34 cin.tie(nullptr);
35
36 const size_t n = 10000; // number of samples/strata
37 mt19937_64 rng(123456789ULL); // fixed seed for reproducibility
38
39 double mc = mc_integral_1d(n, rng);
40 // Re-seed or use separate RNG stream if you want independent runs
41 mt19937_64 rng2(987654321ULL);
42 double strat = stratified_integral_1d(n, rng2);
43
44 // True integral (for reference) is not elementary; compare approximations only
45 cout.setf(std::ios::fixed); cout << setprecision(6);
46 cout << "Plain MC estimate: " << mc << "\n";
47 cout << "Stratified estimate: " << strat << "\n";
48 cout << "(Both estimate ∫_0^1 exp(-x^2) dx)\n";
49 return 0;
50}
51

We approximate ∫_0^1 e^{-x^2} dx. The plain Monte Carlo uses i.i.d. uniform x. The stratified version divides [0,1] into n equal subintervals and draws one jittered point from each, averaging f(x). Because coverage is enforced, stratified sampling typically shows lower variance than plain Monte Carlo for the same n.

Time: O(n)Space: O(1)
2D Stratified (Jittered Grid) Point Generator
1#include <bits/stdc++.h>
2using namespace std;
3
4struct Pt { double x, y; };
5
6// Generate an s x s jittered grid of points over [0,1]^2
7vector<Pt> jittered_grid_2d(size_t s, mt19937_64 &rng) {
8 uniform_real_distribution<double> U(0.0, 1.0);
9 vector<Pt> pts; pts.reserve(s * s);
10 for (size_t i = 0; i < s; ++i) {
11 for (size_t j = 0; j < s; ++j) {
12 // Cell ((i)/s,(i+1)/s] x ((j)/s,(j+1)/s]
13 double ux = U(rng), uy = U(rng);
14 double x = (i + ux) / static_cast<double>(s);
15 double y = (j + uy) / static_cast<double>(s);
16 pts.push_back({x, y});
17 }
18 }
19 return pts;
20}
21
22int main() {
23 ios::sync_with_stdio(false);
24 cin.tie(nullptr);
25
26 size_t s = 16; // 16x16 = 256 points
27 mt19937_64 rng(42);
28 auto pts = jittered_grid_2d(s, rng);
29
30 cout.setf(std::ios::fixed); cout << setprecision(5);
31 for (size_t k = 0; k < min<size_t>(pts.size(), 20); ++k) {
32 cout << pts[k].x << ", " << pts[k].y << "\n";
33 }
34 cerr << "Generated " << pts.size() << " stratified points over [0,1]^2\n";
35 return 0;
36}
37

This generates an s×s stratified (jittered) grid in 2D. Each cell contributes exactly one random point, providing even spatial coverage suitable for rendering, sampling textures, or initializing k-means centers. You can stream the points without storing them if memory is tight.

Time: O(s^2)Space: O(s^2) (O(1) if streamed)
Latin Hypercube Sampling (LHS) in d Dimensions with Integral Estimation
1#include <bits/stdc++.h>
2using namespace std;
3
4// Generate n d-dimensional LHS samples in [0,1]^d
5vector<vector<double>> lhs(size_t n, size_t d, mt19937_64 &rng) {
6 uniform_real_distribution<double> U(0.0, 1.0);
7 vector<vector<double>> X(n, vector<double>(d, 0.0));
8
9 // For each dimension, make a permutation and jitter within bins
10 for (size_t j = 0; j < d; ++j) {
11 vector<size_t> pi(n);
12 iota(pi.begin(), pi.end(), 0); // 0..n-1
13 shuffle(pi.begin(), pi.end(), rng); // independent permutation per dimension
14 for (size_t i = 0; i < n; ++i) {
15 double u = U(rng); // jitter in [0,1)
16 // Map to ((k)/n,(k+1)/n] with k = pi[i]
17 X[i][j] = (static_cast<double>(pi[i]) + u) / static_cast<double>(n);
18 // Clamp (defensive): ensure in [0,1]
19 if (X[i][j] >= 1.0) X[i][j] = nextafter(1.0, 0.0);
20 }
21 }
22 return X;
23}
24
25// Test function: f(x) = exp(-sum_j x_j). True mean over [0,1]^d is (1 - e^{-1})^d
26static inline double f(const vector<double> &x) {
27 double s = 0.0;
28 for (double v : x) s += v;
29 return exp(-s);
30}
31
32int main() {
33 ios::sync_with_stdio(false);
34 cin.tie(nullptr);
35
36 size_t n = 5000; // number of samples
37 size_t d = 5; // dimensionality
38 mt19937_64 rng(2024);
39
40 auto X = lhs(n, d, rng);
41
42 double sum = 0.0;
43 for (const auto &x : X) sum += f(x);
44 double estimate = sum / static_cast<double>(n);
45
46 // True value for comparison: (1 - e^{-1})^d
47 double truth = pow(1.0 - exp(-1.0), static_cast<int>(d));
48
49 cout.setf(std::ios::fixed); cout << setprecision(8);
50 cout << "LHS estimate: " << estimate << "\n";
51 cout << "True value: " << truth << "\n";
52 cout << "Abs error: " << fabs(estimate - truth) << "\n";
53 return 0;
54}
55

This implements Latin Hypercube Sampling by using an independent random permutation per dimension and jittering within each 1/n bin. We estimate E[f(X)] for f(x) = exp(-∑ x_j) over [0,1]^d, whose true value is (1 - e^{-1})^d. LHS provides uniform marginals and improved coverage versus plain i.i.d. sampling for many functions.

Time: O(n d)Space: O(n d)
#stratified sampling#latin hypercube sampling#variance reduction#monte carlo#jittered grid#design of experiments#space-filling#permutation shuffle#neyman allocation#uniform marginals#randomized sampling#integration#uncertainty quantification#blue-noise