Probability Axioms & Rules
Key Points
- âąKolmogorovâs axioms define probability as a measure on events: non-negativity, normalization, and countable additivity.
- âąFrom these axioms you can derive key rules like complements, inclusionâexclusion, conditional probability, and Bayesâ theorem.
- âąEvents are subsets of a sample space, and probabilities add for disjoint (mutually exclusive) events.
- âąConditional probability reweights the sample space to focus on a given condition, enabling the chain rule and Bayesian inference.
- âąIndependence means intersection probabilities factor into products, not the same thing as being mutually exclusive.
- âąIn code, finite probability spaces can be represented with maps and sets; Monte Carlo can approximate probabilities via simulation.
- âąAlways check normalization (sum to 1) and guard against floating-point error when verifying axioms numerically.
- âąUse these rules to model games of chance, reliability, classification, risk analysis, and to reason from data to causes.
Prerequisites
- âSet theory basics â Events are subsets; unions, intersections, complements are central.
- âFunctions and measures â Probability is a measure mapping sets to [0,1].
- âSeries and summations â Countable additivity involves infinite sums.
- âBasic combinatorics â Counting outcomes to build discrete probability models.
- âConditional statements and logic â Understanding implications and case splits for total probability and Bayes.
- âC++ STL: vectors, sets, maps, RNG â Implementing finite spaces and Monte Carlo simulations.
- âFloating-point arithmetic â Handling rounding errors and tolerances in probability computations.
Detailed Explanation
Tap terms for definitions01Overview
Hook: Imagine running a fair game nightârolling dice, shuffling cards, or flipping coinsâand needing a consistent way to assign chances to outcomes. You want rules that donât contradict each other, regardless of the game or how many events you combine. Concept: Probability axioms, introduced by Kolmogorov, give a rigorous foundation for assigning and combining probabilities. They treat probability as a measure on a collection of events (subsets of outcomes). With only three axiomsânon-negativity, normalization, and countable additivityâyou can derive most of the everyday rules students use: complement rule, inclusionâexclusion, conditional probability, chain rule, independence, and Bayesâ theorem. Example: For a fair die, each face has probability 1/6. The event âeven numberâ has probability 1/6 + 1/6 + 1/6 = 1/2, and the event âat least 5â has probability 1/3. The probability of their union can be computed by inclusionâexclusion: 1/2 + 1/3 â 1/6 = 2/3. These axioms matter beyond games: they underpin machine learning (Bayesian updating), risk assessment (union bounds), quality control (independence assumptions), and randomized algorithms. In programming, we encode finite probability spaces with sets and maps or approximate them by Monte Carlo simulation. The axioms act as invariants we can test in code (sum to 1, non-negative, add over disjoint events), ensuring our models are coherent.
02Intuition & Analogies
Hook: Think of probability as spreading a fixed amount of paintâexactly one bucketâover all possible outcomes. You canât have negative paint, the total paint is one bucket, and if two patches donât overlap, the paint on their union is just the sum on each patch. Concept: The sample space Ω is the canvas of all outcomes. Events are regions on the canvas (subsets of Ω). Probability P is the amount of paint on an event. Non-negativity means you never owe paint: P(A) â„ 0. Normalization says all paint is used: P(Ω) = 1. Countable additivity says if events donât overlap, then the paint on their union equals the sum of their paints. From this, intuition about complements and overlaps follows naturally: if A takes up some paint, whatâs left over (Aâs complement) must be 1 â P(A). If two events overlap, adding their paints double-counts the overlap, so we subtract it once (inclusionâexclusion). Example: Picture two translucent stickers on a window: A (even roll) and B (â„ 4) for a die. The combined tint from sticking both equals the tint of A plus the tint of B minus the area they overlap. If the stickers never overlap (mutually exclusive events), their tints just add. Conditioning is like putting on polarized glasses: you restrict your view to B and then re-measure Aâs tint relative to what you can still seeâthis is P(A | B). Independence is when two stickersâ shades are arranged so that the overlapâs tint is exactly the product of their individual tints, reflecting no informative alignment between them. This mental model helps explain why the rules look the way they do.
03Formal Definition
04When to Use
Use the axioms whenever you model uncertainty in a principled way. They define the legal moves for building complex probabilities from simpler ones.
- Finite models: Cards, dice, lotteries, or any discrete system with countable outcomes. Represent Ω explicitly, assign a probability mass function (PMF), and use additivity for unions and intersections.
- Risk aggregation: Estimating the chance of at least one failure among many components. Apply the union bound or inclusionâexclusion; assume independence only when justified.
- Classification and diagnostics: Apply conditional probability, the law of total probability, and Bayesâ theorem to fuse evidence with prior knowledge (e.g., medical tests with known sensitivity/specificity and disease prevalence).
- Randomized algorithms and simulations: Use Monte Carlo to approximate probabilities that are hard to compute exactly; check normalization and consistency numerically to detect implementation errors.
- Probabilistic reasoning pipelines: Use the chain rule to factor joint probabilities as products of conditionals (basis of Bayesian networks and hidden Markov models). The axioms ensure the factors combine into a coherent joint distribution. Example: To compute the probability a password is compromised by either phishing or brute force, model each event, decide if theyâre independent, and compute P(phishing âȘ brute) by inclusionâexclusion. To interpret a positive spam filter result, use Bayesâ theorem to compute the posterior probability of spam given the classifierâs scores.
â ïžCommon Mistakes
- Confusing independence with mutual exclusivity: Mutually exclusive non-trivial events cannot be independent because P(A â© B) = 0 but P(A)P(B) > 0 if both have positive probability.
- Forgetting to subtract overlaps: Using P(A âȘ B) = P(A) + P(B) even when A and B are not disjoint. Always apply inclusionâexclusion or compute intersections explicitly.
- Ignoring base rates in Bayesâ theorem: Overestimating P(disease | positive) by focusing only on sensitivity/specificity and neglecting low prevalence.
- Misusing conditional probability when P(B) = 0: P(A | B) is undefined if P(B) = 0; handle such cases explicitly in code.
- Over-trusting small-sample simulations: Monte Carlo estimates with small N can be noisy; report confidence intervals and use enough samples.
- Floating-point pitfalls: Sums like P(A) + P(A^c) may be 0.9999999 due to rounding; use tolerances. Normalize PMFs to sum to 1.
- Hidden assumptions of independence: Applying P(A â© B) = P(A)P(B) without justification can severely bias results. Validate with data or domain knowledge. Example fixes: For unions, compute P(A) + P(B) â P(A â© B). For Bayes, include the prior P(disease). In code, check non-negativity, sum-to-one within an epsilon (e.g., 1eâ12), and guard P(B) > 0 before dividing.
Key Formulas
Non-negativity Axiom
Explanation: Every event has a probability that is zero or positive. Negative probabilities are not allowed.
Normalization Axiom
Explanation: The total probability over the entire sample space is 1, meaning something in the sample space must occur.
Countable Additivity Axiom
Explanation: If events do not overlap, the probability of their union is the sum of their probabilities, even for countably many events.
Complement Rule
Explanation: The probability of not A is the leftover mass after accounting for A. Useful when A is complicated but its complement is simple.
Monotonicity
Explanation: B includes at least all outcomes in A, so its probability cannot be smaller. This follows from the axioms.
InclusionâExclusion (Two Sets)
Explanation: Adds probabilities of A and B but corrects for double-counting the intersection. Essential for overlapping events.
Union Bound (Boole)
Explanation: Provides a simple upper bound on the probability of at least one event happening. Useful when intersections are hard to compute.
Conditional Probability
Explanation: Reweights the sample space to the condition B. Foundational for reasoning with partial information.
Multiplication Rule
Explanation: Expresses joint probability via a conditional and a marginal. Basis for the chain rule.
Law of Total Probability
Explanation: Decomposes P(A) across disjoint cases that cover the sample space. Useful for case analysis.
Bayesâ Theorem (Finite Partition)
Explanation: Updates prior beliefs P() after seeing evidence A using likelihoods P(A | ). Core to Bayesian inference.
Independence Criterion
Explanation: Defines independence: knowing one event doesnât change the probability of the other.
Chain Rule
Explanation: Breaks a joint probability into a product of conditional probabilities. Fundamental in graphical models and HMMs.
InclusionâExclusion (Three Sets)
Explanation: Adds singles, subtracts pairwise overlaps, and adds back the triple overlap. Extends to larger finite unions.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // A simple finite probability space with outcomes of type int 5 struct ProbabilitySpace { 6 unordered_map<int, double> pmf; // P({w}) for each outcome w 7 8 // Check normalization and non-negativity within a tolerance 9 bool validate(double eps = 1e-12) const { 10 double sum = 0.0; 11 for (auto &kv : pmf) { 12 if (kv.second < -eps) return false; // negative mass 13 sum += kv.second; 14 } 15 return fabs(sum - 1.0) <= 1e-10; // allow small floating error 16 } 17 18 // Compute probability of an event represented as a set of outcomes 19 double prob(const unordered_set<int> &event) const { 20 double p = 0.0; 21 for (int w : event) { 22 auto it = pmf.find(w); 23 if (it != pmf.end()) p += it->second; 24 } 25 return p; 26 } 27 }; 28 29 // Set utilities 30 unordered_set<int> set_union_us(const unordered_set<int>& A, const unordered_set<int>& B) { 31 unordered_set<int> U; U.reserve(A.size() + B.size()); 32 for (int x : A) U.insert(x); 33 for (int x : B) U.insert(x); 34 return U; 35 } 36 37 unordered_set<int> set_intersection_us(const unordered_set<int>& A, const unordered_set<int>& B) { 38 const unordered_set<int> *S = &A, *L = &B; 39 if (A.size() > B.size()) { S = &B; L = &A; } 40 unordered_set<int> I; I.reserve(S->size()); 41 for (int x : *S) if (L->count(x)) I.insert(x); 42 return I; 43 } 44 45 bool disjoint(const unordered_set<int>& A, const unordered_set<int>& B) { 46 const unordered_set<int> *S = &A, *L = &B; 47 if (A.size() > B.size()) { S = &B; L = &A; } 48 for (int x : *S) if (L->count(x)) return false; 49 return true; 50 } 51 52 int main() { 53 ios::sync_with_stdio(false); 54 cin.tie(nullptr); 55 56 // Build a fair die probability space: outcomes 1..6 each with 1/6 57 ProbabilitySpace ps; 58 for (int w = 1; w <= 6; ++w) ps.pmf[w] = 1.0/6.0; 59 60 cout << boolalpha; 61 cout << "Valid PMF? " << ps.validate() << "\n"; // should be true 62 63 // Define events 64 unordered_set<int> A = {2,4,6}; // even 65 unordered_set<int> B = {4,5,6}; // >= 4 66 unordered_set<int> C = {1}; // singleton 67 unordered_set<int> D = {2,3}; // disjoint with C 68 69 // Compute basic probabilities 70 double pA = ps.prob(A); 71 double pB = ps.prob(B); 72 auto AUB = set_union_us(A, B); 73 auto AIB = set_intersection_us(A, B); 74 double pAUB = ps.prob(AUB); 75 double pAIB = ps.prob(AIB); 76 77 cout << fixed << setprecision(6); 78 cout << "P(A)=" << pA << ", P(B)=" << pB << "\n"; 79 cout << "P(Aâ©B)=" << pAIB << ", P(AâȘB)=" << pAUB << "\n"; 80 81 // Verify inclusionâexclusion: P(AâȘB) = P(A)+P(B)-P(Aâ©B) 82 double lhs = pAUB; 83 double rhs = pA + pB - pAIB; 84 cout << "InclusionâExclusion holds? diff=" << fabs(lhs - rhs) << "\n"; 85 86 // Verify additivity for disjoint events C and D 87 cout << "C and D disjoint? " << disjoint(C, D) << "\n"; 88 auto CUD = set_union_us(C, D); 89 double pC = ps.prob(C), pD = ps.prob(D), pCUD = ps.prob(CUD); 90 cout << "P(CâȘD)=" << pCUD << ", P(C)+P(D)=" << (pC + pD) << ", diff=" << fabs(pCUD - (pC + pD)) << "\n"; 91 92 // Complement rule for A 93 unordered_set<int> Omega = {1,2,3,4,5,6}; 94 unordered_set<int> Ac; 95 for (int w : Omega) if (!A.count(w)) Ac.insert(w); 96 double pAc = ps.prob(Ac); 97 cout << "P(A^c)=" << pAc << ", 1-P(A)=" << (1.0 - pA) << ", diff=" << fabs(pAc - (1.0 - pA)) << "\n"; 98 99 return 0; 100 } 101
We construct a finite probability space for a fair die and implement utilities to compute probabilities of events, unions, intersections, and complements. We validate the PMF (non-negativity and normalization), then check inclusionâexclusion for overlapping events and additivity for disjoint events. Small differences may arise only from floating-point rounding.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Estimator { 5 uint64_t countA = 0, countB = 0, countAB = 0, trials = 0; 6 void add(bool A, bool B) { 7 trials++; 8 if (A) countA++; 9 if (B) countB++; 10 if (A && B) countAB++; 11 } 12 tuple<double,double,double> probs() const { 13 double N = (double)trials; 14 return {countA/N, countB/N, countAB/N}; 15 } 16 }; 17 18 int main() { 19 ios::sync_with_stdio(false); 20 cin.tie(nullptr); 21 22 // RNG setup 23 std::random_device rd; 24 std::mt19937_64 gen(rd()); 25 std::uniform_int_distribution<int> die(1,6); 26 27 const uint64_t N = 5'000'000ULL; // 5 million trials 28 Estimator est; 29 30 for (uint64_t i = 0; i < N; ++i) { 31 int x = die(gen); // roll a fair die 32 bool A = (x % 2 == 0); // even 33 bool B = (x >= 4); // at least 4 34 est.add(A, B); 35 } 36 37 auto [pA, pB, pAB] = est.probs(); 38 double pAUB_via_counts = pA + pB - pAB; // inclusionâexclusion 39 40 cout << fixed << setprecision(6); 41 cout << "Estimated P(A)=" << pA << ", P(B)=" << pB << ", P(Aâ©B)=" << pAB << "\n"; 42 cout << "Estimated P(AâȘB) via inclusionâexclusion = " << pAUB_via_counts << "\n"; 43 44 // Theoretical values for comparison 45 cout << "Theoretical: P(A)=0.5, P(B)=1/2, P(Aâ©B)=1/3, P(AâȘB)=2/3\n"; 46 47 return 0; 48 } 49
This Monte Carlo simulation estimates probabilities by relative frequencies over 5 million die rolls. It verifies inclusionâexclusion numerically by computing P(A) + P(B) â P(A â© B) and compares with theoretical values. Increasing N tightens the estimates due to the law of large numbers.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct BayesResult { 5 double prior; // P(D) 6 double sens; // P(+ | D) 7 double spec; // P(- | D^c) 8 double post_pos; // P(D | +) 9 double post_neg; // P(D | -) 10 }; 11 12 BayesResult compute_exact(double prior, double sens, double spec) { 13 double pD = prior; 14 double pDc = 1.0 - pD; 15 double pPosGivenD = sens; 16 double pNegGivenDc = spec; 17 double pPosGivenDc = 1.0 - spec; 18 19 // Law of total probability for P(+) 20 double pPos = pPosGivenD * pD + pPosGivenDc * pDc; 21 double pNeg = 1.0 - pPos; 22 23 // Bayesâ theorem 24 double pDgivenPos = (pPosGivenD * pD) / pPos; // assume pPos>0 25 double pDgivenNeg = ( (1.0 - pPosGivenD) * pD ) / pNeg; // P(D | -) 26 27 return {pD, sens, spec, pDgivenPos, pDgivenNeg}; 28 } 29 30 pair<double,double> simulate(uint64_t N, double prior, double sens, double spec, std::mt19937_64 &gen) { 31 bernoulli_distribution hasDisease(prior); 32 bernoulli_distribution pos_if_D(sens); 33 bernoulli_distribution neg_if_notD(spec); 34 35 uint64_t countPos = 0, countDandPos = 0; 36 uint64_t countNeg = 0, countDandNeg = 0; 37 38 for (uint64_t i = 0; i < N; ++i) { 39 bool D = hasDisease(gen); 40 bool testPos; 41 if (D) { 42 testPos = pos_if_D(gen); 43 } else { 44 // If not diseased, test is positive with (1 - specificity) 45 bernoulli_distribution pos_if_notD(1.0 - spec); 46 testPos = pos_if_notD(gen); 47 } 48 if (testPos) { 49 countPos++; 50 if (D) countDandPos++; 51 } else { 52 countNeg++; 53 if (D) countDandNeg++; 54 } 55 } 56 57 double pDgivenPos_hat = countPos ? (double)countDandPos / (double)countPos : numeric_limits<double>::quiet_NaN(); 58 double pDgivenNeg_hat = countNeg ? (double)countDandNeg / (double)countNeg : numeric_limits<double>::quiet_NaN(); 59 60 return {pDgivenPos_hat, pDgivenNeg_hat}; 61 } 62 63 int main() { 64 ios::sync_with_stdio(false); 65 cin.tie(nullptr); 66 67 // Example parameters: low prevalence disease 68 double prior = 0.01; // 1% prevalence 69 double sens = 0.95; // sensitivity 70 double spec = 0.90; // specificity 71 72 auto exact = compute_exact(prior, sens, spec); 73 74 cout << fixed << setprecision(6); 75 cout << "Exact P(D|+)= " << exact.post_pos << ", P(D|-)= " << exact.post_neg << "\n"; 76 77 std::random_device rd; std::mt19937_64 gen(rd()); 78 uint64_t N = 10'000'000ULL; // 10 million patients 79 auto sim = simulate(N, prior, sens, spec, gen); 80 cout << "Simulated P(D|+)= " << sim.first << ", P(D|-)= " << sim.second << "\n"; 81 82 return 0; 83 } 84
We compute posteriors P(D | +) and P(D | â) exactly using Bayesâ theorem and then validate via Monte Carlo simulation. With low prevalence, even accurate tests yield modest P(D | +), demonstrating the base-rate effect. Large N reduces simulation noise.