๐ŸŽ“How I Study AIHISA
๐Ÿ“–Read
๐Ÿ“„Papers๐Ÿ“ฐBlogs๐ŸŽฌCourses
๐Ÿ’กLearn
๐Ÿ›ค๏ธPaths๐Ÿ“šTopics๐Ÿ’กConcepts๐ŸŽดShorts
๐ŸŽฏPractice
โฑ๏ธCoach๐ŸงฉProblems๐Ÿง Thinking๐ŸŽฏPrompts๐Ÿง Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts532

Groups

๐Ÿ“Linear Algebra15๐Ÿ“ˆCalculus & Differentiation10๐ŸŽฏOptimization14๐ŸŽฒProbability Theory12๐Ÿ“ŠStatistics for ML9๐Ÿ“กInformation Theory10๐Ÿ”บConvex Optimization7๐Ÿ”ขNumerical Methods6๐Ÿ•ธGraph Theory for Deep Learning6๐Ÿ”ตTopology for ML5๐ŸŒDifferential Geometry6โˆžMeasure Theory & Functional Analysis6๐ŸŽฐRandom Matrix Theory5๐ŸŒŠFourier Analysis & Signal Processing9๐ŸŽฐSampling & Monte Carlo Methods10๐Ÿง Deep Learning Theory12๐Ÿ›ก๏ธRegularization Theory11๐Ÿ‘๏ธAttention & Transformer Theory10๐ŸŽจGenerative Model Theory11๐Ÿ”ฎRepresentation Learning10๐ŸŽฎReinforcement Learning Mathematics9๐Ÿ”„Variational Methods8๐Ÿ“‰Loss Functions & Objectives10โฑ๏ธSequence & Temporal Models8๐Ÿ’ŽGeometric Deep Learning8

Category

๐Ÿ”ทAllโˆ‘Mathโš™๏ธAlgo๐Ÿ—‚๏ธDS๐Ÿ“šTheory

Level

AllBeginnerIntermediateAdvanced
๐Ÿ“šTheoryIntermediate

Minimum Description Length (MDL)

Minimum Description Length (MDL) picks the model that compresses the data best by minimizing L(M) + L(D|M).

#minimum description length#mdl#bic+12
โˆ‘MathIntermediate

Rรฉnyi Entropy & Divergence

Rรฉnyi entropy generalizes Shannon entropy by measuring uncertainty with a tunable emphasis on common versus rare outcomes.

#renyi entropy#renyi divergence
12345
#shannon entropy
+12
โˆ‘MathAdvanced

f-Divergences

An f-divergence measures how different two probability distributions P and Q are by averaging a convex function f of the density ratio p(x)/q(x) under Q.

#f-divergence#csiszar divergence#kullbackโ€“leibler+11
โˆ‘MathAdvanced

Copulas & Dependency Structures

A copula is a function that glues together marginal distributions to form a multivariate joint distribution while isolating dependence from the margins.

#copula#sklar's theorem#gaussian copula+12
โˆ‘MathIntermediate

Law of Large Numbers

The Weak Law of Large Numbers (WLLN) says that the sample average of independent, identically distributed (i.i.d.) random variables with finite mean gets close to the true mean with high probability as the sample size grows.

#law of large numbers#weak law#sample mean+12
โš™๏ธAlgorithmIntermediate

Mixed Precision Training

Mixed precision training stores and computes tensors in low precision (FP16/BF16) for speed and memory savings while keeping a master copy of weights in FP32 for accurate updates.

#mixed precision#fp16#bf16+10
โš™๏ธAlgorithmIntermediate

Distributed & Parallel Optimization

Data parallelism splits the training data across workers that compute gradients in parallel on a shared model.

#data parallelism#synchronous sgd#asynchronous sgd+12
โš™๏ธAlgorithmIntermediate

Lion Optimizer

Lion (Evolved Sign Momentum) is a first-order, sign-based optimizer discovered through automated program search.

#lion optimizer#sign-based optimization#momentum+12
โš™๏ธAlgorithmIntermediate

Sharpness-Aware Minimization (SAM)

Sharpness-Aware Minimization (SAM) trains models to perform well even when their weights are slightly perturbed, seeking flatter minima that generalize better.

#sharpness-aware minimization#sam optimizer#robust optimization+11
โˆ‘MathIntermediate

Pseudoinverse (Moore-Penrose)

The Mooreโ€“Penrose pseudoinverse generalizes matrix inversion to rectangular or singular matrices and is denoted Aโบ.

#pseudoinverse#moore-penrose#least squares+12
โš™๏ธAlgorithmIntermediate

Sparse Matrices & Computation

A sparse matrix stores only its nonzero entries, saving huge amounts of memory when most entries are zero.

#sparse matrix#csr#csc+12
โˆ‘MathIntermediate

Kronecker Product & Vec Operator

The Kronecker product A โŠ— B expands a small matrix into a larger block matrix by multiplying every entry of A with the whole matrix B.

#kronecker product#vec operator#block matrix+12