🎓How I Study AIHISA

📖Read

📄Papers 📰Blogs 🎬Courses

💡Learn

🛤️Paths 📚Topics 💡Concepts 🎴Shorts

🎯Practice

📝Daily Log 🎯Prompts 🧠Review

Search Settings

How I Study AI - Learn AI Papers & Lectures the Easy Way

Concepts532

Groups

📐Linear Algebra15 📈Calculus & Differentiation10 🎯Optimization14 🎲Probability Theory12 📊Statistics for ML9 📡Information Theory10 🔺Convex Optimization7 🔢Numerical Methods6 🕸Graph Theory for Deep Learning6 🔵Topology for ML5 🌐Differential Geometry6 ∞Measure Theory & Functional Analysis6 🎰Random Matrix Theory5 🌊Fourier Analysis & Signal Processing9 🎰Sampling & Monte Carlo Methods10 🧠Deep Learning Theory12 🛡️Regularization Theory11 👁️Attention & Transformer Theory10 🎨Generative Model Theory11 🔮Representation Learning10 🎮Reinforcement Learning Mathematics9 🔄Variational Methods8 📉Loss Functions & Objectives10 ⏱️Sequence & Temporal Models8 💎Geometric Deep Learning8

Category

🔷All ∑Math ⚙️Algo 🗂️DS 📚Theory

Level

All Beginner Intermediate Advanced

📚TheoryIntermediate

Minimum Description Length (MDL)

Minimum Description Length (MDL) picks the model that compresses the data best by minimizing L(M) + L(D|M).

#minimum description length#mdl#bic+12

∑MathIntermediate

Rényi Entropy & Divergence

Rényi entropy generalizes Shannon entropy by measuring uncertainty with a tunable emphasis on common versus rare outcomes.

#renyi entropy#renyi divergence

#shannon entropy

+12

∑MathAdvanced

f-Divergences

An f-divergence measures how different two probability distributions P and Q are by averaging a convex function f of the density ratio p(x)/q(x) under Q.

#f-divergence#csiszar divergence#kullback–leibler+11

∑MathAdvanced

Copulas & Dependency Structures

A copula is a function that glues together marginal distributions to form a multivariate joint distribution while isolating dependence from the margins.

#copula#sklar's theorem#gaussian copula+12

∑MathIntermediate

Law of Large Numbers

The Weak Law of Large Numbers (WLLN) says that the sample average of independent, identically distributed (i.i.d.) random variables with finite mean gets close to the true mean with high probability as the sample size grows.

#law of large numbers#weak law#sample mean+12

⚙️AlgorithmIntermediate

Mixed Precision Training

Mixed precision training stores and computes tensors in low precision (FP16/BF16) for speed and memory savings while keeping a master copy of weights in FP32 for accurate updates.

#mixed precision#fp16#bf16+10

⚙️AlgorithmIntermediate

Distributed & Parallel Optimization

Data parallelism splits the training data across workers that compute gradients in parallel on a shared model.

#data parallelism#synchronous sgd#asynchronous sgd+12

⚙️AlgorithmIntermediate

Lion Optimizer

Lion (Evolved Sign Momentum) is a first-order, sign-based optimizer discovered through automated program search.

#lion optimizer#sign-based optimization#momentum+12

⚙️AlgorithmIntermediate

Sharpness-Aware Minimization (SAM)

Sharpness-Aware Minimization (SAM) trains models to perform well even when their weights are slightly perturbed, seeking flatter minima that generalize better.

#sharpness-aware minimization#sam optimizer#robust optimization+11

∑MathIntermediate

Pseudoinverse (Moore-Penrose)

The Moore–Penrose pseudoinverse generalizes matrix inversion to rectangular or singular matrices and is denoted A⁺.

#pseudoinverse#moore-penrose#least squares+12

⚙️AlgorithmIntermediate

Sparse Matrices & Computation

A sparse matrix stores only its nonzero entries, saving huge amounts of memory when most entries are zero.

#sparse matrix#csr#csc+12

∑MathIntermediate

Kronecker Product & Vec Operator

The Kronecker product A ⊗ B expands a small matrix into a larger block matrix by multiplying every entry of A with the whole matrix B.

#kronecker product#vec operator#block matrix+12