🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
📚TheoryAdvanced

Normalizing Flows

Key Points

  • •
    Normalizing flows transform a simple base distribution (like a standard Gaussian) into a complex target distribution using a chain of invertible functions.
  • •
    The change-of-variables formula links densities through the Jacobian determinant of the inverse transform: pX​(x) = pZ​(f−1(x)) ∣detJf−1​(x)∣.
  • •
    Designs like affine coupling layers (e.g., RealNVP) ensure easy inversion and cheap log-determinant computation by making the Jacobian triangular.
  • •
    Training maximizes the log-likelihood, which becomes the base log-density plus a sum of per-layer log-determinants.
  • •
    Efficient implementations avoid O(d3) Jacobian determinants by using structures with O(d) or O(d2) cost per layer (triangular, LU-parameterized, or 1×1 convolutions).
  • •
    Sampling is done by drawing z from the base and pushing it forward through the flow; density evaluation pulls x back through the inverse.
  • •
    Numerical stability requires summing log-scales instead of multiplying scales, clamping scales, and avoiding direct determinants.
  • •
    C++ implementations can model flows with plain linear algebra: coupling layers for nonlinearity and triangular or LU layers for fast log-dets.

Prerequisites

  • →Multivariable calculus (Jacobian and chain rule) — Flows rely on Jacobians and the change-of-variables theorem to relate densities.
  • →Linear algebra (matrices, determinants, LU) — Efficient log-determinant computation and invertibility constraints depend on matrix properties.
  • →Probability densities and Gaussian distributions — Flows start from and evaluate against simple base densities like Gaussians.
  • →Numerical stability and log-sum-exp tricks — Stable accumulation of log-determinants and scaling factors is essential.
  • →Neural networks basics (MLPs, activations) — Scale/shift functions in coupling layers are typically small neural networks.
  • →C++ programming and basic linear algebra implementation — Implementing flow layers requires vector/matrix operations and careful memory handling.

Detailed Explanation

Tap terms for definitions

01Overview

Normalizing flows are probabilistic models that build complex probability distributions by warping a simple, known distribution through a sequence of invertible transformations. Imagine starting with a standard multivariate Gaussian (easy to sample from and to evaluate) and then repeatedly bending and stretching space using functions that can be undone exactly. Because each step is invertible and differentiable, we can track how volumes (and therefore densities) change via the Jacobian determinant of each transformation. The change-of-variables theorem tells us how probability densities transform under such mappings, making exact likelihood computation possible. A typical flow composes many simple, carefully designed layers to achieve both expressivity and computational efficiency. Popular designs include affine coupling layers (RealNVP), invertible 1×1 convolutions (Glow), and triangular/LU-parameterized linear transforms. Crucially, these designs keep the Jacobian determinant easy to compute—ideally as a sum over diagonal entries—so the overall log-likelihood reduces to a base log-density plus the sum of per-layer log-determinants. Flows are attractive because they support both fast sampling (forward pass) and exact density evaluation (inverse pass), unlike many generative models that only do one efficiently.

02Intuition & Analogies

Picture molding a lump of clay. Initially it’s a perfect sphere—simple, symmetric, and easy to describe. Now, with each gentle press and twist of your hands, you form ridges, cavities, and intricate shapes. If each press is carefully controlled so you can reverse it exactly, you could reconstruct the original sphere by applying the reverse presses in the reverse order. Normalizing flows use this idea for probability distributions. We start with a simple “sphere-like” distribution (a standard Gaussian). Each flow layer is like a reversible press that deforms space. Where you compress, probability density increases (points are packed tighter); where you stretch, density decreases. The amount of compression or stretching at a point is measured by the Jacobian determinant of the transformation—the multiplier that tells you how a tiny volume changes. By chaining many reversible presses (layers), you can sculpt extremely detailed, multimodal distributions from a simple base. Affine coupling layers are like pressing only half the clay while holding the other half fixed; you decide how to press the second half based on the first half. Because only part moves at a time, the math remains simple and you always know exactly how much volume changed (you just add up some log-scales). LU and triangular layers are like using tools that only push along certain directions, keeping the “volume change tracker” easy to compute. In the end, to say how probable a sculpture point x is, you just unpress it back to the sphere, account for all the presses’ volume changes, and read off the sphere’s density at the recovered point.

03Formal Definition

Let Z be a random vector with known density pZ​(z) on Rd, and let f: Rd → Rd be a bijective, differentiable mapping with inverse f−1. Define X=f(Z). The change-of-variables theorem gives the density of X as pX​(x) = pZ​(f−1(x)) ​detJf−1​(x)​, where J_{f−1}(x) is the Jacobian matrix of f−1 evaluated at x. In practice, flows compose K invertible functions: f=gK​ ∘ gK−1​ ∘ ⋯ ∘ g1​. Writing h0​=x and hi​=gi−1​(hi−1​) for i=1..K, we get z=hK​ and log pX​(x) = log pZ​(z) + ∑i=1K​ log​detJgi−1​​(hi−1​)​. The art of flow design is choosing gi​ so that both gi​ and g_i−1 are fast to compute and the log-determinant is tractable. Affine coupling layers partition variables into two blocks; one block is transformed using scale and shift functions that depend only on the other block. This yields a triangular Jacobian, so the log-determinant reduces to a sum over the scale outputs. Other layers include invertible linear transforms parameterized via LU (log-determinant is the sum of the logs of U’s diagonals) and invertible 1×1 convolutions for channel mixing in images.

04When to Use

Use normalizing flows when you need a generative model that supports both fast sampling and exact likelihoods. They excel in density estimation tasks, anomaly detection (low likelihood indicates outliers), and as flexible priors in Bayesian models where you must evaluate densities precisely. Flows are also effective for modeling continuous-valued data such as audio waveforms, tabular data, and image pixels (after dequantization). If you can exploit structure (e.g., triangular Jacobians, LU factorization, or channel-wise couplings), you can scale to high dimensions while keeping computation tractable. They are particularly appealing when you want interpretability of likelihoods (unlike some implicit models), or need likelihood-based training objectives (e.g., minimizing negative log-likelihood or bits-per-dimension in images). Consider them when re-parameterizable sampling is important (e.g., variational inference) and when invertibility can be guaranteed by design. However, if your data are discrete without a suitable continuous relaxation, or if you require extremely global dependencies that are difficult for coupling flows to capture without very deep stacks, you may prefer autoregressive models or diffusion models.

⚠️Common Mistakes

  • Forgetting the absolute value in |det J|. The determinant can be negative; omitting |·| gives wrong densities. Always work with log|det J| to maintain numerical stability.
  • Computing a full Jacobian and its determinant naively. This is O(d^3) per layer and defeats the purpose. Use structures with triangular, block-triangular, or LU forms so log-dets are O(d) or O(d^2).
  • Ignoring invertibility constraints. For planar or radial flows, parameters must satisfy conditions to ensure invertibility. For affine coupling, only the overall layer must be invertible; the s and t networks need not be.
  • Exponentiating large scales directly. Using exp(s) can overflow. Clamp s (e.g., via tanh and a multiplier) and accumulate log-dets as sums, never by forming determinants explicitly.
  • Mixing up forward and inverse. Sampling uses forward (z → x), while likelihood evaluation uses inverse (x → z). Make sure the sign of the log-determinant matches the chosen direction.
  • Poor mixing between dimensions. Using the same mask repeatedly in coupling layers limits expressivity. Alternate masks and add permutation or invertible linear mixing between layers.
  • Forgetting data preprocessing. For images, dequantize and rescale; for continuous data, standardize. Flows are sensitive to scale.
  • Not handling batch shapes carefully in implementations. Mismatched dimensions or incorrectly broadcasted scale/shift vectors can silently corrupt results.

Key Formulas

Change of Variables

pX​(x)=pZ​(f−1(x))​detJf−1​(x)​

Explanation: Relates the density pX​ at x to the base density evaluated at the inverse image, scaled by how volumes change under the inverse map. This is the core identity enabling exact likelihoods in flows.

Composed Flow Log-Likelihood

logpX​(x)=logpZ​(z)+i=1∑K​log​detJgi−1​​(hi−1​)​,h0​=x, hi​=gi−1​(hi−1​), z=hK​

Explanation: For a composition of K invertible layers, the total log-likelihood is the base log-density plus the sum of per-layer log-determinants along the inverse path.

Multivariate Gaussian Log-Density

logpN​(z;μ,Σ)=−21​(dlog(2π)+logdetΣ+(z−μ)⊤Σ−1(z−μ))

Explanation: This gives the log-probability of a Gaussian vector. With Σ = I and μ = 0, it simplifies to a constant minus half the squared norm of z.

Affine Coupling Layer

ya​yb​log∣detJ∣​=xa​,=xb​⊙exp(s(xa​))+t(xa​),=j∑​sj​(xa​)​

Explanation: Keeping part of the vector fixed yields a triangular Jacobian. The log-determinant is the sum of the scale outputs, making it cheap to compute and perfectly invertible.

Matrix Determinant Lemma

det(I+uϕ(z)⊤)=1+u⊤ϕ(z)

Explanation: Used by planar flows where J=I+uϕ(z)^⊤. The determinant reduces to a simple scalar expression, enabling fast log-determinant computation.

LU Log-Determinant

log∣detW∣=i=1∑d​log∣uii​∣if W=PLU

Explanation: For an LU-parameterized invertible linear map, the log-determinant is the sum of logs of U’s diagonal, independent of x. This underpins efficient invertible 1×1 convolutions.

Negative Log-Likelihood Objective

NLL(θ)=−n=1∑N​logpX​(x(n);θ)

Explanation: Training flows by maximum likelihood minimizes the total negative log-likelihood over data. Gradients backpropagate through inverses and log-determinants.

Inverse Function Theorem (Local)

detJf​(z)=0⟺f is locally invertible at z

Explanation: A nonzero Jacobian determinant guarantees a local inverse. Flow layers must maintain this condition everywhere to ensure valid densities.

Complexity Analysis

Let d be the data dimension, K the number of layers, and B the batch size. Naively computing a full Jacobian and its determinant per layer costs O(d3), which is prohibitive. Flow architectures avoid this by structuring transformations so that log∣detJ∣ is cheap: - Affine coupling layers (RealNVP): The Jacobian is block-triangular. Computing s(xa​) and t(xa​) typically dominates; if implemented as linear maps or small MLPs on roughly d/2 inputs to produce d/2 outputs, each layer costs O(d2) per example (matrix–vector multiplications), while the log-determinant reduces to summing O(d) elements. Across K layers, time is O(K d2) (or less with sparse/low-rank nets). Space is O(d + P) where P is the number of parameters; with batch size B, O(B d + P). - Triangular/LU linear layers: Applying an upper (or lower) triangular map and its inverse uses forward/back substitution in O(d2). The log-determinant is just the sum of logs of diagonal entries, O(d). If combined with a permutation, mixing is improved at the same asymptotic cost. - Invertible 1×1 convolutions (for images with c channels): Per spatial site cost is O(c2) for application and O(c) for log-determinant via LU; total scales with image size. Therefore, a K-layer flow with efficient structures typically runs in O(B K d2) time and O(B d + P) space. If one instead computed general determinants, costs balloon to O(B K d3). Numerical stability requires storing and summing log-scales rather than forming products, and using parameterizations (e.g., exp of unconstrained diagonals) to ensure invertibility without costly constraints.

Code Examples

RealNVP-style affine coupling layer: forward, inverse, and log-likelihood
1#include <iostream>
2#include <vector>
3#include <random>
4#include <cmath>
5#include <numeric>
6#include <algorithm>
7
8// Utility functions for simple vector/matrix ops
9using Vec = std::vector<double>;
10using Mat = std::vector<std::vector<double>>;
11
12static double dot(const Vec &a, const Vec &b) {
13 double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i]*b[i]; return s;
14}
15
16static Vec matvec(const Mat &W, const Vec &x) {
17 size_t m = W.size();
18 size_t n = x.size();
19 Vec y(m, 0.0);
20 for (size_t i = 0; i < m; ++i) {
21 double s = 0.0;
22 for (size_t j = 0; j < n; ++j) s += W[i][j] * x[j];
23 y[i] = s;
24 }
25 return y;
26}
27
28static Vec add(const Vec &a, const Vec &b) {
29 Vec y(a.size());
30 for (size_t i = 0; i < a.size(); ++i) y[i] = a[i] + b[i];
31 return y;
32}
33
34static Vec tanh_vec(const Vec &x) {
35 Vec y(x.size());
36 for (size_t i = 0; i < x.size(); ++i) y[i] = std::tanh(x[i]);
37 return y;
38}
39
40static Vec exp_vec(const Vec &x) {
41 Vec y(x.size());
42 for (size_t i = 0; i < x.size(); ++i) y[i] = std::exp(x[i]);
43 return y;
44}
45
46static Vec hadamard(const Vec &a, const Vec &b) {
47 Vec y(a.size());
48 for (size_t i = 0; i < a.size(); ++i) y[i] = a[i] * b[i];
49 return y;
50}
51
52// A tiny linear layer: y = W x + b
53struct Linear {
54 Mat W; // (out x in)
55 Vec b; // (out)
56 Linear() {}
57 Linear(size_t out_dim, size_t in_dim, std::mt19937 &rng) {
58 std::normal_distribution<double> N(0.0, 0.1);
59 W.assign(out_dim, Vec(in_dim));
60 b.assign(out_dim, 0.0);
61 for (size_t i = 0; i < out_dim; ++i) {
62 for (size_t j = 0; j < in_dim; ++j) W[i][j] = N(rng);
63 b[i] = N(rng);
64 }
65 }
66 Vec forward(const Vec &x) const { return add(matvec(W, x), b); }
67};
68
69// Affine coupling layer with a binary mask: mask[i]=1 means x[i] is kept (a), 0 means transformed (b)
70struct AffineCoupling {
71 std::vector<int> mask; // size d
72 // s(x_a) and t(x_a) parameterized as small linear layers for demo
73 Linear s_lin;
74 Linear t_lin;
75 size_t d, da, db;
76
77 AffineCoupling() : d(0), da(0), db(0) {}
78 AffineCoupling(const std::vector<int> &mask_, std::mt19937 &rng) : mask(mask_) {
79 d = mask.size();
80 da = std::accumulate(mask.begin(), mask.end(), 0);
81 db = d - da;
82 s_lin = Linear(db, da, rng);
83 t_lin = Linear(db, da, rng);
84 }
85
86 // Splits x into x_a (kept) and x_b (transformed) according to mask
87 void split(const Vec &x, Vec &xa, Vec &xb) const {
88 xa.clear(); xb.clear();
89 for (size_t i = 0; i < d; ++i) {
90 if (mask[i]) xa.push_back(x[i]); else xb.push_back(x[i]);
91 }
92 }
93
94 // Merge (ya from kept, yb from transformed) back into full vector
95 Vec merge(const Vec &ya, const Vec &yb) const {
96 Vec y(d);
97 size_t ia = 0, ib = 0;
98 for (size_t i = 0; i < d; ++i) {
99 if (mask[i]) { y[i] = ya[ia++]; } else { y[i] = yb[ib++]; }
100 }
101 return y;
102 }
103
104 // Forward: y = f(x); returns y and log|det J_f(x)|
105 Vec forward(const Vec &x, double &logdet) const {
106 Vec xa, xb; split(x, xa, xb);
107 // Compute s and t from xa
108 Vec s_raw = s_lin.forward(xa);
109 // Clamp scale with tanh to avoid extreme exp; scale_factor can tune capacity
110 const double scale_factor = 1.5;
111 for (double &v : s_raw) v = scale_factor * std::tanh(v);
112 Vec t = t_lin.forward(xa);
113 // Transform xb -> yb
114 Vec exp_s = exp_vec(s_raw);
115 Vec yb = add(hadamard(xb, exp_s), t);
116 Vec y = merge(xa, yb);
117 // log|det J| = sum(s)
118 logdet = std::accumulate(s_raw.begin(), s_raw.end(), 0.0);
119 return y;
120 }
121
122 // Inverse: x = f^{-1}(y); returns x and log|det J_{f^{-1}}(y)| = -sum(s)
123 Vec inverse(const Vec &y, double &logdet_inv) const {
124 Vec ya, yb; split(y, ya, yb);
125 Vec s_raw = s_lin.forward(ya);
126 const double scale_factor = 1.5;
127 for (double &v : s_raw) v = scale_factor * std::tanh(v);
128 Vec t = t_lin.forward(ya);
129 // Invert: xb = (yb - t) * exp(-s)
130 Vec xb(yb.size());
131 for (size_t i = 0; i < yb.size(); ++i) xb[i] = (yb[i] - t[i]) * std::exp(-s_raw[i]);
132 Vec x = merge(ya, xb);
133 logdet_inv = -std::accumulate(s_raw.begin(), s_raw.end(), 0.0);
134 return x;
135 }
136};
137
138// Simple flow: a stack of affine coupling layers
139struct Flow {
140 std::vector<AffineCoupling> layers;
141 size_t d;
142 Flow(size_t d_) : d(d_) {}
143
144 // Forward pass: z -> x (sampling). Returns x and total log|det J_f(z)|
145 Vec forward(const Vec &z, double &logdet) const {
146 Vec h = z; logdet = 0.0;
147 for (const auto &L : layers) {
148 double ld = 0.0; h = L.forward(h, ld); logdet += ld;
149 }
150 return h;
151 }
152
153 // Inverse pass: x -> z (likelihood). Returns z and total log|det J_{f^{-1}}(x)|
154 Vec inverse(const Vec &x, double &logdet_inv) const {
155 Vec h = x; logdet_inv = 0.0;
156 for (int i = (int)layers.size() - 1; i >= 0; --i) {
157 double ld = 0.0; h = layers[i].inverse(h, ld); logdet_inv += ld;
158 }
159 return h;
160 }
161};
162
163// Base distribution: standard Gaussian N(0, I)
164double log_prob_standard_normal(const Vec &z) {
165 const double LOG2PI = std::log(2.0 * std::acos(-1));
166 double quad = 0.0; for (double v : z) quad += v*v;
167 return -0.5 * (z.size() * LOG2PI + quad);
168}
169
170Vec sample_standard_normal(size_t d, std::mt19937 &rng) {
171 std::normal_distribution<double> N(0.0, 1.0);
172 Vec z(d); for (size_t i = 0; i < d; ++i) z[i] = N(rng); return z;
173}
174
175int main() {
176 std::mt19937 rng(42);
177 const size_t d = 4; // dimension
178
179 // Build a flow with two coupling layers and complementary masks
180 Flow flow(d);
181 std::vector<int> mask1 = {1,1,0,0};
182 std::vector<int> mask2 = {0,0,1,1};
183 flow.layers.emplace_back(mask1, rng);
184 flow.layers.emplace_back(mask2, rng);
185
186 // Example: compute log-likelihood of a data point x
187 Vec x = {0.5, -1.0, 0.2, 1.2};
188 double logdet_inv = 0.0;
189 Vec z = flow.inverse(x, logdet_inv); // x -> z
190 double logpz = log_prob_standard_normal(z);
191 double logpx = logpz + logdet_inv; // change-of-variables
192
193 std::cout << "z = ["; for (size_t i = 0; i < d; ++i) std::cout << z[i] << (i+1<d?", ":"]\n");
194 std::cout << "log p(z) = " << logpz << "\n";
195 std::cout << "sum log|det J_inv| = " << logdet_inv << "\n";
196 std::cout << "log p(x) = " << logpx << "\n\n";
197
198 // Example: sample from the model by pushing forward z -> x
199 Vec z_samp = sample_standard_normal(d, rng);
200 double logdet_fwd = 0.0;
201 Vec x_samp = flow.forward(z_samp, logdet_fwd);
202 std::cout << "sampled x = ["; for (size_t i = 0; i < d; ++i) std::cout << x_samp[i] << (i+1<d?", ":"]\n");
203 std::cout << "sum log|det J_fwd| = " << logdet_fwd << "\n";
204
205 return 0;
206}
207

This program implements a minimal RealNVP-style affine coupling layer in C++. The layer keeps some coordinates fixed and uses them to compute per-dimension scale and shift for the remaining coordinates. The Jacobian is block-triangular, so the log-determinant is the sum of scale outputs. The Flow stacks two such layers with complementary masks, making the overall transform expressive and invertible. The code demonstrates both likelihood evaluation (x → z with sum of inverse log-determinants) and sampling (z → x with sum of forward log-determinants), using a standard Gaussian base.

Time: For d dimensions and L coupling layers with linear s,t maps, each layer costs O((d/2)·(d/2)) = O(d^2) per example. Total is O(L d^2). Sampling and likelihood evaluation have the same asymptotic cost.Space: O(d) working memory per pass plus O(P) for parameters (weights/biases), where P ≈ 2L·( (d/2)·(d/2) + d/2 ).
Invertible triangular linear flow with fast log-determinant via back/forward substitution
1#include <iostream>
2#include <vector>
3#include <random>
4#include <cmath>
5#include <numeric>
6
7using Vec = std::vector<double>;
8using Mat = std::vector<std::vector<double>>;
9
10struct TriangularLinear {
11 size_t d;
12 Mat L; // lower-triangular with ones on diagonal
13 Mat U; // upper-triangular with positive diagonal
14
15 TriangularLinear(size_t d_, std::mt19937 &rng) : d(d_) {
16 std::normal_distribution<double> N(0.0, 0.2);
17 L.assign(d, Vec(d, 0.0));
18 U.assign(d, Vec(d, 0.0));
19 for (size_t i = 0; i < d; ++i) {
20 for (size_t j = 0; j < d; ++j) {
21 if (i > j) L[i][j] = N(rng); // strictly lower
22 if (i < j) U[i][j] = N(rng); // strictly upper
23 }
24 L[i][i] = 1.0; // unit diag
25 U[i][i] = std::exp(std::tanh(N(rng))); // positive diag via exp(tanh(.))
26 }
27 }
28
29 // y = L*(U*x)
30 Vec forward(const Vec &x) const {
31 Vec y(d, 0.0);
32 // temp = U*x (upper-triangular matvec)
33 Vec temp(d, 0.0);
34 for (int i = (int)d - 1; i >= 0; --i) {
35 double s = 0.0;
36 for (size_t j = i; j < d; ++j) s += U[i][j] * x[j];
37 temp[i] = s;
38 }
39 // y = L*temp (lower-triangular matvec)
40 for (size_t i = 0; i < d; ++i) {
41 double s = 0.0;
42 for (size_t j = 0; j <= i; ++j) s += L[i][j] * temp[j];
43 y[i] = s;
44 }
45 return y;
46 }
47
48 // x = (L*U)^{-1} y: solve L z = y (forward-substitution), then U x = z (back-substitution)
49 Vec inverse(const Vec &y) const {
50 // Solve L z = y
51 Vec z(d, 0.0);
52 for (size_t i = 0; i < d; ++i) {
53 double s = y[i];
54 for (size_t j = 0; j < i; ++j) s -= L[i][j] * z[j];
55 // L[i][i] = 1.0
56 z[i] = s;
57 }
58 // Solve U x = z
59 Vec x(d, 0.0);
60 for (int i = (int)d - 1; i >= 0; --i) {
61 double s = z[i];
62 for (size_t j = i + 1; j < d; ++j) s -= U[i][j] * x[j];
63 x[i] = s / U[i][i];
64 }
65 return x;
66 }
67
68 // log|det(L*U)| = log|det L| + log|det U| = sum log diag(U) (since det L = 1)
69 double logabsdet() const {
70 double s = 0.0; for (size_t i = 0; i < d; ++i) s += std::log(U[i][i]); return s;
71 }
72};
73
74// Base: standard Gaussian
75double log_prob_standard_normal(const Vec &z) {
76 const double LOG2PI = std::log(2.0 * std::acos(-1));
77 double quad = 0.0; for (double v : z) quad += v*v;
78 return -0.5 * (z.size() * LOG2PI + quad);
79}
80
81int main() {
82 std::mt19937 rng(7);
83 size_t d = 5;
84 TriangularLinear layer(d, rng);
85
86 // Likelihood of a data point x under the linear flow x = W z
87 Vec x = {0.1, -0.3, 1.0, 0.7, -0.2};
88 Vec z = layer.inverse(x);
89 double logpz = log_prob_standard_normal(z);
90 double logdet_inv = -layer.logabsdet(); // inverse Jacobian determinant
91 double logpx = logpz + logdet_inv;
92
93 std::cout << "log p(z) = " << logpz << "\n";
94 std::cout << "log|det J_inv| = " << logdet_inv << "\n";
95 std::cout << "log p(x) = " << logpx << "\n";
96
97 // Sampling: z -> x, logdet forward is +log|det|
98 Vec z_samp(d, 0.0);
99 std::normal_distribution<double> N(0.0, 1.0);
100 for (size_t i = 0; i < d; ++i) z_samp[i] = N(rng);
101 Vec x_samp = layer.forward(z_samp);
102 std::cout << "sample x[0] = " << x_samp[0] << " (of " << d << ")\n";
103
104 return 0;
105}
106

This example builds an invertible linear flow parameterized as a product of a unit-lower-triangular L and an upper-triangular U with positive diagonal. The forward map is y = L(Ux); inversion uses forward- then back-substitution. The log-determinant is just the sum of logs of U’s diagonal—computed in O(d). Although purely linear flows are limited in expressivity, this layer is a key building block (and mirrors LU/triangular tricks used in Glow for efficient channel mixing).

Time: Forward and inverse each cost O(d^2) due to triangular matrix operations; the log-determinant is O(d).Space: O(d^2) for storing L and U; O(d) working memory during solves.
#normalizing flows#change of variables#jacobian determinant#affine coupling#realnvp#glow#invertible transformation#lu factorization#triangular jacobian#density estimation#log likelihood#sampling#invertible neural networks#1x1 convolution#planar flow