📚TheoryAdvanced

Normalizing Flows

Key Points

•
Normalizing flows transform a simple base distribution (like a standard Gaussian) into a complex target distribution using a chain of invertible functions.
•
The change-of-variables formula links densities through the Jacobian determinant of the inverse transform: $p_{X}$ (x) = $p_{Z}$ ( $f^{- 1}$ (x)) $∣ d e t J_{f^{- 1}} (x) ∣$ .
•
Designs like affine coupling layers (e.g., RealNVP) ensure easy inversion and cheap log-determinant computation by making the Jacobian triangular.
•
Training maximizes the log-likelihood, which becomes the base log-density plus a sum of per-layer log-determinants.
•
Efficient implementations avoid O( $d^{3}$ ) Jacobian determinants by using structures with O(d) or O( $d^{2}$ ) cost per layer (triangular, LU-parameterized, or 1×1 convolutions).
•
Sampling is done by drawing z from the base and pushing it forward through the flow; density evaluation pulls x back through the inverse.
•
Numerical stability requires summing log-scales instead of multiplying scales, clamping scales, and avoiding direct determinants.
•
C++ implementations can model flows with plain linear algebra: coupling layers for nonlinearity and triangular or LU layers for fast log-dets.

Prerequisites

→Multivariable calculus (Jacobian and chain rule) — Flows rely on Jacobians and the change-of-variables theorem to relate densities.
→Linear algebra (matrices, determinants, LU) — Efficient log-determinant computation and invertibility constraints depend on matrix properties.
→Probability densities and Gaussian distributions — Flows start from and evaluate against simple base densities like Gaussians.
→Numerical stability and log-sum-exp tricks — Stable accumulation of log-determinants and scaling factors is essential.
→Neural networks basics (MLPs, activations) — Scale/shift functions in coupling layers are typically small neural networks.
→C++ programming and basic linear algebra implementation — Implementing flow layers requires vector/matrix operations and careful memory handling.

Detailed Explanation

Tap terms for definitions

01Overview

Normalizing flows are probabilistic models that build complex probability distributions by warping a simple, known distribution through a sequence of invertible transformations. Imagine starting with a standard multivariate Gaussian (easy to sample from and to evaluate) and then repeatedly bending and stretching space using functions that can be undone exactly. Because each step is invertible and differentiable, we can track how volumes (and therefore densities) change via the Jacobian determinant of each transformation. The change-of-variables theorem tells us how probability densities transform under such mappings, making exact likelihood computation possible. A typical flow composes many simple, carefully designed layers to achieve both expressivity and computational efficiency. Popular designs include affine coupling layers (RealNVP), invertible 1×1 convolutions (Glow), and triangular/LU-parameterized linear transforms. Crucially, these designs keep the Jacobian determinant easy to compute—ideally as a sum over diagonal entries—so the overall log-likelihood reduces to a base log-density plus the sum of per-layer log-determinants. Flows are attractive because they support both fast sampling (forward pass) and exact density evaluation (inverse pass), unlike many generative models that only do one efficiently.

02Intuition & Analogies

Picture molding a lump of clay. Initially it’s a perfect sphere—simple, symmetric, and easy to describe. Now, with each gentle press and twist of your hands, you form ridges, cavities, and intricate shapes. If each press is carefully controlled so you can reverse it exactly, you could reconstruct the original sphere by applying the reverse presses in the reverse order. Normalizing flows use this idea for probability distributions. We start with a simple “sphere-like” distribution (a standard Gaussian). Each flow layer is like a reversible press that deforms space. Where you compress, probability density increases (points are packed tighter); where you stretch, density decreases. The amount of compression or stretching at a point is measured by the Jacobian determinant of the transformation—the multiplier that tells you how a tiny volume changes. By chaining many reversible presses (layers), you can sculpt extremely detailed, multimodal distributions from a simple base. Affine coupling layers are like pressing only half the clay while holding the other half fixed; you decide how to press the second half based on the first half. Because only part moves at a time, the math remains simple and you always know exactly how much volume changed (you just add up some log-scales). LU and triangular layers are like using tools that only push along certain directions, keeping the “volume change tracker” easy to compute. In the end, to say how probable a sculpture point x is, you just unpress it back to the sphere, account for all the presses’ volume changes, and read off the sphere’s density at the recovered point.

03Formal Definition

Let Z be a random vector with known density

p_{Z}

(z) on

R^{d}

, and let f:

R^{d}

\to

R^{d}

be a bijective, differentiable mapping with inverse

f^{- 1}

. Define

X = f (Z

). The change-of-variables theorem gives the density of X as

p_{X}

(x) =

p_{Z}

(

f^{- 1}

(x))

det J_{f^{- 1}} (x)

, where J_{

f^{- 1}

}(x) is the Jacobian matrix of

f^{- 1}

evaluated at x. In practice, flows compose K invertible functions:

f = g_{K}

\circ

g_{K - 1}

\circ

\dots

\circ

g_{1}

. Writing

h_{0} = x

and

h_{i} = g_{i}^{- 1} (h_{i - 1}

) for

i = 1

..K, we get

z = h_{K}

and

lo g

p_{X}

(x) =

lo g

p_{Z}

(z) +

\sum_{i = 1}^{K}

lo g det J_{g_{i}^{- 1}} (h_{i - 1})

. The art of flow design is choosing

g_{i}

so that both

g_{i}

and g_

i^{- 1}

are fast to compute and the log-determinant is tractable. Affine coupling layers partition variables into two blocks; one block is transformed using scale and shift functions that depend only on the other block. This yields a triangular Jacobian, so the log-determinant reduces to a sum over the scale outputs. Other layers include invertible linear transforms parameterized via LU (log-determinant is the sum of the logs of U’s diagonals) and invertible 1×1 convolutions for channel mixing in images.

04When to Use

Use normalizing flows when you need a generative model that supports both fast sampling and exact likelihoods. They excel in density estimation tasks, anomaly detection (low likelihood indicates outliers), and as flexible priors in Bayesian models where you must evaluate densities precisely. Flows are also effective for modeling continuous-valued data such as audio waveforms, tabular data, and image pixels (after dequantization). If you can exploit structure (e.g., triangular Jacobians, LU factorization, or channel-wise couplings), you can scale to high dimensions while keeping computation tractable. They are particularly appealing when you want interpretability of likelihoods (unlike some implicit models), or need likelihood-based training objectives (e.g., minimizing negative log-likelihood or bits-per-dimension in images). Consider them when re-parameterizable sampling is important (e.g., variational inference) and when invertibility can be guaranteed by design. However, if your data are discrete without a suitable continuous relaxation, or if you require extremely global dependencies that are difficult for coupling flows to capture without very deep stacks, you may prefer autoregressive models or diffusion models.

⚠️Common Mistakes

Forgetting the absolute value in |det J|. The determinant can be negative; omitting |·| gives wrong densities. Always work with log|det J| to maintain numerical stability.
Computing a full Jacobian and its determinant naively. This is O(d^3) per layer and defeats the purpose. Use structures with triangular, block-triangular, or LU forms so log-dets are O(d) or O(d^2).
Ignoring invertibility constraints. For planar or radial flows, parameters must satisfy conditions to ensure invertibility. For affine coupling, only the overall layer must be invertible; the s and t networks need not be.
Exponentiating large scales directly. Using exp(s) can overflow. Clamp s (e.g., via tanh and a multiplier) and accumulate log-dets as sums, never by forming determinants explicitly.
Mixing up forward and inverse. Sampling uses forward (z → x), while likelihood evaluation uses inverse (x → z). Make sure the sign of the log-determinant matches the chosen direction.
Poor mixing between dimensions. Using the same mask repeatedly in coupling layers limits expressivity. Alternate masks and add permutation or invertible linear mixing between layers.
Forgetting data preprocessing. For images, dequantize and rescale; for continuous data, standardize. Flows are sensitive to scale.
Not handling batch shapes carefully in implementations. Mismatched dimensions or incorrectly broadcasted scale/shift vectors can silently corrupt results.

Key Formulas

Change of Variables

p_{X} (x) = p_{Z} (f^{- 1} (x)) det J_{f^{- 1}} (x)

Explanation: Relates the density $p_{X}$ at x to the base density evaluated at the inverse image, scaled by how volumes change under the inverse map. This is the core identity enabling exact likelihoods in flows.

Composed Flow Log-Likelihood

lo g p_{X} (x) = lo g p_{Z} (z) + i = 1 \sum K lo g det J_{g_{i}^{- 1}} (h_{i - 1}), h_{0} = x, h_{i} = g_{i}^{- 1} (h_{i - 1}), z = h_{K}

Explanation: For a composition of K invertible layers, the total log-likelihood is the base log-density plus the sum of per-layer log-determinants along the inverse path.

Multivariate Gaussian Log-Density

lo g p_{N} (z; μ, Σ) = - \frac{1}{2} (d lo g (2 π) + lo g det Σ + (z - μ)^{⊤} Σ^{- 1} (z - μ))

Explanation: This gives the log-probability of a Gaussian vector. With $Σ$ = I and $μ$ = 0, it simplifies to a constant minus half the squared norm of z.

Affine Coupling Layer

y_{a} y_{b} lo g ∣ det J ∣ = x_{a}, = x_{b} ⊙ exp (s (x_{a})) + t (x_{a}), = j \sum s_{j} (x_{a})

Explanation: Keeping part of the vector fixed yields a triangular Jacobian. The log-determinant is the sum of the scale outputs, making it cheap to compute and perfectly invertible.

Matrix Determinant Lemma

det (I + u ϕ (z)^{⊤}) = 1 + u^{⊤} ϕ (z)

Explanation: Used by planar flows where $J = I + u ϕ (z$ )^ $⊤$ . The determinant reduces to a simple scalar expression, enabling fast log-determinant computation.

LU Log-Determinant

lo g ∣ det W ∣ = i = 1 \sum d lo g ∣ u_{ii} ∣ if W = P LU

Explanation: For an LU-parameterized invertible linear map, the log-determinant is the sum of logs of U’s diagonal, independent of x. This underpins efficient invertible 1×1 convolutions.

Negative Log-Likelihood Objective

NLL (θ) = - n = 1 \sum N lo g p_{X} (x^{(n)}; θ)

Explanation: Training flows by maximum likelihood minimizes the total negative log-likelihood over data. Gradients backpropagate through inverses and log-determinants.

Inverse Function Theorem (Local)

det J_{f} (z) \neq = 0 ⟺ f is locally invertible at z

Explanation: A nonzero Jacobian determinant guarantees a local inverse. Flow layers must maintain this condition everywhere to ensure valid densities.

Complexity Analysis

Let d be the data dimension, K the number of layers, and B the batch size. Naively computing a full Jacobian and its determinant per layer costs O(

d^{3}

), which is prohibitive. Flow architectures avoid this by structuring transformations so that log

∣ d e t J ∣

is cheap: - Affine coupling layers (RealNVP): The Jacobian is block-triangular. Computing s(

x_{a}

) and t(

x_{a}

) typically dominates; if implemented as linear maps or small MLPs on roughly d/2 inputs to produce d/2 outputs, each layer costs O(

d^{2}

) per example (matrix–vector multiplications), while the log-determinant reduces to summing O(d) elements. Across K layers, time is O(K

d^{2}

) (or less with sparse/low-rank nets). Space is O(d + P) where P is the number of parameters; with batch size B, O(B d + P). - Triangular/LU linear layers: Applying an upper (or lower) triangular map and its inverse uses forward/back substitution in O(

d^{2}

). The log-determinant is just the sum of logs of diagonal entries, O(d). If combined with a permutation, mixing is improved at the same asymptotic cost. - Invertible 1×1 convolutions (for images with c channels): Per spatial site cost is O(

c^{2}

) for application and O(c) for log-determinant via LU; total scales with image size. Therefore, a K-layer flow with efficient structures typically runs in O(B K

d^{2}

) time and O(B d + P) space. If one instead computed general determinants, costs balloon to O(B K

d^{3}

). Numerical stability requires storing and summing log-scales rather than forming products, and using parameterizations (e.g., exp of unconstrained diagonals) to ensure invertibility without costly constraints.

Code Examples

RealNVP-style affine coupling layer: forward, inverse, and log-likelihood

1 #include <iostream>
2 #include <vector>
3 #include <random>
4 #include <cmath>
5 #include <numeric>
6 #include <algorithm>
7 
8 // Utility functions for simple vector/matrix ops
9 using Vec = std::vector<double>;
10 using Mat = std::vector<std::vector<double>>;
11 
12 static double dot(const Vec &a, const Vec &b) {
13     double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i]*b[i]; return s;
14 }
15 
16 static Vec matvec(const Mat &W, const Vec &x) {
17     size_t m = W.size();
18     size_t n = x.size();
19     Vec y(m, 0.0);
20     for (size_t i = 0; i < m; ++i) {
21         double s = 0.0;
22         for (size_t j = 0; j < n; ++j) s += W[i][j] * x[j];
23         y[i] = s;
24     }
25     return y;
26 }
27 
28 static Vec add(const Vec &a, const Vec &b) {
29     Vec y(a.size());
30     for (size_t i = 0; i < a.size(); ++i) y[i] = a[i] + b[i];
31     return y;
32 }
33 
34 static Vec tanh_vec(const Vec &x) {
35     Vec y(x.size());
36     for (size_t i = 0; i < x.size(); ++i) y[i] = std::tanh(x[i]);
37     return y;
38 }
39 
40 static Vec exp_vec(const Vec &x) {
41     Vec y(x.size());
42     for (size_t i = 0; i < x.size(); ++i) y[i] = std::exp(x[i]);
43     return y;
44 }
45 
46 static Vec hadamard(const Vec &a, const Vec &b) {
47     Vec y(a.size());
48     for (size_t i = 0; i < a.size(); ++i) y[i] = a[i] * b[i];
49     return y;
50 }
51 
52 // A tiny linear layer: y = W x + b
53 struct Linear {
54     Mat W; // (out x in)
55     Vec b; // (out)
56     Linear() {}
57     Linear(size_t out_dim, size_t in_dim, std::mt19937 &rng) {
58         std::normal_distribution<double> N(0.0, 0.1);
59         W.assign(out_dim, Vec(in_dim));
60         b.assign(out_dim, 0.0);
61         for (size_t i = 0; i < out_dim; ++i) {
62             for (size_t j = 0; j < in_dim; ++j) W[i][j] = N(rng);
63             b[i] = N(rng);
64         }
65     }
66     Vec forward(const Vec &x) const { return add(matvec(W, x), b); }
67 };
68 
69 // Affine coupling layer with a binary mask: mask[i]=1 means x[i] is kept (a), 0 means transformed (b)
70 struct AffineCoupling {
71     std::vector<int> mask; // size d
72     // s(x_a) and t(x_a) parameterized as small linear layers for demo
73     Linear s_lin;
74     Linear t_lin;
75     size_t d, da, db;
76 
77     AffineCoupling() : d(0), da(0), db(0) {}
78     AffineCoupling(const std::vector<int> &mask_, std::mt19937 &rng) : mask(mask_) {
79         d = mask.size();
80         da = std::accumulate(mask.begin(), mask.end(), 0);
81         db = d - da;
82         s_lin = Linear(db, da, rng);
83         t_lin = Linear(db, da, rng);
84     }
85 
86     // Splits x into x_a (kept) and x_b (transformed) according to mask
87     void split(const Vec &x, Vec &xa, Vec &xb) const {
88         xa.clear(); xb.clear();
89         for (size_t i = 0; i < d; ++i) {
90             if (mask[i]) xa.push_back(x[i]); else xb.push_back(x[i]);
91         }
92     }
93 
94     // Merge (ya from kept, yb from transformed) back into full vector
95     Vec merge(const Vec &ya, const Vec &yb) const {
96         Vec y(d);
97         size_t ia = 0, ib = 0;
98         for (size_t i = 0; i < d; ++i) {
99             if (mask[i]) { y[i] = ya[ia++]; } else { y[i] = yb[ib++]; }
100         }
101         return y;
102     }
103 
104     // Forward: y = f(x); returns y and log|det J_f(x)|
105     Vec forward(const Vec &x, double &logdet) const {
106         Vec xa, xb; split(x, xa, xb);
107         // Compute s and t from xa
108         Vec s_raw = s_lin.forward(xa);
109         // Clamp scale with tanh to avoid extreme exp; scale_factor can tune capacity
110         const double scale_factor = 1.5;
111         for (double &v : s_raw) v = scale_factor * std::tanh(v);
112         Vec t = t_lin.forward(xa);
113         // Transform xb -> yb
114         Vec exp_s = exp_vec(s_raw);
115         Vec yb = add(hadamard(xb, exp_s), t);
116         Vec y = merge(xa, yb);
117         // log|det J| = sum(s)
118         logdet = std::accumulate(s_raw.begin(), s_raw.end(), 0.0);
119         return y;
120     }
121 
122     // Inverse: x = f^{-1}(y); returns x and log|det J_{f^{-1}}(y)| = -sum(s)
123     Vec inverse(const Vec &y, double &logdet_inv) const {
124         Vec ya, yb; split(y, ya, yb);
125         Vec s_raw = s_lin.forward(ya);
126         const double scale_factor = 1.5;
127         for (double &v : s_raw) v = scale_factor * std::tanh(v);
128         Vec t = t_lin.forward(ya);
129         // Invert: xb = (yb - t) * exp(-s)
130         Vec xb(yb.size());
131         for (size_t i = 0; i < yb.size(); ++i) xb[i] = (yb[i] - t[i]) * std::exp(-s_raw[i]);
132         Vec x = merge(ya, xb);
133         logdet_inv = -std::accumulate(s_raw.begin(), s_raw.end(), 0.0);
134         return x;
135     }
136 };
137 
138 // Simple flow: a stack of affine coupling layers
139 struct Flow {
140     std::vector<AffineCoupling> layers;
141     size_t d;
142     Flow(size_t d_) : d(d_) {}
143 
144     // Forward pass: z -> x (sampling). Returns x and total log|det J_f(z)|
145     Vec forward(const Vec &z, double &logdet) const {
146         Vec h = z; logdet = 0.0;
147         for (const auto &L : layers) {
148             double ld = 0.0; h = L.forward(h, ld); logdet += ld;
149         }
150         return h;
151     }
152 
153     // Inverse pass: x -> z (likelihood). Returns z and total log|det J_{f^{-1}}(x)|
154     Vec inverse(const Vec &x, double &logdet_inv) const {
155         Vec h = x; logdet_inv = 0.0;
156         for (int i = (int)layers.size() - 1; i >= 0; --i) {
157             double ld = 0.0; h = layers[i].inverse(h, ld); logdet_inv += ld;
158         }
159         return h;
160     }
161 };
162 
163 // Base distribution: standard Gaussian N(0, I)
164 double log_prob_standard_normal(const Vec &z) {
165     const double LOG2PI = std::log(2.0 * std::acos(-1));
166     double quad = 0.0; for (double v : z) quad += v*v;
167     return -0.5 * (z.size() * LOG2PI + quad);
168 }
169 
170 Vec sample_standard_normal(size_t d, std::mt19937 &rng) {
171     std::normal_distribution<double> N(0.0, 1.0);
172     Vec z(d); for (size_t i = 0; i < d; ++i) z[i] = N(rng); return z;
173 }
174 
175 int main() {
176     std::mt19937 rng(42);
177     const size_t d = 4; // dimension
178 
179     // Build a flow with two coupling layers and complementary masks
180     Flow flow(d);
181     std::vector<int> mask1 = {1,1,0,0};
182     std::vector<int> mask2 = {0,0,1,1};
183     flow.layers.emplace_back(mask1, rng);
184     flow.layers.emplace_back(mask2, rng);
185 
186     // Example: compute log-likelihood of a data point x
187     Vec x = {0.5, -1.0, 0.2, 1.2};
188     double logdet_inv = 0.0;
189     Vec z = flow.inverse(x, logdet_inv); // x -> z
190     double logpz = log_prob_standard_normal(z);
191     double logpx = logpz + logdet_inv; // change-of-variables
192 
193     std::cout << "z = ["; for (size_t i = 0; i < d; ++i) std::cout << z[i] << (i+1<d?", ":"]\n");
194     std::cout << "log p(z) = " << logpz << "\n";
195     std::cout << "sum log|det J_inv| = " << logdet_inv << "\n";
196     std::cout << "log p(x) = " << logpx << "\n\n";
197 
198     // Example: sample from the model by pushing forward z -> x
199     Vec z_samp = sample_standard_normal(d, rng);
200     double logdet_fwd = 0.0;
201     Vec x_samp = flow.forward(z_samp, logdet_fwd);
202     std::cout << "sampled x = ["; for (size_t i = 0; i < d; ++i) std::cout << x_samp[i] << (i+1<d?", ":"]\n");
203     std::cout << "sum log|det J_fwd| = " << logdet_fwd << "\n";
204 
205     return 0;
206 }
207

This program implements a minimal RealNVP-style affine coupling layer in C++. The layer keeps some coordinates fixed and uses them to compute per-dimension scale and shift for the remaining coordinates. The Jacobian is block-triangular, so the log-determinant is the sum of scale outputs. The Flow stacks two such layers with complementary masks, making the overall transform expressive and invertible. The code demonstrates both likelihood evaluation (x → z with sum of inverse log-determinants) and sampling (z → x with sum of forward log-determinants), using a standard Gaussian base.

Time: For d dimensions and L coupling layers with linear s,t maps, each layer costs O((d/2)·(d/2)) = O(d^2) per example. Total is O(L d^2). Sampling and likelihood evaluation have the same asymptotic cost.Space: O(d) working memory per pass plus O(P) for parameters (weights/biases), where P ≈ 2L·( (d/2)·(d/2) + d/2 ).

Invertible triangular linear flow with fast log-determinant via back/forward substitution

1 #include <iostream>
2 #include <vector>
3 #include <random>
4 #include <cmath>
5 #include <numeric>
6 
7 using Vec = std::vector<double>;
8 using Mat = std::vector<std::vector<double>>;
9 
10 struct TriangularLinear {
11     size_t d;
12     Mat L; // lower-triangular with ones on diagonal
13     Mat U; // upper-triangular with positive diagonal
14 
15     TriangularLinear(size_t d_, std::mt19937 &rng) : d(d_) {
16         std::normal_distribution<double> N(0.0, 0.2);
17         L.assign(d, Vec(d, 0.0));
18         U.assign(d, Vec(d, 0.0));
19         for (size_t i = 0; i < d; ++i) {
20             for (size_t j = 0; j < d; ++j) {
21                 if (i > j) L[i][j] = N(rng); // strictly lower
22                 if (i < j) U[i][j] = N(rng); // strictly upper
23             }
24             L[i][i] = 1.0; // unit diag
25             U[i][i] = std::exp(std::tanh(N(rng))); // positive diag via exp(tanh(.))
26         }
27     }
28 
29     // y = L*(U*x)
30     Vec forward(const Vec &x) const {
31         Vec y(d, 0.0);
32         // temp = U*x (upper-triangular matvec)
33         Vec temp(d, 0.0);
34         for (int i = (int)d - 1; i >= 0; --i) {
35             double s = 0.0;
36             for (size_t j = i; j < d; ++j) s += U[i][j] * x[j];
37             temp[i] = s;
38         }
39         // y = L*temp (lower-triangular matvec)
40         for (size_t i = 0; i < d; ++i) {
41             double s = 0.0;
42             for (size_t j = 0; j <= i; ++j) s += L[i][j] * temp[j];
43             y[i] = s;
44         }
45         return y;
46     }
47 
48     // x = (L*U)^{-1} y: solve L z = y (forward-substitution), then U x = z (back-substitution)
49     Vec inverse(const Vec &y) const {
50         // Solve L z = y
51         Vec z(d, 0.0);
52         for (size_t i = 0; i < d; ++i) {
53             double s = y[i];
54             for (size_t j = 0; j < i; ++j) s -= L[i][j] * z[j];
55             // L[i][i] = 1.0
56             z[i] = s; 
57         }
58         // Solve U x = z
59         Vec x(d, 0.0);
60         for (int i = (int)d - 1; i >= 0; --i) {
61             double s = z[i];
62             for (size_t j = i + 1; j < d; ++j) s -= U[i][j] * x[j];
63             x[i] = s / U[i][i];
64         }
65         return x;
66     }
67 
68     // log|det(L*U)| = log|det L| + log|det U| = sum log diag(U) (since det L = 1)
69     double logabsdet() const {
70         double s = 0.0; for (size_t i = 0; i < d; ++i) s += std::log(U[i][i]); return s;
71     }
72 };
73 
74 // Base: standard Gaussian
75 double log_prob_standard_normal(const Vec &z) {
76     const double LOG2PI = std::log(2.0 * std::acos(-1));
77     double quad = 0.0; for (double v : z) quad += v*v;
78     return -0.5 * (z.size() * LOG2PI + quad);
79 }
80 
81 int main() {
82     std::mt19937 rng(7);
83     size_t d = 5;
84     TriangularLinear layer(d, rng);
85 
86     // Likelihood of a data point x under the linear flow x = W z
87     Vec x = {0.1, -0.3, 1.0, 0.7, -0.2};
88     Vec z = layer.inverse(x);
89     double logpz = log_prob_standard_normal(z);
90     double logdet_inv = -layer.logabsdet(); // inverse Jacobian determinant
91     double logpx = logpz + logdet_inv;
92 
93     std::cout << "log p(z) = " << logpz << "\n";
94     std::cout << "log|det J_inv| = " << logdet_inv << "\n";
95     std::cout << "log p(x) = " << logpx << "\n";
96 
97     // Sampling: z -> x, logdet forward is +log|det|
98     Vec z_samp(d, 0.0);
99     std::normal_distribution<double> N(0.0, 1.0);
100     for (size_t i = 0; i < d; ++i) z_samp[i] = N(rng);
101     Vec x_samp = layer.forward(z_samp);
102     std::cout << "sample x[0] = " << x_samp[0] << " (of " << d << ")\n";
103 
104     return 0;
105 }
106

This example builds an invertible linear flow parameterized as a product of a unit-lower-triangular L and an upper-triangular U with positive diagonal. The forward map is y = L(Ux); inversion uses forward- then back-substitution. The log-determinant is just the sum of logs of U’s diagonal—computed in O(d). Although purely linear flows are limited in expressivity, this layer is a key building block (and mirrors LU/triangular tricks used in Glow for efficient channel mixing).

Time: Forward and inverse each cost O(d^2) due to triangular matrix operations; the log-determinant is O(d).Space: O(d^2) for storing L and U; O(d) working memory during solves.

1	#include <iostream>
2	#include <vector>
3	#include <random>
4	#include <cmath>
5	#include <numeric>
6	#include <algorithm>
7
8	// Utility functions for simple vector/matrix ops
9	using Vec = std::vector<double>;
10	using Mat = std::vector<std::vector<double>>;
11
12	static double dot(const Vec &a, const Vec &b) {
13	double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += a[i]*b[i]; return s;
14	}
15
16	static Vec matvec(const Mat &W, const Vec &x) {
17	size_t m = W.size();
18	size_t n = x.size();
19	Vec y(m, 0.0);
20	for (size_t i = 0; i < m; ++i) {
21	double s = 0.0;
22	for (size_t j = 0; j < n; ++j) s += W[i][j] * x[j];
23	y[i] = s;
24	}
25	return y;
26	}
27
28	static Vec add(const Vec &a, const Vec &b) {
29	Vec y(a.size());
30	for (size_t i = 0; i < a.size(); ++i) y[i] = a[i] + b[i];
31	return y;
32	}
33
34	static Vec tanh_vec(const Vec &x) {
35	Vec y(x.size());
36	for (size_t i = 0; i < x.size(); ++i) y[i] = std::tanh(x[i]);
37	return y;
38	}
39
40	static Vec exp_vec(const Vec &x) {
41	Vec y(x.size());
42	for (size_t i = 0; i < x.size(); ++i) y[i] = std::exp(x[i]);
43	return y;
44	}
45
46	static Vec hadamard(const Vec &a, const Vec &b) {
47	Vec y(a.size());
48	for (size_t i = 0; i < a.size(); ++i) y[i] = a[i] * b[i];
49	return y;
50	}
51
52	// A tiny linear layer: y = W x + b
53	struct Linear {
54	Mat W; // (out x in)
55	Vec b; // (out)
56	Linear() {}
57	Linear(size_t out_dim, size_t in_dim, std::mt19937 &rng) {
58	std::normal_distribution<double> N(0.0, 0.1);
59	W.assign(out_dim, Vec(in_dim));
60	b.assign(out_dim, 0.0);
61	for (size_t i = 0; i < out_dim; ++i) {
62	for (size_t j = 0; j < in_dim; ++j) W[i][j] = N(rng);
63	b[i] = N(rng);
64	}
65	}
66	Vec forward(const Vec &x) const { return add(matvec(W, x), b); }
67	};
68
69	// Affine coupling layer with a binary mask: mask[i]=1 means x[i] is kept (a), 0 means transformed (b)
70	struct AffineCoupling {
71	std::vector<int> mask; // size d
72	// s(x_a) and t(x_a) parameterized as small linear layers for demo
73	Linear s_lin;
74	Linear t_lin;
75	size_t d, da, db;
76
77	AffineCoupling() : d(0), da(0), db(0) {}
78	AffineCoupling(const std::vector<int> &mask_, std::mt19937 &rng) : mask(mask_) {
79	d = mask.size();
80	da = std::accumulate(mask.begin(), mask.end(), 0);
81	db = d - da;
82	s_lin = Linear(db, da, rng);
83	t_lin = Linear(db, da, rng);
84	}
85
86	// Splits x into x_a (kept) and x_b (transformed) according to mask
87	void split(const Vec &x, Vec &xa, Vec &xb) const {
88	xa.clear(); xb.clear();
89	for (size_t i = 0; i < d; ++i) {
90	if (mask[i]) xa.push_back(x[i]); else xb.push_back(x[i]);
91	}
92	}
93
94	// Merge (ya from kept, yb from transformed) back into full vector
95	Vec merge(const Vec &ya, const Vec &yb) const {
96	Vec y(d);
97	size_t ia = 0, ib = 0;
98	for (size_t i = 0; i < d; ++i) {
99	if (mask[i]) { y[i] = ya[ia++]; } else { y[i] = yb[ib++]; }
100	}
101	return y;
102	}
103
104	// Forward: y = f(x); returns y and log\|det J_f(x)\|
105	Vec forward(const Vec &x, double &logdet) const {
106	Vec xa, xb; split(x, xa, xb);
107	// Compute s and t from xa
108	Vec s_raw = s_lin.forward(xa);
109	// Clamp scale with tanh to avoid extreme exp; scale_factor can tune capacity
110	const double scale_factor = 1.5;
111	for (double &v : s_raw) v = scale_factor * std::tanh(v);
112	Vec t = t_lin.forward(xa);
113	// Transform xb -> yb
114	Vec exp_s = exp_vec(s_raw);
115	Vec yb = add(hadamard(xb, exp_s), t);
116	Vec y = merge(xa, yb);
117	// log\|det J\| = sum(s)
118	logdet = std::accumulate(s_raw.begin(), s_raw.end(), 0.0);
119	return y;
120	}
121
122	// Inverse: x = f^{-1}(y); returns x and log\|det J_{f^{-1}}(y)\| = -sum(s)
123	Vec inverse(const Vec &y, double &logdet_inv) const {
124	Vec ya, yb; split(y, ya, yb);
125	Vec s_raw = s_lin.forward(ya);
126	const double scale_factor = 1.5;
127	for (double &v : s_raw) v = scale_factor * std::tanh(v);
128	Vec t = t_lin.forward(ya);
129	// Invert: xb = (yb - t) * exp(-s)
130	Vec xb(yb.size());
131	for (size_t i = 0; i < yb.size(); ++i) xb[i] = (yb[i] - t[i]) * std::exp(-s_raw[i]);
132	Vec x = merge(ya, xb);
133	logdet_inv = -std::accumulate(s_raw.begin(), s_raw.end(), 0.0);
134	return x;
135	}
136	};
137
138	// Simple flow: a stack of affine coupling layers
139	struct Flow {
140	std::vector<AffineCoupling> layers;
141	size_t d;
142	Flow(size_t d_) : d(d_) {}
143
144	// Forward pass: z -> x (sampling). Returns x and total log\|det J_f(z)\|
145	Vec forward(const Vec &z, double &logdet) const {
146	Vec h = z; logdet = 0.0;
147	for (const auto &L : layers) {
148	double ld = 0.0; h = L.forward(h, ld); logdet += ld;
149	}
150	return h;
151	}
152
153	// Inverse pass: x -> z (likelihood). Returns z and total log\|det J_{f^{-1}}(x)\|
154	Vec inverse(const Vec &x, double &logdet_inv) const {
155	Vec h = x; logdet_inv = 0.0;
156	for (int i = (int)layers.size() - 1; i >= 0; --i) {
157	double ld = 0.0; h = layers[i].inverse(h, ld); logdet_inv += ld;
158	}
159	return h;
160	}
161	};
162
163	// Base distribution: standard Gaussian N(0, I)
164	double log_prob_standard_normal(const Vec &z) {
165	const double LOG2PI = std::log(2.0 * std::acos(-1));
166	double quad = 0.0; for (double v : z) quad += v*v;
167	return -0.5 * (z.size() * LOG2PI + quad);
168	}
169
170	Vec sample_standard_normal(size_t d, std::mt19937 &rng) {
171	std::normal_distribution<double> N(0.0, 1.0);
172	Vec z(d); for (size_t i = 0; i < d; ++i) z[i] = N(rng); return z;
173	}
174
175	int main() {
176	std::mt19937 rng(42);
177	const size_t d = 4; // dimension
178
179	// Build a flow with two coupling layers and complementary masks
180	Flow flow(d);
181	std::vector<int> mask1 = {1,1,0,0};
182	std::vector<int> mask2 = {0,0,1,1};
183	flow.layers.emplace_back(mask1, rng);
184	flow.layers.emplace_back(mask2, rng);
185
186	// Example: compute log-likelihood of a data point x
187	Vec x = {0.5, -1.0, 0.2, 1.2};
188	double logdet_inv = 0.0;
189	Vec z = flow.inverse(x, logdet_inv); // x -> z
190	double logpz = log_prob_standard_normal(z);
191	double logpx = logpz + logdet_inv; // change-of-variables
192
193	std::cout << "z = ["; for (size_t i = 0; i < d; ++i) std::cout << z[i] << (i+1<d?", ":"]\n");
194	std::cout << "log p(z) = " << logpz << "\n";
195	std::cout << "sum log\|det J_inv\| = " << logdet_inv << "\n";
196	std::cout << "log p(x) = " << logpx << "\n\n";
197
198	// Example: sample from the model by pushing forward z -> x
199	Vec z_samp = sample_standard_normal(d, rng);
200	double logdet_fwd = 0.0;
201	Vec x_samp = flow.forward(z_samp, logdet_fwd);
202	std::cout << "sampled x = ["; for (size_t i = 0; i < d; ++i) std::cout << x_samp[i] << (i+1<d?", ":"]\n");
203	std::cout << "sum log\|det J_fwd\| = " << logdet_fwd << "\n";
204
205	return 0;
206	}
207