🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
📚TheoryAdvanced

Normalizing Flow Variational Inference

Key Points

  • •
    Normalizing-flow variational inference enriches the variational family by transforming a simple base distribution through a sequence of invertible, differentiable mappings.
  • •
    The log density after K flow steps equals the base log density minus the sum of log absolute Jacobian determinants of each step.
  • •
    This approach keeps exact likelihood evaluation tractable while increasing expressiveness to better approximate complex posteriors.
  • •
    The ELBO is estimated via Monte Carlo by sampling from the flow (using the base plus transformations) and computing log p(x,z) − log q(z|x).
  • •
    Planar flows are simple and cheap (O(D) per layer) but may require many layers; coupling flows like RealNVP offer fast triangular Jacobians.
  • •
    Reparameterization makes gradient-based optimization low-variance by pushing randomness to the base distribution.
  • •
    Care must be taken with numerical stability of log-determinants and invertibility constraints.
  • •
    In C++, you can implement flows with standard library random generators and explicit Jacobian terms; training needs gradients from an autodiff library or careful finite-difference approximations.

Prerequisites

  • →Multivariate Calculus — Understanding Jacobians and change-of-variables requires derivatives, determinants, and the chain rule.
  • →Probability Density Functions — Flows manipulate continuous densities; you must know how PDFs transform under mappings.
  • →KL Divergence and ELBO — NFVI optimizes ELBO, which includes KL and expected log-likelihood terms.
  • →Linear Algebra — Vectors, matrices, determinants, and norms appear in Jacobians and Gaussian log-densities.
  • →Stochastic Estimation (Monte Carlo) — ELBO is typically estimated via sampling from the variational distribution.
  • →Optimization and Gradients — Training flows needs gradients of ELBO with respect to flow parameters.

Detailed Explanation

Tap terms for definitions

01Overview

Normalizing Flow Variational Inference (NFVI) is a technique to make variational inference more flexible by transforming a simple, tractable distribution (like a standard Gaussian) through a chain of invertible, differentiable functions, called flows. Each flow reshapes the probability mass while keeping the ability to compute exact log-densities via the change-of-variables formula. The result is a rich family of variational distributions that can capture multimodality, skewness, and complex dependencies—far beyond the limitations of mean-field Gaussians. In practice, we start with a base variable z_0 sampled from a simple distribution p_0, then apply K transformations z_k = f_k(z_{k-1}) to obtain z_K. The log-density of z_K is computable using the base density and the sum of log absolute Jacobian determinants across layers. This property enables unbiased Monte Carlo estimates of the Evidence Lower Bound (ELBO), allowing us to train the flow parameters by maximizing the ELBO. NFVI gracefully blends exact likelihood computation (enabled by invertibility) with deep, learnable transformations, achieving both tractability and expressivity.

02Intuition & Analogies

Imagine squeezing and stretching a lump of clay to make it fit a complex mold. If you only push in one direction (like a diagonal Gaussian), you’ll miss the finer creases and curves. But if you repeatedly apply gentle, carefully designed deformations—press here, twist there—you can make the clay take on almost any shape. Normalizing flows are those sequences of gentle deformations for probability distributions. You start with a simple brick-shaped blob of probability (a standard normal). Each flow layer nudges and warps this blob slightly but in a way that you can exactly reverse and exactly quantify how volume changed. The “exactly quantify” part is crucial: when you squeeze part of the blob, density increases; when you stretch, density decreases. The Jacobian determinant of the transformation measures precisely how much squeezing or stretching you did. By chaining many such steps, you can sculpt the probability density to match complicated posterior shapes that arise in real-world Bayesian problems—like multiple plausible explanations for an image or correlated latent factors in time series. Because every step is invertible and its volume change is known, you can both sample (push forward) and score (compute log-density) efficiently. This is like having a reversible recipe with exact calorie tracking for every ingredient and step.

03Formal Definition

Let z0​ ∼ p0​(z) be a base distribution that is easy to sample and evaluate. A normalizing flow is a sequence of invertible, differentiable mappings fk​: RD → RD for k=1, …, K. Define zk​=fk​(zk−1​) and zK​=fK​ ∘ ⋯ ∘ f1​(z0​). By the change-of-variables formula, the resulting density is log qK​(zK​) = log p0​(z0​) - ∑k=1K​ log ∣detJfk​​(zk−1​)∣, where J_{fk​} is the Jacobian of fk​. In variational inference for a model p(x, z) = p(x ∣ z) p(z), we define qϕ​(z ∣ x) using flows (possibly with parameters ϕ produced by an encoder) and optimize the Evidence Lower Bound (ELBO): L(ϕ) = Eqϕ​(z∣x)​[log p(x, z) - log qϕ​(z ∣ x)]. Using reparameterization, we write z as a deterministic function of base noise ϵ ∼ p0​ and parameters ϕ, enabling low-variance gradient estimates of ∇ϕ​ L. Specific flow families (planar, radial, coupling-based like RealNVP) are designed so that log ∣detJ∣ is efficient to compute.

04When to Use

Use NFVI when a simple variational family (e.g., mean-field Gaussian) underfits the posterior, leading to biased inferences or poor predictive performance. Typical use cases include: (1) complex posteriors with strong correlations, skewness, or multimodality; (2) amortized inference in deep generative models (e.g., VAEs) where flows refine the encoder’s approximate posterior; (3) Bayesian neural networks requiring expressive approximate posteriors over weights; (4) latent-variable time-series models with intricate latent dynamics; and (5) probabilistic programs where tractable, flexible posteriors accelerate inference. If exact likelihood evaluation of q is needed (e.g., for ELBO or importance sampling), normalizing flows are attractive because they preserve tractability. Prefer coupling flows (RealNVP, Glow) when dimensionality is high and you need O(D) Jacobian determinants; use planar/radial flows for lightweight, plug-in expressivity boosts in moderate dimensions. If your problem only needs sampling (not density evaluation), you might consider implicit models, but for variational inference, flows’ exact log-density is a key advantage.

⚠️Common Mistakes

  • Ignoring invertibility constraints: Some flows (e.g., planar) need parameter constraints to ensure invertibility. Without them, log-determinants can become undefined or numerically unstable. Use stabilized parameterizations (e.g., u-hat trick) or regularization.
  • Numerical instability in log-determinants: Summing many log |det J| terms can suffer from precision loss. Use log1p when appropriate, clamp scales in coupling layers, and avoid extreme activations.
  • Mismatch between forward and inverse: For coupling flows, ensure the forward transform and inverse are implemented consistently; errors silently corrupt density calculations.
  • Overly complex s/t networks: In RealNVP, large neural nets for scale/shift increase compute and may overfit. Start simple, add capacity gradually, and monitor ELBO generalization.
  • Too few Monte Carlo samples: ELBO estimates with very few samples can have high variance. Use mini-batching and average over multiple samples per data point when feasible.
  • Forgetting base density contribution: The base log-density \log p_{0}(z_{0}) is essential in \log q; omitting it leads to incorrect likelihoods.
  • Treating flows as black boxes: Inspect intermediate z_k and log-dets to debug; many issues show up as exploding/vanishing log-dets or wildly distributed activations.

Key Formulas

ELBO

L(ϕ)=Eqϕ​(z∣x)​[logp(x,z)−logqϕ​(z∣x)]

Explanation: The ELBO measures how well the variational distribution matches the true posterior while rewarding data fit. Maximizing it brings q closer to p and increases the model’s evidence lower bound.

Flow Composition

zk​=fk​(zk−1​),z0​∼p0​(z)

Explanation: A normalizing flow is a composition of invertible functions applied to a base sample. Each step incrementally reshapes the distribution.

Change of Variables across K layers

logqK​(zK​)=logp0​(z0​)−k=1∑K​log∣detJfk​​(zk−1​)∣

Explanation: The log-density of the transformed variable equals the base log-density minus the sum of log absolute Jacobian determinants. This is the core tractability property of flows.

KL Divergence

KL(q∥p)=Eq(z)​[logq(z)−logp(z)]

Explanation: KL divergence quantifies how different q is from p. Minimizing it (equivalently maximizing the ELBO) drives q toward p.

Monte Carlo ELBO

L=S1​s=1∑S​(logp(x,z(s))−logq(z(s)∣x)),z(s)∼q

Explanation: We estimate the ELBO by averaging over samples from the variational distribution. As S increases, the estimate concentrates around the true ELBO.

Matrix Determinant Lemma (Planar Flow)

log​det(I+uψ(z)⊤)​=log​1+u⊤ψ(z)​

Explanation: Planar flows have Jacobians of the form I + uψT, whose determinant reduces to a scalar. This makes log-determinant computation O(D) instead of O(D3).

RealNVP Affine Coupling

yA​=xA​,yB​=xB​⊙exp(s(xA​))+t(xA​)

Explanation: A coupling layer leaves part of the vector unchanged and affinely transforms the rest conditioned on it. The Jacobian is triangular, making the log-determinant equal to the sum of s(.) outputs.

RealNVP Log-Det

log∣detJRealNVP​∣=i∑​si​(xA​)

Explanation: Because the Jacobian is triangular with diagonal entries exp(si​), the log-determinant is the sum of the scale terms. This yields O(D) complexity for the log-determinant.

Reparameterization Gradient

∇ϕ​L=Eϵ∼p0​​[∇ϕ​(logp(x,z(ϵ;ϕ))−logqϕ​(z(ϵ;ϕ)∣x))]

Explanation: Expressing z as a function of base noise ϵ and parameters ϕ enables gradients to flow through the sampling process, reducing variance compared to score-function estimators.

Standard Gaussian Log-Density

logp0​(z0​)=−21​(Dlog(2π)+∥z0​∥22​)

Explanation: The log-density of a D-dimensional standard normal depends on the squared norm of z0​. This term appears in the flow log-density via the change of variables.

Complexity Analysis

Let D be the latent dimension, K the number of flow layers, and S the number of Monte Carlo samples per ELBO estimate. For general flows where the Jacobian is dense, computing log ∣detJ∣ naïvely costs O(D3) using a determinant, which is prohibitive. Flow architectures are therefore designed to yield cheaper log-determinant computations: planar and radial flows use the matrix determinant lemma, making each layer O(D); coupling flows such as RealNVP/Glow have triangular Jacobians, so the log-determinant reduces to a sum over D/2 scale terms, also O(D) per layer. Thus, evaluating a single transformed sample typically costs O(KD) for these popular flows, not counting the cost of computing s(.) and t(.) networks in coupling layers. If s and t are neural nets with P parameters and cost O(C) per evaluation, then each layer costs O(C + D), and a full pass is O(K(C + D)). Space complexity is O(D) for holding current activations and, in training with backpropagation, O(KD) to store intermediate zk​ (or recompute them with checkpointing to trade time for memory). If s/t are neural nets, additional O(P) parameters are stored. A Monte Carlo ELBO over S samples scales time by a factor of S, i.e., O(SKD) for pure planar/triangular layers, or O(SK(C + D)) with networks. For minibatch size B, multiply by B. Overall, NFVI offers linear complexity in dimension and layers for well-designed flows, enabling practical inference in moderate to high dimensions, provided the s/t networks are not excessively large.

Code Examples

Planar Flow: sampling and exact log-density via change of variables
1#include <bits/stdc++.h>
2using namespace std;
3
4// Utility: sample from standard normal N(0,1)
5double randn(std::mt19937 &rng) {
6 static thread_local std::normal_distribution<double> dist(0.0, 1.0);
7 return dist(rng);
8}
9
10// Compute dot product of two vectors
11double dot(const vector<double> &a, const vector<double> &b) {
12 double s = 0.0; size_t n = a.size();
13 for (size_t i = 0; i < n; ++i) s += a[i] * b[i];
14 return s;
15}
16
17// Planar flow: f(z) = z + u * h(w^T z + b), with h = tanh
18struct PlanarFlow {
19 vector<double> u, w; // size D
20 double b; // scalar
21
22 PlanarFlow(size_t D, std::mt19937 &rng) : u(D), w(D) {
23 // Initialize small random params for stability
24 std::normal_distribution<double> dist(0.0, 0.1);
25 for (size_t i = 0; i < D; ++i) {
26 u[i] = dist(rng);
27 w[i] = dist(rng);
28 }
29 b = dist(rng);
30 }
31
32 // Forward transform: returns z_new and adds log|det J| to log_det_J
33 vector<double> forward(const vector<double> &z, double &log_det_J) const {
34 double a = dot(w, z) + b; // pre-activation
35 double h = tanh(a); // nonlinearity
36 double hprime = 1.0 - tanh(a) * tanh(a);// derivative of tanh
37
38 // psi(z) = h'(a) * w
39 vector<double> psi(w.size());
40 for (size_t i = 0; i < w.size(); ++i) psi[i] = hprime * w[i];
41
42 // det(I + u psi^T) = 1 + u^T psi
43 double det_term = 1.0 + dot(u, psi);
44 // Numerical safety: avoid log(<=0)
45 double eps = 1e-8;
46 log_det_J += log(fabs(det_term) + eps);
47
48 // z_new = z + u * h
49 vector<double> z_new = z;
50 for (size_t i = 0; i < z.size(); ++i) z_new[i] += u[i] * h;
51 return z_new;
52 }
53};
54
55// Sample z0 ~ N(0, I)
56vector<double> sample_base(size_t D, std::mt19937 &rng) {
57 vector<double> z0(D);
58 for (size_t i = 0; i < D; ++i) z0[i] = randn(rng);
59 return z0;
60}
61
62// Log-density of standard Gaussian N(0, I)
63double log_std_normal(const vector<double> &z) {
64 double quad = 0.0; for (double v : z) quad += v * v;
65 double D = static_cast<double>(z.size());
66 return -0.5 * (D * log(2.0 * M_PI) + quad);
67}
68
69// Apply a sequence of planar flows to a base sample, returning (zK, log_q(zK))
70pair<vector<double>, double> sample_and_log_q_planar(
71 const vector<PlanarFlow> &flows, size_t D, std::mt19937 &rng) {
72
73 vector<double> z = sample_base(D, rng); // z0
74 double log_det_sum = 0.0;
75
76 // Push forward through all flows
77 for (const auto &f : flows) {
78 z = f.forward(z, log_det_sum);
79 }
80
81 // log q_K(zK) = log p0(z0) - sum_k log|det J_k|
82 double logq = log_std_normal(sample_base(0, rng)); // placeholder
83 // Actually we need log p0(z0); we kept only zK. To compute logq correctly,
84 // we must retain z0. Modify to store it.
85 // For simplicity, recompute by re-simulating is wrong; so we keep z0.
86 return {z, 0.0};
87}
88
89// Correct implementation that retains z0
90pair<vector<double>, double> sample_and_log_q_planar_keep_z0(
91 const vector<PlanarFlow> &flows, size_t D, std::mt19937 &rng) {
92
93 vector<double> z0 = sample_base(D, rng);
94 vector<double> z = z0;
95 double log_det_sum = 0.0;
96 for (const auto &f : flows) {
97 z = f.forward(z, log_det_sum);
98 }
99 double logq = log_std_normal(z0) - log_det_sum;
100 return {z, logq};
101}
102
103int main() {
104 ios::sync_with_stdio(false);
105 cin.tie(nullptr);
106
107 std::random_device rd; std::mt19937 rng(rd());
108
109 size_t D = 4; size_t K = 5; // 5 planar layers in 4D
110 vector<PlanarFlow> flows; flows.reserve(K);
111 for (size_t k = 0; k < K; ++k) flows.emplace_back(D, rng);
112
113 // Draw a sample and compute its log-density under the flow q
114 auto [zK, logq] = sample_and_log_q_planar_keep_z0(flows, D, rng);
115
116 cout << fixed << setprecision(4);
117 cout << "zK: ";
118 for (double v : zK) cout << v << ' ';
119 cout << "\nlog q(zK): " << logq << "\n";
120
121 return 0;
122}
123

This program defines a planar flow layer f(z) = z + u tanh(w^T z + b). Using the matrix determinant lemma, the log-determinant reduces to log|1 + u^T ψ(z)| with ψ(z) = h'(a) w. We sample from a standard Gaussian base, push the sample forward through K planar layers, and compute the exact log-density using the base log-density minus the accumulated log-determinants. The key point is retaining z0 to compute log p0(z0).

Time: O(KD) per sample for K layers and D dimensions.Space: O(D) working memory; O(KD) if storing all intermediates for backpropagation.
RealNVP coupling flow and Monte Carlo ELBO for a toy Gaussian likelihood
1#include <bits/stdc++.h>
2using namespace std;
3
4// ---------- Random utils ----------
5struct RNG {
6 std::mt19937 gen;
7 RNG() : gen(std::random_device{}()) {}
8 double randn() { static normal_distribution<double> nd(0.0,1.0); return nd(gen); }
9};
10
11// ---------- Linear algebra helpers (naive) ----------
12vector<double> matvec(const vector<vector<double>>& W, const vector<double>& x) {
13 size_t R = W.size(), C = x.size();
14 vector<double> y(R, 0.0);
15 for (size_t i = 0; i < R; ++i) {
16 double s = 0.0;
17 for (size_t j = 0; j < C; ++j) s += W[i][j] * x[j];
18 y[i] = s;
19 }
20 return y;
21}
22
23vector<double> add(const vector<double>& a, const vector<double>& b) {
24 vector<double> c(a.size());
25 for (size_t i = 0; i < a.size(); ++i) c[i] = a[i] + b[i];
26 return c;
27}
28
29vector<double> tanh_vec(const vector<double>& a) {
30 vector<double> y(a.size());
31 for (size_t i = 0; i < a.size(); ++i) y[i] = tanh(a[i]);
32 return y;
33}
34
35// Elementwise operations
36vector<double> exp_vec(const vector<double>& a) {
37 vector<double> y(a.size());
38 for (size_t i = 0; i < a.size(); ++i) y[i] = exp(a[i]);
39 return y;
40}
41vector<double> hadamard(const vector<double>& a, const vector<double>& b) {
42 vector<double> y(a.size());
43 for (size_t i = 0; i < a.size(); ++i) y[i] = a[i] * b[i];
44 return y;
45}
46
47// ---------- RealNVP Layer ----------
48// Partition: first D/2 dims are A, remaining are B (mask alternates across layers)
49struct RealNVPLayer {
50 size_t D, Dh; // D total, D/2 per partition
51 bool flip; // if true, swap A and B roles
52 // Simple linear functions for s(.) and t(.) for demonstrative purposes
53 vector<vector<double>> W_s, W_t; // (Dh x Dh)
54 vector<double> b_s, b_t; // (Dh)
55
56 RealNVPLayer(size_t D_, bool flip_, RNG &rng) : D(D_), Dh(D_/2), flip(flip_), W_s(Dh, vector<double>(Dh)), W_t(Dh, vector<double>(Dh)), b_s(Dh), b_t(Dh) {
57 normal_distribution<double> nd(0.0, 0.2);
58 for (size_t i = 0; i < Dh; ++i) {
59 b_s[i] = 0.0; b_t[i] = 0.0;
60 for (size_t j = 0; j < Dh; ++j) {
61 W_s[i][j] = nd(rng.gen) * 0.1; // keep small for stability
62 W_t[i][j] = nd(rng.gen) * 0.1;
63 }
64 }
65 }
66
67 // Forward pass: y, and add to log_det_J
68 vector<double> forward(const vector<double>& x, double &log_det_J) const {
69 vector<double> y(D);
70 // Split x into A and B
71 vector<double> xA(Dh), xB(Dh);
72 if (!flip) {
73 for (size_t i = 0; i < Dh; ++i) xA[i] = x[i];
74 for (size_t i = 0; i < Dh; ++i) xB[i] = x[Dh + i];
75 } else {
76 for (size_t i = 0; i < Dh; ++i) xA[i] = x[Dh + i];
77 for (size_t i = 0; i < Dh; ++i) xB[i] = x[i];
78 }
79 // s, t as simple (tanh(Wx + b), Wx + b)
80 vector<double> s = tanh_vec(add(matvec(W_s, xA), b_s));
81 vector<double> t = add(matvec(W_t, xA), b_t);
82 vector<double> exp_s = exp_vec(s);
83 vector<double> yB = add(hadamard(xB, exp_s), t);
84
85 // Accumulate log-det: sum(s)
86 double sum_s = 0.0; for (double si : s) sum_s += si;
87 log_det_J += sum_s;
88
89 // Merge back
90 if (!flip) {
91 for (size_t i = 0; i < Dh; ++i) y[i] = xA[i];
92 for (size_t i = 0; i < Dh; ++i) y[Dh + i] = yB[i];
93 } else {
94 for (size_t i = 0; i < Dh; ++i) y[i] = yB[i];
95 for (size_t i = 0; i < Dh; ++i) y[Dh + i] = xA[i];
96 }
97 return y;
98 }
99};
100
101// Base sampling and log-density (standard normal)
102vector<double> sample_base(size_t D, RNG &rng) {
103 vector<double> z(D);
104 for (size_t i = 0; i < D; ++i) z[i] = rng.randn();
105 return z;
106}
107
108double log_std_normal(const vector<double>& z) {
109 double quad = 0.0; for (double v : z) quad += v*v; double D = z.size();
110 return -0.5 * (D * log(2.0 * M_PI) + quad);
111}
112
113// Flow sampling: return (zK, log q(zK))
114pair<vector<double>, double> sample_and_log_q_realnvp(
115 const vector<RealNVPLayer>& layers, size_t D, RNG &rng) {
116 vector<double> z0 = sample_base(D, rng);
117 vector<double> z = z0;
118 double log_det_sum = 0.0;
119 for (const auto &L : layers) z = L.forward(z, log_det_sum);
120 double logq = log_std_normal(z0) - log_det_sum;
121 return {z, logq};
122}
123
124// Toy model: p(z) = N(0,I), p(x|z) = N(W z, sigma^2 I)
125// log p(x|z) = -0.5 [ M log(2pi sigma^2) + ||x - W z||^2 / sigma^2 ]
126double log_likelihood_gaussian(const vector<double>& x, const vector<vector<double>>& W, const vector<double>& z, double sigma) {
127 vector<double> mu = matvec(W, z);
128 double quad = 0.0; for (size_t i = 0; i < x.size(); ++i) { double d = x[i] - mu[i]; quad += d*d; }
129 return -0.5 * (x.size() * log(2.0 * M_PI * sigma * sigma) + quad / (sigma * sigma));
130}
131
132int main() {
133 ios::sync_with_stdio(false);
134 cin.tie(nullptr);
135
136 RNG rng;
137 const size_t D = 4; // latent dim (even)
138 const size_t K = 4; // RealNVP layers
139
140 // Build alternating-mask RealNVP stack
141 vector<RealNVPLayer> layers; layers.reserve(K);
142 for (size_t k = 0; k < K; ++k) layers.emplace_back(D, (k % 2 == 1), rng);
143
144 // Toy data and likelihood parameters
145 const size_t M = 3; // observation dim
146 vector<vector<double>> W(M, vector<double>(D, 0.0));
147 // Simple W: first M dims pick from z
148 for (size_t i = 0; i < M; ++i) if (i < D) W[i][i] = 1.0;
149 vector<double> x = {0.5, -1.2, 0.3};
150 double sigma = 0.5;
151
152 // Monte Carlo ELBO estimate with S samples
153 size_t S = 1000;
154 double elbo_sum = 0.0;
155 for (size_t s = 0; s < S; ++s) {
156 auto [z, logq] = sample_and_log_q_realnvp(layers, D, rng);
157 double logp_z = log_std_normal(z); // prior
158 double logp_x_given_z = log_likelihood_gaussian(x, W, z, sigma);
159 elbo_sum += (logp_x_given_z + logp_z - logq);
160 }
161 double elbo_mc = elbo_sum / static_cast<double>(S);
162
163 cout << fixed << setprecision(6);
164 cout << "Monte Carlo ELBO estimate: " << elbo_mc << "\n";
165
166 return 0;
167}
168

This example implements a lightweight RealNVP with alternating affine coupling layers. Scale s(.) and shift t(.) are simple linear functions (with tanh for s to keep it bounded). The log-determinant is the sum of s’s outputs, thanks to a triangular Jacobian. We then compute a Monte Carlo ELBO for a toy model with a standard normal prior and Gaussian likelihood p(x|z) = N(Wz, σ²I). Each ELBO term uses exact log-densities: log p(x|z), log p(z), and log q(z) from the flow.

Time: O(SKD) for S samples, K layers, and D dimensions (excluding the small linear s/t cost).Space: O(D) for activations; parameters are O(K D^2) here due to naive linear s/t, though practical implementations use neural nets with their own parameter counts.
#normalizing flows#variational inference#elbo#change of variables#jacobian determinant#planar flow#realnvp#reparameterization#affine coupling#bayesian inference#density estimation#monte carlo#pathwise gradients#gaussian likelihood#invertible transforms