Equivariance & Invariance
Key Points
- •Equivariance means that applying a transformation before a function is the same as applying a corresponding transformation after the function.
- •Invariance is a special case of equivariance where the output does not change under the group’s transformations.
- •Convolution is translation-equivariant; shifting the input shifts the output by the same amount.
- •Pooling operations like global sum or max are permutation-invariant; reordering elements does not change the result.
- •For sets, every linear permutation-equivariant layer has the form + B , which commutes with any reordering.
- •Equivariance encodes symmetry so models generalize better with fewer parameters and less data.
- •Boundary conditions and discretization (e.g., padding choice) can break exact equivariance in practice.
- •Composing equivariant layers with invariant pooling yields invariant models, a common design in CNNs, GNNs, and DeepSets.
Prerequisites
- →Basics of group theory — Understanding groups and actions is essential to define and reason about equivariance and invariance.
- →Linear algebra — Matrix representations of group actions, convolution as linear operation, and permutation matrices rely on linear algebra.
- →Discrete signals and convolution — Translation equivariance is most easily seen through convolution on discrete signals.
- →Modular arithmetic — Circular shifts and periodic boundary conditions use modulo indexing.
- →Vector norms and numerical stability — Testing equivariance numerically requires tolerance-based comparisons and awareness of floating-point error.
- →Graphs and sets representation — Permutation equivariance/invariance is defined over sets and graphs, requiring row-wise data handling.
Detailed Explanation
Tap terms for definitions01Overview
Equivariance and invariance are principles that capture how functions should behave when inputs are transformed by symmetries—like shifts, rotations, or permutations. Intuitively, a function is equivariant if transforming the input leads to a predictable transformation of the output. For example, if you shift an image to the right and then run a convolution, the output also shifts right by the same amount. Invariance is the special case where the output does not change at all under the transformation; for instance, a classifier’s label for an object should not depend on where the object appears in an image. These ideas arise from group theory: a group represents a set of transformations that can be combined (like doing two shifts in a row) and have an identity (doing nothing). Designing algorithms and models that respect these symmetries leads to better generalization and efficiency, because we build the symmetry knowledge directly into the computation instead of relying on data to teach it. In practice, this shows up in convolutional neural networks (translation equivariance), graph neural networks (permutation equivariance), and physics-inspired models (rotation/SE(3) equivariance).
02Intuition & Analogies
Imagine you’re reading text on a sliding whiteboard. If someone slides the board to the right, the sentence is still the same—just shifted. A tool like a highlighter that marks nouns should mark the same words no matter where the board sits. If the highlighting also shifts with the text, the tool is equivariant to shifts. If the tool simply counts how many nouns appear and returns a number that doesn’t move with the board, it’s invariant to shifts. Another analogy: think of a bag of marbles. The order in which you pull marbles out is irrelevant to the bag’s total weight. Any computation that just depends on the collection (not the order) is permutation-invariant. If you label each marble with a score and then reorder the marbles, the list of scores should reorder the same way—this is permutation equivariance. In image processing, convolution is like sliding a stencil over an image to detect patterns. If you slide the image, the detected pattern map slides the same way—so convolution is translation-equivariant. Finally, consider renaming nodes in a social network: a meaningful analysis should not depend on the node IDs. A graph algorithm that simply relabels its outputs when you relabel inputs is permutation-equivariant; if it produces a single graph-level summary that is unaffected by relabeling, it’s permutation-invariant. These intuitions guide us to design computations that respect the problem’s inherent symmetries.
03Formal Definition
04When to Use
Use equivariance when the task’s outputs should transform in lockstep with the inputs under known symmetries. Examples: feature maps in CNNs (translation symmetry), node features in GNNs under node permutations, vector fields under rotations in physics simulations, and molecular force prediction under SE(3) (rotation/translation) symmetry. Use invariance when the final output should not depend on the transformation: image classification that ignores object location (translation invariance via global pooling), graph classification that ignores node labels (permutation invariance via pooling over nodes), or set regression where order does not matter. In practice, you often stack equivariant layers (to propagate local structure consistently across the domain) and then apply an invariant readout (like sum/mean/max pooling) to produce a stable global summary. Choose the group carefully: circular padding enables exact shift equivariance on periodic signals; reflection or rotation groups can be used for crystals or microscopy; permutation groups are natural for sets and graphs. If your data has approximate symmetries (e.g., small perspective distortions), equivariance can still help, but you may need data augmentation or relaxed architectures.
⚠️Common Mistakes
- Confusing equivariance with invariance: equivariant outputs move with inputs; invariant outputs do not change. Be explicit about T_g and T'_g. 2) Breaking equivariance with boundary handling: zero padding in convolution is not strictly translation-equivariant; circular padding is. 3) Ignoring representation choices: if the group action on outputs is wrong (e.g., using identity when outputs should rotate), the equivariance claim fails. 4) Assuming any nonlinearity preserves equivariance: pointwise nonlinearities do preserve many equivariances (same action on each coordinate), but operations that couple coordinates in an action-dependent way may break it. 5) Mishandling permutations: forgetting to permute both features and adjacency in graphs ruins permutation equivariance. 6) Over-constraining models: forcing invariance too early (e.g., pooling at the input) can discard useful information needed to solve the task. 7) Floating-point expectations: numerical tests of equivariance should allow small tolerances due to rounding. 8) Using the wrong group: approximate or missing symmetries can mislead model design; validate that the symmetry holds for your data and objective.
Key Formulas
Equivariance Definition
Explanation: Applying the group action to the input and then f equals applying f and then the corresponding action on the output. This captures how outputs should transform predictably with inputs.
Invariance Definition
Explanation: The function f is unchanged under the group action on inputs. This is the special case of equivariance where the output action is the identity.
Group Action Axioms
Explanation: These properties ensure the transformations form a consistent action of the group on a space. The identity does nothing, and composition matches group multiplication.
Discrete Convolution
Explanation: Convolution at time t sums products of input x with a reversed, shifted kernel k. On finite signals we restrict the sum and handle boundaries by padding.
Translation Equivariance of Convolution
Explanation: Shifting the input by and then convolving equals convolving first and then shifting by . This explains why CNN feature maps move with the input.
Permutation Equivariance
Explanation: For a set or graph with n elements, reordering the elements by permutation P reorders the outputs in the same way. This is essential for set-/graph-structured data.
General Linear Permutation-Equivariant Map
Explanation: Every linear, permutation-equivariant map on element features has this form. The shared A acts locally and B couples to the global sum, ensuring commutation with any permutation.
DeepSets Invariant Form
Explanation: A universal form for permutation-invariant functions on sets. First map each element by , sum them, and then apply to obtain an order-agnostic result.
Convolution Theorem
Explanation: In the frequency domain, convolution becomes pointwise multiplication. This diagonalization explains why convolution commutes with translations.
Invariant Readout from Equivariant Features
Explanation: Composing an equivariant feature extractor with an invariant pooling yields an invariant overall function. This is the standard CNN/GNN design pattern.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Circular index helper: ensures (i mod n) is in [0, n-1] 5 static inline int mod_idx(int i, int n) { 6 int r = i % n; 7 return r < 0 ? r + n : r; 8 } 9 10 // Compute circular 1D convolution y = x * k with length(x) = n, length(k) = m 11 // y[t] = sum_j k[j] * x[t - j mod n] 12 vector<double> circular_convolve(const vector<double>& x, const vector<double>& k) { 13 int n = (int)x.size(); 14 int m = (int)k.size(); 15 vector<double> y(n, 0.0); 16 for (int t = 0; t < n; ++t) { 17 double acc = 0.0; 18 for (int j = 0; j < m; ++j) { 19 int idx = mod_idx(t - j, n); 20 acc += k[j] * x[idx]; 21 } 22 y[t] = acc; 23 } 24 return y; 25 } 26 27 // Circularly shift a signal by shift s: (S_s x)[t] = x[t - s mod n] 28 vector<double> circular_shift(const vector<double>& x, int s) { 29 int n = (int)x.size(); 30 vector<double> y(n); 31 for (int t = 0; t < n; ++t) { 32 y[t] = x[mod_idx(t - s, n)]; 33 } 34 return y; 35 } 36 37 int main() { 38 ios::sync_with_stdio(false); 39 cin.tie(nullptr); 40 41 // Create a random signal and kernel 42 int n = 128; // signal length 43 int m = 9; // kernel length (small for locality) 44 int s = 17; // shift amount 45 46 std::mt19937 rng(42); 47 std::normal_distribution<double> dist(0.0, 1.0); 48 49 vector<double> x(n), k(m); 50 for (int i = 0; i < n; ++i) x[i] = dist(rng); 51 for (int j = 0; j < m; ++j) k[j] = dist(rng); 52 53 // Compute y = x * k 54 vector<double> y = circular_convolve(x, k); 55 56 // Compute shifted signal and its convolution 57 vector<double> x_shift = circular_shift(x, s); 58 vector<double> y_from_shifted = circular_convolve(x_shift, k); 59 60 // Shift the original convolution output 61 vector<double> y_shift = circular_shift(y, s); 62 63 // Compare: translation equivariance means y_from_shifted == y_shift 64 double max_abs_diff = 0.0; 65 for (int i = 0; i < n; ++i) { 66 max_abs_diff = max(max_abs_diff, fabs(y_from_shifted[i] - y_shift[i])); 67 } 68 69 cout << fixed << setprecision(6); 70 cout << "Max |(S_s x)*k - S_s(x*k)| = " << max_abs_diff << "\n"; 71 if (max_abs_diff < 1e-9) { 72 cout << "Equivariance verified (up to numerical precision).\n"; 73 } else { 74 cout << "Equivariance broken (likely by implementation or precision).\n"; 75 } 76 77 return 0; 78 } 79
This program implements circular 1D convolution and a circular shift. It demonstrates translation equivariance by showing that convolving a shifted input equals shifting the convolved output. Circular padding ensures exact equivariance on a periodic domain. The maximum absolute difference should be near machine precision.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Multiply a dxd matrix M by a d-vector v 5 vector<double> matvec(const vector<vector<double>>& M, const vector<double>& v) { 6 int d = (int)v.size(); 7 vector<double> out(d, 0.0); 8 for (int i = 0; i < d; ++i) { 9 double acc = 0.0; 10 for (int j = 0; j < d; ++j) acc += M[i][j] * v[j]; 11 out[i] = acc; 12 } 13 return out; 14 } 15 16 // Add two d-vectors 17 vector<double> add_vec(const vector<double>& a, const vector<double>& b) { 18 int d = (int)a.size(); 19 vector<double> out(d); 20 for (int i = 0; i < d; ++i) out[i] = a[i] + b[i]; 21 return out; 22 } 23 24 // Sum over n elements of d-vectors 25 vector<double> sum_rows(const vector<vector<double>>& X) { 26 int n = (int)X.size(); 27 int d = (int)X[0].size(); 28 vector<double> s(d, 0.0); 29 for (int i = 0; i < n; ++i) 30 for (int j = 0; j < d; ++j) 31 s[j] += X[i][j]; 32 return s; 33 } 34 35 // Permute rows of X by permutation p: Y[i] = X[p[i]] 36 vector<vector<double>> permute(const vector<vector<double>>& X, const vector<int>& p) { 37 int n = (int)X.size(); 38 vector<vector<double>> Y(n, vector<double>(X[0].size())); 39 for (int i = 0; i < n; ++i) Y[i] = X[p[i]]; 40 return Y; 41 } 42 43 // Equivariant layer: y_i = A x_i + B * sum_j x_j 44 vector<vector<double>> pe_layer(const vector<vector<double>>& X, 45 const vector<vector<double>>& A, 46 const vector<vector<double>>& B) { 47 int n = (int)X.size(); 48 int d = (int)X[0].size(); 49 vector<double> S = sum_rows(X); // global sum (shared context) 50 vector<vector<double>> Y(n, vector<double>(d, 0.0)); 51 vector<double> BS = matvec(B, S); 52 for (int i = 0; i < n; ++i) { 53 vector<double> Ax = matvec(A, X[i]); 54 for (int j = 0; j < d; ++j) Y[i][j] = Ax[j] + BS[j]; 55 } 56 return Y; 57 } 58 59 int main() { 60 ios::sync_with_stdio(false); 61 cin.tie(nullptr); 62 63 int n = 8; // number of elements in the set 64 int d = 4; // feature dimension per element 65 66 std::mt19937 rng(123); 67 std::normal_distribution<double> dist(0.0, 1.0); 68 69 // Random set X (n x d) 70 vector<vector<double>> X(n, vector<double>(d)); 71 for (int i = 0; i < n; ++i) 72 for (int j = 0; j < d; ++j) 73 X[i][j] = dist(rng); 74 75 // Random parameters A, B (d x d) 76 vector<vector<double>> A(d, vector<double>(d)), B(d, vector<double>(d)); 77 for (int i = 0; i < d; ++i) 78 for (int j = 0; j < d; ++j) { 79 A[i][j] = dist(rng) * 0.5; 80 B[i][j] = dist(rng) * 0.5; 81 } 82 83 // Random permutation p 84 vector<int> p(n); 85 iota(p.begin(), p.end(), 0); 86 shuffle(p.begin(), p.end(), rng); 87 88 // Forward on original X 89 auto Y1 = pe_layer(X, A, B); 90 // Permute inputs and forward 91 auto Xp = permute(X, p); 92 auto Y2 = pe_layer(Xp, A, B); 93 // Permute Y1 to compare: P Y1 94 auto PY1 = permute(Y1, p); 95 96 // Compute max absolute difference between Y2 and P Y1 97 double max_abs_diff = 0.0; 98 for (int i = 0; i < n; ++i) 99 for (int j = 0; j < d; ++j) 100 max_abs_diff = max(max_abs_diff, fabs(Y2[i][j] - PY1[i][j])); 101 102 cout << fixed << setprecision(6); 103 cout << "Max |f(PX) - P f(X)| = " << max_abs_diff << "\n"; 104 if (max_abs_diff < 1e-9) cout << "Permutation equivariance verified.\n"; 105 else cout << "Equivariance broken (check implementation).\n"; 106 107 return 0; 108 } 109
This code builds a linear permutation-equivariant layer for sets: y_i = A x_i + B sum_j x_j. It verifies f(PX) = P f(X) for a random permutation P by comparing outputs. The global sum couples elements identically, ensuring commutation with any reordering.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 struct Aggregates { 5 vector<double> sum, mean, mx; 6 }; 7 8 Aggregates aggregate(const vector<vector<double>>& X) { 9 int n = (int)X.size(); 10 int d = (int)X[0].size(); 11 Aggregates aggr; 12 aggr.sum.assign(d, 0.0); 13 aggr.mx.assign(d, -numeric_limits<double>::infinity()); 14 15 for (int i = 0; i < n; ++i) { 16 for (int j = 0; j < d; ++j) { 17 aggr.sum[j] += X[i][j]; 18 aggr.mx[j] = max(aggr.mx[j], X[i][j]); 19 } 20 } 21 aggr.mean = aggr.sum; 22 for (int j = 0; j < d; ++j) aggr.mean[j] /= max(1, (int)X.size()); 23 return aggr; 24 } 25 26 vector<vector<double>> permute(const vector<vector<double>>& X, const vector<int>& p) { 27 int n = (int)X.size(); 28 vector<vector<double>> Y(n, vector<double>(X[0].size())); 29 for (int i = 0; i < n; ++i) Y[i] = X[p[i]]; 30 return Y; 31 } 32 33 int main() { 34 ios::sync_with_stdio(false); 35 cin.tie(nullptr); 36 37 int n = 10, d = 3; 38 std::mt19937 rng(7); 39 std::uniform_real_distribution<double> dist(-1.0, 1.0); 40 41 vector<vector<double>> X(n, vector<double>(d)); 42 for (int i = 0; i < n; ++i) 43 for (int j = 0; j < d; ++j) 44 X[i][j] = dist(rng); 45 46 // Random permutation 47 vector<int> p(n); iota(p.begin(), p.end(), 0); shuffle(p.begin(), p.end(), rng); 48 49 auto A1 = aggregate(X); 50 auto A2 = aggregate(permute(X, p)); 51 52 auto cmp = [](const vector<double>& a, const vector<double>& b){ 53 double m = 0.0; 54 for (size_t i = 0; i < a.size(); ++i) m = max(m, fabs(a[i] - b[i])); 55 return m; 56 }; 57 58 cout << fixed << setprecision(6); 59 cout << "Max |sum - sum(PX)| = " << cmp(A1.sum, A2.sum) << "\n"; 60 cout << "Max |mean - mean(PX)| = " << cmp(A1.mean, A2.mean) << "\n"; 61 cout << "Max |max - max(PX)| = " << cmp(A1.mx, A2.mx) << "\n"; 62 63 return 0; 64 } 65
This example computes sum, mean, and max over element features and shows these aggregations are invariant to permutations. Such invariant readouts are used in DeepSets and GNNs to produce order-independent graph- or set-level summaries.