📚TheoryAdvanced

E(n) Equivariant Neural Networks

Key Points

•
E(n)-equivariant neural networks are models whose outputs transform predictably when inputs are rotated, translated, or reflected in n-dimensional Euclidean space.
•
They enforce physics-like symmetries by building layers from relative vectors and distance-based scalars, which guarantees correct transformation behavior.
•
Equivariance is stronger than invariance: invariant outputs stay the same under transformations, while equivariant outputs change in a controlled, known way.
•
Common E(n)-equivariant designs include EGNN-style message passing layers and force fields that depend only on pairwise distances.
•
Using only relative positions ( $x_{i}$ - $x_{j}$ ) and distance scalars (| $∣ x_{i} - x_{j} ∣$ |) preserves translation and rotation/reflection symmetries.
•
These models shine on molecules, point clouds, robotics, and physical simulations where geometry matters.
•
Mistakes such as injecting absolute coordinates, adding arbitrary biases to vector channels, or mixing scalars with vectors break equivariance.
•
You can test equivariance numerically by applying a random rotation/translation to inputs and checking that outputs rotate/translate in the same way.

Prerequisites

→Linear algebra (vectors, matrices, orthogonality) — Rotations/reflections are represented by orthogonal matrices and vectors transform linearly.
→Basic group theory — Equivariance and invariance are defined using group actions and representations.
→Neural networks (MLPs, activations) — Equivariant layers are built from standard neural components with geometric constraints.
→Graph neural networks — Point-cloud equivariant models commonly use message passing on k-NN graphs.
→Floating-point arithmetic and numerical stability — Equivariance tests require tolerances, and stable implementations avoid numerical artifacts.
→Euclidean geometry and distances — Using invariants like distances and relative positions is central to E(n) symmetry.

Detailed Explanation

Tap terms for definitions

01Overview

Imagine a function that understands shapes the way we do: if you rotate a 3D molecule or slide a 2D point cloud, the meaning shouldn’t change unpredictably. E(n)-equivariant neural networks (where E(n) is the Euclidean group in n dimensions) are built so their outputs transform in a predictable, geometrically correct way under rotations, translations, and reflections. Instead of learning this behavior from data (which is sample-inefficient), the symmetry is enforced directly by design. This typically happens by computing with relative positions (differences between points) and invariant scalars (like distances), and by carefully defining how outputs should transform (as scalars, vectors, or higher-order tensors). The payoff is better generalization, fewer training samples, and outputs that respect physical laws and geometric constraints. Hook → Concept → Example: Hook: If you rotate a map, roads are still connected; your GPS should not get confused. Concept: An E(n)-equivariant model changes its internal vector outputs by exactly the same rotation when you rotate the inputs, and it stays unchanged by translations when you translate the inputs. Example: A force-prediction model for atoms can be built so that if you rotate the molecule, all predicted forces rotate by the same amount, guaranteeing consistent physics. In practice, this is done using message passing over a graph of points, where messages depend on distances and relative directions, and updates are arranged to preserve the symmetry.

02Intuition & Analogies

Think of Lego structures built from connectors: only the distances between studs and how pieces plug together matter, not where the structure sits on your desk or how it’s oriented. If you pick the whole build up and rotate it, all the internal relations stay the same. E(n)-equivariant networks imitate this idea by focusing on relationships (relative vectors and distances) rather than absolute coordinates. Translations disappear when you subtract positions (x_i − x_j). Rotations reflect as simple rotations of every vector quantity, while scalars like distances stay unchanged. Another analogy: Consider a compass and a ruler. The ruler measures distance (invariant under rotation/translation), and the compass arrow points north (a vector that rotates if you turn the map). A good geometric model understands which things are “ruler-like” (scalars) and which are “compass-like” (vectors). E(n)-equivariant layers keep scalars as scalars and ensure vectors transform like vectors under any rotation or reflection. If you reflect a drawing in a mirror, a proper vector arrow flips direction accordingly; equivariant models guarantee the same for their vector outputs. Finally, think of consistent instructions: “Walk toward your friend by the distance between you.” If you both rotate the whole scene, the instruction remains valid and results rotate consistently. Message-passing layers built on distances and relative directions are like these instructions—they produce outputs that move exactly as the inputs are moved.

03Formal Definition

Let

G = E (n

) be the Euclidean group of rigid motions in

R^{n}

: all transformations g(x) = R x + t with R

\in

O(n) (orthogonal matrices, allowing rotations and reflections) and t

\in

R^{n}

. A function f between spaces with group actions is equivariant if applying g before f equals applying a corresponding action after f. Formally, given representations

ρ_{X}

: G

\to

Aut

(X) and

ρ_{Y}

: G

\to

Aut

(Y), f is G-equivariant if f(

ρ_{X}

(g) x) =

ρ_{Y}

(g) f(x) for all g

\in

G and x

\in

X. In E(n)-equivariant neural nets for point clouds, inputs are point sets X = \{

x_{i}

\}_{

i=1}^{N}

\subset

R^{n}

with optional features

h_{i}

. The standard action of g

\in

E(n) on coordinates is g

\cdot

x_{i} = R

x_{i}

+ t; scalar features remain unchanged (

ρ

(

h_{i}

h_{i}

), vector features are multiplied by R, and higher-order tensors transform via appropriate tensor representations. Equivariant layers are constructed from invariants like \

∣ x_{i} - x_{j} ∥

and covariants like (

x_{i}

x_{j}

). By ensuring computations depend only on invariants for scalar channels and multiply relative vectors by scalar functions of invariants for vector channels, the layer satisfies f(g

\cdot

X) = g

\cdot

f(X).

04When to Use

Molecular modeling: Predict energies (invariant) and forces (equivariant vectors) for atoms in 3D without data augmentation over orientations.
Point-cloud perception: Classify objects (invariant label) or predict normals and flow fields (equivariant vectors) in robotics and 3D vision.
Physics simulation: Learn interaction laws where outputs (like accelerations or forces) must rotate consistently with the scene and be insensitive to global translation.
Robotics and SLAM: Build models that don’t change their belief just because the robot turned its head or moved the camera; vector outputs (e.g., directions) should rotate with the robot.
Medical imaging and materials science: Analyze structures where orientation is arbitrary, but geometric relations are essential. Use E(n)-equivariant networks when geometry and rigid motions are part of the problem definition and you want sample-efficient, symmetry-respecting predictions. Prefer simpler invariant heads (pooling) when you only need an orientation-independent scalar. Use full equivariance when predicting geometric quantities (vectors/tensors) or when downstream tasks require consistent directional behavior.

⚠️Common Mistakes

Confusing invariance with equivariance: Invariance means f(gx) = f(x); equivariance means f(gx) = g f(x). Predicting a vector field requires equivariance, not invariance.
Using absolute coordinates: Injecting raw x_i into scalar networks breaks translation equivariance; always use relative positions (x_i - x_j) and invariant scalars (|x_i - x_j|).
Mixing scalars and vectors incorrectly: Adding a learned bias to vector channels or combining them with scalars without geometry-aware rules destroys equivariance. Vector updates must be linear in relative vectors with scalar coefficients that depend only on invariants.
Ignoring reflections: Many tasks require O(n) (rotations + reflections), not just SO(n). Using cross products or pseudoscalar constructions can flip incorrectly under reflections.
Normalization layers and biases: Applying per-dimension biases or anisotropic normalization to vector channels breaks symmetry; prefer scalar-only normalization or channel-wise isotropic operations.
Floating-point tests: Numerical checks of equivariance will show tiny nonzero errors due to finite precision; use tolerances, not exact equality.
Overly dense graphs: Full O(N^2) connectivity grows quickly; use k-NN neighborhoods to keep computation tractable while preserving locality.

Key Formulas

Equivariance condition

f (ρ_{X} (g) x) = ρ_{Y} (g) f (x)

Explanation: A function f is equivariant if transforming the input by group action $ρ_{X}$ and then applying f equals applying f and then transforming the output by $ρ_{Y}$ . This is the core requirement for E(n)-equivariant networks.

Structure of the Euclidean group

E (n) = R^{n} ⋊ O (n)

Explanation: The Euclidean group combines translations $R^{n}$ and orthogonal transformations O(n) via a semidirect product. It formalizes that rigid motions are rotations/reflections followed by translations.

Group action on coordinates

g \cdot x = R x + t, R \in O (n), t \in R^{n}

Explanation: A Euclidean transformation maps coordinates by rotation/reflection and translation. This is how inputs are transformed in E(n)-equivariant models.

Distance invariance

∥ R (x_{i} - x_{j}) ∥^{2} = ∥ x_{i} - x_{j} ∥^{2}

Explanation: Squared distances between points are unchanged by orthogonal transformations. This justifies using distances as invariant inputs to scalar functions.

Feature update (EGNN-style)

h_{i}^{'} = ϕ_{h} h_{i}, j \in N (i) \sum w_{ij} h_{j}, w_{ij} = ϕ_{e} (h_{i}, h_{j}, ∥ x_{i} - x_{j} ∥^{2})

Explanation: Node features are updated by aggregating neighbor features scaled by scalar weights that depend only on invariants. This preserves E(n) symmetry for scalar features.

Coordinate update (EGNN-style)

x_{i}^{'} = x_{i} + j \in N (i) \sum (x_{i} - x_{j}) s_{ij}, s_{ij} = ϕ_{x} (h_{i}, h_{j}, ∥ x_{i} - x_{j} ∥^{2})

Explanation: Coordinates are updated by linear combinations of relative vectors with scalar coefficients depending on invariants. This guarantees E(n)-equivariance of vector outputs.

Invariant pooling

Invariant head: y = Pool_{i} ψ (h_{i}), e.g., sum/mean

Explanation: Graph-level scalar predictions can be made invariant by pooling node-wise scalar features. Pooling commutes with E(n) actions because features are scalars.

Equivariant force field

F_{i} = j \neq = i \sum k (∥ x_{i} - x_{j} ∥^{2}) (x_{i} - x_{j})

Explanation: A force constructed from relative vectors scaled by distance-based scalars is E(n)-equivariant. Under rigid motions, forces rotate/reflect exactly like coordinates.

EGNN layer complexity

O (N k (F^{2} + d F))

Explanation: For N nodes, k neighbors, feature size F, and dimension d, the main costs are computing scalar weights and feature MLPs per edge and aggregating vectors. This is the typical time complexity of one forward pass.

Numerical equivariance test

∥ f (g \cdot X) - ρ_{Y} (g) f (X) ∥ \leq ε

Explanation: In floating-point arithmetic, we test equivariance up to a small tolerance $ε$ . This accounts for rounding errors and implementation noise.

Complexity Analysis

Consider an EGNN-style layer over a graph of N nodes in dimension d with k neighbors per node and scalar feature width F. For each edge (i, j), we compute invariants (e.g., squared distance), pass them with features through small MLPs to get scalar weights, and aggregate messages. The per-edge cost includes O(

F^{2}

) multiplications for feature mixing (when

φ_{h}

is a two-layer MLP or a linear map) and O(d) work to accumulate coordinate updates via relative vectors. Summed over all O(Nk) edges, the time complexity is O(Nk(

F^{2}

+ dF)), dominated by feature transformations and vector aggregations; with full connectivity k ≈ N, this becomes O(

N^{2}

(

F^{2}

+ dF)). Neighbor search (e.g., k-NN) can add O(N log N) to O(Nk) preprocessing depending on the method, but is often amortized or approximated. Space complexity consists of storing features O(NF), coordinates O(Nd), and adjacency O(Nk). Temporary buffers for messages add O(NkF) if explicitly materialized; streaming aggregations can reduce peak memory to O(NF + Nd + Nk). Parameter memory is small compared to activations when using shallow MLPs per edge. For the force-field example that uses only distance-based scalars and relative vectors, time is O(Nk d) (or O(

N^{2}

d) if dense) and space is O(Nd + Nk). In practice, choosing modest k (e.g., 16–64) balances accuracy and efficiency, and batching multiple graphs affects memory linearly with batch size.

Code Examples

EGNN-style layer with equivariance test (3D)

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Simple utilities for vectors/matrices in 3D
5 struct Vec {
6     double x, y, z;
7 };
8 
9 Vec operator+(const Vec& a, const Vec& b){ return {a.x+b.x, a.y+b.y, a.z+b.z}; }
10 Vec operator-(const Vec& a, const Vec& b){ return {a.x-b.x, a.y-b.y, a.z-b.z}; }
11 Vec operator*(const Vec& a, double s){ return {a.x*s, a.y*s, a.z*s}; }
12 Vec& operator+=(Vec& a, const Vec& b){ a.x+=b.x; a.y+=b.y; a.z+=b.z; return a; }
13 
14 double dot(const Vec& a, const Vec& b){ return a.x*b.x + a.y*b.y + a.z*b.z; }
15 double norm2(const Vec& a){ return dot(a,a); }
16 
17 // 3x3 matrix for rotations/reflections
18 struct Mat3 {
19     double m[3][3];
20 };
21 
22 Vec mul(const Mat3& R, const Vec& v){
23     return {
24         R.m[0][0]*v.x + R.m[0][1]*v.y + R.m[0][2]*v.z,
25         R.m[1][0]*v.x + R.m[1][1]*v.y + R.m[1][2]*v.z,
26         R.m[2][0]*v.x + R.m[2][1]*v.y + R.m[2][2]*v.z
27     };
28 }
29 
30 Mat3 gram_schmidt_orthogonal(mt19937& rng, bool allow_reflection=true){
31     normal_distribution<double> nd(0.0,1.0);
32     // Random matrix A
33     double a[3][3];
34     for(int i=0;i<3;i++) for(int j=0;j<3;j++) a[i][j] = nd(rng);
35     // Gram-Schmidt
36     vector<array<double,3>> v(3), u(3);
37     for(int i=0;i<3;i++){ v[i] = {a[0][i], a[1][i], a[2][i]}; }
38     auto proj = [&](array<double,3> u, array<double,3> v){
39         double udotv = u[0]*v[0] + u[1]*v[1] + u[2]*v[2];
40         double udotu = u[0]*u[0] + u[1]*u[1] + u[2]*u[2];
41         double s = udotv / max(udotu,1e-12);
42         return array<double,3>{u[0]*s, u[1]*s, u[2]*s};
43     };
44     for(int i=0;i<3;i++){
45         array<double,3> w = v[i];
46         for(int j=0;j<i;j++){
47             auto p = proj(u[j], v[i]);
48             w[0] -= p[0]; w[1] -= p[1]; w[2] -= p[2];
49         }
50         double n = sqrt(max(1e-12, w[0]*w[0] + w[1]*w[1] + w[2]*w[2]));
51         u[i] = {w[0]/n, w[1]/n, w[2]/n};
52     }
53     Mat3 R{};
54     for(int i=0;i<3;i++) for(int j=0;j<3;j++) R.m[i][j] = u[j][i]; // columns are u
55     // Adjust determinant if needed
56     double det = 
57         R.m[0][0]*(R.m[1][1]*R.m[2][2]-R.m[1][2]*R.m[2][1]) -
58         R.m[0][1]*(R.m[1][0]*R.m[2][2]-R.m[1][2]*R.m[2][0]) +
59         R.m[0][2]*(R.m[1][0]*R.m[2][1]-R.m[1][1]*R.m[2][0]);
60     if(!allow_reflection && det < 0){
61         // Flip one column to make det positive
62         for(int i=0;i<3;i++) R.m[i][2] *= -1.0;
63     }
64     return R;
65 }
66 
67 struct MLP {
68     // Simple 2-layer MLP: in -> hidden -> out with ReLU
69     vector<vector<double>> W1, W2;
70     vector<double> b1, b2;
71     MLP(){}
72     MLP(int in_dim, int hid, int out_dim, mt19937& rng){
73         normal_distribution<double> nd(0.0, 0.5);
74         W1.assign(hid, vector<double>(in_dim));
75         W2.assign(out_dim, vector<double>(hid));
76         b1.assign(hid, 0.0);
77         b2.assign(out_dim, 0.0);
78         for(auto& r: W1) for(auto& v: r) v = nd(rng) / sqrt(in_dim);
79         for(auto& r: W2) for(auto& v: r) v = nd(rng) / sqrt(hid);
80     }
81     vector<double> forward(const vector<double>& x) const{
82         int hid = (int)W1.size(), in_dim = (int)W1[0].size();
83         vector<double> h(hid, 0.0);
84         for(int i=0;i<hid;i++){
85             double s = b1[i];
86             for(int j=0;j<in_dim;j++) s += W1[i][j]*x[j];
87             h[i] = max(0.0, s); // ReLU
88         }
89         int out_dim = (int)W2.size();
90         vector<double> y(out_dim, 0.0);
91         for(int i=0;i<out_dim;i++){
92             double s = b2[i];
93             for(int j=0;j<hid;j++) s += W2[i][j]*h[j];
94             y[i] = s; // linear output
95         }
96         return y;
97     }
98 };
99 
100 struct EGNNLayer {
101     int F; // feature size
102     MLP phi_e; // edge weight scalar
103     MLP phi_h; // feature update
104     MLP phi_x; // coordinate scalar
105     double coord_scale;
106     EGNNLayer(int F_, mt19937& rng, int hidden=32): F(F_), coord_scale(0.1){
107         // Inputs: [h_i (F), h_j (F), d2 (1)] -> scalar
108         phi_e = MLP(2*F + 1, hidden, 1, rng);
109         phi_x = MLP(2*F + 1, hidden, 1, rng);
110         // Inputs to phi_h: [h_i (F), m_i (F)] -> h_i' (F)
111         phi_h = MLP(2*F, hidden, F, rng);
112     }
113 
114     void forward(const vector<Vec>& X, const vector<vector<double>>& H,
115                  vector<Vec>& Xout, vector<vector<double>>& Hout) const{
116         int N = (int)X.size();
117         Xout.assign(N, {0,0,0});
118         Hout.assign(N, vector<double>(F, 0.0));
119         // Precompute messages
120         vector<vector<double>> m(N, vector<double>(F, 0.0));
121         vector<Vec> dx(N, {0,0,0});
122         for(int i=0;i<N;i++){
123             for(int j=0;j<N;j++) if(i!=j){
124                 Vec rij = X[i] - X[j];
125                 double d2 = norm2(rij);
126                 // Edge scalar weights
127                 vector<double> in;
128                 in.reserve(2*F+1);
129                 in.insert(in.end(), H[i].begin(), H[i].end());
130                 in.insert(in.end(), H[j].begin(), H[j].end());
131                 in.push_back(d2);
132                 double wij = phi_e.forward(in)[0];
133                 double sij = phi_x.forward(in)[0];
134                 // Feature message: scale neighbor feature by scalar weight
135                 for(int f=0; f<F; f++) m[i][f] += wij * H[j][f];
136                 // Coordinate update: linear in relative vector with scalar coeff
137                 dx[i] += rij * (coord_scale * sij);
138             }
139         }
140         for(int i=0;i<N;i++){
141             // Feature update
142             vector<double> hin;
143             hin.reserve(2*F);
144             hin.insert(hin.end(), H[i].begin(), H[i].end());
145             hin.insert(hin.end(), m[i].begin(), m[i].end());
146             Hout[i] = phi_h.forward(hin);
147             // Coordinate update is equivariant by design
148             Xout[i] = {X[i].x + dx[i].x, X[i].y + dx[i].y, X[i].z + dx[i].z};
149         }
150     }
151 };
152 
153 int main(){
154     ios::sync_with_stdio(false);
155     cin.tie(nullptr);
156 
157     mt19937 rng(42);
158     uniform_real_distribution<double> ud(-1.0, 1.0);
159 
160     int N = 6; int F = 4;
161     vector<Vec> X(N);
162     vector<vector<double>> H(N, vector<double>(F));
163     for(int i=0;i<N;i++){
164         X[i] = {ud(rng), ud(rng), ud(rng)};
165         for(int f=0; f<F; f++) H[i][f] = ud(rng);
166     }
167 
168     EGNNLayer layer(F, rng, 32);
169 
170     vector<Vec> X1; vector<vector<double>> H1;
171     layer.forward(X, H, X1, H1);
172 
173     // Apply a random orthogonal transform and translation
174     Mat3 R = gram_schmidt_orthogonal(rng, /*allow_reflection=*/true);
175     Vec t = {0.7, -0.3, 0.2};
176 
177     vector<Vec> Xg(N);
178     for(int i=0;i<N;i++) Xg[i] = mul(R, X[i]) + t; // g·X
179 
180     vector<Vec> X2; vector<vector<double>> H2;
181     layer.forward(Xg, H, X2, H2); // same H (scalar features don't transform)
182 
183     // Transform original output: g·(X1)
184     vector<Vec> X1g(N);
185     for(int i=0;i<N;i++) X1g[i] = mul(R, X1[i]) + t;
186 
187     // Compare X2 vs X1g and H2 vs H1 (features are scalars -> invariant)
188     auto max_abs = [](double a, double b){ return fabs(a)>fabs(b)?a:b; };
189     double pos_err = 0.0, feat_err = 0.0;
190     for(int i=0;i<N;i++){
191         pos_err = max(pos_err, fabs(X2[i].x - X1g[i].x));
192         pos_err = max(pos_err, fabs(X2[i].y - X1g[i].y));
193         pos_err = max(pos_err, fabs(X2[i].z - X1g[i].z));
194         for(int f=0; f<F; f++) feat_err = max(feat_err, fabs(H2[i][f] - H1[i][f]));
195     }
196 
197     cout.setf(std::ios::fixed); cout<<setprecision(6);
198     cout << "Max coordinate equivariance error: " << pos_err << "\n";
199     cout << "Max feature invariance error:    " << feat_err << "\n";
200     cout << "(Small nonzero errors are expected due to floating-point and random weights.)\n";
201 
202     return 0;
203 }
204

This example implements a minimal EGNN-style layer. Scalar edge weights depend only on invariant inputs (features and squared distances). Coordinate updates are linear in relative vectors with scalar coefficients, guaranteeing E(n)-equivariance. We numerically test equivariance by applying a random orthogonal transform (rotation/reflection) and a translation to the inputs and verifying that the outputs transform accordingly. Features are treated as scalars (unchanged by E(n)); coordinates transform by R and t.

Time: O(N^2 (F^2 + dF)) for dense connectivity; with k-NN it becomes O(Nk (F^2 + dF)).Space: O(NF + Nd + N^2) in dense mode (adjacency implicit); can be reduced to O(NF + Nd + Nk) with sparse neighborhoods.

E(n)-equivariant force field from distance-based kernel

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 struct Vec { double x,y,z; };
5 Vec operator+(const Vec& a, const Vec& b){ return {a.x+b.x, a.y+b.y, a.z+b.z}; }
6 Vec operator-(const Vec& a, const Vec& b){ return {a.x-b.x, a.y-b.y, a.z-b.z}; }
7 Vec operator*(const Vec& a, double s){ return {a.x*s, a.y*s, a.z*s}; }
8 Vec& operator+=(Vec& a, const Vec& b){ a.x+=b.x; a.y+=b.y; a.z+=b.z; return a; }
9 
10 double dot(const Vec& a, const Vec& b){ return a.x*b.x + a.y*b.y + a.z*b.z; }
11 double norm2(const Vec& a){ return dot(a,a); }
12 
13 struct Mat3 { double m[3][3]; };
14 Vec mul(const Mat3& R, const Vec& v){
15     return { R.m[0][0]*v.x + R.m[0][1]*v.y + R.m[0][2]*v.z,
16              R.m[1][0]*v.x + R.m[1][1]*v.y + R.m[1][2]*v.z,
17              R.m[2][0]*v.x + R.m[2][1]*v.y + R.m[2][2]*v.z };
18 }
19 
20 Mat3 random_orthogonal(mt19937& rng, bool allow_reflection=true){
21     normal_distribution<double> nd(0.0,1.0);
22     // Gram-Schmidt
23     double a[3][3]; for(int i=0;i<3;i++) for(int j=0;j<3;j++) a[i][j]=nd(rng);
24     array<double,3> u[3];
25     auto norm = [&](array<double,3> v){ return sqrt(max(1e-12, v[0]*v[0]+v[1]*v[1]+v[2]*v[2])); };
26     auto proj = [&](array<double,3> u, array<double,3> v){
27         double s = (u[0]*v[0]+u[1]*v[1]+u[2]*v[2]) / max(1e-12, u[0]*u[0]+u[1]*u[1]+u[2]*u[2]);
28         return array<double,3>{u[0]*s,u[1]*s,u[2]*s}; };
29     array<double,3> v[3] = { array<double,3>{a[0][0],a[1][0],a[2][0]},
30                               array<double,3>{a[0][1],a[1][1],a[2][1]},
31                               array<double,3>{a[0][2],a[1][2],a[2][2]} };
32     for(int i=0;i<3;i++){
33         auto w = v[i];
34         for(int j=0;j<i;j++){ auto p = proj(u[j], v[i]); w[0]-=p[0]; w[1]-=p[1]; w[2]-=p[2]; }
35         double n = norm(w); u[i] = {w[0]/n, w[1]/n, w[2]/n};
36     }
37     Mat3 R{}; for(int i=0;i<3;i++) for(int j=0;j<3;j++) R.m[i][j] = u[j][i];
38     double det = R.m[0][0]*(R.m[1][1]*R.m[2][2]-R.m[1][2]*R.m[2][1])
39                - R.m[0][1]*(R.m[1][0]*R.m[2][2]-R.m[1][2]*R.m[2][0])
40                + R.m[0][2]*(R.m[1][0]*R.m[2][1]-R.m[1][1]*R.m[2][0]);
41     if(!allow_reflection && det < 0){ for(int i=0;i<3;i++) R.m[i][2] *= -1.0; }
42     return R;
43 }
44 
45 struct RadialMLP {
46     // k(d2) = w2 * ReLU(w1 * [d2, d2^0.5, 1]) + b2 (scalar output)
47     array<double,3> w1; double b1; double w2; double b2;
48     RadialMLP(mt19937& rng){
49         normal_distribution<double> nd(0.0, 0.5);
50         for(int i=0;i<3;i++) w1[i] = nd(rng);
51         b1 = 0.0; w2 = nd(rng); b2 = 0.0;
52     }
53     double forward(double d2) const{
54         double d = sqrt(max(1e-12, d2));
55         double h = w1[0]*d2 + w1[1]*d + w1[2]*1.0 + b1;
56         h = max(0.0, h); // ReLU ensures non-negativity
57         return w2 * h + b2;
58     }
59 };
60 
61 vector<Vec> compute_forces(const vector<Vec>& X, const RadialMLP& phi){
62     int N = (int)X.size();
63     vector<Vec> F(N, {0,0,0});
64     for(int i=0;i<N;i++){
65         Vec fi{0,0,0};
66         for(int j=0;j<N;j++) if(i!=j){
67             Vec rij = {X[i].x - X[j].x, X[i].y - X[j].y, X[i].z - X[j].z};
68             double d2 = rij.x*rij.x + rij.y*rij.y + rij.z*rij.z;
69             double kij = phi.forward(d2);
70             fi += rij * kij; // linear in relative vector, scalar depends on invariants
71         }
72         F[i] = fi;
73     }
74     return F;
75 }
76 
77 int main(){
78     mt19937 rng(7);
79     uniform_real_distribution<double> ud(-1.0, 1.0);
80     int N=8; vector<Vec> X(N);
81     for(int i=0;i<N;i++) X[i] = {ud(rng), ud(rng), ud(rng)};
82 
83     RadialMLP phi(rng);
84     vector<Vec> F = compute_forces(X, phi);
85 
86     // Apply random rigid motion
87     Mat3 R = random_orthogonal(rng, true);
88     Vec t = {0.3, -0.4, 0.1};
89     vector<Vec> Xg(N); for(int i=0;i<N;i++) Xg[i] = {R.m[0][0]*X[i].x + R.m[0][1]*X[i].y + R.m[0][2]*X[i].z + t.x,
90                                                    R.m[1][0]*X[i].x + R.m[1][1]*X[i].y + R.m[1][2]*X[i].z + t.y,
91                                                    R.m[2][0]*X[i].x + R.m[2][1]*X[i].y + R.m[2][2]*X[i].z + t.z};
92     vector<Vec> Fg = compute_forces(Xg, phi);
93 
94     // Transform original forces by R and compare to Fg
95     auto mul = [&](const Vec& v){ return Vec{R.m[0][0]*v.x + R.m[0][1]*v.y + R.m[0][2]*v.z,
96                                             R.m[1][0]*v.x + R.m[1][1]*v.y + R.m[1][2]*v.z,
97                                             R.m[2][0]*v.x + R.m[2][1]*v.y + R.m[2][2]*v.z}; };
98     double max_err = 0.0;
99     for(int i=0;i<N;i++){
100         Vec RF = mul(F[i]);
101         max_err = max(max_err, fabs(RF.x - Fg[i].x));
102         max_err = max(max_err, fabs(RF.y - Fg[i].y));
103         max_err = max(max_err, fabs(RF.z - Fg[i].z));
104     }
105     cout.setf(std::ios::fixed); cout<<setprecision(6);
106     cout << "Max force equivariance error: " << max_err << "\n";
107     cout << "(Small nonzero errors expected due to floating point.)\n";
108     return 0;
109 }
110

This program defines an equivariant force field on a set of 3D points. Forces are sums of relative vectors scaled by a distance-based scalar kernel, which is invariant under E(n). Under any rigid motion (rotation/reflection + translation), the forces rotate or reflect exactly like the inputs, which we verify numerically.

Time: O(N^2 d) for dense all-pairs interactions; with k-NN neighbors, O(Nk d).Space: O(Nd).

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Simple utilities for vectors/matrices in 3D
5	struct Vec {
6	double x, y, z;
7	};
8
9	Vec operator+(const Vec& a, const Vec& b){ return {a.x+b.x, a.y+b.y, a.z+b.z}; }
10	Vec operator-(const Vec& a, const Vec& b){ return {a.x-b.x, a.y-b.y, a.z-b.z}; }
11	Vec operator(const Vec& a, double s){ return {a.xs, a.ys, a.zs}; }
12	Vec& operator+=(Vec& a, const Vec& b){ a.x+=b.x; a.y+=b.y; a.z+=b.z; return a; }
13
14	double dot(const Vec& a, const Vec& b){ return a.xb.x + a.yb.y + a.z*b.z; }
15	double norm2(const Vec& a){ return dot(a,a); }
16
17	// 3x3 matrix for rotations/reflections
18	struct Mat3 {
19	double m[3][3];
20	};
21
22	Vec mul(const Mat3& R, const Vec& v){
23	return {
24	R.m[0][0]v.x + R.m[0][1]v.y + R.m[0][2]*v.z,
25	R.m[1][0]v.x + R.m[1][1]v.y + R.m[1][2]*v.z,
26	R.m[2][0]v.x + R.m[2][1]v.y + R.m[2][2]*v.z
27	};
28	}
29
30	Mat3 gram_schmidt_orthogonal(mt19937& rng, bool allow_reflection=true){
31	normal_distribution<double> nd(0.0,1.0);
32	// Random matrix A
33	double a[3][3];
34	for(int i=0;i<3;i++) for(int j=0;j<3;j++) a[i][j] = nd(rng);
35	// Gram-Schmidt
36	vector<array<double,3>> v(3), u(3);
37	for(int i=0;i<3;i++){ v[i] = {a[0][i], a[1][i], a[2][i]}; }
38	auto proj = [&](array<double,3> u, array<double,3> v){
39	double udotv = u[0]v[0] + u[1]v[1] + u[2]*v[2];
40	double udotu = u[0]u[0] + u[1]u[1] + u[2]*u[2];
41	double s = udotv / max(udotu,1e-12);
42	return array<double,3>{u[0]s, u[1]s, u[2]*s};
43	};
44	for(int i=0;i<3;i++){
45	array<double,3> w = v[i];
46	for(int j=0;j<i;j++){
47	auto p = proj(u[j], v[i]);
48	w[0] -= p[0]; w[1] -= p[1]; w[2] -= p[2];
49	}
50	double n = sqrt(max(1e-12, w[0]w[0] + w[1]w[1] + w[2]*w[2]));
51	u[i] = {w[0]/n, w[1]/n, w[2]/n};
52	}
53	Mat3 R{};
54	for(int i=0;i<3;i++) for(int j=0;j<3;j++) R.m[i][j] = u[j][i]; // columns are u
55	// Adjust determinant if needed
56	double det =
57	R.m[0][0](R.m[1][1]R.m[2][2]-R.m[1][2]*R.m[2][1]) -
58	R.m[0][1](R.m[1][0]R.m[2][2]-R.m[1][2]*R.m[2][0]) +
59	R.m[0][2](R.m[1][0]R.m[2][1]-R.m[1][1]*R.m[2][0]);
60	if(!allow_reflection && det < 0){
61	// Flip one column to make det positive
62	for(int i=0;i<3;i++) R.m[i][2] *= -1.0;
63	}
64	return R;
65	}
66
67	struct MLP {
68	// Simple 2-layer MLP: in -> hidden -> out with ReLU
69	vector<vector<double>> W1, W2;
70	vector<double> b1, b2;
71	MLP(){}
72	MLP(int in_dim, int hid, int out_dim, mt19937& rng){
73	normal_distribution<double> nd(0.0, 0.5);
74	W1.assign(hid, vector<double>(in_dim));
75	W2.assign(out_dim, vector<double>(hid));
76	b1.assign(hid, 0.0);
77	b2.assign(out_dim, 0.0);
78	for(auto& r: W1) for(auto& v: r) v = nd(rng) / sqrt(in_dim);
79	for(auto& r: W2) for(auto& v: r) v = nd(rng) / sqrt(hid);
80	}
81	vector<double> forward(const vector<double>& x) const{
82	int hid = (int)W1.size(), in_dim = (int)W1[0].size();
83	vector<double> h(hid, 0.0);
84	for(int i=0;i<hid;i++){
85	double s = b1[i];
86	for(int j=0;j<in_dim;j++) s += W1[i][j]*x[j];
87	h[i] = max(0.0, s); // ReLU
88	}
89	int out_dim = (int)W2.size();
90	vector<double> y(out_dim, 0.0);
91	for(int i=0;i<out_dim;i++){
92	double s = b2[i];
93	for(int j=0;j<hid;j++) s += W2[i][j]*h[j];
94	y[i] = s; // linear output
95	}
96	return y;
97	}
98	};
99
100	struct EGNNLayer {
101	int F; // feature size
102	MLP phi_e; // edge weight scalar
103	MLP phi_h; // feature update
104	MLP phi_x; // coordinate scalar
105	double coord_scale;
106	EGNNLayer(int F_, mt19937& rng, int hidden=32): F(F_), coord_scale(0.1){
107	// Inputs: [h_i (F), h_j (F), d2 (1)] -> scalar
108	phi_e = MLP(2*F + 1, hidden, 1, rng);
109	phi_x = MLP(2*F + 1, hidden, 1, rng);
110	// Inputs to phi_h: [h_i (F), m_i (F)] -> h_i' (F)
111	phi_h = MLP(2*F, hidden, F, rng);
112	}
113
114	void forward(const vector<Vec>& X, const vector<vector<double>>& H,
115	vector<Vec>& Xout, vector<vector<double>>& Hout) const{
116	int N = (int)X.size();
117	Xout.assign(N, {0,0,0});
118	Hout.assign(N, vector<double>(F, 0.0));
119	// Precompute messages
120	vector<vector<double>> m(N, vector<double>(F, 0.0));
121	vector<Vec> dx(N, {0,0,0});
122	for(int i=0;i<N;i++){
123	for(int j=0;j<N;j++) if(i!=j){
124	Vec rij = X[i] - X[j];
125	double d2 = norm2(rij);
126	// Edge scalar weights
127	vector<double> in;
128	in.reserve(2*F+1);
129	in.insert(in.end(), H[i].begin(), H[i].end());
130	in.insert(in.end(), H[j].begin(), H[j].end());
131	in.push_back(d2);
132	double wij = phi_e.forward(in)[0];
133	double sij = phi_x.forward(in)[0];
134	// Feature message: scale neighbor feature by scalar weight
135	for(int f=0; f<F; f++) m[i][f] += wij * H[j][f];
136	// Coordinate update: linear in relative vector with scalar coeff
137	dx[i] += rij * (coord_scale * sij);
138	}
139	}
140	for(int i=0;i<N;i++){
141	// Feature update
142	vector<double> hin;
143	hin.reserve(2*F);
144	hin.insert(hin.end(), H[i].begin(), H[i].end());
145	hin.insert(hin.end(), m[i].begin(), m[i].end());
146	Hout[i] = phi_h.forward(hin);
147	// Coordinate update is equivariant by design
148	Xout[i] = {X[i].x + dx[i].x, X[i].y + dx[i].y, X[i].z + dx[i].z};
149	}
150	}
151	};
152
153	int main(){
154	ios::sync_with_stdio(false);
155	cin.tie(nullptr);
156
157	mt19937 rng(42);
158	uniform_real_distribution<double> ud(-1.0, 1.0);
159
160	int N = 6; int F = 4;
161	vector<Vec> X(N);
162	vector<vector<double>> H(N, vector<double>(F));
163	for(int i=0;i<N;i++){
164	X[i] = {ud(rng), ud(rng), ud(rng)};
165	for(int f=0; f<F; f++) H[i][f] = ud(rng);
166	}
167
168	EGNNLayer layer(F, rng, 32);
169
170	vector<Vec> X1; vector<vector<double>> H1;
171	layer.forward(X, H, X1, H1);
172
173	// Apply a random orthogonal transform and translation
174	Mat3 R = gram_schmidt_orthogonal(rng, /allow_reflection=/true);
175	Vec t = {0.7, -0.3, 0.2};
176
177	vector<Vec> Xg(N);
178	for(int i=0;i<N;i++) Xg[i] = mul(R, X[i]) + t; // g·X
179
180	vector<Vec> X2; vector<vector<double>> H2;
181	layer.forward(Xg, H, X2, H2); // same H (scalar features don't transform)
182
183	// Transform original output: g·(X1)
184	vector<Vec> X1g(N);
185	for(int i=0;i<N;i++) X1g[i] = mul(R, X1[i]) + t;
186
187	// Compare X2 vs X1g and H2 vs H1 (features are scalars -> invariant)
188	auto max_abs = [](double a, double b){ return fabs(a)>fabs(b)?a:b; };
189	double pos_err = 0.0, feat_err = 0.0;
190	for(int i=0;i<N;i++){
191	pos_err = max(pos_err, fabs(X2[i].x - X1g[i].x));
192	pos_err = max(pos_err, fabs(X2[i].y - X1g[i].y));
193	pos_err = max(pos_err, fabs(X2[i].z - X1g[i].z));
194	for(int f=0; f<F; f++) feat_err = max(feat_err, fabs(H2[i][f] - H1[i][f]));
195	}
196
197	cout.setf(std::ios::fixed); cout<<setprecision(6);
198	cout << "Max coordinate equivariance error: " << pos_err << "\n";
199	cout << "Max feature invariance error: " << feat_err << "\n";
200	cout << "(Small nonzero errors are expected due to floating-point and random weights.)\n";
201
202	return 0;
203	}
204

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	struct Vec { double x,y,z; };
5	Vec operator+(const Vec& a, const Vec& b){ return {a.x+b.x, a.y+b.y, a.z+b.z}; }
6	Vec operator-(const Vec& a, const Vec& b){ return {a.x-b.x, a.y-b.y, a.z-b.z}; }
7	Vec operator(const Vec& a, double s){ return {a.xs, a.ys, a.zs}; }
8	Vec& operator+=(Vec& a, const Vec& b){ a.x+=b.x; a.y+=b.y; a.z+=b.z; return a; }
9
10	double dot(const Vec& a, const Vec& b){ return a.xb.x + a.yb.y + a.z*b.z; }
11	double norm2(const Vec& a){ return dot(a,a); }
12
13	struct Mat3 { double m[3][3]; };
14	Vec mul(const Mat3& R, const Vec& v){
15	return { R.m[0][0]v.x + R.m[0][1]v.y + R.m[0][2]*v.z,
16	R.m[1][0]v.x + R.m[1][1]v.y + R.m[1][2]*v.z,
17	R.m[2][0]v.x + R.m[2][1]v.y + R.m[2][2]*v.z };
18	}
19
20	Mat3 random_orthogonal(mt19937& rng, bool allow_reflection=true){
21	normal_distribution<double> nd(0.0,1.0);
22	// Gram-Schmidt
23	double a[3][3]; for(int i=0;i<3;i++) for(int j=0;j<3;j++) a[i][j]=nd(rng);
24	array<double,3> u[3];
25	auto norm = [&](array<double,3> v){ return sqrt(max(1e-12, v[0]v[0]+v[1]v[1]+v[2]*v[2])); };
26	auto proj = [&](array<double,3> u, array<double,3> v){
27	double s = (u[0]v[0]+u[1]v[1]+u[2]v[2]) / max(1e-12, u[0]u[0]+u[1]u[1]+u[2]u[2]);
28	return array<double,3>{u[0]s,u[1]s,u[2]*s}; };
29	array<double,3> v[3] = { array<double,3>{a[0][0],a[1][0],a[2][0]},
30	array<double,3>{a[0][1],a[1][1],a[2][1]},
31	array<double,3>{a[0][2],a[1][2],a[2][2]} };
32	for(int i=0;i<3;i++){
33	auto w = v[i];
34	for(int j=0;j<i;j++){ auto p = proj(u[j], v[i]); w[0]-=p[0]; w[1]-=p[1]; w[2]-=p[2]; }
35	double n = norm(w); u[i] = {w[0]/n, w[1]/n, w[2]/n};
36	}
37	Mat3 R{}; for(int i=0;i<3;i++) for(int j=0;j<3;j++) R.m[i][j] = u[j][i];
38	double det = R.m[0][0](R.m[1][1]R.m[2][2]-R.m[1][2]*R.m[2][1])
39	- R.m[0][1](R.m[1][0]R.m[2][2]-R.m[1][2]*R.m[2][0])
40	+ R.m[0][2](R.m[1][0]R.m[2][1]-R.m[1][1]*R.m[2][0]);
41	if(!allow_reflection && det < 0){ for(int i=0;i<3;i++) R.m[i][2] *= -1.0; }
42	return R;
43	}
44
45	struct RadialMLP {
46	// k(d2) = w2 * ReLU(w1 * [d2, d2^0.5, 1]) + b2 (scalar output)
47	array<double,3> w1; double b1; double w2; double b2;
48	RadialMLP(mt19937& rng){
49	normal_distribution<double> nd(0.0, 0.5);
50	for(int i=0;i<3;i++) w1[i] = nd(rng);
51	b1 = 0.0; w2 = nd(rng); b2 = 0.0;
52	}
53	double forward(double d2) const{
54	double d = sqrt(max(1e-12, d2));
55	double h = w1[0]d2 + w1[1]d + w1[2]*1.0 + b1;
56	h = max(0.0, h); // ReLU ensures non-negativity
57	return w2 * h + b2;
58	}
59	};
60
61	vector<Vec> compute_forces(const vector<Vec>& X, const RadialMLP& phi){
62	int N = (int)X.size();
63	vector<Vec> F(N, {0,0,0});
64	for(int i=0;i<N;i++){
65	Vec fi{0,0,0};
66	for(int j=0;j<N;j++) if(i!=j){
67	Vec rij = {X[i].x - X[j].x, X[i].y - X[j].y, X[i].z - X[j].z};
68	double d2 = rij.xrij.x + rij.yrij.y + rij.z*rij.z;
69	double kij = phi.forward(d2);
70	fi += rij * kij; // linear in relative vector, scalar depends on invariants
71	}
72	F[i] = fi;
73	}
74	return F;
75	}
76
77	int main(){
78	mt19937 rng(7);
79	uniform_real_distribution<double> ud(-1.0, 1.0);
80	int N=8; vector<Vec> X(N);
81	for(int i=0;i<N;i++) X[i] = {ud(rng), ud(rng), ud(rng)};
82
83	RadialMLP phi(rng);
84	vector<Vec> F = compute_forces(X, phi);
85
86	// Apply random rigid motion
87	Mat3 R = random_orthogonal(rng, true);
88	Vec t = {0.3, -0.4, 0.1};
89	vector<Vec> Xg(N); for(int i=0;i<N;i++) Xg[i] = {R.m[0][0]X[i].x + R.m[0][1]X[i].y + R.m[0][2]*X[i].z + t.x,
90	R.m[1][0]X[i].x + R.m[1][1]X[i].y + R.m[1][2]*X[i].z + t.y,
91	R.m[2][0]X[i].x + R.m[2][1]X[i].y + R.m[2][2]*X[i].z + t.z};
92	vector<Vec> Fg = compute_forces(Xg, phi);
93
94	// Transform original forces by R and compare to Fg
95	auto mul = [&](const Vec& v){ return Vec{R.m[0][0]v.x + R.m[0][1]v.y + R.m[0][2]*v.z,
96	R.m[1][0]v.x + R.m[1][1]v.y + R.m[1][2]*v.z,
97	R.m[2][0]v.x + R.m[2][1]v.y + R.m[2][2]*v.z}; };
98	double max_err = 0.0;
99	for(int i=0;i<N;i++){
100	Vec RF = mul(F[i]);
101	max_err = max(max_err, fabs(RF.x - Fg[i].x));
102	max_err = max(max_err, fabs(RF.y - Fg[i].y));
103	max_err = max(max_err, fabs(RF.z - Fg[i].z));
104	}
105	cout.setf(std::ios::fixed); cout<<setprecision(6);
106	cout << "Max force equivariance error: " << max_err << "\n";
107	cout << "(Small nonzero errors expected due to floating point.)\n";
108	return 0;
109	}
110