⚙️AlgorithmAdvanced

ADMM (Alternating Direction Method of Multipliers)

Key Points

•
ADMM splits a hard optimization problem into two easier subproblems that communicate through simple averaging-like steps.
•
It alternates minimizing with respect to each block of variables and then updates a running “penalty-adjusted” dual variable.
•
The method is especially effective for large, structured, and distributed convex problems, such as L1-regularized learning (LASSO) and consensus averaging.
•
ADMM’s core engine is the augmented Lagrangian, which stabilizes constraints with a quadratic penalty controlled by a parameter $ $ρ$ $.
•
Stopping is based on primal and dual residuals that measure constraint satisfaction and update stability.
•
Per-iteration cost is usually dominated by linear algebra in each subproblem; warm-starts and precomputed factorizations make ADMM fast in practice.
•
ADMM is robust and easy to implement with proximal operators (like soft-thresholding) but typically converges sublinearly (often $O $\frac{1}{k}$ $).
•
Choosing $ $ρ$ $ and good scaling strongly affects speed; heuristic adjustments of $ $ρ$ $ can help convergence.

Prerequisites

→Convex optimization basics — ADMM assumes convexity for convergence guarantees and uses concepts like convex functions and optimality.
→Linear algebra (matrix factorizations) — Efficient x-updates rely on solving linear systems using Cholesky/QR and understanding A^T A.
→Lagrangian duality — The augmented Lagrangian and dual variables are central to ADMM’s design and convergence.
→Proximal operators — Many ADMM subproblems reduce to evaluating proximal mappings like soft-thresholding.
→Numerical stability and conditioning — Choosing formulations and factorizations that are stable is key for performance and accuracy.

Detailed Explanation

Tap terms for definitions

01Overview

The Alternating Direction Method of Multipliers (ADMM) is a powerful algorithm for solving optimization problems that can be decomposed into parts. Think of problems where an objective naturally splits into two terms, each depending on a different block of variables, tied together by linear constraints. ADMM leverages this structure by alternating between optimizing each block while keeping the other fixed, and then enforcing agreement through a dual update. This combines the decomposability of dual ascent with the stability of the augmented Lagrangian. In practice, ADMM shines in large-scale convex optimization, especially when the cost function involves non-smooth penalties like the L1 norm. Examples include LASSO for sparse regression, total variation denoising, and distributed consensus problems. Because each subproblem can often be solved with simple operations (like soft-thresholding) or fast linear algebra (like solving a system with a precomputed factorization), ADMM can handle very large datasets efficiently and can be distributed across machines. While its theoretical convergence rate is typically sublinear, its robustness, simplicity, and ability to exploit problem structure make it a go-to method in modern optimization and machine learning.

02Intuition & Analogies

Imagine two teams trying to write a joint report. Team X writes the content (x), Team Z designs the layout (z). They must agree on a final version (the constraint ties x and z). Working together on every sentence is slow, so they adopt a routine: Team X drafts content given the current layout, Team Z refines the layout given the new draft, and a coordinator checks how different their versions are, adding a “penalty note” if they disagree. Over time, the fear of accumulating penalties pushes both teams toward agreement while each focuses on their specialty. That is ADMM. More concretely, ADMM introduces a soft “rubber band” between the two variable blocks: if x and z drift apart (violating the constraint), the rubber band (quadratic penalty) pulls them back. The dual variable u is like an accountant tracking the disagreement; when disagreement persists, u tightens the rubber band by increasing the effective cost of misalignment. Because each team only solves their own subproblem (often easy), the overall method scales well. In distributed settings, think of multiple sensors estimating a common quantity: each computes a local update using its data, then everyone averages their answers and adjusts their internal bias (u). Repeat this, and the network reaches consensus without any single node solving the entire problem. The alternating, penalty-aided negotiation is what makes ADMM both stable and modular.

03Formal Definition

Consider convex functions f and g, matrices A and B, and vector c. The problem form is minimize f(x) + g(z) subject to A x + B

z = c

. The augmented Lagrangian with penalty parameter \(

ρ

> 0\) is

L_{ρ}

(x,z,y) = f(x) + g(z) +

y^{⊤}

(A x + B z - c) +

\frac{ρ}{2} ∥

A x + B z - c

∥_{2}^{2}

. In the scaled form with u = y/

ρ

, ADMM performs the iterations:

x^{k + 1}

ar g min_{x}

f(x) +

\frac{ρ}{2}

∥

A x + B

z^{k}

- c +

u^{k}

∥_{2}^{2}

;

z^{k + 1}

ar g min_{z}

g(z) +

\frac{ρ}{2}

∥

x^{k + 1}

+ B z - c +

u^{k}

∥_{2}^{2}

;

u^{k + 1} = u^{k} + A

x^{k + 1}

+ B

z^{k + 1}

- c. Convergence (under standard assumptions: closed, proper, convex f and g; existence of a saddle point; mild qualification conditions) guarantees that the primal residual

r^{k} = A

x^{k}

+ B

z^{k}

- c approaches zero and that the objective values converge to the optimal value. Stopping criteria are typically based on the norms of the primal residual and the dual residual

s^{k}

, compared against absolute and relative tolerances that scale with problem size and variable magnitudes.

04When to Use

Use ADMM when your problem naturally splits into two (or more) parts connected by simple linear constraints, and each part is easy to optimize on its own. Classic scenarios include: (1) Sparse learning, like LASSO, where the smooth least-squares term couples with a non-smooth L1 penalty; (2) Total variation and fused lasso problems where proximal operators are simple; (3) Distributed or federated optimization where data is partitioned across machines—ADMM allows local updates and lightweight global coordination; (4) Consensus and sharing problems where many local variables must agree on a common value; (5) Large-scale problems where precomputations (like a Cholesky factor) can be reused across many iterations. ADMM is less ideal when subproblems are themselves expensive or ill-conditioned without a good preconditioner, when ultra-fast high-accuracy is needed (second-order interior-point methods may win there), or when the problem does not decompose in a way that yields simple proximal steps. It is a strong default for convex, structured, and separable objectives where modest accuracy suffices and scalability matters.

⚠️Common Mistakes

Ignoring scaling and normalization: Poorly scaled A, variables, or penalties can slow convergence dramatically. Normalize features and tune (\rho) (or adapt it during iterations).
Not precomputing reusable factorizations: Re-solving the same linear system from scratch each iteration wastes time. Factorize ( $A^{\top}$ A + \rho I) once and reuse triangular solves.
Loose or incorrect stopping criteria: Failing to monitor both primal and dual residuals can cause premature stopping or endless looping. Use standard absolute/relative tolerances that scale with problem size.
Using too small or too large (\rho): Too small makes constraints weakly enforced (slow primal convergence); too large causes tiny steps (slow dual convergence). Implement heuristic (\rho) adjustment based on residual magnitudes.
Forgetting warm starts: ADMM benefits from warm-starting x, z, and u, especially when solving a sequence of related problems (e.g., along a regularization path).
Overlooking numerical stability: For ill-conditioned matrices, prefer numerically stable factorizations (Cholesky for SPD, QR otherwise) and avoid forming $A^{\top}$ A explicitly when possible (use CG or normal equations with care).

Key Formulas

ADMM canonical form

x, z min f (x) + g (z) s.t. A x + B z = c

Explanation: This is the standard problem structure ADMM targets: a sum of two functions with a linear constraint tying variables together. Many practical problems can be written this way.

Augmented Lagrangian

L_{ρ} (x, z, y) = f (x) + g (z) + y^{⊤} (A x + B z - c) + \frac{ρ}{2} ∥ A x + B z - c ∥_{2}^{2}

Explanation: Adds a quadratic penalty to the classical Lagrangian to stabilize dual ascent. The parameter $ $ρ$ $ controls how strongly constraint violations are penalized.

Scaled ADMM updates

x^{k + 1} z^{k + 1} u^{k + 1} = ar g x min f (x) + \frac{ρ}{2} ∥ A x + B z^{k} - c + u^{k} ∥_{2}^{2} = ar g z min g (z) + \frac{ρ}{2} ∥ A x^{k + 1} + B z - c + u^{k} ∥_{2}^{2} = u^{k} + A x^{k + 1} + B z^{k + 1} - c

Explanation: These are the core iteration steps using the scaled dual variable u = y/ $ρ$ . Each step solves a simpler subproblem focusing on one variable block while the other is fixed.

Residuals (general form)

r^{k + 1} = A x^{k + 1} + B z^{k + 1} - c, s^{k + 1} = ρ A^{⊤} B (z^{k + 1} - z^{k})

Explanation: The primal residual r measures constraint violation; the dual residual s measures dual feasibility progress. They are used to decide when to stop.

Residuals (x = z splitting)

r^{k + 1} = x^{k + 1} - z^{k + 1}, s^{k + 1} = ρ (z^{k + 1} - z^{k})

Explanation: For problems written with A = I, B = -I, c = 0, the residuals simplify. This is the common form used in LASSO and many proximal splitting problems.

Soft-thresholding

S_{κ} (v_{i}) = sign (v_{i}) max (∣ v_{i} ∣ - κ, 0)

Explanation: The proximal operator for the L1 norm. Apply it elementwise with threshold $ $κ$ = $λ$ / $ρ$ $ in LASSO-like problems.

Stopping tolerances

ε_{pri} = n ε_{abs} + ε_{rel} max {∥ A x ∥_{2}, ∥ B z ∥_{2}, ∥ c ∥_{2}}, ε_{dual} = n ε_{abs} + ε_{rel} ∥ ρ A^{⊤} u ∥_{2}

Explanation: Absolute and relative tolerances scale with dimension to make stopping criteria robust. Stop when the residual norms are below these thresholds.

LASSO x-update normal equations

(A^{⊤} A + ρ I) x = A^{⊤} b + ρ (z - u)

Explanation: For the LASSO splitting with x = z, the x-subproblem reduces to solving a linear system. Precomputing a factorization makes each iteration cheap.

Consensus z-update

z^{k + 1} = \frac{1}{N} i = 1 \sum N (x_{i}^{k + 1} + u_{i}^{k})

Explanation: In consensus ADMM, the global variable is the average of local primal-plus-dual terms. This step communicates minimal information across workers.

Ergodic convergence rate

O (1/ k)

Explanation: Under convexity, averages of iterates often converge at a rate proportional to 1/k. This describes typical sublinear convergence for first-order splitting methods.

Complexity Analysis

ADMM’s per-iteration cost depends on the two subproblems. If each subproblem has a cheap proximal operator (e.g., L1 soft-thresholding), per-iteration time is dominated by linear algebra for the smooth term. For LASSO with m samples and n features in the splitting

x = z

, the x-update solves (

A^{T}

A + ρ I)

x = A^{T}

b + ρ(z − u). Precomputing

A^{T}

A and its Cholesky factor costs O(m

n^{2}

n^{3}

) once. Each iteration then performs two triangular solves in O(

n^{2}

), plus O(n) vector operations and an O(n) soft-thresholding; total O(

n^{2}

) per iteration after setup. If n is large and m ≫ n, forming

A^{T}

A may be acceptable; if n is huge, prefer iterative solvers (e.g., conjugate gradient) with matrix-vector costs O(m n) per CG iteration. For consensus ADMM with N workers and d-dimensional variables, local x-updates are O(d) per worker (closed-form), the global z-update is an O(N d) average, and dual updates are O(d) per worker; total O(N d) per iteration with minimal communication (one vector per worker). Space complexity typically stores the problem data (e.g., A in O(m n)), auxiliary variables x, z, u (O(n) or O(N d)), and any factorization (e.g., Cholesky factors in O(

n^{2}

)). ADMM’s memory footprint is therefore dominated by data matrices and cached factorizations. Convergence is usually sublinear; the number of iterations varies with conditioning and ρ. Adaptive ρ, preconditioning, and good scaling can substantially reduce iterations in practice.

Code Examples

ADMM for LASSO: min 1/2 ||A x - b||_2^2 + λ ||x||_1

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Utility: 2-norm of a vector
5 static double norm2(const vector<double>& v){
6     double s=0; for(double x: v) s += x*x; return sqrt(s);
7 }
8 
9 // Matrix-vector multiply: y = A x, A is m x n
10 static vector<double> matvec(const vector<vector<double>>& A, const vector<double>& x){
11     int m = (int)A.size(), n = (int)A[0].size();
12     vector<double> y(m,0.0);
13     for(int i=0;i<m;++i){
14         double s=0.0; for(int j=0;j<n;++j) s += A[i][j]*x[j];
15         y[i]=s;
16     }
17     return y;
18 }
19 
20 // Compute AtA (n x n) and Atb (n)
21 static void AtA_Atb(const vector<vector<double>>& A, const vector<double>& b,
22                     vector<vector<double>>& AtA, vector<double>& Atb){
23     int m = (int)A.size(), n = (int)A[0].size();
24     AtA.assign(n, vector<double>(n, 0.0));
25     Atb.assign(n, 0.0);
26     for(int i=0;i<m;++i){
27         for(int j=0;j<n;++j){
28             Atb[j] += A[i][j]*b[i];
29             for(int k=0;k<n;++k){
30                 AtA[j][k] += A[i][j]*A[i][k];
31             }
32         }
33     }
34 }
35 
36 // Cholesky decomposition for SPD matrix M (n x n): M = L L^T
37 static bool cholesky(vector<vector<double>> M, vector<vector<double>>& L){
38     int n = (int)M.size();
39     L.assign(n, vector<double>(n, 0.0));
40     for(int i=0;i<n;++i){
41         for(int j=0;j<=i;++j){
42             double s = M[i][j];
43             for(int k=0;k<j;++k) s -= L[i][k]*L[j][k];
44             if(i==j){
45                 if(s <= 0.0) return false;
46                 L[i][i] = sqrt(max(s, 0.0));
47             }else{
48                 L[i][j] = s / L[j][j];
49             }
50         }
51     }
52     return true;
53 }
54 
55 // Solve (L L^T) x = b
56 static vector<double> chol_solve(const vector<vector<double>>& L, const vector<double>& b){
57     int n = (int)L.size();
58     vector<double> y(n,0.0), x(n,0.0);
59     // Forward solve: L y = b
60     for(int i=0;i<n;++i){
61         double s = b[i];
62         for(int k=0;k<i;++k) s -= L[i][k]*y[k];
63         y[i] = s / L[i][i];
64     }
65     // Backward solve: L^T x = y
66     for(int i=n-1;i>=0;--i){
67         double s = y[i];
68         for(int k=i+1;k<n;++k) s -= L[k][i]*x[k];
69         x[i] = s / L[i][i];
70     }
71     return x;
72 }
73 
74 // Soft-thresholding (prox of L1): z_i = sign(v_i) * max(|v_i| - kappa, 0)
75 static void soft_threshold(vector<double>& v, double kappa){
76     for(double &x : v){
77         double a = fabs(x) - kappa;
78         if(a > 0) x = copysign(a, x);
79         else x = 0.0;
80     }
81 }
82 
83 int main(){
84     ios::sync_with_stdio(false);
85     cin.tie(nullptr);
86 
87     // Example data: small LASSO instance (m samples, n features)
88     int m = 6, n = 4;
89     vector<vector<double>> A = {
90         {0.50, -0.10,  0.30,  0.00},
91         {0.20,  0.40, -0.20,  0.10},
92         {0.00,  0.10,  0.50, -0.30},
93         {0.40, -0.30,  0.10,  0.20},
94         {-0.10, 0.20,  0.00,  0.50},
95         {0.30,  0.10, -0.40,  0.10}
96     };
97     vector<double> b = { 0.9, 0.1, 0.4, 0.7, -0.2, 0.3 };
98 
99     // ADMM parameters
100     double lambda = 0.2;   // L1 regularization strength
101     double rho    = 1.0;   // Augmented Lagrangian penalty
102     double abs_tol = 1e-4, rel_tol = 1e-3;
103     int max_iters = 200;
104 
105     // Precompute normal equations: (A^T A + rho I)
106     vector<vector<double>> AtA; vector<double> Atb;
107     AtA_Atb(A, b, AtA, Atb);
108     for(int i=0;i<n;++i) AtA[i][i] += rho; // add rho I
109 
110     // Cholesky factor of (A^T A + rho I)
111     vector<vector<double>> L; 
112     if(!cholesky(AtA, L)){
113         cerr << "Matrix not SPD; try increasing rho or using QR/CG.\n";
114         return 0;
115     }
116 
117     // Initialize x, z, u (warm start with zeros)
118     vector<double> x(n,0.0), z(n,0.0), u(n,0.0);
119     vector<double> z_old(n,0.0);
120 
121     // Precompute Atb (constant)
122     // Note: we already have Atb from AtA_Atb
123 
124     cout << fixed << setprecision(6);
125     for(int k=0;k<max_iters;++k){
126         // x-update: solve (A^T A + rho I) x = Atb + rho (z - u)
127         vector<double> rhs(n,0.0);
128         for(int i=0;i<n;++i) rhs[i] = Atb[i] + rho*(z[i] - u[i]);
129         x = chol_solve(L, rhs);
130 
131         // z-update: soft-thresholding of x + u with kappa = lambda/rho
132         z_old = z;
133         z = x; // start from x
134         for(int i=0;i<n;++i) z[i] += u[i];
135         soft_threshold(z, lambda / rho);
136 
137         // u-update: u = u + x - z
138         for(int i=0;i<n;++i) u[i] += x[i] - z[i];
139 
140         // Residuals for stopping
141         // Primal: r = x - z; Dual: s = rho * (z - z_old)
142         vector<double> r(n,0.0), s(n,0.0);
143         for(int i=0;i<n;++i){ r[i] = x[i] - z[i]; s[i] = rho*(z[i] - z_old[i]); }
144 
145         double r_norm = norm2(r);
146         double s_norm = norm2(s);
147 
148         // Tolerances
149         double x_norm = norm2(x), z_norm = norm2(z), u_norm = norm2(u);
150         double eps_pri = sqrt((double)n)*abs_tol + rel_tol*max(x_norm, z_norm);
151         double eps_dual = sqrt((double)n)*abs_tol + rel_tol*norm2(u)*rho; // approximate form
152 
153         if(k % 10 == 0){
154             cout << "iter=" << setw(3) << k
155                  << "  r_norm=" << setw(10) << r_norm
156                  << "  s_norm=" << setw(10) << s_norm
157                  << "  eps_pri=" << setw(10) << eps_pri
158                  << "  eps_dual=" << setw(10) << eps_dual << "\n";
159         }
160 
161         if(r_norm <= eps_pri && s_norm <= eps_dual){
162             cout << "Converged at iter " << k << "\n";
163             break;
164         }
165     }
166 
167     // Output solution
168     cout << "x (sparse): ";
169     for(double xi: x) cout << xi << ' ';
170     cout << "\n";
171     cout << "z (prox result): ";
172     for(double zi: z) cout << zi << ' ';
173     cout << "\n";
174 
175     return 0;
176 }
177

Solves the LASSO problem by splitting x and z with the constraint x = z. The x-update solves a linear system with a precomputed Cholesky factor of (A^T A + ρ I); the z-update applies elementwise soft-thresholding (the proximal operator of the L1 norm). The dual variable u accumulates constraint disagreement. Stopping uses primal and dual residual norms relative to absolute/relative tolerances. Precomputing the factorization makes each iteration fast and numerically stable for moderately sized n.

Time: Setup: O(m n^2 + n^3) to form A^T A and compute Cholesky. Each ADMM iteration: O(n^2) for two triangular solves plus O(n) vector ops and soft-thresholding.Space: O(m n) to store A; O(n^2) for A^T A and its Cholesky; O(n) for vectors x, z, u.

Consensus ADMM for distributed averaging

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Compute 2-norm
5 static double norm2(const vector<double>& v){ double s=0; for(double x: v) s+=x*x; return sqrt(s); }
6 
7 int main(){
8     ios::sync_with_stdio(false);
9     cin.tie(nullptr);
10 
11     // Suppose N workers each hold a d-dimensional observation a_i.
12     int N = 5, d = 3;
13     vector<vector<double>> a = {
14         { 1.0,  2.0,  0.0},
15         { 0.5,  1.5, -1.0},
16         {-0.5,  2.5,  1.0},
17         { 2.0, -1.0,  0.5},
18         { 1.5,  0.0, -0.5}
19     };
20 
21     // ADMM variables per worker: x_i, u_i; and global z
22     vector<vector<double>> x(N, vector<double>(d, 0.0));
23     vector<vector<double>> u(N, vector<double>(d, 0.0));
24     vector<double> z(d, 0.0);
25 
26     double rho = 1.0; // penalty parameter
27     double abs_tol = 1e-5, rel_tol = 1e-3;
28     int max_iters = 200;
29 
30     auto vec_add = [&](vector<double>& y, const vector<double>& v){
31         for(int j=0;j<d;++j) y[j] += v[j];
32     };
33 
34     for(int k=0;k<max_iters;++k){
35         // x-update (local, closed-form): x_i = (a_i + rho z - u_i) / (1 + rho)
36         for(int i=0;i<N;++i){
37             for(int j=0;j<d;++j){
38                 x[i][j] = (a[i][j] + rho*z[j] - u[i][j]) / (1.0 + rho);
39             }
40         }
41 
42         // z-update (global averaging): z = (1/N) * sum_i (x_i + u_i)
43         vector<double> sum(d, 0.0);
44         for(int i=0;i<N;++i){
45             for(int j=0;j<d;++j){ sum[j] += x[i][j] + u[i][j]; }
46         }
47         for(int j=0;j<d;++j) z[j] = sum[j] / (double)N;
48 
49         // u-update: u_i = u_i + x_i - z
50         for(int i=0;i<N;++i){
51             for(int j=0;j<d;++j){ u[i][j] += x[i][j] - z[j]; }
52         }
53 
54         // Residuals: primal r_i = x_i - z, dual s = rho * (z - z_prev) aggregated
55         static vector<double> z_prev; if(k==0) z_prev = vector<double>(d, 0.0);
56         vector<double> r_flat; r_flat.reserve(N*d);
57         for(int i=0;i<N;++i){
58             for(int j=0;j<d;++j) r_flat.push_back(x[i][j] - z[j]);
59         }
60         vector<double> s_vec(d, 0.0);
61         for(int j=0;j<d;++j) s_vec[j] = rho * (z[j] - z_prev[j]);
62 
63         double r_norm = norm2(r_flat);
64         double s_norm = norm2(s_vec);
65 
66         double eps_pri = sqrt((double)N*d)*abs_tol + rel_tol*max(norm2(z), 0.0);
67         double eps_dual = sqrt((double)N*d)*abs_tol + rel_tol*norm2(u[0])*rho; // heuristic; u magnitudes comparable
68 
69         if(k % 10 == 0){
70             cout << "iter=" << setw(3) << k
71                  << "  r_norm=" << r_norm
72                  << "  s_norm=" << s_norm << "\n";
73         }
74 
75         if(r_norm <= eps_pri && s_norm <= eps_dual){
76             cout << "Converged at iter " << k << "\n";
77             break;
78         }
79         z_prev = z;
80     }
81 
82     // Report result and compare to direct average of a_i
83     vector<double> avg(d, 0.0);
84     for(int i=0;i<N;++i){ for(int j=0;j<d;++j) avg[j] += a[i][j]; }
85     for(int j=0;j<d;++j) avg[j] /= (double)N;
86 
87     cout << fixed << setprecision(6);
88     cout << "Consensus z: "; for(double v: z) cout << v << ' '; cout << "\n";
89     cout << "Direct avg : "; for(double v: avg) cout << v << ' '; cout << "\n";
90 
91     return 0;
92 }
93

Solves a consensus problem where each worker i wants x_i close to its local vector a_i, with the constraint that all x_i equal a global consensus z. ADMM yields simple closed-form local updates, a global averaging step, and dual accumulation. The final z matches the direct average of the local data, illustrating distributed averaging with lightweight communication.

Time: O(N d) per iteration (local updates and one global average).Space: O(N d) for local variables x_i and u_i, plus O(d) for z.

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Utility: 2-norm of a vector
5	static double norm2(const vector<double>& v){
6	double s=0; for(double x: v) s += x*x; return sqrt(s);
7	}
8
9	// Matrix-vector multiply: y = A x, A is m x n
10	static vector<double> matvec(const vector<vector<double>>& A, const vector<double>& x){
11	int m = (int)A.size(), n = (int)A[0].size();
12	vector<double> y(m,0.0);
13	for(int i=0;i<m;++i){
14	double s=0.0; for(int j=0;j<n;++j) s += A[i][j]*x[j];
15	y[i]=s;
16	}
17	return y;
18	}
19
20	// Compute AtA (n x n) and Atb (n)
21	static void AtA_Atb(const vector<vector<double>>& A, const vector<double>& b,
22	vector<vector<double>>& AtA, vector<double>& Atb){
23	int m = (int)A.size(), n = (int)A[0].size();
24	AtA.assign(n, vector<double>(n, 0.0));
25	Atb.assign(n, 0.0);
26	for(int i=0;i<m;++i){
27	for(int j=0;j<n;++j){
28	Atb[j] += A[i][j]*b[i];
29	for(int k=0;k<n;++k){
30	AtA[j][k] += A[i][j]*A[i][k];
31	}
32	}
33	}
34	}
35
36	// Cholesky decomposition for SPD matrix M (n x n): M = L L^T
37	static bool cholesky(vector<vector<double>> M, vector<vector<double>>& L){
38	int n = (int)M.size();
39	L.assign(n, vector<double>(n, 0.0));
40	for(int i=0;i<n;++i){
41	for(int j=0;j<=i;++j){
42	double s = M[i][j];
43	for(int k=0;k<j;++k) s -= L[i][k]*L[j][k];
44	if(i==j){
45	if(s <= 0.0) return false;
46	L[i][i] = sqrt(max(s, 0.0));
47	}else{
48	L[i][j] = s / L[j][j];
49	}
50	}
51	}
52	return true;
53	}
54
55	// Solve (L L^T) x = b
56	static vector<double> chol_solve(const vector<vector<double>>& L, const vector<double>& b){
57	int n = (int)L.size();
58	vector<double> y(n,0.0), x(n,0.0);
59	// Forward solve: L y = b
60	for(int i=0;i<n;++i){
61	double s = b[i];
62	for(int k=0;k<i;++k) s -= L[i][k]*y[k];
63	y[i] = s / L[i][i];
64	}
65	// Backward solve: L^T x = y
66	for(int i=n-1;i>=0;--i){
67	double s = y[i];
68	for(int k=i+1;k<n;++k) s -= L[k][i]*x[k];
69	x[i] = s / L[i][i];
70	}
71	return x;
72	}
73
74	// Soft-thresholding (prox of L1): z_i = sign(v_i) * max(\|v_i\| - kappa, 0)
75	static void soft_threshold(vector<double>& v, double kappa){
76	for(double &x : v){
77	double a = fabs(x) - kappa;
78	if(a > 0) x = copysign(a, x);
79	else x = 0.0;
80	}
81	}
82
83	int main(){
84	ios::sync_with_stdio(false);
85	cin.tie(nullptr);
86
87	// Example data: small LASSO instance (m samples, n features)
88	int m = 6, n = 4;
89	vector<vector<double>> A = {
90	{0.50, -0.10, 0.30, 0.00},
91	{0.20, 0.40, -0.20, 0.10},
92	{0.00, 0.10, 0.50, -0.30},
93	{0.40, -0.30, 0.10, 0.20},
94	{-0.10, 0.20, 0.00, 0.50},
95	{0.30, 0.10, -0.40, 0.10}
96	};
97	vector<double> b = { 0.9, 0.1, 0.4, 0.7, -0.2, 0.3 };
98
99	// ADMM parameters
100	double lambda = 0.2; // L1 regularization strength
101	double rho = 1.0; // Augmented Lagrangian penalty
102	double abs_tol = 1e-4, rel_tol = 1e-3;
103	int max_iters = 200;
104
105	// Precompute normal equations: (A^T A + rho I)
106	vector<vector<double>> AtA; vector<double> Atb;
107	AtA_Atb(A, b, AtA, Atb);
108	for(int i=0;i<n;++i) AtA[i][i] += rho; // add rho I
109
110	// Cholesky factor of (A^T A + rho I)
111	vector<vector<double>> L;
112	if(!cholesky(AtA, L)){
113	cerr << "Matrix not SPD; try increasing rho or using QR/CG.\n";
114	return 0;
115	}
116
117	// Initialize x, z, u (warm start with zeros)
118	vector<double> x(n,0.0), z(n,0.0), u(n,0.0);
119	vector<double> z_old(n,0.0);
120
121	// Precompute Atb (constant)
122	// Note: we already have Atb from AtA_Atb
123
124	cout << fixed << setprecision(6);
125	for(int k=0;k<max_iters;++k){
126	// x-update: solve (A^T A + rho I) x = Atb + rho (z - u)
127	vector<double> rhs(n,0.0);
128	for(int i=0;i<n;++i) rhs[i] = Atb[i] + rho*(z[i] - u[i]);
129	x = chol_solve(L, rhs);
130
131	// z-update: soft-thresholding of x + u with kappa = lambda/rho
132	z_old = z;
133	z = x; // start from x
134	for(int i=0;i<n;++i) z[i] += u[i];
135	soft_threshold(z, lambda / rho);
136
137	// u-update: u = u + x - z
138	for(int i=0;i<n;++i) u[i] += x[i] - z[i];
139
140	// Residuals for stopping
141	// Primal: r = x - z; Dual: s = rho * (z - z_old)
142	vector<double> r(n,0.0), s(n,0.0);
143	for(int i=0;i<n;++i){ r[i] = x[i] - z[i]; s[i] = rho*(z[i] - z_old[i]); }
144
145	double r_norm = norm2(r);
146	double s_norm = norm2(s);
147
148	// Tolerances
149	double x_norm = norm2(x), z_norm = norm2(z), u_norm = norm2(u);
150	double eps_pri = sqrt((double)n)abs_tol + rel_tolmax(x_norm, z_norm);
151	double eps_dual = sqrt((double)n)abs_tol + rel_tolnorm2(u)*rho; // approximate form
152
153	if(k % 10 == 0){
154	cout << "iter=" << setw(3) << k
155	<< " r_norm=" << setw(10) << r_norm
156	<< " s_norm=" << setw(10) << s_norm
157	<< " eps_pri=" << setw(10) << eps_pri
158	<< " eps_dual=" << setw(10) << eps_dual << "\n";
159	}
160
161	if(r_norm <= eps_pri && s_norm <= eps_dual){
162	cout << "Converged at iter " << k << "\n";
163	break;
164	}
165	}
166
167	// Output solution
168	cout << "x (sparse): ";
169	for(double xi: x) cout << xi << ' ';
170	cout << "\n";
171	cout << "z (prox result): ";
172	for(double zi: z) cout << zi << ' ';
173	cout << "\n";
174
175	return 0;
176	}
177

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Compute 2-norm
5	static double norm2(const vector<double>& v){ double s=0; for(double x: v) s+=x*x; return sqrt(s); }
6
7	int main(){
8	ios::sync_with_stdio(false);
9	cin.tie(nullptr);
10
11	// Suppose N workers each hold a d-dimensional observation a_i.
12	int N = 5, d = 3;
13	vector<vector<double>> a = {
14	{ 1.0, 2.0, 0.0},
15	{ 0.5, 1.5, -1.0},
16	{-0.5, 2.5, 1.0},
17	{ 2.0, -1.0, 0.5},
18	{ 1.5, 0.0, -0.5}
19	};
20
21	// ADMM variables per worker: x_i, u_i; and global z
22	vector<vector<double>> x(N, vector<double>(d, 0.0));
23	vector<vector<double>> u(N, vector<double>(d, 0.0));
24	vector<double> z(d, 0.0);
25
26	double rho = 1.0; // penalty parameter
27	double abs_tol = 1e-5, rel_tol = 1e-3;
28	int max_iters = 200;
29
30	auto vec_add = [&](vector<double>& y, const vector<double>& v){
31	for(int j=0;j<d;++j) y[j] += v[j];
32	};
33
34	for(int k=0;k<max_iters;++k){
35	// x-update (local, closed-form): x_i = (a_i + rho z - u_i) / (1 + rho)
36	for(int i=0;i<N;++i){
37	for(int j=0;j<d;++j){
38	x[i][j] = (a[i][j] + rho*z[j] - u[i][j]) / (1.0 + rho);
39	}
40	}
41
42	// z-update (global averaging): z = (1/N) * sum_i (x_i + u_i)
43	vector<double> sum(d, 0.0);
44	for(int i=0;i<N;++i){
45	for(int j=0;j<d;++j){ sum[j] += x[i][j] + u[i][j]; }
46	}
47	for(int j=0;j<d;++j) z[j] = sum[j] / (double)N;
48
49	// u-update: u_i = u_i + x_i - z
50	for(int i=0;i<N;++i){
51	for(int j=0;j<d;++j){ u[i][j] += x[i][j] - z[j]; }
52	}
53
54	// Residuals: primal r_i = x_i - z, dual s = rho * (z - z_prev) aggregated
55	static vector<double> z_prev; if(k==0) z_prev = vector<double>(d, 0.0);
56	vector<double> r_flat; r_flat.reserve(N*d);
57	for(int i=0;i<N;++i){
58	for(int j=0;j<d;++j) r_flat.push_back(x[i][j] - z[j]);
59	}
60	vector<double> s_vec(d, 0.0);
61	for(int j=0;j<d;++j) s_vec[j] = rho * (z[j] - z_prev[j]);
62
63	double r_norm = norm2(r_flat);
64	double s_norm = norm2(s_vec);
65
66	double eps_pri = sqrt((double)Nd)abs_tol + rel_tol*max(norm2(z), 0.0);
67	double eps_dual = sqrt((double)Nd)abs_tol + rel_tolnorm2(u[0])rho; // heuristic; u magnitudes comparable
68
69	if(k % 10 == 0){
70	cout << "iter=" << setw(3) << k
71	<< " r_norm=" << r_norm
72	<< " s_norm=" << s_norm << "\n";
73	}
74
75	if(r_norm <= eps_pri && s_norm <= eps_dual){
76	cout << "Converged at iter " << k << "\n";
77	break;
78	}
79	z_prev = z;
80	}
81
82	// Report result and compare to direct average of a_i
83	vector<double> avg(d, 0.0);
84	for(int i=0;i<N;++i){ for(int j=0;j<d;++j) avg[j] += a[i][j]; }
85	for(int j=0;j<d;++j) avg[j] /= (double)N;
86
87	cout << fixed << setprecision(6);
88	cout << "Consensus z: "; for(double v: z) cout << v << ' '; cout << "\n";
89	cout << "Direct avg : "; for(double v: avg) cout << v << ' '; cout << "\n";
90
91	return 0;
92	}
93