∑MathAdvanced

KKT Conditions

Key Points

•
KKT conditions generalize Lagrange multipliers to handle inequality constraints in constrained optimization problems.
•
They consist of four parts: stationarity, primal feasibility, dual feasibility, and complementary slackness.
•
For convex problems with a regularity condition (like Slater’s condition), KKT conditions are not just necessary but also sufficient for optimality.
•
Lagrange multipliers (dual variables) can be interpreted as shadow prices that measure how much the objective would improve if a constraint were relaxed.
•
You can numerically check KKT residuals to assess solution quality and stopping criteria in algorithms.
•
For quadratic programs with linear constraints, KKT reduces to solving a linear system; active-set methods enumerate active constraints.
•
Numerical implementations must tolerate small violations using tolerances because floating-point arithmetic is inexact.
•
KKT underpins modern algorithms such as interior-point methods, SQP, and is used widely in machine learning (e.g., SVMs).

Prerequisites

→Multivariable calculus (gradients) — KKT stationarity uses gradients of the objective and constraints.
→Linear algebra — Solving KKT systems requires comfort with matrices, permutations, and linear independence (LICQ).
→Convex analysis — Sufficiency of KKT relies on convexity of objective and constraints and conditions like Slater.
→Lagrange multipliers (equality constraints) — KKT extends this classical method to inequalities; the Lagrangian is central.
→Numerical linear algebra — Stable and efficient solution of KKT systems (conditioning, pivoting, factorization).
→Quadratic forms and positive (semi)definiteness — QP structure and properties of the KKT matrix depend on Q being PSD/PD.
→Optimization duality basics — Understanding dual feasibility, strong duality, and shadow prices.
→Floating-point arithmetic and tolerances — Practical KKT checks use tolerances; understanding rounding errors avoids false failures.

Detailed Explanation

Tap terms for definitions

01Overview

Imagine trying to minimize a cost while respecting a set of do-not-violate rules. Karush–Kuhn–Tucker (KKT) conditions are the rulebook that tells you whether you have successfully minimized the cost without breaking any rules. They extend the classical Lagrange multiplier method (which handles equality constraints) to also support inequality constraints, thus covering a large class of real-world optimization problems. KKT conditions combine information from the objective function and the constraints into a single structure, the Lagrangian, and articulate four requirements—stationarity, primal feasibility, dual feasibility, and complementary slackness—that a candidate solution must satisfy at an optimum. When the problem is convex and mild regularity conditions hold (such as Slater’s condition), these requirements are not only necessary but also sufficient, meaning if they hold, you have found a global optimum. Hook: If you have ever pushed a box against a wall and felt the wall pushing back, you already have the right intuition—constraints push back with forces; KKT tells you when all pushes are perfectly balanced. Concept: KKT characterizes optimal points via gradients and multipliers attached to constraints, ensuring no direction can reduce the cost without violating constraints. Example: Minimizing distance to a point subject to staying on or beyond a line leads to a solution exactly on the line with a positive multiplier that balances the gradient of the objective and the constraint’s normal.

02Intuition & Analogies

Hook: Think of hiking downhill (minimizing elevation) in a valley with fences (constraints). You want the lowest point you can reach without crossing any fence. If you’re in the interior, you just walk until the slope is zero. If you hit a fence, you can only move along it. At the bottom, the slope that wants to pull you downhill is exactly canceled by the fence pushing back. Concept: The gradient of the objective points in the direction of steepest increase; the negative gradient points downhill. Constraints act like walls that block motion in certain directions. Lagrange multipliers are the magnitudes of the pushes from each active wall. Stationarity says “sum of downhill pulls plus pushes from active walls equals zero.” Primal feasibility means you didn’t cross any fence; dual feasibility says walls can only push (nonnegative multipliers), never pull; complementary slackness says a wall pushes you only if you’re exactly touching it (if you’re away from the wall, that multiplier must be zero). Example: Suppose you want the closest point to (1,2) but must stay in the region x + y ≥ 4. The unconstrained best is (1,2) (already the closest to itself), but it violates x + y ≥ 4. Projecting onto the line x + y = 4 gives (1.5, 2.5). The gradient of the squared distance at that point is (1,1), and the constraint’s normal is (−1,−1). A positive multiplier 1 on that constraint makes (1,1) + 1·(−1,−1) = (0,0): perfect balance.

03Formal Definition

We consider the nonlinear program: minimize f(x) subject to

g_{i}

(x)

\leq

0 for

i = 1

,...,m and

h_{j}

(x) = 0 for

j = 1

,...,p, where f,

g_{i}

h_{j}

are differentiable. Define the Lagrangian L(x,

μ

λ

) = f(x) +

\sum_{i = 1}^{m}

μ_{i}

g_{i}

(x) +

\sum_{j = 1}^{p}

λ_{j}

h_{j}

(x) with multipliers

μ

\in

R^{m}

and

λ

\in

R^{p}

. KKT conditions at a candidate optimum

x^{*}

(with multipliers

μ^{*}

λ^{*}

) are: 1) Stationarity:

\nabla

x^{*}

) +

\sum_{i = 1}^{m}

μ_{i}^{*}

\nabla

g_{i}

(

x^{*}

) +

\sum_{j = 1}^{p}

λ_{j}^{*}

\nabla

h_{j}

(

x^{*}

) = 0. 2) Primal feasibility:

g_{i}

(

x^{*}

)

\leq

0 and

h_{j}

(

x^{*}

) = 0. 3) Dual feasibility:

μ_{i}^{*}

\geq

0 for all i. 4) Complementary slackness:

μ_{i}^{*}

g_{i}

(

x^{*}

) = 0 for all i. Under appropriate constraint qualifications (CQs), e.g., Slater’s condition for convex problems, these conditions are necessary for optimality. If, in addition, the problem is convex (f convex,

g_{i}

convex,

h_{j}

affine), KKT conditions are also sufficient: any (

x^{*}

μ^{*}

λ^{*}

) satisfying KKT is globally optimal, and strong duality holds (primal and dual optimal values are equal).

04When to Use

Use KKT conditions when you need optimality certificates or stopping criteria in constrained optimization. They are indispensable in: (1) Convex optimization with inequalities (e.g., quadratic programs, logistic regression with constraints, LASSO variants), to both certify and compute solutions. (2) Machine learning models such as SVMs, where KKT yields the dual problem and support vector characterization (complementary slackness identifies active constraints). (3) Engineering design and control (trajectory optimization with bounds, resource allocation), where multipliers quantify trade-offs (shadow prices). (4) Algorithm design, e.g., interior-point methods and sequential quadratic programming (SQP), where solving KKT linear systems is the core step. (5) Simple small-scale problems where active sets can be enumerated and KKT equations solved directly. Hook: If you want a numeric “green light” that your candidate is optimal, check KKT residuals. Example: In a small quadratic program with a handful of linear inequalities, testing every subset of possibly active constraints and solving the KKT system gives you the exact optimizer and multipliers.

⚠️Common Mistakes

Ignoring constraint qualifications: KKT may fail to be necessary if CQs (e.g., LICQ, Slater) do not hold; test feasibility and regularity before concluding optimality.
Wrong sign conventions: With inequalities g_i(x) \le 0, multipliers must satisfy \mu_i \ge 0 and the Lagrangian uses “+ \mu_i g_i(x)”. Reversing signs breaks stationarity or dual feasibility.
Mixing active/inactive logic: Complementary slackness requires \mu_i g_i(x)=0. If g_i(x) < 0 substantially but \mu_i > 0, or if g_i(x) \approx 0 but \mu_i < 0, KKT is violated.
Numerical tolerance issues: Floating-point noise means exact zeros are rare. Use tolerances (e.g., 1e−8) for feasibility and complementarity checks.
Treating KKT as sufficient in nonconvex problems: For nonconvex objectives, KKT are necessary (under CQs) but not sufficient; they can certify stationary points that are not global minima.
Singular KKT systems: Equality-constrained QPs can yield singular KKT matrices if constraints are linearly dependent; check rank or use regularization.
Forgetting equality constraints are unrestricted in sign: Their multipliers (\lambda) can be positive or negative; only inequality multipliers (\mu) must be nonnegative.
Overlooking scaling: Poor scaling of Q, A, and gradients can cause unstable solves; scale variables/constraints for numerical robustness.

Key Formulas

Primal problem

x \in R^{n} min f (x) s.t. g_{i} (x) \leq 0, i = 1.. m, h_{j} (x) = 0, j = 1.. p

Explanation: This is the general nonlinear program with inequality and equality constraints. It describes the optimization landscape KKT conditions apply to.

Lagrangian

L (x, μ, λ) = f (x) + i = 1 \sum m μ_{i} g_{i} (x) + j = 1 \sum p λ_{j} h_{j} (x)

Explanation: The Lagrangian blends the objective and constraints using multipliers. It is central to expressing KKT stationarity and deriving the dual.

Stationarity

\nabla f (x^{*}) + i = 1 \sum m μ_{i}^{*} \nabla g_{i} (x^{*}) + j = 1 \sum p λ_{j}^{*} \nabla h_{j} (x^{*}) = 0

Explanation: At an optimal point, the gradient of the Lagrangian with respect to x is zero. This means no feasible first-order descent direction exists.

Primal feasibility

g_{i} (x^{*}) \leq 0, h_{j} (x^{*}) = 0

Explanation: The candidate solution must satisfy all inequality and equality constraints. Violating these invalidates optimality.

Dual feasibility

μ_{i}^{*} \geq 0

Explanation: Inequality multipliers cannot be negative; constraints can only push back, not pull. Equality multipliers are unrestricted in sign.

Complementary slackness

μ_{i}^{*} g_{i} (x^{*}) = 0

Explanation: Each inequality is either active ( $g_{i} = 0$ , possibly positive multiplier) or inactive ( $g_{i} < 0$ , zero multiplier). Both cannot be positive simultaneously.

Dual function

q (μ, λ) = x in f L (x, μ, λ)

Explanation: The dual function gives a lower bound on the primal optimal value for any feasible multipliers. Maximizing it yields the dual problem.

Dual problem

μ \geq 0, λ max q (μ, λ)

Explanation: The dual problem searches for the best lower bound via multipliers. Under strong duality, its optimal value equals the primal’s.

Slater's condition

\exists x : g_{i} (x) < 0 \forall i, h_{j} (x) = 0 \forall j

Explanation: This strict feasibility condition (for convex problems) ensures strong duality and that KKT conditions are necessary and sufficient.

KKT system for equality-constrained QP

[Q A_{A} A_{A}^{⊤} 0] [x λ_{A}] = [- c b_{A}]

Explanation: For minimizing $\frac{1}{2}$ $x^{T}$ Q x + $c^{T}$ x with active constraints A_ $A$ x = b_ $A$ , this linear system yields x and active multipliers. It is the core of active-set methods.

KKT residual norm

\nabla_{x} L (x, μ, λ) max {g (x), 0} h (x) μ \circ g (x)_{2}

Explanation: A practical stopping metric: near zero indicates approximate satisfaction of KKT. The max{g,0} term measures inequality violations only when positive.

Saddle-point characterization

L (x^{*}, μ, λ) \leq L (x^{*}, μ^{*}, λ^{*}) \leq L (x, μ^{*}, λ^{*})

Explanation: At optimality for convex problems, (x*, mu*, lambda*) forms a saddle point of the Lagrangian. This underlies strong duality and sufficiency of KKT.

Complexity Analysis

Checking KKT conditions is typically inexpensive compared to solving the optimization problem. If gradients are available, evaluating stationarity requires computing ∇f(x) and the weighted sum of constraint gradients, which costs O(n + m·n + p·n) = O((m + p) n) for dense functions, plus evaluating constraints O(m + p). Computing residual norms is linear in n and the number of constraints. If gradients are approximated via finite differences, cost increases by a factor of O(n) per function evaluation. Solving problems via KKT can be more expensive. For quadratic programs with k active constraints (treated as equalities), the KKT linear system has size (n + k) × (n + k). Direct Gaussian elimination with partial pivoting costs O((n + k)^3) time and O((n + k)^2) space for dense matrices. Active-set enumeration over m inequalities is exponential, O(2^m (n + k)^3), since each subset of constraints may require solving one KKT system; this is only practical for small m. Interior-point and SQP methods repeatedly form and solve linearized KKT systems. With dense linear algebra, each iteration is O(

n^{3}

) (or O((n + m)^3) including constraints). With sparse structure and specialized solvers, practical complexity can be nearly linear in problem size for large, sparse systems. Memory usage scales with storing Jacobians/Hessians (O(

n^{2}

) dense, less with sparsity). In summary: verifying KKT is cheap; globally solving via naive enumeration is exponential; scalable solvers rely on efficient, often sparse, KKT factorizations.

Code Examples

Numerically check KKT conditions for a simple convex problem

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Helper: L2 norm of a vector
5 double norm2(const vector<double>& v) {
6     double s = 0.0; for (double x : v) s += x * x; return sqrt(s);
7 }
8 
9 // Helper: vector add
10 vector<double> add(const vector<double>& a, const vector<double>& b) {
11     vector<double> c(a.size());
12     for (size_t i = 0; i < a.size(); ++i) c[i] = a[i] + b[i];
13     return c;
14 }
15 
16 // Helper: scalar * vector
17 vector<double> scal(double alpha, const vector<double>& a) {
18     vector<double> c(a.size());
19     for (size_t i = 0; i < a.size(); ++i) c[i] = alpha * a[i];
20     return c;
21 }
22 
23 int main() {
24     // Problem: minimize f(x) = (x1-1)^2 + (x2-2)^2
25     // subject to g(x) = 4 - x1 - x2 <= 0  (equivalently x1 + x2 >= 4)
26     // No equality constraints.
27 
28     auto f = [](const vector<double>& x) {
29         return (x[0]-1)*(x[0]-1) + (x[1]-2)*(x[1]-2);
30     };
31     auto grad_f = [](const vector<double>& x) {
32         // ∇f = 2*(x - [1,2])
33         return vector<double>{2.0*(x[0]-1.0), 2.0*(x[1]-2.0)};
34     };
35 
36     auto g = [](const vector<double>& x) {
37         // g(x) = 4 - x1 - x2
38         return vector<double>{4.0 - x[0] - x[1]};
39     };
40     auto grad_g = [](const vector<double>& x) {
41         // ∇g = [-1, -1]
42         (void)x; // unused
43         return vector<vector<double>>{ {-1.0, -1.0} };
44     };
45 
46     // Analytic solution: x* = (1.5, 2.5), μ* = 1
47     vector<double> x = {1.5, 2.5};
48     vector<double> mu = {1.0};   // inequality multiplier ≥ 0
49     vector<double> lambda;       // no equalities
50 
51     // Compute KKT residuals
52     vector<double> st = grad_f(x); // stationarity accumulator
53     auto G = grad_g(x);
54     for (size_t i = 0; i < mu.size(); ++i) {
55         // add μ_i ∇g_i
56         st = add(st, scal(mu[i], G[i]));
57     }
58 
59     // Inequality and equality feasibility
60     vector<double> gv = g(x);
61     // For reporting, compute max(0, g)
62     vector<double> pos_g(gv.size());
63     for (size_t i = 0; i < gv.size(); ++i) pos_g[i] = max(0.0, gv[i]);
64 
65     // Complementary slackness residuals: μ_i * g_i(x)
66     vector<double> comp(gv.size());
67     for (size_t i = 0; i < gv.size(); ++i) comp[i] = mu[i] * gv[i];
68 
69     double tol = 1e-8;
70 
71     cout << fixed << setprecision(8);
72     cout << "f(x*) = " << f(x) << "\n";
73     cout << "Stationarity residual ||∇f + Σ μ_i ∇g_i||_2 = " << norm2(st) << "\n";
74     cout << "Inequality feasibility max(g,0) L2 norm = " << norm2(pos_g) << "\n";
75     cout << "Dual feasibility (all μ_i >= 0): "
76          << ((all_of(mu.begin(), mu.end(), [](double v){ return v >= -1e-12; })) ? "OK" : "FAIL") << "\n";
77     cout << "Complementary slackness ||μ ∘ g(x)||_2 = " << norm2(comp) << "\n";
78 
79     bool kkt_ok = (norm2(st) <= tol) && (norm2(pos_g) <= tol)
80                   && all_of(mu.begin(), mu.end(), [](double v){ return v >= -1e-12; })
81                   && (norm2(comp) <= tol);
82     cout << "KKT satisfied within tol? " << (kkt_ok ? "YES" : "NO") << "\n";
83     return 0;
84 }
85

We define a 2D convex problem: minimize squared distance to (1,2) subject to x1 + x2 ≥ 4. We compute the gradient of f, the constraint and its gradient, and the analytic solution (x*, μ*) = ((1.5, 2.5), 1). The program assembles the KKT residuals: stationarity, feasibility, dual feasibility, and complementary slackness. All residuals are zero (up to numerical tolerance), confirming KKT optimality.

Time: O(n + m·n) for gradients and residuals (here n=2, m=1).Space: O(n + m) to store vectors and residuals.

Solve a small convex quadratic program via KKT active-set enumeration

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Basic dense linear algebra helpers for small systems
5 using Mat = vector<vector<double>>;
6 using Vec = vector<double>;
7 
8 Vec matVec(const Mat& A, const Vec& x) {
9     size_t n = A.size(), m = x.size();
10     Vec y(n, 0.0);
11     for (size_t i = 0; i < n; ++i)
12         for (size_t j = 0; j < m; ++j) y[i] += A[i][j] * x[j];
13     return y;
14 }
15 
16 Mat transpose(const Mat& A) {
17     size_t n = A.size(), m = A[0].size();
18     Mat AT(m, vector<double>(n));
19     for (size_t i = 0; i < n; ++i)
20         for (size_t j = 0; j < m; ++j) AT[j][i] = A[i][j];
21     return AT;
22 }
23 
24 // Solve A x = b via Gaussian elimination with partial pivoting
25 bool solveLinearSystem(Mat A, Vec b, Vec& x) {
26     const double EPS = 1e-12;
27     int n = (int)A.size();
28     for (int i = 0; i < n; ++i) A[i].push_back(b[i]); // augment
29     // Forward elimination
30     for (int col = 0, row = 0; col < n && row < n; ++col, ++row) {
31         // Pivot
32         int piv = row;
33         for (int i = row+1; i < n; ++i)
34             if (fabs(A[i][col]) > fabs(A[piv][col])) piv = i;
35         if (fabs(A[piv][col]) < EPS) return false; // singular
36         swap(A[piv], A[row]);
37         // Normalize
38         double div = A[row][col];
39         for (int j = col; j <= n; ++j) A[row][j] /= div;
40         // Eliminate
41         for (int i = 0; i < n; ++i) if (i != row) {
42             double f = A[i][col];
43             for (int j = col; j <= n; ++j) A[i][j] -= f * A[row][j];
44         }
45     }
46     x.assign(n, 0.0);
47     for (int i = 0; i < n; ++i) x[i] = A[i][n];
48     return true;
49 }
50 
51 // Build KKT system for equality-constrained QP: minimize 1/2 x^T Q x + c^T x s.t. Aeq x = beq
52 // Returns (x, lambda) by solving [Q A^T; A 0][x; lambda] = [-c; beq]
53 bool solveEqualityConstrainedQP(const Mat& Q, const Vec& c, const Mat& Aeq, const Vec& beq,
54                                 Vec& x, Vec& lambda) {
55     int n = (int)Q.size();
56     int k = (int)Aeq.size();
57     int N = n + k;
58     Mat KKT(N, vector<double>(N, 0.0));
59     // Top-left: Q
60     for (int i = 0; i < n; ++i) for (int j = 0; j < n; ++j) KKT[i][j] = Q[i][j];
61     // Top-right: A^T
62     for (int i = 0; i < n; ++i) for (int j = 0; j < k; ++j) KKT[i][n + j] = Aeq[j][i];
63     // Bottom-left: A
64     for (int i = 0; i < k; ++i) for (int j = 0; j < n; ++j) KKT[n + i][j] = Aeq[i][j];
65     // Bottom-right: zeros
66     // Build RHS: [-c; beq]
67     Vec rhs(N, 0.0);
68     for (int i = 0; i < n; ++i) rhs[i] = -c[i];
69     for (int i = 0; i < k; ++i) rhs[n + i] = beq[i];
70     Vec sol;
71     if (!solveLinearSystem(KKT, rhs, sol)) return false;
72     x.assign(sol.begin(), sol.begin() + n);
73     lambda.assign(sol.begin() + n, sol.end());
74     return true;
75 }
76 
77 // Active-set enumeration for Ax <= b (inequalities). We assume convex Q (symmetric PSD).
78 struct QP {
79     Mat Q; // n x n (symmetric PSD)
80     Vec c; // n
81     Mat A; // m x n
82     Vec b; // m
83 };
84 
85 int main() {
86     // Example: minimize (x-1)^T (x-1) with constraint x1 + x2 >= 4
87     // Write as: minimize 1/2 x^T Q x + c^T x, with Q = 2I, c = -2*[1,2]
88     // Constraint g(x) = A x - b <= 0 as -x1 - x2 + 4 <= 0 -> A = [-1 -1], b = -4
89 
90     QP prob;
91     int n = 2, m = 1;
92     prob.Q = {{2.0, 0.0}, {0.0, 2.0}}; // 2I
93     prob.c = {-2.0*1.0, -2.0*2.0};     // -2a
94     prob.A = {{-1.0, -1.0}};
95     prob.b = {-4.0};
96 
97     double bestObj = numeric_limits<double>::infinity();
98     Vec bestX, bestMu;
99     const double tol = 1e-8;
100 
101     // Enumerate all subsets of constraints as candidate active sets
102     for (int mask = 0; mask < (1<<m); ++mask) {
103         // Build A_active, b_active from active constraints (those with bit = 1)
104         vector<int> activeIdx;
105         for (int i = 0; i < m; ++i) if (mask & (1<<i)) activeIdx.push_back(i);
106         Mat Aeq; Vec beq;
107         for (int idx : activeIdx) { Aeq.push_back(prob.A[idx]); beq.push_back(prob.b[idx]); }
108 
109         Vec x, lambda_act;
110         if (!solveEqualityConstrainedQP(prob.Q, prob.c, Aeq, beq, x, lambda_act)) continue; // singular -> skip
111 
112         // Check primal feasibility: Ax - b <= 0 with tolerance
113         bool feasible = true;
114         for (int i = 0; i < m; ++i) {
115             double gi = prob.A[i][0]*x[0] + prob.A[i][1]*x[1] - prob.b[i];
116             if (gi > tol) { feasible = false; break; }
117         }
118         if (!feasible) continue;
119 
120         // Dual feasibility for active inequality multipliers: mu >= 0
121         // Here, the Lagrangian uses + mu^T (A x - b), so mu = lambda_act for active constraints
122         bool dual_ok = true;
123         for (double mu_i : lambda_act) if (mu_i < -1e-10) { dual_ok = false; break; }
124         if (!dual_ok) continue;
125 
126         // Complementary slackness for inactive constraints: they must be strictly inactive (g < 0) or have mu = 0
127         // Since inactive mu = 0 by construction, just ensure g <= 0 already (feasible).
128 
129         // Objective value: 1/2 x^T Q x + c^T x
130         double obj = 0.0;
131         for (int i = 0; i < n; ++i) for (int j = 0; j < n; ++j) obj += 0.5 * x[i] * prob.Q[i][j] * x[j];
132         for (int i = 0; i < n; ++i) obj += prob.c[i] * x[i];
133 
134         if (obj < bestObj) {
135             bestObj = obj;
136             bestX = x;
137             // Map active multipliers into full mu vector (inactive = 0)
138             bestMu.assign(m, 0.0);
139             for (size_t k = 0; k < activeIdx.size(); ++k) bestMu[activeIdx[k]] = lambda_act[k];
140         }
141     }
142 
143     cout << fixed << setprecision(8);
144     if (bestX.empty()) {
145         cout << "No feasible KKT point found.\n";
146         return 0;
147     }
148 
149     cout << "Best solution found by active-set enumeration:\n";
150     cout << "x* = [" << bestX[0] << ", " << bestX[1] << "]\n";
151     cout << "Objective f(x*) = " << bestObj << "\n";
152     cout << "Multipliers mu* = [" << bestMu[0] << "] (inactive constraints have mu=0)\n";
153 
154     // Verify KKT stationarity: Q x + c + A^T mu = 0
155     Vec gradL = { prob.Q[0][0]*bestX[0] + prob.Q[0][1]*bestX[1] + prob.c[0],
156                   prob.Q[1][0]*bestX[0] + prob.Q[1][1]*bestX[1] + prob.c[1] };
157     // Add A^T mu
158     for (int j = 0; j < n; ++j)
159         for (int i = 0; i < m; ++i) gradL[j] += prob.A[i][j] * bestMu[i];
160 
161     // g(x)
162     Vec gx(m, 0.0);
163     for (int i = 0; i < m; ++i) gx[i] = prob.A[i][0]*bestX[0] + prob.A[i][1]*bestX[1] - prob.b[i];
164 
165     // Residuals
166     auto norm2 = [](const Vec& v){ double s=0; for(double z:v) s+=z*z; return sqrt(s); };
167     Vec comp(m); for (int i = 0; i < m; ++i) comp[i] = bestMu[i] * gx[i];
168     Vec pos_g = gx; for (double& z : pos_g) z = max(0.0, z);
169 
170     cout << "KKT check:\n";
171     cout << "||stationarity|| = " << norm2(gradL) << "\n";
172     cout << "||max(g,0)|| = " << norm2(pos_g) << "\n";
173     cout << "dual feasibility (mu >= 0): "
174          << (all_of(bestMu.begin(), bestMu.end(), [](double v){ return v >= -1e-12; }) ? "OK" : "FAIL") << "\n";
175     cout << "||mu ∘ g|| = " << norm2(comp) << "\n";
176 
177     return 0;
178 }
179

This program solves a tiny convex quadratic program with linear inequalities by enumerating all active sets. For each subset, it solves the equality-constrained KKT system [Q A^T; A 0][x; λ] = [−c; b] to obtain x and multipliers for those constraints, then checks primal and dual feasibility. The best feasible candidate is returned, and KKT residuals are reported. For our example, the solution is x* = (1.5, 2.5) with μ* = 1, matching the analytical KKT solution.

Time: O(2^m · (n + k)^3) due to active-set enumeration and dense linear solves.Space: O((n + k)^2) for the KKT matrix during each solve; overall O(n^2 + m n).

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Helper: L2 norm of a vector
5	double norm2(const vector<double>& v) {
6	double s = 0.0; for (double x : v) s += x * x; return sqrt(s);
7	}
8
9	// Helper: vector add
10	vector<double> add(const vector<double>& a, const vector<double>& b) {
11	vector<double> c(a.size());
12	for (size_t i = 0; i < a.size(); ++i) c[i] = a[i] + b[i];
13	return c;
14	}
15
16	// Helper: scalar * vector
17	vector<double> scal(double alpha, const vector<double>& a) {
18	vector<double> c(a.size());
19	for (size_t i = 0; i < a.size(); ++i) c[i] = alpha * a[i];
20	return c;
21	}
22
23	int main() {
24	// Problem: minimize f(x) = (x1-1)^2 + (x2-2)^2
25	// subject to g(x) = 4 - x1 - x2 <= 0 (equivalently x1 + x2 >= 4)
26	// No equality constraints.
27
28	auto f = [](const vector<double>& x) {
29	return (x[0]-1)(x[0]-1) + (x[1]-2)(x[1]-2);
30	};
31	auto grad_f = [](const vector<double>& x) {
32	// ∇f = 2*(x - [1,2])
33	return vector<double>{2.0(x[0]-1.0), 2.0(x[1]-2.0)};
34	};
35
36	auto g = [](const vector<double>& x) {
37	// g(x) = 4 - x1 - x2
38	return vector<double>{4.0 - x[0] - x[1]};
39	};
40	auto grad_g = [](const vector<double>& x) {
41	// ∇g = [-1, -1]
42	(void)x; // unused
43	return vector<vector<double>>{ {-1.0, -1.0} };
44	};
45
46	// Analytic solution: x* = (1.5, 2.5), μ* = 1
47	vector<double> x = {1.5, 2.5};
48	vector<double> mu = {1.0}; // inequality multiplier ≥ 0
49	vector<double> lambda; // no equalities
50
51	// Compute KKT residuals
52	vector<double> st = grad_f(x); // stationarity accumulator
53	auto G = grad_g(x);
54	for (size_t i = 0; i < mu.size(); ++i) {
55	// add μ_i ∇g_i
56	st = add(st, scal(mu[i], G[i]));
57	}
58
59	// Inequality and equality feasibility
60	vector<double> gv = g(x);
61	// For reporting, compute max(0, g)
62	vector<double> pos_g(gv.size());
63	for (size_t i = 0; i < gv.size(); ++i) pos_g[i] = max(0.0, gv[i]);
64
65	// Complementary slackness residuals: μ_i * g_i(x)
66	vector<double> comp(gv.size());
67	for (size_t i = 0; i < gv.size(); ++i) comp[i] = mu[i] * gv[i];
68
69	double tol = 1e-8;
70
71	cout << fixed << setprecision(8);
72	cout << "f(x*) = " << f(x) << "\n";
73	cout << "Stationarity residual \|\|∇f + Σ μ_i ∇g_i\|\|_2 = " << norm2(st) << "\n";
74	cout << "Inequality feasibility max(g,0) L2 norm = " << norm2(pos_g) << "\n";
75	cout << "Dual feasibility (all μ_i >= 0): "
76	<< ((all_of(mu.begin(), mu.end(), [](double v){ return v >= -1e-12; })) ? "OK" : "FAIL") << "\n";
77	cout << "Complementary slackness \|\|μ ∘ g(x)\|\|_2 = " << norm2(comp) << "\n";
78
79	bool kkt_ok = (norm2(st) <= tol) && (norm2(pos_g) <= tol)
80	&& all_of(mu.begin(), mu.end(), [](double v){ return v >= -1e-12; })
81	&& (norm2(comp) <= tol);
82	cout << "KKT satisfied within tol? " << (kkt_ok ? "YES" : "NO") << "\n";
83	return 0;
84	}
85

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Basic dense linear algebra helpers for small systems
5	using Mat = vector<vector<double>>;
6	using Vec = vector<double>;
7
8	Vec matVec(const Mat& A, const Vec& x) {
9	size_t n = A.size(), m = x.size();
10	Vec y(n, 0.0);
11	for (size_t i = 0; i < n; ++i)
12	for (size_t j = 0; j < m; ++j) y[i] += A[i][j] * x[j];
13	return y;
14	}
15
16	Mat transpose(const Mat& A) {
17	size_t n = A.size(), m = A[0].size();
18	Mat AT(m, vector<double>(n));
19	for (size_t i = 0; i < n; ++i)
20	for (size_t j = 0; j < m; ++j) AT[j][i] = A[i][j];
21	return AT;
22	}
23
24	// Solve A x = b via Gaussian elimination with partial pivoting
25	bool solveLinearSystem(Mat A, Vec b, Vec& x) {
26	const double EPS = 1e-12;
27	int n = (int)A.size();
28	for (int i = 0; i < n; ++i) A[i].push_back(b[i]); // augment
29	// Forward elimination
30	for (int col = 0, row = 0; col < n && row < n; ++col, ++row) {
31	// Pivot
32	int piv = row;
33	for (int i = row+1; i < n; ++i)
34	if (fabs(A[i][col]) > fabs(A[piv][col])) piv = i;
35	if (fabs(A[piv][col]) < EPS) return false; // singular
36	swap(A[piv], A[row]);
37	// Normalize
38	double div = A[row][col];
39	for (int j = col; j <= n; ++j) A[row][j] /= div;
40	// Eliminate
41	for (int i = 0; i < n; ++i) if (i != row) {
42	double f = A[i][col];
43	for (int j = col; j <= n; ++j) A[i][j] -= f * A[row][j];
44	}
45	}
46	x.assign(n, 0.0);
47	for (int i = 0; i < n; ++i) x[i] = A[i][n];
48	return true;
49	}
50
51	// Build KKT system for equality-constrained QP: minimize 1/2 x^T Q x + c^T x s.t. Aeq x = beq
52	// Returns (x, lambda) by solving [Q A^T; A 0][x; lambda] = [-c; beq]
53	bool solveEqualityConstrainedQP(const Mat& Q, const Vec& c, const Mat& Aeq, const Vec& beq,
54	Vec& x, Vec& lambda) {
55	int n = (int)Q.size();
56	int k = (int)Aeq.size();
57	int N = n + k;
58	Mat KKT(N, vector<double>(N, 0.0));
59	// Top-left: Q
60	for (int i = 0; i < n; ++i) for (int j = 0; j < n; ++j) KKT[i][j] = Q[i][j];
61	// Top-right: A^T
62	for (int i = 0; i < n; ++i) for (int j = 0; j < k; ++j) KKT[i][n + j] = Aeq[j][i];
63	// Bottom-left: A
64	for (int i = 0; i < k; ++i) for (int j = 0; j < n; ++j) KKT[n + i][j] = Aeq[i][j];
65	// Bottom-right: zeros
66	// Build RHS: [-c; beq]
67	Vec rhs(N, 0.0);
68	for (int i = 0; i < n; ++i) rhs[i] = -c[i];
69	for (int i = 0; i < k; ++i) rhs[n + i] = beq[i];
70	Vec sol;
71	if (!solveLinearSystem(KKT, rhs, sol)) return false;
72	x.assign(sol.begin(), sol.begin() + n);
73	lambda.assign(sol.begin() + n, sol.end());
74	return true;
75	}
76
77	// Active-set enumeration for Ax <= b (inequalities). We assume convex Q (symmetric PSD).
78	struct QP {
79	Mat Q; // n x n (symmetric PSD)
80	Vec c; // n
81	Mat A; // m x n
82	Vec b; // m
83	};
84
85	int main() {
86	// Example: minimize (x-1)^T (x-1) with constraint x1 + x2 >= 4
87	// Write as: minimize 1/2 x^T Q x + c^T x, with Q = 2I, c = -2*[1,2]
88	// Constraint g(x) = A x - b <= 0 as -x1 - x2 + 4 <= 0 -> A = [-1 -1], b = -4
89
90	QP prob;
91	int n = 2, m = 1;
92	prob.Q = {{2.0, 0.0}, {0.0, 2.0}}; // 2I
93	prob.c = {-2.01.0, -2.02.0}; // -2a
94	prob.A = {{-1.0, -1.0}};
95	prob.b = {-4.0};
96
97	double bestObj = numeric_limits<double>::infinity();
98	Vec bestX, bestMu;
99	const double tol = 1e-8;
100
101	// Enumerate all subsets of constraints as candidate active sets
102	for (int mask = 0; mask < (1<<m); ++mask) {
103	// Build A_active, b_active from active constraints (those with bit = 1)
104	vector<int> activeIdx;
105	for (int i = 0; i < m; ++i) if (mask & (1<<i)) activeIdx.push_back(i);
106	Mat Aeq; Vec beq;
107	for (int idx : activeIdx) { Aeq.push_back(prob.A[idx]); beq.push_back(prob.b[idx]); }
108
109	Vec x, lambda_act;
110	if (!solveEqualityConstrainedQP(prob.Q, prob.c, Aeq, beq, x, lambda_act)) continue; // singular -> skip
111
112	// Check primal feasibility: Ax - b <= 0 with tolerance
113	bool feasible = true;
114	for (int i = 0; i < m; ++i) {
115	double gi = prob.A[i][0]x[0] + prob.A[i][1]x[1] - prob.b[i];
116	if (gi > tol) { feasible = false; break; }
117	}
118	if (!feasible) continue;
119
120	// Dual feasibility for active inequality multipliers: mu >= 0
121	// Here, the Lagrangian uses + mu^T (A x - b), so mu = lambda_act for active constraints
122	bool dual_ok = true;
123	for (double mu_i : lambda_act) if (mu_i < -1e-10) { dual_ok = false; break; }
124	if (!dual_ok) continue;
125
126	// Complementary slackness for inactive constraints: they must be strictly inactive (g < 0) or have mu = 0
127	// Since inactive mu = 0 by construction, just ensure g <= 0 already (feasible).
128
129	// Objective value: 1/2 x^T Q x + c^T x
130	double obj = 0.0;
131	for (int i = 0; i < n; ++i) for (int j = 0; j < n; ++j) obj += 0.5 * x[i] * prob.Q[i][j] * x[j];
132	for (int i = 0; i < n; ++i) obj += prob.c[i] * x[i];
133
134	if (obj < bestObj) {
135	bestObj = obj;
136	bestX = x;
137	// Map active multipliers into full mu vector (inactive = 0)
138	bestMu.assign(m, 0.0);
139	for (size_t k = 0; k < activeIdx.size(); ++k) bestMu[activeIdx[k]] = lambda_act[k];
140	}
141	}
142
143	cout << fixed << setprecision(8);
144	if (bestX.empty()) {
145	cout << "No feasible KKT point found.\n";
146	return 0;
147	}
148
149	cout << "Best solution found by active-set enumeration:\n";
150	cout << "x* = [" << bestX[0] << ", " << bestX[1] << "]\n";
151	cout << "Objective f(x*) = " << bestObj << "\n";
152	cout << "Multipliers mu* = [" << bestMu[0] << "] (inactive constraints have mu=0)\n";
153
154	// Verify KKT stationarity: Q x + c + A^T mu = 0
155	Vec gradL = { prob.Q[0][0]bestX[0] + prob.Q[0][1]bestX[1] + prob.c[0],
156	prob.Q[1][0]bestX[0] + prob.Q[1][1]bestX[1] + prob.c[1] };
157	// Add A^T mu
158	for (int j = 0; j < n; ++j)
159	for (int i = 0; i < m; ++i) gradL[j] += prob.A[i][j] * bestMu[i];
160
161	// g(x)
162	Vec gx(m, 0.0);
163	for (int i = 0; i < m; ++i) gx[i] = prob.A[i][0]bestX[0] + prob.A[i][1]bestX[1] - prob.b[i];
164
165	// Residuals
166	auto norm2 = [](const Vec& v){ double s=0; for(double z:v) s+=z*z; return sqrt(s); };
167	Vec comp(m); for (int i = 0; i < m; ++i) comp[i] = bestMu[i] * gx[i];
168	Vec pos_g = gx; for (double& z : pos_g) z = max(0.0, z);
169
170	cout << "KKT check:\n";
171	cout << "\|\|stationarity\|\| = " << norm2(gradL) << "\n";
172	cout << "\|\|max(g,0)\|\| = " << norm2(pos_g) << "\n";
173	cout << "dual feasibility (mu >= 0): "
174	<< (all_of(bestMu.begin(), bestMu.end(), [](double v){ return v >= -1e-12; }) ? "OK" : "FAIL") << "\n";
175	cout << "\|\|mu ∘ g\|\| = " << norm2(comp) << "\n";
176
177	return 0;
178	}
179