Implicit Differentiation & Implicit Function Theorem
Key Points
- •Implicit differentiation lets you find slopes and higher derivatives even when y is given indirectly by an equation F(x,y)=0.
- •The key formula for a single equation is dy/dx = -/, provided at the point of interest.
- •The Implicit Function Theorem guarantees a local function ) exists if F(x0,y0)=0 and the partial derivative with respect to y is nonzero (or det(∂F/∂y) ≠ 0 in higher dimensions).
- •Numerically, you can compute slopes using finite differences for partials and evaluate -/; good step sizes and checks for ≈0 are crucial.
- •Newton’s method can solve for y at a fixed x by iterating in multiple variables it becomes solving Δy = -F.
- •In higher dimensions, the derivative matrix is Dg(x) = - ( F)^{-1} F; computing it requires building Jacobians and solving a linear system.
- •Geometrically, F(x,y)=0 is a level curve; the tangent direction is perpendicular to the gradient ∇F, and its slope encodes dy/dx.
- •Watch out for points where (vertical tangents or cusps); the implicit formula can fail or switch to solving for x as a function of y.
Prerequisites
- →Single-variable calculus (derivatives, chain rule) — Implicit differentiation relies on the chain rule and understanding slopes.
- →Multivariable calculus (partial derivatives, gradients) — Formulas use partial derivatives and gradients/Jacobians.
- →Linear algebra (matrices, determinants, solving linear systems) — The Implicit Function Theorem and multivariate case require Jacobians and matrix inverses/solves.
- →Numerical methods (finite differences, Newton’s method) — Code computes partials numerically and solves nonlinear equations iteratively.
- →Floating-point arithmetic and stability — Choosing step sizes and handling ill-conditioning affect accuracy and robustness.
- →C++ functions and std::function — We pass functions as arguments to evaluators and solvers.
- →Gaussian elimination with pivoting — Needed to solve D_yF Δy = −F efficiently and stably.
Detailed Explanation
Tap terms for definitions01Overview
Hook: Suppose you know a curve only by a rule like x^2 + y^2 = 1 (a circle). You can see the shape, but there’s no single formula y = f(x) for the entire circle. How do you still get the slope at a point?
Concept: Implicit differentiation is the technique for differentiating relationships where variables are tied together by an equation F(x, y) = 0 rather than an explicit y = f(x). It works by differentiating both sides with respect to x and using the chain rule to peel out dy/dx. This idea generalizes beautifully to many variables and many equations.
Example: For the circle F(x,y) = x^2 + y^2 - 1 = 0, differentiating gives 2x + 2y (dy/dx) = 0, so dy/dx = -x/y. At the point (0.6, 0.8), the slope is -0.75.
02Intuition & Analogies
Hook: Think of a hiking trail defined by a painted line on the ground. You don’t have the trail’s elevation as a neat y = f(x) formula, but at each step you can still tell how steeply you’re going up or down.
Concept: An equation F(x,y)=0 defines a level set—the set of all points where F takes the same value. The gradient ∇F points in the direction of steepest increase of F and is perpendicular to this level set. Because the curve is perpendicular to ∇F, the slope of the tangent line along the curve is determined by how F changes with x versus how it changes with y. The ratio of these changes, -F_x/F_y, is exactly the slope dy/dx when moving along the curve.
Example: Imagine F as a “temperature map” on a floor, and you’re walking along the – say – 0-degree isotherm line (F=0). If increasing x makes F go up by F_x and increasing y makes it go up by F_y, then to stay on the 0-degree line, a small step Δx must be balanced by a step Δy that keeps F unchanged: F_x Δx + F_y Δy ≈ 0. Solving gives Δy/Δx ≈ -F_x/F_y, which is the intuitive heart of implicit differentiation. For multiple outputs y in terms of x, the same balancing act holds, but you use matrices (Jacobians) instead of scalars.
03Formal Definition
04When to Use
Hook: Whenever it’s tough or impossible to solve for y explicitly, but you still need slopes, tangents, or sensitivity to x, implicit differentiation is the tool of choice.
Concept: Use implicit differentiation when a relationship is naturally defined as a constraint F(x,y)=0 (or F(x,y)=c). It is especially useful near points where the constraint surface behaves nicely (nonzero F_y or invertible D_y F). In modeling, engineering, and graphics, constraints often define curves and surfaces; differentiating them implicitly yields tangents, normals, and rates of change.
Examples:
- Geometry: Find the slope to a conic (circle, ellipse) without solving for y.
- Physics: Constraints like conservation laws or contact constraints define implicit relations; velocities and accelerations follow from differentiating constraints.
- Optimization: Karush–Kuhn–Tucker systems or equation constraints define optimal solutions; sensitivity of solutions to parameters can be obtained via implicit differentiation.
- Numerical computing: When you solve F(x,y)=0 for y with Newton’s method, the linearization F(x+Δx,y+Δy)≈F + D_xF Δx + D_yF Δy gives Δy in terms of Δx via D_yF, which is precisely the implicit derivative.
⚠️Common Mistakes
Hook: Most errors come from forgetting what is held constant and where formulas are valid.
Concept and fixes:
- Ignoring the condition F_y ≠ 0 (or det(D_yF) ≠ 0). If F_y=0 at a point, dy/dx = -F_x/F_y blows up or is undefined. Fix: Check the condition; if it fails, consider solving x as a function of y (swap roles) or analyze singular behavior.
- Differentiating as if y were independent of x. Remember y=y(x); apply the chain rule so terms with y pick up factors of dy/dx (or Dg).
- Using points not on the curve. The formula applies at points satisfying F(x,y)=0. Fix: First verify F≈0 numerically before computing derivatives.
- Numerical step sizes that are too big or too small. Finite differences with a bad h cause truncation or rounding error. Fix: Use central differences with a scaled h (e.g., 10^{-6} times variable scale) and test sensitivity.
- Poor Newton initial guesses. Newton’s method can diverge if the starting y is far from the solution or if D_yF is ill-conditioned. Fix: Use damping, bracketed methods for 1D, or better initial guesses.
- Confusing partials with totals. F_x and F_y are partial derivatives holding the other variable fixed; dy/dx is a total derivative along the constraint. Keep roles clear in formulas and code.
Key Formulas
Implicit derivative (scalar)
Explanation: For a single equation F(x,y)=0 with 0, the slope of y as a function of x is the negative ratio of partial derivatives. This comes from differentiating F(x,y(x)) 0 and applying the chain rule.
Second derivative (scalar)
Explanation: Differentiating + y' = 0 again and using equality of mixed partials yields this expression. It gives curvature information of the implicitly defined curve where 0.
Implicit Function Theorem (existence)
Explanation: When the Jacobian with respect to y is invertible at a solution, there exists a local function ) solving the constraint. The statement is local: g is guaranteed only near (,).
Implicit Function Theorem (derivative)
Explanation: The derivative (Jacobian) of the implicit function g is obtained by solving · Dg + D_x for Dg. This generalizes -/ to higher dimensions.
First-order linearization
Explanation: Near a point, F behaves like an affine map composed of its Jacobians. Setting this approximation to zero under the constraint provides the linear relationship between small changes in x and y.
Newton iteration (fixed x)
Explanation: Holding x constant, Newton’s method updates y by solving a linear system with the Jacobian . This converges quadratically near well-behaved solutions.
Nonsingularity condition
Explanation: This condition ensures is invertible, a prerequisite for the Implicit Function Theorem and for stable numerical solution of y given x.
Orthogonality of gradient and tangent
Explanation: The tangent vector lies in the kernel of the Jacobian of F; equivalently, it is orthogonal to the gradient. This geometric fact underlies the slope formula.
Complexity Analysis
Code Examples
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Type alias for a scalar implicit function F(x,y) 5 using ImplicitF = function<double(double,double)>; 6 7 // Central difference partial derivative with respect to x at (x,y) 8 double partial_x(const ImplicitF& F, double x, double y) { 9 double scale = max(1.0, abs(x)); 10 double h = 1e-6 * scale; // scale-aware step 11 return (F(x + h, y) - F(x - h, y)) / (2.0 * h); 12 } 13 14 // Central difference partial derivative with respect to y at (x,y) 15 double partial_y(const ImplicitF& F, double x, double y) { 16 double scale = max(1.0, abs(y)); 17 double h = 1e-6 * scale; 18 return (F(x, y + h) - F(x, y - h)) / (2.0 * h); 19 } 20 21 // Compute dy/dx = -F_x / F_y at (x,y). Throws if F_y is too small. 22 double implicit_slope(const ImplicitF& F, double x, double y) { 23 double Fx = partial_x(F, x, y); 24 double Fy = partial_y(F, x, y); 25 if (!isfinite(Fx) || !isfinite(Fy)) { 26 throw runtime_error("Non-finite partial derivative encountered"); 27 } 28 if (abs(Fy) < 1e-12) { 29 throw runtime_error("F_y is near zero; slope may be undefined or vertical."); 30 } 31 return -Fx / Fy; 32 } 33 34 int main() { 35 // Example: Unit circle F(x,y) = x^2 + y^2 - 1 = 0 36 ImplicitF F = [](double x, double y){ return x*x + y*y - 1.0; }; 37 38 double x = 0.6; 39 double y = sqrt(max(0.0, 1.0 - x*x)); // upper branch y>0 40 41 // Verify the point is on the curve (approximately) 42 double val = F(x, y); 43 cout << fixed << setprecision(6); 44 cout << "F(x,y) = " << val << " (should be ~0)\n"; 45 46 try { 47 double slope = implicit_slope(F, x, y); 48 cout << "dy/dx at (" << x << ", " << y << ") ≈ " << slope << "\n"; 49 cout << "Expected (analytic) slope = -x/y = " << -x/y << "\n"; 50 } catch (const exception& e) { 51 cerr << "Error: " << e.what() << "\n"; 52 } 53 return 0; 54 } 55
We approximate the partial derivatives F_x and F_y by central differences and apply the implicit formula dy/dx = -F_x/F_y. On the unit circle, the analytic slope is -x/y, which matches the numerical estimate when the step size is reasonable and the point is not near F_y=0 (vertical tangent).
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 using ImplicitF = function<double(double,double)>; 5 6 // Central difference derivative wrt y at fixed x 7 double partial_y(const ImplicitF& F, double x, double y) { 8 double scale = max(1.0, abs(y)); 9 double h = 1e-6 * scale; 10 return (F(x, y + h) - F(x, y - h)) / (2.0 * h); 11 } 12 13 // Newton iteration to find y s.t. F(x, y)=0, starting from y0. 14 // Returns pair(y, converged) 15 pair<double,bool> newton_solve_y_at_x(const ImplicitF& F, double x, double y0, 16 int max_iter=50, double tol=1e-12) { 17 double y = y0; 18 for (int it = 0; it < max_iter; ++it) { 19 double Fy = partial_y(F, x, y); 20 double Fx0 = F(x, y); 21 if (!isfinite(Fy) || !isfinite(Fx0)) return {y,false}; 22 if (abs(Fx0) < tol) return {y,true}; 23 if (abs(Fy) < 1e-14) return {y,false}; // avoid division by near-zero 24 double step = Fx0 / Fy; 25 // Optional damping for stability 26 double lambda = 1.0; 27 double y_new = y - lambda * step; 28 if (abs(y_new - y) < tol * max(1.0, abs(y))) { 29 y = y_new; return {y,true}; 30 } 31 y = y_new; 32 } 33 return {y,false}; 34 } 35 36 int main(){ 37 // Example: circle F(x,y)=x^2 + y^2 - 1. For x=0.6, find y>0. 38 ImplicitF F = [](double x, double y){ return x*x + y*y - 1.0; }; 39 40 double x = 0.6; 41 double y0 = 0.5; // initial guess near the upper branch 42 43 auto [y, ok] = newton_solve_y_at_x(F, x, y0); 44 cout << fixed << setprecision(12); 45 if (ok) { 46 cout << "Converged y ≈ " << y << "\n"; 47 cout << "Residual F(x,y) ≈ " << F(x,y) << "\n"; 48 cout << "True y = sqrt(1 - x^2) = " << sqrt(1.0 - x*x) << "\n"; 49 } else { 50 cout << "Newton did not converge. Last y = " << y << ", residual = " << F(x,y) << "\n"; 51 } 52 return 0; 53 } 54
Holding x fixed, Newton’s method updates y by y_{k+1} = y_k − F/F_y. We approximate F_y with a central difference. For the circle, this converges quickly if the initial guess is on the correct branch and not near a vertical tangent where F_y≈0.
1 #include <bits/stdc++.h> 2 using namespace std; 3 4 // Vector type aliases 5 using Vec = vector<double>; 6 using Mat = vector<vector<double>>; // row-major 7 8 // F: R^{n+m} -> R^{m}, given as F(x,y) -> vector size m 9 using VectorF = function<Vec(const Vec&, const Vec&)>; 10 11 // Central difference Jacobian wrt y: m x m 12 Mat jacobian_y(const VectorF& F, const Vec& x, const Vec& y) { 13 int m = (int)y.size(); 14 Vec F0 = F(x, y); 15 Mat J(m, Vec(m, 0.0)); 16 for (int j = 0; j < m; ++j) { 17 Vec yh = y; 18 double scale = max(1.0, abs(y[j])); 19 double h = 1e-6 * scale; 20 yh[j] = y[j] + h; Vec Fp = F(x, yh); 21 yh[j] = y[j] - h; Vec Fm = F(x, yh); 22 for (int i = 0; i < m; ++i) J[i][j] = (Fp[i] - Fm[i]) / (2.0*h); 23 } 24 return J; 25 } 26 27 // Central difference Jacobian wrt x: m x n 28 Mat jacobian_x(const VectorF& F, const Vec& x, const Vec& y) { 29 int m = (int)y.size(); 30 int n = (int)x.size(); 31 Mat J(m, Vec(n, 0.0)); 32 for (int j = 0; j < n; ++j) { 33 Vec xh = x; 34 double scale = max(1.0, abs(x[j])); 35 double h = 1e-6 * scale; 36 xh[j] = x[j] + h; Vec Fp = F(xh, y); 37 xh[j] = x[j] - h; Vec Fm = F(xh, y); 38 for (int i = 0; i < m; ++i) J[i][j] = (Fp[i] - Fm[i]) / (2.0*h); 39 } 40 return J; 41 } 42 43 // Solve A X = B for X, where A is m x m and B is m x n, via Gaussian elimination with partial pivoting 44 Mat solve_multi_rhs(Mat A, Mat B) { 45 int m = (int)A.size(); 46 int n = (int)B[0].size(); 47 // Augment A with B (m x (m+n)) 48 Mat Aug(m, Vec(m + n, 0.0)); 49 for (int i = 0; i < m; ++i) { 50 for (int j = 0; j < m; ++j) Aug[i][j] = A[i][j]; 51 for (int j = 0; j < n; ++j) Aug[i][m + j] = B[i][j]; 52 } 53 // Forward elimination 54 for (int k = 0; k < m; ++k) { 55 // Pivot 56 int piv = k; double best = abs(Aug[k][k]); 57 for (int i = k+1; i < m; ++i) if (abs(Aug[i][k]) > best) { best = abs(Aug[i][k]); piv = i; } 58 if (best < 1e-14) throw runtime_error("Singular or ill-conditioned Jacobian D_yF"); 59 if (piv != k) swap(Aug[piv], Aug[k]); 60 // Normalize row k 61 double diag = Aug[k][k]; 62 for (int j = k; j < m + n; ++j) Aug[k][j] /= diag; 63 // Eliminate below 64 for (int i = k+1; i < m; ++i) { 65 double factor = Aug[i][k]; 66 if (factor == 0.0) continue; 67 for (int j = k; j < m + n; ++j) Aug[i][j] -= factor * Aug[k][j]; 68 } 69 } 70 // Back substitution 71 for (int k = m-1; k >= 0; --k) { 72 for (int i = 0; i < k; ++i) { 73 double factor = Aug[i][k]; 74 if (factor == 0.0) continue; 75 for (int j = k; j < m + n; ++j) Aug[i][j] -= factor * Aug[k][j]; 76 } 77 } 78 // Extract X 79 Mat X(m, Vec(n, 0.0)); 80 for (int i = 0; i < m; ++i) 81 for (int j = 0; j < n; ++j) 82 X[i][j] = Aug[i][m + j]; 83 return X; 84 } 85 86 // Compute Dg = -(D_yF)^{-1} D_xF at (x,y) 87 Mat implicit_derivative(const VectorF& F, const Vec& x, const Vec& y) { 88 Mat Jy = jacobian_y(F, x, y); // m x m 89 Mat Jx = jacobian_x(F, x, y); // m x n 90 // Solve Jy * D = -Jx => D = -Jy^{-1} Jx 91 for (auto& row : Jx) for (double& v : row) v = -v; 92 Mat D = solve_multi_rhs(Jy, Jx); // D is m x n, representing Dg 93 return D; 94 } 95 96 int main(){ 97 cout << fixed << setprecision(8); 98 // Example system (m=2, n=2): 99 // F1(x,y) = y1 + x1^2 - 1 = 0 => y1 = 1 - x1^2 100 // F2(x,y) = y2 - exp(x2) = 0 => y2 = exp(x2) 101 VectorF F = [](const Vec& x, const Vec& y){ 102 Vec out(2); 103 out[0] = y[0] + x[0]*x[0] - 1.0; 104 out[1] = y[1] - exp(x[1]); 105 return out; 106 }; 107 108 Vec x = {0.6, 0.0}; 109 Vec y = {1.0 - x[0]*x[0], exp(x[1])}; // exact solution at x 110 111 // Check residual 112 Vec r = F(x,y); 113 cout << "Residuals: [" << r[0] << ", " << r[1] << "]\n"; 114 115 Mat Dg = implicit_derivative(F, x, y); // Dg is 2 x 2 116 cout << "Dg = -(D_yF)^{-1} D_xF at (x,y):\n"; 117 for (int i = 0; i < (int)Dg.size(); ++i) { 118 for (int j = 0; j < (int)Dg[0].size(); ++j) cout << setw(12) << Dg[i][j] << ' '; 119 cout << '\n'; 120 } 121 cout << "Expected: [[-2*x1, 0],[0, exp(x2)]] = [[" << -2*x[0] << ", 0], [0, " << exp(x[1]) << "]]\n"; 122 return 0; 123 } 124
For F: R^{n+m}→R^{m}, we compute Jacobians D_yF and D_xF via central differences and solve D_yF·Dg = −D_xF using Gaussian elimination with partial pivoting. The example has analytical Dg = [[-2x1, 0], [0, e^{x2}]], which the numerical code reproduces. This generalizes −F_x/F_y to higher dimensions.