🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
∑MathIntermediate

Implicit Differentiation & Implicit Function Theorem

Key Points

  • •
    Implicit differentiation lets you find slopes and higher derivatives even when y is given indirectly by an equation F(x,y)=0.
  • •
    The key formula for a single equation is dy/dx = -Fx​/Fy​, provided Fy​=0 at the point of interest.
  • •
    The Implicit Function Theorem guarantees a local function y=g(x) exists if F(x0,y0)=0 and the partial derivative with respect to y is nonzero (or det(∂F/∂y) ≠ 0 in higher dimensions).
  • •
    Numerically, you can compute slopes using finite differences for partials and evaluate -Fx​/Fy​; good step sizes and checks for Fy​≈0 are crucial.
  • •
    Newton’s method can solve for y at a fixed x by iterating yk+1​=yk​−F/Fy​; in multiple variables it becomes solving Jy​ Δy = -F.
  • •
    In higher dimensions, the derivative matrix is Dg(x) = - (Dy​ F)^{-1} Dx​ F; computing it requires building Jacobians and solving a linear system.
  • •
    Geometrically, F(x,y)=0 is a level curve; the tangent direction is perpendicular to the gradient ∇F, and its slope encodes dy/dx.
  • •
    Watch out for points where Fy​=0 (vertical tangents or cusps); the implicit formula can fail or switch to solving for x as a function of y.

Prerequisites

  • →Single-variable calculus (derivatives, chain rule) — Implicit differentiation relies on the chain rule and understanding slopes.
  • →Multivariable calculus (partial derivatives, gradients) — Formulas use partial derivatives and gradients/Jacobians.
  • →Linear algebra (matrices, determinants, solving linear systems) — The Implicit Function Theorem and multivariate case require Jacobians and matrix inverses/solves.
  • →Numerical methods (finite differences, Newton’s method) — Code computes partials numerically and solves nonlinear equations iteratively.
  • →Floating-point arithmetic and stability — Choosing step sizes and handling ill-conditioning affect accuracy and robustness.
  • →C++ functions and std::function — We pass functions as arguments to evaluators and solvers.
  • →Gaussian elimination with pivoting — Needed to solve D_yF Δy = −F efficiently and stably.

Detailed Explanation

Tap terms for definitions

01Overview

Hook: Suppose you know a curve only by a rule like x^2 + y^2 = 1 (a circle). You can see the shape, but there’s no single formula y = f(x) for the entire circle. How do you still get the slope at a point?
Concept: Implicit differentiation is the technique for differentiating relationships where variables are tied together by an equation F(x, y) = 0 rather than an explicit y = f(x). It works by differentiating both sides with respect to x and using the chain rule to peel out dy/dx. This idea generalizes beautifully to many variables and many equations.
Example: For the circle F(x,y) = x^2 + y^2 - 1 = 0, differentiating gives 2x + 2y (dy/dx) = 0, so dy/dx = -x/y. At the point (0.6, 0.8), the slope is -0.75.

02Intuition & Analogies

Hook: Think of a hiking trail defined by a painted line on the ground. You don’t have the trail’s elevation as a neat y = f(x) formula, but at each step you can still tell how steeply you’re going up or down.
Concept: An equation F(x,y)=0 defines a level set—the set of all points where F takes the same value. The gradient ∇F points in the direction of steepest increase of F and is perpendicular to this level set. Because the curve is perpendicular to ∇F, the slope of the tangent line along the curve is determined by how F changes with x versus how it changes with y. The ratio of these changes, -F_x/F_y, is exactly the slope dy/dx when moving along the curve.
Example: Imagine F as a “temperature map” on a floor, and you’re walking along the – say – 0-degree isotherm line (F=0). If increasing x makes F go up by F_x and increasing y makes it go up by F_y, then to stay on the 0-degree line, a small step Δx must be balanced by a step Δy that keeps F unchanged: F_x Δx + F_y Δy ≈ 0. Solving gives Δy/Δx ≈ -F_x/F_y, which is the intuitive heart of implicit differentiation. For multiple outputs y in terms of x, the same balancing act holds, but you use matrices (Jacobians) instead of scalars.

03Formal Definition

Hook: The algebraic trick you used for the circle applies generally and leads to a clean formula. Concept: Let F: Rn+m → Rm be continuously differentiable (C1). Split variables as x ∈ Rn, y ∈ Rm. At a point (x0​,y0​) with F(x0​,y0​)=0 and the Jacobian with respect to y, Dy​ F(x0​,y0​), invertible (det(Dy​ F) = 0), the Implicit Function Theorem states there are neighborhoods U of x0​ and V of y0​ and a unique C1 function g: U → V such that F(x, g(x)) = 0 for x ∈ U. Moreover, its derivative is Dg(x) = - (Dy​ F(x, g(x)))^{-1} Dx​ F(x, g(x)). Example (m=1, n=1): If F(x,y)=0 and Fy​(x0​,y0​) = 0, then near (x0​,y0​) there exists y=g(x) with g'(x0​) = - Fx​(x0​,y0​) / Fy​(x0​,y0​). Differentiating F(x, g(x)) ≡ 0 via the chain rule gives Fx​ + Fy​ g' = 0 and hence the formula.

04When to Use

Hook: Whenever it’s tough or impossible to solve for y explicitly, but you still need slopes, tangents, or sensitivity to x, implicit differentiation is the tool of choice.
Concept: Use implicit differentiation when a relationship is naturally defined as a constraint F(x,y)=0 (or F(x,y)=c). It is especially useful near points where the constraint surface behaves nicely (nonzero F_y or invertible D_y F). In modeling, engineering, and graphics, constraints often define curves and surfaces; differentiating them implicitly yields tangents, normals, and rates of change.
Examples:

  • Geometry: Find the slope to a conic (circle, ellipse) without solving for y.
  • Physics: Constraints like conservation laws or contact constraints define implicit relations; velocities and accelerations follow from differentiating constraints.
  • Optimization: Karush–Kuhn–Tucker systems or equation constraints define optimal solutions; sensitivity of solutions to parameters can be obtained via implicit differentiation.
  • Numerical computing: When you solve F(x,y)=0 for y with Newton’s method, the linearization F(x+Δx,y+Δy)≈F + D_xF Δx + D_yF Δy gives Δy in terms of Δx via D_yF, which is precisely the implicit derivative.

⚠️Common Mistakes

Hook: Most errors come from forgetting what is held constant and where formulas are valid.
Concept and fixes:

  • Ignoring the condition F_y ≠ 0 (or det(D_yF) ≠ 0). If F_y=0 at a point, dy/dx = -F_x/F_y blows up or is undefined. Fix: Check the condition; if it fails, consider solving x as a function of y (swap roles) or analyze singular behavior.
  • Differentiating as if y were independent of x. Remember y=y(x); apply the chain rule so terms with y pick up factors of dy/dx (or Dg).
  • Using points not on the curve. The formula applies at points satisfying F(x,y)=0. Fix: First verify F≈0 numerically before computing derivatives.
  • Numerical step sizes that are too big or too small. Finite differences with a bad h cause truncation or rounding error. Fix: Use central differences with a scaled h (e.g., 10^{-6} times variable scale) and test sensitivity.
  • Poor Newton initial guesses. Newton’s method can diverge if the starting y is far from the solution or if D_yF is ill-conditioned. Fix: Use damping, bracketed methods for 1D, or better initial guesses.
  • Confusing partials with totals. F_x and F_y are partial derivatives holding the other variable fixed; dy/dx is a total derivative along the constraint. Keep roles clear in formulas and code.

Key Formulas

Implicit derivative (scalar)

dxdy​=−Fy​(x,y)Fx​(x,y)​

Explanation: For a single equation F(x,y)=0 with Fy​ = 0, the slope of y as a function of x is the negative ratio of partial derivatives. This comes from differentiating F(x,y(x)) ≡ 0 and applying the chain rule.

Second derivative (scalar)

y′′=−Fy​Fxx​+2Fxy​y′+Fyy​(y′)2​

Explanation: Differentiating Fx​ + Fy​ y' = 0 again and using equality of mixed partials yields this expression. It gives curvature information of the implicitly defined curve where Fy​ = 0.

Implicit Function Theorem (existence)

If F(x0​,y0​)=0, F∈C1, det(Dy​F(x0​,y0​))=0,  then ∃ g: U→V  with F(x,g(x))=0

Explanation: When the Jacobian with respect to y is invertible at a solution, there exists a local function y=g(x) solving the constraint. The statement is local: g is guaranteed only near (x0​,y0​).

Implicit Function Theorem (derivative)

Dg(x)=−(Dy​F(x,g(x)))−1Dx​F(x,g(x))

Explanation: The derivative (Jacobian) of the implicit function g is obtained by solving Dy​F · Dg + D_xF=0 for Dg. This generalizes -Fx​/Fy​ to higher dimensions.

First-order linearization

F(x+Δx,y+Δy)≈F(x,y)+Dx​F(x,y)Δx+Dy​F(x,y)Δy

Explanation: Near a point, F behaves like an affine map composed of its Jacobians. Setting this approximation to zero under the constraint provides the linear relationship between small changes in x and y.

Newton iteration (fixed x)

yk+1​=yk​−(Dy​F(x,yk​))−1F(x,yk​)

Explanation: Holding x constant, Newton’s method updates y by solving a linear system with the Jacobian Dy​F. This converges quadratically near well-behaved solutions.

Nonsingularity condition

det(Dy​F(x0​,y0​))=0

Explanation: This condition ensures Dy​F is invertible, a prerequisite for the Implicit Function Theorem and for stable numerical solution of y given x.

Orthogonality of gradient and tangent

∇F(x,y)=(Fx​,Fy​)⊥ tangent of F=0

Explanation: The tangent vector lies in the kernel of the Jacobian of F; equivalently, it is orthogonal to the gradient. This geometric fact underlies the slope formula.

Complexity Analysis

Computationally, implicit differentiation in code involves two main tasks: evaluating partial derivatives (to form Jacobians) and solving small linear systems. If partials are provided analytically, each evaluation costs O(1). When approximated by central finite differences, computing a single partial for a scalar F costs two function evaluations. For F: Rn+m → Rm, building Dy​F (m×m) and Dx​F (m×n) via central differences needs O(m·(m+n)) function evaluations. If the cost of one F evaluation is CF​, Jacobian construction costs O(m·(m+n)·CF​). Once Dy​F is formed, computing Dg = −(Dy​F)^{-1} Dx​F reduces to solving a linear system with multiple right-hand sides. Using Gaussian elimination with partial pivoting, solving A X=B (A is m×m, B is m×n) costs O(m3 + m2 n) time and O(m2) space (to store A and factors), plus O(m n) for B. In the common scalar case (m=1), this is O(1). For Newton’s method at fixed x, each iteration requires evaluating F and Dy​F and solving Dy​F Δy = −F. The per-iteration cost is O(m·(m+n)·CF​ + m3). Convergence is typically quadratic near a well-conditioned root, so the number of iterations is small (often<10), but ill-conditioned Dy​F (large condition number) can slow or stall convergence and amplify floating-point errors. Space usage is dominated by storing Jacobians (O(m2)) and any factorization (also O(m2)). For robust performance, use central differences with scale-aware step sizes, pivoted elimination, and checks for small pivots; these reduce numerical error and improve stability without changing big-O complexity.

Code Examples

Compute dy/dx for F(x,y)=0 using finite differences (scalar case)
1#include <bits/stdc++.h>
2using namespace std;
3
4// Type alias for a scalar implicit function F(x,y)
5using ImplicitF = function<double(double,double)>;
6
7// Central difference partial derivative with respect to x at (x,y)
8double partial_x(const ImplicitF& F, double x, double y) {
9 double scale = max(1.0, abs(x));
10 double h = 1e-6 * scale; // scale-aware step
11 return (F(x + h, y) - F(x - h, y)) / (2.0 * h);
12}
13
14// Central difference partial derivative with respect to y at (x,y)
15double partial_y(const ImplicitF& F, double x, double y) {
16 double scale = max(1.0, abs(y));
17 double h = 1e-6 * scale;
18 return (F(x, y + h) - F(x, y - h)) / (2.0 * h);
19}
20
21// Compute dy/dx = -F_x / F_y at (x,y). Throws if F_y is too small.
22double implicit_slope(const ImplicitF& F, double x, double y) {
23 double Fx = partial_x(F, x, y);
24 double Fy = partial_y(F, x, y);
25 if (!isfinite(Fx) || !isfinite(Fy)) {
26 throw runtime_error("Non-finite partial derivative encountered");
27 }
28 if (abs(Fy) < 1e-12) {
29 throw runtime_error("F_y is near zero; slope may be undefined or vertical.");
30 }
31 return -Fx / Fy;
32}
33
34int main() {
35 // Example: Unit circle F(x,y) = x^2 + y^2 - 1 = 0
36 ImplicitF F = [](double x, double y){ return x*x + y*y - 1.0; };
37
38 double x = 0.6;
39 double y = sqrt(max(0.0, 1.0 - x*x)); // upper branch y>0
40
41 // Verify the point is on the curve (approximately)
42 double val = F(x, y);
43 cout << fixed << setprecision(6);
44 cout << "F(x,y) = " << val << " (should be ~0)\n";
45
46 try {
47 double slope = implicit_slope(F, x, y);
48 cout << "dy/dx at (" << x << ", " << y << ") ≈ " << slope << "\n";
49 cout << "Expected (analytic) slope = -x/y = " << -x/y << "\n";
50 } catch (const exception& e) {
51 cerr << "Error: " << e.what() << "\n";
52 }
53 return 0;
54}
55

We approximate the partial derivatives F_x and F_y by central differences and apply the implicit formula dy/dx = -F_x/F_y. On the unit circle, the analytic slope is -x/y, which matches the numerical estimate when the step size is reasonable and the point is not near F_y=0 (vertical tangent).

Time: O(1) per slope evaluation (a constant number of F calls)Space: O(1)
Newton’s method to solve F(x, y)=0 for y at fixed x (scalar y)
1#include <bits/stdc++.h>
2using namespace std;
3
4using ImplicitF = function<double(double,double)>;
5
6// Central difference derivative wrt y at fixed x
7double partial_y(const ImplicitF& F, double x, double y) {
8 double scale = max(1.0, abs(y));
9 double h = 1e-6 * scale;
10 return (F(x, y + h) - F(x, y - h)) / (2.0 * h);
11}
12
13// Newton iteration to find y s.t. F(x, y)=0, starting from y0.
14// Returns pair(y, converged)
15pair<double,bool> newton_solve_y_at_x(const ImplicitF& F, double x, double y0,
16 int max_iter=50, double tol=1e-12) {
17 double y = y0;
18 for (int it = 0; it < max_iter; ++it) {
19 double Fy = partial_y(F, x, y);
20 double Fx0 = F(x, y);
21 if (!isfinite(Fy) || !isfinite(Fx0)) return {y,false};
22 if (abs(Fx0) < tol) return {y,true};
23 if (abs(Fy) < 1e-14) return {y,false}; // avoid division by near-zero
24 double step = Fx0 / Fy;
25 // Optional damping for stability
26 double lambda = 1.0;
27 double y_new = y - lambda * step;
28 if (abs(y_new - y) < tol * max(1.0, abs(y))) {
29 y = y_new; return {y,true};
30 }
31 y = y_new;
32 }
33 return {y,false};
34}
35
36int main(){
37 // Example: circle F(x,y)=x^2 + y^2 - 1. For x=0.6, find y>0.
38 ImplicitF F = [](double x, double y){ return x*x + y*y - 1.0; };
39
40 double x = 0.6;
41 double y0 = 0.5; // initial guess near the upper branch
42
43 auto [y, ok] = newton_solve_y_at_x(F, x, y0);
44 cout << fixed << setprecision(12);
45 if (ok) {
46 cout << "Converged y ≈ " << y << "\n";
47 cout << "Residual F(x,y) ≈ " << F(x,y) << "\n";
48 cout << "True y = sqrt(1 - x^2) = " << sqrt(1.0 - x*x) << "\n";
49 } else {
50 cout << "Newton did not converge. Last y = " << y << ", residual = " << F(x,y) << "\n";
51 }
52 return 0;
53}
54

Holding x fixed, Newton’s method updates y by y_{k+1} = y_k − F/F_y. We approximate F_y with a central difference. For the circle, this converges quickly if the initial guess is on the correct branch and not near a vertical tangent where F_y≈0.

Time: O(I) iterations, each O(1) with numeric partials (so O(I))Space: O(1)
Multivariate implicit derivative: compute Dg = -(D_yF)^{-1} D_xF
1#include <bits/stdc++.h>
2using namespace std;
3
4// Vector type aliases
5using Vec = vector<double>;
6using Mat = vector<vector<double>>; // row-major
7
8// F: R^{n+m} -> R^{m}, given as F(x,y) -> vector size m
9using VectorF = function<Vec(const Vec&, const Vec&)>;
10
11// Central difference Jacobian wrt y: m x m
12Mat jacobian_y(const VectorF& F, const Vec& x, const Vec& y) {
13 int m = (int)y.size();
14 Vec F0 = F(x, y);
15 Mat J(m, Vec(m, 0.0));
16 for (int j = 0; j < m; ++j) {
17 Vec yh = y;
18 double scale = max(1.0, abs(y[j]));
19 double h = 1e-6 * scale;
20 yh[j] = y[j] + h; Vec Fp = F(x, yh);
21 yh[j] = y[j] - h; Vec Fm = F(x, yh);
22 for (int i = 0; i < m; ++i) J[i][j] = (Fp[i] - Fm[i]) / (2.0*h);
23 }
24 return J;
25}
26
27// Central difference Jacobian wrt x: m x n
28Mat jacobian_x(const VectorF& F, const Vec& x, const Vec& y) {
29 int m = (int)y.size();
30 int n = (int)x.size();
31 Mat J(m, Vec(n, 0.0));
32 for (int j = 0; j < n; ++j) {
33 Vec xh = x;
34 double scale = max(1.0, abs(x[j]));
35 double h = 1e-6 * scale;
36 xh[j] = x[j] + h; Vec Fp = F(xh, y);
37 xh[j] = x[j] - h; Vec Fm = F(xh, y);
38 for (int i = 0; i < m; ++i) J[i][j] = (Fp[i] - Fm[i]) / (2.0*h);
39 }
40 return J;
41}
42
43// Solve A X = B for X, where A is m x m and B is m x n, via Gaussian elimination with partial pivoting
44Mat solve_multi_rhs(Mat A, Mat B) {
45 int m = (int)A.size();
46 int n = (int)B[0].size();
47 // Augment A with B (m x (m+n))
48 Mat Aug(m, Vec(m + n, 0.0));
49 for (int i = 0; i < m; ++i) {
50 for (int j = 0; j < m; ++j) Aug[i][j] = A[i][j];
51 for (int j = 0; j < n; ++j) Aug[i][m + j] = B[i][j];
52 }
53 // Forward elimination
54 for (int k = 0; k < m; ++k) {
55 // Pivot
56 int piv = k; double best = abs(Aug[k][k]);
57 for (int i = k+1; i < m; ++i) if (abs(Aug[i][k]) > best) { best = abs(Aug[i][k]); piv = i; }
58 if (best < 1e-14) throw runtime_error("Singular or ill-conditioned Jacobian D_yF");
59 if (piv != k) swap(Aug[piv], Aug[k]);
60 // Normalize row k
61 double diag = Aug[k][k];
62 for (int j = k; j < m + n; ++j) Aug[k][j] /= diag;
63 // Eliminate below
64 for (int i = k+1; i < m; ++i) {
65 double factor = Aug[i][k];
66 if (factor == 0.0) continue;
67 for (int j = k; j < m + n; ++j) Aug[i][j] -= factor * Aug[k][j];
68 }
69 }
70 // Back substitution
71 for (int k = m-1; k >= 0; --k) {
72 for (int i = 0; i < k; ++i) {
73 double factor = Aug[i][k];
74 if (factor == 0.0) continue;
75 for (int j = k; j < m + n; ++j) Aug[i][j] -= factor * Aug[k][j];
76 }
77 }
78 // Extract X
79 Mat X(m, Vec(n, 0.0));
80 for (int i = 0; i < m; ++i)
81 for (int j = 0; j < n; ++j)
82 X[i][j] = Aug[i][m + j];
83 return X;
84}
85
86// Compute Dg = -(D_yF)^{-1} D_xF at (x,y)
87Mat implicit_derivative(const VectorF& F, const Vec& x, const Vec& y) {
88 Mat Jy = jacobian_y(F, x, y); // m x m
89 Mat Jx = jacobian_x(F, x, y); // m x n
90 // Solve Jy * D = -Jx => D = -Jy^{-1} Jx
91 for (auto& row : Jx) for (double& v : row) v = -v;
92 Mat D = solve_multi_rhs(Jy, Jx); // D is m x n, representing Dg
93 return D;
94}
95
96int main(){
97 cout << fixed << setprecision(8);
98 // Example system (m=2, n=2):
99 // F1(x,y) = y1 + x1^2 - 1 = 0 => y1 = 1 - x1^2
100 // F2(x,y) = y2 - exp(x2) = 0 => y2 = exp(x2)
101 VectorF F = [](const Vec& x, const Vec& y){
102 Vec out(2);
103 out[0] = y[0] + x[0]*x[0] - 1.0;
104 out[1] = y[1] - exp(x[1]);
105 return out;
106 };
107
108 Vec x = {0.6, 0.0};
109 Vec y = {1.0 - x[0]*x[0], exp(x[1])}; // exact solution at x
110
111 // Check residual
112 Vec r = F(x,y);
113 cout << "Residuals: [" << r[0] << ", " << r[1] << "]\n";
114
115 Mat Dg = implicit_derivative(F, x, y); // Dg is 2 x 2
116 cout << "Dg = -(D_yF)^{-1} D_xF at (x,y):\n";
117 for (int i = 0; i < (int)Dg.size(); ++i) {
118 for (int j = 0; j < (int)Dg[0].size(); ++j) cout << setw(12) << Dg[i][j] << ' ';
119 cout << '\n';
120 }
121 cout << "Expected: [[-2*x1, 0],[0, exp(x2)]] = [[" << -2*x[0] << ", 0], [0, " << exp(x[1]) << "]]\n";
122 return 0;
123}
124

For F: R^{n+m}→R^{m}, we compute Jacobians D_yF and D_xF via central differences and solve D_yF·Dg = −D_xF using Gaussian elimination with partial pivoting. The example has analytical Dg = [[-2x1, 0], [0, e^{x2}]], which the numerical code reproduces. This generalizes −F_x/F_y to higher dimensions.

Time: O(m·(m+n)·C_F + m^3 + m^2 n) per evaluation of DgSpace: O(m^2 + m n)
#implicit differentiation#implicit function theorem#jacobian#partial derivatives#newton method#finite differences#tangent line#level set#nonsingular#gaussian elimination#numerical differentiation#stability#condition number#local linearization#hessian