🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
⏱️Coach🧩Problems🧠Thinking🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way
∑MathIntermediate

Inner Products & Norms

Key Points

  • •
    An inner product measures how much two vectors point in the same direction; in Rn it is the dot product.
  • •
    A norm measures the length of a vector; common norms include L1 (Manhattan), L2 (Euclidean), and L∞ (max).
  • •
    The Cauchy–Schwarz inequality bounds the inner product by the product of lengths, implying the cosine angle formula.
  • •
    Lp norms generalize length: |∣x∣∣p​=(sum∣xi​∣p)1/pforp≥1,and∣∣x∣∣∞​=max∣xi​|.
  • •
    In finite dimensions, all norms are equivalent up to constant factors, but they emphasize different geometric aspects.
  • •
    Dot product and norms power key operations like projection, normalization, angle computation, and distances.
  • •
    Numerically, careful accumulation (e.g., Kahan summation) improves accuracy for dot products and L2 norms.
  • •
    Computing dot products and norms over n-dimensional vectors is O(n) time and O(1) extra space.

Prerequisites

  • →Basic algebra and absolute value — Lp norms rely on powers and absolute values; manipulating sums and products is essential.
  • →Vectors in ℝ^n — Understanding coordinates, vector addition, and scalar multiplication is required to define inner products and norms.
  • →Trigonometry (cosine) — The angle between vectors is defined through the cosine of the angle via the normalized dot product.
  • →Inequalities — Cauchy–Schwarz, triangle inequality, Hölder, and Minkowski are inequality results central to proofs and bounds.
  • →Floating-point arithmetic — Implementations are numerical; understanding precision, rounding, and overflow/underflow avoids bugs.
  • →C++ functions, vectors, and exceptions — To implement and safely use dot products and norms with correct error handling.

Detailed Explanation

Tap terms for definitions

01Overview

Inner products and norms are fundamental tools for measuring similarity and size in vector spaces. The inner product, often the dot product in real Euclidean space, tells you how aligned two vectors are. If it is large and positive, the vectors point in a similar direction; if zero, they are orthogonal (perpendicular); if negative, they point in opposite directions. A norm, on the other hand, assigns a nonnegative length to a vector. Common norms include the L1 norm (sum of absolute values), the L2 norm (Euclidean length), and the L∞ norm (largest absolute component). These measurements underpin geometry, data analysis, machine learning, optimization, and numerical computing. The Cauchy–Schwarz inequality is a cornerstone result that relates inner products to norms: the absolute inner product cannot exceed the product of the vectors’ lengths. This leads directly to defining the angle between vectors via the cosine formula. Lp norms generalize the notion of length and let us emphasize different aspects of data: L1 emphasizes total magnitude, L2 emphasizes outliers more, and L∞ focuses on the worst coordinate. In practice, these ideas translate into algorithms for projecting vectors onto directions, normalizing to unit length, computing distances, and building robust models. All of these operations can be implemented efficiently in C++ with linear-time passes over the data.

02Intuition & Analogies

Imagine two arrows on a map. The dot product tells you how much of arrow A goes in the direction of arrow B—like asking, “How much does A help me move along B’s path?” If A is perfectly aligned with B, the contribution is maximal; if A is perpendicular, it contributes nothing in B’s direction; if A points the opposite way, it subtracts. That’s why the dot product captures similarity in direction. Now think about norms as different ways to measure the length of a path:

  • L2 (Euclidean) is the straight-line distance you’d measure with a ruler between the start and end of an arrow.
  • L1 (Manhattan) is like walking city blocks: you measure distance by summing how much you go east-west and north-south.
  • L∞ (Chebyshev or max norm) is the number of king moves in chess: the time is determined by the slower direction because you can move diagonally; effectively the longest coordinate dominates. These norms change what you consider “close.” Under L1, two points are close if the total coordinate differences are small; under L∞, they’re close if no single coordinate differs much. Cauchy–Schwarz is like a fairness rule: no matter how you shuffle components, the inner product’s magnitude can never exceed the product of the lengths. Geometrically, it’s the reason the cosine of the angle between two vectors is between −1 and 1. Practically, it protects algorithms from producing impossible angles or similarities. Finally, projection is the idea of dropping a perpendicular shadow from one arrow onto another: how much of u lies along v? This shadow is essential in decomposing motion, fitting lines (least squares), and filtering signals. Normalizing is just scaling an arrow to unit length so you compare only direction, not magnitude.

03Formal Definition

Let V be a real vector space. An inner product is a function ⟨·,·⟩: V × V → ℝ such that for all x, y, z ∈ V and all scalars α ∈ ℝ: (1) Symmetry: ⟨x, y⟩ = ⟨y, x⟩; (2) Linearity in each argument (bilinearity over ℝ): ⟨αx + y, z⟩ = α⟨x, z⟩ + ⟨y, z⟩ and similarly in the second slot; (3) Positive-definiteness: ⟨x, x⟩ ≥ 0 with equality iff x=0. In ℝ^n, the standard inner product (dot product) is ⟨x, y⟩ = ∑_{i=1}^n xi​ yi​. Given an inner product, its induced norm is defined by |∣x∣∣=√⟨x,x⟩,satisfying:(i)Nonnegativityanddefiniteness:∣∣x∣∣≥0and∣∣x∣∣=0iffx=0;(ii)Homogeneity:∣|αx∣∣ = |α∣⋅∣∣x∣∣; (iii) Triangle inequality: |∣x+y∣∣≤∣∣x∣∣+∣∣y∣∣.InRnwiththestandardinnerproduct,thisinducednormistheL2norm.Moregenerally,forp≥1,theLpnormonRnis∣∣x∣∣p​=(∑i=1n​∣xi​∣p)1/p,andtheL∞normis∣∣x∣∣∞​=max1≤i≤n​∣xi​∣.WhileL2isinducedbythestandardinnerproduct,mostLpnormsforp=2arenotinducedbyanyinnerproduct.TheCauchy–Schwarzinequalitystates∣⟨x, y⟩∣≤∣∣x∣∣⋅∣∣y∣|, with equality iff x and y are linearly dependent and aligned (one is a scalar multiple of the other with nonnegative scalar if working over ℝ).

04When to Use

Use the dot product when you need to measure similarity or alignment, compute projections, find angles, or perform least-squares fitting. For example, cosine similarity in information retrieval is just the normalized dot product, making it robust to differences in document lengths. Choose an L2 norm when Euclidean geometry is appropriate or when you want smooth optimization landscapes (e.g., ridge regression). It penalizes large coordinates quadratically, making outliers more influential. Use L1 when robustness and sparsity matter (e.g., LASSO, median-related estimators); it treats all deviations linearly and tends to drive many components to exactly zero in optimization. Select L∞ to minimize worst-case deviation or to enforce uniform bounds across all coordinates (e.g., Chebyshev approximation, max-error constraints). Lp with other p values interpolate these behaviors to fit application-specific error models. In algorithms:

  • Distances for nearest-neighbor search can use any Lp metric depending on problem geometry and noise.
  • Normalization to unit length (under L2) is standard before computing cosine similarity to compare directions only.
  • Projection decomposes motion/variables into components along a direction and orthogonal to it, critical in Gram–Schmidt, PCA, and signal processing.
  • Error bounds and stability often rely on Cauchy–Schwarz and triangle inequality to estimate accumulated errors.

⚠️Common Mistakes

• Confusing dot product with elementwise (Hadamard) multiplication; the dot product sums products across coordinates. Always ensure you sum the products to obtain a scalar. • Using p < 1 in an “Lp norm.” For p < 1, the expression fails the triangle inequality; it is not a norm and changes algorithmic guarantees. • Forgetting absolute values in L1/Lp definitions. Norms always use absolute values to ensure nonnegativity and orientation independence. • Ignoring zero vectors when normalizing or computing angles. Division by ||x|| can fail if x is zero or nearly zero; add checks and tolerances. • Assuming all norms are induced by inner products. Only L2 (under the standard inner product) is; L1 and L∞ are not inner-product-induced. • Numerical instability in long dot products or L2 norms with mixed magnitudes (catastrophic cancellation). Use double precision and compensated summation (e.g., Kahan), or reorder terms. • Overflow/underflow when squaring large/small values for L2. Consider scaling or using hypot-like techniques. • Mismatched vector sizes in implementations. Validate dimensions before computing dot products or distances. • Misinterpreting cosine similarity scale. It lies in [−1, 1]; do not compare raw dot products across vectors of different lengths; normalize first. • Overgeneralizing norm equivalence: in finite dimensions norms are equivalent up to constants depending on n, but those constants can be large; algorithmic behavior may still differ materially for finite n.

Key Formulas

Dot Product

⟨x,y⟩=i=1∑n​xi​yi​

Explanation: The inner product in Euclidean space equals the sum of coordinate-wise products. It measures alignment of two vectors.

Induced Norm

∥x∥=⟨x,x⟩​

Explanation: The length of a vector under an inner product is the square root of the inner product of the vector with itself. This is the L2 norm under the standard dot product.

Lp Norm

∥x∥p​=(i=1∑n​∣xi​∣p)1/p,p≥1

Explanation: Generalized family of norms parameterized by p. Larger p emphasizes bigger coordinates more strongly.

Infinity Norm

∥x∥∞​=1≤i≤nmax​∣xi​∣

Explanation: The L∞ norm is the size of the largest-magnitude coordinate. It captures worst-case deviation.

Cauchy–Schwarz Inequality

​⟨x,y⟩​≤∥x∥∥y∥

Explanation: The absolute inner product is at most the product of lengths, ensuring the cosine of the angle is within [−1, 1].

Angle Between Vectors

cosθ=∥x∥∥y∥⟨x,y⟩​

Explanation: Relates inner product and norms to the cosine of the angle between two nonzero vectors. Useful for similarity measures.

Projection

projv​(u)=⟨v,v⟩⟨u,v⟩​v

Explanation: The component of u along v is given by scaling v by the ratio of inner products. Requires v = 0.

Triangle Inequality

∥x+y∥≤∥x∥+∥y∥

Explanation: A defining property of norms; it ensures distances satisfy the triangle property and bounds sums of vectors.

H\"older Inequality

i=1∑n​∣ai​bi​∣≤(i=1∑n​∣ai​∣p)1/p(i=1∑n​∣bi​∣q)1/q,p1​+q1​=1

Explanation: Generalizes Cauchy–Schwarz (the case p=q=2). It bounds the sum of products using Lp and Lq norms.

Minkowski Inequality

∥x+y∥p​≤∥x∥p​+∥y∥p​,p≥1

Explanation: Triangle inequality specialized to Lp norms. Fundamental in proving that Lp is a norm.

Lp Limit

p→∞lim​∥x∥p​=∥x∥∞​

Explanation: As p grows, the largest coordinate dominates the p-norm, converging to the infinity norm.

Finite-Dimensional Norm Equivalence

∥x∥q​≤n(q1​−p1​)∥x∥p​,1≤q≤p≤∞

Explanation: Relates different Lp norms by dimension-dependent constants, showing all norms are equivalent up to scaling in finite dimensions.

Lp Distance

dp​(x,y)=∥x−y∥p​

Explanation: Defines a metric from an Lp norm. It measures how far two vectors are under the chosen geometry.

Complexity Analysis

For vectors of length n stored in contiguous arrays (e.g., std::vector<double>), computing the dot product, any Lp norm (with fixed p), and L∞ norm all take O(n) time because each requires a single pass over the components and a constant amount of work per element. Memory overhead is O(1) extra space beyond the input vectors because we maintain only a few scalars (accumulators) and possibly a running maximum. When p is not fixed and pow is used for ∣xi​∣^p, the hidden constant factor increases due to exponentiation, but the asymptotic time remains O(n). Specialized implementations for p=1, 2, and ∞ avoid pow and are significantly faster. The projection of u onto v consists of two dot products and one vector scaling, which is still O(n) time, O(1) extra space. Numerical stability can affect performance marginally if using techniques like Kahan summation. Kahan adds a few arithmetic operations per iteration—still O(n)—but can reduce floating-point error dramatically for sums with disparate magnitudes. If vectors are sparse (most entries zero) and stored in compressed form, complexity is O(nnz) where nnz is the number of nonzero entries, which can be much less than n. Caching and memory bandwidth matter in practice: these operations are memory-bound on large n, so laying out data contiguously and enabling compiler vectorization (e.g., -O3) may give large speedups. Parallelization across cores (e.g., with OpenMP) can reduce wall-clock time but not asymptotic complexity. Space complexity remains O(1) auxiliary in all these basic operations because no additional data structures are created proportional to n.

Code Examples

Dot Product and Common Norms (L1, L2, Lp, L∞) with Basic Numerical Care
1#include <iostream>
2#include <vector>
3#include <cmath>
4#include <stdexcept>
5#include <algorithm>
6#include <limits>
7
8// Compute dot product with Kahan compensated summation for better numerical stability
9double dot(const std::vector<double>& a, const std::vector<double>& b) {
10 if (a.size() != b.size()) throw std::invalid_argument("dot: size mismatch");
11 double sum = 0.0;
12 double c = 0.0; // compensation for lost low-order bits
13 for (size_t i = 0; i < a.size(); ++i) {
14 double prod = a[i] * b[i];
15 double y = prod - c;
16 double t = sum + y;
17 c = (t - sum) - y;
18 sum = t;
19 }
20 return sum;
21}
22
23// L1 norm: sum of absolute values
24double norm1(const std::vector<double>& x) {
25 double s = 0.0;
26 for (double v : x) s += std::abs(v);
27 return s;
28}
29
30// L2 norm: Euclidean length using Kahan on squares
31double norm2(const std::vector<double>& x) {
32 double sumsq = 0.0, c = 0.0;
33 for (double v : x) {
34 double term = v * v;
35 double y = term - c;
36 double t = sumsq + y;
37 c = (t - sumsq) - y;
38 sumsq = t;
39 }
40 return std::sqrt(sumsq);
41}
42
43// L-infinity norm: maximum absolute value
44double norm_inf(const std::vector<double>& x) {
45 double m = 0.0;
46 for (double v : x) m = std::max(m, std::abs(v));
47 return m;
48}
49
50// General Lp norm for p >= 1 (uses pow; slower than specialized versions)
51double norm_p(const std::vector<double>& x, double p) {
52 if (!(p >= 1.0)) throw std::invalid_argument("norm_p: p must be >= 1");
53 if (std::isinf(p)) return norm_inf(x);
54
55 double s = 0.0;
56 // Basic accumulation; for extreme values, consider scaling to reduce overflow/underflow
57 for (double v : x) s += std::pow(std::abs(v), p);
58 return std::pow(s, 1.0 / p);
59}
60
61int main() {
62 std::vector<double> x = {3.0, -1.0, 2.0, 0.5};
63 std::vector<double> y = {1.0, 4.0, -2.0, 2.0};
64
65 std::cout << "dot(x, y) = " << dot(x, y) << "\n";
66
67 std::cout << "||x||_1 = " << norm1(x) << "\n";
68 std::cout << "||x||_2 = " << norm2(x) << "\n";
69 std::cout << "||x||_3 = " << norm_p(x, 3.0) << "\n";
70 std::cout << "||x||_inf = " << norm_inf(x) << "\n";
71
72 // Simple distance example under L2
73 std::vector<double> diff(x.size());
74 for (size_t i = 0; i < x.size(); ++i) diff[i] = x[i] - y[i];
75 std::cout << "d_2(x, y) = ||x - y||_2 = " << norm2(diff) << "\n";
76
77 return 0;
78}
79

This program implements the dot product with Kahan summation for stability and computes common vector norms: L1, L2, Lp, and L∞. It demonstrates their values on sample vectors and shows how to get an L2 distance by taking the norm of a difference vector. Specialized versions for p = 1, 2, and ∞ avoid pow and are faster and more stable.

Time: O(n)Space: O(1)
Cauchy–Schwarz Check, Cosine Similarity, and Projection
1#include <iostream>
2#include <vector>
3#include <cmath>
4#include <stdexcept>
5#include <algorithm>
6
7double dot(const std::vector<double>& a, const std::vector<double>& b) {
8 if (a.size() != b.size()) throw std::invalid_argument("dot: size mismatch");
9 double s = 0.0;
10 for (size_t i = 0; i < a.size(); ++i) s += a[i] * b[i];
11 return s;
12}
13
14double norm2(const std::vector<double>& x) {
15 double s = 0.0;
16 for (double v : x) s += v * v;
17 return std::sqrt(s);
18}
19
20// Projection of u onto v: (u·v / v·v) * v
21std::vector<double> project_u_onto_v(const std::vector<double>& u, const std::vector<double>& v) {
22 if (u.size() != v.size()) throw std::invalid_argument("project: size mismatch");
23 double vv = dot(v, v);
24 if (vv == 0.0) throw std::invalid_argument("project: v must be nonzero");
25 double scale = dot(u, v) / vv;
26 std::vector<double> proj(v.size());
27 for (size_t i = 0; i < v.size(); ++i) proj[i] = scale * v[i];
28 return proj;
29}
30
31int main() {
32 std::vector<double> u = {2.0, -1.0, 3.0};
33 std::vector<double> v = {1.0, 2.0, 0.0};
34
35 double duv = dot(u, v);
36 double nu = norm2(u);
37 double nv = norm2(v);
38
39 // Cosine similarity and angle (in degrees)
40 double cos_theta = (nu > 0 && nv > 0) ? duv / (nu * nv) : 1.0; // define cos(0) if either zero
41 // Clamp to [-1, 1] to avoid NaNs from slight numerical error
42 cos_theta = std::max(-1.0, std::min(1.0, cos_theta));
43 double theta_rad = std::acos(cos_theta);
44 double theta_deg = theta_rad * 180.0 / M_PI;
45
46 // Verify Cauchy–Schwarz: |u·v| <= ||u||*||v||
47 bool cs_ok = std::abs(duv) <= (nu * nv) + 1e-12; // small tolerance
48
49 std::vector<double> proj = project_u_onto_v(u, v);
50
51 std::cout << "u·v = " << duv << "\n";
52 std::cout << "||u||_2 = " << nu << ", ||v||_2 = " << nv << "\n";
53 std::cout << "cos(theta) = " << cos_theta << ", theta ≈ " << theta_deg << " degrees\n";
54 std::cout << "Cauchy–Schwarz holds? " << (cs_ok ? "yes" : "no") << "\n";
55
56 std::cout << "proj_v(u) = [";
57 for (size_t i = 0; i < proj.size(); ++i) {
58 std::cout << proj[i] << (i + 1 == proj.size() ? "" : ", ");
59 }
60 std::cout << "]\n";
61
62 // Residual (orthogonal component): u - proj_v(u)
63 std::vector<double> w(u.size());
64 for (size_t i = 0; i < u.size(); ++i) w[i] = u[i] - proj[i];
65 std::cout << "Residual norm ||u - proj_v(u)||_2 = " << norm2(w) << "\n";
66
67 return 0;
68}
69

This program computes cosine similarity and the angle between vectors and verifies the Cauchy–Schwarz inequality numerically. It also projects u onto v and reports the orthogonal residual’s norm, illustrating geometric decomposition into parallel and perpendicular components.

Time: O(n)Space: O(1)
Lp and L∞ Distances, and Robust Unit Normalization
1#include <iostream>
2#include <vector>
3#include <cmath>
4#include <stdexcept>
5#include <algorithm>
6#include <limits>
7
8void check_same_size(const std::vector<double>& a, const std::vector<double>& b) {
9 if (a.size() != b.size()) throw std::invalid_argument("size mismatch");
10}
11
12double norm_p(const std::vector<double>& x, double p) {
13 if (!(p >= 1.0)) throw std::invalid_argument("p must be >= 1");
14 if (std::isinf(p)) {
15 double m = 0.0; for (double v : x) m = std::max(m, std::abs(v));
16 return m;
17 }
18 double s = 0.0; for (double v : x) s += std::pow(std::abs(v), p);
19 return std::pow(s, 1.0 / p);
20}
21
22// Lp distance: ||a - b||_p
23double distance_p(const std::vector<double>& a, const std::vector<double>& b, double p) {
24 check_same_size(a, b);
25 if (!(p >= 1.0)) throw std::invalid_argument("p must be >= 1");
26 if (std::isinf(p)) {
27 double m = 0.0; for (size_t i = 0; i < a.size(); ++i) m = std::max(m, std::abs(a[i] - b[i]));
28 return m;
29 }
30 double s = 0.0; for (size_t i = 0; i < a.size(); ++i) s += std::pow(std::abs(a[i] - b[i]), p);
31 return std::pow(s, 1.0 / p);
32}
33
34// Normalize x to unit Lp norm. Throws if the norm is (near) zero.
35std::vector<double> normalize_lp(const std::vector<double>& x, double p, double eps = 1e-15) {
36 double nx = norm_p(x, p);
37 if (nx <= eps) throw std::invalid_argument("cannot normalize near-zero vector");
38 std::vector<double> y(x.size());
39 for (size_t i = 0; i < x.size(); ++i) y[i] = x[i] / nx;
40 return y;
41}
42
43int main() {
44 std::vector<double> a = {1.0, -2.0, 3.0, 0.0};
45 std::vector<double> b = {0.0, 2.0, 1.0, 4.0};
46
47 std::cout << "d_1(a, b) = " << distance_p(a, b, 1.0) << "\n";
48 std::cout << "d_2(a, b) = " << distance_p(a, b, 2.0) << "\n";
49 std::cout << "d_inf(a, b) = " << distance_p(a, b, std::numeric_limits<double>::infinity()) << "\n";
50
51 try {
52 std::vector<double> a_unit = normalize_lp(a, 2.0);
53 std::cout << "||normalize_2(a)||_2 = " << norm_p(a_unit, 2.0) << "\n";
54 } catch (const std::exception& e) {
55 std::cout << "Normalization failed: " << e.what() << "\n";
56 }
57
58 return 0;
59}
60

This program implements Lp and L∞ distances between two vectors and a robust unit-normalization routine that guards against division by (near) zero. It demonstrates how different norms induce different distances and how to safely create unit vectors.

Time: O(n)Space: O(1)
#inner product#dot product#norm#l1 norm#l2 norm#lp norm#linfinity norm#cauchy schwartz#cosine similarity#projection#triangle inequality#holder inequality#minkowski inequality#euclidean distance#manhattan distance