Matrix Calculus
Matrix Calculus
Matrix Calculus
Matrix Calculus
1
Calculus
What is Matrix Calculus
1. What is Matrix Calculus?
Matrix calculus is widely used in machine learning, optimization, computer vision, and statistics.
It extends ordinary derivatives to vectors and matrices.
\[f(x) \in \mathbb{R} ,\quad x \in \mathbb{R}^n\]
- Scalar Function of a Vector
The derivative of (f) with respect to (x) is called the gradient:
\[\nabla_x f = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix}\]This gives the direction of the steepest increase.
\[f(x) = a^T x ,\quad a \in \mathbb{R}^n\]
- Derivative of a Linear Function
Then
\[\frac{\partial}{\partial x}(a^T x) = a ,\quad \frac{\partial}{\partial x}(x^T a) = a\]\[f(x) = x^T A x ,\quad A \in \mathbb{R}^{n \times n}\]
- Derivative of a Quadratic Form
The derivative is
\[\frac{\partial}{\partial x}(x^T A x) = (A + A^T)x\]\[f(x) = ||x||^2 \quad \rightarrow \quad ||x||^2 = x^T x\]
- Norm Derivative
The derivative becomes
\[\frac{\partial}{\partial x}(x^T x) = 2x\]\[f(x) = ||Ax - b||^2\]
- Least Squares Derivative
where
- $A \in \mathbb{R}^{m\times n},\; x \in \mathbb{R}^n,\; b \in \mathbb{R}^m$
Now differentiate with respect to (x):
\[\nabla_x f = 2A^TAx - 2A^Tb\]Setting the gradient to zero gives the normal equation:
\[A^TAx = A^Tb\]
- Trace trick
Often matrix derivatives are easier using trace identities.
Example:
\[x^T A x = \text{tr}(x^T A x)\]Using trace rules simplifies many derivations in machine learning.
This post is licensed under CC BY 4.0 by the author.