Post

Matrix Calculus

Matrix Calculus

Matrix Calculus


Matrix Calculus

1
Calculus

What is Matrix Calculus

1. What is Matrix Calculus?

Matrix calculus is widely used in machine learning, optimization, computer vision, and statistics.
It extends ordinary derivatives to vectors and matrices.

  1. Scalar Function of a Vector
\[f(x) \in \mathbb{R} ,\quad x \in \mathbb{R}^n\]

The derivative of (f) with respect to (x) is called the gradient:

\[\nabla_x f = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix}\]

This gives the direction of the steepest increase.


  1. Derivative of a Linear Function
\[f(x) = a^T x ,\quad a \in \mathbb{R}^n\]

Then

\[\frac{\partial}{\partial x}(a^T x) = a ,\quad \frac{\partial}{\partial x}(x^T a) = a\]
  1. Derivative of a Quadratic Form
\[f(x) = x^T A x ,\quad A \in \mathbb{R}^{n \times n}\]

The derivative is

\[\frac{\partial}{\partial x}(x^T A x) = (A + A^T)x\]
  1. Norm Derivative
\[f(x) = ||x||^2 \quad \rightarrow \quad ||x||^2 = x^T x\]

The derivative becomes

\[\frac{\partial}{\partial x}(x^T x) = 2x\]
  1. Least Squares Derivative
\[f(x) = ||Ax - b||^2\]

where

  • $A \in \mathbb{R}^{m\times n},\; x \in \mathbb{R}^n,\; b \in \mathbb{R}^m$
\[f(x) = (Ax-b)^T(Ax-b) = x^T A^T A x - 2b^T A x + b^T b\]

Now differentiate with respect to (x):

\[\nabla_x f = 2A^TAx - 2A^Tb\]

Setting the gradient to zero gives the normal equation:

\[A^TAx = A^Tb\]
  1. Trace trick

Often matrix derivatives are easier using trace identities.

Example:

\[x^T A x = \text{tr}(x^T A x)\]

Using trace rules simplifies many derivations in machine learning.

This post is licensed under CC BY 4.0 by the author.