Post

Support Vector Machine (SVM)

Support Vector Machine (SVM)

๐Ÿง โœจ Support Vector Machine (SVM)

๐ŸŽฏ Goal: Find the maximum-margin hyperplane that separates two classes.


๐Ÿ“ฆ 1. Problem Setup

Training data:

\[\{(x_i, y_i)\}_{i=1}^n,\quad x_i \in \mathbb{R}^p,\quad y_i \in \{-1, +1\}\]

Hyperplane:

\[\beta^T x + \beta_0 = 0\]

Classifier:

\[\hat y = \mathrm{sign}(\beta^T x + \beta_0)\]

๐Ÿ“ 2. Distance from a Point to a Hyperplane (Full Derivation)

We compute the signed distance from a point $x$ to the hyperplane.

๐Ÿ”น Step 1 โ€” Closest point on the hyperplane

The closest point must lie along the normal vector $\beta$:

\[x_\perp = x - t\beta\]

for some scalar $t$.

๐Ÿ”น Step 2 โ€” Enforce hyperplane constraint

\[\beta^T x_\perp + \beta_0 = 0\]

Substitute:

\[\beta^T (x - t\beta) + \beta_0 = 0\] \[\beta^T x - t\beta^T\beta + \beta_0 = 0\]

Solve for $t$:

\[t = \frac{\beta^T x + \beta_0}{\|\beta\|^2}\]

๐Ÿ”น Step 3 โ€” Distance

\[\|x - x_\perp\| = \|t\beta\| = |t| \|\beta\|\] \[\|x - x_\perp\| = \frac{|\beta^T x + \beta_0|}{\|\beta\|}\]

Signed distance:

\[\boxed{ \mathrm{dist}(x,H) = \frac{\beta^T x + \beta_0}{\|\beta\|} }\]

๐Ÿš€ 3. Functional Margin and Geometric Margin

Functional margin:

\[y_i(\beta^T x_i + \beta_0)\]

Geometric margin:

\[\gamma_i = \frac{y_i(\beta^T x_i + \beta_0)}{\|\beta\|}\]

Dataset margin:

\[\gamma = \min_i \gamma_i = \min_i \frac{y_i(\beta^T x_i + \beta_0)}{\|\beta\|}\]

๐ŸŽฏ Goal: maximize margin.


๐Ÿ” 4. Max-Margin Optimization (Full Transformation)

We want:

\[\max_{\beta,\beta_0} \gamma\]

๐Ÿ”น Scaling invariance

\[\gamma(\beta,\beta_0) = \gamma(c\beta, c\beta_0)\]

So enforce normalization:

\[y_i(\beta^T x_i + \beta_0) \ge 1\]

Then:

\[\gamma = \frac{1}{\|\beta\|}\]

Thus:

\[\max \gamma \Longleftrightarrow \min \|\beta\|\]

Smooth objective:

\[\boxed{ \min_{\beta,\beta_0} \frac12 \|\beta\|^2 \quad \text{s.t.} \quad y_i(\beta^T x_i + \beta_0) \ge 1 }\]

๐Ÿ‘‰ This is the Hard-Margin SVM Primal Problem.


๐Ÿงฎ 5. Lagrangian Formulation

Introduce multipliers $\alpha_i \ge 0$:

\[\mathcal{L}(\beta,\beta_0,\alpha) = \frac12 \|\beta\|^2 + \sum_{i=1}^n \alpha_i \left(1 - y_i(\beta^T x_i + \beta_0)\right)\]

๐Ÿ“‰ 6. Stationarity Conditions

๐Ÿ”น Derivative w.r.t. $\beta$

\[\frac{\partial \mathcal{L}}{\partial \beta} = \beta - \sum_{i=1}^n \alpha_i y_i x_i = 0\] \[\boxed{ \beta = \sum_{i=1}^n \alpha_i y_i x_i }\]

๐Ÿ”น Derivative w.r.t. $\beta_0$

\[\frac{\partial \mathcal{L}}{\partial \beta_0} = -\sum_{i=1}^n \alpha_i y_i = 0\] \[\boxed{ \sum_{i=1}^n \alpha_i y_i = 0 }\]

๐Ÿ”„ 7. Dual Derivation (No Steps Skipped)

Substitute

\[\beta = \sum_{i=1}^n \alpha_i y_i x_i\]

into the Lagrangian.

๐Ÿ”น Expand $|\beta|^2$

\[\|\beta\|^2 = \left(\sum_{i=1}^n \alpha_i y_i x_i\right)^T \left(\sum_{j=1}^n \alpha_j y_j x_j\right)\]

$$

\sum_{i=1}^n \sum_{j=1}^n \alpha_i \alpha_j y_i y_j x_i^T x_j $$

Thus:

\[\frac12 \|\beta\|^2 = \frac12 \sum_{i,j} \alpha_i \alpha_j y_i y_j x_i^T x_j\]

๐Ÿ”น Expand middle term

\[\sum_{i=1}^n \alpha_i y_i \beta^T x_i = \sum_{i=1}^n \sum_{j=1}^n \alpha_i \alpha_j y_i y_j x_i^T x_j\]

๐Ÿ”น Combine

\[\mathcal{L} = \sum_{i=1}^n \alpha_i - \frac12 \sum_{i,j} \alpha_i \alpha_j y_i y_j x_i^T x_j\]

๐ŸŽฏ 8. Dual Optimization

\[\boxed{ \max_{\alpha} \sum_{i=1}^n \alpha_i - \frac12 \sum_{i,j} \alpha_i \alpha_j y_i y_j x_i^T x_j }\]

subject to:

\[\alpha_i \ge 0, \quad \sum_{i=1}^n \alpha_i y_i = 0\]

๐Ÿง  9. Final Decision Function

Optimal weight:

\[\beta = \sum_{i=1}^n \alpha_i y_i x_i\]

Decision function:

\[f(x) = \sum_{i=1}^n \alpha_i y_i x_i^T x + \beta_0\]

Prediction:

\[\boxed{\hat y = \mathrm{sign}(f(x))}\]

Only $\alpha_i > 0$ matter โ†’ Support Vectors.


๐Ÿ“Œ 10. Complementary Slackness

\[\alpha_i \left(1 - y_i(\beta^T x_i + \beta_0)\right) = 0\]
  • If $\alpha_i > 0$ โ‡’ point lies on margin
  • If point outside margin โ‡’ $\alpha_i = 0$

Only support vectors determine the boundary.


๐Ÿ”— 11. Kernel Form

Replace inner product:

\[x_i^T x \rightarrow K(x_i, x)\]

Decision function:

\[f(x) = \sum_{i=1}^n \alpha_i y_i K(x_i, x) + \beta_0\]

โœจ Key Insights

  • ๐Ÿง  SVM maximizes geometric margin
  • ๐Ÿ“ Primal minimizes weight norm
  • ๐Ÿ”Ž Dual depends only on inner products
  • ๐Ÿงท Solution is sparse (support vectors)
  • ๐Ÿ”ฅ Kernel trick enables nonlinear boundaries
This post is licensed under CC BY 4.0 by the author.