Post

Loss Function

Loss Function

πŸ“˜ Margin-Based Loss Functions


🎯 What is a Loss Function?

A loss function measures how wrong a model prediction is.

Given prediction $\hat{y}$ and label $y$:

\[\mathcal{L}(\hat{y}, y)\]
  • If $\hat{y} = y$ β†’ small loss (β‰ˆ0)
  • If $\hat{y} \ne y$ β†’ larger loss
  • Larger error β†’ larger penalty

🧠 Binary Classification Setup

Labels:

\[y \in \{-1, +1\}\]

Model outputs a score:

\[f(x) = w^T x + b\]

Prediction:

\[\hat{y} = \text{sign}(f(x))\]

Define margin:

\[m = y f(x) = y(w^T x + b)\]
  • $m > 0$ β†’ correct classification
  • $m < 0$ β†’ wrong classification

Loss depends only on margin $m$ β†’ Margin‑based loss


πŸ“‰ 0/1 Loss

Definition:

\[\mathcal{L}_{0/1}(m) = \begin{cases} 0 & m > 0 \\ 1 & m \le 0 \end{cases}\]

Non‑differentiable β†’ hard to optimize.


πŸ“ˆ Logistic (Log) Loss β€” Full Derivation

Logistic model:

\[P(y=1|x) = \sigma(f(x)) = \frac{1}{1 + e^{-f(x)}}\]

Likelihood:

\[P(y|x) = \sigma(f(x))^{\frac{1+y}{2}} (1-\sigma(f(x)))^{\frac{1-y}{2}}\]

Negative log-likelihood:

\[\mathcal{L}_{log}(m) = \log(1 + e^{-m})\]

Properties:

  • Smooth & differentiable
  • Probabilistic interpretation
  • Penalizes wrong predictions smoothly

Gradient:

\[\frac{d}{dm}\log(1+e^{-m}) = -\frac{1}{1+e^{m}}\]

πŸ“Š Exponential Loss β€” AdaBoost

Used in AdaBoost:

\[\mathcal{L}_{exp}(m) = e^{-m}\]

Derivation (Boosting objective):

Minimize weighted classification error:

\[\sum_i e^{-y_i f(x_i)}\]

Properties:

  • Very sensitive to outliers
  • Large penalty for misclassified points

πŸ“ Hinge Loss β€” SVM

Definition:

\[\mathcal{L}_{hinge}(m) = \max(0, 1 - m)\]

From Soft‑Margin SVM:

Primal problem:

\[\min_w \frac{1}{2}\|w\|^2 + C\sum_i \xi_i\]

with:

\[\xi_i = \max(0, 1 - y_i f(x_i))\]

Substitute β†’

\[\min_w \frac{1}{2}\|w\|^2 + C\sum_i \max(0, 1 - y_i f(x_i))\]

πŸ‘‰ SVM = Hinge Loss + L2 Regularization


πŸ“ Geometric Interpretation

Margin:

\[m = y f(x)\]
  • $m \ge 1$ β†’ no loss
  • $0 < m < 1$ β†’ inside margin
  • $m < 0$ β†’ misclassified

πŸ”¬ Comparison of Loss Functions

LossFormulaSmoothRobust to Noise
0/1$\mathbf{1}(m \le 0)$❌Medium
Logistic$\log(1+e^{-m})$βœ”Good
Exponential$e^{-m}$βœ”βŒ Sensitive
Hinge$\max(0,1-m)$PiecewiseGood

βš–οΈ Logistic vs Hinge vs Exponential

Logistic

  • Smooth optimization
  • Probabilistic
  • Robust

Hinge

  • Sparse solution (Support Vectors)
  • Margin maximization
  • Efficient

Exponential

  • Strong focus on hard samples
  • Sensitive to outliers

πŸ”₯ Unified Risk Minimization

All margin-based classifiers minimize:

\[\min_w \frac{\lambda}{2}\|w\|^2 + \sum_i \mathcal{L}(y_i f(x_i))\]

Different loss β†’ different algorithm.


🌈 Shapes of Loss Functions

  • 0/1 β†’ step
  • Logistic β†’ smooth S‑curve
  • Exponential β†’ steep penalty
  • Hinge β†’ piecewise linear

πŸš€ Summary

  • Loss depends on margin $m = y f(x)$
  • Logistic β†’ probabilistic smooth loss
  • Exponential β†’ boosting loss
  • Hinge β†’ SVM loss
  • All are special cases of regularized risk minimization
This post is licensed under CC BY 4.0 by the author.