Loss Function
Loss Function
π Margin-Based Loss Functions
π― What is a Loss Function?
A loss function measures how wrong a model prediction is.
Given prediction $\hat{y}$ and label $y$:
\[\mathcal{L}(\hat{y}, y)\]- If $\hat{y} = y$ β small loss (β0)
- If $\hat{y} \ne y$ β larger loss
- Larger error β larger penalty
π§ Binary Classification Setup
Labels:
\[y \in \{-1, +1\}\]Model outputs a score:
\[f(x) = w^T x + b\]Prediction:
\[\hat{y} = \text{sign}(f(x))\]Define margin:
\[m = y f(x) = y(w^T x + b)\]- $m > 0$ β correct classification
- $m < 0$ β wrong classification
Loss depends only on margin $m$ β Marginβbased loss
π 0/1 Loss
Definition:
\[\mathcal{L}_{0/1}(m) = \begin{cases} 0 & m > 0 \\ 1 & m \le 0 \end{cases}\]Nonβdifferentiable β hard to optimize.
π Logistic (Log) Loss β Full Derivation
Logistic model:
\[P(y=1|x) = \sigma(f(x)) = \frac{1}{1 + e^{-f(x)}}\]Likelihood:
\[P(y|x) = \sigma(f(x))^{\frac{1+y}{2}} (1-\sigma(f(x)))^{\frac{1-y}{2}}\]Negative log-likelihood:
\[\mathcal{L}_{log}(m) = \log(1 + e^{-m})\]Properties:
- Smooth & differentiable
- Probabilistic interpretation
- Penalizes wrong predictions smoothly
Gradient:
\[\frac{d}{dm}\log(1+e^{-m}) = -\frac{1}{1+e^{m}}\]π Exponential Loss β AdaBoost
Used in AdaBoost:
\[\mathcal{L}_{exp}(m) = e^{-m}\]Derivation (Boosting objective):
Minimize weighted classification error:
\[\sum_i e^{-y_i f(x_i)}\]Properties:
- Very sensitive to outliers
- Large penalty for misclassified points
π Hinge Loss β SVM
Definition:
\[\mathcal{L}_{hinge}(m) = \max(0, 1 - m)\]From SoftβMargin SVM:
Primal problem:
\[\min_w \frac{1}{2}\|w\|^2 + C\sum_i \xi_i\]with:
\[\xi_i = \max(0, 1 - y_i f(x_i))\]Substitute β
\[\min_w \frac{1}{2}\|w\|^2 + C\sum_i \max(0, 1 - y_i f(x_i))\]π SVM = Hinge Loss + L2 Regularization
π Geometric Interpretation
Margin:
\[m = y f(x)\]- $m \ge 1$ β no loss
- $0 < m < 1$ β inside margin
- $m < 0$ β misclassified
π¬ Comparison of Loss Functions
| Loss | Formula | Smooth | Robust to Noise |
|---|---|---|---|
| 0/1 | $\mathbf{1}(m \le 0)$ | β | Medium |
| Logistic | $\log(1+e^{-m})$ | β | Good |
| Exponential | $e^{-m}$ | β | β Sensitive |
| Hinge | $\max(0,1-m)$ | Piecewise | Good |
βοΈ Logistic vs Hinge vs Exponential
Logistic
- Smooth optimization
- Probabilistic
- Robust
Hinge
- Sparse solution (Support Vectors)
- Margin maximization
- Efficient
Exponential
- Strong focus on hard samples
- Sensitive to outliers
π₯ Unified Risk Minimization
All margin-based classifiers minimize:
\[\min_w \frac{\lambda}{2}\|w\|^2 + \sum_i \mathcal{L}(y_i f(x_i))\]Different loss β different algorithm.
π Shapes of Loss Functions
- 0/1 β step
- Logistic β smooth Sβcurve
- Exponential β steep penalty
- Hinge β piecewise linear
π Summary
- Loss depends on margin $m = y f(x)$
- Logistic β probabilistic smooth loss
- Exponential β boosting loss
- Hinge β SVM loss
- All are special cases of regularized risk minimization
This post is licensed under CC BY 4.0 by the author.