Entry of Classification, Linear Regression vs Logistic Regression
๐ฏ Classification โ Linear Regression vs Logistic Regression
๐ Overview
In classification problems, our goal is to predict a discrete class label rather than a continuous value.
This post explains:
- Why Linear Regression is not suitable for classification โ
- Why Logistic Regression is the correct probabilistic model โ
- The mathematical intuition behind both approaches ๐
๐ง Classification Basics
๐ What is Classification?
Given:
- Feature vector: \(\mathbf{x}\)
- Class label: \(y \in C\)
We learn a function:
\[f(\mathbf{x}) \in C\]Often, we prefer probabilities instead of hard labels:
\[P(y = c \mid \mathbf{x})\]๐ก Why Probabilities Matter
Probabilistic outputs enable:
- ๐ฏ Riskโbased decision making
- โ๏ธ Threshold tuning
- ๐ฐ Costโsensitive classification
- ๐ Confidence estimation
Example: Fraud detection โ probability is more valuable than a binary decision.
โ ๏ธ Can Linear Regression Be Used for Classification?
Binary Encoding
\[y = \begin{cases} 0 & \text{No} \\ 1 & \text{Yes} \end{cases}\]One might try:
\[\hat{y} > 0.5 \Rightarrow \text{Class 1}\]๐ Why It Sometimes Works
Because:
\[\mathbb{E}[y \mid \mathbf{x}] = P(y=1 \mid \mathbf{x})\]Linear regression can approximate probabilities in limited cases.
โ Major Problems
1. Predictions outside [0,1]
Linear regression may produce:
- Negative probabilities โ
- Probabilities > 1 โ
Which is invalid for probability modeling.
2. Multiclass Problem
Numeric coding introduces fake ordering:
\[1=\text{stroke},\quad 2=\text{overdose},\quad 3=\text{seizure}\]Implies meaningless distance relationships โ โ incorrect structure.
๐ซ Conclusion
Linear regression is not suitable for classification.
Better alternatives:
- Logistic Regression โ
- Softmax / Multinomial Logistic Regression
- LDA / QDA
- Probabilistic classifiers
๐ท Logistic Regression
๐ฏ Goal
Model probability:
\[P(y=1 \mid \mathbf{x})\]We need a function mapping:
\[(-\infty,+\infty) \rightarrow (0,1)\]๐ Sigmoid Function
\[\sigma(s) = \frac{1}{1+e^{-s}}\]Properties:
- Smooth & monotonic ๐
- Valid probability output ๐ฏ
- Basis of Logistic Regression
๐ Logistic Model
\[P(y=1 \mid \mathbf{x}) = \frac{1}{1+e^{-\boldsymbol{\beta}^T\mathbf{x}}}\] \[P(y=0 \mid \mathbf{x}) = 1 - P(y=1 \mid \mathbf{x})\]๐ Interpretation
LogโOdds (Logit)
\[\log \frac{p}{1-p} = \boldsymbol{\beta}^T \mathbf{x}\]Meaning:
- Logistic regression is linear in logโodds, not probability.
Coefficient Meaning
If:
\[\hat{\beta}_1 > 0\]โ Increasing feature increases probability of class 1.
Each +1 unit change in feature increases logโodds by \(\beta_1\).
๐ Maximum Likelihood Estimation (MLE)
Likelihood
For binary outcome:
\[P(y_i=1|\mathbf{x}_i)=\sigma(\boldsymbol{\beta}^T\mathbf{x}_i)\]Dataset likelihood:
\[\mathcal{L}(\boldsymbol{\beta})=\prod_{i=1}^n p_i^{y_i}(1-p_i)^{1-y_i}\]LogโLikelihood
\[\ell(\boldsymbol{\beta})=\sum_{i=1}^n \left[ y_i \log p_i + (1-y_i)\log(1-p_i) \right]\]Equivalent to minimizing crossโentropy loss.
๐งฎ Linear vs Logistic โ Key Differences
| Feature | Linear Regression | Logistic Regression |
|---|---|---|
| Output | Real value | Probability (0โ1) |
| Task | Regression | Classification |
| Valid Probabilities | โ | โ |
| Decision Boundary | Linear | Linear (in logโodds) |
| Optimization | Least Squares | MLE / CrossโEntropy |
| Multiclass Extension | โ | Softmax |
๐ Key Takeaways
- Linear regression can approximate classification but is not probabilistically valid โ
- Logistic regression models true probabilities using sigmoid โ
- Model is linear in logโodds ๐
- Estimated using Maximum Likelihood ๐
- Foundation of modern classification methods ๐ง
๐ (Optional Extensions)
- ๐น Regularization (L1 / L2)
- ๐น Multiclass Softmax Regression
- ๐น Decision boundary geometry
- ๐น Gradient / Hessian derivation
- ๐น Newton / IRLS optimization