Bagging & Ensemble
π² Bagging & Ensemble Method
π― 1. What is Bagging (Bootstrap Aggregation)?
Bagging creates B bootstrap datasets from the original training data and trains B separate models.
Each model:
\[\hat{f}^{(b)}(x), \quad b = 1,2,...,B\]Final prediction (Regression):
\[\hat{f}_{bag}(x) = \frac{1}{B}\sum_{b=1}^{B}\hat{f}^{(b)}(x)\]Final prediction (Classification):
\[\hat{y} = \text{majority vote of } \hat{y}^{(1)},...,\hat{y}^{(B)}\]π Bagging averages predictions, not parameters β works for ANY model.
π§ 2. Why Bagging Works (Variance Reduction)
Bagging is similar to wisdom of crowd.
If we average independent estimators:
\[Var(\bar{Z}) = \frac{\sigma^2}{n}\]Thus averaging reduces variance.
Total error:
\[MSE(\hat{\theta}) = Var(\hat{\theta}) + Bias(\hat{\theta})^2\]π Bagging reduces variance without increasing bias.
Cost: Need to train B models.
π 3. Mathematical Analysis of Bagging
Let:
\[y_b(x) = h(x) + \epsilon_b(x)\]Where:
- $h(x)$ = true function
- $\epsilon_b(x)$ = error of model $b$
Error of single model
\[E_{single} = E_x[(y_b(x) - h(x))^2] = E_x[\epsilon_b(x)^2]\]Error of combined model
\[E_{comb} = E_x\left[\left(\frac{1}{B}\sum_{b=1}^{B} y_b(x) - h(x)\right)^2\right]\]π Theorem 1 β Ensemble never worse
The expected error of ensemble β€ single model.
Using Jensenβs inequality:
\[E_{single} \ge E_{comb}\]π Theorem 2 β Error can shrink by 1/B
Expand:
\[E_{comb} = E_x\left[\left(\frac{1}{B}\sum_{b=1}^{B}\epsilon_b(x)\right)^2\right]\] \[= E_x\left[\frac{1}{B^2}\sum_{b=1}^{B}\epsilon_b(x)^2 + \frac{2}{B^2}\sum_{j\ne k}\epsilon_j(x)\epsilon_k(x)\right]\]If models are independent:
\[E[\epsilon_j(x)\epsilon_k(x)] = 0\]Then:
\[E_{comb} = \frac{1}{B}E_{single}\]π If models identical β no gain
π More independence β better ensemble
π 4. General Ensemble Learning
Ensemble = Combine multiple models to improve prediction.
Types
1. Bagging
- Parallel models
- Reduce variance
- Example: Random Forest π²
2. Boosting
- Sequential models
- Reduce bias + variance
- Example: AdaBoost / Gradient Boosting β‘
3. Stacking
- Combine different model types using metaβmodel
π 5. Can Different Models Be Ensembled?
YES β Heterogeneous Ensemble
You can combine:
- Linear model + Tree + Neural Net
- SVM + Random Forest + Logistic
- Any models with prediction output
Common combination methods:
Averaging (Regression)
\[\hat{y} = \sum_{m=1}^{M} w_m \hat{y}_m\]Majority Vote (Classification)
\[\hat{y} = \arg\max_k \sum_{m=1}^{M} I(\hat{y}_m = k)\]Stacking (MetaβLearning)
Train second model:
\[\hat{y} = g(\hat{y}_1, \hat{y}_2, ..., \hat{y}_M)\]π 6. When Ensemble Works Best
Ensemble improves when:
- Models are accurate
- Models are diverse (uncorrelated errors)
- Individual models not identical
Key idea:
\[Var(\text{ensemble}) = \rho\sigma^2 + \frac{1-\rho}{B}\sigma^2\]Where:
- $\rho$ = correlation between models
- Lower $\rho$ β stronger ensemble
β οΈ 7. Limitations
- High computation cost
- Harder to interpret
- Little gain if models highly correlated
π 8. Summary
- Bagging reduces variance
- Ensemble error β€ single model
- Independence between models is critical
- Can combine same or different model types
- Foundation of Random Forest, Boosting, Stacking