Bagging & Ensemble

Posted Feb 8, 2026 Updated Mar 25, 2026

2 min read

Bagging & Ensemble

🌲 Bagging & Ensemble Method

🎯 1. What is Bagging (Bootstrap Aggregation)?

Bagging creates B bootstrap datasets from the original training data and trains B separate models.

Each model:

\[\hat{f}^{(b)}(x), \quad b = 1,2,...,B\]

Final prediction (Regression):

\[\hat{f}_{bag}(x) = \frac{1}{B}\sum_{b=1}^{B}\hat{f}^{(b)}(x)\]

Final prediction (Classification):

\[\hat{y} = \text{majority vote of } \hat{y}^{(1)},...,\hat{y}^{(B)}\]

📌 Bagging averages predictions, not parameters → works for ANY model.

🧠 2. Why Bagging Works (Variance Reduction)

Bagging is similar to wisdom of crowd.

If we average independent estimators:

\[Var(\bar{Z}) = \frac{\sigma^2}{n}\]

Thus averaging reduces variance.

Total error:

\[MSE(\hat{\theta}) = Var(\hat{\theta}) + Bias(\hat{\theta})^2\]

👉 Bagging reduces variance without increasing bias.

Cost: Need to train B models.

📊 3. Mathematical Analysis of Bagging

Let:

\[y_b(x) = h(x) + \epsilon_b(x)\]

Where:

$h(x)$ = true function
$\epsilon_b(x)$ = error of model $b$

Error of single model

\[E_{single} = E_x[(y_b(x) - h(x))^2] = E_x[\epsilon_b(x)^2]\]

Error of combined model

\[E_{comb} = E_x\left[\left(\frac{1}{B}\sum_{b=1}^{B} y_b(x) - h(x)\right)^2\right]\]

📌 Theorem 1 — Ensemble never worse

The expected error of ensemble ≤ single model.

Using Jensen’s inequality:

\[E_{single} \ge E_{comb}\]

📌 Theorem 2 — Error can shrink by 1/B

Expand:

\[E_{comb} = E_x\left[\left(\frac{1}{B}\sum_{b=1}^{B}\epsilon_b(x)\right)^2\right]\] \[= E_x\left[\frac{1}{B^2}\sum_{b=1}^{B}\epsilon_b(x)^2 + \frac{2}{B^2}\sum_{j\ne k}\epsilon_j(x)\epsilon_k(x)\right]\]

If models are independent:

\[E[\epsilon_j(x)\epsilon_k(x)] = 0\]

Then:

\[E_{comb} = \frac{1}{B}E_{single}\]

📌 If models identical → no gain
📌 More independence → better ensemble

🌐 4. General Ensemble Learning

Ensemble = Combine multiple models to improve prediction.

Types

1. Bagging

Parallel models
Reduce variance
Example: Random Forest 🌲

2. Boosting

Sequential models
Reduce bias + variance
Example: AdaBoost / Gradient Boosting ⚡

3. Stacking

Combine different model types using meta‑model

🔀 5. Can Different Models Be Ensembled?

YES — Heterogeneous Ensemble

You can combine:

Linear model + Tree + Neural Net
SVM + Random Forest + Logistic
Any models with prediction output

Common combination methods:

Averaging (Regression)

\[\hat{y} = \sum_{m=1}^{M} w_m \hat{y}_m\]

Majority Vote (Classification)

\[\hat{y} = \arg\max_k \sum_{m=1}^{M} I(\hat{y}_m = k)\]

Stacking (Meta‑Learning)

Train second model:

\[\hat{y} = g(\hat{y}_1, \hat{y}_2, ..., \hat{y}_M)\]

📌 6. When Ensemble Works Best

Ensemble improves when:

Models are accurate
Models are diverse (uncorrelated errors)
Individual models not identical

Key idea:

\[Var(\text{ensemble}) = \rho\sigma^2 + \frac{1-\rho}{B}\sigma^2\]