Statistical Interpret for MLE
Statistical Interpret for MLE
📊 Bias, Variance, and Mean Squared Error (MSE)
1. Definitions
Bias
Bias is the systematic error of an estimator.
\[\mathrm{Bias}(\hat{\theta}) = \mathbb{E}[\hat{\theta}] - \theta\]Variance
Variance measures the expected squared deviation from the mean.
\[\mathrm{Var}(\hat{\theta}) = \mathbb{E}\left[(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2\right]\]Mean Squared Error (MSE)
Mean Squared Error measures the expected squared difference between the estimator and the true value.
\[\mathrm{MSE}(\hat{\theta}) = \mathbb{E}\left[\|\hat{\theta} - \theta\|^2\right] = \mathbb{E}\left[\sum_{j=1}^{d} (\hat{\theta}_j - \theta_j)^2 \right]\]2. Bias–Variance Decomposition Theorem
Theorem: The MSE of an estimator equals the sum of its variance and squared bias.
\[\mathrm{MSE}(\hat{\theta}) = \mathrm{tr}(\mathrm{Var}(\hat{\theta})) + \|\mathrm{Bias}(\hat{\theta})\|^2\]3. Proof (Scalar Case)
\[\mathbb{E}\left[(\hat{\theta} - \theta)^2\right] = \mathbb{E}\left[(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2\right] + (\mathbb{E}[\hat{\theta}] - \theta)^2\]Expand:
\[\mathbb{E}\left[(\hat{\theta} - \theta)^2\right] = \mathbb{E}\left[(\hat{\theta} - \mathbb{E}[\hat{\theta}] + \mathbb{E}[\hat{\theta}] - \theta)^2\right]\] \[= \mathbb{E}\left[(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2 + 2(\hat{\theta} - \mathbb{E}[\hat{\theta}])(\mathbb{E}[\hat{\theta}] - \theta) + (\mathbb{E}[\hat{\theta}] - \theta)^2 \right]\] \[= \mathbb{E}[(\hat{\theta} - \mathbb{E}[\hat{\theta}])^2] + 2\mathbb{E}[(\hat{\theta} - \mathbb{E}[\hat{\theta}])](\mathbb{E}[\hat{\theta}] - \theta) + (\mathbb{E}[\hat{\theta}] - \theta)^2\]Since:
\[\mathbb{E}[\hat{\theta} - \mathbb{E}[\hat{\theta}]] = 0\] \[\mathrm{MSE}(\hat{\theta}) = \mathrm{Var}(\hat{\theta}) + \mathrm{Bias}(\hat{\theta})^2\]4. MSE of Linear Regression
Using bias–variance decomposition:
\[\mathrm{MSE}(\hat{\beta} \mid X) = \mathrm{Var}(\hat{\beta} \mid X) + \mathrm{Bias}(\hat{\beta} \mid X)^2\]For Ordinary Least Squares (OLS):
- Linear regression is unbiased:
- Variance:
Therefore:
\[\mathrm{MSE}(\hat{\beta} \mid X) = \sigma^2 (X^T X)^{-1}\]Notes
- Linear regression estimator is unbiased.
- As sample size increases, variance decreases.
- With infinite data, variance converges to 0.
- OLS (MLE) is the Best Linear Unbiased Estimator (BLUE) under Gauss–Markov assumptions.
This post is licensed under CC BY 4.0 by the author.