Bias-variance Decomposition
The notation used is as follows:
| Symbol | Notation |
|---|---|
| \(\mathcal D\) | the dataset |
| \(x\) | the sample |
| \(y_\mathcal D\) | the observation of \(x\) in \(\mathcal D\), affected by noise |
| \(y\) | the real value of \(x\) |
| \(\bar y\) | the mean of the real values |
| \(f\) | the model learned with \(\mathcal D\) |
| \(f(x)\) | the prediction of \(f\) with \(x\) |
| \(\bar f(x)\) | the expectation of prediction of \(f\) with \(x\) |
| \(l(f(x), y_\mathcal D)\) | the loss function, chosen to be squared error |
By assuming that the observation errors averages to \(0\), the expectation of the error will be \[ \begin{aligned} E_{x \sim \mathcal D}&[l(f(x), y_\mathcal D)] = E[(f(x) - y_\mathcal D)^2] = E\{[(f(x) - \bar f(x) + (\bar f(x) - y_\mathcal D)]^2\} \\ &= E[(f(x) - \bar f(x))^2] + E[(\bar f(x) - y_\mathcal D)^2] + 2E[(f(x) - \bar f(x))(\bar f(x) - y_\mathcal D)] \\ &= E[(f(x) - \bar f(x))^2] + E\{[(\bar f(x) - y) + (y - y_\mathcal D)]^2\} \\ &\quad + 2\underbrace{E[f(x) - \bar f(x)]}_0 E[\bar f(x) - y_\mathcal D] \\ &= E[(f(x) - \bar f(x))^2] + E[(\bar f(x) - y)^2] + E[(y - y_\mathcal D)^2] + 2E[(\bar f(x) - y)(y - y_\mathcal D)] \\ &= E[(f(x) - \bar f(x))^2] + E[(y - y_\mathcal D)^2] + E\{[(\bar f(x) - \bar y) + (\bar y - y)]^2\} \\ &\quad + 2E[\bar f(x) - y] \underbrace{E[y - y_\mathcal D]}_0 \\ &= E[(f(x) - \bar f(x))^2] + E[(y - y_\mathcal D)^2] + E\{[(\bar f(x) - \bar y) + (\bar y - y)]^2\} \\ &= E[(f(x) - \bar f(x))^2] + E[(y - y_\mathcal D)^2] + E[(\bar f(x) - \bar y)^2] + E[(\bar y - y)^2] + 2E[(\bar f(x) - \bar y)(\bar y - y)] \\ &= E[(f(x) - \bar f(x))^2] + E[(y - y_\mathcal D)^2] + E[(\bar f(x) - \bar y)^2] + E[(\bar y - y)^2] \\ &\quad + 2E[\bar f(x) - \bar y]\underbrace{E[\bar y - y]}_0 \\ &= \underbrace{E[(f(x) - \bar f(x))^2]}_{variance} + \underbrace{E[(\bar f(x) - \bar y)^2]}_{bias^2} + \underbrace{E[(y - y_\mathcal D)^2]}_{noise} + \underbrace{E[(\bar y - y)^2]}_{scatter} \\ \end{aligned} \] 5 ways to achieve right balance of Bias and Variance in ML model | by Niwratti Kasture | Analytics Vidhya | Medium