Bias-variance Decomposition

The notation used is as follows:

Symbol Notation
D the dataset
x the sample
yD the observation of x in D, affected by noise
y the real value of x
y¯ the mean of the real values
f the model learned with D
f(x) the prediction of f with x
f¯(x) the expectation of prediction of f with x
l(f(x),yD) the loss function, chosen to be squared error

By assuming that the observation errors averages to 0, the expectation of the error will be (1)ExD[l(f(x),yD)]=E[(f(x)yD)2]=E{[(f(x)f¯(x)+(f¯(x)yD)]2}=E[(f(x)f¯(x))2]+E[(f¯(x)yD)2]+2E[(f(x)f¯(x))(f¯(x)yD)]=E[(f(x)f¯(x))2]+E{[(f¯(x)y)+(yyD)]2}+2E[f(x)f¯(x)]0E[f¯(x)yD]=E[(f(x)f¯(x))2]+E[(f¯(x)y)2]+E[(yyD)2]+2E[(f¯(x)y)(yyD)]=E[(f(x)f¯(x))2]+E[(yyD)2]+E{[(f¯(x)y¯)+(y¯y)]2}+2E[f¯(x)y]E[yyD]0=E[(f(x)f¯(x))2]+E[(yyD)2]+E{[(f¯(x)y¯)+(y¯y)]2}=E[(f(x)f¯(x))2]+E[(yyD)2]+E[(f¯(x)y¯)2]+E[(y¯y)2]+2E[(f¯(x)y¯)(y¯y)]=E[(f(x)f¯(x))2]+E[(yyD)2]+E[(f¯(x)y¯)2]+E[(y¯y)2]+2E[f¯(x)y¯]E[y¯y]0=E[(f(x)f¯(x))2]variance+E[(f¯(x)y¯)2]bias2+E[(yyD)2]noise+E[(y¯y)2]scatter 5 ways to achieve right balance of Bias and Variance in ML model | by Niwratti Kasture | Analytics Vidhya | Medium

Previous
Next