Fisher's Linear Discriminant
Two-class
Assume we have a set of \(N\)-dimensional samples \(D = \{x^{(i)}\}^M_{i=1}\) , \(M_1\) of which belong to Class \(1\), and \(M_2\) to Class \(2\) (\(M_1 + M_2 = M\)). We seek to obtain a scalar \(z\) by projecting \(x\) onto a unit vector \(v\): \(z^{(i)} = v^Tx^{(i)}\). Of all the possibilities we would like to select the one that maximizes the separability of the scalars between classes.
The mean of each class in the \(x\)-space and the \(z\)-space is \[ \mu_c = \frac{\sum_{i=1}^M\mathbb I[y^{(i)} = c]x^{(i)}}{\sum_{i=1}^M\mathbb I[y^{(i)} = c]} \\ \tilde \mu_c = \frac{\sum_{i=1}^M\mathbb I[y^{(i)} = c]v^T x^{(i)}}{\sum_{i=1}^M\mathbb I[y^{(i)} = c]} = v^T\mu_c\\ \] The scatter of each class in the \(x\)-space and the \(z\)-space is defined as \[ \begin{aligned} S_c &= \sum_{i=1}^M \mathbb I[y^{(i)} = c](x^{(i)} - \mu_c)(x^{(i)} - \mu_c)^T \\ \tilde s^2_c &= \sum_{i=1}^M\ \mathbb I[y^{(i)} = c](z^{(i)} - \tilde u_c)^2 \\ &= \sum_{i=1}^M\ \mathbb I[y^{(i)} = c](v^Tx^{(i)} - v^Tu_c)^2 \\ &= \sum_{i=1}^M\ \mathbb I[y^{(i)} = c]v^T(x^{(i)} - u_c)v^T(x^{(i)} - u_c) \\ &= \sum_{i=1}^M\ \mathbb I[y^{(i)} = c]v^T(x^{(i)} - u_c)(x^{(i)} - u_c)^Tv \\ &= v^TS_cv \end{aligned} \] FLD suggests in \(z\)-space maximizing the distance among the means of classes (inter-class scatter) and minimizing the variance over each class (within-class scatter), i.e. \[ \begin{aligned} \max_v J(v) &\coloneq \frac{(\tilde \mu_1 - \tilde \mu_2)^2}{\tilde s^2_1 + \tilde s^2_2} \\ &= \frac{(v^T\mu_1 - v^T\mu_2)^2}{v^TS_1v + v^TS_2v} \\ &= \frac{v^T(\mu_1 - \mu_2)(\mu_1 - \mu_2)^Tv}{v^T(S_1 + S_2)v} \\ \end{aligned} \] \(S_W = S_1 + S_2\) is called within-class scatter, \(S_B = (\mu_1 - \mu_2)(\mu_1 - \mu_2)^T\) is called inter-class scatter. Then, \[ J(v) = \frac{v^TS_Bv}{v^TS_Wv} \\ \]
Take derivative w.r.t. \(v\) and make it zero to give \[ \begin{aligned} \frac{\partial J}{\partial v} &= v^TS_Wv \frac{\partial v^TS_Bv}{\partial v} - v^TS_Bv \frac{\partial v^TS_Wv}{\partial v}& \\ &= (v^TS_Wv) 2S_Bv- (v^TS_Bv) 2S_Wv \\ \end{aligned} \]
\[ \begin{aligned} (v^TS_Wv) 2S_Bv- (v^TS_Bv) 2S_Wv &= 0 \\ S_Bv - \frac{(v^TS_Bv)}{(v^TS_Wv)} S_Wv &= 0 \\ S_Bv &= J(v) S_Wv \end{aligned} \]
If \(S_W\) is invertible, \(v\) is some eigenvector of \(S^{-1}_WS_B\): \[ S^{-1}_WS_Bv = J(v) v \]
\(S_Bv = (\mu_1 - \mu_2)(\mu_1 - \mu_2)^Tv = \alpha(\mu_1 - \mu_2)\) always points to the same direction as \((\mu_1 - \mu_2)\). Thus we can immediately give a \(v\) and verify: \[ v = S^{-1}_W(\mu_1 - \mu_2) \] \[ S^{-1}_WS_B\underbrace{S^{-1}_W(\mu_1 - \mu_2)}_v = S^{-1}_W\alpha(\mu_1 - \mu_2) = \underbrace{\alpha}_{J(v)} \underbrace{S^{-1}_W(\mu_1 - \mu_2)}_v \\ \]