$f$-divergence

\(f\)-divergence can be treated as the generalization of the KL-divergence. It is defined as \[ D_f (p||q) = \int f(\frac{p(x)}{q(x)}) q(x)\ \d x \] where \(f\) has to satisfy that \(f(1) = 0\) and \(f\) is a convex function. The reason for these two constraints is that we hope \[ \begin{gather} D_f (p||q) = 0 \text{ when $p=q$} \\ \forall p, q, D_f (p||q) \ge 0 \end{gather} \] When \(f(x) = x\log x\), \(f\)-divergence becomes KL-divergence.

Variational \(f\)-divergence

When \(p\) and \(q\) have no closed-form expression, it is difficult to compute the \(f\)-divergence. Therefore in practice \(f\)-divergence is computed with a variational expression: \[ D_f (p||q) = \sup_{T:\mathcal X \to \R} \{ \E_p[f(x)] + \E_q[f^* \circ T(x)] \} \] where \(f^*\) is the convex conjugate of \(f\). The derivation is as follows: \[ \begin{aligned} &D_f (p||q) = \int f(\frac{p(x)}{q(x)}) q(x)\ \d x \\ &= \int f^{**}(\frac{p(x)}{q(x)}) q(x)\ \d x \\ &= \int \sup_t [\frac{p(x)}{q(x)} t - f^*(t)] q(x)\ \d x \\ &= \int \sup_t[p(x) t - f^*(t) q(x)]\ \d x \\ &\Downarrow_{T(x) = \arg \sup_t[p(x) t - f^*(t) q(x)]} \\ &= \sup_{T:\mathcal X \to \R} \int [p(x) T(x) - f^*(T(x)) q(x)]\ \d x \\ &= \sup_{T:\mathcal X \to \R} \{ \E_p[f(x)] + \E_q[f^* \circ T(x)] \} \end{aligned} \]

Previous
Next