$f$-divergence

$f$-divergence can be treated as the generalization of the KL-divergence. For continuous random variable, it is defined as \[ D_f (p||q) = \int f(\frac{p(x)}{q(x)}) q(x)\ \d x \] where $f$ has to satisfy that $f(1) = 0$ and $f$ is a convex function. The reason for these two constraints is that we hope \[ \begin{gather} D_f (p||q) = 0 \text{ when $p=q$} \\ \forall p, q, D_f (p||q) \ge 0 \end{gather} \] To show it, \[ D_f (p||q) = \int f(\frac{p(x)}{q(x)}) q(x)\ \d x \ge f(\int \frac{p(x)}{q(x)} q(x) \d x) = f(1) = 0 \\ \] When $f(x) = x\log x$, $f$-divergence becomes KL-divergence. The log sum inequality property can be derived using the formulation of $f$-divergence: \[ \begin{aligned} \sum_{i=1}^n a_i \log \frac{a_i}{b_i} &= \sum_{i=1}^n b_i \underbrace{\frac{a_i}{b_i} \log \frac{a_i}{b_i}}_{f(\frac{a_i}{b_i})} \\ &= b \sum_{i=1}^n \frac{b_i}{b} f(\frac{a_i}{b_i}) \\ &\ge b f(\sum_{i=1}^n \frac{b_i}{b} \frac{a_i}{b_i}) \\ &= b f(\frac{a}{b}) \\ &= a \log \frac{a}{b} \end{aligned} \]

Variational $f$-divergence

When $p$ and $q$ have no closed-form expression, it is difficult to compute the $f$-divergence. Therefore in practice $f$-divergence is computed with a variational expression: \[ D_f (p||q) = \sup_{T:\mathcal X \to \R} \{ \E_p[f(x)] + \E_q[f^* \circ T(x)] \} \] where $f^*$ is the convex conjugate of $f$. The derivation is as follows: \[ \begin{aligned} &D_f (p||q) = \int f(\frac{p(x)}{q(x)}) q(x)\ \d x \\ &= \int f^{**}(\frac{p(x)}{q(x)}) q(x)\ \d x \\ &= \int \sup_t [\frac{p(x)}{q(x)} t - f^*(t)] q(x)\ \d x \\ &= \int \sup_t[p(x) t - f^*(t) q(x)]\ \d x \\ &\Downarrow_{T(x) = \arg \sup_t[p(x) t - f^*(t) q(x)]} \\ &= \sup_{T:\mathcal X \to \R} \int [p(x) T(x) - f^*(T(x)) q(x)]\ \d x \\ &= \sup_{T:\mathcal X \to \R} \{ \E_p[f(x)] + \E_q[f^* \circ T(x)] \} \end{aligned} \]

Last updated on May 3, 2022

$f$-divergence

Variational \(f\)-divergence