Information Theory

  • Entropy

    The entropy of discrete distribution \(p\) (probability mass function) is defined as \[ H(p) = -\mathrm{E}_{x \sim p}\log p(x) \] The entropy reaches its maximum when the underlying distribution \(p\) is a uniform distribution.

  • Conditional Entropy

    The conditional entropy measures the the amount of information needed to describe the outcome of a random variable \(Y\) given that the value of another random variable \(X\) is known.

  • Cross Entropy

    The cross entropy between two distributions over the same underlying set of events measures the average number of bits to identify the event drawn from the set if a coding scheme is used for the set is optimized for probability distribution \(q\), instead of the true distribution \(p\).

  • Mutual Information

    Mutual information of two random variables \(X\) and \(Y\) is a measure of the mutual independence between them. It quantifies the amount of information obtained about one random variable by observing the other random variable.

  • KL-divergence

    KL-divergence KL-divergence, denoted as \(D_{KL}(p\|q)\), is statistical distance, measuring how the probability distribution \(q\) is different from the reference probability distribution \(p\), both defined on \(X \in \mathcal{X}\). In information theory, it measures the relative entropy from \(q\) to \(p\), which is the average number of extra bits required to represent a message with \(q\) instead of \(p\).

  • f-divergence

    \(f\)-divergence can be treated as the generalization of the KL-divergence. It is defined as \[ D_f (p||q) = \int f(\frac{p(x)}{q(x)}) q(x)\ \d x \] where \(f\) has to satisfy that \(f(1) = 0\) and \(f\) is a convex function.

  • Jenson-Shannon Divergence

    Jenson-Shannon Divergence In probability theory and statistics, Jenson-Shannon divergence is another method of measuring the distance between two distributions. It is based on KL-divergence with some notable differences. KL-divergence does not make a good measure of distance between distributions, since in the first place it is not symmetric.

  • Overview

    Both Cross Entropy and KL-divergence describe the relationship between two distributions. Both conditional entropy and mutual information, as well as joint entropy, describe the relationship between two random variables.