Information Theory

Entropy

The entropy of discrete distribution \(p\) (probability mass function) is defined as \[ H(p) = -\mathrm{E}_{x \sim p}\log p(x) \] The entropy reaches its maximum when the underlying distribution \(p\) is a uniform distribution.
Conditional Entropy

The conditional entropy measures the the amount of information needed to describe the outcome of a random variable \(Y\) given that the value of another random variable \(X\) is known.
Cross Entropy

The cross entropy between two distributions over the same underlying set of events measures the average number of bits to identify the event drawn from the set if a coding scheme is used for the set is optimized for probability distribution \(q\), instead of the true distribution \(p\).
Mutual Information

Mutual information of two random variables \(X\) and \(Y\) is a measure of the mutual independence between them. It quantifies the amount of information obtained about one random variable by observing the other random variable.
KL-divergence

KL-divergence KL-divergence, denoted as \(D_{KL}(p\|q)\), is statistical distance, measuring how the probability distribution \(q\) is different from the reference probability distribution \(p\), both defined on \(X \in \mathcal{X}\). In information theory, it measures the relative entropy from \(q\) to \(p\), which is the average number of extra bits required to represent a message with \(q\) instead of \(p\).
f-divergence

\(f\)-divergence can be treated as the generalization of the KL-divergence. For continuous random variable, it is defined as \[ D_f (p||q) = \int f(\frac{p(x)}{q(x)}) q(x)\ \d x \] where \(f\) has to satisfy that \(f(1) = 0\) and \(f\) is a convex function.
Jenson-Shannon Divergence

Jenson-Shannon Divergence In probability theory and statistics, Jenson-Shannon divergence is another method of measuring the distance between two distributions. It is based on KL-divergence with some notable differences. KL-divergence does not make a good measure of distance between distributions, since in the first place it is not symmetric.
Overview

Both cross entropy and KL-divergence describe the relationship between two different probability measures/functions over the same event space. Both conditional entropy and mutual information, as well as joint entropy, describe the relationship between two different random variables.

Last updated on Jul 10, 2022