Information Theory
-
Entropy
The entropy of discrete distribution \(p\) (probability mass function) is defined as \[ H(p) = -\mathrm{E}_{x \sim p}\log p(x) \] The entropy reaches its maximum when the underlying distribution \(p\) is a uniform distribution.
-
Conditional Entropy
The conditional entropy measures the the amount of information needed to describe the outcome of a random variable \(Y\) given that the value of another random variable \(X\) is known.
-
Cross Entropy
The cross entropy between two distributions over the same underlying set of events measures the average number of bits to identify the event drawn from the set if a coding scheme is used for the set is optimized for probability distribution \(q\), instead of the true distribution \(p\).
-
Mutual Information
Mutual information of two random variables \(X\) and \(Y\) is a measure of the mutual independence between them. It quantifies the amount of information obtained about one random variable by observing the other random variable.
-
KL-divergence
KL-divergence KL-divergence, denoted as \(D_{KL}(p\|q)\), is statistical distance, measuring how the probability distribution \(q\) is different from the reference probability distribution \(p\), both defined on \(X \in \mathcal{X}\). In information theory, it measures the relative entropy from \(q\) to \(p\), which is the average number of extra bits required to represent a message with \(q\) instead of \(p\).
-
f-divergence
\(f\)-divergence can be treated as the generalization of the KL-divergence. For continuous random variable, it is defined as \[ D_f (p||q) = \int f(\frac{p(x)}{q(x)}) q(x)\ \d x \] where \(f\) has to satisfy that \(f(1) = 0\) and \(f\) is a convex function.
-
Jenson-Shannon Divergence
Jenson-Shannon Divergence In probability theory and statistics, Jenson-Shannon divergence is another method of measuring the distance between two distributions. It is based on KL-divergence with some notable differences. KL-divergence does not make a good measure of distance between distributions, since in the first place it is not symmetric.
-
Overview
Both cross entropy and KL-divergence describe the relationship between two different probability measures/functions over the same event space. Both conditional entropy and mutual information, as well as joint entropy, describe the relationship between two different random variables.