Introduction
The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point (Claude Shannon, 1948).
Similar to that stated above, the fundamental subject of machine learning is about with best effort recovering the original data using the so-called model or algorithm.
The recovery can be done by memoizing. In communication system, this is exactly what we want: we need to try our best to convey the message with the achievable highest fidelity. But it may be overwhelmingly costly when the volume of data becomes larger, such as in the case faced by machine learning. Thus, a better way would be to summarizing, i.e. using a function to reveal the correlation (here I don’t mean the mathematical term correlation) among data.
This book reviews the classic concepts and methods of the recovery process in communication system field. But it would also be interesting for machine learning folks, as the two fields meet the same capacity limit of the recovery.
A System View
Let’s make the life easier by introducing a few concepts. In communication, we have source message to convey. After being passed to and encoded by the encoder, the message travels along the channel to be finally decoded by the decoder and received by the receiver.
The encoding process includes transforming the data into digital form and introducing redundant information for the actual transmission. The redundancy is necessary to keep the underlying data from the noisy channel, which may alter bits, or add/drop bits (these two cases are very rare though) to the bit flow transmitting on it.
Information theory is concerned with the theoretical limit and potential for such systems. The coding theory is concerned with designing such a coding theme as to reach the limit/potential.
Error-correcting Codes
By the name of “error-correcting”, we mean that we want to be able to detect and correct errors; the retransmission is not an option.
The way we do this, as stated above, is to add redundancy. But that the redundancy needs to be cost-effective. The rate is the ratio between the amount of source data and that of the transmitted data; the higher the better.
There are some examples of coding scheme in this genre:
- repetition code
- block code
- Hamming code
The Best Performance?
There seems to be a tradeoff between the error probability (which we want to minimize) and the rate (which we want to maximize), which can be seen mathematically in the case of repetition code and Hamming code.
It seems that the error-rate curve, no matter for which kind of coding scheme, would pass the \((0, 0)\) point: in order to achieve a vanishingly small error probability, one would have to reduce the rate correspondingly to zero.
However, Shannon proved that this curve may intersect with the rate axis at some nonzero point! And this maximum rate at which the communication is possible with arbitrarily small error probability is called the capacity of the channel. The capacity is only related to the noise level of this channel. For any noisy binary symmetric channel with noise level of \(f\) (i.e., the probability that a bit is flipped), its capacity is \[ C(f) = 1 - H_2(f) = 1 - \left[ f \log \frac{1}{f} + (1-f) \log \frac{1}{1-f} \right] \]