Maximum Entropy Property of the
Gaussian Distribution
Entropy of a Probability Distribution
The entropy of a probability density function (PDF) is defined as [48]
(D.29) |
where denotes the logarithm base 2. The entropy of can be interpreted as the average number of bits needed to specify random variables drawn at random according to :
(D.30) |
The term can be viewed as the number of bits which should be assigned to the value . (The most common values of should be assigned the fewest bits, while rare values can be assigned many bits.)
Example: Random Bit String
Consider a random sequence of 1s and 0s, i.e., the probability of a 0 or 1 is always . The corresponding probability density function is
(D.31) |
and the entropy is
(D.32) |
Thus, 1 bit is required for each bit of the sequence. In other words, the sequence cannot be compressed. There is no redundancy.
If instead the probability of a 0 is 1/4 and that of a 1 is 3/4, we get
and the sequence can be compressed about .
In the degenerate case for which the probability of a 0 is 0 and that of a 1 is 1, we get
Thus, the entropy is 0 when the sequence is perfectly predictable.
Maximum Entropy Distributions
Uniform Distribution
Among probability distributions which are nonzero over a finite range of values , the maximum-entropy distribution is the uniform distribution. To show this, we must maximize the entropy,
(D.33) |
with respect to , subject to the constraints
Using the method of Lagrange multipliers for optimization in the presence of constraints [86], we may form the objective function
(D.34) |
and differentiate with respect to (and renormalize by dropping the factor multiplying all terms) to obtain
(D.35) |
Setting this to zero and solving for gives
(D.36) |
(Setting the partial derivative with respect to to zero merely restates the constraint.)
Choosing to satisfy the constraint gives , yielding
(D.37) |
That this solution is a maximum rather than a minimum or inflection point can be verified by ensuring the sign of the second partial derivative is negative for all :
(D.38) |
Since the solution spontaneously satisfied , it is a maximum.
Exponential Distribution
Among probability distributions which are nonzero over a semi-infinite range of values and having a finite mean , the exponential distribution has maximum entropy.
To the previous case, we add the new constraint
(D.39) |
resulting in the objective function
Now the partials with respect to are
and is of the form . The unit-area and finite-mean constraints result in and , yielding
(D.40) |
Gaussian Distribution
The Gaussian distribution has maximum entropy relative to all probability distributions covering the entire real line but having a finite mean and finite variance .
Proceeding as before, we obtain the objective function
and partial derivatives
leading to
(D.41) |
For more on entropy and maximum-entropy distributions, see [48].
Next Section:
Gaussian Moments
Previous Section:
Gaussian Probability Density Function