Maximum Entropy Property of the
where denotes the logarithm base 2. The entropy of can be interpreted as the average number of bits needed to specify random variables drawn at random according to :
The term can be viewed as the number of bits which should be assigned to the value . (The most common values of should be assigned the fewest bits, while rare values can be assigned many bits.)
Consider a random sequence of 1s and 0s, i.e., the probability of a 0 or 1 is always . The corresponding probability density function is
and the entropy is
Thus, 1 bit is required for each bit of the sequence. In other words, the sequence cannot be compressed. There is no redundancy.
If instead the probability of a 0 is 1/4 and that of a 1 is 3/4, we get
and the sequence can be compressed about .
In the degenerate case for which the probability of a 0 is 0 and that of a 1 is 1, we get
Thus, the entropy is 0 when the sequence is perfectly predictable.
Among probability distributions which are nonzero over a finite range of values , the maximum-entropy distribution is the uniform distribution. To show this, we must maximize the entropy,
with respect to , subject to the constraints
Using the method of Lagrange multipliers for optimization in the presence of constraints , we may form the objective function
and differentiate with respect to (and renormalize by dropping the factor multiplying all terms) to obtain
Setting this to zero and solving for gives
(Setting the partial derivative with respect to to zero merely restates the constraint.)
Choosing to satisfy the constraint gives , yielding
That this solution is a maximum rather than a minimum or inflection point can be verified by ensuring the sign of the second partial derivative is negative for all :
Since the solution spontaneously satisfied , it is a maximum.
To the previous case, we add the new constraint
resulting in the objective function
Now the partials with respect to are
and is of the form . The unit-area and finite-mean constraints result in and , yielding
Proceeding as before, we obtain the objective function
and partial derivatives
For more on entropy and maximum-entropy distributions, see .
Gaussian Probability Density Function