## Maximum Entropy Property of the Gaussian Distribution

### Entropy of a Probability Distribution The entropy of a probability density function (PDF) is defined as (D.29)

where denotes the logarithm base 2. The entropy of can be interpreted as the average number of bits needed to specify random variables drawn at random according to : (D.30)

The term can be viewed as the number of bits which should be assigned to the value . (The most common values of should be assigned the fewest bits, while rare values can be assigned many bits.)

### Example: Random Bit String

Consider a random sequence of 1s and 0s, i.e., the probability of a 0 or 1 is always . The corresponding probability density function is (D.31)

and the entropy is (D.32)

Thus, 1 bit is required for each bit of the sequence. In other words, the sequence cannot be compressed. There is no redundancy. If instead the probability of a 0 is 1/4 and that of a 1 is 3/4, we get and the sequence can be compressed about . In the degenerate case for which the probability of a 0 is 0 and that of a 1 is 1, we get Thus, the entropy is 0 when the sequence is perfectly predictable.

### Maximum Entropy Distributions

#### Uniform Distribution

Among probability distributions which are nonzero over a finite range of values , the maximum-entropy distribution is the uniform distribution. To show this, we must maximize the entropy, (D.33)

with respect to , subject to the constraints Using the method of Lagrange multipliers for optimization in the presence of constraints , we may form the objective function (D.34)

and differentiate with respect to (and renormalize by dropping the factor multiplying all terms) to obtain (D.35)

Setting this to zero and solving for gives (D.36)

(Setting the partial derivative with respect to to zero merely restates the constraint.) Choosing to satisfy the constraint gives , yielding (D.37)

That this solution is a maximum rather than a minimum or inflection point can be verified by ensuring the sign of the second partial derivative is negative for all : (D.38)

Since the solution spontaneously satisfied , it is a maximum.

#### Exponential Distribution

Among probability distributions which are nonzero over a semi-infinite range of values and having a finite mean , the exponential distribution has maximum entropy. To the previous case, we add the new constraint (D.39)

resulting in the objective function Now the partials with respect to are and is of the form . The unit-area and finite-mean constraints result in and , yielding (D.40)

#### Gaussian Distribution

The Gaussian distribution has maximum entropy relative to all probability distributions covering the entire real line but having a finite mean and finite variance . Proceeding as before, we obtain the objective function and partial derivatives  (D.41)