Maximum Entropy Property of the
Gaussian Distribution

Entropy of a Probability Distribution

The entropy of a probability density function (PDF) $ p(x)$ is defined as [48]

$\displaystyle \zbox {h(p) \isdef \int_x p(x) \cdot \lg\left[\frac{1}{p(x)}\right] dx}$ (D.29)

where $ \lg$ denotes the logarithm base 2. The entropy of $ p(x)$ can be interpreted as the average number of bits needed to specify random variables $ x$ drawn at random according to $ p(x)$ :

$\displaystyle h(p) = {\cal E}_p\left\{\lg \left[\frac{1}{p(x)}\right]\right\}$ (D.30)

The term $ \lg[1/p(x)]$ can be viewed as the number of bits which should be assigned to the value $ x$ . (The most common values of $ x$ should be assigned the fewest bits, while rare values can be assigned many bits.)

Example: Random Bit String

Consider a random sequence of 1s and 0s, i.e., the probability of a 0 or 1 is always $ P(0)=P(1)=1/2$ . The corresponding probability density function is

$\displaystyle p_b(x) = \frac{1}{2}\delta(x) + \frac{1}{2}\delta(x-1)$ (D.31)

and the entropy is

$\displaystyle h(p_b) = \frac{1}{2}\lg(2) + \frac{1}{2}\lg(2) = \lg(2) = 1$ (D.32)

Thus, 1 bit is required for each bit of the sequence. In other words, the sequence cannot be compressed. There is no redundancy.

If instead the probability of a 0 is 1/4 and that of a 1 is 3/4, we get

p_b(x) &=& \frac{1}{4}\delta(x) + \frac{3}{4}\delta(x-1)\\
h(p_b) &=& \frac{1}{4}\lg(4) + \frac{3}{4}\lg\left(\frac{4}{3}\right) = 0.81128\ldots

and the sequence can be compressed about $ 19\%$ .

In the degenerate case for which the probability of a 0 is 0 and that of a 1 is 1, we get

p_b(x) &=& \lim_{\epsilon \to0}\left[\epsilon \delta(x) + (1-\epsilon )\delta(x-1)\right]\\
h(p_b) &=& \lim_{\epsilon \to0}\epsilon \cdot\lg\left(\frac{1}{\epsilon }\right) + 1\cdot\lg(1) = 0.

Thus, the entropy is 0 when the sequence is perfectly predictable.

Maximum Entropy Distributions

Uniform Distribution

Among probability distributions $ p(x)$ which are nonzero over a finite range of values $ x\in[a,b]$ , the maximum-entropy distribution is the uniform distribution. To show this, we must maximize the entropy,

$\displaystyle H(p) \isdef -\int_a^b p(x)\, \lg p(x)\, dx$ (D.33)

with respect to $ p(x)$ , subject to the constraints

p(x) &\geq& 0\\
\int_a^b p(x)\,dx &=& 1.

Using the method of Lagrange multipliers for optimization in the presence of constraints [86], we may form the objective function

$\displaystyle J(p) \isdef -\int_a^b p(x) \, \ln p(x) \,dx + \lambda_0\left(\int_a^b p(x)\,dx - 1\right)$ (D.34)

and differentiate with respect to $ p(x)$ (and renormalize by dropping the $ dx$ factor multiplying all terms) to obtain

$\displaystyle \frac{\partial}{\partial p(x)\,dx} J(p) = - \ln p(x) - 1 + \lambda_0.$ (D.35)

Setting this to zero and solving for $ p(x)$ gives

$\displaystyle p(x) = e^{\lambda_0-1}.$ (D.36)

(Setting the partial derivative with respect to $ \lambda_0$ to zero merely restates the constraint.)

Choosing $ \lambda_0$ to satisfy the constraint gives $ \lambda_0
=1-\ln(b-a)$ , yielding

$\displaystyle p(x) = \left\{\begin{array}{ll} \frac{1}{b-a}, & a\leq x \leq b \\ [5pt] 0, & \hbox{otherwise}. \\ \end{array} \right.$ (D.37)

That this solution is a maximum rather than a minimum or inflection point can be verified by ensuring the sign of the second partial derivative is negative for all $ x$ :

$\displaystyle \frac{\partial^2}{\partial p(x)^2dx} J(p) = - \frac{1}{p(x)}$ (D.38)

Since the solution spontaneously satisfied $ p(x)>0$ , it is a maximum.

Exponential Distribution

Among probability distributions $ p(x)$ which are nonzero over a semi-infinite range of values $ x\in[0,\infty]$ and having a finite mean $ \mu$ , the exponential distribution has maximum entropy.

To the previous case, we add the new constraint

$\displaystyle \int_{-\infty}^\infty x\,p(x)\,dx = \mu < \infty$ (D.39)

resulting in the objective function

J(p) &\isdef & -\int_0^\infty p(x) \, \ln p(x)\,dx
+ \lambda_0\left(\int_0^\infty p(x)\,dx - 1\right).\\
& & + \lambda_1\left(\int_0^\infty x\,p(x)\,dx - \mu\right)

Now the partials with respect to $ p(x)$ are

\frac{\partial}{\partial p(x)\,dx} J(p) &=& - \ln p(x) - 1 + \lambda_0 + \lambda_1 x\\
\frac{\partial^2}{\partial p(x)^2 dx} J(p) &=& - \frac{1}{p(x)}

and $ p(x)$ is of the form $ p(x) = e^{(\lambda_0-1)+\lambda_1x}$ . The unit-area and finite-mean constraints result in $ \exp(\lambda_0-1) =
1/\mu$ and $ \lambda_1=-1/\mu$ , yielding

$\displaystyle p(x) = \left\{\begin{array}{ll} \frac{1}{\mu} e^{-x/\mu}, & x\geq 0 \\ [5pt] 0, & \hbox{otherwise}. \\ \end{array} \right.$ (D.40)

Gaussian Distribution

The Gaussian distribution has maximum entropy relative to all probability distributions covering the entire real line $ x\in(-\infty,\infty)$ but having a finite mean $ \mu$ and finite variance $ \sigma^2$ .

Proceeding as before, we obtain the objective function

J(p) &\isdef & -\int_{-\infty}^\infty p(x) \, \ln p(x)\,dx
+ \lambda_0\left(\int_{-\infty}^\infty p(x)\,dx - 1\right)\\
&+& \lambda_1\left(\int_{-\infty}^\infty x\,p(x)\,dx - \mu\right)
+ \lambda_2\left(\int_{-\infty}^\infty x^2\,p(x)\,dx - \sigma^2\right)

and partial derivatives

\frac{\partial}{\partial p(x)\,dx} J(p) &=& - \ln p(x) - 1 + \lambda_0 + \lambda_1 x\\
\frac{\partial^2}{\partial p(x)^2 dx} J(p) &=& - \frac{1}{p(x)}

leading to

$\displaystyle p(x) = e^{(\lambda_0-1)+\lambda_1 x + \lambda_2 x^2} = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{x^2}{2\sigma^2}}.$ (D.41)

For more on entropy and maximum-entropy distributions, see [48].

Next Section:
Gaussian Moments
Previous Section:
Gaussian Probability Density Function