Maximum Entropy Property of the Gaussian Distribution

Free Books Spectral Audio Signal Processing

Maximum Entropy Property of the
Gaussian Distribution

Entropy of a Probability Distribution

The entropy of a probability density function (PDF) is defined as [48]

$\displaystyle \zbox {h(p) \isdef \int_x p(x) \cdot \lg\left[\frac{1}{p(x)}\right] dx}$

(D.29)

where $\lg$ denotes the logarithm base 2. The entropy of can be interpreted as the average number of bits needed to specify random variables drawn at random according to :

$\displaystyle h(p) = {\cal E}_p\left\{\lg \left[\frac{1}{p(x)}\right]\right\}$

(D.30)

The term $\lg[1/p(x)]$ can be viewed as the number of bits which should be assigned to the value . (The most common values of should be assigned the fewest bits, while rare values can be assigned many bits.)

Example: Random Bit String

Consider a random sequence of 1s and 0s, i.e., the probability of a 0 or 1 is always . The corresponding probability density function is

$\displaystyle p_b(x) = \frac{1}{2}\delta(x) + \frac{1}{2}\delta(x-1)$

(D.31)

and the entropy is

$\displaystyle h(p_b) = \frac{1}{2}\lg(2) + \frac{1}{2}\lg(2) = \lg(2) = 1$

(D.32)

Thus, 1 bit is required for each bit of the sequence. In other words, the sequence cannot be compressed. There is no redundancy.

If instead the probability of a 0 is 1/4 and that of a 1 is 3/4, we get

$\begin{eqnarray*} p_b(x) &=& \frac{1}{4}\delta(x) + \frac{3}{4}\delta(x-1)\\ h(p_b) &=& \frac{1}{4}\lg(4) + \frac{3}{4}\lg\left(\frac{4}{3}\right) = 0.81128\ldots \end{eqnarray*}$

and the sequence can be compressed about $19\%$ .

In the degenerate case for which the probability of a 0 is 0 and that of a 1 is 1, we get

$\begin{eqnarray*} p_b(x) &=& \lim_{\epsilon \to0}\left[\epsilon \delta(x) + (1-\epsilon )\delta(x-1)\right]\\ h(p_b) &=& \lim_{\epsilon \to0}\epsilon \cdot\lg\left(\frac{1}{\epsilon }\right) + 1\cdot\lg(1) = 0. \end{eqnarray*}$

Thus, the entropy is 0 when the sequence is perfectly predictable.

Maximum Entropy Distributions

Uniform Distribution

Among probability distributions which are nonzero over a finite range of values $x\in[a,b]$ , the maximum-entropy distribution is the uniform distribution. To show this, we must maximize the entropy,

$\displaystyle H(p) \isdef -\int_a^b p(x)\, \lg p(x)\, dx$

(D.33)

with respect to , subject to the constraints

$\begin{eqnarray*} p(x) &\geq& 0\\ \int_a^b p(x)\,dx &=& 1. \end{eqnarray*}$

Using the method of Lagrange multipliers for optimization in the presence of constraints [86], we may form the objective function

$\displaystyle J(p) \isdef -\int_a^b p(x) \, \ln p(x) \,dx + \lambda_0\left(\int_a^b p(x)\,dx - 1\right)$

(D.34)

and differentiate with respect to (and renormalize by dropping the factor multiplying all terms) to obtain

$\displaystyle \frac{\partial}{\partial p(x)\,dx} J(p) = - \ln p(x) - 1 + \lambda_0.$

(D.35)

Setting this to zero and solving for gives

$\displaystyle p(x) = e^{\lambda_0-1}.$

(D.36)

(Setting the partial derivative with respect to $\lambda_0$ to zero merely restates the constraint.)

Choosing $\lambda_0$ to satisfy the constraint gives $\lambda_0 =1-\ln(b-a)$ , yielding

$\displaystyle p(x) = \left\{\begin{array}{ll} \frac{1}{b-a}, & a\leq x \leq b \\ [5pt] 0, & \hbox{otherwise}. \\ \end{array} \right.$

(D.37)

That this solution is a maximum rather than a minimum or inflection point can be verified by ensuring the sign of the second partial derivative is negative for all :

$\displaystyle \frac{\partial^2}{\partial p(x)^2dx} J(p) = - \frac{1}{p(x)}$

(D.38)

Since the solution spontaneously satisfied , it is a maximum.

Exponential Distribution

Among probability distributions which are nonzero over a semi-infinite range of values $x\in[0,\infty]$ and having a finite mean $\mu$ , the exponential distribution has maximum entropy.

To the previous case, we add the new constraint

$\displaystyle \int_{-\infty}^\infty x\,p(x)\,dx = \mu < \infty$

(D.39)

resulting in the objective function

$\begin{eqnarray*} J(p) &\isdef & -\int_0^\infty p(x) \, \ln p(x)\,dx + \lambda_0\left(\int_0^\infty p(x)\,dx - 1\right).\\ & & + \lambda_1\left(\int_0^\infty x\,p(x)\,dx - \mu\right) \end{eqnarray*}$

Now the partials with respect to are

$\begin{eqnarray*} \frac{\partial}{\partial p(x)\,dx} J(p) &=& - \ln p(x) - 1 + \lambda_0 + \lambda_1 x\\ \frac{\partial^2}{\partial p(x)^2 dx} J(p) &=& - \frac{1}{p(x)} \end{eqnarray*}$

and is of the form $p(x) = e^{(\lambda_0-1)+\lambda_1x}$ . The unit-area and finite-mean constraints result in $\exp(\lambda_0-1) = 1/\mu$ and $\lambda_1=-1/\mu$ , yielding

$\displaystyle p(x) = \left\{\begin{array}{ll} \frac{1}{\mu} e^{-x/\mu}, & x\geq 0 \\ [5pt] 0, & \hbox{otherwise}. \\ \end{array} \right.$

(D.40)

Gaussian Distribution

The Gaussian distribution has maximum entropy relative to all probability distributions covering the entire real line $x\in(-\infty,\infty)$ but having a finite mean $\mu$ and finite variance $\sigma^2$ .

Proceeding as before, we obtain the objective function

$\begin{eqnarray*} J(p) &\isdef & -\int_{-\infty}^\infty p(x) \, \ln p(x)\,dx + \lambda_0\left(\int_{-\infty}^\infty p(x)\,dx - 1\right)\\ &+& \lambda_1\left(\int_{-\infty}^\infty x\,p(x)\,dx - \mu\right) + \lambda_2\left(\int_{-\infty}^\infty x^2\,p(x)\,dx - \sigma^2\right) \end{eqnarray*}$

and partial derivatives

$\begin{eqnarray*} \frac{\partial}{\partial p(x)\,dx} J(p) &=& - \ln p(x) - 1 + \lambda_0 + \lambda_1 x\\ \frac{\partial^2}{\partial p(x)^2 dx} J(p) &=& - \frac{1}{p(x)} \end{eqnarray*}$

leading to

$\displaystyle p(x) = e^{(\lambda_0-1)+\lambda_1 x + \lambda_2 x^2} = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{x^2}{2\sigma^2}}.$

(D.41)

For more on entropy and maximum-entropy distributions, see [48].

Next Section:
Gaussian Moments
Previous Section:
Gaussian Probability Density Function