Gaussian Function Properties

This appendix collects together various facts about the fascinating Gaussian function--the classic bell curve'' that arises repeatedly in science and mathematics. As already seen in §B.17.1, only the Gaussian achieves the minimum time-bandwidth product among all smooth (analytic) functions.

Gaussian Window and Transform

The Gaussian window for FFT analysis was introduced in §3.11, and complex Gaussians (chirplets'') were utilized in §10.6. For reference in support of these topics, this appendix derives some additional properties of the Gaussian, defined by

 (D.1)

and discusses some interesting applications in spectral modeling (the subject of §10.4). The basic mathematics rederived here are well known (see, e.g., [202,5]), while the application to spectral modeling of sound remains a topic under development.

Gaussians Closed under Multiplication

Define

where are arbitrary complex numbers. Then by direct calculation, we have

Completing the square, we obtain

 (D.2)

with

Note that this result holds for Gaussian-windowed chirps ( and complex).

Product of Two Gaussian PDFs

For the special case of two Gaussian probability densities,

the product density has mean and variance given by

Gaussians Closed under Convolution

In §D.8 we show that

• the Fourier transform of a Gaussian is Gaussian, and in §D.2 that
• the product of any two Gaussians is Gaussian.
Therefore, it follows from the convolution theorem for Fourier transforms (§B.7) that the convolution of any two Gaussians is Gaussian.

Fitting a Gaussian to Data

When fitting a single Gaussian to data, one can take a log and fit a parabola. In matlab, this can be carried out as in the following example:

x = -1:0.1:1;
sigma = 0.01;
y = exp(-x.*x) + sigma*randn(size(x)); % test data:
[p,s] = polyfit(x,log(y),2); % fit parabola to log
yh = exp(polyval(p,x)); % data model
norm(y-yh) % ans =  1.9230e-16 when sigma=0
plot(abs([y',yh']));

In practice, it is good to avoid zeros in the data. For example, one can fit only to the middle third or so of a measured peak, restricting consideration to measured samples that are positive and look Gaussian'' to a reasonable extent.

Infinite Flatness at Infinity

The Gaussian is infinitely flat at infinity. Equivalently, the Maclaurin expansion (Taylor expansion about ) of

 (D.3)

is zero for all orders. Thus, even though is differentiable of all orders at , its series expansion fails to approach the function. This happens because has an essential singularity at (also called a non-removable singularity''). One can think of an essential singularity as an infinite number of poles piled up at the same point ( for ). Equivalently, above has an infinite number of zeros at , leading to the problem with Maclaurin series expansion. To prove this, one can show

 (D.4)

for all . This follows from the fact that exponential growth or decay is faster than polynomial growth or decay. An exponential can in fact be viewed as an infinite-order polynomial, since

 (D.5)

We may call infinitely flat at in the Padé sense'':

Another interesting mathematical property of essential singularities is that near an essential singular point the inequality

 (D.6)

is satisfied at some point in every neighborhood of , however small. In other words, comes arbitrarily close to every possible value in any neighborhood about an essential singular point. This was first proved by Weierstrass [42, p. 270].

Integral of a Complex Gaussian

Theorem:

 (D.7)

Proof: Let denote the integral. Then

where we needed re to have as . Thus,

 (D.8)

as claimed.

Area Under a Real Gaussian

Corollary: Setting in the previous theorem, where is real, we have

 (D.9)

Therefore, we may normalize the Gaussian to unit area by defining

 (D.10)

Since

 and (D.11)

it satisfies the requirements of a probability density function.

Gaussian Integral with Complex Offset

Theorem:

 (D.12)

Proof: When , we have the previously proved case. For arbitrary and real number , let denote the closed rectangular contour , depicted in Fig.D.1.

Clearly, is analytic inside the region bounded by . By Cauchy's theorem [42], the line integral of along is zero, i.e.,

 (D.13)

This line integral breaks into the following four pieces:

where and are real variables. In the limit as , the first piece approaches , as previously proved. Pieces and contribute zero in the limit, since as . Since the total contour integral is zero by Cauchy's theorem, we conclude that piece 3 is the negative of piece 1, i.e., in the limit as ,

 (D.14)

Making the change of variable , we obtain

 (D.15)

as desired.

Fourier Transform of Complex Gaussian

Theorem:

 (D.16)

Proof: [202, p. 211] The Fourier transform of is defined as

 (D.17)

Completing the square of the exponent gives

Thus, the Fourier transform can be written as

 (D.18)

using our previous result.

Alternate Proof

The Fourier transform of a complex Gaussian can also be derived using the differentiation theorem and its dual (§B.2).D.1

Proof: Let

 (D.19)

Then by the differentiation theorem (§B.2),

 (D.20)

By the differentiation theorem dual (§B.3),

 (D.21)

Differentiating gives

 (D.22)

Therefore,

 (D.23)

or

 (D.24)

Integrating both sides with respect to yields

 (D.25)

In §D.7, we found that , so that, finally, exponentiating gives

 (D.26)

as expected.

The Fourier transform of complex Gaussians (chirplets'') is used in §10.6 to analyze Gaussian-windowed chirps'' in the frequency domain.

Why Gaussian?

This section lists some of the points of origin for the Gaussian function in mathematics and physics.

Central Limit Theorem

The central limit theoremD.2provides that many iterated convolutions of any sufficiently regular'' shape will approach a Gaussian function.

Iterated Convolutions

Any reasonable'' probability density function (PDF) (§C.1.3) has a Fourier transform that looks like near its tip. Iterating convolutions then corresponds to , which becomes [2]

 (D.27)

for large , by the definition of [264]. This proves that the th power of approaches the Gaussian function defined in §D.1 for large .

Since the inverse Fourier transform of a Gaussian is another Gaussian (§D.8), we can define a time-domain function as being sufficiently regular'' when its Fourier transform approaches in a sufficiently small neighborhood of . That is, the Fourier transform simply needs a sufficiently smooth peak'' at that can be expanded into a convergent Taylor series. This obviously holds for the DTFT of any discrete-time window function (the subject of Chapter 3), because the window transform is a finite sum of continuous cosines of the form in the zero-phase case, and complex exponentials in the causal case, each of which is differentiable any number of times in .

Binomial Distribution

The last row of Pascal's triangle (the binomial distribution) approaches a sampled Gaussian function as the number of rows increases.D.3 Since Lagrange interpolation (elementary polynomial interpolation) is equal to binomially windowed sinc interpolation [301,134], it follows that Lagrange interpolation approaches Gaussian-windowed sinc interpolation at high orders.

GaussianProbability Density Function

Any non-negative function which integrates to 1 (unit total area) is suitable for use as a probability density function (PDF) (§C.1.3). The most general Gaussian PDF is given by shifts of the normalized Gaussian:

 (D.28)

The parameter is the mean, and is the variance of the distribution (we'll show this in §D.12 below).

Maximum Entropy Property of the Gaussian Distribution

Entropy of a Probability Distribution

The entropy of a probability density function (PDF) is defined as [48]

 (D.29)

where denotes the logarithm base 2. The entropy of can be interpreted as the average number of bits needed to specify random variables drawn at random according to :

 (D.30)

The term can be viewed as the number of bits which should be assigned to the value . (The most common values of should be assigned the fewest bits, while rare values can be assigned many bits.)

Example: Random Bit String

Consider a random sequence of 1s and 0s, i.e., the probability of a 0 or 1 is always . The corresponding probability density function is

 (D.31)

and the entropy is

 (D.32)

Thus, 1 bit is required for each bit of the sequence. In other words, the sequence cannot be compressed. There is no redundancy.

If instead the probability of a 0 is 1/4 and that of a 1 is 3/4, we get

and the sequence can be compressed about .

In the degenerate case for which the probability of a 0 is 0 and that of a 1 is 1, we get

Thus, the entropy is 0 when the sequence is perfectly predictable.

Maximum Entropy Distributions

Uniform Distribution

Among probability distributions which are nonzero over a finite range of values , the maximum-entropy distribution is the uniform distribution. To show this, we must maximize the entropy,

 (D.33)

with respect to , subject to the constraints

Using the method of Lagrange multipliers for optimization in the presence of constraints [86], we may form the objective function

 (D.34)

and differentiate with respect to (and renormalize by dropping the factor multiplying all terms) to obtain

 (D.35)

Setting this to zero and solving for gives

 (D.36)

(Setting the partial derivative with respect to to zero merely restates the constraint.)

Choosing to satisfy the constraint gives , yielding

 (D.37)

That this solution is a maximum rather than a minimum or inflection point can be verified by ensuring the sign of the second partial derivative is negative for all :

 (D.38)

Since the solution spontaneously satisfied , it is a maximum.

Exponential Distribution

Among probability distributions which are nonzero over a semi-infinite range of values and having a finite mean , the exponential distribution has maximum entropy.

To the previous case, we add the new constraint

 (D.39)

resulting in the objective function

Now the partials with respect to are

and is of the form . The unit-area and finite-mean constraints result in and , yielding

 (D.40)

Gaussian Distribution

The Gaussian distribution has maximum entropy relative to all probability distributions covering the entire real line but having a finite mean and finite variance .

Proceeding as before, we obtain the objective function

and partial derivatives

 (D.41)

For more on entropy and maximum-entropy distributions, see [48].

Gaussian Moments

Gaussian Mean

The mean of a distribution is defined as its first-order moment:

 (D.42)

To show that the mean of the Gaussian distribution is , we may write, letting ,

since .

Gaussian Variance

The variance of a distribution is defined as its second central moment:

 (D.43)

where is the mean of .

To show that the variance of the Gaussian distribution is , we write, letting ,

where we used integration by parts and the fact that as .

Higher Order Moments Revisited

Theorem: The th central moment of the Gaussian pdf with mean and variance is given by

 (D.44)

where denotes the product of all odd integers up to and including (see double-factorial notation''). Thus, for example, , , , and .

Proof: The formula can be derived by successively differentiating the moment-generating function with respect to and evaluating at ,D.4 or by differentiating the Gaussian integral

 (D.45)

successively with respect to [203, p. 147-148]:

for . Setting and , and dividing both sides by yields

 (D.46)

for . Since the change of variable has no affect on the result, (D.44) is also derived for .

Moment Theorem

Theorem: For a random variable ,

 (D.47)

where is the characteristic function of the PDF of :

 (D.48)

(Note that is the complex conjugate of the Fourier transform of .)

Proof: [201, p. 157] Let denote the th moment of , i.e.,

 (D.49)

Then

where the term-by-term integration is valid when all moments are finite.

Gaussian Characteristic Function

Since the Gaussian PDF is

 (D.50)

and since the Fourier transform of is

 (D.51)

It follows that the Gaussian characteristic function is

 (D.52)

Gaussian Central Moments

The characteristic function of a zero-mean Gaussian is

 (D.53)

Since a zero-mean Gaussian is an even function of , (i.e., ), all odd-order moments are zero. By the moment theorem, the even-order moments are

 (D.54)

In particular,

Since and , we see , , as expected.

A Sum of Gaussian Random Variables is a Gaussian Random Variable

A basic result from the theory of random variables is that when you sum two independent random variables, you convolve their probability density functions (PDF). (Equivalently, in the frequency domain, their characteristic functions multiply.)

That the sum of two independent Gaussian random variables is Gaussian follows immediately from the fact that Gaussians are closed under multiplication (or convolution).

Next Section:
Bilinear Frequency-Warping for Audio Spectrum Analysis over Bark and ERB Frequency Scales
Previous Section:
Beginning Statistical Signal Processing