# Gaussian Function Properties

This appendix collects together various facts about the fascinating*Gaussian*function--the classic ``bell curve'' that arises repeatedly in science and mathematics. As already seen in §B.17.1, only the Gaussian achieves the

*minimum time-bandwidth product*among all smooth (analytic) functions.

## Gaussian Window and Transform

The*Gaussian window*for FFT analysis was introduced in §3.11, and complex Gaussians (``chirplets'') were utilized in §10.6. For reference in support of these topics, this appendix derives some additional properties of the Gaussian, defined by

(D.1) |

and discusses some interesting applications in spectral modeling (the subject of §10.4). The basic mathematics rederived here are well known (see,

*e.g.*, [202,5]), while the application to spectral modeling of sound remains a topic under development.

## Gaussians Closed under Multiplication

Define(D.2) |

with

*Gaussian-windowed chirps*( and complex).

### Product of Two Gaussian PDFs

For the special case of two Gaussian probability densities,## Gaussians Closed under Convolution

In §D.8 we show that- the Fourier transform of a Gaussian is Gaussian, and in §D.2 that
- the product of any two Gaussians is Gaussian.

*convolution theorem*for Fourier transforms (§B.7) that the

*convolution of any two Gaussians is Gaussian*.

## Fitting a Gaussian to Data

When fitting a single Gaussian to data, one can take a log and fit a parabola. In matlab, this can be carried out as in the following example:x = -1:0.1:1; sigma = 0.01; y = exp(-x.*x) + sigma*randn(size(x)); % test data: [p,s] = polyfit(x,log(y),2); % fit parabola to log yh = exp(polyval(p,x)); % data model norm(y-yh) % ans = 1.9230e-16 when sigma=0 plot(abs([y',yh']));In practice, it is good to avoid zeros in the data. For example, one can fit only to the middle third or so of a measured peak, restricting consideration to measured samples that are positive and ``look Gaussian'' to a reasonable extent.

## Infinite Flatness at Infinity

The Gaussian is*infinitely flat*at infinity. Equivalently, the Maclaurin expansion (Taylor expansion about ) of

(D.3) |

is

*zero*for all orders. Thus, even though is differentiable of all orders at , its series expansion fails to approach the function. This happens because has an

*essential singularity*at (also called a ``non-removable singularity''). One can think of an essential singularity as an infinite number of

*poles*piled up at the same point ( for ). Equivalently, above has an infinite number of zeros at , leading to the problem with Maclaurin series expansion. To prove this, one can show

(D.4) |

for all . This follows from the fact that exponential growth or decay is faster than polynomial growth or decay. An exponential can in fact be viewed as an infinite-order polynomial, since

(D.5) |

We may call

*infinitely flat*at in the ``Padé sense'':

- Padé approximation is
*maximally flat*approximation, and seeks to use all degrees of freedom in the approximation to match the leading terms of the Taylor series expansion. - Butterworth filters (IIR) are maximally flat at dc [263].
- Lagrange interpolation (FIR) is maximally flat at dc [266].
- Thiran allpass interpolation has maximally flat group delay at dc [266].

*essential singular point*the inequality

(D.6) |

is satisfied at some point in

*every neighborhood*of , however small. In other words, comes arbitrarily close to

*every possible value*in

*any neighborhood*about an essential singular point. This was first proved by Weierstrass [42, p. 270].

## Integral of a Complex Gaussian

**Theorem:**

(D.7) |

*Proof:*Let denote the integral. Then

**re**to have as . Thus,

(D.8) |

as claimed.

### Area Under a Real Gaussian

**Corollary:**Setting in the previous theorem, where is real, we have

(D.9) |

Therefore, we may normalize the Gaussian to

*unit area*by defining

(D.10) |

Since

and | (D.11) |

it satisfies the requirements of a

*probability density*function.

## Gaussian Integral with Complex Offset

**Theorem:**

(D.12) |

*Proof:*When , we have the previously proved case. For arbitrary and real number , let denote the closed rectangular contour , depicted in Fig.D.1. Clearly, is analytic inside the region bounded by . By Cauchy's theorem [42], the line integral of along is zero,

*i.e.*,

(D.13) |

This line integral breaks into the following four pieces:

*i.e.*, in the limit as ,

(D.14) |

Making the change of variable , we obtain

(D.15) |

as desired.

## Fourier Transform of Complex Gaussian

**Theorem:**

(D.16) |

*Proof:*[202, p. 211] The Fourier transform of is defined as

(D.17) |

Completing the square of the exponent gives

(D.18) |

using our previous result.

### Alternate Proof

The Fourier transform of a complex Gaussian can also be derived using the differentiation theorem and its dual (§B.2).^{D.1}

*Proof:*Let

(D.19) |

Then by the differentiation theorem (§B.2),

(D.20) |

By the differentiation theorem dual (§B.3),

(D.21) |

Differentiating gives

(D.22) |

Therefore,

(D.23) |

or

(D.24) |

Integrating both sides with respect to yields

(D.25) |

In §D.7, we found that , so that, finally, exponentiating gives

(D.26) |

as expected. The Fourier transform of complex Gaussians (``chirplets'') is used in §10.6 to analyze Gaussian-windowed ``chirps'' in the frequency domain.

## Why Gaussian?

This section lists some of the points of origin for the Gaussian function in mathematics and physics.### Central Limit Theorem

The central limit theorem^{D.2}provides that many iterated convolutions of any ``sufficiently regular'' shape will approach a Gaussian function.

### Iterated Convolutions

Any ``reasonable''*probability density function*(PDF) (§C.1.3) has a Fourier transform that looks like near its tip. Iterating convolutions then corresponds to , which becomes [2]

(D.27) |

for large , by the definition of [264]. This proves that the th power of approaches the Gaussian function defined in §D.1 for large . Since the inverse Fourier transform of a Gaussian is another Gaussian (§D.8), we can define a time-domain function as being ``sufficiently regular'' when its Fourier transform approaches in a sufficiently small neighborhood of . That is, the Fourier transform simply needs a ``sufficiently smooth peak'' at that can be expanded into a convergent Taylor series. This obviously holds for the DTFT of any discrete-time window function (the subject of Chapter 3), because the window transform is a finite sum of continuous cosines of the form in the zero-phase case, and complex exponentials in the causal case, each of which is differentiable any number of times in .

### Binomial Distribution

The last row of Pascal's triangle (the binomial distribution) approaches a sampled Gaussian function as the number of rows increases.^{D.3}Since Lagrange interpolation (elementary polynomial interpolation) is equal to binomially windowed sinc interpolation [301,134], it follows that Lagrange interpolation approaches Gaussian-windowed sinc interpolation at high orders.

## Gaussian Probability Density Function

Any non-negative function which integrates to 1 (unit total area) is suitable for use as a probability density function (PDF) (§C.1.3). The most general Gaussian PDF is given by shifts of the normalized Gaussian:(D.28) |

The parameter is the

*mean*, and is the

*variance*of the distribution (we'll show this in §D.12 below).

##
Maximum Entropy Property of the

Gaussian Distribution

### Entropy of a Probability Distribution

The*entropy*of a probability density function (PDF) is defined as [48]

(D.29) |

where denotes the logarithm base 2. The entropy of can be interpreted as the average number of bits needed to specify random variables drawn at random according to :

(D.30) |

The term can be viewed as the number of bits which should be assigned to the value . (The most common values of should be assigned the fewest bits, while rare values can be assigned many bits.)

### Example: Random Bit String

Consider a random sequence of 1s and 0s,*i.e.*, the probability of a 0 or 1 is always . The corresponding probability density function is

(D.31) |

and the entropy is

(D.32) |

Thus, 1 bit is required for each bit of the sequence. In other words, the sequence cannot be compressed. There is no redundancy. If instead the probability of a 0 is 1/4 and that of a 1 is 3/4, we get

### Maximum Entropy Distributions

#### Uniform Distribution

Among probability distributions which are nonzero over a*finite*range of values , the maximum-entropy distribution is the

*uniform*distribution. To show this, we must maximize the entropy,

(D.33) |

with respect to , subject to the constraints

*Lagrange multipliers*for optimization in the presence of constraints [86], we may form the

*objective function*

(D.34) |

and differentiate with respect to (and renormalize by dropping the factor multiplying all terms) to obtain

(D.35) |

Setting this to zero and solving for gives

(D.36) |

(Setting the partial derivative with respect to to zero merely restates the constraint.) Choosing to satisfy the constraint gives , yielding

(D.37) |

That this solution is a maximum rather than a minimum or inflection point can be verified by ensuring the sign of the second partial derivative is negative for all :

(D.38) |

Since the solution spontaneously satisfied , it is a maximum.

#### Exponential Distribution

Among probability distributions which are nonzero over a*semi-infinite*range of values and having a finite mean , the

*exponential*distribution has maximum entropy. To the previous case, we add the new constraint

(D.39) |

resulting in the objective function

(D.40) |

#### Gaussian Distribution

The Gaussian distribution has maximum entropy relative to all probability distributions covering the entire real line but having a finite mean and*finite variance*. Proceeding as before, we obtain the objective function

(D.41) |

For more on entropy and maximum-entropy distributions, see [48].

## Gaussian Moments

### Gaussian Mean

The*mean of a distribution*is defined as its

*first-order moment*:

(D.42) |

To show that the mean of the Gaussian distribution is , we may write, letting ,

### Gaussian Variance

The*variance of a distribution*is defined as its

*second central moment*:

(D.43) |

where is the mean of . To show that the variance of the Gaussian distribution is , we write, letting ,

### Higher Order Moments Revisited

**Theorem:**The th central moment of the Gaussian pdf with mean and variance is given by

where denotes the product of all odd integers up to and including (see ``

*double-factorial*notation''). Thus, for example, , , , and .

*Proof:*The formula can be derived by successively differentiating the

*moment-generating function*with respect to and evaluating at ,

^{D.4}or by differentiating the Gaussian integral

(D.45) |

successively with respect to [203, p. 147-148]:

(D.46) |

for . Since the change of variable has no affect on the result, (D.44) is also derived for .

### Moment Theorem

**Theorem:**For a random variable ,

(D.47) |

where is the

*characteristic function*of the PDF of :

(D.48) |

(Note that is the

*complex conjugate*of the

*Fourier transform*of .)

*Proof:*[201, p. 157] Let denote the th moment of ,

*i.e.*,

(D.49) |

Then

*finite*.

### Gaussian Characteristic Function

Since the Gaussian PDF is(D.50) |

and since the Fourier transform of is

(D.51) |

It follows that the

*Gaussian characteristic function*is

(D.52) |

### Gaussian Central Moments

The characteristic function of a zero-mean Gaussian is(D.53) |

Since a zero-mean Gaussian is an even function of , (

*i.e.*, ), all odd-order moments are zero. By the moment theorem, the even-order moments are

(D.54) |

In particular,

## A Sum of Gaussian Random Variables is a Gaussian Random Variable

A basic result from the theory of random variables is that when you*sum*two independent random variables, you

*convolve*their probability density functions (PDF). (Equivalently, in the frequency domain, their

*characteristic functions multiply*.) That the sum of two independent Gaussian random variables is Gaussian follows immediately from the fact that Gaussians are closed under multiplication (or convolution).

**Next Section:**

Bilinear Frequency-Warping for Audio Spectrum Analysis over Bark and ERB Frequency Scales

**Previous Section:**

Beginning Statistical Signal Processing