# Gaussian Function Properties

This appendix collects together various facts about the
fascinating *Gaussian* function--the classic ``bell curve'' that
arises repeatedly in science and mathematics. As already seen
in §B.17.1, only the Gaussian achieves the *minimum
time-bandwidth product* among all smooth (analytic) functions.

## Gaussian Window and Transform

The *Gaussian window* for FFT analysis was introduced in
§3.11, and complex Gaussians (``chirplets'') were utilized
in §10.6. For reference in support of these topics,
this appendix derives some additional properties of the Gaussian,
defined by

(D.1) |

and discusses some interesting applications in spectral modeling (the subject of §10.4). The basic mathematics rederived here are well known (see,

*e.g.*, [202,5]), while the application to spectral modeling of sound remains a topic under development.

## Gaussians Closed under Multiplication

Define

where are arbitrary complex numbers. Then by direct calculation, we have

Completing the square, we obtain

(D.2) |

with

Note that this result holds for *Gaussian-windowed chirps*
(
and
complex).

### Product of Two Gaussian PDFs

For the special case of two Gaussian probability densities,

the product density has mean and variance given by

## Gaussians Closed under Convolution

In §D.8 we show that

- the Fourier transform of a Gaussian is Gaussian, and in §D.2 that
- the product of any two Gaussians is Gaussian.

*convolution theorem*for Fourier transforms (§B.7) that the

*convolution of any two Gaussians is Gaussian*.

## Fitting a Gaussian to Data

When fitting a single Gaussian to data, one can take a log and fit a parabola. In matlab, this can be carried out as in the following example:

x = -1:0.1:1; sigma = 0.01; y = exp(-x.*x) + sigma*randn(size(x)); % test data: [p,s] = polyfit(x,log(y),2); % fit parabola to log yh = exp(polyval(p,x)); % data model norm(y-yh) % ans = 1.9230e-16 when sigma=0 plot(abs([y',yh']));In practice, it is good to avoid zeros in the data. For example, one can fit only to the middle third or so of a measured peak, restricting consideration to measured samples that are positive and ``look Gaussian'' to a reasonable extent.

## Infinite Flatness at Infinity

The Gaussian is *infinitely flat* at infinity. Equivalently, the
Maclaurin expansion (Taylor expansion about
) of

(D.3) |

is

*zero*for all orders. Thus, even though is differentiable of all orders at , its series expansion fails to approach the function. This happens because has an

*essential singularity*at (also called a ``non-removable singularity''). One can think of an essential singularity as an infinite number of

*poles*piled up at the same point ( for ). Equivalently, above has an infinite number of zeros at , leading to the problem with Maclaurin series expansion. To prove this, one can show

(D.4) |

for all . This follows from the fact that exponential growth or decay is faster than polynomial growth or decay. An exponential can in fact be viewed as an infinite-order polynomial, since

(D.5) |

We may call

*infinitely flat*at in the ``Padé sense'':

- Padé approximation is
*maximally flat*approximation, and seeks to use all degrees of freedom in the approximation to match the leading terms of the Taylor series expansion. - Butterworth filters (IIR) are maximally flat at dc [263].
- Lagrange interpolation (FIR) is maximally flat at dc [266].
- Thiran allpass interpolation has maximally flat group delay at dc [266].

Another interesting mathematical property of essential singularities is
that near an *essential singular point*
the
inequality

(D.6) |

is satisfied at some point in

*every neighborhood*of , however small. In other words, comes arbitrarily close to

*every possible value*in

*any neighborhood*about an essential singular point. This was first proved by Weierstrass [42, p. 270].

## Integral of a Complex Gaussian

**Theorem: **

(D.7) |

*Proof: *Let
denote the integral. Then

where we needed
**re**
to have
as
. Thus,

(D.8) |

as claimed.

### Area Under a Real Gaussian

**Corollary: **
Setting
in the previous theorem, where
is real,
we have

(D.9) |

Therefore, we may normalize the Gaussian to

*unit area*by defining

(D.10) |

Since

and | (D.11) |

it satisfies the requirements of a

*probability density*function.

## Gaussian Integral with Complex Offset

**Theorem: **

(D.12) |

*Proof: *
When
, we have the previously proved case. For arbitrary
and real number
, let
denote the closed rectangular contour
, depicted in Fig.D.1.

Clearly,
is analytic inside the region bounded
by
. By Cauchy's theorem [42],
the line integral of
along
is zero, *i.e.*,

(D.13) |

This line integral breaks into the following four pieces:

where
and
are real variables. In the limit as
,
the first piece approaches
, as previously proved.
Pieces
and
contribute zero in the limit, since
as
. Since the total contour integral is
zero by Cauchy's theorem, we conclude that piece 3 is the negative of
piece 1, *i.e.*, in the limit as
,

(D.14) |

Making the change of variable , we obtain

(D.15) |

as desired.

## Fourier Transform of Complex Gaussian

**Theorem: **

(D.16) |

*Proof: *[202, p. 211]
The Fourier transform of
is defined as

(D.17) |

Completing the square of the exponent gives

Thus, the Fourier transform can be written as

(D.18) |

using our previous result.

### Alternate Proof

The Fourier transform of a complex Gaussian can also be derived using the
differentiation theorem and its dual (§B.2).^{D.1}

*Proof: *Let

(D.19) |

Then by the differentiation theorem (§B.2),

(D.20) |

By the differentiation theorem dual (§B.3),

(D.21) |

Differentiating gives

(D.22) |

Therefore,

(D.23) |

or

(D.24) |

Integrating both sides with respect to yields

(D.25) |

In §D.7, we found that , so that, finally, exponentiating gives

(D.26) |

as expected.

The Fourier transform of complex Gaussians (``chirplets'') is used in §10.6 to analyze Gaussian-windowed ``chirps'' in the frequency domain.

## Why Gaussian?

This section lists some of the points of origin for the Gaussian function in mathematics and physics.

### Central Limit Theorem

The central limit theorem^{D.2}provides that many iterated convolutions of
any ``sufficiently regular'' shape will approach a Gaussian function.

### Iterated Convolutions

Any ``reasonable'' *probability density function* (PDF) (§C.1.3)
has a Fourier transform that looks like
near its tip. Iterating
convolutions then corresponds to
, which becomes
[2]

(D.27) |

for large , by the definition of [264]. This proves that the th power of approaches the Gaussian function defined in §D.1 for large .

Since the inverse Fourier transform of a Gaussian is another Gaussian (§D.8), we can define a time-domain function as being ``sufficiently regular'' when its Fourier transform approaches in a sufficiently small neighborhood of . That is, the Fourier transform simply needs a ``sufficiently smooth peak'' at that can be expanded into a convergent Taylor series. This obviously holds for the DTFT of any discrete-time window function (the subject of Chapter 3), because the window transform is a finite sum of continuous cosines of the form in the zero-phase case, and complex exponentials in the causal case, each of which is differentiable any number of times in .

### Binomial Distribution

The last row of Pascal's triangle (the binomial distribution)
approaches a sampled Gaussian function as the number of rows
increases.^{D.3} Since Lagrange
interpolation (elementary polynomial interpolation) is equal to
binomially windowed sinc interpolation
[301,134], it follows that Lagrange
interpolation approaches Gaussian-windowed sinc interpolation at high
orders.

## Gaussian Probability Density Function

Any non-negative function which integrates to 1 (unit total area) is suitable for use as a probability density function (PDF) (§C.1.3). The most general Gaussian PDF is given by shifts of the normalized Gaussian:

(D.28) |

The parameter is the

*mean*, and is the

*variance*of the distribution (we'll show this in §D.12 below).

##
Maximum Entropy Property of the

Gaussian Distribution

### Entropy of a Probability Distribution

The *entropy* of a probability density function (PDF)
is
defined as [48]

(D.29) |

where denotes the logarithm base 2. The entropy of can be interpreted as the average number of bits needed to specify random variables drawn at random according to :

(D.30) |

The term can be viewed as the number of bits which should be assigned to the value . (The most common values of should be assigned the fewest bits, while rare values can be assigned many bits.)

### Example: Random Bit String

Consider a random sequence of 1s and 0s, *i.e.*, the probability of a 0 or
1 is always
. The corresponding probability density function
is

(D.31) |

and the entropy is

(D.32) |

Thus, 1 bit is required for each bit of the sequence. In other words, the sequence cannot be compressed. There is no redundancy.

If instead the probability of a 0 is 1/4 and that of a 1 is 3/4, we get

and the sequence can be compressed about .

In the degenerate case for which the probability of a 0 is 0 and that of a 1 is 1, we get

Thus, the entropy is 0 when the sequence is perfectly predictable.

### Maximum Entropy Distributions

#### Uniform Distribution

Among probability distributions
which are nonzero over a
*finite* range of values
, the maximum-entropy
distribution is the *uniform* distribution. To show this, we
must maximize the entropy,

(D.33) |

with respect to , subject to the constraints

Using the method of *Lagrange multipliers* for optimization in
the presence of constraints [86], we may form the
*objective function*

(D.34) |

and differentiate with respect to (and renormalize by dropping the factor multiplying all terms) to obtain

(D.35) |

Setting this to zero and solving for gives

(D.36) |

(Setting the partial derivative with respect to to zero merely restates the constraint.)

Choosing to satisfy the constraint gives , yielding

(D.37) |

That this solution is a maximum rather than a minimum or inflection point can be verified by ensuring the sign of the second partial derivative is negative for all :

(D.38) |

Since the solution spontaneously satisfied , it is a maximum.

#### Exponential Distribution

Among probability distributions
which are nonzero over a
*semi-infinite* range of values
and having a finite
mean
, the *exponential* distribution has maximum entropy.

To the previous case, we add the new constraint

(D.39) |

resulting in the objective function

Now the partials with respect to are

and is of the form . The unit-area and finite-mean constraints result in and , yielding

(D.40) |

#### Gaussian Distribution

The Gaussian distribution has maximum entropy relative to all
probability distributions covering the entire real line
but having a finite mean
and *finite
variance*
.

Proceeding as before, we obtain the objective function

and partial derivatives

leading to

(D.41) |

For more on entropy and maximum-entropy distributions, see [48].

## Gaussian Moments

### Gaussian Mean

The *mean of a distribution*
is defined as its
*first-order moment*:

(D.42) |

To show that the mean of the Gaussian distribution is , we may write, letting ,

since .

### Gaussian Variance

The
*variance of a distribution*
is defined as its
*second central moment*:

(D.43) |

where is the mean of .

To show that the variance of the Gaussian distribution is , we write, letting ,

where we used integration by parts and the fact that as .

### Higher Order Moments Revisited

**Theorem: **
The
th central moment of the Gaussian pdf
with mean
and variance
is given by

where denotes the product of all odd integers up to and including (see ``

*double-factorial*notation''). Thus, for example, , , , and .

*Proof: *
The formula can be derived by successively differentiating the
*moment-generating function*
with respect to
and evaluating at
,^{D.4} or by differentiating the
Gaussian integral

(D.45) |

successively with respect to [203, p. 147-148]:

for . Setting and , and dividing both sides by yields

(D.46) |

for . Since the change of variable has no affect on the result, (D.44) is also derived for .

### Moment Theorem

**Theorem: **
For a random variable
,

(D.47) |

where is the

*characteristic function*of the PDF of :

(D.48) |

(Note that is the

*complex conjugate*of the

*Fourier transform*of .)

*Proof: *[201, p. 157]
Let
denote the
th moment of
, *i.e.*,

(D.49) |

Then

where the term-by-term integration is valid when all moments
are
*finite*.

### Gaussian Characteristic Function

Since the Gaussian PDF is

(D.50) |

and since the Fourier transform of is

(D.51) |

It follows that the

*Gaussian characteristic function*is

(D.52) |

### Gaussian Central Moments

The characteristic function of a zero-mean Gaussian is

(D.53) |

Since a zero-mean Gaussian is an even function of , (

*i.e.*, ), all odd-order moments are zero. By the moment theorem, the even-order moments are

(D.54) |

In particular,

Since and , we see , , as expected.

## A Sum of Gaussian Random Variables is a Gaussian Random Variable

A basic result from the theory of random variables is that when you
*sum* two independent random variables, you *convolve* their
probability density functions (PDF). (Equivalently, in the frequency
domain, their *characteristic functions multiply*.)

That the sum of two independent Gaussian random variables is Gaussian follows immediately from the fact that Gaussians are closed under multiplication (or convolution).

**Next Section:**

Bilinear Frequency-Warping for Audio Spectrum Analysis over Bark and ERB Frequency Scales

**Previous Section:**

Beginning Statistical Signal Processing