Gaussian Function Properties
This appendix collects together various facts about the fascinating Gaussian function--the classic ``bell curve'' that arises repeatedly in science and mathematics. As already seen in §B.17.1, only the Gaussian achieves the minimum time-bandwidth product among all smooth (analytic) functions.
Gaussian Window and Transform
The Gaussian window for FFT analysis was introduced in §3.11, and complex Gaussians (``chirplets'') were utilized in §10.6. For reference in support of these topics, this appendix derives some additional properties of the Gaussian, defined by
(D.1) |
and discusses some interesting applications in spectral modeling (the subject of §10.4). The basic mathematics rederived here are well known (see, e.g., [202,5]), while the application to spectral modeling of sound remains a topic under development.
Gaussians Closed under Multiplication
Define
where are arbitrary complex numbers. Then by direct calculation, we have
Completing the square, we obtain
(D.2) |
with
Note that this result holds for Gaussian-windowed chirps ( and complex).
Product of Two Gaussian PDFs
For the special case of two Gaussian probability densities,
the product density has mean and variance given by
Gaussians Closed under Convolution
In §D.8 we show that
- the Fourier transform of a Gaussian is Gaussian, and in §D.2 that
- the product of any two Gaussians is Gaussian.
Fitting a Gaussian to Data
When fitting a single Gaussian to data, one can take a log and fit a parabola. In matlab, this can be carried out as in the following example:
x = -1:0.1:1; sigma = 0.01; y = exp(-x.*x) + sigma*randn(size(x)); % test data: [p,s] = polyfit(x,log(y),2); % fit parabola to log yh = exp(polyval(p,x)); % data model norm(y-yh) % ans = 1.9230e-16 when sigma=0 plot(abs([y',yh']));In practice, it is good to avoid zeros in the data. For example, one can fit only to the middle third or so of a measured peak, restricting consideration to measured samples that are positive and ``look Gaussian'' to a reasonable extent.
Infinite Flatness at Infinity
The Gaussian is infinitely flat at infinity. Equivalently, the Maclaurin expansion (Taylor expansion about ) of
(D.3) |
is zero for all orders. Thus, even though is differentiable of all orders at , its series expansion fails to approach the function. This happens because has an essential singularity at (also called a ``non-removable singularity''). One can think of an essential singularity as an infinite number of poles piled up at the same point ( for ). Equivalently, above has an infinite number of zeros at , leading to the problem with Maclaurin series expansion. To prove this, one can show
(D.4) |
for all . This follows from the fact that exponential growth or decay is faster than polynomial growth or decay. An exponential can in fact be viewed as an infinite-order polynomial, since
(D.5) |
We may call infinitely flat at in the ``Padé sense'':
- Padé approximation is maximally flat approximation, and seeks to use all degrees of freedom in the approximation to match the leading terms of the Taylor series expansion.
- Butterworth filters (IIR) are maximally flat at dc [263].
- Lagrange interpolation (FIR) is maximally flat at dc [266].
- Thiran allpass interpolation has maximally flat group delay at dc [266].
Another interesting mathematical property of essential singularities is that near an essential singular point the inequality
(D.6) |
is satisfied at some point in every neighborhood of , however small. In other words, comes arbitrarily close to every possible value in any neighborhood about an essential singular point. This was first proved by Weierstrass [42, p. 270].
Integral of a Complex Gaussian
Theorem:
(D.7) |
Proof: Let
denote the integral. Then
where we needed re to have as . Thus,
(D.8) |
as claimed.
Area Under a Real Gaussian
Corollary:
Setting
in the previous theorem, where
is real,
we have
(D.9) |
Therefore, we may normalize the Gaussian to unit area by defining
(D.10) |
Since
and | (D.11) |
it satisfies the requirements of a probability density function.
Gaussian Integral with Complex Offset
Theorem:
(D.12) |
Proof:
When
, we have the previously proved case. For arbitrary
and real number
, let
denote the closed rectangular contour
, depicted in Fig.D.1.
Clearly, is analytic inside the region bounded by . By Cauchy's theorem [42], the line integral of along is zero, i.e.,
(D.13) |
This line integral breaks into the following four pieces:
where and are real variables. In the limit as , the first piece approaches , as previously proved. Pieces and contribute zero in the limit, since as . Since the total contour integral is zero by Cauchy's theorem, we conclude that piece 3 is the negative of piece 1, i.e., in the limit as ,
(D.14) |
Making the change of variable , we obtain
(D.15) |
as desired.
Fourier Transform of Complex Gaussian
Theorem:
(D.16) |
Proof: [202, p. 211]
The Fourier transform of
is defined as
(D.17) |
Completing the square of the exponent gives
Thus, the Fourier transform can be written as
(D.18) |
using our previous result.
Alternate Proof
The Fourier transform of a complex Gaussian can also be derived using the differentiation theorem and its dual (§B.2).D.1
Proof: Let
(D.19) |
Then by the differentiation theorem (§B.2),
(D.20) |
By the differentiation theorem dual (§B.3),
(D.21) |
Differentiating gives
(D.22) |
Therefore,
(D.23) |
or
(D.24) |
Integrating both sides with respect to yields
(D.25) |
In §D.7, we found that , so that, finally, exponentiating gives
(D.26) |
as expected.
The Fourier transform of complex Gaussians (``chirplets'') is used in §10.6 to analyze Gaussian-windowed ``chirps'' in the frequency domain.
Why Gaussian?
This section lists some of the points of origin for the Gaussian function in mathematics and physics.
Central Limit Theorem
The central limit theoremD.2provides that many iterated convolutions of any ``sufficiently regular'' shape will approach a Gaussian function.
Iterated Convolutions
Any ``reasonable'' probability density function (PDF) (§C.1.3) has a Fourier transform that looks like near its tip. Iterating convolutions then corresponds to , which becomes [2]
(D.27) |
for large , by the definition of [264]. This proves that the th power of approaches the Gaussian function defined in §D.1 for large .
Since the inverse Fourier transform of a Gaussian is another Gaussian (§D.8), we can define a time-domain function as being ``sufficiently regular'' when its Fourier transform approaches in a sufficiently small neighborhood of . That is, the Fourier transform simply needs a ``sufficiently smooth peak'' at that can be expanded into a convergent Taylor series. This obviously holds for the DTFT of any discrete-time window function (the subject of Chapter 3), because the window transform is a finite sum of continuous cosines of the form in the zero-phase case, and complex exponentials in the causal case, each of which is differentiable any number of times in .
Binomial Distribution
The last row of Pascal's triangle (the binomial distribution) approaches a sampled Gaussian function as the number of rows increases.D.3 Since Lagrange interpolation (elementary polynomial interpolation) is equal to binomially windowed sinc interpolation [301,134], it follows that Lagrange interpolation approaches Gaussian-windowed sinc interpolation at high orders.
Gaussian Probability Density Function
Any non-negative function which integrates to 1 (unit total area) is suitable for use as a probability density function (PDF) (§C.1.3). The most general Gaussian PDF is given by shifts of the normalized Gaussian:
(D.28) |
The parameter is the mean, and is the variance of the distribution (we'll show this in §D.12 below).
Maximum Entropy Property of the
Gaussian Distribution
Entropy of a Probability Distribution
The entropy of a probability density function (PDF) is defined as [48]
(D.29) |
where denotes the logarithm base 2. The entropy of can be interpreted as the average number of bits needed to specify random variables drawn at random according to :
(D.30) |
The term can be viewed as the number of bits which should be assigned to the value . (The most common values of should be assigned the fewest bits, while rare values can be assigned many bits.)
Example: Random Bit String
Consider a random sequence of 1s and 0s, i.e., the probability of a 0 or 1 is always . The corresponding probability density function is
(D.31) |
and the entropy is
(D.32) |
Thus, 1 bit is required for each bit of the sequence. In other words, the sequence cannot be compressed. There is no redundancy.
If instead the probability of a 0 is 1/4 and that of a 1 is 3/4, we get
and the sequence can be compressed about .
In the degenerate case for which the probability of a 0 is 0 and that of a 1 is 1, we get
Thus, the entropy is 0 when the sequence is perfectly predictable.
Maximum Entropy Distributions
Uniform Distribution
Among probability distributions which are nonzero over a finite range of values , the maximum-entropy distribution is the uniform distribution. To show this, we must maximize the entropy,
(D.33) |
with respect to , subject to the constraints
Using the method of Lagrange multipliers for optimization in the presence of constraints [86], we may form the objective function
(D.34) |
and differentiate with respect to (and renormalize by dropping the factor multiplying all terms) to obtain
(D.35) |
Setting this to zero and solving for gives
(D.36) |
(Setting the partial derivative with respect to to zero merely restates the constraint.)
Choosing to satisfy the constraint gives , yielding
(D.37) |
That this solution is a maximum rather than a minimum or inflection point can be verified by ensuring the sign of the second partial derivative is negative for all :
(D.38) |
Since the solution spontaneously satisfied , it is a maximum.
Exponential Distribution
Among probability distributions which are nonzero over a semi-infinite range of values and having a finite mean , the exponential distribution has maximum entropy.
To the previous case, we add the new constraint
(D.39) |
resulting in the objective function
Now the partials with respect to are
and is of the form . The unit-area and finite-mean constraints result in and , yielding
(D.40) |
Gaussian Distribution
The Gaussian distribution has maximum entropy relative to all probability distributions covering the entire real line but having a finite mean and finite variance .
Proceeding as before, we obtain the objective function
and partial derivatives
leading to
(D.41) |
For more on entropy and maximum-entropy distributions, see [48].
Gaussian Moments
Gaussian Mean
The mean of a distribution is defined as its first-order moment:
(D.42) |
To show that the mean of the Gaussian distribution is , we may write, letting ,
since .
Gaussian Variance
The variance of a distribution is defined as its second central moment:
(D.43) |
where is the mean of .
To show that the variance of the Gaussian distribution is , we write, letting ,
where we used integration by parts and the fact that as .
Higher Order Moments Revisited
Theorem:
The
th central moment of the Gaussian pdf
with mean
and variance
is given by
where denotes the product of all odd integers up to and including (see ``double-factorial notation''). Thus, for example, , , , and .
Proof:
The formula can be derived by successively differentiating the
moment-generating function
with respect to
and evaluating at
,D.4 or by differentiating the
Gaussian integral
(D.45) |
successively with respect to [203, p. 147-148]:
for . Setting and , and dividing both sides by yields
(D.46) |
for . Since the change of variable has no affect on the result, (D.44) is also derived for .
Moment Theorem
Theorem:
For a random variable
,
(D.47) |
where is the characteristic function of the PDF of :
(D.48) |
(Note that is the complex conjugate of the Fourier transform of .)
Proof: [201, p. 157]
Let
denote the
th moment of
, i.e.,
(D.49) |
Then
where the term-by-term integration is valid when all moments are finite.
Gaussian Characteristic Function
Since the Gaussian PDF is
(D.50) |
and since the Fourier transform of is
(D.51) |
It follows that the Gaussian characteristic function is
(D.52) |
Gaussian Central Moments
The characteristic function of a zero-mean Gaussian is
(D.53) |
Since a zero-mean Gaussian is an even function of , (i.e., ), all odd-order moments are zero. By the moment theorem, the even-order moments are
(D.54) |
In particular,
Since and , we see , , as expected.
A Sum of Gaussian Random Variables is a Gaussian Random Variable
A basic result from the theory of random variables is that when you sum two independent random variables, you convolve their probability density functions (PDF). (Equivalently, in the frequency domain, their characteristic functions multiply.)
That the sum of two independent Gaussian random variables is Gaussian follows immediately from the fact that Gaussians are closed under multiplication (or convolution).
Next Section:
Bilinear Frequency-Warping for Audio Spectrum Analysis over Bark and ERB Frequency Scales
Previous Section:
Beginning Statistical Signal Processing