This appendix collects together various facts about the fascinating Gaussian function--the classic ``bell curve'' that arises repeatedly in science and mathematics. As already seen in §B.17.1, only the Gaussian achieves the minimum time-bandwidth product among all smooth (analytic) functions.
Gaussian Window and Transform
The Gaussian window for FFT analysis was introduced in §3.11, and complex Gaussians (``chirplets'') were utilized in §10.6. For reference in support of these topics, this appendix derives some additional properties of the Gaussian, defined by
and discusses some interesting applications in spectral modeling (the subject of §10.4). The basic mathematics rederived here are well known (see, e.g., [202,5]), while the application to spectral modeling of sound remains a topic under development.
Gaussians Closed under Multiplication
where are arbitrary complex numbers. Then by direct calculation, we have
Completing the square, we obtain
Note that this result holds for Gaussian-windowed chirps ( and complex).
For the special case of two Gaussian probability densities,
the product density has mean and variance given by
Gaussians Closed under Convolution
In §D.8 we show that
- the Fourier transform of a Gaussian is Gaussian, and in §D.2 that
- the product of any two Gaussians is Gaussian.
x = -1:0.1:1; sigma = 0.01; y = exp(-x.*x) + sigma*randn(size(x)); % test data: [p,s] = polyfit(x,log(y),2); % fit parabola to log yh = exp(polyval(p,x)); % data model norm(y-yh) % ans = 1.9230e-16 when sigma=0 plot(abs([y',yh']));In practice, it is good to avoid zeros in the data. For example, one can fit only to the middle third or so of a measured peak, restricting consideration to measured samples that are positive and ``look Gaussian'' to a reasonable extent.
The Gaussian is infinitely flat at infinity. Equivalently, the Maclaurin expansion (Taylor expansion about ) of
is zero for all orders. Thus, even though is differentiable of all orders at , its series expansion fails to approach the function. This happens because has an essential singularity at (also called a ``non-removable singularity''). One can think of an essential singularity as an infinite number of poles piled up at the same point ( for ). Equivalently, above has an infinite number of zeros at , leading to the problem with Maclaurin series expansion. To prove this, one can show
for all . This follows from the fact that exponential growth or decay is faster than polynomial growth or decay. An exponential can in fact be viewed as an infinite-order polynomial, since
We may call infinitely flat at in the ``Padé sense'':
- Padé approximation is maximally flat approximation, and seeks to use all degrees of freedom in the approximation to match the leading terms of the Taylor series expansion.
- Butterworth filters (IIR) are maximally flat at dc .
- Lagrange interpolation (FIR) is maximally flat at dc .
- Thiran allpass interpolation has maximally flat group delay at dc .
Another interesting mathematical property of essential singularities is that near an essential singular point the inequality
is satisfied at some point in every neighborhood of , however small. In other words, comes arbitrarily close to every possible value in any neighborhood about an essential singular point. This was first proved by Weierstrass [42, p. 270].
Proof: Let denote the integral. Then
where we needed re to have as . Thus,
Area Under a Real Gaussian
Corollary: Setting in the previous theorem, where is real, we have
Therefore, we may normalize the Gaussian to unit area by defining
it satisfies the requirements of a probability density function.
Gaussian Integral with Complex Offset
Clearly, is analytic inside the region bounded by . By Cauchy's theorem , the line integral of along is zero, i.e.,
This line integral breaks into the following four pieces:
where and are real variables. In the limit as , the first piece approaches , as previously proved. Pieces and contribute zero in the limit, since as . Since the total contour integral is zero by Cauchy's theorem, we conclude that piece 3 is the negative of piece 1, i.e., in the limit as ,
Making the change of variable , we obtain
Fourier Transform of Complex Gaussian
Proof: [202, p. 211] The Fourier transform of is defined as
Completing the square of the exponent gives
Thus, the Fourier transform can be written as
using our previous result.
Then by the differentiation theorem (§B.2),
By the differentiation theorem dual (§B.3),
Integrating both sides with respect to yields
In §D.7, we found that , so that, finally, exponentiating gives
for large , by the definition of . This proves that the th power of approaches the Gaussian function defined in §D.1 for large .
Since the inverse Fourier transform of a Gaussian is another Gaussian (§D.8), we can define a time-domain function as being ``sufficiently regular'' when its Fourier transform approaches in a sufficiently small neighborhood of . That is, the Fourier transform simply needs a ``sufficiently smooth peak'' at that can be expanded into a convergent Taylor series. This obviously holds for the DTFT of any discrete-time window function (the subject of Chapter 3), because the window transform is a finite sum of continuous cosines of the form in the zero-phase case, and complex exponentials in the causal case, each of which is differentiable any number of times in .
The last row of Pascal's triangle (the binomial distribution) approaches a sampled Gaussian function as the number of rows increases.D.3 Since Lagrange interpolation (elementary polynomial interpolation) is equal to binomially windowed sinc interpolation [301,134], it follows that Lagrange interpolation approaches Gaussian-windowed sinc interpolation at high orders.
Gaussian Probability Density Function
Any non-negative function which integrates to 1 (unit total area) is suitable for use as a probability density function (PDF) (§C.1.3). The most general Gaussian PDF is given by shifts of the normalized Gaussian:
The parameter is the mean, and is the variance of the distribution (we'll show this in §D.12 below).
Maximum Entropy Property of the
where denotes the logarithm base 2. The entropy of can be interpreted as the average number of bits needed to specify random variables drawn at random according to :
The term can be viewed as the number of bits which should be assigned to the value . (The most common values of should be assigned the fewest bits, while rare values can be assigned many bits.)
Consider a random sequence of 1s and 0s, i.e., the probability of a 0 or 1 is always . The corresponding probability density function is
and the entropy is
Thus, 1 bit is required for each bit of the sequence. In other words, the sequence cannot be compressed. There is no redundancy.
If instead the probability of a 0 is 1/4 and that of a 1 is 3/4, we get
and the sequence can be compressed about .
In the degenerate case for which the probability of a 0 is 0 and that of a 1 is 1, we get
Thus, the entropy is 0 when the sequence is perfectly predictable.
Among probability distributions which are nonzero over a finite range of values , the maximum-entropy distribution is the uniform distribution. To show this, we must maximize the entropy,
with respect to , subject to the constraints
Using the method of Lagrange multipliers for optimization in the presence of constraints , we may form the objective function
and differentiate with respect to (and renormalize by dropping the factor multiplying all terms) to obtain
Setting this to zero and solving for gives
(Setting the partial derivative with respect to to zero merely restates the constraint.)
Choosing to satisfy the constraint gives , yielding
That this solution is a maximum rather than a minimum or inflection point can be verified by ensuring the sign of the second partial derivative is negative for all :
Since the solution spontaneously satisfied , it is a maximum.
To the previous case, we add the new constraint
resulting in the objective function
Now the partials with respect to are
and is of the form . The unit-area and finite-mean constraints result in and , yielding
Proceeding as before, we obtain the objective function
and partial derivatives
For more on entropy and maximum-entropy distributions, see .
To show that the mean of the Gaussian distribution is , we may write, letting ,
where is the mean of .
To show that the variance of the Gaussian distribution is , we write, letting ,
where we used integration by parts and the fact that as .
Theorem: The th central moment of the Gaussian pdf with mean and variance is given by
where denotes the product of all odd integers up to and including (see ``double-factorial notation''). Thus, for example, , , , and .
Proof: The formula can be derived by successively differentiating the moment-generating function with respect to and evaluating at ,D.4 or by differentiating the Gaussian integral
successively with respect to [203, p. 147-148]:
for . Setting and , and dividing both sides by yields
for . Since the change of variable has no affect on the result, (D.44) is also derived for .
Theorem: For a random variable ,
where is the characteristic function of the PDF of :
(Note that is the complex conjugate of the Fourier transform of .)
Proof: [201, p. 157] Let denote the th moment of , i.e.,
where the term-by-term integration is valid when all moments are finite.
Gaussian Characteristic Function
Since the Gaussian PDF is
and since the Fourier transform of is
It follows that the Gaussian characteristic function is
The characteristic function of a zero-mean Gaussian is
Since a zero-mean Gaussian is an even function of , (i.e., ), all odd-order moments are zero. By the moment theorem, the even-order moments are
Since and , we see , , as expected.
A Sum of Gaussian Random Variables is a Gaussian Random Variable
A basic result from the theory of random variables is that when you sum two independent random variables, you convolve their probability density functions (PDF). (Equivalently, in the frequency domain, their characteristic functions multiply.)
That the sum of two independent Gaussian random variables is Gaussian follows immediately from the fact that Gaussians are closed under multiplication (or convolution).
Bilinear Frequency-Warping for Audio Spectrum Analysis over Bark and ERB Frequency Scales
Beginning Statistical Signal Processing