Free Books Spectral Audio Signal Processing

Spectrum Analysis of Noise

Spectrum analysis of noise is generally more advanced than the analysis of ``deterministic'' signals such as sinusoids, because the mathematical model for noise is a so-called stochastic process, which is defined as a sequence of random variables (see §C.1). More broadly, the analysis of signals containing noise falls under the subject of statistical signal processing [121]. Appendix C provides a short tutorial on this topic. In this chapter, we will cover only the most basic practical aspects of spectrum analysis applied to signals regarded as noise.

In particular, we will be concerned with estimating two functions from an observed noise sequence , $n=0,1,2,\ldots N-1$ :

sample autocorrelation $\hat{r}_x(l)$
sample power spectral density (PSD) ${\hat S}_x(\omega)$

When the number of observed samples of approaches infinity, we assume that the sample autocorrelation $\hat{r}_x(l)$ approaches the true autocorrelation (defined formally in Appendix C). Note that we do not need to know anything about the true autocorrelation function--only that the sample autocorrelation approaches it in the limit as $N\to\infty$ .

The PSD is the Fourier transform of the autocorrelation function:

$\displaystyle \zbox {S_x(\omega) = \hbox{\sc DTFT}_\omega(r_x)}$

(7.1)

We'll accept this as nothing more than the definition of the PSD. When the signal is real, both and are real and even.

As indicated above, when estimating the true autocorrelation from observed samples of , the resulting estimate $\hat{r}_x(l)$ will be called a sample autocorrelation. Likewise, the Fourier transform of a sample autocorrelation will be called a sample PSD. It is assumed that the sample PSD ${\hat S}_x(\omega)$ converges to the true PSD $S_x(\omega)$ as $N\to\infty$ .

We will also be concerned with two cases of the autocorrelation function itself:

biased autocorrelation
unbiased autocorrelation

The biased autocorrelation,^7.1or simply autocorrelation, will be taken to be the simplest case computationally: If is a discrete-time signal, where ranges over all integers, then as described in §2.3.7, the autocorrelation of at ``lag '' is given by

$\displaystyle (x \star x)(l) = \sum_{n=-\infty}^{\infty} \overline{x(n)}x(n+l)$

(7.2)

Note that this definition of autocorrelation is workable only for signals having finite support (nonzero over a finite number of samples). As shown in §2.3.7, the Fourier transform of the autocorrelation of is simply the squared-magnitude of the Fourier transform of :

$\displaystyle \hbox{\sc DTFT}_\omega(x \star x) = \vert X(\omega)\vert^2$

(7.3)

This chapter is concerned with noise-like signals that ``last forever'', i.e., they exhibit infinite support. As a result, we cannot work only with $x\star x$ , and will introduce the unbiased sample autocorrelation function

$\displaystyle \hat{r}_x(l) \isdef \frac{1}{N-\vert l\vert} \sum_{n=0}^{N-1} \overline{x(n)}x(n+l). \protect$

(7.4)

Since this gives an unbiased estimator of the true autocorrelation (as will be discussed below), we see that the ``bias'' in $x\star x$ consists of a multiplication of the unbiased sample autocorrelation by a Bartlett (triangular) window (see §3.5). This means we can convert the biased autocorrelation to unbiased form by simply ``dividing out'' this window:

$\displaystyle \hat{r}_x(l) = \left\{\begin{array}{ll} \frac{(x\star x)(l)}{N-\vert l\vert}, & \vert l\vert<N \\ [5pt] 0, & \vert l\vert\ge N \\ \end{array} \right.$

(7.5)

Since the Fourier transform of a Bartlett window is $\hbox{asinc}^2$ (§3.5), we find that the DTFT of the biased autocorrelation is a smoothed version of the unbiased PSD (convolved with $\hbox{asinc}^2$ ).

To avoid terminology confusion below, remember that the ``autocorrelation'' of a signal is defined here (and in §2.3.7) to mean the maximally simplified case $x\star x = \sum_n x(n)x(n+l)$ , i.e., without normalization of any kind. This definition of ``autocorrelation'' is adopted to correspond to everyday practice in digital signal processing. The term ``sample autocorrelation'', on the other hand, will refer to an unbiased autocorrelation estimate. Thus, the ``autocorrelation'' of a signal $x\star x$ can be viewed as a Bartlett-windowed (unbiased-)sample-autocorrelation. In the frequency domain, the autocorrelation transforms to the magnitude-squared Fourier transform, and the sample autocorrelation transforms to the sample power spectral density.

Introduction to Noise

Why Analyze Noise?

An example application of noise spectral analysis is denoising, in which noise is to be removed from some recording. On magnetic tape, for example, ``tape hiss'' is well modeled mathematically as a noise process. If we know the noise level in each frequency band (its power level), we can construct time-varying band gains to suppress the noise when it is audible. That is, the gain in each band is close to 1 when the music is louder than the noise, and close to 0 when the noise is louder than the music. Since tape hiss is well modeled as stationary (constant in nature over time), we can estimate the noise level during periods of ``silence'' on the tape.

Another application of noise spectral analysis is spectral modeling synthesis (the subject of §10.4). In this sound modeling technique, sinusoidal peaks are measured and removed from each frame of a short-time Fourier transform (sequence of FFTs over time). The remaining signal energy, whatever it may be, is defined as ``noise'' and resynthesized using white noise through a filter determined by the upper spectral envelope of the ``noise floor''.

What is Noise?

Consider the spectrum analysis of the following sequence:

x = [-1.55, -1.35, -0.33, -0.93, 0.39, 0.45, -0.45, -1.98]

In the absence of any other information, this is just a list of numbers. It could be temperature fluctuations in some location from one day to the next, or it could be some normalization of successive samples from a music CD. There is no way to know if the numbers are ``random'' or just ``complicated''.^7.2 More than a century ago, before the dawn of quantum mechanics in physics, it was thought that there was no such thing as true randomness--given the positions and momenta of all particles, the future could be predicted exactly; now, ``probability'' is a fundamental component of all elementary particle interactions in the Standard Model of physics [59].

It so happens that, in the example above, the numbers were generated by the randn function in matlab, thereby simulating normally distributed random variables with unit variance. However, this cannot be definitively inferred from a finite list of numbers. The best we can do is estimate the likelihood that these numbers were generated according to some normal distribution. The point here is that any such analysis of noise imposes the assumption that the noise data were generated by some ``random'' process. This turns out to be a very effective model for many kinds of physical processes such as thermal motions or sounds from turbulent flow. However, we should always keep in mind that any analysis we perform is carried out in terms of some underlying signal model which represents assumptions we are making regarding the nature of the data. Ultimately, we are fitting models to data.

We will consider only one type of noise: the stationary stochastic process (defined in Appendix C). All such noises can be created by passing white noise through a linear time-invariant (stable) filter [263]. Thus, for purposes of this book, the term noise always means ``filtered white noise''.

Spectral Characteristics of Noise

As we know, the spectrum $X(\omega)$ of a time series has both a magnitude $\left\vert X(\omega)\right\vert$ and a phase $\angle{X(\omega)}$ . The phase of the spectrum gives information about when the signal occurred in time. For example, if the phase is predominantly linear with slope $-\tau$ , then the signal must have a prominent pulse, onset, or other transient, at time $\tau$ in the time domain.

For stationary noise signals, the spectral phase is simply random, and therefore devoid of information. This happens because stationary noise signals, by definition, cannot have special ``events'' at certain times (other than their usual random fluctuations). Thus, an important difference between the spectra of deterministic signals (like sinusoids) and noise signals is that the concept of phase is meaningless for noise signals. Therefore, when we Fourier analyze a noise sequence , we will always eliminate phase information by working with $\left\vert X(\omega)\right\vert^2$ in the frequency domain (the squared-magnitude Fourier transform), where $X(\omega) \isdeftext \hbox{\sc DTFT}_\omega(x)$ .

White Noise

White noise may be defined as a sequence of uncorrelated random values, where correlation is defined in Appendix C and discussed further below. Perceptually, white noise is a wideband ``hiss'' in which all frequencies are equally likely. In Matlab or Octave, band-limited white noise can be generated using the rand or randn functions:

y = randn(1,100); % 100 samples of Gaussian white noise
                  % with zero mean and unit variance

x = rand(1,100);  % 100 white noise samples,
                  %   uniform between 0 and 1.
xn = 2*(x-0.5);   % Make it uniform between -1 and +1

True white noise is obtained in the limit as the sampling rate goes to infinity and as time goes to plus and minus infinity. In other words, we never work with true white noise, but rather a finite time-segment from a white noise which has been band-limited to less than half the sampling rate and sampled.

In making white noise, it doesn't matter how the amplitude values are distributed probabilistically (although that amplitude-distribution must be the same for each sample--otherwise the noise sequence would not be stationary, i.e., its statistics would be time-varying, which we exclude here). In other words, the relative probability of different amplitudes at any single sample instant does not affect whiteness, provided there is some zero-mean distribution of amplitude. It only matters that successive samples of the sequence are uncorrelated. Further discussion regarding white noise appears in §C.3.

Testing for White Noise

To test whether a set of samples can be well modeled as white noise , we may compute its sample autocorrelation and verify that it approaches an impulse in the limit as the number of samples becomes large; this is another way of saying that successive noise samples are uncorrelated. Equivalently, we may break the set of samples into successive blocks across time, take an FFT of each block, and average their squared magnitudes; if the resulting average magnitude spectrum is flat, then the set of samples looks like white noise. In the following sections, we will describe these steps in further detail, culminating in Welch's method for noise spectrum analysis, summarized in §6.9.

Sample Autocorrelation

The sample autocorrelation of a sequence , $n=0,1,2,\ldots,N-1$ may be defined by

$\displaystyle \hat{r}_{v,N}(l) \isdef \frac{1}{N-\vert l\vert} \sum_{n=0}^{N-1}\overline{v(n)}v(n+l), \quad l=0,\pm1,\pm2, \ldots, \pm (N-1), \protect$

(7.6)

where is defined as zero outside of the range $n\in[0,N-1]$ . (Note that this differs from the usual definition of indexing modulo for the DFT.) In more explicit detail, (6.6) can be written out as

$\displaystyle \left\{\begin{array}{ll} \frac{1}{N-l}\sum_{n=0}^{N-1-l}\overline{v(n)}v(n+l), & l=0,1,2,\ldots,N-1 \\ [5pt] \frac{1}{N+l}\sum_{n=-l}^{N-1}\overline{v(n)}v(n+l), & l=-1,-2,\ldots,-N+1 \\ \end{array} \right. \protect$

(7.7)

and zero for $\left\vert l\right\vert\geq N$ .

In matlab, the sample autocorrelation of a vector x can be computed using the xcorr function.^7.3

Example:

octave:1> xcorr([1 1 1 1], 'unbiased')
ans =
  1   1   1   1   1   1   1

The xcorr function also performs cross-correlation when given a second signal argument, and offers additional features with additional arguments. Say help xcorr for details.

Note that $\hat{r}_{v,N}(l)$ is the average of the lagged product over all available data. For white noise , this average approaches zero for $l\neq0$ as the number of terms in the average increases. That is, we must have

$\displaystyle \hat{r}_{v,N}(l) \approx \left\{\begin{array}{ll} \hat{\sigma}_{v,N}^2, & l=0 \\ [5pt] 0, & l\neq 0 \\ \end{array} \right. \isdef \hat{\sigma}_{v,N}^2 \delta(l)$

(7.8)

where

$\displaystyle \hat{\sigma}_{v,N}^2 \isdef \frac{1}{N}\sum_{n=0}^{N-1} \left\vert v(n)\right\vert^2$

(7.9)

is defined as the sample variance of .^7.4

The plot in the upper left corner of Fig.6.1 shows the sample autocorrelation obtained for 32 samples of pseudorandom numbers (synthetic random numbers). (For reasons to be discussed below, the sample autocorrelation has been multiplied by a Bartlett (triangular) window.) Proceeding down the column on the left, the results of averaging many such sample autocorrelations can be seen. It is clear that the average sample autocorrelation function is approaching an impulse, as desired by definition for white noise . (The right column shows the Fourier transform of each sample autocorrelation function, which is a smoothed estimate of the power spectral density, as discussed in §6.6 below.)

**Figure 6.1:** Averaged sample autocorrelations and their Fourier transforms.
$\includegraphics[width=\twidth]{eps/twhite}$

For stationary stochastic processes , the sample autocorrelation function $\hat{r}_{v,N}(l)$ approaches the true autocorrelation function in the limit as the number of observed samples goes to infinity, i.e.,

$\displaystyle \lim_{N\to\infty} \hat{r}_{v,N}(l) = r_v(l).$

(7.10)

The true autocorrelation function of a random process is defined in Appendix C. For our purposes here, however, the above limit can be taken as the definition of the true autocorrelation function for the noise sequence .

At lag , the autocorrelation function of a zero-mean random process reduces to the variance:

$\displaystyle r_v(0) \isdef \lim_{N\to\infty}\frac{1}{N}\sum_{m=0}^{N-1} \left\vert v(m)\right\vert^2 = \sigma_v^2$

The variance can also be called the average power or mean square. The square root $\sigma_v$ of the variance is called the standard deviation or root mean square (RMS).

Sample Power Spectral Density

The Fourier transform of the sample autocorrelation function $\hat{r}_{v,N}$ (see (6.6)) is defined as the sample power spectral density (PSD):

$\displaystyle {\hat S}_{v,N}(\omega) \isdef \hbox{\sc DTFT}_\omega\{\hat{r}_{v,N}\} \isdef \sum_{n=-\infty}^\infty \hat{r}_{v,N}(n)e^{-j\omega n}$

(7.11)

This definition coincides with the classical periodogram when $\hat{r}_{v,N}$ is weighted differently (by a Bartlett window).

Similarly, the true power spectral density of a stationary stochastic processes is given by the Fourier transform of the true autocorrelation function , i.e.,

$\displaystyle S_{v}(\omega) = \hbox{\sc DTFT}_\omega\{r_v\}.$

(7.12)

For real signals, the autocorrelation function is always real and even, and therefore the power spectral density is real and even for all real signals. An area under the PSD, $S_x(\omega)\cdot\Delta\omega$ , comprises the contribution to the variance of from the frequency interval $[\omega,\omega+\Delta\omega]$ . The total integral of the PSD gives the total variance:

$\displaystyle \int_{-\pi}^\pi S_v(\omega) \frac{d\omega}{2\pi} = \int_{-0.5}^{0.5} S_v(2\pi f) df = r_v(0) = \sigma_v^2,$

(7.13)

again assuming is zero mean.^7.5

Since the sample autocorrelation of white noise approaches an impulse, its PSD approaches a constant, as can be seen in Fig.6.1. This means that white noise contains all frequencies in equal amounts. Since white light is defined as light of all colors in equal amounts, the term ``white noise'' is seen to be analogous.

Biased Sample Autocorrelation

The sample autocorrelation defined in (6.6) is not quite the same as the autocorrelation function for infinitely long discrete-time sequences defined in §2.3.6, viz.,

$\displaystyle (v\star v)_l$	$\displaystyle \isdef$	$\displaystyle \sum_{n=-\infty}^{\infty} \overline{v(n)} v(n+l) \protect$	(7.14)
	$\displaystyle \longleftrightarrow$	$\displaystyle \left\vert V(\omega)\right\vert^2$

where the signal is assumed to be of finite support (nonzero over a finite range of samples), and $V(\omega)$ is the DTFT of . The advantage of the definition of $v\star v$ is that there is a simple Fourier theorem associated with it. The disadvantage is that it is biased as an estimate of the statistical autocorrelation. The bias can be removed, however, since

$\displaystyle \hat{r}_{v,N}(l) \isdef \frac{1}{N-\vert l\vert} (v\star v)(l), \quad \vert l\vert<N. \protect$

(7.15)

Thus, $v\star v$ can be seen as a Bartlett-windowed sample autocorrelation:

$\displaystyle (v\star v)(l) = \left\{\begin{array}{ll} (N-\left\vert l\right\vert) \hat{r}_{v,N}(l), & l=0,\pm1,\pm2,\ldots,\pm (N-1) \\ [5pt] 0, & \vert l\vert\geq N. \\ \end{array} \right. \protect$

(7.16)

It is common in practice to retain the implicit Bartlett (triangular) weighting in the sample autocorrelation. It merely corresponds to smoothing of the power spectrum (or cross-spectrum) with the $\hbox{asinc}^2$ kernel, and smoothing is necessary anyway for statistical stability. It also down-weights the less reliable large-lag estimates, weighting each lag by the number of lagged products that were summed, which seems natural.

The left column of Fig.6.1 in fact shows the Bartlett-biased sample autocorrelation. When the bias is removed, the autocorrelation appears noisier at higher lags (near the endpoints of the plot).

Smoothed Power Spectral Density

The DTFT of the Bartlett (triangular) window weighting in (6.16) is given by

$\displaystyle N^2\cdot\hbox{asinc}^2(\omega)\isdef \left[\frac{\sin(N\omega/2)}{\sin(\omega/2)}\right]^2,$

(7.17)

where is again the number of samples of . We see that $\left\vert V(\omega)\right\vert^2$ equals the sample power spectral density convolved with $N^2\cdot\hbox{asinc}_N^2(\omega)$ , or

$\displaystyle \left\vert V(\omega)\right\vert^2 = N^2\cdot \hbox{asinc}_N^2 \ast {\hat S}_{v,N}(\omega).$

(7.18)

It turns out that even more smoothing than this is essential for obtaining a stable estimate of the true PSD, as discussed further in §6.11 below.

Since the Bartlett window has no effect on an impulse signal (other than a possible overall scaling), we may use the biased autocorrelation (6.14) in place of the unbiased autocorrelation (6.15) for the purpose of testing for white noise .

The right column of Fig.6.1 shows successively greater averaging of the Bartlett-smoothed sample PSD.

Cyclic Autocorrelation

For sequences of length , the cyclic autocorrelation operator is defined by

$\displaystyle (v\star v)_l \isdefs \sum_{n=0}^{N-1} \overline{v(n)} v(n+l) \;\longleftrightarrow\;\left\vert V(\omega_k)\right\vert^2, \quad k=0,1,2,\ldots,N-1,$

(7.19)

where $\omega_k\isdef 2\pi k/N$ and the index is interpreted modulo .

By using zero padding by a factor of 2 or more, cyclic autocorrelation also implements acyclic autocorrelation as defined in (6.16).

An unbiased cyclic autocorrelation is obtained, in the zero-mean case, by simply normalizing $v\star v$ by the number of terms in the sum:

$\displaystyle \hat{r}_v(l) = \frac{1}{N}(v\star v)_l$

(7.20)

Practical Bottom Line

Since we must use the DFT in practice, preferring an FFT for speed, we typically compute the sample autocorrelation function for a length sequence , $n=0,1,2,\ldots,M-1$ as follows:

Choose the FFT size to be a power of 2 providing at least samples of zero padding ( $N\geq 2M-1$ ):

$\displaystyle x \isdef [v(0),v(1),\ldots,v(M-1), \underbrace{0,\ldots,0}_{\hbox{$N-M$}}].$ (7.21)
Perform a length FFT to get $X(\omega_k)=\hbox{\sc FFT}_k(x)$ .
Compute the squared magnitude $\left\vert X(\omega_k)\right\vert^2$ .
Compute the inverse FFT to get $(x\star x)(l)$ , $l=0,\ldots,N-1$ .
Remove the bias, if desired, by inverting the implicit Bartlett-window weighting to get

$\displaystyle \hat{r}_{v,M}(l) \isdef \left\{\begin{array}{ll} \frac{1}{M-\vert l\vert} (x\star x)(l), & l=0,\,\pm1,\,\pm2,\,\pm (M-1)\;\mbox{(mod $N$)} \\ [5pt] 0, & \vert l\vert\geq M\; \mbox{(mod $N$)}. \\ \end{array} \right.$ (7.22)

Often the sample mean (average value) of the samples of is removed prior to taking an FFT. Some implementations also detrend the data, which means removing any linear ``tilt'' in the data.^7.6

It is important to note that the sample autocorrelation is itself a stochastic process. To stably estimate a true autocorrelation function, or its Fourier transform the power spectral density, many sample autocorrelations (or squared-magnitude FFTs) must be averaged together, as discussed in §6.12 below.

Why an Impulse is Not White Noise

Given the test for white noise that its sample autocorrelation must approach an impulse in the limit, one might suppose that the impulse signal $\delta(n)$ is technically ``white noise'', because its sample autocorrelation function is a perfect impulse. However, the impulse signal fails the test of being stationary. That is, its statistics are not the same at every time instant. Instead, we classify an impulse as a deterministic signal. What is true is that the impulse signal is the deterministic counterpart of white noise. Both signals contain all frequencies in equal amounts. We will see that all approaches to noise spectrum analysis (that we will consider) effectively replace noise by its autocorrelation function, thereby converting it to deterministic form. The impulse signal is already deterministic.

We can modify our white-noise test to exclude obviously nonstationary signals by dividing the signal under analysis into blocks and computing the sample autocorrelation in each block. The final sample autocorrelation is defined as the average of the block sample autocorrelations. However, we can also test to see that the blocks are sufficiently ``comparable''. A precise definition of ``comparable'' will need to wait, but intuitively, we expect that the larger the block size (the more averaging of lagged products within each block), the more nearly identical the results for each block should be. For the impulse signal, the first block gives an ideal impulse for the sample autocorrelation, while all other blocks give the zero signal. The impulse will therefore be declared nonstationary under any reasonable definition of what it means to be ``comparable'' from block to block.

The Periodogram

The periodogram is based on the definition of the power spectral density (PSD) (see Appendix C). Let denote a windowed segment of samples from a random process , where the window function (classically the rectangular window) contains nonzero samples. Then the periodogram is defined as the squared-magnitude DTFT of divided by [120, p. 65]:^7.7

$\displaystyle P_{x,M}(\omega)$	$\displaystyle \isdef$	$\displaystyle \frac{1}{M} \left\vert\hbox{\sc DTFT}(x_w)\right\vert^2 = \frac{1}{M} \left\vert\sum_{n=0}^{M-1} x_w(n)e^{-j\omega n}\right\vert^2 \protect$	(7.23)
	$\displaystyle \longleftrightarrow$	$\displaystyle \frac{1}{M} \sum_{n=0}^{M-1-\vert l\vert} x_w(n)x_w(n+\vert l\vert), \quad l\in{\bf Z}.$

In the limit as goes to infinity, the expected value of the periodogram equals the true power spectral density of the noise process . This is expressed by writing

$\displaystyle {\cal E}\{ \lim_{M\to\infty} P_{x,M}(\omega) \} = S_x(\omega)$

(7.24)

where $S_x(\omega)$ denotes the power spectral density (PSD) of . (``Expected value'' is defined in Appendix C on page

In terms of the sample PSD defined in §6.7, we have

$\displaystyle P_{x,M}(\omega) = M \hbox{asinc}_M^2 \ast {\hat S}_{x,M}(\omega).$

(7.25)

That is, the periodogram is equal to the smoothed sample PSD. In the time domain, the autocorrelation function corresponding to the periodogram is Bartlett windowed.

In practice, we of course compute a sampled periodogram $S_x(\omega_k)$ , $\omega_k = 2\pi/N$ , replacing the DTFT with the length $N\ge M$ FFT. Essentially, the steps of §6.9 include computation of the periodogram.

As mentioned in §6.9, a problem with the periodogram of noise signals is that it too is random for most purposes. That is, while the noise has been split into bands by the Fourier transform, it has not been averaged in any way that reduces randomness, and each band produces a nearly independent random value. In fact, it can be shown [120] that $P_{x,M}(\omega)$ is a random variable whose standard deviation (square root of its variance) is comparable to its mean.

In principle, we should be able to recover from $P_{x,M}(\omega)$ a filter $H_x(\omega)$ which, when used to filter white noise, creates a noise indistinguishable statistically from the observed sequence . However, the DTFT is evidently useless for this purpose. How do we proceed?

The trick to noise spectrum analysis is that many sample power spectra (squared-magnitude FFTs) must be averaged to obtain a ``stable'' statistical estimate of the noise spectral envelope. This is the essence of Welch's method for spectrum analysis of stochastic processes, as elaborated in §6.12 below. The right column of Fig.6.1 illustrates the effect of this averaging for white noise.

Matlab for the Periodogram

Octave and the Matlab Signal Processing Toolbox have a periodogram function. Matlab for computing a periodogram of white noise is given below (see top-right plot in Fig.6.1):

M = 32;
v = randn(M,1);       % white noise
V = abs(fft(v)).^2/M; % periodogram

Welch's Method

Welch's method [296] (also called the periodogram method) for estimating power spectra is carried out by dividing the time signal into successive blocks, forming the periodogram for each block, and averaging.

Denote the th windowed, zero-padded frame from the signal by

$\displaystyle x_m(n)\isdef w(n)x(n+mR), \quad n=0,1,\ldots,M-1,\; m=0,1,\ldots,K-1,$

(7.26)

where is defined as the window hop size, and let denote the number of available frames. Then the periodogram of the th block is given by

$\displaystyle P_{x_m,M}(\omega_k) = \frac{1}{M}\left\vert\hbox{\sc FFT}_{N,k}(x_m)\right\vert^2 \isdef \frac{1}{M}\left\vert\sum_{n=0}^{N-1} x_m(n) e^{-j2\pi nk/N}\right\vert^2 \protect$

as before, and the Welch estimate of the power spectral density is given by

$\displaystyle {\hat S}_x^W(\omega_k) \isdef \frac{1}{K}\sum_{m=0}^{K-1}P_{x_m,M}(\omega_k). \protect$

(7.27)

In other words, it's just an average of periodograms across time. When is the rectangular window, the periodograms are formed from non-overlapping successive blocks of data. For other window types, the analysis frames typically overlap, as discussed further in §6.13 below.

Welch Autocorrelation Estimate

Since $\left\vert X_m\right\vert^2\;\longleftrightarrow\;x\star x$ which is circular (or cyclic) correlation, we must use zero padding in each FFT in order to be able to compute the acyclic autocorrelation function as the inverse DFT of the Welch PSD estimate. There is no need to arrange the zero padding in zero-phase form, since all phase information is discarded when the magnitude squared operation is performed in the frequency domain.

The Welch autocorrelation estimate is biased. That is, as discussed in §6.6, it converges as $K\to\infty$ to the true autocorrelation weighted by $M-\vert l\vert$ (a Bartlett window). The bias can be removed by simply dividing it out, as in (6.15).

Resolution versus Stability

A fundamental trade-off exists in Welch's method between spectral resolution and statistical stability. As discussed in §5.4.1, we wish to maximize the block size in order to maximize spectral resolution. On the other hand, more blocks (larger ) gives more averaging and hence greater spectral stability. A typical default choice is $M\approx K\approx \sqrt{N_x}$ , where denotes the number of available data samples.

Welch's Method with Windows

As usual, the purpose of the window function (Chapter 3) is to reduce side-lobe level in the spectral density estimate, at the expense of frequency resolution, exactly as in the case of sinusoidal spectrum analysis.

When using a non-rectangular analysis window, the window hop-size cannot exceed half the frame length without introducing a non-uniform sensitivity to the data over time. In the rectangular window case, we can set and have non-overlapping windows. For Hamming, Hanning, and any other generalized Hamming window, one normally sees for odd-length windows. For the Blackman window, $R\approx M/3$ is typical. In general, the hop size is chosen so that the analysis window overlaps and adds to a constant at that hop size. This consideration is explored more fully in Chapter 8. An equivalent parameter used by most matlab implementations is the overlap parameter .

Note that the number of blocks averaged in (6.27) increases as decreases. If $N_x\geq M$ denotes the total number of signal samples available, then the number of complete blocks available for averaging may be computed as

$\displaystyle K = \left\lfloor \frac{N_x-M}{R}\right\rfloor + 1,$

(7.28)

where the floor function $\left\lfloor x\right\rfloor$ denotes the largest integer less than or equal to .

Matlab for Welch's Method

Octave and the Matlab Signal Processing Toolbox have a pwelch function. Note that these functions also provide confidence intervals (not covered here). Matlab for generating the data shown in Fig.6.1 (employing Welch's method) appears in Fig.6.2.

**Figure:** Matlab listing for generating data plotted in Fig.6.1.
M = 32; Ks=[1 8 32 128] nkases = length(Ks); for kase = 1:4 K = Ks(kase); N = MK; Nfft = 2M; % zero pad for acyclic autocorrelation Sv = zeros(Nfft,1); % PSD "accumulator" for m=1:K v = randn(M,1); % noise sample V = fft(v,Nfft); Vms = abs(V).^2; % same as conj(V) .* V Sv = Sv + Vms; % sum scaled periodograms end Sv = Sv/K; % average of all scaled periodograms rv = ifft(Sv); % Average Bartlett-windowed sample autocor. rvup = [rv(Nfft-M+1:Nfft)',rv(1:M)']; % unpack FFT rvup = rvup/M; % Normalize for no bias at lag 0 end

Figure: Matlab listing for generating data plotted in Fig.6.1.

M = 32;
Ks=[1 8 32 128]
nkases = length(Ks);
for kase = 1:4
  K = Ks(kase);
  N = M*K;
  Nfft = 2*M; % zero pad for acyclic autocorrelation
  Sv = zeros(Nfft,1); % PSD "accumulator"
  for m=1:K
    v = randn(M,1);  % noise sample
    V = fft(v,Nfft);
    Vms = abs(V).^2; % same as conj(V) .* V
    Sv = Sv + Vms;   % sum scaled periodograms
  end
  Sv = Sv/K;     % average of all scaled periodograms
  rv = ifft(Sv); % Average Bartlett-windowed sample autocor.
  rvup = [rv(Nfft-M+1:Nfft)',rv(1:M)']; % unpack FFT
  rvup = rvup/M; % Normalize for no bias at lag 0
end

Filtered White Noise

When a white-noise sequence is filtered, successive samples generally become correlated.^7.8 Some of these filtered-white-noise signals have names:

pink noise: Filter amplitude response is proportional to $1/\sqrt{f}$ ; PSD $\propto 1/f$ (``1/f noise'' -- ``equal-loudness noise'')
brown noise: Filter amplitude response is proportional to ; PSD $\propto 1/f^2$ (``Brownian motion'' -- ``Wiener process'' -- ``random increments'')

More generally, filtered white noise can be termed colored noise or correlated noise. As long as the filter is linear and time-invariant (LTI), and strictly stable (poles inside and not on the unit circle of the plane), its output will be a stationary ``colored noise''. We will only consider stochastic processes of this nature.

In the preceding sections, we have looked at two ways of analyzing noise: the sample autocorrelation function in the time or ``lag'' domain, and the sample power spectral density (PSD) in the frequency domain. We now look at these two representations for the case of filtered noise.

Let denote a length sequence we wish to analyze. Then the Bartlett-windowed acyclic sample autocorrelation of is $x\star x$ , and the corresponding smoothed sample PSD is $\left\vert X(\omega)\right\vert^2$ (§6.7, §2.3.6).

For filtered white noise, we can write as a convolution of white noise and some impulse response :

$\displaystyle x(n) = (h\ast v)(n) \isdef \sum_{m=-\infty}^\infty v(m)h(n-m)$

(7.29)

The DTFT of is then, by the convolution theorem (§2.3.5),

$\displaystyle X(\omega) = H(\omega)V(\omega)$

(7.30)

so that

$\begin{eqnarray*} x\star x &\longleftrightarrow& \left\vert X(\omega)\right\vert^2 = \left\vert H(\omega)V(\omega)\right\vert^2 = \left\vert H(\omega)\right\vert^2\left\vert V(\omega)\right\vert^2\\ &\longleftrightarrow& (h\star h)\ast (v\star v) \propto h\star h, \end{eqnarray*}$

since $v\star v \propto \sigma_v^2\delta$ for white noise. Thus, we have derived that the autocorrelation of filtered white noise is proportional to the autocorrelation of the impulse response times the variance of the driving white noise.

Let's try to pin this down more precisely and find the proportionality constant. As the number of observed samples of $x(n) = (h\ast v)(n)$ goes to infinity, the length- Bartlett-window bias $M-\vert l\vert$ in the autocorrelation $x\star x$ converges to a constant scale factor at lags such that $M\gg \vert l\vert$ . Therefore, the unbiased autocorrelation can be expressed as

$\displaystyle \hat{r}_x(l) = \frac{1}{M-\vert l\vert} x\star x \;\to\; \frac{1}{M} x \star x, \quad \hbox{($M\gg l$)}.$

(7.31)

In the limit, we obtain

$\displaystyle \lim_{M\to\infty} \frac{1}{M} x \star x = r_x(l)$

(7.32)

In the frequency domain we therefore have

$\begin{eqnarray*} S_x(\omega) &=& \lim_{M\to \infty}\frac{1}{M}\vert X(\omega)\vert^2 \;=\; % = \frac{1}{M}\vert H(\omega)\,V(\omega)\vert^2 \vert H(\omega)\vert^2 \cdot \lim_{M\to \infty} \frac{\vert V(\omega)\vert^2}{M} \\ [5pt] &=& \vert H(\omega)\vert^2 S_v(\omega) \;=\; \vert H(\omega)\vert^2\sigma_v^2 \;\longleftrightarrow\; (h\star h) \sigma_v^2 . \end{eqnarray*}$

In summary, the autocorrelation of filtered white noise $x=h\ast v$ is

$\displaystyle \zbox {r_x(l) = \sigma_v^2\cdot(h\star h)(l) \;\longleftrightarrow\;\sigma_v^2 \left\vert H(\omega)\right\vert^2}$

(7.33)

where $\sigma_v^2$ is the variance of the driving white noise.

In words, the true autocorrelation of filtered white noise equals the autocorrelation of the filter's impulse response times the white-noise variance. (The filter is of course assumed LTI and stable.) In the frequency domain, we have that the true power spectral density of filtered white noise is the squared-magnitude frequency response of the filter scaled by the white-noise variance.

For finite number of observed samples of a filtered white noise process, we may say that the sample autocorrelation of filtered white noise is given by the autocorrelation of the filter's impulse response convolved with the sample autocorrelation of the driving white-noise sequence. For lags much less than the number of observed samples , the driver sample autocorrelation approaches an impulse scaled by the white-noise variance. In the frequency domain, we have that the sample PSD of filtered white noise is the squared-magnitude frequency response of the filter $\vert H(\omega)\vert^2$ scaled by a sample PSD of the driving noise.

We reiterate that every stationary random process may be defined, for our purposes, as filtered white noise.^7.9 As we see from the above, all correlation information is embodied in the filter used.

Example: FIR-Filtered White Noise

Let's estimate the autocorrelation and power spectral density of the ``moving average'' (MA) process

$\displaystyle x(n) = v(n) + v(n-1) + \cdots + v(n-8)$

(7.34)

where is unit-variance white noise.

Since ,

$\displaystyle h\star h = [8,7,6,5,4,3,2,1,0,\ldots]$

(7.35)

for nonnegative lags ( $l\ge0$ ). More completely, we can write

$\displaystyle (h\star h)(l) = \left\{\begin{array}{ll} 8-l, & \vert l\vert<8 \\ [5pt] 0, & \vert l\vert\ge 8. \\ \end{array} \right.$

(7.36)

Thus, the autocorrelation of is a triangular pulse centered on lag 0. The true (unbiased) autocorrelation is given by

$\displaystyle r_x(l) \isdef {\cal E}\{x(n)x(n+l)\} = \sigma_v^2 (h\star h)(l)$

(7.37)

The true power spectral density (PSD) is then

$\displaystyle \hbox{\sc DTFT}_\omega(h\star h) = 8^2\cdot\hbox{asinc}^2_{8}(\omega) = \frac{\sin^2(4\omega)}{\sin^2(0.5\omega)}$

(7.38)

Figure 6.3 shows a collection of measured autocorrelations together with their associated smoothed-PSD estimates.

**Figure 6.3:** Averaged sample autocorrelations (biased) and their Fourier transforms (smoothed PSD estimates), for FIR-filtered white noise.
$\includegraphics[width=\twidth]{eps/tcolored}$

Example: Synthesis of 1/F Noise (Pink Noise)

Pink noise ^7.10 or ``1/f noise'' is an interesting case because it occurs often in nature [294],^7.11is often preferred by composers of computer music, and there is no exact (rational, finite-order) filter which can produce it from white noise . This is because the ideal amplitude response of the filter must be proportional to the irrational function $1/\sqrt{f}$ , where denotes frequency in Hz. However, it is easy enough to generate pink noise to any desired degree of approximation, including perceptually exact.

The following Matlab/Octave code generates pretty good pink noise:

Nx = 2^16;  % number of samples to synthesize
B = [0.049922035 -0.095993537 0.050612699 -0.004408786];
A = [1 -2.494956002   2.017265875  -0.522189400];
nT60 = round(log(1000)/(1-max(abs(roots(A))))); % T60 est.
v = randn(1,Nx+nT60); % Gaussian white noise: N(0,1)
x = filter(B,A,v);    % Apply 1/F roll-off to PSD
x = x(nT60+1:end);    % Skip transient response

In the next section, we will analyze the noise produced by the above matlab and verify that its power spectrum rolls off at approximately 3 dB per octave.

Example: Pink Noise Analysis

Let's test the pink noise generation algorithm presented in §6.14.2. We might want to know, for example, does the power spectral density really roll off as ? Obviously such a shape cannot extend all the way to dc, so how far does it go? Does it go far enough to be declared ``perceptually equivalent'' to ideal 1/f noise? Can we get by with fewer bits in the filter coefficients? Questions like these can be answered by estimating the power spectral density of the noise generator output.

Figure 6.4 shows a single periodogram of the generated pink noise, and Figure 6.5 shows an averaged periodogram (Welch's method of smoothed power spectral density estimation). Also shown in each log-log plot is the true 1/f roll-off line. We see that indeed a single periodogram is quite random, although the overall trend is what we expect. The more stable smoothed PSD estimate from Welch's method (averaged periodograms) gives us much more confidence that the noise generator makes high quality 1/f noise.

Note that we do not have to test for stationarity in this example, because we know the signal was generated by LTI filtering of white noise. (We trust the randn function in Matlab and Octave to generate stationary white noise.)

**Figure 6.4:** Periodogram of pink noise.
$\includegraphics[width=\twidth]{eps/noisepr}$

**Figure 6.5:** Estimated power spectral density of pink noise.
$\includegraphics[width=\twidth]{eps/noisep}$

Processing Gain

A basic property of noise signals is that they add non-coherently. This means they sum on a power basis instead of an amplitude basis. Thus, for example, if you add two separate realizations of a random process together, the total energy rises by approximately 3 dB ( $10\log_{10}(2)$ ). In contrast to this, sinusoids and other deterministic signals can add coherently. For example, at the midpoint between two loudspeakers putting out identical signals, a sinusoidal signal is dB louder than the signal out of each loudspeaker alone (cf. dB for uncorrelated noise).

Coherent addition of sinusoids and noncoherent addition of noise can be used to obtain any desired signal to noise ratio in a spectrum analysis of sinusoids in noise. Specifically, for each doubling of the periodogram block size in Welch's method, the signal to noise ratio (SNR) increases by 3 dB (6 dB spectral amplitude increase for all sinusoids, minus 3 dB increase for the noise spectrum).

Consider a single complex sinusoid in white noise as introduced in (5.32):

$\displaystyle x(n) \isdef {\cal A}e^{j\omega_0 n} + v(n) \protect$

(7.39)

where ${\cal A}= A e^{j\phi}$ is the complex amplitude. Then the length DFT of the first block of is

$\begin{eqnarray*} X(\omega_k) &=& \hbox{\sc DFT}_k(x) = \sum_{n=0}^{N-1}\left[{\cal A}e^{j\omega_0 n} + v(n)\right] e^{-j\omega_k n}\\ &=& {\cal A}\sum_{n=0}^{N-1}e^{j(\omega_0 -\omega_k) n} + \sum_{n=0}^{N-1}v(n) e^{-j\omega_k n}\\ &=& {\cal A}\frac{1-e^{j(\omega_0 -\omega_k)N}}{1-e^{j(\omega_0 -\omega_k)}} + V_N(\omega_k)\\ \end{eqnarray*}$

For simplicity, let $\omega_0 = \omega_l$ for some . That is, suppose for now that $\omega_0$ is one of the DFT frequencies $2\pi l/N$ , $l=0,1,2,\ldots,N-1$ . Then

$\begin{eqnarray*} X(\omega_k) &=& \left\{\begin{array}{ll} N{\cal A}+V_N(\omega_k), & k=l \\ [5pt] V_N(\omega_k), & k\neq l \\ \end{array} \right.\\ &=& N{\cal A}\delta(k-l)+V_N(\omega_k) \end{eqnarray*}$

for $k=0,1,\ldots,N-1$ . Squaring the absolute value gives

$\displaystyle \left\vert X(\omega_k)\right\vert^2 = N^2 A^2\delta(k-l) + \left\vert V_N(\omega_k)\right\vert^2 + 2$ re $\displaystyle \left\{N{\cal A}\delta(k-l)\overline{V_N(\omega_k)}\right\}.$

(7.40)

Since is zero mean, so is $V(\omega_k)$ for all . Therefore, the average over many length- blocks will converge to

$\displaystyle {\cal E}\{\left\vert X(\omega_k)\right\vert^2\} = N^2 A^2\delta(k-l) + {\cal E}\{\left\vert V_N(\omega_k)\right\vert^2\}$

(7.41)

where ${\cal E}\{\cdot\}$ denotes time averaging which, for stationary stochastic processes, is equivalent to taking the expected value (§C.1.6).

The final term can be expanded as

$\begin{eqnarray*} {\cal E}\{\left\vert V_N(\omega_k)\right\vert^2\} &=& {\cal E}\left\{\sum_{n=0}^{N-1}\overline{v(n)} e^{j\omega_k n} \sum_{m=0}^{N-1} v(m)e^{-j\omega_k m}\right\}\\ &=& \sum_{n=0}^{N-1}\sum_{m=0}^{N-1} {\cal E}\{\overline{v(n)}v(m)\} e^{j\omega_k(n-m)}\\ &=& \sum_{n=0}^{N-1} \sigma_v^2 = N\sigma_v^2 \end{eqnarray*}$

since ${\cal E}\{\overline{v(n)}v(m)\}=\sigma_v^2\delta(n-m)$ because is white noise.

In conclusion, we have derived that the average squared-magnitude DFT of samples of a sinusoid in white noise is given by

$\displaystyle {\cal E}\{\left\vert X(\omega_k)\right\vert^2\} = N^2 A^2\delta(k-l) + N\sigma_v^2$

(7.42)

where is the amplitude of the complex sinusoid, and $\sigma_v^2$ is the variance (mean square) of the noise. We see that the signal to noise ratio is zero in every bin but the th, and in that bin it is

$\displaystyle \hbox{\sc SNR}(l) = \frac{N^2 A^2}{N\sigma_v^2} = N\frac{A^2}{\sigma_v^2} .$

(7.43)

In the time domain, the mean square for the signal is while the mean square for the noise is $\sigma_v^2$ . Thus, the DFT gives a factor of processing gain in the bin where the sinusoid falls. Each doubling of the DFT length adds 3 dB to the within-bin SNR. (Remember that we use $10\log_{10}$ for power ratios.)

Another way of viewing processing gain is to consider that the DFT performs a change of coordinates on the observations such that all of the signal energy ``piles up'' in one coordinate $X(\omega_l)$ , while the noise energy remains uniformly distributed among all of the coordinates.

A practical implication of the above discussion is that it is meaningless to quote signal-to-noise ratio in the frequency domain without reporting the relevant bandwidth. In the above example, the SNR could be reported as $N A^2/\sigma_v^2$ in band $2\pi/N$ .

The above analysis also makes clear the effect of bandpass filtering on the signal-to-noise ratio (SNR). For example, consider a dc level in white noise with variance $\sigma_v^2$ . Then the SNR (mean-square level ratio) in the time domain is $A^2/\sigma_v^2$ . Low-pass filtering at $\omega=\pi/2$ cuts the noise energy in half but leaves the dc component unaffected, thereby increasing the SNR by $10\log_{10}(2)\approx 3$ dB. Each halving of the lowpass cut-off frequency adds another 3 dB to the SNR. Since the signal is a dc component (zero bandwidth), this process can be repeated indefinitely to achieve any desired SNR. The narrower the lowpass filter, the higher the SNR. Similarly, for sinusoids, the narrower the bandpass filter centered on the sinusoid's frequency, the higher the SNR.

The Panning Problem

An interesting illustration of the difference between coherent and noncoherent signal addition comes up in the problem of stereo panning between two loudspeakers. Let and denote the signals going to the left and right loudspeakers, respectively, and let and denote their respective gain factors (the ``panning'' gains, between 0 and 1). When , sound comes only from the left speaker, and when , sound comes only from the right speaker. These are the easy cases. The harder question is what should the gains be for a sound directly in front? It turns out that the answer depends upon the listening geometry and the signal frequency content.

If the listener is sitting exactly between the speakers, the ideal ``front image'' channel gains are , provided that the shortest wavelength in the signal is much larger than the ear-to-ear separation of the listener. This restriction is necessary because only those frequencies (below a few kHz, say), will combine coherently from both speakers at each ear. At higher frequencies, the signals from the two speakers decorrelate at each ear because the propagation path lengths differs significantly in units of wavelengths. (In addition, ``head shadowing'' becomes a factor at frequencies this high.) In the perfectly uncorrelated case (e.g., independent white noise coming from each speaker), the energy-preserving gains are $g_L = g_R = 1/\sqrt{2}$ . (This value is typically used in practice since the listener may be anywhere in relation to the speakers.)

To summarize, in ordinary stereo panning, decorrelated high frequencies are attenuated by about 3dB, on average, when using gains dB. At any particular high frequency, the actual gain at each ear can be anywhere between 0 and 1, but on average, they combine on a power basis to provide a 3 dB boost on top of the dB cut, leaving an overall dB change in the level at high frequencies.

Next Section:
Time-Frequency Displays
Previous Section:
Spectrum Analysis of Sinusoids