Elementary Spectrum Analysis

Before we do anything in the field of spectral modeling, we must be able to competently compute the spectrum of a signal. Since the spectrum is given by the Fourier transform of a signal, Chapter 2 begins with a review of elementary Fourier theory and the most generally useful Fourier theorems for practical spectrum analysis.

In Chapter 3, we look at a number of FFT2.1windows used in practice. The Fourier theorems give us a good thinking vocabulary for understanding the properties of windows in the frequency domain. In addition to a tour of well known windows, optimal custom window design is addressed.

In Chapter 4, we apply both the Fourier theorems of Chapter 2 and the FFT windows of Chapter 3 to the topic of FIR digital filter design--that is, the numerical design of finite-impulse response (FIR) filters for linear filtering in discrete time. We will need such filters in Chapter 10 when we implement FIR filters using FFT convolution in the framework of the short-time Fourier transform (STFT).

Chapter 5 is concerned with spectrum analysis of tonal signals, that is, signals having narrow-band peaks in their spectra. It turns out the ear is especially sensitive to spectral peaks (which is the basis for MPEG audio coding), and so it is often important to be able to accurately measure the amplitude and frequency of each prominent peak in the spectrum. (Sometimes we will also measure the phase of the spectrum at the peak.) This chapter also discusses resolution of spectral peaks, and how the choice of FFT window affects resolution.

Chapter 6 is concerned with spectrum analysis of noise, where, for our purposes, ``noise'' is defined as any ``filtered white noise,'' and white noise is defined as any uncorrelated sequence of samples. (These terms are defined in detail in Chapter 6.) Unlike the ``deterministic'' case, such as when analyzing tonal signals, we must average the squared-magnitude spectrum over several time frames in order to obtain a statistically ``stable'' estimate of the spectrum. This average is called a power spectral density, and the method of averaging is called Welch's Method. It is noteworthy that the power spectral density is a real and positive function, so that it contains no phase information.

The Short-Time Fourier Transform (STFT) and
Time-Frequency Displays

Often we simply want to display sound as a spectrum that evolves through time. We know that this is what the brain ``sees'' when we hear sound. The classic spectrogram, developed at Bell Telephone Laboratories during World War II, has been used for decades to display the short-time spectrum of sound. There are even people who can ``read'' a spectrogram of speech. In Chapter 7, the classic spectrogram is reviewed, and development of more refined ``loudness spectrograms'' based on psychoacoustic research in loudness perception are discussed. These more refined spectrograms come closer to goal of ``what you see is what you hear''.

Since the proliferation of digital computers, spectrograms have been computed using the Short-Time Fourier Transform (STFT), which is simply a sequence of FFTs over time. In Chapter 7, the STFT is introduced.

Short-Time Analysis, Modification, and Resynthesis

The STFT is the main computational tool used in spectral modeling applications. As we discuss in detail in Chapters 8 and 9, the STFT may be viewed as either an

  1. overlap-add processor--a sequence of Fourier transforms of windowed data segments (a ``sliding'' or ``hopping'' FFT), or a
  2. filter-bank-sum processor--an implementation of a time-domain bandpass filter bank using an FFT to implement the filter bank.
Most STFT applications are based on the overlap-add point of view. However, the filter-bank-sum viewpoint coincides with earlier precursors such as the phase vocoder (discussed in the applications chapter), and many audio coding schemes have been based on filter banks.

STFT Applications

Chapter 10 discusses various practical STFT applications such as fundamental frequency measurement, cross-synthesis, spectral envelope extraction, sinusoidal modeling, time-scale modification (frequency scaling), and audio FFT filter banks. It is a good idea to scan Chapter 10 now to get a feel for practical techniques in common use.

Multirate Polyphase and Wavelet Filter Banks

The last chapter is devoted to the relatively advanced topic of perfect-reconstruction filter banks. There has been much recent research literature regarding primarily critically sampled filter banks. This is appropriate in audio compression applications. This book, however, is not focused on compression applications, and therefore so-called oversampled filter banks are of primary interest. After introducing the basic terminology and properties of elementary perfect-reconstruction filter banks, it is shown that the polyphase representation of a filter bank can be recognized as an overlap-add representation, such as we develop in Chapter 8 and apply in Chapter 10. Thus, there is a unified view of polyphase filter banks and STFT processing.


The appendices provide a summary of notation, continuous-time Fourier theorems, a primer on statistical signal processing (at a deeper level than in Chapter 6), more about the Gaussian function, frequency warping (a preprocessing technique that can greatly enhance the quality of an audio filter design), and programming examples in the matlab language (including software for spectrogram computation and inversion). Finally, an historical summary of spectral modeling from a ``computer music point of view'' appears in Appendix G.

Next Section:
Discrete Time Fourier Transform (DTFT)
Previous Section: