# Introduction and Overview

The ear is a kind of Fourier analyzer. That is, sound is spread out along the inner ear according to frequency, much like a prism separates light into various colors. As a result, hearing in the brain is based on a kind of ``short term spectrum analysis'' of sound. This is useful for a variety reasons:

- Perhaps most important, when the frequency content of one sound
is different from that of another sound, and the two sounds are
mixed (added) together, the sounds are largely separated out by the
hearing process. This allows us to mentally ``unmix'' the sounds,
enabling us to focus on one sound in the mix, excluding all
others. This is hard to do by computer, and it remains an active
research topic (``source separation'') in the field of Music
Information Retrieval (MIR).
- The formant resonances that distinguish the vowels of speech
are separated in the auditory nerve, thereby facilitating vowel
recognition by the brain.
- Periodic sounds are more audible than random sounds in the same
frequency band. Throughout the animal kingdom, this fact provides a
basis for various ``calls'' that can be heard above the ambient
environmental noise.
- Sounds can be recognized and distinguished based on spectral
profile, such as the difference between `s' and `sh'.
- Last but not least, we are able to appreciate tonal music!

As another example of the utility of spectrum analysis, the fields of chemistry, physics, astronomy, and cosmology were all advanced profoundly by the study of light spectra. To cite just one of many, many examples, the ``red shift'' (downward Doppler frequency-shift) of light coming from stars led Edwin Hubble (in 1929) to conclude that the Universe was expanding according to the Big Bang theory of cosmology (the farther apart two stars are, the faster they are racing away from each other).

In summary, spectrum analysis provides a wealth of information about signals that can be used for detection, classification, and discrimination tasks. Since hearing is based on a spectral decomposition, spectrum analysis provides an important foundation for many audio signal processing applications.

## Organization

The chapters are organized in a progression from basic spectrum analysis to more advanced frequency-domain signal processing as follows:

- Fourier transforms and theorems
- Spectrum analysis windows and their design
- FIR digital filter design
- Spectrum analysis of sinusoids
- Spectrum analysis of noise
- Time-frequency displays
- The Short-Time Fourier Transform (STFT)
- Overlap-add STFT processing
- Filter-bank view of the STFT
- Applications of the STFT
- Multirate polyphase and wavelet filter banks

In addition, appendices are provided containing material that extends and supplements various chapters in various directions. Others provide supporting background material:

- Notation
- Continuous-time Fourier theorems
- Statistical signal processing
- Gaussian function properties
- Bilinear audio frequency warping
- Examples in Matlab and Octave
- History of spectral modeling by topic

## Overview

### Elementary Spectrum Analysis

Before we do anything in the field of spectral modeling, we must be
able to competently compute the spectrum of a signal. Since the
spectrum is given by the Fourier transform of a
signal, Chapter 2 begins with a review of elementary Fourier
theory and the most generally useful *Fourier theorems* for practical
spectrum analysis.

In Chapter 3, we look at a number of FFT^{2.1}*windows* used in practice. The Fourier theorems give us a good
thinking vocabulary for understanding the properties of windows in the
frequency domain. In addition to a tour of well known windows,
optimal custom window design is addressed.

In Chapter 4, we apply both the Fourier theorems of
Chapter 2 and the FFT windows of Chapter 3 to the topic
of *FIR digital filter design*--that is, the numerical design of
finite-impulse response (FIR) filters for linear filtering in discrete
time. We will need such filters in Chapter 10 when we implement FIR
filters using *FFT convolution* in the framework of the
*short-time Fourier transform* (STFT).

Chapter 5 is concerned with spectrum analysis of *tonal*
signals, that is, signals having *narrow-band peaks* in their
spectra. It turns out the ear is especially sensitive to spectral peaks
(which is the basis for MPEG audio coding), and so it is often
important to be able to accurately measure the amplitude and frequency
of each prominent peak in the spectrum. (Sometimes we will also
measure the *phase* of the spectrum at the peak.)
This chapter also discusses *resolution* of spectral peaks, and
how the choice of FFT window affects resolution.

Chapter 6 is concerned with spectrum analysis of *noise*,
where, for our purposes, ``noise'' is defined as any ``filtered white
noise,'' and white noise is defined as any uncorrelated sequence of
samples. (These terms are defined in detail in Chapter 6.)
Unlike the ``deterministic'' case, such as when analyzing tonal
signals, we must *average* the squared-magnitude spectrum over
several time frames in order to obtain a statistically ``stable''
estimate of the spectrum. This average is called a *power
spectral density*, and the method of averaging is called *Welch's
Method*. It is noteworthy that the power spectral density is a real
and positive function, so that it contains no phase information.

###
The Short-Time Fourier Transform (STFT) and

Time-Frequency Displays

Often we simply want to display sound as a spectrum that evolves
through time. We know that this is what the brain ``sees'' when we
hear sound. The classic *spectrogram*, developed at Bell
Telephone Laboratories during World War II, has been used for decades
to display the short-time spectrum of sound. There are even people
who can ``read'' a spectrogram of speech. In Chapter 7, the
classic spectrogram is reviewed, and development of more refined
``loudness spectrograms'' based on psychoacoustic research in
loudness perception are discussed. These more refined spectrograms
come closer to goal of ``what you see is what you hear''.

Since the proliferation of digital computers, spectrograms have been computed using the Short-Time Fourier Transform (STFT), which is simply a sequence of FFTs over time. In Chapter 7, the STFT is introduced.

### Short-Time Analysis, Modification, and Resynthesis

The STFT is the main computational tool used in spectral modeling applications. As we discuss in detail in Chapters 8 and 9, the STFT may be viewed as either an

*overlap-add processor*--a sequence of Fourier transforms of windowed data segments (a ``sliding'' or ``hopping'' FFT), or a*filter-bank-sum processor*--an implementation of a time-domain bandpass filter bank using an FFT to implement the filter bank.

### STFT Applications

Chapter 10 discusses various practical STFT applications such as fundamental frequency measurement, cross-synthesis, spectral envelope extraction, sinusoidal modeling, time-scale modification (frequency scaling), and audio FFT filter banks. It is a good idea to scan Chapter 10 now to get a feel for practical techniques in common use.

### Multirate Polyphase and Wavelet Filter Banks

The last chapter is devoted to the relatively advanced topic of
perfect-reconstruction filter
banks. There has been much recent research literature regarding
primarily *critically sampled* filter banks. This is appropriate
in *audio compression* applications. This book, however, is not
focused on compression applications, and therefore so-called
*oversampled filter banks* are of primary interest. After
introducing the basic terminology and properties of elementary
perfect-reconstruction filter banks, it is shown that the *polyphase
representation* of a filter bank can be recognized as an *overlap-add*
representation, such as we develop in Chapter 8 and apply in
Chapter 10. Thus, there is a unified view of polyphase filter banks
and STFT processing.

### Appendices

The appendices provide a summary of notation, continuous-time Fourier theorems, a primer on statistical signal processing (at a deeper level than in Chapter 6), more about the Gaussian function, frequency warping (a preprocessing technique that can greatly enhance the quality of an audio filter design), and programming examples in the matlab language (including software for spectrogram computation and inversion). Finally, an historical summary of spectral modeling from a ``computer music point of view'' appears in Appendix G.

**Next Section:**

Fourier Transforms for Continuous/Discrete Time/Frequency

**Previous Section:**

Preface