Multirate Filter Banks
The preceding chapters have been concerned essentially with the shorttime Fourier transform and all that goes with it. After developing the overlapadd point of view in Chapter 7, we developed the alternative (dual) filterbank point of view in Chapter 8. This chapter is concerned more broadly with filter banks, whether they are implemented using the FFT or by some other means. In the end, however, we will come full circle and look at the properly configured STFT as an example of a perfect reconstruction (PR) filter bank as defined herein. Moreover, filter banks in practice are normally implemented using the FFT.
The subject of PR filter banks is normally considered only in the context of systems for audio compression, and they are normally critically sampled in both time and frequency. This book, on the other hand, belongs to a tiny minority which is not concerned with compression at all, but rather useful timefrequency decompositions for sound, and corresponding applications in music and digital audio effects.
Perhaps the most important new topic introduced in this chapter is the polyphase representation for filter banks. This is both an important analysis tool and a basis for efficient implementation. We will see that it can be seen as a generalization of the overlapadd approach discussed in Chapter 7.
The polyphase representation will make it straightforward to determine general conditions for perfect reconstruction in any filter bank. The STFT will provide some special cases, but there will be many more. In particular, the filter banks used in perceptual audio coding will be special cases as well. Polyphase analysis is used to derive classes of PR filter banks called ``paraunitary,'' ``cosine modulated,'' and ``pseudoquadrature mirror'' filter banks, among others.
Another extension we will take up in this chapter is multirate systems. Multirate filter banks use different sampling rates in different channels, matched to different filter bandwidths. Multirate filter banks are very important in audio work because the filtering by the inner ear is similarly a variable resolution ``filter bank'' using wider passbands at higher frequencies. Finally, the related subject of wavelet filter banks is briefly introduced, and further reading is recommended.
Upsampling and Downsampling
For the DTFT, we prove in §2.3.11 of Chapter 2 the stretch theorem (repeat theorem) which relates upsampling (``stretch'') to spectral copies (``images'') in the DTFT context; this is the discretetime counterpart of the scaling theorem for continuoustime Fourier transforms (§B.1.4). Also, §2.3.12 discusses the downsampling theorem (aliasing theorem) for DTFTs which relates downsampling to aliasing for discretetime signals. In this section, we review the main results.
Upsampling (Stretch) Operator
Figure 10.1 shows the graphical symbol for a digital upsampler by the factor . To upsample by the integer factor , we simply insert zeros between and for all . In other words, the upsampler implements the stretch operator defined in §2.3.9:
In the frequency domain, we have, by the stretch (repeat) theorem for DTFTs:
Plugging in , we see that the spectrum on contracts by the factor , and images appear around the unit circle. For , this is depicted in Fig.10.2.
Downsampling (Decimation) Operator
Figure 10.3 shows the symbol for downsampling by the factor . The downsampler selects every th sample and discards the rest:
In the frequency domain, we have
Thus, the frequency axis is expanded by factor , wrapping times around the unit circle, adding to itself times. For , two partial spectra are summed, as indicated in Fig.10.4.
Using the common twiddle factor notation
Example: Downsampling by 2
As an example, when , , and (since )
Example: Upsampling by 2
When , , and
Filtering and Downsampling
Because downsampling by will cause aliasing for any frequencies in the original signal above , the input signal may need to be first lowpass filtered to prevent aliasing, as shown in Fig.10.5. Suppose we implement such an antialiasing lowpass filter as an FIR filter of length with a cutoff frequency . This is drawn in direct form in Fig.10.6.
We do not need out of every filter output samples due to the downsampler. To realize this savings, we can commute the downsampler through the adders inside the FIR filter to obtain the result shown in Fig.10.7. The multipliers are now running at times the sampling frequency of the input signal, . This reduces the computation requirements by a factor of . The downsampler outputs are called polyphase signals. This is a summed polyphase filter bank in which each ``subphase filter'' is a constant scale factor .
The summed polyphase signals of Fig.10.7 can be interpreted in the following ways:
 A ``serial to parallel conversion'' from a stream of scalar
samples to a sequence of length buffers every samples,
followed by a dot product of each buffer with .
 The overall system is equivalent to a
roundrobin demultiplexor, with a different gain for
each output, followed by an sample summer which adds the
``deinterleaved'' signals together:
The polyphase processing in the antialiasing filter of Fig.10.7 is as follows:
 The 0th subphase signal,
 Subphase signal 1,

 Subphase signal ,
Polyphase Filtering
In multirate signal processing, it is often fruitful to split a signal or filter into its polyphase components.
TwoChannel Case
Let's look first at the case . We begin with the filter
and are the polyphase components of the polyphase decomposition of for .
Now write as the sum of the odd and even terms:
As a simple example, consider
Thus, can be written as the sum of the following two polyphase components:
NChannel Polyphase Decomposition
For the general case of arbitrary , the basic idea is to decompose into its periodically interleaved subsequences, as indicated schematically in Fig.10.9. The polyphase decomposition into channels is given by
For , we have the system diagram shown in Fig.10.11.
Type II Polyphase Decomposition
The preceding polyphase decomposition of into channels
In the ``Type II'', or reverse polyphase decomposition, the powers of progress in the opposite direction:
Filtering and Downsampling, Revisited
As another example of polyphase filtering, let's return to the example §10.1.3. This time, however, let the FIR lowpass filter h(n) be of length . The polyphase filters, , are each length . Recall that
Next, we commute the downsampler through the adders and through the upsampled polyphase filters, to obtain Fig.10.13.
Commuting the downsampler through the subphase filters to get is an example of a multirate noble identity.
Multirate Noble Identities
Figure 10.14 shows the two socalled ``noble identities'' for commuting downsamplers/upsamplers with ``sparse transfer functions'' which can be expressed a function of . Note that downsamplers and upsamplers are linear, timevarying operators. Therefore, operation order is very important. It is also important to note that adders or multipliers (any memoryless operators) can commute across downsamplers and upsamplers, as shown in Fig.10.15.
Critically Sampled Perfect Reconstruction Filter Banks
A Perfect Reconstruction (PR) filter bank is any filter bank whose reconstruction is the original signal, possibly delayed, and possibly scaled by a constant. In this context, critical sampling (also called ``maximal downsampling'') means that the downsampling factor is the same as the number of filter channels. For the STFT, this implies (with for Portnoff windows).
As derived in Chapter 7, the ShortTime Fourier Transform (STFT) is a PR filter bank whenever the ConstantOverLapAdd (COLA) condition is met by the analysis window and the hop size . However, only the rectangular window case with no zeropadding is critically sampled (OLA hop size = FBS downsampling factor = ). Advanced audio compression algorithms (``perceptual audio coding'') are based on critically sampled filter banks, for obvious reasons.
Important Point: We normally do not require critical sampling for audio analysis, digital audio effects, and music applications. We normally only need it when compression is a requirement.
TwoChannel Critically Sampled Filter Banks
Let's begin with a simple twochannel case, with lowpass analysis filter , highpass analysis filter , lowpass synthesis filter , and highpass synthesis filter . This system is diagrammed in Fig.10.16. The outputs of the two analysis filters are then
After substitutions and rearranging, the output is a filtered replica plus an aliasing term:
We require the second term (the aliasing term) to be zero for perfect reconstruction. This is arranged if we set
Thus,
 The synthesis lowpass filter is the rotation by of the analysis highpass filter on the unit circle. If is highpass, cutting off at , then will be lowpass, cutting off at .
 The synthesis highpass filter is the negative of the rotation of the analysis lowpass filter .
For perfect reconstruction, we additionally need
where is any constant times a linearphase term corresponding to samples of delay.
Choosing and to cancel aliasing,
Perfect reconstruction thus also imposes a constraint on the analysis filters, which is of course true for any bandsplitting filter bank.
Let denote . Then both constraints can be expressed in matrix form as
AmplitudeComplementary 2Channel Filter Bank
Perhaps the most natural choice of analysis filters for our twochannel, critically sampled filter bank, is an amplitudecomplementary lowpass/highpass pair, i.e.,
Plugging the COLA constraint into the Filtering and Aliasing Cancellation constraint (10.4) gives
Points to note:
 Evenindexed terms of the impulse response are unconstrained, since they subtract out in the constraint.
 For perfect reconstruction, exactly one oddindexed term must be nonzero in the lowpass impulse response . The simplest choice is .
or
The above class of amplitudecomplementary filters can be characterized in general as follows:
In summary, we see that an amplitudecomplementary lowpass/highpass analysis filter pair yields perfect reconstruction (aliasing and filtering cancellation) when there is exactly one oddindexed term in the impulse response of .
Unfortunately, the channel filters are so constrained in form that it is impossible to make a high quality lowpass/highpass pair. This happens because repeats twice around the unit circle. Since we assume real coefficients, the frequency response, is magnitudesymmetric about as well as . This is not good since we only have one degree of freedom, , with which we can break the symmetry to reduce the highfrequency gain and/or boost the lowfrequency gain. This class of filters cannot be expected to give high quality lowpass or highpass behavior.
To achieve higher quality lowpass and highpass channel filters, we will need to relax the amplitudecomplementary constraint (and/or filtering cancellation and/or aliasing cancellation) and find another approach.
Haar Example
Before we leave this case (amplitudecomplementary, twochannel, critically sampled, perfect reconstruction filter banks), let's see what happens when is the simplest possible lowpass filter having unity dc gain, i.e.,
The polyphase components of are clearly
Thus, both the analysis and reconstruction filter banks are scalings of the familiar Haar filters (``sum and difference'' filters ).
The frequency responses are
which are plotted in Fig.10.17.
Polyphase Decomposition of Haar Example
Let's look at the polyphase representation for this example. Starting with the filter bank and its reconstruction (see Fig.10.18), the polyphase decomposition of is
The polyphase representation of the filter bank and its reconstruction can now be drawn as in Fig.10.19. Notice that the reconstruction filter bank is formally the transpose of the analysis filter bank [247]. A filter bank that is inverted by its own transpose is said to be an orthogonal filter bank, a subject to which we will return §10.3.8.
Commuting the downsamplers (using the noble identities from §10.2.5), we obtain Figure 10.20. Since , this is simply the OLA form of an STFT filter bank for , with , and rectangular window . That is, the DFT size, window length, and hop size are all 2, and both the DFT and its inverse are simply sumanddifference operations.
Quadrature Mirror Filters (QMF)
The well studied subject of Quadrature Mirror Filters (QMF) is entered by imposing the following symmetry constraint on the analysis filters:
That is, the filter for channel 1 is constrained to be a rotation of filter 0 along the unit circle. In the time domain, , i.e., all oddindex coefficients are negated. If is a lowpass filter cutting off near (as is typical), then is a complementary highpass filter. The exact cutoff frequency can be adjusted along with the rolloff rate to provide a maximally constant frequencyresponse sum.
Twochannel QMFs have been around since at least 1976 [51], and appear to be the first critically sampled perfect reconstruction filter banks. Historically, the term QMF applied only to twochannel filter banks having the QMF symmetry constraint (10.6). Today, the term ``QMF filter bank'' may refer to more general PR filter banks with any number of channels and not obeying (10.6) [266].
Combining the QMF symmetry constraint with the aliasingcancellation constraints, given by
the perfect reconstruction requirement reduces to
Now, all four filters are determined by .
It is easy to show using the polyphase representation of (see [266]) that the only causal FIR QMF analysis filters yielding exact perfect reconstruction are twotap FIR filters of the form
where and are constants, and and are integers. Therefore, only weak channel filters are available in the QMF case ( ), as we saw in the amplitudecomplementary case. On the other hand, very high quality IIR solutions are possible. See [266, pp. 201204] for details. In practice, approximate ``pseudo QMF'' filters are more practical, which only give approximate perfect reconstruction. We'll return to this topic in §10.7.1.
The Haar filters, which we saw gave perfect reconstruction in the amplitudecomplementary case, are also examples of a QMF filter bank:
In this example, , and .
Linear Phase Quadrature Mirror Filter Banks
Linear phase filters delay all frequencies by equal amounts, and this is often a desirable property in audio and other applications. A filter phase response is linear in whenever its impulse response is symmetric, i.e.,
Conjugate Quadrature Filters (CQF)
A class of causal, FIR, twochannel, criticially sampled, exact perfectreconstruction filterbanks is the set of socalled Conjugate Quadrature Filters (CQF). In the zdomain, the CQF relationships are
That is, for the lowpass channel, and the highpass channel filters are a modulation of their lowpass counterparts by . Again, all four analysis and synthesis filters are determined by the lowpass analysis filter . It can be shown that this is an orthogonal filter bank. The analysis filters and are power complementary, i.e.,
With the CQF constraints, Eq.(10.1) reduces to
Let , such that is a spectral factor of the halfband filter (i.e., is a nonnegative power response which is lowpass, cutting off near ). Then, (10.8) reduces to
The problem of the PR filter design has thus been reduced to designing one halfband filter, . It can be shown that any halfband filter can be written in the form . That is, all nonzero evenidexed values of are set to zero.
A simple design of an FIR halfband filter would be to window a sinc function:
(11.10) 
where is any suitable window, such as the Kaiser window.
Note that as a result of (10.8), the CQF filters are power complementary. That is, they satisfy:
 FIR
 orthogonal
 linear phase
By relaxing ``orthogonality'' to ``biorthogonality'', it becomes possible to obtain FIR linear phase filters in a critically sampled, perfect reconstruction filter bank. (See §10.9.)
Orthogonal TwoChannel Filter Banks
Recall the reconstruction equation for the twochannel, critically sampled, perfectreconstruction filterbank:
This can be written in matrix form as
It turns out orthogonal filter banks give perfect reconstruction filter banks for any number of channels. Orthogonal filter banks are also called paraunitary filter banks, which we'll study in polyphase form in §10.5 below. The AC matrix is paraunitary if and only if the polyphase matrix (defined in the next section) is paraunitary [266].
Perfect Reconstruction Filter Banks
We now consider filter banks with an arbitrary number of channels, and ask under what conditions do we obtain a perfect reconstruction filter bank? Polyphase analysis will give us the answer readily. Let's begin with the channel filter bank in Fig.10.21. The downsampling factor is . For critical sampling, we set .
The next step is to expand each analysis filter into its channel ``Type 1'' polyphase representation:
Similarly, expand the synthesis filters in a Type II polyphase decomposition:
The polyphase representation can now be depicted as shown in Fig.10.22. When , commuting the up/downsamplers gives the result shown in Fig.10.23. We call the polyphase matrix.
As we will show below, the above simplification can be carried out more generally whenever divides (e.g., ). In these cases becomes and becomes .
Simple Examples of Perfect Reconstruction
If we can arrange to have
Thus, when and , we have a simple parallelizer/serializer, which is perfectreconstruction by inspection: Referring to Fig.10.24, think of the input samples as ``filling'' a length delay line over sample clocks. At time 0, the downsamplers and upsamplers ``fire'', transferring (and zeros) from the delay line to the output delay chain, summing with zeros. Over the next clocks, makes its way toward the output, and zeros fill in behind it in the output delay chain. Simultaneously, the input buffer is being filled with samples of . At time , makes it to the output. At time , the downsamplers ``fire'' again, transferring a length ``buffer'' [ ] to the upsamplers. On the same clock pulse, the upsamplers also ``fire'', transferring samples to the output delay chain. The bottommost sample [ ] goes out immediately at time . Over the next sample clocks, the length output buffer will be ``drained'' and refilled by zeros. Simultaneously, the input buffer will be replaced by new samples of . At time , the downsamplers and upsamplers ``fire'', and the process goes on, repeating with period . The output of the way parellelizer/serializer is therefore
Sliding Polyphase Filter Bank
When , there is no downsampling or upsampling, and the system further reduces to the case shown in Fig.10.25. Working backward along the output delay chain, the output sum can be written as
Thus, when , the output is
Hopping Polyphase Filter Bank
When and divides , we have, by a similar analysis,
Sufficient Condition for Perfect Reconstruction
Above, we found that, for any integer which divides , a sufficient condition for perfect reconstruction is
where is any constant and is any nonnegative integer. In this case, the output signal is
Thus, given any polyphase matrix , we can attempt to compute : If it is stable, we can use it to build a perfectreconstruction filter bank. However, if is FIR, will typically be IIR. In §10.5 below, we will look at paraunitary filter banks, for which is FIR and paraunitary whenever is.
Necessary and Sufficient Conditions for Perfect Reconstruction
It can be shown [266] that the most general conditions for perfect reconstruction are that
Note that the more general form of above can be regarded as a (nonunique) square root of a vector unit delay, since
Polyphase View of the STFT
As a familiar special case, set
The channel analysis and synthesis filters are, respectively,
where , and
Looking again at the polyphase representation of the channel filter bank with hop size , , , dividing , we have the system shown in Fig.10.26. Following the same analysis as in §10.4.1 leads to the following conclusion:
Our analysis showed that the STFT using a rectangular window is a perfect reconstruction filter bank for all integer hop sizes in the set . The same type of analysis can be applied to the STFT using the other windows we've studied, including Portnoff windows.
Example: Polyphase Analysis of the STFT with 50% Overlap, ZeroPadding, and a NonRectangular Window
Figure 10.27 illustrates how a window and a hop size other than can be introduced into the polyphase representation of the STFT. The constantoverlapadd of the window is implemented in the synthesis delay chain (which is technically the transpose of a tapped delay line). The downsampling factor and window must be selected together to give constant overlapadd, independent of the choice of polyphase matrices and (shown here as the and ).
Example: Polyphase Analysis of the Weighted Overlap Add Case: 50% Overlap, ZeroPadding, and a NonRectangular Window
We may convert the previous example to a weighted overlapadd (WOLA) (§7.6) filter bank by replacing each by and introducing these gains also between the and upsamplers:
Paraunitary Filter Banks
Paraunitary filter banks form an interesting subset of perfect reconstruction (PR) filter banks. We saw above that we get a PR filter bank whenever the synthesis polyphase matrix times the analysis polyphase matrix is the identity matrix , i.e., when
Paraconjugation
Paraconjugation is the generalization of the complex conjugate transpose operation from the unit circle to the entire plane. A paraunitary filter bank is therefore a generalization of an orthogonal filter bank. Recall that an orthogonal filter bank is one in which is an orthogonal (or unitary) matrix, to within a constant scale factor, and is its transpose (or Hermitian transpose).
Lossless Filters
To motivate the idea of paraunitary filters, let's first review some properties of lossless filters, progressing from the simplest cases up to paraunitary filter banks:
 A linear, timeinvariant filter is said to be
lossless (or
allpass) if it preserves signal
energy. That is, if the input signal is , and the output
signal is
, then we have
 It is straightforward to show that losslessness implies
 The paraconjugate of a transfer function may be defined as the
analytic continuation of the complex conjugate from the unit circle to
the whole plane:
We refrain from conjugating in the definition of the paraconjugate becase is not analytic in the complexvariables sense. Instead, we invert , which is analytic, and which reduces to complex conjugation on the unit circle.
The paraconjugate may be used to characterize allpass filters as follows:
 A causal, stable, filter is allpass if and only if
To generalize lossless filters to the multiinput, multioutput (MIMO) case, we must generalize conjugation to MIMO transfer function matrices:
 A transfer function matrix
is
said to be lossless
if it is stable and its frequencyresponse matrix
is
unitary. That is,
 Note that
is a matrix
product of a times a matrix. If , then
the rank must be deficient. Therefore, we must have .
(There must be at least as many outputs as there are inputs, but it's
ok to have extra outputs.)
 A lossless transfer function matrix
is paraunitary,
i.e.,
Lossless Filter Examples
 The simplest lossless filter is a unitmodulus gain
 A lossless FIR filter can only consist of a single nonzero tap:
 Every finiteorder, singleinput, singleoutput (SISO),
lossless IIR filter (recursive allpass filter) can be written as
 The normalized DFT matrix is an order zero
paraunitary transformation. This is because the normalized DFT
matrix,
, where
, is a
unitary matrix:
Properties of Paraunitary Systems
Paraunitary systems are essentially multiinput, multioutput (MIMO) allpass filters. Let denote the matrix transfer function of a paraunitary system. In the square case (), the matrix determinant, , is an allpass filter. Therefore, if a square contains FIR elements, its determinant is a simple delay: for some integer .
Properties of Paraunitary Filter Banks
An channel analysis filter bank can be viewed as an MIMO filter:
We can note the following properties of paraunitary filter banks:
 The synthesis filter bank is simply the paraconjugate of the
analysis filter bank:
 The channel filters are power complementary:
 When
is FIR, the corresponding synthesis filter matrix
is also FIR.
 When
is FIR, each synthesis filter,
, is simply the
of its corresponding
analysis filter
:
 FIR analysis and synthesis filters in paraunitary filter banks
have the same amplitude response. This follows from the fact
that
, i.e., flipping an FIR filter
impulse response conjugates the frequency response, which does
not affect its amplitude response
.
 The polyphase matrix
for any FIR paraunitary perfect
reconstruction filter bank can be written as the product of a
paraunitary and a unimodular matrix, where a
unimodular polynomial matrix
is any square
polynomial matrix having a constant nonzero
determinant. For example,
Paraunitary Examples
Consider the Haar filter bank discussed previously, for which
For more about paraunitary filter banks, see Chapter 6 of [266].
Filter Banks Equivalent to STFTs
We now turn to various practical examples of perfect reconstruction filter banks, with emphasis on those using the FFT in their implementation (i.e., various STFT filter banks).
Figure 10.29 illustrates a generic filter bank with channels, much like we derived in §8.3. The analysis filters , are bandpass filters derived from a lowpass prototype by modulation (e.g., ), as shown in the right portion of the figure. The channel signals are given by the convolution of the input signal with the th channel impulse response:
From Chapter 8, we recognize this expression as the slidingwindow STFT, where is the flip of a sliding window ``centered'' at time , and is the th DFT bin at time . We also know from that discussion that remodulating the DFT channel outputs and summing gives perfect reconstruction of whenever is Nyquist(N) (the defining condition for Portnoff windows [202] in §8.7.
Suppose the analysis window (flip of the basebandfilter impulse response ) is length . Then in the context of overlapadd processors (Chapter 7), is a Portnoff window, and implementing the window with a length FFT requires that the windowed data frame be timealiased down to length prior to taking a length FFT (see §8.7). We can obtain this same result via polyphase analysis, as elaborated in the next section.
Polyphase Analysis of Portnoff STFT
Consider the th filterbank channel filter
Consequently,
If is a good thband lowpass, the subband signals are bandlimited to a region of width . As a result, there is negligible aliasing when we downsample each of the subbands by . Commuting the downsamplers to get an efficient implementation gives Fig.10.30.
First note that if for all , the system of Fig.10.30 reduces to a rectangularly windowed STFT in which the window length equals the DFT length . The downsamplers ``hold off'' the DFT until the length 3 delay line fills with new input samples, then it ``fires'' to produce a spectral frame. A new spectral frame is produced after every third sample of input data is received.
In the more general case in which are nontrivial filters, such as , for example, they can be seen to compute the equivalent of a time aliased windowed input frame, such as . This follows because the filters operate on the downsampled input stream, so that the filter coefficients operate on signal samples separated by samples. The linear combination of these samples by the filter implements the timealiased windowed data frame in a Portnoffwindowed overlapadd STFT. Taken together, the polyphase filters compute the appropriately timealiased data frame windowed by the .
In the overlapadd interpretation of Fig.10.30, the window is hopped by samples. While this was the entire window length in the rectangular window case (), it is only a portion of the effective frame length when the analysis filters have order 1 or greater.
MPEG Filter Banks
This section provides some highlights of the history of filter banks used for perceptual audio coding (MPEG audio). For a more complete introduction and discussion of MPEG filter banks, see [16].
PseudoQMF Cosine Modulation Filter Bank
Section 10.3.5 introduced twochannel quadrature mirror filter banks (QMF). We found that the quadrature mirror constraint on the analysis filters
Quadrature Mirror Filters (QMF), defined in §10.3.5, provide a particular class of perfect reconstruction filter banks. The PseudoQMF (PQMF) filter bank is a ``near perfect reconstruction'' filter bank in which aliasing cancellation occurs only between adjacent bands [183,266]. The PQMF filters commonly used in perceptual audio coders employ bandpass filters with stopband attenuation near dB, so the neglected bands (which alias freely) are not significant. The design procedure is as follows:
 Design a lowpass prototype window, , with length ,
 The lowpass design is
constrained to give aliasing cancellation in neighboring subbands:
 The filter bank analysis filters are cosine modulations of
:
 Since it is an orthogonal filter bank by construction,
the synthesis filters are simply the time reverse of the analysis filters:
Perfect Reconstruction Cosine Modulated Filter Banks
By changing the phases , the pseudoQMF filter bank can yield perfect reconstruction:
If , then this is the oddlystacked PrincenBradley filter bank, and the analysis filters are related by cosine modulations of the lowpass prototype:
MPEG Layer III Filter Bank
In MPEG 1& 2, Layer III (the popular ``MP3 format''),
The original MPEG 1&2, Layers I and II, based on the MUSICAM coder, contained only 32 subbands (each band approximately 650 Hz wide, implemented using a length 512 lowpassprototype window, lapped (``time aliased'') by factor of 512/32 = 8, thus yielding 32 real bands with 96 dB of stopband rejection, and having a hop size of 32 samples) [140, §4.1.1]. It was found, however, that a higher coding gain was obtained using a finer frequency resolution. As a result, the MPEG 1&2 Layer III coder (based on the ASPEC coder from AT&T), appended a PrincenBradley filter bank [203] having 6 to 18 subbands to the output of each subband of the 32channel PQMF cosinemodulated analysis filter bank [140, §4.1.2]. The number of subbands and window shape were chosen to be signaldependent as follows:
 Transients use subbands, corresponding to relatively high time resolution and low frequency resolution.
 Steadystate tones use subbands, corresponding to higher frequency resolution and lower time resolution relative to transients.^{11.1}
 The encoder generates a function called the perceptual entropy (PE) which tells the coder when to switch resolutions.
The MPEG AAC coder is generally regarded as providing nearly twice the compression ratio of ``MP3'' (MPEG 12 Layer III) coding at the same quality level.^{11.2} MPEG AAC introduced a new MDCT filter bank that adaptively switched between 128 and 1024 bands (length 256 and 2048 FFT windows, using 50% overlap) [140, §4.1.6]. The nearly doubled number of frequency bands available for coding steadystate signal intervals contributed much to the increased coding gain of AAC over MP3. The 1281024 MDCT filter bank in AAC is also considerably simpler than the hierarchical MP3 filter bank, without requiring the ``crosstalk aliasing reduction'' needed by the PQMF/MDCT hierarchical filter bank of MP3 [140, §4.1.6].
The MPEG4 audio compression standard (there was no MPEG3), included a new transform coder based on the AAC filter bank [140, §4.1.7].
See, e.g., [16] for much more on MPEG coders and related topics. Chapter 4 of the dissertation of Scott Levine [140] contains an excellent summary of MPEG, Sony ATRAC, and Dolby ACn coders up to 1998.
Review of STFT Filterbanks
Let's take a look at some of the STFT processors we've seen before, now viewed as a polyphase filter bank.
Since they are all based on the FFT, they are all efficient, but most are oversampled as ``filter banks'' go. Some oversampling is usually preferred outside of a compression context.
The STFT also computes a uniform filter bank, but it can be used as the basis for a variety of nonuniform filter banks giving frequency resolution closer to that of hearing.
STFT, Rectangular Window, No Overlap
 Perfect reconstruction
 Critically sampled (aliasing cancellation)
 Poor channel isolation (13dB)
 Not robust to filterbank modifications^{11.3}
STFT, Rectangular Window, 50% Overlap
 Perfect reconstruction
 Oversampled by 2 (aliasing cancellation)
 Poor channel isolation (13dB)
 Not robust to filterbank modifications, but better
STFT, Triangular Window, 50% Overlap
 Perfect reconstruction
 Oversampled by 2
 Better channel isolation (26dB)
 Moderately robust to filterbank modifications
STFT, Hamming Window, 75% Overlap
 Perfect reconstruction
 Oversampled by 4
 Aliasing from sidelobes only
 Good channel isolation (42dB)
 Moderately robust to filterbank modifications
STFT, Kaiser Window, Beta=10, 90 % Overlap
 Approximate perfect reconstruction (sidelobes controlled by )
 Oversampled by
 Excellent channel isolation (80 dB)
 Very robust to filterbank modifications
 Aliasing from sidelobes only
Sliding FFT (Maximum Overlap), Any Window, ZeroPadded by 5
 Perfect reconstruction (always true when hopsize = 1)
 Oversampled by :
 = window length [timedomain oversampling factor]
 5 = zeropadding factor [frequencydomain oversampling factor]
 Excellent channel isolation (set by window sidelobes)
 Extremely robust to filterbank modifications
 No aliasing to cancel
Wavelet Filter Banks
We will now approach filterbank derivation from a ``Hilbert space'' (geometric) point of view. This is the most natural setting for the study of wavelet filter banks [270,266].
Geometric Signal Theory
In general, signals can be expanded as a linear combination
of orthonormal basis signals [248]. In the
discretetime case, this can be expressed as
where the coefficient of projection of onto is given by
A set of signals is said to be a biorthogonal basis set if any signal can be represented as
The following examples illustrate the Hilbert space point of view for various familiar cases of the Fourier transform and STFT. A more detailed introduction appears in Book I [248].
Natural Basis
The natural basis for a discretetime signal is the set of shifted impulses:
This expansion was used in Book II [247] to derive the impulseresponse representation of an arbitrary linear, timeinvariant filter.
Normalized DFT Basis for
The Normalized Discrete Fourier Transform (NDFT) (introduced in Book I [248]) projects the signal onto discretetime sinusoids of length , where the sinusoids are normalized to have unit norm:
and the expansion of in terms of the NDFT basis set is
for .
Normalized Fourier Transform Basis
The Fourier transform projects a continuoustime signal onto an infinite set of continuoustime complex sinusoids , for . These sinusoids all have infinite norm, but a simple normalization by can be chosen so that the inverse Fourier transform has the desired form of a superposition of projections:
Normalized DTFT Basis
The Discrete Time Fourier Transform (DTFT) is similar to the Fourier transform case:
Normalized STFT Basis
The Short Time Fourier Transform (STFT) is defined as a timeordered sequence of DTFTs, and implemented in practice as a sequence of FFTs (see §6.1). Thus, the signal basis functions are naturally defined as the DFTsinusoids multiplied by timeshifted windows, suitably normalized for unit norm:
When successive windows overlap (i.e., the hop size is less than the window length ), the basis functions are not orgthogonal. In this case, we may say that the basis set is overcomplete.
The basis signals are orthonormal when and the rectangular window is used (). That is, two rectangularly windowed DFT sinusoids are orthogonal when either the frequency binnumbers or the time framenumbers differ, provided that the window length equals the number of DFT frequencies (no zero padding). In other words, we obtain an orthogonal basis set in the STFT when the hop size, window length, and DFT length are all equal (in which case the rectangular window must be used to retain the perfectreconstruction property). In this case, we can write
so that the signal expansion can be interpreted as
In the overcomplete case, we get a special case of weighted
overlapadd (§7.6):
Continuous Wavelet Transform
In the present (Hilbert space) setting, we can now easily define the continuous wavelet transform in terms of its signal basis set:
The parameter is called a scale parameter (analogous to frequency). The normalization by maintains energy invariance as a function of scale. We call the wavelet coefficient at scale and time . The kernel of the wavelet transform is called the mother wavelet, and it typically has a bandpass spectrum. A qualitative example is shown in Fig.10.32.
The socalled admissibility condition for a mother wavelet is
The Morlet wavelet is simply a Gaussianwindowed complex sinusoid:
The scale factor is chosen so that . The center frequency is typically chosen so that second peak is half of first:
Since the scale parameter of a wavelet transform is analogous to frequency in a Fourier transform, a wavelet transform display is often called a scalogram, in analogy with an STFT ``spectrogram'' (discussed in §6.2).
When the mother wavelet can be interpreted as a windowed sinusoid (such as the Morlet wavelet), the wavelet transform can be interpreted as a constantQ Fourier transform.^{11.4}Before the theory of wavelets, constantQ Fourier transforms (such as obtained from a classic thirdoctave filter bank) were not easy to invert, because the basis signals were not orthogonal. See Appendix F for related discussion.
Discrete Wavelet Transform
The discrete wavelet transform is a discretetime,
discretefrequency counterpart of the continuous wavelet transform of
the previous section:
where and range over the integers, and is the mother wavelet, interpreted here as a (continuous) filter impulse response.
The inverse transform is, as always, the signal expansion in terms of the orthonormal basis set:
We can show that discrete wavelet transforms are constantQ by defining the center frequency of the th basis signal as the geometric mean of its bandlimits and , i.e.,
Discrete Wavelet Filterbank
In a discrete wavelet filterbank, each basis signal is
interpreted as the impulse response of a bandpass filter in a
constantQ filter bank:
Thus, the th channelfilter is obtained by frequencyscaling (and normalizing for unit energy) the zeroth channel filter . The frequency scalefactor is of course equal to the inverse of the timescale factor.
Recall that in the STFT, channel filter is a shift of the zeroth channelfilter (which corresponds to ``cosine modulation'' in the time domain).
As the channelnumber increases, the channel impulse response lengthens by the factor ., while the passband of its frequencyresponse narrows by the inverse factor .
Figure 10.33 shows a block diagram of the discrete wavelet filter bank for (the ``dyadic'' or ``octave filterbank'' case), and Fig.10.34 shows its timefrequency tiling as compared to that of the STFT. The synthesis filters may be used to make a biorthogonal filter bank. If the are orthonormal, then .
Dyadic Filter Banks
A dyadic filter bank is any octave filter bank,^{11.5} as illustrated qualitatively in Figure 10.35. Note that is the topoctave bandpass filter, is the bandpass filter for next octave down, is the octave bandpass below that, and so on. The optional scale factors result in the same sumofsquares for each channelfilter impulse response.
A dyadic filter bank may be derived from the discrete wavelet filter bank by setting and relaxing the exact orthonormality requirement on the channelfilter impulse responses. If they do happen to be orthonormal, we may call it a dyadic wavelet filter bank.
For a dyadic filter bank, the centerfrequency of the th channelfilter impulse response can be defined as
Dyadic Filter Bank Design
Design of dyadic filter banks using the window method for FIR digital filter design (introduced in §E.4) is described in, e.g., [214, §6.2.3b].
A ``very easy'' method suggested in [266, §11.6] is to design a twochannel paraunitary QMF bank, and repeat recursively to split the lowerhalf of the spectrum down to some desired depth.
Generalized STFT
A generalized STFT may be defined by [266]
This filter bank and its reconstruction are diagrammed in Fig.10.36.
The analysis filter is typically complex bandpass (as in the STFT case). The integers give the downsampling factor for the output of the th channel filter: For critical sampling without aliasing, we set . The impulse response of synthesis filter can be regarded as the th basis signal in the reconstruction. If the are orthonormal, then we have . More generally, form a biorthogonal basis.
Further Reading
In addition to [266] which was frequently cited in this chapter, [270], [151], and [16] are excellent references (books). The original paper by Princen and Bradley on timedomain aliasing cancellation is [203]. The PrincenBradley filter bank is a special case of Lapped Orthogonal Transforms (LOT) [151] for which the overlap is 50%. It is also an orthogonal perfect reconstruction filter bank with firstorder FIR polyphase filters. Other papers related to perceptual audio coding include [23,264,53,152,184,17,24,18,190,16].
Next Section:
Summary and Conclusions
Previous Section:
Applications of the STFT