Multirate Filter Banks

The preceding chapters have been concerned essentially with the short-time Fourier transform and all that goes with it. After developing the overlap-add point of view in Chapter 8, we developed the alternative (dual) filter-bank point of view in Chapter 9. This chapter is concerned more broadly with filter banks, whether they are implemented using an FFT or by some other means. In the end, however, we will come full circle and look at the properly configured STFT as an example of a perfect reconstruction (PR) filter bank as defined herein. Moreover, filter banks in practice are normally implemented using FFT methods.

The subject of PR filter banks is normally considered only in the context of systems for audio compression, and they are normally critically sampled in both time and frequency. This book, on the other hand, belongs to a tiny minority which is not concerned with compression at all, but rather useful time-frequency decompositions for sound, and corresponding applications in music and digital audio effects.

Perhaps the most important new topic introduced in this chapter is the polyphase representation for filter banks. This is both an important analysis tool and a basis for efficient implementation. We will see that it can be seen as a generalization of the overlap-add approach discussed in Chapter 8.

The polyphase representation will make it straightforward to determine general conditions for perfect reconstruction in any filter bank. The STFT will provide some special cases, but there will be many more. In particular, the filter banks used in perceptual audio coding will be special cases as well. Polyphase analysis is used to derive classes of PR filter banks called ``paraunitary,'' ``cosine modulated,'' and ``pseudo-quadrature mirror'' filter banks, among others.

Another extension we will take up in this chapter is multirate systems. Multirate filter banks use different sampling rates in different channels, matched to different filter bandwidths. Multirate filter banks are very important in audio work because the filtering by the inner ear is similarly a variable resolution ``filter bank'' using wider pass-bands at higher frequencies. Finally, the related subject of wavelet filter banks is briefly introduced, and further reading is recommended.

Upsampling and Downsampling

For the DTFT, we proved in Chapter 2 (p. p. [*]) the stretch theorem (repeat theorem) which relates upsampling (``stretch'') to spectral copies (``images'') in the DTFT context; this is the discrete-time counterpart of the scaling theorem for continuous-time Fourier transformsB.4). Also, §2.3.12 discusses the downsampling theorem (aliasing theorem) for DTFTs which relates downsampling to aliasing for discrete-time signals. In this section, we review the main results.

Upsampling (Stretch) Operator

Figure: Upsampling by a factor of $ N$ : $ y \isdeftext
\hbox{\sc Stretch}_N(x)$ .

Figure 11.1 shows the graphical symbol for a digital upsampler by the factor $ N$ . To upsample by the integer factor $ N$ , we simply insert $ N-1$ zeros between $ x(n)$ and $ x(n+1)$ for all $ n$ . In other words, the upsampler implements the stretch operator defined in §2.3.9:

y(n) &=& \hbox{\sc Stretch}_{N,n}(x)\\
&\isdef & \left\{\begin{array}{ll}
x(n/N), & \frac{n}{N}\in{\bf Z} \\ [5pt]
0, & \hbox{otherwise}. \\
\end{array} \right.

In the frequency domain, we have, by the stretch (repeat) theorem for DTFTs:

Y(z) &=& \hbox{\sc Repeat}_{N,z}(X)\\
&\isdef & X(z^N), \quad z\in{\bf C}.

Plugging in $ z=e^{j\omega}$ , we see that the spectrum on $ [-\pi,\pi)$ contracts by the factor $ N$ , and $ N$ images appear around the unit circle. For $ N=2$ , this is depicted in Fig.11.2.

Figure: Illustration of $ \hbox {\sc Repeat}_2$ in the frequency domain.

Downsampling (Decimation) Operator

Figure: Downsampling by $ N$ .

Figure 11.3 shows the symbol for downsampling by the factor $ N$ . The downsampler selects every $ N$ th sample and discards the rest:

y(n) &=& \hbox{\sc Downsample}_{N,n}(x)\\
&\isdef & x(Nn), \quad n\in{\bf Z}

In the frequency domain, we have

Y(z) &=& \hbox{\sc Alias}_{N,z}(X)\\
&\isdef &
\frac{1}{N} \sum_{m=0}^{N-1} X\left(z^\frac{1}{N}e^{-jm\frac{2\pi}{N}} \right),
\quad z\in{\bf C}.

Thus, the frequency axis is expanded by the factor $ N$ , wrapping $ N$ times around the unit circle, adding to itself $ N$ times. For $ N=2$ , two partial spectra are summed, as indicated in Fig.11.4.

Figure: Illustration of $ \hbox {\sc Alias}_2$ in the frequency domain.

Using the common twiddle factor notation

$\displaystyle W_N \isdefs e^{-j2\pi/N},$ (12.1)

the aliasing expression can be written as

$\displaystyle Y(z) \eqsp \frac{1}{N} \sum_{m=0}^{N-1} X(W_N^m z^{1/N}).

Example: Downsampling by 2

For $ N=2$ , downsampling by 2 can be expressed as $ y(n) = x(2n)$ , so that (since $ W_2\isdef e^{-j2\pi/2}=-1$ )

Y(z) &=& \frac{1}{2}\left[X\left(W^0_2 z^{1/2}\right) + X\left(W^1_2 z^{1/2}\right)\right] \\ [5pt]
&=& \frac{1}{2}\left[X\left(e^{-j2\pi 0/2} z^{1/2}\right) + X\left(e^{-j2\pi 1/2}z^{1/2}\right)\right] \\ [5pt]
&=& \frac{1}{2}\left[X\left(z^{1/2}\right) + X\left(-z^{1/2}\right)\right] \\ [5pt]
&=& \frac{1}{2}\left[\hbox{\sc Stretch}_2(X) + \hbox{\sc Stretch}_2\left(\hbox{\sc Shift}_\pi(X)\right)\right].

Example: Upsampling by 2

For $ N=2$ , upsampling (stretching) by 2 can be expressed as
$ y=[x_0,0,x_1,0,\ldots]$ , so that

$\displaystyle Y(z) \eqsp X(z^2) \eqsp \hbox{\sc Repeat}_2(X),$ (12.2)

as discussed more fully in §2.3.11.

Filtering and Downsampling

Because downsampling by $ N$ causes aliasing of any frequencies in the original signal above $ \vert\omega\vert > \pi/N$ , the input signal may need to be first lowpass-filtered to prevent this aliasing, as shown in Fig.11.5.

Figure 11.5: Lowpass filtering followed by downsampling.

Suppose we implement such an anti-aliasing lowpass filter $ h(n)$ as an FIR filter of length $ M\le N$ with cut-off frequency $ \pi/N$ .12.1 This is drawn in direct form in Fig.11.6.

Figure 11.6: Direct-form implementation of an FIR anti-aliasing lowpass filter followed by a downsampler.

Figure 11.7: FIR lowpass filter with downsampler commuted inside the direct-form filter.

We do not need $ N-1$ out of every $ N$ filter output samples due to the $ N$ :$ 1$ downsampler. To realize this savings, we can commute the downsampler through the adders inside the FIR filter to obtain the result shown in Fig.11.7. The multipliers are now running at $ 1/N$ times the sampling frequency of the input signal $ x(n)$ . This reduces the computation requirements by a factor of $ 1/N$ . The downsampler outputs in Fig.11.7 are called polyphase signals. The overall system is a summed polyphase filter bank in which each ``subphase filter'' is a constant scale factor $ h(m)$ . As we will see, more general subphase filters can be used to implement time-domain aliasing as needed for Portnoff windows (§9.7).

We may describe the polyphase processing in the anti-aliasing filter of Fig.11.7 as follows:

  • Subphase signal 0

    $\displaystyle x(nN)\left\vert _{n=0}^{\infty}\right. \eqsp [x_0,x_N,x_{2N},\ldots]$ (12.3)

    is scaled by $ h(0)$ .

  • Subphase signal 1

    $\displaystyle x(nN-1)\left\vert _{n=0}^{\infty}\right.\eqsp [x_{-1},x_{N-1},x_{2N-1},\ldots]$ (12.4)

    is scaled by $ h(1)$ ,

  • $ \cdots$

  • Subphase signal $ M-1$

    $\displaystyle x(nN-m)\left\vert _{n=0}^{\infty}\right.\eqsp [x_{-m},x_{N-m},x_{2N-m},\ldots]

    is scaled by $ h(M-1)$ .
These scaled subphase signals are finally summed to form the output signal shown in Fig.11.7

$\displaystyle y(n) \eqsp \sum_{m=0}^{M-1} h(m)\, x(nN-m),$ (12.5)

which we recognize as a direct-form-convolution implementation of a length $ M$ FIR filter $ h$ , with its output downsampled by the factor $ N$ .

The summed polyphase signals of Fig.11.7 can be interpreted as ``serial to parallel conversion'' from an ``interleaved'' stream of scalar samples $ x(n)$ to a ``deinterleaved'' sequence of buffers (each length $ M$ ) every $ N$ samples, followed by an inner product of each buffer with $ h = [h(0),\ldots,h(M-1)]$ . The same operation may be visualized as a deinterleaving through variable gains into a running sum, as shown in Fig.11.8.

Figure: Demultiplex-and-sum interpretation of the polyphase signal sum of Fig.11.7.

Polyphase Decomposition

The previous section derived an efficient polyphase implementation of an FIR filter $ h$ whose output was downsampled by the factor $ N$ . The derivation was based on commuting the downsampler with the FIR summer. We now derive the polyphase representation of a filter of any length algebraically by splitting the impulse response $ h$ into $ N$ polyphase components.

Two-Channel Case

The simplest nontrivial case is $ N=2$ channels. Starting with a general linear time-invariant filter

$\displaystyle H(z) \eqsp \sum_{n=-\infty}^{\infty}h(n)z^{-n},$ (12.6)

we may separate the even- and odd-indexed terms to get

$\displaystyle H(z) \eqsp \sum_{n=-\infty}^{\infty}h(2n)z^{-2n} + z^{-1}\sum_{n=-\infty}^{\infty}h(2n+1)z^{-2n}.$ (12.7)

We define the polyphase component filters as follows:

E_0(z)&=&\sum_{n=-\infty}^{\infty}h(2n)z^{-n}\\ [5pt]

$ E_0(z)$ and $ E_1(z)$ are the polyphase components of the polyphase decomposition of $ H(z)$ for $ N=2$ .

Now write $ H(z)$ in terms of its polyphase components:

$\displaystyle \zbox {H(z) \eqsp E_0(z^2) + z^{-1}E_1(z^2)}$ (12.8)

As a simple example, consider

$\displaystyle H(z)\eqsp 1 + 2z^{-1} + 3z^{-2} + 4z^{-3}.$ (12.9)

Then the polyphase component filters are

E_0(z) &=& 1 + 3z^{-1}\\ [5pt]
E_1(z) &=& 2 + 4z^{-1}


$\displaystyle H(z) \eqsp E_0(z^2) + z^{-1}E_1(z^2) \eqsp (1 + 3z^{-2}) + (2z^{-1} + 4z^{-3}).$ (12.10)

N-Channel Polyphase Decomposition

Figure 11.9: Schematic illustration of three interleaved polyphase signal components.

For the general case of arbitrary $ N$ , the basic idea is to decompose $ x(n)$ into its periodically interleaved subsequences, as indicated schematically in Fig.11.9. The polyphase decomposition into $ N$ channels is given by

$\displaystyle H(z) \eqsp \sum_{l=0}^{N-1} z^{-l}E_l(z^N) \protect$ (12.11)

where the subphase filters are defined by

$\displaystyle E_l(z) \eqsp \sum_{n=-\infty}^{\infty}e_l(n)z^{-n},\; l=0,1,\ldots,N-1,$ (12.12)


$\displaystyle e_l(n) \isdefs h(Nn+l). \qquad\hbox{($l$th subphase filter)}.$ (12.13)

The signal $ e_l(n)$ can be obtained by passing $ h(n)$ through an advance of $ l$ samples, followed by downsampling by the factor $ N$ . as shown in Fig.11.10.

% latex2html id marker 29755\psfrag{M}{{\normalsize $N$}}\psfrag{ztl}{{\Large $z^l$}}\psfrag{h[n]}{{\Large $h(n)$}}\psfrag{eln}{{\Large $e_l(n)$}}\begin{figure}[htbp]
\caption{Advance by $l$\ samples followed by a downsampling by the factor $N$.}

Type II Polyphase Decomposition

The polyphase decomposition of $ H(z)$ into $ N$ channels in (11.11) may be termed a ``type I'' polyphase decomposition. In the ``type II'', or reverse polyphase decomposition, the powers of $ z$ progress in the opposite direction:

$\displaystyle H(z) \eqsp \sum_{l=0}^{N-1} z^{-(N-l-1)} R_{l}(z^{N})$ (12.14)

We will see that we need type I for analysis filter banks and type II for synthesis filter banks in a general ``perfect reconstruction filter bank'' analysis/synthesis system.

Filtering and Downsampling, Revisited

Let's return to the example of §11.1.3, but this time have the FIR lowpass filter h(n) be length $ M=LN$ , $ L\in{\bf Z}$ . In this case, the $ N$ polyphase filters, $ e_l(n)$ , are each length $ L$ .12.2 Recall that

$\displaystyle H(z) \eqsp E_0(z^N) + z^{-1}E_1(z^N) + \cdots + z^{-(N-1)}E_{N-1}(z^N)$ (12.15)

leading to the result shown in Fig.11.11.

Figure: Polyphase decomposition of a length $ M=LN$ FIR filter followed by a downsampler.

Figure: Polyphase decomposition of a length $ M=LN$ FIR filter followed by a downsampler.

Next, we commute the $ N$ :$ 1$ downsampler through the adders and upsampled (stretched) polyphase filters $ E_l(z^N)$ to obtain Fig.11.12. Commuting the downsampler through the subphase filters $ E_l(z^N)$ to obtain $ E_l(z)$ is an example of a multirate noble identity.

Multirate Noble Identities

Figure 11.13 illustrates the so-called noble identities for commuting downsamplers/upsamplers with ``sparse transfer functions'' that can be expressed a function of $ z^{-N}$ . Note that downsamplers and upsamplers are linear, time-varying operators. Therefore, operation order is important. Also note that adders and multipliers (any memoryless operators) may be commuted across downsamplers and upsamplers, as shown in Fig.11.14.

% latex2html id marker 29805\psfrag{nd}{ $N\downarrow$\ }\psfrag{hz}{ $H(z)$\ }\psfrag{hzn}{ $H(z^N)$\ }\psfrag{equal}{ $\equiv$\ }\begin{figure}[htbp]
\caption{Multirate noble identities}
\end{figure} % was 6in

Figure 11.14: Commuting of downsampler with adder and gains.

Critically Sampled Perfect Reconstruction Filter Banks

A Perfect Reconstruction (PR) filter bank is any filter bank whose reconstruction is the original signal, possibly delayed, and possibly scaled by a constant [287]. In this context, critical sampling (also called ``maximal downsampling'') means that the downsampling factor is the same as the number of filter channels. For the STFT, this implies $ R=M=N$ (with $ M>N$ allowed for Portnoff windows).

As derived in Chapter 8, the Short-Time Fourier Transform (STFT) is a PR filter bank whenever the Constant-OverLap-Add (COLA) condition is met by the analysis window $ w$ and the hop size $ R$ . However, only the rectangular window case with no zero-padding is critically sampled (OLA hop size = FBS downsampling factor = $ N$ ). Perceptual audio compression algorithms such as MPEG audio coding are based on critically sampled filter banks, for obvious reasons. It is important to remember that we normally do not require critical sampling for audio analysis, digital audio effects, and music applications; instead, we normally need critical sampling only when compression is a requirement. Thus, when compression is not a requirement, we are normally interested in oversampled filter banks. The polyphase representation is useful in that case as well. In particular, we will obtain some excellent insights into the aliasing cancellation that goes on in such downsampled filter banks (including STFTs with hop sizes $ R>1$ ), as the next section makes clear.

Two-Channel Critically Sampled Filter Banks

Figure 11.15 shows a simple two-channel band-splitting filter bank, followed by the corresponding synthesis filter bank which reconstructs the original signal (we hope) from the two channels. The analysis filter $ H_0(z)$ is a half-band lowpass filter, and $ H_1(z)$ is a complementary half-band highpass filter. The synthesis filters $ F_0(z)$ and $ F_1(z)$ are to be derived. Intuitively, we expect $ F_0(z)$ to be a lowpass that rejects the upper half-band due to the upsampler by 2, and $ F_1(z)$ should do the same but then also reposition its output band as the upper half-band, which can be accomplished by selecting the upper of the two spectral images in the upsampler output.

% latex2html id marker 29839\psfrag{x(n)}{\normalsize $x(n)$}\psfrag{x}{\normalsize $x$}\psfrag{(n)}{\normalsize $(n)$}\psfrag{y}{\normalsize $y$}\psfrag{v}{\normalsize $v$}\psfrag{H}{\normalsize $H$}\psfrag{F}{\normalsize $F$}\psfrag{z}{\normalsize $z$}\begin{figure}[htbp]
\caption{Two-channel critically sampled filter bank.}

The outputs of the two analysis filters in Fig.11.15 are

$\displaystyle X_k(z) \eqsp H_k(z)X(z), \quad k=0,1.$ (12.16)

Using the results of §11.1, the signals become, after downsampling,

$\displaystyle V_k(z) \eqsp \frac{1}{2}\left[X_k(z^{1/2}) + X_k(-z^{1/2})\right], \; k=0,1.$ (12.17)

After upsampling, the signals become
$\displaystyle Y_k(z) \eqsp V_k(z^2)$ $\displaystyle =$ $\displaystyle \frac{1}{2}[X_k(z) + X_k(-z)]$  
  $\displaystyle =$ $\displaystyle \frac{1}{2}[H_k(z)X(z) + H_k(-z)X(-z)],\; k=0,1.$  

After substitutions and rearranging, we find that the output $ \hat{x}$ is a filtered replica of the input signal plus an aliasing term:
$\displaystyle \hat{X}(z)$ $\displaystyle =$ $\displaystyle \frac{1}{2}\left[H_0(z)F_0(z) + H_1(z)F_1(z)\right]X(z)$  
  $\displaystyle +$ $\displaystyle \frac{1}{2}\left[H_0(-z)F_0(z) + H_1(-z)F_1(z)\right]X(-z)
\protect$ (12.18)

For perfect reconstruction, we require the aliasing term to be zero. For ideal half-band filters cutting off at $ \omega=\pi/2$ , we can choose $ F_0=H_0$ and $ F_1=H_1$ and the aliasing term is zero because there is no spectral overlap between the channels, i.e., $ H_0(-z)F_0(z)=H_1(z)H_0(z)=0$ , and $ H_1(-z)F_1(z)=H_0(z)H_1(z)=0$ . However, more generally (and more practically), we can force the aliasing to zero by choosing synthesis filters
$\displaystyle F_0(z)$ $\displaystyle =$ $\displaystyle \quad\! H_1(-z),$  
$\displaystyle F_1(z)$ $\displaystyle =$ $\displaystyle -H_0(-z).
\protect$ (12.19)

In this case, synthesis filter $ F_0(z)$ is still a lowpass, but the particular one obtained by $ \pi$ -rotating the highpass analysis filter around the unit circle in the $ z$ plane. Similarly, synthesis filter $ F_1(z)$ is the $ \pi$ -rotation (and negation) of the analysis lowpass filter $ H_0(z)$ on the unit circle. For this choice of synthesis filters $ F_0$ and $ F_1$ , aliasing is completely canceled for any choice of analysis filters $ H_0$ and $ H_1$ .

Referring again to (11.18), we see that we also need the non-aliased term to be of the form

$\displaystyle A(z)$ $\displaystyle =$ $\displaystyle H_0(z)F_0(z) + H_1(z)F_1(z)
\protect$ (12.20)

where $ A(z)$ is of the form

$\displaystyle A(z)\eqsp g\,z^{-d}.$ (12.21)

That is, for perfect reconstruction, we need, in addition to aliasing cancellation, that the non-aliasing term reduce to a constant gain $ g$ and/or delay $ d$ . We will call this the filtering cancellation constraint on the channel filters. Thus perfect reconstruction requires both aliasing cancellation and filtering cancellation.

Let $ {\tilde H}$ denote $ H(-z)$ . Then both constraints can be expressed in matrix form as follows:

$\displaystyle \left[\begin{array}{cc} H_0 & H_1 \\ [2pt] {\tilde H}_0 & {\tilde H}_1 \end{array}\right]\left[\begin{array}{c} F_0 \\ [2pt] F_1 \end{array}\right]\eqsp \left[\begin{array}{c} c \\ [2pt] 0 \end{array}\right]$ (12.22)

Substituting the aliasing-canceling choices for $ F_0$ and $ F_1$ from (11.19) into the filtering-cancellation constraint (11.20), we obtain

$\displaystyle g\,z^{-d}$ $\displaystyle =$ $\displaystyle H_0(z)H_1(-z) - H_1(z)H_0(-z).
\protect$ (12.23)

The filtering-cancellation constraint is almost satisfied by ideal zero-phase half-band filters cutting off at $ \pi/2$ , since in that case we have $ H_1(-z)=H_0(z)$ and $ H_0(-z)=H_1(z)$ . However, the minus sign in (11.23) means there is a discontinuous sign flip as frequency crosses $ \omega=\pi/2$ , which is not equivalent to a linear phase term. Therefore the filtering cancellation constraint fails for the ideal half-band filter bank! Recall from above, however, that ideal half-band filters did work using a different choice of synthesis filters, relying instead on their lack of spectral overlap. The presently studied case from (11.19) arose from so-called Quadrature Mirror Filters (QMF), which are discussed further below. First, however, we'll look at some simple special cases.

Amplitude-Complementary 2-Channel Filter Bank

A natural choice of analysis filters for our two-channel critically sampled filter bank is an amplitude-complementary lowpass/highpass pair, i.e.,

$\displaystyle H_1(z) \eqsp 1-H_0(z)$ (12.24)

where we impose the unity dc gain constraint $ H_0(1)=1$ . Note that amplitude-complementary means constant overlap-add (COLA) on the unit circle in the $ z$ plane.

Substituting the COLA constraint into the filtering and aliasing cancellation constraint (11.23) gives

g\,z^{-d} &=& H_0(z)\left[1-H_0(-z)\right] - \left[1-H_0(z)\right]H_0(-z) \\ [5pt]
&=& H_0(z) - H_0(-z)\\ [5pt]
\;\longleftrightarrow\;\quad a(n) &=& h_0(n) - (-1)^n h_0(n) \\ [5pt]
&=& \left\{\begin{array}{ll}
0, & \hbox{$n$\ even} \\ [5pt]
2h_0(n), & \hbox{$n$\ odd} \\
\end{array} \right.

Thus, we find that even-indexed terms of the impulse response are unconstrained, since they subtract out in the constraint, while, for perfect reconstruction, exactly one odd-indexed term must be nonzero in the lowpass impulse response $ h_0(n)$ . The simplest choice is $ h_0(1)\neq 0$ .

Thus, we have derived that the lowpass-filter impulse-response for channel 0 can be anything of the form

$\displaystyle h_0 \eqsp [h_0(0), \bold{h_0(1)}, h_0(2), 0, h_0(4), 0, h_0(6), 0, \ldots] \protect$ (12.25)


$\displaystyle h_0 \eqsp [h_0(0), 0, h_0(2), \bold{h_0(3)}, h_0(4), 0, h_0(6), 0, \ldots]$ (12.26)

etc. The corresponding highpass-filter impulse response is then

$\displaystyle h_1(n) \eqsp \delta(n) - h_0(n).$ (12.27)

The first example (11.25) above goes with the highpass filter

$\displaystyle h_1 \eqsp [1-h_0(0), -h_0(1), -h_0(2), 0, -h_0(4), 0, -h_0(6), 0, \ldots]$ (12.28)

and similarly for the other example.

The above class of amplitude-complementary filters can be characterized in general as follows:

H_0(z) &=& E_0(z^2) + h_0(o) z^{-o}, \quad E_0(1)+h_0(o)\eqsp 1, \, \hbox{$o$\ odd}\\ [5pt]
H_1(z) &=& 1-H_0(z) \eqsp 1 - E_0(z^2) - h_0(o) z^{-o}

In summary, we see that an amplitude-complementary lowpass/highpass analysis filter pair yields perfect reconstruction (aliasing and filtering cancellation) when there is exactly one odd-indexed term in the impulse response of $ h_0(n)$ .

Unfortunately, the channel filters are so constrained in form that it is impossible to make a high quality lowpass/highpass pair. This happens because $ E_0(z^2)$ repeats twice around the unit circle. Since we assume real coefficients, the frequency response, $ E_0(e^{j2\omega})$ is magnitude-symmetric about $ \omega=\pi/2$ as well as $ \pi$ . This is not good since we only have one degree of freedom, $ h_0(o) z^{-o}$ , with which we can break the $ \pi/2$ symmetry to reduce the high-frequency gain and/or boost the low-frequency gain. This class of filters cannot be expected to give high quality lowpass or highpass behavior.

To achieve higher quality lowpass and highpass channel filters, we will need to relax the amplitude-complementary constraint (and/or filtering cancellation and/or aliasing cancellation) and find another approach.

Haar Example

Before we leave the case of amplitude-complementary, two-channel, critically sampled, perfect reconstruction filter banks, let's see what happens when $ H_0(z)$ is the simplest possible lowpass filter having unity dc gain, i.e.,

$\displaystyle H_0(z) \eqsp \frac{1}{2} + \frac{1}{2}z^{-1}.$ (12.29)

This case is obtained above by setting $ E_0(z^2)=1/2$ , $ o=1$ , and $ h_0(1)=1/2$ . The polyphase components of $ H_0(z)$ are clearly

$\displaystyle E_0(z^2)\eqsp E_1(z^2)\eqsp 1/2.$ (12.30)

Choosing $ H_1(z)=1-H_0(z)$ , and choosing $ F_0(z)$ and $ F_1(z)$ for aliasing cancellation, the four filters become

H_0(z) &=& \frac{1}{2} + \frac{1}{2}z^{-1} \eqsp E_0(z^2)+z^{-1}E_1(z^2)\\ [5pt]
H_1(z) &=& 1-H_0(z) \eqsp \frac{1}{2} - \frac{1}{2}z^{-1} \eqsp E_0(z^2)-z^{-1}E_1(z^2)\\ [5pt]
F_0(z) &=& \;\;\, H_1(-z) \eqsp \frac{1}{2} + \frac{1}{2}z^{-1} \eqsp \;\;\,H_0(z)\\ [5pt]
F_1(z) &=& -H_0(-z) \eqsp -\frac{1}{2} + \frac{1}{2}z^{-1} \eqsp -H_1(z).

Thus, both the analysis and reconstruction filter banks are scalings of the familiar Haar filters (``sum and difference'' filters $ (1\pm z^{-1})/\sqrt{2}$ ). The frequency responses are

H_0(e^{j\omega}) &=&\;\;\,F_0(e^{j\omega}) \eqsp \frac{1}{2} + \frac{1}{2}e^{-j\omega}\eqsp e^{-j\frac{\omega}{2}} \cos\left(\frac{\omega}{2}\right)\\ [5pt]
H_1(e^{j\omega}) &=& -F_0(e^{j\omega}) \eqsp \frac{1}{2} - \frac{1}{2}e^{-j\omega}\eqsp j e^{-j\frac{\omega}{2}} \sin\left(\frac{\omega}{2}\right)

which are plotted in Fig.11.16.

Figure 11.16: Amplitude responses of the two channel filters in the Haar filter bank.

Polyphase Decomposition of Haar Example

% latex2html id marker 29979\psfrag{x(n)}{\normalsize $x(n)$}\psfrag{x}{\normalsize $x$}\psfrag{(n)}{\normalsize $(n)$}\psfrag{y}{\normalsize $y$}\psfrag{v}{\normalsize $v$}\psfrag{H}{\normalsize $H$}\psfrag{F}{\normalsize $F$}\psfrag{z}{\normalsize $z$}\begin{figure}[htbp]
\caption{Two-channel polyphase filter bank and inverse.}

Let's look at the polyphase representation for this example. Starting with the filter bank and its reconstruction (see Fig.11.17), the polyphase decomposition of $ H_0(z)$ is

$\displaystyle H_0(z) \eqsp E_0(z^2) + z^{-1}E_1(z^2) \eqsp \frac{1}{2}+\frac{1}{2}z^{-1}.$ (12.31)

Thus, $ E_0(z^2)=E_1(z^2)=1/2$ , and therefore

$\displaystyle H_1(z) \eqsp 1-H_0(z) \eqsp E_0(z^2)-z^{-1}E_1(z^2).$ (12.32)

We may derive polyphase synthesis filters as follows:

\hat{X}(z) &=& \left[F_0(z)H_0(z) + F_1(z)H_1(z)\right] X(z)\\
&=& \left[\left(\frac{1}{2} + \frac{1}{2}z^{-1}\right)H_0(z) + \left(-\frac{1}{2}+\frac{1}{2}z^{-1}\right)H_1(z)\right]X(z)\\
&=& \frac{1}{2}\left\{\left[H_0(z)-H_1(z)\right] + z^{-1}\left[H_0(z) + H_1(z)\right]\right\}X(z)

The polyphase representation of the filter bank and its reconstruction can now be drawn as in Fig.11.18. Notice that the reconstruction filter bank is formally the transpose of the analysis filter bank [263]. A filter bank that is inverted by its own transpose is said to be an orthogonal filter bank, a subject to which we will return §11.3.8.

Figure 11.18: Polyphase representation of the general two-channel, critically sampled filter bank and its inverse.

Figure 11.19: Figure 11.18 with downsamplers commuted inside the filter branches.

Commuting the downsamplers (using the noble identities from §11.2.5), we obtain Figure 11.19. Since $ E_0(z)=E_1(z)=1/2$ , this is simply the OLA form of an STFT filter bank for $ N=2$ , with $ N=M=R=2$ , and rectangular window $ w=[1/2,1/2]$ . That is, the DFT size, window length, and hop size are all 2, and both the DFT and its inverse are simply sum-and-difference operations.

Quadrature Mirror Filters (QMF)

The well studied subject of Quadrature Mirror Filters (QMF) is entered by imposing the following symmetry constraint on the analysis filters:

$\displaystyle H_1(z) \eqsp H_0(-z)\quad \hbox{(QMF Symmetry Constraint)} \protect$ (12.33)

That is, the filter for channel 1 is constrained to be a $ \pi$ -rotation of filter 0 along the unit circle. This of course makes perfect sense for a two-channel band-splitting filter bank, and can form the basis of a dyadic tree band splitting, as we'll look at in §11.9.1 below.

In the time domain, the QMF constraint (11.33) becomes $ h_1(n) =
(-1)^n h_0(n)$ , i.e., all odd-index coefficients are negated. If $ H_0$ is a lowpass filter cutting off near $ \omega=\pi/2$ (as is typical), then $ H_1$ is a complementary highpass filter. The exact cut-off frequency can be adjusted along with the roll-off rate to provide a maximally constant frequency-response sum.

Two-channel QMFs have been around since at least 1976 [51], and appear to be the first critically sampled perfect reconstruction filter banks. Moreover, the Princen-Bradley filter bank, the initial foundation of MPEG audio as we now know it, was conceived as the Fourier dual of QMFs [214]. Historically, the term QMF applied only to two-channel filter banks having the QMF symmetry constraint (11.33). Today, the term ``QMF filter bank'' may refer to more general PR filter banks with any number of channels and not obeying (11.33) [287].

Combining the QMF symmetry constraint with the aliasing-cancellation constraints, given by

F_0(z) &=& \quad\! H_1(-z) \eqsp \quad\! H_0(z)\\ [5pt]
F_1(z) &=& -H_0(-z) \eqsp -H_1(z),

the perfect reconstruction requirement reduces to

$\displaystyle g\,z^{-d} \eqsp H_0(z)F_0(z) + H_1(z)F_1(z) \eqsp H_0^2(z) - H_0^2(-z). \protect$ (12.34)

Now, all four filters are determined by $ H_0(z)$ .

It is easy to show using the polyphase representation of $ H_0(z)$ (see [287]) that the only causal FIR QMF analysis filters yielding exact perfect reconstruction are two-tap FIR filters of the form

H_0(z) &=& c_0 z^{-2n_0} + c_1 z^{-(2n_1+1)}\\
H_1(z) &=& c_0 z^{-2n_0} - c_1 z^{-(2n_1+1)}

where $ c_0$ and $ c_1$ are constants, and $ n_0$ and $ n_1$ are integers. Therefore, only weak channel filters are available in the QMF case [ $ H_1(z)=H_0(-z)$ ], as we saw in the amplitude-complementary case above. On the other hand, very high quality IIR solutions are possible. See [287, pp. 201-204] for details. In practice, approximate ``pseudo QMF'' filters are more practical, which only give approximate perfect reconstruction. We'll return to this topic in §11.7.1.

The scaled Haar filters, which we saw gave perfect reconstruction in the amplitude-complementary case, are also examples of a QMF filter bank:

H_0(z) &=& 1 + z^{-1}\\ [5pt]
H_1(z) &=& 1 - z^{-1}

In this example, $ c_0=c_1=1$ , and $ n_0=n_1=0$ .

Linear Phase Quadrature Mirror Filter Banks

Linear phase filters delay all frequencies by equal amounts, and this is often a desirable property in audio and other applications. A filter phase response is linear in $ \omega$ whenever its impulse response $ h_0(n)$ is symmetric, i.e.,

$\displaystyle h_0(-n) \eqsp h_0(n)$ (12.35)

in which case the frequency response can be expressed as

$\displaystyle H_0(e^{j\omega}) \eqsp e^{-j\omega N/2}\left\vert H_0(e^{j\omega})\right\vert.$ (12.36)

Substituting this into the QMF perfect reconstruction constraint (11.34) gives

$\displaystyle g\,e^{-j\omega d} \eqsp e^{-j\omega N}\left[ \left\vert H_0(e^{j\omega})\right\vert^2 - (-1)^N\left\vert H_0(e^{j(\pi-\omega)})\right\vert^2\right].$ (12.37)

When $ N$ is even, the right hand side of the above equation is forced to zero at $ \omega=\pi/2$ . Therefore, we will only consider odd $ N$ , for which the perfect reconstruction constraint reduces to

$\displaystyle g\,z^{-j\omega d} \eqsp e^{-j\omega N}\left[ \left\vert H_0(e^{j\omega})\right\vert^2 + \left\vert H_0(e^{j(\pi-\omega)}\right\vert^2\right].$ (12.38)

We see that perfect reconstruction is obtained in the linear-phase case whenever the analysis filters are power complementary. See [287] for further details.

Conjugate Quadrature Filters (CQF)

A class of causal, FIR, two-channel, critically sampled, exact perfect-reconstruction filter-banks is the set of so-called Conjugate Quadrature Filters (CQF). In the z-domain, the CQF relationships are

$\displaystyle H_1(z) \eqsp z^{-(L-1)}H_0(-z^{-1}).$ (12.39)

In the time domain, the analysis and synthesis filters are given by

h_1(n) &=& -(-1)^n h_0(L-1-n) \\ [5pt]
f_0(n) &=& h_0(L-1-n) \\ [5pt]
f_1(n) &=& -(-1)^n h_0(n) \eqsp - h_1(L-1-n).

That is, $ f_0=\hbox{\sc Flip}(h_0)$ for the lowpass channel, and each highpass channel filter is a modulation of its lowpass counterpart by $ (-1)^n$ . Again, all four analysis and synthesis filters are determined by the lowpass analysis filter $ H_0(z)$ . It can be shown that this is an orthogonal filter bank. The analysis filters $ H_0(z)$ and $ H_1(z)$ are power complementary, i.e.,

$\displaystyle \left\vert H_0{e^{j\omega}}\right\vert^2 + \left\vert H_1{e^{j\omega}}\right\vert^2 \eqsp 1$ (12.40)


$\displaystyle {\tilde H}_0(z) H_0(z) + {\tilde H}_1(z) H_1(z) \eqsp 1$ (12.41)

where $ {\tilde H}_0(z)\isdef \overline{H}_0(z^{-1})$ denotes the paraconjugate of $ H_0(z)$ (for real filters $ H_0$ ). The paraconjugate is the analytic continuation of $ \overline{H_0(e^{j\omega})}$ from the unit circle to the $ z$ plane. Moreover, the analysis filters $ H_0(z)$ are power symmetric, e.g.,

$\displaystyle {\tilde H}_0(z) H_0(z) + {\tilde H}_0(-z) H_0(-z) \eqsp 1 .$ (12.42)

The power symmetric case was introduced by Smith and Barnwell in 1984 [272]. With the CQF constraints, (11.18) reduces to

$\displaystyle \hat{X}(z) \eqsp \frac{1}{2}\left[H_0(z)H_0(z^{-1}) + H_0(-z)H_0(-z^{-1})\right]X(z) \protect$ (12.43)

Let $ P(z) = H_0(z)H_0(-z)$ , such that $ H_0(z)$ is a spectral factor of the half-band filter $ P(z)$ (i.e., $ P(e^{j\omega})$ is a nonnegative power response which is lowpass, cutting off near $ \omega=\pi/4$ ). Then, (11.43) reduces to

$\displaystyle \hat{X}(z) \eqsp \frac{1}{2}\left[P(z) + P(-z)\right]X(z) \eqsp -z^{-(L-1)}X(z)$ (12.44)

The problem of PR filter design has thus been reduced to designing one half-band filter $ P(z)$ . It can be shown that any half-band filter can be written in the form $ p(2n) = \delta(n)$ . That is, all non-zero even-indexed values of $ p(n)$ are set to zero.

A simple design of an FIR half-band filter would be to window a sinc function:

$\displaystyle p(n) \eqsp \frac{\hbox{sin}[\pi n/2]}{\pi n/2}w(n)$ (12.45)

where $ w(n)$ is any suitable window, such as the Kaiser window.

Note that as a result of (11.43), the CQF filters are power complementary. That is, they satisfy

$\displaystyle \left\vert H_0(e^{j \omega})\right\vert^2 + \left\vert H_1(e^{j \omega})\right\vert^2 \eqsp 2.$ (12.46)

Also note that the filters $ H_0$ and $ H_1$ are not linear phase. It can be shown that there are no two-channel perfect reconstruction filter banks that have all three of the following characteristics (except for the Haar filters):
  • FIR
  • orthogonal
  • linear phase
In this design procedure, we have chosen to satisfy the first two and give up the third.

By relaxing ``orthogonality'' to ``biorthogonality'', it becomes possible to obtain FIR linear phase filters in a critically sampled, perfect reconstruction filter bank. (See §11.9.)

Orthogonal Two-Channel Filter Banks

Recall the reconstruction equation for the two-channel, critically sampled, perfect-reconstruction filter-bank:

\hat{X}(z) &=& \frac{1}{2}[H_0(z)F_0(z) + H_1(z)F_1(z)]X(z)
\nonumber\\ [5pt]
&+& \frac{1}{2}[H_0(-z)F_0(z) + H_1(-z)F_1(z)]X(-z)

This can be written in matrix form as

$\displaystyle \hat{X}(z) \eqsp \frac{1}{2} \left[\begin{array}{c} F_0(z) \\ [2pt] F_1(z) \end{array}\right]^{T} \left[\begin{array}{cc} H_0(z) & H_0(-z) \\ [2pt] H_1(z) & H_1(-z) \end{array}\right] \left[\begin{array}{c} X(z) \\ [2pt] X(-z) \end{array}\right]$ (12.47)

where the above $ 2 \times 2$ matrix, $ \bold{H}_m(z)$ , is called the alias component matrix (or analysis modulation matrix). If

$\displaystyle {\tilde {\bold{H}}}_m(z)\bold{H}_m(z) \eqsp 2\bold{I}$ (12.48)

where $ {\tilde {\bold{H}}}_m(z)\isdef \bold{H}_m^T(z^{-1})$ denotes the paraconjugate of $ \bold{H}_m(z)$ , then the alias component (AC) matrix is lossless, and the (real) filter bank is orthogonal.

It turns out orthogonal filter banks give perfect reconstruction filter banks for any number of channels. Orthogonal filter banks are also called paraunitary filter banks, which we'll study in polyphase form in §11.5 below. The AC matrix is paraunitary if and only if the polyphase matrix (defined in the next section) is paraunitary [287].

Perfect Reconstruction Filter Banks

We now consider filter banks with an arbitrary number of channels, and ask under what conditions do we obtain a perfect reconstruction filter bank? Polyphase analysis will give us the answer readily. Let's begin with the $ N$ -channel filter bank in Fig.11.20. The downsampling factor is $ R\leq N$ . For critical sampling, we set $ R=N$ .

Figure: $ N$ -channel filter bank.

The next step is to expand each analysis filter $ H_k(z)$ into its $ N$ -channel ``type I'' polyphase representation:

$\displaystyle H_k(z) \eqsp \sum_{l=0}^{N-1} z^{-l} E_{kl}(z^N)$ (12.49)


$\displaystyle \underbrace{\left[\begin{array}{c} H_0(z) \\ [2pt] H_1(z) \\ [2pt] \vdots \\ [2pt] \!\!H_{N-1}(z)\!\!\end{array}\right]}_{\bold{h}(z)} = \underbrace{\left[\begin{array}{cccc} E_{0,0}(z^N) & E_{0,1}(z^N) & \cdots & E_{0,N-1}(z^N) \\ E_{1,0}(z^N) & E_{1,1}(z^N) & \cdots & E_{1,N-1}(z^N)\\ \vdots & \vdots & \cdots & \vdots\\ \!\!E_{N-1,0}(z^N) & E_{N-1,1}(z^N) & \cdots & E_{N-1,N-1}(z^N) \!\! \end{array}\right]}_{\bold{E}(z^N)} \underbrace{\left[\begin{array}{c} 1 \\ [2pt] z^{-1} \\ [2pt] \vdots \\ [2pt] \!\!z^{-(N-1)}\!\!\end{array}\right]}_{\bold{e}(z)}$ (12.50)

which we can write as

$\displaystyle \bold{h}(z) \eqsp \bold{E}(z^N)\bold{e}(z).$ (12.51)

Similarly, expand the synthesis filters in a type II polyphase decomposition:

$\displaystyle F_k(z) \eqsp \sum_{l=0}^{N-1} z^{-(N-l-1)}R_{lk}(z^N)$ (12.52)


$\displaystyle \underbrace{\left[\begin{array}{c} F_0(z) \\ [2pt] F_1(z) \\ [2pt] \vdots \\ [2pt] \!\!F_{N-1}(z)\!\!\end{array}\right]^T}_{\bold{f}^T(z)} \eqsp \underbrace{\left[\begin{array}{c} \!\!z^{-(N-1)}\!\! \\ [2pt] \!\!z^{-(N-2)}\!\! \\ [2pt] \vdots \\ [2pt] 1\end{array}\right]^T}_{{\tilde{\bold{e}}}(z)} \underbrace{\left[\begin{array}{cccc} R_{0,0}(z^N) & R_{0,1}(z^N) & \cdots & R_{0,N-1}(z^N) \\ R_{1,0}(z^N) & R_{1,1}(z^N) & \cdots & R_{1,N-1}(z^N)\\ \vdots & \vdots & \cdots & \vdots\\ \!\!R_{N-1,0}(z^N) & R_{N-1,1}(z^N) & \cdots & R_{N-1,N-1}(z^N)\!\! \end{array}\right]}_{\bold{R}(z^N)}$ (12.53)

which we can write as

$\displaystyle \bold{f}^T(z) \eqsp {\tilde{\bold{e}}}(z)\bold{R}(z^N).$ (12.54)

The polyphase representation can now be depicted as shown in Fig.11.21. When $ R=N$ , commuting the up/downsamplers gives the result shown in Fig.11.22. We call $ \bold{E}(z)$ the polyphase matrix.

Figure: Polyphase representation of the $ N$ -channel filter bank.

Figure: Efficient polyphase form of the $ N$ -channel filter bank.

As we will show below, the above simplification can be carried out more generally whenever $ R$ divides $ N$ (e.g., $ R=N/2, N/3,\ldots,
1$ ). In these cases $ \bold{E}(z)$ becomes $ \bold{E}(z^{N/R})$ and $ \bold{R}(z)$ becomes $ \bold{R}(z^{N/R})$ .

Simple Examples of Perfect Reconstruction

If we can arrange to have

$\displaystyle \zbox {\bold{R}(z)\bold{E}(z) = \bold{I}_N}$ (12.55)

then the filter bank will reduce to the simple system shown in Fig.11.23.

Figure: Simplified filter bank when $ R(z)$ inverts $ E(z)$ .

Thus, when $ R=N$ and $ \bold{R}(z)\bold{E}(z)=\bold{I}_N$ , we have a simple parallelizer/serializer, which is perfect-reconstruction by inspection: Referring to Fig.11.23, think of the input samples $ x(n)$ as ``filling'' a length $ N-1$ delay line over $ N-1$ sample clocks. At time 0 , the downsamplers and upsamplers ``fire'', transferring $ x(0)$ (and $ N-1$ zeros) from the delay line to the output delay chain, summing with zeros. Over the next $ N-1$ clocks, $ x(0)$ makes its way toward the output, and zeros fill in behind it in the output delay chain. Simultaneously, the input buffer is being filled with samples of $ x(n)$ . At time $ N-1$ , $ x(0)$ makes it to the output. At time $ N$ , the downsamplers ``fire'' again, transferring a length $ N$ ``buffer'' [$ x(1$ :$ N)$ ] to the upsamplers. On the same clock pulse, the upsamplers also ``fire'', transferring $ N$ samples to the output delay chain. The bottom-most sample [ $ x(n-N+1)=x(1)$ ] goes out immediately at time $ N$ . Over the next $ N-1$ sample clocks, the length $ N-1$ output buffer will be ``drained'' and refilled by zeros. Simultaneously, the input buffer will be replaced by new samples of $ x(n)$ . At time $ 2N$ , the downsamplers and upsamplers ``fire'', and the process goes on, repeating with period $ N$ . The output of the $ N$ -way parallelizer/serializer is therefore

$\displaystyle {\hat x}(n) \eqsp x(n-N+1)$ (12.56)

and we have perfect reconstruction.

Sliding Polyphase Filter Bank

Figure: Simplified filter bank when $ R(z)$ inverts $ E(z)$ and there are no downsamplers or upsamplers ($ R=1$ ).

When $ R=1$ , there is no downsampling or upsampling, and the system further reduces to the case shown in Fig.11.24. Working backward along the output delay chain, the output sum can be written as

\hat{X}(z) &=& \left[z^{-0}z^{-(N-1)} + z^{-1}z^{-(N-2)} + z^{-2}z^{-(N-3)} + \cdots \right.\\
& & \left. + z^{-(N-2)}z^{-1} + z^{-(N-1)}z^{-0} \right] X(z)\\
&=& N z^{-(N-1)} X(z).

Thus, when $ R=1$ , the output is

$\displaystyle {\hat x}(n) \eqsp N x(n-N+1)$ (12.57)

and we again have perfect reconstruction.

Hopping Polyphase Filter Bank

When $ 1<R<N$ and $ R$ divides $ N$ , we have, by a similar analysis,

$\displaystyle {\hat x}(n) \eqsp \frac{N}{R} x(n-N+1)$ (12.58)

which is again perfect reconstruction. Note the built-in overlap-add when $ R<N$ .

Sufficient Condition for Perfect Reconstruction

Above, we found that, for any integer $ 1\leq R\leq N$ which divides $ N$ , a sufficient condition for perfect reconstruction is

$\displaystyle \bold{P}(z)\isdefs \bold{R}(z)\bold{E}(z) \eqsp \bold{I}_N$ (12.59)

and the output signal is then

$\displaystyle {\hat x}(n) \eqsp \frac{N}{R} \, x(n-N+1).$ (12.60)

More generally, we allow any nonzero scaling and any additional delay:

$\displaystyle \bold{P}(z) \eqsp \bold{R}(z)\bold{E}(z) \eqsp c\, z^{-K}\, \bold{I}_N \protect$ (12.61)

where $ c\neq 0$ is any constant and $ K$ is any nonnegative integer. In this case, the output signal is

$\displaystyle {\hat x}(n) \eqsp c\,\frac{N}{R} \, x(n-N+1-K)$ (12.62)

Thus, given any polyphase matrix $ \bold{E}(z)$ , we can attempt to compute $ \bold{R}(z) = \bold{E}^{-1}(z)$ : If it is stable, we can use it to build a perfect-reconstruction filter bank. However, if $ \bold{E}(z)$ is FIR, $ \bold{R}(z)$ will typically be IIR. In §11.5 below, we will look at paraunitary filter banks, for which $ \bold{R}(z)$ is FIR and paraunitary whenever $ \bold{E}(z)$ is.

Necessary and Sufficient Conditions for Perfect Reconstruction

It can be shown [287] that the most general conditions for perfect reconstruction are that

$\displaystyle \zbox {\bold{R}(z)\bold{E}(z) \eqsp c z^{-K} \left[\begin{array}{cc} \bold{0}_{(N-L)\times L} & z^{-1}\bold{I}_{N-L} \\ [2pt] \bold{I}_L & \bold{0}_{L \times (N-L)} \end{array}\right]}$ (12.63)

for some constant $ c$ and some integer $ K\geq 0$ , where $ L$ is any integer between 0 and $ N-1$ .

Note that the more general form of $ \bold{R}(z)\bold{E}(z)$ above can be regarded as a (non-unique) square root of a vector unit delay, since

$\displaystyle \left[\begin{array}{cc} \bold{0}_{(N-L)\times L} & z^{-1}\bold{I}_{N-L} \\ [2pt] \bold{I}_L & \bold{0}_{L \times (N-L)} \end{array}\right]^2 \eqsp z^{-1}\bold{I}_N.$ (12.64)

Thus, the general case is the same thing as

$\displaystyle \bold{R}(z)\bold{E}(z) \eqsp c z^{-K} \bold{I}_N.$ (12.65)

except for some channel swapping and an extra sample of delay in some channels.

Polyphase View of the STFT

As a familiar special case, set

$\displaystyle \bold{E}(z) \eqsp \bold{W}_N^\ast$ (12.66)

where $ \bold{W}_N^\ast$ is the DFT matrix:

$\displaystyle \bold{W}_N^\ast[kn] \eqsp \left[e^{-j2\pi kn/N}\right]$ (12.67)

The inverse of this polyphase matrix is then simply the inverse DFT matrix:

$\displaystyle \bold{R}(z) \eqsp \frac{1}{N}\bold{W}_N$ (12.68)

Thus, the STFT (with rectangular window) is the simple special case of a perfect reconstruction filter bank for which the polyphase matrix is constant. It is also unitary; therefore, the STFT is an orthogonal filter bank.

The channel analysis and synthesis filters are, respectively,

H_k(z) &=& H_0(zW_N^k)\\ [5pt]
F_k(z) &=& F_0(zW_N^{-k})

where $ W_N\isdef e^{-j2\pi/N}$ , and

$\displaystyle F_0(z)\eqsp H_0(z)\eqsp \sum_{n=0}^{N-1}z^{-n}\;\longleftrightarrow\;[1,1,\ldots,1]$ (12.69)

corresponding to the rectangular window.

Figure 11.25: Polyphase representation of the STFT with a rectangular window.

Looking again at the polyphase representation of the $ N$ -channel filter bank with hop size $ R$ , $ \bold{E}(z)=\bold{W}_N^\ast$ , $ \bold{R}(z)=\bold{W}_N$ , $ R$ dividing $ N$ , we have the system shown in Fig.11.25. Following the same analysis as in §11.4.1 leads to the following conclusion:

$\displaystyle \zbox {\hbox{The polyphase representation is an \emph{overlap-add} representation.}}

Our analysis showed that the STFT using a rectangular window is a perfect reconstruction filter bank for all integer hop sizes in the set $ R\in\{N,N/2,N/3,\ldots,N/N\}$ . The same type of analysis can be applied to the STFT using the other windows we've studied, including Portnoff windows.

Example: Polyphase Analysis of the STFT with 50% Overlap, Zero-Padding, and a Non-Rectangular Window

Figure: Polyphase representation of the STFT with a more general window and $ M/2$ hop size ($ 50$ % overlap).

Figure 11.26 illustrates how a window and a hop size other than $ N$ can be introduced into the polyphase representation of the STFT. The constant-overlap-add of the window $ w(n)$ is implemented in the synthesis delay chain (which is technically the transpose of a tapped delay line). The downsampling factor and window must be selected together to give constant overlap-add, independent of the choice of polyphase matrices $ \bold{E}(z)$ and $ \bold{R}(z)$ (shown here as the $ \hbox{\sc DFT}$ and $ \hbox{\sc IDFT}$ ).

Example: Polyphase Analysis of the Weighted Overlap Add Case: 50% Overlap, Zero-Padding, and a Non-Rectangular Window

We may convert the previous example to a weighted overlap-add (WOLA) (§8.6) filter bank by replacing each $ w(m)$ by $ \sqrt{w(m)}$ and introducing these gains also between the $ \hbox{\sc IDFT}$ and upsamplers, as shown in Fig.11.27.

Figure 11.27: Polyphase representation of STFT using Weighted OverLap Add (WOLA).

Paraunitary Filter Banks

Paraunitary filter banks form an interesting subset of perfect reconstruction (PR) filter banks. We saw above that we get a PR filter bank whenever the synthesis polyphase matrix $ \bold{R}(z)$ times the analysis polyphase matrix $ \bold{E}(z)$ is the identity matrix $ \bold{I}$ , i.e., when

$\displaystyle \bold{P}(z) \isdefs \bold{R}(z)\bold{E}(z) \eqsp \bold{I}.$ (12.70)

In particular, if $ \bold{R}(z)$ is the paraconjugate of $ \bold{E}(z)$ , we say the filter bank is paraunitary.

Paraconjugation is the generalization of the complex conjugate transpose operation from the unit circle to the entire $ z$ plane. A paraunitary filter bank is therefore a generalization of an orthogonal filter bank. Recall that an orthogonal filter bank is one in which $ \bold{E}(e^{j\omega})$ is an orthogonal (or unitary) matrix, to within a constant scale factor, and $ \bold{R}(e^{j\omega})$ is its transpose (or Hermitian transpose).

Lossless Filters

To motivate the idea of paraunitary filters, let's first review some properties of lossless filters, progressing from the simplest cases up to paraunitary filter banks:

A linear, time-invariant filter $ H(z)$ is said to be lossless (or allpass) if it preserves signal energy. That is, if the input signal is $ x(n)$ , and the output signal is $ y(n) = (h\ast x)(n)$ , then we have

$\displaystyle \sum_{n=-\infty}^{\infty} \left\vert y(n)\right\vert^2 \eqsp \sum_{n=-\infty}^{\infty} \left\vert x(n)\right\vert^2.$ (12.71)

In terms of the $ L2$ signal norm $ \left\Vert\,\,\cdot\,\,\right\Vert _2$4.10.1), this can be expressed more succinctly as

$\displaystyle \left\Vert\,y\,\right\Vert _2^2 \eqsp \left\Vert\,x\,\right\Vert _2^2.$ (12.72)

Notice that only stable filters can be lossless since, otherwise, $ \left\Vert\,y\,\right\Vert=\infty$ . We further assume all filters are causal for simplicity.

It is straightforward to show that losslessness implies

$\displaystyle \left\vert H(e^{j\omega})\right\vert \eqsp 1, \quad \forall \omega.$ (12.73)

That is, the frequency response must have magnitude 1 everywhere on the unit circle in the $ z$ plane. Another way to express this is to write

$\displaystyle \overline{H(e^{j\omega})} H(e^{j\omega}) \eqsp 1, \quad\forall\omega,$ (12.74)

and this form generalizes to $ {\tilde H}(z)H(z)$ over the entire the $ z$ plane.

The paraconjugate of a transfer function may be defined as the analytic continuation of the complex conjugate from the unit circle to the whole $ z$ plane:

$\displaystyle {\tilde H}(z) \isdefs \overline{H}(z^{-1})$ (12.75)

where $ \overline{H}(z)$ denotes complex conjugation of the coefficients only of $ H(z)$ and not the powers of $ z$ . For example, if $ H(z)=1+jz^{-1}$ , then $ \overline{H}(z) = 1-jz^{-1}$ . We can write, for example,

$\displaystyle \overline{H}(z) \isdefs \overline{H\left(\overline{z}\right)}$ (12.76)

in which the conjugation of $ z$ serves to cancel the outer conjugation.

We refrain from conjugating $ z$ in the definition of the paraconjugate because $ \overline{z}$ is not analytic in the complex-variables sense. Instead, we invert $ z$ , which is analytic, and which reduces to complex conjugation on the unit circle.

The paraconjugate may be used to characterize allpass filters as follows:

A causal, stable, filter $ H(z)$ is allpass if and only if

$\displaystyle {\tilde H}(z) H(z) \eqsp 1.$ (12.77)

Note that this is equivalent to the previous result on the unit circle since

$\displaystyle {\tilde H}(e^{j\omega}) H(e^{j\omega}) \eqsp \overline{H}(1/e^{j\omega})H(e^{j\omega}) \eqsp \overline{H(e^{j\omega})}H(e^{j\omega}).$ (12.78)

To generalize lossless filters to the multi-input, multi-output (MIMO) case, we must generalize conjugation to MIMO transfer function matrices.

A $ p\times q$ transfer function matrix $ \bold{H}(z)$ is said to be lossless if it is stable and its frequency-response matrix $ \bold{H}(e^{j\omega})$ is unitary. That is,

$\displaystyle \bold{H}^*(e^{j\omega})\bold{H}(e^{j\omega}) \eqsp \bold{I}_q$ (12.79)

for all $ \omega$ , where $ \bold{I}_q$ denotes the $ q\times q$ identity matrix, and $ \bold{H}^\ast(e^{j\omega})$ denotes the Hermitian transpose (complex-conjugate transpose) of $ \bold{H}(e^{j\omega})$ :

$\displaystyle \bold{H}^*(e^{j\omega}) \isdefs \overline{\bold{H}^T(e^{j\omega})}$ (12.80)

Note that $ \bold{H}^*(e^{j\omega})\bold{H}(e^{j\omega})$ is a $ q\times q$ matrix product of a $ q\times p$ times a $ p\times q$ matrix. If $ q>p$ , then the rank must be deficient. Therefore, we must have $ p\geq q$ . (There must be at least as many outputs as there are inputs, but it's ok to have extra outputs.)

A lossless $ p\times q$ transfer function matrix $ \bold{H}(z)$ is paraunitary, i.e.,

$\displaystyle {\tilde {\bold{H}}}(z) \bold{H}(z) \eqsp \bold{I}_q$ (12.81)

Thus, every paraunitary matrix transfer function is unitary on the unit circle for all $ \omega$ . Away from the unit circle, paraunitary $ \bold{H}(z)$ is the unique analytic continuation of unitary $ \bold{H}(e^{j\omega})$ .

Lossless Filter Examples

The simplest lossless filter is a unit-modulus gain

$\displaystyle H(z) \eqsp e^{j\phi}$ (12.82)

where $ \phi$ can be any phase value. In the real case $ \phi$ can only be 0 or $ \pi$ , hence $ H(z)=\pm 1$ .

A lossless FIR filter can only consist of a single nonzero tap:

$\displaystyle H(z) \eqsp e^{j\phi} z^{-K} \protect$ (12.83)

for some fixed integer $ K$ , where $ \phi$ is again some constant phase, constrained to be 0 or $ \pi$ in the real-filter case. We consider only causal filters here, so $ K\geq 0$ .

Every finite-order, single-input, single-output (SISO), lossless IIR filter (recursive allpass filter) can be written as

$\displaystyle H(z) \eqsp e^{j\phi} z^{-K} \frac{z^{-N}{\tilde A}(z)}{A(z)}$ (12.84)

where $ K\geq 0$ , $ A(z) = 1 + a_1 z^{-1}+ a_2 z^{-2} + \cdots + a_N
z^{-N}$ , and $ {\tilde A}(z)\isdef \overline{A}(z^{-1})$ . The polynomial $ {\tilde A}(z)$ can be obtained by reversing the order of the coefficients in $ A(z)$ , conjugating them, and multiplying by $ z^N$ . (The factor $ z^{-N}$ above serves to restore negative powers of $ z$ and hence causality.) Such filters are generally called allpass filters.

The normalized DFT matrix is an $ N\times N$ order zero paraunitary transformation. This is because the normalized DFT matrix, $ \bold{W}=[W_N^{nk}]/\sqrt{N}$ , $ n,k=0,\ldots,N-1$ , where $ W_N\isdeftext
e^{-j2\pi/N}$ , is a unitary matrix:

$\displaystyle \frac{\bold{W}^\ast}{\sqrt{N}} \frac{\bold{W}}{\sqrt{N}} \eqsp \bold{I}_N$ (12.85)

Properties of Paraunitary Filter Banks

Paraunitary systems are essentially multi-input, multi-output (MIMO) allpass filters. Let $ \bold{H}(z)$ denote the $ p\times q$ matrix transfer function of a paraunitary system. In the square case ($ p=q$ ), the matrix determinant, $ \det[\bold{H}(z)]$ , is an allpass filter. Therefore, if a square $ \bold{H}(z)$ contains FIR elements, its determinant is a simple delay: $ \det[\bold{H}(z)]=z^{-K}$ for some integer $ K$ .

An $ N$ -channel analysis filter bank can be viewed as an $ N\times 1$ MIMO filter:

$\displaystyle \bold{H}(z) \eqsp \left[\begin{array}{c} H_1(z) \\ [2pt] H_2(z) \\ [2pt] \vdots \\ [2pt] H_N(z)\end{array}\right]$ (12.86)

A paraunitary filter bank must therefore satisfy

$\displaystyle {\tilde {\bold{H}}}(z)\bold{H}(z) \eqsp 1.$ (12.87)

More generally, we allow paraunitary filter banks to scale and/or delay the input signal:

$\displaystyle {\tilde {\bold{H}}}(z)\bold{H}(z) \eqsp c_K z^{-K}$ (12.88)

where $ K$ is some nonnegative integer and $ c_K\neq 0$ .

We can note the following properties of paraunitary filter banks:

  • The synthesis filter bank is simply the paraconjugate of the analysis filter bank:

    $\displaystyle \bold{F}(z) \eqsp {\tilde {\bold{H}}}(z)$ (12.89)

    That is, since the paraconjugate is the inverse of a paraunitary filter matrix, it is exactly what we need for perfect reconstruction.

  • The channel filters $ H_k(z)$ are power complementary:

    $\displaystyle \left\vert H_1(e^{j\omega})\right\vert^2 + \left\vert H_2(e^{j\omega})\right\vert^2 + \cdots + \left\vert H_N(e^{j\omega})\right\vert^2 \eqsp 1$ (12.90)

    This follows immediately from looking at the paraunitary property on the unit circle.

  • When $ \bold{H}(z)$ is FIR, the corresponding synthesis filter matrix $ {\tilde {\bold{H}}}(z)$ is also FIR.

  • When $ \bold{H}(z)$ is FIR, each synthesis filter, $ F_k(z) =
{\tilde {\bold{H}}}_k(z),\, k=1,\ldots,N$ , is simply the $ \hbox{\sc Flip}$ of its corresponding analysis filter $ H_k(z)=\bold{H}_k(z)$ :

    $\displaystyle f_k(n) \eqsp h_k(L-n)$ (12.91)

    where $ L$ is the filter length. (When the filter coefficients are complex, $ \hbox{\sc Flip}$ includes a complex conjugation as well.) This follows from the fact that paraconjugating an FIR filter amounts to simply flipping (and conjugating) its coefficients. As we observed in (11.83) above (§11.5.2), only trivial FIR filters of the form $ H(z) = e^{j\phi} z^{-K}$ can be paraunitary in the single-input, single-output (SISO) case. In the MIMO case, on the other hand, paraunitary systems can be composed of FIR filters of any order.

  • FIR analysis and synthesis filters in paraunitary filter banks have the same amplitude response. This follows from the fact that $ \hbox{\sc Flip}(h) \;\leftrightarrow\;\overline{H}$ , i.e., flipping an FIR filter impulse response $ h(n)$ conjugates the frequency response, which does not affect its amplitude response $ \vert H(e^{j\omega})\vert$ .

  • The polyphase matrix $ \bold{E}(z)$ for any FIR paraunitary perfect reconstruction filter bank can be written as the product of a paraunitary and a unimodular matrix, where a unimodular polynomial matrix $ \bold{U}(z)$ is any square polynomial matrix having a constant nonzero determinant. For example,

    $\displaystyle \bold{U}(z) \eqsp
\left[\begin{array}{cc} 1+z^3 & z^2 \\ [2pt] z & 1 \end{array}\right] $

    is unimodular. See [287, p. 663] for further details.


Consider the Haar filter bank discussed in §11.3.3, for which

$\displaystyle \bold{H}(z) \eqsp \frac{1}{\sqrt{2}}\left[\begin{array}{c} 1+z^{-1} \\ [2pt] 1-z^{-1} \end{array}\right].$ (12.92)

The paraconjugate of $ \bold{H}(z)$ is

$\displaystyle {\tilde {\bold{H}}}(z) \eqsp \frac{1}{\sqrt{2}}\left[\begin{array}{cc} 1+z & 1 - z \end{array}\right]$ (12.93)

so that

$\displaystyle {\tilde {\bold{H}}}(z) \bold{H}(z) \eqsp \frac{1}{2} \left[\begin{array}{cc} 1+z & 1 - z \end{array}\right] \left[\begin{array}{c} 1+z^{-1} \\ [2pt] 1-z^{-1} \end{array}\right] \eqsp 1.$ (12.94)

Thus, the Haar filter bank is paraunitary. This is true for any power-complementary filter bank, since when $ {\tilde {\bold{H}}}(z)$ is $ N\times 1$ , power-complementary and paraunitary are the same property. For more about paraunitary filter banks, see Chapter 6 of [287].

Filter Banks Equivalent to STFTs

We now turn to various practical examples of perfect reconstruction filter banks, with emphasis on those using FFTs in their implementation (i.e., various STFT filter banks).

Figure 11.28 illustrates a generic filter bank with $ N=3$ channels, much like we derived in §9.3. The analysis filters $ H_{k}(z)$ , $ k=1, \ldots, N$ are bandpass filters derived from a lowpass prototype $ H_{0}(z)$ by modulation (e.g., $ H_{k}(z) = H_{0}(zW_N^k),\; W_N \isdef e^{-j\frac{2 \pi}{N}}$ ), as shown in the right portion of the figure. The channel signals $ X_n(\omega_k)$ are given by the convolution of the input signal with the $ k$ th channel impulse response:

$\displaystyle X_n(\omega_k) \eqsp \sum_{m=-\infty}^{\infty} x(m) h_{0}(n-m)W_N^{-km} \protect$ (12.95)

From Chapter 9, we recognize this expression as the sliding-window STFT, where $ h_{0}(n-m)$ is the flip of a sliding window ``centered'' at time $ n$ , and $ X_{n}(\omega_k)$ is the $ k$ th DFT bin at time $ n$ . We also know from that discussion that remodulating the DFT channel outputs and summing gives perfect reconstruction of $ x(n)$ whenever $ h_{0}(n)$ is Nyquist(N) (the defining condition for Portnoff windows [213], §9.7).

% latex2html id marker 30457\psfrag{z^-1}{{\large $z^{-1}$}}\psfrag{X(z)}{{\large $X(z)$}}\psfrag{X0(z)}{{\large $X_0(z)$}}\psfrag{X1(z)}{{\large $X_1(z)$}}\psfrag{X2(z)}{{\large $X_2(z)$}}\psfrag{Xn[0]}{{\normalsize $X_n(\omega_0)$}}\psfrag{Xn[1]}{{\normalsize $X_n(\omega_1)$}}\psfrag{Xn[2]}{{\normalsize $X_n(\omega_2)$}}\psfrag{H0(z)}{{\large $H_0(z)$}}\psfrag{H1(z)}{{\large $H_1(z)$}}\psfrag{H2(z)}{{\large $H_2(z)$}}\psfrag{W(-0n,N)}{{\small $W_N^{0n}$}}\psfrag{W(-1n,N)}{{\small $W_N^{-1n}$}}\psfrag{W(-2n,N)}{{\small $W_N^{-2n}$}}\psfrag{W(0n,N)}{{\small $W_N^{0n}$}}\psfrag{W(1n,N)}{{\small $W_N^{1n}$}}\psfrag{W(2n,N)}{{\small $W_N^{2n}$}}\psfrag{STFT}{}\begin{figure}[htbp]
\caption{Portnoff analysis bank for $N=3$.}

Suppose the analysis window $ w$ (flip of the baseband-filter impulse response $ h_0$ ) is length $ L>N$ . Then in the context of overlap-add processors (Chapter 8), $ w$ is a Portnoff window, and implementing the window with a length $ N$ FFT requires that the windowed data frame be time-aliased down to length $ N$ prior to taking a length $ N$ FFT (see §9.7). We can obtain this same result via polyphase analysis, as elaborated in the next section.

Polyphase Analysis of Portnoff STFT

Consider the $ k$ th filter-bank channel filter

$\displaystyle H_{k}(z) \eqsp H_{0}(zW_N^k), \; k=1,\ldots,N-1\,.$ (12.96)

The impulse-response $ h_0(n)$ can be any length $ L$ sequence. Denote the $ N$ -channel polyphase components of $ H_0(z)$ by $ E_l(z)$ , $ l=0,1,\ldots,N-1$ . Then by the polyphase decomposition (§11.2.2), we have

H_{0}(z) & = & \sum_{l=0}^{N-1} z^{-l}E_l(z^{N}) \\ [5pt]
H_{k}(z) & = & \sum_{l=0}^{N-1} (zW_N^k)^{-l} E_l[(z W_N^k)^{N}] \\ [5pt]
& = & \sum_{l=0}^{N-1} z^{-l} E_l(z^{N}) W_N^{-kl}.


H_{k}(z)X(z) & = & \sum_{l=0}^{N-1} z^{-l} E_l(z^{N})X(z) W_N^{-kl} \\
H_{0}(z) \\
\ldots \\
H_{N-1}(z) \end{array} \right]
& = & \left[\begin{array}{ccc}
& & \\
& W_N^{-kl} & \\
& & \end{array} \right]
E_0(z^N) z^{-0} X(z) \\
\ldots \\
E_{N-1}(z^N) z^{-(N-1)} X(z) \end{array}

If $ H_{0}(z)$ is a good $ N$ th-band lowpass, the subband signals $ x_{k}(n)$ are bandlimited to a region of width $ 2\pi/N$ . As a result, there is negligible aliasing when we downsample each of the subbands by $ N$ . Commuting the downsamplers to get an efficient implementation gives Fig.11.29.

% latex2html id marker 30545\psfrag{X(z)}{{\large $X(z)$}}\psfrag{E0(z^3)}{{\large $E_0(z^3)$}}\psfrag{E1(z^3)}{{\large $E_1(z^3)$}}\psfrag{E2(z^3)}{{\large $E_2(z^3)$}}\psfrag{E0(z)}{{\large $E_0(z)$}}\psfrag{E1(z)}{{\large $E_1(z)$}}\psfrag{E2(z)}{{\large $E_2(z)$}}\psfrag{X0(z)}{{\large $X_0(z)$}}\psfrag{X1(z)}{{\large $X_1(z)$}}\psfrag{X2(z)}{{\large $X_2(z)$}}\begin{figure}[htbp]
\caption{Polyphase implementation of
Portnoff STFT filter bank for $N=3$.}

First note that if $ E_k(z) = 1$ for all $ k$ , the system of Fig.11.29 reduces to a rectangularly windowed STFT in which the window length $ M$ equals the DFT length $ N=3$ . The downsamplers ``hold off'' the DFT until the length 3 delay line fills with new input samples, then it ``fires'' to produce a spectral frame. A new spectral frame is produced after every third sample of input data is received.

In the more general case in which $ E_k(z)$ are nontrivial filters, such as $ E_k(z)=1+z^{-1}$ , for example, they can be seen to compute the equivalent of a time aliased windowed input frame, such as $ x(n) + x(n-N)$ . This follows because the filters operate on the downsampled input stream, so that the filter coefficients operate on signal samples separated by $ N$ samples. The linear combination of these samples by the filter implements the time-aliased windowed data frame in a Portnoff-windowed overlap-add STFT. Taken together, the polyphase filters $ E_k(z)$ compute the appropriately time-aliased data frame windowed by $ w=\hbox{\sc Flip}(h_0)\;\leftrightarrow\;H_{0}(z^{-1})$ .

In the overlap-add interpretation of Fig.11.29, the window is hopped by $ N=3$ samples. While this was the entire window length in the rectangular window case ($ E_k(z) = 1$ ), it is only a portion of the effective frame length $ L$ when the analysis filters have order 1 or greater.

MPEG Filter Banks

This section provides some highlights of the history of filter banks used for perceptual audio coding (MPEG audio). For a more complete introduction and discussion of MPEG filter banks, see, e.g., [16,273].

Pseudo-QMF Cosine Modulation Filter Bank

Section 11.3.5 introduced two-channel quadrature mirror filter banks (QMF). QMFs were shown to provide a particular class of perfect reconstruction filter banks. We found, however, that the quadrature mirror constraint on the analysis filters,

$\displaystyle H_1(z) \eqsp H_0(-z),$ (12.97)

was rather severe in that linear-phase FIR implementations only exist in the two-tap case $ H_k(z) = h_{0k}+h_{1k}z^{-1}$ , $ k=0,1$ . In addition to relaxing this constraint, we need to be able to design an $ N$ -channel filter bank for any $ N$ .

The Pseudo-QMF (PQMF) filter bank is a ``near perfect reconstruction'' filter bank in which aliasing cancellation occurs only between adjacent bands [194,287]. The PQMF filters commonly used in perceptual audio coders employ bandpass filters with stop-band attenuation near $ 96$ dB, so the neglected bands (which alias freely) are not significant. An outline of the design procedure is as follows:

  1. Design a lowpass prototype window, $ h(n)$ , with length $ M=LN$ , $ L,M,N \in {\bf Z}.$
  2. The lowpass design is constrained to give aliasing cancellation in neighboring subbands:

\vert H(e^{j\omega})\vert^2 + \vert H(e^{j(\pi/N)-\omega})\vert^2 &=& 2, \hspace{.5cm}0 < \vert\omega\vert <
\pi/{2N} \\
\vert H(e^{j\omega})\vert^2 &=& 0, \hspace{.5cm}\vert w\vert > \pi/N

  3. The filter bank analysis filters $ h_k(n)$ are cosine modulations of $ h(n)$ :

    $\displaystyle h_k(n) \eqsp h(n)\hbox{cos}\left[\left(k+\frac{1}{2}\right)\left(n-\frac{M-1}{2}\right)\frac{\pi}{N} + \phi_k\right],$ (12.98)

    $ k=0,\ldots,N-1$ , where the phases are restricted according to

    $\displaystyle \phi_{k+1} - \phi_k \eqsp (2r+1)\frac{\pi}{2}$ (12.99)

    again for aliasing cancellation.
  4. Since it is an orthogonal filter bank by construction, the synthesis filters are simply the time-reverse of the analysis filters:

    $\displaystyle f_k(n) \eqsp h_k(M-1-n)$ (12.100)

This PQMF filter bank is reportedly used in MPEG audio, layers I and II with $ N=32$ bands and $ M=512$ taps ($ L=8$ ).

Perfect Reconstruction Cosine Modulated Filter Banks

By changing the phases $ \phi_k$ , the pseudo-QMF filter bank can yield perfect reconstruction:

$\displaystyle \phi_k \eqsp \left(k+\frac{1}{2}\right)\left(L+1\right)\frac{\pi}{2}$ (12.101)

where $ L$ is the length of the polyphase filter ($ M=LN$ ).

If $ M=2N$ , then this is the oddly stacked Princen-Bradley filter bank and the analysis filters are related by cosine modulations of the lowpass prototype:

$\displaystyle f_k(n) \eqsp h(n)\hbox{cos}\left[\left(n+\frac{N+1}{2}\right)\left(k+\frac{1}{2}\right)\frac{\pi}{N}\right],\quad k=0,\ldots,N-1$ (12.102)

However, the length of the filters $ M$ can be any even multiple of $ N$ :

$\displaystyle M\eqsp LN, \quad (L/2) \in \cal{Z}$ (12.103)

The parameter $ L$ is called the overlapping factor. These filter banks are also referred to as extended lapped transforms, when $ K \ge 2$ [159].

MPEG Layer III Filter Bank

MPEG 1 and 2, Layer III is popularly known as ``MP3 format.'' The original MPEG 1 and 2, Layers I and II, based on the MUSICAM coder, contained only 32 subbands, each band approximately 650 Hz wide, implemented using a length 512 lowpass-prototype window, lapped (``time aliased'') by factor of 512/32 = 16, thus yielding 32 real bands with 96 dB of stop-band rejection, and having a hop size of 32 samples [149, §4.1.1]. It was found, however, that a higher coding gain was obtained using a finer frequency resolution. As a result, the MPEG 1&2 Layer III coder (based on the ASPEC coder from AT&T), appended a Princen-Bradley filter bank [214] having 6 to 18 subbands to the output of each subband of the 32-channel PQMF cosine-modulated analysis filter bank [149, § 4.1.2]. The number of sub-bands and window shape were chosen to be signal-dependent as follows:

  • Transients use $ 32\times 6=192$ subbands, corresponding to relatively high time resolution and low frequency resolution.
  • Steady-state tones use $ 32\times 18=576$ subbands, corresponding to higher frequency resolution and lower time resolution relative to transients.12.3
  • The encoder generates a function called the perceptual entropy (PE) which tells the coder when to switch resolutions.

The MPEG AAC coder is often regarded as providing nearly twice the compression ratio of ``MP3'' (MPEG 1-2 Layer III) coding at the same quality level.12.4 MPEG AAC introduced a new MDCT filter bank that adaptively switches between 128 and 1024 bands (length 256 and 2048 FFT windows, using 50% overlap) [149, §4.1.6]. The nearly doubled number of frequency bands available for coding steady-state signal intervals contributed much to the increased coding gain of AAC over MP3. The 128-1024 MDCT filter bank in AAC is also considerably simpler than the hierarchical $ 32\times 6$ -$ 18$ MP3 filter bank, without requiring the ``cross-talk aliasing reduction'' needed by the PQMF/MDCT hierarchical filter bank of MP3 [149, §4.1.6].

The MPEG-4 audio compression standard (there was no MPEG-3), included a new transform coder based on the AAC filter bank [149, §4.1.7].

See, e.g., [16,273] for much more on MPEG coders and related topics. Chapter 4 of [149] contains an excellent summary of MPEG, Sony ATRAC, and Dolby AC-n coders up to 1998.

Review of STFT Filterbanks

Let's take a look at some of the STFT processors we've seen before, now viewed as polyphase filter banks. Since they all use FFTs to perform overlap-add decompositions of spectra, they are all efficient, but most are oversampled in time and/or frequency as ``filter banks'' go. Oversampling is usually preferred outside of a compression context, and normally required when spectral modifications are to be performed. The STFT also computes a uniform filter bank, but it can be used as the basis for a variety of non-uniform filter banks, as discussed in §10.7, to give frequency resolution more like that of hearing7.3).

For each selected STFT example below, a list of filter-bank properties is listed, followed by some discussion. Most of the properties are determined by the choice of FFT window $ w$ and FFT hop size $ R$ .

STFT, Rectangular Window, No Overlap

  • Perfect reconstruction
  • Critically sampled (relies on aliasing cancellation)
  • Poor channel isolation ( $ \approx 13$ dB)
  • Not robust to filter-bank modifications
This is the main critically sampled perfect-reconstruction filter bank implemented by an STFT (other examples involve Portnoff windows, §9.7). ``Filter-bank modifications'' here means modifications introduced as time-varying complex gains applied to the filter-bank channel signals prior to remodulation and summing to reconstruct the signal (Chapter 9). In contrast to this, as discussed in Chapter 8, multiplicative spectral modifications in overlap-add systems having sufficient time-domain zero-padding yield perfect reconstruction of the filtered signal, even when their filter-bank interpretation obviously involves aliasing cancellation among channels in the frequency domain.

STFT, Rectangular Window, 50% Overlap

  • Perfect reconstruction
  • Oversampled by 2 (less reliant on aliasing cancellation)
  • Poor channel isolation ( $ \approx 13$ dB)
  • Not very robust to filter-bank modifications, but better
Reducing the hop size to half the window length greatly reduces the amount of aliasing in the filter-bank output signals8.3.1). Recall that this happens because the folding frequency due to downsampling (by the hop size) doubles to coincide with the first zero-crossing of the window transform.

STFT, Triangular Window, 50% Overlap

  • Perfect reconstruction
  • Oversampled by 2
  • Better channel isolation ( $ \approx 26$ dB)
  • Moderately robust to filter-bank modifications
This case is essentially the no-overlap rectangular-window case with the window-length doubled and the window-transform squared, as derived in Chapter 3. The squaring doubles the channel isolation in dB. To move the folding frequency out to the first zero-crossing, 75% overlap should be used ($ 4\times$ oversampling in time).

STFT, Hamming Window, 75% Overlap

  • Perfect reconstruction
  • Oversampled by 4
  • Aliasing from side lobes only
  • Good channel isolation ( $ \approx 42$ dB)
  • Moderately robust to filter-bank modifications
This can be considered a ``telephone quality'' audio filter bank. It has been used many times to analyze/model speech signals.

STFT, Kaiser Window, Beta=10, 90% Overlap

  • Approximate perfect reconstruction (side lobes controlled by $ \beta $ )
  • Oversampled by $ 10$
  • Excellent channel isolation ( $ \approx 80$ dB)
  • Very robust to filter-bank modifications
  • Aliasing from side lobes only
Because the Kaiser window transform does not have harmonic nulls that can be tuned to harmonics of the frame rate (§8.3.1), we obtain only approximate perfect reconstruction. However, the reconstruction error is entirely under our control through the choice of the Kaiser-window $ \beta $ parameter (determining its time-bandwidth product). This filter bank does not rely on aliasing cancellation (when side-lobes are negligible), so it is very robust to spectral modifications.

Sliding FFT (Maximum Overlap), Any Window, Zero-Padded by 5

This example is practical in research applications. With powerful computers and large disks, why not set the FFT hop size to $ R=1$ and avoid all aliasing entirely? In the early days of computer music, this was the normal choice in phase-vocoder analysis for additive synthesisG.10), and it is of course far more affordable now. For aggressive spectral modifications, the sliding FFT ($ R=1$ ) generally yields the best quality results. Additionally, the signal analyzed can be oversampled so that the frequency domain has a large extended area where nonlinear distortion products can ``land'' without aliasing. As an example, ``tube distortion'' simulators routinely utilize $ 8\times$ or even $ 16\times$ oversampling in the input signal prior to distortion [300].

Wavelet Filter Banks

Let's now approach filter-bank derivation from a ``Hilbert space'' (geometric) point of view. This is the most natural setting for the study of wavelet filter banks [291,287].

Geometric Signal Theory

In general, signals can be expanded as a linear combination of orthonormal basis signals $ \varphi_k $ [264]. In the discrete-time case, this can be expressed as

$\displaystyle x(n)$ $\displaystyle =$ $\displaystyle \sum_{k=-\infty}^{\infty}\left<\varphi_k ,x\right> \varphi_k (n) \protect$ (12.104)
    $\displaystyle n\in(-\infty,\infty), \quad x(n), \varphi_k (n) \in {\cal C}$  

where the coefficient of projection of $ x$ onto $ \varphi_k $ is given by

$\displaystyle \left<\varphi_k ,x\right> \; \isdef \sum_{n=-\infty}^{\infty}\varphi_k ^\ast(n) x(n) \qquad \qquad \hbox{(inner product)}$ (12.105)

and the basis signals are orthonormal:

$\displaystyle \left<\varphi_k ,\varphi_l \right> \eqsp \delta(k-l) \eqsp \left\{\begin{array}{ll} 1, & k=l \\ 0, & k\neq l \\ \end{array} \right. \qquad \hbox{(orthonormal)}$ (12.106)

The signal expansion (11.104) can be interpreted geometrically as a sum of orthogonal projections of $ x$ onto $ \{\varphi_k \}$ , as illustrated for 2D in Fig.11.30.
Figure 11.30: Orthogonal projection in 2D.

A set of signals $ \{h_k,f_k\}_{k=1}^N$ is said to be a biorthogonal basis set if any signal $ x$ can be represented as

$\displaystyle x = \sum_{k=1}^N \alpha_k\left<x,h_k\right>f_k$ (12.107)

where $ \alpha_k$ is some normalizing scalar dependent only on $ h_k$ and/or $ f_k$ . Thus, in a biorthogonal system, we project onto the signals $ h_k$ and resynthesize in terms of the basis $ f_k$ .

The following examples illustrate the Hilbert space point of view for various familiar cases of the Fourier transform and STFT. A more detailed introduction appears in Book I [264].

Natural Basis

The natural basis for a discrete-time signal $ x(n)$ is the set of shifted impulses:

$\displaystyle \varphi_k \isdefs [\ldots, 0,\underbrace{1}_{k^{\hbox{\tiny th}}},0,\ldots],$ (12.108)


$\displaystyle \varphi_k (n) \isdefs \delta(n-k)$ (12.109)

for all integers $ k$ and $ n$ . The basis set is orthonormal since $ \left<\varphi_k ,\varphi_l \right>=\delta(k-l)$ . The coefficient of projection of $ x$ onto $ \varphi_k $ is given by

$\displaystyle \left<\varphi_k ,x\right> \eqsp x(k)$ (12.110)

so that the expansion of $ x$ in terms of the natural basis is simply

$\displaystyle x \eqsp \sum_{k=-\infty}^{\infty}x(k) \varphi_k ,$ (12.111)

$\displaystyle x(n)$ $\displaystyle =$ $\displaystyle \sum_{k=-\infty}^{\infty}x(k) \varphi_k (n)$  
  $\displaystyle =$ $\displaystyle \sum_{k=-\infty}^{\infty}x(k) \delta(n-k)
\isdefs (x \ast \delta)(n).$  

This expansion was used in Book II [263] to derive the impulse-response representation of an arbitrary linear, time-invariant filter.

Normalized DFT Basis for $ {\bf C}^N$

The Normalized Discrete Fourier Transform (NDFT) (introduced in Book I [264]) projects the signal $ x$ onto $ N$ discrete-time sinusoids of length $ N$ , where the sinusoids are normalized to have unit $ L2$ norm:

$\displaystyle \varphi_k (n) \isdefs \frac{e^{j\omega_k n}}{\sqrt{N}}, \quad \omega_k \isdef k\frac{2\pi}{N},$ (12.112)

and $ n,k \in [0,N-1]$ . The coefficient of projection of $ x$ onto $ \varphi_k $ is given by
$\displaystyle \left<\varphi_k ,x\right>$ $\displaystyle \isdef$ $\displaystyle \frac{1}{\sqrt{N}}\sum_{n=0}^{N-1} x(n) e^{-j\omega_k n}$  
  $\displaystyle \isdef$ $\displaystyle \hbox{NDFT}_{N,k}(x)
\isdefs \frac{1}{\sqrt{N}}\hbox{DFT}_{N,k}(x) \isdefs \frac{X(\omega_k )}{\sqrt{N}}$  

and the expansion of $ x$ in terms of the NDFT basis set is
$\displaystyle x(n)$ $\displaystyle =$ $\displaystyle \sum_{k=0}^{N-1} \left<\varphi_k ,x\right> \varphi_k (n)$  
  $\displaystyle =$ $\displaystyle \frac{1}{N} \sum_{k=0}^{N-1} X(k)e^{j\omega_k n}$  
  $\displaystyle \isdef$ $\displaystyle \hbox{DFT}_{N,n}^{-1}(X)$  

for $ n=0,1,2,\ldots,N-1$ .

Normalized Fourier Transform Basis

The Fourier transform projects a continuous-time signal $ x(t)$ onto an infinite set of continuous-time complex sinusoids $ \exp(j\omega t)$ , for $ t,\omega\in(-\infty,\infty)$ . These sinusoids all have infinite $ L2$ norm, but a simple normalization by $ 1/\sqrt{2\pi}$ can be chosen so that the inverse Fourier transform has the desired form of a superposition of projections:

$\displaystyle \varphi _\omega (t) \isdef e^{j\omega t}/\sqrt{2\pi},\quad \omega , t\in (-\infty,\infty)$ (12.113)

$\displaystyle \displaystyle
\implies\quad \left<\varphi _\omega ,x\right>$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty} x(t) e^{-j\omega t} dt$  
  $\displaystyle \isdef$ $\displaystyle \hbox{FT}_\omega (x)/\sqrt{2\pi} \isdef X(\omega )/\sqrt{2\pi}$  
$\displaystyle x(t)$ $\displaystyle =$ $\displaystyle \int_{-\infty}^{\infty} \left<\varphi _\omega ,x\right> \varphi _\omega (t) d\omega$  
  $\displaystyle =$ $\displaystyle \frac{1}{2\pi} \int_{-\infty}^{\infty} X(\omega )e^{j\omega t} d\omega$  
  $\displaystyle \isdef$ $\displaystyle \hbox{FT}_t^{-1}(X)$  

Normalized DTFT Basis

The Discrete Time Fourier Transform (DTFT) is similar to the Fourier transform case:

$\displaystyle \varphi _\omega (n) \isdef e^{j\omega n}/\sqrt{2\pi},\quad \omega \in (-\pi,\pi], \quad n\in (-\infty,\infty)$ (12.114)

The inner product $ \left<\varphi _\omega ,x\right>$ and reconstruction of $ x(n)$ in terms of $ \{\varphi _\omega \}$ are left as exercises.

Normalized STFT Basis

The Short Time Fourier Transform (STFT) is defined as a time-ordered sequence of DTFTs, and implemented in practice as a sequence of FFTs (see §7.1). Thus, the signal basis functions are naturally defined as the DFT-sinusoids multiplied by time-shifted windows, suitably normalized for unit $ L2$ norm:

$\displaystyle \varphi_{mk}(n) \isdef \frac{w(n-mR)e^{j\omega_k n}}{\left\Vert\,\hbox{\sc Shift}_{mR}(w) e^{j\omega_k (\cdot)}\,\right\Vert} = \frac{w(n-mR) e^{j\omega_k n}}{\sqrt{\sum_n{w^2(n)}}},$ (12.115)

$\displaystyle \omega_k = \frac{2\pi k}{N}, \quad k \in [0,N-1], \quad n\in (-\infty,\infty),\quad w(n)\in{\cal R},$ (12.116)

and $ N$ is the DFT length.

When successive windows overlap (i.e., the hop size $ R$ is less than the window length $ M$ ), the basis functions are not orgthogonal. In this case, we may say that the basis set is overcomplete.

The basis signals are orthonormal when $ R=M=N$ and the rectangular window is used ($ w=w_R$ ). That is, two rectangularly windowed DFT sinusoids are orthogonal when either the frequency bin-numbers or the time frame-numbers differ, provided that the window length $ M$ equals the number of DFT frequencies $ N$ (no zero padding). In other words, we obtain an orthogonal basis set in the STFT when the hop size, window length, and DFT length are all equal (in which case the rectangular window must be used to retain the perfect-reconstruction property). In this case, we can write

$\displaystyle \varphi_{mk}= \hbox{\sc Shift}_{mN}\left[\hbox{\sc ZeroPad}_\infty\left(\varphi_k ^{\hbox{\tiny DFT}}\right)\right],$ (12.117)


$\displaystyle \varphi_{mk}(n) = \left\{\begin{array}{ll} \frac{e^{j\omega_k n}}{\sqrt{N}}, & mN \leq n \leq (m+1)N-1 \\ [5pt] 0, & \mbox{otherwise.} \\ \end{array} \right.$ (12.118)

The coefficient of projection can be written
$\displaystyle \displaystyle
\left<\varphi_{mk},x\right>$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{N}} \sum_{n=-\infty}^{\infty}
x(n) w_R(n-mN) e^{-j\omega_k n}$  
  $\displaystyle \isdef$ $\displaystyle \frac{\hbox{STFT}_{N,m,k}(x)}{\sqrt{N}} \isdefs \frac{X_m(\omega_k )}{\sqrt{N}}$  

so that the signal expansion can be interpreted as
$\displaystyle \displaystyle
x(n)$ $\displaystyle =$ $\displaystyle \sum_{m=-\infty}^{\infty}\sum_{k=0}^{N-1} \left<\varphi_{mk},x\right> \varphi_{mk}(n)$  
  $\displaystyle =$ $\displaystyle \sum_{m=-\infty}^{\infty}
w_R(n-mN)\frac{1}{N}\sum_{k=0}^{N-1} X_m(\omega_k )e^{j\omega_k n}$  
  $\displaystyle =$ $\displaystyle \sum_{m=-\infty}^{\infty}
\hbox{\sc Shift}_{mN,n}\left\{\hbox{\sc ZeroPad}_\infty\left[\hbox{DFT}_N^{-1}(X_m)\right]\right\}$  
  $\displaystyle \isdef$ $\displaystyle \hbox{STFT}_{N,n}^{-1}(X)$  

In the overcomplete case, we get a special case of weighted overlap-add8.6):

$\displaystyle \displaystyle
x(n)$ $\displaystyle =$ $\displaystyle \sum_{m=-\infty}^{\infty}\sum_{k=0}^{N-1} \left<\varphi_{mk},x\right> \varphi_{mk}(n)$  
  $\displaystyle =$ $\displaystyle \sum_{m=-\infty}^{\infty} \frac{1}{N}\sum_{k=0}^{N-1} X_m(\omega_k ) w(n-mN)e^{j\omega_k n}$  

Continuous Wavelet Transform

In the present (Hilbert space) setting, we can now easily define the continuous wavelet transform in terms of its signal basis set:

$\displaystyle \displaystyle
\varphi_{s\tau}(t)$ $\displaystyle \isdef$ $\displaystyle \frac{1}{\sqrt{\vert s\vert}} f^\ast\left(\frac{\tau-t}{s}\right),
\qquad \tau,s,t \in (-\infty,\infty)$  
$\displaystyle X(s,\tau)$ $\displaystyle \isdef$ $\displaystyle \frac{1}{\sqrt{\vert s\vert}} \int_{-\infty}^{\infty} x(t)
f\left(\frac{t-\tau}{s}\right) dt$  

The parameter $ s$ is called a scale parameter (analogous to frequency). The normalization by $ 1/\sqrt{\vert s\vert}$ maintains energy invariance as a function of scale. We call $ X(s,\tau)$ the wavelet coefficient at scale $ s$ and time $ \tau$ . The kernel of the wavelet transform $ f(t)$ is called the mother wavelet, and it typically has a bandpass spectrum. A qualitative example is shown in Fig.11.31.

Figure: Typical qualitative appearance of first three wavelets when the scale parameter is $ s=2$ .

The so-called admissibility condition for a mother wavelet $ \psi(t)$ is

$\displaystyle C_\psi
= \int_{-\infty}^\infty \frac{\vert\Psi(\omega )\vert^2}{\vert\omega \vert}d\omega
< \infty .

Given sufficient decay with $ \omega$ , this reduces to $ \Psi(0)=0$ , that is, the mother wavelet must be zero-mean.

The Morlet wavelet is simply a Gaussian-windowed complex sinusoid:

$\displaystyle \psi(t)$ $\displaystyle \isdef$ $\displaystyle \frac{1}{\sqrt{2\pi}} e^{-j\omega _0 t} e^{-t^2/2}$  
$\displaystyle \;\longleftrightarrow\;\quad
\Psi(\omega )$ $\displaystyle =$ $\displaystyle e^{-(\omega -\omega _0)^2/2}$  

The scale factor is chosen so that $ \left\Vert\,\psi\,\right\Vert=1$ . The center frequency $ \omega_0$ is typically chosen so that second peak is half of first:

$\displaystyle \omega _0\eqsp \pi\sqrt{2/\hbox{ln}2} \;\approx\; 5.336$ (12.119)

In this case, we have $ \Psi(0)\approx 7\times10^{-7}\approx 0$ , which is close enough to zero-mean for most practical purposes.

Since the scale parameter of a wavelet transform is analogous to frequency in a Fourier transform, a wavelet transform display is often called a scalogram, in analogy with an STFT ``spectrogram'' (discussed in §7.2).

When the mother wavelet can be interpreted as a windowed sinusoid (such as the Morlet wavelet), the wavelet transform can be interpreted as a constant-Q Fourier transform.12.5Before the theory of wavelets, constant-Q Fourier transforms (such as obtained from a classic third-octave filter bank) were not easy to invert, because the basis signals were not orthogonal. See Appendix E for related discussion.

Discrete Wavelet Transform

The discrete wavelet transform is a discrete-time, discrete-frequency counterpart of the continuous wavelet transform of the previous section:

$\displaystyle X(k,n)$ $\displaystyle =$ $\displaystyle s^{-k/2} \int_{-\infty}^\infty x(t) h\left(nT-a^{-k}t\right) dt$  
  $\displaystyle =$ $\displaystyle \int_{-\infty}^\infty x(t) h\left(na^k T-t\right) dt$  

where $ n$ and $ k$ range over the integers, and $ h$ is the mother wavelet, interpreted here as a (continuous) filter impulse response.

The inverse transform is, as always, the signal expansion in terms of the orthonormal basis set:

$\displaystyle x(t) = \sum_k \sum_n X(k,n) \underbrace{\varphi _{kn}(t)}_{\hbox{basis}}$ (12.120)

We can show that discrete wavelet transforms are constant-Q by defining the center frequency of the $ k$ th basis signal as the geometric mean of its bandlimits $ \omega_1$ and $ \omega _2$ , i.e.,

$\displaystyle \omega _c(k) \isdef \sqrt{\omega _1(k)\,\omega _2(k)} \eqsp \sqrt{a^k\omega _1(0)\,a^k\omega _2(0)} \eqsp a^k\omega _c(0).$ (12.121)


$\displaystyle Q(k) \isdefs \frac{\omega _c(k)}{\omega _2(k) - \omega _1(k)} \eqsp \frac{a^k\omega _c(0)}{a^k\omega _2(0) - a^k\omega _1(0)} \eqsp Q(0)$ (12.122)

which does not depend on $ k$ .

Discrete Wavelet Filterbank

In a discrete wavelet filterbank, each basis signal is interpreted as the impulse response of a bandpass filter in a constant-Q filter bank:

$\displaystyle h_k(t)$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{a^k}}\, h\left(\frac{t}{a^k}\right), \quad a>1$  
$\displaystyle \longleftrightarrow\quad H_k(\omega )$ $\displaystyle =$ $\displaystyle \sqrt{a^k}\, H(a^k\omega )$  

Thus, the $ k$ th channel-filter $ H_k(\omega )$ is obtained by frequency-scaling (and normalizing for unit energy) the zeroth channel filter $ H_0(\omega )$ . The frequency scale-factor is of course equal to the inverse of the time-scale factor.

Recall that in the STFT, channel filter $ H_k(\omega )$ is a shift of the zeroth channel-filter $ H_0(\omega )$ (which corresponds to ``cosine modulation'' in the time domain).

As the channel-number $ k$ increases, the channel impulse response $ h_k$ lengthens by the factor $ a^k$ ., while the pass-band of its frequency-response $ H_k$ narrows by the inverse factor $ a^{-k}$ .

Figure 11.32 shows a block diagram of the discrete wavelet filter bank for $ a=2$ (the ``dyadic'' or ``octave filter-bank'' case), and Fig.11.33 shows its time-frequency tiling as compared to that of the STFT. The synthesis filters $ f_k$ may be used to make a biorthogonal filter bank. If the $ h_k$ are orthonormal, then $ f_k=h_k$ .

Figure 11.32: Dyadic Biorthogonal Wavelet Filterbank

Figure 11.33: Time-frequency tiling for the Short-Time Fourier Transform (left) and dyadic wavelet filter bank (right).

Dyadic Filter Banks

Figure 11.34: Frequency response magnitudes for a dyadic filter bank (amplitude scalings optional).

A dyadic filter bank is any octave filter bank,12.6 as illustrated qualitatively in Figure 11.34. Note that $ H_0(\omega )$ is the top-octave bandpass filter, $ H_1(w) = \sqrt{2}
H_0(2\omega )$ is the bandpass filter for next octave down, $ H_2(w) =
2H_0(4\omega )$ is the octave bandpass below that, and so on. The optional scale factors result in the same sum-of-squares for each channel-filter impulse response.

A dyadic filter bank may be derived from the discrete wavelet filter bank by setting $ a=2$ and relaxing the exact orthonormality requirement on the channel-filter impulse responses. If they do happen to be orthonormal, we may call it a dyadic wavelet filter bank.

For a dyadic filter bank, the center-frequency of the $ k$ th channel-filter impulse response can be defined as

$\displaystyle \omega _c(k) \eqsp \sqrt{\omega _0(\omega _0+\hbox{bandwidth})} \eqsp \sqrt{2}\omega _0$ (12.123)

so that

$\displaystyle Q \eqsp \frac{\sqrt{2}\omega _0}{2\omega _0 - \omega _0} \eqsp \sqrt{2}.$ (12.124)

Thus, a dyadic filter bank is a special case of a constant-Q filter bank for which the $ Q$ is $ \sqrt{2}$ .

Dyadic Filter Bank Design

Design of dyadic filter banks using the window method for FIR digital filter design (introduced in §4.5) is described in, e.g., [226, §6.2.3b].

A ``very easy'' method suggested in [287, §11.6] is to design a two-channel paraunitary QMF bank, and repeat recursively to split the lower-half of the spectrum down to some desired depth.

Generalized STFT

A generalized STFT may be defined by [287]

$\displaystyle x_k(n)$ $\displaystyle =$ $\displaystyle (x\ast h_k)(nR_k) \eqsp \sum_{m=-\infty}^{\infty}x(m) \underbrace{h_k(nR_k - m)}_{\hbox{analysis filter}}$  
$\displaystyle x(n)$ $\displaystyle =$ $\displaystyle \sum_k (x_k\ast f_k)(n) \eqsp \sum_{k=0}^{N-1}\sum_{m=-\infty}^{\infty}x_k(m)
\underbrace{f_k(n-m R_k)}_{\hbox{synthesis filter}}$  

This filter bank and its reconstruction are diagrammed in Fig.11.35.

Figure 11.35: Generalized STFT

The analysis filter $ h_k$ is typically complex bandpass (as in the STFT case). The integers $ R_k$ give the downsampling factor for the output of the $ k$ th channel filter: For critical sampling without aliasing, we set $ R_k= \pi/\hbox{Width}(H_k)$ . The impulse response of synthesis filter $ f_k$ can be regarded as the $ k$ th basis signal in the reconstruction. If the $ \{f_k\}$ are orthonormal, then we have $ f_k(n) = h_k^\ast(-n)$ . More generally, $ \{h_k,f_k\}$ form a biorthogonal basis.

Further Reading

In addition to [287] which was frequently cited in this chapter, [291], [160], [16], [273] are excellent references (books). The original paper by Princen and Bradley on time-domain aliasing cancellation is [214]. The Princen-Bradley filter bank is a special case of Lapped Orthogonal Transforms (LOT) [160] for which the overlap is 50%. It is also an orthogonal perfect reconstruction filter bank with first-order FIR polyphase filters. Other papers related to perceptual audio coding include [24,284,53,161,195,17,25,18,200,16,273]. An approximately invertible constant-Q transform, with FOSS matlab code, is described in [244].


This chapter introduced various multirate filter banks. While the main purpose was to relate them to STFT processors (which the author found to be missing in the literature), it is difficult to avoid being drawn into the general topic of perfect-reconstruction filter banks!

Next Section:
Summary and Conclusions
Previous Section:
Applications of the STFT