FFT Filter Banks

Free Books Spectral Audio Signal Processing

This section, based on [265], describes how to make practical audio filter banks using the Short Time Fourier Transform (STFT). This material bridges the filter-bank interpretation of the STFT in Chapter 9 and the discussion of multirate filter banks in Chapter 11. The filter banks of this section are based entirely on the STFT, with consideration of the basic Fourier theorems introduced in Chapter 2. In Chapter 11, filter banks such as used in audio compression are addressed. However, when not doing compression, i.e., when each channel of the filter bank does not have to be critically sampled, the methods of this section may suffice in many applications.

Audio Filter Banks

It is well known that the frequency resolution of human hearing decreases with frequency [71,276]. As a result, any ``auditory filter bank'' must be a nonuniform filter bank in which the channel bandwidths increase with frequency over most of the spectrum. A classic approximate example is the third-octave filter bank.^11.14 A simpler (cruder) approximation is the octave filter bank,^11.15 also called a dyadic filter bank when implemented using a binary tree structure [287]. Both are examples of constant-Q filter banks [29,30,244], in which the bandwidth of each filter-bank channel is proportional to center frequency [263]. Approximate auditory filter banks, such as constant-Q filter banks, have extensive applications in computer music, audio engineering, and basic hearing research.

If the output signals from all channels of a constant-Q filter bank are all sampled at a particular time, we obtain what may be called a constant-Q transform [29]. A constant-Q transform can be efficiently implemented by smoothing the output of a Fast Fourier Transform (FFT) [30]. More generally, a multiresolution spectrogram can be implemented by combining FFTs of different lengths and advancing the FFTs forward through time (§7.3). Such nonuniform filter banks can also be implemented based on the Goetzel algorithm [33].

While the topic of filter banks is well developed in the literature, including constant-Q, nonuniform FFT-based, and wavelet filter banks, the simple, robust methods presented in this section appear to be new [265]. In particular, classic nonuniform FFT filter banks as described in [226] have not offered the perfect reconstruction property [287] in which the filter-bank sum yields the input signal exactly (to within a delay and/or scale factor) when the filter-band signals are not modified. The voluminous literature on perfect-reconstruction filter banks [287] addresses nonuniform filter banks, such as dyadic filter banks designed based on pseudo quadrature mirror filter designs, but simpler STFT methods do not yet appear to be incorporated. In the cosine-modulated filter-bank domain, subband DCTs have been used in a related way [302], but apparently without consideration for the possibility of a common time domain across multiple channels.^11.16

This section can be viewed as an extension of [30] to the FFT filter-bank case. Alternatively, it may be viewed as a novel method for nonuniform FIR filter-bank design and implementation, based on STFT methodology, with arbitrarily accurate reconstruction and controlled aliasing in the downsampled case. While we consider only auditory (approximately constant-Q) filter banks, the method works equally well for arbitrary nonuniform spectral partitions and overlap-add decompositions in the frequency domain.

Basic Idea

The basic idea is to partition FFT bins into the desired nonuniform bands, and perform smaller inverse FFTs on each subband to synthesize downsampled time-domain signals in each band. A simple example for a length 8 FFT octave filter bank is shown in Fig.10.30. We'll next extend this example to a practically useful level via a series of tutorially oriented matlab examples.

**Figure:** FFT implementation of one frame of the simple octave filter bank shown in Fig.10.31. Successive frames are non-overlapping (rectangular window advances its full length each frame).
$\includegraphics[width=0.8\twidth]{eps/fft8}$

Summing STFT Bins

In the Short-Time Fourier Transform, which implements a uniform FIR filter bank (Chapter 9), each FFT bin can be regarded as one sample of the filter-bank output in one channel. It is elementary that summing adjacent filter-bank signals sums the corresponding pass-bands to create a wider pass-band. Summing adjacent FFT bins in the STFT, therefore, synthesizes one sample from a wider pass-band implemented using an FFT. This is essentially how a constant-Q transform is created from an FFT in [30] (using a different frequency-weighting, or ``smoothing kernel''). However, when making a filter bank, as opposed to only a transform used for spectrographic purposes, we must be able to step the FFT through time and compute properly sampled time-domain filter-bank signals.

The wider pass-band created by adjacent-channel summing requires a higher sampling rate in the time domain to avoid aliasing. As a result, the maximum STFT ``hop size'' is limited by the widest pass-band in the filter bank. For audio filter banks, low-frequency channels have narrow bandwidths, while high-frequency channels are wider, thereby forcing a smaller hop size for the STFT. This means that the low-frequency channels are heavily oversampled when the high-frequency channels are merely adequately sampled (in time) [30,88]. In an octave filter-bank, for example, the top octave, occupying the entire upper half of the spectrum, requires a time-domain step-size of no more than two samples, if aliasing of the band is to be avoided. Each octave down is then oversampled (in time) by an additional factor of 2.

Inverse Transforming STFT Bin Groups

The solution proposed in [265] is to compute multiple time samples for each high-frequency channel, so that one hop of the FFT produces all needed samples in each band. That way all channels can use the same hop-size without redundancy. If the narrowest band gets one sample per hop, then a band times as wide must produce samples per hop.

A fast way to compute multiple time samples from the frequency-samples (FFT bins) of a given band is an inverse FFT, as shown in Fig.10.30. In matlab notation, let X(1:N) denote the FFT (length N) of the current frame of data in the STFT, and denote the lower and upper spectral samples for band k by lsn(k) and hsn(k), respectively. Then we may specify the matlab computation of the full-rate time-domain signal corresponding to band k as follows:

BandK = X(lsn(k):hsn(k));
z1 = zeros(1,lsn(k)-1);
z2 = zeros(1,N-hsn(k));
BandKzp = [z1, BandK, z2];     % (1)
x(k,:) = ifft(BandKzp);

where x(k,:) denotes the output signal vector (length N) for the kth filter-bank channel for the current time-domain STFT window.

Let $\texttt{Nk}= \texttt{hsn(k)} - \texttt{lsn(k)} + 1$ denote the number of FFT bins in band . When $\texttt{Nk}$ is a power of 2, we can apply an inverse FFT only to the nonzero samples in the band:

xd{k} = ifft(BandK);

where xd{k} now denotes a cell array for holding the ``downsampled'' signal vectors (since the downsampling factor is typically different for different bands).

We may relate x(k,:) and xd{k} by noting that, when Lk = N/Nk is an integer, we have that the relation

BandK == alias(BandKzp,Lk)     % (2)

is true, for each element, where alias(x,K) denotes aliasing of K equal partitions of the vector x (§2.3.12):^11.17

function y = alias(x,L)
  Nx = length(x);
  Np = Nx/L; % aliasing-partition length
  x = x(:);  % ensure column vector
  y = zeros(Np,1);
  for i=1:L
    y = y + x(1+(i-1)*Np : i*Np);
  end

By the aliasing theorem (downsampling theorem) for Discrete Fourier Transforms (DFTs) (§2.3.12), the relation (2) in the frequency domain corresponds to

xd{k} == Lk * x(k,1:Lk:end)

in the time domain, i.e., xd{k} is obtained from x(k,:) by means of downsampling by the factor Lk. This produces N/Lk == Nk samples. That is, for a band that is Nk bins wide, we obtain Nk time-domain samples for each STFT frame, when critically sampled. (At the full rate, we obtain N samples from each channel for each frame.)

We see that taking an inverse FFT of only the bins in a given channel computes the critically downsampled signal for that channel. Downsampling factors between 1 and Lk can be implemented by choosing a zero-padding factor for the band somewhere between 1 and Lk samples.

Note that this filter bank has the perfect reconstruction (PR) property [287]. That is, the original input signal can be exactly reconstructed (to within a delay and possible scale factor) from the channel signals. The PR property follows immediately from the exact invertibility of the discrete Fourier transform. Specifically, we can recover the original signal frame by taking an FFT of each channel-signal frame and abutting the spectral bins so computed to form the original frame spectrum. Of course, the underlying STFT (uniform filter bank) must be PR as well, as it routinely is in practice.

Improving the Channel Filters

Recall that each FFT bin can be viewed as a sample from a bandpass filter whose frequency response is a frequency-shift of the FFT-window Fourier transform (§9.3). Therefore, the frequency response of a channel filter obtained by summing Nk adjacent FFT bins is given by the sum of Nk window transforms, one for each FFT bin in the sum. As a result, the stop-band of the channel-filter frequency response is a sum of Nk window side lobes, and by controlling window side-lobe level, we may control the stop-band gain of the channel filters.

The transition width from pass-band to stop-band, on the other hand, is given by the main-lobe width of the window transform (§5.5.1). In the previous subsection, by zero-padding the band (line (1) above), we implicitly assumed a transition width of one bin. Only the length N rectangular window can be reasonably said to have a one-bin transition from pass-band to stop-band. Since the first side lobe of a rectangular window transform is only about 13 dB below the main lobe, the rectangular window gives poor stop-band performance, as illustrated in Fig.10.33. Moreover, we often need FFT data windows to be shorter than the FFT size N (i.e., we often need zero-padding in the time domain) so that the frame spectrum will be oversampled, enabling various spectral processing such as linear filtering (Chapter 8).

One might wonder how the length N rectangular window can be all that bad when it gives the perfect reconstruction property, as demonstrated in the previous subsection. The answer is that there is a lot of aliasing in the channel signals, when downsampled, but this aliasing is exactly canceled in the reconstruction, provided the channel signals were not modified in any way.

Going back to §10.7.3, we need to replace the zero-padded band (1) by a proper filtering operation in the frequency domain (a ``spectral window''):

BandK2 = Hk .* X;
x(k,:) = ifft(BandK2); % full rate
BandK2a = alias(BandK2,Nk);
xd{k} = ifft(BandK2a); % crit samp

where the channel filter frequency response Hk may be prepared in advance as the appropriate weighted sum of FFT-window transforms:

Hideal = [z1,ones(1,Nk),z2];
Hk = cconvr(W,Hideal); % circ. conv.

where z1 and z2 are the same zero vectors defined in §10.7.3, and cconvr(W,H) denotes the circular convolution of two real vectors having the same length [264]:

function [Y] = cconvr(W,X)
  wc=fft(W);  xc=fft(X);
  yc = wc .* xc;
  Y = real(ifft(yc));

Note that in this more practical case, the perfect reconstruction property no longer holds, since the operation

BandK2a = alias(Hk .* X, Nk);

is not exactly invertible in general.^11.18However, we may approach perfect reconstruction arbitrarily closely by only aliasing stop-band intervals onto the pass-band, and by increasing the stop-band attenuation of Hk as desired. In contrast to the PR case, we do not rely on aliasing cancellation, which is valuable when the channel signals are to be modified.

The band filters Hk can be said to have been designed by the window method for FIR filter design [224]. (See functions fir1 and fir2 in Octave and/or the Matlab Signal Processing Toolbox.)

Fast Octave Filter Banks

Let's now apply the above technique to the design of an octave filter bank.^11.19 At first sight, this appears to be a natural fit, because it is immediately easy to partition a power of 2 (typical for the FFT size N) into octaves, each having a width in bins that is a power of 2. For example, when N == 8, we have the following stack of frequency-response vectors for the rectangular-window, no-zero-padding, complex-signal case:

H = [ ...
   0   0   0   0   1   1   1   1 ; ...
   0   0   1   1   0   0   0   0 ; ...
   0   1   0   0   0   0   0   0 ; ...
   1   0   0   0   0   0   0   0 ];

Figure 10.31 depicts the resulting filter bank schematically in the frequency domain.^11.20 Thus, H(1,:) is the frequency-response for the top (first) octave, H(2,:) is the frequency-response for the next-to-top (second) octave, H(3,:) is the next octave down, and H(4,:) is the ``remainder'' of the spectrum. (In every octave filter bank, there is a final low-frequency band.

**Figure 10.31:** Simple octave filter bank for complex signals.
$\includegraphics[width=0.8\twidth]{eps/simple-partition}$

A schematic of the filter-bank implementation using FFTs is shown in Fig.10.30. The division by N called for by ifft(N) is often spread out among the ifft butterfly stages as divisions by 2 (one-bit right-shifts in fixed-point arithmetic) after each of them. For conservation of dynamic range, half of the right-shifts can be spread out among every other forward-FFT butterfly stage [28] as well.

The simple spectral partitioning of Fig.10.31 is appropriate for complex signals--those for which the spectrum is regarded as spanning 0 Hz to the sampling rate. For real signals, we need a spectral partition more like that in Fig.10.32.

**Figure 10.32:** Simple octave filter bank for real signals.
$\includegraphics[width=0.8\twidth]{eps/simple-real-partition}$

Unfortunately, the number of spectral samples in Fig.10.32 is 14--not a power of 2. (The previous length 8 complex case maps to length 14 real case because the dc and Nyquist-limit samples do not have complex-conjugate counterparts.) Discarding the sample at the Nyquist limit--number 8 in Fig.10.32--does not help, since that gives 13 samples--still not a power of 2. In summary, there is no obvious way to octave-partition the spectral samples of a real signal while maintaining the power-of-2 condition for each band and symmetrically partitioning positive and negative frequencies.

A real signal can of course be converted to its corresponding ``analytic signal'' by filtering out the negative-frequency components. This is normally done by designing a Hilbert transform filter [133,224]. However, such filters are large-order FIR filters, exactly like we are trying to design! If we design our filter bank properly, we can use it to zero the negative-frequency components.

Spectral Rotation of Real Signals

Note that if we rotate the spectrum of a real signal by half a bin, we obtain positive-frequency samples and negative-frequency samples, with no sample at dc or at the Nyquist limit. This is typically desirable for audio signals because dc is inaudible and the Nyquist limit is a degenerate point of the spectrum that, for example, cannot have a phase other than 0 or $\pi$ . If is a power of 2, then so is , and the octave-band partitioning of the previous subsection can be applied separately to each half of the spectrum:

N = 32;  x = randn(1,N); % Specific example
LN = round(log2(N)); % number of octave bands
shifter = exp(-j*pi*[0:N-1]/N); % half-bin
xs = x .* shifter; % apply spectral shift
X = fft(xs,N); % project xs onto rotated basis
XP = X(1:N/2); % positive-frequency components
XN = X(N:-1:N/2+1); % neg.-frequency components
YP = dcells(XP); % partition to octave bands
YN = dcells(XN); % ditto for neg. frequencies
YPe = dcells2spec(YP); % unpack "dyadic cells"
YNe = dcells2spec(YN); % unpack neg. freqs
YNeflr = fliplr(YNe);  % undo former flip
ys = ifft([YPe,YNeflr],N,2);
y = real(ones(LN,1)*conj(shifter) .* ys);
% = octave filter-bank signals (real)
yr = sum(y); % filter-bank sum (should equal x)
yrerr = x - yr;
disp(sprintf(...
  'Total filter-bank sum L2 error = %0.2f %%',...
   100*norm(yrerr)/norm(x)));

In the above listing, the function dcells^11.21simply forms a cell array in matlab containing the spectral partition (``dyadic cells''). The function dcells2spec^11.22is the inverse of dcells, taking a spectral partition and unpacking it to produce a usual spectrum vector.

Improving the Octave Band Filters

To see the true filter-bank frequency response corresponding to Fig.10.31, we may feed an impulse to the filter bank and take a long FFT (for zero padding) of the channel signals to produce the interpolated response shown in Fig.10.33.

**Figure:** Channel-frequency-response overlay for the octave filter bank shown in Fig.10.31.
$\includegraphics[width=0.8\twidth]{eps/simple-partition-interp}$

The horizontal line along 0 dB in Fig.10.33 was obtained by summing the channel responses, indicating that it is a perfect-reconstruction filter-bank, as expected. However, the stop-band performance of the channels is quite poor, being comparable to the side-lobes of a rectangular window transform (an aliased sinc function). In fact, the stop-band is identical to the rectangular-window side-lobes for the two lowest bands. Notice that the original eight samples of Fig.10.31 still lie along the 0-dB line, and there are still zeros in each channel response beneath the unity-gain point of every other channel's response, so Fig.10.31 can be obtained by sampling Fig.10.33 at those eight points. However, the interpolated response shows that the filter bank is quite poor by audio standards.

**Figure 10.34:** Channel-frequency-response overlay for a complex-signal octave filter bank, designed using a length 127 Dolph-Chebyshev window (80 dB stop-band attenuation) and length 256 FFT.
$\includegraphics[width=0.8\twidth]{eps/impulse-cheb127h-rect129x-N256-partition-interp}$

Figure 10.34 shows an octave filter bank (again for complex signals) that is better designed for audio usage. Instead of basing the channel filter prototype on the rectangular window, a Dolph-Chebyshev window was used (using the matlab function call chebwin(127,80)). The FFT size is about twice the filter length, thereby allowing for a data frame of equal length (to the filter) without incurring time aliasing, in the usual way for STFTs (Chapter 8). The data window is chosen to overlap-add to a constant, as typical in STFTs, so its choice does not affect the quality of the filter-bank output channel signals. We therefore may continue to use a rectangular data window that advances by its full length each frame. Choosing an odd filter length facilitates zero-phase offline STFT processing, in which the middle FIR sample is treated as occurring at time zero, so that there is no delay in any filter-bank channel [264].

The filter bank is PR in the full-rate case because the underlying STFT is PR, in the absence of modifications, and because the channel filter-bank is constant-overlap-add in the frequency domain (again in the absence of modifications) according to STFT theory (Chapter 8).

Aliasing on Downsampling

While the filter bank of Fig.10.34 gives good stop-band rejection, there is still a significant amount of aliasing when the bands are critically sampled. This happens because the transition bands are aliased about their midpoints. This can be seen in Fig.10.34 by noting that aliasing ``folding frequencies'' lie at the crossover point between each pair of bands. An overlay of the spectra of the downsampled filter-bank outputs, for an impulse input, is shown in Fig.10.35.

**Figure:** Same as Fig.10.34 obtained by critically downsampling each channel signal, zero-padding, and performing an FFT. All the observable stop-band error happens to cancel out in the filter-bank sum because the input signal is an impulse, in which case the reconstruction remains exact.
$\includegraphics[width=0.8\twidth]{eps/impulse-cheb127h-rect129x-N256-aliased-partition-interp}$

Figure 10.36 shows the aliased spectral signal bands (prior to inverse STFT) for a step input (same filter bank). (This type of plot looks ideal for an impulse input signal because the spectrum is constant, so the aliased bands are also constant.) Note the large slice of dc energy that has aliased from near the sampling rate to near half the sampling rate in the top octave band. The signal and error spectra are shown overlaid in Fig.10.37. In this case, the aliasing causes significant error in the reconstruction.

**Figure:** Same filter bank as in Fig.10.35 but driven by a step input.
$\includegraphics[width=0.8\twidth]{eps/step-cheb127h-rect129x-N256-aliased-unpacked}$

**Figure:** Signal spectrum (an impulse, since the time signal is a step) and error spectrum for the case of Fig.10.36. Note the large error near half the sampling rate.
$\includegraphics[width=0.8\twidth]{eps/step-cheb127h-rect129x-N256-aliased-error-interp}$

Restricting Aliasing to Stop-Bands

To eliminate the relatively heavy transition-band aliasing (when critically sampling the channel signals), we can define overlapping bands such that each band encompasses the transition bands on either side. However, unless a full $2\times\null$ oversampling is provided for each band (which is one easy solution), the bandwidth (in bins) is no longer a power of two, thereby thwarting use of radix-2 inverse-FFTs to compute the time-domain band signals.

To keep the channel bandwidths at powers of two while restricting aliasing to stop-band energy, the IFFT bands can be widened to include transition bands on either side. That is, the desired pass-band plus the two transition bands span a power-of-two bins. This results in overlapping channel IFFTs. Figure 10.38 shows how the example of Fig.10.34 is modified by this strategy.

**Figure 10.38:** Channel-frequency-response overlay for an octave filter bank designed using a length 127 Dolph-Chebyshev window (80 dB SBA) and length 256 FFT size.
$\includegraphics[width=0.8\twidth]{eps/impulse-cheb127h-rect128x-N256-qdcells}$

The basic principle of filter-bank band allocation is to enclose each filter band plus its transition bands within a wider band that is a power-of-two bins wide.^11.23 The band should roll off to reach its stop-band at the edge of the wider encompassing band. It is fine to have extra space in the wider band, and this may be filled with a continuation of the enclosed band's stop-band response (or some tapering of it--since we assume stop-band energy is negligible, the difference should be inconsequential). The desired bands may overlap each other by any amount, and may have any desired shape. The encompassing bands then overlap further to reach the next power of two (in width) for each overlapping extended band. (See the gammatone and gammachirp filter banks for examples of heavily overlapping bands in an audio filter bank [111].)

In this approach, pass-bands of arbitrary width are embedded in overlapping IFFT bands that are a power-of-2 wide. As a result of this flexibility, the frequency-rotation trick of §10.7.7 is no longer needed for real filter banks. Instead, we simply allocate any desired bands between dc and half the sampling rate, and then conjugate-symmetry dictates the rest. In addition to a left-over ``dc-Nyquist'' band, there is a similar residual ``Nyquist-limit'' band (a typically negligible band about half the sampling rate). In other words, since the pass-bands may be any width and the encompassing IFFT bands may overlap by any amount, they do not have to ``pack'' conveniently as power-of-two blocks.

The minimum channel bandwidth is defined as two transition bands plus one bin (i.e., the minimum pass-band width is zero, corresponding to one bin, or one spectral sample). For the Dolph-Chebyshev window, the transition bandwidth is known in closed form [155]. In our examples, we have a length 127 window with 80 dB stop-band attenuation in the lowpass prototype [chebwin(127,80)], corresponding to a transition width of 6.35 bins in a length 256 FFT, which was rounded up to 7 bins in software for simplicity of band allocation. Therefore, our minimum channel bandwidth is 15 bins (two transition bands plus one sample for the band center). The next highest power of two is 16, so that is our minimum encompassing IFFT length for any band.

The dc and Nyquist channels are combined into a single channel containing the left-over residual filter-bank response consisting of a low transition down from dc and a high-frequency transition up to the sampling rate (in the complex-signal case). When N is sufficiently large so that these bands contain no audible energy, they may be discarded. We include them in all examples here so as to preserve the (near) perfect reconstruction property of the filter bank. Thus, the 7-bin dc channel is combined with the 7-bin Nyquist channel to form a single 16-bin encompassing residual band that may be discarded in many audio applications (when the initial FFT size is sufficiently large for the sampling rate used).

In the example of Fig.10.38, the initial FFT size is 256, and the channel bandwidths (pass-bands only, excluding transitions), from top to bottom, are 121, 64, 32, 16, and 8 bins. The top band is reduced by 7 bins to leave a transition band to the sampling rate. Similarly, the lowest band lies above a transition band consisting of bins 0-6. The encompassing IFFTs (containing transitions) are lengths 256, 128, 64, 32, 32, for the interior bands, and a length 32 IFFT handles the dc and Nyquist bands (which are combined into a single 14-bin band about dc, which occupies 28 bins when the transition bands are appended). Letting [lo,hi] denote a band by its lower and upper bin limit, the non-overlapping adjacent pass-band edges in ``spectral samples''^11.24 of the interior bands are [8, 15], [16, 31], [32, 63], [64, 127], and [128, 248]; the overlapping encompassing IFFT band edges are then [1, 32], [9, 40], [25, 88], [57, 184], [1, 256], i.e., they each contain a pass-band and two transition bands, and have a power-of-2 length. The downsampling factor for each channel can be computed as the initial FFT size (256) divided by the IFFT size ( , , , , or ) for the channel.

Figure 10.39 shows the counterpart of Fig.10.35 for this example. In this case, the aliased signal energy comes only from channel-filter stop-bands. For narrow bands, the aliasing is suppressed by at least 80 dB (the side-lobe level of the chosen Dolph-Chebyshev window transform). For bands significantly wider than one bin (the minimum bandwidth in this example is the dc-Nyquist band at 14 bins), the stop-band consists of a sum of shifts of the window-transform side lobes, and these are found to be more than 80 dB down due to cancellation (more than 90 dB down in most bands of this example).

**Figure:** Same example as in Fig.10.38 but looking at the effects of aliasing due to channel-signal downsampling. Compare to Fig.10.35.
$\includegraphics[width=0.8\twidth]{eps/impulse-cheb127h-rect128x-N256-aliased-qdcells}$

Tightening the IFFTs

In this example the top band is not downsampled at all, and the interior bands are oversampled by approximately 2. This is because the desired pass-band widths started out at a power of 2, so that the addition of transition bands forced the next higher power of 2 for the IFFT size. Narrowing the width of the top band from 121 bins to $128-2\cdot 7 = 114$ bins would enable use of a length 128 IFFT for the top band, and similarly for the lower bands. In other words, when the desired spectral partition is that of an ideal octave filter bank, as sketched in Fig.10.31, narrowing each octave-band by twice the transition width of the lowpass prototype filter (and ``covering down'' to keep them adjacent) will produce a relatively ``tight'' FFT filterbank design in which the IFFT sizes remain the same length as in the heavily aliased case discussed above (Fig.10.35). When applied to the octave filter bank, the pass-bands become a little narrower than one octave. We may call this a quasi octave filter bank.

Real Filter Bank Example

Finally, Fig.10.40 shows the appearance of the octave filter bank for real signals. In this case, bands are constructed between dc and half the sampling rate, and conjugate symmetry is used to automatically construct the desired bands between half the sampling rate and the sampling rate. The band allocation algorithm therefore needs no modification for the real-signal case.

**Figure:** Channel-frequency-response overlay for a *real-signal* octave filter bank, designed using a length 127 Dolph-Chebyshev window and length 256 FFT size (no downsampling). Compare the complex-signal filter bank in Fig.10.38.
$\includegraphics[width=0.8\twidth]{eps/impulse-cheb127h-rect128x-N256-real-qdcells}$

Optimal Band Filters

In the filter-bank literature, one class of filter banks is called ``cosine modulated'' filter banks. DFT filter banks are similar. The lowpass-filter prototype in such filter banks can be used in place of the Dolph-Chebyshev window used here. Therefore, any result on optimal design of cosine-modulated filter banks can be adapted to this context. See, for example, [253,302]. Note, however, that in principle a separate optimization is needed for each different channel bandwidth. An optimal lowpass prototype only optimizes channels having a one-bin pass-band, since the prototype frequency-response is merely shifted in frequency (cosine-modulated in time) to create the channel frequency response. Wider channels are made by summing such channel responses, which alters the stop-bands.

In practice, the Dolph-Chebyshev window, used in the examples of this section, typically yields a filter bank magnitude frequency response that is optimal in the Chebyshev sense, when at least one channel is minimum width, because

there can be only one lowpass prototype filter in any modulated filter bank (such as the DFT filter bank),
the prototype itself is the optimal Chebyshev lowpass filter of minimum bandwidth, and
summing shifted copies of the prototype frequency response (to synthesize a wider pass-band) generally improves the stop-band rejection over that of the prototype, thereby meeting the Chebyshev optimality requirement for the filter-bank as a whole (keeping below the worst-case deviation of the prototype).

All channel bands, whatever their width, are constructed by some linear combination of shifted copies of the lowpass prototype frequency response. The Dolph-Chebyshev window is precisely optimal (in the Chebyshev sense) for any pass-band that is one bin wide. Summing shifts of the window transform to synthesize wider bands has been observed to invariably improve the stop-band rejection significantly. The examples shown above illustrate the margin beyond 80 dB stop-band rejection achieved for the octave filter bank.

The Dolph-Chebyshev window has faint impulsive endpoints on the order of the side-lobe level (about 50 dB down in the 80-dB-SBA examples above), and in some applications, this could be considered an undesirable time-domain characteristic. To eliminate them, an optimal Chebyshev window may be designed by means of linear programming with a time-domain monotonicity constraint (§3.13). Alternatively, of course, other windows can be used, such as the Kaiser, or three-term Blackman-Harris window, to name just two (Chapter 3).

Next Section:
FFT Filter-Bank Summary and Fourier Duality with OLA
Previous Section:
Gaussian Windowed Chirps (Chirplets)

FFT Filter Banks

Audio Filter Banks

Basic Idea

Summing STFT Bins

Inverse Transforming STFT Bin Groups

Improving the Channel Filters

Fast Octave Filter Banks

Spectral Rotation of Real Signals

Improving the Octave Band Filters

Aliasing on Downsampling

Restricting Aliasing to Stop-Bands

Tightening the IFFTs

Real Filter Bank Example

Optimal Band Filters

Sign in

About this Book

Spectral Audio Signal Processing

Blogs - Hall of Fame

Free PDF Downloads

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group