Free Books

Review of STFT Filterbanks

Let's take a look at some of the STFT processors we've seen before, now viewed as polyphase filter banks. Since they all use FFTs to perform overlap-add decompositions of spectra, they are all efficient, but most are oversampled in time and/or frequency as ``filter banks'' go. Oversampling is usually preferred outside of a compression context, and normally required when spectral modifications are to be performed. The STFT also computes a uniform filter bank, but it can be used as the basis for a variety of non-uniform filter banks, as discussed in §10.7, to give frequency resolution more like that of hearing7.3).

For each selected STFT example below, a list of filter-bank properties is listed, followed by some discussion. Most of the properties are determined by the choice of FFT window $ w$ and FFT hop size $ R$ .

STFT, Rectangular Window, No Overlap

  • Perfect reconstruction
  • Critically sampled (relies on aliasing cancellation)
  • Poor channel isolation ( $ \approx 13$ dB)
  • Not robust to filter-bank modifications
This is the main critically sampled perfect-reconstruction filter bank implemented by an STFT (other examples involve Portnoff windows, §9.7). ``Filter-bank modifications'' here means modifications introduced as time-varying complex gains applied to the filter-bank channel signals prior to remodulation and summing to reconstruct the signal (Chapter 9). In contrast to this, as discussed in Chapter 8, multiplicative spectral modifications in overlap-add systems having sufficient time-domain zero-padding yield perfect reconstruction of the filtered signal, even when their filter-bank interpretation obviously involves aliasing cancellation among channels in the frequency domain.

STFT, Rectangular Window, 50% Overlap

  • Perfect reconstruction
  • Oversampled by 2 (less reliant on aliasing cancellation)
  • Poor channel isolation ( $ \approx 13$ dB)
  • Not very robust to filter-bank modifications, but better
Reducing the hop size to half the window length greatly reduces the amount of aliasing in the filter-bank output signals8.3.1). Recall that this happens because the folding frequency due to downsampling (by the hop size) doubles to coincide with the first zero-crossing of the window transform.

STFT, Triangular Window, 50% Overlap

  • Perfect reconstruction
  • Oversampled by 2
  • Better channel isolation ( $ \approx 26$ dB)
  • Moderately robust to filter-bank modifications
This case is essentially the no-overlap rectangular-window case with the window-length doubled and the window-transform squared, as derived in Chapter 3. The squaring doubles the channel isolation in dB. To move the folding frequency out to the first zero-crossing, 75% overlap should be used ($ 4\times$ oversampling in time).

STFT, Hamming Window, 75% Overlap

  • Perfect reconstruction
  • Oversampled by 4
  • Aliasing from side lobes only
  • Good channel isolation ( $ \approx 42$ dB)
  • Moderately robust to filter-bank modifications
This can be considered a ``telephone quality'' audio filter bank. It has been used many times to analyze/model speech signals.

STFT, Kaiser Window, Beta=10, 90% Overlap

  • Approximate perfect reconstruction (side lobes controlled by $ \beta $ )
  • Oversampled by $ 10$
  • Excellent channel isolation ( $ \approx 80$ dB)
  • Very robust to filter-bank modifications
  • Aliasing from side lobes only
Because the Kaiser window transform does not have harmonic nulls that can be tuned to harmonics of the frame rate (§8.3.1), we obtain only approximate perfect reconstruction. However, the reconstruction error is entirely under our control through the choice of the Kaiser-window $ \beta $ parameter (determining its time-bandwidth product). This filter bank does not rely on aliasing cancellation (when side-lobes are negligible), so it is very robust to spectral modifications.

Sliding FFT (Maximum Overlap), Any Window, Zero-Padded by 5

This example is practical in research applications. With powerful computers and large disks, why not set the FFT hop size to $ R=1$ and avoid all aliasing entirely? In the early days of computer music, this was the normal choice in phase-vocoder analysis for additive synthesisG.10), and it is of course far more affordable now. For aggressive spectral modifications, the sliding FFT ($ R=1$ ) generally yields the best quality results. Additionally, the signal analyzed can be oversampled so that the frequency domain has a large extended area where nonlinear distortion products can ``land'' without aliasing. As an example, ``tube distortion'' simulators routinely utilize $ 8\times$ or even $ 16\times$ oversampling in the input signal prior to distortion [300].

Next Section:
Wavelet Filter Banks
Previous Section:
MPEG Filter Banks