Downsampled STFT Filter BanksWe now look at STFT filter banks which are downsampled by the factor . The downsampling factor corresponds to a hop size of samples in the overlap-add view of the STFT. From the filter-bank point of view, the impact of is aliasing in the channel signals when the lowpass filter (analysis window) is less than ideal. When the conditions for perfect reconstruction are met, this aliasing will be canceled in the reconstruction (when the filter-bank channel signals are remodulated and summed).
Downsampled STFT Filter BankSo far we have considered only (the ``sliding'' DFT) in our filter-bank interpretation of the STFT. For we obtain a downsampled version of :
i.e., is simply evaluated at every sample, as shown in Fig.9.17.
Note that this can be considered an implementation of a phase vocoder filter bank . (See §G.5 for an introduction to the vocoder.)
Since the channel signals are downsampled, we generally need interpolation in the reconstruction. Figure 9.18 indicates how we might pursue this. From studying the overlap-add framework, we know that the inverse STFT is exact when the window is , that is, when is constant. In only these cases can the STFT be considered a perfect reconstruction filter bank. From the Poisson Summation Formula in §8.3.1, we know that a condition equivalent to the COLA condition is that the window transform have notches at all harmonics of the frame rate, i.e., for . In the present context (filter-bank point of view), perfect reconstruction appears impossible for , because for ideal reconstruction after downsampling, the channel anti-aliasing filter ( ) and interpolation filter ( ) have to be ideal lowpass filters. This is a true conclusion in any single channel, but not for the filter bank as a whole. We know, for example, from the overlap-add interpretation of the STFT that perfect reconstruction occurs for hop-sizes greater than 1 as long as the COLA condition is met. This is an interesting paradox to which we will return shortly. What we would expect in the filter-bank context is that the reconstruction can be made arbitrarily accurate given better and better lowpass filters and which cut off at (the folding frequency associated with down-sampling by ). This is the right way to think about the STFT when spectral modifications are involved. In Chapter 11 we will develop the general topic of perfect reconstruction filter banks, and derive various STFT processors as special cases.
In FBS, is the downsampling factor in each of the filter-bank channels, and thus the window serves as the anti-aliasing filter (see Fig.9.19). We see that to avoid aliasing, must be bandlimited to , as illustrated schematically in Fig.9.20. main lobe. Given the first zero of at , we obtain
The following table gives maximum hop sizes for various window types in the Blackman-Harris family, where is both the number of constant-plus-cosine terms in the window definition (§3.3) and the half-main-lobe width in units of side-lobe widths . Also shown in the table is the maximum COLA hop size we determined in Chapter 8.
|L||Window Type (Length )|
|L||In and Out Window (Length )|
|1||Rectangular ( )||M/2||M|
|2||Generalized Hamming ( )||M/6||M/3|
|3||Blackman Family ( )||M/10||M/5|
- is equal to divided by the main-lobe width in ``side lobes'', while
- is divided by the first notch frequency in the window transform (lowest available frame rate at which all frame-rate harmonics are notched).
- For windows in the Blackman-Harris families, and with main-lobe widths defined from zero-crossing to zero-crossing, .
- Weak COLA: Window transform has zeros at frame-rate harmonics:
- Strong COLA: Window transform is bandlimited consistent with
downsampling by the frame rate:
- Perfect OLA reconstruction
- No aliasing
- better for spectral modifications
- Time-domain window infinitely long in ideal case
M = 33; % window length w = hamming(M); R = (M-1)/2; % maximum hop size w(M) = 0; % 'periodic Hamming' (for COLA) %w(M) = w(M)/2; % another solution, %w(1) = w(1)/2; % interesting to compare
ff = 1/R; % frame rate (fs=1) N = 6*M; % no. samples to look at OLA sp = ones(N,1)*sum(w)/R; % dc term (COLA term) ubound = sp(1); % try easy-to-compute upper bound lbound = ubound; % and lower bound n = (0:N-1)'; for (k=1:R-1) % traverse frame-rate harmonics f=ff*k; csin = exp(j*2*pi*f*n); % frame-rate harmonic % find exact window transform at frequency f Wf = w' * conj(csin(1:M)); hum = Wf*csin; % contribution to OLA "hum" sp = sp + hum/R; % "Poisson summation" into OLA % Update lower and upper bounds: Wfb = abs(Wf); ubound = ubound + Wfb/R; % build upper bound lbound = lbound - Wfb/R; % build lower bound endIn this example, the overlap-add is theoretically a perfect constant (equal to ) because the frame rate and all its harmonics coincide with nulls in the window transform (see Fig.9.24). A plot of the steady-state overlap-add and that computed using the Poisson Summation Formula (not shown) is constant to within numerical precision. The difference between the actual overlap-add and that computed using the PSF is shown in Fig.9.23. We verify that the difference is on the order of , which is close enough to zero in double-precision (64-bit) floating-point computations. We thus verify that the overlap-add of a length Hamming window using a hop size of samples is constant to within machine precision.
M = 33; % Window length beta = 8; w = kaiser(M,beta); R = floor(1.7*(M-1)/(beta+1)); % ROUGH estimate (gives R=6)Figure 9.25 plots the overlap-added Kaiser windows, and Fig.9.26 shows the steady-state overlap-add (a time segment sometime after the first 30 samples). The ``predicted'' OLA is computed using the Poisson Summation Formula using the same matlab code as before. Note that the Poisson summation formula gives exact results to within numerical precision. The upper (lower) bound was computed by summing (subtracting) the window-transform magnitudes at all frame-rate harmonics to (from) the dc gain of the window. This is one example of how the PSF can be used to estimate upper and lower bounds on OLA error. 9.27. Again the two methods agree to within numerical precision. 9.28 shows the Kaiser window transform, with marks indicating the folding frequency at the chosen hop size , as well as the frame-rate and twice the frame rate. We see that the frame rate (hop size) has been well chosen for this window, as the folding frequency lies very close to what would be called the ``stop band'' of the Kaiser window transform. The ``stop-band rejection'' can be seen to be approximately dB (height of highest side lobe in Fig.9.28). We conclude that this example--a length 33 Kaiser window with and hop-size -- represents a reasonably high-quality audio STFT that will be robust in the presence of spectral modifications. We expect such robustness whenever the folding frequency lies above the main lobe of the window transform.
STFT with Modifications