Downsampled STFT Filter Banks
We now look at STFT filter banks which are downsampled by the factor . The downsampling factor corresponds to a hop size of samples in the overlapadd view of the STFT. From the filterbank point of view, the impact of is aliasing in the channel signals when the lowpass filter (analysis window) is less than ideal. When the conditions for perfect reconstruction are met, this aliasing will be canceled in the reconstruction (when the filterbank channel signals are remodulated and summed).
Downsampled STFT Filter Bank
So far we have considered only (the ``sliding'' DFT) in our filterbank interpretation of the STFT. For we obtain a downsampled version of :
Let us define the downsampled time index as so that
(10.25) 
i.e., is simply evaluated at every sample, as shown in Fig.9.17.
Note that this can be considered an implementation of a phase vocoder filter bank [212]. (See §G.5 for an introduction to the vocoder.)
Filter Bank Reconstruction
Since the channel signals are downsampled, we generally need interpolation in the reconstruction. Figure 9.18 indicates how we might pursue this. From studying the overlapadd framework, we know that the inverse STFT is exact when the window is , that is, when is constant. In only these cases can the STFT be considered a perfect reconstruction filter bank. From the Poisson Summation Formula in §8.3.1, we know that a condition equivalent to the COLA condition is that the window transform have notches at all harmonics of the frame rate, i.e., for . In the present context (filterbank point of view), perfect reconstruction appears impossible for , because for ideal reconstruction after downsampling, the channel antialiasing filter ( ) and interpolation filter ( ) have to be ideal lowpass filters. This is a true conclusion in any single channel, but not for the filter bank as a whole. We know, for example, from the overlapadd interpretation of the STFT that perfect reconstruction occurs for hopsizes greater than 1 as long as the COLA condition is met. This is an interesting paradox to which we will return shortly.
What we would expect in the filterbank context is that the reconstruction can be made arbitrarily accurate given better and better lowpass filters and which cut off at (the folding frequency associated with downsampling by ). This is the right way to think about the STFT when spectral modifications are involved.
In Chapter 11 we will develop the general topic of perfect reconstruction filter banks, and derive various STFT processors as special cases.
Downsampling with AntiAliasing
In OLA, the hop size is governed by the COLA constraint
(10.26) 
In FBS, is the downsampling factor in each of the filterbank channels, and thus the window serves as the antialiasing filter (see Fig.9.19). We see that to avoid aliasing, must be bandlimited to , as illustrated schematically in Fig.9.20.
Properly AntiAliasing Window Transforms
For simplicity, define windowtransform bandlimits at first zerocrossings about the main lobe. Given the first zero of at , we obtain
(10.27) 
The following table gives maximum hop sizes for various window types in the BlackmanHarris family, where is both the number of constantpluscosine terms in the window definition (§3.3) and the halfmainlobe width in units of sidelobe widths . Also shown in the table is the maximum COLA hop size we determined in Chapter 8.
L  Window Type (Length )  
1  Rectangular  M/2  M 
2  Generalized Hamming  M/4  M/2 
3  Blackman Family  M/6  M/3 
L  term BlackmanHarris  M/2L  M/L 
It is interesting to note that the maximum COLA hop size is double the maximum downsampling factor which avoids aliasing of the main lobe of the window transform in FFTbin signals . Since the COLA constraint is a sufficient condition for perfect reconstruction, this aliasing is quite heavy (see Fig.9.21), yet it is all canceled in the reconstruction. The general theory of aliasing cancellation in perfect reconstruction filter banks will be taken up in Chapter 11.
It is important to realize that aliasing cancellation is disturbed by FBS spectral modifications.^{10.4}For robustness in the presence of spectral modifications, it is advisable to keep . For compression, it is common to use together with a ``synthesis window'' in a weighted overlapadd (WOLA) scheme (§8.6).
Hop Sizes for WOLA
In the weighted overlapadd method, with the synthesis (output) window equal to the analysis (input) window, we have the following modification of the recommended maximum hopsize table:
L  In and Out Window (Length )  
1  Rectangular ( )  M/2  M 
2  Generalized Hamming ( )  M/6  M/3 
3  Blackman Family ( )  M/10  M/5 
L  term BlackmanHarris  M/(4L2)  M/(2L1) 

is equal to
divided by the mainlobe width
in ``side lobes'', while

is
divided by the first notch
frequency in the window transform (lowest available frame rate at
which all framerate harmonics are notched).
 For windows in the BlackmanHarris families, and
with mainlobe widths defined from zerocrossing to zerocrossing,
.
ConstantOverlapAdd (COLA) Cases
 Weak COLA: Window transform has zeros at framerate harmonics:
 Perfect OLA reconstruction
 Relies on aliasing cancellation in frequency domain
 Aliasing cancellation is disturbed by spectral modifications
 See Portnoff for further details
 Strong COLA: Window transform is bandlimited consistent with
downsampling by the frame rate:
 Perfect OLA reconstruction
 No aliasing
 better for spectral modifications
 Timedomain window infinitely long in ideal case
Hamming OverlapAdd Example
Matlab code:
M = 33; % window length w = hamming(M); R = (M1)/2; % maximum hop size w(M) = 0; % 'periodic Hamming' (for COLA) %w(M) = w(M)/2; % another solution, %w(1) = w(1)/2; % interesting to compare
PeriodicHamming OLA from Poisson Summation Formula
Matlab code:
ff = 1/R; % frame rate (fs=1) N = 6*M; % no. samples to look at OLA sp = ones(N,1)*sum(w)/R; % dc term (COLA term) ubound = sp(1); % try easytocompute upper bound lbound = ubound; % and lower bound n = (0:N1)'; for (k=1:R1) % traverse framerate harmonics f=ff*k; csin = exp(j*2*pi*f*n); % framerate harmonic % find exact window transform at frequency f Wf = w' * conj(csin(1:M)); hum = Wf*csin; % contribution to OLA "hum" sp = sp + hum/R; % "Poisson summation" into OLA % Update lower and upper bounds: Wfb = abs(Wf); ubound = ubound + Wfb/R; % build upper bound lbound = lbound  Wfb/R; % build lower bound end
In this example, the overlapadd is theoretically a perfect constant (equal to ) because the frame rate and all its harmonics coincide with nulls in the window transform (see Fig.9.24). A plot of the steadystate overlapadd and that computed using the Poisson Summation Formula (not shown) is constant to within numerical precision. The difference between the actual overlapadd and that computed using the PSF is shown in Fig.9.23. We verify that the difference is on the order of , which is close enough to zero in doubleprecision (64bit) floatingpoint computations. We thus verify that the overlapadd of a length Hamming window using a hop size of samples is constant to within machine precision.
Figure 9.24 shows the zeropadded DFT of the modified Hamming window we're using ( ) with the framerate harmonics marked. In this example ( ), the upper half of the main lobe aliases into the lower half of the main lobe. (In fact, all energy above the folding frequency aliases into the lower half of the main lobe.) While this window and hop size still give perfect reconstruction under the STFT, spectral modifications will disturb the aliasing cancellation during reconstruction. This ``undersampled'' configuration is suitable as a basis for compression applications.
Note that if we were to cut in half to , then the folding frequency in Fig.9.24 would coincide with the first null in the window transform. Since the frame rate and all its harmonics continue to land on nulls in the window transform, overlapadd is still exact. At this reduced hop size, however, the STFT becomes much more robust to spectral modifications, because all aliasing in the effective downsampled filter bank is now weighted by the side lobes of the window transform, with no aliasing components coming from within the main lobe. This is the central result of [9].
Kaiser OverlapAdd Example
Matlab code:
M = 33; % Window length beta = 8; w = kaiser(M,beta); R = floor(1.7*(M1)/(beta+1)); % ROUGH estimate (gives R=6)
Figure 9.25 plots the overlapadded Kaiser windows, and Fig.9.26 shows the steadystate overlapadd (a time segment sometime after the first 30 samples). The ``predicted'' OLA is computed using the Poisson Summation Formula using the same matlab code as before. Note that the Poisson summation formula gives exact results to within numerical precision. The upper (lower) bound was computed by summing (subtracting) the windowtransform magnitudes at all framerate harmonics to (from) the dc gain of the window. This is one example of how the PSF can be used to estimate upper and lower bounds on OLA error.
The difference between measured steadystate overlapadd and that computed using the Poisson summation formula is shown in Fig.9.27. Again the two methods agree to within numerical precision.
Finally, Fig.9.28 shows the Kaiser window transform, with marks indicating the folding frequency at the chosen hop size , as well as the framerate and twice the frame rate. We see that the frame rate (hop size) has been well chosen for this window, as the folding frequency lies very close to what would be called the ``stop band'' of the Kaiser window transform. The ``stopband rejection'' can be seen to be approximately dB (height of highest side lobe in Fig.9.28). We conclude that this examplea length 33 Kaiser window with and hopsize  represents a reasonably highquality audio STFT that will be robust in the presence of spectral modifications. We expect such robustness whenever the folding frequency lies above the main lobe of the window transform.
Remember that, for robustness in the presence of spectral modifications, the frame rate should be more than twice the highest mainlobe frequency.
Next Section:
STFT with Modifications
Previous Section:
Portnoff Windows