Downsampled STFT Filter Banks
We now look at STFT filter banks which are downsampled by the
factor
. The downsampling factor
corresponds to a hop size
of
samples in the overlap-add view of the STFT. From the
filter-bank point of view, the impact of
is aliasing in
the channel signals when the lowpass filter (analysis window) is less
than ideal. When the conditions for perfect reconstruction are met,
this aliasing will be canceled in the reconstruction (when
the filter-bank channel signals are remodulated and summed).
Downsampled STFT Filter Bank
So far we have considered only
(the ``sliding'' DFT) in our
filter-bank interpretation of the STFT. For
we obtain a
downsampled version of
:
![\begin{eqnarray*}
X_{mR}(\omega_k) &=& \sum_{n=-\infty}^\infty [x(n)e^{-j\omega_kn}]\tilde{w}(mR-n)
\hspace{1.2cm} (\tilde{w} \mathrel{\stackrel{\Delta}{=}}\hbox{\sc Flip}(w)) \\
&=& (x_k \ast {\tilde w})(mR)
\end{eqnarray*}](http://www.dsprelated.com/josimages_new/sasp2/img1659.png)
Let us define the downsampled time index as
so
that
![]() |
(10.25) |
i.e.,



Note that this can be considered an implementation of a phase vocoder filter bank [212]. (See §G.5 for an introduction to the vocoder.)
Filter Bank Reconstruction
Since the channel signals are downsampled, we generally need
interpolation in the reconstruction. Figure 9.18
indicates how we might pursue this. From studying the overlap-add
framework, we know that the inverse STFT is exact when the
window
is
, that is, when
is constant.
In only these cases can the STFT be considered a perfect
reconstruction filter bank. From the Poisson Summation Formula in
§8.3.1, we know that a condition
equivalent to the COLA condition is that the window
transform
have notches at all harmonics
of the frame rate, i.e.,
for
. In the
present context (filter-bank point of view), perfect reconstruction
appears impossible for
, because for ideal reconstruction
after downsampling, the channel anti-aliasing filter (
) and
interpolation filter (
) have to be ideal lowpass filters.
This is a true conclusion in any single channel, but not for the
filter bank as a whole. We know, for example, from the overlap-add
interpretation of the STFT that perfect reconstruction occurs for
hop-sizes greater than 1 as long as the COLA condition is met. This
is an interesting paradox to which we will return shortly.
What we would expect in the filter-bank context is that the
reconstruction can be made arbitrarily accurate given better and
better lowpass filters
and
which cut off at
(the folding frequency associated with down-sampling by
). This is
the right way to think about the STFT when spectral
modifications are involved.
In Chapter 11 we will develop the general topic of perfect reconstruction filter banks, and derive various STFT processors as special cases.
Downsampling with Anti-Aliasing
In OLA, the hop size
is governed by the COLA constraint
![]() |
(10.26) |
In FBS,




Properly Anti-Aliasing Window Transforms
For simplicity, define window-transform bandlimits at first
zero-crossings about the main lobe. Given the first zero of
at
, we obtain
![]() |
(10.27) |
The following table gives maximum hop sizes for various window types in the Blackman-Harris family, where


L | Window Type (Length ![]() |
![]() |
![]() |
1 | Rectangular | M/2 | M |
2 | Generalized Hamming | M/4 | M/2 |
3 | Blackman Family | M/6 | M/3 |
L | ![]() |
M/2L | M/L |

It is interesting to note that the maximum COLA hop size is
double the maximum downsampling factor which avoids aliasing of the
main lobe of the window transform in FFT-bin signals
. Since the COLA constraint is a sufficient condition
for perfect reconstruction, this aliasing is quite heavy (see
Fig.9.21), yet it is all canceled in the
reconstruction. The general theory of aliasing cancellation in perfect
reconstruction filter banks will be taken up in Chapter 11.
It is important to realize that aliasing cancellation is
disturbed by FBS spectral modifications.10.4For robustness in the presence of spectral modifications, it is
advisable to keep
. For compression, it
is common to use
together with a ``synthesis window'' in a weighted overlap-add (WOLA)
scheme (§8.6).
Hop Sizes for WOLA
In the weighted overlap-add method, with the synthesis (output) window equal to the analysis (input) window, we have the following modification of the recommended maximum hop-size table:
L | In and Out Window (Length ![]() |
![]() |
![]() |
1 | Rectangular (![]() |
M/2 | M |
2 | Generalized Hamming (![]() |
M/6 | M/3 |
3 | Blackman Family (![]() |
M/10 | M/5 |
L | ![]() |
M/(4L-2) | M/(2L-1) |
-
is equal to
divided by the main-lobe width in ``side lobes'', while
-
is
divided by the first notch frequency in the window transform (lowest available frame rate at which all frame-rate harmonics are notched).
- For windows in the Blackman-Harris families, and
with main-lobe widths defined from zero-crossing to zero-crossing,
.
Constant-Overlap-Add (COLA) Cases
- Weak COLA: Window transform has zeros at frame-rate harmonics:
- Perfect OLA reconstruction
- Relies on aliasing cancellation in frequency domain
- Aliasing cancellation is disturbed by spectral modifications
- See Portnoff for further details
- Strong COLA: Window transform is bandlimited consistent with
downsampling by the frame rate:
- Perfect OLA reconstruction
- No aliasing
- better for spectral modifications
- Time-domain window infinitely long in ideal case
Hamming Overlap-Add Example
Matlab code:
M = 33; % window length w = hamming(M); R = (M-1)/2; % maximum hop size w(M) = 0; % 'periodic Hamming' (for COLA) %w(M) = w(M)/2; % another solution, %w(1) = w(1)/2; % interesting to compare
Periodic-Hamming OLA from Poisson Summation Formula
Matlab code:
ff = 1/R; % frame rate (fs=1) N = 6*M; % no. samples to look at OLA sp = ones(N,1)*sum(w)/R; % dc term (COLA term) ubound = sp(1); % try easy-to-compute upper bound lbound = ubound; % and lower bound n = (0:N-1)'; for (k=1:R-1) % traverse frame-rate harmonics f=ff*k; csin = exp(j*2*pi*f*n); % frame-rate harmonic % find exact window transform at frequency f Wf = w' * conj(csin(1:M)); hum = Wf*csin; % contribution to OLA "hum" sp = sp + hum/R; % "Poisson summation" into OLA % Update lower and upper bounds: Wfb = abs(Wf); ubound = ubound + Wfb/R; % build upper bound lbound = lbound - Wfb/R; % build lower bound end
In this example, the overlap-add is theoretically a perfect constant
(equal to
) because the frame rate and all its harmonics
coincide with nulls in the window transform (see
Fig.9.24). A plot of the steady-state
overlap-add and that computed using the Poisson Summation Formula (not
shown) is constant to within numerical precision. The
difference between the actual overlap-add and that computed
using the PSF is shown in Fig.9.23. We verify that the
difference is on the order of
, which is close enough to
zero in double-precision (64-bit) floating-point computations. We
thus verify that the overlap-add of a length
Hamming window using
a hop size of
samples is constant to within machine
precision.
Figure 9.24 shows the zero-padded DFT of the
modified Hamming window we're using (
) with the
frame-rate harmonics marked. In this example (
), the upper
half of the main lobe aliases into the lower half of the main
lobe. (In fact, all energy above the folding frequency
aliases into the lower half of the main lobe.) While this window and
hop size still give perfect reconstruction under the STFT, spectral
modifications will disturb the aliasing cancellation during
reconstruction. This ``undersampled'' configuration is suitable as a
basis for compression applications.
Note that if we were to cut
in half to
, then the folding
frequency in Fig.9.24 would coincide with the
first null in the window transform. Since the frame rate and all its
harmonics continue to land on nulls in the window transform,
overlap-add is still exact. At this reduced hop size, however, the
STFT becomes much more robust to spectral modifications, because all
aliasing in the effective downsampled filter bank is now weighted by
the side lobes of the window transform, with no aliasing
components coming from within the main lobe. This is the central
result of [9].
Kaiser Overlap-Add Example
Matlab code:
M = 33; % Window length beta = 8; w = kaiser(M,beta); R = floor(1.7*(M-1)/(beta+1)); % ROUGH estimate (gives R=6)
Figure 9.25 plots the overlap-added Kaiser windows, and Fig.9.26 shows the steady-state overlap-add (a time segment sometime after the first 30 samples). The ``predicted'' OLA is computed using the Poisson Summation Formula using the same matlab code as before. Note that the Poisson summation formula gives exact results to within numerical precision. The upper (lower) bound was computed by summing (subtracting) the window-transform magnitudes at all frame-rate harmonics to (from) the dc gain of the window. This is one example of how the PSF can be used to estimate upper and lower bounds on OLA error.
The difference between measured steady-state overlap-add and that computed using the Poisson summation formula is shown in Fig.9.27. Again the two methods agree to within numerical precision.
Finally, Fig.9.28 shows the Kaiser window
transform, with marks indicating the folding frequency at the chosen
hop size
, as well as the frame-rate and twice the frame rate. We
see that the frame rate (hop size) has been well chosen for this
window, as the folding frequency lies very close to what would be
called the ``stop band'' of the Kaiser window transform. The
``stop-band rejection'' can be seen to be approximately
dB
(height of highest side lobe in Fig.9.28). We
conclude that this example--a length 33 Kaiser window with
and hop-size
-- represents a reasonably high-quality audio STFT
that will be robust in the presence of spectral modifications. We
expect such robustness whenever the folding frequency lies above the
main lobe of the window transform.
Remember that, for robustness in the presence of spectral modifications, the frame rate should be more than twice the highest main-lobe frequency.
Next Section:
STFT with Modifications
Previous Section:
Portnoff Windows