In the Short-Time Fourier Transform, which implements a uniform
FIR filter bank (Chapter 9), each FFT bin can be regarded as one
sample of the filter-bank output in one channel. It is elementary
that summing adjacent filter-bank signals sums the corresponding
pass-bands to create a wider pass-band. Summing adjacent FFT bins in
the STFT, therefore, synthesizes one sample from a wider pass-band
implemented using an FFT. This is essentially how a constant-Q
transform is created from an FFT in  (using a
different frequency-weighting, or ``smoothing kernel''). However,
when making a filter bank, as opposed to only a transform used for
spectrographic purposes, we must be able to step the FFT through time
and compute properly sampled time-domain filter-bank signals.
Summing STFT Bins
The wider pass-band created by adjacent-channel summing requires a higher sampling rate in the time domain to avoid aliasing. As a result, the maximum STFT ``hop size'' is limited by the widest pass-band in the filter bank. For audio filter banks, low-frequency channels have narrow bandwidths, while high-frequency channels are wider, thereby forcing a smaller hop size for the STFT. This means that the low-frequency channels are heavily oversampled when the high-frequency channels are merely adequately sampled (in time) [30,88]. In an octave filter-bank, for example, the top octave, occupying the entire upper half of the spectrum, requires a time-domain step-size of no more than two samples, if aliasing of the band is to be avoided. Each octave down is then oversampled (in time) by an additional factor of 2.
Inverse Transforming STFT Bin Groups