Figure 7.4 shows a multiresolution STFT for the same speech signal that was analyzed to produce Fig.7.2. The bandlimits in Hz for the five combined FFTs were , where the last two (in parentheses) were not used due to the signal sampling rate being only kHz. The corresponding window lengths in milliseconds were , where, again, the last two are not needed for this example. Our hop size is chosen to be 1 ms, giving 75% overlap in the highest-frequency channel, and more overlap in lower-frequency channels. Thus, all frequency channels are oversampled along the time dimension. Since many frequency channels from each FFT will be combined via smoothing to form the ``excitation pattern'' (see next section), temporal oversampling is necessary in all channels to avoid uneven weighting of data in the time domain due to the hop size being too large for the shortened effective time-domain windows.
A Note on Hop Size