Multiresolution STFT

Figure 7.4 shows a multiresolution STFT for the same speech signal that was analyzed to produce Fig.7.2. The bandlimits in Hz for the five combined FFTs were $ [0,80,500,1250,2540,4050,(10000,22050)]$ , where the last two (in parentheses) were not used due to the signal sampling rate being only $ 8$ kHz. The corresponding window lengths in milliseconds were $ [64,32,16,8,4,(2,2)]$ , where, again, the last two are not needed for this example. Our hop size is chosen to be 1 ms, giving 75% overlap in the highest-frequency channel, and more overlap in lower-frequency channels. Thus, all frequency channels are oversampled along the time dimension. Since many frequency channels from each FFT will be combined via smoothing to form the ``excitation pattern'' (see next section), temporal oversampling is necessary in all channels to avoid uneven weighting of data in the time domain due to the hop size being too large for the shortened effective time-domain windows.

Figure 7.4: Multiresolution Short-Time Fourier Transform (MRSTFT). Compare to the fixed-resolution STFT in Fig.7.2.

Next Section:
Excitation Pattern
Previous Section:
A Note on Hop Size