Free Books

Two Dual Interpretations of the STFT

The STFT $ \tilde{X}_m^{w,z}(\omega_k )$ can be viewed as a function of either frame-time $ m$ or bin-frequency $ k$ . We will develop both points of view in this book.

At each frame time $ m$ , the STFT can be regarded as producing a Fourier transform centered around that time. As $ m$ advances, a sequence of spectral transforms is obtained. This is depicted graphically in Fig.9.1, and it forms the basis of the overlap-add method for Fourier analysis, modification, and resynthesis [9]. It is also the basis for transform coders [16,284].

In an exact Fourier duality, each bin $ \tilde{X}_m^{w,z}(\omega_k )$ of the STFT can be regarded as a sample of the complex signal at the output of a lowpass filter whose input is $ \tilde{x}_m^{w,z}(n) e^{-j\omega_k m T}$ . As discussed in §9.1.2, this signal is obtained from $ \tilde{x}_m^{w,z}(n)$ by frequency-shifting it so that frequency $ \omega_k$ is translated down to 0 Hz. For each value of $ k$ , the time-domain signal $ \tilde{X}_m^{w,z}(\omega_k )$ , for $ m=\ldots,-2,-1,0,1,2,\ldots$ , is the output of the $ k$ th ``filter bank channel,'' for $ k=0,1,\ldots,N-1$ . In this ``filter bank'' interpretation, the hop size $ R$ can be interpreted as the downsampling factor applied to each bin-filter output, and the analysis window $ w(\,\cdot\,)$ is seen as the impulse response of the anti-aliasing filter used prior to downsampling. The window transform $ W(\omega)$ is also the frequency response of each channel filter (translated to dc). This point of view is depicted graphically in Fig.9.2 and elaborated further in Chapter 9.

Next Section:
The STFT as a Time-Frequency Distribution
Previous Section:
Summary of STFT Computation Using FFTs