Filter-Bank Summation (FBS) Interpretation of the STFT

We can group the terms in the STFT definition differently to obtain the filter-bank interpretation:

$\displaystyle X_m(\omega_k)$ $\displaystyle =$ $\displaystyle \sum_{n=-\infty}^\infty \underbrace{[ x(n)e^{-j\omega_k n}]}_{x_k(n)} w(n-m)$  
  $\displaystyle =$ $\displaystyle \left[x_k \ast \hbox{\sc Flip}(w)\right](m)
\protect$ (9.2)

As will be explained further below (and illustrated further in Figures 8.3, 8.4, and 8.5), under the filter-bank interpretation, the spectrum of $ x$ is first rotated along the unit circle in the $ z$ plane so as to shift frequency $ \omega_k$ down to 0 (via modulation by $ e^{-j\omega_k n}$ in the time domain), thus forming the heterodyned signal $ x_k(n)\isdeftext x(n)\exp(-j\omega_k
n)$. Next, the heterodyned signal $ x_k(n)$ is lowpass-filtered to a narrow band about frequency 0 (via convolving with the time-reversed window $ \hbox{\sc Flip}(w)$). The STFT is thus interpreted as a frequency-ordered collection of narrow-band time-domain signals, as depicted in Fig.8.2. In other words, the STFT can be seen as a uniform filter bank in which the input signal $ x(n)$ is converted to a set of $ N$ time-domain output signals $ X_n(\omega_k)$, $ k=0,1,\ldots,N-1$, one for each channel of the $ N$-channel filter bank.

figure[htbp] \includegraphics{eps/fbs}

Expanding on the previous paragraph, the STFT (8.2) is computed by the following operations:

  • Frequency-shift $ x(n)$ by $ -\omega_k$ to get $ x_k(n) \mathrel{\stackrel{\Delta}{=}}e^{-j\omega_k n}x(n)$.
  • Convolve $ x_k(n)$ with $ {\tilde w}\mathrel{\stackrel{\Delta}{=}}\hbox{\sc Flip}(w)$ to get $ X_m(\omega_k)$:

    $\displaystyle X_m(\omega_k) = \sum_{n=-\infty}^\infty x_k(n){\tilde w}(m-n) = (x_k * {\tilde w})(m)

The STFT output signal $ X_m(\omega_k)$ is regarded as a time-domain signal (time index $ m$) coming out of the $ k$th channel of an $ N$-channel filter bank. The center frequency of the $ k$th channel filter is $ \omega_k =
2\pi k/N$, $ k=0,1,\ldots,N-1$. Each channel output signal is a baseband signal; that is, it is centered about dc, with the ``carrier term'' $ e^{j\omega_k m}$ taken out by ``demodulation'' (frequency-shifting). In particular, the $ k$th channel signal is constant whenever the input signal happens to be a sinusoid tuned to frequency $ \omega_k$ exactly.

Note that the STFT analysis window $ w$ is now interpreted as (the flip of) a lowpass-filter impulse response. Since the analysis window $ w$ in the STFT is typically symmetric, we usually have $ \hbox{\sc Flip}(w)=w$. This filter is effectively frequency-shifted to provide each channel bandpass filter. If the cut-off frequency of the window transform is $ \omega_c$ (typically half a main-lobe width), then each channel signal can be downsampled significantly. This downsampling factor is the FBS counterpart of the hop size $ R$ in the OLA context.

Figure 8.3 illustrates the filter-bank interpretation for $ R=1$ (the ``sliding STFT''). The input signal $ x(n)$ is frequency-shifted by a different amount for each channel and lowpass filtered by the (flipped) window.

% latex2html id marker 20658\psfrag{w}{\Large$\protect\hbox{\s...
... where $x_k(n)\isdeftext
x(n)\exp(-j\omega_k n)$.

Next Section:
FBS and Perfect Reconstruction
Previous Section:
Overlap-Add (OLA) Interpretation of the STFT