Free Books

Downsampled STFT Filter Bank

So far we have considered only $ R=1$ (the ``sliding'' DFT) in our filter-bank interpretation of the STFT. For $ R>1$ we obtain a downsampled version of $ X_m(\omega_k)$ :

X_{mR}(\omega_k) &=& \sum_{n=-\infty}^\infty [x(n)e^{-j\omega_kn}]\tilde{w}(mR-n)
\hspace{1.2cm} (\tilde{w} \mathrel{\stackrel{\Delta}{=}}\hbox{\sc Flip}(w)) \\
&=& (x_k \ast {\tilde w})(mR)

Let us define the downsampled time index as $ \tilde{m} \mathrel{\stackrel{\Delta}{=}}mR$ so that

$\displaystyle X_{\tilde{m}}(\omega_k) = \sum_{n=-\infty}^\infty [x(n)e^{-j\omega_kn}]\tilde{w}(\tilde{m}-n) \mathrel{\stackrel{\Delta}{=}}\left(x_k \ast {\tilde w}\right)(\tilde{m})$ (10.25)

i.e., $ X_{\tilde{m}}$ is simply $ X_m$ evaluated at every $ R^{th}$ sample, as shown in Fig.9.17.

% latex2html id marker 25320\psfrag{w}{{\Large $\protect\hbox{\sc Flip}(w)$\ }}\psfrag{x(n)}{\Large $x(n)$\ }\psfrag{Xm}{\Large $X_m$\ }\psfrag{Xmt}{\Large $X_{\tilde{m}}$\ }\psfrag{X0}{\Large $X_{\tilde{m}}(\omega_0)$\ }\psfrag{X1}{\Large $X_{\tilde{m}}(\omega_1)$\ }\psfrag{XNm1}{\Large $X_{\tilde{m}}(\omega_{N-1})$\ }\psfrag{ejw0}{\Large $e^{-j\omega_0n}$\ }\psfrag{ejw1}{\Large $e^{-j\omega_1n}$\ }\psfrag{ejwNm1}{\Large $e^{-j\omega_{N-1}n}$\ }\psfrag{dR}{\Large $\downarrow R$\ }\begin{figure}[htbp]
\caption{Downsampled STFT filter bank.}

Note that this can be considered an implementation of a phase vocoder filter bank [212]. (See §G.5 for an introduction to the vocoder.)

Filter Bank Reconstruction

% latex2html id marker 25351\psfrag{w}{{\Large $f$\ }} % should fix source (.draw file)\begin{figure}[htbp]
\caption{Interpolated, remodulated, filter-bank sum.}

Since the channel signals are downsampled, we generally need interpolation in the reconstruction. Figure 9.18 indicates how we might pursue this. From studying the overlap-add framework, we know that the inverse STFT is exact when the window $ w(n)$ is $ \hbox{\sc Cola}(R)$ , that is, when $ \hbox{\sc Alias}_R(w)$ is constant. In only these cases can the STFT be considered a perfect reconstruction filter bank. From the Poisson Summation Formula in §8.3.1, we know that a condition equivalent to the COLA condition is that the window transform $ W(\omega)$ have notches at all harmonics of the frame rate, i.e., $ W(2\pi k/R)=0$ for $ k=1,2,3,R-1$ . In the present context (filter-bank point of view), perfect reconstruction appears impossible for $ R>1$ , because for ideal reconstruction after downsampling, the channel anti-aliasing filter ($ w$ ) and interpolation filter ($ f$ ) have to be ideal lowpass filters. This is a true conclusion in any single channel, but not for the filter bank as a whole. We know, for example, from the overlap-add interpretation of the STFT that perfect reconstruction occurs for hop-sizes greater than 1 as long as the COLA condition is met. This is an interesting paradox to which we will return shortly.

What we would expect in the filter-bank context is that the reconstruction can be made arbitrarily accurate given better and better lowpass filters $ w$ and $ f$ which cut off at $ \omega_c = \pi/R$ (the folding frequency associated with down-sampling by $ R$ ). This is the right way to think about the STFT when spectral modifications are involved.

In Chapter 11 we will develop the general topic of perfect reconstruction filter banks, and derive various STFT processors as special cases.

Next Section:
Downsampling with Anti-Aliasing
Previous Section:
Uniform Running-Sum Filter Banks