Time Varying OLA Modifications

In the preceding sections, we assumed that the spectral modification $ H$ did not vary over time. We will now examine the implications of time-varying spectral modifications. The derivation below follows [9], except that we'll keep our previous notation:

\begin{eqnarray*}
X_m(\omega_k) &=& \hbox{sampled DTFT (FFT) of $m$th input frame, $k=0,1,\ldots,N-1$}\\
H_m(\omega_k) &=& \hbox{time varying spectral modification (new each frame)}\\
Y_m(\omega_k) &=& \hbox{$X_m(\omega_k) H_m(\omega_k) = m$th output spectrum}\\
\omega_k &=&\hbox{ $2\pi k / N$\ = $k$th spectral sample}\\
N &=& \hbox{FFT length}\\
M &=& \hbox{window $w$\ length: $x_m(n) = x(n)w(n-m)$}\\
L &=& \hbox{\emph{maximum} length of FIR filter $h_m$\ applied to each frame}\\
N &\ge& \hbox{ $M+L-1$\ to avoid time aliasing in $y_m$}
\end{eqnarray*}

Using $ H_m$ in our OLA formulation with a hop size $ R=1$ results in

\begin{eqnarray*}
y(n) &=& \sum_{m=-\infty}^\infty y_m(n) \\
&=& \sum_{m=-\infty}^\infty \frac{1}{N}\sum_{k=0}^{N-1} X_m(\omega_k) H_m(\omega_k) e^{j\omega_kn} \\
&=& \sum_{m=-\infty}^\infty \frac{1}{N}\sum_{k=0}^{N-1}
\left[ \sum_{l=-\infty}^\infty x(l) w(l-m)e^{-j\omega_kl} \right]
H_m(\omega_k) e^{j\omega_kn} \\
&=& \sum_{l=-\infty}^\infty x(l) \sum_{m=-\infty}^\infty w(l-m)
\frac{1}{N}\sum_{k=0}^{N-1} H_m(\omega_k)
e^{j\omega_k(n-l)} \\
&=& \sum_{l=-\infty}^\infty x(l)
\sum_{m=-\infty}^\infty w(l-m) h_m(n-l) \\
\end{eqnarray*}

Define $ r \mathrel{\stackrel{\Delta}{=}}n-l \;\Rightarrow\; l = n-r$ to get

$\displaystyle y(n)=\sum_{r=-\infty}^\infty x(n-r) \sum_{m=-\infty}^\infty h_m(r) w(n-r-m).$ (9.42)

Let's examine the term $ \displaystyle\sum_{m=-\infty}^\infty h_m(r) w(
n-r-m )$ in more detail:
  • $ h_m(r)$ describes the time variation of the $ r^{th}$ tap.
  • $ \sum_{m=-\infty}^\infty h_m(r) w[(n-r)-m] = [h_{(\cdot)}(r) \ast w](n-r)$ is a filtered version of the $ r^{th}$ tap $ h_m(r)$ . It is lowpass-filtered by w and delayed by $ r$ samples.
  • Denote the $ r$ th time-varying, lowpass-filtered, delayed-by-$ r$ filter tap by $ {\hat h}_{n-r}(r)$ . This can be interpreted as the weighting in the output at time $ r$ of an impulse entering the time-varying filter at time $ n-r$ .
Using this, we get

\begin{eqnarray*}
y(n) &=& \sum_{r=-\infty}^\infty x(n-r) {\hat h}_{n-r}(r) \\
&=& x(n) {\hat h}_n(0) \\
& & + x(n-1) {\hat h}_{n-1}(1) + x(n-2) {\hat h}_{n-2}(2) + \cdots \\
& & + x(n+1) {\hat h}_{n+1}(-1) + x(n+2) {\hat h}_{n+2}(-2) + \cdots
\end{eqnarray*}

This is a superposition sum for an arbitrary linear, time-varying filter $ {\hat h}_{n-r}(r) = [h_{(\cdot)}(r) \ast w](n-r)$ .

Block Diagram Interpretation of Time-Varying STFT Modifications

Assuming $ {\hat h}$ is causal gives

\begin{eqnarray*}
y(n) &=& \sum_{r=0}^\infty x(n-r) {\hat h}_{n-r}(r) \\
&=& x(n) {\hat h}_n(0) + x(n-1) {\hat h}_{n-1}(1) + x(n-2) {\hat h}_{n-2}(2) + \cdots
\end{eqnarray*}

This is depicted in Fig.8.17.

\begin{psfrags}
% latex2html id marker 23334\psfrag{zm1}{\large $z^{-1}$\ }\psfrag{h(0,n)}{\large$ h_n(0) $}\psfrag{h(1,n)}{\large$ h_{n-1}(1) $}\psfrag{h(2,n)}{\large$ h_{n-L+1}(L-1) $}\psfrag{+}{\large$\Sigma$}\psfrag{w(n)}{\large$ w $}\psfrag{y(n)}{\large$ y(n) $}\begin{figure}[htbp]
\includegraphics[width=\twidth]{eps/olamods}
\caption{System diagram giving
an interpretation of the bandlimited time-varying filter coefficients
in the overlap-add STFT processor with a new filter each frame.}
\end{figure}
\end{psfrags}

The term $ h_n(k)$ can be interpreted as the FIR filter tap $ k$ at time $ n$ . Note how each tap is lowpass filtered by the FFT window $ w$ . The window thus enforces bandlimiting each filter tap to the bandwidth of the window's main lobe. For an $ L$ -term length-$ M$ Blackman-Harris window, for example, the main-lobe reaches zero at frequency $ L\Omega_M=2\pi L/M$ (see Table 5.2 in §5.5.2 for other examples). This bandlimiting places a limit on the bandwidth expansion caused by time-variation of the filter coefficients, which in turn places a limit on the maximum STFT hop-size that can be used without frequency-domain aliasing. See Allen and Rabiner 1977 [9] for further details on the bandlimiting property.


Length L FIR Frame Filters

To avoid time aliasing, we restrict the filter length to a maximum of $ L$ samples. Since $ H_m(\omega_k)$ is an arbitrary multiplicative weighting of the $ m$ th spectral frame, the frame filter need not be causal. For odd $ L$ , the filter impulse response indices may run from $ -L_h$ to $ L_h$ , where

$\displaystyle L_h \isdef \frac{L-1}{2}$ (9.43)

This gives

\begin{eqnarray*}
y(n) &=& \sum_{r=-L_h}^{L_h} x(n-r) {\hat h}_{n-r}(r) \\
&=& x(n) {\hat h}_n(0) \\
& & + x(n-1) {\hat h}_{n-1}(1) + \cdots + x(n-L_h) {\hat h}_{n-L_h}(L_h) \\
& & + x(n+1) {\hat h}_{n+1}(-1) + \cdots + x(n+L_h) {\hat h}_{n+L_h}(-L_h)
\end{eqnarray*}

This is the general length $ L$ time-varying FIR filter convolution sum for time $ n$ , when $ L$ is odd.


Next Section:
Weighted Overlap Add
Previous Section:
Overlap-Save Method