Time Varying OLA Modifications

Free Books Spectral Audio Signal Processing

In the preceding sections, we assumed that the spectral modification did not vary over time. We will now examine the implications of time-varying spectral modifications. The derivation below follows [9], except that we'll keep our previous notation:

$\begin{eqnarray*} X_m(\omega_k) &=& \hbox{sampled DTFT (FFT) of $m$th input frame, $k=0,1,\ldots,N-1$}\\ H_m(\omega_k) &=& \hbox{time varying spectral modification (new each frame)}\\ Y_m(\omega_k) &=& \hbox{$X_m(\omega_k) H_m(\omega_k) = m$th output spectrum}\\ \omega_k &=&\hbox{ $2\pi k / N$\ = $k$th spectral sample}\\ N &=& \hbox{FFT length}\\ M &=& \hbox{window $w$\ length: $x_m(n) = x(n)w(n-m)$}\\ L &=& \hbox{\emph{maximum} length of FIR filter $h_m$\ applied to each frame}\\ N &\ge& \hbox{ $M+L-1$\ to avoid time aliasing in $y_m$} \end{eqnarray*}$

Using in our OLA formulation with a hop size results in

$\begin{eqnarray*} y(n) &=& \sum_{m=-\infty}^\infty y_m(n) \\ &=& \sum_{m=-\infty}^\infty \frac{1}{N}\sum_{k=0}^{N-1} X_m(\omega_k) H_m(\omega_k) e^{j\omega_kn} \\ &=& \sum_{m=-\infty}^\infty \frac{1}{N}\sum_{k=0}^{N-1} \left[ \sum_{l=-\infty}^\infty x(l) w(l-m)e^{-j\omega_kl} \right] H_m(\omega_k) e^{j\omega_kn} \\ &=& \sum_{l=-\infty}^\infty x(l) \sum_{m=-\infty}^\infty w(l-m) \frac{1}{N}\sum_{k=0}^{N-1} H_m(\omega_k) e^{j\omega_k(n-l)} \\ &=& \sum_{l=-\infty}^\infty x(l) \sum_{m=-\infty}^\infty w(l-m) h_m(n-l) \\ \end{eqnarray*}$

Define $r \mathrel{\stackrel{\Delta}{=}}n-l \;\Rightarrow\; l = n-r$ to get

$\displaystyle y(n)=\sum_{r=-\infty}^\infty x(n-r) \sum_{m=-\infty}^\infty h_m(r) w(n-r-m).$

(9.42)

Let's examine the term $\displaystyle\sum_{m=-\infty}^\infty h_m(r) w( n-r-m )$ in more detail:

describes the time variation of the $r^{th}$ tap.
$\sum_{m=-\infty}^\infty h_m(r) w[(n-r)-m] = [h_{(\cdot)}(r) \ast w](n-r)$ is a filtered version of the $r^{th}$ tap . It is lowpass-filtered by w and delayed by samples.
Denote the th time-varying, lowpass-filtered, delayed-by- filter tap by ${\hat h}_{n-r}(r)$ . This can be interpreted as the weighting in the output at time of an impulse entering the time-varying filter at time .

Using this, we get

$\begin{eqnarray*} y(n) &=& \sum_{r=-\infty}^\infty x(n-r) {\hat h}_{n-r}(r) \\ &=& x(n) {\hat h}_n(0) \\ & & + x(n-1) {\hat h}_{n-1}(1) + x(n-2) {\hat h}_{n-2}(2) + \cdots \\ & & + x(n+1) {\hat h}_{n+1}(-1) + x(n+2) {\hat h}_{n+2}(-2) + \cdots \end{eqnarray*}$

This is a superposition sum for an arbitrary linear, time-varying filter ${\hat h}_{n-r}(r) = [h_{(\cdot)}(r) \ast w](n-r)$ .

Block Diagram Interpretation of Time-Varying STFT Modifications

Assuming ${\hat h}$ is causal gives

$\begin{eqnarray*} y(n) &=& \sum_{r=0}^\infty x(n-r) {\hat h}_{n-r}(r) \\ &=& x(n) {\hat h}_n(0) + x(n-1) {\hat h}_{n-1}(1) + x(n-2) {\hat h}_{n-2}(2) + \cdots \end{eqnarray*}$

This is depicted in Fig.8.17.

$\begin{psfrags} % latex2html id marker 23334\psfrag{zm1}{\large $z^{-1}$\ }\psfrag{h(0,n)}{\large$ h_n(0) $}\psfrag{h(1,n)}{\large$ h_{n-1}(1) $}\psfrag{h(2,n)}{\large$ h_{n-L+1}(L-1) $}\psfrag{+}{\large$\Sigma$}\psfrag{w(n)}{\large$ w $}\psfrag{y(n)}{\large$ y(n) $}\begin{figure}[htbp] \includegraphics[width=\twidth]{eps/olamods} \caption{System diagram giving an interpretation of the bandlimited time-varying filter coefficients in the overlap-add STFT processor with a new filter each frame.} \end{figure} \end{psfrags}$

The term can be interpreted as the FIR filter tap at time . Note how each tap is lowpass filtered by the FFT window . The window thus enforces bandlimiting each filter tap to the bandwidth of the window's main lobe. For an -term length- Blackman-Harris window, for example, the main-lobe reaches zero at frequency $L\Omega_M=2\pi L/M$ (see Table 5.2 in §5.5.2 for other examples). This bandlimiting places a limit on the bandwidth expansion caused by time-variation of the filter coefficients, which in turn places a limit on the maximum STFT hop-size that can be used without frequency-domain aliasing. See Allen and Rabiner 1977 [9] for further details on the bandlimiting property.

Length L FIR Frame Filters

To avoid time aliasing, we restrict the filter length to a maximum of samples. Since $H_m(\omega_k)$ is an arbitrary multiplicative weighting of the th spectral frame, the frame filter need not be causal. For odd , the filter impulse response indices may run from to , where

$\displaystyle L_h \isdef \frac{L-1}{2}$

(9.43)

This gives

$\begin{eqnarray*} y(n) &=& \sum_{r=-L_h}^{L_h} x(n-r) {\hat h}_{n-r}(r) \\ &=& x(n) {\hat h}_n(0) \\ & & + x(n-1) {\hat h}_{n-1}(1) + \cdots + x(n-L_h) {\hat h}_{n-L_h}(L_h) \\ & & + x(n+1) {\hat h}_{n+1}(-1) + \cdots + x(n+L_h) {\hat h}_{n+L_h}(-L_h) \end{eqnarray*}$

This is the general length time-varying FIR filter convolution sum for time , when is odd.

Next Section:
Weighted Overlap Add
Previous Section:
Overlap-Save Method