Free Books

Convolving with Long Signals

We saw that we can perform efficient convolution of two finite-length sequences using a Fast Fourier Transform (FFT). There are some situations, however, in which it is impractical to use a single FFT for each convolution operand:

  • One or both of the signals being convolved is very long.
  • The filter must operate in real time. (We can't wait until the input signal ends before providing an output signal.)
Direct convolution does not have these problems. For example, given a causal finite-impulse response (FIR) $ h$ of length $ L$ , we need only store the past $ L-1$ samples of the input signal $ x$ to calculate the next output sample, since

y(n) &=& (h\ast x)(n) = \sum_{m=0}^n h(m)x(n-m)\\
&=& h(0)x(n) + h(1)x(n-1)
+\cdots+ h(L-1) x(n-L+1)

Thus, at every time $ n$ , the output $ y(n)$ can be computed as a linear combination of the current input sample $ x(n)$ and the current filter state $ \{x(n-1),\ldots,x(n-L+1)\}$ .

To obtain the benefit of high-speed FFT convolution when the input signal is very long, we simply chop up the input signal $ x$ into blocks, and perform convolution on each block separately. The output is then the sum of the separately filtered blocks. The blocks overlap because of the ``ringing'' of the filter. For a zero-phase filter, each block overlaps with both of its neighboring blocks. For causal filters, each block overlaps only with its neighbor to the right (the next block in time). The fact that signal blocks overlap and must be added together (instead of simply abutted) is the source of the name overlap-add method for FFT convolution of long sequences [7,9].

The idea of processing input blocks separately can be extended also to both operands of a convolution (both $ x$ and $ h$ in $ x\ast h$ ). The details are a straightforward extension of the single-block-signal case discussed below.

When simple FFT convolution is being performed between a signal $ x$ and FIR filter $ h$ , there is no reason to use a non-rectangular window function on each input block. A rectangular window length of $ M$ samples may advance $ M$ samples for each successive frame (hop size $ =M$ samples). In this case, the input blocks do not overlap, while the output blocks overlap by the FIR filter length (minus one sample). On the other hand, if nonlinear and/or time-varying spectral modifications to be performed, then there are good reasons to use a non-rectangular window function and a smaller hop size, as we will develop below.

Overlap-Add Decomposition

Consider breaking an input signal $ x$ into frames using a finite, zero-phase, length $ M$ window $ w$ . Then we may express the $ m$ th windowed data frame as

$\displaystyle x_m(n) \isdefs x(n)w(n-mR),\quad n \in (-\infty, +\infty)$ (9.17)


$\displaystyle x_m \isdefs x \cdot \hbox{\sc Shift}_{mR}(w)$ (9.18)


R &\isdef & \hbox{frame step (hop size)}\\
m &\isdef & \hbox{frame index}

The hop size is the number of samples between the begin-times of adjacent frames. Specifically, it is the number of samples by which we advance each successive window.

Figure 8.8 shows an input signal (top) and three successive windowed data frames using a length $ M=128$ causal Hamming window and 50% overlap ($ R=M/2=64$ ).

Figure 8.8: Input signal (top) and three successive windowed data frames.
\includegraphics[width=\textwidth ]{eps/windsig}

For frame-by-frame spectral processing to work, we must be able to reconstruct $ x$ from the individual overlapping frames, ideally by simply summing them in their original time positions. This can be written as

x(n) &=& \sum_{m=-\infty}^{\infty} x_m(n) \\
&=& \sum_{m=-\infty}^{\infty} x(n) w(n-mR) \\
&=& x(n) \sum_{m=-\infty}^{\infty} w(n-mR)

Hence, $ x= \sum_{m} x_m$ if and only if

$\displaystyle \zbox {\sum_{m\in{\bf Z}} w(n-mR) = 1, \,\forall n\in{\bf Z}.}$ (9.19)

This is the constant-overlap-add (COLA)9.6 constraint for the FFT analysis window $ w$ . It has also been called the partition of unity property.

Figure 8.9: Example of 50% overlap-add for the Bartlett (triangular) window

Figure 8.9 illustrates the appearance of 50% overlap-add for the Bartlett (triangular) window. The Bartlett window is clearly COLA for a wide variety of hop sizes, such as $ M/2$ , $ M/3$ , and so on, provided $ M/k$ is an integer (otherwise the underlying continuous triangular window must be resampled). However, when using windows defined in a library, the COLA condition should be carefully checked. For example, the following Matlab/Octave script shows that there is a problem with the standard Hamming window:

M = 33;          % window length
R = (M-1)/2;     % hop size
N = 3*M;         % overlap-add span
w = hamming(M);  % window
z = zeros(N,1);  plot(z,'-k');  hold on;  s = z;
for so=0:R:N-M
  ndx = so+1:so+M;        % current window location
  s(ndx) = s(ndx) + w;    % window overlap-add
  wzp = z; wzp(ndx) = w;  % for plot only
  plot(wzp,'--ok');       % plot just this window
plot(s,'ok');  hold off;  % plot window overlap-add
The figure produced by this matlab code is shown in Fig.8.10. As can be seen, the equal end-points sum to form an impulse in each frame of the overlap-add.

Figure 8.10: Overlap-add example for the default Hamming window in Matlab.

The Matlab window functions (such as hamming) have an optional second argument which can be either 'symmetric' (the default), or 'periodic'. The periodic case is equivalent to

w = hamming(M+1); % symmetric case
w = w(1:M);       % delete last sample for periodic case
The periodic variant solves the non-constant overlap-add problem for even $ M$ and $ R=M/2$ , but not for odd $ M$ . The problem can be solved for odd $ M$ and $ R=(M-1)/2$ while retaining symmetry as follows:
w = hamming(M); % symmetric case
w(1) = w(1)/2;  % repair constant-overlap-add for R=(M-1)/2
w(M) = w(M)/2;
Since different window types may add or subtract 1 to/from $ M$ internally, it is best to check the result using test code as above to make sure the window is COLA at the desired hop size. E.g., in Matlab:
  • hamming(M) $ \isdef $
    .54 - .46*cos(2*pi*(0:M-1)'/(M-1));
    gives constant overlap-add for $ R=(M-1)/2$ , $ (M-1)/4$ , etc., when endpoints are divided by 2 or one endpoint is zeroed
  • hanning(M) $ \isdef $
    .5*(1 - cos(2*pi*(1:M)'/(M+1)));
    does not give constant overlap-add for $ R=(M-1)/2$ , but does for $ R=(M+1)/2$

  • blackman(M) $ \isdef $
    .42 - .5*cos(2*pi*m)' + .08*cos(4*pi*m)';
    where m = (0:M-1)/(M-1), gives constant overlap-add for $ R=(M-1)/3$ when $ M$ is odd and $ R$ is an integer, and $ R=M/3$ when $ M$ is even and $ R$ is integer.

In summary, all windows obeying the constant-overlap-add constraint will yield perfect reconstruction of the original signal $ x$ from the data frames $ x_m = x\cdot\hbox{\sc Shift}_{mR}(w)$ by overlap-add (OLA). There is no constraint on window type, only that the window overlap-adds to a constant for the hop size used. In particular, $ R=1$ always yields a constant overlap-add for any window function. We will learn later (§8.3.1) that there is also a simple frequency-domain test on the window transform for the constant overlap-add property.

To emphasize an earlier point, if simple time-invariant FIR filtering is being implemented, and we don't need to work with the intermediate STFT, it is most efficient to use the rectangular window with hop size $ R=M$ , and to set $ M=N-L+1$ , where $ L$ is the length of the filter $ h$ and $ N$ is a convenient FFT size. The optimum $ M$ for a given $ L$ is an interesting exercise to work out.

COLA Examples

So far we've seen the following constant-overlap-add examples:

  • Rectangular window at 0% overlap (hop size $ R$ = window size $ M$ )
  • Bartlett window at 50% overlap ( $ R\approx M/2$ ) (Since normally $ M$ is odd, `` $ R\approx M/2$ '' means ``R=(M-1)/2,'' etc.)
  • Hamming window at 50% overlap ( $ R\approx M/2$ )
In addition, we can mention the following cases (referring to window types discussed in Chapter 3):
  • Rectangular window at 50% overlap ( $ R\approx M/2$ )
  • Hamming window at 75% overlap ($ R=M/4=25$ % hop size)
  • Any member of the Generalized Hamming family at 50% overlap
  • Any member of the Blackman family at 2/3 overlap (1/3 hop size); e.g., blackman(33,'periodic'), $ R=11$
  • Any member of the $ L$ -term Blackman-Harris family with $ R\approx M/L$ .
  • Any window with R=1 (``sliding FFT'')
Recall from §3.2.6, that many audio coders use the MLT sine window. The window is applied twice: once before the FFT (the ``analysis window'') and secondly after the inverse FFT prior to reconstruction by overlap-add (the so-called ``synthesis window''). Since the window is effectively squared, it functions as a Hann window for overlap-add purposes (a member of the Generalized Hamming family). As such, it can be used with 50% overlap. More generally, any positive COLA window can be split into an analysis and synthesis window pair by taking its square root.

STFT of COLA Decomposition

To represent practical FFT implementations, it is preferable to shift the $ m^{th}$ frame back to the time origin:

$\displaystyle {\tilde x}_m(n) \isdef x_m(n+mR) \eqsp \hbox{\sc Shift}_{-mR,n}(x_m)$ (9.20)

This is summarized in Fig.8.11. Zero-based frames are needed because the leftmost input sample is assigned to time zero by FFT algorithms. In other words, a hopping FFT effectively redefines time zero on each hop. Thus, a practical STFT is a sequence of FFTs of the zero-based frames $ {\tilde x}_0, {\tilde x}_1, \ldots$ . On the other hand, papers in the literature (such as [7,9]) work with the fixed time-origin case ( $ x_0, x_1,
\ldots$ ). Since they differ only by a time shift, it is not hard to translate back and forth.

% latex2html id marker 21934\psfrag{x}{$x$}\psfrag{Zero-centered 3rd frame x_3: M = 64, R = M/2}%
{\normalsize Zero-centered 3rd frame $x_3$: $M = 64$, $R = M/2$}\psfrag{x_3}{$x_3$} % doesn't work\psfrag{xtilde_3}{${\tilde x}_3$}\begin{figure}[htbp]
\caption{Input signal $x$\ (top), third frame
$x_3$\ in its natural time location (middle), and the third frame
shifted to time 0, ${\tilde x}_3$\ (bottom).}

Note that we may sample the DTFT of both $ x_m$ and $ {\tilde x}_m$ , because both are time-limited to $ M$ nonzero samples. The minimum information-preserving sampling interval along the unit circle in both cases is $ \Omega_M \isdeftext 2\pi/M$ . In practice, we often oversample to some extent, using $ \Omega_N$ with $ N>M$ instead. For $ {\tilde x}_m$ , we get

$\displaystyle {\tilde X}_m(\omega_k)$ $\displaystyle \isdef$ $\displaystyle \hbox{\sc Sample}_{\Omega_N,k}\left(\hbox{\sc DTFT}\left({\tilde x}_m\right)\right)$  
  $\displaystyle =$ $\displaystyle \hbox{\sc DFT}_{N,k}({\tilde x}_m),
\protect$ (9.21)

where $ \omega_k \isdef 2\pi k/N = k\Omega_N$ . For $ x_m$ we have

X_m(\omega_k) &\isdef & \hbox{\sc Sample}_{\Omega_N,k}\left(\hbox{\sc DTFT}(x_m)\right)\\
&\longleftrightarrow& \hbox{\sc Alias}_N(x_m)\\
&\neq& {\tilde x}_m \; \hbox{(in general).}

Since $ {\tilde x}_m = \hbox{\sc Shift}_{-mR}(x_m)$ , their transforms are related by the shift theorem:

{\tilde X}_m(\omega_k) &=& e^{jmR\omega_k} X_m(\omega_k) \\
{\tilde x}_m(n) &=& \hbox{\sc Alias}_{N,n+mR}(x_m)\\
&=& x_m(n+mR)_N

where $ (n+mR)_N$ denotes modulo $ N$ indexing (appropriate since the DTFTs have been sampled at intervals of $ \Omega_N = 2\pi/N$ ).

Acyclic Convolution

Getting back to acyclic convolution, we may write it as

y(n) &=& (x * h)(n), \quad n\in{\bf Z}\\
&=& \left ( \left( \sum_m x_m \right) * h \right)(n) \\
&=& \sum_m (x_m * h)(n) \qquad \hbox{(by linearity of convolution)}\\
&=& \sum_m (( \hbox{\sc Shift}_{mR}({\tilde x}_m)) * h)(n) \\
&=& \sum_m \sum_l \hbox{\sc Shift}_{mR,l}({\tilde x}_m)h(n-l) \\
&=& \sum_m \sum_l {\tilde x}_m(l-mR)h(n-l) \\
&=& \sum_m \sum_{l^{\prime}} {\tilde x}_m(l^{\prime})h(n-mR-l^{\prime})) \\
&=& \sum_m \hbox{\sc Shift}_{mR,n}( {\tilde x}_m * h) \\
&=& \sum_m \hbox{\sc Shift}_{mR} ( \hbox{\sc DTFT}^{-1}( {\tilde X}_m \cdot H) ) \\

Since $ {\tilde x}_m$ is time limited to $ [0,\ldots,M-1]$ (or $ [-(M-1)/2,(M-1)/2]$ ), $ {\tilde X}_m$ can be sampled at intervals of $ \Omega_M = 2\pi/M$ without time aliasing. If $ h$ is time-limited to $ [0,L-1]$ , then $ {\tilde x}_m * h$ will be time limited to $ M+L-1$ . Therefore, we may sample $ {\tilde X}_m\cdot H$ at intervals of

$\displaystyle \Omega_{M+L-1} = \frac{2 \pi}{L+M-1}$ (9.22)

or less along the unit circle. This is the dual of the usual sampling theorem.

We conclude that practical FFT acyclic convolution may be carried out using an FFT of any length $ N$ satisfying

$\displaystyle N \ge M+L-1,$ (9.23)

where $ M$ is the frame size and $ L$ is the filter length. Our final expression for $ y(n)$ is

y(n) &=&
\sum_m \hbox{\sc Shift}_{mR,n} \left[\frac{1}{N} \sum_{k=0}^{N-1}
{\tilde H}(\omega_k) {\tilde X}_m(\omega_k) e^{j\omega_k n T}\right]\\
\sum_m \hbox{\sc Shift}_{mR,n}\left\{ \hbox{\sc IFFT}_N[\hbox{\sc FFT}_N({\tilde x}_m)\cdot \hbox{\sc FFT}_N(h)]\right\},

where $ {\tilde X}_m$ is the length $ N$ DFT of the zero-padded $ m^{\mbox{th}}$ frame $ {\tilde x}_m$ , and $ {\tilde H}$ is the length $ N$ DFT of $ h$ , also zero-padded out to length $ N$ , with $ N\geq L+M-1$ .

Note that the terms in the outer sum overlap when $ R<M$ even if $ H(\omega_k)\equiv1$ . In general, an LTI filtering by $ H$ increases the amount of overlap among the frames.

This completes our derivation of FFT convolution between an indefinitely long signal $ x(n)$ and a reasonably short FIR filter $ h(n)$ (short enough that its zero-padded DFT can be practically computed using one FFT).

The fast-convolution processor we have derived is a special case of the Overlap-Add (OLA) method for short-time Fourier analysis, modification, and resynthesis. See [7,9] for more details.

Example of Overlap-Add Convolution

Let's look now at a specific example of FFT convolution:

We will work through the matlab for this example and display the results. First, the simulation parameters:

L = 31;         % FIR filter length in taps
fc = 600;       % lowpass cutoff frequency in Hz
fs = 4000;      % sampling rate in Hz

Nsig = 150;     % signal length in samples
period = round(L/3); % signal period in samples
FFT processing parameters:
M = L;                  % nominal window length
Nfft = 2^(ceil(log2(M+L-1))); % FFT Length
M = Nfft-L+1            % efficient window length
R = M;                  % hop size for rectangular window
Nframes = 1+floor((Nsig-M)/R);  % no. complete frames
Generate the impulse-train test signal:
sig = zeros(1,Nsig);
sig(1:period:Nsig) = ones(size(1:period:Nsig));
Design the lowpass filter using the window method:
epsilon = .0001;     % avoids 0 / 0
nfilt = (-(L-1)/2:(L-1)/2) + epsilon;
hideal = sin(2*pi*fc*nfilt/fs) ./ (pi*nfilt);
w = hamming(L); % FIR filter design by window method
h = w' .* hideal; % window the ideal impulse response

hzp = [h zeros(1,Nfft-L)];  % zero-pad h to FFT size
H = fft(hzp);               % filter frequency response
Carry out the overlap-add FFT processing:
y = zeros(1,Nsig + Nfft); % allocate output+'ringing' vector
for m = 0:(Nframes-1)
    index = m*R+1:min(m*R+M,Nsig); % indices for the mth frame
    xm = sig(index);  % windowed mth frame (rectangular window)
    xmzp = [xm zeros(1,Nfft-length(xm))]; % zero pad the signal
    Xm = fft(xmzp);
    Ym = Xm .* H;               % freq domain multiplication
    ym = real(ifft(Ym))         % inverse transform
    outindex = m*R+1:(m*R+Nfft);
    y(outindex) = y(outindex) + ym; % overlap add

The time waveforms for the first three frames ($ m=0,1,2$ ) are shown in Figures 8.12 through 8.14. Notice how the causal linear-phase filtering results in an overall signal delay by half the filter length. Also, note how frames 0 and 2 contain four impulses, while frame 1 only happens to catch three; this causes no difficulty, and the filtered result remains correct by superposition.

Figure 8.12: OLA Example, Frame 0.
Figure 8.13: OLA Example, Frame 1.
Figure 8.14: OLA Example, Frame 2.

Summary of Overlap-Add FFT Processing

Overlap-add FFT processors provide efficient implementations for FIR filters longer than 100 or so taps on single CPUs. Specifically, we ended up with:

$\displaystyle y = \sum_{m=-\infty}^\infty \hbox{\sc Shift}_{mR} \left( \hbox{\sc DFT}_N^{-1} \left\{ H \cdot \hbox{\sc DFT}_N\left[\hbox{\sc Shift}_{-mR}(x)\cdot w \right]\right\}\right)$ (9.24)

where $ \hbox{\sc Shift}()$ is acyclic in this context. Stated as a procedure, we have the following steps in an overlap-add FFT processor:
Extract the $ m$ th length $ M$ frame of data at time $ mR$ .
Shift it to the base time interval $ [0,M-1]$ (or $ [-(M-1)/2,(M-1)/2]$ ).
Optionally apply a length $ M$ analysis window $ w$ (causal or zero phase, as preferred). For simple LTI filtering, the rectangular window is fine.
Zero-pad the windowed data out to the FFT size $ N$ (a power of 2), such that $ N\geq M+L-1$ , where $ L$ is the FIR filter length.
Take the $ N$ -point FFT.
Apply the filter frequency-response $ H=\hbox{\sc FFT}_N(h)$ as a windowing operation in the frequency domain.
Take the $ N$ -point inverse FFT.
Shift the origin of the $ N$ -point result out to sample $ mR$ where it belongs.
Sum into the output buffer containing the results from prior frames (OLA step).
The condition $ N\geq M+L-1$ is necessary to avoid time aliasing, i.e., to implement acyclic convolution using an FFT; this condition is equivalent to a minimum sampling-rate requirement in the frequency domain.

A second condition is that the analysis window be COLA at the hop size used:

$\displaystyle \sum_m w(n-mR) = 1, \, \forall n\in{\bf Z}.$ (9.25)

Next Section:
Dual of Constant Overlap-Add
Previous Section:
Convolution of Short Signals