Free Books

Convolution of Short Signals

Figure: System diagram for filtering an input signal $ x(n)$ by filter $ h(n)$ to produce output $ y(n)$ as the convolution of $ x$ and $ h$ .

Figure 8.1 illustrates the conceptual operation of filtering an input signal $ x(n)$ by a filter with impulse-response $ h(n)$ to produce an output signal $ y(n)$ . By the convolution theorem for DTFTs2.3.5),

$\displaystyle (h*x) \;\longleftrightarrow\;H \cdot X$ (9.9)


$\displaystyle \hbox{\sc DTFT}_\omega(h*x)\eqsp H(\omega)X(\omega)$ (9.10)

where $ h$ and $ x$ are arbitrary real or complex sequences, and $ H$ and $ X$ are the DTFTs of $ h$ and $ x$ , respectively. The convolution of $ x$ and $ h$ is defined by

$\displaystyle y(n) \eqsp (x*h)(n) \isdefs \sum_{m=-\infty}^{\infty} x(m)h(n-m).$ (9.11)

In practice, we always use the DFT (preferably an FFT) in place of the DTFT, in which case we may write

$\displaystyle \hbox{\sc DFT}_k(h*x)\eqsp H(\omega_k)X(\omega_k)$ (9.12)

where now $ h,x,H,X\in {\bf C}^N$ (length $ N$ complex sequences). It is important to remember that the specific form of convolution implied in the DFT case is cyclic (also called circular) convolution [264]:

$\displaystyle y(n) \eqsp (x*h)(n) \isdefs \sum_{m=0}^{N-1} x(m)h(n-m)_N \protect$ (9.13)

where $ (n-m)_N$ means ``$ (n-m)$ modulo $ N$ .''

Another way to look at convolution is as the inner product of $ x$ , and $ \hbox{\sc Shift}_n[\hbox{\sc Flip}(h)]$ , where $ \hbox{\sc Flip}_n(h)\isdeftext h(-n)=h(N-n)$ , i.e.,

$\displaystyle y(n) \eqsp \langle x, \hbox{\sc Shift}_n[\hbox{\sc Flip}(h)] \rangle. % \qquad\hbox{($h$ real)}
$ (9.14)

This form describes graphical convolution in which the output sample at time $ n$ is computed as an inner product of the impulse response after flipping it about time 0 and shifting time 0 to time $ n$ . See [264, p. 105] for an illustration of graphical convolution.

Cyclic FFT Convolution

Thanks to the convolution theorem, we have two alternate ways to perform cyclic convolution in practice:

  1. Direct calculation in the time domain using (8.13)
  2. Frequency-domain convolution:
    1. Fourier Transform both signals
    2. Perform term by term multiplication of the transformed signals
    3. Inverse transform the result to get back to the time domain
For short convolutions (less than a hundred samples or so), method 1 is usually faster. However, for longer convolutions, method 2 is ultimately faster. This is because the computational complexity of direct cyclic convolution of two $ N$ -point signals is $ {\cal O}(N^2)$ , while that of FFT convolution is $ {\cal O}(N \lg N)$ . More precisely, direct cyclic convolution requires $ N^2$ multiplies and $ N(N-1)$ additions, while the exact FFT numbers depend on the particular FFT algorithm used [80,66,224,277]. Some specific cases are compared in §8.1.4 below.

Acyclic FFT Convolution

If we add enough trailing zeros to the signals being convolved, we can obtain acyclic convolution embedded within a cyclic convolution. How many zeros do we need to add? Suppose the signal $ x(n)$ consists of $ N_x$ contiguous nonzero samples at times 0 to $ N_x-1$ , preceded and followed by zeros, and suppose $ h(n)$ is nonzero only over a block of $ N_h$ samples starting at time 0. Then the acyclic convolution of $ x$ with $ h$ reduces to

$\displaystyle (x\ast h)(n) \isdefs \sum_{m=-\infty}^\infty x(m)h(n-m) \eqsp \sum_{m=0}^n x(m)h(n-m)$ (9.15)

which is zero for $ n<0$ and $ n>(N_x+N_h-1)-1$ . Thus,
$\textstyle \parbox{0.8\textwidth}{\emph{the acyclic convolution of $N_x$\ samples with $N_h$\ samples produces at most $N_x+N_h-1$\ nonzero samples.}}$
The number $ N_x+N_h-1$ is easily checked for signals of length 1 since $ \delta\ast \delta = \delta$ , where $ \delta $ is 1 at time zero and 0 at all other times. Similarly,

$\displaystyle [\delta+\hbox{\sc Shift}_1(\delta)] \ast [\delta+\hbox{\sc Shift}_1(\delta)] \eqsp \delta + 2\hbox{\sc Shift}_1(\delta) + \hbox{\sc Shift}_2(\delta)$ (9.16)

and so on.

When $ N_x$ or $ N_h$ is infinity, the convolution result can be as small as 1. For example, consider $ x=[1,r,r^2,r^3,\ldots]$ , with $ \left\vert r\right\vert<1$ , and $ h=[1,-r,0,0,\ldots]$ . Then $ x\ast h = [1, 0, 0,
\ldots]$ . This is an example of what is called deconvolution. In the frequency domain, deconvolution always involves a pole-zero cancellation. Therefore, it is only possible when $ N_x$ or $ N_h$ is infinite. In practice, deconvolution can sometimes be accomplished approximately, particularly within narrow frequency bands [119].

We thus conclude that, to embed acyclic convolution within a cyclic convolution (as provided by an FFT), we need to zero-pad both operands out to length $ N$ , where $ N$ is at least the sum of the operand lengths (minus one).

Acyclic Convolution in Matlab

In Matlab or Octave, the conv function implements acyclic convolution:

octave:1> conv([1 2],[3 4])
ans =
   3  10   8
Note that it returns an output vector which is long enough to accommodate the entire result of the convolution, unlike the filter primitive, which always returns an output signal equal in length to the input signal:
octave:2> filter([1 2],1,[3 4])
ans =
   3  10
octave:3> filter([1 2],1,[3 4 0])
ans =
   3  10   8

Pictorial View of Acyclic Convolution

Figure 8.2: Schematic depiction of the acyclic convolution of two signals.
\includegraphics[width=\textwidth ]{eps/convwaves}

Figure 8.2 shows schematically the result of convolving two zero-padded signals $ x$ and $ h$ . In this case, the signal $ x(n)$ starts some time after $ n=0$ , say at $ n=n_x$ . Since $ h(n)$ begins at time 0 , the output starts promptly at time $ n_x$ , but it takes some time to ``ramp up'' to full amplitude. (This is the transient response of the FIR filter $ h$ .) If the length of $ h$ is $ N_h$ , then the transient response is finished at time $ n=n_x+N_h-1$ . Next, when the input signal goes to zero at time $ n_x+N_x$ , the output reaches zero $ N_h-1$ samples later (after the filter ``decay time''), or time $ n_x+N_x+N_h-1$ . Thus, the total number of nonzero output samples is $ N_x+N_h-1$ .

If we don't add enough zeros, some of our convolution terms ``wrap around'' and add back upon others (due to modulo indexing). This can be called time-domain aliasing. Zero-padding in the time domain results in more samples (closer spacing) in the frequency domain, i.e., a higher `sampling rate' in the frequency domain. If we have a high enough spectral sampling rate, we can avoid time aliasing.

The motivation for implementing acyclic convolution using a zero-padded cyclic convolution is that we can use a Cooley-Tukey Fast Fourier Transform (FFT) to implement cyclic convolution when its length $ N$ is a power of 2.

Acyclic FFT Convolution in Matlab

The following example illustrates the implementation of acyclic convolution using a Cooley-Tukey FFT in matlab:

x = [1 2 3 4];
h = [1 1 1];

nx = length(x);
nh = length(h);
nfft = 2^nextpow2(nx+nh-1)
xzp = [x, zeros(1,nfft-nx)];
hzp = [h, zeros(1,nfft-nh)];
X = fft(xzp);
H = fft(hzp);

Y = H .* X;
format bank;
y = real(ifft(Y)) % zero-padded result
yt = y(1:nx+nh-1) % trim and print
yc = conv(x,h)    % for comparison
Program output:

nfft = 8
y =
  1.00  3.00  6.00  9.00  7.00  4.00  0.00  0.00
yt =
  1.00  3.00  6.00  9.00  7.00  4.00
yc =
     1     3     6     9     7     4

FFT versus Direct Convolution

Using the Matlab test program in [264],9.1FFT convolution was found to be faster than direct convolution starting at length $ N=2^6=64$ (looking only at powers of 2 for the length $ N$ ).9.2 FFT convolution was also never significantly slower at shorter lengths for which ``calling overhead'' dominates.

Running the same test program in 2011,9.3 FFT convolution using the fft function was found to be faster than conv for all (power-of-2) lengths. The speed of FFT convolution divided by that of direct convolution started out at 14 for $ N=2$ , fell to a minimum of $ 11$ at $ N=2^7=128$ , above which it started to climb as expected, reaching $ 3,160$ at $ N=2^{16}=65,536$ . Note that this comparison is unfair because the Octave fft function is a dynamically linked, separately compiled module, while conv is written in the matlab language and thus suffers more overhead from the matlab interpreter.

An analysis reported in Strum and Kirk [279, p. 521], based on the number of real multiplies, predicts that the fft is faster starting at length $ 2^7=128$ , and that direct convolution is significantly faster for very short convolutions (e.g., 16 operations for a direct length-4 convolution, versus 176 for the fft function).

See [264]9.4for further discussion of FFT algorithms and their applications.

In digital audio, FIR filters are often hundreds of taps long. For such filters, the FFT method is much faster than direct convolution in the time domain on single CPUs. On GPUs, FFT convolution is faster than direct convolution only for much longer FIR-filter lengths (in the thousands of taps [242]); this is because massively parallel hardware can perform an $ {\cal O}(N^2)$ algorithm (direct convolution) faster than a single CPU can perform an $ {\cal O}(N\,\lg N)$ algorithm (FFT convolution).

Audio FIR Filters

FIR filters shorter than the ear's ``integration time'' can generally be characterized by their magnitude frequency response (no perceivable ``delay effects''). The nominal ``integration time'' of the ear can be defined as the reciprocal of a critical bandwidth of hearing. Using Zwicker's definition of critical bandwidth [305], the smallest critical bandwidth of hearing is approximately 100 Hz (below 500 Hz). Thus, the nominal integration time of the ear is 10ms below 500 Hz. (Using the equivalent-rectangular-bandwidth (ERB) definition of critical bandwidth [179,269], longer values are obtained). At a 50 kHz sampling rate, this is 500 samples. Therefore, FIR filters shorter than the ear's ``integration time,'' i.e., perceptually ``instantaneous,'' can easily be hundreds of taps long (as discussed in the next section). FFT convolution is consequently an important implementation tool for FIR filters in digital audio applications.

Example 1: Low-Pass Filtering by FFT Convolution

In this example, we design and implement a length $ L=257$ FIR lowpass filter having a cut-off frequency at $ f_c = 600$ Hz. The filter is tested on an input signal $ x(n)$ consisting of a sum of sinusoidal components at frequencies $ (440, 880, 1000, 2000)$ Hz. We'll filter a single input frame of length $ M=256$ , which allows the FFT to be $ N=512$ samples (no wasted zero-padding).

% Signal parameters:
f = [ 440 880 1000 2000 ];      % frequencies
M = 256;                        % signal length
Fs = 5000;                      % sampling rate

% Generate a signal by adding up sinusoids:
x = zeros(1,M); % pre-allocate 'accumulator'
n = 0:(M-1);    % discrete-time grid
for fk = f;
    x = x + sin(2*pi*n*fk/Fs);

Next we design the lowpass filter using the window method:

% Filter parameters:
L = 257;    % filter length
fc = 600;   % cutoff frequency

% Design the filter using the window method:
hsupp = (-(L-1)/2:(L-1)/2);
hideal = (2*fc/Fs)*sinc(2*fc*hsupp/Fs);
h = hamming(L)' .* hideal; % h is our filter

Figure 8.3: FIR filter impulse response (top) and amplitude response (bottom).

Figure 8.3 plots the impulse response and amplitude response of our FIR filter designed by the window method. Next, the signal frame and filter impulse response are zero-padded out to the FFT size and transformed:

% Choose the next power of 2 greater than L+M-1
Nfft = 2^(ceil(log2(L+M-1))); % or 2^nextpow2(L+M-1)

% Zero pad the signal and impulse response:
xzp = [ x zeros(1,Nfft-M) ];
hzp = [ h zeros(1,Nfft-L) ];

X = fft(xzp); % signal
H = fft(hzp); % filter

Figure 8.4 shows the input signal spectrum and the filter amplitude response overlaid. We see that only one sinusoidal component falls within the pass-band.

Figure 8.4: Overlay of input signal spectrum and desired lowpass filter pass-band.

Figure 8.5: Output signal magnitude spectrum = magnitude of input spectrum times filter frequency response.

Now we perform cyclic convolution in the time domain using pointwise multiplication in the frequency domain:

Y = X .* H;
The modified spectrum is shown in Fig.8.5.

The final acyclic convolution is the inverse transform of the pointwise product in the frequency domain. The imaginary part is not quite zero as it should be due to finite numerical precision:

y = ifft(Y);
relrmserr = norm(imag(y))/norm(y) % check... should be zero
y = real(y);

Figure 8.6: Filtered output signal, with close-up showing the filter start-up transient (``pre-ring'').

Figure 8.6 shows the filter output signal in the time domain. As expected, it looks like a pure tone in steady state. Note the equal amounts of ``pre-ringing'' and ``post-ringing'' due to the use of a linear-phase FIR filter.9.5

For an input signal approximately $ 4000$ samples long, this example is 2-3 times faster than the conv function in Matlab (which is precompiled C code implementing time-domain convolution).

Example 2: Time Domain Aliasing

Figure 8.7 shows the effect of insufficient zero padding, which can be thought of as undersampling in the frequency domain. We will see aliasing in the time domain results.

The lowpass filter length is $ L= 65$ and the input signal consists of an impulse at times $ 10$ and $ M-(L-1)/4 = 85$ , where the data frame length is $ M=100$ . To avoid time aliasing (i.e., to implement acyclic convolution using an FFT), we must use an FFT size $ N$ at least as large as $ 85+65-1=149$ . In the figure, the FFT sizes $ 116$ , $ 132$ , and $ 165$ are used. Thus, the first case is heavily time aliased, the second only slightly time aliased (involving only some of the filter's ``ringing'' after the second pulse), and the third is free of time aliasing altogether.

Figure 8.7: Illustration of FFT convolution with insufficient zero padding. From the top: (1) Input signal (two impulses) and lowpass-filter impulse response; (2) heavily time-aliased convolution in which the second filter impulse has wrapped around to low times; (3) slightly time-aliased result in which some of the filter ``post-ring'' from the second pulse wraps around; (4) result with no time aliasing.

Next Section:
Convolving with Long Signals
Previous Section: