### FFT versus Direct Convolution

Using the Matlab test program in
[264],^{9.1}FFT convolution was found to be faster than direct convolution
starting at length
(looking only at powers of 2 for the
length
).^{9.2} FFT convolution was also never
significantly slower at shorter lengths for which ``calling overhead''
dominates.

Running the same test program in 2011,^{9.3} FFT convolution using the
`fft` function was found to be faster than `conv` for
*all* (power-of-2) lengths. The speed of FFT convolution divided
by that of direct convolution started out at 14 for
, fell to a
minimum of
at
, above which it started to climb as
expected, reaching
at
. Note that this
comparison is unfair because the Octave `fft` function is a
dynamically linked, separately compiled module, while `conv` is
written in the matlab language and thus suffers more overhead from the
matlab interpreter.

An analysis reported in Strum and Kirk [279, p. 521],
based on the number of real multiplies, predicts that the `fft`
is faster starting at length
, and that direct convolution is
significantly faster for very short convolutions (*e.g.*, 16 operations
for a direct length-4 convolution, versus 176 for the `fft`
function).

See
[264]^{9.4}for further discussion of FFT algorithms and their applications.

In digital audio, FIR filters are often hundreds of taps long. For such filters, the FFT method is much faster than direct convolution in the time domain on single CPUs. On GPUs, FFT convolution is faster than direct convolution only for much longer FIR-filter lengths (in the thousands of taps [242]); this is because massively parallel hardware can perform an algorithm (direct convolution) faster than a single CPU can perform an algorithm (FFT convolution).

#### Audio FIR Filters

FIR filters shorter than the ear's ``integration time'' can generally
be characterized by their magnitude frequency response (no perceivable
``delay effects''). The nominal ``integration time'' of the ear can
be defined as the reciprocal of a critical bandwidth of hearing.
Using Zwicker's definition of critical bandwidth
[305], the smallest critical bandwidth of hearing
is approximately 100 Hz (below 500 Hz). Thus, the nominal integration
time of the ear is 10ms below 500 Hz. (Using the
equivalent-rectangular-bandwidth (ERB) definition of critical
bandwidth [179,269], longer values are obtained).
At a 50 kHz sampling rate, this is 500 samples. Therefore, FIR
filters shorter than the ear's ``integration time,'' *i.e.*, perceptually
``instantaneous,'' can easily be hundreds of taps long (as discussed
in the next section). FFT convolution is consequently an important
implementation tool for FIR filters in digital audio applications.

#### Example 1: Low-Pass Filtering by FFT Convolution

In this example, we design and implement a length FIR lowpass filter having a cut-off frequency at Hz. The filter is tested on an input signal consisting of a sum of sinusoidal components at frequencies Hz. We'll filter a single input frame of length , which allows the FFT to be samples (no wasted zero-padding).

% Signal parameters: f = [ 440 880 1000 2000 ]; % frequencies M = 256; % signal length Fs = 5000; % sampling rate % Generate a signal by adding up sinusoids: x = zeros(1,M); % pre-allocate 'accumulator' n = 0:(M-1); % discrete-time grid for fk = f; x = x + sin(2*pi*n*fk/Fs); end

Next we design the lowpass filter using the window method:

% Filter parameters: L = 257; % filter length fc = 600; % cutoff frequency % Design the filter using the window method: hsupp = (-(L-1)/2:(L-1)/2); hideal = (2*fc/Fs)*sinc(2*fc*hsupp/Fs); h = hamming(L)' .* hideal; % h is our filter

Figure 8.3 plots the impulse response and amplitude response of our FIR filter designed by the window method. Next, the signal frame and filter impulse response are zero-padded out to the FFT size and transformed:

% Choose the next power of 2 greater than L+M-1 Nfft = 2^(ceil(log2(L+M-1))); % or 2^nextpow2(L+M-1) % Zero pad the signal and impulse response: xzp = [ x zeros(1,Nfft-M) ]; hzp = [ h zeros(1,Nfft-L) ]; X = fft(xzp); % signal H = fft(hzp); % filter

Figure 8.4 shows the input signal spectrum and the filter amplitude response overlaid. We see that only one sinusoidal component falls within the pass-band.

Now we perform cyclic convolution in the time domain using pointwise multiplication in the frequency domain:

Y = X .* H;The modified spectrum is shown in Fig.8.5.

The final acyclic convolution is the inverse transform of the pointwise product in the frequency domain. The imaginary part is not quite zero as it should be due to finite numerical precision:

y = ifft(Y); relrmserr = norm(imag(y))/norm(y) % check... should be zero y = real(y);

Figure 8.6 shows the filter output signal in the time
domain. As expected, it looks like a pure tone in steady state. Note
the equal amounts of ``pre-ringing'' and ``post-ringing'' due to the
use of a linear-phase FIR filter.^{9.5}

For an input signal approximately
samples long, this example is
2-3 times faster than the `conv` function in Matlab (which is
precompiled C code implementing time-domain convolution).

#### Example 2: Time Domain Aliasing

Figure 8.7 shows the effect of insufficient zero padding, which can
be thought of as *undersampling in the frequency domain*. We
will see *aliasing* in the time domain results.

The lowpass filter length is
and the input signal consists of
an impulse at times
and
, where the data frame
length is
. To avoid time aliasing (*i.e.*, to implement
acyclic convolution using an FFT), we must use an FFT size
at
least as large as
. In the figure, the FFT sizes
,
, and
are used. Thus, the first case is heavily time
aliased, the second only slightly time aliased (involving only some of
the filter's ``ringing'' after the second pulse), and the third is
free of time aliasing altogether.

**Next Section:**

Overlap-Add Decomposition

**Previous Section:**

Acyclic FFT Convolution in Matlab