# Fast Fourier Transform (FFT) Algorithms

The term*fast Fourier transform*(FFT) refers to an efficient implementation of the discrete Fourier transform (DFT) for highly composite

^{A.1}transform lengths . When computing the DFT as a set of inner products of length each, the computational complexity is . When is an integer power of 2, a Cooley-Tukey FFT algorithm delivers complexity , where denotes the log-base-2 of , and means ``on the order of ''. Such FFT algorithms were evidently first used by Gauss in 1805 [30] and rediscovered in the 1960s by Cooley and Tukey [16].

In this appendix, a brief introduction is given for various FFT algorithms. A tutorial review (1990) is given in [22]. Additionally, there are some excellent FFT ``home pages'': Pointers to FFT software are given in §A.7.

## Mixed-Radix Cooley-Tukey FFT

When the desired DFT length can be expressed as a product of smaller integers, the Cooley-Tukey decomposition provides what is called a*mixed radix*Cooley-Tukey FFT algorithm.

^{A.2}Two basic varieties of Cooley-Tukey FFT are

*decimation in time*(DIT) and its Fourier dual,

*decimation in frequency*(DIF). The next section illustrates decimation in time.

### Decimation in Time

The DFT is defined bywhere and denote the even- and odd-indexed samples from . Thus, the length DFT is computable using two length DFTs. The complex factors are called

*twiddle factors*. The splitting into sums over even and odd time indexes is called

*decimation in time*. (For

*decimation in frequency*, the inverse DFT of the spectrum is split into sums over even and odd

*bin numbers*.)

### Radix 2 FFT

When is a power of , say where is an integer, then the above DIT decomposition can be performed times, until each DFT is length . A length DFT requires no multiplies. The overall result is called a*radix 2 FFT*. A different radix 2 FFT is derived by performing decimation in frequency. A

*split radix*FFT is theoretically more efficient than a pure radix 2 algorithm [73,31] because it minimizes real arithmetic operations. The term ``split radix'' refers to a DIT decomposition that combines portions of one radix 2 and two radix 4 FFTs [22].

^{A.3}On modern general-purpose processors, however, computation time is often not minimized by minimizing the arithmetic operation count (see §A.7 below).

#### Radix 2 FFT Complexity is N Log N

Putting together the length DFT from the length- DFTs in a radix-2 FFT, the only multiplies needed are those used to combine two small DFTs to make a DFT twice as long, as in Eq.(A.1). Since there are approximately (complex) multiplies needed for each stage of the DIT decomposition, and only stages of DIT (where denotes the log-base-2 of ), we see that the total number of multiplies for a length DFT is reduced from to , where means ``on the order of ''. More precisely, a complexity of means that given any implementation of a length- radix-2 FFT, there exist a constant and integer such that the computational complexity satisfies### Fixed-Point FFTs and NFFTs

Recall (*e.g.*, from Eq.(6.1)) that the inverse DFT requires a division by that the forward DFT does not. In fixed-point arithmetic (Appendix G), and when is a power of 2, dividing by may be carried out by right-shifting places in the binary word. Fixed-point implementations of the inverse Fast Fourier Transforms (FFT) (Appendix A) typically right-shift one place after each Butterfly stage. However, superior overall numerical performance may be obtained by right-shifting after every

*other*butterfly stage [8], which corresponds to dividing both the forward and inverse FFT by (

*i.e.*, is implemented by

*half*as many right shifts as dividing by ). Thus, in fixed-point, numerical performance can be improved, no extra work is required, and the normalization work (right-shifting) is spread equally between the forward and reverse transform, instead of concentrating all right-shifts in the inverse transform. The NDFT is therefore quite attractive for fixed-point implementations. Because signal amplitude can grow by a factor of 2 from one butterfly stage to the next, an extra guard bit is needed for each pair of subsequent NDFT butterfly stages. Also note that if the DFT length is not a power of , the number of right-shifts in the forward and reverse transform must differ by one (because is odd instead of even).

## Prime Factor Algorithm (PFA)

By the prime factorization theorem, every integer can be uniquely factored into a product of prime numbers raised to an integer power :*prime factor algorithm*(PFA), also called the

*Good-Thomas FFT algorithm*, can be used [26,80,35,43,10,83].

^{A.4}The Chinese Remainder Theorem is used to

*re-index*either the input or output samples for the PFA.

^{A.5}Since the PFA is only applicable to mutually prime factors of , it is ideally combined with a mixed-radix Cooley-Tukey FFT, which works for any integer factors. It is interesting to note that the PFA actually predates the Cooley-Tukey FFT paper of 1965 [16], with Good's 1958 work on the PFA being cited in that paper [83]. The PFA and

*Winograd transform*[43] are closely related, with the PFA being somewhat faster [9].

## Rader's FFT Algorithm for Prime Lengths

Rader's FFT algorithm can be used to compute DFTs of length in operations when is a prime number. For an introduction, see the Wikipedia page for Rader's FFT Algorithm:http://en.wikipedia.org/wiki/Rader's_FFT_algorithm

## Bluestein's FFT Algorithm

Like Rader's FFT,*Bluestein's FFT algorithm*(also known as the

*chirp -transform algorithm*), can be used to compute prime-length DFTs in operations [24, pp. 213-215].

^{A.6}However, unlike Rader's FFT, Bluestein's algorithm is not restricted to prime lengths, and it can compute other kinds of transforms, as discussed further below. Beginning with the DFT

^{A.7}we multiply and divide by to obtain

*i.e.*, an instantaneous frequency that increases linearly with time. Such signals are called

*chirp signals*. For this reason, Bluestein's algorithm is also called the chirp -transform algorithm [59]. In summary, Bluestein's FFT algorithm provides complexity for any positive integer DFT-length whatsoever, even when is prime. Other adaptations of the Bluestein FFT algorithm can be used to compute a contiguous subset of DFT frequency samples (any uniformly spaced set of samples along the unit circle), with complexity. It can similarly compute samples of the transform along a sampled spiral of the form , where is any complex number, and , again with complexity [24].

## Fast Transforms in Audio DSP

Since most audio signal processing applications benefit from*zero padding*(see §8.1), in which case we can always choose the FFT length to be a power of 2, there is almost never a need in practice for more ``exotic'' FFT algorithms than the basic ``pruned'' power-of-2 algorithms. (Here ``pruned'' means elimination of all unnecessary operations, such as when the input signal is real [74,21].) An exception is when processing exactly periodic signals where the period is known to be an exact integer number of samples in length.

^{A.8}In such a case, the DFT of one period of the waveform can be interpreted as a

*Fourier series*(§B.3) of the periodic waveform, and unlike virtually all other practical spectrum analysis scenarios, spectral interpolation is not needed (or wanted). In the exactly periodic case, the spectrum is truly zero between adjacent harmonic frequencies, and the DFT of one period provides spectral samples only at the harmonic frequencies. Adaptive FFT software (see §A.7 below) will choose the fastest algorithm available for any desired DFT length. Due to modern processor architectures, execution time is not normally minimized by minimizing arithmetic complexity [23].

## Related Transforms

There are alternatives to the Cooley-Tukey FFT which can serve the same or related purposes and which can have advantages in certain situations [10]. Examples include the fast discrete cosine transform (DCT) [5], discrete Hartley transform [21], and number theoretic transform [2]. The DCT, used extensively in image coding, is described in §A.6.1 below. The Hartley transform, optimized for processing real signals, does not appear to have any advantages over a ``pruned real-only FFT'' [74]. The number theoretic transform has special applicability for large-scale, high-precision calculations (§A.6.2 below).### The Discrete Cosine Transform (DCT)

In image coding (such as MPEG and JPEG), and many audio coding algorithms (MPEG), the*discrete cosine transform*(DCT) is used because of its nearly optimal asymptotic theoretical

*coding gain*.

^{A.9}For 1D signals, one of several DCT definitions (the one called DCT-II)

^{A.10}is given by

where

`dct`and

`dct2`are available for the 1D and 2D cases, respectively.

**Exercise:**Using Euler's identity, expand the cosine in the DCT defined by Eq.(A.2) above into a sum of complex sinusoids, and show that the DCT can be rewritten as the sum of two phase-modulated DFTs:

### Number Theoretic Transform

The*number theoretic transform*is based on generalizing the th primitive root of unity (see §3.12) to a ``quotient ring'' instead of the usual field of complex numbers. Let denote a primitive th root of unity. We have been using in the field of complex numbers, and it of course satisfies , making it a root of unity; it also has the property that visits all of the ``DFT frequency points'' on the unit circle in the plane, as goes from 0 to . In a number theory transform, is an

*integer*which satisfies

*in some order*, as goes from to . Since for all integers (another result from number theory), is also an th root of unity, where is the transform size. (More generally, can be any integer divisor of , in which case we use as the generator of the numbers participating in the transform.) When the number of elements in the transform is composite, a ``fast number theoretic transform'' may be constructed in the same manner as a fast Fourier transform is constructed from the DFT, or as the prime factor algorithm (or Winograd transform) is constructed for products of small mutually prime factors [43]. Unlike the DFT, the number theoretic transform does not transform to a meaningful ``frequency domain''. However, it has analogous theorems, such as the convolution theorem, enabling it to be used for fast convolutions and correlations like the various FFT algorithms. An interesting feature of the number theory transform is that all computations are

*exact*(integer multiplication and addition modulo a prime integer). There is no round-off error. This feature has been used to do fast convolutions to multiply extremely large numbers, such as are needed when computing to millions of digits of precision.

## FFT Software

For those of us in signal processing research, the built-in`fft`function in Matlab (or Octave) is what we use almost all the time. It is adaptive in that it will choose the best algorithm available for the desired transform size. For C or C++ applications, there are several highly optimized FFT variants in the

`FFTW`package (``Fastest Fourier Transform in the West'') [23].

`FFTW`is free for non-commercial or free-software applications under the terms of the GNU General Public License. For embedded DSP applications (software running on special purpose DSP chips), consult your vendor's software libraries and support website for FFT algorithms written in optimized assembly language for your DSP hardware platform. Nearly all DSP chip vendors supply free FFT software (and other signal processing utilities) as a way of promoting their hardware.

**Next Section:**

Fourier Transforms for Continuous/Discrete Time/Frequency

**Previous Section:**

Example Applications of the DFT