Derivation of the Discrete Fourier Transform (DFT)
This chapter derives the Discrete
Fourier Transform (
DFT) as a
projection of a length
signal 
onto the set of

sampled
complex
sinusoids generated by the

th
roots of unity.

Geometric Series
Recall that for any
complex number

, the
signal
defines a
geometric sequence,
i.e., each
term is obtained by multiplying the previous term by the (complex) constant

.
A
geometric series is the
sum of a geometric sequence:
If

, the sum can be expressed in
closed form:
Proof: We have
When

,

, by inspection of the definition of

.
A key property of
sinusoids is that they are
orthogonal at different
frequencies. That is,
This is true whether they are complex or real, and whatever amplitude
and phase they may have. All that matters is that the frequencies be
different. Note, however, that the durations must be infinity (in general).
For length
sampled sinusoidal signal segments, such as used
by the
DFT, exact orthogonality holds only for the
harmonics of
the sampling-rate-divided-by-
,
i.e., only for the frequencies (in Hz)
These are the only frequencies that have a
whole number
of periods in
samples (depicted in Fig.
6.2 for

).
6.1
The
complex sinusoids corresponding to the frequencies

are
These sinusoids are generated by the

th
roots of unity in the
complex plane.
As introduced in §
3.12, the
complex numbers
are called the
th roots of unity because each of them satisfies
In particular,

is called a
primitive
th root of unity.
6.2
The

th roots of unity are plotted in the
complex plane in
Fig.
6.1 for

. It is easy to find them graphically
by dividing the unit circle into

equal parts using

points, with
one point anchored at

, as indicated in Fig.
6.1. When

is even, there will be a point at

(corresponding to a
sinusoid
with frequency at exactly half the
sampling rate), while if

is
odd, there is no point at

.
Figure 6.1:
The
roots of unity for
.
![\includegraphics[width=\twidth]{eps/dftfreqs}](http://www.dsprelated.com/josimages_new/mdft/img1016.png) |
The sampled
sinusoids generated by integer powers of the
roots of
unity are plotted in Fig.
6.2. These are the sampled sinusoids

used by the
DFT. Note that taking successively higher integer powers of the
point

on the unit circle
generates samples of the

th
DFT sinusoid, giving
![$ [W_N^k]^n$](http://www.dsprelated.com/josimages_new/mdft/img1018.png)
,

. The

th sinusoid generator

is in turn
the

th

th root of unity (

th power of the primitive

th root
of unity

).
Note that in Fig.
6.2 the range of

is taken to be
![$ [-N/2,N/2-1] = [-4,3]$](http://www.dsprelated.com/josimages_new/mdft/img1020.png)
instead of
![$ [0,N-1]=[0,7]$](http://www.dsprelated.com/josimages_new/mdft/img1021.png)
. This is the most
``physical'' choice since it corresponds with our notion of ``
negative
frequencies.'' However, we may add any integer multiple of

to

without changing the sinusoid indexed by

. In other words,

refers to the same sinusoid

for all integers

.
We now show mathematically that the DFT
sinusoids are exactly
orthogonal.
Let
denote the

th DFT
complex-sinusoid, for

. Then
where the last step made use of the closed-form expression for the sum
of a
geometric series (§
6.1). If

, the
denominator is nonzero while the numerator is zero. This proves
While we only looked at unit amplitude,
zero-phase complex sinusoids, as
used by the DFT, it is readily verified that the (nonzero) amplitude and
phase have no effect on orthogonality.
For

, we follow the previous derivation to the next-to-last step to get
which proves
We can normalize the
DFT sinusoids to obtain an orthonormal set:
The orthonormal sinusoidal basis
signals satisfy
We call these the
normalized DFT sinusoids.
In §
6.10 below, we will project signals onto them to obtain
the
normalized DFT (NDFT).
Given a
signal

, its
DFT is defined
by
6.3
where
or, as it is most often written,
We may also refer to

as the
spectrum of

, and

is the

th
sample of the
spectrum at frequency

.
Thus, the

th sample

of the
spectrum of

is defined
as the
inner product of

with the

th DFT
sinusoid 
. This
definition is

times the
coefficient of projection of

onto

,
i.e.,
The projection of

onto

is
Since the

are
orthogonal and span

, using the main
result of the preceding chapter, we have that the inverse DFT is
given by the sum of the projections
or, as we normally write,
 |
(6.1) |
In summary, the DFT is proportional to the set of coefficients of
projection onto the
sinusoidal basis set, and the inverse DFT is the
reconstruction of the original signal as a superposition of its
sinusoidal projections. This basic ``architecture'' extends to all
linear orthogonal transforms, including wavelets,
Fourier transforms,
Fourier series, the
discrete-time Fourier transform (
DTFT), and
certain
short-time Fourier transforms (
STFT). See Appendix
B
for some of these.
We have defined the DFT from a geometric signal theory point of view,
building on the preceding chapter. See §
7.1.1 for
notation and terminology associated with the DFT.
Frequencies in the ``Cracks''
The
DFT is defined only for frequencies

. If we
are analyzing one or more
periods of an exactly
periodic signal, where the
period is exactly

samples (or some integer divisor of

), then these
really are the only frequencies present in the
signal, and the
spectrum is
actually zero everywhere but at

,
![$ k\in[0,N-1]$](http://www.dsprelated.com/josimages_new/mdft/img1046.png)
.
However, we use the
DFT to analyze arbitrary signals from nature. What happens when a
frequency

is present in a signal

that is not one of the
DFT-
sinusoid frequencies

?
To find out, let's project a length

segment of a
sinusoid at an
arbitrary frequency

onto the

th
DFT sinusoid:
The
coefficient of projection is proportional to
using the closed-form expression for a
geometric series sum once
again. As shown in §
6.3-§
6.4 above,
the sum is

if

and zero at

, for

. However,
the sum is
nonzero at all other frequencies 
.
Since we are only looking at

samples, any
sinusoidal segment can be
projected onto the

DFT sinusoids and be reconstructed exactly by a
linear combination of them. Another way to say this is that the DFT
sinusoids form a
basis for

, so that any length

signal
whatsoever can be expressed as a linear combination of them. Therefore, when
analyzing segments of recorded signals, we must interpret what we see
accordingly.
The typical way to think about this in practice is to consider the DFT
operation as a
digital filter for each

, whose input is

and whose output is

at time

.
6.4 The
frequency response of this
filter is what we just
computed,
6.5 and its magnitude is
(shown in Fig.
6.3a for

). At all other integer values of

, the frequency response is the same but shifted (circularly) left or right so
that the peak is centered on

. The secondary peaks away from

are called
sidelobes of the DFT response, while the main
peak may be called the
main lobe of the response. Since we are
normally most interested in
spectra from an audio perspective, the same
plot is repeated using a
decibel vertical scale in
Fig.
6.3b
6.6(clipped at
dB). We see that the sidelobes are really quite high
from an audio perspective. Sinusoids with frequencies near

, for example, are only attenuated approximately
dB in the DFT
output

.
Figure:
Magnitude frequency response of a particular DFT ``bin'' (where ``bin'' is defined in §6.8).
The solid curve shows the relative contribution of arbitrary frequency
components to the spectral bin at one-fourth the sampling rate.
![\includegraphics[width=\twidth]{eps/dftfilter}](http://www.dsprelated.com/josimages_new/mdft/img1059.png) |
We see that

is sensitive to
all frequencies between
dc
and the
sampling rate
except the other DFT-sinusoid frequencies

for

. This is sometimes called
spectral leakage
or
cross-talk in the
spectrum analysis. Again, there is
no
leakage when the signal being analyzed is truly
periodic and we can choose

to be exactly a period, or some multiple of a period. Normally,
however, this cannot be easily arranged, and spectral leakage can
be a problem.
Note that peak spectral leakage is not reduced by increasing

.
6.7 It can be thought of as being caused by abruptly
truncating a sinusoid at the beginning and/or end of the

-sample time window. Only the DFT sinusoids are not cut off at the
window boundaries. All other frequencies will suffer some truncation
distortion, and the spectral content of the abrupt cut-off or turn-on
transient can be viewed as the source of the sidelobes. Remember
that, as far as the DFT is concerned, the input signal

is the
same as its
periodic extension (more about this in
§
7.1.2). If we repeat

samples of a sinusoid at frequency

(for any

), there will be a ``glitch''
every

samples since the signal is not periodic in

samples.
This glitch can be considered a source of new energy over the entire
spectrum. See
Fig.
8.3 for an example waveform.
To reduce spectral leakage (cross-talk from far-away
frequencies), we typically use a
window
function, such as a
``raised cosine'' window, to
taper the data record gracefully
to zero at both endpoints of the window. As a result of the smooth
tapering, the
main lobe widens and the
sidelobes
decrease in the DFT response. Using no window is better viewed as
using a
rectangular window of length

, unless the signal is
exactly periodic in

samples. These topics are considered further
in Chapter
8.
Spectral Bin Numbers
Since the

th spectral sample

is properly regarded as
a measure of spectral amplitude over a
range of frequencies,
nominally

to

, this range is sometimes called a
frequency bin
(as in a ``storage bin'' for spectral energy).
The frequency index

is called the
bin number, and

can be regarded as the total energy in the

th
bin (see §
7.4.9).
Similar remarks apply to samples of any
bandlimited
function; however, the term ``bin'' is only used in the
frequency
domain, even though it could be assigned exactly the same meaning
mathematically in the time domain.
In the very special case of
truly periodic signals

, for all

, the
DFT may be regarded as
computing the
Fourier series coefficients of

from one
period of its sampled representation

,

. The
period of

must be exactly

seconds for this to work. For the
details, see §
B.3.
Normalized DFT
A more ``theoretically clean'' DFT is obtained by projecting onto the
normalized DFT sinusoids (§
6.5)
In this case, the
normalized DFT (NDFT) of

is
which is also precisely the
coefficient of projection of

onto

.
The inverse normalized DFT is then more simply
While this definition is much cleaner from a ``geometric
signal theory''
point of view, it is rarely used in practice since it requires slightly more
computation than the typical definition. However, note that the only
difference between the forward and inverse transforms in this case
is the sign of the exponent in the kernel. Advantages of the NDFT over the
DFT in
fixed-point implementations (Appendix
G) are
discussed in Appendix
A.
It can be said that only the NDFT provides a proper
change of
coordinates from the time-domain (shifted
impulse basis signals) to
the
frequency-domain (
DFT sinusoid basis signals). That is, only the
NDFT is a pure
rotation in

, preserving both
orthogonality and the unit-
norm
property of the basis functions. The DFT, in contrast, preserves
orthogonality, but the norms of the basis functions grow to

. Therefore, in the present context, the DFT coefficients can be
considered ``denormalized'' frequency-domain coordinates.
The length
DFT is particularly simple, since the basis
sinusoids
are real:
The
DFT sinusoid 
is a sampled constant
signal, while

is a
sampled
sinusoid at half the
sampling rate.
Figure
6.4 illustrates the graphical relationships for the length

DFT of the signal
![$ \underline{x}=[6,2]$](http://www.dsprelated.com/josimages_new/mdft/img1076.png)
.
Figure 6.4:
Graphical interpretation of the length 2 DFT.
![\includegraphics[width=\twidth]{eps/dft2}](http://www.dsprelated.com/josimages_new/mdft/img1077.png) |
Analytically, we compute the DFT to be
and the corresponding projections onto the DFT sinusoids are
Note the lines of
orthogonal projection illustrated in the figure. The
``time domain'' basis consists of the vectors

, and the
orthogonal projections onto them are simply the coordinate axis projections

and

. The ``
frequency domain''
basis vectors are

, and they provide an orthogonal basis set that is rotated

degrees relative to the time-domain basis vectors. Projecting
orthogonally onto them gives

and

, respectively.
The original signal

can be
expressed either as the vector sum of its coordinate projections
(0,...,x(i),...,0), (a time-domain representation), or as the
vector sum of its projections onto the DFT sinusoids (a
frequency-domain representation of the time-domain signal

).
Computing the
coefficients of projection is essentially ``taking the
DFT,'' and constructing

as the vector sum of its projections onto
the DFT sinusoids amounts to ``taking the inverse DFT.''
In summary, the oblique coordinates in Fig.
6.4 are interpreted as
follows:
Matrix Formulation of the DFT
The DFT can be formulated as a complex
matrix multiply, as we show in
this section. (This section can be omitted without affecting what
follows.) For basic definitions regarding
matrices, see
Appendix
H.
The DFT consists of
inner products of the input
signal

with sampled complex
sinusoidal sections

:
By collecting the DFT output samples into a column vector, we have
or
where

denotes the
DFT matrix
![$ \mathbf{S}^\ast_N[k,n]\isdef W_N^{-kn} \isdef
e^{-j2\pi k n/N}$](http://www.dsprelated.com/josimages_new/mdft/img1092.png)
,
i.e.,
The notation

denotes the
Hermitian transpose of the complex matrix

(transposition
and complex conjugation).
Note that the

th column of

is the

th DFT
sinusoid, so
that the

th row of the DFT matrix

is the
complex-conjugate of the

th
DFT sinusoid. Therefore, multiplying
the DFT matrix times a signal vector

produces a column-vector

in which the

th element
![$ \underline{X}[k]$](http://www.dsprelated.com/josimages_new/mdft/img1097.png)
is the inner
product of the

th DFT
sinusoid with

, or
![$ \underline{X}[k]=\underline{s}^\ast_k\underline{x}
= \left<\underline{x},s_k\right>$](http://www.dsprelated.com/josimages_new/mdft/img1098.png)
, as expected.
Computation of the DFT matrix in
Matlab is illustrated in §
I.4.3.
The
inverse DFT matrix is simply

. That is,
we can perform the inverse DFT operation as
 |
(6.2) |
Since the forward DFT is

,
substituting

from Eq.

(
6.2) into the forward DFT
leads quickly to the conclusion that
 |
(6.3) |
This equation succinctly states that the
columns of
are orthogonal, which, of course, we already knew.
I.e.,

for

, and

:
The
normalized DFT matrix is given by
and the corresponding
normalized
inverse DFT matrix
is simply

, so that Eq.

(
6.3) becomes
This implies that the columns of

are
orthonormal. Such a
complex matrix is said to be
unitary.
When a
real matrix

satisfies

, then

is said to be
orthogonal.
``Unitary'' is the generalization of ``orthogonal'' to
complex matrices.
See
http://ccrma.stanford.edu/~jos/mdftp/DFT_Problems.html
Next Section: Fourier Theorems for the DFTPrevious Section: Geometric Signal Theory