Mathematics of the DFT

In the signal processing literature, it is common to write the DFT and its inverse in the more pure form below, obtained by setting $ T=1$ in the previous definition:

X(k) &\isdef & \sum_{n=0}^{N-1}x(n)e^{-j{2\pi n k/N}}, \qquad ...
...\sum_{k=0}^{N-1}X(k)e^{j{2\pi n k/N}}, \qquad n=0,1,2,\ldots,N-1

where $ x(n)$ denotes the input signal at time (sample) $ n$, and $ X(k)$ denotes the $ k$th spectral sample. This form is the simplest mathematically, while the previous form is easier to interpret physically.

There are two remaining symbols in the DFT we have not yet defined:

j &\isdef & \sqrt{-1} \\
e &\isdef & \lim_{n\to\infty} \left(1+\frac{1}{n}\right)^{n}
= 2.71828182845905\ldots

The first, $ j=\sqrt{-1}$, is the basis for complex numbers.1.1 As a result, complex numbers will be the first topic we cover in this book (but only to the extent needed to understand the DFT).

The second, $ e = 2.718\ldots\,$, is a (transcendental) real number defined by the above limit. We will derive $ e$ and talk about why it comes up in Chapter 3.

Note that not only do we have complex numbers to contend with, but we have them appearing in exponents, as in

$\displaystyle s_k(n) \isdef e^{j{2\pi n k/N}}.

We will systematically develop what we mean by imaginary exponents in order that such mathematical expressions are well defined.

With $ e$, $ j$, and imaginary exponents understood, we can go on to prove Euler's Identity:

$\displaystyle e^{j\theta} = \cos(\theta) + j\sin(\theta)

Euler's Identity is the key to understanding the meaning of expressions like

$\displaystyle s_k(t_n) \isdef e^{j\omega_k t_n}= \cos(\omega_k t_n) + j\sin(\omega_k t_n).

We'll see that such an expression defines a sampled complex sinusoid, and we'll talk about sinusoids in some detail, particularly from an audio perspective.

Finally, we need to understand what the summation over $ n$ is doing in the definition of the DFT. We'll learn that it should be seen as the computation of the inner product of the signals $ x$ and $ s_k$ defined above, so that we may write the DFT, using inner-product notation, as

$\displaystyle X(k) \isdef \left<x,s_k\right>

where $ s_k(n) \isdef e^{j{2\pi n k/N}}$ is the sampled complex sinusoid at (normalized) radian frequency $ \omega_k T=2\pi k/N$, and the inner product operation $ \left<\,\cdot\,,\,\cdot\,\right>$ is defined by

$\displaystyle \left<x,y\right> \isdef \sum_{n=0}^{N-1}x(n) \overline{y(n)}.

We will show that the inner product of $ x$ with the $ k$th ``basis sinusoid'' $ s_k$ is a measure of ``how much'' of $ s_k$ is present in $ x$ and at ``what phase'' (since it is a complex number).

After the foregoing, the inverse DFT can be understood as the sum of projections of $ x$ onto $ \{s_k\}_{k=0}^{N-1}$; i.e., we'll show

$\displaystyle x(n) = \sum_{k=0}^{N-1}\tilde{X}_k s_k(n), \qquad n=0,1,2,\ldots,N-1


$\displaystyle \tilde{X}_k \isdef \frac{X(k)}{N}

is the coefficient of projection of $ x$ onto $ s_k$. Using the notation $ x\isdef x(\cdot)$ to mean the whole signal $ x(n)$ for all $ n\in [0,N-1]$, the IDFT can be written more simply as

$\displaystyle x = \sum_k \tilde{X}_k s_k.

Note that both the basis sinusoids $ s_k$ and their coefficients of projection $ \tilde{X}_k$ are complex valued in general.

Having completely understood the DFT and its inverse mathematically, we go on to proving various Fourier Theorems, such as the ``shift theorem,'' the ``convolution theorem,'' and ``Parseval's theorem.'' The Fourier theorems provide a basic thinking vocabulary for working with signals in the time and frequency domains. They can be used to answer questions such as

``What happens in the frequency domain if I do [operation x] in the time domain?''
Usually a frequency-domain understanding comes closest to a perceptual understanding of audio processing.

Finally, we will study a variety of practical spectrum analysis examples, using primarily the matlab programming language [67] to analyze and display signals and their spectra.

Next Section:
DFT Math Outline
Previous Section:
Inverse DFT