Geometric Signal Theory

This chapter provides an introduction to the elements of geometric signal theory, including vector spaces, norms, inner products, orthogonality, projection of one signal onto another, and elementary vector space operations. First, however, we will ``get our bearings'' with respect to the DFT.


For a length $ N$ complex sequence $ x(n)$, $ n=0,1,2,\ldots,N-1$, the discrete Fourier transform (DFT) is defined by

$\displaystyle X(\omega_k) \isdef \sum_{n=0}^{N-1}x(n) e^{-j\omega_k t_n} = \sum_{n=0}^{N-1}x(n) e^{-j 2\pi kn/N},
\quad k=0,1,2,\ldots N-1

t_n &\isdef & nT = \mbox{$n$th sampling instant (sec)} \\
...sdef & 2\pi f_s/N = \mbox{frequency sampling interval (rad/sec)}

We are now in a position to have a full understanding of the transform kernel:

$\displaystyle e^{-j\omega_k t_n} = \cos(\omega_k t_n) - j \sin(\omega_k t_n)

The kernel consists of samples of a complex sinusoid at $ N$ discrete frequencies $ \omega_k$ uniformly spaced between 0 and the sampling rate $ \omega_s \isdeftext 2\pi f_s$. All that remains is to understand the purpose and function of the summation over $ n$ of the pointwise product of $ x(n)$ times each complex sinusoid. We will learn that this can be interpreted as an inner product operation which computes the coefficient of projection of the signal $ x$ onto the complex sinusoid $ \cos(\omega_k t_n) + j \sin(\omega_k t_n)$. As such, $ X(\omega_k)$, the DFT at frequency $ \omega_k$, is a measure of the amplitude and phase of the complex sinusoid which is present in the input signal $ x$ at that frequency. This is the basic function of all linear transform summations (in discrete time) and integrals (in continuous time) and their kernels.

Signals as Vectors

For the DFT, all signals and spectra are length $ N$. A length $ N$ sequence $ x$ can be denoted by $ x(n)$, $ n=0,1,2,\ldots N-1$, where $ x(n)$ may be real ( $ x\in{\bf R}^N$) or complex ( $ x\in{\bf C}^N$). We now wish to regard $ x$ as a vector5.1 $ \underline{x}$ in an $ N$ dimensional vector space. That is, each sample $ x(n)$ is regarded as a coordinate in that space. A vector $ \underline{x}$ is mathematically a single point in $ N$-space represented by a list of coordinates $ (x_0,x_1,x_2,\ldots,x_{N-1})$ called an $ N$-tuple. (The notation $ x_n$ means the same thing as $ x(n)$.) It can be interpreted geometrically as an arrow in $ N$-space from the origin $ \underline{0}
\isdef (0,0,\ldots,0)$ to the point $ \underline{x}\isdef

We define the following as equivalent:

$\displaystyle x \isdef \underline{x}\isdef x(\cdot)
\isdef (x_0,x_1,\ldots,x_{N-1})
\isdef [x_0,x_1,\ldots,x_{N-1}]
\isdef [x_0\; x_1\; \cdots\; x_{N-1}]

where $ x_n \isdef x(n)$ is the $ n$th sample of the signal (vector) $ x$. From now on, unless specifically mentioned otherwise, all signals are length $ N$.

The reader comfortable with vectors, vector addition, and vector subtraction may skip to §5.6.

An Example Vector View: $ N=2$

Consider the example two-sample signal $ x=(2, 3)$ graphed in Fig.5.1.

Figure 5.1: A length 2 signal $ x=(2, 3)$ plotted as a vector in 2D space.

Under the geometric interpretation of a length $ N$ signal, each sample is a coordinate in the $ N$ dimensional space. Signals which are only two samples long are not terribly interesting to hear,5.2 but they are easy to plot geometrically.

Vector Addition

Given two vectors in $ {\bf R}^N$, say

\underline{x}&\isdef & (x_0,x_1,\ldots,x_{N-1})\\
\underline{y}&\isdef & (y_0,y_1,\ldots,y_{N-1}),

the vector sum is defined by elementwise addition. If we denote the sum by $ \underline{w}\isdef \underline{x}+\underline{y}$, then we have $ \underline{w}_n = x_n+y_n$ for $ n=0,1,2,\ldots,N-1$. We could also write $ \underline{w}(n) = x(n)+y(n)$ for $ n=0,1,2,\ldots,N-1$ if preferred.

The vector diagram for the sum of two vectors can be found using the parallelogram rule, as shown in Fig.5.2 for $ N=2$, $ \underline{x}=(2,3)$, and $ \underline{y}=(4,1)$.

Figure 5.2: Geometric interpretation of a length 2 vector sum.

Also shown are the lighter construction lines which complete the parallelogram started by $ \underline{x}$ and $ \underline{y}$, indicating where the endpoint of the sum $ \underline{x}+\underline{y}$ lies. Since it is a parallelogram, the two construction lines are congruent to the vectors $ \underline{x}$ and $ \underline{y}$. As a result, the vector sum is often expressed as a triangle by translating the origin of one member of the sum to the tip of the other, as shown in Fig.5.3.

Figure 5.3: Vector sum, translating one vector to the tip of the other.

In the figure, $ \underline{x}$ was translated to the tip of $ \underline{y}$. This depicts $ y+x$, since ``$ x$ picks up where $ y$ leaves off.'' It is equally valid to translate $ \underline{y}$ to the tip of $ \underline{x}$, because vector addition is commutative, i.e., $ \underline{x}+\underline{y}$ = $ \underline{y}+\underline{x}$.

Vector Subtraction

Figure 5.4 illustrates the vector difference $ \underline{w}=\underline{x}-\underline{y}$ between $ \underline{x}=(2,3)$ and $ \underline{y}=(4,1)$. From the coordinates, we compute $ \underline{w}= \underline{x}-\underline{y}= (-2, 2)$.

Figure 5.4: Geometric interpretation of a difference vector.

Note that the difference vector $ \underline{w}$ may be drawn from the tip of $ \underline{y}$ to the tip of $ \underline{x}$ rather than from the origin to the point $ (-2,2)$; this is a customary practice which emphasizes relationships among vectors, but the translation in the plot has no effect on the mathematical definition or properties of the vector. Subtraction, however, is not commutative.

To ascertain the proper orientation of the difference vector $ \underline{w}=\underline{x}-\underline{y}$, rewrite its definition as $ \underline{x}=\underline{y}+\underline{w}$, and then it is clear that the vector $ \underline{x}$ should be the sum of vectors $ \underline{y}$ and $ \underline{w}$, hence the arrowhead is on the correct endpoint. Or remember ``$ x-y$ points to $ x$,'' or ``$ x-y$ is $ x$ from $ y$.''

Scalar Multiplication

A scalar is any constant value used as a scale factor applied to a vector. Mathematically, all of our scalars will be either real or complex numbers.5.3 For example, if $ \underline{x}\in{\bf C}^N$ denotes a vector of $ N$ complex elements, and $ \alpha\in{\bf C}$ denotes a complex scalar, then

$\displaystyle \alpha\, \underline{x}\isdef (\alpha\,x_1, \alpha\,x_2, \ldots, \alpha\,x_N)

denotes the scalar multiplication of $ \underline{x}$ by $ \alpha$. Thus, multiplication of a vector by a scalar is done in the obvious way, which is to multiply each coordinate of the vector by the scalar.

In signal processing, we think of scalar multiplication as applying some constant scale factor to a signal, i.e., multiplying each sample of the signal by the same constant number. For example, a 6 dB boost can be carried out by multiplying each sample of a signal by 2, in which case 2 is the scalar. When the scalar magnitude is greater than one, it is often called a gain factor, and when it is less than one, an attenuation.

Linear Combination of Vectors

A linear combination of vectors is a sum of scalar multiples of those vectors. That is, given a set of $ M$ vectors $ \underline{x}_i$ of the same type,5.4 such as $ {\bf R}^N$ (they must have the same number of elements so they can be added), a linear combination is formed by multiplying each vector by a scalar $ \alpha_i$ and summing to produce a new vector $ \underline{y}$ of the same type:

$\displaystyle \underline{y}= \alpha_1 \underline{x}_1 + \alpha_2 \underline{x}_2 + \cdots + \alpha_M \underline{x}_M

For example, let $ \underline{x}_1=(1,2,3)$, $ \underline{x}_2=(4,5,6)$, $ \alpha_1=2$, and $ \alpha_2=3$. Then the linear combination of $ \underline{x}_1$ and $ \underline{x}_2$ is given by

$\displaystyle \underline{y}= \alpha_1\underline{x}_1 + \alpha_2\underline{x}_2 = 2\cdot(1,2,3) + 3\cdot(4,5,6)
= (2,4,6)+(12,15,18) = (14,19,24).

In signal processing, we think of a linear combination as a signal mix. Thus, the output of a mixing console may be regarded as a linear combination of the input signal tracks.

Linear Vector Space

A set of vectors may be called a linear vector space if it is closed under linear combinations. That is, given any two vectors $ \underline{x}_1$ and $ \underline{x}_2$ from the set, the linear combination

$\displaystyle \underline{y}= \alpha_1\underline{x}_1 + \alpha_2\underline{x}_2

is also in the set, for all scalars $ \alpha_1$ and $ \alpha_2$. In our context, most generally, the vector coordinates and the scalars can be any complex numbers. Since complex numbers are closed under multiplication and addition, it follows that the set of all vectors in $ {\bf C}^N$ with complex scalars ( $ \alpha\in{\bf C}$) forms a linear vector space. The same can be said of real length-$ N$ vectors in $ {\bf R}^N$ with real scalars ( $ \alpha\in{\bf R}$). However, real vectors with complex scalars do not form a vector space, since scalar multiplication can take a real vector to a complex vector outside of the set of real vectors.

Signal Metrics

This section defines some useful functions of signals (vectors).

The mean of a signal $ x$ (more precisely the ``sample mean'') is defined as the average value of its samples:5.5

$\displaystyle \mu_x \isdef \frac{1}{N}\sum_{n=0}^{N-1}x_n$   $\displaystyle \mbox{(mean of $x$)}$

The total energy of a signal $ x$ is defined as the sum of squared moduli:

$\displaystyle {\cal E}_x \isdef \sum_{n=0}^{N-1}\left\vert x_n\right\vert^2$   $\displaystyle \mbox{(energy of $x$)}$

In physics, energy (the ``ability to do work'') and work are in units of ``force times distance,'' ``mass times velocity squared,'' or other equivalent combinations of units.5.6 In digital signal processing, physical units are routinely discarded, and signals are renormalized whenever convenient. Therefore, $ {\cal E}_x$ is defined above without regard for constant scale factors such as ``wave impedance'' or the sampling interval $ T$.

The average power of a signal $ x$ is defined as the energy per sample:

$\displaystyle {\cal P}_x \isdef \frac{{\cal E}_x}{N} = \frac{1}{N} \sum_{n=0}^{N-1}\left\vert x_n\right\vert^2$   $\displaystyle \mbox{(average power of $x$)}$

Another common description when $ x$ is real is the mean square. When $ x$ is a complex sinusoid, i.e., $ x(n) = Ae^{j(\omega nT +
\phi)}$, then $ {\cal P}_x = A^2$; in other words, for complex sinusoids, the average power equals the instantaneous power which is the amplitude squared. For real sinusoids, $ y_n =$   re$ \left\{x_n\right\}=A\cos(\omega nT+\phi)$, we have $ {\cal P}_y = A^2/2$.

Power is always in physical units of energy per unit time. It therefore makes sense to define the average signal power as the total signal energy divided by its length. We normally work with signals which are functions of time. However, if the signal happens instead to be a function of distance (e.g., samples of displacement along a vibrating string), then the ``power'' as defined here still has the interpretation of a spatial energy density. Power, in contrast, is a temporal energy density.

The root mean square (RMS) level of a signal $ x$ is simply $ \sqrt{{\cal P}_x}$. However, note that in practice (especially in audio work) an RMS level is typically computed after subtracting out any nonzero mean value.

The variance (more precisely the sample variance) of the signal $ x$ is defined as the power of the signal with its mean removed:5.7

$\displaystyle \sigma_x^2 \isdef \frac{1}{N}\sum_{n=0}^{N-1}\left\vert x_n - \mu_x\right\vert^2$   $\displaystyle \mbox{(sample variance of $x$)}$

It is quick to show that, for real signals, we have

$\displaystyle \sigma_x^2 = {\cal P}_x - \mu_x^2

which is the ``mean square minus the mean squared.'' We think of the variance as the power of the non-constant signal components (i.e., everything but dc). The terms ``sample mean'' and ``sample variance'' come from the field of statistics, particularly the theory of stochastic processes. The field of statistical signal processing [27,33,65] is firmly rooted in statistical topics such as ``probability,'' ``random variables,'' ``stochastic processes,'' and ``time series analysis.'' In this book, we will only touch lightly on a few elements of statistical signal processing in a self-contained way.

The norm (more specifically, the $ L2$ norm, or Euclidean norm) of a signal $ x$ is defined as the square root of its total energy:

$\displaystyle \Vert x\Vert \isdef \sqrt{{\cal E}_x} = \sqrt{\sum_{n=0}^{N-1}\left\vert x_n\right\vert^2}$   $\displaystyle \mbox{(norm of $x$)}$

We think of $ \Vert x\Vert$ as the length of the vector $ x$ in $ N$-space. Furthermore, $ \Vert x-y\Vert$ is regarded as the distance between $ x$ and $ y$. The norm can also be thought of as the ``absolute value'' or ``radius'' of a vector.5.8

Other Lp Norms

Since our main norm is the square root of a sum of squares,

$\displaystyle \Vert x\Vert \isdef \sqrt{{\cal E}_x} = \sqrt{\sum_{n=0}^{N-1}\left\vert x_n\right\vert^2}$   $\displaystyle \mbox{(norm of $x$)}$$\displaystyle ,

we are using what is called an $ L2$ norm and we may write $ \Vert x\Vert _2$ to emphasize this fact.

We could equally well have chosen a normalized $ L2$ norm:

$\displaystyle \Vert x\Vert _{\tilde{2}} \isdef \sqrt{{\cal P}_x} = \sqrt{\frac{...
\left\vert x_n\right\vert^2} \qquad \mbox{(normalized $L2$\ norm of $x$)}

which is simply the ``RMS level'' of $ x$ (``Root Mean Square'').

More generally, the (unnormalized) $ Lp$ norm of $ x\in{\bf C}^N$ is defined as

$\displaystyle \Vert x\Vert _p \isdef \left(\sum_{n=0}^{N-1}\left\vert x_n\right\vert^p\right)^{1/p}.

(The normalized case would include $ 1/N$ in front of the summation.) The most interesting $ Lp$ norms are
  • $ p=1$: The $ L1$, ``absolute value,'' or ``city block'' norm.
  • $ p=2$: The $ L2$, ``Euclidean,'' ``root energy,'' or ``least squares'' norm.
  • $ p=\infty$: The $ L-infinity$, ``Chebyshev,'' ``supremum,'' ``minimax,'' or ``uniform'' norm.
Note that the case $ p=\infty$ is a limiting case which becomes

$\displaystyle \Vert x\Vert _\infty = \max_{0\leq n < N} \left\vert x_n\right\vert.

Norm Properties

There are many other possible choices of norm. To qualify as a norm on $ {\bf C}^N$, a real-valued signal-function $ f(\underline{x})$ must satisfy the following three properties:

  1. $ f(\underline{x})\ge 0$, with $ 0\Leftrightarrow \underline{x}=\underline{0}$
  2. $ f(\underline{x}+\underline{y})\leq f(\underline{x})+f(\underline{y})$
  3. $ f(c\underline{x}) = \left\vert c\right\vert f(\underline{x})$, $ \forall c\in{\bf C}$
The first property, ``positivity,'' says the norm is nonnegative, and only the zero vector has norm zero. The second property is ``subadditivity'' and is sometimes called the ``triangle inequality'' for reasons that can be seen by studying Fig.5.6. The third property says the norm is ``absolutely homogeneous'' with respect to scalar multiplication. (The scalar $ c$ can be complex, in which case the angle of $ c$ has no effect).

Banach Spaces

Mathematically, what we are working with so far is called a Banach space, which is a normed linear vector space. To summarize, we defined our vectors as any list of $ N$ real or complex numbers which we interpret as coordinates in the $ N$-dimensional vector space. We also defined vector addition5.3) and scalar multiplication5.5) in the obvious way. To have a linear vector space (§5.7), it must be closed under vector addition and scalar multiplication (linear combinations). I.e., given any two vectors $ \underline{x}\in{\bf C}^N$ and $ \underline{y}\in{\bf C}^N$ from the vector space, and given any two scalars $ \alpha\in{\bf C}$ and $ \beta\in{\bf C}$ from the field of scalars $ {\bf C}^N$, the linear combination $ \alpha \underline{x}+ \beta\underline{y}$ must also be in the space. Since we have used the field of complex numbers $ {\bf C}$ (or real numbers $ {\bf R}$) to define both our scalars and our vector components, we have the necessary closure properties so that any linear combination of vectors from $ {\bf C}^N$ lies in $ {\bf C}^N$. Finally, the definition of a norm (any norm) elevates a vector space to a Banach space.

The Inner Product

The inner product (or ``dot product'', or ``scalar product'') is an operation on two vectors which produces a scalar. Defining an inner product for a Banach space specializes it to a Hilbert space (or ``inner product space''). There are many examples of Hilbert spaces, but we will only need $ \{{\bf C}^N,{\bf C}\}$ for this book (complex length $ N$ vectors, and complex scalars).

The inner product between (complex) $ N$-vectors $ \underline{u}$ and $ \underline{v}$ is defined by5.9

$\displaystyle \zbox {\left<\underline{u},\underline{v}\right> \isdef \sum_{n=0}^{N-1}u(n)\overline{v(n)}.}

The complex conjugation of the second vector is done in order that a norm will be induced by the inner product:5.10

$\displaystyle \left<\underline{u},\underline{u}\right> = \sum_{n=0}^{N-1}u(n)\o...
...sum_{n=0}^{N-1}\left\vert u(n)\right\vert^2 \isdef {\cal E}_u = \Vert u\Vert^2

As a result, the inner product is conjugate symmetric:

$\displaystyle \left<\underline{v},\underline{u}\right> = \overline{\left<\underline{u},\underline{v}\right>}

Note that the inner product takes $ {\bf C}^N\times{\bf C}^N$ to $ {\bf C}$. That is, two length $ N$ complex vectors are mapped to a complex scalar.

Linearity of the Inner Product

Any function $ f(\underline{u})$ of a vector $ \underline{u}\in{\bf C}^N$ (which we may call an operator on $ {\bf C}^N$) is said to be linear if for all $ \underline{u}\in{\bf C}^N$ and $ \underline{v}\in{\bf C}^N$, and for all scalars $ \alpha$ and $ \beta $ in $ {\bf C}$,

$\displaystyle f(\alpha \underline{u}+ \beta \underline{v}) = \alpha f(\underline{u}) + \beta f(\underline{v}).

A linear operator thus ``commutes with mixing.'' Linearity consists of two component properties:
  • additivity: $ f(\underline{u}+\underline{v}) = f(\underline{u}) + f(\underline{v})$
  • homogeneity: $ f(\alpha \underline{u}) = \alpha f(\underline{u})$
A function of multiple vectors, e.g., $ f(\underline{u},\underline{v},\underline{w})$ can be linear or not with respect to each of its arguments.

The inner product $ \left<\underline{u},\underline{v}\right>$ is linear in its first argument, i.e., for all $ \underline{u},\underline{v},\underline{w}\in{\bf C}^N$, and for all $ \alpha, \beta\in{\bf C}^N$,

$\displaystyle \left<\alpha \underline{u}+ \beta \underline{v},\underline{w}\rig...
...line{u},\underline{w}\right> + \beta \left<\underline{v},\underline{w}\right>.

This is easy to show from the definition:

\left<\alpha \underline{u}+ \beta \underline{v},\underline{w}\...
...rline{w}\right> + \beta \left<\underline{v},\underline{w}\right>

The inner product is also additive in its second argument, i.e.,

$\displaystyle \left<\underline{u},\underline{v}+ \underline{w}\right> = \left<\underline{u},\underline{v}\right> + \left<\underline{u},\underline{w}\right>,

but it is only conjugate homogeneous (or antilinear) in its second argument, since

$\displaystyle \left<\underline{u},\alpha \underline{v}\right> = \overline{\alph...
...{u},\underline{v}\right> \neq \alpha \left<\underline{u},\underline{v}\right>.

The inner product is strictly linear in its second argument with respect to real scalars $ a$ and $ b$:

$\displaystyle \left<\underline{u},a \underline{v}+ b \underline{w}\right> = a \...{v}\right> + b \left<\underline{u},\underline{w}\right>, \quad a,b\in{\bf R}

where $ \underline{u},\underline{v},\underline{w}\in{\bf C}^N$.

Since the inner product is linear in both of its arguments for real scalars, it may be called a bilinear operator in that context.

Norm Induced by the Inner Product

We may define a norm on $ \underline{u}\in{\bf C}^N$ using the inner product:

$\displaystyle \zbox {\Vert\underline{u}\Vert \isdef \sqrt{\left<\underline{u},\underline{u}\right>}}

It is straightforward to show that properties 1 and 3 of a norm hold (see §5.8.2). Property 2 follows easily from the Schwarz Inequality which is derived in the following subsection. Alternatively, we can simply observe that the inner product induces the well known $ L2$ norm on $ {\bf C}^N$.

Cauchy-Schwarz Inequality

The Cauchy-Schwarz Inequality (or ``Schwarz Inequality'') states that for all $ \underline{u}\in{\bf C}^N$ and $ \underline{v}\in{\bf C}^N$, we have

$\displaystyle \zbox {\left\vert\left<\underline{u},\underline{v}\right>\right\vert \leq \Vert\underline{u}\Vert\cdot\Vert\underline{v}\Vert}

with equality if and only if $ \underline{u}=c\underline{v}$ for some scalar $ c$.

We can quickly show this for real vectors $ \underline{u}\in{\bf R}^N$, $ \underline{v}\in{\bf R}^N$, as follows: If either $ \underline{u}$ or $ \underline{v}$ is zero, the inequality holds (as equality). Assuming both are nonzero, let's scale them to unit-length by defining the normalized vectors $ \underline{\tilde{u}}\isdeftext
\underline{u}/\Vert\underline{u}\Vert$, $ \underline{\tilde{v}}\isdeftext \underline{v}/\Vert\underline{v}\Vert$, which are unit-length vectors lying on the ``unit ball'' in $ {\bf R}^N$ (a hypersphere of radius $ 1$). We have

0 \leq \Vert\underline{\tilde{u}}-\underline{\tilde{v}}\Vert^2...
...=& 2 - 2\left<\underline{\tilde{u}},\underline{\tilde{v}}\right>

which implies

$\displaystyle \left<\underline{\tilde{u}},\underline{\tilde{v}}\right> \leq 1

or, removing the normalization,

$\displaystyle \left<\underline{u},\underline{v}\right> \leq \Vert\underline{u}\Vert\cdot\Vert\underline{v}\Vert.

The same derivation holds if $ \underline{u}$ is replaced by $ -\underline{u}$ yielding

$\displaystyle -\left<\underline{u},\underline{v}\right> \leq \Vert\underline{u}\Vert\cdot\Vert\underline{v}\Vert.

The last two equations imply

$\displaystyle \left\vert\left<\underline{u},\underline{v}\right>\right\vert \leq \Vert\underline{u}\Vert\cdot\Vert\underline{v}\Vert.

In the complex case, let $ \left<\underline{u},\underline{v}\right>=R e^{j\theta}$, and define $ \underline{\tilde{v}}=\underline{v}e^{j\theta}$. Then $ \left<\underline{u},\underline{\tilde{v}}\right>$ is real and equal to $ \vert\left<\underline{u},\underline{\tilde{v}}\right>\vert=R>0$. By the same derivation as above,

$\displaystyle \left<\underline{u},\underline{\tilde{v}}\right>\leq\Vert\underli...
...derline{\tilde{v}}\Vert = \Vert\underline{u}\Vert\cdot\Vert\underline{v}\Vert.

Since $ \left<\underline{u},\underline{\tilde{v}}\right>=R=\left\vert\left<\underline{...
...right>\right\vert=\left\vert\left<\underline{u},\underline{v}\right>\right\vert$, the result is established also in the complex case.

Triangle Inequality

The triangle inequality states that the length of any side of a triangle is less than or equal to the sum of the lengths of the other two sides, with equality occurring only when the triangle degenerates to a line. In $ {\bf C}^N$, this becomes

$\displaystyle \zbox {\Vert\underline{u}+\underline{v}\Vert \leq \Vert\underline{u}\Vert + \Vert\underline{v}\Vert.}

We can show this quickly using the Schwarz Inequality:

\Vert\underline{u}+\underline{v}\Vert^2 &=& \left<\underline{u...
...v}\Vert &\leq& \Vert\underline{u}\Vert + \Vert\underline{v}\Vert

Triangle Difference Inequality

A useful variation on the triangle inequality is that the length of any side of a triangle is greater than the absolute difference of the lengths of the other two sides:

$\displaystyle \zbox {\Vert\underline{u}-\underline{v}\Vert \geq \left\vert\Vert\underline{u}\Vert - \Vert\underline{v}\Vert\right\vert}

Proof: By the triangle inequality,

\Vert\underline{v}+ (\underline{u}-\underline{v})\Vert &\leq &...
...}\Vert &\geq& \Vert\underline{u}\Vert - \Vert\underline{v}\Vert.

Interchanging $ \underline{u}$ and $ \underline{v}$ establishes the absolute value on the right-hand side.

Vector Cosine

The Cauchy-Schwarz Inequality can be written

$\displaystyle \frac{\left\vert\left<\underline{u},\underline{v}\right>\right\vert}{\Vert\underline{u}\Vert\cdot\Vert\underline{v}\Vert} \leq 1.

In the case of real vectors $ \underline{u},\underline{v}$, we can always find a real number $ \theta\in[0,\pi]$ which satisfies

$\displaystyle \zbox {\cos(\theta) \isdef \frac{\left<\underline{u},\underline{v}\right>}{\Vert\underline{u}\Vert\cdot\Vert\underline{v}\Vert}.}

We thus interpret $ \theta$ as the angle between two vectors in $ {\bf R}^N$.


The vectors (signals) $ x$ and $ y$5.11are said to be orthogonal if $ \left<x,y\right>=0$, denoted $ x\perp y$. That is to say

$\displaystyle \zbox {x\perp y \Leftrightarrow \left<x,y\right>=0.}

Note that if $ x$ and $ y$ are real and orthogonal, the cosine of the angle between them is zero. In plane geometry ($ N=2$), the angle between two perpendicular lines is $ \pi/2$, and $ \cos(\pi/2)=0$, as expected. More generally, orthogonality corresponds to the fact that two vectors in $ N$-space intersect at a right angle and are thus perpendicular geometrically.

Example ($ N=2$):

Let $ x=[1,1]$ and $ y=[1,-1]$, as shown in Fig.5.8.

Figure 5.8: Example of two orthogonal vectors for $ N=2$.

The inner product is $ \left<x,y\right>=1\cdot \overline{1} + 1\cdot\overline{(-1)} = 0$. This shows that the vectors are orthogonal. As marked in the figure, the lines intersect at a right angle and are therefore perpendicular.

The Pythagorean Theorem in N-Space

In 2D, the Pythagorean Theorem says that when $ x$ and $ y$ are orthogonal, as in Fig.5.8, (i.e., when the vectors $ x$ and $ y$ intersect at a right angle), then we have

$\displaystyle \Vert x+y\Vert^2 = \Vert x\Vert^2 + \Vert y\Vert^2$   $\displaystyle \mbox{($x\perp y$)}$$\displaystyle . \protect$ (5.1)

This relationship generalizes to $ N$ dimensions, as we can easily show:
$\displaystyle \Vert x+y\Vert^2$ $\displaystyle =$ $\displaystyle \left<x+y,x+y\right>$  
  $\displaystyle =$ $\displaystyle \left<x,x\right>+\left<x,y\right>+\left<y,x\right>+\left<y,y\right>$  
  $\displaystyle =$ $\displaystyle \Vert x\Vert^2 + \left<x,y\right>+\overline{\left<x,y\right>} + \Vert y\Vert^2$  
  $\displaystyle =$ $\displaystyle \Vert x\Vert^2 + \Vert y\Vert^2 + 2$re$\displaystyle \left\{\left<x,y\right>\right\}
\protect$ (5.2)

If $ x\perp y$, then $ \left<x,y\right>=0$ and Eq.$ \,$(5.1) holds in $ N$ dimensions.

Note that the converse is not true in $ {\bf C}^N$. That is, $ \Vert x+y\Vert^2 = \Vert x\Vert^2 + \Vert y\Vert^2$ does not imply $ x\perp y$ in $ {\bf C}^N$. For a counterexample, consider $ x= (j,1)$, $ y=
(1, -j)$, in which case

$\displaystyle \Vert x+y\Vert^2 = \Vert 1+j,1-j\Vert^2 =
4 = \Vert x\Vert^2 + \Vert y\Vert^2

while $ \left<x,y\right> = j\cdot 1 + 1 \cdot\overline{-j} = 2j$.

For real vectors $ x,y\in{\bf R}^N$, the Pythagorean theorem Eq.$ \,$(5.1) holds if and only if the vectors are orthogonal. To see this, note that, from Eq.$ \,$(5.2), when the Pythagorean theorem holds, either $ x$ or $ y$ is zero, or $ \left<x,y\right>$ is zero or purely imaginary, by property 1 of norms (see §5.8.2). If the inner product cannot be imaginary, it must be zero.

Note that we also have an alternate version of the Pythagorean theorem:

$\displaystyle x\perp y\,\,\Rightarrow\,\,
\Vert x-y\Vert^2 = \Vert x\Vert^2 + \Vert y\Vert^2


The orthogonal projection (or simply ``projection'') of $ y\in{\bf C}^N$ onto $ x\in{\bf C}^N$ is defined by

$\displaystyle \zbox {{\bf P}_{x}(y) \isdef \frac{\left<y,x\right>}{\Vert x\Vert^2} x.}

The complex scalar $ \left<y,x\right>/\Vert x\Vert^2$ is called the coefficient of projection. When projecting $ y$ onto a unit length vector $ x$, the coefficient of projection is simply the inner product of $ y$ with $ x$.

Motivation: The basic idea of orthogonal projection of $ y$ onto $ x$ is to ``drop a perpendicular'' from $ y$ onto $ x$ to define a new vector along $ x$ which we call the ``projection'' of $ y$ onto $ x$. This is illustrated for $ N=2$ in Fig.5.9 for $ x= [4,1]$ and $ y=[2,3]$, in which case

$\displaystyle {\bf P}_{x}(y) \isdef \frac{\left<y,x\right>}{\Vert x\Vert^2} x
...{1})}{4^2+1^2} x
= \frac{11}{17} x= \left[\frac{44}{17},\frac{11}{17}\right].

Figure: Projection of $ y$ onto $ x$ in 2D space.

Derivation: (1) Since any projection onto $ x$ must lie along the line collinear with $ x$, write the projection as $ {\bf P}_{x}(y)=\alpha
x$. (2) Since by definition the projection error $ y-{\bf P}_{x}(y)$ is orthogonal to $ x$, we must have

(y-\alpha x) & \perp & x\\
= \frac{\left<y,x\right>}{\Vert x\Vert^2}.


$\displaystyle {\bf P}_{x}(y) = \frac{\left<y,x\right>}{\Vert x\Vert^2} x.

See §I.3.3 for illustration of orthogonal projection in matlab.

Signal Reconstruction from Projections

We now know how to project a signal onto other signals. We now need to learn how to reconstruct a signal $ x\in{\bf C}^N$ from its projections onto $ N$ different vectors $ \sv_k\in{\bf C}^N$, $ k=0,1,2,\ldots,N-1$. This will give us the inverse DFT operation (or the inverse of whatever transform we are working with).

As a simple example, consider the projection of a signal $ x\in{\bf C}^N$ onto the rectilinear coordinate axes of $ {\bf C}^N$. The coordinates of the projection onto the 0th coordinate axis are simply $ (x_0,0,\ldots,0)$. The projection along coordinate axis $ 1$ has coordinates $ (0,x_1,0,\ldots,0)$, and so on. The original signal $ x$ is then clearly the vector sum of its projections onto the coordinate axes:

x &=& (x_0,\ldots,x_{N-1})\\
&=& (x_0,0,\ldots,0) + (0,x_1,0,\ldots,0) + \cdots

To make sure the previous paragraph is understood, let's look at the details for the case $ N=2$. We want to project an arbitrary two-sample signal $ x= (x_0,x_1)$ onto the coordinate axes in 2D. A coordinate axis can be generated by multiplying any nonzero vector by scalars. The horizontal axis can be represented by any vector of the form $ \underline{e}_0=(\alpha,0)$, $ \alpha\neq0$ while the vertical axis can be represented by any vector of the form $ \underline{e}_1=(0,\beta)$, $ \beta\neq 0$. For maximum simplicity, let's choose

\underline{e}_0 &\isdef & [1,0], \\
\underline{e}_1 &\isdef & [0,1].

The projection of $ x$ onto $ \underline{e}_0$ is, by definition,

{\bf P}_{\underline{e}_0}(x) &\isdef & \frac{\left<x,\underlin...
...0}) \underline{e}_0
= x_0 \underline{e}_0\\ [5pt]
&=& [x_0,0].

Similarly, the projection of $ x$ onto $ \underline{e}_1$ is

{\bf P}_{\underline{e}_1}(x) &\isdef & \frac{\left<x,\underlin...
...1}) \underline{e}_1
= x_1 \underline{e}_1\\ [5pt]
&=& [0,x_1].

The reconstruction of $ x$ from its projections onto the coordinate axes is then the vector sum of the projections:

$\displaystyle x= {\bf P}_{\underline{e}_0}(x) + {\bf P}_{\underline{e}_1}(x) = ...
..._0 + x_1 \underline{e}_1
\isdef x_0 \cdot [1,0] + x_1 \cdot [0,1] = (x_0,x_1)

The projection of a vector onto its coordinate axes is in some sense trivial because the very meaning of the coordinates is that they are scalars $ x_n$ to be applied to the coordinate vectors $ \underline{e}_n$ in order to form an arbitrary vector $ x\in{\bf C}^N$ as a linear combination of the coordinate vectors:

$\displaystyle x\isdef x_0 \underline{e}_0 + x_1 \underline{e}_1 + \cdots + x_{N-1} \underline{e}_{N-1}

Note that the coordinate vectors are orthogonal. Since they are also unit length, $ \Vert\underline{e}_n\Vert=1$, we say that the coordinate vectors $ \{\underline{e}_n\}_{n=0}^{N-1}$ are orthonormal.

Changing Coordinates

What's more interesting is when we project a signal $ x$ onto a set of vectors other than the coordinate set. This can be viewed as a change of coordinates in $ {\bf C}^N$. In the case of the DFT, the new vectors will be chosen to be sampled complex sinusoids.

An Example of Changing Coordinates in 2D

As a simple example, let's pick the following pair of new coordinate vectors in 2D:

\sv_0 &\isdef & [1,1] \\
\sv_1 &\isdef & [1,-1]

These happen to be the DFT sinusoids for $ N=2$ having frequencies $ f_0=0$ (``dc'') and $ f_1=f_s/2$ (half the sampling rate). (The sampled complex sinusoids of the DFT reduce to real numbers only for $ N=1$ and $ N=2$.) We already showed in an earlier example that these vectors are orthogonal. However, they are not orthonormal since the norm is $ \sqrt{2}$ in each case. Let's try projecting $ x$ onto these vectors and seeing if we can reconstruct by summing the projections.

The projection of $ x$ onto $ \sv_0$ is, by definition,5.12

{\bf P}_{\sv_0}(x) &\isdef & \frac{\left<x,\sv_0\right>}{\Vert...
...+ x_1 \cdot \overline{1})}{2} \sv_0
= \frac{x_0 + x_1}{2}\sv_0.

Similarly, the projection of $ x$ onto $ \sv_1$ is

{\bf P}_{\sv_1}(x) &\isdef & \frac{\left<x,\sv_1\right>}{\Vert...
...- x_1 \cdot \overline{1})}{2} \sv_1
= \frac{x_0 - x_1}{2}\sv_1.

The sum of these projections is then

{\bf P}_{\sv_0}(x) + {\bf P}_{\sv_1}(x) &=&
\frac{x_0 + x_1}...
...} - \frac{x_0 - x_1}{2}\right) \\ [5pt]
&=& (x_0,x_1) \isdef x.

It worked!

Projection onto Linearly Dependent Vectors

Now consider another example:

\sv_0 &\isdef & [1,1], \\
\sv_1 &\isdef & [-1,-1].

The projections of $ x=[x_0,x_1]$ onto these vectors are

{\bf P}_{\sv_0}(x) &=& \frac{x_0 + x_1}{2}\sv_0, \\
{\bf P}_{\sv_1}(x) &=& -\frac{x_0 + x_1}{2}\sv_1.

The sum of the projections is

{\bf P}_{\sv_0}(x) + {\bf P}_{\sv_1}(x) &=&
\frac{x_0 + x_1}...
... + x_1}{2} (-1,-1) \\
&=& \left(x_0+x_1,x_0+x_1\right) \neq x.

Something went wrong, but what? It turns out that a set of $ N$ vectors can be used to reconstruct an arbitrary vector in $ {\bf C}^N$ from its projections only if they are linearly independent. In general, a set of vectors is linearly independent if none of them can be expressed as a linear combination of the others in the set. What this means intuitively is that they must ``point in different directions'' in $ N$-space. In this example $ s_1 = - s_0$ so that they lie along the same line in $ 2$-space. As a result, they are linearly dependent: one is a linear combination of the other ( $ s_1 = (-1)s_0$).

Projection onto Non-Orthogonal Vectors

Consider this example:

\sv_0 &\isdef & [1,1] \\
\sv_1 &\isdef & [0,1]

These point in different directions, but they are not orthogonal. What happens now? The projections are

{\bf P}_{\sv_0}(x) &=& \frac{x_0 + x_1}{2}\sv_0 \\
{\bf P}_{\sv_1}(x) &=& x_1\sv_1.

The sum of the projections is

{\bf P}_{\sv_0}(x) + {\bf P}_{\sv_1}(x) &=&
\frac{x_0 + x_1}...
...\frac{x_0 + x_1}{2},
\frac{x_0 + 3x_1}{2}\right) \\
&\neq& x.

So, even though the vectors are linearly independent, the sum of projections onto them does not reconstruct the original vector. Since the sum of projections worked in the orthogonal case, and since orthogonality implies linear independence, we might conjecture at this point that the sum of projections onto a set of $ N$ vectors will reconstruct the original vector only when the vector set is orthogonal, and this is true, as we will show.

It turns out that one can apply an orthogonalizing process, called Gram-Schmidt orthogonalization to any $ N$ linearly independent vectors in $ {\bf C}^N$ so as to form an orthogonal set which will always work. This will be derived in Section 5.10.4.

Obviously, there must be at least $ N$ vectors in the set. Otherwise, there would be too few degrees of freedom to represent an arbitrary $ x\in{\bf C}^N$. That is, given the $ N$ coordinates $ \{u(n)\}_{n=0}^{N-1}$ of $ x$ (which are scale factors relative to the coordinate vectors $ \underline{e}_n$ in $ {\bf C}^N$), we have to find at least $ N$ coefficients of projection (which we may think of as coordinates relative to new coordinate vectors $ \sv_k$). If we compute only $ M<N$ coefficients, then we would be mapping a set of $ N$ complex numbers to $ M<N$ numbers. Such a mapping cannot be invertible in general. It also turns out $ N$ linearly independent vectors is always sufficient. The next section will summarize the general results along these lines.

General Conditions

This section summarizes and extends the above derivations in a more formal manner (following portions of chapter 4 of $ \cite{Noble}$). In particular, we establish that the sum of projections of $ x\in{\bf C}^N$ onto $ N$ vectors $ \sv_k\in{\bf C}^N$ will give back the original vector $ x$ whenever the set $ \{\sv_k\}_0^{N-1}$ is an orthogonal basis for $ {\bf C}^N$.

Definition: A set of vectors is said to form a vector space if, given any two members $ x$ and $ y$ from the set, the vectors $ x+y$ and $ cx$ are also in the set, where $ c\in{\bf C}^N$ is any scalar.

Definition: The set of all $ N$-dimensional complex vectors is denoted $ {\bf C}^N$. That is, $ {\bf C}^N$ consists of all vectors $ \underline{x}=
(x_0,\ldots,x_{N-1})$ defined as a list of $ N$ complex numbers $ x_i\in{\bf C}$.

Theorem: $ {\bf C}^N$ is a vector space under elementwise addition and multiplication by complex scalars.

Proof: This is a special case of the following more general theorem.

Theorem: Let $ M\le N$ be an integer greater than 0. Then the set of all linear combinations of $ M$ vectors from $ {\bf C}^N$ forms a vector space under elementwise addition and multiplication by complex scalars.

Proof: Let the original set of $ M$ vectors be denoted $ \sv_0,\ldots,\sv_{M-1}$. Form

$\displaystyle x_0 = \alpha_0 \sv_0 + \cdots + \alpha_{M-1}\sv_{M-1}

as a particular linear combination of $ \{\sv_i\}_{i=0}^{M-1}$. Then

$\displaystyle cx_0 = c\alpha_0 \sv_0 + \cdots + c\alpha_{M-1}\sv_{M-1}

is also a linear combination of $ \{\sv_i\}_{i=0}^{M-1}$, since complex numbers are closed under multiplication ( $ c\alpha_i\in{\bf C}$ for each $ i$). Now, given any second linear combination of $ \{\sv_i\}_{i=0}^{M-1}$,

$\displaystyle x_1 = \beta_0 \sv_0 + \cdots + \beta_{M-1}\sv_{M-1},

the sum is

x_0 + x_1 &=& (\alpha_0 \sv_0 + \cdots \alpha_{M-1}\sv_{M-1}) ...
... \beta_0) \sv_0 + \cdots + (\alpha_{M-1} + \beta_{M-1})\sv_{M-1}

which is yet another linear combination of the original vectors (since complex numbers are closed under addition). Since we have shown that scalar multiples and vector sums of linear combinations of the original $ M$ vectors from $ {\bf C}^N$ are also linear combinations of those same original $ M$ vectors from $ {\bf C}^N$, we have that the defining properties of a vector space are satisfied. $ \Box$

Note that the closure of vector addition and scalar multiplication are ``inherited'' from the closure of complex numbers under addition and multiplication.

Corollary: The set of all linear combinations of $ M$ real vectors $ x\in{\bf R}^N$, using real scalars $ \alpha_i\in{\bf R}$, form a vector space.

Definition: The set of all linear combinations of a set of $ M$ complex vectors from $ {\bf C}^N$, using complex scalars, is called a complex vector space of dimension $ M$.

Definition: The set of all linear combinations of a set of $ M$ real vectors from $ {\bf R}^N$, using real scalars, is called a real vector space of dimension $ M$.

Definition: If a vector space consists of the set of all linear combinations of a finite set of vectors $ \sv_0,\ldots,\sv_{M-1}$, then those vectors are said to span the space.

Example: The coordinate vectors in $ {\bf C}^N$ span $ {\bf C}^N$ since every vector $ x\in{\bf C}^N$ can be expressed as a linear combination of the coordinate vectors as

$\displaystyle x= x_0 \underline{e}_0 + x_1 \underline{e}_1 + \cdots + x_{N-1}\underline{e}_{N-1}

where $ x_i\in{\bf C}$, and

\underline{e}_0 &=& (1,0,0,\ldots,0),\\
\underline{e}_1 &=& (...
...x{, and so on up to }\\
\underline{e}_{N-1} &=& (0,\ldots,0,1).

Definition: The vector space spanned by a set of $ M<N$ vectors from $ {\bf C}^N$ is called an $ M$-dimensional subspace of $ {\bf C}^N$.

Definition: A vector $ \underline{s}\in{\bf C}^N$ is said to be linearly dependent on a set of $ M\le N$ vectors $ \sv_i\in{\bf C}^N$, $ m=0,\ldots,M-1$, if $ \underline{s}$ can be expressed as a linear combination of those $ M$ vectors.

Thus, $ \underline{s}$ is linearly dependent on $ \{\sv_i\}_{i=0}^{M-1}$ if there exist scalars $ \{\alpha_i\}_{i=0}^{M-1}$ such that $ \underline{s}= \alpha_0\sv_0 +
\alpha_1\sv_1 + \cdots + \alpha_{M-1}\sv_{M-1}$. Note that the zero vector is linearly dependent on every collection of vectors.

Theorem: (i) If $ \sv_0,\ldots,\sv_{M-1}$ span a vector space, and if one of them, say $ \sv_m$, is linearly dependent on the others, then the same vector space is spanned by the set obtained by omitting $ \sv_m$ from the original set. (ii) If $ \sv_0,\ldots,\sv_{M-1}$ span a vector space, we can always select from these a linearly independent set that spans the same space.

Proof: Any $ x$ in the space can be represented as a linear combination of the vectors $ \sv_0,\ldots,\sv_{M-1}$. By expressing $ \sv_m$ as a linear combination of the other vectors in the set, the linear combination for $ x$ becomes a linear combination of vectors other than $ \sv_m$. Thus, $ \sv_m$ can be eliminated from the set, proving (i). To prove (ii), we can define a procedure for forming the required subset of the original vectors: First, assign $ \sv_0$ to the set. Next, check to see if $ \sv_0$ and $ \sv_1$ are linearly dependent. If so (i.e., $ \sv_1$ is a scalar times $ \sv_0$), then discard $ \sv_1$; otherwise assign it also to the new set. Next, check to see if $ \sv_2$ is linearly dependent on the vectors in the new set. If it is (i.e., $ \sv_2$ is some linear combination of $ \sv_0$ and $ \sv_1$) then discard it; otherwise assign it also to the new set. When this procedure terminates after processing $ \sv_{M-1}$, the new set will contain only linearly independent vectors which span the original space.

Definition: A set of linearly independent vectors which spans a vector space is called a basis for that vector space.

Definition: The set of coordinate vectors in $ {\bf C}^N$ is called the natural basis for $ {\bf C}^N$, where the $ n$th basis vector is

$\displaystyle \underline{e}_n = [\;0\;\;\cdots\;\;0\;\underbrace{1}_{\mbox{$n$th}}\;\;0\;\;\cdots\;\;0].

Theorem: The linear combination expressing a vector in terms of basis vectors for a vector space is unique.

Proof: Suppose a vector $ x\in{\bf C}^N$ can be expressed in two different ways as a linear combination of basis vectors $ \sv_0,\ldots,\sv_{N-1}$:

x&=& \alpha_0 \sv_0 + \cdots \alpha_{N-1}\sv_{N-1} \\
&=& \beta_0 \sv_0 + \cdots \beta_{N-1}\sv_{N-1}

where $ \alpha_i\neq\beta_i$ for at least one value of $ i\in[0,N-1]$. Subtracting the two representations gives

$\displaystyle \underline{0}= (\alpha_0 - \beta_0)\sv_0 + \cdots +
(\alpha_{N-1} - \beta_{N-1})\sv_{N-1}.

Since the vectors are linearly independent, it is not possible to cancel the nonzero vector $ (\alpha_i-\beta_i)\sv_i$ using some linear combination of the other vectors in the sum. Hence, $ \alpha_i=\beta_i$ for all $ i=0,1,2,\ldots,N-1$.

Note that while the linear combination relative to a particular basis is unique, the choice of basis vectors is not. For example, given any basis set in $ {\bf C}^N$, a new basis can be formed by rotating all vectors in $ {\bf C}^N$ by the same angle. In this way, an infinite number of basis sets can be generated.

As we will soon show, the DFT can be viewed as a change of coordinates from coordinates relative to the natural basis in $ {\bf C}^N$, $ \{\underline{e}_n\}_{n=0}^{N-1}$, to coordinates relative to the sinusoidal basis for $ {\bf C}^N$, $ \{\sv_k\}_{k=0}^{N-1}$, where $ \sv_k(n)=
e^{j\omega_k t_n}$. The sinusoidal basis set for $ {\bf C}^N$ consists of length $ N$ sampled complex sinusoids at frequencies $ \omega_k=2\pi k f_s/N,
k=0,1,2,\ldots,N-1$. Any scaling of these vectors in $ {\bf C}^N$ by complex scale factors could also be chosen as the sinusoidal basis (i.e., any nonzero amplitude and any phase will do). However, for simplicity, we will only use unit-amplitude, zero-phase complex sinusoids as the Fourier ``frequency-domain'' basis set. To summarize this paragraph, the time-domain samples of a signal are its coordinates relative to the natural basis for $ {\bf C}^N$, while its spectral coefficients are the coordinates of the signal relative to the sinusoidal basis for $ {\bf C}^N$.

Theorem: Any two bases of a vector space contain the same number of vectors.

Proof: Left as an exercise (or see [47]).

Definition: The number of vectors in a basis for a particular space is called the dimension of the space. If the dimension is $ N$, the space is said to be an $ N$ dimensional space, or $ N$-space.

In this book, we will only consider finite-dimensional vector spaces in any detail. However, the discrete-time Fourier transform (DTFT) and Fourier transform (FT) both require infinite-dimensional basis sets, because there is an infinite number of points in both the time and frequency domains. (See Appendix B for details regarding the FT and DTFT.)

Signal/Vector Reconstruction from Projections

We now arrive finally at the main desired result for this section:

Theorem: The projections of any vector $ x\in{\bf C}^N$ onto any orthogonal basis set for $ {\bf C}^N$ can be summed to reconstruct $ x$ exactly.

Proof: Let $ \{\sv_0,\ldots,\sv_{N-1}\}$ denote any orthogonal basis set for $ {\bf C}^N$. Then since $ x$ is in the space spanned by these vectors, we have

$\displaystyle x= \alpha_0\sv_0 + \alpha_1\sv_1 + \cdots + \alpha_{N-1}\sv_{N-1} \protect$ (5.3)

for some (unique) scalars $ \alpha_0,\ldots,\alpha_{N-1}$. The projection of $ x$ onto $ \sv_k$ is equal to

$\displaystyle {\bf P}_{\sv_k}(x) = \alpha_0{\bf P}_{\sv_k}(\sv_0) +
\alpha_1{\bf P}_{\sv_k}(\sv_1) + \cdots + \alpha_{N-1}{\bf P}_{\sv_k}(\sv_{N-1})

(using the linearity of the projection operator which follows from linearity of the inner product in its first argument). Since the basis vectors are orthogonal, the projection of $ \sv_l$ onto $ \sv_k$ is zero for $ l\neq k$:

$\displaystyle {\bf P}_{\sv_k}(\sv_l) \isdef
\underline{0}, & l\neq k \\ [5pt]
\sv_k, & l=k. \\
\end{array} \right.

We therefore obtain

$\displaystyle {\bf P}_{\sv_k}(x) = 0 + \cdots + 0 + \alpha_k{\bf P}_{\sv_k}(\sv_k) + 0 + \cdots + 0
= \alpha_k\sv_k.

Therefore, the sum of projections onto the vectors $ \sv_k$, $ k=0,1,\ldots,
N-1$, is just the linear combination of the $ \sv_k$ which forms $ x$:

$\displaystyle \sum_{k=0}^{N-1}
{\bf P}_{\sv_k}(x) = \sum_{k=0}^{N-1} \alpha_k \sv_k = x

by Eq.$ \,$(5.3). $ \Box$

Gram-Schmidt Orthogonalization

Recall from the end of §5.10 above that an orthonormal set of vectors is a set of unit-length vectors that are mutually orthogonal. In other words, orthonormal vector set is just an orthogonal vector set in which each vector $ \sv_i$ has been normalized to unit length $ \sv_i/ \vert\vert\,\sv_i\,\vert\vert $.

Theorem: Given a set of $ N$ linearly independent vectors $ \sv_0,\ldots,\sv_{N-1}$ from $ {\bf C}^N$, we can construct an orthonormal set $ \underline{\tilde{s}}_0,\ldots,\underline{\tilde{s}}_{N-1}$ which are linear combinations of the original set and which span the same space.

Proof: We prove the theorem by constructing the desired orthonormal set $ \{\underline{\tilde{s}}_k\}$ sequentially from the original set $ \{\sv_k\}$. This procedure is known as Gram-Schmidt orthogonalization.

First, note that $ \sv_k\ne \underline{0}$ for all $ k$, since $ \underline{0}$ is linearly dependent on every vector. Therefore, $ \vert\vert\,\sv_k\,\vert\vert \ne

  • Set $ \underline{\tilde{s}}_0 \isdef \frac{\sv_0}{\left\Vert\,\sv_0\,\right\Vert}$.
  • Define $ \underline{x}_1$ as $ \sv_1$ minus the projection of $ \sv_1$ onto $ \underline{\tilde{s}}_0$:

    $\displaystyle \underline{x}_1 \isdef \sv_1 - {\bf P}_{\underline{\tilde{s}}_0}(...
= \sv_1 - \left<\sv_1,\underline{\tilde{s}}_0\right>\underline{\tilde{s}}_0

    The vector $ \underline{x}_1$ is orthogonal to $ \underline{\tilde{s}}_0$ by construction. (We subtracted out the part of $ \sv_1$ that wasn't orthogonal to $ \underline{\tilde{s}}_0$.) Also, since $ \sv_1$ and $ \sv_0$ are linearly independent, we have $ \vert\vert\,\underline{x}_1\,\vert\vert \ne 0$.

  • Set $ \underline{\tilde{s}}_1 \isdef \frac{\underline{x}_1}{\left\Vert\,\underline{x}_1\,\right\Vert}$ (i.e., normalize the result of the preceding step).
  • Define $ \underline{x}_2$ as $ \sv_2$ minus the projection of $ \sv_2$ onto $ \underline{\tilde{s}}_0$ and $ \underline{\tilde{s}}_1$:

    $\displaystyle \underline{x}_2 \;\isdef \; \sv_2 - {\bf P}_{\underline{\tilde{s}...
...ilde{s}}_0 - \left<\sv_2,\underline{\tilde{s}}_1\right>\underline{\tilde{s}}_1

  • Normalize: $ \underline{\tilde{s}}_2 \isdef \frac{\underline{x}_2}{\left\Vert\,\underline{x}_2\,\right\Vert}$.
  • Continue this process until $ \underline{\tilde{s}}_{N-1}$ has been defined.

The Gram-Schmidt orthogonalization procedure will construct an orthonormal basis from any set of $ N$ linearly independent vectors. Obviously, by skipping the normalization step, we could also form simply an orthogonal basis. The key ingredient of this procedure is that each new basis vector is obtained by subtracting out the projection of the next linearly independent vector onto the vectors accepted so far into the set. We may say that each new linearly independent vector $ \sv_k$ is projected onto the subspace spanned by the vectors $ \{\underline{\tilde{s}}_0,\ldots,\underline{\tilde{s}}_{k-1}\}$, and any nonzero projection in that subspace is subtracted out of $ \sv_k$ to make the new vector orthogonal to the entire subspace. In other words, we retain only that portion of each new vector $ \sv_k$ which ``points along'' a new dimension. The first direction is arbitrary and is determined by whatever vector we choose first ($ \sv_0$ here). The next vector is forced to be orthogonal to the first. The second is forced to be orthogonal to the first two (and thus to the 2D subspace spanned by them), and so on.

This chapter can be considered an introduction to some important concepts of linear algebra. The student is invited to pursue further reading in any textbook on linear algebra, such as [47].5.13

Matlab/Octave examples related to this chapter appear in Appendix I.

Signal Projection Problems


Next Section:
Derivation of the Discrete Fourier Transform (DFT)
Previous Section:
Sinusoids and Exponentials