Geometric Signal Theory

Free Books Spectral Audio Signal Processing

In general, signals can be expanded as a linear combination of orthonormal basis signals $\varphi_k$ [264]. In the discrete-time case, this can be expressed as

$\displaystyle x(n)$	$\displaystyle =$	$\displaystyle \sum_{k=-\infty}^{\infty}\left<\varphi_k ,x\right> \varphi_k (n) \protect$	(12.104)
		$\displaystyle n\in(-\infty,\infty), \quad x(n), \varphi_k (n) \in {\cal C}$

where the coefficient of projection of onto $\varphi_k$ is given by

$\displaystyle \left<\varphi_k ,x\right> \; \isdef \sum_{n=-\infty}^{\infty}\varphi_k ^\ast(n) x(n) \qquad \qquad \hbox{(inner product)}$

(12.105)

and the basis signals are orthonormal:

$\displaystyle \left<\varphi_k ,\varphi_l \right> \eqsp \delta(k-l) \eqsp \left\{\begin{array}{ll} 1, & k=l \\ 0, & k\neq l \\ \end{array} \right. \qquad \hbox{(orthonormal)}$

(12.106)

The signal expansion (11.104) can be interpreted geometrically as a sum of orthogonal projections of onto $\{\varphi_k \}$ , as illustrated for 2D in Fig.11.30.

**Figure 11.30:** Orthogonal projection in 2D.
$\includegraphics[width=0.5\twidth]{eps/proj}$

A set of signals $\{h_k,f_k\}_{k=1}^N$ is said to be a biorthogonal basis set if any signal can be represented as

$\displaystyle x = \sum_{k=1}^N \alpha_k\left<x,h_k\right>f_k$

(12.107)

where $\alpha_k$ is some normalizing scalar dependent only on and/or . Thus, in a biorthogonal system, we project onto the signals and resynthesize in terms of the basis .

The following examples illustrate the Hilbert space point of view for various familiar cases of the Fourier transform and STFT. A more detailed introduction appears in Book I [264].

Natural Basis

The natural basis for a discrete-time signal is the set of shifted impulses:

$\displaystyle \varphi_k \isdefs [\ldots, 0,\underbrace{1}_{k^{\hbox{\tiny th}}},0,\ldots],$

(12.108)

or,

$\displaystyle \varphi_k (n) \isdefs \delta(n-k)$

(12.109)

for all integers and . The basis set is orthonormal since $\left<\varphi_k ,\varphi_l \right>=\delta(k-l)$ . The coefficient of projection of onto $\varphi_k$ is given by

$\displaystyle \left<\varphi_k ,x\right> \eqsp x(k)$

(12.110)

so that the expansion of in terms of the natural basis is simply

$\displaystyle x \eqsp \sum_{k=-\infty}^{\infty}x(k) \varphi_k ,$

(12.111)

i.e.,

$\displaystyle x(n)$	$\displaystyle =$	$\displaystyle \sum_{k=-\infty}^{\infty}x(k) \varphi_k (n)$
	$\displaystyle =$	$\displaystyle \sum_{k=-\infty}^{\infty}x(k) \delta(n-k) \isdefs (x \ast \delta)(n).$

This expansion was used in Book II [263] to derive the impulse-response representation of an arbitrary linear, time-invariant filter.

Normalized DFT Basis for ${\bf C}^N$

The Normalized Discrete Fourier Transform (NDFT) (introduced in Book I [264]) projects the signal onto discrete-time sinusoids of length , where the sinusoids are normalized to have unit norm:

$\displaystyle \varphi_k (n) \isdefs \frac{e^{j\omega_k n}}{\sqrt{N}}, \quad \omega_k \isdef k\frac{2\pi}{N},$

(12.112)

and $n,k \in [0,N-1]$ . The coefficient of projection of onto $\varphi_k$ is given by

$\displaystyle \left<\varphi_k ,x\right>$	$\displaystyle \isdef$	$\displaystyle \frac{1}{\sqrt{N}}\sum_{n=0}^{N-1} x(n) e^{-j\omega_k n}$
	$\displaystyle \isdef$	$\displaystyle \hbox{NDFT}_{N,k}(x) \isdefs \frac{1}{\sqrt{N}}\hbox{DFT}_{N,k}(x) \isdefs \frac{X(\omega_k )}{\sqrt{N}}$

and the expansion of in terms of the NDFT basis set is

$\displaystyle x(n)$	$\displaystyle =$	$\displaystyle \sum_{k=0}^{N-1} \left<\varphi_k ,x\right> \varphi_k (n)$
	$\displaystyle =$	$\displaystyle \frac{1}{N} \sum_{k=0}^{N-1} X(k)e^{j\omega_k n}$
	$\displaystyle \isdef$	$\displaystyle \hbox{DFT}_{N,n}^{-1}(X)$

for $n=0,1,2,\ldots,N-1$ .

Normalized Fourier Transform Basis

The Fourier transform projects a continuous-time signal onto an infinite set of continuous-time complex sinusoids $\exp(j\omega t)$ , for $t,\omega\in(-\infty,\infty)$ . These sinusoids all have infinite norm, but a simple normalization by $1/\sqrt{2\pi}$ can be chosen so that the inverse Fourier transform has the desired form of a superposition of projections:

$\displaystyle \varphi _\omega (t) \isdef e^{j\omega t}/\sqrt{2\pi},\quad \omega , t\in (-\infty,\infty)$

(12.113)

$\displaystyle \displaystyle \implies\quad \left<\varphi _\omega ,x\right>$	$\displaystyle =$	$\displaystyle \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty} x(t) e^{-j\omega t} dt$
	$\displaystyle \isdef$	$\displaystyle \hbox{FT}_\omega (x)/\sqrt{2\pi} \isdef X(\omega )/\sqrt{2\pi}$

$\displaystyle x(t)$	$\displaystyle =$	$\displaystyle \int_{-\infty}^{\infty} \left<\varphi _\omega ,x\right> \varphi _\omega (t) d\omega$
	$\displaystyle =$	$\displaystyle \frac{1}{2\pi} \int_{-\infty}^{\infty} X(\omega )e^{j\omega t} d\omega$
	$\displaystyle \isdef$	$\displaystyle \hbox{FT}_t^{-1}(X)$

Normalized DTFT Basis

The Discrete Time Fourier Transform (DTFT) is similar to the Fourier transform case:

$\displaystyle \varphi _\omega (n) \isdef e^{j\omega n}/\sqrt{2\pi},\quad \omega \in (-\pi,\pi], \quad n\in (-\infty,\infty)$

(12.114)

The inner product $\left<\varphi _\omega ,x\right>$ and reconstruction of in terms of $\{\varphi _\omega \}$ are left as exercises.

Normalized STFT Basis

The Short Time Fourier Transform (STFT) is defined as a time-ordered sequence of DTFTs, and implemented in practice as a sequence of FFTs (see §7.1). Thus, the signal basis functions are naturally defined as the DFT-sinusoids multiplied by time-shifted windows, suitably normalized for unit norm:

$\displaystyle \varphi_{mk}(n) \isdef \frac{w(n-mR)e^{j\omega_k n}}{\left\Vert\,\hbox{\sc Shift}_{mR}(w) e^{j\omega_k (\cdot)}\,\right\Vert} = \frac{w(n-mR) e^{j\omega_k n}}{\sqrt{\sum_n{w^2(n)}}},$

(12.115)

$\displaystyle \omega_k = \frac{2\pi k}{N}, \quad k \in [0,N-1], \quad n\in (-\infty,\infty),\quad w(n)\in{\cal R},$

(12.116)

and is the DFT length.

When successive windows overlap (i.e., the hop size is less than the window length ), the basis functions are not orgthogonal. In this case, we may say that the basis set is overcomplete.

The basis signals are orthonormal when and the rectangular window is used ( ). That is, two rectangularly windowed DFT sinusoids are orthogonal when either the frequency bin-numbers or the time frame-numbers differ, provided that the window length equals the number of DFT frequencies (no zero padding). In other words, we obtain an orthogonal basis set in the STFT when the hop size, window length, and DFT length are all equal (in which case the rectangular window must be used to retain the perfect-reconstruction property). In this case, we can write

$\displaystyle \varphi_{mk}= \hbox{\sc Shift}_{mN}\left[\hbox{\sc ZeroPad}_\infty\left(\varphi_k ^{\hbox{\tiny DFT}}\right)\right],$

(12.117)

i.e.,

$\displaystyle \varphi_{mk}(n) = \left\{\begin{array}{ll} \frac{e^{j\omega_k n}}{\sqrt{N}}, & mN \leq n \leq (m+1)N-1 \\ [5pt] 0, & \mbox{otherwise.} \\ \end{array} \right.$

(12.118)

The coefficient of projection can be written

$\displaystyle \displaystyle \left<\varphi_{mk},x\right>$	$\displaystyle =$	$\displaystyle \frac{1}{\sqrt{N}} \sum_{n=-\infty}^{\infty} x(n) w_R(n-mN) e^{-j\omega_k n}$
	$\displaystyle \isdef$	$\displaystyle \frac{\hbox{STFT}_{N,m,k}(x)}{\sqrt{N}} \isdefs \frac{X_m(\omega_k )}{\sqrt{N}}$

so that the signal expansion can be interpreted as

$\displaystyle \displaystyle x(n)$	$\displaystyle =$	$\displaystyle \sum_{m=-\infty}^{\infty}\sum_{k=0}^{N-1} \left<\varphi_{mk},x\right> \varphi_{mk}(n)$
	$\displaystyle =$	$\displaystyle \sum_{m=-\infty}^{\infty} w_R(n-mN)\frac{1}{N}\sum_{k=0}^{N-1} X_m(\omega_k )e^{j\omega_k n}$
	$\displaystyle =$	$\displaystyle \sum_{m=-\infty}^{\infty} \hbox{\sc Shift}_{mN,n}\left\{\hbox{\sc ZeroPad}_\infty\left[\hbox{DFT}_N^{-1}(X_m)\right]\right\}$
	$\displaystyle \isdef$	$\displaystyle \hbox{STFT}_{N,n}^{-1}(X)$

In the overcomplete case, we get a special case of weighted overlap-add (§8.6):

$\displaystyle \displaystyle x(n)$	$\displaystyle =$	$\displaystyle \sum_{m=-\infty}^{\infty}\sum_{k=0}^{N-1} \left<\varphi_{mk},x\right> \varphi_{mk}(n)$
	$\displaystyle =$	$\displaystyle \sum_{m=-\infty}^{\infty} \frac{1}{N}\sum_{k=0}^{N-1} X_m(\omega_k ) w(n-mN)e^{j\omega_k n}$

Continuous Wavelet Transform

In the present (Hilbert space) setting, we can now easily define the continuous wavelet transform in terms of its signal basis set:

$\displaystyle \displaystyle \varphi_{s\tau}(t)$	$\displaystyle \isdef$	$\displaystyle \frac{1}{\sqrt{\vert s\vert}} f^\ast\left(\frac{\tau-t}{s}\right), \qquad \tau,s,t \in (-\infty,\infty)$
$\displaystyle X(s,\tau)$	$\displaystyle \isdef$	$\displaystyle \frac{1}{\sqrt{\vert s\vert}} \int_{-\infty}^{\infty} x(t) f\left(\frac{t-\tau}{s}\right) dt$

The parameter is called a scale parameter (analogous to frequency). The normalization by $1/\sqrt{\vert s\vert}$ maintains energy invariance as a function of scale. We call $X(s,\tau)$ the wavelet coefficient at scale and time $\tau$ . The kernel of the wavelet transform is called the mother wavelet, and it typically has a bandpass spectrum. A qualitative example is shown in Fig.11.31.

**Figure:** Typical qualitative appearance of first three wavelets when the scale parameter is .
$\includegraphics[width=0.8\twidth]{eps/wavelets}$

The so-called admissibility condition for a mother wavelet $\psi(t)$ is

$\displaystyle C_\psi = \int_{-\infty}^\infty \frac{\vert\Psi(\omega )\vert^2}{\vert\omega \vert}d\omega < \infty .$

Given sufficient decay with $\omega$ , this reduces to $\Psi(0)=0$ , that is, the mother wavelet must be zero-mean.

The Morlet wavelet is simply a Gaussian-windowed complex sinusoid:

$\displaystyle \psi(t)$	$\displaystyle \isdef$	$\displaystyle \frac{1}{\sqrt{2\pi}} e^{-j\omega _0 t} e^{-t^2/2}$
$\displaystyle \;\longleftrightarrow\;\quad \Psi(\omega )$	$\displaystyle =$	$\displaystyle e^{-(\omega -\omega _0)^2/2}$

The scale factor is chosen so that $\left\Vert\,\psi\,\right\Vert=1$ . The center frequency $\omega_0$ is typically chosen so that second peak is half of first:

$\displaystyle \omega _0\eqsp \pi\sqrt{2/\hbox{ln}2} \;\approx\; 5.336$

(12.119)

In this case, we have $\Psi(0)\approx 7\times10^{-7}\approx 0$ , which is close enough to zero-mean for most practical purposes.

Since the scale parameter of a wavelet transform is analogous to frequency in a Fourier transform, a wavelet transform display is often called a scalogram, in analogy with an STFT ``spectrogram'' (discussed in §7.2).

When the mother wavelet can be interpreted as a windowed sinusoid (such as the Morlet wavelet), the wavelet transform can be interpreted as a constant-Q Fourier transform.^12.5Before the theory of wavelets, constant-Q Fourier transforms (such as obtained from a classic third-octave filter bank) were not easy to invert, because the basis signals were not orthogonal. See Appendix E for related discussion.

Discrete Wavelet Transform

The discrete wavelet transform is a discrete-time, discrete-frequency counterpart of the continuous wavelet transform of the previous section:

$\displaystyle X(k,n)$	$\displaystyle =$	$\displaystyle s^{-k/2} \int_{-\infty}^\infty x(t) h\left(nT-a^{-k}t\right) dt$
	$\displaystyle =$	$\displaystyle \int_{-\infty}^\infty x(t) h\left(na^k T-t\right) dt$

where and range over the integers, and is the mother wavelet, interpreted here as a (continuous) filter impulse response.

The inverse transform is, as always, the signal expansion in terms of the orthonormal basis set:

$\displaystyle x(t) = \sum_k \sum_n X(k,n) \underbrace{\varphi _{kn}(t)}_{\hbox{basis}}$

(12.120)

We can show that discrete wavelet transforms are constant-Q by defining the center frequency of the th basis signal as the geometric mean of its bandlimits $\omega_1$ and $\omega _2$ , i.e.,

$\displaystyle \omega _c(k) \isdef \sqrt{\omega _1(k)\,\omega _2(k)} \eqsp \sqrt{a^k\omega _1(0)\,a^k\omega _2(0)} \eqsp a^k\omega _c(0).$

(12.121)

Then

$\displaystyle Q(k) \isdefs \frac{\omega _c(k)}{\omega _2(k) - \omega _1(k)} \eqsp \frac{a^k\omega _c(0)}{a^k\omega _2(0) - a^k\omega _1(0)} \eqsp Q(0)$

(12.122)

which does not depend on .

Discrete Wavelet Filterbank

In a discrete wavelet filterbank, each basis signal is interpreted as the impulse response of a bandpass filter in a constant-Q filter bank:

$\displaystyle h_k(t)$	$\displaystyle =$	$\displaystyle \frac{1}{\sqrt{a^k}}\, h\left(\frac{t}{a^k}\right), \quad a>1$
$\displaystyle \longleftrightarrow\quad H_k(\omega )$	$\displaystyle =$	$\displaystyle \sqrt{a^k}\, H(a^k\omega )$

Thus, the th channel-filter $H_k(\omega )$ is obtained by frequency-scaling (and normalizing for unit energy) the zeroth channel filter $H_0(\omega )$ . The frequency scale-factor is of course equal to the inverse of the time-scale factor.

Recall that in the STFT, channel filter $H_k(\omega )$ is a shift of the zeroth channel-filter $H_0(\omega )$ (which corresponds to ``cosine modulation'' in the time domain).

As the channel-number increases, the channel impulse response lengthens by the factor ., while the pass-band of its frequency-response narrows by the inverse factor $a^{-k}$ .

Figure 11.32 shows a block diagram of the discrete wavelet filter bank for (the ``dyadic'' or ``octave filter-bank'' case), and Fig.11.33 shows its time-frequency tiling as compared to that of the STFT. The synthesis filters may be used to make a biorthogonal filter bank. If the are orthonormal, then .

**Figure 11.32:** Dyadic Biorthogonal Wavelet Filterbank
$\includegraphics[width=\twidth]{eps/DyadicFilterbank}$

**Figure 11.33:** Time-frequency tiling for the Short-Time Fourier Transform (left) and dyadic wavelet filter bank (right).
$\includegraphics[width=0.8\twidth]{eps/DyadicTiling}$

Dyadic Filter Banks

**Figure 11.34:** Frequency response magnitudes for a dyadic filter bank (amplitude scalings optional).
$\includegraphics[width=0.7\twidth]{eps/dyadicFilters}$

A dyadic filter bank is any octave filter bank,^12.6 as illustrated qualitatively in Figure 11.34. Note that $H_0(\omega )$ is the top-octave bandpass filter, $H_1(w) = \sqrt{2} H_0(2\omega )$ is the bandpass filter for next octave down, $H_2(w) = 2H_0(4\omega )$ is the octave bandpass below that, and so on. The optional scale factors result in the same sum-of-squares for each channel-filter impulse response.

A dyadic filter bank may be derived from the discrete wavelet filter bank by setting and relaxing the exact orthonormality requirement on the channel-filter impulse responses. If they do happen to be orthonormal, we may call it a dyadic wavelet filter bank.

For a dyadic filter bank, the center-frequency of the th channel-filter impulse response can be defined as

$\displaystyle \omega _c(k) \eqsp \sqrt{\omega _0(\omega _0+\hbox{bandwidth})} \eqsp \sqrt{2}\omega _0$

(12.123)

so that

$\displaystyle Q \eqsp \frac{\sqrt{2}\omega _0}{2\omega _0 - \omega _0} \eqsp \sqrt{2}.$

(12.124)

Thus, a dyadic filter bank is a special case of a constant-Q filter bank for which the is $\sqrt{2}$ .

Dyadic Filter Bank Design

Design of dyadic filter banks using the window method for FIR digital filter design (introduced in §4.5) is described in, e.g., [226, §6.2.3b].

A ``very easy'' method suggested in [287, §11.6] is to design a two-channel paraunitary QMF bank, and repeat recursively to split the lower-half of the spectrum down to some desired depth.

Generalized STFT

A generalized STFT may be defined by [287]

$\displaystyle x_k(n)$	$\displaystyle =$	$\displaystyle (x\ast h_k)(nR_k) \eqsp \sum_{m=-\infty}^{\infty}x(m) \underbrace{h_k(nR_k - m)}_{\hbox{analysis filter}}$
$\displaystyle x(n)$	$\displaystyle =$	$\displaystyle \sum_k (x_k\ast f_k)(n) \eqsp \sum_{k=0}^{N-1}\sum_{m=-\infty}^{\infty}x_k(m) \underbrace{f_k(n-m R_k)}_{\hbox{synthesis filter}}$

This filter bank and its reconstruction are diagrammed in Fig.11.35.

**Figure 11.35:** Generalized STFT
$\includegraphics[width=\twidth]{eps/GenSTFT}$

The analysis filter is typically complex bandpass (as in the STFT case). The integers give the downsampling factor for the output of the th channel filter: For critical sampling without aliasing, we set $R_k= \pi/\hbox{Width}(H_k)$ . The impulse response of synthesis filter can be regarded as the th basis signal in the reconstruction. If the $\{f_k\}$ are orthonormal, then we have $f_k(n) = h_k^\ast(-n)$ . More generally, $\{h_k,f_k\}$ form a biorthogonal basis.