Recursive Digital Filter Design

The subject of digital filter design is enormous--much larger than we can hope to address in this book. However, a surprisingly large number of applications can be addressed using small filter sections which are easily designed by hand, as exemplified in Appendix B. This appendix describes some of the ``classic'' methods for IIR filter design based on the bilinear transformation of prototype analog filters, followed by the simple but powerful weighted equation error method for general purpose IIR design. For further information on digital filter design, see the documentation for the Matlab Toolboxes for Signal Processing and Filter Design, and/or [64,68,60,78].

Lowpass Filter Design

We have discussed in detail (Chapter 1) the simplest lowpass filter, $ y(n) = x(n) + x(n - 1)$ having the transfer function $ H(z)=1+z^{-1}$ with one zero at $ z = -1$ and one pole at $ z=0$. From the graphical method for visualizing the amplitude response8.2), we see that this filter totally rejects signal energy at half the sampling rate, while lower frequencies experience higher gains, reaching a maximum at $ \omega=0$. We also see that the pole at $ z=0$ has no effect on the amplitude response.

A high quality lowpass filter should look more like the ``box car'' amplitude response shown in Fig.1.1. While it is impossible to achieve this ideal response exactly using a finite-order filter, we can come arbitrarily close. We can expect the amplitude response to improve if we add another pole or zero to the implementation.

Perhaps the best known ``classical'' methods for lowpass filter designs are those derived from analog Butterworth, Chebyshev, and Elliptic Function filters [64]. These generally yield IIR filters with the same number of poles as zeros. When an FIR lowpass filter is desired, different design methods are used, such as the window method [68, p. 88] (Matlab functions fir1 and fir2), Remez exchange algorithm [68, pp. 136-140], [64, pp. 89-106] (Matlab functions remez and cremez), linear programming [93], [68, p. 140], and convex optimization [67]. This section will describe only Butterworth IIR lowpass design in some detail. For the remaining classical cases (Chebyshev, Inverse Chebyshev, and Elliptic), see, e.g., [64, Chapter 7] and/or Matlab/Octave functions butter, cheby1, cheby2, and ellip.


Butterworth Lowpass Design

Almost all methods for filter design are optimal in some sense, and the choice of optimality determines nature of the design. Butterworth filters are optimal in the sense of having a maximally flat amplitude response, as measured using a Taylor series expansion about dc [64, p. 162]. Of course, the trivial filter $ H(z)=1$ has a perfectly flat amplitude response, but that's an allpass, not a lowpass filter. Therefore, to constrain the optimization to the space of lowpass filters, we need constraints on the design, such as $ H(1)=1$ and $ H(-1)=0$. That is, we may require the dc gain to be 1, and the gain at half the sampling rate to be 0.

It turns out Butterworth filters (as well as Chebyshev and Elliptic Function filter types) are much easier to design as analog filters which are then converted to digital filters. This means carrying out the design over the $ s$ plane instead of the $ z$ plane, where the $ s$ plane is the complex plane over which analog filter transfer functions are defined. The analog transfer function $ H_a(s)$ is very much like the digital transfer function $ H(z)$, except that it is interpreted relative to the analog frequency axis $ s=j\omega_a$ (the ``$ j\omega$ axis'') instead of the digital frequency axis $ z=e^{j\omega_d T}$ (the ``unit circle''). In particular, analog filter poles are stable if and only if they are all in the left-half of the $ s$ plane, i.e., their real parts are negative. An introduction to Laplace transforms is given in Appendix D, and an introduction to converting analog transfer functions to digital transfer functions using the bilinear transform appears in §I.3.

Butterworth Lowpass Poles and Zeros

When the maximally flat optimality criterion is applied to the general (analog) squared amplitude response $ G_a^2(\omega_a)\isdef \left\vert H_a(j\omega_a)\right\vert^2$, a surprisingly simple result is obtained [64]:

$\displaystyle G_a^2(\omega_a) = \frac{1}{1+\omega_a^{2N}} \protect$ (I.1)

where $ N$ is the desired order (number of poles). This simple result is obtained when the response is taken to be maximally flat at $ \omega_a=\infty$ as well as dc (i.e., when both $ G_a^2(\omega_a)$ and $ G_a^2(1/\omega_a)$ are maximally flat at dc).I.1Also, an arbitrary scale factor for $ \omega_a$ has been set such that the cut-off frequency (-3dB frequency) is $ \omega_c = 1$ rad/sec.

The analytic continuationD.2) of $ G_a^2(\omega_a)$ to the whole $ s$-plane may be obtained by substituting $ \omega_a = s/j$ to obtain

$\displaystyle H_a(s)H_a(-s) = \frac{1}{1+\left(\frac{s}{j}\right)^{2N}} =
\frac{1}{1+(-1)^Ns^{2N}}
$

The $ 2N$ poles of this expression are simply the roots of unity when $ N$ is odd, and the roots of $ -1$ when $ N$ is even. Half of these poles $ s_k$ are in the left-half $ s$-plane ( re$ \left\{s_k\right\}<0$) and thus belong to $ H_a(s)$ (which must be stable). The other half belong to $ H_a(-s)$. In summary, the poles of an $ N$th-order Butterworth lowpass prototype are located in the $ s$-plane at $ s_k = \sigma_k +
j\omega_k = e^{-j\theta_k}$, where [64, p. 168]

\begin{displaymath}\begin{array}{rcrl} \sigma_k &=&-\!&\sin(\theta_k)\\ \omega_k &=&&\cos(\theta_k) \end{array} \protect\end{displaymath} (I.2)

with

$\displaystyle \theta_k \isdef \frac{(2k+1)\pi}{2N}
$

for $ k=0,1,2,\dots,N-1$. These poles may be quickly found graphically by placing $ 2N$ poles uniformly distributed around the unit circle (in the $ s$ plane, not the $ z$ plane--this is not a frequency axis) in such a way that each complex pole has a complex-conjugate counterpart.

A Butterworth lowpass filter additionally has $ N$ zeros at $ s=\infty$. Under the bilinear transform $ s = c(z-1)/(z+1)$, these all map to the point $ z = -1$, which determines the numerator of the digital filter as $ (1+z^{-1})^N$.

Given the poles and zeros of the analog prototype, it is straightforward to convert to digital form by means of the bilinear transformation.


Example: Second-Order Butterworth Lowpass

In the second-order case, we have, for the analog prototype,

$\displaystyle H_a(s) = \frac{1}{(s + a)(s + \overline{a})}
$

where, from Eq.$ \,$(I.2), $ a = e^{j\pi/4}$, so that

$\displaystyle H_a(s) = \frac{1}{(s + e^{j\pi/4})(s + e^{-j\pi/4})} = \frac{1}{s^2 + \sqrt{2}s + 1} \protect$ (I.3)

To convert this to digital form, we apply the bilinear transform

$\displaystyle s = c\frac{1-z^{-1}}{1+z^{-1}}
$

(from Eq.$ \,$(I.9)), where, as discussed in §I.3, we set

$\displaystyle c = \cot(\omega_cT/2) \isdef \frac{\cos(\omega_cT/2)}{\sin(\omega_cT/2)}
$

to obtain a digital cut-off frequency at $ \omega_c$ radians per second. For example, choosing $ \omega_c T = \pi/2$ (a cut off at one-fourth the sampling rate), we get

$\displaystyle c = \frac{\cos(\pi/4)}{\sin(\pi/4)} = 1
$

and the digital filter transfer function is
$\displaystyle H_d(z)$ $\displaystyle =$ $\displaystyle H_a\left(\frac{1-z^{-1}}{1+z^{-1}}\right) =
\frac{1}{\left(\frac{1-z^{-1}}{1+z^{-1}}\right)^2 + \sqrt{2}\left(\frac{1-z^{-1}}{1+z^{-1}}\right) + 1}$ (I.4)
  $\displaystyle =$ $\displaystyle \frac{(1+z^{-1})^2}{(1-2z^{-1}+z^{-2}) + (\sqrt{2} - \sqrt{2}z^{-2}) + (1+2z^{-1}+z^{-2})}$ (I.5)
  $\displaystyle =$ $\displaystyle \frac{(1+z^{-1})^2}{(2+\sqrt{2}) + (2-\sqrt{2})z^{-2}}$ (I.6)
  $\displaystyle =$ $\displaystyle \frac{1}{2+\sqrt{2}}\frac{(1+z^{-1})^2}{1 + \frac{2-\sqrt{2}}{2+\sqrt{2}}z^{-2}}$ (I.7)

Note that the numerator is $ (1+z^{-1})^2$, as predicted earlier. As a check, we can verify that the dc gain is 1:

$\displaystyle H_d(1) = \frac{2^2}{2+\sqrt{2} + 2-\sqrt{2}} = 1
$

It is also immediately verified that $ H_d(-1) = 0$, i.e., that there is a (double) notch at half the sampling rate.

In the analog prototype, the cut-off frequency is $ \omega_a=1$ rad/sec, where, from Eq.$ \,$(I.1), the amplitude response is $ G_a(j)=1/\sqrt{2}$. Since we mapped the cut-off frequency precisely under the bilinear transform, we expect the digital filter to have precisely this gain. The digital frequency response at one-fourth the sampling rate is

$\displaystyle H_d(j) = \frac{(1-j)^2}{2+\sqrt{2} - (2-\sqrt{2})} = -\frac{j}{\sqrt{2}}, \protect$ (I.8)

and $ 20\log_{10}(\left\vert H_d(j)\right\vert)=-3$ dB as expected.

Note from Eq.$ \,$(I.8) that the phase at cut-off is exactly -90 degrees in the digital filter. This can be verified against the pole-zero diagram in the $ z$ plane, which has two zeros at $ z = -1$, each contributing +45 degrees, and two poles at $ z=\pm
j\sqrt{\frac{2-\sqrt{2}}{2+\sqrt{2}}}$, each contributing -90 degrees. Thus, the calculated phase-response at the cut-off frequency agrees with what we expect from the digital pole-zero diagram.

In the $ s$ plane, it is not as easy to use the pole-zero diagram to calculate the phase at $ \omega_a=1$, but using Eq.$ \,$(I.3), we quickly obtain

$\displaystyle H_a(j\cdot 1) = \frac{1}{j^2 + \sqrt{2}j + 1} = -\frac{j}{\sqrt{2}},
$

and exact agreement with $ H_d(e^{j\pi/2})$ [Eq.$ \,$(I.8)] is verified.

A related example appears in §9.2.4.


Digitizing Analog Filters with the
Bilinear Transformation

The desirable properties of many filter types (such as lowpass, highpass, and bandpass) are preserved very well by the $ s\leftrightarrow z$ mapping called the bilinear transform.

Bilinear Transformation

The bilinear transform may be defined by

$\displaystyle s$ $\displaystyle =$ $\displaystyle c\frac{1-z^{-1}}{1+z^{-1}}\protect$ (I.9)
$\displaystyle z^{-1}$ $\displaystyle =$ $\displaystyle \frac{1-s/c}{1+s/c}\protect$ (I.10)

where $ c$ is an arbitrary positive constant that we may set to map one analog frequency precisely to one digital frequency. In the case of a lowpass or highpass filter, $ c$ is typically used to set the cut-off frequency to be identical in the analog and digital cases.


Frequency Warping

It is easy to check that the bilinear transform gives a one-to-one, order-preserving, conformal map [57] between the analog frequency axis $ s=j\omega_a$ and the digital frequency axis $ z=e^{j\omega_d T}$, where $ T$ is the sampling interval. Therefore, the amplitude response takes on exactly the same values over both axes, with the only defect being a frequency warping such that equal increments along the unit circle in the $ z$ plane correspond to larger and larger bandwidths along the $ j\omega$ axis in the $ s$ plane [88]. Some kind of frequency warping is obviously unavoidable in any one-to-one map because the analog frequency axis is infinite while the digital frequency axis is finite. The relation between the analog and digital frequency axes may be derived immediately from Eq.$ \,$(I.9) as

\begin{eqnarray*}
j\omega_a &=& c\frac{1-e^{-j\omega_d T}}{1+e^{-j\omega_d T}} =...
...sin(\omega_dT/2)}{\cos(\omega_dT/2)}\\
&=& jc\tan(\omega_dT/2).
\end{eqnarray*}

Given an analog cut-off frequency $ \omega_a=\omega_c$, to obtain the same cut-off frequency in the digital filter, we set

$\displaystyle c = \omega_c\cot(\omega_cT/2)
$


Analog Prototype Filter

Since the digital cut-off frequency may be set to any value, irrespective of the analog cut-off frequency, it is convenient to set the analog cut-off frequency to $ \omega_c = 1$. In this case, the bilinear-transform constant $ c$ is simply set to

$\displaystyle c = \cot(\omega_cT/2)
$

when carrying out mapping Eq.$ \,$(I.9) to convert the analog prototype to a digital filter with cut-off at frequency $ \omega_c$.


Examples

Examples of using the bilinear transform to ``digitize'' analog filters may be found in §I.2 and [64,5,6,103,86]. Bilinear transform design is also inherent in the construction of wave digital filters [25,86].


Filter Design by Minimizing the
L2 Equation-Error Norm

One of the simplest formulations of recursive digital filter design is based on minimizing the equation error. This method allows matching of both spectral phase and magnitude. Equation-error methods can be classified as variations of Prony's method [48]. Equation error minimization is used very often in the field of system identification [46,30,78].

The problem of fitting a digital filter to a given spectrum may be formulated as follows:

Given a continuous complex function $ H(e^{j\omega}),\,-\pi < \omega \leq \pi$, corresponding to a causalI.2 desired frequency-response, find a stable digital filter of the form

$\displaystyle \hat{H}(z) \isdef \frac{\hat{B}(z)}{\hat{A}(z)},
$

where

\begin{eqnarray*}
\hat{B}(z) &\isdef \hat{b}_0 + \hat{b}_1 z^{-1} + \cdots + \ha...
...1 + \hat{a}_1 z^{-1} + \cdots + \hat{a}_{{n}_a}z^{-{{n}_a}} ,\\
\end{eqnarray*}

with $ {{n}_b},{{n}_a}$ given, such that some norm of the error

$\displaystyle J(\hat{\theta}) \isdef \left\Vert\,H(e^{j\omega}) - \hat{H}(e^{j\omega})\,\right\Vert
$

is minimum with respect to the filter coefficients

$\displaystyle \hat{\theta}^T\isdef \left[\hat{b}_0,\hat{b}_1,\ldots\,,\hat{b}_{{n}_b},\hat{a}_1,\hat{a}_2,\ldots\,,\hat{a}_{{n}_a}\right]^T,
$

which are constrained to lie in a subset $ \hat{\Theta}\subset\Re ^{N}$, where $ N\isdef {{n}_a}+{{n}_b}+1$. When explicitly stated, the filter coefficients may be complex, in which case $ \hat{\Theta}\subset{\bf C}^{N}$.

The approximate filter $ \hat{H}$ is typically constrained to be stable, and since positive powers of $ z$ do not appear in $ \hat{B}(z)$, stability implies causality. Consequently, the impulse response of the filter $ \hat{h}(n)$ is zero for $ n < 0$. If $ H$ were noncausal, all impulse-response components $ h(n)$ for $ n < 0$ would be approximated by zero.

Equation Error Formulation

The equation error is defined (in the frequency domain) as

$\displaystyle E_{\mbox{ee}}(e^{j\omega}) \isdef \hat{A}(e^{j\omega})H(e^{j\omega}) - \hat{B}(e^{j\omega})
$

By comparison, the more natural frequency-domain error is the so-called output error:

$\displaystyle E_{\mbox{oe}}(e^{j\omega}) \isdef H(e^{j\omega}) - \frac{\hat{B}(e^{j\omega})}{\hat{A}(e^{j\omega})}
$

The names of these errors make the most sense in the time domain. Let $ x(n)$ and $ y(n)$ denote the filter input and output, respectively, at time $ n$. Then the equation error is the error in the difference equation:

\begin{eqnarray*}
e_{\mbox{ee}}(n) &=& y(n) + \hat{a}_1 y(n-1) + \cdots + \hat{a...
...0 x(n) - \hat{b}_1 x(n-1) - \cdots - \hat{b}_{{n}_b}x(n-{{n}_b})
\end{eqnarray*}

while the output error is the difference between the ideal and approximate filter outputs:

\begin{eqnarray*}
e_{\mbox{oe}}(n) &=& y(n) - \hat{y}(n) \\
\hat{y}(n) &=& \ha...
...}_1 \hat{y}(n-1) - \cdots - \hat{a}_{{{n}_a}} \hat{y}(n-{{n}_a})
\end{eqnarray*}

Denote the $ L2$ norm of the equation error by

$\displaystyle J_E(\hat{\theta}) \isdef \left\Vert\,\hat{A}(e^{j\omega})H(e^{j\omega}) - \hat{B}(e^{j\omega})\,\right\Vert _2,$ (I.11)

where $ \hat{\theta}^T = [\hat{b}_0,\hat{b}_1,\ldots,\hat{b}_{{n}_b}, \hat{a}_1,\ldots, \hat{a}_{{n}_a}]$ is the vector of unknown filter coefficients. Then the problem is to minimize this norm with respect to $ \hat{\theta}$. What makes the equation-error so easy to minimize is that it is linear in the parameters. In the time-domain form, it is clear that the equation error is linear in the unknowns $ \hat{a}_i,\hat{b}_i$. When the error is linear in the parameters, the sum of squared errors is a quadratic form which can be minimized using one iteration of Newton's method. In other words, minimizing the $ L2$ norm of any error which is linear in the parameters results in a set of linear equations to solve. In the case of the equation-error minimization at hand, we will obtain $ {{n}_b}+{{n}_a}+1$ linear equations in as many unknowns.

Note that (I.11) can be expressed as

$\displaystyle J_E(\hat{\theta}) = \left\Vert\,\left\vert\hat{A}(e^{j\omega})\ri...
...ot\left\vert H(e^{j\omega}) - \hat{H}(e^{j\omega})\right\vert\,\right\Vert _2.
$

Thus, the equation-error can be interpreted as a weighted output error in which the frequency weighting function on the unit circle is given by $ \vert\hat{A}(e^{j\omega})\vert$. Thus, the weighting function is determined by the filter poles, and the error is weighted less near the poles. Since the poles of a good filter-design tend toward regions of high spectral energy, or toward ``irregularities'' in the spectrum, it is evident that the equation-error criterion assigns less importance to the most prominent or structured spectral regions. On the other hand, far away from the roots of $ \hat{A}(z)$, good fits to both phase and magnitude can be expected. The weighting effect can be eliminated through use of the Steiglitz-McBride algorithm [45,78] which iteratively solves the weighted equation-error solution, using the canceling weight function from the previous iteration. When it converges (which is typical in practice), it must converge to the output error minimizer.


Error Weighting and Frequency Warping

Audio filter designs typically benefit from an error weighting function that weights frequencies according to their audibility. An oversimplified but useful weighting function is simply $ 1/\omega$, in which low frequencies are deemed generally more important than high frequencies. Audio filter designs also typically improve when using a frequency warping, such as described in [88,78] (and similar to that in §I.3.2). In principle, the effect of a frequency-warping can be achieved using a weighting function, but in practice, the numerical performance of a frequency warping is often much better.


Stability of Equation Error Designs

A problem with equation-error methods is that stability of the filter design is not guaranteed. When an unstable design is encountered, one common remedy is to reflect unstable poles inside the unit circle, leaving the magnitude response unchanged while modifying the phase of the approximation in an ad hoc manner. This requires polynomial factorization of $ \hat{A}(z)$ to find the filter poles, which is typically more work than the filter design itself.

A better way to address the instability problem is to repeat the filter design employing a bulk delay. This amounts to replacing $ H(e^{j\omega})$ by

$\displaystyle H_\tau(e^{j\omega}) \isdef e^{-\omega \tau} H(e^{j\omega}),\quad\tau>0,
$

and minimizing $ \vert\vert\,\hat{A}(e^{j\omega})H_\tau(e^{j\omega}) - \hat{B}(e^{j\omega})\,\vert\vert _2$. This effectively delays the desired impulse response, i.e., $ h_\tau(n)=h(n-\tau)$. As the bulk delay is increased, the likelihood of obtaining an unstable design decreases, for reasons discussed in the next paragraph.

Unstable equation-error designs are especially likely when $ H(e^{j\omega})$ is noncausal. Since there are no constraints on where the poles of $ \hat{H}$ can be, one can expect unstable designs for desired frequency-response functions having a linear phase trend with positive slope.

In the other direction, experience has shown that best results are obtained when $ H(z)$ is minimum phase, i.e., when all the zeros of $ H(z)$ are inside the unit circle. For a given magnitude, $ \vert H(e^{j\omega})\vert$, minimum phase gives the maximum concentration of impulse-response energy near the time origin $ n = 0$. Consequently, the impulse-response tends to start large and decay immediately. For non-minimum phase $ H$, the impulse-response $ h(n)$ may be small for the first $ {{n}_b}+1$ samples, and the equation error method can yield very poor filters in these cases. To see why this is so, consider a desired impulse-response $ h(n)$ which is zero for $ n\leq{{n}_b}$, and arbitrary thereafter. Transforming $ J_E^2$ into the time domain yields

\begin{eqnarray*}
J_E^2(\hat{\theta}) &=& \left\Vert\,\hat{a}\ast h(n) - \hat{b}...
... \sum_{n={{n}_b}+1}^\infty
\left(\hat{a}\ast h(n)\right)^2,\\
\end{eqnarray*}

where ``$ \ast $'' denotes convolution, and the additive decomposition is due the fact that $ \hat{a}\ast h(n)=0$ for $ n\leq{{n}_b}$. In this case the minimum occurs for $ \hat{B}(z)=0\,\,\Rightarrow\,\,\hat{H}(z)\equiv 0$! Clearly this is not a particularly good fit. Thus, the introduction of bulk-delay to guard against unstable designs is limited by this phenomenon.

It should be emphasized that for minimum-phase $ H(e^{j\omega})$, equation-error methods are very effective. It is simple to convert a desired magnitude response into a minimum-phase frequency-response by use of cepstral techniques [22,60] (see also the appendix below), and this is highly recommended when minimizing equation error. Finally, the error weighting by $ \vert\hat{A}(e^{j\omega_k})\vert$ can usually be removed by a few iterations of the Steiglitz-McBride algorithm.


An FFT-Based Equation-Error Method

The algorithm below minimizes the equation error in the frequency-domain. As a result, it can make use of the FFT for speed. This algorithm is implemented in Matlab's invfreqz() function when no iteration-count is specified. (The iteration count gives that many iterations of the Steiglitz-McBride algorithm, thus transforming equation error to output error after a few iterations. There is also a time-domain implementation of the Steiglitz-McBride algorithm called stmcb() in the Matlab Signal Processing Toolbox, which takes the desired impulse response as input.)

Given a desired spectrum $ H(e^{j\omega_k})$ at equally spaced frequencies $ \omega_k = 2\pi k/N, k=0,\ldots\,,N-1$, with $ N$ a power of $ 2$, it is desired to find a rational digital filter with $ {{n}_b}$ zeros and $ {{n}_a}$ poles,

$\displaystyle \hat{H}(z) \isdef \frac{\hat{B}(z)}{\hat{A}(z)}
\isdef \frac{\sum_{k=0}^{{n}_b}b_k z^{-k}}{\sum_{k=0}^{{n}_a}a_k z^{-k} },$

normalized by $ a_0 = 1$, such that

$\displaystyle J^2_E = \sum_{k=0}^{N-1} \left\vert\hat{A}(e^{j\omega_k})H(e^{j\omega_k})-\hat{B}(e^{j\omega_k})\right\vert^2
$

is minimized.

Since $ J^2_E$ is a quadratic form, the solution is readily obtained by equating the gradient to zero. An easier derivation follows from minimizing equation error variance in the time domain and making use of the orthogonality principle [36]. This may be viewed as a system identification problem where the known input signal is an impulse, and the known output is the desired impulse response. A formulation employing an arbitrary known input is valuable for introducing complex weighting across the frequency grid, and this general form is presented. A detailed derivation appears in [78, Chapter 2], and here only the final algorithm is given:

Given spectral output samples $ Y(e^{j\omega_k})$ and input samples $ U(e^{j\omega_k})$, we minimize

$\displaystyle J^2_E = \sum_{k=0}^{N-1} \left\vert\hat{A}(e^{j\omega_k})Y(e^{j\omega_k})-\hat{B}(e^{j\omega_k})U(e^{j\omega_k})\right\vert^2 .
$

If $ \vert U(e^{j\omega_k})\vert^2$ is to be used as a weighting function in the filter-design problem, then we set $ Y(e^{j\omega_k}) = H(e^{j\omega_k})U(e^{j\omega_k})$.

Let $ \underline{x}[n_1\,$:$ \,n_2]$ denote the column vector determined by $ x(n)$, for $ n=n_1,\ldots\,,n_2$ filled in from top to bottom, and let $ T(x[n_1\,$:$ \,n_2])$ denote the size $ n_2-n_1+1$ symmetric Toeplitz matrix consisting of $ \underline{x}[n_1\,$:$ \,n_2]$ in its first column. A nonsymmetric Toeplitz matrix may be specified by its first column and row, and we use the notation $ T(\underline{x}[n_1\,$:$ \,n_2],\underline{y}^T[m_1\,$:$ \,m_2])$ to denote the $ n_2-n_1+1$ by $ m_2-m_1+1$ Toeplitz matrix with left-most column $ \underline{x}[n_1\,$:$ \,n_2]$ and top row $ \underline{y}^T[m_1\,$:$ \,m_2]$. The inverse Fourier transform of $ X(e^{j\omega_k})$ is defined as

$\displaystyle x(n) =$   FFT$\displaystyle ^{-1}{X(e^{j\omega_k})} \isdef \frac{1}{N}\sum_{k=0}^{N-1} X(e^{j\omega_k})
e^{j\omega_k n} .
$

The scaling by $ 1/N$ is optional since it has no effect on the solution. We require three correlation functions involving $ U$ and $ Y$,

\begin{eqnarray*}
\underline{R}_{uu}(n) &\isdef & \mbox{FFT}^{-1}{\left\vert U(e...
...})\overline{U(e^{j\omega_k})}} \\
n & = & 0,1,\ldots\,,N-1,\\
\end{eqnarray*}

where the overbar denotes complex conjugation, and four corresponding Toeplitz matrices,

\begin{eqnarray*}
R_{yy} &\isdef & T(\underline{R}_{yy}[0\,\mbox{:}\,{{n}_a}-1])...
...u}^T[-1\,\mbox{:}\,-{{n}_a}])\\
R_{uy} &\isdef & R_{uy}^T , \\
\end{eqnarray*}

where negative indices are to be interpreted mod $ N$, e.g., $ R_{yu}(-1)=R_{yu}(N-1)$.

The solution is then

$\displaystyle \hat{\theta}^\ast = \left[\begin{array}{c} \underline{\hat{B}}^\a...
...,{{n}_b}] \\ [2pt] \underline{R}_{yy}[1\,\mbox{:}\,{{n}_a}] \end{array}\right]
$

where

$\displaystyle \underline{\hat{B}}^\ast \isdef \left[\begin{array}{c} \hat{b}^\a...
...t{a}^\ast _0 \\ [2pt] \vdots \\ [2pt] \hat{a}^\ast _{{n}_a}\end{array}\right],
$


Prony's Method

There are several variations on equation-error minimization, and some confusion in terminology exists. We use the definition of Prony's method given by Markel and Gray [48]. It is equivalent to ``Shank's method'' [9]. In this method, one first computes the denominator $ \hat{A}^\ast (z)$ by minimizing

\begin{eqnarray*}
J_S^2(\hat{\theta}) &= \sum_{n={{n}_b}+1}^\infty\left(\hat{a}\...
...= \sum_{n={{n}_b}+1}^\infty\left(\hat{a}\ast h(n) \right)^2. \\
\end{eqnarray*}

This step is equivalent to minimization of ratio error (as used in linear prediction) for the all-pole part $ \hat{A}(z)$, with the first $ {{n}_b}+1$ terms of the time-domain error sum discarded (to get past the influence of the zeros on the impulse response). When $ {{n}_b}={{n}_a}-1$, it coincides with the covariance method of linear prediction [48,47]. This idea for finding the poles by ``skipping'' the influence of the zeros on the impulse-response shows up in the stochastic case under the name of modified Yule-Walker equations [11].

Now, Prony's method consists of next minimizing $ L2$ output error with the pre-assigned poles given by $ \hat{A}^\ast (z)$. In other words, the numerator $ \hat{B}(z)$ is found by minimizing

$\displaystyle \left\Vert\,H(e^{j\omega}) - \frac{\hat{B}(e^{j\omega})}{\hat{A}^\ast (e^{j\omega})}\,\right\Vert _2,
$

where $ \hat{A}^\ast (e^{j\omega})$ is now known. This hybrid method is not as sensitive to the time distribution of $ h(n)$ as is the pure equation-error method. In particular, the degenerate equation-error example above (in which $ \hat{H}\equiv 0$ was obtained) does not fare so badly using Prony's method.


The Padé-Prony Method

Another variation of Prony's method, described by Burrus and Parks [9] consists of using Padé approximation to find the numerator $ \hat{B}^\ast $ after the denominator $ \hat{A}^\ast $ has been found as before. Thus, $ \hat{B}^\ast $ is found by matching the first $ {{n}_b}+1$ samples of $ h(n)$, viz., $ \hat{b}^\ast _n = \hat{a}^\ast \ast h (n),
n=0\ldots\,,{{n}_b}$. This method is faster, but does not generally give as good results as the previous version. In particular, the degenerate example $ h(n)=0, n\leq {{n}_b}$ gives $ \hat{H}^\ast (z)\equiv 0$ here as did pure equation error. This method has been applied also in the stochastic case [11].

On the whole, when $ H(e^{j\omega})$ is causal and minimum phase (the ideal situation for just about any stable filter-design method), the variants on equation-error minimization described in this section perform very similarly. They are all quite fast, relative to algorithms which iteratively minimize output error, and the equation-error method based on the FFT above is generally fastest.


Next Section:
Matlab Utilities
Previous Section:
A View of Linear Time Varying Digital Filters