Least Squares Sinusoidal Parameter Estimation

There are many ways to define ``optimal'' in signal modeling. Perhaps the most elementary case is least squares estimation. Every estimator tries to measure one or more parameters of some underlying signal model. In the case of sinusoidal parameter estimation, the simplest model consists of a single complex sinusoidal component in additive white noise:

$\displaystyle x(n) \isdef {\cal A}e^{j\omega_0 n} + v(n) \protect$ (6.32)

where $ {\cal A}= A e^{j\phi}$ is the complex amplitude of the sinusoid, and $ v(n)$ is white noise (defined in §C.3). Given measurements of $ x(n)$ , $ n=0,1,2,\ldots,N-1$ , we wish to estimate the parameters $ (A,\phi,\omega_0 )$ of this sinusoid. In the method of least squares, we minimize the sum of squared errors between the data and our model. That is, we minimize

$\displaystyle J(\underline{\theta}) \isdef \sum_{n=0}^{N-1}\left\vert x(n)-{\hat x}(n)\right\vert^2 \protect$ (6.33)

with respect to the parameter vector

$\displaystyle \underline{\theta}\isdef \left[\begin{array}{c} A \\ [2pt] \phi \\ [2pt] \omega_0 \end{array}\right],$ (6.34)

where $ {\hat x}(n)$ denotes our signal model:

$\displaystyle {\hat x}(n)\isdef \hat{{\cal A}}e^{j\hat{\omega}_0n}$ (6.35)

Note that the error signal $ x(n)-\hat{{\cal A}}e^{j\hat{\omega}_0n}$ is linear in $ \hat{{\cal A}}$ but nonlinear in the parameter $ \hat{\omega}_0$ . More significantly, $ J(\underline{\theta})$ is non-convex with respect to variations in $ \hat{\omega}_0$ . Non-convexity can make an optimization based on gradient descent very difficult, while convex optimization problems can generally be solved quite efficiently [22,86].

Sinusoidal Amplitude Estimation

If the sinusoidal frequency $ \omega_0$ and phase $ \phi$ happen to be known, we obtain a simple linear least squares problem for the amplitude $ A$ . That is, the error signal

$\displaystyle x(n)-\hat{{\cal A}}e^{j\hat{\omega}_0n} = x(n)-{\hat A}e^{j(\omega_0 n+\phi)}$ (6.36)

becomes linear in the unknown parameter $ {\hat A}$ . As a result, the sum of squared errors

$\displaystyle J({\hat A}) \isdef \sum_{n=0}^{N-1}\left\vert x(n)-{\hat A}e^{j(\omega_0 n+\phi)}\right\vert^2 \protect$ (6.37)

becomes a simple quadratic (parabola) over the real line.6.11 Quadratic forms in any number of dimensions are easy to minimize. For example, the ``bottom of the bowl'' can be reached in one step of Newton's method. From another point of view, the optimal parameter $ {\hat A}$ can be obtained as the coefficient of orthogonal projection of the data $ x(n)$ onto the space spanned by all values of $ {\hat A}$ in the linear model $ {\hat A}
e^{j(\omega_0 n+\phi)}$ .

Yet a third way to minimize (5.37) is the method taught in elementary calculus: differentiate $ J({\hat A})$ with respect to $ {\hat A}$ , equate it to zero, and solve for $ {\hat A}$ . In preparation for this, it is helpful to write (5.37) as

\begin{eqnarray*}
J({\hat A}) &\isdef & \sum_{n=0}^{N-1}
\left[x(n)-{\hat A}e^{j(\omega_0 n+\phi)}\right]
\left[\overline{x(n)}-\overline{{\hat A}} e^{-j(\omega_0 n+\phi)}\right]\\
&=&
\sum_{n=0}^{N-1}
\left[
\left\vert x(n)\right\vert^2
-
x(n)\overline{{\hat A}} e^{-j(\omega_0 n+\phi)}
-
\overline{x(n)}{\hat A}e^{j(\omega_0 n+\phi)}
+
{\hat A}^2
\right]
\\
&=& \left\Vert\,x\,\right\Vert _2^2 - 2\mbox{re}\left\{\sum_{n=0}^{N-1} x(n)\overline{{\hat A}}
e^{-j(\omega_0 n+\phi)}\right\}
+ N {\hat A}^2.
\end{eqnarray*}

Differentiating with respect to $ {\hat A}$ and equating to zero yields

$\displaystyle 0 = \frac{d J({\hat A})}{d{\hat A}} = - 2$re$\displaystyle \left\{\sum_{n=0}^{N-1} x(n) e^{-j(\omega_0 n+\phi)}\right\} + 2N{\hat A}.$ (6.38)

Solving this for $ {\hat A}$ gives the optimal least-squares amplitude estimate

$\displaystyle {\hat A}= \frac{1}{N}$re$\displaystyle \left\{\sum_{n=0}^{N-1} x(n) e^{-j(\omega_0 n+\phi)}\right\} = \frac{1}{N}$re$\displaystyle \left\{\hbox{\sc DTFT}_{\omega_0 }\left[e^{-j\phi} x\right]\right\}. \protect$ (6.39)

That is, the optimal least-squares amplitude estimate may be found by the following steps:
  1. Multiply the data $ x(n)$ by $ e^{-j\phi}$ to zero the known phase $ \phi$ .
  2. Take the DFT of the $ N$ samples of $ x$ , suitably zero padded to approximate the DTFT, and evaluate it at the known frequency $ \omega_0$ .
  3. Discard any imaginary part since it can only contain noise, by (5.39).
  4. Divide by $ N$ to obtain a properly normalized coefficient of projection [264] onto the sinusoid

    $\displaystyle s_{\omega_0 }(n)\isdef e^{j\omega_0 n}.$ (6.40)


Sinusoidal Amplitude and Phase Estimation

The form of the optimal estimator (5.39) immediately suggests the following generalization for the case of unknown amplitude and phase:

$\displaystyle \hat{{\cal A}}= {\hat A}e^{j\hat{\phi}} = \frac{1}{N}\sum_{n=0}^{N-1} x(n) e^{-j\omega_0 n} = \frac{1}{N}\hbox{\sc DTFT}_{\omega_0 }(x) \protect$ (6.41)

That is, $ \hat{{\cal A}}$ is given by the complex coefficient of projection [264] of $ x$ onto the complex sinusoid $ e^{j\omega_0 n}$ at the known frequency $ \omega_0$ . This can be shown by generalizing the previous derivation, but here we will derive it using the more enlightened orthogonality principle [114].

The orthogonality principle for linear least squares estimation states that the projection error must be orthogonal to the model. That is, if $ {\hat x}$ is our optimal signal model (viewed now as an $ N$ -vector in $ {\bf R}^N$ ), then we must have [264]

\begin{eqnarray*}
x-{\hat x}&\perp& {\hat x}\\
\Rightarrow\quad
\left<x-{\hat x},{\hat x}\right> &=& 0\\
\Rightarrow\quad \left<x,{\hat x}\right> &=& \left<{\hat x},{\hat x}\right>\\
\Rightarrow\quad \sum_{n=0}^{N-1}\overline{x(n)} {\hat A}e^{j(\omega_0 n+\hat{\phi})}&=& N {\hat A}^2 \\
\Rightarrow\quad \sum_{n=0}^{N-1}x(n) {\hat A}e^{-j(\omega_0 n+\hat{\phi})}&=& N {\hat A}^2 \\
\Rightarrow\quad \hbox{\sc DTFT}_{\omega_0 }(x)&=&
N \frac{{\hat A}^2}{{\hat A}e^{-j\hat{\phi}}} = N {\hat A}e^{j\hat{\phi}}
\end{eqnarray*}

Thus, the complex coefficient of projection of $ x$ onto $ e^{j{\hat \omega}n}$ is given by

$\displaystyle \hat{{\cal A}}= {\hat A}e^{j\hat{\phi}} = \frac{1}{N} \hbox{\sc DTFT}_{\omega_0 }(x).$ (6.42)

The optimality of $ \hat{{\cal A}}$ in the least squares sense follows from the least-squares optimality of orthogonal projection [114,121,252]. From a geometrical point of view, referring to Fig.5.16, we say that the minimum distance from a vector $ x\in{\bf R}^N$ to some lower-dimensional subspace $ {\bf R}^M$ , where $ M<N$ (here $ M=1$ for one complex sinusoid) may be found by ``dropping a perpendicular'' from $ x$ to the subspace. The point $ {\hat x}$ at the foot of the perpendicular is the point within the subspace closest to $ x$ in Euclidean distance.

Figure: Geometric interpretation of the orthogonal projection of a vector $ x$ in 3D space onto a 2D plane ($ N=3$ , $ M=2$ ). The orthogonal projection $ {\hat x}$ minimizes the Euclidean distance $ \left\Vert\,x-{\hat x}\,\right\Vert$ .
\includegraphics{eps/orthproj}


Sinusoidal Frequency Estimation

The form of the least-squares estimator (5.41) in the known-frequency case immediately suggests the following frequency estimator for the unknown-frequency case:

$\displaystyle \hat{\omega}_0^\ast = \arg\{\max_{\hat{\omega}_0} \left\vert\hbox{\sc DTFT}_{\hat{\omega}_0}(x)\right\vert\}. \protect$ (6.43)

That is, the sinusoidal frequency estimate is defined as that frequency which maximizes the DTFT magnitude. Given this frequency, the least-squares sinusoidal amplitude and phase estimates are given by (5.41) evaluated at that frequency.

It can be shown [121] that (5.43) is in fact the optimal least-squares estimator for a single sinusoid in white noise. It is also the maximum likelihood estimator for a single sinusoid in Gaussian white noise, as discussed in the next section.

In summary,

$\textstyle \parbox{0.8\textwidth}{the least squares estimate for the sinusoidal parameters
of amplitude $A$, phase $\phi$, and frequency $\omega_0 $\ are obtained from
the complex amplitude and location of the magnitude peak in the DTFT
of the zero-padded observation data $x(n)$.}$

In practice, of course, the DTFT is implemented as an interpolated FFT, as described in the previous sections (e.g., QIFFT method).


Next Section:
Maximum Likelihood Sinusoid Estimation
Previous Section:
Bias of Parabolic Peak Interpolation