Linear Prediction Spectral Envelope
Linear Prediction (LP) implicitly computes a spectral envelope that is well adapted for audio work, provided the order of the predictor is appropriately chosen. Due to the error minimized by LP, spectral peaks are emphasized in the envelope, as they are in the auditory system. (The peak-emphasis of LP is quantified in (10.10) below.)
The term ``linear prediction'' refers to the process of predicting a signal sample based on past samples:
We call the order of the linear predictor, and the prediction coefficients. The prediction error (or ``innovations sequence'' ) is denoted in (10.4), and it represents all new information entering the signal at time . Because the information is new, is ``unpredictable.'' The predictable component of contains no new information.
Taking the z transform of (10.4) yields
where . In signal modeling by linear prediction, we are given the signal but not the prediction coefficients . We must therefore estimate them. Let denote the polynomial with estimated prediction coefficients . Then we have
where denotes the estimated prediction-error z transform. By minimizing , we define a minimum-least-squares estimate . In other words, the linear prediction coefficients are defined as those which minimize the sum of squared prediction errors
over some range of , typically an interval over which the signal is stationary (defined in Chapter 6). It turns out that this minimization results in maximally flattening the prediction-error spectrum [11,157,162]. That is, the optimal is a whitening filter (also called an inverse filter). This makes sense in terms of Chapter 6 when one considers that a flat power spectral density corresponds to white noise in the time domain, and only white noise is completely unpredictable from one sample to the next. A non-flat spectrum corresponds to a nonzero correlation between two signal samples separated by some nonzero time interval.
If the prediction-error is successfully whitened, then the signal model can be expressed in the frequency domain as
where denotes the power spectral density of (defined in Chapter 6), and denotes the variance of the (white-noise) prediction error . Thus, the spectral magnitude envelope may be defined as
Linear Prediction is Peak Sensitive
From this ``ratio error'' expression in the frequency domain, we can see that contributions to the error are smallest when . Therefore, LP tends to overestimate peaks. LP cannot make arbitrarily large because is constrained to be monic and minimum-phase. It can be shown that the log-magnitude frequency response of every minimum-phase monic polynomial is zero-mean . Therefore, for each peak overestimation, there must be an equal-area ``valley underestimation'' (in a log-magnitude plot over the unit circle).
The two classic methods for linear prediction are called the autocorrelation method and the covariance method [162,157]. Both methods solve the linear normal equations (defined below) using different autocorrelation estimates.
In the autocorrelation method of linear prediction, the covariance matrix is constructed from the usual Bartlett-window-biased sample autocorrelation function (see Chapter 6), and it has the desirable property that is always minimum phase (i.e., is guaranteed to be stable). However, the autocorrelation method tends to overestimate formant bandwidths; in other words, the filter model is typically overdamped. This can be attributed to implicitly ``predicting zero'' outside of the signal frame, resulting in the Bartlett-window bias in the sample autocorrelation.
So-called covariance lattice methods and Burg's method were developed to maintain guaranteed stability while giving accuracy comparable to the covariance method of LP .
where denotes the th data frame from the signal . To obtain the th-order linear predictor coefficients , we solve the following system of linear normal equations (also called Yule-Walker or Wiener-Hopf equations):
In matlab syntax, the solution is given by `` '', where , and . Since the covariance matrix is symmetric and Toeplitz by construction,11.4 an solution exists using the Durbin recursion.11.5
If the rank of the autocorrelation matrix is , then the solution to (10.12) is unique, and this solution is always minimum phase  (i.e., all roots of are inside the unit circle in the plane , so that is always a stable all-pole filter). In practice, the rank of is (with probability 1) whenever includes a noise component. In the noiseless case, if is a sum of sinusoids, each (real) sinusoid at distinct frequency adds 2 to the rank. A dc component, or a component at half the sampling rate, adds 1 to the rank of .
The choice of time window for forming a short-time sample autocorrelation and its weighting also affect the rank of . Equation (10.11) applied to a finite-duration frame yields what is called the autocorrelation method of linear prediction . Dividing out the Bartlett-window bias in such a sample autocorrelation yields a result closer to the covariance method of LP. A matlab example is given in §10.3.3 below.
The classic covariance method computes an unbiased sample covariance matrix by limiting the summation in (10.11) to a range over which stays within the frame--a so-called ``unwindowed'' method. The autocorrelation method sums over the whole frame and replaces by zero when points outside the frame--a so-called ``windowed'' method (windowed by the rectangular window).
For computing spectral envelopes via linear prediction, the order of the predictor should be chosen large enough that the envelope can follow the contour of the spectrum, but not so large that it follows the spectral ``fine structure'' on a scale not considered to belong in the envelope. In particular, for voice, should be twice the number of spectral formants, and perhaps a little larger to allow more detailed modeling of spectral shape away from the formants. For a sum of quasi sinusoids, the order should be significantly less than twice the number of sinusoids to inhibit modeling the sinusoids as spectral-envelope peaks. For filtered-white-noise, should be close to the order of the filter applied to the white noise, and so on.
where is computed from the solution of the Toeplitz normal equations, and is the estimated rms level of the prediction error in the th frame.
can be driven by unit-variance white noise to produce a filtered-white-noise signal having spectral envelope . We may regard (no absolute value) as the frequency response of the filter in a source-filter decomposition of the signal , where the source is white noise.
It bears repeating that is zero mean when is monic and minimum phase (all zeros inside the unit circle). This means, for example, that can be simply estimated as the mean of the log spectral magnitude .
For best results, the frequency axis ``seen'' by linear prediction should be warped to an auditory frequency scale, as discussed in Appendix E . This has the effect of increasing the accuracy of low-frequency peaks in the extracted spectral envelope, in accordance with the nonuniform frequency resolution of the inner ear.
Spectral Envelope Examples