DSPRelated.com
Forums

AR Extrapolation

Started by Rune Allnor December 9, 2004
Hi Cris,

> I am doing some research regarding the High-Resolution methods, for > applications in Automotive Radar, and before I compose my own C codes I am > playing with the Matlab intrinsic functions.
That's a very wise way of working. Be aware, though, that some neat functionality you probably would want in your own code, like order estimation, usually lacks in the matlab codes.
> One of these functions is > arburg (pburg) and I was curious how good are the AR coefficients estimated. > So, I build a short example, I took a short data sequence (one cycle > (period) for better analysis) of length = 64 samples
So you have 64 samples and one sinusoid with period n/64 for some integer n? Do you include damping in your sinusoid? Or do you use a completely different type of data?
> added with some noise > (from a longer generated sequence),
The same goes for the noise. Exactly how did you generate it? Not that it would matter very much in this particular case but such details are of immense help when you ask questions here. There could be a problem with the way you generate the noise. Even if it isn't, people would like to have this type of information just to rule that out.
> and I applied pburg. The spectrum looks > okay, I mean the estimated frequency is at the right bin. With the same > short sequence I estimated the coefficients using arburg, and then having > these coefficients and the original sequence of data I tried to estimate > (through extrapolation) the following cycle, which it should look maybe not > like a twin brother of the main cycle but close to it. > > I used the following eq. for extrapolation: > > p > > x[n] = -sum a[k]x[n-k] > > k=1 > > > > for my surprise I did not become the expected sequence of estimated data > values. Is my thought not logical? Am I missing something? > > (Parameters: 64 data samples x[0:n-1], 16 estimated coefficients a[1:p] and > then try to estimate the next 64 samples x[n:2n-1])
Well, you use the method out of scope. As a basic premise, assume that the data you look at are *not* from an AR(p) stochastic process. This happens either because ther is observation noise added to your data, or because the AR(p) model is an arbitrary chosen model (usually because of both these reasons). So the regression equation becomes P x'[n] + e[n] = -sum a[k]x[n-k] [1] k=1 Look carefully at equation [1]. The 'x' here is not an arbitrary data sample, it's a data sample that belongs to an ordered N-dimensional random variable that was drawn from a stochastic process (N being the number of data samples available, in your case N = 64). Equation [1] says that given P ordered samples in this random variable, it is possible to predict the next sample better than by just guessing wild. There will be made an error, but it can be shown that equation [1] is optimal both with respect to minimizing the noise energy (as is shown through the derivation of the Yule-Walker equations), and it is also optimal with respect to maximizing the entropy of the error sequence (Burg's derivation of 1968). Now, what happens if you extrapolate the sequence x? I'll demonstrate by using equation [1] on the data sequence {x[0], x[1],..., x[p-1]} and extrapolate first to find x[p] and then x[p+1]. First, estimate x[p] based on the samples {x[0],...,x[p-1]}: p x"[p] = - sum a[k]x[p-k] = e[p] [2] k=1 where x"[n] is the estimate of x[n] by extrapolation. Why no x'[p] in equation [2]? Because sample x[p] does not exist in the data stream. Because of that, is not possible for the prediction operator to correct the prediction error, as is readily seen in the next step: p x"[p+1]= - sum a[k]x[p-k] + a[1]e[p] = e[p+2] [3] k=2 The difference between equation [3] and equation [1] is that there is a data sample x[p] where the e[p] step occurs in equation [3]. The occurance of this new data sample means that any error introduced in equation [2] when *predicting* sample x[p], is *not* carried on in the prediction of sample x[p+1]. Since there comes no new sample x[p] from the data source between steps [2] and [3], it is not longer possible for the predictor to correct for the error made in step [2]. So the extrapolation problem is ill-posed. After p steps, extrapolation is probably no better than wild guessing. So in summary, your problems are caused by the fact that you are trying to use the methods out of scope. "Prediction" and "extrapolation" are two very different things. Rune
"Rune Allnor" <allnor@tele.ntnu.no> wrote in message
news:f56893ae.0412082358.1c1b5519@posting.google.com...
> Hi Cris, > > > I am doing some research regarding the High-Resolution methods, for > > applications in Automotive Radar, and before I compose my own C codes I
am
> > playing with the Matlab intrinsic functions. > > That's a very wise way of working. Be aware, though, that some neat > functionality you probably would want in your own code, like order > estimation, usually lacks in the matlab codes. > > > One of these functions is > > arburg (pburg) and I was curious how good are the AR coefficients
estimated.
> > So, I build a short example, I took a short data sequence (one cycle > > (period) for better analysis) of length = 64 samples > > So you have 64 samples and one sinusoid with period n/64 for some > integer n? Do you include damping in your sinusoid? Or do you use > a completely different type of data? > > > added with some noise > > (from a longer generated sequence), > > The same goes for the noise. Exactly how did you generate it? > Not that it would matter very much in this particular case > but such details are of immense help when you ask questions > here. There could be a problem with the way you generate the > noise. Even if it isn't, people would like to have this type > of information just to rule that out. > > > and I applied pburg. The spectrum looks > > okay, I mean the estimated frequency is at the right bin. With the same > > short sequence I estimated the coefficients using arburg, and then
having
> > these coefficients and the original sequence of data I tried to estimate > > (through extrapolation) the following cycle, which it should look maybe
not
> > like a twin brother of the main cycle but close to it. > > > > I used the following eq. for extrapolation: > > > > p > > > > x[n] = -sum a[k]x[n-k] > > > > k=1 > > > > > > > > for my surprise I did not become the expected sequence of estimated data > > values. Is my thought not logical? Am I missing something? > > > > (Parameters: 64 data samples x[0:n-1], 16 estimated coefficients a[1:p]
and
> > then try to estimate the next 64 samples x[n:2n-1]) > > Well, you use the method out of scope. As a basic premise, assume that > the data you look at are *not* from an AR(p) stochastic process. This > happens either because ther is observation noise added to your data, > or because the AR(p) model is an arbitrary chosen model (usually because > of both these reasons). > > So the regression equation becomes > > P > x'[n] + e[n] = -sum a[k]x[n-k] [1] > k=1 > > Look carefully at equation [1]. The 'x' here is not an arbitrary > data sample, it's a data sample that belongs to an ordered N-dimensional > random variable that was drawn from a stochastic process (N being the > number of data samples available, in your case N = 64). > > Equation [1] says that given P ordered samples in this random variable, > it is possible to predict the next sample better than by just guessing > wild. There will be made an error, but it can be shown that equation [1] > is optimal both with respect to minimizing the noise energy (as is > shown through the derivation of the Yule-Walker equations), and it > is also optimal with respect to maximizing the entropy of the error > sequence (Burg's derivation of 1968). > > Now, what happens if you extrapolate the sequence x? I'll demonstrate > by using equation [1] on the data sequence {x[0], x[1],..., x[p-1]} > and extrapolate first to find x[p] and then x[p+1]. > > First, estimate x[p] based on the samples {x[0],...,x[p-1]}: > > p > x"[p] = - sum a[k]x[p-k] = e[p] [2] > k=1 > > where x"[n] is the estimate of x[n] by extrapolation. > > Why no x'[p] in equation [2]? Because sample x[p] does not exist > in the data stream. Because of that, is not possible for the > prediction operator to correct the prediction error, as is readily > seen in the next step: > > p > x"[p+1]= - sum a[k]x[p-k] + a[1]e[p] = e[p+2] [3] > k=2 > > The difference between equation [3] and equation [1] is that there > is a data sample x[p] where the e[p] step occurs in equation [3]. > The occurance of this new data sample means that any error introduced > in equation [2] when *predicting* sample x[p], is *not* carried > on in the prediction of sample x[p+1]. > > Since there comes no new sample x[p] from the data source between > steps [2] and [3], it is not longer possible for the predictor to > correct for the error made in step [2]. So the extrapolation problem > is ill-posed. After p steps, extrapolation is probably no better than > wild guessing. > > So in summary, your problems are caused by the fact that you > are trying to use the methods out of scope. "Prediction" and > "extrapolation" are two very different things. > > Rune
Thank you for your response Rune. I'll try to give some more precise information about what I am doing. It seems that I complicated my life without intention, I was just curious to see if the theory is so simple to apply, and is not. Definitely the prediction is not the same thing with extrapolation and now I am 'happy' that I tried this example because in this way I am improving my knowledge regarding this side of the AR models. So, maybe is better to put some small code for a better view, n=255; fs=256; t = (0:n)/fs; ampn = .05; x_original_signal = sin(2*pi*t*4) + ampn*randn(size(t)); I am taking one cycle, x_period = [x_original_signal(1:64) x_original_signal(1)]; - I put it like this just to show that I consider it as being a time continuous signal period, of course in discrete time representation it will be characterised only by 64 sample values. and then apply to x_period(1:64) the arburg intrinsic function for estimating the AR coefficients. The model order I selected it was p=16. So this sequence of data is not coming from an AR stochastic process. I consider my data as a deterministic signal added with Gaussian distributed noise. Also the model is chosen arbitrarily. It seems that this error correction is doing the whole mess in the extrapolation chain, and I will try to solve it because now really bothers me the idea that I didn't really get it from the first time. Now it just came into my mind this question, is AR extrapolation of use anyway? Regards Cris