# AR Extrapolation

Started by December 9, 2004
```Hi Cris,

> I am doing some research regarding the High-Resolution methods, for
> applications in Automotive Radar, and before I compose my own C codes I am
> playing with the Matlab intrinsic functions.

That's a very wise way of working. Be aware, though, that some neat
functionality you probably would want in your own code, like order
estimation, usually lacks in the matlab codes.

> One of these functions is
> arburg (pburg) and I was curious how good are the AR coefficients estimated.
> So, I build a short example, I took a short data sequence (one cycle
> (period) for better analysis) of length = 64 samples

So you have 64 samples and one sinusoid with period n/64 for some
integer n? Do you include damping in your sinusoid? Or do you use
a completely different type of data?

> (from a longer generated sequence),

The same goes for the noise. Exactly how did you generate it?
Not that it would matter very much in this particular case
but such details are of immense help when you ask questions
here. There could be a problem with the way you generate the
noise. Even if it isn't, people would like to have this type
of information just to rule that out.

> and I applied pburg. The spectrum looks
> okay, I mean the estimated frequency is at the right bin. With the same
> short sequence I estimated the coefficients using arburg, and then having
> these coefficients and the original sequence of data I tried to estimate
> (through extrapolation) the following cycle, which it should look maybe not
> like a twin brother of the main cycle but close to it.
>
> I used the following eq. for extrapolation:
>
>              p
>
> x[n] = -sum a[k]x[n-k]
>
>             k=1
>
>
>
> for my surprise I did not become the expected sequence of estimated data
> values. Is my thought not logical? Am I missing something?
>
> (Parameters: 64 data samples x[0:n-1], 16 estimated coefficients a[1:p] and
> then try to estimate the next 64 samples x[n:2n-1])

Well, you use the method out of scope. As a basic premise, assume that
the data you look at are *not* from an AR(p) stochastic process. This
or because the AR(p) model is an arbitrary chosen model (usually because
of both these reasons).

So the regression equation becomes

P
x'[n] + e[n] = -sum a[k]x[n-k]                           
k=1

Look carefully at equation . The 'x' here is not an arbitrary
data sample, it's a data sample that belongs to an ordered N-dimensional
random variable that was drawn from a stochastic process (N being the
number of data samples available, in your case N = 64).

Equation  says that given P ordered samples in this random variable,
it is possible to predict the next sample better than by just guessing
wild. There will be made an error, but it can be shown that equation 
is optimal both with respect to minimizing the noise energy (as is
shown through the derivation of the Yule-Walker equations), and it
is also optimal with respect to maximizing the entropy of the error
sequence (Burg's derivation of 1968).

Now, what happens if you extrapolate the sequence x? I'll demonstrate
by using equation  on the data sequence {x, x,..., x[p-1]}
and extrapolate first to find x[p] and then x[p+1].

First, estimate x[p] based on the samples {x,...,x[p-1]}:

p
x"[p] = - sum a[k]x[p-k] = e[p]                           
k=1

where x"[n] is the estimate of x[n] by extrapolation.

Why no x'[p] in equation ? Because sample x[p] does not exist
in the data stream. Because of that, is not possible for the
prediction operator to correct the prediction error, as is readily
seen in the next step:

p
x"[p+1]= - sum a[k]x[p-k] + ae[p] = e[p+2]             
k=2

The difference between equation  and equation  is that there
is a data sample x[p] where the e[p] step occurs in equation .
The occurance of this new data sample means that any error introduced
in equation  when *predicting* sample x[p], is *not* carried
on in the prediction of sample x[p+1].

Since there comes no new sample x[p] from the data source between
steps  and , it is not longer possible for the predictor to
correct for the error made in step . So the extrapolation problem
is ill-posed. After p steps, extrapolation is probably no better than
wild guessing.

So in summary, your problems are caused by the fact that you
are trying to use the methods out of scope. "Prediction" and
"extrapolation" are two very different things.

Rune
```
```"Rune Allnor" <allnor@tele.ntnu.no> wrote in message
> Hi Cris,
>
> > I am doing some research regarding the High-Resolution methods, for
> > applications in Automotive Radar, and before I compose my own C codes I
am
> > playing with the Matlab intrinsic functions.
>
> That's a very wise way of working. Be aware, though, that some neat
> functionality you probably would want in your own code, like order
> estimation, usually lacks in the matlab codes.
>
> > One of these functions is
> > arburg (pburg) and I was curious how good are the AR coefficients
estimated.
> > So, I build a short example, I took a short data sequence (one cycle
> > (period) for better analysis) of length = 64 samples
>
> So you have 64 samples and one sinusoid with period n/64 for some
> integer n? Do you include damping in your sinusoid? Or do you use
> a completely different type of data?
>
> > added with some noise
> > (from a longer generated sequence),
>
> The same goes for the noise. Exactly how did you generate it?
> Not that it would matter very much in this particular case
> but such details are of immense help when you ask questions
> here. There could be a problem with the way you generate the
> noise. Even if it isn't, people would like to have this type
> of information just to rule that out.
>
> > and I applied pburg. The spectrum looks
> > okay, I mean the estimated frequency is at the right bin. With the same
> > short sequence I estimated the coefficients using arburg, and then
having
> > these coefficients and the original sequence of data I tried to estimate
> > (through extrapolation) the following cycle, which it should look maybe
not
> > like a twin brother of the main cycle but close to it.
> >
> > I used the following eq. for extrapolation:
> >
> >              p
> >
> > x[n] = -sum a[k]x[n-k]
> >
> >             k=1
> >
> >
> >
> > for my surprise I did not become the expected sequence of estimated data
> > values. Is my thought not logical? Am I missing something?
> >
> > (Parameters: 64 data samples x[0:n-1], 16 estimated coefficients a[1:p]
and
> > then try to estimate the next 64 samples x[n:2n-1])
>
> Well, you use the method out of scope. As a basic premise, assume that
> the data you look at are *not* from an AR(p) stochastic process. This
> happens either because ther is observation noise added to your data,
> or because the AR(p) model is an arbitrary chosen model (usually because
> of both these reasons).
>
> So the regression equation becomes
>
>                     P
>    x'[n] + e[n] = -sum a[k]x[n-k]                           
>                    k=1
>
> Look carefully at equation . The 'x' here is not an arbitrary
> data sample, it's a data sample that belongs to an ordered N-dimensional
> random variable that was drawn from a stochastic process (N being the
> number of data samples available, in your case N = 64).
>
> Equation  says that given P ordered samples in this random variable,
> it is possible to predict the next sample better than by just guessing
> wild. There will be made an error, but it can be shown that equation 
> is optimal both with respect to minimizing the noise energy (as is
> shown through the derivation of the Yule-Walker equations), and it
> is also optimal with respect to maximizing the entropy of the error
> sequence (Burg's derivation of 1968).
>
> Now, what happens if you extrapolate the sequence x? I'll demonstrate
> by using equation  on the data sequence {x, x,..., x[p-1]}
> and extrapolate first to find x[p] and then x[p+1].
>
> First, estimate x[p] based on the samples {x,...,x[p-1]}:
>
>               p
>    x"[p] = - sum a[k]x[p-k] = e[p]                           
>              k=1
>
> where x"[n] is the estimate of x[n] by extrapolation.
>
> Why no x'[p] in equation ? Because sample x[p] does not exist
> in the data stream. Because of that, is not possible for the
> prediction operator to correct the prediction error, as is readily
> seen in the next step:
>
>                p
>    x"[p+1]= - sum a[k]x[p-k] + ae[p] = e[p+2]             
>               k=2
>
> The difference between equation  and equation  is that there
> is a data sample x[p] where the e[p] step occurs in equation .
> The occurance of this new data sample means that any error introduced
> in equation  when *predicting* sample x[p], is *not* carried
> on in the prediction of sample x[p+1].
>
> Since there comes no new sample x[p] from the data source between
> steps  and , it is not longer possible for the predictor to
> correct for the error made in step . So the extrapolation problem
> is ill-posed. After p steps, extrapolation is probably no better than
> wild guessing.
>
> So in summary, your problems are caused by the fact that you
> are trying to use the methods out of scope. "Prediction" and
> "extrapolation" are two very different things.
>
> Rune
Thank you for your response Rune.

I'll try to give some more precise information about what I am doing.

It seems that I complicated my life without intention, I was just curious to
see if the theory is so simple to apply, and is not. Definitely the
prediction is not the same thing with extrapolation and now I am 'happy'
that I tried this example because in this way I am improving my knowledge
regarding this side of the AR models.

So, maybe is better to put some small code for a better view,

n=255;

fs=256;

t = (0:n)/fs;

ampn = .05;

x_original_signal = sin(2*pi*t*4) + ampn*randn(size(t));

I am taking one cycle,

x_period = [x_original_signal(1:64) x_original_signal(1)]; - I put it like
this just to show that I consider it as being a time continuous signal
period, of course in discrete time representation it will be characterised
only by 64 sample values.

and then apply to x_period(1:64) the arburg intrinsic function for
estimating the AR coefficients.

The model order I selected it was p=16.

So this sequence of data is not coming from an AR stochastic process. I
consider my data as a deterministic signal added with Gaussian distributed
noise. Also the model is chosen arbitrarily.

It seems that this error correction is doing the whole mess in the
extrapolation chain, and I will try to solve it because now really bothers
me the idea that I didn't really get it from the first time. Now it just
came into my mind this question, is AR extrapolation of use anyway?

Regards

Cris

```