DSPRelated.com
Forums

Cross Correlating Road Grade information

Started by martini April 20, 2006
All:

I am working on a problem which is correlating some road grade information
to  position a vehicle.  I have a template of the expected road grade
measurments and then several samples along that path.  I want to look at
the 'sample' road grade measurments and find where they best fit on the
template.

basically I will be shifting the template until I get the best match.  At
this point I have done it a couple ways...all of which are hard on the PC.
 I have shown that this idea will work by doing the following.

first i cut the template (or window it) so that it has the same nmber of
data points as the sample data.
then i use matlabs corr function to correlate the two.
then i shift the window of the template, and calculate this again.

this keeps going until i reach the end of the template.  I then backout
the best match from the tabulated correlation coefficients.

Although this works it is obviously a poor way of doing this, so I tried
something else...


I tried using the FFT method:  FFT both the sample and the entire
template, multiply the sample FFT with the conjugate of the template FFT,
and then inverse transform the whole thing.

I have also tried this using matlabs built in xcorr function.
[xcorrelation,lags] = xcorr(road_grade_track,road_grade_sample);

Neither of these FFT methods produce the same results as the poor mans
correlation coefficeint method shown above.  I am getting my maximum
correlation at lag values which do not line up the curves the 'best.'

I have found that if i subtract out the mean of the sample data, and the
mean of the template data, the process works for 2 of the sample sets..

This is probably due to the fact that the template data is so much longer
than the sample data, so when i subtract out the mean, it is subtracting
out the mean of the entire set instead of the mean local to the matching
area (if that makes any sense.)


what can i do to make this work?

thanks in advance,
martini




"martini" <rdm186@psu.edu> wrote in message 
news:WOKdndRlwOgIQdrZnZ2dnUVZ_vednZ2d@giganews.com...
> All: > > I am working on a problem which is correlating some road grade information > to position a vehicle. I have a template of the expected road grade > measurments and then several samples along that path. I want to look at > the 'sample' road grade measurments and find where they best fit on the > template. > > basically I will be shifting the template until I get the best match. At > this point I have done it a couple ways...all of which are hard on the PC. > I have shown that this idea will work by doing the following. > > first i cut the template (or window it) so that it has the same nmber of > data points as the sample data. > then i use matlabs corr function to correlate the two. > then i shift the window of the template, and calculate this again. > > this keeps going until i reach the end of the template. I then backout > the best match from the tabulated correlation coefficients. > > Although this works it is obviously a poor way of doing this, so I tried > something else... > > > I tried using the FFT method: FFT both the sample and the entire > template, multiply the sample FFT with the conjugate of the template FFT, > and then inverse transform the whole thing. > > I have also tried this using matlabs built in xcorr function. > [xcorrelation,lags] = xcorr(road_grade_track,road_grade_sample); > > Neither of these FFT methods produce the same results as the poor mans > correlation coefficeint method shown above. I am getting my maximum > correlation at lag values which do not line up the curves the 'best.' > > I have found that if i subtract out the mean of the sample data, and the > mean of the template data, the process works for 2 of the sample sets.. > > This is probably due to the fact that the template data is so much longer > than the sample data, so when i subtract out the mean, it is subtracting > out the mean of the entire set instead of the mean local to the matching > area (if that makes any sense.) > > > what can i do to make this work?
First, it would be good to clearly specify how the data sets are sampled because comments like "so much longer" may mean "more samples per unit distance" or it may mean "more distance covered". So, it's not clear how you can multiply the sample FFT with the template FFT* unless the number of elements in each sequence are the same. Also, when you do this, you need to have the time sequences zero-padded to avoid aliasing of the circular convolution that you're doing with the FFTs. Like this: Distance L feet for the data sampled at intevals of 1 foot for L samples. Distance L for the samples of the template at intervals of 1 foot for L samples. If this isn't the case then you need to make it so by interpolation or decimation of one or the other sets of samples. You need L+L-1 samples do to the circular convolution. So, it's pretty easy to simply double the number of samples of each by zero-padding them both. Then you take the FFT and multiply and IFFT. I'm a little confused about your problem statement: You have L samples of the real data and L samples of the template. Why doesn't this tell you where they coincide by definition? I don't understand what you mean by "cutting" the template. Is it that the template is very long relative to the data? Is your objective to align the long template with the short data set? If so, one way would be to cross correlate the entire long template with the short data set - no chopping up. They don't have to be the same length until you go to do the FFT and then, as above, you must have zero-padded adequately - which will provide the equal lengths. The overlap for any output will still be L samples so you will be generating all possible correlator outputs in one pass. It sounds like what you're doing but this will do it without any interim cutting going on. Fred
martini skrev:
> All: > > I am working on a problem which is correlating some road grade information > to position a vehicle. I have a template of the expected road grade > measurments and then several samples along that path. I want to look at > the 'sample' road grade measurments and find where they best fit on the > template.
OK, so what do you measure? Vibrations somewhere on the vehicle? One sensor? Several?
> basically I will be shifting the template until I get the best match. At > this point I have done it a couple ways...all of which are hard on the PC. > I have shown that this idea will work by doing the following.
Ehm... "will work" is a very strong claim... "might work" may be a better term. There is always the possibility that some of the positive results are shear luck.
> first i cut the template (or window it) so that it has the same nmber of > data points as the sample data. > then i use matlabs corr function to correlate the two. > then i shift the window of the template, and calculate this again.
I am not sure I understand how your "template" looks. Is this a collection of separate refernce measurements? Or did you drive around your track once and get a complete measurement?
> this keeps going until i reach the end of the template. I then backout > the best match from the tabulated correlation coefficients. > > Although this works it is obviously a poor way of doing this, so I tried > something else... > > > I tried using the FFT method: FFT both the sample and the entire > template, multiply the sample FFT with the conjugate of the template FFT, > and then inverse transform the whole thing.
Seems more efficient. But again, I don't know what you mean by "template", so it is hard to comment further.
> I have also tried this using matlabs built in xcorr function. > [xcorrelation,lags] = xcorr(road_grade_track,road_grade_sample); > > Neither of these FFT methods produce the same results as the poor mans > correlation coefficeint method shown above.
It could be a number of reasons. FFT methods might suffer from wrap-around effects while time-domain methods might suffer from bias. Unless you are very cautious. If you chopped up one large sequence, there may be issues with splicing those smaller sequences.
> I am getting my maximum > correlation at lag values which do not line up the curves the 'best.'
That's a part of statistical signal processing.
> I have found that if i subtract out the mean of the sample data, and the > mean of the template data, the process works for 2 of the sample sets..
Of how many?
> This is probably due to the fact that the template data is so much longer > than the sample data, so when i subtract out the mean, it is subtracting > out the mean of the entire set instead of the mean local to the matching > area (if that makes any sense.)
That made sense to me. And yes, I agree that this might be a factor.
> what can i do to make this work?
I have no idea. You can, however, remove some of the uncertainties in the data. For instance, you can look very carefully over the various numerical artifacts that are introduced by different estimators for the autocrrelation or -covariance. I would also suggest that you try to normalize the correlation coefficients so that they are kept in the range [-1,1]. That way, you avoid two large-amplitude signals "falsely" give a high correlation. Rune
martini wrote:
> All: > > I am working on a problem which is correlating some road grade information > to position a vehicle. I have a template of the expected road grade > measurments and then several samples along that path. I want to look at > the 'sample' road grade measurments and find where they best fit on the > template. > > basically I will be shifting the template until I get the best match. At > this point I have done it a couple ways...all of which are hard on the PC. > I have shown that this idea will work by doing the following. > > first i cut the template (or window it) so that it has the same nmber of > data points as the sample data. > then i use matlabs corr function to correlate the two. > then i shift the window of the template, and calculate this again. > > this keeps going until i reach the end of the template. I then backout > the best match from the tabulated correlation coefficients. > > Although this works it is obviously a poor way of doing this, so I tried > something else... > > > I tried using the FFT method: FFT both the sample and the entire > template, multiply the sample FFT with the conjugate of the template FFT, > and then inverse transform the whole thing. > > I have also tried this using matlabs built in xcorr function. > [xcorrelation,lags] = xcorr(road_grade_track,road_grade_sample); > > Neither of these FFT methods produce the same results as the poor mans > correlation coefficeint method shown above. I am getting my maximum > correlation at lag values which do not line up the curves the 'best.' > > I have found that if i subtract out the mean of the sample data, and the > mean of the template data, the process works for 2 of the sample sets.. > > This is probably due to the fact that the template data is so much longer > than the sample data, so when i subtract out the mean, it is subtracting > out the mean of the entire set instead of the mean local to the matching > area (if that makes any sense.) > > > what can i do to make this work?
Subtract the mean of windowed portion of the template, instead of the mean of the entire set, which you know doesn't work? Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Ok, so I will be more specific.

My data is collected from 1 sensor at a 50 Hz rate.  I then use velocity
and time measurments to estimate path distance covered.  I then
interpolated my road grade measurements to be set at 3cm intervals.  So
both the template and sample data are road grade as a function of path
distance of 3cm intervals.

To get the template, I went around the track once, and collected the
measurments.

Then i went around the same path, multiple times, and collected
measurements along random spots for random lengths of time.

In post processing I am trying to find where my sample data was taken
(because I have the actual position from GPS to check after I run this
routine)


So the thing that i said "works" was the following:

I find the path distance covered by the sample data (since this has been
decimated/interpolated at 3cm intervals, this is equivalent to just
specifing the number of points) - say its N points at 3 cm intervals

I then take the first N points of my template data and do a correlation
between the 'sectioned' template, and the sample data

I store the correlation.

Then I move my template window by 5 or so points (15 cm) and take points 5
to N+5.  I repeat the correlation.

At the end I have all the correlation values.  From there I can find the
maximum value, and the 'section' of template data it correpsonded to.

Therefore my sample data best matches at that spot of the template,
etc...

so for one sample, after searching through a mile of track, it came back
and matched the 160 meter data set like this:

http://www.personal.psu.edu/rdm186/random_images/Sample3_RoadGrade.jpg



obviously this is computationally intensive, which is why I was working
with the FFT stuff.  I have not doin much with FFT's so I could be making
some obvious mistakes...

When I am using the xcorr function in matlab, I am NOT 'sectioning' the
template.  I am directly feeding in the entire template as well as the
sample.  The Xcorr function says that it will autcomatically zero pad the
smaller array.  

From what I understand, the function will shift the template and
calculated the correlation at each shift both in the forward and backward
direction.  by calling it as:

[xcorrelation,lags] = xcorr(road_grade_track,road_grade_sample);


It is recording the correlation as  xcorrelation
and the lag it corresponds to as lags

I then find the maximum correlation by:
I = find(xcorrelation == max(xcorrelation));

and the lag which it corresponds to as:
Istart = lags(I);

Because of the way that it stores the lag values, I can determine if my
template is lagging or leading the data


So if the template needs to be shifted back to match the sample, i can
just start the template and the path distance it covers at the point of
highest correlation:

road_grade_track = road_grade_track(Istart:length(road_grade_track));
distance_track = distance_track(Istart:length(distance_track));
distance_track = distance_track-distance_track(1);

for plotting purposes I then make the sample and template data the same
length

road_grade_track = road_grade_track(1:length(road_grade_sample));
distance_track = distance_track(1:length(distance_sample));




What i get is not the same as what I get the previous way.

let me know what you think.

i will go back and make sure I have answered most of the questions you two
wrote


thanks for the quick replies!
Ok, so I will be more specific.

My data is collected from 1 sensor at a 50 Hz rate.  I then use velocity
and time measurments to estimate path distance covered.  I then
interpolated my road grade measurements to be set at 3cm intervals.  So
both the template and sample data are road grade as a function of path
distance of 3cm intervals.

To get the template, I went around the track once, and collected the
measurments.

Then i went around the same path, multiple times, and collected
measurements along random spots for random lengths of time.

In post processing I am trying to find where my sample data was taken
(because I have the actual position from GPS to check after I run this
routine)


So the thing that i said "works" was the following:

I find the path distance covered by the sample data (since this has been
decimated/interpolated at 3cm intervals, this is equivalent to just
specifing the number of points) - say its N points at 3 cm intervals

I then take the first N points of my template data and do a correlation
between the 'sectioned' template, and the sample data

I store the correlation.

Then I move my template window by 5 or so points (15 cm) and take points 5
to N+5.  I repeat the correlation.

At the end I have all the correlation values.  From there I can find the
maximum value, and the 'section' of template data it correpsonded to.

Therefore my sample data best matches at that spot of the template,
etc...

so for one sample, after searching through a mile of track, it came back
and matched the 160 meter data set like this:

http://www.personal.psu.edu/rdm186/random_images/Sample3_RoadGrade.jpg



obviously this is computationally intensive, which is why I was working
with the FFT stuff.  I have not doin much with FFT's so I could be making
some obvious mistakes...

When I am using the xcorr function in matlab, I am NOT 'sectioning' the
template.  I am directly feeding in the entire template as well as the
sample.  The Xcorr function says that it will autcomatically zero pad the
smaller array.  

From what I understand, the function will shift the template and
calculated the correlation at each shift both in the forward and backward
direction.  by calling it as:

[xcorrelation,lags] = xcorr(road_grade_track,road_grade_sample);


It is recording the correlation as  xcorrelation
and the lag it corresponds to as lags

I then find the maximum correlation by:
I = find(xcorrelation == max(xcorrelation));

and the lag which it corresponds to as:
Istart = lags(I);

Because of the way that it stores the lag values, I can determine if my
template is lagging or leading the data


So if the template needs to be shifted back to match the sample, i can
just start the template and the path distance it covers at the point of
highest correlation:

road_grade_track = road_grade_track(Istart:length(road_grade_track));
distance_track = distance_track(Istart:length(distance_track));
distance_track = distance_track-distance_track(1);

for plotting purposes I then make the sample and template data the same
length

road_grade_track = road_grade_track(1:length(road_grade_sample));
distance_track = distance_track(1:length(distance_sample));




What i get is not the same as what I get the previous way.

let me know what you think.

i will go back and make sure I have answered most of the questions you two
wrote


thanks for the quick replies!
Ok, so I will be more specific.

My data is collected from 1 sensor at a 50 Hz rate.  I then use velocity
and time measurments to estimate path distance covered.  I then
interpolated my road grade measurements to be set at 3cm intervals.  So
both the template and sample data are road grade as a function of path
distance of 3cm intervals.

To get the template, I went around the track once, and collected the
measurments.

Then i went around the same path, multiple times, and collected
measurements along random spots for random lengths of time.

In post processing I am trying to find where my sample data was taken
(because I have the actual position from GPS to check after I run this
routine)


So the thing that i said "works" was the following:

I find the path distance covered by the sample data (since this has been
decimated/interpolated at 3cm intervals, this is equivalent to just
specifing the number of points) - say its N points at 3 cm intervals

I then take the first N points of my template data and do a correlation
between the 'sectioned' template, and the sample data

I store the correlation.

Then I move my template window by 5 or so points (15 cm) and take points 5
to N+5.  I repeat the correlation.

At the end I have all the correlation values.  From there I can find the
maximum value, and the 'section' of template data it correpsonded to.

Therefore my sample data best matches at that spot of the template,
etc...

so for one sample, after searching through a mile of track, it came back
and matched the 160 meter data set like this:

http://www.personal.psu.edu/rdm186/random_images/Sample3_RoadGrade.jpg



obviously this is computationally intensive, which is why I was working
with the FFT stuff.  I have not doin much with FFT's so I could be making
some obvious mistakes...

When I am using the xcorr function in matlab, I am NOT 'sectioning' the
template.  I am directly feeding in the entire template as well as the
sample.  The Xcorr function says that it will autcomatically zero pad the
smaller array.  

From what I understand, the function will shift the template and
calculated the correlation at each shift both in the forward and backward
direction.  by calling it as:

[xcorrelation,lags] = xcorr(road_grade_track,road_grade_sample);


It is recording the correlation as  xcorrelation
and the lag it corresponds to as lags

I then find the maximum correlation by:
I = find(xcorrelation == max(xcorrelation));

and the lag which it corresponds to as:
Istart = lags(I);

Because of the way that it stores the lag values, I can determine if my
template is lagging or leading the data


So if the template needs to be shifted back to match the sample, i can
just start the template and the path distance it covers at the point of
highest correlation:

road_grade_track = road_grade_track(Istart:length(road_grade_track));
distance_track = distance_track(Istart:length(distance_track));
distance_track = distance_track-distance_track(1);

for plotting purposes I then make the sample and template data the same
length

road_grade_track = road_grade_track(1:length(road_grade_sample));
distance_track = distance_track(1:length(distance_sample));




What i get is not the same as what I get the previous way.

let me know what you think.

i will go back and make sure I have answered most of the questions you two
wrote


thanks for the quick replies!
just as a follow up:

Sample data is   5447  data points
track data is   56366  data points


so the question is...where in the 56366 data points does my 5447 data
points match the best...and how can i calculated this in the most
efficient way


martini skrev:
> Ok, so I will be more specific. > > My data is collected from 1 sensor at a 50 Hz rate. I then use velocity > and time measurments to estimate path distance covered. I then > interpolated my road grade measurements to be set at 3cm intervals. So > both the template and sample data are road grade as a function of path > distance of 3cm intervals.
OK, another numerical detail to keep track of. How do you interpolate the measurements? How many interpolated points between each measurement? Those kinds of things. No need to answer them now, but at some point one might get back to these questions.
> To get the template, I went around the track once, and collected the > measurments.
Wearing my "devil's advocate" robe: What would happen if you made several reference passes around the track? How consistent would the refernce data be between lapses? Coming up with some sort of estimate for this sort of coherence could give a clue about what to expect in your random measurements. Probably too late for a multiple-lap refernce measurement now?
> Then i went around the same path, multiple times, and collected > measurements along random spots for random lengths of time.
OK.
> In post processing I am trying to find where my sample data was taken > (because I have the actual position from GPS to check after I run this > routine)
Good job! Don't see that too often; that one comes up with an estimate that can be checked against and independently measured "truth".
> So the thing that i said "works" was the following: > > I find the path distance covered by the sample data (since this has been > decimated/interpolated at 3cm intervals, this is equivalent to just > specifing the number of points) - say its N points at 3 cm intervals > > I then take the first N points of my template data and do a correlation > between the 'sectioned' template, and the sample data > > I store the correlation.
OK. My only comment hee is that you ought to make sure this correlation coefficient is normalized to the [-1 1] range, or possibly absolute value to the [0,1] range.
> Then I move my template window by 5 or so points (15 cm) and take points 5 > to N+5. I repeat the correlation. > > At the end I have all the correlation values. From there I can find the > maximum value, and the 'section' of template data it correpsonded to. > > Therefore my sample data best matches at that spot of the template, > etc...
Sure, provided that normalization issue has been taken care of.
> so for one sample, after searching through a mile of track, it came back > and matched the 160 meter data set like this: > > http://www.personal.psu.edu/rdm186/random_images/Sample3_RoadGrade.jpg > > obviously this is computationally intensive, which is why I was working > with the FFT stuff. I have not doin much with FFT's so I could be making > some obvious mistakes...
As a general rule, getting things right is usually more important than getting things fast.
> When I am using the xcorr function in matlab, I am NOT 'sectioning' the > template. I am directly feeding in the entire template as well as the > sample. The Xcorr function says that it will autcomatically zero pad the > smaller array. > > From what I understand, the function will shift the template and > calculated the correlation at each shift both in the forward and backward > direction. by calling it as: > > [xcorrelation,lags] = xcorr(road_grade_track,road_grade_sample); > > > It is recording the correlation as xcorrelation > and the lag it corresponds to as lags > > I then find the maximum correlation by: > I = find(xcorrelation == max(xcorrelation)); > > and the lag which it corresponds to as: > Istart = lags(I); > > Because of the way that it stores the lag values, I can determine if my > template is lagging or leading the data > > > So if the template needs to be shifted back to match the sample, i can > just start the template and the path distance it covers at the point of > highest correlation: > > road_grade_track = road_grade_track(Istart:length(road_grade_track)); > distance_track = distance_track(Istart:length(distance_track)); > distance_track = distance_track-distance_track(1); > > for plotting purposes I then make the sample and template data the same > length > > road_grade_track = road_grade_track(1:length(road_grade_sample)); > distance_track = distance_track(1:length(distance_sample));
I'll need some time to look this over.
> What i get is not the same as what I get the previous way. > > let me know what you think.
Well, I would suggest you spend a little bit of time getting to know the computational routines. I don't know how much time you have left for this project, but spending a day or two with the FFT and XCORR functions now might save you lots of time later. What I would do in your situation, was to focus on one of the examples that work, and use it as the referennce data set. Test the various techniques on that data set, see the differences between the estimators, get some code up that runs fast. Only after all that is done do you get back to the other random samples. Then you can set up a processing loop that computes the correlation coefficients and match them to the reference. It seems there are two main questions that would decide how to proceed: 1) Is it possible to obtain a multiple-laps refernce recording from the track? (I assume the answer is 'no' but I have to ask) 2) How much time do you have before you must submit a report? Rune
You might be able to match the points faster if you start with a big
shift (say half your data size sh=2720 instead of 5), find the
'section' (x_min, x_max)that gives the maximum correlation.

 Define the new template as the winning 'section'  with neighboring
points (x_min-sh/2, x_max+sh/2).

Repeat the correlation on the smaller template by reducing the shift
value to sh/2.

Repeat the process till you have a small enough shift you are satisfied
with.



martini wrote:
>I then take the first N points of my template data and do a correlation > between the 'sectioned' template, and the sample data > I store the correlation. > Then I move my template window by 5 or so points (15 cm) and take points 5 > to N+5. I repeat the correlation.
>At the end I have all the correlation values. From there I can find the > maximum value, and the 'section' of template data it correpsonded to.
> just as a follow up: > > Sample data is 5447 data points > track data is 56366 data points > > > so the question is...where in the 56366 data points does my 5447 data > points match the best...and how can i calculated this in the most > efficient way