Reply by Rune Allnor June 29, 20062006-06-29
Steve Underwood wrote:
> Jerry Avins wrote: > > JF Mezei wrote: > > > > ... > > > >> So one must really understand the "event" as well as how the data was > >> recorded for that event before starting to process such data and > >> eliminate points judged to be "bad". > > > > One must always understand data in order to analyze and interpret it > > meaningfully. Believing otherwise is like believing that someone can > > manage a business without understanding its nature. > > Tell that to an MBA. :-) > > You are quite right. When I asked what kind of nav system this is, there > was no response. All the discussion has been about hypothetical > something or others, rather than real world improvement of specific > problems in the data. > > Steve
My apologies for that, Steve. These *are* real-world data from real-world systems, but as I don't know exactly where the limits go for corporate "hush-hush" I'll rather play it safe for now. And, of course, I can guess but I don't necessarily know the important details. No offence to you or any other. I saw a way of doing things and I wanted to know what the alternatives, preferably commercially available, are. As there seems to be no canned solution available, I'll probably have to program some Kalman filters myself and play with them until I get a sense for how these things work and how to incorporate the various ideas. Rune
Reply by Steve Underwood June 28, 20062006-06-28
Jerry Avins wrote:
> JF Mezei wrote: > > ... > >> So one must really understand the "event" as well as how the data was >> recorded for that event before starting to process such data and >> eliminate points judged to be "bad". > > One must always understand data in order to analyze and interpret it > meaningfully. Believing otherwise is like believing that someone can > manage a business without understanding its nature.
Tell that to an MBA. :-) You are quite right. When I asked what kind of nav system this is, there was no response. All the discussion has been about hypothetical something or others, rather than real world improvement of specific problems in the data. Steve
Reply by JF Mezei June 28, 20062006-06-28
Mogens Beltoft wrote:
> If the new sampled track point n is outside the "road" defined by track > points n-1 and n-2 plus a margin to each side of the line n-2 to n-1, or > the unit has not recorded a track point for "this long", then record > track point n.
Change in speed also causes a track point to be recorded on sime Garmin units. And I think that change in heading also does. I don't think Garmin ever documented the algorythm.
Reply by Mogens Beltoft June 28, 20062006-06-28
Ulrich Bangert wrote:
> Hello JF Mezei, > >> Ok. fair enough. But that still leaves the requirement that the user >> know about the type of data that he has to process, the types of >> irregularities which must be retained, and those that can be removed >> because this will be needed to decide on the window size. And one also >> need to know how the data was collected. > > Agreed! > >> of stray points. With "auto" track recording, chances are very good that >> the GPS would record a point at the turnoff, one point at the stop for >> water, and again a point once the car gets back to main road and turns >> back into the normal direction. > > I am not sure if i interprete the term "auto track recording" in the right > way. Perhaps it is even a "standard" term in navigation that i am not aware > of (I have seen the question for outlier detection purely from a > mathematical point of view). But if it is some kind of "event driven" track > recording you are of course right that the proposed algorithm can not handle > data acquired in this way because some frontend entity has already made the > decision what an event is and what not and has missed to acquire the > "surrounding data" that are necessary for the algorithm.
I read somewhere, that some GPS units use a boundary and time check when recording track points in auto mode. It went something like this: If the new sampled track point n is outside the "road" defined by track points n-1 and n-2 plus a margin to each side of the line n-2 to n-1, or the unit has not recorded a track point for "this long", then record track point n. /Mogens
Reply by Jerry Avins June 28, 20062006-06-28
JF Mezei wrote:

   ...

> So one must really understand the "event" as well as how the data was > recorded for that event before starting to process such data and > eliminate points judged to be "bad".
One must always understand data in order to analyze and interpret it meaningfully. Believing otherwise is like believing that someone can manage a business without understanding its nature. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by Ulrich Bangert June 28, 20062006-06-28
Hello JF Mezei,

> Ok. fair enough. But that still leaves the requirement that the user > know about the type of data that he has to process, the types of > irregularities which must be retained, and those that can be removed > because this will be needed to decide on the window size. And one also > need to know how the data was collected.
Agreed!
> of stray points. With "auto" track recording, chances are very good that > the GPS would record a point at the turnoff, one point at the stop for > water, and again a point once the car gets back to main road and turns > back into the normal direction.
I am not sure if i interprete the term "auto track recording" in the right way. Perhaps it is even a "standard" term in navigation that i am not aware of (I have seen the question for outlier detection purely from a mathematical point of view). But if it is some kind of "event driven" track recording you are of course right that the proposed algorithm can not handle data acquired in this way because some frontend entity has already made the decision what an event is and what not and has missed to acquire the "surrounding data" that are necessary for the algorithm. Regards Ulrich "JF Mezei" <jfmezei.spamnot@teksavvy.com> schrieb im Newsbeitrag news:44A2289A.BF59B898@teksavvy.com...
> Ulrich Bangert wrote: > > very significant. Note, that the algorithm can fit BOTH kind of views by > > adopting the window length. If you make the window length greater than 2
X
> > the "hill length" then the hill will be completely removed from the
data. If
> > you find that the hill is significant then make the window length
smaller
> > than 2X the "hill length", in this case the hill will not be filtered
out.
> > > Ok. fair enough. But that still leaves the requirement that the user > know about the type of data that he has to process, the types of > irregularities which must be retained, and those that can be removed > because this will be needed to decide on the window size. And one also > need to know how the data was collected. > > Say on a long straight road, a car turns off and drives 100m to a water > hole/pump. With periodic trackpoint recording, you could have a couple > of stray points. With "auto" track recording, chances are very good that > the GPS would record a point at the turnoff, one point at the stop for > water, and again a point once the car gets back to main road and turns > back into the normal direction. > > Now, both would have a couple of stray points from a purely > "mathematical" point of view. But in the second case, a human could > more clearly see a path away from road and back to the road at the same > intersection to resume course. > > So one must really understand the "event" as well as how the data was > recorded for that event before starting to process such data and > eliminate points judged to be "bad".
Reply by Ulrich Bangert June 28, 20062006-06-28
Rune,

as a dedicated follower of PASCAL i program in Borland DELPHI which produces
native code that i do not suspect to be significantly slower then C/C++
generated code. But over the years I have found that the Matlab help system
gives me information about mathematical topics at exactly the level that
seems to match me, that's why i pointed to it. If Plotter does not read your
files, then (in case they are ASCII) send me a few lines of it. I am very
interested to make my file read routines as universal as possible, so every
no-go is a object of interest.

Regards
Ulrich


"Rune Allnor" <allnor@tele.ntnu.no> schrieb im Newsbeitrag
news:1151475859.983140.83250@d56g2000cwd.googlegroups.com...
> > Ulrich Bangert wrote: > > To Rune: > > > > On a typical pc with a window width of 100 I process 600000 data points
in
> > 1-2 minutes, so it is not THAT slow that my first mail may have
indicated. I
> > use this algorithm for example in a freeware software named "Plotter".
You
> > can download "Plotter" from my homepage > > > > www.ulrich-bangert.de > > > > If you manage to load your data files with that (chances are..) you can > > immediatly test the quality and the speed of the outlier detection. > > I'll definately have a look into this. Your first post indicated you > have programmed these things in matlab? If so, there is a speed-up > potential here. I usually get a speed-up on the order of 10-50x when > I port from matlab to C or C++. > > Rune >
Reply by JF Mezei June 28, 20062006-06-28
Ulrich Bangert wrote:
> very significant. Note, that the algorithm can fit BOTH kind of views by > adopting the window length. If you make the window length greater than 2 X > the "hill length" then the hill will be completely removed from the data. If > you find that the hill is significant then make the window length smaller > than 2X the "hill length", in this case the hill will not be filtered out.
Ok. fair enough. But that still leaves the requirement that the user know about the type of data that he has to process, the types of irregularities which must be retained, and those that can be removed because this will be needed to decide on the window size. And one also need to know how the data was collected. Say on a long straight road, a car turns off and drives 100m to a water hole/pump. With periodic trackpoint recording, you could have a couple of stray points. With "auto" track recording, chances are very good that the GPS would record a point at the turnoff, one point at the stop for water, and again a point once the car gets back to main road and turns back into the normal direction. Now, both would have a couple of stray points from a purely "mathematical" point of view. But in the second case, a human could more clearly see a path away from road and back to the road at the same intersection to resume course. So one must really understand the "event" as well as how the data was recorded for that event before starting to process such data and eliminate points judged to be "bad".
Reply by Rune Allnor June 28, 20062006-06-28
Ulrich Bangert wrote:
> To Rune: > > On a typical pc with a window width of 100 I process 600000 data points in > 1-2 minutes, so it is not THAT slow that my first mail may have indicated. I > use this algorithm for example in a freeware software named "Plotter". You > can download "Plotter" from my homepage > > www.ulrich-bangert.de > > If you manage to load your data files with that (chances are..) you can > immediatly test the quality and the speed of the outlier detection.
I'll definately have a look into this. Your first post indicated you have programmed these things in matlab? If so, there is a speed-up potential here. I usually get a speed-up on the order of 10-50x when I port from matlab to C or C++. Rune
Reply by Ulrich Bangert June 28, 20062006-06-28
To Rune:

On a typical pc with a window width of 100 I process 600000 data points in
1-2 minutes, so it is not THAT slow that my first mail may have indicated. I
use this algorithm for example in a freeware software named "Plotter". You
can download "Plotter" from my homepage

www.ulrich-bangert.de

If you manage to load your data files with that (chances are..) you can
immediatly test the quality and the speed of the outlier detection.

To JF Mezei:

I you managed to figure out exactly what the algorithm does, you will have
noticed that for detecting outliers everything is significant, that is
INSIDE the window, nothing else. For that reason, if this algorithm is
applied to the scenario you present, the first thing to say is, that it does
not matter at all whether you have been riding for 6, 12, 18 or anything
hours before you meet the hill. The algorithm is completely insensitive to
that!

The window is something like "If you want to detect outliers look only to
values in the neighbourhood and decide what is normal and what is not for
them". Please note also, that your scenario arises the question for a
definition of  "oulier". Other people would pehrhaps think that the "hill
scenario" IS indeed a outlier that should be removed while you think it is
very significant. Note, that the algorithm can fit BOTH kind of views by
adopting the window length. If you make the window length greater than 2 X
the "hill length" then the hill will be completely removed from the data. If
you find that the hill is significant then make the window length smaller
than 2X the "hill length", in this case the hill will not be filtered out.
By applying the rule "a event shorter than n/2 may be a outlier" YOU decide
what is an outlier not the algorithm.

I cannot accept your second objection, it is a outlier detection algorithm,
not a biker's rest detection algorithm. But if you want to put forward the
question whether the rest will be detected as an outlier or not, the same
rules apply as above: If the window length is set to value so that the
length of the braking action before stop and the window length "match" then
the stop will be recognized as a "normal" change in data

Regards
Ulrich

"Rune Allnor" <allnor@tele.ntnu.no> schrieb im Newsbeitrag
news:1151393854.224220.97860@p79g2000cwp.googlegroups.com...
> > Ulrich Bangert wrote: lots of interesting stuff. > > Thanks. Sounds like something to look into. Processing speed is > (as of yet) insignificant if it can release man-hours for other duties. > Where I am right now, man-hours are expensive. If a computer > needs 12 hours for this sort of job, then so be it, if it can be done > in the human operator's time off watch. > > Rune >