Steve Underwood wrote:
> Jerry Avins wrote:
> > JF Mezei wrote:
> >
> >   ...
> >
> >> So one must really understand the "event" as well as how the data was
> >> recorded for that event before starting to process such data and
> >> eliminate points judged to be "bad".
> >
> > One must always understand data in order to analyze and interpret it
> > meaningfully. Believing otherwise is like believing that someone can
> > manage a business without understanding its nature.
>
> Tell that to an MBA. :-)
>
> You are quite right. When I asked what kind of nav system this is, there
> was no response. All the discussion has been about hypothetical
> something or others, rather than real world improvement of specific
> problems in the data.
>
> Steve

My apologies for that, Steve. These *are* real-world data from
real-world
systems, but as I don't know exactly where the limits go for corporate
"hush-hush" I'll rather play it safe for now. And, of course, I can
guess
but I don't necessarily know the important details.

No offence to you or any other. I saw a way of doing things and I
wanted
to know what the alternatives, preferably commercially available, are.
As
there seems to be no canned solution available, I'll probably have to
program some Kalman filters myself and play with them until I get a
sense for how these things work and how to incorporate the various
ideas. 

Rune

Jerry Avins wrote:
> JF Mezei wrote:
> 
>   ...
> 
>> So one must really understand the "event" as well as how the data was
>> recorded for that event before starting to process such data and
>> eliminate points judged to be "bad".
> 
> One must always understand data in order to analyze and interpret it 
> meaningfully. Believing otherwise is like believing that someone can 
> manage a business without understanding its nature.

Tell that to an MBA. :-)

You are quite right. When I asked what kind of nav system this is, there 
was no response. All the discussion has been about hypothetical 
something or others, rather than real world improvement of specific 
problems in the data.

Steve

Mogens Beltoft wrote:
> If the new sampled track point n is outside the "road" defined by track
> points n-1 and n-2 plus a margin to each side of the line n-2 to n-1, or
> the unit has not recorded a track point for "this long", then record
> track point n.

Change in speed also causes a track point to be recorded on sime Garmin
units. And I think that change in heading also does. I don't think
Garmin ever documented the algorythm.

Ulrich Bangert wrote:
> Hello JF Mezei,
> 
>> Ok. fair enough. But that still leaves the requirement that the user
>> know about the type of data that he has to process, the types of
>> irregularities which must be retained, and those that can be removed
>> because this will be needed to decide on the window size. And one also
>> need to know how the data was collected.
> 
>  Agreed!
> 
>> of stray points. With "auto" track recording, chances are very good that
>> the GPS would record a point at the turnoff, one point at the stop for
>> water, and again a point once the car gets back to main road and turns
>> back into the normal direction.
> 
> I am not sure if i interprete the term "auto track recording" in the right
> way. Perhaps it is even a "standard" term in navigation that i am not aware
> of  (I have seen the question for outlier detection purely from a
> mathematical point of view). But if it is some kind of "event driven" track
> recording you are of course right that the proposed algorithm can not handle
> data acquired in this way because some frontend entity has already made the
> decision what an event is and what not and has missed to acquire the
> "surrounding data" that are necessary for the algorithm.

I read somewhere, that some GPS units use a boundary and time check when 
recording track points in auto mode.

It went something like this:

If the new sampled track point n is outside the "road" defined by track 
points n-1 and n-2 plus a margin to each side of the line n-2 to n-1, or 
the unit has not recorded a track point for "this long", then record 
track point n.

/Mogens

JF Mezei wrote:

   ...

> So one must really understand the "event" as well as how the data was
> recorded for that event before starting to process such data and
> eliminate points judged to be "bad".

One must always understand data in order to analyze and interpret it 
meaningfully. Believing otherwise is like believing that someone can 
manage a business without understanding its nature.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Hello JF Mezei,

> Ok. fair enough. But that still leaves the requirement that the user
> know about the type of data that he has to process, the types of
> irregularities which must be retained, and those that can be removed
> because this will be needed to decide on the window size. And one also
> need to know how the data was collected.

 Agreed!

> of stray points. With "auto" track recording, chances are very good that
> the GPS would record a point at the turnoff, one point at the stop for
> water, and again a point once the car gets back to main road and turns
> back into the normal direction.

I am not sure if i interprete the term "auto track recording" in the right
way. Perhaps it is even a "standard" term in navigation that i am not aware
of  (I have seen the question for outlier detection purely from a
mathematical point of view). But if it is some kind of "event driven" track
recording you are of course right that the proposed algorithm can not handle
data acquired in this way because some frontend entity has already made the
decision what an event is and what not and has missed to acquire the
"surrounding data" that are necessary for the algorithm.

Regards
Ulrich

"JF Mezei" <jfmezei.spamnot@teksavvy.com> schrieb im Newsbeitrag
news:44A2289A.BF59B898@teksavvy.com...
> Ulrich Bangert wrote:
> > very significant. Note, that the algorithm can fit BOTH kind of views by
> > adopting the window length. If you make the window length greater than 2
X
> > the "hill length" then the hill will be completely removed from the
data. If
> > you find that the hill is significant then make the window length
smaller
> > than 2X the "hill length", in this case the hill will not be filtered
out.
>
>
> Ok. fair enough. But that still leaves the requirement that the user
> know about the type of data that he has to process, the types of
> irregularities which must be retained, and those that can be removed
> because this will be needed to decide on the window size. And one also
> need to know how the data was collected.
>
> Say on a long straight road, a car turns off and drives 100m to a water
> hole/pump.  With periodic trackpoint recording, you could have a couple
> of stray points. With "auto" track recording, chances are very good that
> the GPS would record a point at the turnoff, one point at the stop for
> water, and again a point once the car gets back to main road and turns
> back into the normal direction.
>
> Now, both would have a couple of stray points from a purely
> "mathematical" point of view. But in the second case, a human  could
> more clearly see a path away from road and back to the road at the same
> intersection to resume course.
>
> So one must really understand the "event" as well as how the data was
> recorded for that event before starting to process such data and
> eliminate points judged to be "bad".

Rune,

as a dedicated follower of PASCAL i program in Borland DELPHI which produces
native code that i do not suspect to be significantly slower then C/C++
generated code. But over the years I have found that the Matlab help system
gives me information about mathematical topics at exactly the level that
seems to match me, that's why i pointed to it. If Plotter does not read your
files, then (in case they are ASCII) send me a few lines of it. I am very
interested to make my file read routines as universal as possible, so every
no-go is a object of interest.

Regards
Ulrich

"Rune Allnor" <allnor@tele.ntnu.no> schrieb im Newsbeitrag
news:1151475859.983140.83250@d56g2000cwd.googlegroups.com...
>
> Ulrich Bangert wrote:
> > To Rune:
> >
> > On a typical pc with a window width of 100 I process 600000 data points
in
> > 1-2 minutes, so it is not THAT slow that my first mail may have
indicated. I
> > use this algorithm for example in a freeware software named "Plotter".
You
> > can download "Plotter" from my homepage
> >
> > www.ulrich-bangert.de
> >
> > If you manage to load your data files with that (chances are..) you can
> > immediatly test the quality and the speed of the outlier detection.
>
> I'll definately have a look into this. Your first post indicated you
> have programmed these things in matlab? If so, there is a speed-up
> potential here. I usually get a speed-up on the order of 10-50x when
> I port from matlab to C or C++.
>
> Rune
>

Ulrich Bangert wrote:
> very significant. Note, that the algorithm can fit BOTH kind of views by
> adopting the window length. If you make the window length greater than 2 X
> the "hill length" then the hill will be completely removed from the data. If
> you find that the hill is significant then make the window length smaller
> than 2X the "hill length", in this case the hill will not be filtered out.

Ok. fair enough. But that still leaves the requirement that the user
know about the type of data that he has to process, the types of
irregularities which must be retained, and those that can be removed
because this will be needed to decide on the window size. And one also
need to know how the data was collected.

Say on a long straight road, a car turns off and drives 100m to a water
hole/pump.  With periodic trackpoint recording, you could have a couple
of stray points. With "auto" track recording, chances are very good that
the GPS would record a point at the turnoff, one point at the stop for
water, and again a point once the car gets back to main road and turns
back into the normal direction.

Now, both would have a couple of stray points from a purely
"mathematical" point of view. But in the second case, a human  could
more clearly see a path away from road and back to the road at the same
intersection to resume course.

So one must really understand the "event" as well as how the data was
recorded for that event before starting to process such data and
eliminate points judged to be "bad".

Ulrich Bangert wrote:
> To Rune:
>
> On a typical pc with a window width of 100 I process 600000 data points in
> 1-2 minutes, so it is not THAT slow that my first mail may have indicated. I
> use this algorithm for example in a freeware software named "Plotter". You
> can download "Plotter" from my homepage
>
> www.ulrich-bangert.de
>
> If you manage to load your data files with that (chances are..) you can
> immediatly test the quality and the speed of the outlier detection.

I'll definately have a look into this. Your first post indicated you
have programmed these things in matlab? If so, there is a speed-up
potential here. I usually get a speed-up on the order of 10-50x when
I port from matlab to C or C++.

Rune

To Rune:

On a typical pc with a window width of 100 I process 600000 data points in
1-2 minutes, so it is not THAT slow that my first mail may have indicated. I
use this algorithm for example in a freeware software named "Plotter". You
can download "Plotter" from my homepage

www.ulrich-bangert.de

If you manage to load your data files with that (chances are..) you can
immediatly test the quality and the speed of the outlier detection.

To JF Mezei:

I you managed to figure out exactly what the algorithm does, you will have
noticed that for detecting outliers everything is significant, that is
INSIDE the window, nothing else. For that reason, if this algorithm is
applied to the scenario you present, the first thing to say is, that it does
not matter at all whether you have been riding for 6, 12, 18 or anything
hours before you meet the hill. The algorithm is completely insensitive to
that!

The window is something like "If you want to detect outliers look only to
values in the neighbourhood and decide what is normal and what is not for
them". Please note also, that your scenario arises the question for a
definition of  "oulier". Other people would pehrhaps think that the "hill
scenario" IS indeed a outlier that should be removed while you think it is
very significant. Note, that the algorithm can fit BOTH kind of views by
adopting the window length. If you make the window length greater than 2 X
the "hill length" then the hill will be completely removed from the data. If
you find that the hill is significant then make the window length smaller
than 2X the "hill length", in this case the hill will not be filtered out.
By applying the rule "a event shorter than n/2 may be a outlier" YOU decide
what is an outlier not the algorithm.

I cannot accept your second objection, it is a outlier detection algorithm,
not a biker's rest detection algorithm. But if you want to put forward the
question whether the rest will be detected as an outlier or not, the same
rules apply as above: If the window length is set to value so that the
length of the braking action before stop and the window length "match" then
the stop will be recognized as a "normal" change in data

Regards
Ulrich

"Rune Allnor" <allnor@tele.ntnu.no> schrieb im Newsbeitrag
news:1151393854.224220.97860@p79g2000cwp.googlegroups.com...
>
> Ulrich Bangert wrote:  lots of interesting stuff.
>
> Thanks. Sounds like something to look into. Processing speed is
> (as of yet) insignificant if it can release man-hours for other duties.
> Where I am right now, man-hours are expensive. If a computer
> needs 12 hours for this sort of job, then so be it, if it can be done
> in the human operator's time off watch.
>
> Rune
>