Reply by Richard Owlett April 15, 20052005-04-15
Jerry Avins wrote:

> Richard Owlett wrote: > >> I've sortof been following several threads concerned with irregular >> sampling of data and various noise effects. >> >> I think I might benefit from a *QUALITATIVE* discussion of how to >> approach a problem which APPEARS to be outside realm of DSP. >> [The problem is "real" but ... ] >> >> QUESTION: >> What is fuel mileage of a class of vehicles. >> >> AVAILABLE DATA (and possible error sources) >> [source is credit card purchase record] >> Date - 0 error >> Time - 0 error >> Gallons - 0 error >> Vehicle Odometer - entered by [ possibly careless ] human :{ >> transposed digits - you don't have to be dyslectic to have problem >> careless entry - hopefully low frequency >> >> POSSIBLY AVAILABLE DATA: >> very few known good date/time/gallons/odometer data points >> >> I can see how to approach problem if dependent variable(gallons) is >> error prone. But what to do if independent variable(odometer) is >> unreliable? >> >> Secondary question. >> Can you end up in this kind of mess in DSP? >> >> For any replies - thanks. > > > Ignoring that different driving (and engine) conditions affect mileage, > dividing total miles by total gallons gives the number you want. The > intermediate data points have no affect at all. An erroneous record of > miles at fill-up will make one leg longer and the other shorter. Two > differences determine the final result unless you have cause to discard > an end point from consideration. > > Jerry
Which goes once again demonstrate why many regulars routinely caution newbies to carefully state question or risk a correct but not useful answer. I was trying to understand how to deal with two similar problems: 1. irregular sampling intervals. 2. sampling ordinate is noisy as well as sampled data being noisy. I chose fuel mileage as a trivial physical problem that I understood. Fred's and Rune's replies in particular pointed out how to "think".
Reply by Jerry Avins April 13, 20052005-04-13
Richard Owlett wrote:
> I've sortof been following several threads concerned with irregular > sampling of data and various noise effects. > > I think I might benefit from a *QUALITATIVE* discussion of how to > approach a problem which APPEARS to be outside realm of DSP. > [The problem is "real" but ... ] > > QUESTION: > What is fuel mileage of a class of vehicles. > > AVAILABLE DATA (and possible error sources) > [source is credit card purchase record] > Date - 0 error > Time - 0 error > Gallons - 0 error > Vehicle Odometer - entered by [ possibly careless ] human :{ > transposed digits - you don't have to be dyslectic to have problem > careless entry - hopefully low frequency > > POSSIBLY AVAILABLE DATA: > very few known good date/time/gallons/odometer data points > > I can see how to approach problem if dependent variable(gallons) is > error prone. But what to do if independent variable(odometer) is > unreliable? > > Secondary question. > Can you end up in this kind of mess in DSP? > > For any replies - thanks.
Ignoring that different driving (and engine) conditions affect mileage, dividing total miles by total gallons gives the number you want. The intermediate data points have no affect at all. An erroneous record of miles at fill-up will make one leg longer and the other shorter. Two differences determine the final result unless you have cause to discard an end point from consideration. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by Richard Owlett April 13, 20052005-04-13
Fred Marshall wrote:
> "Richard Owlett" <rowlett@atlascomm.net> wrote in message > news:115ogtrkdp7jj7f@corp.supernews.com... > >>I've sortof been following several threads concerned with irregular >>sampling of data and various noise effects. >> >>I think I might benefit from a *QUALITATIVE* discussion of how to approach >>a problem which APPEARS to be outside realm of DSP. >>[The problem is "real" but ... ] >> >>QUESTION: >>What is fuel mileage of a class of vehicles. >> >>AVAILABLE DATA (and possible error sources) >>[source is credit card purchase record] >> Date - 0 error >> Time - 0 error >> Gallons - 0 error >> Vehicle Odometer - entered by [ possibly careless ] human :{ >> transposed digits - you don't have to be dyslectic to have problem >> careless entry - hopefully low frequency >> >>POSSIBLY AVAILABLE DATA: >> very few known good date/time/gallons/odometer data points >> >>I can see how to approach problem if dependent variable(gallons) is error >>prone. But what to do if independent variable(odometer) is unreliable? >> >>Secondary question. >>Can you end up in this kind of mess in DSP? >> > > > Sure. This suggests plotting the data in some form. > > Since you have time accurately, one notion would be to record calculated > miles per gallon as a function of time. Then you can use a variety of > methods to get rid of bad data points known as "outliers". Then, if you are > willing to assume that gas mileage is a constant, you can fit a flat, > straight line to the remaining data using a least squares fit. > > Fred > >
Thanks. Once I see the answer it becomes obvious ;) Looks like my basic problem was making problem more complicated than it was. This kick starts me into thinking about other facts I know that allows me to appropriately treat bad data points. 1. its unlikely that one day's MPG would vary more that x% from average. 2. for >90% of cases total mileage for a particular day WILL BE A + i*B + j*C + k*D where i and j can be 0|1|2 k can be 0|1 B|C << A D << B|C In simplest case a bad odometer, fuel record can be deleted and fuel used can be added to next day's record. If LMS is deemed necessary rather than simple average, that point can be given 'double weight'. So this is exercise in THINKING not dsp :) I'll quit babbling. Thank you Fred, Peter, Rune
Reply by Rune Allnor April 13, 20052005-04-13
Richard Owlett wrote:
> I've sortof been following several threads concerned with irregular > sampling of data and various noise effects. > > I think I might benefit from a *QUALITATIVE* discussion of how to > approach a problem which APPEARS to be outside realm of DSP. > [The problem is "real" but ... ] > > QUESTION: > What is fuel mileage of a class of vehicles. > > AVAILABLE DATA (and possible error sources) > [source is credit card purchase record] > Date - 0 error > Time - 0 error > Gallons - 0 error > Vehicle Odometer - entered by [ possibly careless ] human :{ > transposed digits - you don't have to be dyslectic to have
problem
> careless entry - hopefully low frequency > > POSSIBLY AVAILABLE DATA: > very few known good date/time/gallons/odometer data points > > I can see how to approach problem if dependent variable(gallons) is > error prone. But what to do if independent variable(odometer) is
unreliable?
> > Secondary question. > Can you end up in this kind of mess in DSP? > > For any replies - thanks.
This seems to me as a Total Least Squares (TLS) problem. The difference between the TLS and the "usual" Least Mean Squares (LMS) problem, is that there is the inherent assumption in LMS that there is uncertainty/errors only in the dependent variable, and not in the parameter. The TLS method, on the other hand, is based on that even the parameter can be "noisy". The computations in the TLS can be a bit tricky, though, and are based on concepts from linear algebra. Check out chapter 12.3 of Golub & van Loan: "Matrix Computations", 3rd ed. Rune
Reply by Peter K. April 12, 20052005-04-12
Richard Owlett wrote:

> I can see how to approach problem if dependent variable(gallons) is > error prone. But what to do if independent variable(odometer) is > unreliable?
If one is known to be reliable, just make it the independent variable.
> Secondary question. > Can you end up in this kind of mess in DSP?
Sure! The image processing problem of trying to find straight lines in a noisy set of points has the same sorts of problem --- except that there is noise on both the "x" and "y" axes (independent and dependent variables). Ciao, Peter K.
Reply by Fred Marshall April 12, 20052005-04-12
"Richard Owlett" <rowlett@atlascomm.net> wrote in message 
news:115ogtrkdp7jj7f@corp.supernews.com...
> I've sortof been following several threads concerned with irregular > sampling of data and various noise effects. > > I think I might benefit from a *QUALITATIVE* discussion of how to approach > a problem which APPEARS to be outside realm of DSP. > [The problem is "real" but ... ] > > QUESTION: > What is fuel mileage of a class of vehicles. > > AVAILABLE DATA (and possible error sources) > [source is credit card purchase record] > Date - 0 error > Time - 0 error > Gallons - 0 error > Vehicle Odometer - entered by [ possibly careless ] human :{ > transposed digits - you don't have to be dyslectic to have problem > careless entry - hopefully low frequency > > POSSIBLY AVAILABLE DATA: > very few known good date/time/gallons/odometer data points > > I can see how to approach problem if dependent variable(gallons) is error > prone. But what to do if independent variable(odometer) is unreliable? > > Secondary question. > Can you end up in this kind of mess in DSP? >
Sure. This suggests plotting the data in some form. Since you have time accurately, one notion would be to record calculated miles per gallon as a function of time. Then you can use a variety of methods to get rid of bad data points known as "outliers". Then, if you are willing to assume that gas mileage is a constant, you can fit a flat, straight line to the remaining data using a least squares fit. Fred
Reply by Richard Owlett April 12, 20052005-04-12
I've sortof been following several threads concerned with irregular 
sampling of data and various noise effects.

I think I might benefit from a *QUALITATIVE* discussion of how to 
approach a problem which APPEARS to be outside realm of DSP.
[The problem is "real" but ... ]

QUESTION:
What is fuel mileage of a class of vehicles.

AVAILABLE DATA (and possible error sources)
[source is credit card purchase record]
  Date - 0 error
  Time - 0 error
  Gallons - 0 error
  Vehicle Odometer - entered by [ possibly careless ] human :{
     transposed digits - you don't have to be dyslectic to have problem
     careless entry - hopefully low frequency

POSSIBLY AVAILABLE DATA:
  very few known good date/time/gallons/odometer data points

I can see how to approach problem if dependent variable(gallons) is 
error prone. But what to do if independent variable(odometer) is unreliable?

Secondary question.
Can you end up in this kind of mess in DSP?

For any replies - thanks.