DSPRelated.com
Forums

random ordinate...moving average?

Started by Unknown January 3, 2014
Hello all,
I have two data series of 20 years length. One is of random dates, and
my weight on those dates. The other is of other random dates in the
same range, and the number of quarts of yogurt I purchased on those
dates. I would like to get a series representing average yogurt
consumption, and average weight, graph them, and see the relationship
between the two.
Can anyone suggest how to deal with data that doesn't have a constant
sample frequency?

-- 
Jack
On Friday, January 3, 2014 9:54:29 PM UTC-5, John O'Flaherty wrote:
> Hello all, > > I have two data series of 20 years length. One is of random dates, and > > my weight on those dates. The other is of other random dates in the > > same range, and the number of quarts of yogurt I purchased on those > > dates. I would like to get a series representing average yogurt > > consumption, and average weight, graph them, and see the relationship > > between the two. > > Can anyone suggest how to deal with data that doesn't have a constant > > sample frequency? > > > > -- > > Jack
Jack, Interesting question! In terms of stats, you need a model. Lacking such a model, try something like For each weight date: weight(T) = G* [yogurt1*exp(a*(t1-T)) + yogurt2*exp(a*(t2-T) + ....] Here T is the date for a given weight, and yogurt1 is the amount of yogurt purchased on date t1. Similarly each other purchase of yogurt is treated similarly. "a" is the exponential decay factor. "G" is an overall conversion factor. Set up a matrix of equations for each date you have a weight measurement on using of course yogurt amounts for dates prior to the date the weight is measured on. Solve the matrix and see how well this model fits. You can also set up the model in other ways as well. Think about how fast you gain and lose weight and devise a model for that. With a finite set of data, set up a matrix and solve it to find the model's parameters. IHTH, (food for thought) Clay
On Fri, 3 Jan 2014 19:31:26 -0800 (PST), clay@claysturner.com wrote:

>On Friday, January 3, 2014 9:54:29 PM UTC-5, John O'Flaherty wrote: >> Hello all, >> I have two data series of 20 years length. One is of random dates, and >> my weight on those dates. The other is of other random dates in the >> same range, and the number of quarts of yogurt I purchased on those >> dates. I would like to get a series representing average yogurt >> consumption, and average weight, graph them, and see the relationship >> between the two. >> Can anyone suggest how to deal with data that doesn't have a constant >> sample frequency?
>Interesting question! In terms of stats, you need a model. Lacking such a model, try something like >For each weight date: >weight(T) = G* [yogurt1*exp(a*(t1-T)) + yogurt2*exp(a*(t2-T) + ....] >Here T is the date for a given weight, and yogurt1 is the amount of yogurt purchased on date t1. > Similarly each other purchase of yogurt is treated similarly. "a" is the exponential decay factor. >"G" is an overall conversion factor. Set up a matrix of equations for each date you have a weight > measurement on using of course yogurt amounts for dates prior to the date the weight is > measured on. Solve the matrix and see how well this model fits. >You can also set up the model in other ways as well. Think about how fast you gain >and lose weight and devise a model for that. > With a finite set of data, set up a matrix and solve it to find the model's parameters.
Thanks for the reply, Clay. I see your expression is already incorporating the two series. I was hoping to take the separate series and make a continuous expression for each, that is, to just resolve the problem of randomly spaced data. I could choose some arbitrary interval, a week or month say, and add up purchases in that range for quarts-per-week, but I wonder if there's a better way to smooth the individual series? A resampling from random times to a frequency. I see matlab has a moving average, with an argument of tsobj, time series object, but one of its parameters is frequency. -- Jack
quiasmox@yahoo.com wrote:
> On Fri, 3 Jan 2014 19:31:26 -0800 (PST), clay@claysturner.com wrote: > >> On Friday, January 3, 2014 9:54:29 PM UTC-5, John O'Flaherty wrote: >>> Hello all, >>> I have two data series of 20 years length. One is of random dates, and >>> my weight on those dates. The other is of other random dates in the >>> same range, and the number of quarts of yogurt I purchased on those >>> dates. I would like to get a series representing average yogurt >>> consumption, and average weight, graph them, and see the relationship >>> between the two. >>> Can anyone suggest how to deal with data that doesn't have a constant >>> sample frequency? > >> Interesting question! In terms of stats, you need a model. Lacking such a model, try something like >> For each weight date: >> weight(T) = G* [yogurt1*exp(a*(t1-T)) + yogurt2*exp(a*(t2-T) + ....] >> Here T is the date for a given weight, and yogurt1 is the amount of yogurt purchased on date t1. >> Similarly each other purchase of yogurt is treated similarly. "a" is the exponential decay factor. >> "G" is an overall conversion factor. Set up a matrix of equations for each date you have a weight >> measurement on using of course yogurt amounts for dates prior to the date the weight is >> measured on. Solve the matrix and see how well this model fits. >> You can also set up the model in other ways as well. Think about how fast you gain >> and lose weight and devise a model for that. >> With a finite set of data, set up a matrix and solve it to find the model's parameters. > > Thanks for the reply, Clay. I see your expression is already > incorporating the two series. I was hoping to take the separate series > and make a continuous expression for each, that is, to just resolve > the problem of randomly spaced data. I could choose some arbitrary > interval, a week or month say, and add up purchases in that range for > quarts-per-week, but I wonder if there's a better way to smooth the > individual series? A resampling from random times to a frequency. I > see matlab has a moving average, with an argument of tsobj, time > series object, but one of its parameters is frequency. >
IIRC Scilab 4.? had a set of functions that would do a "best fit" of irregularly spaced data to a fixed interval. My machine with Scilab 4.? isn't available at the moment so I can't check. I don't know if the same set are available in version >= 5 (there were a bunch of changes)
On Sat, 04 Jan 2014 10:13:15 -0600, Richard Owlett wrote:

> quiasmox@yahoo.com wrote: >> On Fri, 3 Jan 2014 19:31:26 -0800 (PST), clay@claysturner.com wrote: >> >>> On Friday, January 3, 2014 9:54:29 PM UTC-5, John O'Flaherty wrote: >>>> Hello all, >>>> I have two data series of 20 years length. One is of random dates, >>>> and my weight on those dates. The other is of other random dates in >>>> the same range, and the number of quarts of yogurt I purchased on >>>> those dates. I would like to get a series representing average yogurt >>>> consumption, and average weight, graph them, and see the relationship >>>> between the two. >>>> Can anyone suggest how to deal with data that doesn't have a constant >>>> sample frequency? >> >>> Interesting question! In terms of stats, you need a model. Lacking >>> such a model, try something like For each weight date: >>> weight(T) = G* [yogurt1*exp(a*(t1-T)) + yogurt2*exp(a*(t2-T) + ....] >>> Here T is the date for a given weight, and yogurt1 is the amount of >>> yogurt purchased on date t1. Similarly each other purchase of yogurt >>> is treated similarly. "a" is the exponential decay factor. "G" is an >>> overall conversion factor. Set up a matrix of equations for each date >>> you have a weight measurement on using of course yogurt amounts for >>> dates prior to the date the weight is measured on. Solve the matrix >>> and see how well this model fits. >>> You can also set up the model in other ways as well. Think about how >>> fast you gain and lose weight and devise a model for that. >>> With a finite set of data, set up a matrix and solve it to find the >>> model's parameters. >> >> Thanks for the reply, Clay. I see your expression is already >> incorporating the two series. I was hoping to take the separate series >> and make a continuous expression for each, that is, to just resolve the >> problem of randomly spaced data. I could choose some arbitrary >> interval, a week or month say, and add up purchases in that range for >> quarts-per-week, but I wonder if there's a better way to smooth the >> individual series? A resampling from random times to a frequency. I see >> matlab has a moving average, with an argument of tsobj, time series >> object, but one of its parameters is frequency. >> >> > IIRC Scilab 4.? had a set of functions that would do a "best fit" > of irregularly spaced data to a fixed interval. My machine with Scilab > 4.? isn't available at the moment so I can't check. I don't know if the > same set are available in version >= 5 (there were a bunch of changes)
I think they mostly replaced the various bells and whistles with better ones, but kept the underlying functionality. I suspect that if you have an old script it'll be compatible with anything but graphics and UI stuff. -- Tim Wescott Control system and signal processing consulting www.wescottdesign.com
On Sat, 04 Jan 2014 09:42:14 -0600, quiasmox wrote:

> On Fri, 3 Jan 2014 19:31:26 -0800 (PST), clay@claysturner.com wrote: > >>On Friday, January 3, 2014 9:54:29 PM UTC-5, John O'Flaherty wrote: >>> Hello all, >>> I have two data series of 20 years length. One is of random dates, and >>> my weight on those dates. The other is of other random dates in the >>> same range, and the number of quarts of yogurt I purchased on those >>> dates. I would like to get a series representing average yogurt >>> consumption, and average weight, graph them, and see the relationship >>> between the two. >>> Can anyone suggest how to deal with data that doesn't have a constant >>> sample frequency? > >>Interesting question! In terms of stats, you need a model. Lacking such >>a model, try something like For each weight date: >>weight(T) = G* [yogurt1*exp(a*(t1-T)) + yogurt2*exp(a*(t2-T) + ....] >>Here T is the date for a given weight, and yogurt1 is the amount of >>yogurt purchased on date t1. >> Similarly each other purchase of yogurt is treated similarly. "a" is >> the exponential decay factor. >>"G" is an overall conversion factor. Set up a matrix of equations for >>each date you have a weight >> measurement on using of course yogurt amounts for dates prior to the >> date the weight is measured on. Solve the matrix and see how well this >> model fits. >>You can also set up the model in other ways as well. Think about how >>fast you gain and lose weight and devise a model for that. >> With a finite set of data, set up a matrix and solve it to find the >> model's parameters. > > Thanks for the reply, Clay. I see your expression is already > incorporating the two series. I was hoping to take the separate series > and make a continuous expression for each, that is, to just resolve the > problem of randomly spaced data. I could choose some arbitrary interval, > a week or month say, and add up purchases in that range for > quarts-per-week, but I wonder if there's a better way to smooth the > individual series? A resampling from random times to a frequency. I see > matlab has a moving average, with an argument of tsobj, time series > object, but one of its parameters is frequency.
Smoothing the individual series and rationalizing the sampling will lose information. If you're lucky it will lose unimportant information, but you don't know that. Maybe smooth and resample, make your hypothesis, then test it against the 'rough' data. -- Tim Wescott Control system and signal processing consulting www.wescottdesign.com
Tim Wescott <tim@seemywebsite.please> wrote:
> On Sat, 04 Jan 2014 09:42:14 -0600, quiasmox wrote: >> On Fri, 3 Jan 2014 19:31:26 -0800 (PST), clay@claysturner.com wrote:
(snip)
>>>> I have two data series of 20 years length. One is of random dates, and >>>> my weight on those dates. The other is of other random dates in the >>>> same range, and the number of quarts of yogurt I purchased on those >>>> dates. I would like to get a series representing average yogurt >>>> consumption, and average weight, graph them, and see the relationship >>>> between the two.
(snip)
>>>weight(T) = G* [yogurt1*exp(a*(t1-T)) + yogurt2*exp(a*(t2-T) + ....] >>>Here T is the date for a given weight, and yogurt1 is the amount of >>>yogurt purchased on date t1.
(snip)
> Smoothing the individual series and rationalizing the sampling > will lose information. If you're lucky it will lose unimportant > information, but you don't know that.
Yes. That is why I like Clay's reply. Note that there is an unknown, in the time delay between purchasing yogurt and its affect. Well, even more, there is a fundamental difference between the two quantities. Consider that if you made more measurements of the weight, in between the current sample points, you would expect the values to be close to the nearby values. Weight is, fundamentally, a continuous quantity that is being sampled. But purchasing yogurt, at least in this problem, is fundamentally a discrete quantity. Or, considering it another way and with a consideration of the time intervals, it is a rate. As I wrote above, the time delay of the yogurt is not known. After buying it, it might be some days before eating, and the effect on weight has an additional delay. So, following Clay's formula, you need to adjust the decay constant, which takes into account the above delay. Assuming that there is an effect, there should be a value of a that gives a better result than larger or smaller values. Note that if the effect took place over minutes or hours, and the weights were measured days apart (and without the time of day) that the information would already be lost.
> Maybe smooth and resample, make your hypothesis, then test it > against the 'rough' data.
There is the additional problem of noise. Unless you ate only yogurt over this time range, the other foods are noise in the data. You have to average that out. The ability to do that will depend on the significance, and time scale, of variations in yogurt purchases. -- glen
On Sat, 04 Jan 2014 22:13:20 +0000, glen herrmannsfeldt wrote:

> Tim Wescott <tim@seemywebsite.please> wrote: >> On Sat, 04 Jan 2014 09:42:14 -0600, quiasmox wrote: >>> On Fri, 3 Jan 2014 19:31:26 -0800 (PST), clay@claysturner.com wrote: > (snip) >>>>> I have two data series of 20 years length. One is of random dates, >>>>> and my weight on those dates. The other is of other random dates in >>>>> the same range, and the number of quarts of yogurt I purchased on >>>>> those dates. I would like to get a series representing average >>>>> yogurt consumption, and average weight, graph them, and see the >>>>> relationship between the two. > > (snip) > >>>>weight(T) = G* [yogurt1*exp(a*(t1-T)) + yogurt2*exp(a*(t2-T) + ....] >>>>Here T is the date for a given weight, and yogurt1 is the amount of >>>>yogurt purchased on date t1. > > (snip) >> Smoothing the individual series and rationalizing the sampling will >> lose information. If you're lucky it will lose unimportant >> information, but you don't know that. > > Yes. That is why I like Clay's reply. Note that there is an unknown, > in the time delay between purchasing yogurt and its affect. > > Well, even more, there is a fundamental difference between the two > quantities. Consider that if you made more measurements of the weight, > in between the current sample points, you would expect the values to be > close to the nearby values. Weight is, fundamentally, a continuous > quantity that is being sampled. > > But purchasing yogurt, at least in this problem, is fundamentally a > discrete quantity. Or, considering it another way and with a > consideration of the time intervals, it is a rate. > > As I wrote above, the time delay of the yogurt is not known. > After buying it, it might be some days before eating, and the effect on > weight has an additional delay. > > So, following Clay's formula, you need to adjust the decay constant, > which takes into account the above delay. > > Assuming that there is an effect, there should be a value of a that > gives a better result than larger or smaller values. > > Note that if the effect took place over minutes or hours, > and the weights were measured days apart (and without the time of day) > that the information would already be lost. > >> Maybe smooth and resample, make your hypothesis, then test it against >> the 'rough' data. > > There is the additional problem of noise. Unless you ate only yogurt > over this time range, the other foods are noise in the data. You have to > average that out. The ability to do that will depend on the > significance, and time scale, of variations in yogurt purchases.
No kidding! It's not just food either -- it's what you're eating, how you're exercising, whether you're sick, or well, etc. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
You have gotten a lot of good advice already, but this seems like a
problem designed to be thought-provoking rather than solved.

I like Michael Unser's paper "Sampling--50 Years After Shannon" from
Proc. of the IEEE, Vol. 88, No. 4, April 2000 for an overview and
references. I would read that and track down the references on
non-uniform sampling and frames.

One other point that makes me think this is a thought experiment: your
two data series have another very important difference, that is, your
weight on random dates is a continuous function whose first and likely
second derivatives are very well-behaved. The amount of yogurt purchased
is not a continuous function and certain interpolation techniques
(such as splines that depend on continous functions) do not apply.

As others have said, you have no information about what happened between
the sampling points. But you could have a physical model for weight that
implies the second derivative is bounded by some practical limits.

----------------------------------------
Tim Wescott <tim@seemywebsite.please> writes:
Path: news.mathworks.com!newsfeed-00.mathworks.com!panix!goblin1!goblin.stu.neva.ru!v6no6470691igd.0!news-out.google.com!p7ni6386qat.0!nntp.google.com!Xl.tags.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!local2.nntp.dca.giganews.com!news.giganews.com.POSTED!not-for-mail
Subject: Re: random ordinate...moving average?
Newsgroups: comp.dsp
Date: Sat, 04 Jan 2014 11:38:35 -0600
>
On Sat, 04 Jan 2014 09:42:14 -0600, quiasmox wrote:
> >> On Fri, 3 Jan 2014 19:31:26 -0800 (PST), clay@claysturner.com wrote: >> >>>On Friday, January 3, 2014 9:54:29 PM UTC-5, John O'Flaherty wrote: >>>> Hello all, >>>> I have two data series of 20 years length. One is of random dates, and >>>> my weight on those dates. The other is of other random dates in the >>>> same range, and the number of quarts of yogurt I purchased on those >>>> dates. I would like to get a series representing average yogurt >>>> consumption, and average weight, graph them, and see the relationship >>>> between the two. >>>> Can anyone suggest how to deal with data that doesn't have a constant >>>> sample frequency? >> >>>Interesting question! In terms of stats, you need a model. Lacking such >>>a model, try something like For each weight date: >>>weight(T) = G* [yogurt1*exp(a*(t1-T)) + yogurt2*exp(a*(t2-T) + ....] >>>Here T is the date for a given weight, and yogurt1 is the amount of >>>yogurt purchased on date t1. >>> Similarly each other purchase of yogurt is treated similarly. "a" is >>> the exponential decay factor. >>>"G" is an overall conversion factor. Set up a matrix of equations for >>>each date you have a weight >>> measurement on using of course yogurt amounts for dates prior to the >>> date the weight is measured on. Solve the matrix and see how well this >>> model fits. >>>You can also set up the model in other ways as well. Think about how >>>fast you gain and lose weight and devise a model for that. >>> With a finite set of data, set up a matrix and solve it to find the >>> model's parameters. >> >> Thanks for the reply, Clay. I see your expression is already >> incorporating the two series. I was hoping to take the separate series >> and make a continuous expression for each, that is, to just resolve the >> problem of randomly spaced data. I could choose some arbitrary interval, >> a week or month say, and add up purchases in that range for >> quarts-per-week, but I wonder if there's a better way to smooth the >> individual series? A resampling from random times to a frequency. I see >> matlab has a moving average, with an argument of tsobj, time series >> object, but one of its parameters is frequency. >
Smoothing the individual series and rationalizing the sampling will lose information. If you're lucky it will lose unimportant information, but you don't know that.
>
Maybe smooth and resample, make your hypothesis, then test it against the 'rough' data.
On Saturday, January 4, 2014 10:42:14 AM UTC-5, John O'Flaherty wrote:
> On Fri, 3 Jan 2014 19:31:26 -0800 (PST), clay@claysturner.com wrote: > > > > >On Friday, January 3, 2014 9:54:29 PM UTC-5, John O'Flaherty wrote: > > >> Hello all, > > >> I have two data series of 20 years length. One is of random dates, and > > >> my weight on those dates. The other is of other random dates in the > > >> same range, and the number of quarts of yogurt I purchased on those > > >> dates. I would like to get a series representing average yogurt > > >> consumption, and average weight, graph them, and see the relationship > > >> between the two. > > >> Can anyone suggest how to deal with data that doesn't have a constant > > >> sample frequency? > > > > >Interesting question! In terms of stats, you need a model. Lacking such a model, try something like > > >For each weight date: > > >weight(T) = G* [yogurt1*exp(a*(t1-T)) + yogurt2*exp(a*(t2-T) + ....] > > >Here T is the date for a given weight, and yogurt1 is the amount of yogurt purchased on date t1. > > > Similarly each other purchase of yogurt is treated similarly. "a" is the exponential decay factor. > > >"G" is an overall conversion factor. Set up a matrix of equations for each date you have a weight > > > measurement on using of course yogurt amounts for dates prior to the date the weight is > > > measured on. Solve the matrix and see how well this model fits. > > >You can also set up the model in other ways as well. Think about how fast you gain > > >and lose weight and devise a model for that. > > > With a finite set of data, set up a matrix and solve it to find the model's parameters. > > > > Thanks for the reply, Clay. I see your expression is already > > incorporating the two series. I was hoping to take the separate series > > and make a continuous expression for each, that is, to just resolve > > the problem of randomly spaced data. I could choose some arbitrary > > interval, a week or month say, and add up purchases in that range for > > quarts-per-week, but I wonder if there's a better way to smooth the > > individual series? A resampling from random times to a frequency. I > > see matlab has a moving average, with an argument of tsobj, time > > series object, but one of its parameters is frequency. > > > > -- > > Jack
Jack, I forgot to mention the obvious starting point would be to detrend the weight data. I.e., find the best fit (least squares for one example) line through the weights and then subtract the weight as predicted from the "line" from the measured weights and then work with residuals. It may look like noise, but it gives you a starting point. Clay