I'm working with some daily plant loading data that is going to be used for a couple of purposes: 1) for determining if a particular level has been exceeded and by how much. 2) charging for excess use. There are plenty of fluctuations and a daily excess isn't deemed "important" or "valid" in the grand scheme of things. But, a 30-day average is more representative of what is deemed a useful measure. A disadvantage of the averaged data is the delay in detection. Here's how I'm processing it: First, the data is 30-day averaged. Next, it's thresholded to determine excesses. With this done, the excesses in the 30-day average were summed and compared with the sum of the daily excesses. As one might expect with spiky data, the sum of the excesses in the averaged data was smaller - about sqrt(2) smaller. I determined empirically that a 30-day look-back "peak hold" on the 30-day average excess data yielded a sum that was pretty close - based on a rather small set of simulations. The idea behind this was that the 30-day average is always less than the 30-day peak but the peak of the 30-day average represents what happened in the preceding 30 days. By doing a look-back peak hold it's as if each day's data was the same for 30-days and would result in the same average. Another way to look at it is if the 30-day average exceeds the threshold for only 1 day then that represents 30 days of excess. By 30-day look-back peak hold I mean: y(i)=max[x(i),x(i+1),...,x(i+30)] I figure that someone must have done something like this before but I don't know what it's called or where it might be treated. Any pointers to such processing? or, comments even? Thanks, Fred
An Empirical Nonlinear Filter
Started by ●October 12, 2007
Reply by ●October 13, 20072007-10-13
On 13 Okt, 03:30, "Fred Marshall" <fmarshallx@remove_the_x.acm.org> wrote:> I'm working with some daily plant loading data that is going to be used for > a couple of purposes: > > 1) for determining if a particular level has been exceeded and by how much. > > 2) charging for excess use. > > There are plenty of fluctuations and a daily excess isn't deemed "important" > or "valid" in the grand scheme of things.How often is the load measured? Dayly? More often?> But, a 30-day average is more > representative of what is deemed a useful measure. A disadvantage of the > averaged data is the delay in detection. > > Here's how I'm processing it: > First, the data is 30-day averaged. > Next, it's thresholded to determine excesses. > > With this done, the excesses in the 30-day average were summed and compared > with the sum of the daily excesses.I'm a bit confused. You measure excess on *both* a daily basis *and* on a 30-day basis?> As one might expect with spiky data, > the sum of the excesses in the averaged data was smaller - about sqrt(2) > smaller.Seems reasonable - one day's excess may be cancelled by another day's lower-than-usual load.> I determined empirically that a 30-day look-back "peak hold" on the 30-day > average excess data yielded a sum that was pretty close - based on a rather > small set of simulations.I don't follow you. You are trying to use the 30-day data to estimate what?> The idea behind this was that the 30-day average is always less than the > 30-day peak but the peak of the 30-day average represents what happened in > the preceding 30 days.... uh ... I don't understand...? How can "the peak of the 30-day average" be different from "the 30-day average"? Or do you mean somethink like "the peak of the 30 daily loads in the 30-day period"?> By doing a look-back peak hold it's as if each day's data was the same for > 30-days and would result in the same average. > Another way to look at it is if the 30-day average exceeds the threshold for > only 1 day then that represents 30 days of excess.I don't follow you in th edetails here, probably because of the details commented above.> By 30-day look-back peak hold I mean: > y(i)=max[x(i),x(i+1),...,x(i+30)] > > I figure that someone must have done something like this before but I don't > know what it's called or where it might be treated. > > Any pointers to such processing? or, comments even?Well, this is certainly something which is outside the realms of DSP. I would assume one might find something of interest in econometrics -- I have some recollection of your earlier posts where you described an application where users of a facility were to be charged for their use of the facility? I can come up with no other suggestion than contact the statistics department in your local (financial) university. If preseneted with a description of your application they ought to be able to come up with some pointers to more material. Of course, I would be very interested in hearing what you manage to dig up of key words and maybe even sources on where to proceed. Rune
Reply by ●October 13, 20072007-10-13
Fred Marshall wrote:> > I'm working with some daily plant loading data that is going to be used for > a couple of purposes: > > 1) for determining if a particular level has been exceeded and by how much. > > 2) charging for excess use. > > There are plenty of fluctuations and a daily excess isn't deemed "important" > or "valid" in the grand scheme of things. But, a 30-day average is more > representative of what is deemed a useful measure. A disadvantage of the > averaged data is the delay in detection. > > Here's how I'm processing it: > First, the data is 30-day averaged. > Next, it's thresholded to determine excesses. > > With this done, the excesses in the 30-day average were summed and compared > with the sum of the daily excesses. As one might expect with spiky data, > the sum of the excesses in the averaged data was smaller - about sqrt(2) > smaller. > > I determined empirically that a 30-day look-back "peak hold" on the 30-day > average excess data yielded a sum that was pretty close - based on a rather > small set of simulations.You seem to be implying that something gets used up and replenished by the system. You are looking at how it gets used up and calling that a load. One can only assume that the rate it gets replenished is a constant and represents a fixed cost. It sounds like there is something that works like storage capacity so daily fluctuations are not very important as they are already being accounted for adequately. What it sounds like you really need to know is about how excess loading occurs at certain frequencies. That is to say, over very long periods (a year maybe) the daily cost of replenishing whatever is getting used up is accounted for adequately and over very short periods (daily) the cost is adequately accounted for but if whatever is getting used up gets used up excessively by sustained loading for certain intermediate periods (you have been looking at a month) that results in extra cost that is not adequately accounted for. Or put another way it sound like you are saying that certain frequencies of fluctuating loads cost more than other frequencies of load fluctuation. Obviously finding those frequencies in the data would be easy if you knew only what you were looking for. The not knowing what any particular frequency of load fluctuation actually cost seems to be your real problem. -jim> The idea behind this was that the 30-day average is always less than the > 30-day peak but the peak of the 30-day average represents what happened in > the preceding 30 days. > By doing a look-back peak hold it's as if each day's data was the same for > 30-days and would result in the same average. > Another way to look at it is if the 30-day average exceeds the threshold for > only 1 day then that represents 30 days of excess. > > By 30-day look-back peak hold I mean: > y(i)=max[x(i),x(i+1),...,x(i+30)] > > I figure that someone must have done something like this before but I don't > know what it's called or where it might be treated. > > Any pointers to such processing? or, comments even? > > Thanks, > > Fred----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==---- http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups ----= East and West-Coast Server Farms - Total Privacy via Encryption =----
Reply by ●October 13, 20072007-10-13
Fred Marshall wrote:> I'm working with some daily plant loading data that is going to be used for > a couple of purposes: > > 1) for determining if a particular level has been exceeded and by how much. > > 2) charging for excess use. > > There are plenty of fluctuations and a daily excess isn't deemed "important" > or "valid" in the grand scheme of things. But, a 30-day average is more > representative of what is deemed a useful measure. A disadvantage of the > averaged data is the delay in detection. > > Here's how I'm processing it: > First, the data is 30-day averaged. > Next, it's thresholded to determine excesses. > > With this done, the excesses in the 30-day average were summed and compared > with the sum of the daily excesses. As one might expect with spiky data, > the sum of the excesses in the averaged data was smaller - about sqrt(2) > smaller.If the difference is that small, the daily fluctuations probably aren't very spiky.> I determined empirically that a 30-day look-back "peak hold" on the 30-day > average excess data yielded a sum that was pretty close - based on a rather > small set of simulations.Pretty close to what?> The idea behind this was that the 30-day average is always less than the > 30-day peak but the peak of the 30-day average represents what happened in > the preceding 30 days.How does the peak of the average represent "what happened in the preceding 30 days"? What does "happened" mean here?> By doing a look-back peak hold it's as if each day's data was the same for > 30-days and would result in the same average. > Another way to look at it is if the 30-day average exceeds the threshold for > only 1 day then that represents 30 days of excess.Why? Suppose the peak on one day were ten times the threshold while the other 29 days were near bur below it. That's not far fetched. A midnight dumper could have unloaded a tanker of septage into a manhole on that day.> By 30-day look-back peak hold I mean: > y(i)=max[x(i),x(i+1),...,x(i+30)] > > I figure that someone must have done something like this before but I don't > know what it's called or where it might be treated. > > Any pointers to such processing? or, comments even?Pointers, no. Comment, yes. Trying to deduce what could have been measured is always frustrating and often unfair to someone. Jerry -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Reply by ●October 13, 20072007-10-13
Fri, 12 Oct 2007 18:30:24 -0700, Fred Marshall wrote:> I'm working with some daily plant loading data that is going to be used for > a couple of purposes: > > 1) for determining if a particular level has been exceeded and by how much. > > 2) charging for excess use. > > There are plenty of fluctuations and a daily excess isn't deemed "important" > or "valid" in the grand scheme of things. But, a 30-day average is more > representative of what is deemed a useful measure. A disadvantage of the > averaged data is the delay in detection. > > Here's how I'm processing it: > First, the data is 30-day averaged. > Next, it's thresholded to determine excesses. > > With this done, the excesses in the 30-day average were summed and compared > with the sum of the daily excesses. As one might expect with spiky data, > the sum of the excesses in the averaged data was smaller - about sqrt(2) > smaller. > > I determined empirically that a 30-day look-back "peak hold" on the 30-day > average excess data yielded a sum that was pretty close - based on a rather > small set of simulations. > The idea behind this was that the 30-day average is always less than the > 30-day peak but the peak of the 30-day average represents what happened in > the preceding 30 days. > By doing a look-back peak hold it's as if each day's data was the same for > 30-days and would result in the same average. > Another way to look at it is if the 30-day average exceeds the threshold for > only 1 day then that represents 30 days of excess. > > By 30-day look-back peak hold I mean: > y(i)=max[x(i),x(i+1),...,x(i+30)] > > I figure that someone must have done something like this before but I don't > know what it's called or where it might be treated. > > Any pointers to such processing? or, comments even? >There's been enough excellent comments that I can only add this: You state up front that you are going to be using this for billing. I'm accounted to be fairly bright guy, yet it took me two readings of the above description to get what you were doing. For billing, sometimes "understandable" trumps "fair" or even "technically correct". -- Tim Wescott Control systems and communications consulting http://www.wescottdesign.com Need to learn how to apply control theory in your embedded system? "Applied Control Theory for Embedded Systems" by Tim Wescott Elsevier/Newnes, http://www.wescottdesign.com/actfes/actfes.html
Reply by ●October 14, 20072007-10-14
Thanks all for the comments. I've often said that writing about something you're too close to is difficult! I proved it once more. I'm going to respond to Jerry's questions after a couple of clarifications for others: In this case, "capacity" means "throughput capacity" - not storage. There's no replenishment involved. Like bandwidth "capacity". It just exists and is either used or not. The pricing here is about using throughput that is owned by someone else. Kind of like charging your neighbor who is using your internet bandwidth via your wireless LAN.> I'm working with some daily plant loading data that is going to be used > for > a couple of purposes: > > 1) for determining if a particular level has been exceeded and by how > much. > > 2) charging for excess use. > > There are plenty of fluctuations and a daily excess isn't deemed > "important" > or "valid" in the grand scheme of things. But, a 30-day average is more > representative of what is deemed a useful measure. A disadvantage of the > averaged data is the delay in detection. > > Here's how I'm processing it: > First, the data is 30-day averaged. > Next, it's thresholded to determine excesses. > > With this done, the excesses in the 30-day average were summed and > compared > with the sum of the daily excesses. As one might expect with spiky data, > the sum of the excesses in the averaged data was smaller - about sqrt(2) > smaller.If the difference is that small, the daily fluctuations probably aren't very spiky. ***sqrt(2) [actually I should have said 1/sqrt(2) ] means a 30% discount. That seems big enough to me. The daily values *are* spiky. Thus, the 30-day average is much lower than the spikes.> I determined empirically that a 30-day look-back "peak hold" on the 30-day > average excess data yielded a sum that was pretty close - based on a > rather > small set of simulations.Pretty close to what? ***The sum of daily excess values using 30-day look-back hold was closer to the sum of the daily excess values.> The idea behind this was that the 30-day average is always less than the > 30-day peak but the peak of the 30-day average represents what happened in > the preceding 30 days.How does the peak of the average represent "what happened in the preceding 30 days"? What does "happened" mean here? ***What I was trying to say is that the transient response to a step through a 30-day averager is 30 days long in total. In order to reach any value in the 30-day average (and here I chose the peaks) means that there has been high use during the preceding 30 days more or less. To your earlier point, the data is spiky but not *that* spiky. So it's very unlikely that a single, very high, data point will kick the average.> By doing a look-back peak hold it's as if each day's data was the same for > 30-days and would result in the same average. > Another way to look at it is if the 30-day average exceeds the threshold > for > only 1 day then that represents 30 days of excess.Why? Suppose the peak on one day were ten times the threshold while the other 29 days were near bur below it. That's not far fetched. A midnight dumper could have unloaded a tanker of septage into a manhole on that day. ***That's correct if there were such peaks - but there aren't. The peaks are running about 30% over the threshold when they occur.> By 30-day look-back peak hold I mean: > y(i)=max[x(i),x(i+1),...,x(i+30)] > > I figure that someone must have done something like this before but I > don't > know what it's called or where it might be treated. > > Any pointers to such processing? or, comments even?Pointers, no. Comment, yes. Trying to deduce what could have been measured is always frustrating and often unfair to someone. ***Yep. Tim makes a similar point. Making any of this fair and understandable to the point of being acceptable is a likely challenge. Need to keep it simple. Fred
Reply by ●October 14, 20072007-10-14
Fred Marshall wrote: ...> ***Yep. Tim makes a similar point. Making any of this fair and > understandable to the point of being acceptable is a likely challenge. Need > to keep it simple.Been there! Maybe in the middle of haggling about who /probably/ owes what to whom, they'll rethink the cost/benefit tradeoff of more meters. Jerry -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Reply by ●October 15, 20072007-10-15
"Fred Marshall" schrieb> I'm working with some daily plant loading data that is going > to be used for a couple of purposes: > > 1) for determining if a particular level has been exceeded > and by how much. > > 2) charging for excess use. > > There are plenty of fluctuations and a daily excess isn't deemed > "important" or "valid" in the grand scheme of things. > But, a 30-day average is more representative of what is deemed > a useful measure. A disadvantage of the averaged data is the > delay in detection. >Can you explain what behavior you want from your users? Low average? Low standard deviation? Apart from being >30 days late, what is the problem with a 30-day average? Do you have to take special measures to accomodate a higher average / higher peaks? Regards Martin
Reply by ●October 15, 20072007-10-15
Martin Blume wrote: > "Fred Marshall" schrieb >> I'm working with some daily plant loading data that is going >> to be used for a couple of purposes: >> >> 1) for determining if a particular level has been exceeded >> and by how much. >> >> 2) charging for excess use. >> >> There are plenty of fluctuations and a daily excess isn't deemed >> "important" or "valid" in the grand scheme of things. >> But, a 30-day average is more representative of what is deemed >> a useful measure. A disadvantage of the averaged data is the >> delay in detection. >> > Can you explain what behavior you want from your users? > Low average? Low standard deviation? > > Apart from being >30 days late, what is the problem with a 30-day > average? Do you have to take special measures to accomodate a higher > average / higher peaks? Fred is concerned with a sewage plant that serves more than one user. Such plants have a lot of money tied up in capital outlay and debt service. Apportioning that cost among the users according to their flows can, when the plant has been long in use, can involve more money than apportioning the operating expenses. Freds's plant apparently allots each user a certain flow capacity for which it is committed to pay. If one user's flow is high and another's is low, the operation can be satisfactory, but one user is benefiting from the other's capacity, and is presumably reimbursement is expected. The issue is complicated by what I deem a deficient way of measuring flow. It is sampled periodically (with unequal periods?) and one user's flow is estimated by subtracting other measured flows from a (less accurate) total. My plants operate on a different paradigm. When it began operation in 1978, the main plant had a rated capacity of 10 MGD (million gallons per day) Of properly treated sewage. (It had a throughput capacity of 25 MGD before raw sewage would have to be bypassed to avoid washing out the biological process.) Subsequent operating refinements and plant improvements have raised the rated capacity over 13 MGD. This capacity is available to all users. Flows from each user (the main plant has 5) are measured continuously. Early on, the charts of flow were integrated by hand. Now we have totalizers. The Authority is committed to provide whatever capacity is needed, and to sell bonds to raise the capital to do that. Users "own" a portion of the plant in proportion to their flow, and are obligated to bear their proportionate share of total debt service to date. Weather has a drastic influence on sewage flow. I&I (inflow and infiltration) can constitute as much as half the flow from old leaky systems during periods of heavy rain. A wet year can shift a large part of the "ownership" from a user with a leaky system (and illegal basement drains) to a user with a tight one. To alleviate the budgeting uncertainty, actual annual transfer payments are based on a 7-year running average. I know that Fred is trying to do the best that can be done with what seems to me to be shoddy metering and a sharing plan that is poorly thought through. I wish him luck in making the proverbial silk purse from a sow's ear. Jerry the Shit Commissioner -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Reply by ●October 15, 20072007-10-15
"Martin Blume" <mblume@socha.net> wrote in message news:47139a7a$0$6565$5402220f@news.sunrise.ch...> "Fred Marshall" schrieb >> I'm working with some daily plant loading data that is going >> to be used for a couple of purposes: >> >> 1) for determining if a particular level has been exceeded >> and by how much. >> >> 2) charging for excess use. >> >> There are plenty of fluctuations and a daily excess isn't deemed >> "important" or "valid" in the grand scheme of things. >> But, a 30-day average is more representative of what is deemed >> a useful measure. A disadvantage of the averaged data is the >> delay in detection. >> > Can you explain what behavior you want from your users? > Low average? Low standard deviation? > > Apart from being >30 days late, what is the problem with a 30-day > average? Do you have to take special measures to accomodate a higher > average / higher peaks? >Martin, There is an owner/user (us) and there is a partner/user. So, since the rules might apply to us as well as our partner we want them to be reasonable. This isn't about controlling behavior although I can surely understand the reason for your question. There is no problem with the 30-day average that I know of - other than the delay. There are no special measures that can be taken except in the long term (investing in additional throughput capacity). The special measures in the shorter term focus on compensation ($) for over-use. Here are the typical questions: 1) Did you encroach on my throughput capacity? A YES/NO question. Short-term spikes *might* be ignored in answering this question by averaging first. An alternate is to use short-term data and be done with it. 2) If the answer to (1) is YES then how much do you owe me? This is murky because the existing agreement only says there will be compensation - no forumula, no methods, etc. So, there need to be measures and methods and, an option to using averages here would be to use short-term data. So, I'm working on measures and methods and, here, focusing on aspects of processing discrete time sequences. Fred






