# Analyzing an "undersampled" sequence

Started by July 13, 2010
```Perhaps I should post this elsewhere but we speak the same language
here.  I may have asked a similar question some time ago but now I have
a new perspective and want to investigate.

I have a wastewater process that's being sampled periodically (uniform
sampling for what it's worth).
The sample rate is way too low to avoid aliasing but the samples are
real enough and the data is continuously available and very likely not
amenable to being sampled more often (economics).

It's a bit like sampling a random series except that I "know" there is
an underlying pattern that repeats each day with variable amplitude no
doubt.  That, plus transients, would be the highest frequency content
and seasonal things are the lowest frequency content which I'm not too
worried about.  And, while I'd like to know when transients happen and
how big they are, I'm afraid that's out of the question.

In fact, what's of value here is to estimate how much plant capacity is
being "used up".  By my reckoning, 6 months of data during our peak
months is a good averaging period - as it's the peak months that
determine our capacity "use" for regulatory purposes.
In the shorter term, the numbers are used for determining charges for
overly high concentrations, shared use, etc.

To make things a bit more complicated, the regulatory agency has us
report the weekly data on a monthly basis (actually here there are 2
samples per week) and average it for the month.
If there are 3 contiguous months with these averages exceeding our
"capacity" or some large fraction of it, then we are put on notice that
planning for future capacity must begin.  So, this is one "measure"
that's in concrete.  But, I digress a bit .....

Here is my question:

Instead of worrying about aliasing which is where I go to first of
course, is there a statistical measure that might help me better
understand the "quality" of our numbers or how much variation is
"expected" given those numbers?
For example, given 4 to 8 weeks of data (4 to 8 samples), what can be
said the data set in a statistical sense?  How might one best put the
answer to use in a case like this?

Where should I be looking?

Fred
```
```I didn't see where you reveal what is being sampled.  Is it how full a tank
is? How much fluid is flowing in a pipe?

Fred Marshall wrote:

> Perhaps I should post this elsewhere but we speak the same language
> here.  I may have asked a similar question some time ago but now I have
> a new perspective and want to investigate.
>
> I have a wastewater process that's being sampled periodically (uniform
> sampling for what it's worth).
> The sample rate is way too low to avoid aliasing but the samples are
> real enough and the data is continuously available and very likely not
> amenable to being sampled more often (economics).
>
> It's a bit like sampling a random series except that I "know" there is
> an underlying pattern that repeats each day with variable amplitude no
> doubt.  That, plus transients, would be the highest frequency content
> and seasonal things are the lowest frequency content which I'm not too
> worried about.  And, while I'd like to know when transients happen and
> how big they are, I'm afraid that's out of the question.
>
> In fact, what's of value here is to estimate how much plant capacity is
> being "used up".  By my reckoning, 6 months of data during our peak
> months is a good averaging period - as it's the peak months that
> determine our capacity "use" for regulatory purposes.
> In the shorter term, the numbers are used for determining charges for
> overly high concentrations, shared use, etc.
>
> To make things a bit more complicated, the regulatory agency has us
> report the weekly data on a monthly basis (actually here there are 2
> samples per week) and average it for the month.

Depending on what is being sampled that could be a complete accounting or
an incomplete accounting of usage. If each sample records how much was used
since the last sample was taken, then when you add them together you have
complete accounting of the usage for the month.  If all that the sample is
measuring is the instantaneous usage at the instant the sample is taken
then you have a very incomplete accounting of usage and could make it mean
just about anything you want it to.

-jim

>
> If there are 3 contiguous months with these averages exceeding our
> "capacity" or some large fraction of it, then we are put on notice that
> planning for future capacity must begin.  So, this is one "measure"
> that's in concrete.  But, I digress a bit .....
>
> Here is my question:
>
> Instead of worrying about aliasing which is where I go to first of
> course, is there a statistical measure that might help me better
> understand the "quality" of our numbers or how much variation is
> "expected" given those numbers?
> For example, given 4 to 8 weeks of data (4 to 8 samples), what can be
> said the data set in a statistical sense?  How might one best put the
> answer to use in a case like this?
>
> Where should I be looking?
>
> Fred

```
```Fred Marshall  <fmarshallx@remove_the_xacm.org> wrote:

>Instead of worrying about aliasing which is where I go to first of
>course, is there a statistical measure that might help me better
>understand the "quality" of our numbers or how much variation is
>"expected" given those numbers?
>For example, given 4 to 8 weeks of data (4 to 8 samples), what can be
>said the data set in a statistical sense?  How might one best put the
>answer to use in a case like this?
>
>Where should I be looking?

Something like a Student's T test can tell you if a sample
or group of samples is out-of-line.

(I think I may have said the same thing, the last time you
asked a similar question.)

Steve
```
```On 7/13/2010 12:31 PM, Fred Marshall wrote:
> Perhaps I should post this elsewhere but we speak the same language
> here.  I may have asked a similar question some time ago but now I have
> a new perspective and want to investigate.
>
> I have a wastewater process that's being sampled periodically (uniform
> sampling for what it's worth).
> The sample rate is way too low to avoid aliasing but the samples are
> real enough and the data is continuously available and very likely not
> amenable to being sampled more often (economics).
>
> It's a bit like sampling a random series except that I "know" there is
> an underlying pattern that repeats each day with variable amplitude no
> doubt. That, plus transients, would be the highest frequency content and
> seasonal things are the lowest frequency content which I'm not too
> worried about. And, while I'd like to know when transients happen and
> how big they are, I'm afraid that's out of the question.
>
> In fact, what's of value here is to estimate how much plant capacity is
> being "used up". By my reckoning, 6 months of data during our peak
> months is a good averaging period - as it's the peak months that
> determine our capacity "use" for regulatory purposes.
> In the shorter term, the numbers are used for determining charges for
> overly high concentrations, shared use, etc.
>
> To make things a bit more complicated, the regulatory agency has us
> report the weekly data on a monthly basis (actually here there are 2
> samples per week) and average it for the month.
> If there are 3 contiguous months with these averages exceeding our
> "capacity" or some large fraction of it, then we are put on notice that
> planning for future capacity must begin. So, this is one "measure"
> that's in concrete. But, I digress a bit .....
>
> Here is my question:
>
> Instead of worrying about aliasing which is where I go to first of
> course, is there a statistical measure that might help me better
> understand the "quality" of our numbers or how much variation is
> "expected" given those numbers?
> For example, given 4 to 8 weeks of data (4 to 8 samples), what can be
> said the data set in a statistical sense? How might one best put the
> answer to use in a case like this?
>
> Where should I be looking?

Other things being equal, clustering should follow a Poisson
distribution. If you measure flow -- a quantity that can be heavily
influenced by rainfall -- only twice a week, how do you bill equitably?

Jerry
--
Engineering is the art of making what you want from things you can get.
&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;
```
```Jerry Avins wrote:

>
> Other things being equal, clustering should follow a Poisson
> distribution. If you measure flow -- a quantity that can be heavily
> influenced by rainfall -- only twice a week, how do you bill equitably?
>
> Jerry

Jerry,

I don't imagine that we bill entirely "equitably" - more like "agreeably".

We measure flow continuously to get the volume and concentration once or
twice a week.

The concentration is assumed to apply for the entire measured volume
between concentration samples.  So, one may say that we sample loading
in that fashion.

I think I answered my own question to the point where I can deal with it:

We have the weekly or twice-weekly samples and have computer monthly
averages - as the latter have some regulatory importance.
You might consider these monthly averages to be lowpassed versions of
the samples.
Then, one can compute the distribution of outcomes and infer(?) the

My "backwards" sort of reasoning goes like this:
We take a set of samples.
We determine the distribution of those sample values over a suitably
long time such that daily and even annual variations are included in the
distribution.
The caution here is that trends get wiped out - so a suitable time frame
or set of them needs to be selected that has some meaning where gross
trends are concerned.
If we assume that the distribution represents a reasonable estimate of
ground truth, then we can infer in quantitative terms what's happening -
It's surely not "perfect" but it's better than nothing ... I think.

Fred
```
```On 7/13/2010 8:28 PM, Fred Marshall wrote:
> Jerry Avins wrote:
>
>>
>> Other things being equal, clustering should follow a Poisson
>> distribution. If you measure flow -- a quantity that can be heavily
>> influenced by rainfall -- only twice a week, how do you bill equitably?
>>
>> Jerry
>
> Jerry,
>
> I don't imagine that we bill entirely "equitably" - more like "agreeably".
>
> We measure flow continuously to get the volume and concentration once or
> twice a week.
>
> The concentration is assumed to apply for the entire measured volume
> between concentration samples. So, one may say that we sample loading in
> that fashion.
>
> I think I answered my own question to the point where I can deal with it:
>
> We have the weekly or twice-weekly samples and have computer monthly
> averages - as the latter have some regulatory importance.
> You might consider these monthly averages to be lowpassed versions of
> the samples.
> Then, one can compute the distribution of outcomes and infer(?) the
>
> My "backwards" sort of reasoning goes like this:
> We take a set of samples.
> We determine the distribution of those sample values over a suitably
> long time such that daily and even annual variations are included in the
> distribution.
> The caution here is that trends get wiped out - so a suitable time frame
> or set of them needs to be selected that has some meaning where gross
> trends are concerned.
> If we assume that the distribution represents a reasonable estimate of
> ground truth, then we can infer in quantitative terms what's happening -
> It's surely not "perfect" but it's better than nothing ... I think.

If your samples are taken at times of unusually high I&I, the dilution
can make the measured concentrations uncharacteristically low.

Jerry
--
Engineering is the art of making what you want from things you can get.
&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;
```
```Jerry Avins wrote:

> If your samples are taken at times of unusually high I&I, the dilution
> can make the measured concentrations uncharacteristically low.
>
> Jerry

Yes, I know but the sample times are set for a number of reasons.
Actually, our concern right now is why the concentrations are so darned
high!  So, in these parts where there's nearly 100 inches of rain each
year, we're used to seeing and fixing I&I.  Right now it's not a big
concern.

Fred

```
```On Jul 13, 9:29&#2013266080;pm, Jerry Avins <j...@ieee.org> wrote:
> On 7/13/2010 8:28 PM, Fred Marshall wrote:
>
>
>
>
>
> > Jerry Avins wrote:
>
> >> Other things being equal, clustering should follow a Poisson
> >> distribution. If you measure flow -- a quantity that can be heavily
> >> influenced by rainfall -- only twice a week, how do you bill equitably?
>
> >> Jerry
>
> > Jerry,
>
> > I don't imagine that we bill entirely "equitably" - more like "agreeably".
>
> > We measure flow continuously to get the volume and concentration once or
> > twice a week.
>
> > The concentration is assumed to apply for the entire measured volume
> > between concentration samples. So, one may say that we sample loading in
> > that fashion.
>
> > I think I answered my own question to the point where I can deal with it:
>
> > We have the weekly or twice-weekly samples and have computer monthly
> > averages - as the latter have some regulatory importance.
> > You might consider these monthly averages to be lowpassed versions of
> > the samples.
> > Then, one can compute the distribution of outcomes and infer(?) the
>
> > My "backwards" sort of reasoning goes like this:
> > We take a set of samples.
> > We determine the distribution of those sample values over a suitably
> > long time such that daily and even annual variations are included in the
> > distribution.
> > The caution here is that trends get wiped out - so a suitable time frame
> > or set of them needs to be selected that has some meaning where gross
> > trends are concerned.
> > If we assume that the distribution represents a reasonable estimate of
> > ground truth, then we can infer in quantitative terms what's happening -
> > It's surely not "perfect" but it's better than nothing ... I think.
>
> If your samples are taken at times of unusually high I&I, the dilution
> can make the measured concentrations uncharacteristically low.

Duh, what's, I & I?

Greg
```
```On 7/14/2010 5:55 AM, Greg Heath wrote:

> ... what's, I&  I?

Infiltration and inflow, which force sewage plants to process rainwater.
Infiltration occurs when leaky mains are lower that the water table.
Inflow is often illegal pump connections to the sanitary sewer. When
streets become submerged, rainwater can pour in through manhole covers.

Jerry
--
Engineering is the art of making what you want from things you can get.
&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;
```
```On 7/14/2010 5:55 AM, Greg Heath wrote:

> ... what's, I&  I?

Infiltration and inflow, which force sewage plants to process rainwater.
Infiltration occurs when leaky mains are lower than the water table.
Inflow is often illegal pump connections to the sanitary sewer. When
streets become submerged, rainwater can pour in through manhole covers.

Jerry
--
Engineering is the art of making what you want from things you can get.
&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;&#2013266095;
```