comp.dsp | Efficient Moving Average and Moving Variance Calculations

Steven Smith in "Digital Signal Processing" describes an
efficient algorithm for computing a moving average.  This
algorithm is also mentioned in the Wikipedia article describing
Moving Average:

http://en.wikipedia.org/wiki/Moving_average

Rick Lyons once asked in this newsgroup about an efficient
algorithm for computing "moving variance":

http://groups.google.com/group/comp.dsp/browse_frm/thread/330ac90a92f8dfaf/02a3b89dcf21fdcc?hl=en&lnk=st&q=variance+group%3Acomp.dsp+author%3AHadstate#02a3b89dcf21fdcc

With minimal effort, one can modify the "Moving Average"
algorithm to efficiently compute a "Moving Variance" and a
"Moving Average" simultaneously at each time step.

Notice that one of the equations for computing a sample
variance over a window with N samples can be written as:


    N * Sum(X**2) - (Sum(X)**2)
V = ---------------------------
            N * (N - 1)


where X is the input.

To implement this efficiently, allocate two history buffers,
one for values of X and one for values of X**2, each containing
room for N points.  These buffers need to be initialized,
perhaps to the first sample of X and X**2 or perhaps to zero,
your choice.  Then initialize two variables, SX1 to be the
Sum(elements in X history buffer) and SX2 to be Sum(elements in
X**2 history buffer).

Then, at each time-step k, compute:

X1 = (new sample value)
X2 = X1 * X1

Y1 = (oldest X1 value from X1 history buffer)
Y2 = (oldest X2 value from X2 history buffer)

Overwrite oldest value in X1 history buffer with X1.
Overwrite oldest value in X2 history buffer with X2.

SX1(k) = SX1(k-1) + X1 - Y1
SX2(k) = SX2(k-1) + X2 - Y2

Calculate Moving Average:
M = SX1 / N

Calculate Moving Variance:
V = (N * SX2 - (SX1 * SX1)) / (N * (N - 1))

Reply by SteveSmith ●May 18, 20082008-05-18

Cool!

Reply by Rick Lyons ●May 18, 20082008-05-18

On Sun, 18 May 2008 08:28:51 -0400, "John E. Hadstate"
<jh113355@hotmail.com> wrote:

>Steven Smith in "Digital Signal Processing" describes an
>efficient algorithm for computing a moving average.  This
>algorithm is also mentioned in the Wikipedia article describing
>Moving Average:
>
>http://en.wikipedia.org/wiki/Moving_average
>
>Rick Lyons once asked in this newsgroup about an efficient
>algorithm for computing "moving variance":
>
>http://groups.google.com/group/comp.dsp/browse_frm/thread/330ac90a92f8dfaf/02a3b89dcf21fdcc?hl=en&lnk=st&q=variance+group%3Acomp.dsp+author%3AHadstate#02a3b89dcf21fdcc
>
>With minimal effort, one can modify the "Moving Average"
>algorithm to efficiently compute a "Moving Variance" and a
>"Moving Average" simultaneously at each time step.
>
>Notice that one of the equations for computing a sample
>variance over a window with N samples can be written as:
>
>
>    N * Sum(X**2) - (Sum(X)**2)
>V = ---------------------------
>            N * (N - 1)
>
>
>where X is the input.

   (snipped by Lyons)

Hi John,

   Your above equation for "V" looks
correct to me.  (For whatever that's worth.)

Regards,
[-Rick-]

Reply by ●May 19, 20082008-05-19

On May 19, 12:28 am, "John E. Hadstate" <jh113...@hotmail.com> wrote:
> Steven Smith in "Digital Signal Processing" describes an
> efficient algorithm for computing a moving average.  This
> algorithm is also mentioned in the Wikipedia article describing
> Moving Average:
>
> http://en.wikipedia.org/wiki/Moving_average
>
> Rick Lyons once asked in this newsgroup about an efficient
> algorithm for computing "moving variance":
>
> http://groups.google.com/group/comp.dsp/browse_frm/thread/330ac90a92f...
>
> With minimal effort, one can modify the "Moving Average"
> algorithm to efficiently compute a "Moving Variance" and a
> "Moving Average" simultaneously at each time step.
>
> Notice that one of the equations for computing a sample
> variance over a window with N samples can be written as:
>
>     N * Sum(X**2) - (Sum(X)**2)
> V = ---------------------------
>             N * (N - 1)
>
> where X is the input.
>
> To implement this efficiently, allocate two history buffers,
> one for values of X and one for values of X**2, each containing
> room for N points.  These buffers need to be initialized,
> perhaps to the first sample of X and X**2 or perhaps to zero,
> your choice.  Then initialize two variables, SX1 to be the
> Sum(elements in X history buffer) and SX2 to be Sum(elements in
> X**2 history buffer).
>
> Then, at each time-step k, compute:
>
> X1 = (new sample value)
> X2 = X1 * X1
>
> Y1 = (oldest X1 value from X1 history buffer)
> Y2 = (oldest X2 value from X2 history buffer)
>
> Overwrite oldest value in X1 history buffer with X1.
> Overwrite oldest value in X2 history buffer with X2.
>
> SX1(k) = SX1(k-1) + X1 - Y1
> SX2(k) = SX2(k-1) + X2 - Y2
>
> Calculate Moving Average:
> M = SX1 / N
>
> Calculate Moving Variance:
> V = (N * SX2 - (SX1 * SX1)) / (N * (N - 1))

A simpler way is to do this....

s(k+1)=beta*s(k)+(1-beta)*u^2(k)


where beta is a forgetting factor less than unity.

u is the input and s is the variance.

The correct equation for recursive variance can be found by addoing an
extra sample to batch variance

s(k+1)=s(k)+(1/(k+1)[s(k)-u^2(k)]

if I rememebr right which converges for stationary u when k->
infinity. No use for tracking though - you need the forgetting factor
version or a moving window as has been explained.

There is one more method which estimated the inverse of variance
should you need that one!

K.

Reply by John E. Hadstate ●May 19, 20082008-05-19

<kronecker@yahoo.co.uk> wrote in message 
news:3e0aadd8-6179-444b-a81d-ec7ab37f9986@d19g2000prm.googlegroups.com...
> On May 19, 12:28 am, "John E. Hadstate" 
> <jh113...@hotmail.com> wrote:
>> Then, at each time-step k, compute:
>>
>> X1 = (new sample value)
>> X2 = X1 * X1
>>
>> Y1 = (oldest X1 value from X1 history buffer)
>> Y2 = (oldest X2 value from X2 history buffer)
>>
>> Overwrite oldest value in X1 history buffer with X1.
>> Overwrite oldest value in X2 history buffer with X2.
>>
>> SX1(k) = SX1(k-1) + X1 - Y1
>> SX2(k) = SX2(k-1) + X2 - Y2
>>
>> Calculate Moving Average:
>> M = SX1 / N
>>
>> Calculate Moving Variance:
>> V = (N * SX2 - (SX1 * SX1)) / (N * (N - 1))
>
> A simpler way is to do this....
>
> s(k+1)=beta*s(k)+(1-beta)*u^2(k)
>
>
> where beta is a forgetting factor less than unity.
>
> u is the input and s is the variance.
>

Caveat Emptor!!  This is not "a simpler way to do this".  The 
method described by Hadstate computes what everyone in the 
world agrees is the sample variance for a population size of N 
and correctly evaluates the sample variance in that population 
as new measurements replace old ones.  The method of 
"kronecker@yahoo.co.uk" will only accidentally produce the same 
results.

The method described by Hadstate can be improved at the cost of 
one additional squaring per sample but at the benefit of 
eliminating the need for the X**2 history buffer.  If 
implemented in scaled fixed point arithmetic, each word of the 
X**2 buffer should be twice as wide as the words of the X 
history buffer, so this could represent a significant savings 
in memory.

Reply by dbd ●May 19, 20082008-05-19

On May 19, 3:10 am, "John E. Hadstate" <jh113...@hotmail.com> wrote:
> <kronec...@yahoo.co.uk> wrote in message
>
> news:3e0aadd8-6179-444b-a81d-ec7ab37f9986@d19g2000prm.googlegroups.com...

>
> Caveat Emptor!!  This is not "a simpler way to do this".  The
> method described by Hadstate computes what everyone in the
> world agrees is the sample variance for a population size of N
> and correctly evaluates the sample variance in that population
> as new measurements replace old ones.  The method of
> "kronec...@yahoo.co.uk" will only accidentally produce the same
> results.

What people usually want is the variance of the population, not the
sample variance for a population of size N. The sample variance for a
population of size N "will only accidentally produce the same
results", since the sample variance is only an estimator of the
variance.  That is often a useful approximation. So is the
exponentially averaged version.

>
> The method described by Hadstate can be improved at the cost of
> one additional squaring per sample but at the benefit of
> eliminating the need for the X**2 history buffer.  If
> implemented in scaled fixed point arithmetic, each word of the
> X**2 buffer should be twice as wide as the words of the X
> history buffer, so this could represent a significant savings
> in memory.

Even this improvement still requires N memory elements instead of the
2 required for the exponentially averaged version, so exponential
averaging could represent a significant savings in memory. In systems
where hundreds of channels are decomposed into thousands of bins and
each bin has this calculation performed, the savings is significant.

The use of exponential averaging to calculate signal and background
parameters was common in signal analyzers in the last half of the last
century when memory prices were considerably greater than they are
today.

Dale B. Dalrymple
http://dbdimages.com

Reply by ●May 19, 20082008-05-19

On May 19, 10:10 pm, "John E. Hadstate" <jh113...@hotmail.com> wrote:
> <kronec...@yahoo.co.uk> wrote in message
>
> news:3e0aadd8-6179-444b-a81d-ec7ab37f9986@d19g2000prm.googlegroups.com...
>
>
>
> > On May 19, 12:28 am, "John E. Hadstate"
> > <jh113...@hotmail.com> wrote:
> >> Then, at each time-step k, compute:
>
> >> X1 = (new sample value)
> >> X2 = X1 * X1
>
> >> Y1 = (oldest X1 value from X1 history buffer)
> >> Y2 = (oldest X2 value from X2 history buffer)
>
> >> Overwrite oldest value in X1 history buffer with X1.
> >> Overwrite oldest value in X2 history buffer with X2.
>
> >> SX1(k) = SX1(k-1) + X1 - Y1
> >> SX2(k) = SX2(k-1) + X2 - Y2
>
> >> Calculate Moving Average:
> >> M = SX1 / N
>
> >> Calculate Moving Variance:
> >> V = (N * SX2 - (SX1 * SX1)) / (N * (N - 1))
>
> > A simpler way is to do this....
>
> > s(k+1)=beta*s(k)+(1-beta)*u^2(k)
>
> > where beta is a forgetting factor less than unity.
>
> > u is the input and s is the variance.
>
> Caveat Emptor!!  This is not "a simpler way to do this".  The
> method described by Hadstate computes what everyone in the
> world agrees is the sample variance for a population size of N
> and correctly evaluates the sample variance in that population
> as new measurements replace old ones.  The method of
> "kronec...@yahoo.co.uk" will only accidentally produce the same
> results.
>
>
Which method? The second method (not exponential smoothing) IS exact!
It is called recursive variance in the literature just as we can get
recursive mean in a similar way. I assume here that the signal is
stationary. If it isn't then essentially it is no longer variance but
time-varying variance. Take your pick of the many methods. Then I
agree the moving window approaches come into their own.


K.

Reply by robert bristow-johnson ●May 19, 20082008-05-19

On May 19, 7:22 pm, kronec...@yahoo.co.uk wrote:
> On May 19, 10:10 pm, "John E. Hadstate" <jh113...@hotmail.com> wrote:
>
> > <kronec...@yahoo.co.uk> wrote in message
>
> >news:3e0aadd8-6179-444b-a81d-ec7ab37f9986@d19g2000prm.googlegroups.com...
>
> > > On May 19, 12:28 am, "John E. Hadstate"
> > > <jh113...@hotmail.com> wrote:
> > >> Then, at each time-step k, compute:
>
> > >> X1 = (new sample value)
> > >> X2 = X1 * X1
>
> > >> Y1 = (oldest X1 value from X1 history buffer)
> > >> Y2 = (oldest X2 value from X2 history buffer)
>
> > >> Overwrite oldest value in X1 history buffer with X1.
> > >> Overwrite oldest value in X2 history buffer with X2.
>
> > >> SX1(k) = SX1(k-1) + X1 - Y1
> > >> SX2(k) = SX2(k-1) + X2 - Y2
>
> > >> Calculate Moving Average:
> > >> M = SX1 / N
>
> > >> Calculate Moving Variance:
> > >> V = (N * SX2 - (SX1 * SX1)) / (N * (N - 1))
>
> > > A simpler way is to do this....
>
> > > s(k+1)=beta*s(k)+(1-beta)*u^2(k)
>
> > > where beta is a forgetting factor less than unity.
>
> > > u is the input and s is the variance.
>
> > Caveat Emptor!!  This is not "a simpler way to do this".  The
> > method described by Hadstate computes what everyone in the
> > world agrees is the sample variance for a population size of N
> > and correctly evaluates the sample variance in that population
> > as new measurements replace old ones.  The method of
> > "kronec...@yahoo.co.uk" will only accidentally produce the same
> > results.

it won't produce the same results.  it's a different formula.  it is
computing a *different* weighted mean and mean-square.

> Which method? The second method (not exponential smoothing) IS exact!
> It is called recursive variance in the literature just as we can get
> recursive mean in a similar way. I assume here that the signal is
> stationary. If it isn't then essentially it is no longer variance but
> time-varying variance. Take your pick of the many methods. Then I
> agree the moving window approaches come into their own.

i don't see how your "second method" (if i understand what you mean by
the term) is "not exponential smoothing".  i *do* see how

       y[n] = beta*y[n-1] + (1-beta)*(x[n])^2

gives you a form of a moving mean square (with exponential
weighting).  the averaging is done with a IIR filter with DC gain = 1
and what John H says is what "everyone in the world agrees is the
sample variance for a population size of N" is n FIR filter with DC
gain of *almost* 1 (but is a little more to compensate for the bias in
the estimator of variance using that weighting method).  both are
weighted averages but apply different weightings to the averaging
function.  your "beta" has for a counterpart "N" in John H's thing.
the impulse response to both averagers are:

yours:   h[n] = (1-beta) * beta^n * u[n]

John's:  h[n] = 1/(N-1) * (u[n] - u[n-(N-1))

where u[n] is the unit step function, 1 for non-negative n and 0 for
negative n.

now both of your impulse responses have a mean length.  for your's it
is 1/(1-beta) and for John's it is (N-1)/2 .  if you were to equate
those, you would have a rough comparison if you would define

   beta = 1 - 2/(N-1)    or   N = 2/(1-beta) + 1

now, to unbias the estimator, John has to take the naive mean of the
variances and multiply it by N/(N-1) to get the correct estimate of
variance.  so my question to you, K, is do you have to also unbias
your estimate of variance by multiplying by an equivalent factor,
perhaps by

  1 + (1-beta)/2  ?

that's something to think about.

not to denigrate John or anything, but i had in grad school (3 decades
ago) come up with the same basic method for moving estimates of
variance and mean of real-time data and i implemented it on a 6800
based system (lotsa "add-with-carry" instructions for quintuple
precision arithmetic).  i also considered a first-order IIR averager
like you did, K, and i would have to dig out some very old school
papers, but i think i came up with the need to scale the estimator of
variance by a factor similar to the above.  have you considered this
issue?  like the FIR moving average, it doesn't make much difference
if N is very large or if beta is very close to 1.

r b-j

Reply by John E. Hadstate ●May 19, 20082008-05-19

"robert bristow-johnson" <rbj@audioimagination.com> wrote in 
message 
news:e666e54b-6a71-49d0-9938-5fe5ed895dab@m44g2000hsc.googlegroups.com...

>
> not to denigrate John or anything, but i had in grad school 
> (3 decades
> ago) come up with the same basic method for moving estimates 
> of
> variance and mean of real-time data and i implemented it on a 
> 6800
> based system (lotsa "add-with-carry" instructions for 
> quintuple
> precision arithmetic).

I just re-invented this method for the umpteenth time.  I don't 
claim original sin; somebody else showed me the trick back when 
I was an undergrad.  My main purpose in posting it here, aside 
from making it accessible to those who haven't seen it, is to 
put it someplace where I can find it the next time I need it. 
Maybe Rick will put it into the next edition of his book.  By 
the time it comes out, I might be wealthy enough to buy my own 
copy ;-)

Reply by ●May 19, 20082008-05-19

kronecker@yahoo.co.uk writes:

> Which method? The second method (not exponential smoothing) IS exact!
> It is called recursive variance in the literature just as we can get
> recursive mean in a similar way. I assume here that the signal is
> stationary. If it isn't then essentially it is no longer variance but
> time-varying variance. Take your pick of the many methods. Then I
> agree the moving window approaches come into their own.

You are correct, the second method is exact.  The problem is that it
effectively "turns off" updates as time proceeds (1/(k+1) -> 0).

That can be bad if the system has some sort of start-up transient (or,
as you say it, the system is not stationary).

Ciao,

Peter K.

-- 
"And he sees the vision splendid 
of the sunlit plains extended
And at night the wondrous glory of the everlasting stars."

Previous12 Next

Efficient Moving Average and Moving Variance Calculations

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group