DSPRelated.com
Forums

Question about computing a "moving variance"

Started by Rick Lyons February 27, 2005
"Rick Lyons" <r.lyons@_BOGUS_ieee.org> wrote in message
news:422cb9e2.446575421@news.sf.sbcglobal.net...
> On 1 Mar 2005 14:00:14 -0800, "John Hadstate"
<jh113355@hotmail.com>
> wrote: > > >So, you're left with: > > > > V = [1/(n-1)]*{[sum (x**2)] - n*xbar**2} > > > > -or- > > > > V = [1/(n-1)]*{[sum(x**2)]-[sum(x)**2]/n} > > Ah. John, your expression is equal to that on the > webpage: > http://www.mnstate.edu/wasson/ed602calcvarraws.htm > (URL supplied by Clay Turner) > > but you've supplied the "missing step" that justifies > the expression. [I didn't see that step until you stuck > my nose in it. :-) ] > Thanks! > > John, your name has been officially entered on the > list of guys to whom I owe a beer! > (Unfortunately, to collect you have to fly to > Sacremento CA, rent a car, and drive to my > local bar to collect that beer.) >
I may take you up on that offer someday. I have kin near Sacramento, in the Sheriff's dept. no less. ;-)
> >So, to build the FIR version, you need one delay line of
n units.
> > Oops. Don't we need two delay lines? One delay line > to compute [sum (x**2)], and one delay line to compute > xbar. Or am I missing something here? >
Actually, you can do it either way. With two delay lines you can feed X into one and X**2 into the other and simply sum the elements of each. However, (at considerable cost in computational cycles), you can feed X into a single delay line. Then sum the x's and sum the Squares of the X's. In other words, it doesn't matter whether you sum the delayed squares or sum the squares of the delayed X's. In the olden days, the cost of memory might have justified using one delay line. Nowadays, you're probably better off with two. By the way, did you figure out the IIR version?
On Mon, 07 Mar 2005 13:43:41 -0800, glen herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:


 (snip)

  
> >> Oops. Don't we need two delay lines? One delay line >> to compute [sum (x**2)], and one delay line to compute >> xbar. Or am I missing something here? > >I thought that at first, too. I think, though, that it is two >delay lines and one multiplier, or one delay line and two >multipliers. Also, the delay line for x**2 could be twice >as wide as for x. I would say it depends on N and the width. > >You could also build a pipelined multiplier into the delay line. > >-- glen
Hi Glen, darn, ... you're the 2nd guy who said this variance processing could be performed using only one delay line. I've gotta figure this out! Thanks, [-Rick-]
On 7 Mar 2005 16:37:50 -0800, "JS" <jshima@timing.com> wrote:

>Nice topic! I have always just used the recursive mean and variance >estimators given in "Optimal Estimation" by Frank Lewis, since they >made sense to me... > >For sample mean using sample input x[n]: > >m[n+1] = m[n] + 1/(N+1)*(x[n+1] - m[n]) > = N/(N+1)*m[n] + 1/(N+1)*x[n+1] = a nice first-order IIR filter > >For zero-mean sample variance (biased): > >s[n+1] = s[n] + 1/(N+1)*(x[n+1]^2 - s[n]), which is equal to the sample >variance and another nice IIR filter (both look like averagers but dont >have constant coefficients since N is changing). > >For non-zero mean you just use your estimate of m[n] above in the >equation for sample variance. You can also derive the same filter for >unbiased estimator. > >These probably reduce to what you are talking about, the derivation is >a bit different. You start with the block equation(s) for sample mean >and variance, then do sort of an induction step where you add one >sample to the block equation and then rewrite the whole deal. You end >up with a nice recursion... > >Jim
Hi Jim, I didn't recognize the "JS" as Jim Shima. Thanks for the equations. I've collected many expressions from the guys here for nonrecursive and recursive (biased and unbiased), real-time and nonreal-time, mean & variance algorithms. I'm gonna see how they all compare with each other. Thanks, [-Rick-]
Yessir, that is me!  Been away for some time.  Nice to see you still
around figuring out all this cool stuff.

I would love to see your conclusions regarding all the different
equations.  It would certainly be illuminating regarding different
filtering implementations of mean and variance.  And those are always
useful "tricks" to have in your DSP toolbox!

Heck I'm still catching up on all this.  If you figure it out let me
know too!!

Jim

robert bristow-johnson <rbj@audioimagination.com> writes:

> in article sm3h9yth.fsf@ieee.org, Randy Yates at yates@ieee.org wrote on > 02/27/2005 21:56: > >> robert bristow-johnson <rbj@audioimagination.com> writes: >>> [...] >>> but >>> >>> E{ S^2 } = (N-1)/N * E{ (x[n]-m)^2 } = (N-1)/N * sigma^2 >> >> How is this derived? > > oh, it's kinda a bitch.
> [...] > E{S^2} = (1/N)*N*(sigma^2 + m^2) - (1/N^2)*((N^2-N)*m^2 + N*(sigma^2 + m^2)) > > = (sigma^2 + m^2) - (1/N^2)*( N^2*m^2 + N*sigma^2 ) > > = sigma^2 - (1/N^2)*( N*sigma^2 ) > > = (N-1)/N * sigma^2
THIS IS A THING OF BEAUTY! THANKS ROBERT!!! I'm sorry I didn't read it sooner. -- % Randy Yates % "Ticket to the moon, flight leaves here %% Fuquay-Varina, NC % today from Satellite 2" %%% 919-577-9882 % 'Ticket To The Moon' %%%% <yates@ieee.org> % *Time*, Electric Light Orchestra http://home.earthlink.net/~yatescr
JS wrote:
> I would love to see your conclusions regarding all the different > equations. It would certainly be illuminating regarding different > filtering implementations of mean and variance. And those are always > useful "tricks" to have in your DSP toolbox! >
I've been messing about with this ever since Rick got a few responses on the thread. Here's how I boil it down. variance = mean(x^2) - (mean(x))^2 There are a lot of ways to calculate the mean. The solution Clay posted uses a CIC filter, and that produces a result identical to the classical (clumsy, slow, bloated, and obvious) method, but a lot faster. Jim Shima proposed using an exponential averager to calculate the mean: y[n] = a*y[n-1] + (1-a)x[n] He set a = N/(N+1). This has the advantage of requiring less memory - in fact, it decouples the memory requirements from N, so that can be very nice. If we recognize that the mean is just the DC component of x, there are a lot of ways to calculate it. Maybe over the weekend I'll code it up using a Goertzel tuned to DC and see how that works. I coded up the CIC and exponential averager versions in C, and found that the accuracy of the exponential averager varied wildly with N and the data set. For some data sets and some values of N, it was accurate to five decimal places. For other values of N, but with the same data, it was off by 25%. I'm sure there's room for improvement there. My analysis was neither rigorous nor exhaustive, and there may well be a bug in the code. My guess is that I need to set "a" to something other than a=N/(N+1). The code is available at http://users.erols.com/vathomas/stats.c if anyone wants to take a look. I ran it on a linux box compiling it like this: gcc -o stats -lm -Wall stats.c To run it you spec N and a file containing numbers - one to a line. ./stats <N> <filename> If filename is omitted, the numbers are read from stdin. I used the sizes of the files in my home directory as one source of numbers: ls -l ~ | awk '{print $5}' | ./stats 50 Enjoy. I know I have (so far). -- Jim Thomas Principal Applications Engineer Bittware, Inc jthomas@bittware.com http://www.bittware.com (603) 226-0404 x536 The plural of anecdote is not data.
in article 1133r0o4n4fmmf7@corp.supernews.com, Jim Thomas at
jthomas@bittware.com wrote on 03/11/2005 14:05:

> I've been messing about with this ever since Rick got a few responses on > the thread. Here's how I boil it down. > > variance = mean(x^2) - (mean(x))^2 > > There are a lot of ways to calculate the mean. The solution Clay posted > uses a CIC filter, and that produces a result identical to the > classical (clumsy, slow, bloated, and obvious) method, but a lot faster. > > Jim Shima proposed using an exponential averager to calculate the mean: > > y[n] = p*y[n-1] + (1-p)*x[n] > > He set p = N/(N+1). This has the advantage of requiring less memory - > in fact, it decouples the memory requirements from N, so that can be > very nice.
okay, now i'm feeling compelled to pay attention to this. (i've also done 1st order IIR filtering to smooth out audio parameters, such as pitch, but hadn't liked it because the tail *never* goes away.) in article 1110242270.308279.308320@g14g2000cwa.googlegroups.com, JS at jshima@timing.com wrote on 03/07/2005 19:37:
> For sample mean using sample input x[n]: > > m[n+1] = m[n] + 1/(N+1)*(x[n+1] - m[n]) > = N/(N+1)*m[n] + 1/(N+1)*x[n+1] = a nice first-order IIR filter >
interesting... that's the same as saying: (N+1)*m[n+1] = (N+1)*m[n] + (x[n+1] - m[n]) = N*m[n] + x[n+1] at the moment that n = N (which is sample N+1) this would mean: sum_up_to[n+1] = sum_up_to[n] + x[n+1] so, at that the (N+1)th sample, Jim Shima's mean is based on the entire sum from the moment you turned the box on when n = 1 (also assuming all samples before m[1] are out of the picture). that's for N+1 samples. and now what happens after that? to what extent are we forgetting the old sample to give meaning to the concept of *moving* variance? also, at that moment, what is the mean delay? when i look at the simple one pole filter y[n] = p*y[n-1] + (1-p)*x[n] ( "p" is the pole and it must be that 0 <= p < 1 ) i get a transfer function of: H(z) = (1-p)/(1 - p*z^-1) frequency response: H(exp(j*w)) = (1-p)/(1 - p*exp(-j*w)) = (1-p)/( 1-p*cos(w) + j*p*sin(w) ) mag (squared): |H(exp(j*w))|^2 = (1-p)^2/( (1-p*cos(w))^2 + (p*sin(w))^2 ) = (1-p)^2/( 1 + p^2 - 2*p*cos(w) ) and phase: arg{H(exp(j*w))} = -arctan( (p*sin(w))/(1-p*cos(w)) ) and group delay: tau(w) = -(d/dw)arg{ H(exp(j*w)) } = p*(cos(w) - p) / ( 1 + p^2 - 2*p*cos(w) ) can anyone find fault with this? (not entirely sure i did it right.) now the CIC filter with N+1 taps always has a delay of N/2 samples (for all frequencies) and the delay (at DC where group and phase delay are the same thing) of this is tau(0) = p/(1-p). if i were to compare the two, i would set tau(0) = p/(1-p) = N/2 so, to try to compare a comparable parameter of this apple to orange, i would set p = N/(N+2) (for an N+1 delay CIC filter) so why is this different than Jim's number??? at sample N+1, they should be precisely the same, but if he's using p = N/(N+1) and i'm using N/(N+2), they ain't precisely the same.
> For zero-mean sample variance (biased): > > s[n+1] = s[n] + 1/(N+1)*(x[n+1]^2 - s[n]), which is equal to the sample > variance and another nice IIR filter
that's the same as saying (N+1)*s[n+1] = (N+1)*s[n] + ((x[n+1])^2 - s[n]) (N+1)*s[n+1] = N*s[n] + (x[n+1])^2 so this MUST be for sum of squares, *not* (quite yet) for variance. at the moment that n = N (the (N+1)th sample) this would mean: sum_of_squares_up_to[n+1] = sum_of_squares_up_to[n] + (x[n+1])^2 still N+1 delay elements, so it would be comparable to the same p. for the CIC with N+1 delay elements i would still toss in a gain of (N+1)/N to unbias the sample variance.
> (both look like averagers but don't > have constant coefficients since N is changing).
oh, i just read that!! i'm trying to understand this with N _fixed_ and comparable to the N of the CIC filter. i see now, you don't even have an LTI system, your's is a time-variant system, so it's *not* a "nice IIR filter". your m[n] or s[n] states that feeds back are different than that of a couple of nice one-pole IIR filters. so here's my painfully arrived at conclusion: if you want to use a "nice IIR filter" instead of an N+1 delay element CIC to do your averaging, i would suggest: for estimated mean: m[n] = N/(N+2)*m[n-1] + 2/(N+2)*x[n] for estimated mean square: s[n] = N/(N+2)*s[n-1] + 2/(N+2)*(x[n])^2 and for *unbiased* estimated variance: v[n] = (N+1)/N * (s[n] - (m[n])^2) that's what i would recommend (but i would change every N+1 to N and call it comparable to a CIC filter with N delay elements and a delay of (N-1)/2 samples). -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Hi Robert,

Yeah, I tried to imply that the filter is not an LTI, it is time
variant since you dont have linear constant coefficients.  Basically I
was trying to give Rick another idea of a recursive mean and variance.
If you were going to use a LTI exponential averager with coeffs of (A)
and (1-A) then yes the analysis is different.

In the case I listed, the mean and variance equations will be equal to
their block sum equivalents at time "n".

That is:

for my m[n] at sample n:
m[n]  = sum(i=1 to n) { x[i]}

and for the variance (zero mean)
s[n] = sum(i=1 to n){x[i]^2}

so as N gets larger, the overall mean and variance gets better since
you are effectively computing a larger block mean/variance in a
recursive fashion.  In fact you will be computing the block mean and
variance for growing "n" at each sample pt.

This is not a "moving variance" as defined, but it can mock one if you
want to reset the filter at a given number of points "M".  In this case
you have a variance that will be equal to the block variance at each
point.

Definition wise for the mean computation, start with:

xbar[n] = 1/n* sum(i=1 to n) {x[i]}   // standard block mean for n
points

let:

xbar[n+1] = 1/(n+1)* sum(i=1 to n+1){x[i]}

so that:

xbar[n+1] = 1/(n+1)* sum(i=1 to n){x[i]}  +  1/(n+1)*x[n+1]
xbar[n+1] = n/(n+1)*xbar[n] +  1/(n+1) * x[n+1]

And you get the equation I listed.  It is a time-variant filter that
"looks" like an alpha filter with a = n/(n+1) and (1-a) = 1/(n+1), but
certainly cannot be analyzed as such.

At any rate, I didnt mean to confuse the issue at hand...  :)

I will look at your coeffs above for the LTI alpha filter, it looks
interesting.  Thanks...

Regards,
Jim

"robert bristow-johnson" <rbj@audioimagination.com> wrote in message
news:BE57780C.52D9%rbj@audioimagination.com...
> in article 1133r0o4n4fmmf7@corp.supernews.com, Jim Thomas at > jthomas@bittware.com wrote on 03/11/2005 14:05: > > > I've been messing about with this ever since Rick got a few responses on > > the thread. Here's how I boil it down. > > > > variance = mean(x^2) - (mean(x))^2 > > > > There are a lot of ways to calculate the mean. The solution Clay posted > > uses a CIC filter, and that produces a result identical to the > > classical (clumsy, slow, bloated, and obvious) method, but a lot faster. > > > > Jim Shima proposed using an exponential averager to calculate the mean: > > > > y[n] = p*y[n-1] + (1-p)*x[n] > > > > He set p = N/(N+1). This has the advantage of requiring less memory - > > in fact, it decouples the memory requirements from N, so that can be > > very nice. > > okay, now i'm feeling compelled to pay attention to this. (i've also done > 1st order IIR filtering to smooth out audio parameters, such as pitch, but > hadn't liked it because the tail *never* goes away.)
A side note, sometimes I've followed the 1st order IIR smoothing filter with a little routine that checks the result of the filter and if it is "close enough" and if it is, makes it exactly so. This "post-check" routine is run infrequently so it doesn't take much processing. It's not elegant, but it is effective (and quite important in the case of say smoothing the coefficients of a very narrow low-frequency notch filter).
"Jon Harris" <goldentully@hotmail.com> wrote in message 
news:39etu6F60u5gkU1@individual.net...
> "robert bristow-johnson" <rbj@audioimagination.com> wrote in message > news:BE57780C.52D9%rbj@audioimagination.com... >> in article 1133r0o4n4fmmf7@corp.supernews.com, Jim Thomas at >> jthomas@bittware.com wrote on 03/11/2005 14:05: >> >> > I've been messing about with this ever since Rick got a few responses >> > on >> > the thread. Here's how I boil it down. >> > >> > variance = mean(x^2) - (mean(x))^2 >> > >> > There are a lot of ways to calculate the mean. The solution Clay >> > posted >> > uses a CIC filter, and that produces a result identical to the >> > classical (clumsy, slow, bloated, and obvious) method, but a lot >> > faster. >> > >> > Jim Shima proposed using an exponential averager to calculate the mean: >> > >> > y[n] = p*y[n-1] + (1-p)*x[n] >> > >> > He set p = N/(N+1). This has the advantage of requiring less memory - >> > in fact, it decouples the memory requirements from N, so that can be >> > very nice. >> >> okay, now i'm feeling compelled to pay attention to this. (i've also >> done >> 1st order IIR filtering to smooth out audio parameters, such as pitch, >> but >> hadn't liked it because the tail *never* goes away.) > > A side note, sometimes I've followed the 1st order IIR smoothing filter > with a > little routine that checks the result of the filter and if it is "close > enough" > and if it is, makes it exactly so. This "post-check" routine is run > infrequently so it doesn't take much processing. It's not elegant, but it > is > effective (and quite important in the case of say smoothing the > coefficients of > a very narrow low-frequency notch filter). > >
A danger with these methods is you are now using an estimate of the sample mean which is in turn used to estimate the population mean. So now you have even more uncertainty with your variance estimator. So depending on how you are going to use your variance, you will need to factor in the extra uncertainty. Clay