Reply by robert bristow-johnson May 21, 20082008-05-21
On May 20, 7:26&#4294967295;pm, Scott Seidman <namdiestt...@mindspring.com> wrote:
...
>>> function y=movingvar(X,N) >>> % y=movingvar(X,N) >>> % Calculates N-point moving variance of Vector X >>> % Highly recommend that N be odd (no error checking) >>> % Note: first and last N/2 points will be unreliable.
...
> "John E. Hadstate" <jh113...@hotmail.com> wrote innews:ttydna1tkKeOxa7VnZ2dnUVZ_gydnZ2d@supernews.com: > >> I guess that about covers all of them, doesn't it? > > X can be any length-- usually its orders of magnitude bigger than N >
now i'm confused. vatdaheck is N? ??? BTW, John, i wasn't trying to pick on you; more to say that i've been pretty familiar with this moving variance estimator for a while since my school daze and to point out to K that i agree with you that there are differences between what he was doing and what you were doing. r b-j
Reply by Scott Seidman May 20, 20082008-05-20
"John E. Hadstate" <jh113355@hotmail.com> wrote in 
news:ttydna1tkKeOxa7VnZ2dnUVZ_gydnZ2d@supernews.com:

> I guess that about covers all of them, doesn't it?
X can be any length-- usually its orders of magnitude bigger than N -- Scott Reverse name to reply
Reply by John E. Hadstate May 20, 20082008-05-20
"Scott Seidman" <namdiesttocs@mindspring.com> wrote in message 
news:Xns9AA4A1C1320B7scottseidmanmindspri@130.133.1.4...
> "John E. Hadstate" <jh113355@hotmail.com> wrote in > news:NpydneugMrGIvK3VnZ2dnUVZ_gednZ2d@supernews.com: > >> Calculate Moving Variance: >> V = (N * SX2 - (SX1 * SX1)) / (N * (N - 1)) > > > I wrote a moving var filter using convolution and simple math > once. This > is matlab code. > > > > function y=movingvar(X,N) > % y=movingvar(X,N) > % Calculates N-point moving variance of Vector X > % Highly recommend that N be odd (no error checking) > % Note: first and last N/2 points will be unreliable.
^^^^^^^^^^^^^^^^^^^^^^^^^ I guess that about covers all of them, doesn't it?
> % Output will be a column vector. > % Please don't distribute without header info > % Scott Seidman 1/23/99 > > d > > X=X(:); > XSQR=X.*X; > convsig=ones(1,N); > y=(conv(convsig,XSQR)-(conv(convsig,X).^2)/N)/(N-1); > > y=y(ceil(N/2):length(X)+floor(N/2)); > > -- > Scott > Reverse name to reply
Reply by Scott Seidman May 20, 20082008-05-20
Scott Seidman <namdiesttocs@mindspring.com> wrote in
news:Xns9AA4A1C1320B7scottseidmanmindspri@130.133.1.4: 

> "John E. Hadstate" <jh113355@hotmail.com> wrote in > news:NpydneugMrGIvK3VnZ2dnUVZ_gednZ2d@supernews.com: > >> Calculate Moving Variance: >> V = (N * SX2 - (SX1 * SX1)) / (N * (N - 1)) > > > I wrote a moving var filter using convolution and simple math once. > This is matlab code. > > > > function y=movingvar(X,N) > % y=movingvar(X,N) > % Calculates N-point moving variance of Vector X > % Highly recommend that N be odd (no error checking) > % Note: first and last N/2 points will be unreliable. > % Output will be a column vector. > % Please don't distribute without header info > % Scott Seidman 1/23/99 > > d > > X=X(:); > XSQR=X.*X; > convsig=ones(1,N); > y=(conv(convsig,XSQR)-(conv(convsig,X).^2)/N)/(N-1); > > y=y(ceil(N/2):length(X)+floor(N/2)); >
oops-- inadvertant "d" insertion in the above. delete the line. -- Scott Reverse name to reply Hak mir nisht ken tshaynik
Reply by Scott Seidman May 20, 20082008-05-20
"John E. Hadstate" <jh113355@hotmail.com> wrote in 
news:NpydneugMrGIvK3VnZ2dnUVZ_gednZ2d@supernews.com:

> Calculate Moving Variance: > V = (N * SX2 - (SX1 * SX1)) / (N * (N - 1))
I wrote a moving var filter using convolution and simple math once. This is matlab code. function y=movingvar(X,N) % y=movingvar(X,N) % Calculates N-point moving variance of Vector X % Highly recommend that N be odd (no error checking) % Note: first and last N/2 points will be unreliable. % Output will be a column vector. % Please don't distribute without header info % Scott Seidman 1/23/99 d X=X(:); XSQR=X.*X; convsig=ones(1,N); y=(conv(convsig,XSQR)-(conv(convsig,X).^2)/N)/(N-1); y=y(ceil(N/2):length(X)+floor(N/2)); -- Scott Reverse name to reply
Reply by robert bristow-johnson May 20, 20082008-05-20
On May 20, 2:54 pm, kronec...@yahoo.co.uk wrote:
...
> I agree and that's why I said that it only applies for stationary > signals otherwise it is useless (which is normally the case in fact!).
but it's not useless then. that's why we create the term "quasi- stationary".
> That is where the other methods come into their own but strictly > speaking it is no longer variance when it is time-varying.
it is an estimation of population variance which is a slightly scaled measure of the sample variance and the sample variance *may* be time- varying since the sample (being that we're DSP folk here rather than statistics folk, i should say "collection of samples") actually varies as "k" increases.
> The forgetting factor method is an ad-hoc IIR filtering but does work.
sure, why shouldn't it. but because it's a weighted sample of the population, it needs to have the sample variance (computed the way we think it would be from the IIR averaging filters) bumped up slightly, depending on that forgetting factor.
> It is only first order and you can improve on this I am sure with > higher order FIR filters to get smoother results with tracking as > well. It is quite an important point really since this problem crops > up all the time when you least expect it.
i'm not sure what is the important point or the problem that crops up all the time. do you mean the difference between sample variance and population variance? if so, i don't think it is *much* of a problem if N is large or if beta is close to 1. r b-j
Reply by May 20, 20082008-05-20
On May 21, 3:56 am, robert bristow-johnson <r...@audioimagination.com>
wrote:
> On May 20, 4:40 am, kronec...@yahoo.co.uk wrote: > > > > > It's hard to explain with just a text editor > > oh, you need to learn how to do "ASCII math" (a relative of "ASCII > art"). > > assume your reader is viewing it with a mono-spaced font, make liberal > use of spaces and parenths. don't use tabs. > > here, i'll show you. since this is "comp.dsp" and DSP is still often > considered a sub-discipline of electrical engineering, and since EE's > use the symbol u(t) or u[n] to mean the "unit step > function" (continuous or discrete time), and since i made reference to > the unit step function in this thread, i am changing your "u" to a > general input "x". sticking with the convention of most DSP lit > nowadaze, we use brackets "[]" for the discrete time or discrete > frequency independent variable (e.g. "x[n]") and pareths for > continuous independent variable, "x(t)". some people might use > brackets instead of parenths to group together multiple terms, but i > don't. often i use "*" for an explicit multiply. i seldom need a > symbol for convolution, but if i do, i use "(*)" to differentiate it > from a simple multiply. > > > but if you take the variance with N samples you get S[N]. > > We then add one term to the summation to get S[N+1] and write S[N+1] > > in terms of S(N) with the extra term (x[N+1])^2/(N+1). also use the fact > > that N/(N+1) = 1-1/(N+1). It works out and you get recursive variance. > > It will converge when N gets large for stationary signals. This is not > > an approximation like exponential smoothing is. (I am not claiming the > > expression with beta in it is exact - it is the other one I quoted). > > > Ok so here we go - for N samples > > N > S[N] = (1/N) SUM (x[k])^2 (1) > k=1 > > > Add a term > > N+1 > S[N+1] = (1/(N+1)) SUM (x[k])^2 (2) > k=1 > > N > = (1/(N+1)) SUM (x[k])^2 + (1/(N+1))*(x[N+1])^2 > k=1 > > = (N/(N+1))*S[N] + (1/(N+1))*(x[N+1])^2 > > = S[N] + (1/(N+1))*( (x[N+1])^2 - S[N] ) > > > QED > > This is the right expression - . > > fine. nice and easily read. took me less than 4 minutes. > > now, an issue that you're not dealing with. let's suppose, for the > time being, that the mean of x[k] is zero (or it had the mean > subtracted out) so that the mean square, S[N] represents the "sample > variance". it is a known fact in statistical analysis that the sample > variance, as computed by the mean square of the samples, is a slightly > biased estimator of the true population variance and has to be > multiplied by N/(N-1) to unbias it. this is in that thread that John > referred to and, because i was asked about it, i proved this > explicitly in this post:http://groups.google.com/group/comp.dsp/msg/1d89215c51200101?dmode=so... > . > > now let's suppose there is a mean and we estimate it with > > N > m[N] = (1/N) SUM x[k] > k=1 > > and you compute the variance as: > > V[N] = N/(N-1) * ( S[N] - (m[N])^2 ) > > you'll get the same variance John does for all of x[k] such that > 1 <= k <= N. but it's not a *moving* variance (as is John's), if we > use the recursive summation you do, since that summation *never* > forgets and goes all the way back to x[1]. it is a comprehensive > variance of all of the samples ever taken. no forgetting factor. > it's like an integrator that is never reset. that's how it is > different from what John put forth, which remembers only the most > recent N samples. > > now that exponentially weighted calculation using a recursive IIR > filter *does* have a forgetting factor, and, even though i do not have > the notes on me to prove it (but if i feel like "recreational > mathematics", i can re-derive it), i do believe that a compensating > factor similar to N/(N-1) is needed to unbias the variance estimator, > but i do not remember precisely what it is, but suspect that it is (1 > + (1-beta)/2) because i seem to remember that it was related to the > "mean length" of the impulse response of either filter (the > exponentially weighted IIR or the constant weighted FIR). > > r b-j
I agree and that's why I said that it only applies for stationary signals otherwise it is useless (which is normally the case in fact!). That is where the other methods come into their own but strictly speaking it is no longer variance when it is time-varying. The forgetting factor method is an ad-hoc IIR filtering but does work. It is only first order and you can improve on this I am sure with higher order FIR filters to get smoother results with tracking as well. It is quite an important point really since this problem crops up all the time when you least expect it. K.
Reply by robert bristow-johnson May 20, 20082008-05-20
On May 20, 11:56 am, robert bristow-johnson
<r...@audioimagination.com> wrote:
...
> > now that exponentially weighted calculation using a recursive IIR > filter *does* have a forgetting factor, and, even though i do not have > the notes on me to prove it (but if i feel like "recreational > mathematics", i can re-derive it), i do believe that a compensating > factor similar to N/(N-1) is needed to unbias the variance estimator,
looks like i *did* re-do this in 2005 in that same thread. http://groups.google.com/group/comp.dsp/browse_frm/thread/330ac90a92f8dfaf/00879d7a66e43589#00879d7a66e43589 i can't remember shit. (too many decades of cannabis.) i think that (1 + (1-beta)/2) is not the correct compensation factor. it looks like it should be (1 + beta)/(2*beta) and i can't tell that they are ostensibly the same.
> but i do not remember precisely what it is, but suspect that it is (1 > + (1-beta)/2) because i seem to remember that it was related to the > "mean length" of the impulse response of either filter (the > exponentially weighted IIR or the constant weighted FIR).
r b-j
Reply by robert bristow-johnson May 20, 20082008-05-20
On May 20, 4:40 am, kronec...@yahoo.co.uk wrote:

> > It's hard to explain with just a text editor
oh, you need to learn how to do "ASCII math" (a relative of "ASCII art"). assume your reader is viewing it with a mono-spaced font, make liberal use of spaces and parenths. don't use tabs. here, i'll show you. since this is "comp.dsp" and DSP is still often considered a sub-discipline of electrical engineering, and since EE's use the symbol u(t) or u[n] to mean the "unit step function" (continuous or discrete time), and since i made reference to the unit step function in this thread, i am changing your "u" to a general input "x". sticking with the convention of most DSP lit nowadaze, we use brackets "[]" for the discrete time or discrete frequency independent variable (e.g. "x[n]") and pareths for continuous independent variable, "x(t)". some people might use brackets instead of parenths to group together multiple terms, but i don't. often i use "*" for an explicit multiply. i seldom need a symbol for convolution, but if i do, i use "(*)" to differentiate it from a simple multiply.
> but if you take the variance with N samples you get S[N]. > We then add one term to the summation to get S[N+1] and write S[N+1] > in terms of S(N) with the extra term (x[N+1])^2/(N+1). also use the fact > that N/(N+1) = 1-1/(N+1). It works out and you get recursive variance. > It will converge when N gets large for stationary signals. This is not > an approximation like exponential smoothing is. (I am not claiming the > expression with beta in it is exact - it is the other one I quoted). > > Ok so here we go - for N samples >
N S[N] = (1/N) SUM (x[k])^2 (1) k=1
> > Add a term >
N+1 S[N+1] = (1/(N+1)) SUM (x[k])^2 (2) k=1 N = (1/(N+1)) SUM (x[k])^2 + (1/(N+1))*(x[N+1])^2 k=1 = (N/(N+1))*S[N] + (1/(N+1))*(x[N+1])^2 = S[N] + (1/(N+1))*( (x[N+1])^2 - S[N] )
> QED > This is the right expression - .
fine. nice and easily read. took me less than 4 minutes. now, an issue that you're not dealing with. let's suppose, for the time being, that the mean of x[k] is zero (or it had the mean subtracted out) so that the mean square, S[N] represents the "sample variance". it is a known fact in statistical analysis that the sample variance, as computed by the mean square of the samples, is a slightly biased estimator of the true population variance and has to be multiplied by N/(N-1) to unbias it. this is in that thread that John referred to and, because i was asked about it, i proved this explicitly in this post: http://groups.google.com/group/comp.dsp/msg/1d89215c51200101?dmode=source . now let's suppose there is a mean and we estimate it with N m[N] = (1/N) SUM x[k] k=1 and you compute the variance as: V[N] = N/(N-1) * ( S[N] - (m[N])^2 ) you'll get the same variance John does for all of x[k] such that 1 <= k <= N. but it's not a *moving* variance (as is John's), if we use the recursive summation you do, since that summation *never* forgets and goes all the way back to x[1]. it is a comprehensive variance of all of the samples ever taken. no forgetting factor. it's like an integrator that is never reset. that's how it is different from what John put forth, which remembers only the most recent N samples. now that exponentially weighted calculation using a recursive IIR filter *does* have a forgetting factor, and, even though i do not have the notes on me to prove it (but if i feel like "recreational mathematics", i can re-derive it), i do believe that a compensating factor similar to N/(N-1) is needed to unbias the variance estimator, but i do not remember precisely what it is, but suspect that it is (1 + (1-beta)/2) because i seem to remember that it was related to the "mean length" of the impulse response of either filter (the exponentially weighted IIR or the constant weighted FIR). r b-j
Reply by May 20, 20082008-05-20
On May 20, 12:19 pm, robert bristow-johnson
<r...@audioimagination.com> wrote:
> On May 19, 7:22 pm, kronec...@yahoo.co.uk wrote: > > > > > On May 19, 10:10 pm, "John E. Hadstate" <jh113...@hotmail.com> wrote: > > > > <kronec...@yahoo.co.uk> wrote in message > > > >news:3e0aadd8-6179-444b-a81d-ec7ab37f9986@d19g2000prm.googlegroups.com... > > > > > On May 19, 12:28 am, "John E. Hadstate" > > > > <jh113...@hotmail.com> wrote: > > > >> Then, at each time-step k, compute: > > > > >> X1 = (new sample value) > > > >> X2 = X1 * X1 > > > > >> Y1 = (oldest X1 value from X1 history buffer) > > > >> Y2 = (oldest X2 value from X2 history buffer) > > > > >> Overwrite oldest value in X1 history buffer with X1. > > > >> Overwrite oldest value in X2 history buffer with X2. > > > > >> SX1(k) = SX1(k-1) + X1 - Y1 > > > >> SX2(k) = SX2(k-1) + X2 - Y2 > > > > >> Calculate Moving Average: > > > >> M = SX1 / N > > > > >> Calculate Moving Variance: > > > >> V = (N * SX2 - (SX1 * SX1)) / (N * (N - 1)) > > > > > A simpler way is to do this.... > > > > > s(k+1)=beta*s(k)+(1-beta)*u^2(k) > > > > > where beta is a forgetting factor less than unity. > > > > > u is the input and s is the variance. > > > > Caveat Emptor!! This is not "a simpler way to do this". The > > > method described by Hadstate computes what everyone in the > > > world agrees is the sample variance for a population size of N > > > and correctly evaluates the sample variance in that population > > > as new measurements replace old ones. The method of > > > "kronec...@yahoo.co.uk" will only accidentally produce the same > > > results. > > it won't produce the same results. it's a different formula. it is > computing a *different* weighted mean and mean-square. > > > Which method? The second method (not exponential smoothing) IS exact! > > It is called recursive variance in the literature just as we can get > > recursive mean in a similar way. I assume here that the signal is > > stationary. If it isn't then essentially it is no longer variance but > > time-varying variance. Take your pick of the many methods. Then I > > agree the moving window approaches come into their own. > > i don't see how your "second method" (if i understand what you mean by > the term) is "not exponential smoothing". i *do* see how > > y[n] = beta*y[n-1] + (1-beta)*(x[n])^2 > > gives you a form of a moving mean square (with exponential > weighting). the averaging is done with a IIR filter with DC gain = 1 > and what John H says is what "everyone in the world agrees is the > sample variance for a population size of N" is n FIR filter with DC > gain of *almost* 1 (but is a little more to compensate for the bias in > the estimator of variance using that weighting method). both are > weighted averages but apply different weightings to the averaging > function. your "beta" has for a counterpart "N" in John H's thing. > the impulse response to both averagers are: > > yours: h[n] = (1-beta) * beta^n * u[n] > > John's: h[n] = 1/(N-1) * (u[n] - u[n-(N-1)) > > where u[n] is the unit step function, 1 for non-negative n and 0 for > negative n. > > now both of your impulse responses have a mean length. for your's it > is 1/(1-beta) and for John's it is (N-1)/2 . if you were to equate > those, you would have a rough comparison if you would define > > beta = 1 - 2/(N-1) or N = 2/(1-beta) + 1 > > now, to unbias the estimator, John has to take the naive mean of the > variances and multiply it by N/(N-1) to get the correct estimate of > variance. so my question to you, K, is do you have to also unbias > your estimate of variance by multiplying by an equivalent factor, > perhaps by > > 1 + (1-beta)/2 ? > > that's something to think about. > > not to denigrate John or anything, but i had in grad school (3 decades > ago) come up with the same basic method for moving estimates of > variance and mean of real-time data and i implemented it on a 6800 > based system (lotsa "add-with-carry" instructions for quintuple > precision arithmetic). i also considered a first-order IIR averager > like you did, K, and i would have to dig out some very old school > papers, but i think i came up with the need to scale the estimator of > variance by a factor similar to the above. have you considered this > issue? like the FIR moving average, it doesn't make much difference > if N is very large or if beta is very close to 1. > > r b-j
It's hard to explain with just a text editor but if you take the variance with N samples you get S(N). We then add one term to the summation to get S(N+1) and write S(N+1) in terms of S(N) with the extra term u(N+1)^2/(N+1). also use the fact that N/(N+1) = 1-1/(N+1). It works out and you get recursive variance. It will converge when N gets large for stationary signals. This is not an approximation like exponential smoothing is. (I am not claiming the expression with beta in it is exact - it is the other one I quoted). Ok so here we go - for N samples S(N)= 1/NSUM (k=1 to N) of u^2(k)....................(1) Add a term S(N+1)=1/(N+1)SUM(k=1 to N+1) of u^2(k).............(2) =1/(N+1)SUM(k=1 to N) of u^2(k) + (1/N+1)u^2(N+1) Now NS(N) = SUM(k=1 to N) of u^2(k) from (1) Therefore substitute into (2) S(N+1)=N/(N+1)S(N)+(1/N+1)u^2(N+1) N/(N+1)=1-(1/N+1) S(N+1)=S(N)+1/(N+1) { u^2(N+1)-S(N)} QED This is the right expression - .