comp.dsp | simple perspective in using floats for recursive moving-average| page 2

Reply by Tim Wescott ●December 27, 20162016-12-27

On Tue, 27 Dec 2016 07:22:53 -0500, Randy Yates wrote:

> Tim Wescott <seemywebsite@myfooter.really> writes:
> 
>> On Mon, 26 Dec 2016 17:03:55 -0500, Randy Yates wrote:
>>
>>> Yup. And admittedly I don't understand floating-point truncation.
>>
>> I just tried for a nice analogy, and failed.  But imagine that you want
>> to add 1/3 to 4/3, in a floating point notation that only allows an
>> 8-bit mantissa.  1/3 = 1r0101010x2^-2.  4/3 = 1r0101010x2^0.  Note that
>> truncation has already happened, because 1/3 is a repeating "decimal"
>> in binary -- ignore that, and just go with the above values.  To add
>> them, you first line them up, then add:
>>
>>     r010101010
>>  + 1r0101010
>>   -----------
>>    1r101010010
>>
>> This answer is exact in the stored numbers, but it is 10 bits.  So, you
>> whack off the two trailing bits:
>>
>> answer = 1r1010100x2^0
>>
>> This answer is off by (if I'm counting digits correctly) 2^-8.  Every
>> time you do an operation, the answer is off.
>>
>> It gets more complicated in the real world, but this is a summation of
>> the effect.
> 
> Right. Pretty straightforward. Although I wonder what complicates this?

There's more than one rounding method, and on some floating point 
hardware it's selectable.  Also, on some floating point hardware (notably 
just about anything from Intel), the FPU registers are longer than the 
external word -- IIRC it's 80 bits for double; I don't know if they 
support longer words for single-precision.  So not only is the truncation 
sometimes not so bad, but it's sensitive to how often the numbers are 
fetched from the FPU and put back in.

For at least one of my customers I'm doing some serious number crunching 
on code written in C++.  I use Linux, my customer uses Windows.  I get oh-
so-slightly different answers running the exact same data through 
executables compiled with the exact same code, only with different tool 
chains.

Personally, when it gets down to that level of detail I figure that the 
most sensible thing to do is to run away screaming -- but sometimes you 
have to just tough it out.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

I'm looking for work -- see my website!

Reply by Randy Yates ●December 29, 20162016-12-29

Tim Wescott <seemywebsite@myfooter.really> writes:

> On Tue, 27 Dec 2016 07:22:53 -0500, Randy Yates wrote:
>
>> Tim Wescott <seemywebsite@myfooter.really> writes:
>> 
>>> On Mon, 26 Dec 2016 17:03:55 -0500, Randy Yates wrote:
>>>
>>>> Yup. And admittedly I don't understand floating-point truncation.
>>>
>>> I just tried for a nice analogy, and failed.  But imagine that you want
>>> to add 1/3 to 4/3, in a floating point notation that only allows an
>>> 8-bit mantissa.  1/3 = 1r0101010x2^-2.  4/3 = 1r0101010x2^0.  Note that
>>> truncation has already happened, because 1/3 is a repeating "decimal"
>>> in binary -- ignore that, and just go with the above values.  To add
>>> them, you first line them up, then add:
>>>
>>>     r010101010
>>>  + 1r0101010
>>>   -----------
>>>    1r101010010
>>>
>>> This answer is exact in the stored numbers, but it is 10 bits.  So, you
>>> whack off the two trailing bits:
>>>
>>> answer = 1r1010100x2^0
>>>
>>> This answer is off by (if I'm counting digits correctly) 2^-8.  Every
>>> time you do an operation, the answer is off.
>>>
>>> It gets more complicated in the real world, but this is a summation of
>>> the effect.
>> 
>> Right. Pretty straightforward. Although I wonder what complicates this?
>
> There's more than one rounding method, and on some floating point 
> hardware it's selectable.  Also, on some floating point hardware (notably 
> just about anything from Intel), the FPU registers are longer than the 
> external word -- IIRC it's 80 bits for double; I don't know if they 
> support longer words for single-precision.  So not only is the truncation 
> sometimes not so bad, but it's sensitive to how often the numbers are 
> fetched from the FPU and put back in.

It was 80 bits back to the 80s when the 387 was a separate
floating-point processor chip, I believe.

I was thinking complications are also associated with NANs, INFs,
denormalization, etc.

> For at least one of my customers I'm doing some serious number crunching 
> on code written in C++.  I use Linux, my customer uses Windows.  I get oh-
> so-slightly different answers running the exact same data through 
> executables compiled with the exact same code, only with different tool 
> chains.

Are you dealing with data which have magnitudes very near to zero (i.e.,
minimum exponent? If so it may have something to do with the denormal
modes used:

  https://en.wikipedia.org/wiki/Denormal_number

but this is just a guess.

> Personally, when it gets down to that level of detail I figure that the 
> most sensible thing to do is to run away screaming -- but sometimes you 
> have to just tough it out.

Makes me want to run screaming to fixed-point. 
-- 
Randy Yates, DSP/Embedded Firmware Developer
Digital Signal Labs
http://www.digitalsignallabs.com

Reply by ●December 30, 20162016-12-30

I use these filters all the time, but there's this nagging worry that a single-cycle math error (caused by a glitch or an alpha particle or whatever ) will produce a permanent DC offset in the output.  If you want to get fancy you can detect this and reset your filter. 

Bob

Reply by Tim Wescott ●December 30, 20162016-12-30

On Fri, 30 Dec 2016 05:56:16 -0800, radams2000 wrote:

> I use these filters all the time, but there's this nagging worry that a
> single-cycle math error (caused by a glitch or an alpha particle or
> whatever ) will produce a permanent DC offset in the output.  If you
> want to get fancy you can detect this and reset your filter.
> 
> Bob

The technique used in CIC filters makes that a non-problem -- instead of 
saving the vector of past data and subtracting out the n-th oldest one 
from the sum (and thus preserving any errors to the sum forever), save a 
vector of past sums and subtract out the n-th oldest one.  If you use 
wrap-on-overflow arithmetic, then your alpha-particle data-smash will 
only affect the filter output for at most n samples -- the same as if it 
happened to the input data.

In general it means you're saving data with a wider word length, but 
you're picking up some dependability.

-- 
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work!  See my website if you're interested
http://www.wescottdesign.com

Reply by ●January 1, 20172017-01-01

Tim

That's true of CIC decimation filters, however Randy's post made no mention of rate changes so he was referring to a filter where you take the difference between the input and output of a delay line and integrate the result, in which case an error will cause a permanent DC offset. 
When using CIC filters for changing the rate, decimation and interpolation behave differently. In decimation mode, the integrators come first (followed by differentiation after the decimation) and wrap around as you described. With interpolation, the differentiators come first, followed by integrators at the high rate, and those are the structures that will turn into white noise generators with a single math error. 

Bob

Reply by Tim Wescott ●January 1, 20172017-01-01

On Sun, 01 Jan 2017 07:11:42 -0800, radams2000 wrote:

> Tim
> 
> That's true of CIC decimation filters, however Randy's post made no
> mention of rate changes so he was referring to a filter where you take
> the difference between the input and output of a delay line and
> integrate the result, in which case an error will cause a permanent DC
> offset.
> When using CIC filters for changing the rate, decimation and
> interpolation behave differently. In decimation mode, the integrators
> come first (followed by differentiation after the decimation) and wrap
> around as you described. With interpolation, the differentiators come
> first, followed by integrators at the high rate, and those are the
> structures that will turn into white noise generators with a single math
> error.

Doesna matter -- do the math.  Instead of saving past values of the 
input, save past values of the sum.  Yes, you have to save ALL past 
values of the sum, and not just the ones on the decimation boundary, but 
then, you don't have that niggling worry about persistent errors.

Because -- (B - A) mod C is still (B - A) mod C, even if you change the 
details of how you got there.

-- 
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work!  See my website if you're interested
http://www.wescottdesign.com

Reply by Randy Yates ●January 1, 20172017-01-01

Tim Wescott <tim@seemywebsite.com> writes:

> On Fri, 30 Dec 2016 05:56:16 -0800, radams2000 wrote:
>
>> I use these filters all the time, but there's this nagging worry that a
>> single-cycle math error (caused by a glitch or an alpha particle or
>> whatever ) will produce a permanent DC offset in the output.  If you
>> want to get fancy you can detect this and reset your filter.
>> 
>> Bob
>
> The technique used in CIC filters makes that a non-problem -- instead of 
> saving the vector of past data and subtracting out the n-th oldest one 
> from the sum (and thus preserving any errors to the sum forever), save a 
> vector of past sums and subtract out the n-th oldest one.  If you use 
> wrap-on-overflow arithmetic, then your alpha-particle data-smash will 
> only affect the filter output for at most n samples -- the same as if it 
> happened to the input data.
>
> In general it means you're saving data with a wider word length, but 
> you're picking up some dependability.

Tim,

I don't understand. Are you saying form a filter with this difference
equation:

  y[n] = y[n - 1] - y[n - N] + x[n] ?

If my math is correct, this works out to a z-transform of

  Y(z)/X(z) = 1 - z^(-1) + z^(-N),

and this is not the same as a boxcar sum, is it?
-- 
Randy Yates, DSP/Embedded Firmware Developer
Digital Signal Labs
http://www.digitalsignallabs.com

Reply by Tim Wescott ●January 2, 20172017-01-02

On Sun, 01 Jan 2017 22:09:42 -0500, Randy Yates wrote:

> Tim Wescott <tim@seemywebsite.com> writes:
> 
>> On Fri, 30 Dec 2016 05:56:16 -0800, radams2000 wrote:
>>
>>> I use these filters all the time, but there's this nagging worry that
>>> a single-cycle math error (caused by a glitch or an alpha particle or
>>> whatever ) will produce a permanent DC offset in the output.  If you
>>> want to get fancy you can detect this and reset your filter.
>>> 
>>> Bob
>>
>> The technique used in CIC filters makes that a non-problem -- instead
>> of saving the vector of past data and subtracting out the n-th oldest
>> one from the sum (and thus preserving any errors to the sum forever),
>> save a vector of past sums and subtract out the n-th oldest one.  If
>> you use wrap-on-overflow arithmetic, then your alpha-particle
>> data-smash will only affect the filter output for at most n samples --
>> the same as if it happened to the input data.
>>
>> In general it means you're saving data with a wider word length, but
>> you're picking up some dependability.
> 
> Tim,
> 
> I don't understand. Are you saying form a filter with this difference
> equation:
> 
>   y[n] = y[n - 1] - y[n - N] + x[n] ?
> 
> If my math is correct, this works out to a z-transform of
> 
>   Y(z)/X(z) = 1 - z^(-1) + z^(-N),
> 
> and this is not the same as a boxcar sum, is it?

Given u[n] in and y[n] out:

y[n] = y[n - 1] + u[n] - u[n - N]

vs.

y[n] = sum_k={n-N+1}^n u[n]

vs.

x[n] = x[n - 1] + u[n]
y[n] = x[n] - x[n - N]

Unless I've done something tremendously stupid with my math, they all 
have the transfer function H(z) = (1 - z^-N)/(1 - z^-1) = 1 + z^-1 + ... 
+ z^-(N-1).

The first is "traditional", the second is the cannonical FIR way of doing 
it.  The third looks like it'd have all sorts of overflow problems, and 
would if you were using floats, but if you're using fixed-point overflow 
arithmetic ("typical C implementation" fixed point), then it'll be just 
fine.

Looked at another way, the first and third cascade a difference that 
spans N taps and an integrator -- it's just that the third method 
switches the order of operations.

-- 
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work!  See my website if you're interested
http://www.wescottdesign.com

Reply by rickman ●January 2, 20172017-01-02

On 12/25/2016 11:27 PM, Randy Yates wrote:
> A colleague was explaining that when he tried using floats in a
> recursive moving-average filter, there was some residual error which
> wasn't present in the standard (non-recursive) moving-average filter.
>
> Would it be correct and reasonable to frame the problem like this:
>
> In the recursive moving-average implementation, the output y[n - 1]
> needs to store perfect knowledge of the past N states, so that when you
> subtract the input N - 1 samples back from y[n - 1], you perfectly
> maintain the sum of inputs x[n - N + 2] to x[n - 1]. However, it does
> not.
>
> In general, every addition/subtraction to y[n - m], m < N, produces a
> quantization error; this quantization error is propagated into the new
> output y[n].
>
> So it is the fact that the output "state" of the filter (i.e., the
> output y[n - 1]) is not perfect, i.e., the state of the filter is
> tainted, that creates this residual error.
>
> This also illustrates why a fixed-point implementation (using the
> appropriate intermediate accumulator width) of the same filter
> works: the output state is known perfectly.
>
> Correct? Good way to view it? (Trivial???)

Am I correct in thinking the recursive version of the filter is one 
where the current value of the output is calculated from the previous 
value of the output, the current value of the input and an older value 
of the input?  The idea being that the recursive calculation is fewer 
steps than the "standard" approach where each output is calculated from 
a series of values of the input.

-- 

Rick C

Reply by Randy Yates ●January 2, 20172017-01-02

rickman <gnuarm@gmail.com> writes:

> On 12/25/2016 11:27 PM, Randy Yates wrote:
>> A colleague was explaining that when he tried using floats in a
>> recursive moving-average filter, there was some residual error which
>> wasn't present in the standard (non-recursive) moving-average filter.
>>
>> Would it be correct and reasonable to frame the problem like this:
>>
>> In the recursive moving-average implementation, the output y[n - 1]
>> needs to store perfect knowledge of the past N states, so that when you
>> subtract the input N - 1 samples back from y[n - 1], you perfectly
>> maintain the sum of inputs x[n - N + 2] to x[n - 1]. However, it does
>> not.
>>
>> In general, every addition/subtraction to y[n - m], m < N, produces a
>> quantization error; this quantization error is propagated into the new
>> output y[n].
>>
>> So it is the fact that the output "state" of the filter (i.e., the
>> output y[n - 1]) is not perfect, i.e., the state of the filter is
>> tainted, that creates this residual error.
>>
>> This also illustrates why a fixed-point implementation (using the
>> appropriate intermediate accumulator width) of the same filter
>> works: the output state is known perfectly.
>>
>> Correct? Good way to view it? (Trivial???)
>
> Am I correct in thinking the recursive version of the filter is one
> where the current value of the output is calculated from the previous
> value of the output, the current value of the input and an older value
> of the input?  The idea being that the recursive calculation is fewer
> steps than the "standard" approach where each output is calculated
> from a series of values of the input.

Hi rickman,

Yes. 

  y[n] = y[n - 1] - x[n - N] + x[n]

-- 
Randy Yates, DSP/Embedded Firmware Developer
Digital Signal Labs
http://www.digitalsignallabs.com

Previous 123 4 Next

simple perspective in using floats for recursive moving-average

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group