DSPRelated.com
Forums

Rounding off problems in IIR filters

Started by kalki August 9, 2004
Hi,

I have some problem in implementing a second order high pass IIR
filter.The problem is that when I tried finding the step response of
the filter, the response I got was varying from what was actually
expected of it. When I traced back, I found the following problem.

Earlier I had thought that the problem was only with respect to the
scaling of the IIR filter coefficients. But I had not considered the
effects of the inherent rounding off of the coefficients caused by the
fixed point DSP (TMS320F2812)used for our application. Consider that u
have the filter coefficients of a second order high pass IIR filter.
The coefficients are ofcourse, floating point numbers with infinite
precision. when we implement it using a fixed point DSP, we'll
normally multiply the coefficients with some scaling factor ( that is,
shift it left by say x no. of bits), use it inside the DSP, shift back
the filter o/p right by the same x bits. So, rounding off takes place
in 2 steps here, one is when u multiply the coefficients by the
scaling factor and round it off, the later being the round off
introduced by the DSP when u shift back the results. When I tried
finding out the frequency response of the filter using the truncated
coefficients, that varied significantly in amplitude, phase and the
frequency when compared to that of the response obtained using the
unscaled coefficients. I don't have any control over the round off
which DSP gives. The pole-zero plot also veries significantly based on
the rounding off of filter coefficients. Any work around for this is
available?

Any help in this regard is highly appreciated.

Thanks & Regards
KK
What is the resolution of your data?  16-bit?  Also, what is the format?
Q15?  There is a iir library function provided by TI to do a 16-bit iir
filter.

Also, why do you say you don't have control over your rounding?  Who does?

It sounds like you might just be making some mistakes in implementing the
fixed point math.  Most likely you're clobbering bits you shouldn't be and
that's destroying your result.  If that's not the case then you should just
keep more internal precision in your computation.  You can maintain 64-bit
precision using the 28xx devices.  There's a good example of this with the
IMACL instruction in the CPU and Instruction Set Guide.

Brad

"kalki" <kalyanik@myw.ltindia.com> wrote in message
news:a130dc51.0408090230.512883fc@posting.google.com...
> Hi, > > I have some problem in implementing a second order high pass IIR > filter.The problem is that when I tried finding the step response of > the filter, the response I got was varying from what was actually > expected of it. When I traced back, I found the following problem. > > Earlier I had thought that the problem was only with respect to the > scaling of the IIR filter coefficients. But I had not considered the > effects of the inherent rounding off of the coefficients caused by the > fixed point DSP (TMS320F2812)used for our application. Consider that u > have the filter coefficients of a second order high pass IIR filter. > The coefficients are ofcourse, floating point numbers with infinite > precision. when we implement it using a fixed point DSP, we'll > normally multiply the coefficients with some scaling factor ( that is, > shift it left by say x no. of bits), use it inside the DSP, shift back > the filter o/p right by the same x bits. So, rounding off takes place > in 2 steps here, one is when u multiply the coefficients by the > scaling factor and round it off, the later being the round off > introduced by the DSP when u shift back the results. When I tried > finding out the frequency response of the filter using the truncated > coefficients, that varied significantly in amplitude, phase and the > frequency when compared to that of the response obtained using the > unscaled coefficients. I don't have any control over the round off > which DSP gives. The pole-zero plot also veries significantly based on > the rounding off of filter coefficients. Any work around for this is > available? > > Any help in this regard is highly appreciated. > > Thanks & Regards > KK
kalki wrote:
> Hi, > > I have some problem in implementing a second order high pass IIR > filter.The problem is that when I tried finding the step response of > the filter, the response I got was varying from what was actually > expected of it. When I traced back, I found the following problem. > > Earlier I had thought that the problem was only with respect to the > scaling of the IIR filter coefficients. But I had not considered the > effects of the inherent rounding off of the coefficients caused by the > fixed point DSP (TMS320F2812)used for our application. Consider that u > have the filter coefficients of a second order high pass IIR filter. > The coefficients are ofcourse, floating point numbers with infinite > precision.
"Floating point" refers to a binary representation of a real number, with a fixed-length mantissa and exponent. It is _not_ infinite precision. Real numbers have infinite precision.
> when we implement it using a fixed point DSP, we'll > normally multiply the coefficients with some scaling factor ( that is, > shift it left by say x no. of bits), use it inside the DSP, shift back > the filter o/p right by the same x bits. So, rounding off takes place > in 2 steps here, one is when u multiply the coefficients by the > scaling factor and round it off, the later being the round off > introduced by the DSP when u shift back the results. When I tried > finding out the frequency response of the filter using the truncated > coefficients, that varied significantly in amplitude, phase and the > frequency when compared to that of the response obtained using the > unscaled coefficients.
Welcome to the real world.
> I don't have any control over the round off which DSP gives.
Eh? Who's writing the software?
> The pole-zero plot also veries significantly based on > the rounding off of filter coefficients. Any work around for this is > available? > > Any help in this regard is highly appreciated. > > Thanks & Regards > KK
If it's kinda working in 16 bit precision it will probably work just fine in 32 bit, but as mentioned elsewhere you can go as high as 64 bit. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
kalyanik@myw.ltindia.com (kalki) wrote:

>Hi, > >I have some problem in implementing a second order high pass IIR >filter.The problem is that when I tried finding the step response of >the filter, the response I got was varying from what was actually >expected of it. When I traced back, I found the following problem. >
Typically there will be a point where the accumulator results are put back into 16 bit form, at the output of the IIR. Be sure to use rounding there. For the TI parts, you can do something like: mov rnd(hi(ac0)), *ar0 You mentioned you round, when going from floating point to the integer coefficient, so that's good. If you do all that, and are sure your using maximum allowable precision in the calculation, the next option is expanded word size, as people have suggested. You can try initially going to 32 bit on either the coefficient or the filter states. If that's not good enough, then 32 bit on both. I don't think there are many typical scenarios ( IMO ), where 64 bit is going to be needed. Regards, Robert www.gldsp.com ( modify address for return email ) www.numbersusa.com www.americanpatrol.com
Him

I missed the start of this thread, but the subject is important to me
(although not exactly in the original context), so here goes...

I want to implement a 400-tap LPF IIR (say) in a fixed point DSP -
actually a Wavefront 1k device, whose MAC accumulator is only 28 bits.
Ignoring for the moment the even lower precision of the multiplier
factor, what concerns me is that the result of each multiply operation
MUST be truncated back to 28 bits before being added to the
accumulator. This would seem to introduce a pseudo-random error into
each of up to 1 LSB p-p per tap.

If the final multiply result is better than 28 bits, then truncated
back to 28 bits before adding to the accumulator, the total
pseudo-random error from 400 taps is probably of the order of 20 LSBs
RMS, with a 200 LSB DC offset (easy to get rid of).

But if the truncation occurs at the unsigned multiply result, which is
then sign-restored, then the final result with a low frequency input
signal will be more like crossover distortion - the DC offset will
always be towards zero (impossible to get rid of).

All discussion will be welcome.

Tony (remove the "_" to reply by email)
Hi,

I would be interested in knowing the application which demands a 400
tap IIR LPF. With just 28 bits of accumulator implementing a 400 tap
IIR filter might not be that easy. How many bits are ur input and the
coefficients? And how did u decide ur scaling factor? Scaling factor
is very imporant to avoid overflow error. The rounding off error which
u mention might alter ur frequency response significantly and shift ur
cut off frequency from the expected one. With 5 taps and 32 bit
acculumator only its very difficult to implement the IIR filter
without overflow and roundoff errors. So, implementing a 400 tap
filter should be a problem. What abt ur storage registers, I mean how
many bits wide are they? U have to store ur intermediate results also
right?

One more thing I would like to know, how will u determine the number
of output bits at the nth tap given the number of bits at the input
and the filter coefficients?

Tony <tony_roe@tpg.com.au> wrote in message news:<isc0i0tpgopg86oqoss2uve5b2buqh7kfv@4ax.com>...
> Him > > I missed the start of this thread, but the subject is important to me > (although not exactly in the original context), so here goes... > > I want to implement a 400-tap LPF IIR (say) in a fixed point DSP - > actually a Wavefront 1k device, whose MAC accumulator is only 28 bits. > Ignoring for the moment the even lower precision of the multiplier > factor, what concerns me is that the result of each multiply operation > MUST be truncated back to 28 bits before being added to the > accumulator. This would seem to introduce a pseudo-random error into > each of up to 1 LSB p-p per tap. > > If the final multiply result is better than 28 bits, then truncated > back to 28 bits before adding to the accumulator, the total > pseudo-random error from 400 taps is probably of the order of 20 LSBs > RMS, with a 200 LSB DC offset (easy to get rid of). > > But if the truncation occurs at the unsigned multiply result, which is > then sign-restored, then the final result with a low frequency input > signal will be more like crossover distortion - the DC offset will > always be towards zero (impossible to get rid of). > > All discussion will be welcome. > > Tony (remove the "_" to reply by email)
Tony wrote:

> Him > > I missed the start of this thread, but the subject is important to me > (although not exactly in the original context), so here goes... > > I want to implement a 400-tap LPF IIR (say) in a fixed point DSP - > actually a Wavefront 1k device, whose MAC accumulator is only 28 bits. > Ignoring for the moment the even lower precision of the multiplier > factor, what concerns me is that the result of each multiply operation > MUST be truncated back to 28 bits before being added to the > accumulator. This would seem to introduce a pseudo-random error into > each of up to 1 LSB p-p per tap. > > If the final multiply result is better than 28 bits, then truncated > back to 28 bits before adding to the accumulator, the total > pseudo-random error from 400 taps is probably of the order of 20 LSBs > RMS, with a 200 LSB DC offset (easy to get rid of). > > But if the truncation occurs at the unsigned multiply result, which is > then sign-restored, then the final result with a low frequency input > signal will be more like crossover distortion - the DC offset will > always be towards zero (impossible to get rid of). > > All discussion will be welcome. > > Tony (remove the "_" to reply by email)
One doesn't usually characterize IIR filters in terms of taps, although it's possible. Are you using a cascade of 200 2nd-order sections? Jerry -- ... the worst possible design that just meets the specification - almost a definition of practical engineering. .. Chris Bore &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
I can't believe I did that - I was thinking FIR, but for some reason
typed IIR (maybe because everything else I was doing was IIR, and my
brain wasn't fully functional?). My apologies to all!

Could you and Jerry please reconsider my questions in this light? I'm
hoping that most of your concerns go away now, but in any case, all
the data is 28 bits, although the input is only really good for 18
bits at best. The application is a SOS (speaker-on-stick) loudspeaker
crossover, probably in the 2-3kHz region (sampled at 48kHz), and I
expect to create the HPF by subtracting the LPF output from the
(mid-tap-delayed) input, plus additional delay to time-align the
acoustic outputs. There will also be a few biquad IIRs for speaker
response anomaly correction and for the sub crossover (at 125Hz - and
yes, that also has error-inducing potential).

On 17 Aug 2004 03:24:32 -0700, kalyanik@myw.ltindia.com (kalki) wrote:
>Hi, > >I would be interested in knowing the application which demands a 400 >tap IIR LPF. With just 28 bits of accumulator implementing a 400 tap >IIR filter might not be that easy. How many bits are ur input and the >coefficients? And how did u decide ur scaling factor? Scaling factor >is very imporant to avoid overflow error. The rounding off error which >u mention might alter ur frequency response significantly and shift ur >cut off frequency from the expected one. With 5 taps and 32 bit >acculumator only its very difficult to implement the IIR filter >without overflow and roundoff errors. So, implementing a 400 tap >filter should be a problem. What abt ur storage registers, I mean how >many bits wide are they? U have to store ur intermediate results also >right? > >One more thing I would like to know, how will u determine the number >of output bits at the nth tap given the number of bits at the input >and the filter coefficients? > >Tony <tony_roe@tpg.com.au> wrote in message news:<isc0i0tpgopg86oqoss2uve5b2buqh7kfv@4ax.com>... >> Him >> >> I missed the start of this thread, but the subject is important to me >> (although not exactly in the original context), so here goes... >> >> I want to implement a 400-tap LPF IIR (say) in a fixed point DSP - >> actually a Wavefront 1k device, whose MAC accumulator is only 28 bits. >> Ignoring for the moment the even lower precision of the multiplier >> factor, what concerns me is that the result of each multiply operation >> MUST be truncated back to 28 bits before being added to the >> accumulator. This would seem to introduce a pseudo-random error into >> each of up to 1 LSB p-p per tap. >> >> If the final multiply result is better than 28 bits, then truncated >> back to 28 bits before adding to the accumulator, the total >> pseudo-random error from 400 taps is probably of the order of 20 LSBs >> RMS, with a 200 LSB DC offset (easy to get rid of). >> >> But if the truncation occurs at the unsigned multiply result, which is >> then sign-restored, then the final result with a low frequency input >> signal will be more like crossover distortion - the DC offset will >> always be towards zero (impossible to get rid of). >> >> All discussion will be welcome. >> >> Tony (remove the "_" to reply by email)
Tony (remove the "_" to reply by email)
Hi,

I am relatively new to this DSP field. I've explained my problem below
and please bear with me incase the question or my observation is
absurd.

Let me explain clearly what am I doing and what is the problem. The
filter design was done using Matlab. Once I got the coefficients, I
had to take it inside my DSP as a fixed point number. So, I had to
find out a scaling factor for that. I found out the scaling factor
using the following relation,

max(coefft) * 2^s <= 2^(M-1)

where max(coefft) is the maximum value of the filter coefficients, s
is the scaling factor and M is the number of bits I want my
coefficents to be. Since I use signed arithmetic, it becomes M-1.
Currently I've designed for M = 16. I am using TMS320F2812, it has 32
bit accumulator and every memory location is 16 bits wide. My input to
the filter is 12 bits and the coefficients will be 16 bits. Filter
implemented is a second order IIR high pass filter. The DAC which
converts the filter output to analog is a 16 bit DAC.

Inside the DSP, I make use of the scaled filter coefficients, that is,
I take the coefficients inside like this, say, b0 = round(b00*2^s)
where b00 is the coefficent as obtained from the Matlab. Then the
output is computed using the equation,

y(n) = b0x(n) + b1x(n-1) + b2x(n-2) - a1y(n-1) - a2y(n-2)

Finally, once I get y(n), I do a right shift of s bits as I've
multiplied the coefficents by 2^s when I took them inside the DSP.
This right shift of the output always does a "floor"ing operation.
This flooring operation works fine if the filter input is positive,
works fine in the sense the difference between the expected output and
the floored output ( say delta ) is minimum. For negative inputs,
delta is too high. (This fact I verified using Matlab also. I gave
positive and negative inputs to the filter and observed the expected
output and the floored output. )

On the other hand, "ceil"ing of the final filter output( for negative
input values) seems to provide a better response, that is delta here
again is minimum. But I dunno how to do a ceil operation in my DSP. My
code is written completely in assembly language, so it would be better
if someone could guide me how to do this in assembly.

Any help in this regard is highly appreciated. Thanks.

Kalki
Hi,

I am relatively new to this DSP field. I've explained my problem below
and please bear with me incase the question or my observation is
absurd.

Let me explain clearly what am I doing and what is the problem. The
filter design was done using Matlab. Once I got the coefficients, I
had to take it inside my DSP as a fixed point number. So, I had to
find out a scaling factor for that. I found out the scaling factor
using the following relation,

max(coefft) * 2^s <= 2^(M-1)

where max(coefft) is the maximum value of the filter coefficients, s
is the scaling factor and M is the number of bits I want my
coefficents to be. Since I use signed arithmetic, it becomes M-1.
Currently I've designed for M = 16. I am using TMS320F2812, it has 32
bit accumulator and every memory location is 16 bits wide. My input to
the filter is 12 bits and the coefficients will be 16 bits. Filter
implemented is a second order IIR high pass filter. The DAC which
converts the filter output to analog is a 16 bit DAC.

Inside the DSP, I make use of the scaled filter coefficients, that is,
I take the coefficients inside like this, say, b0 = round(b00*2^s)
where b00 is the coefficent as obtained from the Matlab. Then the
output is computed using the equation,

y(n) = b0x(n) + b1x(n-1) + b2x(n-2) - a1y(n-1) - a2y(n-2)

Finally, once I get y(n), I do a right shift of s bits as I've
multiplied the coefficents by 2^s when I took them inside the DSP.
This right shift of the output always does a "floor"ing operation.
This flooring operation works fine if the filter input is positive,
works fine in the sense the difference between the expected output and
the floored output ( say delta ) is minimum. For negative inputs,
delta is too high. (This fact I verified using Matlab also. I gave
positive and negative inputs to the filter and observed the expected
output and the floored output. )

On the other hand, "ceil"ing of the final filter output( for negative
input values) seems to provide a better response, that is delta here
again is minimum. But I dunno how to do a ceil operation in my DSP. My
code is written completely in assembly language, so it would be better
if someone could guide me how to do this in assembly.

Any help in this regard is highly appreciated. Thanks.

Kalki