# saturation in iir filter

Started by April 1, 2008
Hi all,

i am currently implementing an iir filter on a arm processor that doesn't
have a fractionnal multiplication.
As i have read on this forum i divided my filter coefficient (bo b1 b2 a1
a2) by 2 if they are >1 to be able to perform a q1.15 by q1.15
multiplication
the results of multiplications are therefore q2.30.
I accumulate those multiplication in a 64 bit register , multiply the
result by 2 to compensate for coeffs normalization

ie: accum = b0\2*sample + b1\2*x[n-1] + b2\2*x[n-2]- a1\2*y[n-1] +
a2\2*y[n-2]
accum += accum; (to compensate for dividing by 2 the coefficient)

then i round and saturate the result the following way:
if accum > 0x3fffffff
result = 0x7fff (going back to a 16 bit value)
else if accum < 0x80000000
result = 0x8000
else
result = (accu +(1<<15)) >>15 (rouding and going back from q2.30 to
q1.15)

i am then storing this result in the memory line as y[n-1]

my question is the following: my input signal has a very big dynamic range
(many samples above 30000 or below -30000) and when i am filtering it with
a high pass filter (fc = 300hz) i get an output signal with a lot of
saturation which produces hearable artifacts: i am doing something wrong
in my calculation or is the only solution to scale down the input signal
or to divided the b's coefficient by some constant to avoid saturation?

read on this forum on the subject :-) )

marianne wrote:
> Hi all, > > i am currently implementing an iir filter on a arm processor that doesn't > have a fractionnal multiplication. > As i have read on this forum i divided my filter coefficient (bo b1 b2 a1 > a2) by 2 if they are >1 to be able to perform a q1.15 by q1.15 > multiplication > the results of multiplications are therefore q2.30. > I accumulate those multiplication in a 64 bit register , multiply the > result by 2 to compensate for coeffs normalization > > ie: accum = b0\2*sample + b1\2*x[n-1] + b2\2*x[n-2]- a1\2*y[n-1] + > a2\2*y[n-2]
So the accum is computed as the 32 bit number and then casted to 64 bits, huh.
> accum += accum; (to compensate for dividing by 2 the coefficient) > > then i round and saturate the result the following way: > if accum > 0x3fffffff
^^^^^^^^^^^^^^^^^^^^^^^^ ????? if(accum > 0x80000000)
> result = 0x7fff (going back to a 16 bit value) > else if accum < 0x80000000 > result = 0x8000
You need to try harder. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Hello Marianne

Please give the filter coefficients also. So we
can see that the coefficients of the feedback
section is implemented correctly.

I suspect that there is a sign error while implementing
feedback coefficients, and that might be causing multiple
rollovers.

Regards
Bharat Pathak

Arithos Designs
www.Arithos.com

Sorry i did many typing errors:

my accumulation equation is
accum = b0\2*sample + b1\2*x[n-1] + b2\2*x[n-2]- a1\2*y[n-1] -
a2\2*y[n-2]

(instead of the +a2\2*y[n-2] i typed before)

and in my particular case coefficients are b0 = 0.941266298 b1 =
-1.8825326 b2 = 0.941266298 a1 = -1.879089 a2 = 0.885975957

another typing error is that i accumulate on a 32 bits register as the
q2.30 format is enough to take the temporary one bit growth that sometimes
happen during accumulation.

and finally i multiply the accum value by 2 (accum = accum + accum) and
then compare it to the max values of my q2.30 format ie 0x3fffffff and
0xC0000000 to check if i have an overflow or not before  rouding and
scaling down to q1.15: ie
if (accum > 0x3fffffff) then i saturate my 16bits result to 0x7fff
if (accum < 0xC0000000) then i saturate my 16bits result to 0x8000
else i round it and shift it by 15 to get back to a q1.15

maybe Vladirmir that's where i got it wrong? for me the accumulation was
still in the q2.30 format.
Sorry i did many typing errors:

my accumulation equation is
accum = b0\2*sample + b1\2*x[n-1] + b2\2*x[n-2]- a1\2*y[n-1] -
a2\2*y[n-2]

(instead of the +a2\2*y[n-2] i typed before)

and in my particular case coefficients are b0 = 0.941266298 b1 =
-1.8825326 b2 = 0.941266298 a1 = -1.879089 a2 = 0.885975957

another typing error is that i accumulate on a 32 bits register as the
q2.30 format is enough to take the temporary one bit growth that sometimes
happen during accumulation.

and finally i multiply the accum value by 2 (accum = accum + accum) and
then compare it to the max values of my q2.30 format ie 0x3fffffff and
0xC0000000 to check if i have an overflow or not before  rouding and
scaling down to q1.15: ie
if (accum > 0x3fffffff) then i saturate my 16bits result to 0x7fff
if (accum < 0xC0000000) then i saturate my 16bits result to 0x8000
else i round it and shift it by 15 to get back to a q1.15

maybe Vladirmir that's where i got it wrong? for me the accumulation was
still in the q2.30 format.
Sorry i did many typing errors:

my accumulation equation is
accum = b0\2*sample + b1\2*x[n-1] + b2\2*x[n-2]- a1\2*y[n-1] -
a2\2*y[n-2]

(instead of the +a2\2*y[n-2] i typed before)

and in my particular case coefficients are b0 = 0.941266298 b1 =
-1.8825326 b2 = 0.941266298 a1 = -1.879089 a2 = 0.885975957

another typing error is that i accumulate on a 32 bits register as the
q2.30 format is enough to take the temporary one bit growth that sometimes
happen during accumulation.

and finally i multiply the accum value by 2 (accum = accum + accum) and
then compare it to the max values of my q2.30 format ie 0x3fffffff and
0xC0000000 to check if i have an overflow or not before  rouding and
scaling down to q1.15: ie
if (accum > 0x3fffffff) then i saturate my 16bits result to 0x7fff
if (accum < 0xC0000000) then i saturate my 16bits result to 0x8000
else i round it and shift it by 15 to get back to a q1.15

maybe Vladirmir that's where i got it wrong? for me the accumulation was
still in the q2.30 format.
Hi marianne,

"marianne" <mary1331@hotmail.com> writes:
> [...] > and finally i multiply the accum value by 2 (accum = accum + accum)
If your machine doesn't have a saturation mode, then this step is absolutely pointless and potentially harmful. You never want to shift a final result left (which is what multiplying by 2 does) since you risk overflowing and there is no reason to do it. Also, depending on the reason for the shift, you either a) change the scaling of the result, or b) change the gain of the overall filter. You almost certainly don't want to do either of these at this stage. Additionally, if you did want to perform b) for some reason, you can perform the function "virtually" by simply reinterpreting the scaling without risking an overflow.
> and then compare it to the max values of my q2.30 format ie 0x3fffffff > and 0xC0000000 to check if i have an overflow or not before rouding > and scaling down to q1.15: ie if (accum > 0x3fffffff) then i saturate > my 16bits result to 0x7fff if (accum < 0xC0000000) then i saturate my > 16bits result to 0x8000 else i round it and shift it by 15 to get back > to a q1.15
What I would do, instead, is first shift the accumulator right by 15 (since this changes the scaling to q17.15), then perform your saturation logic using 0x00007FFF and 0xFFFF8000 and store the least-significant 16 bits of the accumulator, resulting in q(17-16).15 = q1.15. In order to round, add 0x00004000 to the accumulator before shifting right by 15. If you want to protect the round operation from overflow, shift the accumulator right 1 first, then add 0x00002000, and then shift right by 14 and perform the saturation logic. This is assuming your machine doesn't have built-in saturation abilities. If it did, then you would do things quite a bit differently in order to make use of them and save some cycles. -- % Randy Yates % "Midnight, on the water... %% Fuquay-Varina, NC % I saw... the ocean's daughter." %%% 919-577-9882 % 'Can't Get It Out Of My Head' %%%% <yates@ieee.org> % *El Dorado*, Electric Light Orchestra http://www.digitalsignallabs.com

>If your machine doesn't have a saturation mode, then this step is >absolutely pointless and potentially harmful. You never want to shift a >final result left (which is what multiplying by 2 does) since you risk >overflowing and there is no reason to do it.
my machine doesn't have a saturation mode. I was doing this multiplication by two of my accumulator because i read the following from rbj on this forum on this issue of having coeffs out of the range [-1 1[ """"""""""""""""""""""""""""""""""""""""""""""""""""" y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2] you may not scale the a1 or a2 coefficients according to any other criteria. you *may* scale the numerator coefficients (b0, b1, b2) by a common factor and, except for quantizing or overflow issues, that will change only the level of the output. now since a1 has a range that exceeds what you can put in a 1.15 number format, then what is usually done is that all 5 coefficients are represented with 1/2 of their correct value and the resulting sum is doubled (sometimes with an ASL instruction) before it is outputed and saved as a state. that is: y[n] = 2*(b0/2*x[n] + b1/2*x[n-1] + b2/2*x[n-2] - a1/2*y[n-1] - a2/2*y[n-2]) so the coefs you put into the DSP are b0/2, b1/2, b2/2, a1/2, a2/2, and the result of the sum is 1/2 of what the output is until you multiply that by 2 (usually with an ASL instruction). this is not solely a scaling issue because if you don't double the resulting sum, the y[n-1] and y[n-2] states will not be correct and your filter will not have the shape that you designed for. """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" I do fully agree that multiplicating the accumulation result by 2 will cause some overflow but from what RBJ wrote i thought i had to this to keep the shape of my filter as i intended to be when calculating the coefficient s at the first place. One of my point was also not to loose any dynamic in my output signal compares to the dynamic of my input signal but then should i understand it is just not possible to save the dynamic range AND to avoid overflow? many thanks again for your help
"marianne" <mary1331@hotmail.com> writes: