saturation in iir filter

Started by marianne April 1, 2008
Hi all,

i am currently implementing an iir filter on a arm processor that doesn't
have a fractionnal multiplication.
As i have read on this forum i divided my filter coefficient (bo b1 b2 a1
a2) by 2 if they are >1 to be able to perform a q1.15 by q1.15
multiplication
the results of multiplications are therefore q2.30.
I accumulate those multiplication in a 64 bit register , multiply the
result by 2 to compensate for coeffs normalization

ie: accum = b0\2*sample + b1\2*x[n-1] + b2\2*x[n-2]- a1\2*y[n-1] +
a2\2*y[n-2]
accum += accum; (to compensate for dividing by 2 the coefficient)

then i round and saturate the result the following way:
if accum > 0x3fffffff
 result = 0x7fff (going back to a 16 bit value)
else if accum < 0x80000000
 result = 0x8000
else 
 result = (accu +(1<<15)) >>15 (rouding and going back from q2.30 to
q1.15)

i am then storing this result in the memory line as y[n-1]

my question is the following: my input signal has a very big dynamic range
(many samples above 30000 or below -30000) and when i am filtering it with
a high pass filter (fc = 300hz) i get an output signal with a lot of
saturation which produces hearable artifacts: i am doing something wrong
in my calculation or is the only solution to scale down the input signal
or to divided the b's coefficient by some constant to avoid saturation?

thanks for your help ( and thanks for the many helpfull things i allready
read on this forum on the subject :-) )



marianne wrote:
> Hi all, > > i am currently implementing an iir filter on a arm processor that doesn't > have a fractionnal multiplication. > As i have read on this forum i divided my filter coefficient (bo b1 b2 a1 > a2) by 2 if they are >1 to be able to perform a q1.15 by q1.15 > multiplication > the results of multiplications are therefore q2.30. > I accumulate those multiplication in a 64 bit register , multiply the > result by 2 to compensate for coeffs normalization > > ie: accum = b0\2*sample + b1\2*x[n-1] + b2\2*x[n-2]- a1\2*y[n-1] + > a2\2*y[n-2]
So the accum is computed as the 32 bit number and then casted to 64 bits, huh.
> accum += accum; (to compensate for dividing by 2 the coefficient) > > then i round and saturate the result the following way: > if accum > 0x3fffffff
^^^^^^^^^^^^^^^^^^^^^^^^ ????? if(accum > 0x80000000)
> result = 0x7fff (going back to a 16 bit value) > else if accum < 0x80000000 > result = 0x8000
You need to try harder. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Hello Marianne

      Please give the filter coefficients also. So we
      can see that the coefficients of the feedback 
      section is implemented correctly.

      I suspect that there is a sign error while implementing
      feedback coefficients, and that might be causing multiple
      rollovers.

Regards
Bharat Pathak

Arithos Designs
www.Arithos.com

Sorry i did many typing errors:

my accumulation equation is 
accum = b0\2*sample + b1\2*x[n-1] + b2\2*x[n-2]- a1\2*y[n-1] -
a2\2*y[n-2]

(instead of the +a2\2*y[n-2] i typed before)

and in my particular case coefficients are b0 = 0.941266298 b1 =
-1.8825326 b2 = 0.941266298 a1 = -1.879089 a2 = 0.885975957

another typing error is that i accumulate on a 32 bits register as the
q2.30 format is enough to take the temporary one bit growth that sometimes
happen during accumulation.

and finally i multiply the accum value by 2 (accum = accum + accum) and
then compare it to the max values of my q2.30 format ie 0x3fffffff and
0xC0000000 to check if i have an overflow or not before  rouding and
scaling down to q1.15: ie 
if (accum > 0x3fffffff) then i saturate my 16bits result to 0x7fff
if (accum < 0xC0000000) then i saturate my 16bits result to 0x8000
else i round it and shift it by 15 to get back to a q1.15

maybe Vladirmir that's where i got it wrong? for me the accumulation was
still in the q2.30 format.
Sorry i did many typing errors:

my accumulation equation is 
accum = b0\2*sample + b1\2*x[n-1] + b2\2*x[n-2]- a1\2*y[n-1] -
a2\2*y[n-2]

(instead of the +a2\2*y[n-2] i typed before)

and in my particular case coefficients are b0 = 0.941266298 b1 =
-1.8825326 b2 = 0.941266298 a1 = -1.879089 a2 = 0.885975957

another typing error is that i accumulate on a 32 bits register as the
q2.30 format is enough to take the temporary one bit growth that sometimes
happen during accumulation.

and finally i multiply the accum value by 2 (accum = accum + accum) and
then compare it to the max values of my q2.30 format ie 0x3fffffff and
0xC0000000 to check if i have an overflow or not before  rouding and
scaling down to q1.15: ie 
if (accum > 0x3fffffff) then i saturate my 16bits result to 0x7fff
if (accum < 0xC0000000) then i saturate my 16bits result to 0x8000
else i round it and shift it by 15 to get back to a q1.15

maybe Vladirmir that's where i got it wrong? for me the accumulation was
still in the q2.30 format.
Sorry i did many typing errors:

my accumulation equation is 
accum = b0\2*sample + b1\2*x[n-1] + b2\2*x[n-2]- a1\2*y[n-1] -
a2\2*y[n-2]

(instead of the +a2\2*y[n-2] i typed before)

and in my particular case coefficients are b0 = 0.941266298 b1 =
-1.8825326 b2 = 0.941266298 a1 = -1.879089 a2 = 0.885975957

another typing error is that i accumulate on a 32 bits register as the
q2.30 format is enough to take the temporary one bit growth that sometimes
happen during accumulation.

and finally i multiply the accum value by 2 (accum = accum + accum) and
then compare it to the max values of my q2.30 format ie 0x3fffffff and
0xC0000000 to check if i have an overflow or not before  rouding and
scaling down to q1.15: ie 
if (accum > 0x3fffffff) then i saturate my 16bits result to 0x7fff
if (accum < 0xC0000000) then i saturate my 16bits result to 0x8000
else i round it and shift it by 15 to get back to a q1.15

maybe Vladirmir that's where i got it wrong? for me the accumulation was
still in the q2.30 format.
Hi marianne,

"marianne" <mary1331@hotmail.com> writes:
> [...] > and finally i multiply the accum value by 2 (accum = accum + accum)
If your machine doesn't have a saturation mode, then this step is absolutely pointless and potentially harmful. You never want to shift a final result left (which is what multiplying by 2 does) since you risk overflowing and there is no reason to do it. Also, depending on the reason for the shift, you either a) change the scaling of the result, or b) change the gain of the overall filter. You almost certainly don't want to do either of these at this stage. Additionally, if you did want to perform b) for some reason, you can perform the function "virtually" by simply reinterpreting the scaling without risking an overflow.
> and then compare it to the max values of my q2.30 format ie 0x3fffffff > and 0xC0000000 to check if i have an overflow or not before rouding > and scaling down to q1.15: ie if (accum > 0x3fffffff) then i saturate > my 16bits result to 0x7fff if (accum < 0xC0000000) then i saturate my > 16bits result to 0x8000 else i round it and shift it by 15 to get back > to a q1.15
What I would do, instead, is first shift the accumulator right by 15 (since this changes the scaling to q17.15), then perform your saturation logic using 0x00007FFF and 0xFFFF8000 and store the least-significant 16 bits of the accumulator, resulting in q(17-16).15 = q1.15. In order to round, add 0x00004000 to the accumulator before shifting right by 15. If you want to protect the round operation from overflow, shift the accumulator right 1 first, then add 0x00002000, and then shift right by 14 and perform the saturation logic. This is assuming your machine doesn't have built-in saturation abilities. If it did, then you would do things quite a bit differently in order to make use of them and save some cycles. -- % Randy Yates % "Midnight, on the water... %% Fuquay-Varina, NC % I saw... the ocean's daughter." %%% 919-577-9882 % 'Can't Get It Out Of My Head' %%%% <yates@ieee.org> % *El Dorado*, Electric Light Orchestra http://www.digitalsignallabs.com
Thanks for your answer Randy

>If your machine doesn't have a saturation mode, then this step is >absolutely pointless and potentially harmful. You never want to shift a >final result left (which is what multiplying by 2 does) since you risk >overflowing and there is no reason to do it.
my machine doesn't have a saturation mode. I was doing this multiplication by two of my accumulator because i read the following from rbj on this forum on this issue of having coeffs out of the range [-1 1[ """"""""""""""""""""""""""""""""""""""""""""""""""""" y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2] you may not scale the a1 or a2 coefficients according to any other criteria. you *may* scale the numerator coefficients (b0, b1, b2) by a common factor and, except for quantizing or overflow issues, that will change only the level of the output. now since a1 has a range that exceeds what you can put in a 1.15 number format, then what is usually done is that all 5 coefficients are represented with 1/2 of their correct value and the resulting sum is doubled (sometimes with an ASL instruction) before it is outputed and saved as a state. that is: y[n] = 2*(b0/2*x[n] + b1/2*x[n-1] + b2/2*x[n-2] - a1/2*y[n-1] - a2/2*y[n-2]) so the coefs you put into the DSP are b0/2, b1/2, b2/2, a1/2, a2/2, and the result of the sum is 1/2 of what the output is until you multiply that by 2 (usually with an ASL instruction). this is not solely a scaling issue because if you don't double the resulting sum, the y[n-1] and y[n-2] states will not be correct and your filter will not have the shape that you designed for. """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" I do fully agree that multiplicating the accumulation result by 2 will cause some overflow but from what RBJ wrote i thought i had to this to keep the shape of my filter as i intended to be when calculating the coefficient s at the first place. One of my point was also not to loose any dynamic in my output signal compares to the dynamic of my input signal but then should i understand it is just not possible to save the dynamic range AND to avoid overflow? many thanks again for your help
"marianne" <mary1331@hotmail.com> writes:

> Thanks for your answer Randy
You're very welcome, marianne.
>>If your machine doesn't have a saturation mode, then this step is >>absolutely pointless and potentially harmful. You never want to shift a >>final result left (which is what multiplying by 2 does) since you risk >>overflowing and there is no reason to do it. > > my machine doesn't have a saturation mode. > I was doing this multiplication by two of my accumulator because i read > the following from rbj on this forum on this issue of having coeffs out of > the range [-1 1[ > >
"""""""""""""""""""""""""""""""""""""""""""""""""""""
> y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2] > > > you may not scale the a1 or a2 coefficients according to any other > criteria. > you *may* scale the numerator coefficients (b0, b1, b2) by a common > factor > and, except for quantizing or overflow issues, that will change only the > level of the output. > > now since a1 has a range that exceeds what you can put in a 1.15 number > format, then what is usually done is that all 5 coefficients are > represented > with 1/2 of their correct value and the resulting sum is doubled > (sometimes > with an ASL instruction) before it is outputed and saved as a state. that > is: > > > y[n] = 2*(b0/2*x[n] + b1/2*x[n-1] + b2/2*x[n-2] - a1/2*y[n-1] - > a2/2*y[n-2]) > > > so the coefs you put into the DSP are b0/2, b1/2, b2/2, a1/2, a2/2, and > the > result of the sum is 1/2 of what the output is until you multiply that by > 2 > (usually with an ASL instruction). this is not solely a scaling issue > because if you don't double the resulting sum, the y[n-1] and y[n-2] > states > will not be correct and your filter will not have the shape that you > designed for. >""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Aha!
> I do fully agree that multiplicating the accumulation result by 2 will > cause some overflow but from what RBJ wrote i thought i had to this to > keep the shape of my filter as i intended to be when calculating the > coefficient s at the first place.
OK, I see now. Then you do what I mentioned previously: multiply by two "virtually": If the accumulator result is q2.30, then to multiply by two, simply reinterpret the scaling as q3.29. you don't really do anything physically, it's just a reinterpretation. As a simple example, let's say the accumulator result was "1.0", or 0x40000000 in q(2,30). But 0x4000000 in q(3,29) is "2.0". Voila: we've multiplied by two, but we didn't overflow anything or lose any precision. What I would do, then, is first shift the accumulator right by 14 (since this changes the scaling to q17.15), then perform your saturation logic using 0x00007FFF and 0xFFFF8000 and store the least-significant 16 bits of the accumulator, resulting in q(17-16).15 = q1.15. In order to round, add 0x00002000 to the accumulator before shifting right by 14. If you want to protect the round operation from overflow, shift the accumulator right 1 first, then add 0x00001000, and then shift right by 13 and perform the saturation logic.
> One of my point was also not to loose any dynamic in my output signal > compares to the dynamic of my input signal but then should i understand it > is just not possible to save the dynamic range AND to avoid overflow?
That may be true - I'd have to examine your filter coefficients more closely to say for sure. But, as a simple thought experiment, consider a very simple 1-tap FIR filter h[0] = 2.0 (i.e., a gain of two). Then if you want the output samples to be scaled the same as the input samples, you're going to necessarily lose dynamic range, since many samples in your input signal that were originally in range will saturate the "filter" output. A similar thing goes for your IIR. If, for example, your IIR has a gain of > 1 in some band, and you input a full-scale sine wave in that band, then it's going to clip (saturate) at the output, assuming you keep the scaling of the output samples the same as that of the input samples. -- % Randy Yates % "I met someone who looks alot like you, %% Fuquay-Varina, NC % she does the things you do, %%% 919-577-9882 % but she is an IBM." %%%% <yates@ieee.org> % 'Yours Truly, 2095', *Time*, ELO http://www.digitalsignallabs.com
>If the accumulator result is q2.30, then to multiply by two, simply >reinterpret the scaling as q3.29. you don't really do anything >physically, it's just a reinterpretation. As a simple example, let's >say the accumulator result was "1.0", or 0x40000000 in q(2,30). >But 0x4000000 in q(3,29) is "2.0". Voila: we've multiplied by two, >but we didn't overflow anything or lose any precision. > >What I would do, then, is first shift the accumulator right by 14 >(since this changes the scaling to q17.15), then perform your saturation >logic using 0x00007FFF and 0xFFFF8000 and store the least-significant 16 >bits of the accumulator, resulting in q(17-16).15 = q1.15. In order to >round, add 0x00002000 to the accumulator before shifting right by 14. >If you want to protect the round operation from overflow, shift the >accumulator right 1 first, then add 0x00001000, and then shift right by >13 and perform the saturation logic.
Let's take an example that really happen in my filter: my accumulator value (in q2.30 before considering it as q3.29) is 0x21ca49a7: it is not over the max of q2.30, fine. Now i do what you suggest above add 0x00002000 -> 0x21ca69a7 shift rigth by 14 -> 0x00008729 is over 0x00007fff so my q1.15 output result is 0x7fff => ok i am saturating
>That may be true - I'd have to examine your filter coefficients more >closely to say for sure.
the coefficients are b0 = 0.941266298 b1 = -1.8825326 b2 = 0.941266298 a1 = -1.879089 a2 = 0.885975957
> >But, as a simple thought experiment, consider a very simple 1-tap FIR >filter h[0] = 2.0 (i.e., a gain of two). Then if you want the output >samples to be scaled the same as the input samples, you're going to >necessarily lose dynamic range, since many samples in your input signal >that were originally in range will saturate the "filter" output. > >A similar thing goes for your IIR. If, for example, your IIR has a gain >of > 1 in some band, and you input a full-scale sine wave in that band, >then it's going to clip (saturate) at the output, assuming you keep the >scaling of the output samples the same as that of the input samples.
I agree with that idea but in my present case, my filter is a high pass filter and it doesn't seem to me that the gain is ever > 1 (i might be wrong on this) that's where my un-understanding lies: how can i saturate if my filter gain is never >1...
Thanks for your help and good advices Randy.
I think i managed to work out the pb, as you said, my scaling was not
good.
I finally scaled down my inputs by 2, did all the calculation and only in
the end of the process scaled up the output by 2 again and it works: i
don't hear artifacts any more and i kept my input dynamic.
I think the problem was not really coming from having those saturated
samples at the output but rather to put them back in the feedback part of
the filter and do calculation with them.
Anyway , thanks again Randy for the time you took to answer my question
and give precious advice.What is nice with signal processing and Dsp is
that it is a never ending learning topics!
merci beaucoup ;-)

Randy Yates <yates@ieee.org> writes:

> Hi marianne, > [...]
PS: I checked your original post and found out your accumulator is 64 bits wide. That means that, while the result of multipying two 16-bit q1.15 numbers is q2.30, the final result after being sign-extended into the accumulator is q34.30. I was assuming in my earlier post that the accumulator was 32 bits wide. -- % Randy Yates % "She's sweet on Wagner-I think she'd die for Beethoven. %% Fuquay-Varina, NC % She love the way Puccini lays down a tune, and %%% 919-577-9882 % Verdi's always creepin' from her room." %%%% <yates@ieee.org> % "Rockaria", *A New World Record*, ELO http://www.digitalsignallabs.com
Randy Yates <yates@ieee.org> writes:
> [...] > or q6.11.
Sorry! q6.10. -- % Randy Yates % "Rollin' and riding and slippin' and %% Fuquay-Varina, NC % sliding, it's magic." %%% 919-577-9882 % %%%% <yates@ieee.org> % 'Living' Thing', *A New World Record*, ELO http://www.digitalsignallabs.com
Hi marianne,

"marianne" <mary1331@hotmail.com> writes:

>>If the accumulator result is q2.30, then to multiply by two, simply >>reinterpret the scaling as q3.29. you don't really do anything >>physically, it's just a reinterpretation. As a simple example, let's >>say the accumulator result was "1.0", or 0x40000000 in q(2,30). >>But 0x4000000 in q(3,29) is "2.0". Voila: we've multiplied by
two,
>>but we didn't overflow anything or lose any precision. >> >>What I would do, then, is first shift the accumulator right by 14 >>(since this changes the scaling to q17.15), then perform your saturation >>logic using 0x00007FFF and 0xFFFF8000 and store the least-significant 16 >>bits of the accumulator, resulting in q(17-16).15 = q1.15. In order to >>round, add 0x00002000 to the accumulator before shifting right by 14. >>If you want to protect the round operation from overflow, shift the >>accumulator right 1 first, then add 0x00001000, and then shift right by >>13 and perform the saturation logic. > > Let's take an example that really happen in my filter: > my accumulator value (in q2.30 before considering it as q3.29) is > 0x21ca49a7: it is not over the max of q2.30, fine.
Right.
> Now i do what you suggest above add 0x00002000 -> 0x21ca69a7 > shift rigth by 14 -> 0x00008729 is over 0x00007fff so my q1.15 output > result is 0x7fff => ok i am saturating
That's right. That's because you're trying to stuff a number with 2 integer digits into an output with only 0 integer digits. NB: qa.b has a-1 integer digits. See section 4 of my paper near the end where it gives the range of an A(a,b) scaling. http://digitalsignallabs.com/fp.pdf I did not address whether your choices in scaling were appropriate. I simply showed the proper way to execute the filter given those scalings. I assumed that you had chosen scalings that satisfy the requirements of your system.
>>That may be true - I'd have to examine your filter coefficients more >>closely to say for sure. > > the coefficients are b0 = 0.941266298 b1 = -1.8825326 b2 = 0.941266298 > a1 = -1.879089 a2 = 0.885975957 > [...] > I agree with that idea but in my present case, my filter is a high pass > filter and it doesn't seem to me that the gain is ever > 1 (i might be > wrong on this) > that's where my un-understanding lies: how can i saturate if my filter > gain is never >1...
Because, while my example was correct, it wasn't all-inclusive. That is, you cannot state that "An IIR filter has potential overflow problems if and only if it has a gain greater than 1 at some frequency." The situation is more complex: there are other reasons it could overflow and you've run into one. Essentially, while steady-state sine waves won't overflow your filter (I checked your coefficients and its passband gain never exceeds 1) as scaled, transients will. If you want to guarantee that your output will never overflow in this filter, then you must scale the integer output value a in qa.b as a = 3 + ceil(log_2(0.951266298 + 1.8825326 + 0.941266298 + 1.879089 + 0.885975957)) = 6, or q6.11. The system designer has to determine what is more important: avoiding overflow at all costs or preserving precision. You can preserve precision at the cost of potentially saturating a few special case input signals. -- % Randy Yates % "Remember the good old 1980's, when %% Fuquay-Varina, NC % things were so uncomplicated?" %%% 919-577-9882 % 'Ticket To The Moon' %%%% <yates@ieee.org> % *Time*, Electric Light Orchestra http://www.digitalsignallabs.com
>I agree with that idea but in my present case, my filter is a high pass >filter and it doesn't seem to me that the gain is ever > 1 (i might be >wrong on this) >that's where my un-understanding lies: how can i saturate if my filter >gain is never >1...
You have an IIR which consists of two parts: * FIR -> foreward path, coefficients (matlab style) B=[0.941266298,-1.8825326, 0.941266298] * IIR -> backward path, coefficients A=[1, -1.879089, 0.885975957] Depending on your implementation you do foreward before feedback path (for example direct form 1 implementation) (or vice versa). If you own MATLAB or OCTAVE you get your filter response with: freqz(B,A) - then you see your filterresponse -- and you are right that it has no gain, only attenuation but if you have a closer look to the foreward path you have to type: freqz(B,1) then you will see that the foreward path has positive gain and for the feedback path type freqz(1,A) then you will also see positive gain (a lot!) here. Depending to your implementation (I didn&acute;t read all the posts so I don&acute;t know which one you choosed) you will clip the signal in the accumulation point if you do nothing and you are using that coefficents. For example you have direct form 1 implementation you have to take care that the foreward path has no gain. If the feedback path has gain it should be only there where the foreward path compensates this. Or you compensate that in the following way: dividing all coefficients by two and do a shift in left direction (=multiply by two) on the correct place in your structure (For direct form 1 this is before you split up y[n] and the delay line for the feedback path - if you have noise problems then you could do the feedback path in double precision). Check out http://en.wikipedia.org/wiki/Digital_biquad_filter if you are not sure what&acute;s direct form. I hope that helps, greetings, Markus www.two-pi.com
>If the accumulator result is q2.30, then to multiply by two, simply >reinterpret the scaling as q3.29. you don't really do anything >physically, it's just a reinterpretation. As a simple example, let's >say the accumulator result was "1.0", or 0x40000000 in q(2,30). >But 0x4000000 in q(3,29) is "2.0". Voila: we've multiplied by two, >but we didn't overflow anything or lose any precision. > >What I would do, then, is first shift the accumulator right by 14 >(since this changes the scaling to q17.15), then perform your saturation >logic using 0x00007FFF and 0xFFFF8000 and store the least-significant 16 >bits of the accumulator, resulting in q(17-16).15 = q1.15. In order to >round, add 0x00002000 to the accumulator before shifting right by 14. >If you want to protect the round operation from overflow, shift the >accumulator right 1 first, then add 0x00001000, and then shift right by >13 and perform the saturation logic.
Let's take an example that really happen in my filter: my accumulator value (in q2.30 before considering it as q3.29) is 0x21ca49a7: it is not over the max of q2.30, fine. Now i do what you suggest above add 0x00002000 -> 0x21ca69a7 shift rigth by 14 -> 0x00008729 is over 0x00007fff so my q1.15 output result is 0x7fff => ok i am saturating
>That may be true - I'd have to examine your filter coefficients more >closely to say for sure.
the coefficients are b0 = 0.941266298 b1 = -1.8825326 b2 = 0.941266298 a1 = -1.879089 a2 = 0.885975957
> >But, as a simple thought experiment, consider a very simple 1-tap FIR >filter h[0] = 2.0 (i.e., a gain of two). Then if you want the output >samples to be scaled the same as the input samples, you're going to >necessarily lose dynamic range, since many samples in your input signal >that were originally in range will saturate the "filter" output. > >A similar thing goes for your IIR. If, for example, your IIR has a gain >of > 1 in some band, and you input a full-scale sine wave in that band, >then it's going to clip (saturate) at the output, assuming you keep the >scaling of the output samples the same as that of the input samples.
I agree with that idea but in my present case, my filter is a high pass filter and it doesn't seem to me that the gain is ever > 1 (i might be wrong on this) that's where my un-understanding lies: how can i saturate if my filter gain is never >1...
"marianne" <mary1331@hotmail.com> writes:

> Thanks for your answer Randy
You're very welcome, marianne.
>>If your machine doesn't have a saturation mode, then this step is >>absolutely pointless and potentially harmful. You never want to shift a >>final result left (which is what multiplying by 2 does) since you risk >>overflowing and there is no reason to do it. > > my machine doesn't have a saturation mode. > I was doing this multiplication by two of my accumulator because i read > the following from rbj on this forum on this issue of having coeffs out of > the range [-1 1[ > >
"""""""""""""""""""""""""""""""""""""""""""""""""""""
> y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2] > > > you may not scale the a1 or a2 coefficients according to any other > criteria. > you *may* scale the numerator coefficients (b0, b1, b2) by a common > factor > and, except for quantizing or overflow issues, that will change only the > level of the output. > > now since a1 has a range that exceeds what you can put in a 1.15 number > format, then what is usually done is that all 5 coefficients are > represented > with 1/2 of their correct value and the resulting sum is doubled > (sometimes > with an ASL instruction) before it is outputed and saved as a state. that > is: > > > y[n] = 2*(b0/2*x[n] + b1/2*x[n-1] + b2/2*x[n-2] - a1/2*y[n-1] - > a2/2*y[n-2]) > > > so the coefs you put into the DSP are b0/2, b1/2, b2/2, a1/2, a2/2, and > the > result of the sum is 1/2 of what the output is until you multiply that by > 2 > (usually with an ASL instruction). this is not solely a scaling issue > because if you don't double the resulting sum, the y[n-1] and y[n-2] > states > will not be correct and your filter will not have the shape that you > designed for. >""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Aha!
> I do fully agree that multiplicating the accumulation result by 2 will > cause some overflow but from what RBJ wrote i thought i had to this to > keep the shape of my filter as i intended to be when calculating the > coefficient s at the first place.
OK, I see now. Then you do what I mentioned previously: multiply by two "virtually": If the accumulator result is q2.30, then to multiply by two, simply reinterpret the scaling as q3.29. you don't really do anything physically, it's just a reinterpretation. As a simple example, let's say the accumulator result was "1.0", or 0x40000000 in q(2,30). But 0x4000000 in q(3,29) is "2.0". Voila: we've multiplied by two, but we didn't overflow anything or lose any precision. What I would do, then, is first shift the accumulator right by 14 (since this changes the scaling to q17.15), then perform your saturation logic using 0x00007FFF and 0xFFFF8000 and store the least-significant 16 bits of the accumulator, resulting in q(17-16).15 = q1.15. In order to round, add 0x00002000 to the accumulator before shifting right by 14. If you want to protect the round operation from overflow, shift the accumulator right 1 first, then add 0x00001000, and then shift right by 13 and perform the saturation logic.
> One of my point was also not to loose any dynamic in my output signal > compares to the dynamic of my input signal but then should i understand it > is just not possible to save the dynamic range AND to avoid overflow?
That may be true - I'd have to examine your filter coefficients more closely to say for sure. But, as a simple thought experiment, consider a very simple 1-tap FIR filter h[0] = 2.0 (i.e., a gain of two). Then if you want the output samples to be scaled the same as the input samples, you're going to necessarily lose dynamic range, since many samples in your input signal that were originally in range will saturate the "filter" output. A similar thing goes for your IIR. If, for example, your IIR has a gain of > 1 in some band, and you input a full-scale sine wave in that band, then it's going to clip (saturate) at the output, assuming you keep the scaling of the output samples the same as that of the input samples. -- % Randy Yates % "I met someone who looks alot like you, %% Fuquay-Varina, NC % she does the things you do, %%% 919-577-9882 % but she is an IBM." %%%% <yates@ieee.org> % 'Yours Truly, 2095', *Time*, ELO http://www.digitalsignallabs.com
Thanks for your answer Randy

>If your machine doesn't have a saturation mode, then this step is >absolutely pointless and potentially harmful. You never want to shift a >final result left (which is what multiplying by 2 does) since you risk >overflowing and there is no reason to do it.
my machine doesn't have a saturation mode. I was doing this multiplication by two of my accumulator because i read the following from rbj on this forum on this issue of having coeffs out of the range [-1 1[ """"""""""""""""""""""""""""""""""""""""""""""""""""" y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2] you may not scale the a1 or a2 coefficients according to any other criteria. you *may* scale the numerator coefficients (b0, b1, b2) by a common factor and, except for quantizing or overflow issues, that will change only the level of the output. now since a1 has a range that exceeds what you can put in a 1.15 number format, then what is usually done is that all 5 coefficients are represented with 1/2 of their correct value and the resulting sum is doubled (sometimes with an ASL instruction) before it is outputed and saved as a state. that is: y[n] = 2*(b0/2*x[n] + b1/2*x[n-1] + b2/2*x[n-2] - a1/2*y[n-1] - a2/2*y[n-2]) so the coefs you put into the DSP are b0/2, b1/2, b2/2, a1/2, a2/2, and the result of the sum is 1/2 of what the output is until you multiply that by 2 (usually with an ASL instruction). this is not solely a scaling issue because if you don't double the resulting sum, the y[n-1] and y[n-2] states will not be correct and your filter will not have the shape that you designed for. """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" I do fully agree that multiplicating the accumulation result by 2 will cause some overflow but from what RBJ wrote i thought i had to this to keep the shape of my filter as i intended to be when calculating the coefficient s at the first place. One of my point was also not to loose any dynamic in my output signal compares to the dynamic of my input signal but then should i understand it is just not possible to save the dynamic range AND to avoid overflow? many thanks again for your help
Hi marianne,

"marianne" <mary1331@hotmail.com> writes:
> [...] > and finally i multiply the accum value by 2 (accum = accum + accum)
If your machine doesn't have a saturation mode, then this step is absolutely pointless and potentially harmful. You never want to shift a final result left (which is what multiplying by 2 does) since you risk overflowing and there is no reason to do it. Also, depending on the reason for the shift, you either a) change the scaling of the result, or b) change the gain of the overall filter. You almost certainly don't want to do either of these at this stage. Additionally, if you did want to perform b) for some reason, you can perform the function "virtually" by simply reinterpreting the scaling without risking an overflow.
> and then compare it to the max values of my q2.30 format ie 0x3fffffff > and 0xC0000000 to check if i have an overflow or not before rouding > and scaling down to q1.15: ie if (accum > 0x3fffffff) then i saturate > my 16bits result to 0x7fff if (accum < 0xC0000000) then i saturate my > 16bits result to 0x8000 else i round it and shift it by 15 to get back > to a q1.15
What I would do, instead, is first shift the accumulator right by 15 (since this changes the scaling to q17.15), then perform your saturation logic using 0x00007FFF and 0xFFFF8000 and store the least-significant 16 bits of the accumulator, resulting in q(17-16).15 = q1.15. In order to round, add 0x00004000 to the accumulator before shifting right by 15. If you want to protect the round operation from overflow, shift the accumulator right 1 first, then add 0x00002000, and then shift right by 14 and perform the saturation logic. This is assuming your machine doesn't have built-in saturation abilities. If it did, then you would do things quite a bit differently in order to make use of them and save some cycles. -- % Randy Yates % "Midnight, on the water... %% Fuquay-Varina, NC % I saw... the ocean's daughter." %%% 919-577-9882 % 'Can't Get It Out Of My Head' %%%% <yates@ieee.org> % *El Dorado*, Electric Light Orchestra http://www.digitalsignallabs.com
Sorry i did many typing errors:

my accumulation equation is 
accum = b0\2*sample + b1\2*x[n-1] + b2\2*x[n-2]- a1\2*y[n-1] -
a2\2*y[n-2]

(instead of the +a2\2*y[n-2] i typed before)

and in my particular case coefficients are b0 = 0.941266298 b1 =
-1.8825326 b2 = 0.941266298 a1 = -1.879089 a2 = 0.885975957

another typing error is that i accumulate on a 32 bits register as the
q2.30 format is enough to take the temporary one bit growth that sometimes
happen during accumulation.

and finally i multiply the accum value by 2 (accum = accum + accum) and
then compare it to the max values of my q2.30 format ie 0x3fffffff and
0xC0000000 to check if i have an overflow or not before  rouding and
scaling down to q1.15: ie 
if (accum > 0x3fffffff) then i saturate my 16bits result to 0x7fff
if (accum < 0xC0000000) then i saturate my 16bits result to 0x8000
else i round it and shift it by 15 to get back to a q1.15

maybe Vladirmir that's where i got it wrong? for me the accumulation was
still in the q2.30 format.