DSPRelated.com
Forums

Fixed Point IIR implementation

Started by Fred Nach November 30, 2005
Jerry Avins wrote:
> robert bristow-johnson wrote: > > Jerry Avins wrote:
...
> >>I knew that too, but old habits sometimes bubble up. The important point > >>was that Fred N. has no y terms to the right of an equal sign, so I > >>don't see any recursion. > > > > > > there is recursion with the intermediate s[k] signal: > > OK. Back to the books with my thinking cap on.
no need. here's a simple way to look at it: s[k] = x[k] - a1*s[k-1] - a2*s[k-2] y[k] = b0*s[k] + b1*s[k-1] + b2*s[k-2] think of this as two filters in cascade. the first with input x[k] and output s[k]. the second with input s[k] and output y[k]. the first is a all-pole filter (actually there are 2 zeros at the origin, but don't worry about them) and the second is a all zero filter, an FIR (like an FIR, there are 2 poles at the origin, but the other 2 zeros kill 'em). the transfer function of the first: S(z)/X(z) = 1/(1 + a1*z^-1 + a2*z-2) the transfer function of the second Y(z)/S(z) = b0 + b1*z^-1 + b2*z^-2 and the transfer function of the whole sheee-bang is H(z) = Y(z)/X(z) = ( Y(z)/S(z) ) * ( S(z)/X(z) ) and the reason it has only two states instead of four is that the states for the first are identical to the states of the second (both delays on s[k]) so they can be combined into a single delay line. r b-j
Hi all !

Firstly thanks a lot for your fast and usefull help ;-)

It seems that my previous post was not clear enough so here are some 
extra details :

* The filter structure is indeed the From II that robert B-J has 
described very clearly.

* a1 and a2 are hence the DENOMINATORS coeff (I was born in the 80's !!)
and ... b1 and b2 are NUMERATOR (no b0 cause it is supposed scaled)

* The input signal is x(n) and it is a bounded signal : PCM 16 bits so 
it varies from -32768 to 32767

* The s(k) are the internal states that I save between each sample 
processing...

* The filter is indeed stable hence  |a1| <= 2 and |a2| <= 1

I think I understood that this implementation is not recommended because 
it compute first the feedback terms and leads to big variations...
But is there a way to retrieve the boudary of s(k) mathematicaly ?

I think that as long as a1 and a2 are given for a stable filter s(k) 
should be bounded...

For my program, this implementation is very interesting cause i code in 
Assembly and It save a lot of MPIS to only read 2 states variable (s1 
and s2) then 4 (x1 x2 y1 y2)...

Fred.


robert bristow-johnson a &#4294967295;crit :
> Fred Nach wrote: > >>I would like to implement a IIR Biquad filter using the fixed point >>arithmetics... >>Hence to reduced the intermediate states I plan to use the following trick: >>s(k) = x(k) -a1*s(k-1) -a2*s(k-2) >>y(n) = b0*s(k) + b1*s(k-1) + b2*s(k-2) >> >>The I can compute each y, saving only 2 states (s(k-1) a&#4294967295;nd s(k-2))... > > > this is the Direct 2 Form (sometimes called the "Direct II Canonical > Form"). unrecommeded for floating-point (if the Q is quite high, you > end up subtracting numbers very close to each other and losing > precision) and *highly* unrecommended for fixed-point (saturation > clipping will be your lot). > > try the Direct 1 Form: > > y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2] > > every product on the right side of the = sign is a double precision > fixed point and all of those products should be added together in > double precision. this is trivial in the Mot 56K and the ADI SHArC and > maybe in the TI fixed-pointers. > > >>BUT ... in order to scale my coefficients or input, I need to know what >>are the boundaries of the s(k) serie... in order to find a proper Fixed >>Point format for my coefficients.. > > > your coefs are defined strictly in terms of the frequency response you > want. the gross range of a1 is -2 to +2 and a2 is -1 to +1 for any > stable biquad filter. the b0, b1, b2 coefs can be nearly anything but > you will have to choose a range and possibly do some scaling using the > arithmetic shift operation on the result before saving to y[n]. > > >>So my questions are : >> >>* for which conditions, s(k) is bounded >>* If the s(k) is bounded, what are the boundaries? > > > that is a purely artificial construct. are your fixed-point signal > values considered 16 bit signed integers? then it's -32768 <= x[n] < > +32768 . are they normalized? then it's -1 <= x[n]< +1. their range > is whatever you want them to be. > > >>Thanks in advance for your help (and sorry for my poor englsih) > > > as Jerry said, your English is fine. > > r b-j >
Me again ...

Ok, so i've just launch a Matlab plot of the filter derived from the 
s(k) sequence :

s(k) = x(k) -a1*s(k-1) -a2*s(k-2)
hence considering x(k) as input and s(k) as output
H(Z) = 1/(1 + a1*z^-1 + a2*z^-2)

the frequency response give a curve that decrease gradualy, *BUT* start 
witha gain of about 80 dB ... hence it means (as long as i 
understand...) that for almost constant x(k) (=low freq) the gain is 
very important so s(k) will take big values ... (gain of 80dB is like 
multiplying by 10 000 !) hence it for a Fixed Point architecture it 
requires ceil(log2(10000)) = 14 guard bits !!!

So yes... it's definitively not a good solution to implement a filter 
like that ... and I will do it using the direct form 1.

Salut !

Fred
Fred Nach wrote:
> > It seems that my previous post was not clear enough
it was clear enough for me. ...
> * The filter is indeed stable hence |a1| <= 2 and |a2| <= 1 > > I think I understood that this implementation is not recommended because > it compute first the feedback terms and leads to big variations...
it means that s[n] is too large to fit in you 16 bit words. if you represent them as 32 bit words, then you have to do double precision multiplies when you multiply be b0, b1, b2.
> But is there a way to retrieve the boudary of s[n] mathematicaly ? > > I think that as long as a1 and a2 are given for a stable filter s[n] > should be bounded...
it is. the worst case is that if a1 and a2 are known, you can obtain the impulse response for the all-pole filter H_s(z) = 1/(1 + a1*z^-1 + a2*z^-2) call that impulse response h_s[n]. the worst case for the maximum value for |s[n]| is what happens if x[n] shares the same sign as s[n] (for each value of n) and is at it at its maximum magnitude. from the convolution summation s[n] = SUM{ h_s[k] * x[n-k] } n max |s[n]| = max |x[n]| * SUM{ |h_s[n]| } n n n (max's and SUM over all n) you might find that max gain, SUM{ |h_s[n]| }, to be a very large value. much larger than 1. if you were to reduce you input to such a degree that you guarantee that s[n] never saturates, you will likely reduce
> For my program, this implementation is very interesting cause i code in > Assembly and It save a lot of MIPS to only read 2 states variable (s1 > and s2) then 4 (x1 x2 y1 y2)...
there is no reason for the DF1 to require more MIPS in the inner loop than the DF2. at least for cascaded biquads. if you're doing only one biquad, that is your overall filter order is 2, then it will cost a little more (but the cost is worth it). the DF1 requires 2 additional states but, if you cascade 2nd order biquads, the two output states can be shared with the 2 input states of the next section. so if you have N sections(an order 2N filter), the number of states you need for DF1 is 2N+2 only 2 more than DF2. not a very high price to pay. the DF2 is useless in your case (assuming that your filter will have some decent Q making the poles very close to the unit circle). but, if you like learning this yourself (we call this "learning the hard way"), feel free to implement it. but you will find that either you pre-scaling of the input will drive your signal into the noise floor or,if you do not pre-scale the input, your intermediate state signal, s[n], will saturate. only with double precision (that will cost more MIPS than DF1) will you avoid that. r b-j
Fred Nach wrote:
> Me again ... > > Ok, so i've just launch a Matlab plot of the filter derived from the > s(k) sequence : > > s(k) = x(k) -a1*s(k-1) -a2*s(k-2) > hence considering x(k) as input and s(k) as output > H(Z) = 1/(1 + a1*z^-1 + a2*z^-2) > > the frequency response give a curve that decrease gradualy, *BUT* start > witha gain of about 80 dB ... hence it means (as long as i > understand...) that for almost constant x(k) (=low freq) the gain is > very important so s(k) will take big values ... (gain of 80dB is like > multiplying by 10 000 !) hence it for a Fixed Point architecture it > requires ceil(log2(10000)) = 14 guard bits !!! > > So yes... it's definitively not a good solution to implement a filter > like that ... and I will do it using the direct form 1.
our posts "crossed in the mail". i'm glad that you "get it" now. :-) when using the DF1, make sure that all five terms (b0*x[n] b1*x[n-1], etc) are added together in a double precision accumulator before quantizing that back to a single precision output. also, consider "noise shaping" or "fraction saving" (Google Groups the latter term or look up the DC Blocking filter in fixed-point at dspguru.com). with only 16 bits, you'll probably need it. r b-j
On 1 Dec 2005 08:35:02 -0800, "robert bristow-johnson"
<rbj@audioimagination.com> wrote:
> >Fred Nach wrote: >> Me again ... >> >> Ok, so i've just launch a Matlab plot of the filter derived from the >> s(k) sequence : >> >> s(k) = x(k) -a1*s(k-1) -a2*s(k-2) >> hence considering x(k) as input and s(k) as output >> H(Z) = 1/(1 + a1*z^-1 + a2*z^-2) >> >> the frequency response give a curve that decrease gradualy, *BUT* start >> witha gain of about 80 dB ... hence it means (as long as i >> understand...) that for almost constant x(k) (=low freq) the gain is >> very important so s(k) will take big values ... (gain of 80dB is like >> multiplying by 10 000 !) hence it for a Fixed Point architecture it >> requires ceil(log2(10000)) = 14 guard bits !!! >> >> So yes... it's definitively not a good solution to implement a filter >> like that ... and I will do it using the direct form 1. > >our posts "crossed in the mail". i'm glad that you "get it" now. :-) > >when using the DF1, make sure that all five terms (b0*x[n] b1*x[n-1], >etc) are added together in a double precision accumulator before >quantizing that back to a single precision output. also, consider >"noise shaping" or "fraction saving" (Google Groups the latter term or >look up the DC Blocking filter in fixed-point at dspguru.com). with >only 16 bits, you'll probably need it. >
Isn't this a case where a lattice filter implementaion helps?
Just Cocky wrote:

> Isn't this a case where a lattice filter implementaion helps?
the "normalized ladder" architecture is supposed to address this problem. it preserves the mean power in those internal states but does not guarantee against saturation. in the DF1, you can scale down b0, b1, b2 to such a point that the summer (and output y[n]) does not saturate (and then scale it up later, if you can get away with it). r b-j
Just Cocky wrote:

> Isn't this a case where a lattice filter implementaion helps?
the "normalized ladder" architecture is supposed to address this problem. it preserves the mean power in those internal states but does not guarantee against saturation. in the DF1, you can scale down b0, b1, b2 to such a point that the summer (and output y[n]) does not saturate (and then scale it up later, if you can get away with it). r b-j
robert bristow-johnson wrote:
> Jerry Avins wrote: > >>robert bristow-johnson wrote: >> >>>Jerry Avins wrote:
...
>>OK. Back to the books with my thinking cap on. > > > no need. here's a simple way to look at it:
... Thanks much. Part of my problem was misreading the original equations, the ones with () instead of []. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
"Andrew Reilly" <andrew-newspost@areilly.bpc-users.org> wrote in message 
news:pan.2005.12.01.00.10.23.587775@areilly.bpc-users.org...
> On Wed, 30 Nov 2005 11:01:31 -0800, Noway2 wrote: >> I have seen Matlab scripts that run an impulse response till it >> stabilizes and then sum up the impulse responses and use the maximum >> value as a scale factor, though I don't know if this is a correct >> solution. > > It's pretty close. This is the main reason why these "direct form 2" > implmentations are not preferred. A direct form 1 implmentation will use > more memory but (a) memory is relatively cheap and (b) the values stored > will be as constrained as the input and output values.
Even though memory is cheap, sometimes I find the overhead of the extra memory stores/fetches slows the filter down a bit (depending on the architecture).