DSPRelated.com
Forums

another fixed point coefficient scaling question

Started by Noway2 August 6, 2007
All,

For several weeks, I have been working on implementing IIR filters in a 
TI F2812.  I have several goals that I am trying to achieve, though 
these goals are not necessarily design objectives, but rather an attempt 
to see how good I can get things.  Keeping the limitations of the fixed 
point processor in mind, I am typically using a direct form 1 
implementation of second order sections, which seems to work quite well.

One thing that I have noticed is that as I increase the order of the 
filters (usually Butterworth), that the overall gain constant gets 
smaller and smaller, and by the time you get to about an 8th order 
filter the gain constant gets so small that your filter becomes 
unresponsive as your input goes to about zero.  I have experimented with 
the idea of scaling the zeros of the sections by arbitrary factors 
(keeping the overall gain at 1.0) and this works, more or less, but I 
would like to do better.

I saw a post from a few years back that discussed infinity norm scaling 
and how it has the effect of helping to maximize the limited precision. 
  I have been using the Matlab clone, Octave to generate my filter 
functions.  I do not have Matlab and I see hell freezing over before my 
employer buys it.   Unfortunately, Octave does not support this mode of 
coefficient scaling in its signal processing libraries.

 From what I understand, the infinity norm scaling transforms the system 
into state space and then uses the infinity norm to (somehow) scale the 
coefficients and then transforms the equations back into the frequency 
domain.  The frequency domain <---> state space transform isn't a real 
problem and neither is computing the infinity norm, but I would like to 
understand more about the scaling process using the infinity norm and I 
am drawing a blank on this subject.

My question to the group is, could someone please shed some light on 
this form of scaling, or provide an alternate method for scaling the 
coefficients that works better in fixed point, or even a recommendation 
for a better filter structure.  I know the term 'better' is relative to 
design requirements, but at this stage, I am experimenting for the sake 
of learning and willing to try various things and see what happens.


Noway2 wrote:

> All, > > For several weeks, I have been working on implementing IIR filters in a > TI F2812. I have several goals that I am trying to achieve, though > these goals are not necessarily design objectives, but rather an attempt > to see how good I can get things. Keeping the limitations of the fixed > point processor in mind, I am typically using a direct form 1 > implementation of second order sections, which seems to work quite well. > One thing that I have noticed is that as I increase the order of the > filters (usually Butterworth), that the overall gain constant gets > smaller and smaller, and by the time you get to about an 8th order > filter the gain constant gets so small that your filter becomes > unresponsive as your input goes to about zero. I have experimented with > the idea of scaling the zeros of the sections by arbitrary factors > (keeping the overall gain at 1.0) and this works, more or less, but I > would like to do better.
I've done IIR filters in 28xx assembly cursing the darn CPU. That's because TI 28xx is not really a DSP and not really a 32bit machine; the programming for it is quite nasty. However the C implementation is somewhat 3 times slower then the asm code. To me, the best way to avoid the numeric problems in fixed point IIR is implementing a direct type 1 biquad section with the 32bit precision and with the noise shaping of the 1st order. This will work for the most of the practical situations. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Noway2 <no_spam_me@triad.rr.com> writes:
> [...] > My question to the group is, could someone please shed some light on > this form of scaling,
You mean, like this? http://www.mathworks.com/access/helpdesk/help/toolbox/signal/index.html?/access/helpdesk/help/toolbox/signal/tf2sos.html -- % Randy Yates % "She's sweet on Wagner-I think she'd die for Beethoven. %% Fuquay-Varina, NC % She love the way Puccini lays down a tune, and %%% 919-577-9882 % Verdi's always creepin' from her room." %%%% <yates@ieee.org> % "Rockaria", *A New World Record*, ELO http://home.earthlink.net/~yatescr
On Aug 6, 5:17 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> > To me, the best way to avoid the numeric problems in fixed point IIR is > implementing a direct type 1 biquad section with the 32bit precision and > with the noise shaping of the 1st order. This will work for the most of > the practical situations.
Vladimir, there's an expression in English (dunno where it comes from): you're a man after my own heart. it works pretty well and it's still pretty cheap. i know it works for 24 bit, and might expect it to work pretty well for 16 bit. one note: one might store the output as a double precision word (instead of separately as the quantized single precision word along with the chopped off bits for the noise/error shaping) and then, for the next sample, initialize the accumulator with that same double precision word. that is the same as initializing it with the zero-extended error bits (which is what we would normally do for 1st order noise shaping) and then adding to it y[n-1]. so if one does than, they should realize that there is already 1*y[n-1] in the accumulator and adjust the coef (-a1) for that state to compensate (it becomes -a1-1). r b-j
Randy Yates wrote:
> Noway2 <no_spam_me@triad.rr.com> writes: >> [...] >> My question to the group is, could someone please shed some light on >> this form of scaling, > > You mean, like this? > > http://www.mathworks.com/access/helpdesk/help/toolbox/signal/index.html?/access/helpdesk/help/toolbox/signal/tf2sos.html
Yes, that is the function in Matlab. The Octave equivalent only takes the (B,A) parameters and won't accept an ordering or scaling value. I can almost get my head around what it is trying to say. I understand that to compute the infinity norm, that you take the sum of the absolute value of the elements in each row and that the norm is then the max of these sums. Given that value, I am wondering how to scale with it. Do you use this value to divide each zero coefficient by, or is there a more complex formula? I will also admit that I am just starting to get comfortable with state space representation. It was SO poorly taught when I was in school that I developed a bit of aversion to it. I am hoping that as I delve more into representing filters in this format that it will become more clear.
robert bristow-johnson wrote:
> On Aug 6, 5:17 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com> > wrote: >> To me, the best way to avoid the numeric problems in fixed point IIR is >> implementing a direct type 1 biquad section with the 32bit precision and >> with the noise shaping of the 1st order. This will work for the most of >> the practical situations. > > Vladimir, there's an expression in English (dunno where it comes > from): you're a man after my own heart. > > it works pretty well and it's still pretty cheap. i know it works for > 24 bit, and might expect it to work pretty well for 16 bit. > > one note: >
This is the conclusion I am drawing too. It seems to work well enough for the practical situations that I intend to throw at it. I can also relate to the comments about the TI assembly language being abysmal. Of all the assembly languages that I have worked with, TI has so far been the worst. I did however, manage to implement the direct form 1 algorithm in assembly while maintaining 32 bit precision in the input, output and delay states and get it fairly well optimized, to where it will compute a biquad in about 50 clock cycles (~400ns at 120Mhz) including the function call and return. From tests I have run, the resulting steady state numeric precision appears to be good to about five decimal places which is much better than the two that I require for the application. My objective at this point is more academic than practical, hence my goal of seeing how good I can make things with this hardware and investigation of different architectures.
> one might store the output as a double precision word (instead of > separately as the quantized single precision word along with the > chopped off bits for the noise/error shaping) and then, for the next > sample, initialize the accumulator with that same double precision > word. that is the same as initializing it with the zero-extended > error bits (which is what we would normally do for 1st order noise > shaping) and then adding to it y[n-1]. so if one does than, they > should realize that there is already 1*y[n-1] in the accumulator and > adjust the coef (-a1) for that state to compensate (it becomes -a1-1). > > r b-j > >
I had to reread this last part a few times, but that I think I see what you are saying: save the output in double precision (in my case that would be 64 bit), save the lower bits and add them to the y[n-1] (past output) value....basically adding the 'noise' back into the feedback path.

robert bristow-johnson wrote:

> On Aug 6, 5:17 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com> > wrote: > >>To me, the best way to avoid the numeric problems in fixed point IIR is >>implementing a direct type 1 biquad section with the 32bit precision and >>with the noise shaping of the 1st order. This will work for the most of >>the practical situations. > > > Vladimir, there's an expression in English (dunno where it comes > from): you're a man after my own heart. > > it works pretty well and it's still pretty cheap. i know it works for > 24 bit, and might expect it to work pretty well for 16 bit.
As you certainly know, the gain of the recursive part of the filter is at the order of Q(Fs/Fc)^2. Because of this square law, not much can be accomplished using the 16 bit precision.
> > one note: > > one might store the output as a double precision word (instead of > separately as the quantized single precision word along with the > chopped off bits for the noise/error shaping) and then, for the next > sample, initialize the accumulator with that same double precision > word. that is the same as initializing it with the zero-extended > error bits (which is what we would normally do for 1st order noise > shaping) and then adding to it y[n-1]. so if one does than, they > should realize that there is already 1*y[n-1] in the accumulator and > adjust the coef (-a1) for that state to compensate (it becomes -a1-1).
I see... This makes the DSP realization even more elegant, because a1 is usually close to 2. So, taking the accumulator contents into the account, a1 can be represented in 1.x arithmetic. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Vladimir Vassilevsky wrote:
> > > robert bristow-johnson wrote: > >> On Aug 6, 5:17 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com> >> wrote:
...
>> one might store the output as a double precision word (instead of >> separately as the quantized single precision word along with the >> chopped off bits for the noise/error shaping) and then, for the next >> sample, initialize the accumulator with that same double precision >> word. that is the same as initializing it with the zero-extended >> error bits (which is what we would normally do for 1st order noise >> shaping) and then adding to it y[n-1]. so if one does than, they >> should realize that there is already 1*y[n-1] in the accumulator and >> adjust the coef (-a1) for that state to compensate (it becomes -a1-1). > > I see... This makes the DSP realization even more elegant, because a1 is > usually close to 2. So, taking the accumulator contents into the > account, a1 can be represented in 1.x arithmetic.
If a1 is close to 2, isn't -a1-1 close to -3? Jerry -- Engineering is the art of making what you want from things you can get. &macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;
On 7 Aug., 17:08, Jerry Avins <j...@ieee.org> wrote:
> Vladimir Vassilevsky wrote: > > > robert bristow-johnson wrote: > > >> On Aug 6, 5:17 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com> > >> wrote: > > ... > > >> one might store the output as a double precision word (instead of > >> separately as the quantized single precision word along with the > >> chopped off bits for the noise/error shaping) and then, for the next > >> sample, initialize the accumulator with that same double precision > >> word. that is the same as initializing it with the zero-extended > >> error bits (which is what we would normally do for 1st order noise > >> shaping) and then adding to it y[n-1]. so if one does than, they > >> should realize that there is already 1*y[n-1] in the accumulator and > >> adjust the coef (-a1) for that state to compensate (it becomes -a1-1). > > > I see... This makes the DSP realization even more elegant, because a1 is > > usually close to 2. So, taking the accumulator contents into the > > account, a1 can be represented in 1.x arithmetic. > > If a1 is close to 2, isn't -a1-1 close to -3?
Hey wait a minute, you're right! And, even better, (a1/2 e)^pi - pi is veery close to 20!!
Andor wrote:
> On 7 Aug., 17:08, Jerry Avins <j...@ieee.org> wrote: >> Vladimir Vassilevsky wrote: >> >>> robert bristow-johnson wrote: >>>> On Aug 6, 5:17 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com> >>>> wrote: >> ... >> >>>> one might store the output as a double precision word (instead of >>>> separately as the quantized single precision word along with the >>>> chopped off bits for the noise/error shaping) and then, for the next >>>> sample, initialize the accumulator with that same double precision >>>> word. that is the same as initializing it with the zero-extended >>>> error bits (which is what we would normally do for 1st order noise >>>> shaping) and then adding to it y[n-1]. so if one does than, they >>>> should realize that there is already 1*y[n-1] in the accumulator and >>>> adjust the coef (-a1) for that state to compensate (it becomes -a1-1). >>> I see... This makes the DSP realization even more elegant, because a1 is >>> usually close to 2. So, taking the accumulator contents into the >>> account, a1 can be represented in 1.x arithmetic. >> If a1 is close to 2, isn't -a1-1 close to -3? > > Hey wait a minute, you're right! And, even better, (a1/2 e)^pi - pi is > veery close to 20!!
Of course. Everybody knows e^3 = 20 (to better than 0.5%). :-) Jerry -- Engineering is the art of making what you want from things you can get. &macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;