# Quantization of fixed point filter coefficients.

Started by July 20, 2005
```This post is a retake on the 'quantization of filter coefficients'
issue.

Let me present the issue in the following.

I have designed a filter (notch, cut-off 55.7Hz) which is a bi-quad.

The transfer function is :

H(z) = 1 + (-1.992)*z^-1 + (0.994)*z^-2
--------------------------------
1 - (-1.836)*z^-1 - (0.852)*z^-2

Now I am implementing this filter on a 16-bit fixed point DSP (StarCore
1400) using a Q1.15 format. Thus I can represent numbers in the range
of +0.999969 to -1. This range cannot represent the coefficients b1
(-1.992) and a1 (-1.836) in the above equation.

Through the replies to my previous post, some months back, I was
instructed to split these coefficients into two by halving them and
adding another MAC operation in my filtering algorithm. So here is what
I did --

H(z) = 1 + (-0.996)*z^-1 + (-0.996)*z^-1 + (0.994)*z^-2
-------------------------------------------------
1 - (-0.918)*z^-1 - (-0.918)*z^-1 - (0.852)*z^-2

The output is now computed as -->

y(n) = x(n) + (-0.996)*x(n-1) + (-0.996)*x(n-1) + (0.994)*x(n-2)
- (-0.918)*y(n-1) - (-0.918)*y(n-1) - (0.852)*y(n-2)

Now the coefficients DO fit in the desired range. But for some values
of input (which are also in Q1.15 format), the above summation was
running into saturation giving me totally incorrect outputs.

I will take one such example -

y(n) = -0.551269 + (-0.996 * 0.67175) + (-0.996 * 0.67175) + (0.994 *
0.34906) - (-0.918 * 0.61731) - (-0.918 * 0.61731) - (0.852 * 0.34906)

= -0.551269 - 0.669063 - 0.669063 + 0.347
+ 0.5667 + 0.5667 - 0.2974

I am representing the numbers in floating point format for the ease of
understanding. In my algorithm, they are Q1.15 fixed point numbers.

Now if you notice, the sum of the first two terms exceeds -1. Further
adding the third term exceeds the range further. For the above example,
I should get an output of -0.706395 but with saturation setting in, I
get something like 0.1345. This kinda rolls over to the calculation of
the next sample of output and so on with the result that my output is
not close to what I expect it to be.

Now my question is: Is scaling the input signal the ONLY option I have?
If not what else can I try?

Thanks,
T.

```
```tanmay.zargar@gmail.com wrote:

> This post is a retake on the 'quantization of filter coefficients'
> issue.
>
> Let me present the issue in the following.
>
> I have designed a filter (notch, cut-off 55.7Hz) which is a bi-quad.
>
> The transfer function is :
>
> H(z) = 1 + (-1.992)*z^-1 + (0.994)*z^-2
>        --------------------------------
>        1 - (-1.836)*z^-1 - (0.852)*z^-2
>
> Now I am implementing this filter on a 16-bit fixed point DSP (StarCore
> 1400) using a Q1.15 format. Thus I can represent numbers in the range
> of +0.999969 to -1. This range cannot represent the coefficients b1
> (-1.992) and a1 (-1.836) in the above equation.
>
> Through the replies to my previous post, some months back, I was
> instructed to split these coefficients into two by halving them and
> adding another MAC operation in my filtering algorithm. So here is what
> I did --
>
> H(z) = 1 + (-0.996)*z^-1 + (-0.996)*z^-1 + (0.994)*z^-2
>        -------------------------------------------------
>        1 - (-0.918)*z^-1 - (-0.918)*z^-1 - (0.852)*z^-2
>
> The output is now computed as -->
>
> y(n) = x(n) + (-0.996)*x(n-1) + (-0.996)*x(n-1) + (0.994)*x(n-2)
>             - (-0.918)*y(n-1) - (-0.918)*y(n-1) - (0.852)*y(n-2)
>
> Now the coefficients DO fit in the desired range. But for some values
> of input (which are also in Q1.15 format), the above summation was
> running into saturation giving me totally incorrect outputs.
>
> I will take one such example -
>
> y(n) = -0.551269 + (-0.996 * 0.67175) + (-0.996 * 0.67175) + (0.994 *
> 0.34906) - (-0.918 * 0.61731) - (-0.918 * 0.61731) - (0.852 * 0.34906)
>
>      = -0.551269 - 0.669063 - 0.669063 + 0.347
>                  + 0.5667 + 0.5667 - 0.2974
>
> I am representing the numbers in floating point format for the ease of
> understanding. In my algorithm, they are Q1.15 fixed point numbers.
>
> Now if you notice, the sum of the first two terms exceeds -1. Further
> adding the third term exceeds the range further. For the above example,
> I should get an output of -0.706395 but with saturation setting in, I
> get something like 0.1345. This kinda rolls over to the calculation of
> the next sample of output and so on with the result that my output is
> not close to what I expect it to be.
>
> Now my question is: Is scaling the input signal the ONLY option I have?
> If not what else can I try?
>
> Thanks,
> T.
>

If you implement your difference equation in a MAC (yes, this will
require some assembly language coding) it will overflow in some of the
intermediate results but it will not overflow overall.  You should not
need to scale your input if this is the case.

If you insist on implementing the difference equation in C then you may
have issues; if you are very brave you can consider just letting things
overflow knowing in your bones that things will come out right in the
end -- just don't hire me to do any code reviews for you if this is the
case.  You could also experament with changing the order of execution so
that it goes +(something) - (something) + (something).  This will
certainly raise eyebrows, but you may escape the code review unscathed.

Whatever else you do you will end up having to scale the input signal.
Consider that your input coefficients are scaling it right now, anyway.
If your notch filter has a gain that is everywhere equal to or less
than 1 you shouldn't have to scale it too much, however.

I prefer to implement highly resonant filters as a state space:

[ a   w ]           [ b_1 ]
x_n = [       ] x_{n-1} + [     ] u_n,
[-w   a ]           [ b_2 ]

y_n = [ c_1 c_2 ] x_{n-1} + d u_n.

This will have poles at z = a +/- jw.  I leave it to the interested
reader to find good values for the input, output and feedforward
matricies -- I haven't found a nice algorithmic method of doing this
yet, so I always just fumble around manipulating the above equation
symbolically in MathCad until I'm happy.

Note that you _will_ have to pay attention to input scaling with your
b_1 and b_2 values -- since it's resonant this filter's state values
will easily grow quite large.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
```