# Fixed Point issues

Started by July 22, 2003
```Hello,
I am trying to implement an FIR and an IIR filter in fixed point
arithmetic.
The actual filter is of no importance(so I have implemented the
simplest filter possible),as long as it is in fixed point,and moreover
as I can give as input the desirable precision.
I wrote the following piece of code,but the results are totaly bogus..
Can anybody suggest some help?I dont know if I have to blame my
programming skills(I am a total newbie in C) or the fixed point
implementation scheme.
I implement fixed point arithmetic by : 1.Multipying coefficients by
2^precision,findind the 2 least significant bits of the output,and my
final result is the floor[(output/2^precision)+lsb's of output)]

Code follows :
//Fixed Point FIR filter
#include <stdio.h>
#include <iostream.h>
#include <math.h>

#define SAMPLE double  /* type used for data samples */

// Zero the delay line of the filter before beginning
void clear(int ntaps,SAMPLE d[])
{
int i;
for(i=0;i<ntaps;i++){
d[i]=0;
}

}

//FIR FILTER FUNCTION
SAMPLE fir(int input,int ntaps,const SAMPLE c[],SAMPLE d[])
{
int i=0;
SAMPLE acc;
//store input at the beginning of delay

d=input;
//calculate filter
acc=0;
for (i=0;i<ntaps;i++){
acc+=c[i]*d[i];
}
//shift delay
for (i=ntaps-2;i>=0;i--){
d[i+1]=d[i];
}
return acc;
}

int main(void)
{
#define ntaps 7
//#define precision 8
static SAMPLE c[ntaps]={-0.06453888262870,  -0.04068941760916,
0.41809227322162,
0.78848561640558,   0.41809227322162,  -0.04068941760916,
-0.06453888262870};
SAMPLE true_c[ntaps];
static SAMPLE d[ntaps];
#define inp_size 3*ntaps
static SAMPLE inp[inp_size];
SAMPLE output,output1;
int precision;
int lsb,temp_output;
int bit_true_out;
int i,j;
FILE *fp;

cout<<"Give precision of bit true operations:\n";
cin>>precision;

//perform bit true operation on coefficient  matrix
for(j=0;j<ntaps;j++){
true_c[j]=pow(2,precision)*c[j];
}

//create impulse input signal

clear(inp_size,inp);
inp=1.0;
inp=1.0;
inp=1.0;

//apply filter to inputs and print results to file res.txt
fp=fopen("res.txt","a");
clear(ntaps,d);
for (i=0;i<inp_size;i++){
output1=fir(inp[i],ntaps,c,d);
output=fir(inp[i],ntaps,true_c,d);
temp_output=output;
lsb=3&temp_output;
bit_true_out=floor((temp_output/pow(2,precision))+lsb);
fprintf(fp,"---------------------------------------\n");
fprintf(fp,"Precision = %d \n",precision);
fprintf(fp,"Normal Output\n");
fprintf(fp,"%3.4f",(double)output1);
fprintf(fp,"\n");
fprintf(fp,"Temp_out=%d,lsb=%d\n",(int)temp_output,(int)lsb);
fprintf(fp,"\nBit true output\n");
fprintf(fp,"%d",(int)bit_true_out);
fprintf(fp,"\n");
fprintf(fp,"---------------------------------------\n");
}
fclose(fp);
}
```
```Hello,
I am trying to implement an FIR and an IIR filter in fixed point
arithmetic.
The actual filter is of no importance(so I have implemented the
simplest filter possible),as long as it is in fixed point,and moreover
as I can give as input the desirable precision.
I wrote the following piece of code,but the results are totaly bogus..
Can anybody suggest some help?I dont know if I have to blame my
programming skills(I am a total newbie in C) or the fixed point
implementation scheme.
I implement fixed point arithmetic by : 1.Multipying coefficients by
2^precision,findind the 2 least significant bits of the output,and my
final result is the floor[(output/2^precision)+lsb's of output)]

Code follows :
//Fixed Point FIR filter
#include <stdio.h>
#include <iostream.h>
#include <math.h>

#define SAMPLE double  /* type used for data samples */

// Zero the delay line of the filter before beginning
void clear(int ntaps,SAMPLE d[])
{
int i;
for(i=0;i<ntaps;i++){
d[i]=0;
}

}

//FIR FILTER FUNCTION
SAMPLE fir(int input,int ntaps,const SAMPLE c[],SAMPLE d[])
{
int i=0;
SAMPLE acc;
//store input at the beginning of delay

d=input;
//calculate filter
acc=0;
for (i=0;i<ntaps;i++){
acc+=c[i]*d[i];
}
//shift delay
for (i=ntaps-2;i>=0;i--){
d[i+1]=d[i];
}
return acc;
}

int main(void)
{
#define ntaps 7
//#define precision 8
static SAMPLE c[ntaps]={-0.06453888262870,  -0.04068941760916,
0.41809227322162,
0.78848561640558,   0.41809227322162,  -0.04068941760916,
-0.06453888262870};
SAMPLE true_c[ntaps];
static SAMPLE d[ntaps];
#define inp_size 3*ntaps
static SAMPLE inp[inp_size];
SAMPLE output,output1;
int precision;
int lsb,temp_output;
int bit_true_out;
int i,j;
FILE *fp;

cout<<"Give precision of bit true operations:\n";
cin>>precision;

//perform bit true operation on coefficient  matrix
for(j=0;j<ntaps;j++){
true_c[j]=pow(2,precision)*c[j];
}

//create impulse input signal

clear(inp_size,inp);
inp=1.0;
inp=1.0;
inp=1.0;

//apply filter to inputs and print results to file res.txt
fp=fopen("res.txt","a");
clear(ntaps,d);
for (i=0;i<inp_size;i++){
output1=fir(inp[i],ntaps,c,d);
output=fir(inp[i],ntaps,true_c,d);
temp_output=output;
lsb=3&temp_output;
bit_true_out=floor((temp_output/pow(2,precision))+lsb);
fprintf(fp,"---------------------------------------\n");
fprintf(fp,"Precision = %d \n",precision);
fprintf(fp,"Normal Output\n");
fprintf(fp,"%3.4f",(double)output1);
fprintf(fp,"\n");
fprintf(fp,"Temp_out=%d,lsb=%d\n",(int)temp_output,(int)lsb);
fprintf(fp,"\nBit true output\n");
fprintf(fp,"%d",(int)bit_true_out);
fprintf(fp,"\n");
fprintf(fp,"---------------------------------------\n");
}
fclose(fp);
}
```
```Mike Rosing <rosing@neurophys.wisc.edu> wrote in message

<snipped>
>
> I haven't looked at the code, but I think to implement fixed point
> you should either stick with integer math or use a simple limit test.
> After every computation, check if your double is > 1.0 or < 2^(-precision).
> if > 1.0, set the result to .99999999.... (1.0-2^(-precision) actually)
> and if < 2^(-precision) set the result to 0.0.  That simulates saturation
> and round off in a better way.
>
> Patience, persistence, truth,
> Dr. mike

Mike,

saturation but not rounding except for the smallest numbers.

Dirk

Dirk A. Bell
DSP Consultant
```
```X,

Your code assumes the filter is even symmetric.  It is, but you need
to realize that if you violate the assumption after you have fixed

What is 'lsb' supposed to be? The way you are using it makes no sense.
Set to 0 for now, and figure out what it is supposed to do later.

You are using small floats as inputs, scaling the filter coefficients
to be ints, removing the filter scaling from the result (adding 'lsb'
which you should not be doing the way you are doing) and taking the
integer part.  The outputs would be expected to be very small floats
that when converted to integer are severely quantized (a few small
integers). If you want to output int then scale the input floats up to
a corresponding int range before you filter.

Suggestion: Get this working for floating point computations.  Then
scale the inputs up to be large integers and verify still working.
Then change the filter over from float to fixed and verify still
working. Then quantize the output and verify you are getting what you
expect.

There are some other things to keep in mind when you start scaling
things up for int computation like what range of numbers you can
handle.

Dirk

Dirk A. Bell
DSP Consultant

> Hello,
> I am trying to implement an FIR and an IIR filter in fixed point
> arithmetic.
> The actual filter is of no importance(so I have implemented the
> simplest filter possible),as long as it is in fixed point,and moreover
> as I can give as input the desirable precision.
> I wrote the following piece of code,but the results are totaly bogus..
> Can anybody suggest some help?I dont know if I have to blame my
> programming skills(I am a total newbie in C) or the fixed point
> implementation scheme.
> I implement fixed point arithmetic by : 1.Multipying coefficients by
> 2^precision,findind the 2 least significant bits of the output,and my
> final result is the floor[(output/2^precision)+lsb's of output)]
>
> Code follows :
> //Fixed Point FIR filter
> #include <stdio.h>
> #include <iostream.h>
> #include <math.h>
>
> #define SAMPLE double  /* type used for data samples */
>
> // Zero the delay line of the filter before beginning
> void clear(int ntaps,SAMPLE d[])
> {
> 	int i;
> 		for(i=0;i<ntaps;i++){
> 			d[i]=0;
> 		}
>
> }
>
> //FIR FILTER FUNCTION
> SAMPLE fir(int input,int ntaps,const SAMPLE c[],SAMPLE d[])
> {
> 	int i=0;
> 	SAMPLE acc;
> 	//store input at the beginning of delay
>
> 		d=input;
> 	//calculate filter
> 	acc=0;
> 	for (i=0;i<ntaps;i++){
> 		acc+=c[i]*d[i];
> 	}
> 	//shift delay
> 	for (i=ntaps-2;i>=0;i--){
> 		d[i+1]=d[i];
> 	}
>  return acc;
> }
>
> int main(void)
> {
>     #define ntaps 7
> //#define precision 8
> 	static SAMPLE c[ntaps]={-0.06453888262870,  -0.04068941760916,
> 0.41809227322162,
>    0.78848561640558,   0.41809227322162,  -0.04068941760916,
> -0.06453888262870};
> 	SAMPLE true_c[ntaps];
> 	static SAMPLE d[ntaps];
> 	#define inp_size 3*ntaps
> 	static SAMPLE inp[inp_size];
> 	SAMPLE output,output1;
> 	int precision;
>     int lsb,temp_output;
> 	int bit_true_out;
> 		int i,j;
> 		FILE *fp;
>
> 		cout<<"Give precision of bit true operations:\n";
> 		cin>>precision;
>
> 		//perform bit true operation on coefficient  matrix
> 		for(j=0;j<ntaps;j++){
> 			true_c[j]=pow(2,precision)*c[j];
> 		}
>
> 	//create impulse input signal
>
> 	clear(inp_size,inp);
> 		inp=1.0;
> 		inp=1.0;
> 		inp=1.0;
>
>
>     //apply filter to inputs and print results to file res.txt
> 	fp=fopen("res.txt","a");
> 	clear(ntaps,d);
> 	for (i=0;i<inp_size;i++){
> 		output1=fir(inp[i],ntaps,c,d);
> 	output=fir(inp[i],ntaps,true_c,d);
> 	temp_output=output;
> 	lsb=3&temp_output;
> 	bit_true_out=floor((temp_output/pow(2,precision))+lsb);
> 	fprintf(fp,"---------------------------------------\n");
> 	fprintf(fp,"Precision = %d \n",precision);
> 	fprintf(fp,"Normal Output\n");
> 	fprintf(fp,"%3.4f",(double)output1);
> 			fprintf(fp,"\n");
> 			fprintf(fp,"Temp_out=%d,lsb=%d\n",(int)temp_output,(int)lsb);
> 			fprintf(fp,"\nBit true output\n");
> 			fprintf(fp,"%d",(int)bit_true_out);
> 			fprintf(fp,"\n");
> 	fprintf(fp,"---------------------------------------\n");
> 	}
>  fclose(fp);
> }
```
```Dear Dirk,
My "scheme" for the fixed point implementation is this :
a.Multiply filter coefficients with 2^precision
c.Convolution
d.Divide output with 2^precision
e.Mask the result with 3,to find the two least significant bits(lsb)
of it (is this where I made the mistake???and how should I do it??)
f.The fixed point output is the floor[output/2^add_bits + lsb]

Where is the mistake in this??I read this fixed point implementation
in the following paper : "A VLSI architecture for Lifting-Based
Forward and Inverse Wavelet Transform ",by
Andra,Chakrabarti,Acharya-IEEE Transactions on Signal Processing,Vol
50. N 4,April 2002..The ultimate goal is to implement fixed point
wavelet transforms sometime...