DSPRelated.com
Forums

Re: filter accuracy

Started by Jeff Brower July 1, 2008
Christophe-
> I have a project with a downsampling filter on C6713.
> When using the normal routine, the result is pretty much what I have in Matlab
> (errors are 1e-10), but this non optimised routine is taking something like 2
> cycles per tap.
>
> When I am optimising it (still the same routine in C, but with optimisation
> directives for the compiler), the routine takes 1/2 cycle per tap, but there is a
> small error. the error is approx 1e-7 which is quite low, but that I cannot
> explain.
>
> Can somebody help me understand what can be the difference in calculations between
> optimised and non optimised version.

Is your downsampling filter an FIR, so only multiply and add calculations going on?

Are your C code vars defined as floats or doubles? The C67xx has both
single-precision and double-precision floating-point instructions; maybe compiler
optimization somehow is re-arranging things so that you're seeing all or partial
single-precision results. Maybe if you "enforce" results using typecasting or other
method, you could get back the more accurate results. Might lose your speed increase
though.

-Jeff
Thanks for answer.

Sorry didn't mention that, I am using a normal FIR filtering, with single precision float values.
Another thing is that if I am using the Assembly FIR filter routine from TI, I have the exact same error of 1e-7 and same speed.But with standard C function (the one described that is in comment in TI assembly routine), the error lowers to 1e-10 or so.

Chris

Date: Tue, 1 Jul 2008 08:20:10 -0500From: j...@signalogic.comTo: c...@hotmail.comCC: c...Subject: Re: [c6x] filter accuracyChristophe-
I have a project with a downsampling filter on C6713. When using the normal routine, the result is pretty much what I have in Matlab (errors are 1e-10), but this non optimised routine is taking something like 2 cycles per tap.
When I am optimising it (still the same routine in C, but with optimisation directives for the compiler), the routine takes 1/2 cycle per tap, but there is a small error. the error is approx 1e-7 which is quite low, but that I cannot explain.
Can somebody help me understand what can be the difference in calculations between optimised and non optimised version.
Is your downsampling filter an FIR, so only multiply and add calculations going on?
Are your C code vars defined as floats or doubles? The C67xx has both single-precision and double-precision floating-point instructions; maybe compiler optimization somehow is re-arranging things so that you're seeing all or partial single-precision results. Maybe if you "enforce" results using typecasting or other method, you could get back the more accurate results. Might lose your speed increase though.
-Jeff
Christophe-

> Thanks for answer.
>
> Sorry didn't mention that, I am using a normal FIR filtering, with single precision
> float values.

Ok.
> Another thing is that if I am using the Assembly FIR filter routine from TI, I have
> the exact same error of 1e-7 and same speed.
> But with standard C function (the one described that is in comment in TI assembly
> routine), the error lowers to 1e-10 or so.

Well, obvious next step is to compare the asm code generated by the C compiler (using
its non-optimized settings) with the asm filter routine. My guess is there is would
be an obvious difference that shows an assumption made by the C compiler that we're
not thinking of yet.

-Jeff
> --
> Date: Tue, 1 Jul 2008 08:20:10 -0500
> From: j...@signalogic.com
> To: c...@hotmail.com
> CC: c...
> Subject: Re: [c6x] filter accuracy
>
> Christophe-
> I have a project with a downsampling filter on C6713.
> When using the normal routine, the result is pretty much what I
> have in Matlab (errors are 1e-10), but this non optimised
> routine is taking something like 2 cycles per tap. When I am
> optimising it (still the same routine in C, but with
> optimisation directives for the compiler), the routine takes
> 1/2 cycle per tap, but there is a small error. the error is
> approx 1e-7 which is quite low, but that I cannot explain.
>
> Can somebody help me understand what can be the difference in
> calculations between optimised and non optimised version.
> Is your downsampling filter an FIR, so only multiply and add calculations
> going on?
>
> Are your C code vars defined as floats or doubles? The C67xx has both
> single-precision and double-precision floating-point instructions; maybe
> compiler optimization somehow is re-arranging things so that you're
> seeing all or partial single-precision results. Maybe if you "enforce"
> results using typecasting or other method, you could get back the more
> accurate results. Might lose your speed increase though.
>
> -Jeff
I assume you have both data and filter coeffs 8-byte aligned ?

- Andrew E.

----- Original Message ----
From: christophe blouet
To: Jeff Brower
Cc: c...
Sent: Tuesday, July 1, 2008 9:31:15 AM
Subject: RE: [c6x] filter accuracy

Thanks for answer.

Sorry didn't mention that, I am using a normal FIR filtering, with single precision float values.
Another thing is that if I am using the Assembly FIR filter routine from TI, I have the exact same error of 1e-7 and same speed.
But with standard C function (the one described that is in comment in TI assembly routine), the error lowers to 1e-10 or so.

Chris

________________________________
Date: Tue, 1 Jul 2008 08:20:10 -0500
From: j...@signalogic.com
To: c...@hotmail.com
CC: c...
Subject: Re: [c6x] filter accuracy

Christophe-

I have a project with a downsampling filter on C6713.
When using the normal routine, the result is pretty much what I have in Matlab (errors are 1e-10), but this non optimised routine is taking something like 2 cycles per tap. When I am optimising it (still the same routine in C, but with optimisation directives for the compiler), the routine takes 1/2 cycle per tap, but there is a small error. the error is approx 1e-7 which is quite low, but that I cannot explain.
Can somebody help me understand what can be the difference in calculations between optimised and non optimised version.

Is your downsampling filter an FIR, so only multiply and add calculations going on?
Are your C code vars defined as floats or doubles? The C67xx has both single-precision and double-precision floating-point instructions; maybe compiler optimization somehow is re-arranging things so that you're seeing all or partial single-precision results. Maybe if you "enforce" results using typecasting or other method, you could get back the more accurate results. Might lose your speed increase though.
-Jeff
Hi Christophe, Jeff, Mike,

> Subject: filter accuracy
> Posted by: "christophe blouet" c...@hotmail.com
> Date: Tue Jul 1, 2008 4:26 am ((PDT))
>
> I have a project with a downsampling filter on C6713.
> When using the normal routine, the result is pretty much what I have in
> Matlab (errors are 1e-10), but this non optimised routine is taking something
> like 2 cycles per tap.
>
> When I am optimising it (still the same routine in C, but with optimisation
> directives for the compiler), the routine takes 1/2 cycle per tap, but there
> is a small error. the error is approx 1e-7 which is quite low, but that I
> cannot explain.
>
> Can somebody help me understand what can be the difference in calculations
> between optimised and non optimised version.

I do not know what's being modified in the TI's original C code by Christopher,
anyway the answer is simple: the error depend on the order of MAC evaluations
and this fact is the very nature of all floating point calculations.

The original code evaluates MACs sequentially, one by one until the dot product
is computed.

The optimized code (TI's one) calculates several MACs at once (or better to say
in the single inner loop iteration), hence updating several accumulators - I
saw two through maybe eight in the DSPLib fir routines. At the final phase
the accumulators are summed alltogether to get the final dot product value.
The order of MACs has changed therefore the final result is slightly off.

If you would run both original and optimized routines with different data
sets (both filter coefficients and data) you would see that the errors would
vary, either of routines might produce more or less precise result.

Just a simple example. Lets set aside multiplication part of the base MAC
in dot product calculation and consider only its addition part. Then we
need to calculate a sum: x(0) + x(1) + ... + x(n-1). Now, let n = 4 and
x is (1, eps, -1, eps). The eps (I use it as a short fo epsilon) is a well
known number such that 1+eps is 1 in floating point arithmetic. For a 32 bit
single precision the eps is 2**(-23) or 1.1920928955078125e-7. Lets calculate
the sum sequentially:
S(0) = 1
S(1) = 0
S(2) = eps and this is the final result which is two times off the
exact math result.

An optimized routine may well calculate two running sums in parallel, in that
order
S0 = SUM of x(2*k+0) all even index entries
S1 = SUM of x(2*k+1) all odd index entries

The final result is S0 + S1

Then the running sums would be S0 = 0, S1 = 2*eps. By adding them together we
get 2*eps - which is exactly the correct result. The error between the two
methods is 0.5 or 50%! There are numerous examples of how finite precision
can lead to incorrect results of floating point calculations...

I hope this makes things more clear,

Rgds,
Andrew
> Date: 02-Jul-2008 17:03:52 -0700
> From: Jeff Brower
> Subject: Re: [c6x] Re: filter accuracy
>
> So what this means is that if Christophe chose data where each coefficient is a
> number that exactly "fits" a discrete IEEE 32-bit floating-point representation (i.e.
> with no residual error), then he would see no difference between optimized and
> non-optimized calculations? Is that something he can try?

Hi Jeff,

Yes this is absolutely correct. In this case results obtained by different
methods would be the same, no discrepance at all.

A two suggestions: any fp multiplication by a power of 2 produces no roundoff,
hence good candidates for filter coefficients are small positive powers of 2.

Second, any small integers of magnitude in [0, 2**24] are represented exactly
in the fp system, this is a hint that a data set is a collection of small
integers.

If the final dot product is less than 2**24 in magnitude, it is also exactly
representable, provided any term (which is equal to a power of 2 times an
integer) is also exactly representable.

Rgds,
Andrew
Andrew-

> > Subject: filter accuracy
> > Posted by: "christophe blouet" c...@hotmail.com
> > Date: Tue Jul 1, 2008 4:26 am ((PDT))
> >
> > I have a project with a downsampling filter on C6713.
> > When using the normal routine, the result is pretty much what I have in
> > Matlab (errors are 1e-10), but this non optimised routine is taking something
> > like 2 cycles per tap.
> >
> > When I am optimising it (still the same routine in C, but with optimisation
> > directives for the compiler), the routine takes 1/2 cycle per tap, but there
> > is a small error. the error is approx 1e-7 which is quite low, but that I
> > cannot explain.
> >
> > Can somebody help me understand what can be the difference in calculations
> > between optimised and non optimised version.
>
> I do not know what's being modified in the TI's original C code by Christopher,
> anyway the answer is simple: the error depend on the order of MAC evaluations
> and this fact is the very nature of all floating point calculations.
>
> The original code evaluates MACs sequentially, one by one until the dot product
> is computed.
>
> The optimized code (TI's one) calculates several MACs at once (or better to say
> in the single inner loop iteration), hence updating several accumulators - I
> saw two through maybe eight in the DSPLib fir routines. At the final phase
> the accumulators are summed alltogether to get the final dot product value.
> The order of MACs has changed therefore the final result is slightly off.
>
> If you would run both original and optimized routines with different data
> sets (both filter coefficients and data) you would see that the errors would
> vary, either of routines might produce more or less precise result.
>
> Just a simple example. Lets set aside multiplication part of the base MAC
> in dot product calculation and consider only its addition part. Then we
> need to calculate a sum: x(0) + x(1) + ... + x(n-1). Now, let n = 4 and
> x is (1, eps, -1, eps). The eps (I use it as a short fo epsilon) is a well
> known number such that 1+eps is 1 in floating point arithmetic. For a 32 bit
> single precision the eps is 2**(-23) or 1.1920928955078125e-7. Lets calculate
> the sum sequentially:
> S(0) = 1
> S(1) = 0
> S(2) = eps and this is the final result which is two times off the
> exact math result.
>
> An optimized routine may well calculate two running sums in parallel, in that
> order
> S0 = SUM of x(2*k+0) all even index entries
> S1 = SUM of x(2*k+1) all odd index entries
>
> The final result is S0 + S1
>
> Then the running sums would be S0 = 0, S1 = 2*eps. By adding them together we
> get 2*eps - which is exactly the correct result. The error between the two
> methods is 0.5 or 50%! There are numerous examples of how finite precision
> can lead to incorrect results of floating point calculations...
>
> I hope this makes things more clear,

So what this means is that if Christophe chose data where each coefficient is a
number that exactly "fits" a discrete IEEE 32-bit floating-point representation (i.e.
with no residual error), then he would see no difference between optimized and
non-optimized calculations? Is that something he can try?

-Jeff
Thanky you all,
Andrew, this was a brillant demonstration it is now loud and clear to me.
I will adapt my Matlab code to produce the correct coefficents, and check this.

Thanks a lot.