Sign in

Not a member? | Forgot your Password?

Search compdsp

Search tips

Free PDF Downloads

A Quadrature Signals Tutorial: Complex, But Not Complicated

Understanding the 'Phasing Method' of Single Sideband Demodulation

Complex Digital Signal Processing in Telecommunications

Introduction to Sound Processing

C++ Tutorial

Introduction of C Programming for DSP Applications

Fixed-Point Arithmetic: An Introduction

Cascaded Integrator-Comb (CIC) Filter Introduction

Discussion Groups

IIR Filter Design Software

Free Online Books

See Also

Embedded SystemsFPGA

Discussion Groups | Comp.DSP | Extended Precision Floating vs Fixed Point - Sharc

There are 2 messages in this thread.

You are currently looking at messages 1 to .


Is this discussion worth a thumbs up?

0

Extended Precision Floating vs Fixed Point - Sharc - Al Clark - 2003-12-16 13:22:00

There have been several papers comparing filter topologies and filter 
noise performance. There have also been several papers comparing fixed 
point to floating point processing.

Has anyone examined Extended Floating Point (32 bit mantissa) versus 32 
bit fixed point? Both options are available with a Sharc.

FIR case: 

Fixed point should work very well since you have an 80 bit accumulator 
that is only rounded once after all the MACs. I'm not sure what the 
floating point tradeoff might be since you might have multiplying 
advantages with small coefficients but results are always summed into a 
floating point result which will give effectively reduce the contribution 
of small numbers. 

IIR case:

One acid test is a high Q low frequency bandpass. With a fixed point 
implementation I would use Direct Form I (perhaps with noise shaping).
What are the floating point tradeoffs?


-- 
Al Clark
Danville Signal Processing, Inc.
--------------------------------------------------------------------
Purveyors of Fine DSP Hardware and other Cool Stuff
Available at http://www.danvillesignal.com


Re: Extended Precision Floating vs Fixed Point - Sharc - Jon Harris - 2003-12-16 14:59:00

Hi, Al.  This is one of my favorite topics, actually!

First of all, keep in mind that the SHARC's extended precision
floating-point actually achieves a 33-bit mantissa because of the "hidden
bit" in the IEEE floating point format.  A small point, but one at least
worth mentioning.

In my application, I ended up using the floating-point mode.  But this was
primarily because the whole system needed to be floating point, not because
the filters are better or worse.  With that said, see specific comments
below.

"Al Clark" <d...@danvillesignal.com> wrote in message
news:Xns94537DF0991Daclarkdanvillesignal@66.133.130.30...
> There have been several papers comparing filter topologies and filter
> noise performance. There have also been several papers comparing fixed
> point to floating point processing.
>
> Has anyone examined Extended Floating Point (32 bit mantissa) versus 32
> bit fixed point? Both options are available with a Sharc.
>
> FIR case:
>
> Fixed point should work very well since you have an 80 bit accumulator
> that is only rounded once after all the MACs. I'm not sure what the
> floating point tradeoff might be since you might have multiplying
> advantages with small coefficients but results are always summed into a
> floating point result which will give effectively reduce the contribution
> of small numbers.

What is the precision of your source data?  If your original source data is
24-bits or less, I would think the 40-bit floating point would be plenty
adequate for an FIR.  Granted, you may doing some rounding after every MAC,
but you still have plenty of extra bits (e.g. worst case at least 9 if your
source data is 24-bit) so I don't think there would be any significant loss.
I don't think it's really necessary to keep _all_ the extra bits, just
enough to ensure accuracy of the final rounded result.

The 80-bit accumulator with fixed point works excellently as well (I used
that in another job).  There is no loss of precision in the multiplies and
you have plenty of guard bits for overflow.

I think both methods would achieve essentially equivalent results, so the
decision may hinge more on other factors such as:
1) With floating-point, you don't get a true MAC, just a parallel
multiply/add which can be used to make a "pipelined MAC" with some (usually
small) overhead.
2) With the accumulator, there are usually some required instructions to set
up and get the data out in a usable form.
3) There are only 2 80-bit fixed-point accumulators, but 16 40-bit
floating-point "accumulators" (32 if you count the background registers).

> IIR case:
>
> One acid test is a high Q low frequency bandpass. With a fixed point
> implementation I would use Direct Form I (perhaps with noise shaping).
> What are the floating point tradeoffs?

In my experience, the filter form is at least as if not more important than
the issue of floating-point vs. fixed-point and precision.  I've had
excellent results in floating-point with the 4-multiply normalized
Lattice/Ladder form, though there is obviously some cost in execution and
extra coefficient storage.  The Direct Form II Transposed is the best of the
Direct Forms in my experience.
One key with floating-point is to store the delay elements/state variables
(the "Z's") as 40-bit data (using 48-bit wide memory). Without this, you are
losing much of the benefit of the extended precision--for example in your
acid test case, the delay elements contribute strongly to the result.  This
may cost you in terms of memory usage, execution time, and/or programming
hassle factor.  In my particular application, it turned out to work better
to use the Lattice/Ladder form where I could get away with 32-bit delay
element storage rather than deal with 40-bit storage.  YMMV.

Though I haven't tried 32-bit fixed point IIR's on the SHARC, my hunch is
that there would be no advantage over "floating-point done right," e.g.
40-bit math _and_ storage.  The 80-bit accumulator provides little or no
benefit with an IIR biquad and the floating-point is always going to have
more resolution than the fixed (33 vs. 32 with signals close to full scale).

IMHO, the extended precision mode on the SHARC is about as ideal of a format
to work with as there is in today's DSPs.  You have the all the flexibilty
and ease of programming of floating-point (no scaling/overflow to deal
with!), a very wide mantissa, and plenty of "accumulators".  I only wish
there was a true floating-point MAC, though I certainly understand that this
instruction would probably be the slowest path through the hardware and
consequently the limiting factor in clock speed.

-Jon

> --
> Al Clark
> Danville Signal Processing, Inc.
> --------------------------------------------------------------------
> Purveyors of Fine DSP Hardware and other Cool Stuff
> Available at http://www.danvillesignal.com