DSPRelated.com
Forums

fixed point & floating point

Started by ranjeet September 3, 2004
Hi all  !!!!

  As we all know that In fixed point it is easier to multiply the
numbers of the same Q format but their is difficulties in the addition
of this.... correct.

 So my question comes like this. As we go for the basic knowledge. we
realise that the mutilipcation is nothing but simple addition........
right.

now if we gor for the mutilication then we mean that their is lot of
addtion as compared to the simple addtion of the same number.

so why it has been said that multiplication is simple as compared to
the addtion.

One thing i am sure i am really missing some thing thats why my doubt
has beeen generated..


so please let me know on this point.


One more thing Most of the audio signal coding that is Audio codec
deals in The fixed point notation. Now As you all know for geting the
more precission we go for the floating point. But it has been said
that we go for the fixed point implemntation of the audio codec. This
i have read in a DSP book.

Why we go for the Fixed point implementaion ???? One reason i can
understand on this issue is that it may be due to avoid some larger
calculation as comapared to the floationg point.

But dont u all think that we are also on the verge of lossing the data
as the pression level as compared to the Floationg poiont is less in
the fixed point.

 
 where The flaoting points is implemented ????????? and what are the
constraints in the really time senario for the floating point and the
fixed point.

Thanks in advance
Ranjeet.
Hi Ranjeet,

A number of your questions are addressed in the FAQ:

http://www.bdti.com/faq/2.htm#211

--RY

ranjeet.gupta@gmail.com (ranjeet) writes:

> Hi all !!!! > > As we all know that In fixed point it is easier to multiply the > numbers of the same Q format but their is difficulties in the addition > of this.... correct. > > So my question comes like this. As we go for the basic knowledge. we > realise that the mutilipcation is nothing but simple addition........ > right. > > now if we gor for the mutilication then we mean that their is lot of > addtion as compared to the simple addtion of the same number. > > so why it has been said that multiplication is simple as compared to > the addtion. > > One thing i am sure i am really missing some thing thats why my doubt > has beeen generated.. > > > so please let me know on this point. > > > One more thing Most of the audio signal coding that is Audio codec > deals in The fixed point notation. Now As you all know for geting the > more precission we go for the floating point. But it has been said > that we go for the fixed point implemntation of the audio codec. This > i have read in a DSP book. > > Why we go for the Fixed point implementaion ???? One reason i can > understand on this issue is that it may be due to avoid some larger > calculation as comapared to the floationg point. > > But dont u all think that we are also on the verge of lossing the data > as the pression level as compared to the Floationg poiont is less in > the fixed point. > > > where The flaoting points is implemented ????????? and what are the > constraints in the really time senario for the floating point and the > fixed point. > > Thanks in advance > Ranjeet.
-- % Randy Yates % "With time with what you've learned, %% Fuquay-Varina, NC % they'll kiss the ground you walk %%% 919-577-9882 % upon." %%%% <yates@ieee.org> % '21st Century Man', *Time*, ELO http://home.earthlink.net/~yatescr
Ranjeet,

When taking Randy's suggestion, you will do well to rid yourself of some 
misconceptions.

Overflow can happen always when adding, and except for the Q format that 
represents all numbers as smaller than one, when multiplying.

Although multiplication can be accomplished by repeated addition, it 
isn't done that way in practice. Instead, it's done with an algorithm 
I'm sure you understand. (If you don't, ask and I will explain.)

Hardware for floating-point calculation is more complex and uses more 
power than fixed-point hardware of the same speed.

For a given number of bits, fixed point is more precise than floating 
point. With floating point, some of the bits specify the range. With 
fixed point, all the bits are available to specify value. Floating point 
trades away precision to get increased range. The range of a codec is 
small enough not to benefit from that increase.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
ranjeet wrote:

> Hi all !!!! > > As we all know that In fixed point it is easier to multiply the > numbers of the same Q format but their is difficulties in the addition > of this.... correct. > > So my question comes like this. As we go for the basic knowledge. we > realise that the mutilipcation is nothing but simple addition........ > right. > > now if we gor for the mutilication then we mean that their is lot of > addtion as compared to the simple addtion of the same number. > > so why it has been said that multiplication is simple as compared to > the addtion. > > One thing i am sure i am really missing some thing thats why my doubt > has beeen generated.. > > > so please let me know on this point. > > > One more thing Most of the audio signal coding that is Audio codec > deals in The fixed point notation. Now As you all know for geting the > more precission we go for the floating point. But it has been said > that we go for the fixed point implemntation of the audio codec. This > i have read in a DSP book. > > Why we go for the Fixed point implementaion ???? One reason i can > understand on this issue is that it may be due to avoid some larger > calculation as comapared to the floationg point. > > But dont u all think that we are also on the verge of lossing the data > as the pression level as compared to the Floationg poiont is less in > the fixed point. > > > where The flaoting points is implemented ????????? and what are the > constraints in the really time senario for the floating point and the > fixed point. > > Thanks in advance > Ranjeet.
ranjeet wrote:

> Hi all !!!! > > As we all know that In fixed point it is easier to multiply the > numbers of the same Q format but their is difficulties in the addition > of this.... correct. >
Both fixed point and floating point arithmetic present problems in addition. With fixed point you need to worry about overflow, with floating point you need to renormalize the numbers (shift the mantissas around according to the exponents) before addition and you have to renormalize them when you're done (find where the leading '1' is in the mantissa). Avoiding overflow is much less complex than dealing with normalization.
> So my question comes like this. As we go for the basic knowledge. we > realise that the mutilipcation is nothing but simple addition........ > right. > > now if we gor for the mutilication then we mean that their is lot of > addtion as compared to the simple addtion of the same number. >
That's been dealt with in another post -- and most practical DSP hardware has built-in hardware multiply these days.
> so why it has been said that multiplication is simple as compared to > the addtion. >
Who said that?
> One thing i am sure i am really missing some thing thats why my doubt > has beeen generated.. > > > so please let me know on this point. > > > One more thing Most of the audio signal coding that is Audio codec > deals in The fixed point notation. Now As you all know for geting the > more precission we go for the floating point. But it has been said > that we go for the fixed point implemntation of the audio codec. This > i have read in a DSP book. >
For the same number of bits you _lose_ precision with floating point, because you have to carry the exponent.
> Why we go for the Fixed point implementaion ???? One reason i can > understand on this issue is that it may be due to avoid some larger > calculation as comapared to the floationg point. > > But dont u all think that we are also on the verge of lossing the data > as the pression level as compared to the Floationg poiont is less in > the fixed point. > > > where The flaoting points is implemented ????????? and what are the > constraints in the really time senario for the floating point and the > fixed point. >
Floating point operations, particularly addition, are much more complex and require either more time or more hardware (which leads to more size and more power).
> Thanks in advance > Ranjeet.
In general if you're working on a project that requires fast prototyping and won't have a large sales volume then floating point is a good way to go. If you have power, size or product cost constraints that exceed your engineering time constraints then fixed point is the way to go. The reason is that with any kind of signal processing application you can go through the algorithm and predict the dynamic range for each operation and scale your numbers appropriately for the problem at hand to fit into a fixed-point machine. It costs some engineering time up-front, but it means that for nearly all products that use signal processing you really don't need to pay for floating point. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Tim Wescott wrote:

> ranjeet wrote:
...
>> so why it has been said that multiplication is simple as compared to >> the addtion. >> > Who said that?
This could be a dimly understood and poorly remembered echo of accurate information. Hardware for floating-point addition is more complex than for floating-point multiplication. To multiply, one simply multiplies the mantissas and adds the exponents, then renormalizes. To add, the number with the smaller exponent must be denormalized so that the exponents are equal, then addition performed, followed by renormalizing. Floating-point addition in software takes longer than floating-poing multiplication in software on those processors that can multiply signed integers as fast as they can add them. Ignoring the conditions can lead one to Ranjeet's erroneous conclusion. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Tim Wescott wrote:

 > ranjeet wrote:


   ...

 >> so why it has been said that multiplication is simple as compared to
 >> the addtion.
 >>
 > Who said that?


This could be a dimly understood and poorly remembered echo of accurate
information.

Hardware for floating-point addition is more complex than for
floating-point multiplication. To multiply, one simply multiplies the
mantissas and adds the exponents, then renormalizes. To add, the number
with the smaller exponent must be denormalized so that the exponents are
equal, then addition performed, followed by renormalizing.

Floating-point addition in software takes longer than floating-point
multiplication in software on those processors that can multiply signed
integers as fast as they can add them.

Ignoring the conditions can lead one to Ranjeet's erroneous conclusion.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

"Jerry Avins" <jya@ieee.org> wrote in message
news:4138b434$0$19722$61fed72c@news.rcn.com...
> Tim Wescott wrote: > > > ranjeet wrote: > > ... > > >> so why it has been said that multiplication is simple as compared to > >> the addtion. > >> > > Who said that? > > This could be a dimly understood and poorly remembered echo of accurate > information. > > Hardware for floating-point addition is more complex than for > floating-point multiplication. To multiply, one simply multiplies the > mantissas and adds the exponents, then renormalizes. To add, the number > with the smaller exponent must be denormalized so that the exponents are > equal, then addition performed, followed by renormalizing. > > Floating-point addition in software takes longer than floating-poing > multiplication in software on those processors that can multiply signed > integers as fast as they can add them. > > Ignoring the conditions can lead one to Ranjeet's erroneous conclusion.
When I was in school, one project we had was to write floating-point add and multiply routines in assembler on a Motorola micro (with fixed point only). I remember that the multiply was considerably easier than the add, for the reasons Jerry mentioned above, but that neither was trivial.
Tim Wescott wrote:

 > ranjeet wrote:


   ...

 >> so why it has been said that multiplication is simple as compared to
 >> the addtion.
 >>
 > Who said that?


This could be a dimly understood and poorly remembered echo of accurate
information.

Hardware for floating-point addition is more complex than for
floating-point multiplication. To multiply, one simply multiplies the
mantissas and adds the exponents, then renormalizes. To add, the number
with the smaller exponent must be denormalized so that the exponents are
equal, then addition performed, followed by renormalizing.

Floating-point addition in software takes longer than floating-point
multiplication in software on those processors that can multiply signed
integers as fast as they can add them.

Ignoring the conditions can lead one to Ranjeet's erroneous conclusion.

   ...

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Jerry Avins wrote:


> > Floating-point addition in software takes longer than floating-poing > multiplication in software on those processors that can multiply signed > integers as fast as they can add them. > > Ignoring the conditions can lead one to Ranjeet's erroneous conclusion. > > Jerry
Unless the processor also has a way-fast normalization instruction. I was _very_ surprised at the speed of floating point on the TMS320F28xx processor with Code Composter. At the place I used to work I had a fixed-point arithmetic library. It has saturating adds and fractional multiply to slow it down, and it was just about as fast as floating point on the 'F28xx -- on anything else floating point was way slower, even on a Pentium with fast floating point (no, I never tried it on a FP DSP). -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
"ranjeet" <ranjeet.gupta@gmail.com> wrote in message 
news:77c88a3b.0409030108.19b9f2aa@posting.google.com...
> Hi all !!!! > > As we all know that In fixed point it is easier to multiply the > numbers of the same Q format but their is difficulties in the addition > of this.... correct. > > So my question comes like this. As we go for the basic knowledge. we > realise that the mutilipcation is nothing but simple addition........ > right. > > now if we gor for the mutilication then we mean that their is lot of > addtion as compared to the simple addtion of the same number. > > so why it has been said that multiplication is simple as compared to > the addtion.
I believe that what you are thinking of here is that it is easier to multiply to 16-bit Q15 numbers together than it is to add them. The reason for that is because a Q15 number is bound to [-1,1). Therefore when you do a multiply you remain within those bounds but if you add you may overflow. Brad