DSPRelated.com
Forums

Fixed point arithmetic

Started by thunder October 26, 2009
Hi

I have a question regarding Fixed point Arithmetic addition.

For example, i have two fixed point numbers:

a = unsigned Q7.8 format (7-bit integer, 8 bit factional).
b = unsigned Q7.8 format (    "                     "            ).

Now a + b = c, where c is an unsigned Q8.8 result.

Qs: How do I transform c into d, where d is a unsigned Q7.9 result ??

The way i have tried to approach it is as follows:

Integer part
----------------
The way i have thought about the integer part is to say that if bit
[15]  of the result c is a '1', then  bits[14:8] of d is b"111_1111",
otherwise d[14:8] = c[14:8].

Is is correct ??

Fractional Part
----------------------

The way i have thought about the fractional part is that for d, i want
one extra fractional bit to increase the fractional preciion.

The obvious way to me seems to be to add an extra bit at the LSB end:
ie d[8:0 = c[7:0] & 1'b0.

Is this correct?

QS; Can anyone recommend a good book on Fixed Point and Floating point
arithmetic ?

THanks in Advance

J
thunder wrote:
> Hi > > I have a question regarding Fixed point Arithmetic addition. > > For example, i have two fixed point numbers: > > a = unsigned Q7.8 format (7-bit integer, 8 bit factional). > b = unsigned Q7.8 format ( " " ). > > Now a + b = c, where c is an unsigned Q8.8 result.
Then there is overflow, just as two Q15.0 integers and getting a Q16.0 sum. (Remember the sign bit.)
> Qs: How do I transform c into d, where d is a unsigned Q7.9 result ??
You can't. Count the bits. (Remember the sign bit.) ...
> QS; Can anyone recommend a good book on Fixed Point and Floating point > arithmetic ?
http://www.digitalsignallabs.com/fp.pdf Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
On Mon, 26 Oct 2009 10:53:23 -0400, Jerry Avins wrote:

> thunder wrote: >> Hi >> >> I have a question regarding Fixed point Arithmetic addition. >> >> For example, i have two fixed point numbers: >> >> a = unsigned Q7.8 format (7-bit integer, 8 bit factional). b = unsigned >> Q7.8 format ( " " ). >> >> Now a + b = c, where c is an unsigned Q8.8 result. > > Then there is overflow, just as two Q15.0 integers and getting a Q16.0 > sum. (Remember the sign bit.) > >> Qs: How do I transform c into d, where d is a unsigned Q7.9 result ?? > > You can't. Count the bits. (Remember the sign bit.) >
You can't on a 16-bit machine, but if you're working in an FPGA or custom logic a 17-bit type is no problem.
> ... > >> QS; Can anyone recommend a good book on Fixed Point and Floating point >> arithmetic ? > > http://www.digitalsignallabs.com/fp.pdf > > Jerry
-- www.wescottdesign.com
On Mon, 26 Oct 2009 01:17:59 -0700, thunder wrote:

> Hi > > I have a question regarding Fixed point Arithmetic addition. > > For example, i have two fixed point numbers: > > a = unsigned Q7.8 format (7-bit integer, 8 bit factional). b = unsigned > Q7.8 format ( " " ). > > Now a + b = c, where c is an unsigned Q8.8 result. > > Qs: How do I transform c into d, where d is a unsigned Q7.9 result ?? > > The way i have tried to approach it is as follows: > > Integer part > ---------------- > The way i have thought about the integer part is to say that if bit [15] > of the result c is a '1', then bits[14:8] of d is b"111_1111", > otherwise d[14:8] = c[14:8]. > > Is is correct ?? > > Fractional Part > ---------------------- > > The way i have thought about the fractional part is that for d, i want > one extra fractional bit to increase the fractional preciion. > > The obvious way to me seems to be to add an extra bit at the LSB end: ie > d[8:0 = c[7:0] & 1'b0. > > Is this correct?
Rather than answer that, I'm just going to point out that there's not a 1:1 mapping between Q8.8 and Q7.9 types. So for a good part of the range of your Q8.8 type you can only approximate the value in Q7.9. So the question becomes not "is this correct?" but "is this right for my application?" -- and you know what your application is. Me, I'd append a zero to the end and I'd saturate to +/- full range (or to +63.etc and -63.etc -- allowing the b100000... into a signed twos compliment type gives you a tiny corner case that attracts a huge amount of nasty bugs). -- www.wescottdesign.com
thunder <jao16@hotmail.com> writes:

> Hi > > I have a question regarding Fixed point Arithmetic addition. > > For example, i have two fixed point numbers: > > a = unsigned Q7.8 format (7-bit integer, 8 bit factional). > b = unsigned Q7.8 format ( " " ). > > Now a + b = c, where c is an unsigned Q8.8 result. > > Qs: How do I transform c into d, where d is a unsigned Q7.9 result ??
There is no way in general to do this conversion and avoid some kind of nonlinear effect since the range of Q7.9 is smaller than Q8.8. The most obvious method would be to saturate the Q8.8 result to Q7.9. It would be good to know the reason why you're trying to rescale in this manner - there may be a better way to do things from a higher level point-of-view. -- Randy Yates % "The dreamer, the unwoken fool - Digital Signal Labs % in dreams, no pain will kiss the brow..." mailto://yates@ieee.org % http://www.digitalsignallabs.com % 'Eldorado Overture', *Eldorado*, ELO
Tim Wescott wrote:

   ...

> You can't on a 16-bit machine, but if you're working in an FPGA or custom > logic a 17-bit type is no problem.
What do you suppose the OP's context is? Jerry -- Engineering is the art of making what you want from things you can get. &macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;
Randy Yates wrote:
> thunder <jao16@hotmail.com> writes: > >> Hi >> >> I have a question regarding Fixed point Arithmetic addition. >> >> For example, i have two fixed point numbers: >> >> a = unsigned Q7.8 format (7-bit integer, 8 bit factional). >> b = unsigned Q7.8 format ( " " ). >> >> Now a + b = c, where c is an unsigned Q8.8 result. >> >> Qs: How do I transform c into d, where d is a unsigned Q7.9 result ?? > > There is no way in general to do this conversion and avoid some kind of > nonlinear effect since the range of Q7.9 is smaller than Q8.8. The most > obvious method would be to saturate the Q8.8 result to Q7.9. > > It would be good to know the reason why you're trying to rescale in this > manner - there may be a better way to do things from a higher level > point-of-view.
I suggest that the give thunder time to glean an understanding from your monograph, then ask for whatever further clarification he still needs. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
On Mon, 26 Oct 2009 12:27:56 -0400, Jerry Avins wrote:

> Tim Wescott wrote: > > ... > >> You can't on a 16-bit machine, but if you're working in an FPGA or >> custom logic a 17-bit type is no problem. > > What do you suppose the OP's context is? >
Homework, but I'm trying not to be ruled by assumptions. -- www.wescottdesign.com
>On Mon, 26 Oct 2009 10:53:23 -0400, Jerry Avins wrote: > >> thunder wrote: >>> Hi >>> >>> I have a question regarding Fixed point Arithmetic addition. >>> >>> For example, i have two fixed point numbers: >>> >>> a = unsigned Q7.8 format (7-bit integer, 8 bit factional). b =
unsigned
>>> Q7.8 format ( " " ). >>> >>> Now a + b = c, where c is an unsigned Q8.8 result. >> >> Then there is overflow, just as two Q15.0 integers and getting a Q16.0 >> sum. (Remember the sign bit.) >> >>> Qs: How do I transform c into d, where d is a unsigned Q7.9 result ?? >> >> You can't. Count the bits. (Remember the sign bit.) >> >You can't on a 16-bit machine, but if you're working in an FPGA or custom
>logic a 17-bit type is no problem.
Not entirely true (carry flag), but I'm splitting hairs, since the problem is misguided. BTW, the OP repeatedly said unsigned, though it may have been confusing with repeated references to the MSbit. There's absolutely no point in switching to 7.9 midstream. 7.8 holds exactly the same information as 7.9 after *this* operation; the loss is in the integer part, not the fractional part. Adding two unsigned 15 bit numbers could probably be achieved with exactly the same opcode, because the result merely has to be interpreted correctly (the same is not true of multiplication, of course). For this problem, I would use all 16 bits the whole time, not 15. You then have to choose whether to saturate or wrap around. There may be some processors that support saturation as an instruction, but I think you'd otherwise have to look at the carry flag; this should be trivial for unsigned addition. To wrap, do nothing (assuming other constraints don't prevent you from using all 16, else mask).
Tim Wescott wrote:
> On Mon, 26 Oct 2009 12:27:56 -0400, Jerry Avins wrote: > >> Tim Wescott wrote: >> >> ... >> >>> You can't on a 16-bit machine, but if you're working in an FPGA or >>> custom logic a 17-bit type is no problem. >> What do you suppose the OP's context is? >> > Homework, but I'm trying not to be ruled by assumptions.
You're a better man than I am! Jerry -- Engineering is the art of making what you want from things you can get. &macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;