Hi all I have a compute engine which generates an 8-bit floating number, fp_num[7:0]. The format of this 8-bit floating point number is 3-bit signed exponent and 5-bit mantissa (1-bit mantissa sign and 4-bit mantissa magnitude) ie 7 6 5 4 3 2 1 0 exp exp exp mantissa_sign mantissa mantissa mantissa mantissa I need to make sure that the final value of fp_num i send out to the next block does not exceed the range [+num_clamp, -num_clamp], where num_clamp is also an 8-bit FP number with the same data bit representation as above, except the mantissa sign of num_clamp is always zero (ie the mantissa is always positive). One way i thought of doing this is to translate both fp_num and num_clamp to a fixed point number and then determine if fxd_num is less than or greater than fxd_num_clamp. I wondered if anyone has any other suggesstions to determine this floating point saturation, which might be simpler and/or better. Thanks in advance J
Floting Point Saturation
Started by ●November 3, 2009
Reply by ●November 3, 20092009-11-03
In comp.dsp thunder <jao16@hotmail.com> wrote:> I have a compute engine which generates an 8-bit floating number, > fp_num[7:0]. The format of this 8-bit floating point number is 3-bit > signed exponent and 5-bit mantissa (1-bit mantissa sign and 4-bit > mantissa magnitude) ie> 7 6 5 4 3 > 2 1 0 > exp exp exp mantissa_sign mantissa mantissa mantissa mantissa> I need to make sure that the final value of fp_num i send out to the > next block does not exceed the range [+num_clamp, -num_clamp], where > num_clamp is also an 8-bit FP number with the same data bit > representation as above, except the mantissa sign of num_clamp is > always zero (ie the mantissa is always positive).> One way i thought of doing this is to translate both fp_num and > num_clamp to a fixed point number and then determine if fxd_num is > less than or greater than fxd_num_clamp.Without making comments on the usefulness of such a small floating point value... It is usual to use a biased exponent instead of a signed exponent. Since I don't know what you mean by signed exponent, I will explain biased exponent. In three bits, 000 would be the smallest (most negative) exponent and 111 the largest (most positive). You get from twos complement to biased by inverting the sign bit. With a biased exponent on the left, normalized positive floating point numbers can be compared directly as unsigned values. In your case, all you need to do is set the sign bit to positive (usually 0), and compare. If the exponent is twos complement, invert the sign bit before comparing. -- glen
Reply by ●November 4, 20092009-11-04
Compare exponents Compare mantissas What's a problem? VLV thunder wrote:> Hi all > > I have a compute engine which generates an 8-bit floating number, > fp_num[7:0]. The format of this 8-bit floating point number is 3-bit > signed exponent and 5-bit mantissa (1-bit mantissa sign and 4-bit > mantissa magnitude) ie > > 7 6 5 4 3 > 2 1 0 > exp exp exp mantissa_sign mantissa mantissa mantissa mantissa > > > I need to make sure that the final value of fp_num i send out to the > next block does not exceed the range [+num_clamp, -num_clamp], where > num_clamp is also an 8-bit FP number with the same data bit > representation as above, except the mantissa sign of num_clamp is > always zero (ie the mantissa is always positive). > > > One way i thought of doing this is to translate both fp_num and > num_clamp to a fixed point number and then determine if fxd_num is > less than or greater than fxd_num_clamp. > > > I wondered if anyone has any other suggesstions to determine this > floating point saturation, which might be simpler and/or better. > > > Thanks in advance > > J
Reply by ●November 4, 20092009-11-04
In comp.dsp Vladimir Vassilevsky <nospam@nowhere.com> wrote:> Compare exponents > Compare mantissas > What's a problem?Twice as much work as you need to do. Note that the PDP-10 has one compare instruction for both fixed and floating point numbers. -- glen
Reply by ●November 4, 20092009-11-04
thunder wrote:> Hi all > > I have a compute engine which generates an 8-bit floating number, > fp_num[7:0]. The format of this 8-bit floating point number is 3-bit > signed exponent and 5-bit mantissa (1-bit mantissa sign and 4-bit > mantissa magnitude) ie > > 7 6 5 4 3 > 2 1 0 > exp exp exp mantissa_sign mantissa mantissa mantissa mantissa > > > I need to make sure that the final value of fp_num i send out to the > next block does not exceed the range [+num_clamp, -num_clamp], where > num_clamp is also an 8-bit FP number with the same data bit > representation as above, except the mantissa sign of num_clamp is > always zero (ie the mantissa is always positive). > > > One way i thought of doing this is to translate both fp_num and > num_clamp to a fixed point number and then determine if fxd_num is > less than or greater than fxd_num_clamp. > > > I wondered if anyone has any other suggesstions to determine this > floating point saturation, which might be simpler and/or better.You can generate a mere 256 numbers. How many are out of range? At worst, you could search a too-big list and a too-small list to see if your number is on it. How would you compare ordinary signed integers? Mask off the mantissas with AND 0x1F compare exponents. If .TG. or .LT., you have the enswer. If .EQ., shift both left 5 and compare again. Think about what happens if you XOR with 0x90 to make mantissa and exponent offset binary instead of two's complement before doing an unsigned integer compare. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●November 4, 20092009-11-04
glen herrmannsfeldt wrote:> In comp.dsp Vladimir Vassilevsky <nospam@nowhere.com> wrote: > > >>Compare exponents >>Compare mantissas >>What's a problem? > > > Twice as much work as you need to do.Actually less of work. Both compare operations are narrow and could be done in parallel. The question is likely in the context of FPGA.> Note that the PDP-10 has one compare instruction for both > fixed and floating point numbers.This is possible if the floats are in IEEE754-like representation and handling of NANs and denormals not required. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Reply by ●November 6, 20092009-11-06
On 3 Nov, 18:33, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:> In comp.dsp thunder <ja...@hotmail.com> wrote: > > > I have a compute engine which generates an 8-bit floating number, > > fp_num[7:0]. The format of this 8-bit floating point number is 3-bit > > signed exponent and 5-bit mantissa (1-bit mantissa sign and 4-bit > > mantissa magnitude) ie > > �7 � � �6 � � 5 � � �4 � � � � � � � � � � �3 > > 2 � � � � � 1 � � � � � � 0 > > exp �exp �exp �mantissa_sign � mantissa mantissa mantissa mantissa > > I need to make sure that the final value of fp_num i send out to the > > next block does not exceed the range [+num_clamp, -num_clamp], where > > num_clamp is also an 8-bit FP number with the same data bit > > representation as above, except the mantissa sign of num_clamp is > > always zero (ie the mantissa is always positive). > > One way i thought of doing this is to translate both fp_num and > > num_clamp to a fixed point number and then determine if �fxd_num is > > less than or greater than fxd_num_clamp. > > Without making comments on the usefulness of such a small floating > point value... > > It is usual to use a biased exponent instead of a signed exponent. > Since I don't know what you mean by signed exponent, �I will explain > biased exponent. > > In three bits, 000 would be the smallest (most negative) exponent > and 111 the largest (most positive). �You get from twos complement > to biased by inverting the sign bit. > > With a biased exponent on the left, normalized positive floating > point numbers can be compared directly as unsigned values. � > In your case, all you need to do is set the sign bit to positive > (usually 0), and compare. > > If the exponent is twos complement, invert the sign > bit �before comparing. > > -- glenThanks for the input. Just to clarify a few things ... a) By signed exponent, i mean the exponet is a 3-bit signed 2's complement number. THis means that the exponent has a range of [-4, +3]. b) Exponent bias - An exponent bais has already been added. The exponent bias is also a signed 3-bit 2's complement number. Thus the addition of the exponent bias generated a signed 4-bit 2's complement number. However, after the addition of the exponent bias, the resultant exponent is then saturated to be again a 3-bit signed 2's complement number with a range of [-4, +3]. Therefore, the exponent part of fp_num is a 3-bit signed 2's complement number having a range of [-4, +3]. c) Also the mantissa of fp_num is a normalised mantissa. The mantissa represents in this case a purely fractional part. Thus the normalisation of the mantissa means the MSB of the mantissa is set and implied. Thus for fp_num, we have fp_num[4] = mantissa sign and fp_num [3:0] being the mantissa magnitude. However, the mantissa magnitude is normalised. Thus actual_mantissa_magnitude = '1' & fp_num[3:0] (where & is concatenation in this instance). Thanks J
Reply by ●November 6, 20092009-11-06
On 4 Nov, 21:10, Vladimir Vassilevsky <nos...@nowhere.com> wrote:> glen herrmannsfeldt wrote: > > In comp.dsp Vladimir Vassilevsky <nos...@nowhere.com> wrote: > > >>Compare exponents > >>Compare mantissas > >>What's a problem? > > > Twice as much work as you need to do. > > Actually less of work. Both compare operations are narrow and could be > done in parallel. The question is likely in the context of FPGA. > > > Note that the PDP-10 has one compare instruction for both > > fixed and floating point numbers. > > This is possible if the floats are in IEEE754-like representation and > handling of NANs and denormals not required. > > Vladimir Vassilevsky > DSP and Mixed Signal Design Consultanthttp://www.abvolt.comThe compares will be done in parallel. Furthermore, the Floating numbers are internal representation and do not conform to IEEE 754 representation. Thus handling of NANs and denormals are not required. Thanks J
Reply by ●November 6, 20092009-11-06
In comp.dsp thunder <jao16@hotmail.com> wrote: (snip on comparing floating point values)> Thanks for the input.> Just to clarify a few things ...> a) By signed exponent, i mean the exponet is a 3-bit signed 2's > complement number. THis means that the exponent has a range of [-4, > +3].All that I know use a biased representation instead of twos complement. The actual difference is only in the sign bit. With a biased exponent instead of twos complement, you use an unsigned compare instead of a signed compare. It just makes things a little easier.> b) Exponent bias - An exponent bais has already been added. The > exponent bias is also a signed 3-bit 2's complement number. Thus the > addition of the exponent bias generated a signed 4-bit 2's complement > number. However, after the addition of the exponent bias, the > resultant exponent is then saturated to be again a 3-bit signed 2's > complement number with a range of [-4, +3]. Therefore, the exponent > part of fp_num is a 3-bit signed 2's complement number having a range > of [-4, +3].With a biased exponent the range is still -4 to +3, but the bits used to represent the value are different. There should be no discussion of twos complement here at all.> c) Also the mantissa of fp_num is a normalised mantissa. The mantissa > represents in this case a purely fractional part. Thus the > normalisation of the mantissa means the MSB of the mantissa is set and > implied. Thus for fp_num, we have fp_num[4] = mantissa sign and fp_num > [3:0] being the mantissa magnitude. However, the mantissa magnitude is > normalised. Thus actual_mantissa_magnitude = '1' & fp_num[3:0] (where > & is concatenation in this instance).I prefer 'significand' to 'mantissa', but otherwise I think that sounds fine. -- glen
Reply by ●November 6, 20092009-11-06
thunder wrote:> b) Exponent bias - An exponent bais has already been added. The > exponent bias is also a signed 3-bit 2's complement number. Thus the > addition of the exponent bias generated a signed 4-bit 2's complement > number. However, after the addition of the exponent bias, the > resultant exponent is then saturated to be again a 3-bit signed 2's > complement number with a range of [-4, +3]. Therefore, the exponent > part of fp_num is a 3-bit signed 2's complement number having a range > of [-4, +3]. > > c) Also the mantissa of fp_num is a normalised mantissa. The mantissa > represents in this case a purely fractional part. Thus the > normalisation of the mantissa means the MSB of the mantissa is set and > implied. Thus for fp_num, we have fp_num[4] = mantissa sign and fp_num > [3:0] being the mantissa magnitude. However, the mantissa magnitude is > normalised. Thus actual_mantissa_magnitude = '1'& fp_num[3:0] (where > & is concatenation in this instance).This means that all positive numbers can be compared with an unsigned (or signed, since the sign bit is zero) comparison, right? For negative numbers it is exactly the same, except the result must be inverted. When the values have opposite signs, the result is the inverse of the sign bit: int cmp8(int8_t a, int8_t b) { if ((a ^ b) & 0x80) { if (a < 0) return -1; // b is larger return 1; // a is larger (positive) } if (a < 0) return ((b & 0x7f) - (a & 0x7f)); return ((a & 0x7f) - (b & 0x7f)); } Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"






