comp.dsp | Porting LMS from floating-point to fixed-point processor| page 2

Reply by steve ●March 4, 20052005-03-04

Daniel wrote:
> Hello Everybody,
>
> for my diploma thesis, I have to implement a Least-Mean-Square
> Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on
a
> floating-point processor(TI 6713) earlier, so I just to the code and
> copied it. Of course, there are a lot of float variables in the code.
> When I ran the program, it workes for small FIR orders (6), but the
> larger the order of the filter, the worse the result.
> Is this because the 6416 cannot work with floating-point numbers
> accuretly?
> What can I do? When I convert all the float variables to integers I
> get overflow problems.
>
> Thanks a lot
> Daniel

I assume you took the LMS C code that was running on the 6713 and
recompiled it to run on the 6416 with an ANSI C complier. The results
should be identical, the 6713 uses its hardware to perform the floating
point calculations, the 6414 uses a software emulated floating point
package  to perform the floating point caculations. That should be
transparent to the user. Could it be you were using double precision
math on the 6713 and single on the 6416?

Converting the algorithm to integer math is not easy, you can't just
change the type as others have pointed out.

Reply by ●March 4, 20052005-03-04

Jerry Avins <jya@ieee.org> writes:

> Randy Yates wrote:
> > Jerry Avins <jya@ieee.org> writes:
> >
> 
> >> Doesn't the usual fixed-point hardware do a shift after
> >> multiplying? "Redundant sign bit" and all that.
> 
> > Depends on the processor. The TI TMS3205xx series does not by default
> 
> > (there is a register for setting this behavior). The Motorola does.
> 
> Default or not, it isn't behavior one gets automatically from int
> operations in C. That was my point.

If by "it" you mean "automatic left shift by one bit after multiply,"
I agree. That is, it is true that the integer multiply operations in C
do not automatically left shift the result by one.

However, that wasn't what you asked, so I'm not sure how my response
was off-point to your question.
-- 
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
randy.yates@sonyericsson.com, 919-472-1124

Reply by ●March 4, 20052005-03-04

Tim Wescott <tim@wescottnospamdesign.com> writes:

> Randy Yates wrote:
> 
> > Tim Wescott <tim@wescottnospamdesign.com> writes:
> >
> 
> >>[...]
> >>Generally if you stick to pure C you are stuck with integer math.
> >>DSP's are designed to do fixed-radix math pretty quickly, ...
> > Tim, I think most of your points are helpful, but this one is
> > off-the-mark
> 
> > in my judgement. The typical fixed-point DSP operates much the same as
> > the C integer operations, performing integer math. Whether the
> > integers are reinterpreted to be fractional, fixed-point, or integer
> > is all in the interpretation and has little or nothing to do with the
> > implementation of the basic arithmetic operations (add, subtract,
> > multiply).
> > Of course there are differences between fixed-point DSP ALUs and the
> 
> > "ALU" of a C compiler, the biggest of which are probably the wide
> > accumulators and the saturation options when performing various
> > operations. There is also the typical "left shift by 1" that a
> > fractional DSP does after a multiply to make the result fractional,
> > but that is certainly doable in C as well, albeit manually.
> 
> The difference in clock ticks between implementing a fixed-point
> arbitrary-radix vector dot-product in assembly on a DSP and trying to
> do the same thing to the same precision in C on the same processor is
> on the order of 100:1.

Who said anything about a vector operation? Your statement was

  Generally if you stick to pure C you are stuck with integer math.
  DSP's are designed to do fixed-radix math pretty quickly, ...

The term "math" does not mean "vector math" in my interpretation.

> Even on a MAC-less processor when you are in assembly and multiply two
> signed numbers N-bit numbers you can choose to take the lower N bits
> of the 2N-1-bit result as C does, or you can take the upper N-1 bits
> and do a shift, with way fewer clock cycles (10 or 20:1) than you
> could implement the same functionality in C.  I should know -- I've
> done it in C a couple of times and in assembly on three or four
> different processors.

Apparently they did not include the TI TMS320C54x, arguably one of
the most popular DSPs around, and on that processor, the following
code

#include "dsptypes.h"

/* definitions */

#define VECTOR_LENGTH 64

/* local variables */

/* local function prototypes */

/* function definitions */

int main(int margc, char **margv) 
{
   UINT16_T n;
   INT16_T  x[VECTOR_LENGTH];
   INT16_T  y[VECTOR_LENGTH];
   INT32_T  acc;
   INT16_T  result;

   acc = 0;
   for (n = 0; n < VECTOR_LENGTH; n++)
   {
      x[n] = n;
      y[n] = VECTOR_LENGTH - n - 1;
   }

   acc = 0;
   for (n = 0; n < VECTOR_LENGTH; n++)
   {
      acc += x[n] * y[n];
   }

   result = (INT16_T)(acc >> 16);

   return result;
}


produces the following assembly language

0000:0108      main
0000:0108 4A11      PSHM  11h
0000:0109 4A17      PSHM  17h
0000:010A EE80      FRAME -128
0000:010B E781      MVMM  SP,AR1
0000:010C 6DE9      MAR   *+AR1(64)
0000:010E E787      MVMM  SP,AR7
0000:010F E782      MVMM  SP,AR2
0000:0110 E800      LD    #0h,A
0000:0111 771A      STM   3fh,1ah
0000:0113 F072      RPTB  11ah
0000:0115      L1
0000:0115 8092      STL   A,*AR2+
0000:0116 E93F      LD    #3fh,B
0000:0117 F520      SUB   A,0,B
0000:0118 8191      STL   B,*AR1+
0000:0119 F000      ADD   #1h,0,A,A
0000:011B      L2
0000:011B E782      MVMM  SP,AR2
0000:011C 6DEA      MAR   *+AR2(64)
0000:011E E783      MVMM  SP,AR3
0000:011F E800      LD    #0h,A
0000:0120 EC3F      RPT   #3fh
0000:0121      L3
0000:0121 B089      MAC   *AR2+,*AR3+,A,A
0000:0122      L4
0000:0122 F0E0      SFTL  A,0,A
0000:0123 F0F0      SFTL  A,-16,A
0000:0124 6BF8      ADDM  80h,*(18h)
0000:0127 F495      NOP   
0000:0128 F495      NOP   
0000:0129 8A17      POPM  17h
0000:012A 8A11      POPM  11h
0000:012B F4E4      FRET  

Both the vector multiply and the end shift look pretty damn efficient
to me, Tim. 

Thus even if we agree to interpret your point differently, it's still
inaccurate for one of the most popular DSPs in the world.
-- 
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
randy.yates@sonyericsson.com, 919-472-1124

Reply by ●March 4, 20052005-03-04

PS

Tim Wescott <tim@wescottnospamdesign.com> writes:

> Even on a MAC-less processor when you are in assembly and multiply two
> signed numbers N-bit numbers you can choose to take the lower N bits
> of the 2N-1-bit result as C does, or you can take the upper N-1 bits
> and do a shift, with way fewer clock cycles (10 or 20:1) than you
> could implement the same functionality in C. 

The product of two N-bit numbers using two's complement arithmetic
requires 2*N bits to represent all possible results, not 2*N - 1 bits.

We can agree that it is alright to use 2*N - 1 bits and saturate or
truncate the one input combination that requires 2*N bits, but it is
improper to merely assume this is to be done (often it is not!).
-- 
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
randy.yates@sonyericsson.com, 919-472-1124

Reply by Tim Wescott ●March 4, 20052005-03-04

Randy Yates wrote:

> PS
> 
> Tim Wescott <tim@wescottnospamdesign.com> writes:
> 
> 
>>Even on a MAC-less processor when you are in assembly and multiply two
>>signed numbers N-bit numbers you can choose to take the lower N bits
>>of the 2N-1-bit result as C does, or you can take the upper N-1 bits
>>and do a shift, with way fewer clock cycles (10 or 20:1) than you
>>could implement the same functionality in C. 
> 
> 
> The product of two N-bit numbers using two's complement arithmetic
> requires 2*N bits to represent all possible results, not 2*N - 1 bits.
> 
> We can agree that it is alright to use 2*N - 1 bits and saturate or
> truncate the one input combination that requires 2*N bits, but it is
> improper to merely assume this is to be done (often it is not!).

Oh.  heh heh.  Actually I was talking about the operation y = -(x1*x2) :).

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Tim Wescott ●March 4, 20052005-03-04

Randy Yates wrote:

> Tim Wescott <tim@wescottnospamdesign.com> writes:
> 
> 
>>Randy Yates wrote:
>>
>>
>>>Tim Wescott <tim@wescottnospamdesign.com> writes:
>>>
>>
>>>>[...]
>>>>Generally if you stick to pure C you are stuck with integer math.
>>>>DSP's are designed to do fixed-radix math pretty quickly, ...
>>>
>>>Tim, I think most of your points are helpful, but this one is
>>>off-the-mark
>>
>>>in my judgement. The typical fixed-point DSP operates much the same as
>>>the C integer operations, performing integer math. Whether the
>>>integers are reinterpreted to be fractional, fixed-point, or integer
>>>is all in the interpretation and has little or nothing to do with the
>>>implementation of the basic arithmetic operations (add, subtract,
>>>multiply).
>>>Of course there are differences between fixed-point DSP ALUs and the
>>
>>>"ALU" of a C compiler, the biggest of which are probably the wide
>>>accumulators and the saturation options when performing various
>>>operations. There is also the typical "left shift by 1" that a
>>>fractional DSP does after a multiply to make the result fractional,
>>>but that is certainly doable in C as well, albeit manually.
>>
>>The difference in clock ticks between implementing a fixed-point
>>arbitrary-radix vector dot-product in assembly on a DSP and trying to
>>do the same thing to the same precision in C on the same processor is
>>on the order of 100:1.
> 
> 
> Who said anything about a vector operation? Your statement was
> 
>   Generally if you stick to pure C you are stuck with integer math.
>   DSP's are designed to do fixed-radix math pretty quickly, ...
> 
> The term "math" does not mean "vector math" in my interpretation.
> 
> 
>>Even on a MAC-less processor when you are in assembly and multiply two
>>signed numbers N-bit numbers you can choose to take the lower N bits
>>of the 2N-1-bit result as C does, or you can take the upper N-1 bits
>>and do a shift, with way fewer clock cycles (10 or 20:1) than you
>>could implement the same functionality in C.  I should know -- I've
>>done it in C a couple of times and in assembly on three or four
>>different processors.
> 
> 
> Apparently they did not include the TI TMS320C54x, arguably one of
> the most popular DSPs around, and on that processor, the following
> code
> 
> #include "dsptypes.h"
> 
> /* definitions */
> 
> #define VECTOR_LENGTH 64
> 
> /* local variables */
> 
> /* local function prototypes */
> 
> /* function definitions */
> 
> int main(int margc, char **margv) 
> {
>    UINT16_T n;
>    INT16_T  x[VECTOR_LENGTH];
>    INT16_T  y[VECTOR_LENGTH];
>    INT32_T  acc;
>    INT16_T  result;
> 
>    acc = 0;
>    for (n = 0; n < VECTOR_LENGTH; n++)
>    {
>       x[n] = n;
>       y[n] = VECTOR_LENGTH - n - 1;
>    }
> 
>    acc = 0;
>    for (n = 0; n < VECTOR_LENGTH; n++)
>    {
>       acc += x[n] * y[n];
>    }
> 
>    result = (INT16_T)(acc >> 16);
> 
>    return result;
> }
> 
> 
> produces the following assembly language
> 
> 0000:0108      main
> 0000:0108 4A11      PSHM  11h
> 0000:0109 4A17      PSHM  17h
> 0000:010A EE80      FRAME -128
> 0000:010B E781      MVMM  SP,AR1
> 0000:010C 6DE9      MAR   *+AR1(64)
> 0000:010E E787      MVMM  SP,AR7
> 0000:010F E782      MVMM  SP,AR2
> 0000:0110 E800      LD    #0h,A
> 0000:0111 771A      STM   3fh,1ah
> 0000:0113 F072      RPTB  11ah
> 0000:0115      L1
> 0000:0115 8092      STL   A,*AR2+
> 0000:0116 E93F      LD    #3fh,B
> 0000:0117 F520      SUB   A,0,B
> 0000:0118 8191      STL   B,*AR1+
> 0000:0119 F000      ADD   #1h,0,A,A
> 0000:011B      L2
> 0000:011B E782      MVMM  SP,AR2
> 0000:011C 6DEA      MAR   *+AR2(64)
> 0000:011E E783      MVMM  SP,AR3
> 0000:011F E800      LD    #0h,A
> 0000:0120 EC3F      RPT   #3fh
> 0000:0121      L3
> 0000:0121 B089      MAC   *AR2+,*AR3+,A,A
> 0000:0122      L4
> 0000:0122 F0E0      SFTL  A,0,A
> 0000:0123 F0F0      SFTL  A,-16,A
> 0000:0124 6BF8      ADDM  80h,*(18h)
> 0000:0127 F495      NOP   
> 0000:0128 F495      NOP   
> 0000:0129 8A17      POPM  17h
> 0000:012A 8A11      POPM  11h
> 0000:012B F4E4      FRET  
> 
> Both the vector multiply and the end shift look pretty damn efficient
> to me, Tim. 
> 
> Thus even if we agree to interpret your point differently, it's still
> inaccurate for one of the most popular DSPs in the world.

(A) That is the _only_ case that I know of for sure that the compiler 
can figure out it needs to use a MAC and shift -- the version of Code 
Composter that comes with the '2812 certainly doesn't do this, or I 
couldn't find the magic finger-ring combination.

(B) As usual TI is playing fast and loose with the ANSI standard, and 
that isn't even close to ANSI-compatible C.  If it were the x[n] * y[n] 
operation would be truncated to 16 bits before being added to acc, and 
the result would be meaningless.  Compile that up on machine that 
supports 16-bit and 32-bit integers, print out the results, and see what 
I mean.

Furthermore, you are actually making my point:  by starting with an 
awareness of the one thing that sets a DSP apart from the rest and 
warping your code to fit that one thing you can make the operation very 
fast.  But in production code you will have to be constantly on guard to 
make sure that the C code isn't "improved" in such a way that makes the 
compiler implement it as a bunch of "traditional" integer operations, 
thereby making it take 10-100 times slower, and likely reintroducing the 
truncation (more like 10, TI is good at making fast processors).

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Jerry Avins ●March 4, 20052005-03-04

Randy Yates wrote:
> Jerry Avins <jya@ieee.org> writes:
> 
> 
>>Randy Yates wrote:
>>
>>>Jerry Avins <jya@ieee.org> writes:
>>>
>>
>>>>Doesn't the usual fixed-point hardware do a shift after
>>>>multiplying? "Redundant sign bit" and all that.
>>
>>>Depends on the processor. The TI TMS3205xx series does not by default
>>
>>>(there is a register for setting this behavior). The Motorola does.
>>
>>Default or not, it isn't behavior one gets automatically from int
>>operations in C. That was my point.
> 
> 
> If by "it" you mean "automatic left shift by one bit after multiply,"
> I agree. That is, it is true that the integer multiply operations in C
> do not automatically left shift the result by one.
> 
> However, that wasn't what you asked, so I'm not sure how my response
> was off-point to your question.

Then, as all too often, I was too oblique to get my point across. 
Earlier, you wrote, "Whether the integers are reinterpreted to be 
fractional, fixed- point, or integer is all in the interpretation and 
has little or nothing to do with the implementation of the basic 
arithmetic operations (add, subtract, multiply)."

There is -- or can be -- a slight but significant difference with 
multiplication. Coding the shift in C can use a few extra cycles.

Since the choice between fixed-point and integer multiplication is made 
with a bit in a register, I doubt that any compiler that knows only 
integer would do fixed point efficiently. There are other differences 
that just occurred to me. In the fixed-point shift, what had been the 
MSB of the low-order word becomes the LSB of the returned result; that's 
not possible with a HLL integer multiply. Worse, an HLL integer multiply 
returns the low word of the product. A fixed-point multiply returns 
(mostly) the high word. I can remedy this with int in C by promoting to 
long, then multiplying, shifting, and truncating to int. That's painful 
(although it could be a macro) and time consuming. And it won't work 
with longs.

That's too stupid a scenario. What am I missing?

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by robert bristow-johnson ●March 4, 20052005-03-04

in article xxp3bvb1mkl.fsf@usrts005.corpusers.net, Randy Yates at
randy.yates@sonyericsson.com wrote on 03/04/2005 10:02:

> Tim Wescott <tim@wescottnospamdesign.com> writes:
> 
>> Even on a MAC-less processor when you are in assembly and multiply two
>> signed numbers N-bit numbers you can choose to take the lower N bits
>> of the 2N-1 bit result as C does, or you can take the upper N-1 bits
>> and do a shift, with way fewer clock cycles (10 or 20:1) than you
>> could implement the same functionality in C.
> 
> The product of two N-bit numbers using two's complement arithmetic
> requires 2*N bits to represent all possible results, not 2*N - 1 bits.
> 
> We can agree that it is alright to use 2*N - 1 bits and saturate or
> truncate the one input combination that requires 2*N bits, but it is
> improper to merely assume this is to be done (often it is not!).

it *is* only one case, the -1 x -1 = +1 (or in integer math,
 -2^(N-1) x -2^(N-1) = +2^(2N-2) which requires 2N bits.  it is the only
case where the two MSBs are not identical.

nonetheless, because an N-bit "fractional" fixed-point number, F, is related
to its N-bit 2s-complement integer representation, I, (same bit pattern for
both) by

    I = 2^(N-1) * [ F ]    or    F = 2^(1-N) * [ I ]

when two fixed-point numbers are multiplied together

   F1*F2 = [2^(1-N)*I1]*[2^(1-N)*I2]

         = 2^(1-N) * [ (2^(1-N) * (I1*I2) ]

so after the hardware does the integer multiplication, you have to shift
this N-1 bits to the right to get the properly scaled result no matter what.
even in this special case where F1 = F2 = -1.0  .  in an integer machine,
you do your N bit x N bit signed integer multiply, shift left 1 bit, and
take your result from the most significant word of the 2N bit product
register (which is the same as shifting right N bits).  i know this is in
your treatise, Randy, but i wanna make or emphasize another point.

what struck me odd about Motorola DSP56K and DSP563xx series, is the LSB in
the 56 bit accumulator.  they scaled everything right, but failed to
recognize that the LSB was always zero (because it's a 47 bit result after
MPY, not a 48 bit result).  this was a big mistake for a lot of reasons.
one that comes to mind is the code you need in the 56K to do table lookup
and linear interpolation, which i used to do all the damn time.

    move                #>functable,r1
    move                #>(functable_size/2),n1 ; half size of table
    move                n1,y0                   ; this copies the bits
    move                x:function_input,x0     ; -1 <= function_input < 1
    mpy     y0,x0,a     (r1)+n1                 ; point to middle of table
    move                a1,n1
    move                a0,b1                   ; fractional bits go into b1
    lsr     b           (r1)+n1                 ; point to first value
    move                x:(r1)+,x0              ; get first point
    tfr     x0,a        b1,y0
    mac     -y0,x0,a    x:(r1),x0               ; get next point
    macr    y0,x0,a                             ; finish interpolation
                                                ; function result in a

this does one table lookup and linear interpolation.

if, in this instruction

    move                a0,b1

                                     they moved the 23 MSBs of a0 into the
23 LSBs of the destination register (with a zero extension), then i could
move it directly to y0 and eliminate some instructions.  since, after an MPY
instruction, the LSB of a0 or b0 is always zero (because of that left shift
by one bit inherent to fixed-point arithmetic), they could get rid of that
bit and then the 23 bits that are left are precisely the fractional bits i
want, and if zero extended when moved to a 24 bit number signed fractional
register, are precisely in the fractional form that i want.

so, in general, an N bit x N bit signed fixed-point number is a 2N-1 bit
result where you get the similarly scaled result out of the N most
significant bits.  sign extending that into guard bits will take care of the
one case of -1 x -1 (as well as take care of other problems), but there are
only N-1 "fractional bits" below those N bits to worry about and never N
bits.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by ●March 4, 20052005-03-04

Jerry Avins <jya@ieee.org> writes:

> Randy Yates wrote:
> > Jerry Avins <jya@ieee.org> writes:
> >
> 
> >>Randy Yates wrote:
> >>
> >>>Jerry Avins <jya@ieee.org> writes:
> >>>
> >>
> >>>>Doesn't the usual fixed-point hardware do a shift after
> >>>>multiplying? "Redundant sign bit" and all that.
> >>
> >>>Depends on the processor. The TI TMS3205xx series does not by default
> >>
> >>>(there is a register for setting this behavior). The Motorola does.
> >>
> >>Default or not, it isn't behavior one gets automatically from int
> >>operations in C. That was my point.
> > If by "it" you mean "automatic left shift by one bit after multiply,"
> 
> > I agree. That is, it is true that the integer multiply operations in C
> > do not automatically left shift the result by one.
> > However, that wasn't what you asked, so I'm not sure how my response
> 
> > was off-point to your question.
> 
> Then, as all too often, I was too oblique to get my point
> across. Earlier, you wrote, "Whether the integers are reinterpreted to
> be fractional, fixed- point, or integer is all in the interpretation
> and has little or nothing to do with the implementation of the basic
> arithmetic operations (add, subtract, multiply)."
> 
> 
> There is -- or can be -- a slight but significant difference with
> multiplication. Coding the shift in C can use a few extra cycles.

Yes, in the case where you are doing the left shift, there may be
extra cycles required to do this in C. However, this isn't always the
case, and when it is the case, the number of cycles in the difference
will probably be between 0 and "not very many," depending on the
compiler, the specific types of operations being performed, and the
ingenuity of the programmer.

Thus the distinction does not warrant a generic caveat regarding
fixed-point arithmetic in C, in my opinion.

> Since the choice between fixed-point and integer multiplication is
> made with a bit in a register, 

It is this sort of thinking that is propagating the error in this
thread. This logic is tantamount to saying that "the difference
between a Ford and a Chevy is that a Ford has a better ride." It
attempts to generalize what may be true in a specific case. 

> I doubt that any compiler that knows
> only integer would do fixed point efficiently. 

Did you see my other post where I gave actual code that refutes this?

> There are other
> differences that just occurred to me. In the fixed-point shift, what
> had been the MSB of the low-order word becomes the LSB of the returned
> result; 

I'm not sure of the scenario you're trying to describe here. 

> that's not possible with a HLL integer multiply. Worse, an HLL
> integer multiply returns the low word of the product. A fixed-point
> multiply returns (mostly) the high word. 

Not necessarily. Depends on what you're doing. 

> I can remedy this with int in
> C by promoting to long, then multiplying, shifting, and truncating to
> int. That's painful (although it could be a macro) and time
> consuming. 

What's so time-consuming about it? It might cost you one cycle. Big
whoop.

> And it won't work with longs.

True unless the compiler supports an extended long, i.e., an integer
the size of the accumulator. Although double-precision multiplication
takes us pretty far afield from the topic. 

> That's too stupid a scenario. What am I missing?

I think you're right - you are just beginning to think about more of
the details.
-- 
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
randy.yates@sonyericsson.com, 919-472-1124

Reply by Jerry Avins ●March 4, 20052005-03-04

Randy Yates wrote:
> Jerry Avins <jya@ieee.org> writes:

    ...

>>There is -- or can be -- a slight but significant difference with
>>multiplication. Coding the shift in C can use a few extra cycles.
> 
> 
> Yes, in the case where you are doing the left shift, there may be
> extra cycles required to do this in C. However, this isn't always the
> case, and when it is the case, the number of cycles in the difference
> will probably be between 0 and "not very many," depending on the
> compiler, the specific types of operations being performed, and the
> ingenuity of the programmer.
> 
> Thus the distinction does not warrant a generic caveat regarding
> fixed-point arithmetic in C, in my opinion.

What general caveat, "Watch out for gotchas"?

>>Since the choice between fixed-point and integer multiplication is
>>made with a bit in a register, 
> 
> 
> It is this sort of thinking that is propagating the error in this
> thread. This logic is tantamount to saying that "the difference
> between a Ford and a Chevy is that a Ford has a better ride." It
> attempts to generalize what may be true in a specific case. 

I don't understand. Can I expect a C compiler to change the 
integer/~fixed bit as appropriate?

>>I doubt that any compiler that knows
>>only integer would do fixed point efficiently. 
> 
> 
> Did you see my other post where I gave actual code that refutes this?

Yes. Impressive but, I suspect, rare.

>>There are other
>>differences that just occurred to me. In the fixed-point shift, what
>>had been the MSB of the low-order word becomes the LSB of the returned
>>result; 
> 
> 
> I'm not sure of the scenario you're trying to describe here. 

The product of a 32-by-32 multiply has 64 bits. Numbering them zero to 
63 in increasing significance, an ordinary integer multiply returns zero 
to 31, or zero to 30 plus the sign bit, 63. A fixed-point multiply 
should return bits 31 to 62 of the product. Shifting the upper word 
alone forces bit 31 to zero. (I think I could be clearer. Should I try?)

>>that's not possible with a HLL integer multiply. Worse, an HLL
>>integer multiply returns the low word of the product. A fixed-point
>>multiply returns (mostly) the high word. 
> 
> 
> Not necessarily. Depends on what you're doing. 

Please elaborate.

>>I can remedy this with int in
>>C by promoting to long, then multiplying, shifting, and truncating to
>>int. That's painful (although it could be a macro) and time
>>consuming. 
> 
> 
> What's so time-consuming about it? It might cost you one cycle. Big
> whoop.

Better than I supposed by far. I may give up on assembler after all!

>>And it won't work with longs.
> 
> 
> True unless the compiler supports an extended long, i.e., an integer
> the size of the accumulator. Although double-precision multiplication
> takes us pretty far afield from the topic. 
> 
> 
>>That's too stupid a scenario. What am I missing?
> 
> 
> I think you're right - you are just beginning to think about more of
> the details.

I hate psyching out a compiler when I know what I want in the end. "Just 
do it" never seemed more appropriate.

Thanks for the education.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Previous 123 4 Next

Porting LMS from floating-point to fixed-point processor

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group