DSPRelated.com
Forums

Porting LMS from floating-point to fixed-point processor

Started by Daniel March 3, 2005
Daniel wrote:
> Hello Everybody, > > for my diploma thesis, I have to implement a Least-Mean-Square > Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on
a
> floating-point processor(TI 6713) earlier, so I just to the code and > copied it. Of course, there are a lot of float variables in the code. > When I ran the program, it workes for small FIR orders (6), but the > larger the order of the filter, the worse the result. > Is this because the 6416 cannot work with floating-point numbers > accuretly? > What can I do? When I convert all the float variables to integers I > get overflow problems. > > Thanks a lot > Daniel
I assume you took the LMS C code that was running on the 6713 and recompiled it to run on the 6416 with an ANSI C complier. The results should be identical, the 6713 uses its hardware to perform the floating point calculations, the 6414 uses a software emulated floating point package to perform the floating point caculations. That should be transparent to the user. Could it be you were using double precision math on the 6713 and single on the 6416? Converting the algorithm to integer math is not easy, you can't just change the type as others have pointed out.
Jerry Avins <jya@ieee.org> writes:

> Randy Yates wrote: > > Jerry Avins <jya@ieee.org> writes: > > > > >> Doesn't the usual fixed-point hardware do a shift after > >> multiplying? "Redundant sign bit" and all that. > > > Depends on the processor. The TI TMS3205xx series does not by default > > > (there is a register for setting this behavior). The Motorola does. > > Default or not, it isn't behavior one gets automatically from int > operations in C. That was my point.
If by "it" you mean "automatic left shift by one bit after multiply," I agree. That is, it is true that the integer multiply operations in C do not automatically left shift the result by one. However, that wasn't what you asked, so I'm not sure how my response was off-point to your question. -- Randy Yates Sony Ericsson Mobile Communications Research Triangle Park, NC, USA randy.yates@sonyericsson.com, 919-472-1124
Tim Wescott <tim@wescottnospamdesign.com> writes:

> Randy Yates wrote: > > > Tim Wescott <tim@wescottnospamdesign.com> writes: > > > > >>[...] > >>Generally if you stick to pure C you are stuck with integer math. > >>DSP's are designed to do fixed-radix math pretty quickly, ... > > Tim, I think most of your points are helpful, but this one is > > off-the-mark > > > in my judgement. The typical fixed-point DSP operates much the same as > > the C integer operations, performing integer math. Whether the > > integers are reinterpreted to be fractional, fixed-point, or integer > > is all in the interpretation and has little or nothing to do with the > > implementation of the basic arithmetic operations (add, subtract, > > multiply). > > Of course there are differences between fixed-point DSP ALUs and the > > > "ALU" of a C compiler, the biggest of which are probably the wide > > accumulators and the saturation options when performing various > > operations. There is also the typical "left shift by 1" that a > > fractional DSP does after a multiply to make the result fractional, > > but that is certainly doable in C as well, albeit manually. > > The difference in clock ticks between implementing a fixed-point > arbitrary-radix vector dot-product in assembly on a DSP and trying to > do the same thing to the same precision in C on the same processor is > on the order of 100:1.
Who said anything about a vector operation? Your statement was Generally if you stick to pure C you are stuck with integer math. DSP's are designed to do fixed-radix math pretty quickly, ... The term "math" does not mean "vector math" in my interpretation.
> Even on a MAC-less processor when you are in assembly and multiply two > signed numbers N-bit numbers you can choose to take the lower N bits > of the 2N-1-bit result as C does, or you can take the upper N-1 bits > and do a shift, with way fewer clock cycles (10 or 20:1) than you > could implement the same functionality in C. I should know -- I've > done it in C a couple of times and in assembly on three or four > different processors.
Apparently they did not include the TI TMS320C54x, arguably one of the most popular DSPs around, and on that processor, the following code #include "dsptypes.h" /* definitions */ #define VECTOR_LENGTH 64 /* local variables */ /* local function prototypes */ /* function definitions */ int main(int margc, char **margv) { UINT16_T n; INT16_T x[VECTOR_LENGTH]; INT16_T y[VECTOR_LENGTH]; INT32_T acc; INT16_T result; acc = 0; for (n = 0; n < VECTOR_LENGTH; n++) { x[n] = n; y[n] = VECTOR_LENGTH - n - 1; } acc = 0; for (n = 0; n < VECTOR_LENGTH; n++) { acc += x[n] * y[n]; } result = (INT16_T)(acc >> 16); return result; } produces the following assembly language 0000:0108 main 0000:0108 4A11 PSHM 11h 0000:0109 4A17 PSHM 17h 0000:010A EE80 FRAME -128 0000:010B E781 MVMM SP,AR1 0000:010C 6DE9 MAR *+AR1(64) 0000:010E E787 MVMM SP,AR7 0000:010F E782 MVMM SP,AR2 0000:0110 E800 LD #0h,A 0000:0111 771A STM 3fh,1ah 0000:0113 F072 RPTB 11ah 0000:0115 L1 0000:0115 8092 STL A,*AR2+ 0000:0116 E93F LD #3fh,B 0000:0117 F520 SUB A,0,B 0000:0118 8191 STL B,*AR1+ 0000:0119 F000 ADD #1h,0,A,A 0000:011B L2 0000:011B E782 MVMM SP,AR2 0000:011C 6DEA MAR *+AR2(64) 0000:011E E783 MVMM SP,AR3 0000:011F E800 LD #0h,A 0000:0120 EC3F RPT #3fh 0000:0121 L3 0000:0121 B089 MAC *AR2+,*AR3+,A,A 0000:0122 L4 0000:0122 F0E0 SFTL A,0,A 0000:0123 F0F0 SFTL A,-16,A 0000:0124 6BF8 ADDM 80h,*(18h) 0000:0127 F495 NOP 0000:0128 F495 NOP 0000:0129 8A17 POPM 17h 0000:012A 8A11 POPM 11h 0000:012B F4E4 FRET Both the vector multiply and the end shift look pretty damn efficient to me, Tim. Thus even if we agree to interpret your point differently, it's still inaccurate for one of the most popular DSPs in the world. -- Randy Yates Sony Ericsson Mobile Communications Research Triangle Park, NC, USA randy.yates@sonyericsson.com, 919-472-1124
PS

Tim Wescott <tim@wescottnospamdesign.com> writes:

> Even on a MAC-less processor when you are in assembly and multiply two > signed numbers N-bit numbers you can choose to take the lower N bits > of the 2N-1-bit result as C does, or you can take the upper N-1 bits > and do a shift, with way fewer clock cycles (10 or 20:1) than you > could implement the same functionality in C.
The product of two N-bit numbers using two's complement arithmetic requires 2*N bits to represent all possible results, not 2*N - 1 bits. We can agree that it is alright to use 2*N - 1 bits and saturate or truncate the one input combination that requires 2*N bits, but it is improper to merely assume this is to be done (often it is not!). -- Randy Yates Sony Ericsson Mobile Communications Research Triangle Park, NC, USA randy.yates@sonyericsson.com, 919-472-1124
Randy Yates wrote:

> PS > > Tim Wescott <tim@wescottnospamdesign.com> writes: > > >>Even on a MAC-less processor when you are in assembly and multiply two >>signed numbers N-bit numbers you can choose to take the lower N bits >>of the 2N-1-bit result as C does, or you can take the upper N-1 bits >>and do a shift, with way fewer clock cycles (10 or 20:1) than you >>could implement the same functionality in C. > > > The product of two N-bit numbers using two's complement arithmetic > requires 2*N bits to represent all possible results, not 2*N - 1 bits. > > We can agree that it is alright to use 2*N - 1 bits and saturate or > truncate the one input combination that requires 2*N bits, but it is > improper to merely assume this is to be done (often it is not!).
Oh. heh heh. Actually I was talking about the operation y = -(x1*x2) :). -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Randy Yates wrote:

> Tim Wescott <tim@wescottnospamdesign.com> writes: > > >>Randy Yates wrote: >> >> >>>Tim Wescott <tim@wescottnospamdesign.com> writes: >>> >> >>>>[...] >>>>Generally if you stick to pure C you are stuck with integer math. >>>>DSP's are designed to do fixed-radix math pretty quickly, ... >>> >>>Tim, I think most of your points are helpful, but this one is >>>off-the-mark >> >>>in my judgement. The typical fixed-point DSP operates much the same as >>>the C integer operations, performing integer math. Whether the >>>integers are reinterpreted to be fractional, fixed-point, or integer >>>is all in the interpretation and has little or nothing to do with the >>>implementation of the basic arithmetic operations (add, subtract, >>>multiply). >>>Of course there are differences between fixed-point DSP ALUs and the >> >>>"ALU" of a C compiler, the biggest of which are probably the wide >>>accumulators and the saturation options when performing various >>>operations. There is also the typical "left shift by 1" that a >>>fractional DSP does after a multiply to make the result fractional, >>>but that is certainly doable in C as well, albeit manually. >> >>The difference in clock ticks between implementing a fixed-point >>arbitrary-radix vector dot-product in assembly on a DSP and trying to >>do the same thing to the same precision in C on the same processor is >>on the order of 100:1. > > > Who said anything about a vector operation? Your statement was > > Generally if you stick to pure C you are stuck with integer math. > DSP's are designed to do fixed-radix math pretty quickly, ... > > The term "math" does not mean "vector math" in my interpretation. > > >>Even on a MAC-less processor when you are in assembly and multiply two >>signed numbers N-bit numbers you can choose to take the lower N bits >>of the 2N-1-bit result as C does, or you can take the upper N-1 bits >>and do a shift, with way fewer clock cycles (10 or 20:1) than you >>could implement the same functionality in C. I should know -- I've >>done it in C a couple of times and in assembly on three or four >>different processors. > > > Apparently they did not include the TI TMS320C54x, arguably one of > the most popular DSPs around, and on that processor, the following > code > > #include "dsptypes.h" > > /* definitions */ > > #define VECTOR_LENGTH 64 > > /* local variables */ > > /* local function prototypes */ > > /* function definitions */ > > int main(int margc, char **margv) > { > UINT16_T n; > INT16_T x[VECTOR_LENGTH]; > INT16_T y[VECTOR_LENGTH]; > INT32_T acc; > INT16_T result; > > acc = 0; > for (n = 0; n < VECTOR_LENGTH; n++) > { > x[n] = n; > y[n] = VECTOR_LENGTH - n - 1; > } > > acc = 0; > for (n = 0; n < VECTOR_LENGTH; n++) > { > acc += x[n] * y[n]; > } > > result = (INT16_T)(acc >> 16); > > return result; > } > > > produces the following assembly language > > 0000:0108 main > 0000:0108 4A11 PSHM 11h > 0000:0109 4A17 PSHM 17h > 0000:010A EE80 FRAME -128 > 0000:010B E781 MVMM SP,AR1 > 0000:010C 6DE9 MAR *+AR1(64) > 0000:010E E787 MVMM SP,AR7 > 0000:010F E782 MVMM SP,AR2 > 0000:0110 E800 LD #0h,A > 0000:0111 771A STM 3fh,1ah > 0000:0113 F072 RPTB 11ah > 0000:0115 L1 > 0000:0115 8092 STL A,*AR2+ > 0000:0116 E93F LD #3fh,B > 0000:0117 F520 SUB A,0,B > 0000:0118 8191 STL B,*AR1+ > 0000:0119 F000 ADD #1h,0,A,A > 0000:011B L2 > 0000:011B E782 MVMM SP,AR2 > 0000:011C 6DEA MAR *+AR2(64) > 0000:011E E783 MVMM SP,AR3 > 0000:011F E800 LD #0h,A > 0000:0120 EC3F RPT #3fh > 0000:0121 L3 > 0000:0121 B089 MAC *AR2+,*AR3+,A,A > 0000:0122 L4 > 0000:0122 F0E0 SFTL A,0,A > 0000:0123 F0F0 SFTL A,-16,A > 0000:0124 6BF8 ADDM 80h,*(18h) > 0000:0127 F495 NOP > 0000:0128 F495 NOP > 0000:0129 8A17 POPM 17h > 0000:012A 8A11 POPM 11h > 0000:012B F4E4 FRET > > Both the vector multiply and the end shift look pretty damn efficient > to me, Tim. > > Thus even if we agree to interpret your point differently, it's still > inaccurate for one of the most popular DSPs in the world.
(A) That is the _only_ case that I know of for sure that the compiler can figure out it needs to use a MAC and shift -- the version of Code Composter that comes with the '2812 certainly doesn't do this, or I couldn't find the magic finger-ring combination. (B) As usual TI is playing fast and loose with the ANSI standard, and that isn't even close to ANSI-compatible C. If it were the x[n] * y[n] operation would be truncated to 16 bits before being added to acc, and the result would be meaningless. Compile that up on machine that supports 16-bit and 32-bit integers, print out the results, and see what I mean. Furthermore, you are actually making my point: by starting with an awareness of the one thing that sets a DSP apart from the rest and warping your code to fit that one thing you can make the operation very fast. But in production code you will have to be constantly on guard to make sure that the C code isn't "improved" in such a way that makes the compiler implement it as a bunch of "traditional" integer operations, thereby making it take 10-100 times slower, and likely reintroducing the truncation (more like 10, TI is good at making fast processors). -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Randy Yates wrote:
> Jerry Avins <jya@ieee.org> writes: > > >>Randy Yates wrote: >> >>>Jerry Avins <jya@ieee.org> writes: >>> >> >>>>Doesn't the usual fixed-point hardware do a shift after >>>>multiplying? "Redundant sign bit" and all that. >> >>>Depends on the processor. The TI TMS3205xx series does not by default >> >>>(there is a register for setting this behavior). The Motorola does. >> >>Default or not, it isn't behavior one gets automatically from int >>operations in C. That was my point. > > > If by "it" you mean "automatic left shift by one bit after multiply," > I agree. That is, it is true that the integer multiply operations in C > do not automatically left shift the result by one. > > However, that wasn't what you asked, so I'm not sure how my response > was off-point to your question.
Then, as all too often, I was too oblique to get my point across. Earlier, you wrote, "Whether the integers are reinterpreted to be fractional, fixed- point, or integer is all in the interpretation and has little or nothing to do with the implementation of the basic arithmetic operations (add, subtract, multiply)." There is -- or can be -- a slight but significant difference with multiplication. Coding the shift in C can use a few extra cycles. Since the choice between fixed-point and integer multiplication is made with a bit in a register, I doubt that any compiler that knows only integer would do fixed point efficiently. There are other differences that just occurred to me. In the fixed-point shift, what had been the MSB of the low-order word becomes the LSB of the returned result; that's not possible with a HLL integer multiply. Worse, an HLL integer multiply returns the low word of the product. A fixed-point multiply returns (mostly) the high word. I can remedy this with int in C by promoting to long, then multiplying, shifting, and truncating to int. That's painful (although it could be a macro) and time consuming. And it won't work with longs. That's too stupid a scenario. What am I missing? Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
in article xxp3bvb1mkl.fsf@usrts005.corpusers.net, Randy Yates at
randy.yates@sonyericsson.com wrote on 03/04/2005 10:02:

> Tim Wescott <tim@wescottnospamdesign.com> writes: > >> Even on a MAC-less processor when you are in assembly and multiply two >> signed numbers N-bit numbers you can choose to take the lower N bits >> of the 2N-1 bit result as C does, or you can take the upper N-1 bits >> and do a shift, with way fewer clock cycles (10 or 20:1) than you >> could implement the same functionality in C. > > The product of two N-bit numbers using two's complement arithmetic > requires 2*N bits to represent all possible results, not 2*N - 1 bits. > > We can agree that it is alright to use 2*N - 1 bits and saturate or > truncate the one input combination that requires 2*N bits, but it is > improper to merely assume this is to be done (often it is not!).
it *is* only one case, the -1 x -1 = +1 (or in integer math, -2^(N-1) x -2^(N-1) = +2^(2N-2) which requires 2N bits. it is the only case where the two MSBs are not identical. nonetheless, because an N-bit "fractional" fixed-point number, F, is related to its N-bit 2s-complement integer representation, I, (same bit pattern for both) by I = 2^(N-1) * [ F ] or F = 2^(1-N) * [ I ] when two fixed-point numbers are multiplied together F1*F2 = [2^(1-N)*I1]*[2^(1-N)*I2] = 2^(1-N) * [ (2^(1-N) * (I1*I2) ] so after the hardware does the integer multiplication, you have to shift this N-1 bits to the right to get the properly scaled result no matter what. even in this special case where F1 = F2 = -1.0 . in an integer machine, you do your N bit x N bit signed integer multiply, shift left 1 bit, and take your result from the most significant word of the 2N bit product register (which is the same as shifting right N bits). i know this is in your treatise, Randy, but i wanna make or emphasize another point. what struck me odd about Motorola DSP56K and DSP563xx series, is the LSB in the 56 bit accumulator. they scaled everything right, but failed to recognize that the LSB was always zero (because it's a 47 bit result after MPY, not a 48 bit result). this was a big mistake for a lot of reasons. one that comes to mind is the code you need in the 56K to do table lookup and linear interpolation, which i used to do all the damn time. move #>functable,r1 move #>(functable_size/2),n1 ; half size of table move n1,y0 ; this copies the bits move x:function_input,x0 ; -1 <= function_input < 1 mpy y0,x0,a (r1)+n1 ; point to middle of table move a1,n1 move a0,b1 ; fractional bits go into b1 lsr b (r1)+n1 ; point to first value move x:(r1)+,x0 ; get first point tfr x0,a b1,y0 mac -y0,x0,a x:(r1),x0 ; get next point macr y0,x0,a ; finish interpolation ; function result in a this does one table lookup and linear interpolation. if, in this instruction move a0,b1 they moved the 23 MSBs of a0 into the 23 LSBs of the destination register (with a zero extension), then i could move it directly to y0 and eliminate some instructions. since, after an MPY instruction, the LSB of a0 or b0 is always zero (because of that left shift by one bit inherent to fixed-point arithmetic), they could get rid of that bit and then the 23 bits that are left are precisely the fractional bits i want, and if zero extended when moved to a 24 bit number signed fractional register, are precisely in the fractional form that i want. so, in general, an N bit x N bit signed fixed-point number is a 2N-1 bit result where you get the similarly scaled result out of the N most significant bits. sign extending that into guard bits will take care of the one case of -1 x -1 (as well as take care of other problems), but there are only N-1 "fractional bits" below those N bits to worry about and never N bits. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Jerry Avins <jya@ieee.org> writes:

> Randy Yates wrote: > > Jerry Avins <jya@ieee.org> writes: > > > > >>Randy Yates wrote: > >> > >>>Jerry Avins <jya@ieee.org> writes: > >>> > >> > >>>>Doesn't the usual fixed-point hardware do a shift after > >>>>multiplying? "Redundant sign bit" and all that. > >> > >>>Depends on the processor. The TI TMS3205xx series does not by default > >> > >>>(there is a register for setting this behavior). The Motorola does. > >> > >>Default or not, it isn't behavior one gets automatically from int > >>operations in C. That was my point. > > If by "it" you mean "automatic left shift by one bit after multiply," > > > I agree. That is, it is true that the integer multiply operations in C > > do not automatically left shift the result by one. > > However, that wasn't what you asked, so I'm not sure how my response > > > was off-point to your question. > > Then, as all too often, I was too oblique to get my point > across. Earlier, you wrote, "Whether the integers are reinterpreted to > be fractional, fixed- point, or integer is all in the interpretation > and has little or nothing to do with the implementation of the basic > arithmetic operations (add, subtract, multiply)." > > > There is -- or can be -- a slight but significant difference with > multiplication. Coding the shift in C can use a few extra cycles.
Yes, in the case where you are doing the left shift, there may be extra cycles required to do this in C. However, this isn't always the case, and when it is the case, the number of cycles in the difference will probably be between 0 and "not very many," depending on the compiler, the specific types of operations being performed, and the ingenuity of the programmer. Thus the distinction does not warrant a generic caveat regarding fixed-point arithmetic in C, in my opinion.
> Since the choice between fixed-point and integer multiplication is > made with a bit in a register,
It is this sort of thinking that is propagating the error in this thread. This logic is tantamount to saying that "the difference between a Ford and a Chevy is that a Ford has a better ride." It attempts to generalize what may be true in a specific case.
> I doubt that any compiler that knows > only integer would do fixed point efficiently.
Did you see my other post where I gave actual code that refutes this?
> There are other > differences that just occurred to me. In the fixed-point shift, what > had been the MSB of the low-order word becomes the LSB of the returned > result;
I'm not sure of the scenario you're trying to describe here.
> that's not possible with a HLL integer multiply. Worse, an HLL > integer multiply returns the low word of the product. A fixed-point > multiply returns (mostly) the high word.
Not necessarily. Depends on what you're doing.
> I can remedy this with int in > C by promoting to long, then multiplying, shifting, and truncating to > int. That's painful (although it could be a macro) and time > consuming.
What's so time-consuming about it? It might cost you one cycle. Big whoop.
> And it won't work with longs.
True unless the compiler supports an extended long, i.e., an integer the size of the accumulator. Although double-precision multiplication takes us pretty far afield from the topic.
> That's too stupid a scenario. What am I missing?
I think you're right - you are just beginning to think about more of the details. -- Randy Yates Sony Ericsson Mobile Communications Research Triangle Park, NC, USA randy.yates@sonyericsson.com, 919-472-1124
Randy Yates wrote:
> Jerry Avins <jya@ieee.org> writes:
...
>>There is -- or can be -- a slight but significant difference with >>multiplication. Coding the shift in C can use a few extra cycles. > > > Yes, in the case where you are doing the left shift, there may be > extra cycles required to do this in C. However, this isn't always the > case, and when it is the case, the number of cycles in the difference > will probably be between 0 and "not very many," depending on the > compiler, the specific types of operations being performed, and the > ingenuity of the programmer. > > Thus the distinction does not warrant a generic caveat regarding > fixed-point arithmetic in C, in my opinion.
What general caveat, "Watch out for gotchas"?
>>Since the choice between fixed-point and integer multiplication is >>made with a bit in a register, > > > It is this sort of thinking that is propagating the error in this > thread. This logic is tantamount to saying that "the difference > between a Ford and a Chevy is that a Ford has a better ride." It > attempts to generalize what may be true in a specific case.
I don't understand. Can I expect a C compiler to change the integer/~fixed bit as appropriate?
>>I doubt that any compiler that knows >>only integer would do fixed point efficiently. > > > Did you see my other post where I gave actual code that refutes this?
Yes. Impressive but, I suspect, rare.
>>There are other >>differences that just occurred to me. In the fixed-point shift, what >>had been the MSB of the low-order word becomes the LSB of the returned >>result; > > > I'm not sure of the scenario you're trying to describe here.
The product of a 32-by-32 multiply has 64 bits. Numbering them zero to 63 in increasing significance, an ordinary integer multiply returns zero to 31, or zero to 30 plus the sign bit, 63. A fixed-point multiply should return bits 31 to 62 of the product. Shifting the upper word alone forces bit 31 to zero. (I think I could be clearer. Should I try?)
>>that's not possible with a HLL integer multiply. Worse, an HLL >>integer multiply returns the low word of the product. A fixed-point >>multiply returns (mostly) the high word. > > > Not necessarily. Depends on what you're doing.
Please elaborate.
>>I can remedy this with int in >>C by promoting to long, then multiplying, shifting, and truncating to >>int. That's painful (although it could be a macro) and time >>consuming. > > > What's so time-consuming about it? It might cost you one cycle. Big > whoop.
Better than I supposed by far. I may give up on assembler after all!
>>And it won't work with longs. > > > True unless the compiler supports an extended long, i.e., an integer > the size of the accumulator. Although double-precision multiplication > takes us pretty far afield from the topic. > > >>That's too stupid a scenario. What am I missing? > > > I think you're right - you are just beginning to think about more of > the details.
I hate psyching out a compiler when I know what I want in the end. "Just do it" never seemed more appropriate. Thanks for the education. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;