(cross posted to comp.dsp and comp.lang.c.)
are compilers now-a-days smart enough to multiply two N-bit numbers to
a 2N-bit result without wasting instructions in the generated machine
code?
like, we know for C, a long times a long is a long. but if you cast
one of the multiplicands to (long long), and then multiply, you get a
long long without any loss of bits. but are the compilers smart
enough to avoid that? let
long *h, *x, *y; // coefs, input, output ptrs
int L; // FIR length
int n=0;
...
while(TRUE)
{
long long accumulator=0LL;
for(int i=0; i<L; i++)
{
accumulator += ( (long long)(h[i]) )*(x[n-i]);
}
long y[n++] = (long)(accumulator>>(8*sizeof(long)-1);
}
(sorry for what google does to the = sign.)
it's not complete code, but i hope you can get the drift. can anyone
comment on whether or not some particular compiler (gcc?) codes that
to an efficient mac instruction without actually sign extending the
value or h[i] and doing a long long multiply?
this had been an old irritant with C, i think the language definition
gets that wrong, but really smart compilers should know what to do,
no?
r b-j
integer multiplication
Started by ●May 20, 2013
Reply by ●May 20, 20132013-05-20
In comp.dsp robert bristow-johnson <rbj@audioimagination.com> wrote:> (cross posted to comp.dsp and comp.lang.c.)> are compilers now-a-days smart enough to multiply two N-bit numbers to > a 2N-bit result without wasting instructions in the generated machine > code?(snip)> long *h, *x, *y; // coefs, input, output ptrs > int L; // FIR length > int n=0; > > ... > > while(TRUE) > { > > long long accumulator=0LL; > > for(int i=0; i<L; i++) > { > accumulator += ( (long long)(h[i]) )*(x[n-i]); > }The only one I ever looked at this on was the IA32 gcc. Since IA32 doesn't have an instruction for multiplying 64 bit integers (with either 64 or 128 bit product) that would normally be done by subroutine call. But it does recognize this case. I would be slightly less sure for a 64 bit archtecture, which does have the appropriate multiply instruction. The way the gcc code generator works, I would expect it (but still check) for any 32 bit system with an appropriate multiply instruction. I never looked at any other compilers. -- glen
Reply by ●May 20, 20132013-05-20
On Sun, 19 May 2013 20:46:11 -0700 (PDT) robert bristow-johnson <rbj@audioimagination.com> wrote:> > (cross posted to comp.dsp and comp.lang.c.) > > are compilers now-a-days smart enough to multiply two N-bit numbers to > a 2N-bit result without wasting instructions in the generated machine > code? > > like, we know for C, a long times a long is a long. but if you cast > one of the multiplicands to (long long), and then multiply, you get a > long long without any loss of bits. but are the compilers smart > enough to avoid that? let > > > long *h, *x, *y; // coefs, input, output ptrs > int L; // FIR length > int n=0; > > ... > > while(TRUE) > { > > long long accumulator=0LL; > > for(int i=0; i<L; i++) > { > accumulator += ( (long long)(h[i]) )*(x[n-i]); > } > > long y[n++] = (long)(accumulator>>(8*sizeof(long)-1); > > } > > (sorry for what google does to the = sign.) > > it's not complete code, but i hope you can get the drift. can anyone > comment on whether or not some particular compiler (gcc?) codes that > to an efficient mac instruction without actually sign extending the > value or h[i] and doing a long long multiply? > > this had been an old irritant with C, i think the language definition > gets that wrong, but really smart compilers should know what to do, > no? > > > r b-jI've played with it on GCC for ARM. If you cast the input operands first, then at least with optimization on it gives you sensible results. For instance: inline int32_t frac_mult(int32_t a, int32_t b) { int64_t result = (int64_t)a * b; return (result >> 32); } performs a 32x32=64 multiply into a dual-register result, and simply ignores the register with the low half of the result. If you really care though, I'd go read disassembly. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.
Reply by ●May 20, 20132013-05-20
robert bristow-johnson <rbj@audioimagination.com> writes:> (cross posted to comp.dsp and comp.lang.c.) > > are compilers now-a-days smart enough to multiply two N-bit numbers to > a 2N-bit result without wasting instructions in the generated machine > code? > > like, we know for C, a long times a long is a long. but if you cast > one of the multiplicands to (long long), and then multiply, you get a > long long without any loss of bits. but are the compilers smart > enough to avoid that? letKeep in mind that long long is not *necessarily* twice as wide as long. For example, on x86_64 Linux systems, both are typically 64 bits. Your example would work better with int32_t and int64_t.> long *h, *x, *y; // coefs, input, output ptrs > int L; // FIR length > int n=0; > > ... > > while(TRUE) > { > > long long accumulator=0LL; > > for(int i=0; i<L; i++) > { > accumulator += ( (long long)(h[i]) )*(x[n-i]); > } > > long y[n++] = (long)(accumulator>>(8*sizeof(long)-1); > > } > > (sorry for what google does to the = sign.)The "=" characters look fine.> it's not complete code, but i hope you can get the drift. can anyone > comment on whether or not some particular compiler (gcc?) codes that > to an efficient mac instruction without actually sign extending the > value or h[i] and doing a long long multiply? > > this had been an old irritant with C, i think the language definition > gets that wrong, but really smart compilers should know what to do, > no?Certainly a conforming compiler *can* perform this optimization. If it can confirm that both operands of "*" are small enough (which should be easy in this case), then it can determine that a 32-bit by 32-bit to 64-bit multiplication will yield the same result as a 64x64->64 multiplication. I don't know which actual compilers do this. It would be easy enough to find out by examining the generated assembl code (and being more familiar with the instruction set than I am). -- Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst> Working, but not speaking, for JetHead Development, Inc. "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister"
Reply by ●May 20, 20132013-05-20
> >(cross posted to comp.dsp and comp.lang.c.) > >are compilers now-a-days smart enough to multiply two N-bit numbers to >a 2N-bit result without wasting instructions in the generated machine >code? > >like, we know for C, a long times a long is a long. but if you cast >one of the multiplicands to (long long), and then multiply, you get a >long long without any loss of bits. but are the compilers smart >enough to avoid that? let > > >long *h, *x, *y; // coefs, input, output ptrs >int L; // FIR length >int n=0; > >... > >while(TRUE) >{ > >long long accumulator=0LL; > >for(int i=0; i<L; i++) > { > accumulator += ( (long long)(h[i]) )*(x[n-i]); > } > >long y[n++] = (long)(accumulator>>(8*sizeof(long)-1); > >} > >(sorry for what google does to the = sign.) > >it's not complete code, but i hope you can get the drift. can anyone >comment on whether or not some particular compiler (gcc?) codes that >to an efficient mac instruction without actually sign extending the >value or h[i] and doing a long long multiply? > >this had been an old irritant with C, i think the language definition >gets that wrong, but really smart compilers should know what to do, >no? >long long is vague. int64_t would be a better way to express what you want. However, assuming long long is equivalent to int64_t, and long is equivalent to int32_t, then the syntax you give produces sane results with most modern compilers. However, you may need to turn optimisation on to see this. Most compilers follow a very literal interpretation of the C spec at their minimum level of optimisation, perhaps to maximise how well the debugger behaves. Debuggers often become funkier as you increase the optimisation level.
Reply by ●May 20, 20132013-05-20
robert bristow-johnson wrote:>(cross posted to comp.dsp and comp.lang.c.) > >are compilers now-a-days smart enough to multiply two N-bit numbers to >a 2N-bit result without wasting instructions in the generated machine >code? > > >I took a similar example: long x[200], y[200]; foo(int L) { int i; long long accumulator; accumulator = 0; for(i=0;i<L;i++) { accumulator = (long long)x[i] * y[i]; } bar(accumulator); } and compiled it with the Dignus C compiler (in C89 mode) with optimizations, it generated this code: * *** accumulator = (long long)x[i] * y[i]; L 3,@lit_9_3 L 5,0(2,3) M 4,800(2,3) STM 4,5,88(13) ; accumulator That used the z/Architecture M instruction (MULTIPLY) which multiplies two signed 32-bit values producing a signed 64-bit result. To walk thru this, the first instruction loads register #3 with a pointer to that data section, the array 'x' happens to be at offset 0 in that section, The second instruction loads register #5 with x[i] (the value 4*i has been strength-reduced and is in register #2). The next instruction is the multiply isntruction - M. That is multipliying the 32-bit value in register #3 with the 32-bit value in memory at 800(2,3) - which is y[i]. The 64-bit result goes into register #4 and #5. Those two registers are then stored into 'accumulator'. - Dave Rivers - -- rivers@dignus.com Work: (919) 676-0847 Get your mainframe programming tools at http://www.dignus.com
Reply by ●May 20, 20132013-05-20
On 5/19/2013 10:46 PM, robert bristow-johnson wrote:> > are compilers now-a-days smart enough to multiply two N-bit numbers to > a 2N-bit result without wasting instructions in the generated machine > code?Yes. Compilers do opimize that, as well as multiplication of different sizes. Vladimir Vassilevsky DSP and Mixed Signal Designs www.abvolt.com
Reply by ●May 21, 20132013-05-21
On May 20, 1:31�pm, Keith Thompson <ks...@mib.org> wrote:> robert bristow-johnson <r...@audioimagination.com> writes: > > > (sorry for what google does to the = sign.) > > The "=" characters look fine. >boy, i just don't get it with Google Groups. when do they toss in the "3D" after the "=" sign and when don't they? if you go to the other thread (about interpolation), all of my "=" symbols have a "3D" inserted afterward. why not here? thanks everyone for responding. it indeed does look like smart compilers do exist that do not do the sign extension and unnecessary multiword multiplication r b-j
Reply by ●May 21, 20132013-05-21
On 20/05/13 05:46, robert bristow-johnson wrote:> > (cross posted to comp.dsp and comp.lang.c.) > > are compilers now-a-days smart enough to multiply two N-bit numbers to > a 2N-bit result without wasting instructions in the generated machine > code? > > like, we know for C, a long times a long is a long. but if you cast > one of the multiplicands to (long long), and then multiply, you get a > long long without any loss of bits. but are the compilers smart > enough to avoid that? let > > > long *h, *x, *y; // coefs, input, output ptrs > int L; // FIR length > int n=0; > > ... > > while(TRUE) > { > > long long accumulator=0LL; > > for(int i=0; i<L; i++) > { > accumulator += ( (long long)(h[i]) )*(x[n-i]); > } > > long y[n++] = (long)(accumulator>>(8*sizeof(long)-1); > > } > > (sorry for what google does to the = sign.) > > it's not complete code, but i hope you can get the drift. can anyone > comment on whether or not some particular compiler (gcc?) codes that > to an efficient mac instruction without actually sign extending the > value or h[i] and doing a long long multiply? > > this had been an old irritant with C, i think the language definition > gets that wrong, but really smart compilers should know what to do, > no? > > > r b-j >In general, yes - compilers are smart enough to do this. But it will vary between compilers (including different targets for gcc), it may vary according to the widths of the operands, and it will often vary according to compiler flags (if you don't enable optimisation, expect to get poor code). Also note that if you are talking about a DSP and perhaps odd-sized data, such as a 40-bit accumulator, then you may need compiler/target specific types, intrinsic functions or keywords in order to get optimal code. Your only hope of getting this right is to try it and see, and look carefully at the generated assembly code. The compiler is probably smart enough, but you have to understand it well and work with it to get optimal code.
Reply by ●May 21, 20132013-05-21
On Mon, 20 May 2013 22:00:38 -0700, robert bristow-johnson wrote:>> > (sorry for what google does to the = sign.) >> >> The "=" characters look fine. >> > > boy, i just don't get it with Google Groups. when do they toss in the > "3D" after the "=" sign and when don't they? if you go to the other > thread (about interpolation), all of my "=" symbols have a "3D" > inserted afterward. why not here?"= 3D" is quoted-printable (QP) encoding (the equals sign has code 3D in hex). Some clients will always send messages using QP, others will use it if the message contains any non-ASCII characters. Because QP uses the equals sign as an escape character, the equals sign itself must always be encoded when QP is used. The relevant message headers are: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable If QP was used but any of these headers are missing, QP probably won't be decoded and you'll see "= 3D" instead. If the reader doesn't support MIME (it was originally designed for email, not usenet), you'll see the behaviour with any message encoded using QP. Also, whether or not the message uses QP may depend upon the route taken by the message. Nowadays, most news software (servers and clients) is 8-bit clean, so posting raw ISO-8859-* or UTF-8 will probably work. But some servers will automatically convert 8-bit data to QP (or even base-64) to be on the safe side.






