DSPRelated.com
Forums

Fast bit-reverse on an x86?

Started by Tim Wescott November 2, 2010
Doing some work on a PC.  Customer says it's not real time, but real 
fast is still better than real slow.  I need to bit-reverse a slew of 
bytes, so the faster the better.

AFAIK the Pentium instruction set doesn't have a bit-reverse 
instruction, and there certainly isn't one available from C.  My 
knee-jerk reaction to this problem is to make up a table of bit-reversed 
bytes, and do a lookup to retrieve them.

Anyone know a faster way?

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html
On Nov 2, 8:07&#4294967295;pm, Tim Wescott <t...@seemywebsite.com> wrote:
> Doing some work on a PC. &#4294967295;Customer says it's not real time, but real > fast is still better than real slow. &#4294967295;I need to bit-reverse a slew of > bytes, so the faster the better. > > AFAIK the Pentium instruction set doesn't have a bit-reverse > instruction, and there certainly isn't one available from C. &#4294967295;My > knee-jerk reaction to this problem is to make up a table of bit-reversed > bytes, and do a lookup to retrieve them. > > Anyone know a faster way?
Don't know the x86 really but generally this should be doable the "usual" way: shift source once right (so LSB goes into carry, X or whatever), shift destination once left (so carry goes into LSB). Repeat 8 times. No memory accesses involved so perhaps not slower than table lookup at all, and if slower not that bad anyway. I would measure which is faster on the particular hardware if speed is really important. Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On Nov 2, 2:07&#4294967295;pm, Tim Wescott <t...@seemywebsite.com> wrote:
> Doing some work on a PC. &#4294967295;Customer says it's not real time, but real > fast is still better than real slow. &#4294967295;I need to bit-reverse a slew of > bytes, so the faster the better. > > AFAIK the Pentium instruction set doesn't have a bit-reverse > instruction, and there certainly isn't one available from C. &#4294967295;My > knee-jerk reaction to this problem is to make up a table of bit-reversed > bytes, and do a lookup to retrieve them. > > Anyone know a faster way?
faster than table lookup??? how big are the words? did you say bytes? if the words aren't that big, the table ain't that big, memory is cheap, why wouldn't LUT be the fast and simple way to do it? r b-j
On 2010-11-02, Tim Wescott <tim@seemywebsite.com> wrote:

> Doing some work on a PC. Customer says it's not real time, but real > fast is still better than real slow. I need to bit-reverse a slew of > bytes, so the faster the better. > > AFAIK the Pentium instruction set doesn't have a bit-reverse > instruction, and there certainly isn't one available from C. My > knee-jerk reaction to this problem is to make up a table of > bit-reversed bytes, and do a lookup to retrieve them.
That's how I've always done it.
> Anyone know a faster way?
If there's no bit-reverse instruction, I can't really imagine anything faster than doing a lookup in a 256-byte table. I do know somebody (a HW engineer) who once wired up two I/O ports so that he could do a bit-reversal by writing to one port and reading from the other. However, on most machines that's still not any faster than a lookup table (and probably not something you can easily do on a PC). -- Grant Edwards grant.b.edwards Yow! ! The land of the at rising SONY!! gmail.com
>Doing some work on a PC. Customer says it's not real time, but real >fast is still better than real slow. I need to bit-reverse a slew of >bytes, so the faster the better. > >AFAIK the Pentium instruction set doesn't have a bit-reverse >instruction, and there certainly isn't one available from C. My >knee-jerk reaction to this problem is to make up a table of bit-reversed >bytes, and do a lookup to retrieve them. > >Anyone know a faster way?
A table is OK for reversing a byte, but if you are reversing strings of bytes you can try approaches that work on multiple bytes in parallel. On a 32 bit machine this works well for strings of bytes, working 4 bytes at a time: uint32_t bit_reverse_4bytes(uint32_t x) { x = ((x & 0xF0F0F0F0) >> 4) | ((x & 0x0F0F0F0F) << 4); x = ((x & 0xCCCCCCCC) >> 2) | ((x & 0x33333333) << 2); return ((x & 0xAAAAAAAA) >> 1) | ((x & 0x55555555) << 1); } On a 64 bit machine you can up the chunks to 8 bytes with: uint64_t bit_reverse_8bytes(uint64_t x) { x = ((x & 0xF0F0F0F0F0F0F0F0LLU) >> 4) | ((x & 0x0F0F0F0F0F0F0F0FLLU) << 4); x = ((x & 0xCCCCCCCCCCCCCCCCLLU) >> 2) | ((x & 0x3333333333333333LLU) << 2); return ((x & 0xAAAAAAAAAAAAAAAALLU) >> 1) | ((x & 0x5555555555555555LLU) << 1); } If you want to reverse whole words, a table isn't very practical. This will reverse a 32 bit word fairly quickly: uint32_t bit_reverse32(uint32_t x) { x = (x >> 16) | (x << 16); x = ((x & 0xFF00FF00) >> 8) | ((x & 0x00FF00FF) << 8); x = ((x & 0xF0F0F0F0) >> 4) | ((x & 0x0F0F0F0F) << 4); x = ((x & 0xCCCCCCCC) >> 2) | ((x & 0x33333333) << 2); return ((x & 0xAAAAAAAA) >> 1) | ((x & 0x55555555) << 1); } There are tricks with crafted multiplies which will concurrently shunt different bits to different parts of the result word. It might be possible to do fairly fast reversal like that. Steve
On Tue, 2 Nov 2010 11:18:22 -0700 (PDT), robert bristow-johnson
<rbj@audioimagination.com> wrote:

>On Nov 2, 2:07=A0pm, Tim Wescott <t...@seemywebsite.com> wrote: >> Doing some work on a PC. =A0Customer says it's not real time, but real >> fast is still better than real slow. =A0I need to bit-reverse a slew of >> bytes, so the faster the better. >> >> AFAIK the Pentium instruction set doesn't have a bit-reverse >> instruction, and there certainly isn't one available from C. =A0My >> knee-jerk reaction to this problem is to make up a table of bit-reversed >> bytes, and do a lookup to retrieve them. >> >> Anyone know a faster way? > >faster than table lookup??? > >how big are the words? did you say bytes? > >if the words aren't that big, the table ain't that big, memory is >cheap, why wouldn't LUT be the fast and simple way to do it? > >r b-j
On a modern Intel CPU (if that's what he means, they haven't been x86 for a long time), it can make a big difference if the LUT isn't in the cache memory. A programmed solution (like the suggestion to shift in and out of the carry bit) may well be faster if it runs out of cache and the LUT accesses don't. I don't know how hard it is to make sure the LUT is in the cache if it's done that way. Maybe that's not hard with the right tools, but it would be an important point to verify if the difference matters. Eric Jacobsen Minister of Algorithms Abineau Communications http://www.abineau.com
steveu <steveu@n_o_s_p_a_m.coppice.org> wrote:
>>Doing some work on a PC. Customer says it's not real time, but real >>fast is still better than real slow. I need to bit-reverse a slew of >>bytes, so the faster the better.
(snip)
> A table is OK for reversing a byte, but if you are reversing strings of > bytes you can try approaches that work on multiple bytes in parallel.
For IA32 there is BSWAP to swap the bytes in a 32 bit register, and XCHG to swap bytes in a 16 bit register. -- glen
On Nov 2, 4:45&#4294967295;pm, Walter Banks <wal...@bytecraft.com> wrote:
> Tim Wescott wrote: > > > Doing some work on a PC. &#4294967295;Customer says it's not real time, but real > > fast is still better than real slow. &#4294967295;I need to bit-reverse a slew of > > bytes, so the faster the better. > > > AFAIK the Pentium instruction set doesn't have a bit-reverse > > instruction, and there certainly isn't one available from C. &#4294967295;My > > knee-jerk reaction to this problem is to make up a table of bit-reversed > > bytes, and do a lookup to retrieve them. > > > Anyone know a faster way? > > Table lookup is very fast but > this is quicker than doing it the hard way > > unsigned int > reverse(register unsigned int x) > { > &#4294967295; &#4294967295; &#4294967295; &#4294967295; x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1)); > &#4294967295; &#4294967295; &#4294967295; &#4294967295; x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2)); > &#4294967295; &#4294967295; &#4294967295; &#4294967295; x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4)); > &#4294967295; &#4294967295; &#4294967295; &#4294967295; x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8)); > &#4294967295; &#4294967295; &#4294967295; &#4294967295; return((x >> 16) | (x << 16)); > > }
And if you're doing this in x86 assembler, the last two lines can be replaced with a bswap. Plus in assembler you can eliminate roughly half the shifts by using rotates, and then compensating in the next step for the overall rotation of the partial result. Not tested, but roughly: ; input/output in eax mov ebx,eax and eax,0xaaaaaaaa ror eax,2 and ebx,0x55555555 or eax,ebx mov ebx,eax and eax,0x66666666 ror eax,4 and ebx,0x99999999 or eax,ebx mov ebx,eax and eax,0x1e1e1e1e ror eax,8 and ebx,0xe1e1e1e1 or eax,ebx rol eax,7 bswap eax That should work for 64 bit words on x86-64 by just changing the E?Xs with R?Xs.

Tim Wescott wrote:
> > Doing some work on a PC. Customer says it's not real time, but real > fast is still better than real slow. I need to bit-reverse a slew of > bytes, so the faster the better. > > AFAIK the Pentium instruction set doesn't have a bit-reverse > instruction, and there certainly isn't one available from C. My > knee-jerk reaction to this problem is to make up a table of bit-reversed > bytes, and do a lookup to retrieve them. > > Anyone know a faster way?
Table lookup is very fast but this is quicker than doing it the hard way unsigned int reverse(register unsigned int x) { x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1)); x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2)); x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4)); x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8)); return((x >> 16) | (x << 16)); } http://aggregate.org/MAGIC/#Bit%20Reversal A number of bit reversal algorithms with source are benchmarked here http://stackoverflow.com/questions/746171/best-algorithm-for-bit-reversal-from-msb-lsb-to-lsb-msb-in-c Regards, w.. -- Walter Banks Byte Craft Limited http://www.bytecraft.com --- news://freenews.netfront.net/ - complaints: news@netfront.net ---

Tim Wescott wrote:
> Doing some work on a PC. Customer says it's not real time, but real > fast is still better than real slow. I need to bit-reverse a slew of > bytes, so the faster the better.
What do you do? Mirroring some bitmaps?
> AFAIK the Pentium instruction set doesn't have a bit-reverse > instruction, and there certainly isn't one available from C. My > knee-jerk reaction to this problem is to make up a table of bit-reversed > bytes, and do a lookup to retrieve them. > > Anyone know a faster way?
1. Take FFT. 2. Invert the imaginary part 3. Take inverse FFT. Show them that you are the DSP engineer, not some code monkey :-) Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com