DSPRelated.com
Forums

Pseudorandom Hashing

Started by Tim Wescott September 23, 2004
I am having trouble coming up with the right keywords to do a web 
search, so help me out here:

There is a technique where, to significantly reduce the probability of 
getting a long string of zeros, a message is run through a CRC 
generator, and the output bits are taken off.  The transmitted message 
is thoroughly hashed, yet it is a simple matter of a shift register and 
some XOR gates to decode the message on the other end.

I thought I knew how to do this, yet in trying to actually make it work 
I find that over half of my brain cells appear to be attending a 
management seminar.

So, know where I can find out how to do this right?  "pseudorandom" and 
"hash" get me tons of cryptography, but not what I'm looking for.

Thanks.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
On Thu, 23 Sep 2004 14:40:28 -0700, Tim Wescott
<tim@wescottnospamdesign.com> wrote:

>I am having trouble coming up with the right keywords to do a web >search, so help me out here: > >There is a technique where, to significantly reduce the probability of >getting a long string of zeros, a message is run through a CRC >generator, and the output bits are taken off. The transmitted message >is thoroughly hashed, yet it is a simple matter of a shift register and >some XOR gates to decode the message on the other end. > >I thought I knew how to do this, yet in trying to actually make it work >I find that over half of my brain cells appear to be attending a >management seminar.
"attending a management seminar".... I like that ;-)
> >So, know where I can find out how to do this right? "pseudorandom" and >"hash" get me tons of cryptography, but not what I'm looking for. > >Thanks.
My mind is slow today also, but isn't it the same in AND out... XMIT: 2-bit SR, XOR the two SR Q's together to get output RECVR: SAME ...Jim Thompson -- | James E.Thompson, P.E. | mens | | Analog Innovations, Inc. | et | | Analog/Mixed-Signal ASIC's and Discrete Systems | manus | | Phoenix, Arizona Voice:(480)460-2350 | | | E-mail Address at Website Fax:(480)460-2142 | Brass Rat | | http://www.analog-innovations.com | 1962 | I love to cook with wine. Sometimes I even put it in the food.
Jim Thompson wrote:
> On Thu, 23 Sep 2004 14:40:28 -0700, Tim Wescott > <tim@wescottnospamdesign.com> wrote: > > >>I am having trouble coming up with the right keywords to do a web >>search, so help me out here: >> >>There is a technique where, to significantly reduce the probability of >>getting a long string of zeros, a message is run through a CRC >>generator, and the output bits are taken off. The transmitted message >>is thoroughly hashed, yet it is a simple matter of a shift register and >>some XOR gates to decode the message on the other end. >> >>I thought I knew how to do this, yet in trying to actually make it work >>I find that over half of my brain cells appear to be attending a >>management seminar. > > > "attending a management seminar".... I like that ;-) > > >>So, know where I can find out how to do this right? "pseudorandom" and >>"hash" get me tons of cryptography, but not what I'm looking for. >> >>Thanks. > > > My mind is slow today also, but isn't it the same in AND out... > > XMIT: 2-bit SR, XOR the two SR Q's together to get output > > RECVR: SAME > > ...Jim Thompson
IIRC you XOR and feed back on the input, and just XOR on the output. But IIRC then I wouldn't need to ask. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
On Thu, 23 Sep 2004 14:40:28 -0700, Tim Wescott
<tim@wescottnospamdesign.com> wrote:

>I am having trouble coming up with the right keywords to do a web >search, so help me out here: > >There is a technique where, to significantly reduce the probability of >getting a long string of zeros, a message is run through a CRC >generator, and the output bits are taken off. The transmitted message >is thoroughly hashed, yet it is a simple matter of a shift register and >some XOR gates to decode the message on the other end. > >I thought I knew how to do this, yet in trying to actually make it work >I find that over half of my brain cells appear to be attending a >management seminar. > >So, know where I can find out how to do this right? "pseudorandom" and >"hash" get me tons of cryptography, but not what I'm looking for. > >Thanks.
It's called data scrambling. Modems sometimes use it to make sure there are enough data transitions to keep the receive end in sync. Try "modem data scrambling" or something. There are also n/m codes, where each group of n data bits is replaced with a mapped group of m bits, where m = n+1, often. This is also a variation on RLL (run length limiting) coding, I think. John
On Thu, 23 Sep 2004 14:40:28 -0700, Tim Wescott
<tim@wescottnospamdesign.com> wrote:

>I am having trouble coming up with the right keywords to do a web >search, so help me out here: > >There is a technique where, to significantly reduce the probability of >getting a long string of zeros, a message is run through a CRC >generator, and the output bits are taken off. The transmitted message >is thoroughly hashed, yet it is a simple matter of a shift register and >some XOR gates to decode the message on the other end. > >I thought I knew how to do this, yet in trying to actually make it work >I find that over half of my brain cells appear to be attending a >management seminar. > >So, know where I can find out how to do this right? "pseudorandom" and >"hash" get me tons of cryptography, but not what I'm looking for.
--- Is this what you had in mind? A-----------+ +------------------------Y EXOR | | B--+ | +--A | | EXOR Y--[D Q]---[D Q]---[D Q]-+-[D Q]--+--->DATA OUT DATA IN>-- B Each [D Q] is a stage of a shift register, and as each new data bit is presented to the first EXOR, it's EXORed with the EXOR of older bits which have propagated down the chain and been EXORed with each other. The result is that the DATA OUT stream of bits is scrambled and bears no resemblance, on the surface, to DATA IN. In order to recover DATA IN, the DATA OUT stream is presented as DATA IN to an identical shift register / EXOR circuit and, if they're both synced up (the circuits) what comes out of the receiver will be exactly what went into of the transmitter. Depending on what goes into the transmitter input, there may be long strings of ones or zeroes which come out of the output. This can be a bad thing for synchronous data modems which use data transitions to keep their their 'dot clocks' in sync with incoming data, so what's done to circumvent that is that a counter is placed in the transmitter circuitry which counts ones and/or zeroes, and if a certain number of them occur, in a row, then a one or a zero is stuffed into the data stream (or a bit is flipped) to break up the sequence. -- John Fields
On Thu, 23 Sep 2004 18:39:10 -0500, John Fields
<jfields@austininstruments.com> wrote:

>On Thu, 23 Sep 2004 14:40:28 -0700, Tim Wescott ><tim@wescottnospamdesign.com> wrote: > >>I am having trouble coming up with the right keywords to do a web >>search, so help me out here: >> >>There is a technique where, to significantly reduce the probability of >>getting a long string of zeros, a message is run through a CRC >>generator, and the output bits are taken off. The transmitted message >>is thoroughly hashed, yet it is a simple matter of a shift register and >>some XOR gates to decode the message on the other end. >> >>I thought I knew how to do this, yet in trying to actually make it work >>I find that over half of my brain cells appear to be attending a >>management seminar. >> >>So, know where I can find out how to do this right? "pseudorandom" and >>"hash" get me tons of cryptography, but not what I'm looking for. > > >--- >Is this what you had in mind? > > A-----------+ > +------------------------Y EXOR | > | B--+ | > +--A | | > EXOR Y--[D Q]---[D Q]---[D Q]-+-[D Q]--+--->DATA OUT >DATA IN>-- B > >Each [D Q] is a stage of a shift register, and as each new data bit is >presented to the first EXOR, it's EXORed with the EXOR of older bits >which have propagated down the chain and been EXORed with each other. > >The result is that the DATA OUT stream of bits is scrambled and bears >no resemblance, on the surface, to DATA IN. > >In order to recover DATA IN, the DATA OUT stream is presented as DATA >IN to an identical shift register / EXOR circuit and, if they're both >synced up (the circuits) what comes out of the receiver will be >exactly what went into of the transmitter. > >Depending on what goes into the transmitter input, there may be long >strings of ones or zeroes which come out of the output. This can be a >bad thing for synchronous data modems which use data transitions to >keep their their 'dot clocks' in sync with incoming data, so what's >done to circumvent that is that a counter is placed in the transmitter >circuitry which counts ones and/or zeroes, and if a certain number of >them occur, in a row, then a one or a zero is stuffed into the data >stream (or a bit is flipped) to break up the sequence.
I posted... Newsgroups: alt.binaries.schematics.electronic Subject: Pseudorandom Hashing (S.E.D) - NRZ-AddTransitions.pdf Message-ID: <9mo6l0tpq9jn3q07fsrfpkd1lbot33330i@4ax.com> which is INCORRECT. John has it right... that's what I was trying to remember from eons ago. ...Jim Thompson -- | James E.Thompson, P.E. | mens | | Analog Innovations, Inc. | et | | Analog/Mixed-Signal ASIC's and Discrete Systems | manus | | Phoenix, Arizona Voice:(480)460-2350 | | | E-mail Address at Website Fax:(480)460-2142 | Brass Rat | | http://www.analog-innovations.com | 1962 | I love to cook with wine. Sometimes I even put it in the food.
On Thursday 23 September 2004 04:39 pm, John Fields did deign to grace us
with the following:

> On Thu, 23 Sep 2004 14:40:28 -0700, Tim Wescott
>>So, know where I can find out how to do this right? "pseudorandom" and >>"hash" get me tons of cryptography, but not what I'm looking for. > Is this what you had in mind? > > A-----------+ > +------------------------Y EXOR | > | B--+ | > +--A | | > EXOR Y--[D Q]---[D Q]---[D Q]-+-[D Q]--+--->DATA OUT > DATA IN>-- B > > Depending on what goes into the transmitter input, there may be long > strings of ones or zeroes which come out of the output. This can be a > bad thing for synchronous data modems which use data transitions to > keep their their 'dot clocks' in sync with incoming data, so what's > done to circumvent that is that a counter is placed in the transmitter > circuitry which counts ones and/or zeroes, and if a certain number of > them occur, in a row, then a one or a zero is stuffed into the data > stream (or a bit is flipped) to break up the sequence.
I did a bunch of these on protoboard, right out of Don Lancaster's TTL Cookbook, about a thousand years ago. ;-) Just want to mention a couple of things ... well, one is moot for a data stream (unless, as noted, there's a long string of 1s or 0s), but for a free-running PRNG, you have to account for the "dead" state. I think that if it somehow accidentally becomes all 0's or all 1's, I think which is which is determined by how many xors or something, it gets stuck. So I had like an 8-input AND or something to kick it out of that. And the other question is, how do you sync-up the TX and RX anyway? If I had to take a WAG, I'd think you'd maybe skip one clock pulse from time to time until sync characters start coming out, but I'd be surprised if someone hasn't already come up with something more sophisticated. Thanks! Rich
In article <10l6gmbq4c3ngff@corp.supernews.com>,
Tim Wescott  <tim@wescottnospamdesign.com> wrote:
[...]
>There is a technique where, to significantly reduce the probability of >getting a long string of zeros, a message is run through a CRC >generator, and the output bits are taken off. The transmitted message >is thoroughly hashed, yet it is a simple matter of a shift register and >some XOR gates to decode the message on the other end.
Another thing to look up is the GCR method that used to be used on tape decks. It worked to reduce the number of edges per byte and thus limit the high frequencies. IIRC: the math for developing the codes for GCR is very like the math for ensuring transitions happen every so often. -- -- kensmith@rahul.net forging knowledge
Tim Wescott wrote:

[...]

> There is a technique where, to significantly reduce the probability of > getting a long string of zeros, a message is run through a CRC > generator, and the output bits are taken off. The transmitted message > is thoroughly hashed, yet it is a simple matter of a shift register and > some XOR gates to decode the message on the other end. > > I thought I knew how to do this, yet in trying to actually make it work > I find that over half of my brain cells appear to be attending a > management seminar. > > So, know where I can find out how to do this right? "pseudorandom" and > "hash" get me tons of cryptography, but not what I'm looking for. > > Thanks. > > Tim Wescott > Wescott Design Services > http://www.wescottdesign.com
Tim, It's called scrambling - but are you sure you want to do it? Recall the probability of a given bit pattern in a random sequence is 1/2^n, where n is the length of the pattern. So in any random group of 8 bits, you have a 1/256 chance of finding the bit pattern 0000000. If this matches or partially matches a similar pattern in the data, the output will be a long string of zeros. If this causes a system failure, it may occur fairly often. Many encoding schemes have no dc component and can tolerate long strings of ones or zeros. The penalty is low efficiency. Other methods limit the length of the runs, for example the RLL codes that were used in hard disk drives. Optical data transmission uses this technique, and I believe Ethernet. One method that achieves high bit efficiency and has no dc component is the old MFM, or Miller code that was used on hard disk drives prior to RLL. It is very simple to encode and decode, and has twice the efficiency of Manchester encoding. So if the data absolutely has got to get there, any of the RLL encoding methods may be a better choice than data scrambling. Mike
On Thu, 23 Sep 2004 15:26:18 -0700, Tim Wescott wrote:

> Jim Thompson wrote: >> On Thu, 23 Sep 2004 14:40:28 -0700, Tim Wescott >> <tim@wescottnospamdesign.com> wrote: >> >> >>>I am having trouble coming up with the right keywords to do a web >>>search, so help me out here: >>> >>>There is a technique where, to significantly reduce the probability of >>>getting a long string of zeros, a message is run through a CRC >>>generator, and the output bits are taken off. The transmitted message >>>is thoroughly hashed, yet it is a simple matter of a shift register and >>>some XOR gates to decode the message on the other end. >>> >>>I thought I knew how to do this, yet in trying to actually make it work >>>I find that over half of my brain cells appear to be attending a >>>management seminar. >> >> >> "attending a management seminar".... I like that ;-) >> >> >>>So, know where I can find out how to do this right? "pseudorandom" and >>>"hash" get me tons of cryptography, but not what I'm looking for. >>> >>>Thanks. >> >> >> My mind is slow today also, but isn't it the same in AND out... >> >> XMIT: 2-bit SR, XOR the two SR Q's together to get output >> >> RECVR: SAME >> >> ...Jim Thompson > IIRC you XOR and feed back on the input, and just XOR on the output. > > But IIRC then I wouldn't need to ask.
I think you are talking about two things. One is a linear feedback shift register pseudo random noise generator. (LFSR PRNG). These can be made with just a shift register (or d flip flops) and a few xor gates. The other is some kind of encoder, such as a 4-bit to 5-bit encoder. (Two of them together can make an 8-bit/10-bit encoder) Since there are only 16 4-bit sequences, but 32 5-bit sequences, you can map each 4-bit sequence to a 5-bit sequence with a balance of ones and zeroes. There is a standard way to do this. I believe ethernet does this (well, the 8b/10b version) to guarantee that there is no net DC in the ethernet signal. Fibre Channel also does this to guarantee that there is no DC, and also to ensure that there are enough edges for clock recovery. Let me return to the topic of the LFSR. There is a whole literature on these things, and the combinations that work are well known. I would advise you to find a list of the known good ones rather than trying to find one by trial and error. I think the A of E has a section on this. Could be wrong. AFAIK, the LFSR does not decrease the probability of having long runs of zeroes provided that the data going over the wire are random. On the other hand, if the data going over the wire are known to contain long runs of zeroes, then I guess that the LFSR would offer an improvement to that situation, but it couldn't guarantee it by any means. So if long runs of zeroes are an inconvenience, then the LFSR might solve your problem. But if the long runs of zeroes are impermissible, then you need to go with something more like the 8b/10b encoding. Good luck. --Mac