Error correction codes: good intuitive theory for FW developpers?

Forums

Error correction codes: good intuitive theory for FW developpers?

Started by ombz 8 years ago●11 replies●latest reply 8 years ago●191 views

Hello new forum,

It's been a while I haven't posted here and I was surprised by the need to re-verify the user account. Anyway, I appreciate the professionalism in here and the hundreds of years of industry and design experience that come together when accumulated among all users. This is a great environment for young engineers like me to learn from the experienced ones. Also I appreciate the idea of beer awards hehe, very thoughtful. :p

This post is a bit beyond the scope of classical DSP subjects like sampling, filtering, interpolation etc. The background of this post is explained in this paragraph, but not really relevant to the question. You may skip it and jump to the next paragraph directly that addresses my questions. I came accross a sub-project within my last project to be carried out within the industry: a custom low-cost low-power low-speed serial data link through a variable capacitive channel. The capacitance can vary from small (a few dozens of pF) to extremely small (a few dozens of fF) so the signals were likely to be very much corrupted by various types of noise (from various types of noise sources). The electronics design was not my task. And at that point, H/W debugging, noise source analysis and suppression etc.was already carried out to (IMO) a reasonnable level to say that with that kind of system we cannot expect to have any further improvements by H/W tweaking. However the system and firmware design was my task. We have been measuring bit error rates of about 20% in those channels at worst case (depending on the capacitor "geometry") for message lengths of 24 bits. Bit errors were rather of random type (no burst errors usually), but periods of long burst errors might have arrised (but not avoidable within the choice of design we've made). Those bursts usually corrupted huge parts of the message, but luckily they seem to occur for "only" 1 out of 100 messages which did not jeopardize the whole application even if such a corrupted message had to be trashed. So basically the task was to find a way to improve the bit error rate using some ways of error correction. There no exact BER target but with the whole application, a poor 5% would have done it. By means of pragmatism, lack of experience, project schedule pressure and being lucky to have not used all of the 24 bits for useful information at that time, we've simply ended up using the very basic (and poor) repetition code for the most important bits of information, a simple parity check per message, and additional message to message analysis and bit field filtering in time (using "majority"/median/average filter, which have all the same result for binary input/output signals). Other blocks of the communication link like clock and symbol timing recovery, as well as receiver-transmitter synchronization and optimum channel selection have been thoroughly analyzed, debugged and tweaked to a reasonnable maximum. For the project requirements back at the time, the solution was limit acceptable for the clients. But I wasn't truly happy with a basic, poor repetition code. So I kept on trying to grow my knowledge and skills.

Unfortunately I don't remember having had any kind of class about error detection and correction back in college (maybe too much beer? or simply I wasn't part of communication classes). So I googled. Of course I've found loads of information from various colleges and people all around the globe. They've addressed the following types of codes mostly:
- Hamming
- BCH
- Reed-Solomon
- Cyclic
- LDPC
- Turbo
- anything important missed here?
However, those were, in my opinion, just some specific codes which all have the same kind of basic theory (add redundancy in a meaningful way such that minimum Hamming distances would be maximized in order to have maximum bit error robustness) and sometimes quite different internal structures (algebraic vs. random vs. linear etc.). But most of all, I was missing was a good complete theory but without all the heavy maths, a tutorial for developpers or something like a designer's guide in the sense of:
- basic introduction to the theory
- intuitive design guide like: for this and that kind of error correcting capability choose that and that kind of code with those parameters, depending on which platform and kind of communication link etc.
- intuitive explanation of all the majorly used codes without diving into all the (really heavy?!?) algebra, as well as for code construction, encoding and decoding

My question is: is there any website, book or information out there which would fullfil pretty much my demands? If not, okay then it might be time to create a tutorial here hehe. Basically, I would love to have some book or reference that is easy understandable to the point where I can simply use it as a guide like: ok, I have this communication link/budget so I need to use that kind of code; this is how it works; so ok I will be able to implement it (from scratch, as is often needed for custom/embedded/low-cost/low-power electronics).

Thanks for all your suggestions and comments.

Andy

#ECC

[ - ]

Reply by neirober●July 22, 2016

Hi Andy,

I'm not an expert, but FEC can be a tricky subject to jump into. The math can be tedious. I have read parts of the Lin and Costello book below -- it is well written.

Neil

Lin and Costello

https://www.amazon.com/Error-Control-Coding-2nd-Sh...

[ - ]

Reply by ombz●July 22, 2016

Hi Neil,

Thank you for the reply. Yup that's what I discovered too: tricky subject, especially with all the heavy algebra that I am not familar with and that will for sure take some time and patience to get familiar with. :/

With respect to the initial project, I think the only reasonable possibility we could've had (way) better than repetition code would have been either a Hamming code or a pseudo-random code for a code length of 7 or 8 bits and a rate slightly below 0.5. Any other coding scheme would have not been possible on the transmitter side who's equipped with just an 8-bit PIC running at 4 MHz and pretty busy with other stuff already. :p

But the scope of my question goes beyond that project anyway.

I like the last sentence in the description for the book: "Appropriate for those with minimum mathematical background as a comprehensive reference for coding theory." I might give it a try...

Andy

[ - ]

Reply by wolf22●July 22, 2016

Hallo Andy,

as far as I understand you, you have:

1. a serial link, which can transport 24 bits in one transmission.

2. the link is slow, the hardware is already finished and you cannot change it

3. you are faced with the problem, to establish a trustworthy connection, while the hardware together with the serial link is corrupted heavily by noise and noise bursts.

This arises a lot of questions. How is the synchronisation done, to achieve the mentioned 24 bits? Or is the link rather continuous and the hardware already finds the startpoint by applying a pattern, a residuum-test or something like this? Or is also the startpoint uncertain and the positions inside the 24 bit units are not trusteable? Can you reduce the signal bandwidth to achieve a better SNR? Have you any influence, HOW the data is transmitted? (I have much more questions..)

In principle, if you really are faced with 24 bit numbers, the hardware gives you one after the other and if those 24 bit numbers are really already aligned, then my hint would be: learn to live with the errors. To do this, you need to think on blocks of such numbers, applying a fixed structure to those blocks and use checksums like Adler32 or so to the blocks. So you can detect the block boundaries as well as bit errors and you can repeat transmitting the block if necessary. If you have no chance to establish a backward channel to invoke a repeated transmitting, then you need to provide sending the blocks several times.

Sorry to say, but handling errors will always end in spending transmission time and additional time from receiving to presenting the well reconstructed data. If you try to handle it by Reed/Solomon or so, you always have the bits of interest near by each other - and so a burst makes it impossible to recover them. If they are sent a second ot third time after a reasonable time, you have much better chances to recover the bits, because the probability of long taking noise bursts is much lower - if I have understood your problem properly.

kind regards

wolfgang

[ - ]

Reply by ombz●July 22, 2016

Hello Wolfgang,

Actually the whole 24-bit story, the noise, transmission problems etc. is all not really my question, it's just its background. However, I felt like posting this background because I thought it was a neat problem that I'd still know what experts in here would say about.

The true question was basically: is there any really good book/slides/reference on error correcting codes that doesn't go into the whole maths (including theorems and proves and more theory) but that is rather intuitive to understand and apply?

Anyway, back to your questions for completeness:

1. yes
2. yes
3. yes, "heavily", hm... hard to say.

About the synchronization etc. : it's hard to explain in brief and kinda out of topic. But basically: the 24-bit bits are transmitted during a time of approx 300 us and then there's nothing anymore (0) for a few ms. I am not faced to 24-bit numbers. But basically I have 24 bits at free disposal to transfer 2 status bits, an ID and a 9-bit ADC value. As it was a prototype, the ID could be left out. So overall I've had at most 13 bits left for FEC. Most important were the 2 status bits, hence the basic repetition code did the job. I still hard-coded a fix number (ID) on 4 bits so I could check it on the other side if it were corrupted or not, taking it as an indicator if the message was corrupted by burst errors and hence ignored it, and otherwise pass the ADC value into a median filter (in time, hence for all valid messages), in order to have an additional protection (introducing additional delay, but that was acceptable for the ADC value, not for the status bits though) if ever the burst arrived just after the ID was sent, or for any remaining random bit errors (no bursts).

My problem in the project was rather on how to correct the random bit errors (which arrived approx every 10us) and not the (long and bad) burst errors (which was certainly a few ms long but only arrived maybe once a second).

Thank you. :)

[ - ]

Reply by napierm●July 22, 2016

Hello,

What you are describing are burst errors. The usual code for these is Reed-Solomon. For every two parity symbols added the code can correct one symbol. You are limited by the block length of the code word, say 255 bytes (symbols) for a typical 8-bit RS code. For instance, if you include four 8-bit symbols in the code word then a maximum of 2 symbols can be corrected. Since the error burst not likely to be byte aligned, the code can only correct a 9-bit contiguous error burst in the worst case.

The performance can be improved by increasing the number of parity symbols. Also, if the message can be interleaved to spread the error burst among multiple code words then the burst error correcting capability is increased.

There are several repositories of open source software for RS codes available.

Have fun,

Mark Napier

[ - ]

Reply by ombz●July 22, 2016

Hi Mark,

Thanks for your answer. Well, not sure if really the burst errors were the problem. But if you look at the 24-bit message as 3 distinct 8-bit symbols then maybe.

Thanks for the hint on Reed-Solomon codes. I am not sure though if I'd be able to understand them based on code only, hence my question if there was something like an easy understandable design guide on the topic.

Nice weekend ^

Andy

[ - ]

Reply by napierm●July 22, 2016

Hey Andy,

Yes, that is exactly right. Break your messages up into smaller symbols. There are 6,7,8 or other width code symbol sizes. There is a trade off of code size vs. block length. The smaller codes have a better burst error efficiency but a smaller total length. That means the data/(data+parity) rate is lower for a similar correction capability.

FWIW, the encoder is simple enough it can be implemented with very modest processing.

I don't know of a simple design guide. Wikipedia is not a bad intro and has some links. Maybe I should write one...

Cheers,

Mark

[ - ]

Reply by drmike●July 22, 2016

The one you missed is Golay code. It is a "perfect" code that uses 24 bits. It is ideal for your application. Start with Wikipedia and expand from there:

https://en.wikipedia.org/wiki/Binary_Golay_code

https://www.usna.edu/Users/math/wdj/_files/documen...

The math is easy for this particular code.

[ - ]

Reply by ombz●July 22, 2016

Hi Mike

Thanks for the valuable hints :) For sure this one would've been ideal.

Cheers

Andy

[ - ]

Reply by Tim Wescott●July 22, 2016

+1 to everything that's been said, and, if you're only ever transmitting from that itty bitty PIC, remember that for most codes, the transmitter side is much easier than the receiver side.

Are you locked into blocks of 24 bits? Remember that for any given coding rate, performance goes up with block size, and the increase is pretty dramatic going from 8-bit blocks to 24. If you're limited to 24 bits and you can squeeze the code into that PIC, using all 24 bits for coding would definitely be the way to go.

[ - ]

Reply by ombz●July 22, 2016

Hi Tim

Yup, for that project the transmitter side was simply that PIC. Might be an ASIC in a future/current project; but I am not working in that company anymore and hence I don't really know what happened in the mean time. :p The receiver side was a 16-bit DSP so much more power at disposal :)

I might do a home project with these kind of things, that's why I posted my question actually.

Cheers

Andy

Error correction codes: good intuitive theory for FW developpers?

Sign in

Top Contributors

Subscribe!

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Older Forums

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group