Forums

Reed-Solomon FEC necessary to have the same characteristic for symbols and words?

Started by lrq3000 June 29, 2015
Hi there,

I have implemented an almost "universal" (read: compatible with most other
decoders output) Reed-Solomon codec. I feel like I have a good intuitive
grasp on the whole codec, except for one thing: I just can't understand
why we are tied to the same characteristic at both the symbol level and at
the word level.

Let me explain: if for example you choose to work on GF(2^8), this means
that you will encode each symbol of a message using 256 possible values,
which is perfect for binary data. However, in all the implementations I
have seen, it seems this also ties the maximum size of the codeword being
256 symbols/polynomial terms.

Intuitively, it seems to me that we could use a different characteristic
for both levels, eg, GF(2^8) for the "symbol level" by effecting only on
the primitive polynomial, and GF(2^32) for the "word level" effecting only
on the generator polynomial.

Am I missing some causal relationship here between those two levels? If
not, then is it really possible to use a different GF for each level, and
do you know of any paper/implementation that did this?
---------------------------------------
Posted through http://www.DSPRelated.com
lrq3000 <106516@DSPRelated> wrote:

>I have implemented an almost "universal" (read: compatible with most other >decoders output) Reed-Solomon codec. I feel like I have a good intuitive >grasp on the whole codec, except for one thing: I just can't understand >why we are tied to the same characteristic at both the symbol level and at >the word level.
>Let me explain: if for example you choose to work on GF(2^8), this means >that you will encode each symbol of a message using 256 possible values, >which is perfect for binary data. However, in all the implementations I >have seen, it seems this also ties the maximum size of the codeword being >256 symbols/polynomial terms.
>Intuitively, it seems to me that we could use a different characteristic >for both levels, eg, GF(2^8) for the "symbol level" by effecting only on >the primitive polynomial, and GF(2^32) for the "word level" effecting only >on the generator polynomial.
>Am I missing some causal relationship here between those two levels?
Nope, you're not missing anything, the category of codes you are talking about have been called (with minor variations) Algebraic Geometry codes, Goppa codes, and/or nonbinary BCH codes. Steve
>Let me explain: if for example you choose to work on GF(2^8), this means >that you will encode each symbol of a message using 256 possible values, >which is perfect for binary data. However, in all the implementations I >have seen, it seems this also ties the maximum size of the codeword
being
>256 symbols/polynomial terms. > >Intuitively, it seems to me that we could use a different characteristic >for both levels, eg, GF(2^8) for the "symbol level" by effecting only on >the primitive polynomial, and GF(2^32) for the "word level" effecting
only
>on the generator polynomial. >
Hi, If you work with GF(2^8), then you have only 256 possible values in your field, and RS code is based on the evaluation of a degree k (k being the number of useful symbols) on n different symbols (n being the total number of symbols transmitted). So n <= 2^8, otherwise you send redundant informations (same evaluations sent more than once). Actually, it seems that often n is choosen to be 2^p-1 (255 in your case), instead of 2^p, but I don't kwow why. Julien --------------------------------------- Posted through http://www.DSPRelated.com
tsd82 <105802@DSPRelated> wrote:

>If you work with GF(2^8), then you have only 256 possible values in your >field, and RS code is based on the evaluation of a degree k (k being the >number of useful symbols) on n different symbols (n being the total number >of symbols transmitted). So n <= 2^8, otherwise you send redundant >informations (same evaluations sent more than once). > >Actually, it seems that often n is choosen to be 2^p-1 (255 in your >case), instead of 2^p, but I don't kwow why. > >Julien
Short explanation: the field is order 256, meaning there are 255 non-zero elements in the field. It is simplest to associate the loations in the codeword with the non-zero field elements; but if the zero element is included then it is called an extended RS code. A further construction allows including an extra location (at infinity) which results in a doubly-extended RS code, length 257 is this case. Steve
On Monday, June 29, 2015 at 8:11:48 PM UTC-5, lrq3000 wrote:
> Hi there, > > I have implemented an almost "universal" (read: compatible with most other > decoders output) Reed-Solomon codec. I feel like I have a good intuitive > grasp on the whole codec, except for one thing: I just can't understand > why we are tied to the same characteristic at both the symbol level and at > the word level. > > Let me explain: if for example you choose to work on GF(2^8), this means > that you will encode each symbol of a message using 256 possible values, > which is perfect for binary data. However, in all the implementations I > have seen, it seems this also ties the maximum size of the codeword being > 256 symbols/polynomial terms. > > Intuitively, it seems to me that we could use a different characteristic > for both levels, eg, GF(2^8) for the "symbol level" by effecting only on > the primitive polynomial, and GF(2^32) for the "word level" effecting only > on the generator polynomial. > > Am I missing some causal relationship here between those two levels? If > not, then is it really possible to use a different GF for each level, and > do you know of any paper/implementation that did this? > --------------------------------------- > Posted through http://www.DSPRelated.com
The _characteristic_ of both GF(2^8) and GF(2^32) is 2; what you mean to ask is whether the fields need to be of the same size (or cardinality if you like fancier words). If you want to use a generator polynomial whose coefficients are in GF(2^32), then the codeword symbols will also be in GF(2^32) which can be represented as 4-vectors of symbols (bytes) from GF(2^8). This is true even if you insist that the information symbols that you plan on using are restricted to GF(2^8): when you do the encoding (with a systematic code), the data symbols will be single bytes but the parity-check symbols will be from GF(2^32) and will need 4 bytes to represent. To get a code that is purely in GF(2^8) will require that you allow as data only those k-byte patterns that will give you (4-byte) parity-check symbols of the form (0,0,0,x) so that the leading zero-valued bytes need not be transmitted. This is called a **subfield subcode** since we are restricting _all_ the symbols (32 bits or 4 bytes) to be in the _subfield_ GF(2^8). They are not nearly as good as Reed-Solomon codes designed from the start to have byte-size symbols and to operate over GF(2^8) as both the symbol field as well as the coefficient field. The TL;DR summary is that Yes, it is possible to use fields of different size but not worth the bother. Dilip Sarwate
Thank you all very much, all the answers were very helpful.

I still tried to use longer codewords than what is normally permitted by
the field's size (eg, codeword of length 510 in field 2^8), and I adapted
the different algorithms to support that, but there's a theoretical wall
that I can't beat: values above 2^8 will just be modulo reduced, so that
any error will have two locations: one in range 0-255 and one in range
256-510. So now I better understand the intuition behind the fact that the
field's size must be the same for both at the codeword length level and at
the symbols values level.

Also thank you Dilip for your explanation, this is an alternative I have
thought of and that I would probably have spent time trying without your
clear explanation that it's not worthwile at all.

Steve Pope, I am interested in learning more about singly and doubly
extended RS code, what is a good resource you would advise me (if possible
with algorithms, not only mathematical equations as I intend to implement
them), and what are the advantages of extended RS codes compared to
standard RS?
---------------------------------------
Posted through http://www.DSPRelated.com
>Thank you all very much, all the answers were very helpful. > >I still tried to use longer codewords than what is normally permitted by >the field's size (eg, codeword of length 510 in field 2^8), and I
adapted
>the different algorithms to support that, but there's a theoretical wall >that I can't beat: values above 2^8 will just be modulo reduced, so that >any error will have two locations: one in range 0-255 and one in range >256-510. So now I better understand the intuition behind the fact that
the
>field's size must be the same for both at the codeword length level and
at
>the symbols values level. > >Also thank you Dilip for your explanation, this is an alternative I have >thought of and that I would probably have spent time trying without your >clear explanation that it's not worthwile at all. > >Steve Pope, I am interested in learning more about singly and doubly >extended RS code, what is a good resource you would advise me (if
possible
>with algorithms, not only mathematical equations as I intend to
implement
>them), and what are the advantages of extended RS codes compared to >standard RS? >--------------------------------------- >Posted through http://www.DSPRelated.com
PS: to extend my explanation about why the field's size needs to be the same, it's because the values of the generated ECC symbols and of any symbol in fact are in fine used to generate the positions of the errors: if a symbol's value is limited to 2^8, you cannot generate a position above, and thus a codeword cannot be of a longer size. So in reality, there is no field's size for the "codeword level" as I imagined: there is only the field's size at the symbol's level, the limit on codeword length is just a natural consequence. Anyway I wonder if some genius mathematician could devise an error correcting code that would not have any limit on the codeword length. I've heard about Cauchy-Reed-Solomon, but in fact it's a trick because it just moves the limit by aggregating several symbols into blocks, and the codeword limit is then on the number of blocks instead of the number of symbols. Have someone heard of any ecc without a limit on the codeword length? --------------------------------------- Posted through http://www.DSPRelated.com
On Thursday, July 16, 2015 at 6:53:51 AM UTC-5, lrq3000 asked:

> Anyway I wonder if some genius mathematician could devise an error > correcting code that would not have any limit on the codeword length.
About 45 years ago, "genius mathematicians" named Bose and Chaudhuri, and independently Hocquenghem, devised a method for constructing t-error-correcting codes that can be used to make binary codes (and more generally, codes over a specified finite field such as GF(256)) that are as long as one wants. Note that t is fixed but the block length n can be made as large as one wants. What exactly do you need? Dilip Sarwate
On Thu, 16 Jul 2015 06:53:48 -0500, "lrq3000" <106516@DSPRelated>
wrote:

>>Thank you all very much, all the answers were very helpful. >> >>I still tried to use longer codewords than what is normally permitted by >>the field's size (eg, codeword of length 510 in field 2^8), and I >adapted >>the different algorithms to support that, but there's a theoretical wall >>that I can't beat: values above 2^8 will just be modulo reduced, so that >>any error will have two locations: one in range 0-255 and one in range >>256-510. So now I better understand the intuition behind the fact that >the >>field's size must be the same for both at the codeword length level and >at >>the symbols values level. >> >>Also thank you Dilip for your explanation, this is an alternative I have >>thought of and that I would probably have spent time trying without your >>clear explanation that it's not worthwile at all. >> >>Steve Pope, I am interested in learning more about singly and doubly >>extended RS code, what is a good resource you would advise me (if >possible >>with algorithms, not only mathematical equations as I intend to >implement >>them), and what are the advantages of extended RS codes compared to >>standard RS? >>--------------------------------------- >>Posted through http://www.DSPRelated.com > >PS: to extend my explanation about why the field's size needs to be the >same, it's because the values of the generated ECC symbols and of any >symbol in fact are in fine used to generate the positions of the errors: >if a symbol's value is limited to 2^8, you cannot generate a position >above, and thus a codeword cannot be of a longer size. So in reality, >there is no field's size for the "codeword level" as I imagined: there is >only the field's size at the symbol's level, the limit on codeword length >is just a natural consequence. > >Anyway I wonder if some genius mathematician could devise an error >correcting code that would not have any limit on the codeword length. I've >heard about Cauchy-Reed-Solomon, but in fact it's a trick because it just >moves the limit by aggregating several symbols into blocks, and the >codeword limit is then on the number of blocks instead of the number of >symbols. > >Have someone heard of any ecc without a limit on the codeword length? >--------------------------------------- >Posted through http://www.DSPRelated.com
Use a FEC like a convolutional code? You can stream it as long as you like. Eric Jacobsen Anchor Hill Communications http://www.anchorhill.com
Eric Jacobsen <eric.jacobsen@ieee.org> wrote:

>On Thu, 16 Jul 2015 06:53:48 -0500, "lrq3000" <106516@DSPRelated>
>>Have someone heard of any ecc without a limit on the codeword length?
>Use a FEC like a convolutional code? You can stream it as long as you >like.
Yes, but the simplest such approaches have multiple coderwords in the stream. Which is fine. One approach I like is Berlekamp's helical interleaver, in which the codeword length N and the interleaving depth D are selected to be relatively prime. Steve