Forums

Question on high-speed decoding for Binary BCH codes

Started by mauroid 5 years ago14 replieslatest reply 5 years ago212 views
Hello,

I have been reading and following the excellent paper "High-Speed Architectures for Reed–Solomon
Decoders" by Dilip V. Sarwate, and Naresh R. Shanbhag. This paper presents among else an optimized, in terms of critical path and latency, process to evaluate the coefficients of the Error Locator Polynomial for any Reed-Solomon code - named the RiBM algorithm.

The paper builds upon the "Inversionless Decoding of Binary BCH Codes" by Herbert O. Burton, which also presents an algorithm (called iBM) that avoids the division in the Berlekamp-Massey iterative algorithm which produces the error locator polynomial for binary BCH codes.

I need to decode a binary BCH code, using something similar to the RiBM algorithm described in the first paper mentioned above . The problem with using the RiBM algorithm as described in the paper above as is is that I would need to spend 2*T cycles to calculate the ELP, wheras for binary BCH codes the expected latency (or number of algorithm "steps") should be reducible to just T.

The algorithm in the 2nd paper mentioned above ("Inversionless Decoding of Binary BCH Codes") does finish in just T steps, however it has a much longer critical path than the RiBM algorithm.

I understand that the RiBM is targeted to Reed-Solomon codes. My question is, is there a modified version of this algorithm, or any other algorithm presented elsewhere, that can be used to calculate the ELP of a binary BCH code in T cycles, with a minimal critical path in each cycle? The RiBM algorithm has a critical path of just one multiplication and one addition per cycle.

Thank you very much in advance.

Best regards,
Dimitrios Mavroeidis
[ - ]
Reply by Retired_ProfSeptember 19, 2017

Edited after response by the OP

The riBM algorithm is a riff on the Berlekamp-Massey (BM) algorithm for decoding BCH codes. Although considered to be a single algorithm, the algorithms described in 1968 by Berlekamp and separately and contemporaneously by Massey are slightly different. In Berlekamp's version of the BM algorithm, each of the \(2t\) iterations has two steps that are performed in succession. With initial values 

$$ \lambda^{(0)}(z) = B^{(0)}(z) = 1, \omega^{(0)}(z) = H^{(0)}(z) = 0, ~\text{and}~ S(z) = S_0 + S_1z + \cdots + S_{2t-1}z^{2t-1},$$ the \(r\)-th iteration (\(r = 0, 1, \ldots, 2t-1\)) does the following:

Step \(\mathbf r\).1 Calculate the \(r\)-th discrepancy \(\delta_r\) from the \( r\)-th error-locator polynomial \(\lambda^{(r)}(z)\) and the syndrome polynomial \(S(z)\).

Step \(\mathbf r\).2 Using the discrepancy \(\delta_r\), update the error-locator polynomial \(\lambda^{(r)}(z)\) to \(\lambda^{(r+1)}(z)\) and the error-locator polynomial \(\omega^{(r)}(z)\) to \(\omega^{(r+1)}(z)\). Also, in parallel, update the auxiliary or "scratch" polynomials \(B^{r)}(z)\) and \(H^{(r)}(z)\) to \(B^{r+1)}(z)\) and \(H^{(r+1)}(z)\).

In Massey's version, the updating of the error-evaluator polynomial is eliminated in favor of a single calculation that finds \(\omega^{(2t)}(z)\) via

$$ \omega^{(2t)}(z) \equiv \lambda^{(2t)}(z)S(z) \bmod z^{2t} \tag{1}$$

which requires an additional \(t\) clock cycles. Note that (1) is telling us that to get \(\omega^{(2t)}(z)\) one should to multiply the polynomials \(\lambda^{(2t)}(z)\) and \(S(z)\) and throw away all terms in the product that have degree \(2t\) or more, what's left is \(\omega^{(2t)}(z)\). Both versions of the BM algorithm use \(2t\) (static) storage locations to store coefficients of the syndrome polynomial as well as shift registers of length \(t+1\) to store the coefficients of \(\lambda^{(r)}(z)\) and \(B^{(r)}(z)\); Berlekamp's version needs registers of length \(t\) to store the coefficients of \(\omega^{(r)}(z)\) and \(H^{(r)}(z)\) while Massey's version does not need \(H^{(r)}(z)\) but does need extra hardware to compute \(\omega^{2t}(z)\) when the iterative part of the calculation is over.  Most modern discussions of the BM algorithm use the Massey version, but the Berlekamp version has much to recommend it, and indeed the riBM and the RiBM algorithm are based on the Berlekamp version.

Because Steps \(\mathbf r\).1 and \(\mathbf r\).2  are carried out in succession, the critical path includes a multiplier in Step \(\mathbf r\).1 and a division circuit (or inverter followed by a multiplier circuit) in Step \(\mathbf r\).2 and thus is quite long. The updating of the error-locator (and the error-evaluator polynomial) in Step \(\mathbf r\).2 can be simplified to use two multipliers operating in parallel for each update which shortens the total critical path to two multiplications (plus two additions) via ideas similar to those in Burton's paper that you cite. There is also, in parallel, a separate updating of the auxiliary polynomials which also uses two multiplications and two additions and thus has the same critical path length.  This is called the inversionless  Berlekamp-Massey (iBM) algorithm in the Sarwate-Shanbhag paper that you are reading.

The riBM riff is different from the inversionless riff: it uses the notion that the calculations in Steps \(\mathbf r\).2 and \(\mathbf{(r+1)}\).1 can be carried out simultaneously (in parallel) so that the discrepancy \(\delta_{r+1}\) needed during the next iteration is ready as soon as the current updating in Step \(\mathbf r\).2 is completed. Now, the discrepancy \(\delta_{r+1}\) that would have been computed in Step \(\mathbf{(r+1)}\).1 is just the coefficient of \(z^{r+1}\) in the polynomial product \(\lambda^{(r+1)}S(z)\) and so the riBM riff is we compute the entire product \(\delta^{(r+1)}(z) = \lambda^{(r+1)}S(z)\) instead of just one term -- the coefficient of \(z^{r+1}\) in the polynomial \(\delta^{(r+1)}(z)\) -- in a modified Step \(\mathbf r\).2. We do this by setting \(\delta^{(0)}(z) = H^{(0)}(z) = S(z)\) and updating \(\delta^{(r)}(z)\) to \(\delta^{(r+1)}(z)\) in exactly the same way as we update \(\lambda^{(r)}(z)\) to \(\lambda^{(r+1)}(z)\)! Now, Step 1.1 needs no calculation whatsoever since \(\delta_0\) is the constant term in \(\lambda^{(0)}(z)S(z) = S(z)\), and so the riBM algorithm proceeds to execute the modified Step 0.2, then modified Step 1.2, and then modified Step 2.2, and so on till the final modified Step \((2t-1)\).2 is executed. Each modified Step \(r\).2 calculates the updated polynomials \(\lambda^{(r+1)}(z)\), \(\delta^{(r+1)}(z)\), \(B^{(r+1)}(z)\), and \(H^{(r+1)}(z)\). The final Step \((2t-1)\).2 produces the desired \(\lambda^{(2t)}(z)\) and a modified error-evaluator polynomial in \(\delta^{(2t)}(z)\) (see the Sarwate-Shanbhag paper for details). Thus, the Steps \(\mathbf r\).1 are completely eliminated (as is the static storage for the syndrome coefficients \(S(z)\)) and the critical path of the riBM algorithm is half that of the iBM algorithm.

------

For binary BCH codes, there is no need to calculate the error-evaluator polynomial at all, and indeed as Berlekamp described in his 1968 book Algebraic Coding Theory (still in print, I believe), the Berlekamp algorithm can be modified to run in just \(t\) iterations instead of \(2t\) iterations. Berlekamp states the steps as 

\begin{align}\lambda^{(2r+2)}(z) &= \lambda^{(2r)}(z) + \delta_{2r}\cdot z \cdot B^{(2r)}(z)\\B^{(2r+2)}(z) &= \begin{cases} z^2 B^{(2r)}(z), & \delta^{(2r)} = 0,\\ \delta_{2r}^{-1}\cdot z \cdot \lambda^{(2r)}(z), & \delta_{2r}\neq 0, \end{cases}\end{align}

which in the inversionless form should work out to be something like

\begin{align}\lambda^{(2r+2)}(z) &= \gamma_{2r}\cdot \lambda^{(2r)}(z) + \delta_{2r}\cdot z \cdot B^{(2r)}(z)\\\left[B^{(2r+2)}(z), \gamma_{2r+2}\right] &= \begin{cases} \left[z^2 B^{(2r)}(z), \gamma_{2r}\right], & \delta^{(2r)} = 0,\\ \left[ z \cdot \lambda^{(2r)}(z), \delta_{2r}\right], & \delta_{2r}\neq 0, \end{cases}\end{align}

I believe that the riBM riff is directly applicable here but the details are a just a tad more complicated. I am not so sure about the RiBM algorithm; it appears that there is an issue there as you have pointed out, but I regret that I don't have any more time to devote to this.

Hope this helps.

[ - ]
Reply by mauroidSeptember 20, 2017

Thank you very much for your reply. This is very helpful, as it clears up the way RiBM works, and the idea behind it.

I am looking exactly for the algorithm that you describe at a high level in your last paragraph for binary BCH codes, which runs in t iterations, and has a minimal critical path - similar but slightly longer to the RiBM critical path, because of wiring delays. The algorithm would also use shorter registers (2t symbols per register instead of 3t?), since the error-evaluator polynomial is not needed. This is less critical though - what matters most is the critical path and the latency of the algorithm (2t cycles).

I was hoping that this algorithm would be ready in some paper or elsewhere, and I would not need to come up with it in detail myself based on the RiBM ideas. If this is not the case, I will try to figure it out myself - however if you happen to have it ready, that would certainly save me a lot of time.

I was able quite easily to modify the RiBM to use only steps for binary BCH, based on the fact that δ0(r) is zero for even r, but my modification uses a critical path of two multiplications and one addition, which is not acceptable. The modified algorithm simply multiplies δi(r+1) with γ(r+1) (which is known) at Step RiBM.1 for odd values of r, and skips even values of r. I tried it out in s/w and it works. I need to do this with a smaller critical path though.

At least now I know that such an algorithm exists! I would appreciate any help in figuring it out.

Thank you very much for your help.

[ - ]
Reply by Retired_ProfSeptember 21, 2017

Sorry, I didn't have the Sarwate-Shanbhag paper in from of me when I wrote my previous response (it was all from memory), and all my descriptions are for the riBM architecture (two separate cellular arrays of lengths \(2t\) and \(t+1\) respectively with different processors (PE1 and PE0) in the arrays) and not the RiBM architecture which has a cellular array of length \(3t+1\) of PE1 processors.  I will correct my previous response when I have the time. But I don't see why you get the critical path to pass through two multipliers. Perhaps if you can write a high-level explanation or maybe a polynomial formula for how the updates are being done?  I don't think the arrays are any shorter, but the number of iterations certainly should decrease from \{2t\) to \(t\).

[ - ]
Reply by mauroidSeptember 21, 2017

Thank you for your reply.

Here are the updates (only) of the The RiBM Algorithm. The original version is the one shown in the Sarwate-Shanbhag paper.

for r = 0 step 2 until 2t - 1 do (notice the step 2)

....

Step RiBM.1 γ(r+1) = (δ0(r)!=0 and k(r) >= 0)? δ0(r) : γ(r))                            δi(r+1) = γ(r+1).(γ(r).δi+1(r) - δ0(r).θ(r)) (i=0..3t)                      δi(r+1) = δi+1(r) (i=0..3t)

....

(The steps above are carried out in sequence, not simultaneously. In H/W the first step would be a simple mux, and the last two steps can be easily combined)

Also in RiBM.2 k(r+1) will now be increased by 1 more in each step, so k(r+1) becomes either equal to -k(r) or equal to k(r) + 2 (instead of -k(r) - 1 and k(r) + 1 respectively).

It's a very simple change. If we were to use the RiBM as it is for binary BCH, for odd values of r, the algorithm will just multiply δ(ρ) with γ(r) and perform a shift left. This is true because for odd values of r, δ0(r)=0. If we do this multiplication and shifting at the prior RiBM.1 step (when was even), we can completely skip the odd iterations of r.

However if we do this, Step RiBM.1 now has two multiplications in its critical path.

[ - ]
Reply by jms_nhSeptember 18, 2017

How timely... I've just been working on an article covering the Berlekamp-Massey algorithm.


Have you read Massey's original paper? http://crypto.stanford.edu/~mironov/cs359/massey.p...

It's fairly readable; the binary version doesn't have any division so I'm not sure what you're talking about.


[ - ]
Reply by mauroidSeptember 18, 2017

Thank you for your reply.

I believe that this paper presents the original Berlekamp-Massey algorithm, also described (somewhat improved) by Blahut.

It certainly involves division. Look how the algorithm in steps (4) and (5) (page 124) multiplies with b to the -1 power - this is a division by b. This was the whole point of the "Inversionless Decoding of Binary BCH Codes" paper, which presents a modified BM algorithm that avoids this finite field division. However this modified algorithm still has a long critical path, so I am looking for an improved version, similar to the RiBM algorithm.


[ - ]
Reply by jms_nhSeptember 18, 2017

p.s. suggest you ask on https://dsp.stackexchange.com/ as Dilip Sarwate is fairly active there.

[ - ]
Reply by mauroidSeptember 18, 2017

Thanks! I will definitely do that!

[ - ]
Reply by stephanebSeptember 18, 2017

Let me see if maybe Professor Sarwate would consider joining DSPRelated, that would be great.

[ - ]
Reply by mauroidSeptember 18, 2017

Dr Sarwate has already kindly answered questions on DSPRelated - here's hoping this one will grab his attention as well! :)

[ - ]
Reply by stephanebSeptember 18, 2017

No kidding?  Indeed, I just found Dr. Sarwate in the users' database. I had already contacted him to ask if he would join, so he will soon be aware of this thread and will maybe chime in.  

[ - ]
Reply by jms_nhSeptember 18, 2017
Look how the algorithm in steps (4) and (5) (page 124) multiplies with b to the -1 power - this is a division by b

Oh, that's a multiplicative inverse. I thought you said you were working on a binary code. In GF(2), the value of b is always 1. Which finite field are you working in? If GF(2^n) what is the value of n? For small n this could just use a lookup table.

(obligatory plug: see https://www.embeddedrelated.com/showarticle/1086.p... )

[ - ]
Reply by mauroidSeptember 18, 2017

The code I am working on is the BCH(4096, 3892), T=12. The codewords are sequences of 0's and 1's (1 bit per symbol) - so each symbol is binary. All calculations in the BM algorithm are done in the extension field GF(2^12).

[ - ]
Reply by jms_nhSeptember 18, 2017
ah -- ok, This is beyond my experience; I've implemented Reed-Solomon once from scratch in Java, but not optimized for speed.