comp.dsp | Question on Dual-Line RS decoder

Hello,

I have been reading the excellent paper titled "A HIGH-SPEED AND LOW-LATENCY REED-SOLOMON DECODER BASED ON A DUAL-LINE STRUCTURE" by Hyeong-Ju Kang and In-Cheol Park.

I have been able to understand the algorithm as it is described, and have also been able to reproduce it successfully in software in order to calculate the Error Locator Polynomial. However I have been struggling with the Error Evaluator Polynomial. In the paper the authors suggest that the same dual-line structure can be used to calculate the Error Evaluator Polynomial. I quote, from the paper:

"Since the calculation of the error-evaluator-polynomial is similar to that of error-locator-polynomial, this structure can be reused to produce the error-evaluator-polynomial. When the dual-line structure completes 2t iterations, C(2t)(k) registers are re-initialized properly and processed again. After (t +1) cycles, the error-evaluator-polynomial is in C registers."


I would be most grateful if someone could possibly elaborate a little more on this. Can you please explain how exactly the C (and D) registers are re-initialized in order to calculate the error evaluator polynomial?

Also, I am very interested in reducing the latency as much as possible, so if there is some way to modify the dual-line algorithm and reduce the (3t+1) cycle latency required to produce both polynomials, I would be very interested in that as well.

Thank you very much in advance.

Best regards,
Dimitrios Mavroeidis

Reply by dvsarwate ●December 20, 20132013-12-20

On Friday, December 20, 2013 11:04:54 AM UTC-6, Dimitris Mavroidis wrote:
> Hello,
> 
> 
> 
> I have been reading the excellent paper titled "A HIGH-SPEED AND LOW-LATENCY REED-SOLOMON DECODER BASED ON A DUAL-LINE STRUCTURE" by Hyeong-Ju Kang and In-Cheol Park.
> 

I have a different opinion about the Kang-Park approach,
which in my opinion misses the boat entirely.

For decoders, dual-line if you want to call them that,
which find both the error-evaluator and the error-locator
polynomial in just 2t clock cycles (instead of the Kang-Park
approach of first finding the error-locator in 2t cycles,
and then finding the error-evaluator using an additional 
t+1 cycles for a total latency of 3t+1 cycles), read my
paper

D. V. Sarwate and N, R. Shanbhag, "High-Speed Architectures
for Reed-Solomon Decoders, IEEE Transactions on VLSI Systems,
vol. 9, no. 5, Oct 2001.

You can find it on IEEExplore or look at this link:

http://www.ifp.illinois.edu/~sarwate/

--Dilip Sarwate

Reply by Dimitris Mavroidis ●December 23, 20132013-12-23

Dr. Sarwate,

Thank you very much for your suggestion. I downloaded and read your paper, and I am indeed inclined to use that approach since as you say it can calculate both polynomials in 2t clock cycles. This is excellent.

I do have one question for you though. The error evaluator polynomial (EEP) that you calculate is not exactly the same as the one calculated through the conventional algorithms. In order to use that EEP, one needs to calculate Xi to the power of (m0 + 2t), as shown in equation (12). Of course if one is free to choose m0, and chooses m0 = -2t, this is eliminated, but in my case I can't change m0 - it has to be equal to 0 or my circuit will be incompatible with other circuits that will use m0=0 and thus a different generator polynomial.

I am afraid that calculating Xi**(m0 - 2t) in a single clock cycle through combinational logic might be at the critical path of my design, which will force me to decrease the clock frequency accordingly. This as you understand can take a big bite out of the savings of calculating the EEP at the sane time as the ELP. 

Am I missing something here? Any suggestions would be very welcome.

Regards,
Dimitri Mavroidis

Reply by dvsarwate ●December 23, 20132013-12-23

On Monday, December 23, 2013 11:24:28 AM UTC-6, Dimitris Mavroidis wrote:

> I am afraid that calculating Xi**(m0 - 2t) in a single clock cycle through combinational logic might be at the critical path of my design, which will force me to decrease the clock frequency accordingly. This as you understand can take a big bite out of the savings of calculating the EEP at the sane time as the ELP. 
>

Lots of people, when asked about polynomial evaluation,
promptly remember Horner's rule as the way to go.  However,
in the calculations of error locations and error values,
one should NOT use Horner's rule because we need to evaluate
a polynomial not at one point but at MANY points, and in a
fixed and known order. Specifically, we need to evaluate
f(x) at 1, a, a^2, a^3, ....  IN THAT ORDER.  So, 

put the coefficients f0, f1, ... in separate registers.

Their sum is f(1).

In one clock cycle, multiply the contents of the registers 
by 1, a, a^2, a^3, ....  respectively. 

These are scalar multiplications: the quantities a, a^2, a^3,
etc are all known to the designer, and can be pre-computed 
and their values incorporated into the hardware. Multiplication
of the contents of a register by a known constant is a one-cycle
operation; all you have to do is XOR together various combinations
of the register contents and put the result back into the
register.

Thus, after one clock cycle, the registers contain f0, af1,
a^2f2, a^3f3, .... and their sum is f(a).

Lather, rinse, repeat...  The register contents in successive
clock cycles are

f0     f1      f2      f3     ......   sum = f(1)

f0    af1   a^2f2   a^3f3     ......   sum = f(a)

f0   a^2f1  a^4f2   a^6f3     ......   sum = f(a^2)

f0   a^3f1  a^6f2   a^9f3     .......  sum = f(a^3)

and so on and so forth.

--Dilip Sarwate

Reply by Dimitris Mavroidis ●December 24, 20132013-12-24

Dr Sarwate,

Thank you very much for your reply. I believe however that in this post you are referring to the Chien search. I am aware of the method that you describe, and this is what I am planning to use in my design.

However my question had to do with the Forney calculation, as described in your paper in equation (12), rather than the Chien search. The Forney calculation in your paper, because of the "high-order" EEP that is calculated through the riBM and RiBM algorithms, requires the calculation of a power - Xi^(m0 - 2t). That power may become the critical path. I was wondering if you had any suggestions on how to perform the Forney calculation on each root position efficiently.

Thank you,
Dimitri Mavroeidis

Reply by dvsarwate ●December 24, 20132013-12-24

On Tuesday, December 24, 2013 2:53:56 AM UTC-6, Dimitris Mavroidis wrote:
> Dr Sarwate,
> 
> 
> 
> Thank you very much for your reply. I believe however that in this post you are referring to the Chien search. I am aware of the method that you describe, and this is what I am planning to use in my design.
> 
> 
> 
> However my question had to do with the Forney calculation, as described in your paper in equation (12), rather than the Chien search. The Forney calculation in your paper, because of the "high-order" EEP that is calculated through the riBM and RiBM algorithms, requires the calculation of a power - Xi^(m0 - 2t). That power may become the critical path. I was wondering if you had any suggestions on how to perform the Forney calculation on each root position efficiently.
> 

It is the very same thing as I described above.  

You are evaluating a polynomial z^{m_0+2t}\Omega^{(h)}(z)
in which the low order coefficients are 0 (the lowest
degree term is m_0+2t) and these terms don't need to be 
saved in registers, multiplied by constants in one clock
cycle etc. So you are doing the same thing as you are
doing in the Chien search. Notice also that the denominator
in (12) is actually the sum of alternate terms of what you
will be computing in the Chien search, that is, both the
error evaluation and the Chien search should be done
synchronously. The Chien search produces two answers, one
telling you if this is an error location, and the other
(the sum of alternate terms) telling you what the denominator
is in (12). Thus, we always compute a putative error value,
discarding it if this is not an error location, and using it
if this is indeed an error location.

--Dilip Sarwate

Reply by Dimitris Mavroidis ●December 24, 20132013-12-24

Dr Sarwate,

Thank you very much for your reply. This makes a lot of sense, and it is indeed very elegant. I fully understand what you say.

I think that the only remaining item which will take a few extra cycles is the division itself in (12). We need to calculate the reciprocal of the denominator and then multiply with the numerator. As far as I know, the only way to calculate the reciprocal would be a lookup table in combinational logic, which can be implemented to operate in multiple pipelined cycles in order to reduce the critical path.

I would certainly value your input if you had some proposal about how to perform the division in a more efficient way than that.

Thank you very much for all your help so far.

Best regards,
Dimitri Mavroeidis

Reply by dvsarwate ●December 24, 20132013-12-24

On Tuesday, December 24, 2013 2:25:44 PM UTC-6, Dimitris Mavroidis wrote:
> Dr Sarwate,
> 
> 
> 
> Thank you very much for your reply. This makes a lot of sense, and it is indeed very elegant. I fully understand what you say.
> 
> 
> 
> I think that the only remaining item which will take a few extra cycles is the division itself in (12). We need to calculate the reciprocal of the denominator and then multiply with the numerator. As far as I know, the only way to calculate the reciprocal would be a lookup table in combinational logic, which can be implemented to operate in multiple pipelined cycles in order to reduce the critical path.
> 
> 
> 
> I would certainly value your input if you had some proposal about how to perform the division in a more efficient way than that.
> 
> 
> 
> Thank you very much for all your help so far.
> 
> 
> 
> Best regards,
> 
> Dimitri Mavroeidis

I have published several papers on inversion and division
in finite fields with my student (now ex-student) Zhiyuan 
Yan of the ECE Dept of Lehigh University.  Look at his web page
which has most of these available in pdf format.

--Dilip Sarwate

Reply by Dimitris Mavroidis ●December 25, 20132013-12-25

Thank you Dr Sarwate. I found the list of papers in your student's web page, but could not find any links to download them in pdf form. However, after searching online, I was able to find most of the relevant papers in pdf form.

I will study those papers and let you know if I have more questions.

Thank you so much for your kind help.

Best regards,
Dimitri Mavroeidis

Reply by Eric Jacobsen ●December 25, 20132013-12-25

On Wed, 25 Dec 2013 07:51:38 -0800 (PST), Dimitris Mavroidis
<dmavroid@gmail.com> wrote:

>
>Thank you Dr Sarwate. I found the list of papers in your student's web page, but could not find any links to download them in pdf form. However, after searching online, I was able to find most of the relevant papers in pdf form.
>
>I will study those papers and let you know if I have more questions.
>
>Thank you so much for your kind help.
>
>Best regards,
>Dimitri Mavroeidis

This sounds like an interesting project.   Is your implementation in
hardware or software?


Eric Jacobsen
Anchor Hill Communications
http://www.anchorhill.com

Previous12 Next

Question on Dual-Line RS decoder

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group