Reply by Dimitris Mavroidis December 26, 20132013-12-26
Hi Eric, my implementation is going to be in hardware. Eventually anyway - for now I am writing a software emulator of the algorithm.

-Dimitri
Reply by Eric Jacobsen December 25, 20132013-12-25
On Wed, 25 Dec 2013 07:51:38 -0800 (PST), Dimitris Mavroidis
<dmavroid@gmail.com> wrote:

> >Thank you Dr Sarwate. I found the list of papers in your student's web page, but could not find any links to download them in pdf form. However, after searching online, I was able to find most of the relevant papers in pdf form. > >I will study those papers and let you know if I have more questions. > >Thank you so much for your kind help. > >Best regards, >Dimitri Mavroeidis
This sounds like an interesting project. Is your implementation in hardware or software? Eric Jacobsen Anchor Hill Communications http://www.anchorhill.com
Reply by Dimitris Mavroidis December 25, 20132013-12-25
Thank you Dr Sarwate. I found the list of papers in your student's web page, but could not find any links to download them in pdf form. However, after searching online, I was able to find most of the relevant papers in pdf form.

I will study those papers and let you know if I have more questions.

Thank you so much for your kind help.

Best regards,
Dimitri Mavroeidis
Reply by dvsarwate December 24, 20132013-12-24
On Tuesday, December 24, 2013 2:25:44 PM UTC-6, Dimitris Mavroidis wrote:
> Dr Sarwate, > > > > Thank you very much for your reply. This makes a lot of sense, and it is indeed very elegant. I fully understand what you say. > > > > I think that the only remaining item which will take a few extra cycles is the division itself in (12). We need to calculate the reciprocal of the denominator and then multiply with the numerator. As far as I know, the only way to calculate the reciprocal would be a lookup table in combinational logic, which can be implemented to operate in multiple pipelined cycles in order to reduce the critical path. > > > > I would certainly value your input if you had some proposal about how to perform the division in a more efficient way than that. > > > > Thank you very much for all your help so far. > > > > Best regards, > > Dimitri Mavroeidis
I have published several papers on inversion and division in finite fields with my student (now ex-student) Zhiyuan Yan of the ECE Dept of Lehigh University. Look at his web page which has most of these available in pdf format. --Dilip Sarwate
Reply by Dimitris Mavroidis December 24, 20132013-12-24
Dr Sarwate,

Thank you very much for your reply. This makes a lot of sense, and it is indeed very elegant. I fully understand what you say.

I think that the only remaining item which will take a few extra cycles is the division itself in (12). We need to calculate the reciprocal of the denominator and then multiply with the numerator. As far as I know, the only way to calculate the reciprocal would be a lookup table in combinational logic, which can be implemented to operate in multiple pipelined cycles in order to reduce the critical path.

I would certainly value your input if you had some proposal about how to perform the division in a more efficient way than that.

Thank you very much for all your help so far.

Best regards,
Dimitri Mavroeidis
Reply by dvsarwate December 24, 20132013-12-24
On Tuesday, December 24, 2013 2:53:56 AM UTC-6, Dimitris Mavroidis wrote:
> Dr Sarwate, > > > > Thank you very much for your reply. I believe however that in this post you are referring to the Chien search. I am aware of the method that you describe, and this is what I am planning to use in my design. > > > > However my question had to do with the Forney calculation, as described in your paper in equation (12), rather than the Chien search. The Forney calculation in your paper, because of the "high-order" EEP that is calculated through the riBM and RiBM algorithms, requires the calculation of a power - Xi^(m0 - 2t). That power may become the critical path. I was wondering if you had any suggestions on how to perform the Forney calculation on each root position efficiently. >
It is the very same thing as I described above. You are evaluating a polynomial z^{m_0+2t}\Omega^{(h)}(z) in which the low order coefficients are 0 (the lowest degree term is m_0+2t) and these terms don't need to be saved in registers, multiplied by constants in one clock cycle etc. So you are doing the same thing as you are doing in the Chien search. Notice also that the denominator in (12) is actually the sum of alternate terms of what you will be computing in the Chien search, that is, both the error evaluation and the Chien search should be done synchronously. The Chien search produces two answers, one telling you if this is an error location, and the other (the sum of alternate terms) telling you what the denominator is in (12). Thus, we always compute a putative error value, discarding it if this is not an error location, and using it if this is indeed an error location. --Dilip Sarwate
Reply by Dimitris Mavroidis December 24, 20132013-12-24
Dr Sarwate,

Thank you very much for your reply. I believe however that in this post you are referring to the Chien search. I am aware of the method that you describe, and this is what I am planning to use in my design.

However my question had to do with the Forney calculation, as described in your paper in equation (12), rather than the Chien search. The Forney calculation in your paper, because of the "high-order" EEP that is calculated through the riBM and RiBM algorithms, requires the calculation of a power - Xi^(m0 - 2t). That power may become the critical path. I was wondering if you had any suggestions on how to perform the Forney calculation on each root position efficiently.

Thank you,
Dimitri Mavroeidis
Reply by dvsarwate December 23, 20132013-12-23
On Monday, December 23, 2013 11:24:28 AM UTC-6, Dimitris Mavroidis wrote:

 
> I am afraid that calculating Xi**(m0 - 2t) in a single clock cycle through combinational logic might be at the critical path of my design, which will force me to decrease the clock frequency accordingly. This as you understand can take a big bite out of the savings of calculating the EEP at the sane time as the ELP. >
Lots of people, when asked about polynomial evaluation, promptly remember Horner's rule as the way to go. However, in the calculations of error locations and error values, one should NOT use Horner's rule because we need to evaluate a polynomial not at one point but at MANY points, and in a fixed and known order. Specifically, we need to evaluate f(x) at 1, a, a^2, a^3, .... IN THAT ORDER. So, put the coefficients f0, f1, ... in separate registers. Their sum is f(1). In one clock cycle, multiply the contents of the registers by 1, a, a^2, a^3, .... respectively. These are scalar multiplications: the quantities a, a^2, a^3, etc are all known to the designer, and can be pre-computed and their values incorporated into the hardware. Multiplication of the contents of a register by a known constant is a one-cycle operation; all you have to do is XOR together various combinations of the register contents and put the result back into the register. Thus, after one clock cycle, the registers contain f0, af1, a^2f2, a^3f3, .... and their sum is f(a). Lather, rinse, repeat... The register contents in successive clock cycles are f0 f1 f2 f3 ...... sum = f(1) f0 af1 a^2f2 a^3f3 ...... sum = f(a) f0 a^2f1 a^4f2 a^6f3 ...... sum = f(a^2) f0 a^3f1 a^6f2 a^9f3 ....... sum = f(a^3) and so on and so forth. --Dilip Sarwate
Reply by Dimitris Mavroidis December 23, 20132013-12-23
Dr. Sarwate,

Thank you very much for your suggestion. I downloaded and read your paper, and I am indeed inclined to use that approach since as you say it can calculate both polynomials in 2t clock cycles. This is excellent.

I do have one question for you though. The error evaluator polynomial (EEP) that you calculate is not exactly the same as the one calculated through the conventional algorithms. In order to use that EEP, one needs to calculate Xi to the power of (m0 + 2t), as shown in equation (12). Of course if one is free to choose m0, and chooses m0 = -2t, this is eliminated, but in my case I can't change m0 - it has to be equal to 0 or my circuit will be incompatible with other circuits that will use m0=0 and thus a different generator polynomial.

I am afraid that calculating Xi**(m0 - 2t) in a single clock cycle through combinational logic might be at the critical path of my design, which will force me to decrease the clock frequency accordingly. This as you understand can take a big bite out of the savings of calculating the EEP at the sane time as the ELP. 

Am I missing something here? Any suggestions would be very welcome.

Regards,
Dimitri Mavroidis

Reply by dvsarwate December 20, 20132013-12-20
On Friday, December 20, 2013 11:04:54 AM UTC-6, Dimitris Mavroidis wrote:
> Hello, > > > > I have been reading the excellent paper titled "A HIGH-SPEED AND LOW-LATENCY REED-SOLOMON DECODER BASED ON A DUAL-LINE STRUCTURE" by Hyeong-Ju Kang and In-Cheol Park. >
I have a different opinion about the Kang-Park approach, which in my opinion misses the boat entirely. For decoders, dual-line if you want to call them that, which find both the error-evaluator and the error-locator polynomial in just 2t clock cycles (instead of the Kang-Park approach of first finding the error-locator in 2t cycles, and then finding the error-evaluator using an additional t+1 cycles for a total latency of 3t+1 cycles), read my paper D. V. Sarwate and N, R. Shanbhag, "High-Speed Architectures for Reed-Solomon Decoders, IEEE Transactions on VLSI Systems, vol. 9, no. 5, Oct 2001. You can find it on IEEExplore or look at this link: http://www.ifp.illinois.edu/~sarwate/ --Dilip Sarwate