Reply by July 18, 20172017-07-18
On Saturday, February 11, 2017 at 4:44:28 PM UTC-8, rickman wrote:
> I don't think I've ever really dug into the CORDIC algorithm enough to > appreciate the logic involved. I do recall coming to the realization > that the shift and add/subtract algorithm is not overly different from a > multiplication, so I had trouble understanding why touted as a way to > avoid the use of a multiplier.
I first knew CORDIC from the implementations in scientific calculators. Calculators commonly do digit serial BCD arithmetic, and I believe that the algorithm is a little different in that case, but not much. If you want to do a 10 decimal digit, or maybe 36 bit binary calculation, multiplication is a lot of work. With 36 bit binary, you do shift and conditional add, 36 times. Each add is one loop through the serial adder, but you might do it four bits at a time. But in any case, with a serial adder (binary or bcd) addition is O(N), multiply is O(N**2). Now, say you do atan() using a 10 term polynomial, and you now have about 10 multiplies. But the whole CORDIC to compute sine, cosine, tangent, or arctan takes only about as much as one multiply. Actually, I believe that there is an additional multiply in the last or first step of CORDIC, but it is still much faster than the polynomials used in software on most computer systems. This leaves out the argument reduction needed before the polynomial, and I suspect before CORDIC. But given a bit or digit serial adder, it works very well.
Reply by February 13, 20172017-02-13
Le samedi 11 février 2017 22:19:29 UTC-5, Tim Wescott a écrit :

> > In a day when multiplies were much more expensive than additions, CORDIC > made oodles of sense, because you were just shifting and adding. Now we > have block RAM and DSP blocks with single-cycle multiplies in our FPGAs > and single-cycle multiply instructions in our DSP chips, and CORDIC > doesn't make so much sense. >
> -- > > Tim Wescott > Wescott Design Services > http://www.wescottdesign.com > > I'm looking for work -- see my website!
I used Cordic 10 years ago in an FPGA for the Atan2 function as using a LUT for 16-bit X and 16-bit Y inputs would have required a huge (or yuge if you prefer) external memory. AFAIK, for Atan2, Cordic is still the best way to go. Maybe a hybrid LUT with linear interpolation would yield good results too.
Reply by Tim Wescott February 13, 20172017-02-13
On Mon, 13 Feb 2017 01:16:40 +0000, eric.jacobsen wrote:

> On Sun, 12 Feb 2017 15:58:02 -0500, rickman <gnuarm@gmail.com> wrote: > >>On 2/12/2017 11:26 AM, Tim Wescott wrote: >>> On Sun, 12 Feb 2017 00:36:52 -0500, rickman wrote: >>>> >>>> If there is something CORDIC is better at, I'm not getting it. If it >>>> only used a single add per iteration or even two I would say yes, it >>>> has it's uses. But with three adds per iteration it sounds more >>>> costly than having to do either a multiply or a divide. >>> >>> If you're thinking in terms of raw gates, or chip area, perhaps the >>> memory savings is worth the extra effort. Certainly in 1956 or >>> whatever it was the coolest of the cool -- but then, in 1956, memory >>> was either a few transistors per bit, or it was a little toroid woven >>> into a mess of wires by some soon-to-be nearsighted young lady. >> >>I'm not sure how they would have done this in 1956. Wasn't this for a >>B-58 navigation computer? What kind of digital electronics did they >>have then? Looking at a book on "Pulse and Digital Circuit" I see the >>only mention of transistors is in special chapter at the end of the >>book. Wasn't the transistor brand new then, invented in '51? Even >>though they were available they were expensive and an entire FF would >>occupy a small PCB. >> >>In high school (circa '69) we had been given an old computer from the >>weather service. It had PCBs about 3-4 inches square which would hold a >>single FF, or a gate. Even an 8 bit CORDIC would be tough to implement >>in this technology unless the adds were bit serial. I guess that could >>work in a box a foot on each side. It can be hard to think in terms of >>bit serial. Everything has to be a shift register. >> >>Here is a reference by Volder about the invention of CORDIC with a >>picture of the navigation computer. >> >>http://late-dpedago.urv.cat/site_media/papers/fulltext_2.pdf >> >>It's a lot bigger than a foot cube and I think I see some tubes. >> >>As an aside, the B-58 was a supersonic bomber. I don't think we have >>any of those. The paper says it was scrapped in favor of balistic >>missiles which makes perfect sense. > > B-1B are supersonic bombers that are still in service. B-58s have been > retired for more than fifty years.
I wonder if they're still used in supersonic, or if the wings just stay out while they fly high & drop smart bombs. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
Reply by Tim Wescott February 12, 20172017-02-12
On Sun, 12 Feb 2017 15:58:02 -0500, rickman wrote:

> On 2/12/2017 11:26 AM, Tim Wescott wrote: >> On Sun, 12 Feb 2017 00:36:52 -0500, rickman wrote: >>> >>> If there is something CORDIC is better at, I'm not getting it. If it >>> only used a single add per iteration or even two I would say yes, it >>> has it's uses. But with three adds per iteration it sounds more >>> costly than having to do either a multiply or a divide. >> >> If you're thinking in terms of raw gates, or chip area, perhaps the >> memory savings is worth the extra effort. Certainly in 1956 or >> whatever it was the coolest of the cool -- but then, in 1956, memory >> was either a few transistors per bit, or it was a little toroid woven >> into a mess of wires by some soon-to-be nearsighted young lady. > > I'm not sure how they would have done this in 1956. Wasn't this for a > B-58 navigation computer? What kind of digital electronics did they > have then? Looking at a book on "Pulse and Digital Circuit" I see the > only mention of transistors is in special chapter at the end of the > book. Wasn't the transistor brand new then, invented in '51? Even > though they were available they were expensive and an entire FF would > occupy a small PCB.
Point-contact transistors were working in the lab in December of 1947. Junction transistors came along in 1950. Your article below says the computer was implemented with DTL, so -- transistors. I think the cylindrical things you see are probably relays, but it could have had toobs in there, too. Given that the thing is _called_ the CORDIC, I think the algorithm is a central part of the computer. <snip>
> Here is a reference by Volder about the invention of CORDIC with a > picture of the navigation computer. > > http://late-dpedago.urv.cat/site_media/papers/fulltext_2.pdf > > It's a lot bigger than a foot cube and I think I see some tubes.
Thanks for digging that up! <more snip> -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
Reply by February 12, 20172017-02-12
On Sun, 12 Feb 2017 15:58:02 -0500, rickman <gnuarm@gmail.com> wrote:

>On 2/12/2017 11:26 AM, Tim Wescott wrote: >> On Sun, 12 Feb 2017 00:36:52 -0500, rickman wrote: >>> >>> If there is something CORDIC is better at, I'm not getting it. If it >>> only used a single add per iteration or even two I would say yes, it has >>> it's uses. But with three adds per iteration it sounds more costly than >>> having to do either a multiply or a divide. >> >> If you're thinking in terms of raw gates, or chip area, perhaps the >> memory savings is worth the extra effort. Certainly in 1956 or whatever >> it was the coolest of the cool -- but then, in 1956, memory was either a >> few transistors per bit, or it was a little toroid woven into a mess of >> wires by some soon-to-be nearsighted young lady. > >I'm not sure how they would have done this in 1956. Wasn't this for a >B-58 navigation computer? What kind of digital electronics did they >have then? Looking at a book on "Pulse and Digital Circuit" I see the >only mention of transistors is in special chapter at the end of the >book. Wasn't the transistor brand new then, invented in '51? Even >though they were available they were expensive and an entire FF would >occupy a small PCB. > >In high school (circa '69) we had been given an old computer from the >weather service. It had PCBs about 3-4 inches square which would hold a >single FF, or a gate. Even an 8 bit CORDIC would be tough to implement >in this technology unless the adds were bit serial. I guess that could >work in a box a foot on each side. It can be hard to think in terms of >bit serial. Everything has to be a shift register. > >Here is a reference by Volder about the invention of CORDIC with a >picture of the navigation computer. > >http://late-dpedago.urv.cat/site_media/papers/fulltext_2.pdf > >It's a lot bigger than a foot cube and I think I see some tubes. > >As an aside, the B-58 was a supersonic bomber. I don't think we have >any of those. The paper says it was scrapped in favor of balistic >missiles which makes perfect sense.
B-1B are supersonic bombers that are still in service. B-58s have been retired for more than fifty years. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Reply by rickman February 12, 20172017-02-12
On 2/12/2017 11:26 AM, Tim Wescott wrote:
> On Sun, 12 Feb 2017 00:36:52 -0500, rickman wrote: >> >> If there is something CORDIC is better at, I'm not getting it. If it >> only used a single add per iteration or even two I would say yes, it has >> it's uses. But with three adds per iteration it sounds more costly than >> having to do either a multiply or a divide. > > If you're thinking in terms of raw gates, or chip area, perhaps the > memory savings is worth the extra effort. Certainly in 1956 or whatever > it was the coolest of the cool -- but then, in 1956, memory was either a > few transistors per bit, or it was a little toroid woven into a mess of > wires by some soon-to-be nearsighted young lady.
I'm not sure how they would have done this in 1956. Wasn't this for a B-58 navigation computer? What kind of digital electronics did they have then? Looking at a book on "Pulse and Digital Circuit" I see the only mention of transistors is in special chapter at the end of the book. Wasn't the transistor brand new then, invented in '51? Even though they were available they were expensive and an entire FF would occupy a small PCB. In high school (circa '69) we had been given an old computer from the weather service. It had PCBs about 3-4 inches square which would hold a single FF, or a gate. Even an 8 bit CORDIC would be tough to implement in this technology unless the adds were bit serial. I guess that could work in a box a foot on each side. It can be hard to think in terms of bit serial. Everything has to be a shift register. Here is a reference by Volder about the invention of CORDIC with a picture of the navigation computer. http://late-dpedago.urv.cat/site_media/papers/fulltext_2.pdf It's a lot bigger than a foot cube and I think I see some tubes. As an aside, the B-58 was a supersonic bomber. I don't think we have any of those. The paper says it was scrapped in favor of balistic missiles which makes perfect sense. -- Rick C
Reply by Tim Wescott February 12, 20172017-02-12
On Sun, 12 Feb 2017 02:05:41 -0500, rickman wrote:

> On 2/12/2017 1:40 AM, eric.jacobsen@ieee.org wrote: >> On Sun, 12 Feb 2017 00:36:52 -0500, rickman <gnuarm@gmail.com> wrote: >> >>> On 2/11/2017 10:19 PM, Tim Wescott wrote: >>>> On Sat, 11 Feb 2017 19:44:26 -0500, rickman wrote: >>>> >>>>> I don't think I've ever really dug into the CORDIC algorithm enough >>>>> to appreciate the logic involved. I do recall coming to the >>>>> realization that the shift and add/subtract algorithm is not overly >>>>> different from a multiplication, so I had trouble understanding why >>>>> touted as a way to avoid the use of a multiplier. >>>>> >>>>> Looking at it harder, it would appear that the level of complexity >>>>> is that of *three* multipliers. I may not fully understand the >>>>> algorithm, but I believe it consists of repeated add/subtracts to >>>>> rotate the vector by successively smaller amounts corresponding to >>>>> arctan(2^-i). At each step there are three add/subtract operations. >>>>> The vector is multiplied by a matrix equivalent to two >>>>> add/subtracts. But to determine if the operations will be adds or >>>>> subtracts the current rotation of the vector must be compared to the >>>>> angle desired (a subtraction). I'm a bit fuzzy on this (I probably >>>>> should push this around in my head a bit longer) but I believe there >>>>> is a fourth operation of adding/subtracting arctan(2^-i) >>>>> to update the current rotation angle. >>>>> >>>>> So four add/subtracts for each iteration. That's actually like four >>>>> multipliers, no? >>>>> >>>>> From what I've read the iterations can be shortened by starting the >>>>> calculations using a LUT for some number of bits to be precomputed. >>>>> >>>>> Also, since the values of arctan(2^-i) are precomputed and >>>>> successively added, the errors in these values are accumulative in >>>>> the usual way of (hopefully) uncorrelated noise. So while the >>>>> result of the sine or cosine may be bit exact to within 1 lsb, the >>>>> angle is not. So you get a perfectly correct sine for the slightly >>>>> wrong angle. >>>>> >>>>> Am I understanding this correctly? Or am I making some part of this >>>>> more complex than it really is in implementation? >>>> >>>> That matches my understanding, and I've been looking into it myself, >>>> lately. >>>> >>>> In a day when multiplies were much more expensive than additions, >>>> CORDIC made oodles of sense, because you were just shifting and >>>> adding. Now we have block RAM and DSP blocks with single-cycle >>>> multiplies in our FPGAs and single-cycle multiply instructions in our >>>> DSP chips, and CORDIC doesn't make so much sense. >>> >>> I HATE when people say, "CORDIC... [is] just shifting and adding". >>> That's /exactly/ what multiplication is. I thought the trig method >>> (sin(a)cos(b)+cos(a)sin(b)) was more complex because it has other >>> operations in addition to the multiply, but it turns out there are the >>> equivalent of *three* multiplies and a fairly small look up table in >>> the CORDIC while the trig method has but one multiply along with a >>> larger look up table. The only advantage is the CORDIC computes both >>> sine and cosine simultaneously. It appears to be integral with the >>> algorithm, there's no way to not compute both. Useful if you are >>> generating a quadrature signal, but still more expensive with three >>> adds per significant bit rather than just two with the trig method >>> (for both sine and cosine). >> >> There are many implementation methods for multipliers, and most don't >> actually use shifts and adds, well, not in the usual sense, anyway. >> Wallace trees and stuff like that. From that perspective "shifts and >> adds" applies better to the CORDIC than to typical multplier >> implementations, although the concepts both work that way. >> >> And in some applications, like a complex mixer, you need the >> simultaneous sin and cos, so CORDIC was a good thing to have before >> multipliers and LUTs got cheaper. > > That's my point. However you are doing the CORDIC, the multiply is > easier. A CORDIC uses *three* adds per iteration, a multiply only > *one*. Using the trig method to calculate sine *and* cosine uses two > multipliers. Do the math. > > >>>> I think it may still make sense for rotating a vector to be entirely >>>> (well, almost entirely) real, either for the purpose of finding its >>>> magnitude or to find its angle. But that's because taking the >>>> arctangent pretty much unavoidably requires a divide, which is still >>>> a multi-clock operation: CORDIC (at least on an FPGA, and if I'm not >>>> mistaken) probably saves you there. >>> >>> I think you are talking about finding the phase angle of a complex >>> pair. >>> I'm not sure if CORDIC saves anything there either. >>> >>> There is nothing inherent in FPGA calculations that require multiple >>> clocks. It just depends on how fast the clock is running, if you can >>> wait for the result and can afford the hardware. Divide is just the >>> inverse of multiply. Brute force will get 1 bit of the result per >>> iteration (not the same thing as clocks). There are other ways >>> involving iteration by multiplying by a guess and adjusting the result >>> to get closer. This can converge faster than repeated subtractions, 1 >>> bit at a time, but requires multiple multiplies. >> >> Our CORDICs only ever had one clock as far as I recall. > > I don't know if you mean you can do the entire CORDIC in one clock or it > is pipelined. It's the same hardware in either case, you just separate > the stages with registers (typically free with the logic in FPGAs) to > pipeline it. One way requires a slow clock to get a result, but you get > it one clock after you set up the inputs. The pipeline requires you to > wait N clocks, but you can get a result on every clock. > > >>>> Ditto -- possibly -- for doing division, although I'm not so sure >>>> that shift-and-subtract isn't better (and certainly more direct). >>> >>> I don't know exactly how the CORDIC would work to calculate arctan. I >>> suppose it would be similar but starting with a zero angle, choose up >>> or down by comparing the Y value to 0. In the end rotation of the >>> vector will produce an angle and a magnitude. >> >> Back when CORDICs were the answer to everything there were some app >> notes and white papers and stuff floating around that had CORDIC >> solutions for nearly anything, e.g., all the trig functions, etc., etc. >> So that stuff was sorted out and published from multiple sources and >> might be still found with a bit of sleuthing. >> >>>> So -- if you're designing an ASIC at the gate level, CORDIC may still >>>> be the bee's knees. Or if you just have to do DSP with an 8051 or a >>>> CPLD. But in this degenerate age I don't think that you need to pick >>>> up the CORDIC algorithm to rotate a vector any more than you need a >>>> big book of log tables to multiply two numbers. >>> >>> If there is something CORDIC is better at, I'm not getting it. If it >>> only used a single add per iteration or even two I would say yes, it >>> has it's uses. But with three adds per iteration it sounds more >>> costly than having to do either a multiply or a divide. >> >> That's been the conclusion that many have come to. As Tim alluded, >> once in a while something pops up where somebody still can't use a >> multiplier in silicon because of some extreme constraint on size or >> power or something, and they'll plop a CORDIC in instead. If >> multipliers and memory aren't constrained out, I don't know why you'd >> use a CORDIC. > > As I've said, the reason I got into this recently was because someone in > another group claimed superhuman powers for the CORDIC of giving > *exactly* correct results other than the lsb. I take that as &plusmn;1 lsb. > Turns out this is not correct. It's a bit of a Douglas Adams thing, yes > the answer is perfectly correct, but the question (angle) is wrong by a > little bit, more than &plusmn;1 lsb some of the time.
In 1956, with the hardware available at the time, the CORDIC probably _did_ look superhuman. I think what you were seeing was residual shine from that. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
Reply by Tim Wescott February 12, 20172017-02-12
On Sun, 12 Feb 2017 00:36:52 -0500, rickman wrote:

> On 2/11/2017 10:19 PM, Tim Wescott wrote: >> On Sat, 11 Feb 2017 19:44:26 -0500, rickman wrote: >> >>> I don't think I've ever really dug into the CORDIC algorithm enough to >>> appreciate the logic involved. I do recall coming to the realization >>> that the shift and add/subtract algorithm is not overly different from >>> a multiplication, so I had trouble understanding why touted as a way >>> to avoid the use of a multiplier. >>> >>> Looking at it harder, it would appear that the level of complexity is >>> that of *three* multipliers. I may not fully understand the >>> algorithm, but I believe it consists of repeated add/subtracts to >>> rotate the vector by successively smaller amounts corresponding to >>> arctan(2^-i). At each step there are three add/subtract operations. >>> The vector is multiplied by a matrix equivalent to two add/subtracts. >>> But to determine if the operations will be adds or subtracts the >>> current rotation of the vector must be compared to the angle desired >>> (a subtraction). I'm a bit fuzzy on this (I probably should push this >>> around in my head a bit longer) but I believe there is a fourth >>> operation of adding/subtracting arctan(2^-i) >>> to update the current rotation angle. >>> >>> So four add/subtracts for each iteration. That's actually like four >>> multipliers, no? >>> >>> From what I've read the iterations can be shortened by starting the >>> calculations using a LUT for some number of bits to be precomputed. >>> >>> Also, since the values of arctan(2^-i) are precomputed and >>> successively added, the errors in these values are accumulative in the >>> usual way of (hopefully) uncorrelated noise. So while the result of >>> the sine or cosine may be bit exact to within 1 lsb, the angle is not. >>> So you get a perfectly correct sine for the slightly wrong angle. >>> >>> Am I understanding this correctly? Or am I making some part of this >>> more complex than it really is in implementation? >> >> That matches my understanding, and I've been looking into it myself, >> lately. >> >> In a day when multiplies were much more expensive than additions, >> CORDIC made oodles of sense, because you were just shifting and adding. >> Now we have block RAM and DSP blocks with single-cycle multiplies in >> our FPGAs and single-cycle multiply instructions in our DSP chips, and >> CORDIC doesn't make so much sense. > > I HATE when people say, "CORDIC... [is] just shifting and adding". > That's /exactly/ what multiplication is. I thought the trig method > (sin(a)cos(b)+cos(a)sin(b)) was more complex because it has other > operations in addition to the multiply, but it turns out there are the > equivalent of *three* multiplies and a fairly small look up table in the > CORDIC while the trig method has but one multiply along with a larger > look up table. The only advantage is the CORDIC computes both sine and > cosine simultaneously. It appears to be integral with the algorithm, > there's no way to not compute both. Useful if you are generating a > quadrature signal, but still more expensive with three adds per > significant bit rather than just two with the trig method (for both sine > and cosine).
Ah, I get your point now.
>> I think it may still make sense for rotating a vector to be entirely >> (well, almost entirely) real, either for the purpose of finding its >> magnitude or to find its angle. But that's because taking the >> arctangent pretty much unavoidably requires a divide, which is still a >> multi-clock operation: CORDIC (at least on an FPGA, and if I'm not >> mistaken) probably saves you there. > > I think you are talking about finding the phase angle of a complex pair. > I'm not sure if CORDIC saves anything there either.
Yes, that's what I meant. I don't know either, but if I were doing the work I'd check.
> There is nothing inherent in FPGA calculations that require multiple > clocks. It just depends on how fast the clock is running, if you can > wait for the result and can afford the hardware. Divide is just the > inverse of multiply. Brute force will get 1 bit of the result per > iteration (not the same thing as clocks). There are other ways > involving iteration by multiplying by a guess and adjusting the result > to get closer. This can converge faster than repeated subtractions, 1 > bit at a time, but requires multiple multiplies. >
I was using "clocks" as shorthand for "clocks or real estate".
>> Ditto -- possibly -- for doing division, although I'm not so sure that >> shift-and-subtract isn't better (and certainly more direct). > > I don't know exactly how the CORDIC would work to calculate arctan. I > suppose it would be similar but starting with a zero angle, choose up or > down by comparing the Y value to 0. In the end rotation of the vector > will produce an angle and a magnitude.
Yes, exactly.
>> So -- if you're designing an ASIC at the gate level, CORDIC may still >> be the bee's knees. Or if you just have to do DSP with an 8051 or a >> CPLD. But in this degenerate age I don't think that you need to pick up >> the CORDIC algorithm to rotate a vector any more than you need a big >> book of log tables to multiply two numbers. > > If there is something CORDIC is better at, I'm not getting it. If it > only used a single add per iteration or even two I would say yes, it has > it's uses. But with three adds per iteration it sounds more costly than > having to do either a multiply or a divide.
If you're thinking in terms of raw gates, or chip area, perhaps the memory savings is worth the extra effort. Certainly in 1956 or whatever it was the coolest of the cool -- but then, in 1956, memory was either a few transistors per bit, or it was a little toroid woven into a mess of wires by some soon-to-be nearsighted young lady. -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
Reply by rickman February 12, 20172017-02-12
On 2/12/2017 1:40 AM, eric.jacobsen@ieee.org wrote:
> On Sun, 12 Feb 2017 00:36:52 -0500, rickman <gnuarm@gmail.com> wrote: > >> On 2/11/2017 10:19 PM, Tim Wescott wrote: >>> On Sat, 11 Feb 2017 19:44:26 -0500, rickman wrote: >>> >>>> I don't think I've ever really dug into the CORDIC algorithm enough to >>>> appreciate the logic involved. I do recall coming to the realization >>>> that the shift and add/subtract algorithm is not overly different from a >>>> multiplication, so I had trouble understanding why touted as a way to >>>> avoid the use of a multiplier. >>>> >>>> Looking at it harder, it would appear that the level of complexity is >>>> that of *three* multipliers. I may not fully understand the algorithm, >>>> but I believe it consists of repeated add/subtracts to rotate the vector >>>> by successively smaller amounts corresponding to arctan(2^-i). At each >>>> step there are three add/subtract operations. The vector is multiplied >>>> by a matrix equivalent to two add/subtracts. But to determine if the >>>> operations will be adds or subtracts the current rotation of the vector >>>> must be compared to the angle desired (a subtraction). I'm a bit fuzzy >>>> on this (I probably should push this around in my head a bit longer) but >>>> I believe there is a fourth operation of adding/subtracting arctan(2^-i) >>>> to update the current rotation angle. >>>> >>>> So four add/subtracts for each iteration. That's actually like four >>>> multipliers, no? >>>> >>>> From what I've read the iterations can be shortened by starting the >>>> calculations using a LUT for some number of bits to be precomputed. >>>> >>>> Also, since the values of arctan(2^-i) are precomputed and successively >>>> added, the errors in these values are accumulative in the usual way of >>>> (hopefully) uncorrelated noise. So while the result of the sine or >>>> cosine may be bit exact to within 1 lsb, the angle is not. So you get a >>>> perfectly correct sine for the slightly wrong angle. >>>> >>>> Am I understanding this correctly? Or am I making some part of this >>>> more complex than it really is in implementation? >>> >>> That matches my understanding, and I've been looking into it myself, >>> lately. >>> >>> In a day when multiplies were much more expensive than additions, CORDIC >>> made oodles of sense, because you were just shifting and adding. Now we >>> have block RAM and DSP blocks with single-cycle multiplies in our FPGAs >>> and single-cycle multiply instructions in our DSP chips, and CORDIC >>> doesn't make so much sense. >> >> I HATE when people say, "CORDIC... [is] just shifting and adding". >> That's /exactly/ what multiplication is. I thought the trig method >> (sin(a)cos(b)+cos(a)sin(b)) was more complex because it has other >> operations in addition to the multiply, but it turns out there are the >> equivalent of *three* multiplies and a fairly small look up table in the >> CORDIC while the trig method has but one multiply along with a larger >> look up table. The only advantage is the CORDIC computes both sine and >> cosine simultaneously. It appears to be integral with the algorithm, >> there's no way to not compute both. Useful if you are generating a >> quadrature signal, but still more expensive with three adds per >> significant bit rather than just two with the trig method (for both sine >> and cosine). > > There are many implementation methods for multipliers, and most don't > actually use shifts and adds, well, not in the usual sense, anyway. > Wallace trees and stuff like that. From that perspective "shifts and > adds" applies better to the CORDIC than to typical multplier > implementations, although the concepts both work that way. > > And in some applications, like a complex mixer, you need the > simultaneous sin and cos, so CORDIC was a good thing to have before > multipliers and LUTs got cheaper.
That's my point. However you are doing the CORDIC, the multiply is easier. A CORDIC uses *three* adds per iteration, a multiply only *one*. Using the trig method to calculate sine *and* cosine uses two multipliers. Do the math.
>>> I think it may still make sense for rotating a vector to be entirely >>> (well, almost entirely) real, either for the purpose of finding its >>> magnitude or to find its angle. But that's because taking the arctangent >>> pretty much unavoidably requires a divide, which is still a multi-clock >>> operation: CORDIC (at least on an FPGA, and if I'm not mistaken) probably >>> saves you there. >> >> I think you are talking about finding the phase angle of a complex pair. >> I'm not sure if CORDIC saves anything there either. >> >> There is nothing inherent in FPGA calculations that require multiple >> clocks. It just depends on how fast the clock is running, if you can >> wait for the result and can afford the hardware. Divide is just the >> inverse of multiply. Brute force will get 1 bit of the result per >> iteration (not the same thing as clocks). There are other ways >> involving iteration by multiplying by a guess and adjusting the result >> to get closer. This can converge faster than repeated subtractions, 1 >> bit at a time, but requires multiple multiplies. > > Our CORDICs only ever had one clock as far as I recall.
I don't know if you mean you can do the entire CORDIC in one clock or it is pipelined. It's the same hardware in either case, you just separate the stages with registers (typically free with the logic in FPGAs) to pipeline it. One way requires a slow clock to get a result, but you get it one clock after you set up the inputs. The pipeline requires you to wait N clocks, but you can get a result on every clock.
>>> Ditto -- possibly -- for doing division, although I'm not so sure that >>> shift-and-subtract isn't better (and certainly more direct). >> >> I don't know exactly how the CORDIC would work to calculate arctan. I >> suppose it would be similar but starting with a zero angle, choose up or >> down by comparing the Y value to 0. In the end rotation of the vector >> will produce an angle and a magnitude. > > Back when CORDICs were the answer to everything there were some app > notes and white papers and stuff floating around that had CORDIC > solutions for nearly anything, e.g., all the trig functions, etc., > etc. So that stuff was sorted out and published from multiple > sources and might be still found with a bit of sleuthing. > >>> So -- if you're designing an ASIC at the gate level, CORDIC may still be >>> the bee's knees. Or if you just have to do DSP with an 8051 or a CPLD. >>> But in this degenerate age I don't think that you need to pick up the >>> CORDIC algorithm to rotate a vector any more than you need a big book of >>> log tables to multiply two numbers. >> >> If there is something CORDIC is better at, I'm not getting it. If it >> only used a single add per iteration or even two I would say yes, it has >> it's uses. But with three adds per iteration it sounds more costly than >> having to do either a multiply or a divide. > > That's been the conclusion that many have come to. As Tim alluded, > once in a while something pops up where somebody still can't use a > multiplier in silicon because of some extreme constraint on size or > power or something, and they'll plop a CORDIC in instead. If > multipliers and memory aren't constrained out, I don't know why you'd > use a CORDIC.
As I've said, the reason I got into this recently was because someone in another group claimed superhuman powers for the CORDIC of giving *exactly* correct results other than the lsb. I take that as &#4294967295;1 lsb. Turns out this is not correct. It's a bit of a Douglas Adams thing, yes the answer is perfectly correct, but the question (angle) is wrong by a little bit, more than &#4294967295;1 lsb some of the time. -- Rick C
Reply by February 12, 20172017-02-12
On Sun, 12 Feb 2017 00:36:52 -0500, rickman <gnuarm@gmail.com> wrote:

>On 2/11/2017 10:19 PM, Tim Wescott wrote: >> On Sat, 11 Feb 2017 19:44:26 -0500, rickman wrote: >> >>> I don't think I've ever really dug into the CORDIC algorithm enough to >>> appreciate the logic involved. I do recall coming to the realization >>> that the shift and add/subtract algorithm is not overly different from a >>> multiplication, so I had trouble understanding why touted as a way to >>> avoid the use of a multiplier. >>> >>> Looking at it harder, it would appear that the level of complexity is >>> that of *three* multipliers. I may not fully understand the algorithm, >>> but I believe it consists of repeated add/subtracts to rotate the vector >>> by successively smaller amounts corresponding to arctan(2^-i). At each >>> step there are three add/subtract operations. The vector is multiplied >>> by a matrix equivalent to two add/subtracts. But to determine if the >>> operations will be adds or subtracts the current rotation of the vector >>> must be compared to the angle desired (a subtraction). I'm a bit fuzzy >>> on this (I probably should push this around in my head a bit longer) but >>> I believe there is a fourth operation of adding/subtracting arctan(2^-i) >>> to update the current rotation angle. >>> >>> So four add/subtracts for each iteration. That's actually like four >>> multipliers, no? >>> >>> From what I've read the iterations can be shortened by starting the >>> calculations using a LUT for some number of bits to be precomputed. >>> >>> Also, since the values of arctan(2^-i) are precomputed and successively >>> added, the errors in these values are accumulative in the usual way of >>> (hopefully) uncorrelated noise. So while the result of the sine or >>> cosine may be bit exact to within 1 lsb, the angle is not. So you get a >>> perfectly correct sine for the slightly wrong angle. >>> >>> Am I understanding this correctly? Or am I making some part of this >>> more complex than it really is in implementation? >> >> That matches my understanding, and I've been looking into it myself, >> lately. >> >> In a day when multiplies were much more expensive than additions, CORDIC >> made oodles of sense, because you were just shifting and adding. Now we >> have block RAM and DSP blocks with single-cycle multiplies in our FPGAs >> and single-cycle multiply instructions in our DSP chips, and CORDIC >> doesn't make so much sense. > >I HATE when people say, "CORDIC... [is] just shifting and adding". >That's /exactly/ what multiplication is. I thought the trig method >(sin(a)cos(b)+cos(a)sin(b)) was more complex because it has other >operations in addition to the multiply, but it turns out there are the >equivalent of *three* multiplies and a fairly small look up table in the >CORDIC while the trig method has but one multiply along with a larger >look up table. The only advantage is the CORDIC computes both sine and >cosine simultaneously. It appears to be integral with the algorithm, >there's no way to not compute both. Useful if you are generating a >quadrature signal, but still more expensive with three adds per >significant bit rather than just two with the trig method (for both sine >and cosine).
There are many implementation methods for multipliers, and most don't actually use shifts and adds, well, not in the usual sense, anyway. Wallace trees and stuff like that. From that perspective "shifts and adds" applies better to the CORDIC than to typical multplier implementations, although the concepts both work that way. And in some applications, like a complex mixer, you need the simultaneous sin and cos, so CORDIC was a good thing to have before multipliers and LUTs got cheaper.
>> I think it may still make sense for rotating a vector to be entirely >> (well, almost entirely) real, either for the purpose of finding its >> magnitude or to find its angle. But that's because taking the arctangent >> pretty much unavoidably requires a divide, which is still a multi-clock >> operation: CORDIC (at least on an FPGA, and if I'm not mistaken) probably >> saves you there. > >I think you are talking about finding the phase angle of a complex pair. > I'm not sure if CORDIC saves anything there either. > >There is nothing inherent in FPGA calculations that require multiple >clocks. It just depends on how fast the clock is running, if you can >wait for the result and can afford the hardware. Divide is just the >inverse of multiply. Brute force will get 1 bit of the result per >iteration (not the same thing as clocks). There are other ways >involving iteration by multiplying by a guess and adjusting the result >to get closer. This can converge faster than repeated subtractions, 1 >bit at a time, but requires multiple multiplies.
Our CORDICs only ever had one clock as far as I recall.
> >> Ditto -- possibly -- for doing division, although I'm not so sure that >> shift-and-subtract isn't better (and certainly more direct). > >I don't know exactly how the CORDIC would work to calculate arctan. I >suppose it would be similar but starting with a zero angle, choose up or >down by comparing the Y value to 0. In the end rotation of the vector >will produce an angle and a magnitude.
Back when CORDICs were the answer to everything there were some app notes and white papers and stuff floating around that had CORDIC solutions for nearly anything, e.g., all the trig functions, etc., etc. So that stuff was sorted out and published from multiple sources and might be still found with a bit of sleuthing.
>> So -- if you're designing an ASIC at the gate level, CORDIC may still be >> the bee's knees. Or if you just have to do DSP with an 8051 or a CPLD. >> But in this degenerate age I don't think that you need to pick up the >> CORDIC algorithm to rotate a vector any more than you need a big book of >> log tables to multiply two numbers. > >If there is something CORDIC is better at, I'm not getting it. If it >only used a single add per iteration or even two I would say yes, it has >it's uses. But with three adds per iteration it sounds more costly than >having to do either a multiply or a divide.
That's been the conclusion that many have come to. As Tim alluded, once in a while something pops up where somebody still can't use a multiplier in silicon because of some extreme constraint on size or power or something, and they'll plop a CORDIC in instead. If multipliers and memory aren't constrained out, I don't know why you'd use a CORDIC. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus