I don't think I've ever really dug into the CORDIC algorithm enough to appreciate the logic involved. I do recall coming to the realization that the shift and add/subtract algorithm is not overly different from a multiplication, so I had trouble understanding why touted as a way to avoid the use of a multiplier. Looking at it harder, it would appear that the level of complexity is that of *three* multipliers. I may not fully understand the algorithm, but I believe it consists of repeated add/subtracts to rotate the vector by successively smaller amounts corresponding to arctan(2^-i). At each step there are three add/subtract operations. The vector is multiplied by a matrix equivalent to two add/subtracts. But to determine if the operations will be adds or subtracts the current rotation of the vector must be compared to the angle desired (a subtraction). I'm a bit fuzzy on this (I probably should push this around in my head a bit longer) but I believe there is a fourth operation of adding/subtracting arctan(2^-i) to update the current rotation angle. So four add/subtracts for each iteration. That's actually like four multipliers, no? From what I've read the iterations can be shortened by starting the calculations using a LUT for some number of bits to be precomputed. Also, since the values of arctan(2^-i) are precomputed and successively added, the errors in these values are accumulative in the usual way of (hopefully) uncorrelated noise. So while the result of the sine or cosine may be bit exact to within 1 lsb, the angle is not. So you get a perfectly correct sine for the slightly wrong angle. Am I understanding this correctly? Or am I making some part of this more complex than it really is in implementation? -- Rick C
CORDIC
Started by ●February 11, 2017
Reply by ●February 11, 20172017-02-11
On 2/11/2017 7:44 PM, rickman wrote:> I don't think I've ever really dug into the CORDIC algorithm enough to > appreciate the logic involved. I do recall coming to the realization > that the shift and add/subtract algorithm is not overly different from a > multiplication, so I had trouble understanding why touted as a way to > avoid the use of a multiplier. > > Looking at it harder, it would appear that the level of complexity is > that of *three* multipliers. I may not fully understand the algorithm, > but I believe it consists of repeated add/subtracts to rotate the vector > by successively smaller amounts corresponding to arctan(2^-i). At each > step there are three add/subtract operations. The vector is multiplied > by a matrix equivalent to two add/subtracts. But to determine if the > operations will be adds or subtracts the current rotation of the vector > must be compared to the angle desired (a subtraction). I'm a bit fuzzy > on this (I probably should push this around in my head a bit longer) but > I believe there is a fourth operation of adding/subtracting arctan(2^-i) > to update the current rotation angle. > > So four add/subtracts for each iteration. That's actually like four > multipliers, no? > > From what I've read the iterations can be shortened by starting the > calculations using a LUT for some number of bits to be precomputed. > > Also, since the values of arctan(2^-i) are precomputed and successively > added, the errors in these values are accumulative in the usual way of > (hopefully) uncorrelated noise. So while the result of the sine or > cosine may be bit exact to within 1 lsb, the angle is not. So you get a > perfectly correct sine for the slightly wrong angle. > > Am I understanding this correctly? Or am I making some part of this > more complex than it really is in implementation?I found some block diagrams of the CORDIC and it is pretty much what I thought, except the angle computations do not require two adders for adding and then comparing. They start with the angle and subtract or add depending on the sign of the previous result. So there are three adds/subtracts for each iteration of the CORDIC which is three times as many as for a multiply. Do I have it right now? -- Rick C
Reply by ●February 11, 20172017-02-11
On 2/11/2017 8:12 PM, rickman wrote:> On 2/11/2017 7:44 PM, rickman wrote: >> I don't think I've ever really dug into the CORDIC algorithm enough to >> appreciate the logic involved. I do recall coming to the realization >> that the shift and add/subtract algorithm is not overly different from a >> multiplication, so I had trouble understanding why touted as a way to >> avoid the use of a multiplier. >> >> Looking at it harder, it would appear that the level of complexity is >> that of *three* multipliers. I may not fully understand the algorithm, >> but I believe it consists of repeated add/subtracts to rotate the vector >> by successively smaller amounts corresponding to arctan(2^-i). At each >> step there are three add/subtract operations. The vector is multiplied >> by a matrix equivalent to two add/subtracts. But to determine if the >> operations will be adds or subtracts the current rotation of the vector >> must be compared to the angle desired (a subtraction). I'm a bit fuzzy >> on this (I probably should push this around in my head a bit longer) but >> I believe there is a fourth operation of adding/subtracting arctan(2^-i) >> to update the current rotation angle. >> >> So four add/subtracts for each iteration. That's actually like four >> multipliers, no? >> >> From what I've read the iterations can be shortened by starting the >> calculations using a LUT for some number of bits to be precomputed. >> >> Also, since the values of arctan(2^-i) are precomputed and successively >> added, the errors in these values are accumulative in the usual way of >> (hopefully) uncorrelated noise. So while the result of the sine or >> cosine may be bit exact to within 1 lsb, the angle is not. So you get a >> perfectly correct sine for the slightly wrong angle. >> >> Am I understanding this correctly? Or am I making some part of this >> more complex than it really is in implementation? > > I found some block diagrams of the CORDIC and it is pretty much what I > thought, except the angle computations do not require two adders for > adding and then comparing. They start with the angle and subtract or > add depending on the sign of the previous result. > > So there are three adds/subtracts for each iteration of the CORDIC which > is three times as many as for a multiply. Do I have it right now?I was thinking about the errors in the CORDIC computation a bit more and realized something. In addition to the accumulation of the rounding errors in the arctan constants that are accumulated to approximate the angle, the last computation of angle is subject to 100% error. If the N-1th step resulted in the exactly right angle, the final step is going to push you away from the right angle by the full amount of arctan(2^-N). I expect this to be pretty small, likely a value of 1 in the lsb. Even if you don't have this limiting case, the error in CORDIC is not so much in the numerical result. That is it computes the perfectly correct answer, but to the slightly wrong question! The angle is what is off rather than the calculated values. lol -- Rick C
Reply by ●February 11, 20172017-02-11
On Sat, 11 Feb 2017 19:44:26 -0500, rickman wrote:> I don't think I've ever really dug into the CORDIC algorithm enough to > appreciate the logic involved. I do recall coming to the realization > that the shift and add/subtract algorithm is not overly different from a > multiplication, so I had trouble understanding why touted as a way to > avoid the use of a multiplier. > > Looking at it harder, it would appear that the level of complexity is > that of *three* multipliers. I may not fully understand the algorithm, > but I believe it consists of repeated add/subtracts to rotate the vector > by successively smaller amounts corresponding to arctan(2^-i). At each > step there are three add/subtract operations. The vector is multiplied > by a matrix equivalent to two add/subtracts. But to determine if the > operations will be adds or subtracts the current rotation of the vector > must be compared to the angle desired (a subtraction). I'm a bit fuzzy > on this (I probably should push this around in my head a bit longer) but > I believe there is a fourth operation of adding/subtracting arctan(2^-i) > to update the current rotation angle. > > So four add/subtracts for each iteration. That's actually like four > multipliers, no? > > From what I've read the iterations can be shortened by starting the > calculations using a LUT for some number of bits to be precomputed. > > Also, since the values of arctan(2^-i) are precomputed and successively > added, the errors in these values are accumulative in the usual way of > (hopefully) uncorrelated noise. So while the result of the sine or > cosine may be bit exact to within 1 lsb, the angle is not. So you get a > perfectly correct sine for the slightly wrong angle. > > Am I understanding this correctly? Or am I making some part of this > more complex than it really is in implementation?That matches my understanding, and I've been looking into it myself, lately. In a day when multiplies were much more expensive than additions, CORDIC made oodles of sense, because you were just shifting and adding. Now we have block RAM and DSP blocks with single-cycle multiplies in our FPGAs and single-cycle multiply instructions in our DSP chips, and CORDIC doesn't make so much sense. I think it may still make sense for rotating a vector to be entirely (well, almost entirely) real, either for the purpose of finding its magnitude or to find its angle. But that's because taking the arctangent pretty much unavoidably requires a divide, which is still a multi-clock operation: CORDIC (at least on an FPGA, and if I'm not mistaken) probably saves you there. Ditto -- possibly -- for doing division, although I'm not so sure that shift-and-subtract isn't better (and certainly more direct). So -- if you're designing an ASIC at the gate level, CORDIC may still be the bee's knees. Or if you just have to do DSP with an 8051 or a CPLD. But in this degenerate age I don't think that you need to pick up the CORDIC algorithm to rotate a vector any more than you need a big book of log tables to multiply two numbers. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
Reply by ●February 12, 20172017-02-12
On 2/11/2017 10:19 PM, Tim Wescott wrote:> On Sat, 11 Feb 2017 19:44:26 -0500, rickman wrote: > >> I don't think I've ever really dug into the CORDIC algorithm enough to >> appreciate the logic involved. I do recall coming to the realization >> that the shift and add/subtract algorithm is not overly different from a >> multiplication, so I had trouble understanding why touted as a way to >> avoid the use of a multiplier. >> >> Looking at it harder, it would appear that the level of complexity is >> that of *three* multipliers. I may not fully understand the algorithm, >> but I believe it consists of repeated add/subtracts to rotate the vector >> by successively smaller amounts corresponding to arctan(2^-i). At each >> step there are three add/subtract operations. The vector is multiplied >> by a matrix equivalent to two add/subtracts. But to determine if the >> operations will be adds or subtracts the current rotation of the vector >> must be compared to the angle desired (a subtraction). I'm a bit fuzzy >> on this (I probably should push this around in my head a bit longer) but >> I believe there is a fourth operation of adding/subtracting arctan(2^-i) >> to update the current rotation angle. >> >> So four add/subtracts for each iteration. That's actually like four >> multipliers, no? >> >> From what I've read the iterations can be shortened by starting the >> calculations using a LUT for some number of bits to be precomputed. >> >> Also, since the values of arctan(2^-i) are precomputed and successively >> added, the errors in these values are accumulative in the usual way of >> (hopefully) uncorrelated noise. So while the result of the sine or >> cosine may be bit exact to within 1 lsb, the angle is not. So you get a >> perfectly correct sine for the slightly wrong angle. >> >> Am I understanding this correctly? Or am I making some part of this >> more complex than it really is in implementation? > > That matches my understanding, and I've been looking into it myself, > lately. > > In a day when multiplies were much more expensive than additions, CORDIC > made oodles of sense, because you were just shifting and adding. Now we > have block RAM and DSP blocks with single-cycle multiplies in our FPGAs > and single-cycle multiply instructions in our DSP chips, and CORDIC > doesn't make so much sense.I HATE when people say, "CORDIC... [is] just shifting and adding". That's /exactly/ what multiplication is. I thought the trig method (sin(a)cos(b)+cos(a)sin(b)) was more complex because it has other operations in addition to the multiply, but it turns out there are the equivalent of *three* multiplies and a fairly small look up table in the CORDIC while the trig method has but one multiply along with a larger look up table. The only advantage is the CORDIC computes both sine and cosine simultaneously. It appears to be integral with the algorithm, there's no way to not compute both. Useful if you are generating a quadrature signal, but still more expensive with three adds per significant bit rather than just two with the trig method (for both sine and cosine).> I think it may still make sense for rotating a vector to be entirely > (well, almost entirely) real, either for the purpose of finding its > magnitude or to find its angle. But that's because taking the arctangent > pretty much unavoidably requires a divide, which is still a multi-clock > operation: CORDIC (at least on an FPGA, and if I'm not mistaken) probably > saves you there.I think you are talking about finding the phase angle of a complex pair. I'm not sure if CORDIC saves anything there either. There is nothing inherent in FPGA calculations that require multiple clocks. It just depends on how fast the clock is running, if you can wait for the result and can afford the hardware. Divide is just the inverse of multiply. Brute force will get 1 bit of the result per iteration (not the same thing as clocks). There are other ways involving iteration by multiplying by a guess and adjusting the result to get closer. This can converge faster than repeated subtractions, 1 bit at a time, but requires multiple multiplies.> Ditto -- possibly -- for doing division, although I'm not so sure that > shift-and-subtract isn't better (and certainly more direct).I don't know exactly how the CORDIC would work to calculate arctan. I suppose it would be similar but starting with a zero angle, choose up or down by comparing the Y value to 0. In the end rotation of the vector will produce an angle and a magnitude.> So -- if you're designing an ASIC at the gate level, CORDIC may still be > the bee's knees. Or if you just have to do DSP with an 8051 or a CPLD. > But in this degenerate age I don't think that you need to pick up the > CORDIC algorithm to rotate a vector any more than you need a big book of > log tables to multiply two numbers.If there is something CORDIC is better at, I'm not getting it. If it only used a single add per iteration or even two I would say yes, it has it's uses. But with three adds per iteration it sounds more costly than having to do either a multiply or a divide. -- Rick C
Reply by ●February 12, 20172017-02-12
On Sun, 12 Feb 2017 00:36:52 -0500, rickman <gnuarm@gmail.com> wrote:>On 2/11/2017 10:19 PM, Tim Wescott wrote: >> On Sat, 11 Feb 2017 19:44:26 -0500, rickman wrote: >> >>> I don't think I've ever really dug into the CORDIC algorithm enough to >>> appreciate the logic involved. I do recall coming to the realization >>> that the shift and add/subtract algorithm is not overly different from a >>> multiplication, so I had trouble understanding why touted as a way to >>> avoid the use of a multiplier. >>> >>> Looking at it harder, it would appear that the level of complexity is >>> that of *three* multipliers. I may not fully understand the algorithm, >>> but I believe it consists of repeated add/subtracts to rotate the vector >>> by successively smaller amounts corresponding to arctan(2^-i). At each >>> step there are three add/subtract operations. The vector is multiplied >>> by a matrix equivalent to two add/subtracts. But to determine if the >>> operations will be adds or subtracts the current rotation of the vector >>> must be compared to the angle desired (a subtraction). I'm a bit fuzzy >>> on this (I probably should push this around in my head a bit longer) but >>> I believe there is a fourth operation of adding/subtracting arctan(2^-i) >>> to update the current rotation angle. >>> >>> So four add/subtracts for each iteration. That's actually like four >>> multipliers, no? >>> >>> From what I've read the iterations can be shortened by starting the >>> calculations using a LUT for some number of bits to be precomputed. >>> >>> Also, since the values of arctan(2^-i) are precomputed and successively >>> added, the errors in these values are accumulative in the usual way of >>> (hopefully) uncorrelated noise. So while the result of the sine or >>> cosine may be bit exact to within 1 lsb, the angle is not. So you get a >>> perfectly correct sine for the slightly wrong angle. >>> >>> Am I understanding this correctly? Or am I making some part of this >>> more complex than it really is in implementation? >> >> That matches my understanding, and I've been looking into it myself, >> lately. >> >> In a day when multiplies were much more expensive than additions, CORDIC >> made oodles of sense, because you were just shifting and adding. Now we >> have block RAM and DSP blocks with single-cycle multiplies in our FPGAs >> and single-cycle multiply instructions in our DSP chips, and CORDIC >> doesn't make so much sense. > >I HATE when people say, "CORDIC... [is] just shifting and adding". >That's /exactly/ what multiplication is. I thought the trig method >(sin(a)cos(b)+cos(a)sin(b)) was more complex because it has other >operations in addition to the multiply, but it turns out there are the >equivalent of *three* multiplies and a fairly small look up table in the >CORDIC while the trig method has but one multiply along with a larger >look up table. The only advantage is the CORDIC computes both sine and >cosine simultaneously. It appears to be integral with the algorithm, >there's no way to not compute both. Useful if you are generating a >quadrature signal, but still more expensive with three adds per >significant bit rather than just two with the trig method (for both sine >and cosine).There are many implementation methods for multipliers, and most don't actually use shifts and adds, well, not in the usual sense, anyway. Wallace trees and stuff like that. From that perspective "shifts and adds" applies better to the CORDIC than to typical multplier implementations, although the concepts both work that way. And in some applications, like a complex mixer, you need the simultaneous sin and cos, so CORDIC was a good thing to have before multipliers and LUTs got cheaper.>> I think it may still make sense for rotating a vector to be entirely >> (well, almost entirely) real, either for the purpose of finding its >> magnitude or to find its angle. But that's because taking the arctangent >> pretty much unavoidably requires a divide, which is still a multi-clock >> operation: CORDIC (at least on an FPGA, and if I'm not mistaken) probably >> saves you there. > >I think you are talking about finding the phase angle of a complex pair. > I'm not sure if CORDIC saves anything there either. > >There is nothing inherent in FPGA calculations that require multiple >clocks. It just depends on how fast the clock is running, if you can >wait for the result and can afford the hardware. Divide is just the >inverse of multiply. Brute force will get 1 bit of the result per >iteration (not the same thing as clocks). There are other ways >involving iteration by multiplying by a guess and adjusting the result >to get closer. This can converge faster than repeated subtractions, 1 >bit at a time, but requires multiple multiplies.Our CORDICs only ever had one clock as far as I recall.> >> Ditto -- possibly -- for doing division, although I'm not so sure that >> shift-and-subtract isn't better (and certainly more direct). > >I don't know exactly how the CORDIC would work to calculate arctan. I >suppose it would be similar but starting with a zero angle, choose up or >down by comparing the Y value to 0. In the end rotation of the vector >will produce an angle and a magnitude.Back when CORDICs were the answer to everything there were some app notes and white papers and stuff floating around that had CORDIC solutions for nearly anything, e.g., all the trig functions, etc., etc. So that stuff was sorted out and published from multiple sources and might be still found with a bit of sleuthing.>> So -- if you're designing an ASIC at the gate level, CORDIC may still be >> the bee's knees. Or if you just have to do DSP with an 8051 or a CPLD. >> But in this degenerate age I don't think that you need to pick up the >> CORDIC algorithm to rotate a vector any more than you need a big book of >> log tables to multiply two numbers. > >If there is something CORDIC is better at, I'm not getting it. If it >only used a single add per iteration or even two I would say yes, it has >it's uses. But with three adds per iteration it sounds more costly than >having to do either a multiply or a divide.That's been the conclusion that many have come to. As Tim alluded, once in a while something pops up where somebody still can't use a multiplier in silicon because of some extreme constraint on size or power or something, and they'll plop a CORDIC in instead. If multipliers and memory aren't constrained out, I don't know why you'd use a CORDIC. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Reply by ●February 12, 20172017-02-12
On 2/12/2017 1:40 AM, eric.jacobsen@ieee.org wrote:> On Sun, 12 Feb 2017 00:36:52 -0500, rickman <gnuarm@gmail.com> wrote: > >> On 2/11/2017 10:19 PM, Tim Wescott wrote: >>> On Sat, 11 Feb 2017 19:44:26 -0500, rickman wrote: >>> >>>> I don't think I've ever really dug into the CORDIC algorithm enough to >>>> appreciate the logic involved. I do recall coming to the realization >>>> that the shift and add/subtract algorithm is not overly different from a >>>> multiplication, so I had trouble understanding why touted as a way to >>>> avoid the use of a multiplier. >>>> >>>> Looking at it harder, it would appear that the level of complexity is >>>> that of *three* multipliers. I may not fully understand the algorithm, >>>> but I believe it consists of repeated add/subtracts to rotate the vector >>>> by successively smaller amounts corresponding to arctan(2^-i). At each >>>> step there are three add/subtract operations. The vector is multiplied >>>> by a matrix equivalent to two add/subtracts. But to determine if the >>>> operations will be adds or subtracts the current rotation of the vector >>>> must be compared to the angle desired (a subtraction). I'm a bit fuzzy >>>> on this (I probably should push this around in my head a bit longer) but >>>> I believe there is a fourth operation of adding/subtracting arctan(2^-i) >>>> to update the current rotation angle. >>>> >>>> So four add/subtracts for each iteration. That's actually like four >>>> multipliers, no? >>>> >>>> From what I've read the iterations can be shortened by starting the >>>> calculations using a LUT for some number of bits to be precomputed. >>>> >>>> Also, since the values of arctan(2^-i) are precomputed and successively >>>> added, the errors in these values are accumulative in the usual way of >>>> (hopefully) uncorrelated noise. So while the result of the sine or >>>> cosine may be bit exact to within 1 lsb, the angle is not. So you get a >>>> perfectly correct sine for the slightly wrong angle. >>>> >>>> Am I understanding this correctly? Or am I making some part of this >>>> more complex than it really is in implementation? >>> >>> That matches my understanding, and I've been looking into it myself, >>> lately. >>> >>> In a day when multiplies were much more expensive than additions, CORDIC >>> made oodles of sense, because you were just shifting and adding. Now we >>> have block RAM and DSP blocks with single-cycle multiplies in our FPGAs >>> and single-cycle multiply instructions in our DSP chips, and CORDIC >>> doesn't make so much sense. >> >> I HATE when people say, "CORDIC... [is] just shifting and adding". >> That's /exactly/ what multiplication is. I thought the trig method >> (sin(a)cos(b)+cos(a)sin(b)) was more complex because it has other >> operations in addition to the multiply, but it turns out there are the >> equivalent of *three* multiplies and a fairly small look up table in the >> CORDIC while the trig method has but one multiply along with a larger >> look up table. The only advantage is the CORDIC computes both sine and >> cosine simultaneously. It appears to be integral with the algorithm, >> there's no way to not compute both. Useful if you are generating a >> quadrature signal, but still more expensive with three adds per >> significant bit rather than just two with the trig method (for both sine >> and cosine). > > There are many implementation methods for multipliers, and most don't > actually use shifts and adds, well, not in the usual sense, anyway. > Wallace trees and stuff like that. From that perspective "shifts and > adds" applies better to the CORDIC than to typical multplier > implementations, although the concepts both work that way. > > And in some applications, like a complex mixer, you need the > simultaneous sin and cos, so CORDIC was a good thing to have before > multipliers and LUTs got cheaper.That's my point. However you are doing the CORDIC, the multiply is easier. A CORDIC uses *three* adds per iteration, a multiply only *one*. Using the trig method to calculate sine *and* cosine uses two multipliers. Do the math.>>> I think it may still make sense for rotating a vector to be entirely >>> (well, almost entirely) real, either for the purpose of finding its >>> magnitude or to find its angle. But that's because taking the arctangent >>> pretty much unavoidably requires a divide, which is still a multi-clock >>> operation: CORDIC (at least on an FPGA, and if I'm not mistaken) probably >>> saves you there. >> >> I think you are talking about finding the phase angle of a complex pair. >> I'm not sure if CORDIC saves anything there either. >> >> There is nothing inherent in FPGA calculations that require multiple >> clocks. It just depends on how fast the clock is running, if you can >> wait for the result and can afford the hardware. Divide is just the >> inverse of multiply. Brute force will get 1 bit of the result per >> iteration (not the same thing as clocks). There are other ways >> involving iteration by multiplying by a guess and adjusting the result >> to get closer. This can converge faster than repeated subtractions, 1 >> bit at a time, but requires multiple multiplies. > > Our CORDICs only ever had one clock as far as I recall.I don't know if you mean you can do the entire CORDIC in one clock or it is pipelined. It's the same hardware in either case, you just separate the stages with registers (typically free with the logic in FPGAs) to pipeline it. One way requires a slow clock to get a result, but you get it one clock after you set up the inputs. The pipeline requires you to wait N clocks, but you can get a result on every clock.>>> Ditto -- possibly -- for doing division, although I'm not so sure that >>> shift-and-subtract isn't better (and certainly more direct). >> >> I don't know exactly how the CORDIC would work to calculate arctan. I >> suppose it would be similar but starting with a zero angle, choose up or >> down by comparing the Y value to 0. In the end rotation of the vector >> will produce an angle and a magnitude. > > Back when CORDICs were the answer to everything there were some app > notes and white papers and stuff floating around that had CORDIC > solutions for nearly anything, e.g., all the trig functions, etc., > etc. So that stuff was sorted out and published from multiple > sources and might be still found with a bit of sleuthing. > >>> So -- if you're designing an ASIC at the gate level, CORDIC may still be >>> the bee's knees. Or if you just have to do DSP with an 8051 or a CPLD. >>> But in this degenerate age I don't think that you need to pick up the >>> CORDIC algorithm to rotate a vector any more than you need a big book of >>> log tables to multiply two numbers. >> >> If there is something CORDIC is better at, I'm not getting it. If it >> only used a single add per iteration or even two I would say yes, it has >> it's uses. But with three adds per iteration it sounds more costly than >> having to do either a multiply or a divide. > > That's been the conclusion that many have come to. As Tim alluded, > once in a while something pops up where somebody still can't use a > multiplier in silicon because of some extreme constraint on size or > power or something, and they'll plop a CORDIC in instead. If > multipliers and memory aren't constrained out, I don't know why you'd > use a CORDIC.As I've said, the reason I got into this recently was because someone in another group claimed superhuman powers for the CORDIC of giving *exactly* correct results other than the lsb. I take that as �1 lsb. Turns out this is not correct. It's a bit of a Douglas Adams thing, yes the answer is perfectly correct, but the question (angle) is wrong by a little bit, more than �1 lsb some of the time. -- Rick C
Reply by ●February 12, 20172017-02-12
On Sun, 12 Feb 2017 00:36:52 -0500, rickman wrote:> On 2/11/2017 10:19 PM, Tim Wescott wrote: >> On Sat, 11 Feb 2017 19:44:26 -0500, rickman wrote: >> >>> I don't think I've ever really dug into the CORDIC algorithm enough to >>> appreciate the logic involved. I do recall coming to the realization >>> that the shift and add/subtract algorithm is not overly different from >>> a multiplication, so I had trouble understanding why touted as a way >>> to avoid the use of a multiplier. >>> >>> Looking at it harder, it would appear that the level of complexity is >>> that of *three* multipliers. I may not fully understand the >>> algorithm, but I believe it consists of repeated add/subtracts to >>> rotate the vector by successively smaller amounts corresponding to >>> arctan(2^-i). At each step there are three add/subtract operations. >>> The vector is multiplied by a matrix equivalent to two add/subtracts. >>> But to determine if the operations will be adds or subtracts the >>> current rotation of the vector must be compared to the angle desired >>> (a subtraction). I'm a bit fuzzy on this (I probably should push this >>> around in my head a bit longer) but I believe there is a fourth >>> operation of adding/subtracting arctan(2^-i) >>> to update the current rotation angle. >>> >>> So four add/subtracts for each iteration. That's actually like four >>> multipliers, no? >>> >>> From what I've read the iterations can be shortened by starting the >>> calculations using a LUT for some number of bits to be precomputed. >>> >>> Also, since the values of arctan(2^-i) are precomputed and >>> successively added, the errors in these values are accumulative in the >>> usual way of (hopefully) uncorrelated noise. So while the result of >>> the sine or cosine may be bit exact to within 1 lsb, the angle is not. >>> So you get a perfectly correct sine for the slightly wrong angle. >>> >>> Am I understanding this correctly? Or am I making some part of this >>> more complex than it really is in implementation? >> >> That matches my understanding, and I've been looking into it myself, >> lately. >> >> In a day when multiplies were much more expensive than additions, >> CORDIC made oodles of sense, because you were just shifting and adding. >> Now we have block RAM and DSP blocks with single-cycle multiplies in >> our FPGAs and single-cycle multiply instructions in our DSP chips, and >> CORDIC doesn't make so much sense. > > I HATE when people say, "CORDIC... [is] just shifting and adding". > That's /exactly/ what multiplication is. I thought the trig method > (sin(a)cos(b)+cos(a)sin(b)) was more complex because it has other > operations in addition to the multiply, but it turns out there are the > equivalent of *three* multiplies and a fairly small look up table in the > CORDIC while the trig method has but one multiply along with a larger > look up table. The only advantage is the CORDIC computes both sine and > cosine simultaneously. It appears to be integral with the algorithm, > there's no way to not compute both. Useful if you are generating a > quadrature signal, but still more expensive with three adds per > significant bit rather than just two with the trig method (for both sine > and cosine).Ah, I get your point now.>> I think it may still make sense for rotating a vector to be entirely >> (well, almost entirely) real, either for the purpose of finding its >> magnitude or to find its angle. But that's because taking the >> arctangent pretty much unavoidably requires a divide, which is still a >> multi-clock operation: CORDIC (at least on an FPGA, and if I'm not >> mistaken) probably saves you there. > > I think you are talking about finding the phase angle of a complex pair. > I'm not sure if CORDIC saves anything there either.Yes, that's what I meant. I don't know either, but if I were doing the work I'd check.> There is nothing inherent in FPGA calculations that require multiple > clocks. It just depends on how fast the clock is running, if you can > wait for the result and can afford the hardware. Divide is just the > inverse of multiply. Brute force will get 1 bit of the result per > iteration (not the same thing as clocks). There are other ways > involving iteration by multiplying by a guess and adjusting the result > to get closer. This can converge faster than repeated subtractions, 1 > bit at a time, but requires multiple multiplies. >I was using "clocks" as shorthand for "clocks or real estate".>> Ditto -- possibly -- for doing division, although I'm not so sure that >> shift-and-subtract isn't better (and certainly more direct). > > I don't know exactly how the CORDIC would work to calculate arctan. I > suppose it would be similar but starting with a zero angle, choose up or > down by comparing the Y value to 0. In the end rotation of the vector > will produce an angle and a magnitude.Yes, exactly.>> So -- if you're designing an ASIC at the gate level, CORDIC may still >> be the bee's knees. Or if you just have to do DSP with an 8051 or a >> CPLD. But in this degenerate age I don't think that you need to pick up >> the CORDIC algorithm to rotate a vector any more than you need a big >> book of log tables to multiply two numbers. > > If there is something CORDIC is better at, I'm not getting it. If it > only used a single add per iteration or even two I would say yes, it has > it's uses. But with three adds per iteration it sounds more costly than > having to do either a multiply or a divide.If you're thinking in terms of raw gates, or chip area, perhaps the memory savings is worth the extra effort. Certainly in 1956 or whatever it was the coolest of the cool -- but then, in 1956, memory was either a few transistors per bit, or it was a little toroid woven into a mess of wires by some soon-to-be nearsighted young lady. -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
Reply by ●February 12, 20172017-02-12
On Sun, 12 Feb 2017 02:05:41 -0500, rickman wrote:> On 2/12/2017 1:40 AM, eric.jacobsen@ieee.org wrote: >> On Sun, 12 Feb 2017 00:36:52 -0500, rickman <gnuarm@gmail.com> wrote: >> >>> On 2/11/2017 10:19 PM, Tim Wescott wrote: >>>> On Sat, 11 Feb 2017 19:44:26 -0500, rickman wrote: >>>> >>>>> I don't think I've ever really dug into the CORDIC algorithm enough >>>>> to appreciate the logic involved. I do recall coming to the >>>>> realization that the shift and add/subtract algorithm is not overly >>>>> different from a multiplication, so I had trouble understanding why >>>>> touted as a way to avoid the use of a multiplier. >>>>> >>>>> Looking at it harder, it would appear that the level of complexity >>>>> is that of *three* multipliers. I may not fully understand the >>>>> algorithm, but I believe it consists of repeated add/subtracts to >>>>> rotate the vector by successively smaller amounts corresponding to >>>>> arctan(2^-i). At each step there are three add/subtract operations. >>>>> The vector is multiplied by a matrix equivalent to two >>>>> add/subtracts. But to determine if the operations will be adds or >>>>> subtracts the current rotation of the vector must be compared to the >>>>> angle desired (a subtraction). I'm a bit fuzzy on this (I probably >>>>> should push this around in my head a bit longer) but I believe there >>>>> is a fourth operation of adding/subtracting arctan(2^-i) >>>>> to update the current rotation angle. >>>>> >>>>> So four add/subtracts for each iteration. That's actually like four >>>>> multipliers, no? >>>>> >>>>> From what I've read the iterations can be shortened by starting the >>>>> calculations using a LUT for some number of bits to be precomputed. >>>>> >>>>> Also, since the values of arctan(2^-i) are precomputed and >>>>> successively added, the errors in these values are accumulative in >>>>> the usual way of (hopefully) uncorrelated noise. So while the >>>>> result of the sine or cosine may be bit exact to within 1 lsb, the >>>>> angle is not. So you get a perfectly correct sine for the slightly >>>>> wrong angle. >>>>> >>>>> Am I understanding this correctly? Or am I making some part of this >>>>> more complex than it really is in implementation? >>>> >>>> That matches my understanding, and I've been looking into it myself, >>>> lately. >>>> >>>> In a day when multiplies were much more expensive than additions, >>>> CORDIC made oodles of sense, because you were just shifting and >>>> adding. Now we have block RAM and DSP blocks with single-cycle >>>> multiplies in our FPGAs and single-cycle multiply instructions in our >>>> DSP chips, and CORDIC doesn't make so much sense. >>> >>> I HATE when people say, "CORDIC... [is] just shifting and adding". >>> That's /exactly/ what multiplication is. I thought the trig method >>> (sin(a)cos(b)+cos(a)sin(b)) was more complex because it has other >>> operations in addition to the multiply, but it turns out there are the >>> equivalent of *three* multiplies and a fairly small look up table in >>> the CORDIC while the trig method has but one multiply along with a >>> larger look up table. The only advantage is the CORDIC computes both >>> sine and cosine simultaneously. It appears to be integral with the >>> algorithm, there's no way to not compute both. Useful if you are >>> generating a quadrature signal, but still more expensive with three >>> adds per significant bit rather than just two with the trig method >>> (for both sine and cosine). >> >> There are many implementation methods for multipliers, and most don't >> actually use shifts and adds, well, not in the usual sense, anyway. >> Wallace trees and stuff like that. From that perspective "shifts and >> adds" applies better to the CORDIC than to typical multplier >> implementations, although the concepts both work that way. >> >> And in some applications, like a complex mixer, you need the >> simultaneous sin and cos, so CORDIC was a good thing to have before >> multipliers and LUTs got cheaper. > > That's my point. However you are doing the CORDIC, the multiply is > easier. A CORDIC uses *three* adds per iteration, a multiply only > *one*. Using the trig method to calculate sine *and* cosine uses two > multipliers. Do the math. > > >>>> I think it may still make sense for rotating a vector to be entirely >>>> (well, almost entirely) real, either for the purpose of finding its >>>> magnitude or to find its angle. But that's because taking the >>>> arctangent pretty much unavoidably requires a divide, which is still >>>> a multi-clock operation: CORDIC (at least on an FPGA, and if I'm not >>>> mistaken) probably saves you there. >>> >>> I think you are talking about finding the phase angle of a complex >>> pair. >>> I'm not sure if CORDIC saves anything there either. >>> >>> There is nothing inherent in FPGA calculations that require multiple >>> clocks. It just depends on how fast the clock is running, if you can >>> wait for the result and can afford the hardware. Divide is just the >>> inverse of multiply. Brute force will get 1 bit of the result per >>> iteration (not the same thing as clocks). There are other ways >>> involving iteration by multiplying by a guess and adjusting the result >>> to get closer. This can converge faster than repeated subtractions, 1 >>> bit at a time, but requires multiple multiplies. >> >> Our CORDICs only ever had one clock as far as I recall. > > I don't know if you mean you can do the entire CORDIC in one clock or it > is pipelined. It's the same hardware in either case, you just separate > the stages with registers (typically free with the logic in FPGAs) to > pipeline it. One way requires a slow clock to get a result, but you get > it one clock after you set up the inputs. The pipeline requires you to > wait N clocks, but you can get a result on every clock. > > >>>> Ditto -- possibly -- for doing division, although I'm not so sure >>>> that shift-and-subtract isn't better (and certainly more direct). >>> >>> I don't know exactly how the CORDIC would work to calculate arctan. I >>> suppose it would be similar but starting with a zero angle, choose up >>> or down by comparing the Y value to 0. In the end rotation of the >>> vector will produce an angle and a magnitude. >> >> Back when CORDICs were the answer to everything there were some app >> notes and white papers and stuff floating around that had CORDIC >> solutions for nearly anything, e.g., all the trig functions, etc., etc. >> So that stuff was sorted out and published from multiple sources and >> might be still found with a bit of sleuthing. >> >>>> So -- if you're designing an ASIC at the gate level, CORDIC may still >>>> be the bee's knees. Or if you just have to do DSP with an 8051 or a >>>> CPLD. But in this degenerate age I don't think that you need to pick >>>> up the CORDIC algorithm to rotate a vector any more than you need a >>>> big book of log tables to multiply two numbers. >>> >>> If there is something CORDIC is better at, I'm not getting it. If it >>> only used a single add per iteration or even two I would say yes, it >>> has it's uses. But with three adds per iteration it sounds more >>> costly than having to do either a multiply or a divide. >> >> That's been the conclusion that many have come to. As Tim alluded, >> once in a while something pops up where somebody still can't use a >> multiplier in silicon because of some extreme constraint on size or >> power or something, and they'll plop a CORDIC in instead. If >> multipliers and memory aren't constrained out, I don't know why you'd >> use a CORDIC. > > As I've said, the reason I got into this recently was because someone in > another group claimed superhuman powers for the CORDIC of giving > *exactly* correct results other than the lsb. I take that as ±1 lsb. > Turns out this is not correct. It's a bit of a Douglas Adams thing, yes > the answer is perfectly correct, but the question (angle) is wrong by a > little bit, more than ±1 lsb some of the time.In 1956, with the hardware available at the time, the CORDIC probably _did_ look superhuman. I think what you were seeing was residual shine from that. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
Reply by ●February 12, 20172017-02-12
On 2/12/2017 11:26 AM, Tim Wescott wrote:> On Sun, 12 Feb 2017 00:36:52 -0500, rickman wrote: >> >> If there is something CORDIC is better at, I'm not getting it. If it >> only used a single add per iteration or even two I would say yes, it has >> it's uses. But with three adds per iteration it sounds more costly than >> having to do either a multiply or a divide. > > If you're thinking in terms of raw gates, or chip area, perhaps the > memory savings is worth the extra effort. Certainly in 1956 or whatever > it was the coolest of the cool -- but then, in 1956, memory was either a > few transistors per bit, or it was a little toroid woven into a mess of > wires by some soon-to-be nearsighted young lady.I'm not sure how they would have done this in 1956. Wasn't this for a B-58 navigation computer? What kind of digital electronics did they have then? Looking at a book on "Pulse and Digital Circuit" I see the only mention of transistors is in special chapter at the end of the book. Wasn't the transistor brand new then, invented in '51? Even though they were available they were expensive and an entire FF would occupy a small PCB. In high school (circa '69) we had been given an old computer from the weather service. It had PCBs about 3-4 inches square which would hold a single FF, or a gate. Even an 8 bit CORDIC would be tough to implement in this technology unless the adds were bit serial. I guess that could work in a box a foot on each side. It can be hard to think in terms of bit serial. Everything has to be a shift register. Here is a reference by Volder about the invention of CORDIC with a picture of the navigation computer. http://late-dpedago.urv.cat/site_media/papers/fulltext_2.pdf It's a lot bigger than a foot cube and I think I see some tubes. As an aside, the B-58 was a supersonic bomber. I don't think we have any of those. The paper says it was scrapped in favor of balistic missiles which makes perfect sense. -- Rick C