Hi, first off I'd like to apologize in case this question is/sounds extremely stupid, but I'm really stuck. I need an efficient atan2 that doesn't take up as much real estate on an FPGA as a CORDIC would. As such, I've been looking at the 'Trick' mentioned at dspguru[1]. So according to that trick, given: x = 0.1838 y = -0.1818 I would now do the following, since it is in the IV quadrant: r = (x-y)/(x+y) But here is the problem: the result of that division is around 177. If I now continue on and find the angle by doing: theta = pi/4 - pi/4*r (or rather pi/4*r - pi/4) it will obviously be horribly wrong. What am I missing? I'm testing all this in Matlab, but I also tried using the fixed point toolbox, and the result with the same number of bits and fraction length is simply '1'. How is this supposed to work? Where am I wrong? Thank you in advance, Alex Hornung [1]: http://www.dspguru.com/dsp/tricks/fixed-point-atan2-with-self-normalization

# fixed point atan2

Started by ●November 30, 2010

Reply by ●November 30, 20102010-11-30

On Nov 30, 2:47�am, Alex Hornung <ahorn...@gmail.com> wrote:> Hi, > > first off I'd like to apologize in case this question is/sounds > extremely stupid, but I'm really stuck. > > I need an efficient atan2 that doesn't take up as much real estate on an > FPGA as a CORDIC would. As such, I've been looking at the 'Trick' > mentioned at dspguru[1]. > > So according to that trick, given: > x = 0.1838 > y = -0.1818 > > I would now do the following, since it is in the IV quadrant: > r = (x-y)/(x+y) > > But here is the problem: the result of that division is around 177. If I > now continue on and find the angle by doing: > theta = pi/4 - pi/4*r (or rather pi/4*r - pi/4) > > it will obviously be horribly wrong. What am I missing? I'm testing all > this in Matlab, but I also tried using the fixed point toolbox, and the > result with the same number of bits and fraction length is simply '1'. > How is this supposed to work? Where am I wrong? > > Thank you in advance, > Alex Hornung > > [1]:http://www.dspguru.com/dsp/tricks/fixed-point-atan2-with-self-normali...You are missing the use of abs(y). See the accompanying code. Hope this helps. Greg

Reply by ●November 30, 20102010-11-30

On 30/11/2010 10:58, Greg Heath wrote:> On Nov 30, 2:47 am, Alex Hornung<ahorn...@gmail.com> wrote: >> Hi, >> >> first off I'd like to apologize in case this question is/sounds >> extremely stupid, but I'm really stuck. >> >> I need an efficient atan2 that doesn't take up as much real estate on an >> FPGA as a CORDIC would. As such, I've been looking at the 'Trick' >> mentioned at dspguru[1]. >> >> So according to that trick, given: >> x = 0.1838 >> y = -0.1818 >> >> I would now do the following, since it is in the IV quadrant: >> r = (x-y)/(x+y) >> >> But here is the problem: the result of that division is around 177. If I >> now continue on and find the angle by doing: >> theta = pi/4 - pi/4*r (or rather pi/4*r - pi/4) >> >> it will obviously be horribly wrong. What am I missing? I'm testing all >> this in Matlab, but I also tried using the fixed point toolbox, and the >> result with the same number of bits and fraction length is simply '1'. >> How is this supposed to work? Where am I wrong? >> >> Thank you in advance, >> Alex Hornung >> >> [1]:http://www.dspguru.com/dsp/tricks/fixed-point-atan2-with-self-normali... > > You are missing the use of abs(y). See the accompanying code. > > Hope this helps. > > GregIt sure does! Thank you very much, Alex

Reply by ●November 30, 20102010-11-30

>I need an efficient atan2 that doesn't take up as much real estate on an >FPGA as a CORDIC would. As such, I've been looking at the 'Trick' >mentioned at dspguru[1]. >Why do you believe the CORDIC uses more FPGA "real estate" than this other approach?

Reply by ●November 30, 20102010-11-30

On 30/11/2010 13:28, cfelton wrote:>> I need an efficient atan2 that doesn't take up as much real estate on an >> FPGA as a CORDIC would. As such, I've been looking at the 'Trick' >> mentioned at dspguru[1]. >> > > Why do you believe the CORDIC uses more FPGA "real estate" than this other > approach?From the data provided by Xilinx, a CORDIC takes up anywhere between 1300 and 4000 LUT-FF pairs. Any multiplier I'd use would take up at most 4 xtremeDSP slices and any full adders shouldn't take up much either. As far as I can tell the divider would be the biggest block with this approach, and according to the Xilinx IP datasheet, it can be anywhere between 80 LUT-FF pairs and 500 for my purposes. This still seems quite a bit lower than what a CORDIC would require. In terms of latency it should be almost the same as a CORDIC, mainly due to the divider, again. I would of course welcome any solution that would allow me to make this even simpler (for example by removing the divider somehow). Considering that I don't require much accuracy, there might be even more efficient solutions that I don't know anything about. As you might have guessed from my first post, I'm quite new to this (both FPGAs and DSP) and I'd greatly appreciate any further insight. Kind Regards, Alex Hornung

Reply by ●November 30, 20102010-11-30

Alex Hornung <ahornung@gmail.com> wrote:>I would of course welcome any solution that would allow me to make this >even simpler (for example by removing the divider somehow). Considering >that I don't require much accuracy, there might be even more efficient >solutions that I don't know anything about.Have you considered a one-octant LUT plus mirroring? How much accuracy do you need? These things are usually small. Steve

Reply by ●November 30, 20102010-11-30

On 30/11/2010 15:18, Steve Pope wrote:> Alex Hornung<ahornung@gmail.com> wrote: > >> I would of course welcome any solution that would allow me to make this >> even simpler (for example by removing the divider somehow). Considering >> that I don't require much accuracy, there might be even more efficient >> solutions that I don't know anything about. > > Have you considered a one-octant LUT plus mirroring? How much accuracy > do you need? These things are usually small. > > > SteveNo, and I have no idea on how that would work, to get everything from that one octant. Remember I'm really new to all of this :) Do you happen to have any paper/website/etc about it? The accuracy I need is somewhere around 0.1 to 0.2 radians. Maybe I could even use some lookup for the all the values with this kind of accuracy? After all it would just be around 60 possibilities for the whole circle. Cheers, Alex

Reply by ●November 30, 20102010-11-30

Alex Hornung <ahornung@gmail.com> wrote:>On 30/11/2010 15:18, Steve Pope wrote:>> Alex Hornung<ahornung@gmail.com> wrote:>>> I would of course welcome any solution that would allow me to make this >>> even simpler (for example by removing the divider somehow). Considering >>> that I don't require much accuracy, there might be even more efficient >>> solutions that I don't know anything about.>> Have you considered a one-octant LUT plus mirroring? How much accuracy >> do you need? These things are usually small.>No, and I have no idea on how that would work, to get everything from >that one octant. Remember I'm really new to all of this :) Do you happen >to have any paper/website/etc about it?>The accuracy I need is somewhere around 0.1 to 0.2 radians. Maybe I >could even use some lookup for the all the values with this kind of >accuracy?Yes, probably. Suppose the input to your four quadrant arctan is two 5-bit signed values representing a complex number. That's 1024 possible arctan values, which is a prettty large lookup table, but by manipulating these so that the input of the table lies always in the first octant, you are now down to 9 * 9 = 81 values. I think you will find the accuracy is better than 0.1 radians using such an approach. I know of no paper, you just have to design it and try it out. Steve

Reply by ●November 30, 20102010-11-30

> From the data provided by Xilinx, a CORDIC takes up anywhere between >1300 and 4000 LUT-FF pairs. Any multiplier I'd use would take up at most >4 xtremeDSP slices and any full adders shouldn't take up much either. As >far as I can tell the divider would be the biggest block with this >approach, and according to the Xilinx IP datasheet, it can be anywhere >between 80 LUT-FF pairs and 500 for my purposes. This still seems quite >a bit lower than what a CORDIC would require. >That is fairly large for the Xilinx "core". A CORDIC "core" will have more modes than you require. If you only implement what you need it will be much smaller. I would implement the CORDIC algorithm or use the look up table as mentioned. The CORDIC will only require shifts and adds and a small state-machine.

Reply by ●November 30, 20102010-11-30

On 11/30/2010 07:18 AM, Steve Pope wrote:> Alex Hornung<ahornung@gmail.com> wrote: > >> I would of course welcome any solution that would allow me to make this >> even simpler (for example by removing the divider somehow). Considering >> that I don't require much accuracy, there might be even more efficient >> solutions that I don't know anything about. > > Have you considered a one-octant LUT plus mirroring? How much accuracy > do you need? These things are usually small.Unless the incoming data is normalized he'd still have to do the divide. But he should have a LUT on his list of things to try. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com Do you need to implement control loops in software? "Applied Control Theory for Embedded Systems" was written for you. See details at http://www.wescottdesign.com/actfes/actfes.html