# fixed point atan2

Started by November 30, 2010
```Hi,

first off I'd like to apologize in case this question is/sounds
extremely stupid, but I'm really stuck.

I need an efficient atan2 that doesn't take up as much real estate on an
FPGA as a CORDIC would. As such, I've been looking at the 'Trick'
mentioned at dspguru[1].

So according to that trick, given:
x = 0.1838
y = -0.1818

I would now do the following, since it is in the IV quadrant:
r = (x-y)/(x+y)

But here is the problem: the result of that division is around 177. If I
now continue on and find the angle by doing:
theta = pi/4 - pi/4*r (or rather pi/4*r - pi/4)

it will obviously be horribly wrong. What am I missing? I'm testing all
this in Matlab, but I also tried using the fixed point toolbox, and the
result with the same number of bits and fraction length is simply '1'.
How is this supposed to work? Where am I wrong?

Alex Hornung

[1]:
http://www.dspguru.com/dsp/tricks/fixed-point-atan2-with-self-normalization
```
```On Nov 30, 2:47&#2013266080;am, Alex Hornung <ahorn...@gmail.com> wrote:
> Hi,
>
> first off I'd like to apologize in case this question is/sounds
> extremely stupid, but I'm really stuck.
>
> I need an efficient atan2 that doesn't take up as much real estate on an
> FPGA as a CORDIC would. As such, I've been looking at the 'Trick'
> mentioned at dspguru[1].
>
> So according to that trick, given:
> x = 0.1838
> y = -0.1818
>
> I would now do the following, since it is in the IV quadrant:
> r = (x-y)/(x+y)
>
> But here is the problem: the result of that division is around 177. If I
> now continue on and find the angle by doing:
> theta = pi/4 - pi/4*r (or rather pi/4*r - pi/4)
>
> it will obviously be horribly wrong. What am I missing? I'm testing all
> this in Matlab, but I also tried using the fixed point toolbox, and the
> result with the same number of bits and fraction length is simply '1'.
> How is this supposed to work? Where am I wrong?
>
> Alex Hornung
>
> [1]:http://www.dspguru.com/dsp/tricks/fixed-point-atan2-with-self-normali...

You are missing the use of abs(y). See the accompanying code.

Hope this helps.

Greg
```
```On 30/11/2010 10:58, Greg Heath wrote:
> On Nov 30, 2:47 am, Alex Hornung<ahorn...@gmail.com>  wrote:
>> Hi,
>>
>> first off I'd like to apologize in case this question is/sounds
>> extremely stupid, but I'm really stuck.
>>
>> I need an efficient atan2 that doesn't take up as much real estate on an
>> FPGA as a CORDIC would. As such, I've been looking at the 'Trick'
>> mentioned at dspguru[1].
>>
>> So according to that trick, given:
>> x = 0.1838
>> y = -0.1818
>>
>> I would now do the following, since it is in the IV quadrant:
>> r = (x-y)/(x+y)
>>
>> But here is the problem: the result of that division is around 177. If I
>> now continue on and find the angle by doing:
>> theta = pi/4 - pi/4*r (or rather pi/4*r - pi/4)
>>
>> it will obviously be horribly wrong. What am I missing? I'm testing all
>> this in Matlab, but I also tried using the fixed point toolbox, and the
>> result with the same number of bits and fraction length is simply '1'.
>> How is this supposed to work? Where am I wrong?
>>
>> Alex Hornung
>>
>> [1]:http://www.dspguru.com/dsp/tricks/fixed-point-atan2-with-self-normali...
>
> You are missing the use of abs(y). See the accompanying code.
>
> Hope this helps.
>
> Greg

It sure does!

Thank you very much,
Alex
```
```>I need an efficient atan2 that doesn't take up as much real estate on an
>FPGA as a CORDIC would. As such, I've been looking at the 'Trick'
>mentioned at dspguru[1].
>

Why do you believe the CORDIC uses more FPGA "real estate" than this other
approach?
```
```On 30/11/2010 13:28, cfelton wrote:
>> I need an efficient atan2 that doesn't take up as much real estate on an
>> FPGA as a CORDIC would. As such, I've been looking at the 'Trick'
>> mentioned at dspguru[1].
>>
>
> Why do you believe the CORDIC uses more FPGA "real estate" than this other
> approach?

From the data provided by Xilinx, a CORDIC takes up anywhere between
1300 and 4000 LUT-FF pairs. Any multiplier I'd use would take up at most
4 xtremeDSP slices and any full adders shouldn't take up much either. As
far as I can tell the divider would be the biggest block with this
approach, and according to the Xilinx IP datasheet, it can be anywhere
between 80 LUT-FF pairs and 500 for my purposes. This still seems quite
a bit lower than what a CORDIC would require.

In terms of latency it should be almost the same as a CORDIC, mainly due
to the divider, again.

I would of course welcome any solution that would allow me to make this
even simpler (for example by removing the divider somehow). Considering
that I don't require much accuracy, there might be even more efficient
solutions that I don't know anything about.

As you might have guessed from my first post, I'm quite new to this
(both FPGAs and DSP) and I'd greatly appreciate any further insight.

Kind Regards,
Alex Hornung
```
```Alex Hornung  <ahornung@gmail.com> wrote:

>I would of course welcome any solution that would allow me to make this
>even simpler (for example by removing the divider somehow). Considering
>that I don't require much accuracy, there might be even more efficient
>solutions that I don't know anything about.

Have you considered a one-octant LUT plus mirroring?  How much accuracy
do you need?  These things are usually small.

Steve
```
```On 30/11/2010 15:18, Steve Pope wrote:
> Alex Hornung<ahornung@gmail.com>  wrote:
>
>> I would of course welcome any solution that would allow me to make this
>> even simpler (for example by removing the divider somehow). Considering
>> that I don't require much accuracy, there might be even more efficient
>> solutions that I don't know anything about.
>
> Have you considered a one-octant LUT plus mirroring?  How much accuracy
> do you need?  These things are usually small.
>
>
> Steve

No, and I have no idea on how that would work, to get everything from
that one octant. Remember I'm really new to all of this :) Do you happen
to have any paper/website/etc about it?

The accuracy I need is somewhere around 0.1 to 0.2 radians. Maybe I
could even use some lookup for the all the values with this kind of
accuracy? After all it would just be around 60 possibilities for the
whole circle.

Cheers,
Alex
```
```Alex Hornung  <ahornung@gmail.com> wrote:

>On 30/11/2010 15:18, Steve Pope wrote:

>> Alex Hornung<ahornung@gmail.com>  wrote:

>>> I would of course welcome any solution that would allow me to make this
>>> even simpler (for example by removing the divider somehow). Considering
>>> that I don't require much accuracy, there might be even more efficient
>>> solutions that I don't know anything about.

>> Have you considered a one-octant LUT plus mirroring?  How much accuracy
>> do you need?  These things are usually small.

>No, and I have no idea on how that would work, to get everything from
>that one octant. Remember I'm really new to all of this :) Do you happen
>to have any paper/website/etc about it?

>The accuracy I need is somewhere around 0.1 to 0.2 radians. Maybe I
>could even use some lookup for the all the values with this kind of
>accuracy?

Yes, probably.

Suppose the input to your four quadrant arctan is two 5-bit signed values
representing a complex number.  That's 1024 possible arctan values,
which is a prettty large lookup table, but by manipulating these so
that the input of the table lies always in the first octant, you are
now down to 9 * 9 = 81 values.  I think you will find the accuracy
is better than 0.1 radians using such an approach.

I know of no paper, you just have to design it and try it out.

Steve
```
```> From the data provided by Xilinx, a CORDIC takes up anywhere between
>1300 and 4000 LUT-FF pairs. Any multiplier I'd use would take up at most
>4 xtremeDSP slices and any full adders shouldn't take up much either. As
>far as I can tell the divider would be the biggest block with this
>approach, and according to the Xilinx IP datasheet, it can be anywhere
>between 80 LUT-FF pairs and 500 for my purposes. This still seems quite
>a bit lower than what a CORDIC would require.
>

That is fairly large for the Xilinx "core".  A CORDIC "core" will have more
modes than you require.  If you only implement what you need it will be
much smaller.  I would implement the CORDIC algorithm or use the look up
table as mentioned.  The CORDIC will only require shifts and adds and a
small state-machine.
```
```On 11/30/2010 07:18 AM, Steve Pope wrote:
> Alex Hornung<ahornung@gmail.com>  wrote:
>
>> I would of course welcome any solution that would allow me to make this
>> even simpler (for example by removing the divider somehow). Considering
>> that I don't require much accuracy, there might be even more efficient
>> solutions that I don't know anything about.
>
> Have you considered a one-octant LUT plus mirroring?  How much accuracy
> do you need?  These things are usually small.

Unless the incoming data is normalized he'd still have to do the divide.

But he should have a LUT on his list of things to try.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html
```