fixed point atan2

Started by November 30, 2010
```On Nov 30, 10:30=A0am, Alex Hornung <ahorn...@gmail.com> wrote:
>
> The accuracy I need is somewhere around 0.1 to 0.2 radians.

that's not so strict.  an error of 5 degrees is okay?

> Maybe I
> could even use some lookup for the all the values with this kind of
> accuracy? After all it would just be around 60 possibilities for the
> whole circle.

if it's atan2(x,y) you have 2 independent variables for your LUT.

want to do this with a finite power series (it wouldn't have to be
very high order for just 5 degrees).  in fact, maybe that
approximation in Rick's 2nd edition would be good enough.

r b-j

```
```Tim Wescott  <tim@seemywebsite.com> wrote:

>On 11/30/2010 07:18 AM, Steve Pope wrote:

>> Have you considered a one-octant LUT plus mirroring?  How much accuracy
>> do you need?  These things are usually small.

>Unless the incoming data is normalized he'd still have to do the divide.

Well, that's a little sweeping.

If the incoming data comprises too many fixed point bits, then he
has to massage it, either by normalizing (generally involving shifting)
or by dividing.  I went on to give an example where there is 10 bits
total of incoming data.  If there are too many more than this, then
yes something will have to be done in this regard.  Usually it would
not be a divide though.

Steve
```
```
Alex Hornung wrote:

> The accuracy I need is somewhere around 0.1 to 0.2 radians. Maybe I
> could even use some lookup for the all the values with this kind of
> accuracy? After all it would just be around 60 possibilities for the
> whole circle.

If the accuracy is as coarse as 0.1 radian, then atan2(x,y) ~ x/y within
an octant of the circle. Fold the angle to a proper octant and set the
signs accordingly. Normalize x and y before division, so you can get by
some small additions and corrections without any division at all.

DSP and Mixed Signal Design Consultant
http://www.abvolt.com

```
```On 30/11/2010 17:23, Steve Pope wrote:
> Tim Wescott<tim@seemywebsite.com>  wrote:
>
>> On 11/30/2010 07:18 AM, Steve Pope wrote:
>
>>> Have you considered a one-octant LUT plus mirroring?  How much accuracy
>>> do you need?  These things are usually small.
>
>> Unless the incoming data is normalized he'd still have to do the divide.
>
> Well, that's a little sweeping.
>
> If the incoming data comprises too many fixed point bits, then he
> has to massage it, either by normalizing (generally involving shifting)
> or by dividing.  I went on to give an example where there is 10 bits
> total of incoming data.  If there are too many more than this, then
> yes something will have to be done in this regard.  Usually it would
> not be a divide though.
>
> Steve

Actually I'm seeing that as little as 2 bits of precision are enough. So
erring a bit on the safe side, 3 bits, would give me a 2^6 lookup table
(64 entries) which should be very reasonable?

Regarding normalization; I basically read 14-bit complex samples off an
ADC (14 bit real, 14 bit imag), so I was thinking I can just shift them
so all the non-zero bits are behind the decimal point, which should be
normalized (?).

Am I making some wrong assumption here? Is there anything wrong with
just normalizing by shifting the hell out of it?

Regards,
Alex
```
```Alex Hornung  <ahornung@gmail.com> wrote:

>On 30/11/2010 17:23, Steve Pope wrote:

>> If the incoming data comprises too many fixed point bits, then he
>> has to massage it, either by normalizing (generally involving shifting)
>> or by dividing.  I went on to give an example where there is 10 bits
>> total of incoming data.  If there are too many more than this, then
>> yes something will have to be done in this regard.  Usually it would
>> not be a divide though.

>Actually I'm seeing that as little as 2 bits of precision are enough. So
>erring a bit on the safe side, 3 bits, would give me a 2^6 lookup table
>(64 entries) which should be very reasonable?

Well, I came up with 81 entries (9 * 9, rather than 8 * 8) but that
assumes it's convenient to include both end-points of the octant.
Due to the necessary mirroring that can be convenient.

>Regarding normalization; I basically read 14-bit complex samples off an
>ADC (14 bit real, 14 bit imag), so I was thinking I can just shift them
>so all the non-zero bits are behind the decimal point, which should be
>normalized (?).

>Am I making some wrong assumption here? Is there anything wrong with
>just normalizing by shifting the hell out of it?

No, that's fine.  You need to count leading zeros and ones, not
just leading zeros, if the data is two's complement.  That's about it.
It's pretty straightforward that for a given number of bits in the result,
normalizing by a factor of two is no more than one bit worse than doing
a full divide.

Steve
```
```On 11/30/2010 7:42 AM, Steve Pope wrote:
> Alex Hornung<ahornung@gmail.com>  wrote:
>
>> On 30/11/2010 15:18, Steve Pope wrote:
>
>>> Alex Hornung<ahornung@gmail.com>   wrote:
>
>>>> I would of course welcome any solution that would allow me to make this
>>>> even simpler (for example by removing the divider somehow). Considering
>>>> that I don't require much accuracy, there might be even more efficient
>>>> solutions that I don't know anything about.
>
>>> Have you considered a one-octant LUT plus mirroring?  How much accuracy
>>> do you need?  These things are usually small.
>
>> No, and I have no idea on how that would work, to get everything from
>> that one octant. Remember I'm really new to all of this :) Do you happen
>> to have any paper/website/etc about it?
>
>> The accuracy I need is somewhere around 0.1 to 0.2 radians. Maybe I
>> could even use some lookup for the all the values with this kind of
>> accuracy?
>
> Yes, probably.
>
> Suppose the input to your four quadrant arctan is two 5-bit signed values
> representing a complex number.  That's 1024 possible arctan values,
> which is a prettty large lookup table, but by manipulating these so
> that the input of the table lies always in the first octant, you are
> now down to 9 * 9 = 81 values.  I think you will find the accuracy
> is better than 0.1 radians using such an approach.
>
> I know of no paper, you just have to design it and try it out.
>
> Steve

It's early in the morning, and I grant I'm under-caffeinated, but a 1024
element lookup table just doesn't strike me as a deal breaker.  A single
Xilinx BRAM or two Altera M9Ks gives you an 18-bit output for that
10-bit input, assuming you don't do any octant folding.  If you weren't
using that RAM for anything yet than that's absolutely free, no fabric
required, and gives you a single cycle ATAN function.

Folding costs you a few cycles (though it can be pipelined if
throughput's an issue) and a small amount of fabric for a pretty hefty
resolution improvement, or you can just brute force it by throwing more
RAMs at the problem.

Never underestimate the ability of a (pseudo-)ROM to implement arbitrary
functions.

--
Email address is currently out of order
```
```On 30/11/2010 17:51, Rob Gaddi wrote:
> On 11/30/2010 7:42 AM, Steve Pope wrote:
>> Alex Hornung<ahornung@gmail.com> wrote:
>>
>>> On 30/11/2010 15:18, Steve Pope wrote:
>>
>>>> Alex Hornung<ahornung@gmail.com> wrote:
>>
>>>>> I would of course welcome any solution that would allow me to make
>>>>> this
>>>>> even simpler (for example by removing the divider somehow).
>>>>> Considering
>>>>> that I don't require much accuracy, there might be even more efficient
>>>>> solutions that I don't know anything about.
>>
>>>> Have you considered a one-octant LUT plus mirroring? How much accuracy
>>>> do you need? These things are usually small.
>>
>>> No, and I have no idea on how that would work, to get everything from
>>> that one octant. Remember I'm really new to all of this :) Do you happen
>>> to have any paper/website/etc about it?
>>
>>> The accuracy I need is somewhere around 0.1 to 0.2 radians. Maybe I
>>> could even use some lookup for the all the values with this kind of
>>> accuracy?
>>
>> Yes, probably.
>>
>> Suppose the input to your four quadrant arctan is two 5-bit signed values
>> representing a complex number. That's 1024 possible arctan values,
>> which is a prettty large lookup table, but by manipulating these so
>> that the input of the table lies always in the first octant, you are
>> now down to 9 * 9 = 81 values. I think you will find the accuracy
>> is better than 0.1 radians using such an approach.
>>
>> I know of no paper, you just have to design it and try it out.
>>
>> Steve
>
> It's early in the morning, and I grant I'm under-caffeinated, but a 1024
> element lookup table just doesn't strike me as a deal breaker. A single
> Xilinx BRAM or two Altera M9Ks gives you an 18-bit output for that
> 10-bit input, assuming you don't do any octant folding. If you weren't
> using that RAM for anything yet than that's absolutely free, no fabric
> required, and gives you a single cycle ATAN function.
>
> Folding costs you a few cycles (though it can be pipelined if
> throughput's an issue) and a small amount of fabric for a pretty hefty
> resolution improvement, or you can just brute force it by throwing more
> RAMs at the problem.
>
> Never underestimate the ability of a (pseudo-)ROM to implement arbitrary
> functions.
>

That actually sounds pretty perfect. Didn't even think of using the BRAM
blocks for this, and I can definitely spare 1 out of 192. I'll implement
it with the Xilinx Block Memory Generator, but I was wondering, just out
of curiosity, if there is some way of using them directly from VHDL?

Regards,
Alex
```
```On 11/30/2010 10:13 AM, Alex Hornung wrote:
> On 30/11/2010 17:51, Rob Gaddi wrote:
>> On 11/30/2010 7:42 AM, Steve Pope wrote:
>>> Alex Hornung<ahornung@gmail.com> wrote:
>>>
>>>> On 30/11/2010 15:18, Steve Pope wrote:
>>>
>>>>> Alex Hornung<ahornung@gmail.com> wrote:
>>>
>>>>>> I would of course welcome any solution that would allow me to make
>>>>>> this
>>>>>> even simpler (for example by removing the divider somehow).
>>>>>> Considering
>>>>>> that I don't require much accuracy, there might be even more
>>>>>> efficient
>>>>>> solutions that I don't know anything about.
>>>
>>>>> Have you considered a one-octant LUT plus mirroring? How much accuracy
>>>>> do you need? These things are usually small.
>>>
>>>> No, and I have no idea on how that would work, to get everything from
>>>> that one octant. Remember I'm really new to all of this :) Do you
>>>> happen
>>>> to have any paper/website/etc about it?
>>>
>>>> The accuracy I need is somewhere around 0.1 to 0.2 radians. Maybe I
>>>> could even use some lookup for the all the values with this kind of
>>>> accuracy?
>>>
>>> Yes, probably.
>>>
>>> Suppose the input to your four quadrant arctan is two 5-bit signed
>>> values
>>> representing a complex number. That's 1024 possible arctan values,
>>> which is a prettty large lookup table, but by manipulating these so
>>> that the input of the table lies always in the first octant, you are
>>> now down to 9 * 9 = 81 values. I think you will find the accuracy
>>> is better than 0.1 radians using such an approach.
>>>
>>> I know of no paper, you just have to design it and try it out.
>>>
>>> Steve
>>
>> It's early in the morning, and I grant I'm under-caffeinated, but a 1024
>> element lookup table just doesn't strike me as a deal breaker. A single
>> Xilinx BRAM or two Altera M9Ks gives you an 18-bit output for that
>> 10-bit input, assuming you don't do any octant folding. If you weren't
>> using that RAM for anything yet than that's absolutely free, no fabric
>> required, and gives you a single cycle ATAN function.
>>
>> Folding costs you a few cycles (though it can be pipelined if
>> throughput's an issue) and a small amount of fabric for a pretty hefty
>> resolution improvement, or you can just brute force it by throwing more
>> RAMs at the problem.
>>
>> Never underestimate the ability of a (pseudo-)ROM to implement arbitrary
>> functions.
>>
>
> That actually sounds pretty perfect. Didn't even think of using the BRAM
> blocks for this, and I can definitely spare 1 out of 192. I'll implement
> it with the Xilinx Block Memory Generator, but I was wondering, just out
> of curiosity, if there is some way of using them directly from VHDL?
>
> Regards,
> Alex

You're looking to implement a single-port, synchronous read ROM.
There's a code template in some piece of documentation (xst.pdf I think)
that discusses the VHDL you have to write in order to implement one of
thems.  Closely following their example code will yield the best
results, and you'll want to look through the XST output log in order to
make sure that it did in fact infer a ROM.  You'll know if it didn't;
synthesis will take forever as it tries to build logic trees out of it

You can either declare all the table values inline in the VHDL or use
std.textio to read them in from an external file.  I prefer the external
file from an aesthetic standpoint, but it does make things a bit trickier.

--
Email address is currently out of order
```
```On Nov 30, 8:47=A0pm, Alex Hornung <ahorn...@gmail.com> wrote:
> Hi,
>
> first off I'd like to apologize in case this question is/sounds
> extremely stupid, but I'm really stuck.
>
> I need an efficient atan2 that doesn't take up as much real estate on an
> FPGA as a CORDIC would. As such, I've been looking at the 'Trick'
> mentioned at dspguru[1].
>
> So according to that trick, given:
> x =3D 0.1838
> y =3D -0.1818
>
> I would now do the following, since it is in the IV quadrant:
> r =3D (x-y)/(x+y)
>
> But here is the problem: the result of that division is around 177. If I
> now continue on and find the angle by doing:
> theta =3D pi/4 - pi/4*r (or rather pi/4*r - pi/4)
>
> it will obviously be horribly wrong. What am I missing? I'm testing all
> this in Matlab, but I also tried using the fixed point toolbox, and the
> result with the same number of bits and fraction length is simply '1'.
> How is this supposed to work? Where am I wrong?
>
> Alex Hornung
>
> [1]:http://www.dspguru.com/dsp/tricks/fixed-point-atan2-with-self-normali=
...

I don't know the application, but if it's software radio then you
don't need atan at all for FM at least.

Hardy
```
```HardySpicer  <gyansorova@gmail.com> wrote:

>I don't know the application, but if it's software radio then you
>don't need atan at all for FM at least.

That would depend upon your choice of demodulator algorithm, it
seems to me.   Which is going to depend upon other factors, some
of which might add up to "need".

Steve
```