On Nov 30, 8:47=A0pm, Alex Hornung <ahorn...@gmail.com> wrote:
> Hi,
>
> first off I'd like to apologize in case this question is/sounds
> extremely stupid, but I'm really stuck.
>
> I need an efficient atan2 that doesn't take up as much real estate on an
> FPGA as a CORDIC would. As such, I've been looking at the 'Trick'
> mentioned at dspguru[1].
>
> So according to that trick, given:
> x =3D 0.1838
> y =3D -0.1818
>
> I would now do the following, since it is in the IV quadrant:
> r =3D (x-y)/(x+y)
>
> But here is the problem: the result of that division is around 177. If I
> now continue on and find the angle by doing:
> theta =3D pi/4 - pi/4*r (or rather pi/4*r - pi/4)
>
> it will obviously be horribly wrong. What am I missing? I'm testing all
> this in Matlab, but I also tried using the fixed point toolbox, and the
> result with the same number of bits and fraction length is simply '1'.
> How is this supposed to work? Where am I wrong?
>
> Thank you in advance,
> Alex Hornung
>
> [1]:http://www.dspguru.com/dsp/tricks/fixed-point-atan2-with-self-normali=
...

I don't know the application, but if it's software radio then you
don't need atan at all for FM at least.

Hardy

On 11/30/2010 10:13 AM, Alex Hornung wrote:
> On 30/11/2010 17:51, Rob Gaddi wrote:
>> On 11/30/2010 7:42 AM, Steve Pope wrote:
>>> Alex Hornung<ahornung@gmail.com> wrote:
>>>
>>>> On 30/11/2010 15:18, Steve Pope wrote:
>>>
>>>>> Alex Hornung<ahornung@gmail.com> wrote:
>>>
>>>>>> I would of course welcome any solution that would allow me to make
>>>>>> this
>>>>>> even simpler (for example by removing the divider somehow).
>>>>>> Considering
>>>>>> that I don't require much accuracy, there might be even more
>>>>>> efficient
>>>>>> solutions that I don't know anything about.
>>>
>>>>> Have you considered a one-octant LUT plus mirroring? How much accuracy
>>>>> do you need? These things are usually small.
>>>
>>>> No, and I have no idea on how that would work, to get everything from
>>>> that one octant. Remember I'm really new to all of this :) Do you
>>>> happen
>>>> to have any paper/website/etc about it?
>>>
>>>> The accuracy I need is somewhere around 0.1 to 0.2 radians. Maybe I
>>>> could even use some lookup for the all the values with this kind of
>>>> accuracy?
>>>
>>> Yes, probably.
>>>
>>> Suppose the input to your four quadrant arctan is two 5-bit signed
>>> values
>>> representing a complex number. That's 1024 possible arctan values,
>>> which is a prettty large lookup table, but by manipulating these so
>>> that the input of the table lies always in the first octant, you are
>>> now down to 9 * 9 = 81 values. I think you will find the accuracy
>>> is better than 0.1 radians using such an approach.
>>>
>>> I know of no paper, you just have to design it and try it out.
>>>
>>> Steve
>>
>> It's early in the morning, and I grant I'm under-caffeinated, but a 1024
>> element lookup table just doesn't strike me as a deal breaker. A single
>> Xilinx BRAM or two Altera M9Ks gives you an 18-bit output for that
>> 10-bit input, assuming you don't do any octant folding. If you weren't
>> using that RAM for anything yet than that's absolutely free, no fabric
>> required, and gives you a single cycle ATAN function.
>>
>> Folding costs you a few cycles (though it can be pipelined if
>> throughput's an issue) and a small amount of fabric for a pretty hefty
>> resolution improvement, or you can just brute force it by throwing more
>> RAMs at the problem.
>>
>> Never underestimate the ability of a (pseudo-)ROM to implement arbitrary
>> functions.
>>
>
> That actually sounds pretty perfect. Didn't even think of using the BRAM
> blocks for this, and I can definitely spare 1 out of 192. I'll implement
> it with the Xilinx Block Memory Generator, but I was wondering, just out
> of curiosity, if there is some way of using them directly from VHDL?
>
> Regards,
> Alex

You're looking to implement a single-port, synchronous read ROM. 
There's a code template in some piece of documentation (xst.pdf I think) 
that discusses the VHDL you have to write in order to implement one of 
thems.  Closely following their example code will yield the best 
results, and you'll want to look through the XST output log in order to 
make sure that it did in fact infer a ROM.  You'll know if it didn't; 
synthesis will take forever as it tries to build logic trees out of it 
instead.

You can either declare all the table values inline in the VHDL or use 
std.textio to read them in from an external file.  I prefer the external 
file from an aesthetic standpoint, but it does make things a bit trickier.

-- 
Rob Gaddi, Highland Technology
Email address is currently out of order

On 30/11/2010 17:51, Rob Gaddi wrote:
> On 11/30/2010 7:42 AM, Steve Pope wrote:
>> Alex Hornung<ahornung@gmail.com> wrote:
>>
>>> On 30/11/2010 15:18, Steve Pope wrote:
>>
>>>> Alex Hornung<ahornung@gmail.com> wrote:
>>
>>>>> I would of course welcome any solution that would allow me to make
>>>>> this
>>>>> even simpler (for example by removing the divider somehow).
>>>>> Considering
>>>>> that I don't require much accuracy, there might be even more efficient
>>>>> solutions that I don't know anything about.
>>
>>>> Have you considered a one-octant LUT plus mirroring? How much accuracy
>>>> do you need? These things are usually small.
>>
>>> No, and I have no idea on how that would work, to get everything from
>>> that one octant. Remember I'm really new to all of this :) Do you happen
>>> to have any paper/website/etc about it?
>>
>>> The accuracy I need is somewhere around 0.1 to 0.2 radians. Maybe I
>>> could even use some lookup for the all the values with this kind of
>>> accuracy?
>>
>> Yes, probably.
>>
>> Suppose the input to your four quadrant arctan is two 5-bit signed values
>> representing a complex number. That's 1024 possible arctan values,
>> which is a prettty large lookup table, but by manipulating these so
>> that the input of the table lies always in the first octant, you are
>> now down to 9 * 9 = 81 values. I think you will find the accuracy
>> is better than 0.1 radians using such an approach.
>>
>> I know of no paper, you just have to design it and try it out.
>>
>> Steve
>
> It's early in the morning, and I grant I'm under-caffeinated, but a 1024
> element lookup table just doesn't strike me as a deal breaker. A single
> Xilinx BRAM or two Altera M9Ks gives you an 18-bit output for that
> 10-bit input, assuming you don't do any octant folding. If you weren't
> using that RAM for anything yet than that's absolutely free, no fabric
> required, and gives you a single cycle ATAN function.
>
> Folding costs you a few cycles (though it can be pipelined if
> throughput's an issue) and a small amount of fabric for a pretty hefty
> resolution improvement, or you can just brute force it by throwing more
> RAMs at the problem.
>
> Never underestimate the ability of a (pseudo-)ROM to implement arbitrary
> functions.
>

That actually sounds pretty perfect. Didn't even think of using the BRAM 
blocks for this, and I can definitely spare 1 out of 192. I'll implement 
it with the Xilinx Block Memory Generator, but I was wondering, just out 
of curiosity, if there is some way of using them directly from VHDL?

Regards,
Alex

On 11/30/2010 7:42 AM, Steve Pope wrote:
> Alex Hornung<ahornung@gmail.com>  wrote:
>
>> On 30/11/2010 15:18, Steve Pope wrote:
>
>>> Alex Hornung<ahornung@gmail.com>   wrote:
>
>>>> I would of course welcome any solution that would allow me to make this
>>>> even simpler (for example by removing the divider somehow). Considering
>>>> that I don't require much accuracy, there might be even more efficient
>>>> solutions that I don't know anything about.
>
>>> Have you considered a one-octant LUT plus mirroring?  How much accuracy
>>> do you need?  These things are usually small.
>
>> No, and I have no idea on how that would work, to get everything from
>> that one octant. Remember I'm really new to all of this :) Do you happen
>> to have any paper/website/etc about it?
>
>> The accuracy I need is somewhere around 0.1 to 0.2 radians. Maybe I
>> could even use some lookup for the all the values with this kind of
>> accuracy?
>
> Yes, probably.
>
> Suppose the input to your four quadrant arctan is two 5-bit signed values
> representing a complex number.  That's 1024 possible arctan values,
> which is a prettty large lookup table, but by manipulating these so
> that the input of the table lies always in the first octant, you are
> now down to 9 * 9 = 81 values.  I think you will find the accuracy
> is better than 0.1 radians using such an approach.
>
> I know of no paper, you just have to design it and try it out.
>
> Steve

It's early in the morning, and I grant I'm under-caffeinated, but a 1024 
element lookup table just doesn't strike me as a deal breaker.  A single 
Xilinx BRAM or two Altera M9Ks gives you an 18-bit output for that 
10-bit input, assuming you don't do any octant folding.  If you weren't 
using that RAM for anything yet than that's absolutely free, no fabric 
required, and gives you a single cycle ATAN function.

Folding costs you a few cycles (though it can be pipelined if 
throughput's an issue) and a small amount of fabric for a pretty hefty 
resolution improvement, or you can just brute force it by throwing more 
RAMs at the problem.

Never underestimate the ability of a (pseudo-)ROM to implement arbitrary 
functions.

-- 
Rob Gaddi, Highland Technology
Email address is currently out of order

Alex Hornung  <ahornung@gmail.com> wrote:

>On 30/11/2010 17:23, Steve Pope wrote:

>> If the incoming data comprises too many fixed point bits, then he
>> has to massage it, either by normalizing (generally involving shifting)
>> or by dividing.  I went on to give an example where there is 10 bits
>> total of incoming data.  If there are too many more than this, then
>> yes something will have to be done in this regard.  Usually it would
>> not be a divide though.

>Actually I'm seeing that as little as 2 bits of precision are enough. So 
>erring a bit on the safe side, 3 bits, would give me a 2^6 lookup table 
>(64 entries) which should be very reasonable?

Well, I came up with 81 entries (9 * 9, rather than 8 * 8) but that
assumes it's convenient to include both end-points of the octant.
Due to the necessary mirroring that can be convenient.

>Regarding normalization; I basically read 14-bit complex samples off an 
>ADC (14 bit real, 14 bit imag), so I was thinking I can just shift them 
>so all the non-zero bits are behind the decimal point, which should be 
>normalized (?).

>Am I making some wrong assumption here? Is there anything wrong with 
>just normalizing by shifting the hell out of it?

No, that's fine.  You need to count leading zeros and ones, not
just leading zeros, if the data is two's complement.  That's about it.
It's pretty straightforward that for a given number of bits in the result,
normalizing by a factor of two is no more than one bit worse than doing
a full divide.

Steve

On 30/11/2010 17:23, Steve Pope wrote:
> Tim Wescott<tim@seemywebsite.com>  wrote:
>
>> On 11/30/2010 07:18 AM, Steve Pope wrote:
>
>>> Have you considered a one-octant LUT plus mirroring?  How much accuracy
>>> do you need?  These things are usually small.
>
>> Unless the incoming data is normalized he'd still have to do the divide.
>
> Well, that's a little sweeping.
>
> If the incoming data comprises too many fixed point bits, then he
> has to massage it, either by normalizing (generally involving shifting)
> or by dividing.  I went on to give an example where there is 10 bits
> total of incoming data.  If there are too many more than this, then
> yes something will have to be done in this regard.  Usually it would
> not be a divide though.
>
> Steve

Actually I'm seeing that as little as 2 bits of precision are enough. So 
erring a bit on the safe side, 3 bits, would give me a 2^6 lookup table 
(64 entries) which should be very reasonable?

Regarding normalization; I basically read 14-bit complex samples off an 
ADC (14 bit real, 14 bit imag), so I was thinking I can just shift them 
so all the non-zero bits are behind the decimal point, which should be 
normalized (?).

Am I making some wrong assumption here? Is there anything wrong with 
just normalizing by shifting the hell out of it?

Regards,
Alex

Alex Hornung wrote:

> The accuracy I need is somewhere around 0.1 to 0.2 radians. Maybe I 
> could even use some lookup for the all the values with this kind of 
> accuracy? After all it would just be around 60 possibilities for the 
> whole circle.

If the accuracy is as coarse as 0.1 radian, then atan2(x,y) ~ x/y within 
an octant of the circle. Fold the angle to a proper octant and set the 
signs accordingly. Normalize x and y before division, so you can get by 
some small additions and corrections without any division at all.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Tim Wescott  <tim@seemywebsite.com> wrote:

>On 11/30/2010 07:18 AM, Steve Pope wrote:

>> Have you considered a one-octant LUT plus mirroring?  How much accuracy
>> do you need?  These things are usually small.

>Unless the incoming data is normalized he'd still have to do the divide.

Well, that's a little sweeping.

If the incoming data comprises too many fixed point bits, then he
has to massage it, either by normalizing (generally involving shifting)
or by dividing.  I went on to give an example where there is 10 bits
total of incoming data.  If there are too many more than this, then
yes something will have to be done in this regard.  Usually it would
not be a divide though.

Steve