# Implementation of Arctan(x)

Started by October 1, 2007
```Hi,
I am  implementing arctan(x) in hardware can any one suggest me any
of the algorithms other than cordic because it takes lots of iterations to
reach the final result.

Thanks and Regards
Karthik W

```
```karthikw wrote:
> Hi,
>      I am  implementing arctan(x) in hardware can any one suggest me any
> of the algorithms other than cordic because it takes lots of iterations to
> reach the final result.
>
Some sort of a table lookup + interpolation will probably be the best as
far as speed/size.  No matter what you do, you'll have some interesting
effects as the slope goes to infinity -- you'll either be taking
reciprocals (not speedy), or _really_ juggling cost/speed/precision issues.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" gives you just what it
says.
See details at http://www.wescottdesign.com/actfes/actfes.html
```
```"karthikw" <karthikwali@gmail.com> writes:

> Hi,
>      I am  implementing arctan(x) in hardware can any one suggest me any
> of the algorithms other than cordic because it takes lots of iterations to
> reach the final result.
>
> Thanks and Regards
> Karthik W

Hi Karthik,

Try:

Efficient approximations for the arctangent function
Rajan, S.; Sichun Wang; Inkol, R.; Joyal, A.;
Signal Processing Magazine, IEEE
Volume 23,  Issue 3,  May 2006 Page(s):108 - 111
Digital Object Identifier 10.1109/MSP.2006.1628884
Summary: This paper provides several efficient approximations for the arctangent
function using Lagrange interpolation and minimax optimization techniques. These
approximations are particularly useful when processing power, memory, and power
consumption are i.....
--
%  Randy Yates                  % "With time with what you've learned,
%% Fuquay-Varina, NC            %  they'll kiss the ground you walk
%%% 919-577-9882                %  upon."
%%%% <yates@ieee.org>           % '21st Century Man', *Time*, ELO
http://www.digitalsignallabs.com
```
```karthikw wrote:
> Hi,
>      I am  implementing arctan(x) in hardware can any one suggest me any
> of the algorithms other than cordic because it takes lots of iterations to
> reach the final result.

What accuracy do you need? Over what range?

Jerry
--
Engineering is the art of making what you want from things you can get.
&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;
```
```On Oct 2, 5:16 am, "karthikw" <karthikw...@gmail.com> wrote:
> Hi,
>      I am  implementing arctan(x) in hardware can any one suggest me any
> of the algorithms other than cordic because it takes lots of iterations to
> reach the final result.
>
> Thanks and Regards
> Karthik W

If you are demodulating FM then forget it. You don't need arctan(x).

Hardy

```
```Tim Wescott wrote:
> karthikw wrote:
> > Hi,
> >      I am  implementing arctan(x) in hardware can any one suggest me any
> > of the algorithms other than cordic because it takes lots of iterations
to
> > reach the final result.
>
> Some sort of a table lookup + interpolation will probably be the best as
> far as speed/size.  No matter what you do, you'll have some interesting
> effects as the slope goes to infinity

The slope of the arctan goes to zero, not infinity. Perhaps this is
interesting:

http://dspguru.com/comp.dsp/tricks/alg/fxdatan2.htm

Regards,
Andor

```
```Andor wrote:
> Tim Wescott wrote:
>> karthikw wrote:
>>> Hi,
>>>      I am  implementing arctan(x) in hardware can any one suggest me
any
>>> of the algorithms other than cordic because it takes lots of iterations
to
>>> reach the final result.
>> Some sort of a table lookup + interpolation will probably be the best as
>> far as speed/size.  No matter what you do, you'll have some interesting
>> effects as the slope goes to infinity
>
> The slope of the arctan goes to zero, not infinity. Perhaps this is
> interesting:
>
arctan(x) = sin(x)/cos(x).  As x goes to pi/2 the arctan (and it's
slope) goes to infinity.

Granted, for pi/4 < x < pi/2 you can use 1/(arctan(pi/2-x)) -- but then
you're calculating a reciprocal, of a number, so you're back to some
computationally intensive calculation.

Not that I'm an expert.  I've always sliced and diced the ordinate down
to 0 <= x <= pi/4, then negated and inverted as necessary -- but I've
never needed the absolute fastest speed, either.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" gives you just what it
says.
See details at http://www.wescottdesign.com/actfes/actfes.html
```
```Tim Wescott <t...@seemywebsite.com> wrote:
> Andor wrote:
> > Tim Wescott wrote:
> >> karthikw wrote:
> >>> Hi,
> >>>      I am  implementing arctan(x) in hardware can any one suggest
me any
> >>> of the algorithms other than cordic because it takes lots of
iterations to
> >>> reach the final result.
> >> Some sort of a table lookup + interpolation will probably be the best
as
> >> far as speed/size.  No matter what you do, you'll have some
interesting
> >> effects as the slope goes to infinity
>
> > The slope of the arctan goes to zero, not infinity. Perhaps this is
> > interesting:
>
> arctan(x) = sin(x)/cos(x).  As x goes to pi/2 the arctan (and it's
> slope) goes to infinity.

You are mixing up tan with arctan.

Regards,
andor

```
```Andor wrote:
> Tim Wescott <t...@seemywebsite.com> wrote:
>> Andor wrote:
>>> Tim Wescott wrote:
>>>> karthikw wrote:
>>>>> Hi,
>>>>>      I am  implementing arctan(x) in hardware can any one
suggest me any
>>>>> of the algorithms other than cordic because it takes lots of
iterations to
>>>>> reach the final result.
>>>> Some sort of a table lookup + interpolation will probably be the
best as
>>>> far as speed/size.  No matter what you do, you'll have some
interesting
>>>> effects as the slope goes to infinity
>>> The slope of the arctan goes to zero, not infinity. Perhaps this is
>>> interesting:
>> arctan(x) = sin(x)/cos(x).  As x goes to pi/2 the arctan (and it's
>> slope) goes to infinity.
>
> You are mixing up tan with arctan.
>
> Regards,
> andor
>
Argh.  So I am.

In which case a look-up table would work nice, except for the minor
problem of the infinite ordinate -- then one may want a lookup table
with ever-increasing intervals, which actually wouldn't be too bad to
implement.

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" gives you just what it
says.
See details at http://www.wescottdesign.com/actfes/actfes.html
```
```On Oct 1, 5:43 pm, Tim Wescott <t...@seemywebsite.com> wrote:
>
...
>
> In which case a look-up table would work nice, except for the minor
> problem of the infinite ordinate -- then one may want a lookup table
> with ever-increasing intervals, which actually wouldn't be too bad to
> implement.

so how would one determine the index into the table without some
repeated search and compare operations?

also, i think something like

arctan(1/x) = pi/2 - arctan(x)

can be used for the the nearly infinite ordinates, no?  (a division is
required.)

dunno how this would work for hardware, but if a single division is
tolerable and
for -1 <= x <= 1, a very accurate approximation is:

arctan(x) ~= x/f(x^2)

where

f(u)  =     1.0
+  0.33288950512027 * u
+ -0.08467922817644 * u^2
+  0.03252232640125 * u^3
+ -0.00749305860992 * u^4

maybe too many terms, but it's not the finite power series that's bad,
but the division by it that's costly in many different contexts.

r b-j

```
```On Oct 23, 11:46 pm, robert bristow-johnson
<r...@audioimagination.com> wrote:
> we're gonna have to keep track of the fact that there are two
> different Karthiks hanging around here.
>
> r b-j

Sigh... unfortunately so :)

We do have different initials/last names depending on how you look at
it.

K.

```
```On Oct 23, 1:18 am, Karthik <karthik301...@gmail.com> wrote:
> > Hi,
> >     Thanks for the suggestion.Do you know any papers which, gives the
> > above concept of mixing both lut and cordic, and also good papers on fast
> > hardware implementation of cordic. I wanted some information on radix4
> > cordic implementation, can you tell me a good paper on radix4 cordic
> > implementation.
>
> > Thanks and Regards
> > Karthik W
>
> 1) G O O G L E. Ray's writeup(s) on CORDIC is/are probably the most
>
> 2) Spend more time thinking than googling.
>
> Karthik S.

we're gonna have to keep track of the fact that there are two
different Karthiks hanging around here.

r b-j

```
```> Hi,
>     Thanks for the suggestion.Do you know any papers which, gives the
> above concept of mixing both lut and cordic, and also good papers on fast
> hardware implementation of cordic. I wanted some information on radix4
> cordic implementation, can you tell me a good paper on radix4 cordic
> implementation.
>
> Thanks and Regards
> Karthik W

1) G O O G L E. Ray's writeup(s) on CORDIC is/are probably the most

2) Spend more time thinking than googling.

Karthik S.

```
```>karthikw wrote:
>
>>>Hi,
>>>    I am  implementing arctan(x) in hardware can any one suggest me
any
>>>of the algorithms other than cordic because it takes lots of
iterations
>>
>> to
>>
>>>reach the final result.
>>>
>>>Thanks and Regards
>>>Karthik W
>>>
>>>
>>
>> Hi,
>>     Thanks for the response. But actually I am implementing an
rectangular
>> to polar cordinate conversion ie x + i y to sqrt(x^2 + y^2)arctan(y/x).
So
>> I need arctan(y/x) not arctan(x) because a divider costs a lot with
>> respect to area. I intend a accuracy of 4 decimal place which is q15
fixed
>> point format.If go to lookup table, I need a huge memory of 2^16.
>> Interpolation technique is good but I think for hardware
implementation
>> its quite complex ,I some cases I may need a divider or a multiplier.
I
>> also tried looking into the variants (formulas) of arctan but all need
a
>> divider in some or the other way .So if I can have methood which costs
>> less on area and with faster speed than cordic then it will be good.
>>
>> Thanks and Regards
>> Karthik W
>>
>>
>
>
>Unfortunately, most of the algorithms for arctan involve division by a
>variable, so I don't think you are going to find any that give much of a

>net savings in latency over CORDIC for similar performance.  The best
>performance will be from a look-up table, but you are obviously limited
>by to a relatively small number of angles by the size of the memory.
>Your best bet might be to use a cross between CORDIC and a look-up where

>instead of having a binary rotation decision at each step, you have a
>small finite number of rotation angles.  That result is then passed to
>subsequent stages that have finer angular resolutions.  I think in the
>end though, you'll wind up with about the same latency as the CORDIC.
>The other advantage to using CORDIC is that you obtain not only the
>arctan, but you also get the magnitude, and you don't need to compute a
>square root for it.  The square root and the divide are both hardware
>intensive and don't lend themselves well to parallelizing because the
>intermediate results depend on the previous intermediate results.  Same
>is true for CORDIC, only CORDIC gives you both functions at once.
>
>Perhaps you can look at ways to speed up the CORDIC, maybe using a
>multiplied clock or not pipelining at every stage.
>

Hi,
Thanks for the suggestion.Do you know any papers which, gives the
above concept of mixing both lut and cordic, and also good papers on fast
hardware implementation of cordic. I wanted some information on radix4
cordic implementation, can you tell me a good paper on radix4 cordic
implementation.

Thanks and Regards
Karthik W

```
```karthikw wrote:

>>Hi,
>>    I am  implementing arctan(x) in hardware can any one suggest me any
>>of the algorithms other than cordic because it takes lots of iterations
>
> to
>
>>reach the final result.
>>
>>Thanks and Regards
>>Karthik W
>>
>>
>
> Hi,
>     Thanks for the response. But actually I am implementing an rectangular
> to polar cordinate conversion ie x + i y to sqrt(x^2 + y^2)arctan(y/x). So
> I need arctan(y/x) not arctan(x) because a divider costs a lot with
> respect to area. I intend a accuracy of 4 decimal place which is q15 fixed
> point format.If go to lookup table, I need a huge memory of 2^16.
> Interpolation technique is good but I think for hardware implementation
> its quite complex ,I some cases I may need a divider or a multiplier. I
> also tried looking into the variants (formulas) of arctan but all need a
> divider in some or the other way .So if I can have methood which costs
> less on area and with faster speed than cordic then it will be good.
>
> Thanks and Regards
> Karthik W
>
>

Unfortunately, most of the algorithms for arctan involve division by a
variable, so I don't think you are going to find any that give much of a
net savings in latency over CORDIC for similar performance.  The best
performance will be from a look-up table, but you are obviously limited
by to a relatively small number of angles by the size of the memory.
Your best bet might be to use a cross between CORDIC and a look-up where
instead of having a binary rotation decision at each step, you have a
small finite number of rotation angles.  That result is then passed to
subsequent stages that have finer angular resolutions.  I think in the
end though, you'll wind up with about the same latency as the CORDIC.
The other advantage to using CORDIC is that you obtain not only the
arctan, but you also get the magnitude, and you don't need to compute a
square root for it.  The square root and the divide are both hardware
intensive and don't lend themselves well to parallelizing because the
intermediate results depend on the previous intermediate results.  Same
is true for CORDIC, only CORDIC gives you both functions at once.

Perhaps you can look at ways to speed up the CORDIC, maybe using a
multiplied clock or not pipelining at every stage.
```
```On Tue, 02 Oct 2007 01:35:10 -0500, "karthikw"
<karthikwali@gmail.com>
wrote:

>>Hi,
>>     I am  implementing arctan(x) in hardware can any one suggest me any
>>of the algorithms other than cordic because it takes lots of iterations
>to
>>reach the final result.
>>
>>Thanks and Regards
>>Karthik W
>>
>>
>Hi,
>    Thanks for the response. But actually I am implementing an rectangular
>to polar cordinate conversion ie x + i y to sqrt(x^2 + y^2)arctan(y/x). So
>I need arctan(y/x) not arctan(x) because a divider costs a lot with
>respect to area. I intend a accuracy of 4 decimal place which is q15 fixed
>point format.If go to lookup table, I need a huge memory of 2^16.
>Interpolation technique is good but I think for hardware implementation
>its quite complex ,I some cases I may need a divider or a multiplier. I
>also tried looking into the variants (formulas) of arctan but all need a
>divider in some or the other way .So if I can have methood which costs
>less on area and with faster speed than cordic then it will be good.
>
>Thanks and Regards
>Karthik W

Hi Karthik,
ya' might take a look at
"Another Contender in The Arctangent Race",
by R. Lyons, IEEE Signal Processing Magazine,
DSP Tips & Tricks column, Vol. 21, No. 1,
Jan. 2004, page 109.

The algorithm there yields an arctan accuracy
of roughly one quarter of a degree, and does
not use a lookup table.  The algo is fairly
efficient except, darn it, it requires a
divide operation.

I don't know of a fast, accurate, table-free,
divide-free arctan algorithm.  If you find
one, Karthik, you'll become famous.

Good Luck,
[-Rick-]

```
```>On Oct 2, 2:35 am, "karthikw" <karthikw...@gmail.com> wrote:
>...
>> >     I am  implementing arctan(x) in hardware can any one suggest me
any
>> >of the algorithms other than cordic because it takes lots of
iterations to
>> >reach the final result.
>>
>...
>> ... I am implementing an rectangular
>> to polar cordinate conversion ie x + i y to sqrt(x^2 + y^2)arctan(y/x).
So
>> I need arctan(y/x) not arctan(x) because a divider costs a lot with
>> respect to area. I intend a accuracy of 4 decimal place which is q15
fixed
>> point format.If go to lookup table, I need a huge memory of 2^16.
>> Interpolation technique is good but I think for hardware
implementation
>> its quite complex ,I some cases I may need a divider or a multiplier.
I
>> also tried looking into the variants (formulas) of arctan but all need
a
>> divider in some or the other way .So if I can have methood which costs
>> less on area and with faster speed than cordic then it will be good.
>
>i don't think you'll avoid division.  also you will have to break up
>your (x,y) coordinate pair into 4 quadrants (maybe 5, if you want your
>answer to alway be the Principal Angle which has magnitude less than
>pi) as is done in the atan2(x,y) function.  is this for FM
>demodulation?  ultimately, is what you want the *difference* in angles
>from the previous (complex) sample to the current:
>
>     arg{ (x[n] + j*y[n]) }  -  arg{ x[n-1] + j*y[n-1] }   ?
>
>previously on comp.dsp, but i do not know the links.
>
>is that where i can email you?
>
>r b-j
>

Hi,
Thanks for the reply. My mail id is karthikwali@gmail.com. I am
implementing  arctan(x,y) for FFT. The general formula for

atan(x,y) = i log((x + iy)/sqrt(x^2 + y^2)).

And you can see how complex it is to implement in hardware.So
I thought for going to cordic, but it is requiring 13 cycle latency if I
pipeline it.But if I want to calculate it at different times always there
will be a latency of 13 cycles.

Thanks and Regards
Karthik W

```
```On Oct 2, 2:35 am, "karthikw" <karthikw...@gmail.com> wrote:
...
> >     I am  implementing arctan(x) in hardware can any one suggest me any
> >of the algorithms other than cordic because it takes lots of iterations to
> >reach the final result.
>
...
> ... I am implementing an rectangular
> to polar cordinate conversion ie x + i y to sqrt(x^2 + y^2)arctan(y/x). So
> I need arctan(y/x) not arctan(x) because a divider costs a lot with
> respect to area. I intend a accuracy of 4 decimal place which is q15 fixed
> point format.If go to lookup table, I need a huge memory of 2^16.
> Interpolation technique is good but I think for hardware implementation
> its quite complex ,I some cases I may need a divider or a multiplier. I
> also tried looking into the variants (formulas) of arctan but all need a
> divider in some or the other way .So if I can have methood which costs
> less on area and with faster speed than cordic then it will be good.

i don't think you'll avoid division.  also you will have to break up
your (x,y) coordinate pair into 4 quadrants (maybe 5, if you want your
answer to alway be the Principal Angle which has magnitude less than
pi) as is done in the atan2(x,y) function.  is this for FM
demodulation?  ultimately, is what you want the *difference* in angles
from the previous (complex) sample to the current:

arg{ (x[n] + j*y[n]) }  -  arg{ x[n-1] + j*y[n-1] }   ?

previously on comp.dsp, but i do not know the links.

is that where i can email you?

r b-j

```
```>Hi,
>     I am  implementing arctan(x) in hardware can any one suggest me any
>of the algorithms other than cordic because it takes lots of iterations
to
>reach the final result.
>
>Thanks and Regards
>Karthik W
>
>
Hi,
Thanks for the response. But actually I am implementing an rectangular
to polar cordinate conversion ie x + i y to sqrt(x^2 + y^2)arctan(y/x). So
I need arctan(y/x) not arctan(x) because a divider costs a lot with
respect to area. I intend a accuracy of 4 decimal place which is q15 fixed
point format.If go to lookup table, I need a huge memory of 2^16.
Interpolation technique is good but I think for hardware implementation
its quite complex ,I some cases I may need a divider or a multiplier. I
also tried looking into the variants (formulas) of arctan but all need a
divider in some or the other way .So if I can have methood which costs
less on area and with faster speed than cordic then it will be good.

Thanks and Regards
Karthik W

```
```robert bristow-johnson wrote:
> On Oct 1, 5:43 pm, Tim Wescott <t...@seemywebsite.com> wrote:
> ...
>> In which case a look-up table would work nice, except for the minor
>> problem of the infinite ordinate -- then one may want a lookup table
>> with ever-increasing intervals, which actually wouldn't be too bad to
>> implement.
>
> so how would one determine the index into the table without some
> repeated search and compare operations?

For floating point, look at the mantissa and do it sorta
logarithmically.  For fixed-point, count the number of zeros (preferably
with fast, pipelined logic).
>
> also, i think something like
>
>     arctan(1/x) = pi/2 - arctan(x)
>
> can be used for the the nearly infinite ordinates, no?  (a division is
> required.)
>
> dunno how this would work for hardware, but if a single division is
> tolerable and
> for -1 <= x <= 1, a very accurate approximation is:
>
>     arctan(x) ~= x/f(x^2)
>
> where
>
>     f(u)  =     1.0
>              +  0.33288950512027 * u
>              + -0.08467922817644 * u^2
>              +  0.03252232640125 * u^3
>              + -0.00749305860992 * u^4
>
> maybe too many terms, but it's not the finite power series that's bad,
> but the division by it that's costly in many different contexts.
>
> r b-j
>
>

--

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" gives you just what it
says.
See details at http://www.wescottdesign.com/actfes/actfes.html
```