Praveen,

See my last post for more comments, questions.

Dirk

"praveen" <praveenkumar_11@yahoo.com> wrote in message
news:d8daf655.0312150501.5ff9134e@posting.google.com...
> Hello,
>
> > Are you using single precision (16 bits) or double precision (32 bits)
> > each to store 'x' and 'y'?
>
> I am using double precision
>
>
> > Are you using integer or fractional math?
>
> integer
>
> > Are the values of 'x' and 'y' using the entire range of the number of
> > bits they are stored in?
>
> my range of x and y is maximum of 2 and minimum of -2. But i am 32 bit
> to represent it.
>
>
> > How many bits represent 'ang'?
> i am using 32 bit
>
> > Why does i go from 0 to 25?
>
> because my estimation of accuracy should of the order of 1
> microradians.
>
> > Describe the contents of your LUT.
> its contains value from 45 degrees to 0. with step size of 45/26.
>
> static long
LUT[26]={23592960,13927738,7359034,3735561,1875029,938429,469329,234679,1173
41,58671,29335,14668,7334,3667,1833,917,458,229,115,57,29,14,7,4,2,1};
>
>
> >
> > Have you verified that the result at each iteration of the loop is
> > what you expected? How about the final results? For what range of
> > input angles?
>
> Yes the result is fine as expected. Its also 32 bit.
>
> waiting for reply
> with regards
> praveen

Hello, 

> Are you using single precision (16 bits) or double precision (32 bits)
> each to store 'x' and 'y'?

I am using double precision 


> Are you using integer or fractional math?

integer

> Are the values of 'x' and 'y' using the entire range of the number of
> bits they are stored in?

my range of x and y is maximum of 2 and minimum of -2. But i am 32 bit
to represent it.


> How many bits represent 'ang'?
i am using 32 bit

> Why does i go from 0 to 25?

because my estimation of accuracy should of the order of 1
microradians.

> Describe the contents of your LUT.
its contains value from 45 degrees to 0. with step size of 45/26.

static long LUT[26]={23592960,13927738,7359034,3735561,1875029,938429,469329,234679,117341,58671,29335,14668,7334,3667,1833,917,458,229,115,57,29,14,7,4,2,1};


> 
> Have you verified that the result at each iteration of the loop is
> what you expected? How about the final results? For what range of
> input angles?

Yes the result is fine as expected. Its also 32 bit.

waiting for reply
with regards
praveen

Praveen,

A few comments:

1) The code as presented assumes that x>=0. Max total shift possible in your
code is a little more than 90 degrees.
2) From your comments you are using double precision variables and math,
which is expensive computationally. Depending on your application single
precision might work adequately.
3) The accuracy you have stated is required does not require the loop to
iterate 26 times.
4) Your cordic code is short enough that you should be able to determine the
assembly code generated and present that to the group for suggestions of
what to change. The question of how many shifts the C compiler is using to
implement '>>i' would be answered by this.  If the answer is 'i' shifts then
there are simple alternatives to save processing. Other potential problems
may also be apparent.

A few more questions:

1)The original values loaded into x and y have how many bits each?
2) Where are they placed in the 32 bits of the x and y variables prior to
starting the routine?

Dirk A. Bell
DSP Consultant


"praveen" <praveenkumar_11@yahoo.com> wrote in message
news:d8daf655.0312122059.1de8f781@posting.google.com...
> Hello,
>
> I need 25 iteration,so my look up table is 4 byte size each,
> Size of x,y,atan is 4 byte each. My accuracy of estimation is of the
> order 1 microradian.
>
> waiting for reply
> with regards
> praveen

Randy Yates wrote:

   ...

> Jerry,
> 
> Here's an intrinsic right out of TI's documentation for the C54x C
> compiler:
> 
> long _smac(long src, int op1, int op2);    MAC    Multiplies
> op1 and op2, shifts the result left by 1, and adds it to src.
> Produces a saturated 32-bit result. (OVM and FRCT set)
> 
> Now that's a pretty special-purpose intrinsic that is essentially
> tied to the architecture of the machine. This is precisely what
> I mean. It's not the implementations of intrinsics, it's their very
> definitions.

Right, but a mac needs special code to set up and finish. _smac() is
good for the middle omly. On long filters, that's most of it.

> At some point you just *cannot* abstract or generalize operations
> since it's those very operations that give you the performance
> improvement. Sort of a physical law.

No HLL I know gives access to flags like carry and overflow. It's hard
to write efficient code without them. For example, dividing by 2^n with 
rounding is best done by right_shift n, add_immediate 0 with carry. The
rounding operation is one insteuction in assembler. How many in C?

> Here's another thing I don't like about being tied to C. Many times
> a hard-core assembly optimization requires organizing the data in
> a very specific way. Now sure, you could organize it that way in
> C too, but if you were just thinking in C you wouldn't think to
> do the organization in the first place because you wouldn't be
> doing the low-level instructions.
> 
> Like I said, assembly: learn it, live it, love it.

   ...

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Jerry Avins wrote:
> Randy Yates wrote:
> 
>> Matt Timmermans wrote:
>>
>>> "Randy Yates" <yates@ieee.org> wrote in message
>>> news:tTvCb.64$im.58@newsread2.news.atl.earthlink.net...
>>>
>>>> You can. Write your time-critical code in assembly
>>>> and make it C-callable. Intrinsics have the same
>>>> problem as the one Jim addressed - in the time it
>>>> takes to learn and apply them, you could've written
>>>> in assembly, and the code is still less readable and
>>>> unportable - two big reasons for writing in C to begin
>>>> with.
>>>
>>>
>>>
>>>
>>> No fair, Randy -- you didn't count the time it takes to learn and apply
>>> assembly.  And if you have one intrinsics library across platforms, 
>>> you only
>>> have to learn that once.
>>
>>
>>
>> What if different processors require different instrinsics?
> 
> 
> The point is that the compiler vendor should be writing the processor-
> specific intrinsics packages. Just as compilers can optimize the same
> code for a pentium or a PPC depending on switch settings.

Jerry,

Here's an intrinsic right out of TI's documentation for the C54x C
compiler:

long _smac(long src, int op1, int op2);	MAC	Multiplies
op1 and op2, shifts the result left by 1, and adds it to src.
Produces a saturated 32-bit result. (OVM and FRCT set)

Now that's a pretty special-purpose intrinsic that is essentially
tied to the architecture of the machine. This is precisely what
I mean. It's not the implementations of intrinsics, it's their very
definitions.

At some point you just *cannot* abstract or generalize operations
since it's those very operations that give you the performance
improvement. Sort of a physical law.

Here's another thing I don't like about being tied to C. Many times
a hard-core assembly optimization requires organizing the data in
a very specific way. Now sure, you could organize it that way in
C too, but if you were just thinking in C you wouldn't think to
do the organization in the first place because you wouldn't be
doing the low-level instructions.

Like I said, assembly: learn it, live it, love it.

> Given the small customer base, I don't see that happening. Until it
> does, people like you, who can write in assembler, will be in demand.

Isn't it funny how one's hearing improves a hundred-fold when praises
are being said? Thank you, Jerry. I sure hope you're right.
-- 
%  Randy Yates                  % "...the answer lies within your soul
%% Fuquay-Varina, NC            %       'cause no one knows which side
%%% 919-577-9882                %                   the coin will fall."
%%%% <yates@ieee.org>           %  'Big Wheels', *Out of the Blue*, ELO
http://home.earthlink.net/~yatescr

Randy Yates wrote:

> Matt Timmermans wrote:
> 
>> "Randy Yates" <yates@ieee.org> wrote in message
>> news:tTvCb.64$im.58@newsread2.news.atl.earthlink.net...
>>
>>> You can. Write your time-critical code in assembly
>>> and make it C-callable. Intrinsics have the same
>>> problem as the one Jim addressed - in the time it
>>> takes to learn and apply them, you could've written
>>> in assembly, and the code is still less readable and
>>> unportable - two big reasons for writing in C to begin
>>> with.
>>
>>
>>
>> No fair, Randy -- you didn't count the time it takes to learn and apply
>> assembly.  And if you have one intrinsics library across platforms, 
>> you only
>> have to learn that once.
> 
> 
> What if different processors require different instrinsics?

The point is that the compiler vendor should be writing the processor-
specific intrinsics packages. Just as compilers can optimize the same
code for a pentium or a PPC depending on switch settings.

Given the small customer base, I don't see that happening. Until it
does, people like you, who can write in assembler, will be in demand.

   ...

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Jerry Avins wrote:

   ...
>   ... There's a Forth 
> with extensions for the 'C31, but the author isn't proud enough of it to 
> release it. He's accustomed to deep optimizing, but this one is better 
> than C (for that machine), but not close enough to assembly.

But, but, but ....  That guy ought to learn how to write.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Matt Timmermans wrote:

> "Randy Yates" <yates@ieee.org> wrote in message
> news:tTvCb.64$im.58@newsread2.news.atl.earthlink.net...
> 
>>You can. Write your time-critical code in assembly
>>and make it C-callable. Intrinsics have the same
>>problem as the one Jim addressed - in the time it
>>takes to learn and apply them, you could've written
>>in assembly, and the code is still less readable and
>>unportable - two big reasons for writing in C to begin
>>with.
> 
> 
> No fair, Randy -- you didn't count the time it takes to learn and apply
> assembly.  And if you have one intrinsics library across platforms, you only
> have to learn that once.

What if different processors require different instrinsics?

> Once you've leared to program in a couple assembly languages, learning
> another one is a waste of neurons. 

It may be a pain in the ass, but concluding it's a waste of neurons
is a bit presumptuous. It depends on what your situation is.

One scenario is where the optimizations gained provide
real performance improvements to the end-user and/or enable you
to cost-reduce a mass-marketed product. I did just that - if you
buy a Sony Ericsson T226, T230, or T237, you'll be buying just
such an solution. And I'm here to tell you, it feels damn good to
be able to do this for your company. (I really don't want to go
into it since it may be IP-sensitive.)

> It won't teach you any great truths or
> help you think differently -- it's just work that becomes worthless when you
> want to switch processor families.

If it saved a crapload of money, then I wouldn't call that worthless.

>>I think folks who cling to C are in denial - you're
>>just gonna have to break down and code in assembly if
>>you want optimum performance. Learn it. Live it. Love it.
> 
> 
> Well, yeah, that's certainly true as things stand, but that's mostly because
> C sucks for DSP, and that's mostly because C was optimized for different
> kinds of CPUs.  Poor DSP performance is not an inescapable characteristic of
> all higher level languages.  Theres no reason a DSP-optimized C-level
> language couldn't get more than half the performance in a similar space for
> less than half the effort.  In the current scheme of things, that is almost
> always a trade-off I'd jump at.

It may be that we really agree with each other, Matt. I certainly agree that
it isn't worth spending a month optimizing some code if it's just for a test
fixture that will be used for a few weeks. But like I said above, whether the
extra time and effort are really worth it or not depend on the situation, and
some situations DEFINITELY warrant the descendence into hard-core assembly.

> And an intrinsics library can be just like having a new language, except
> without the nifty syntactic sugar and stricter type checking that you'd get
> if you'd designed a really new language instead.

I dunno, it'd have to serve me breakfast before I'd agree we need YANL (yet
another new language). I really don't care for C#, and I'd even challenge
Perl and such. Just use C, man (or C++).
-- 
%  Randy Yates                  % "...the answer lies within your soul
%% Fuquay-Varina, NC            %       'cause no one knows which side
%%% 919-577-9882                %                   the coin will fall."
%%%% <yates@ieee.org>           %  'Big Wheels', *Out of the Blue*, ELO
http://home.earthlink.net/~yatescr

John Monro wrote:

   ...

> Praveen,
> That number of cycles seems high. The DSP chip is capable of doing a 
> multi-bit shift in a single cycle, using its barrel shifter.
> The compiler should compile each 'C'  multi-bit shift into a single 
> assembly-language shift. If, for some reason  the compiler is producing 
> a number of single-bit shifts for each multi-bit shift
> then that would account for the high cycle count.
> 
> Regards,
> John

Here we go, psyching out a stupid compiler again. (Well, pretty smart 
actually, but stupid compared to Praveen.) And when you modify the .obj 
file to remove the extra instructions, all subsequent labels need 
address fix-ups. (Until you slow the program down a bit by forcing the 
compiler to do the fix-up, or insert no-ops where their only harm is 
taking up room.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;