comp.dsp | how to optimize c code of Cordic algorithm| page 2

Reply by praveen ●December 13, 20032003-12-13

Hello,

I need 25 iteration,so my look up table is 4 byte size each,
Size of x,y,atan is 4 byte each. My accuracy of estimation is of the
order 1 microradian.

waiting for reply
with regards
praveen

Reply by Matt Timmermans ●December 13, 20032003-12-13

"Randy Yates" <yates@ieee.org> wrote in message
news:tTvCb.64$im.58@newsread2.news.atl.earthlink.net...
> You can. Write your time-critical code in assembly
> and make it C-callable. Intrinsics have the same
> problem as the one Jim addressed - in the time it
> takes to learn and apply them, you could've written
> in assembly, and the code is still less readable and
> unportable - two big reasons for writing in C to begin
> with.

No fair, Randy -- you didn't count the time it takes to learn and apply
assembly.  And if you have one intrinsics library across platforms, you only
have to learn that once.
Once you've leared to program in a couple assembly languages, learning
another one is a waste of neurons.  It won't teach you any great truths or
help you think differently -- it's just work that becomes worthless when you
want to switch processor families.

> I think folks who cling to C are in denial - you're
> just gonna have to break down and code in assembly if
> you want optimum performance. Learn it. Live it. Love it.

Well, yeah, that's certainly true as things stand, but that's mostly because
C sucks for DSP, and that's mostly because C was optimized for different
kinds of CPUs.  Poor DSP performance is not an inescapable characteristic of
all higher level languages.  Theres no reason a DSP-optimized C-level
language couldn't get more than half the performance in a similar space for
less than half the effort.  In the current scheme of things, that is almost
always a trade-off I'd jump at.

And an intrinsics library can be just like having a new language, except
without the nifty syntactic sugar and stricter type checking that you'd get
if you'd designed a really new language instead.

Reply by John Monro ●December 13, 20032003-12-13

praveen wrote:

>Hello,
>I have implemented cordic for finding the atan in adsp 2191. But it
>takes 3714 cycles for its execution. I have implemented it in c. Can
>something tell me how can i optimize the code for that it takes less
>than 500 cycles.
>my code is
>LUT is the lookup table
>x and y are the two input whsoe atan to be determined
>
>for(i=0;i<=25;i++)
>	{
>		x1=x;
>		if (y>0)
>		{
>			x=x+(y>>i);
>			y=y-(x1>>i);
>			ang=ang+LUT[i];
>		}
>		else
>		{
>			x=x-(y>>i);
>			y=y+(x1>>i);
>			ang=ang-LUT[i];
>		} 
>
>Please suggest me technic my which i can reduce the number of cycles
>
>Waiting for reply
>With regards
>praveen
>  
>
Praveen,
That number of cycles seems high. 
The DSP chip is capable of doing a multi-bit shift in a single cycle, 
using its barrel shifter.
The compiler should compile each 'C'  multi-bit shift into a single 
assembly-language shift. 
If, for some reason  the compiler is producing a number of single-bit 
shifts for each multi-bit shift
then that would account for the high cycle count.

Regards,
John

Reply by Jerry Avins ●December 13, 20032003-12-13

Matt Timmermans wrote:

   ...

> Because the translation to assembly on any given platform is obvious, you
> can hand-optimize C code for that platform in a predictable way, and your
> code would remain portable to the extent that your program would have the
> same semantic meaning across all platforms, even though you might want to
> re-optimize it for processors that were significantly different.  You also
> get to let the compiler manage register allocation, stack shuffling,
> instruction scheduling, type checking, and all that tedious stuff that
> compilers are better at than people these days.

More easily said than done, but that's the general idea. There's a Forth 
with extensions for the 'C31, but the author isn't proud enough of it to 
release it. He's accustomed to deep optimizing, but this one is better 
than C (for that machine), but not close enough to assembly.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Jerry Avins ●December 13, 20032003-12-13

John Monro wrote:

   ...

> Praveen,
> That number of cycles seems high. The DSP chip is capable of doing a 
> multi-bit shift in a single cycle, using its barrel shifter.
> The compiler should compile each 'C'  multi-bit shift into a single 
> assembly-language shift. If, for some reason  the compiler is producing 
> a number of single-bit shifts for each multi-bit shift
> then that would account for the high cycle count.
> 
> Regards,
> John

Here we go, psyching out a stupid compiler again. (Well, pretty smart 
actually, but stupid compared to Praveen.) And when you modify the .obj 
file to remove the extra instructions, all subsequent labels need 
address fix-ups. (Until you slow the program down a bit by forcing the 
compiler to do the fix-up, or insert no-ops where their only harm is 
taking up room.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Randy Yates ●December 13, 20032003-12-13

Matt Timmermans wrote:

> "Randy Yates" <yates@ieee.org> wrote in message
> news:tTvCb.64$im.58@newsread2.news.atl.earthlink.net...
> 
>>You can. Write your time-critical code in assembly
>>and make it C-callable. Intrinsics have the same
>>problem as the one Jim addressed - in the time it
>>takes to learn and apply them, you could've written
>>in assembly, and the code is still less readable and
>>unportable - two big reasons for writing in C to begin
>>with.
> 
> 
> No fair, Randy -- you didn't count the time it takes to learn and apply
> assembly.  And if you have one intrinsics library across platforms, you only
> have to learn that once.

What if different processors require different instrinsics?

> Once you've leared to program in a couple assembly languages, learning
> another one is a waste of neurons. 

It may be a pain in the ass, but concluding it's a waste of neurons
is a bit presumptuous. It depends on what your situation is.

One scenario is where the optimizations gained provide
real performance improvements to the end-user and/or enable you
to cost-reduce a mass-marketed product. I did just that - if you
buy a Sony Ericsson T226, T230, or T237, you'll be buying just
such an solution. And I'm here to tell you, it feels damn good to
be able to do this for your company. (I really don't want to go
into it since it may be IP-sensitive.)

> It won't teach you any great truths or
> help you think differently -- it's just work that becomes worthless when you
> want to switch processor families.

If it saved a crapload of money, then I wouldn't call that worthless.

>>I think folks who cling to C are in denial - you're
>>just gonna have to break down and code in assembly if
>>you want optimum performance. Learn it. Live it. Love it.
> 
> 
> Well, yeah, that's certainly true as things stand, but that's mostly because
> C sucks for DSP, and that's mostly because C was optimized for different
> kinds of CPUs.  Poor DSP performance is not an inescapable characteristic of
> all higher level languages.  Theres no reason a DSP-optimized C-level
> language couldn't get more than half the performance in a similar space for
> less than half the effort.  In the current scheme of things, that is almost
> always a trade-off I'd jump at.

It may be that we really agree with each other, Matt. I certainly agree that
it isn't worth spending a month optimizing some code if it's just for a test
fixture that will be used for a few weeks. But like I said above, whether the
extra time and effort are really worth it or not depend on the situation, and
some situations DEFINITELY warrant the descendence into hard-core assembly.

> And an intrinsics library can be just like having a new language, except
> without the nifty syntactic sugar and stricter type checking that you'd get
> if you'd designed a really new language instead.

I dunno, it'd have to serve me breakfast before I'd agree we need YANL (yet
another new language). I really don't care for C#, and I'd even challenge
Perl and such. Just use C, man (or C++).
-- 
%  Randy Yates                  % "...the answer lies within your soul
%% Fuquay-Varina, NC            %       'cause no one knows which side
%%% 919-577-9882                %                   the coin will fall."
%%%% <yates@ieee.org>           %  'Big Wheels', *Out of the Blue*, ELO
http://home.earthlink.net/~yatescr

Reply by Jerry Avins ●December 13, 20032003-12-13

Jerry Avins wrote:

   ...
>   ... There's a Forth 
> with extensions for the 'C31, but the author isn't proud enough of it to 
> release it. He's accustomed to deep optimizing, but this one is better 
> than C (for that machine), but not close enough to assembly.

But, but, but ....  That guy ought to learn how to write.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Jerry Avins ●December 13, 20032003-12-13

Randy Yates wrote:

> Matt Timmermans wrote:
> 
>> "Randy Yates" <yates@ieee.org> wrote in message
>> news:tTvCb.64$im.58@newsread2.news.atl.earthlink.net...
>>
>>> You can. Write your time-critical code in assembly
>>> and make it C-callable. Intrinsics have the same
>>> problem as the one Jim addressed - in the time it
>>> takes to learn and apply them, you could've written
>>> in assembly, and the code is still less readable and
>>> unportable - two big reasons for writing in C to begin
>>> with.
>>
>>
>>
>> No fair, Randy -- you didn't count the time it takes to learn and apply
>> assembly.  And if you have one intrinsics library across platforms, 
>> you only
>> have to learn that once.
> 
> 
> What if different processors require different instrinsics?

The point is that the compiler vendor should be writing the processor-
specific intrinsics packages. Just as compilers can optimize the same
code for a pentium or a PPC depending on switch settings.

Given the small customer base, I don't see that happening. Until it
does, people like you, who can write in assembler, will be in demand.

   ...

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Randy Yates ●December 13, 20032003-12-13

Jerry Avins wrote:
> Randy Yates wrote:
> 
>> Matt Timmermans wrote:
>>
>>> "Randy Yates" <yates@ieee.org> wrote in message
>>> news:tTvCb.64$im.58@newsread2.news.atl.earthlink.net...
>>>
>>>> You can. Write your time-critical code in assembly
>>>> and make it C-callable. Intrinsics have the same
>>>> problem as the one Jim addressed - in the time it
>>>> takes to learn and apply them, you could've written
>>>> in assembly, and the code is still less readable and
>>>> unportable - two big reasons for writing in C to begin
>>>> with.
>>>
>>>
>>>
>>>
>>> No fair, Randy -- you didn't count the time it takes to learn and apply
>>> assembly.  And if you have one intrinsics library across platforms, 
>>> you only
>>> have to learn that once.
>>
>>
>>
>> What if different processors require different instrinsics?
> 
> 
> The point is that the compiler vendor should be writing the processor-
> specific intrinsics packages. Just as compilers can optimize the same
> code for a pentium or a PPC depending on switch settings.

Jerry,

Here's an intrinsic right out of TI's documentation for the C54x C
compiler:

long _smac(long src, int op1, int op2);	MAC	Multiplies
op1 and op2, shifts the result left by 1, and adds it to src.
Produces a saturated 32-bit result. (OVM and FRCT set)

Now that's a pretty special-purpose intrinsic that is essentially
tied to the architecture of the machine. This is precisely what
I mean. It's not the implementations of intrinsics, it's their very
definitions.

At some point you just *cannot* abstract or generalize operations
since it's those very operations that give you the performance
improvement. Sort of a physical law.

Here's another thing I don't like about being tied to C. Many times
a hard-core assembly optimization requires organizing the data in
a very specific way. Now sure, you could organize it that way in
C too, but if you were just thinking in C you wouldn't think to
do the organization in the first place because you wouldn't be
doing the low-level instructions.

Like I said, assembly: learn it, live it, love it.

> Given the small customer base, I don't see that happening. Until it
> does, people like you, who can write in assembler, will be in demand.

Isn't it funny how one's hearing improves a hundred-fold when praises
are being said? Thank you, Jerry. I sure hope you're right.
-- 
%  Randy Yates                  % "...the answer lies within your soul
%% Fuquay-Varina, NC            %       'cause no one knows which side
%%% 919-577-9882                %                   the coin will fall."
%%%% <yates@ieee.org>           %  'Big Wheels', *Out of the Blue*, ELO
http://home.earthlink.net/~yatescr

Reply by Jerry Avins ●December 13, 20032003-12-13

Randy Yates wrote:

   ...

> Jerry,
> 
> Here's an intrinsic right out of TI's documentation for the C54x C
> compiler:
> 
> long _smac(long src, int op1, int op2);    MAC    Multiplies
> op1 and op2, shifts the result left by 1, and adds it to src.
> Produces a saturated 32-bit result. (OVM and FRCT set)
> 
> Now that's a pretty special-purpose intrinsic that is essentially
> tied to the architecture of the machine. This is precisely what
> I mean. It's not the implementations of intrinsics, it's their very
> definitions.

Right, but a mac needs special code to set up and finish. _smac() is
good for the middle omly. On long filters, that's most of it.

> At some point you just *cannot* abstract or generalize operations
> since it's those very operations that give you the performance
> improvement. Sort of a physical law.

No HLL I know gives access to flags like carry and overflow. It's hard
to write efficient code without them. For example, dividing by 2^n with 
rounding is best done by right_shift n, add_immediate 0 with carry. The
rounding operation is one insteuction in assembler. How many in C?

> Here's another thing I don't like about being tied to C. Many times
> a hard-core assembly optimization requires organizing the data in
> a very specific way. Now sure, you could organize it that way in
> C too, but if you were just thinking in C you wouldn't think to
> do the organization in the first place because you wouldn't be
> doing the low-level instructions.
> 
> Like I said, assembly: learn it, live it, love it.

   ...

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Previous 123 Next

how to optimize c code of Cordic algorithm

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group