DSPRelated.com
Forums

how to optimize c code of Cordic algorithm

Started by praveen December 11, 2003
Hello,

I need 25 iteration,so my look up table is 4 byte size each,
Size of x,y,atan is 4 byte each. My accuracy of estimation is of the
order 1 microradian.

waiting for reply
with regards
praveen
"Randy Yates" <yates@ieee.org> wrote in message
news:tTvCb.64$im.58@newsread2.news.atl.earthlink.net...
> You can. Write your time-critical code in assembly > and make it C-callable. Intrinsics have the same > problem as the one Jim addressed - in the time it > takes to learn and apply them, you could've written > in assembly, and the code is still less readable and > unportable - two big reasons for writing in C to begin > with.
No fair, Randy -- you didn't count the time it takes to learn and apply assembly. And if you have one intrinsics library across platforms, you only have to learn that once. Once you've leared to program in a couple assembly languages, learning another one is a waste of neurons. It won't teach you any great truths or help you think differently -- it's just work that becomes worthless when you want to switch processor families.
> I think folks who cling to C are in denial - you're > just gonna have to break down and code in assembly if > you want optimum performance. Learn it. Live it. Love it.
Well, yeah, that's certainly true as things stand, but that's mostly because C sucks for DSP, and that's mostly because C was optimized for different kinds of CPUs. Poor DSP performance is not an inescapable characteristic of all higher level languages. Theres no reason a DSP-optimized C-level language couldn't get more than half the performance in a similar space for less than half the effort. In the current scheme of things, that is almost always a trade-off I'd jump at. And an intrinsics library can be just like having a new language, except without the nifty syntactic sugar and stricter type checking that you'd get if you'd designed a really new language instead.
praveen wrote:

>Hello, >I have implemented cordic for finding the atan in adsp 2191. But it >takes 3714 cycles for its execution. I have implemented it in c. Can >something tell me how can i optimize the code for that it takes less >than 500 cycles. >my code is >LUT is the lookup table >x and y are the two input whsoe atan to be determined > >for(i=0;i<=25;i++) > { > x1=x; > if (y>0) > { > x=x+(y>>i); > y=y-(x1>>i); > ang=ang+LUT[i]; > } > else > { > x=x-(y>>i); > y=y+(x1>>i); > ang=ang-LUT[i]; > } > >Please suggest me technic my which i can reduce the number of cycles > >Waiting for reply >With regards >praveen > >
Praveen, That number of cycles seems high. The DSP chip is capable of doing a multi-bit shift in a single cycle, using its barrel shifter. The compiler should compile each 'C' multi-bit shift into a single assembly-language shift. If, for some reason the compiler is producing a number of single-bit shifts for each multi-bit shift then that would account for the high cycle count. Regards, John
Matt Timmermans wrote:

   ...

> Because the translation to assembly on any given platform is obvious, you > can hand-optimize C code for that platform in a predictable way, and your > code would remain portable to the extent that your program would have the > same semantic meaning across all platforms, even though you might want to > re-optimize it for processors that were significantly different. You also > get to let the compiler manage register allocation, stack shuffling, > instruction scheduling, type checking, and all that tedious stuff that > compilers are better at than people these days.
More easily said than done, but that's the general idea. There's a Forth with extensions for the 'C31, but the author isn't proud enough of it to release it. He's accustomed to deep optimizing, but this one is better than C (for that machine), but not close enough to assembly. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
John Monro wrote:

   ...

> Praveen, > That number of cycles seems high. The DSP chip is capable of doing a > multi-bit shift in a single cycle, using its barrel shifter. > The compiler should compile each 'C' multi-bit shift into a single > assembly-language shift. If, for some reason the compiler is producing > a number of single-bit shifts for each multi-bit shift > then that would account for the high cycle count. > > Regards, > John
Here we go, psyching out a stupid compiler again. (Well, pretty smart actually, but stupid compared to Praveen.) And when you modify the .obj file to remove the extra instructions, all subsequent labels need address fix-ups. (Until you slow the program down a bit by forcing the compiler to do the fix-up, or insert no-ops where their only harm is taking up room. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Matt Timmermans wrote:

> "Randy Yates" <yates@ieee.org> wrote in message > news:tTvCb.64$im.58@newsread2.news.atl.earthlink.net... > >>You can. Write your time-critical code in assembly >>and make it C-callable. Intrinsics have the same >>problem as the one Jim addressed - in the time it >>takes to learn and apply them, you could've written >>in assembly, and the code is still less readable and >>unportable - two big reasons for writing in C to begin >>with. > > > No fair, Randy -- you didn't count the time it takes to learn and apply > assembly. And if you have one intrinsics library across platforms, you only > have to learn that once.
What if different processors require different instrinsics?
> Once you've leared to program in a couple assembly languages, learning > another one is a waste of neurons.
It may be a pain in the ass, but concluding it's a waste of neurons is a bit presumptuous. It depends on what your situation is. One scenario is where the optimizations gained provide real performance improvements to the end-user and/or enable you to cost-reduce a mass-marketed product. I did just that - if you buy a Sony Ericsson T226, T230, or T237, you'll be buying just such an solution. And I'm here to tell you, it feels damn good to be able to do this for your company. (I really don't want to go into it since it may be IP-sensitive.)
> It won't teach you any great truths or > help you think differently -- it's just work that becomes worthless when you > want to switch processor families.
If it saved a crapload of money, then I wouldn't call that worthless.
>>I think folks who cling to C are in denial - you're >>just gonna have to break down and code in assembly if >>you want optimum performance. Learn it. Live it. Love it. > > > Well, yeah, that's certainly true as things stand, but that's mostly because > C sucks for DSP, and that's mostly because C was optimized for different > kinds of CPUs. Poor DSP performance is not an inescapable characteristic of > all higher level languages. Theres no reason a DSP-optimized C-level > language couldn't get more than half the performance in a similar space for > less than half the effort. In the current scheme of things, that is almost > always a trade-off I'd jump at.
It may be that we really agree with each other, Matt. I certainly agree that it isn't worth spending a month optimizing some code if it's just for a test fixture that will be used for a few weeks. But like I said above, whether the extra time and effort are really worth it or not depend on the situation, and some situations DEFINITELY warrant the descendence into hard-core assembly.
> And an intrinsics library can be just like having a new language, except > without the nifty syntactic sugar and stricter type checking that you'd get > if you'd designed a really new language instead.
I dunno, it'd have to serve me breakfast before I'd agree we need YANL (yet another new language). I really don't care for C#, and I'd even challenge Perl and such. Just use C, man (or C++). -- % Randy Yates % "...the answer lies within your soul %% Fuquay-Varina, NC % 'cause no one knows which side %%% 919-577-9882 % the coin will fall." %%%% <yates@ieee.org> % 'Big Wheels', *Out of the Blue*, ELO http://home.earthlink.net/~yatescr
Jerry Avins wrote:

   ...
> ... There's a Forth > with extensions for the 'C31, but the author isn't proud enough of it to > release it. He's accustomed to deep optimizing, but this one is better > than C (for that machine), but not close enough to assembly.
But, but, but .... That guy ought to learn how to write. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Randy Yates wrote:

> Matt Timmermans wrote: > >> "Randy Yates" <yates@ieee.org> wrote in message >> news:tTvCb.64$im.58@newsread2.news.atl.earthlink.net... >> >>> You can. Write your time-critical code in assembly >>> and make it C-callable. Intrinsics have the same >>> problem as the one Jim addressed - in the time it >>> takes to learn and apply them, you could've written >>> in assembly, and the code is still less readable and >>> unportable - two big reasons for writing in C to begin >>> with. >> >> >> >> No fair, Randy -- you didn't count the time it takes to learn and apply >> assembly. And if you have one intrinsics library across platforms, >> you only >> have to learn that once. > > > What if different processors require different instrinsics?
The point is that the compiler vendor should be writing the processor- specific intrinsics packages. Just as compilers can optimize the same code for a pentium or a PPC depending on switch settings. Given the small customer base, I don't see that happening. Until it does, people like you, who can write in assembler, will be in demand. ... Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Jerry Avins wrote:
> Randy Yates wrote: > >> Matt Timmermans wrote: >> >>> "Randy Yates" <yates@ieee.org> wrote in message >>> news:tTvCb.64$im.58@newsread2.news.atl.earthlink.net... >>> >>>> You can. Write your time-critical code in assembly >>>> and make it C-callable. Intrinsics have the same >>>> problem as the one Jim addressed - in the time it >>>> takes to learn and apply them, you could've written >>>> in assembly, and the code is still less readable and >>>> unportable - two big reasons for writing in C to begin >>>> with. >>> >>> >>> >>> >>> No fair, Randy -- you didn't count the time it takes to learn and apply >>> assembly. And if you have one intrinsics library across platforms, >>> you only >>> have to learn that once. >> >> >> >> What if different processors require different instrinsics? > > > The point is that the compiler vendor should be writing the processor- > specific intrinsics packages. Just as compilers can optimize the same > code for a pentium or a PPC depending on switch settings.
Jerry, Here's an intrinsic right out of TI's documentation for the C54x C compiler: long _smac(long src, int op1, int op2); MAC Multiplies op1 and op2, shifts the result left by 1, and adds it to src. Produces a saturated 32-bit result. (OVM and FRCT set) Now that's a pretty special-purpose intrinsic that is essentially tied to the architecture of the machine. This is precisely what I mean. It's not the implementations of intrinsics, it's their very definitions. At some point you just *cannot* abstract or generalize operations since it's those very operations that give you the performance improvement. Sort of a physical law. Here's another thing I don't like about being tied to C. Many times a hard-core assembly optimization requires organizing the data in a very specific way. Now sure, you could organize it that way in C too, but if you were just thinking in C you wouldn't think to do the organization in the first place because you wouldn't be doing the low-level instructions. Like I said, assembly: learn it, live it, love it.
> Given the small customer base, I don't see that happening. Until it > does, people like you, who can write in assembler, will be in demand.
Isn't it funny how one's hearing improves a hundred-fold when praises are being said? Thank you, Jerry. I sure hope you're right. -- % Randy Yates % "...the answer lies within your soul %% Fuquay-Varina, NC % 'cause no one knows which side %%% 919-577-9882 % the coin will fall." %%%% <yates@ieee.org> % 'Big Wheels', *Out of the Blue*, ELO http://home.earthlink.net/~yatescr
Randy Yates wrote:

   ...

> Jerry, > > Here's an intrinsic right out of TI's documentation for the C54x C > compiler: > > long _smac(long src, int op1, int op2); MAC Multiplies > op1 and op2, shifts the result left by 1, and adds it to src. > Produces a saturated 32-bit result. (OVM and FRCT set) > > Now that's a pretty special-purpose intrinsic that is essentially > tied to the architecture of the machine. This is precisely what > I mean. It's not the implementations of intrinsics, it's their very > definitions.
Right, but a mac needs special code to set up and finish. _smac() is good for the middle omly. On long filters, that's most of it.
> At some point you just *cannot* abstract or generalize operations > since it's those very operations that give you the performance > improvement. Sort of a physical law.
No HLL I know gives access to flags like carry and overflow. It's hard to write efficient code without them. For example, dividing by 2^n with rounding is best done by right_shift n, add_immediate 0 with carry. The rounding operation is one insteuction in assembler. How many in C?
> Here's another thing I don't like about being tied to C. Many times > a hard-core assembly optimization requires organizing the data in > a very specific way. Now sure, you could organize it that way in > C too, but if you were just thinking in C you wouldn't think to > do the organization in the first place because you wouldn't be > doing the low-level instructions. > > Like I said, assembly: learn it, live it, love it.
... Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;