DSPRelated.com
Forums

Any suggestions on further optimizaiton?

Started by Vict...@itri.org.tw August 15, 2006
Hello,

I am trying to optimize my code about RGB-Gray conversion on DM642.
I write this conversion as a function and insert it into the main function.
The main function deals with image capturing, YUV-RGB conversion, and output to VGA port.
Without the inserted function, everything works fine.

Therefore, the performance is obviously affected by the inserted function.
I've tried almost any optimizaiotn approach I know except packed-data
accessing.
Approaches I used are─
1.the restrict Keyword
2.pragma directive MUST_ITERATE, UNROLL
3.intrinsics _nassert, _extu
4.manually refine code to unroll the for loop instead of using UNROLL.

But it seems these approaches don't work fine.

However, according to the feedback from compiler, I find bit-shifting by
_extu is a major burden of performance, and word alignment is not achieved.
Besides, only few registers are used.

So, any advice on further optimizaiton?

What I come up with are─
1.packed-data accessing
2.DATA_ALIGN
3.another way to unroll the for loop
4.other possible intrinsics
5.task scheduling at dsp/bios level
6.Cache usage problem

Victor

PS.
My code is as follows─
==========================================================================void RGB565_RGB(unsigned short* restrict BMP, unsigned int ByteUsed)
{
unsigned int i;

unsigned int R=0;
unsigned int G=0;
unsigned int B=0;

_nassert(((int) BMP & 0x3) == 0);

#pragma MUST_ITERATE(640,, 4);
/*#pragma UNROLL(8); */

for(i=0; i {

R= _extu(BMP[i], 16, 27);
G= _extu(BMP[i], 21, 26);
B= _extu(BMP[i], 27, 27);
R=G=B=((R+(G>>1)+B)*341)>>10;
BMP[i]=((R<<11)|(G*2<<5)|B);

}

}
==========================================================================