My program run too slow in DSP simulator of CCS 3.1. From my profile report, I can see that it spent most of the time doing the following loop: for(i=0 ; i<length ; i++){ //vertical ty=y0+i; tx=x0; tt=(short)ty*(short)hor; if(ty<0){ tmp1=frame[0]; tmp2=frame[hor_1]; for(j=0 ; j<length ; j++,tx++){ value=((tx<0)?tmp1:(tx>hor_1)?tmp2:frame[tx]); temp[i] += value * FIRx[j]; } } else if(ty>ver_1){ tmp1=frame[ver_pos]; tmp2=frame[ver_pos+hor_1]; for(j=0 ; j<length ; j++,tx++){ value=((tx<0)?tmp1:(tx>hor_1)?tmp2:frame[ver_pos+tx]); temp[i] += value * FIRx[j]; } } else{ tmp1=frame[tt]; tmp2=frame[tt+hor_1]; for(j=0 ; j<length ; j++,tx++){ value=((tx<0)?tmp1:(tx>hor_1)?tmp2:frame[tt+tx]); temp[i] += value * FIRx[j]; } } pixel += temp[i] * FIRy[i]; } I have studied these codes, and I don't know how to modify them? Could switches embeded in the loops be the reason?
My program run 10000 times slower in DM642 than in visual studio! Help!
Started by ●January 7, 2007
Reply by ●January 9, 20072007-01-09
Juliana wrote:> My program run too slow in DSP simulator of CCS 3.1. From my profile > report, I can see that it spent most of the time doing the following > loop: > > for(i=0 ; i<length ; i++){ //vertical > ty=y0+i; > tx=x0; > tt=(short)ty*(short)hor; > if(ty<0){ > tmp1=frame[0]; > tmp2=frame[hor_1]; > for(j=0 ; j<length ; j++,tx++){ > value=((tx<0)?tmp1:(tx>hor_1)?tmp2:frame[tx]); > temp[i] += value * FIRx[j]; > } > } > else if(ty>ver_1){ > tmp1=frame[ver_pos]; > tmp2=frame[ver_pos+hor_1]; > for(j=0 ; j<length ; j++,tx++){ > value=((tx<0)?tmp1:(tx>hor_1)?tmp2:frame[ver_pos+tx]); > temp[i] += value * FIRx[j]; > } > } > else{ > tmp1=frame[tt]; > tmp2=frame[tt+hor_1]; > for(j=0 ; j<length ; j++,tx++){ > value=((tx<0)?tmp1:(tx>hor_1)?tmp2:frame[tt+tx]); > temp[i] += value * FIRx[j]; > } > } > pixel += temp[i] * FIRy[i]; > } > > I have studied these codes, and I don't know how to modify them? > > Could switches embeded in the loops be the reason?The many switches (in "value=...") are the reason. Look at such a loop: for (j=0�;�j<length�;�j++,tx++) { value= ((tx < 0) ? tmp1 : ((tx > hor_1) ? tmp2 :frame[tx])); ����temp[i]�+=�value�*�FIRx[j]; } This has two decisions per loop, and if 'length' is large then these decisons almost always have the same result. Therefore you should move these decisions out of the loop. This can be done by splitting the loop into three loops (code not tested): int todo = length; // get number of loops to do in first for-loop int num1 = - tx; if (num1 > todo) num1 = todo; else if (num1 < 0) num1 = 0; todo -= num1; // first loop: does not need 'tx' inside so we can calculate it once for (j = 0; j < num1; ++j) temp[i] += tmp1�*�FIRx[j];; tx += num1; // get number of loops to do in second for-loop int num2 = hor_1+1; if (num2 > todo) num2 = todo; else if (num2 < 0) num2 = 0; todo -= num2; // second loop: for (int n = 0; n < num2; ++n, ++j, ++tx) temp[i] += frame[tx]�*�FIRx[j]; // third loop: does not need 'tx' inside so we can calculate it once for (int n = 0; n < todo; ++n, ++j) temp[i] += tmp2�*�FIRx[j];; tx += todo; bye Andreas -- Andreas H�nnebeck | email: acmh@gmx.de ----- privat ---- | www : http://www.huennebeck-online.de Fax/Anrufbeantworter: 0721/151-284301 GPG-Key: http://www.huennebeck-online.de/public_keys/andreas.asc PGP-Key: http://www.huennebeck-online.de/public_keys/pgp_andreas.asc