Reply by Andreas Huennebeck January 9, 20072007-01-09
Juliana wrote:

> My program run too slow in DSP simulator of CCS 3.1. From my profile > report, I can see that it spent most of the time doing the following > loop: > > for(i=0 ; i<length ; i++){ //vertical > ty=y0+i; > tx=x0; > tt=(short)ty*(short)hor; > if(ty<0){ > tmp1=frame[0]; > tmp2=frame[hor_1]; > for(j=0 ; j<length ; j++,tx++){ > value=((tx<0)?tmp1:(tx>hor_1)?tmp2:frame[tx]); > temp[i] += value * FIRx[j]; > } > } > else if(ty>ver_1){ > tmp1=frame[ver_pos]; > tmp2=frame[ver_pos+hor_1]; > for(j=0 ; j<length ; j++,tx++){ > value=((tx<0)?tmp1:(tx>hor_1)?tmp2:frame[ver_pos+tx]); > temp[i] += value * FIRx[j]; > } > } > else{ > tmp1=frame[tt]; > tmp2=frame[tt+hor_1]; > for(j=0 ; j<length ; j++,tx++){ > value=((tx<0)?tmp1:(tx>hor_1)?tmp2:frame[tt+tx]); > temp[i] += value * FIRx[j]; > } > } > pixel += temp[i] * FIRy[i]; > } > > I have studied these codes, and I don't know how to modify them? > > Could switches embeded in the loops be the reason?
The many switches (in "value=...") are the reason. Look at such a loop: for (j=0&#4294967295;;&#4294967295;j<length&#4294967295;;&#4294967295;j++,tx++) { value= ((tx < 0) ? tmp1 : ((tx > hor_1) ? tmp2 :frame[tx])); &#4294967295;&#4294967295;&#4294967295;&#4294967295;temp[i]&#4294967295;+=&#4294967295;value&#4294967295;*&#4294967295;FIRx[j]; } This has two decisions per loop, and if 'length' is large then these decisons almost always have the same result. Therefore you should move these decisions out of the loop. This can be done by splitting the loop into three loops (code not tested): int todo = length; // get number of loops to do in first for-loop int num1 = - tx; if (num1 > todo) num1 = todo; else if (num1 < 0) num1 = 0; todo -= num1; // first loop: does not need 'tx' inside so we can calculate it once for (j = 0; j < num1; ++j) temp[i] += tmp1&#4294967295;*&#4294967295;FIRx[j];; tx += num1; // get number of loops to do in second for-loop int num2 = hor_1+1; if (num2 > todo) num2 = todo; else if (num2 < 0) num2 = 0; todo -= num2; // second loop: for (int n = 0; n < num2; ++n, ++j, ++tx) temp[i] += frame[tx]&#4294967295;*&#4294967295;FIRx[j]; // third loop: does not need 'tx' inside so we can calculate it once for (int n = 0; n < todo; ++n, ++j) temp[i] += tmp2&#4294967295;*&#4294967295;FIRx[j];; tx += todo; bye Andreas -- Andreas H&#4294967295;nnebeck | email: acmh@gmx.de ----- privat ---- | www : http://www.huennebeck-online.de Fax/Anrufbeantworter: 0721/151-284301 GPG-Key: http://www.huennebeck-online.de/public_keys/andreas.asc PGP-Key: http://www.huennebeck-online.de/public_keys/pgp_andreas.asc
Reply by Juliana January 7, 20072007-01-07
My program run too slow in DSP simulator of CCS 3.1. From my profile
report, I can see that it spent most of the time doing the following
loop:

     for(i=0 ; i<length ; i++){ //vertical
        ty=y0+i;
		tx=x0;
        tt=(short)ty*(short)hor;
	   if(ty<0){
		  tmp1=frame[0];
		  tmp2=frame[hor_1];
                                   for(j=0 ; j<length ; j++,tx++){
			  value=((tx<0)?tmp1:(tx>hor_1)?tmp2:frame[tx]);
			  temp[i] += value * FIRx[j];
		   }
	    }
	    else if(ty>ver_1){
		    tmp1=frame[ver_pos];
		    tmp2=frame[ver_pos+hor_1];
			for(j=0 ; j<length ; j++,tx++){
				value=((tx<0)?tmp1:(tx>hor_1)?tmp2:frame[ver_pos+tx]);
			                temp[i] += value * FIRx[j];
			}
	   }
	   else{
		    tmp1=frame[tt];
		    tmp2=frame[tt+hor_1];
			for(j=0 ; j<length ; j++,tx++){
				value=((tx<0)?tmp1:(tx>hor_1)?tmp2:frame[tt+tx]);
				 temp[i] += value * FIRx[j];
			}
	   }
       pixel += temp[i] * FIRy[i];
}

I have studied these codes, and I don't know how to modify them?

Could switches embeded in the loops be the reason?