Jay,
Here are some ideas on how you might get better code performance:
1) Consider the for loop:
for (j=15; j>0; j--)
{
FIR1[j]=FIR1[j-1];
}
I suspect that this for loop can be replaced with a call to memcpy.
The memcpy supplied with your compiler should be written in assembly
language and therefore
it takes full advantage of any special instructions provided by the
hardware. I am thinking of a zero overhead repeat loop.
2) You have these to for loops:
for (j =0; j<16; j++)
{
Z+= FIR1[j]*weight_array[j];
}
outp= RCnorm - Z;
for (j=0; j<16; j++)
{
weight_array[j] += 2*0.01*outp*FIR1[j];
}
I suspect that you can turn these two for loops into one loop. Not
sure how much this is going to help. You can also consider using
pointer arithmetic rather than indexing. On some machines this can be
a big win. Some compilers will do this for you automatically but many
do not. You can also consider turning each of the above for loops into
16 assignment statements. I am not sure it is worth the extra code
space.
3) If a variable is heavily used in a program, a compiler should
allocate it to a fast register. However, some compilers fail to
allocate the right variables to fast registers. As a result, your
program runs slowly. By using the keyword register, you can suggest to
the compiler that this variable should be allocated to a fast
register.
4) You wrote:
double norm = pow (2,15);
If the above statement is inside a loop, then I would compute 2^15
at compile time. By the way the statement:
norm = 1.0;
might run faster then the statement:
norm = 1;
because some compilers will do the conversion from int to double at
compile time. Most are not that bad.
I hope this helps.
Bob Sherry
"Jay" <cdragon@cogeco.ca> wrote in message
news:Xns9530CBE8594D2cdragoncogecoca@216.221.81.119...
> We are fourth year electrical engineering students involved in a
Final
> Design Project course. (Approaching deadline date) We are using a
> TMS320C6416 DSK DSP by Texas Instruments to perform some adaptive
noise
> cancellation. Unfortunately we have run into some serious
unexpected
> CPU Usage problems. We think that our C code is relatively simple.
>
> Our code consists of a simple LMS algorithm, which we have
implemented
> using Reference Framework 3 (provided by TI eXpressDSP tutorial).
>
> The code of our noise cancellation algorithm is as follows:
>
> //Variable Declaration (Global)
>
> static double FIR1[16]={0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
> static double FIR2[16]={0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
>
> static double weight_array [16] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
> static double weight_array2 [16] =
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
> static double tempweights [5000][16];
>
> Sample *srcLeft, *dst, *srcRight, *dst2, ttemp, ttemp2;
> Int size; /* in samples */
> Int chan;
> Int i,j;
> double RCnorm, LCnorm;
> Int icount = 0;
>
>
> double outp;
> double norm = pow (2,15);
> double Z = 0.0;
> double peakpower = 0.0;
>
>
> //Assign variables to input buffers
> srcRight = (Sample *)PIP_getReaderAddr( thrAudioproc[chan].pipIn );
> srcLeft = (Sample *)PIP_getReaderAddr(
thrAudioproc[chan].pipIn2);
> /* get the size in samples (the function below returns it in
words)
> */
> size = sizeInSamples( PIP_getReaderSize(
thrAudioproc[chan].pipIn )
> );
>
> /* get the empty buffer from the out-pipe */
> PIP_alloc( thrAudioproc[chan].pipOut );
> PIP_alloc( thrAudioproc[chan].pipOut2 );
> //Declare output buffers
> dst = (Sample *)PIP_getWriterAddr( thrAudioproc[chan].pipOut );
> dst2 = (Sample *)PIP_getWriterAddr(
thrAudioproc[chan].pipOut2 );
>
>
>
>
> // ***********BASIC LMS (NOISE CANCELLATION) ALGORITHM
STARTS
> HERE *************************************
>
> for ( i= 0; i < FRAMELEN; i++)
> {
>
>
> RCnorm = srcRight[i]/norm;
> LCnorm =srcLeft[i]/norm;
>
> for (j=15; j>0; j--)
> {
> FIR1[j]=FIR1[j-1];
> }
>
> FIR1[0] = LCnorm;
>
> Z=0.0;
>
> for (j =0; j<16; j++)
> {
> Z+= FIR1[j]*weight_array[j];
> }
> outp= RCnorm - Z;
>
>
> for (j=0; j<16; j++)
> {
> weight_array[j] += 2*0.01*outp*FIR1[j];
> }
>
>
> ttemp=outp*norm;
>
> //dst[i] = srcRight[i];//(Short)(norm*RCnorm[i]); /*
real
> stereo/N-ch.
> //dst2[i] = srcRight[i];//(Short)(norm*LCnorm[i]);
> dst[i] = ttemp;
> dst2[i] = ttemp;
> }
>
>
> If anyone has any ideas as to whether the format of our code could
be
> contributing to a high DSP CPU Usage (92.8%!!) it would be
appreciated
> if they could post some suggestions. Some of our ideas thus far
> include:
>
> - inefficient variable declarations/definitions
> - inefficient code structure
> - 'for loop' problems????
>
>
> Any help would be greatly appreciated. Thank you very much.