Hi everyone!

I am using DM642 EVM. I need perform FFT very quickly. For this purpose I use
code from TI DSPLIB. My program below:

void main()

{

int i, j;

double *x;

double *w;

int N = 32768;

/* Initialize the board support library, must be first BSL call */

EVMDM642_init();

/* Initialize the LED modules of the BSL */

EVMDM642_LED_init();

// memory alloc

x = MEM_alloc(0, sizeof(double) * N * 2 * 2, BUFALIGN);

if (x == MEM_ILLEGAL) { LOG_printf(&trace,"benchmark err!: MEM_ILLEGAL");
return;};

w = MEM_alloc(0, sizeof(double) * N * 2, BUFALIGN);

if (w == MEM_ILLEGAL) { LOG_printf(&trace,"benchmark err!: MEM_ILLEGAL");
return;};

for(i = 0; i < N; i++)

{

// real part

x[2 * i] = (double)(10*sin(2 * 3.14 * 10 * i / N) + sin(2 * 3.14 * 40 * i /
N));

// img part

x[2 * i + 1] = 0.0;

}

/*[!!!BREAKPOINT!!!]*/

gen_w_r2(w, N); // Generate coefficient table

// in normal order

// Function is given in C-CODE section

DSPF_dp_cfftr2(N, x, w, 1); // input in normal order, output

// in order bit-reversed

bit_rev(x, N); // Bit reverse the output if

// normal order output is needed

// Function is given in C-CODE section

/*[!!!BREAKPOINT!!!]*/

}

This program runs about 1.5 sec!!! It returns correct data, but to slow (I need
FFT per ms)! Then I benchmark DSPF_dp_cfftr2 function with RUNB command (see
/*[!!!BREAKPOINT!!!]*/ comment) it returns 1 097 007 061 clocks!!!

I am new in CCStudio. May be I made some error in code? May be DSP/BIOS settings
not correct? Why it slows? Please HELP!

http://focus.ti.com/lit/ml/sprt379a/sprt379a.pdf document says radix-4 FFT
performs:

0.75*nx*log4(nx) + 38

For nx = 1024: cycles = 3878

Is cycles and clocks the same? If not that is the cycle?

Thanks!

# DM642 EVM TO SLOW!!!?

Started by ●April 17, 2008

Reply by ●April 18, 20082008-04-18

Hi Alexey,

It seem that the cycle count you've obtained is correct: N = 2**15, and you are

using a non-optimized double precision C code to calculate radix 2 complex FFT

from C6700 SPRC121. On a DM642 double precision math is simulated by calls to

_adddp(), _subdp(), _mpydp() functions, which takes CPU cycles.

You might consider either to use a CPU with hardware floating point (C6701,

C671x, C672x) or use an optimized fixed point FFT routines, if the length

of your desired FFT allows for that (that is not too large). The cycle count

from sprt379b doc relates to a C64+ (DM642 is not a C64+ cpu) function, which

I was unable to identify. There are a few fft functions in the source archive,

none of which does have a benchmark of 3/4*nx*log(nx)+38.

Rgds,

Andrew

> Subject: DM642 EVM TO SLOW!!!?

> Posted by: "A...@cognitivevision.com" A...@cognitivevision.com alexeymavrin

> Date: Thu Apr 17, 2008 5:18 am ((PDT))

>

> Hi everyone!

> I am using DM642 EVM. I need perform FFT very quickly. For this purpose I

> use code from TI DSPLIB. My program below:

>

> void main()

> {

> int i, j;

> double *x;

> double *w;

> int N = 32768;

> /* Initialize the board support library, must be first BSL call */

> EVMDM642_init();

>

> /* Initialize the LED modules of the BSL */

> EVMDM642_LED_init();

> // memory alloc

> x = MEM_alloc(0, sizeof(double) * N * 2 * 2, BUFALIGN);

> if (x == MEM_ILLEGAL) { LOG_printf(&trace,"benchmark err!: MEM_ILLEGAL"); return;};

>

> w = MEM_alloc(0, sizeof(double) * N * 2, BUFALIGN);

> if (w == MEM_ILLEGAL) { LOG_printf(&trace,"benchmark err!: MEM_ILLEGAL"); return;};

>

> for(i = 0; i < N; i++)

> {

> // real part

> x[2 * i] = (double)(10*sin(2 * 3.14 * 10 * i / N) + sin(2 * 3.14 * 40 * i / N));

> // img part

> x[2 * i + 1] = 0.0;

> }

>

> /*[!!!BREAKPOINT!!!]*/

>

> gen_w_r2(w, N); // Generate coefficient table

> // in normal order

> // Function is given in C-CODE section

>

> DSPF_dp_cfftr2(N, x, w, 1); // input in normal order, output

> // in order bit-reversed

>

> bit_rev(x, N); // Bit reverse the output if

> // normal order output is needed

> // Function is given in C-CODE section

> /*[!!!BREAKPOINT!!!]*/

>

> }

>

> This program runs about 1.5 sec!!! It returns correct data, but to slow (I need FFT per ms)! Then I benchmark DSPF_dp_cfftr2 function with RUNB command (see /*[!!!BREAKPOINT!!!]*/ comment) it returns 1 097 007 061 clocks!!!

> I am new in CCStudio. May be I made some error in code? May be DSP/BIOS settings not correct? Why it slows? Please HELP!

> http://focus.ti.com/lit/ml/sprt379a/sprt379a.pdf document says radix-4 FFT performs:

> 0.75*nx*log4(nx) + 38

> For nx = 1024: cycles = 3878

> Is cycles and clocks the same? If not that is the cycle?

> Thanks!

It seem that the cycle count you've obtained is correct: N = 2**15, and you are

using a non-optimized double precision C code to calculate radix 2 complex FFT

from C6700 SPRC121. On a DM642 double precision math is simulated by calls to

_adddp(), _subdp(), _mpydp() functions, which takes CPU cycles.

You might consider either to use a CPU with hardware floating point (C6701,

C671x, C672x) or use an optimized fixed point FFT routines, if the length

of your desired FFT allows for that (that is not too large). The cycle count

from sprt379b doc relates to a C64+ (DM642 is not a C64+ cpu) function, which

I was unable to identify. There are a few fft functions in the source archive,

none of which does have a benchmark of 3/4*nx*log(nx)+38.

Rgds,

Andrew

> Subject: DM642 EVM TO SLOW!!!?

> Posted by: "A...@cognitivevision.com" A...@cognitivevision.com alexeymavrin

> Date: Thu Apr 17, 2008 5:18 am ((PDT))

>

> Hi everyone!

> I am using DM642 EVM. I need perform FFT very quickly. For this purpose I

> use code from TI DSPLIB. My program below:

>

> void main()

> {

> int i, j;

> double *x;

> double *w;

> int N = 32768;

> /* Initialize the board support library, must be first BSL call */

> EVMDM642_init();

>

> /* Initialize the LED modules of the BSL */

> EVMDM642_LED_init();

> // memory alloc

> x = MEM_alloc(0, sizeof(double) * N * 2 * 2, BUFALIGN);

> if (x == MEM_ILLEGAL) { LOG_printf(&trace,"benchmark err!: MEM_ILLEGAL"); return;};

>

> w = MEM_alloc(0, sizeof(double) * N * 2, BUFALIGN);

> if (w == MEM_ILLEGAL) { LOG_printf(&trace,"benchmark err!: MEM_ILLEGAL"); return;};

>

> for(i = 0; i < N; i++)

> {

> // real part

> x[2 * i] = (double)(10*sin(2 * 3.14 * 10 * i / N) + sin(2 * 3.14 * 40 * i / N));

> // img part

> x[2 * i + 1] = 0.0;

> }

>

> /*[!!!BREAKPOINT!!!]*/

>

> gen_w_r2(w, N); // Generate coefficient table

> // in normal order

> // Function is given in C-CODE section

>

> DSPF_dp_cfftr2(N, x, w, 1); // input in normal order, output

> // in order bit-reversed

>

> bit_rev(x, N); // Bit reverse the output if

> // normal order output is needed

> // Function is given in C-CODE section

> /*[!!!BREAKPOINT!!!]*/

>

> }

>

> This program runs about 1.5 sec!!! It returns correct data, but to slow (I need FFT per ms)! Then I benchmark DSPF_dp_cfftr2 function with RUNB command (see /*[!!!BREAKPOINT!!!]*/ comment) it returns 1 097 007 061 clocks!!!

> I am new in CCStudio. May be I made some error in code? May be DSP/BIOS settings not correct? Why it slows? Please HELP!

> http://focus.ti.com/lit/ml/sprt379a/sprt379a.pdf document says radix-4 FFT performs:

> 0.75*nx*log4(nx) + 38

> For nx = 1024: cycles = 3878

> Is cycles and clocks the same? If not that is the cycle?

> Thanks!