DSPRelated.com
Forums

Testing the speed of the fft on DSP TMS320c6711 (with CCS 3.1 device cycle accurate simulator) , are the times reasonable?

Started by Dukke February 3, 2006
I'm testing the speed of FFT Dsplib functions on DSP TMS320c6711(with
CCS 3.1 device cycle accurate simulator) .

For the single pass implementation the code is the following:


#define CHIP_6711C

#include <stdio.h>
#include <stdlib.h>
#include <csl.h>
#include <csl_timer.h>
#include <math.h>

#include "twiddlesp8192.h" /*twiddle factors*/
#include "brev_table.h"
#include "DSPF_sp_fftSPxSP.h"
#include "DSPF_sp_ifftSPxSP.h"


#define PI  (3.141592654)
#define NN  (8192)

#pragma DATA_ALIGN(x, 8);
#pragma DATA_ALIGN(y, 8);

float x[2 * NN];
float y[2 * NN];

void main(void)
{
    int i;
	int k;
	int j;
   short F1 = 10, F2 = 40;



    /*
==================================================================== */
    /* Generate Q.15 input data
    */
    /*
==================================================================== */
 for(i=0; i<NN; i++)
 {
 /* real part */
x[2 * i] = (sin(2 * PI * F1 * i / NN) + sin(2 * PI * F2 * i / NN));
 /* img part */
x[2 * i + 1] = 0;}

DSPF_sp_fftSPxSP(NN, x ,w , y , brev ,2 ,0 ,NN);


DSPF_sp_ifftSPxSP (NN, y ,w , x , brev ,2,0 ,NN);


Using the profiling function of CCS (enabling in "range types" only the
DSPF_sp_fftSPxSP function), for the 8192 point complex FFT  I get the
results:

cycle.cpu=164.004
cycle.total=867.020

Using the multipass implementation I get an unexpected light worsening!
:-(

For the 2048 FFT the results are:

cycle.cpu=34.959
cycle.total=78.563   (0.523 msec assuming a clock of 150MHz)

with a 4-pass implementation:

cycle.cpu=35.531    (it increase little)
cycle.total=67.129 (this time there is an improvement of cycle total
using the multipass)

My questions are:

1) for having an idea of the necessary time to compute FFT it is right
to  multiply the cycle.total result for the time of clock of the
simulated device?

3)The best value from which deducing the speed is the cycle.total,is
that right? Cycle.cpu does not hold account of the stalls ecc, right?

The obtained values, for example 0.513 msec for a 2048 complex fft,
are reasonable considering the simulated device? (TMS320c6711)

Thank you very very much!!!

>My questions are: > >1) for having an idea of the necessary time to compute FFT it is right >to multiply the cycle.total result for the time of clock of the >simulated device? > >3)The best value from which deducing the speed is the cycle.total,is >that right? Cycle.cpu does not hold account of the stalls ecc, right? > >The obtained values, for example 0.513 msec for a 2048 complex fft, >are reasonable considering the simulated device? (TMS320c6711) > >Thank you very very much!!! >
Few suggestions: 1) swith cache on! I'am not sure what 6711 has L2 cache, but probably it has. Caching will really speed-up your system. 10 times or more, actually on my image-processing applications. 2) do not measure filling of input array with data. get CPU counts (A) just before you run FFT, then just after (B) - with using of clock() function. Clock function is configurable in simulator software - Main menu->Profile->Clock (it's correct at least for CCS 3.0 and above) 3) Cycle.cpu doesn't hold stalls, cycles.total does. Just divide (B-A) on CPU frequency and get time in seconds.