DSPRelated.com
Forums

Results of tests on the fft on DSP TMS320c6711, are the results reasonable?

Started by dukke3d January 31, 2006
Test on the fft on DSP TMS320c6711 with CCS 3.1 device cycle
accurate simulator.
For the single pass implementation the code is the following:

#define CHIP_6711C

#include <stdio.h>
#include <stdlib.h>
#include <csl.h>
#include <csl_timer.h>
#include <math.h>

#include "twiddlesp8192.h" /*twiddle factors*/
#include "brev_table.h"
#include "DSPF_sp_fftSPxSP.h"
#include "DSPF_sp_ifftSPxSP.h" #define PI (3.141592654)
#define NN (8192)

#pragma DATA_ALIGN(x, 8);
#pragma DATA_ALIGN(y, 8);

float x[2 * NN];
float y[2 * NN];

void main(void)
{
int i;
int k;
int j;
short F1 = 10, F2 = 40;
/*
====================================================================
*/
/* Generate Q.15 input
data */
/*
====================================================================
*/
for(i=0; i<NN; i++)
{
/* real part */
x[2 * i] = (sin(2 * PI * F1 * i / NN) + sin(2 * PI * F2 * i / NN));
/* img part */
x[2 * i + 1] = 0;}

DSPF_sp_fftSPxSP(NN, x ,w , y , brev ,2 ,0 ,NN); DSPF_sp_ifftSPxSP (NN, y ,w , x , brev ,2,0 ,NN); Using the profiling function of CCS (enabling in "range types" only
the DSPF_sp_fftSPxSP function), for the 8192 point complex FFT I
get the results:

cycle.cpu = 164.004
cycle.total = 2.133.073

(cycle.cpu:Excl.total = cycle.cpu:Incl:total, there are no
subroutines because I only estimate the fft function) Using the multipass implementation I get a light worsening! :-(

For the 2048 FFT the results are:

cycle.cpu = 34.959
cycle.total = 122.984 (0.614 msec assuming a clock of 200MHz)

with a 16-pass implementation:

cycle.cpu = 36.500 (it increase little)
cycle.total = 115.706 (this time there is an improvement of cycle
total using the multipass)

For a 1024 FFT with 4-pass implementation:

cycle.cpu = 14787
cycle.total = 54.091

My questions are:

1)doubling the size of fft (1024 to 2048 for example) the cycles
increase of a factor 2,8 , but then, between 4096 and 8192 there is
an increase of a factor 7,5!. Is that normal or possible? It can be
caused by the filling of the memory?

2)The number of cycles above are reasonable(at least in the case of
2048 complex FFT, 0.614 msec) for the device that I am simulating?
(c6711)

3)The best value from which deducing the speed is the cycle.total,is
that right? Cycle.cpu does not hold account of the stalls ecc, right?

In the end I repeat the the more important question:
the obtained values are reasonable considering the simulated device?

Thank you very very much!!!