F2812 MIPS for C code appears low

Started by perf...@yahoo.com February 16, 2005
ok (Tim? :-), now I've got the PLL set to 150Mhz, all code and data
located in internal RAM, I can verify actual 150 MIPS speed using RPT
NOP instructions, but now when I time an actual C function such as the
PID example TI supplies, it appears to be ~1/6 the speed of 150 MIPS.

e.g. timing the following C function, I get about 5 us per call.
Single stepping in assembler mode, I count about 120 mouse clicks to go
through one iteration.  But 5 us equates to about 750 instructions at
150 MIPS, so what is going on here?

Is this due to pipeline inefficiency effects, eg stalls or
address/data-bus collisions or etc?  All data and code appear to be
located in internal RAM afaics, I watch the AR registers and dont see
any external fetches.  If I relocate to external RAM then I can see a
definite increase, eg 500 times slower (!), so I am pretty confident
about being in internal RAM.

tia for any clues!

------------------------

TI PID example (~120 assembler instruction steps per call):


void pid_reg3_calc(PIDREG3 *v)
{

    v->e_reg3 = v->pid_ref_reg3 - v->pid_fdb_reg3;

    v->up_reg3 = v->Kp_reg3*v->e_reg3;

    v->uprsat_reg3 = v->up_reg3 + v->ui_reg3 + v->ud_reg3;

    if (v->uprsat_reg3 > v->pid_out_max)

      v->pid_out_reg3 =  v->pid_out_max;

    else if (v->uprsat_reg3 < v->pid_out_min)

      v->pid_out_reg3 =  v->pid_out_min;

    else

      v->pid_out_reg3 = v->uprsat_reg3;

    v->saterr_reg3 = v->pid_out_reg3 - v->uprsat_reg3;

    v->ui_reg3 = v->ui_reg3 + v->Ki_reg3*v->up_reg3 +
v->Kc_reg3*v->saterr_reg3;

    v->ud_reg3 = v->Kd_reg3*(v->up_reg3 - v->up1_reg3);

    v->up1_reg3 = v->up_reg3;

}


linker cmd file:

MEMORY
{
PAGE 0 :
   /* For this example, H0 is split between PAGE 0 and PAGE 1 */
   /* BEGIN is used for the "boot to HO" bootloader mode      */
   /* RESET is loaded with the reset vector only if           */
   /* the boot is from XINTF Zone 7.  Otherwise reset vector  */
   /* is fetched from boot ROM. See .reset section below      */


   RAMM0      : origin = 0x000000, length = 0x000400
   BEGIN      : origin = 0x3F8000, length = 0x000002
  /* PRAMH0     : origin = 0x3F8002, length = 0x000FFE internal RAM */
 /*  PRAMH0     : origin = 0x100000, length = 0x03E800 external RAM */
   PRAMH0     : origin = 0x3F8002, length = 0x000FFE
   RESET      : origin = 0x3FFFC0, length = 0x000002


PAGE 1 :

   /* For this example, H0 is split between PAGE 0 and PAGE 1 */

   RAMM1    : origin = 0x000400, length = 0x000400
   DRAMH0   : origin = 0x3f9000, length = 0x001000
}


SECTIONS
{
   /* Setup for "boot to H0" mode:
      The codestart section (found in DSP28_CodeStartBranch.asm)
      re-directs execution to the start of user code.
      Place this section at the start of H0  */

   codestart        : > BEGIN,       PAGE = 0
   ramfuncs         : > PRAMH0       PAGE = 0
   .text            : > PRAMH0,      PAGE = 0
   .cinit           : > PRAMH0,      PAGE = 0
   .pinit           : > PRAMH0,      PAGE = 0
   .switch          : > RAMM0,       PAGE = 0
   .reset           : > RESET,       PAGE = 0, TYPE = DSECT /* not
used, */

   .stack           : > RAMM1,       PAGE = 1
   .ebss            : > DRAMH0,      PAGE = 1
   .econst          : > DRAMH0,      PAGE = 1
   .esysmem         : > DRAMH0,      PAGE = 1
}

perfb@yahoo.com wrote:
> ok (Tim? :-), now I've got the PLL set to 150Mhz, all code and data > located in internal RAM, I can verify actual 150 MIPS speed using RPT > NOP instructions, but now when I time an actual C function such as the > PID example TI supplies, it appears to be ~1/6 the speed of 150 MIPS. > > e.g. timing the following C function, I get about 5 us per call. > Single stepping in assembler mode, I count about 120 mouse clicks to go > through one iteration. But 5 us equates to about 750 instructions at > 150 MIPS, so what is going on here? > > Is this due to pipeline inefficiency effects, eg stalls or > address/data-bus collisions or etc? All data and code appear to be > located in internal RAM afaics, I watch the AR registers and dont see > any external fetches. If I relocate to external RAM then I can see a > definite increase, eg 500 times slower (!), so I am pretty confident > about being in internal RAM. > > tia for any clues! > > ------------------------ > > TI PID example (~120 assembler instruction steps per call): > > > void pid_reg3_calc(PIDREG3 *v) > { > > v->e_reg3 = v->pid_ref_reg3 - v->pid_fdb_reg3; > > v->up_reg3 = v->Kp_reg3*v->e_reg3; > > v->uprsat_reg3 = v->up_reg3 + v->ui_reg3 + v->ud_reg3; > > if (v->uprsat_reg3 > v->pid_out_max) > > v->pid_out_reg3 = v->pid_out_max; > > else if (v->uprsat_reg3 < v->pid_out_min) > > v->pid_out_reg3 = v->pid_out_min; > > else > > v->pid_out_reg3 = v->uprsat_reg3; > > v->saterr_reg3 = v->pid_out_reg3 - v->uprsat_reg3; > > v->ui_reg3 = v->ui_reg3 + v->Ki_reg3*v->up_reg3 + > v->Kc_reg3*v->saterr_reg3; > > v->ud_reg3 = v->Kd_reg3*(v->up_reg3 - v->up1_reg3); > > v->up1_reg3 = v->up_reg3; > > } > > > linker cmd file: >
- snipped -
>
You don't have the structure definition, but it appears to be floating point -- is this so? Did you trace down into each floating point call and count instructions there, too? Did you start from _outside_ the function and count the clocks to get in, then back out (which should takes somewhat less than 600 clocks, to be sure!). -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
aha, yes, it is due to floating pt, I hadnt realized as I only
stepped-over not thru, and the branch instruction didnt disassemble
correctly in Code Composer for some reason, the branch showed as a
'.word' not as an instruction,

anyway that explains it, thanks again, Tim!

perfb@yahoo.com wrote:

> aha, yes, it is due to floating pt, I hadnt realized as I only > stepped-over not thru, and the branch instruction didnt disassemble > correctly in Code Composer for some reason, the branch showed as a > '.word' not as an instruction, > > anyway that explains it, thanks again, Tim! >
IIRC you have to set up your 'GEL' file for the memory mode you're in -- the 28xx wakes up emulating a 24x or a 27xx and you have to tell it what it is, but then the debugger needs to know, too. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com