DSPRelated.com
Forums

[F2812 EZDSP] Inaccurate timing executing code relocated from Flash

Started by seankuay January 12, 2006
Hi Expers

I am now using the eZdsp board to perform some tasks on the GPIO of 
TMS320f2812 dsp controller, 150 mhz.


PROBLEM SCNENARIO:
===================
1. When i run from JTAG emulator (execute completely in RAM), i am 
able to control the output timing of GPIO pin very accurately by 
6.7ns per NOP operation (1 isntruction). This means i can get the 
150 Mhz as state by the.
HOWEVER, when i burn my code into the flash using the flash 
programmer tool in CCS v2, i can't get exactly the same 6.7ns per 
cycle. I have copied the code from flash to HO SARAM, as below:

//****in my C file..
#pragma CODE_SECTION(dataMgmt,"ramfuncs");
..
MemCopy(&RamfuncsLoadStart, &RamfuncsLoadEnd, &RamfuncsRunStart);
..

//****in the linker.cmd file,
ramfuncs : LOAD = FLASHA, 
RUN = RAMH0, 
LOAD_START(_RamfuncsLoadStart),
LOAD_END(_RamfuncsLoadEnd),
RUN_START(_RamfuncsRunStart),
PAGE = 0

What i get now is that..
i can only get 8.3ns per NOP operation, which is only roughly 120 
MIPS. I understand that running from flash will be slow, but i have 
already copied the code from flash to HO SARAM. I have confirmed 
this by checking the linker output map file.. 

// the linker output MAP file... 0x3f66xx is still in flash, 
0x3f80000 is beginning of dsp internal RAM
003f664b _RamfuncsLoadStart
003f685e _RamfuncsLoadEnd
003f685e _PieVectTableInit
003f7ff8 _CsmPwl
003f8000 _RamfuncsRunStart
003f8000 _InitFlash
003f8016 _pieCDelay
003f801d _dataMgmt

FYI, i have enabled the flash pipelining mode, and also enable the 
PLL..

//enable flash pipeline mode
FlashRegs.FOPT.bit.ENPIPE=1; 

//xclkout, sysclkout initialization 
SysCtrlRegs.PLLCR.all=0xa;
XintfRegs.XINTCNF2.bit.XTIMCLK=0x0; //if xtimclk=0, xtimclk=sysclk/1
XintfRegs.XINTCNF2.bit.CLKMODE=0x0; //if clkmode=0, xclkout NOT 
divide 


QUESTIONS:
===========
1. Is my way of relocating the code wrong??
2. What can possibly cause this? Is it what it's supposed to be? why 
i can get the exact timing in jtag emulation mode, but not running 
from flash even though i have relocated the code?
3. BTW, is there any other easier way to relocate code? I want to 
relocate the whole section, for eg. the .cinit, .text, .pinit or 
etc. straight away to RAM, rather than relocating function by 
function by using #pragma


Really really Thanks 
On Wed, 11 Jan 2006 23:19:46 -0600, "seankuay" <seankuay@yahoo.com>
wrote in comp.dsp:

> > Hi Expers > > I am now using the eZdsp board to perform some tasks on the GPIO of > TMS320f2812 dsp controller, 150 mhz. > > > PROBLEM SCNENARIO: > =================== > 1. When i run from JTAG emulator (execute completely in RAM), i am > able to control the output timing of GPIO pin very accurately by > 6.7ns per NOP operation (1 isntruction). This means i can get the > 150 Mhz as state by the. > HOWEVER, when i burn my code into the flash using the flash > programmer tool in CCS v2, i can't get exactly the same 6.7ns per > cycle. I have copied the code from flash to HO SARAM, as below: > > //****in my C file.. > #pragma CODE_SECTION(dataMgmt,"ramfuncs"); > .. > MemCopy(&RamfuncsLoadStart, &RamfuncsLoadEnd, &RamfuncsRunStart);
Surely this is not your real code... What is "MemCopy"??? It is not a standard C function, and as far as I know, it is not in the TI library either. It is certainly not in their application note. Go to TI's web site and download this application note: "Running an Application from Internal Flash Memory on the TMS320F28xx DSP (Rev. E)"
> .. > > //****in the linker.cmd file, > ramfuncs : LOAD = FLASHA, > RUN = RAMH0, > LOAD_START(_RamfuncsLoadStart), > LOAD_END(_RamfuncsLoadEnd), > RUN_START(_RamfuncsRunStart), > PAGE = 0 > > What i get now is that.. > i can only get 8.3ns per NOP operation, which is only roughly 120 > MIPS. I understand that running from flash will be slow, but i have > already copied the code from flash to HO SARAM. I have confirmed > this by checking the linker output map file.. > > // the linker output MAP file... 0x3f66xx is still in flash, > 0x3f80000 is beginning of dsp internal RAM > 003f664b _RamfuncsLoadStart > 003f685e _RamfuncsLoadEnd > 003f685e _PieVectTableInit > 003f7ff8 _CsmPwl > 003f8000 _RamfuncsRunStart > 003f8000 _InitFlash > 003f8016 _pieCDelay > 003f801d _dataMgmt > > FYI, i have enabled the flash pipelining mode, and also enable the > PLL.. > > //enable flash pipeline mode > FlashRegs.FOPT.bit.ENPIPE=1; > > //xclkout, sysclkout initialization > SysCtrlRegs.PLLCR.all=0xa; > XintfRegs.XINTCNF2.bit.XTIMCLK=0x0; //if xtimclk=0, xtimclk=sysclk/1 > XintfRegs.XINTCNF2.bit.CLKMODE=0x0; //if clkmode=0, xclkout NOT > divide > > > QUESTIONS: > =========== > 1. Is my way of relocating the code wrong??
Yes, your way of relocating the code is very, very wrong, although I do not know if that is what is causing your problem or not. Your code should look like this (make sure to include <string.h>): memcpy(&RamfuncsRunStart, &RamfuncsLoadStart, (&RamfuncsLoadEnd - &RamfuncsLoadStart) + 1); Note that there is a mistake in some of the sample code in TI's application note, they don't show the final "+ 1" on the number of words to copy.
> 2. What can possibly cause this? Is it what it's supposed to be? why > i can get the exact timing in jtag emulation mode, but not running > from flash even though i have relocated the code?
If your copy operation is faulty, you could be overwriting anything, including the PLL control register, changing it from 10/2 to 8/2, which would change your actual execution speed from 150 MHz to 120 MHz.
> 3. BTW, is there any other easier way to relocate code? I want to > relocate the whole section, for eg. the .cinit, .text, .pinit or > etc. straight away to RAM, rather than relocating function by > function by using #pragma
I don't think you really want to allocate whole sections to internal RAM, and especially not sections like .cinit and .pinit that only run once at start up and then are done forever. There is much less internal RAM than there is flash, so you generally only put interrupt service routines and other speed critical routines there. But another way to do it is to put all that you want to copy to internal RAM in one or more files without any code that you don't want to copy to RAM. They you can use the linker .cmd file to specify all code (or constant or bss or data) from one or more files into a section to be copied: ramfuncs : LOAD = FLASHA, RUN = RAMH0, LOAD_START(_RamfuncsLoadStart), LOAD_END(_RamfuncsLoadEnd), RUN_START(_RamfuncsRunStart), PAGE = 0 Code_To_Copy: LOAD = FLASH_SECTION, PAGE = 0 LOAD_START(_Code_To_Copy_loadstart), LOAD_END(_Code_To_Copy_loadend), RUN_START(_Code_To_Copy_runstart) RUN_START(_Code_To_Copy_runstart) { file1.obj (.text) file2.obj (.text) file2.obj (.text) } Again, download the TI application note I mentioned, from this page: http://focus.ti.com/dsp/docs/dspsupporttechdocsc.tsp?sectionId=3&tabId=409&familyId=510&abstractName=spra958e There is also a link on that page to download sample code and projects that use the features of the app note. -- Jack Klein Home: http://JK-Technology.Com FAQs for comp.lang.c http://c-faq.com/ comp.lang.c++ http://www.parashift.com/c++-faq-lite/ alt.comp.lang.learn.c-c++ http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
>QUESTIONS: >=========== >1. Is my way of relocating the code wrong??
No, your way of relocating code is perfectly fine. In response to Jack Klein the function MemCopy is defined as: void MemCopy(Uint16 *SourceAddr, Uint16* SourceEndAddr, Uint16* DestAddr) { while(SourceAddr < SourceEndAddr) { *DestAddr++ = *SourceAddr++; } return; } You also correctly verified that the function was running from RAM by looking at the .map file: 003f8000 _RamfuncsRunStart 003f8000 _InitFlash 003f8016 _pieCDelay 003f801d _dataMgmt As you can see, the RAM functions are indeed located in the RAM (as InitFlash must be btw). As an aside I found that the #pragma declaration for each function must be found within the file the function is actually defined.
>2. What can possibly cause this? Is it what it's supposed to be? why >i can get the exact timing in jtag emulation mode, but not running >from flash even though i have relocated the code?
The most likely cause of your timing difference is that the place you are calling your RAM function from is still running from FLASH. If you really want to test out speed differences you should put this within your RAM function dataMgmt: for(;;){ <toggle DSP output> } Hooking a scope to the DSP output should give you a square wave of a certain frequency when running from the debugger. If you are correctly running from RAM then when your program is burned to the FLASH (and your function copied to RAM) you should see the exact same frequency of the DSP output toggling.