Reply by Nils October 22, 20082008-10-22
johnstokes wrote:
> When i write a simple linear assembler function program it takes up 32 > cycles, i know this from viewing the mixed asm ouput also i know that for > example nop takes up 5 cycles add takes up 1 etc. > > But when i profile the program it takes up 81 cycles on the highest > compliler optimisation setting (-o03). I am clearly only profiling the > function being called so why is there so many more CPU cycles?
Caching. If you measure the performance on a real piece of hardware the memory latency can make a major difference. There are multiple ways around this: 1. Put the code and the data into internal memory. 2. Run your test twice with equal parameters. This moves the code into the code-cache and maybe the data into the data-cache. The data had to fit into the cache (e.g. don't run your test on hundrets of kilobytes). I have no experience with the 6713, but the C64x+ does not look that different from an architectural point of view. On that machine a cache miss can take up to 66 cycles, even with very fast RAM. It may or may not apply to your platform, but I just wanted to give you a number so you can see how much of a difference it can make in practice. Btw - regarding the cache: Check the cache allocation strategy of your first level cache. If it's set to read allocate and you do only writes to a chunk of memory you're bypassing it and the performance will suffer a lot. I've seen speed-ups of 30% and more by just doing dummy-loads on the memory before I did writes to it.
Reply by Jerry Avins October 21, 20082008-10-21
johnstokes wrote:
> hi > i am new to dsp so please be patient:)) > > i am using the ti c6713 dsk board. i am looking at cycle comparison in c > and linear assembler. i am using the simulator and also real time > implementation. I write main in c and call c and linear assembler > functions through it. > > When i write a simple linear assembler function program it takes up 32 > cycles, i know this from viewing the mixed asm ouput also i know that for > example nop takes up 5 cycles add takes up 1 etc. > > But when i profile the program it takes up 81 cycles on the highest > compliler optimisation setting (-o03). I am clearly only profiling the > function being called so why is there so many more CPU cycles?
Can't you see the compiler's assembly output?
> is it due to the process of calling the function? putting variables on > stack etc? i get similar results on real time implementation and on the > simulator
How do you suppose that someone learns the answer? Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by johnstokes October 21, 20082008-10-21
hi
i am new to dsp so please be patient:))

i am using the ti c6713 dsk board.  i am looking at cycle comparison in c
and linear assembler.  i am using the simulator and also real time
implementation.  I write main in c and call c and linear assembler
functions through it.

When i write a simple linear assembler function program it takes up 32
cycles, i know this from viewing the mixed asm ouput also i know that for
example nop takes up 5 cycles add takes up 1 etc.  

But when i profile the program it takes up 81 cycles on the highest
compliler optimisation setting (-o03).  I am clearly only profiling the
function being called so why is there so many more CPU cycles?

is it due to the process of calling the function? putting variables on
stack etc? i get similar results on real time implementation and on the
simulator

thanks
john