----- Original Message -----
From: Yong Yang
Sent: Monday, April 26, 2004 3:03 PM
Subject: Re: [c6x] How to speed up profiling
Hi, Ganesh
Pls see the answers below,
What are the functions that you are trying to
profile ? All major functions in the encoder [Ganesh] If you profile all functions, then are you profiling on
simulator or board ? I guess board from your answer below. In which case, you
can profile only one function at a time. You need to use CSL (chip support
library) functions, which is either clock() or TIMER_getCount(). If you want an
estimate of your individual functions' breakup, then perform profiling for
a small resolution image on a simulator.[yong]I am profiling on board. Since the profiler tool already
provides cycle count, why need i use clock() or
TIMER_getCount()? Do you mean i don't use the profiler tool, but use my
hand-made code to count the time consumed by each function
instead?
What are your project settings ?
Function Profile Debug, Speed most Critical, Opt level:File, Program Level OPt: No
External Var Refs, RTS Modifications: Defns no Funcs, Memory Models: Far Calls
& Data, RTS CAlls:Use Memory Model [Ganesh] Ideally
you should run your code with file level optimization level -o3. When you are
profiling, you should ideally run using release mode with no debug information
whatsoever.[yong]yes, i run my code with file level
optimization level -o3. Howerver,The profiler tool needs function Profile
Debug information. So i don't think i can profile with no debug information
using release mode
What is your memory allocation pattern
? ISDRAM base:0x0, length:40000, heap size
:0x20000
SDRAM base:0x80000000, length:0x5000000, heap size:
0x3000000
All code and data are loaded to SDRAM, L2 cache 256k
enabled [Ganesh] You are using DM642 which has only 256 KB
internal memory. From your statement, I guess you aren't using L2 ISRAM or
are you ? In any case, you should be thinking of using ISRAM
judiciously.[yong]What's your recommendation
to achieve the highest performance(speed)? 256k L2 cache, plus 0 ISRAM, or other
combinations, such as 192k ISRAM plus 64k cache, etc?
What is the frequency of your DSP ?
DM 642 600MHZ [Ganesh] Are
you using C6416 TEB or DM642? I am getting this doubt as you have specified
ISRAM address as well as claiming 256 KB cache which isn't possible in
DM642.[yong]i'm using DM642 EVM. I
specified ISRAM address in DSP/BIOS config file, while claiming 256 KB cache in
code by CACHE_setL2Mode(CACHE_256KCACHE). Maybe it's a mistake and i should
make ISRAM+L2 cache= 256k, is it?
Have you optimized your code or are you trying
to cross-compile the code ? Optimazed on Pentium
3.2G PC, speed around 80fps. Now on DSP only 2fps, need realtime 15fps [Ganesh] You need to go through optimizing C code for C6416
document as well performing coding of some low level functions.[yong]How much do you think is possible to improve the speed by
code optimization such as linear assembly, software pipeline, etc. Is it
possible to move from 2fps to 15fps? otherwise shall i do some algorithm
optimation before code optimation?
How are you profiling ? Are you using clock()
functions or TIMER module ?
Using profiler tool. Under menu->Start new Sesseion,
then select profile area. No clock() functions or TIMER module [Ganesh] Answered
above.