Reply by Yong Yang April 26, 20042004-04-26
Hi Ganesh,
 
Thanks for your answer.  I need some clarification. Pls see them embedded below (in red).
 
Thanks
 
Yong


Ganesh Vijayan <g...@emuzed.com> wrote:
Hi Yong,
Find my answers embedded in your mail below:
----- Original Message -----
From: Yong Yang
Sent: Monday, April 26, 2004 3:03 PM
Subject: Re: [c6x] How to speed up profiling

Hi, Ganesh

Pls see the answers below,

What are the functions that you are trying to profile ?

All major functions in the encoder [Ganesh] If you profile all functions, then are you profiling on simulator or board ? I guess board from your answer below. In which case, you can profile only one function at a time. You need to use CSL (chip support library) functions, which is either clock() or TIMER_getCount(). If you want an estimate of your individual functions' breakup, then perform profiling for a small resolution image on a simulator.[yong]I am profiling on board. Since the profiler tool already provides cycle count, why need i use clock() or TIMER_getCount()? Do you mean i don't use the profiler tool, but use my hand-made code to count the time consumed by each function instead?

What are your project settings ?

Function Profile Debug, Speed most Critical, Opt level:File, Program Level OPt: No External Var Refs, RTS Modifications: Defns no Funcs, Memory Models: Far Calls & Data, RTS CAlls:Use Memory Model [Ganesh] Ideally you should run your code with file level optimization level -o3. When you are profiling, you should ideally run using release mode with no debug information whatsoever.[yong]yes, i run my code with file level optimization level -o3. Howerver,The profiler tool needs function Profile Debug information. So i don't think i can profile with no debug information using release mode

What is your memory allocation pattern ?

ISDRAM base:0x0, length:40000, heap size :0x20000

SDRAM base:0x80000000, length:0x5000000, heap size: 0x3000000

All code and data are loaded to SDRAM, L2 cache 256k enabled [Ganesh] You are using DM642 which has only 256 KB internal memory. From your statement, I guess you aren't using L2 ISRAM or are you ? In any case, you should be thinking of using ISRAM judiciously.[yong]What's your recommendation to achieve the highest performance(speed)? 256k L2 cache, plus 0 ISRAM, or other combinations, such as 192k ISRAM plus 64k cache, etc?

What is the frequency of your DSP ?

DM 642  600MHZ [Ganesh] Are you using C6416 TEB or DM642? I am getting this doubt as you have specified ISRAM address as well as claiming 256 KB cache which isn't possible in DM642.[yong]i'm using DM642 EVM. I specified ISRAM address in DSP/BIOS config file, while claiming 256 KB cache in code by CACHE_setL2Mode(CACHE_256KCACHE). Maybe it's a mistake and i should make ISRAM+L2 cache= 256k, is it?

Have you optimized your code or are you trying to cross-compile the code ?

Optimazed on Pentium 3.2G PC, speed around 80fps. Now on DSP only 2fps, need realtime 15fps [Ganesh] You need to go through optimizing C code for C6416 document as well performing coding of some low level functions.[yong]How much do you think is possible to improve the speed by code optimization such as linear assembly, software pipeline, etc. Is it possible to move from 2fps to 15fps? otherwise shall i do some algorithm optimation before code optimation?

How are you profiling ? Are you using clock() functions or TIMER module ?

Using profiler tool. Under menu->Start new Sesseion, then select profile area. No clock() functions or TIMER module [Ganesh] Answered above.