DSPRelated.com
Forums

How to speed up profiling

Started by Yong Yang April 23, 2004
Hi, all
 
I am profiling my video encoder program on TI DM642 EVM. It's extremely slow. 17 hours has passed but it even has not encoded one frame! Any way to speed it up?
 
Thank you
Yong



Hello Yong,
Are you having the project under "Full Symbolic Debug" option?
Making the project options "Function Profile Debug" will make it a bit faster, also compiler will do better optimization with symbols stripped.
 
Amrut

Yong Yang <y...@yahoo.com> wrote:
Hi, all
 
I am profiling my video encoder program on TI DM642 EVM. It's extremely slow. 17 hours has passed but it even has not encoded one frame! Any way to speed it up?
 
Thank you
Yong


Yahoo! India Matrimony: Find your partner online.

Yong,
 
Beware of what you profile.  For each 'item' [normally a function] that you profile, multiple breakpoints are set, the target is run to a breakpoint, data is collected and the process is repeated.
 
If you profile several 'inner loop' functions, you will slow down the actual execution to "almost stopped" - especially if you try to profile 'everything'.  Instead, try to break things down and profile a few items at a time - or use the simulator if you can.  This is one area that it "blows away" the hardware in execution speed.
 
mikedunn

Yong Yang <y...@yahoo.com> wrote:
Hi, all
 
I am profiling my video encoder program on TI DM642 EVM. It's extremely slow. 17 hours has passed but it even has not encoded one frame! Any way to speed it up?
 
Thank you
Yong



Note: If you do a simple "reply" with your email client, only the author of this message will receive your answer.  You need to do a "reply all" if you want your answer to be distributed to the entire group.

_____________________________________
About this discussion group:

To Join:  Send an email to c...@yahoogroups.com

To Post:  Send an email to c...@yahoogroups.com

To Leave: Send an email to c...@yahoogroups.com

Archives: http://www.yahoogroups.com/group/c6x

Other Groups: http://www.dsprelated.com



Yahoo! Groups Links
To


Yong Yang-

> I am profiling my video encoder program on TI DM642 EVM. It's extremely slow.
17
> hours has passed but it even has not encoded one frame! Any way to speed it
up?

I'd be happy to help you with some advice, since we are working with DM642 EVM
board
also.

It sounds like you are sort of stuck with basic things that need to be fixed
before
you can even think about video encoder performance.

Could you answer my previous first? I need to know if you could get simple
loopback
working, without your video codec. I cannot help you unless you answer my
questions.

-Jeff




Hi Yong,

One method (suggested by TI people) which I have
started using for my video decoder is to use the timer
register and do intrusive instrumentation of the major
functions of the code. There is a slight overhead of
using the timer, but overall it is would run in the
same as without profiling. Also you will be able to
run the code without any debug option (e.g. function
profile debug), so it would be as optimized as the
final version. Regards
Piyush

--- Yong Yang <> wrote:
> Hi, all
>
> I am profiling my video encoder program on TI DM642
> EVM. It's extremely slow. 17 hours has passed but it
> even has not encoded one frame! Any way to speed it
> up?
>
> Thank you
> Yong > ---------------------------------
>

=====
**************************************
And---"A blind Understanding!" Heav'n replied.

Piyush Kaul
http://www.geocities.com/piyushkaul

__________________________________



Hi, Jeff
 
The simple loopback is working. Actually i got it from a TI sample program of EVM DM642, called "scaling", which gets captured video and outputs to TV. I simply changed the output to the internal buffer of my encoder. Maybe shall i remove all the code about display?
 
You can find this sample from C:\ti\boards\evmdm642\examples\video\driver\, if you installed CCS in C:\ti\.
 
Thanks
Yong

Jeff Brower <j...@signalogic.com> wrote:
Yong Yang-

> I am profiling my video encoder program on TI DM642 EVM. It's extremely slow. 17
> hours has passed but it even has not encoded one frame! Any way to speed it up?

I'd be happy to help you with some advice, since we are working with DM642 EVM board
also.

It sounds like you are sort of stuck with basic things that need to be fixed before
you can even think about video encoder performance.

Could you answer my previous first? I need to know if you could get simple loopback
working, without your video codec. I cannot help you unless you answer my questions.

-Jeff



Hi Yong,
Mike Dunn wrote appropriately that Beware of your profiling. Kindly tell me the following to help you to profile your code better:
  • What are the functions that you are trying to profile ?
  • What are your project settings ?
  • What is your memory allocation pattern ?
  • What is the frequency of your DSP ?
  • Have you optimized your code or are you trying to cross-compile the code ?
  • How are you profiling ? Are you using clock() functions or TIMER module ? If so, can you share your settings with us. I guess 3rd answer should be a true indicator of your encoder's performance.
If you can shed some light on the above, then it would be helpful to point out your problem.
Hope this helps.
Ganesh
----- Original Message -----
From: Yong Yang
To: Jeff Brower
Cc: c...@yahoogroups.com ; C...@yahoogroups.com
Sent: Monday, April 26, 2004 8:58 AM
Subject: Re: [c6x] How to speed up profiling

Hi, Jeff
 
The simple loopback is working. Actually i got it from a TI sample program of EVM DM642, called "scaling", which gets captured video and outputs to TV. I simply changed the output to the internal buffer of my encoder. Maybe shall i remove all the code about display?
 
You can find this sample from C:\ti\boards\evmdm642\examples\video\driver\, if you installed CCS in C:\ti\.
 
Thanks
Yong

Jeff Brower <j...@signalogic.com> wrote:
Yong Yang-

> I am profiling my video encoder program on TI DM642 EVM. It's extremely slow. 17
> hours has passed but it even has not encoded one frame! Any way to speed it up?

I'd be happy to help you with some advice, since we are working with DM642 EVM board
also.

It sounds like you are sort of stuck with basic things that need to be fixed before
you can even think about video encoder performance.

Could you answer my previous first? I need to know if you could get simple loopback
working, without your video codec. I cannot help you unless you answer my questions.

-Jeff



Note: If you do a simple "reply" with your email client, only the author of this message will receive your answer.  You need to do a "reply all" if you want your answer to be distributed to the entire group.

_____________________________________
About this discussion group:

To Join:  Send an email to c...@yahoogroups.com

To Post:  Send an email to c...@yahoogroups.com

To Leave: Send an email to c...@yahoogroups.com

Archives: http://www.yahoogroups.com/group/c6x

Other Groups: http://www.dsprelated.com





Hi Yong,
Find my answers embedded in your mail below:
----- Original Message -----
From: Yong Yang
Sent: Monday, April 26, 2004 3:03 PM
Subject: Re: [c6x] How to speed up profiling

Hi, Ganesh

Pls see the answers below,

What are the functions that you are trying to profile ?

All major functions in the encoder [Ganesh] If you profile all functions, then are you profiling on simulator or board ? I guess board from your answer below. In which case, you can profile only one function at a time. You need to use CSL (chip support library) functions, which is either clock() or TIMER_getCount(). If you want an estimate of your individual functions' breakup, then perform profiling for a small resolution image on a simulator.

What are your project settings ?

Function Profile Debug, Speed most Critical, Opt level:File, Program Level OPt: No External Var Refs, RTS Modifications: Defns no Funcs, Memory Models: Far Calls & Data, RTS CAlls:Use Memory Model [Ganesh] Ideally you should run your code with file level optimization level -o3. When you are profiling, you should ideally run using release mode with no debug information whatsoever.

What is your memory allocation pattern ?

ISDRAM base:0x0, length:40000, heap size :0x20000

SDRAM base:0x80000000, length:0x5000000, heap size: 0x3000000

All code and data are loaded to SDRAM, L2 cache 256k enabled [Ganesh] You are using DM642 which has only 256 KB internal memory. From your statement, I guess you aren't using L2 ISRAM or are you ? In any case, you should be thinking of using ISRAM judiciously.

What is the frequency of your DSP ?

DM 642  600MHZ [Ganesh] Are you using C6416 TEB or DM642? I am getting this doubt as you have specified ISRAM address as well as claiming 256 KB cache which isn't possible in DM642.

Have you optimized your code or are you trying to cross-compile the code ?

Optimazed on Pentium 3.2G PC, speed around 80fps. Now on DSP only 2fps, need realtime 15fps [Ganesh] You need to go through optimizing C code for C6416 document as well performing coding of some low level functions.

How are you profiling ? Are you using clock() functions or TIMER module ?

Using profiler tool. Under menu->Start new Sesseion, then select profile area. No clock() functions or TIMER module [Ganesh] Answered above .




Hi Yong,
Find my answers in Maroon Color.
Hope that helps.
Ganesh
----- Original Message -----
From: Yong Yang
Subject: Re: [c6x] How to speed up profiling

Hi Ganesh,
 
Thanks for your answer.  I need some clarification. Pls see them embedded below (in red).
 
Thanks
 
Yong


Ganesh Vijayan <g...@emuzed.com> wrote:
Hi Yong,
Find my answers embedded in your mail below:
----- Original Message -----
From: Yong Yang
Sent: Monday, April 26, 2004 3:03 PM
Subject: Re: [c6x] How to speed up profiling

Hi, Ganesh

Pls see the answers below,

What are the functions that you are trying to profile ?

All major functions in the encoder [Ganesh] If you profile all functions, then are you profiling on simulator or board ? I guess board from your answer below. In which case, you can profile only one function at a time. You need to use CSL (chip support library) functions, which is either clock() or TIMER_getCount(). If you want an estimate of your individual functions' breakup, then perform profiling for a small resolution image on a simulator.[yong]I am profiling on board. Since the profiler tool already provides cycle count, why need i use clock() or TIMER_getCount()? Do you mean i don't use the profiler tool, but use my hand-made code to count the time consumed by each function instead? [Ganesh] If you are profiling on board, with my experience on the same, I can suggest that the data given out maynot be correct. To get a correct value, I would suggest you to use CSL functions. Ideally, both the CCS profiler as well as clock use the same APIs, but I have one observation. If you are profiling small duration functions, then both clock from CCS as well as TIMER will give you same numbers. But if you were to profile huge functions, for eg: Application entirely, then TIMER is a better bet.

What are your project settings ?

Function Profile Debug, Speed most Critical, Opt level:File, Program Level OPt: No External Var Refs, RTS Modifications: Defns no Funcs, Memory Models: Far Calls & Data, RTS CAlls:Use Memory Model [Ganesh] Ideally you should run your code with file level optimization level -o3. When you are profiling, you should ideally run using release mode with no debug information whatsoever.[yong]yes, i run my code with file level optimization level -o3. Howerver,The profiler tool needs function Profile Debug information. So i don't think i can profile with no debug information using release mode [Ganesh] Yes, you are correct when you are trying to profile individual functions, you require function profile debug setting.

What is your memory allocation pattern ?

ISDRAM base:0x0, length:40000, heap size :0x20000

SDRAM base:0x80000000, length:0x5000000, heap size: 0x3000000

All code and data are loaded to SDRAM, L2 cache 256k enabled [Ganesh] You are using DM642 which has only 256 KB internal memory. From your statement, I guess you aren't using L2 ISRAM or are you ? In any case, you should be thinking of using ISRAM judiciously.[yong]What's your recommendation to achieve the highest performance(speed)? 256k L2 cache, plus 0 ISRAM, or other combinations, such as 192k ISRAM plus 64k cache, etc?[Ganesh] Read my earlier statement. You need to utilize your L2 "judiciously". I can only suggest you to have a combination of ISRAM and Cache for better performance, but you are the ultimate judge .

What is the frequency of your DSP ?

DM 642  600MHZ [Ganesh] Are you using C6416 TEB or DM642? I am getting this doubt as you have specified ISRAM address as well as claiming 256 KB cache which isn't possible in DM642.[yong]i'm using DM642 EVM. I specified ISRAM address in DSP/BIOS config file, while claiming 256 KB cache in code by CACHE_setL2Mode(CACHE_256KCACHE). Maybe it's a mistake and i should make ISRAM+L2 cache= 256k, is it? [Ganesh] Yes, in DM642 your total L2 is 256k which should be Cache+ ISRAM.

Have you optimized your code or are you trying to cross-compile the code ?

Optimazed on Pentium 3.2G PC, speed around 80fps. Now on DSP only 2fps, need realtime 15fps [Ganesh] You need to go through optimizing C code for C6416 document as well performing coding of some low level functions.[yong]How much do you think is possible to improve the speed by code optimization such as linear assembly, software pipeline, etc. Is it possible to move from 2fps to 15fps? otherwise shall i do some algorithm optimation before code optimation? [Ganesh] I can only suggest you to go through TMS320C6000 Optimizing C Compiler User's Guide (spru187i.pdf). It will answer all your questions.

How are you profiling ? Are you using clock() functions or TIMER module ?

Using profiler tool. Under menu->Start new Sesseion, then select profile area. No clock() functions or TIMER module [Ganesh] Answered above.



Hi, Piyush, Ganesh and all
 
I used clock() function to profile my encoder.
At the start of every function, i put
//yy
clock_t start=clock();
clock_t stop;
static clock_t sum = 0;  //to sum up total time the function costs
static int enter_c = 0;   //to count the times the function is called
//yy end
At the end of every function, i put
//yy added, record is a structure
//
record[0].n= ++enter_c;   
stop = clock();
sum += stop-start-overhead_c;
record[0].t = sum;
//yy
 
struct record_type
{
clock_t t; //count time
int n;  //count the times the function is called
}
 
 
At the end of main(), i write record to a file.
 
I find it run very slow, even if i only profile a few functions. And another question is the total time is almost 0, although the function has been called a few hundred times.
 
Can you tell me how you write the efficient code? Thank you
 
Yong

piyush kaul <p...@yahoo.com> wrote:
Hi Yong,

One method (suggested by TI people) which I have
started using for my video decoder is to use the timer
register and do intrusive instrumentation of the major
functions of the code. There is a slight overhead of
using the timer, but overall it is would run in the
same as without profiling. Also you will be able to
run the code without any debug option (e.g. function
profile debug), so it would be as optimized as the
final version.Regards
Piyush

--- Yong Yang wrote:
> Hi, all
>
> I am profiling my video encoder program on TI DM642
> EVM. It's extremely slow. 17 hours has passed but it
> even has not encoded one frame! Any way to speed it
> up?
>
> Thank you
> Yong> ---------------------------------=====
**************************************
And---"A blind Understanding!" Heav'n replied.

Piyush Kaul
http://www.geocities.com/piyushkaul

__________________________________