Forums

Different profile results with load6x and CCS

Started by Sachin Gupta October 16, 2003
Hi,

I am profiling a function and am using CCS as well as load6x for this
purpose. I am getting different results from both of them.

1. Load6x is showing a lesser count than CCS.
2. Load6x shows consistent counts for max min and avg. CCS shows
different counts for max min and avg.

Any reasons as to why this might be happening. Do I need to use
different compiler option with load6x (I tried -mg option and CCS gives
a warning "-mg is deprecated use -gp")

Which results should I count on ?

Sachin
__________________________________



Hi Sachin,
The IDE is an overhead...what you are observing is
consistent with the understading that shell program runs LIGHTER
compared to the IDE...the ide gives you so many features to debug and
all of that increases your code size as well your cycle count
summary...so as i said,what you are observing is to be expected...its
a matter of only by how much...am sure the load6x results are closer
to the reality than the profiler results...somebody could verify
that...but i think thats the case...anyways,the max,min and avg cycle
being consistent beats me...i dont why thats the case....

Bhooshan

> I am profiling a function and am using CCS as well as load6x for
this purpose. I am getting different results from both of them.

> 1. Load6x is showing a lesser count than CCS.
> 2. Load6x shows consistent counts for max min and avg. CCS shows
> different counts for max min and avg.
>
> Any reasons as to why this might be happening. Do I need to use
> different compiler option with load6x (I tried -mg option and CCS
gives
> a warning "-mg is deprecated use -gp")
>
> Which results should I count on ?
>
> Sachin >
> __________________________________
>





To all CCS profiling users,
 
First of all, let me disspell the myth that the user interface affects profiling accuracy.  When you are using a simulator, it is instrumented with counters to take measurements.  By the same token, when you are profiling hardware counters inside the DSP are used to take measurements.  After these measurements are accumulated, they are supplied to the user interface.
 
The comments below refer to profiling DSP programs without DMA or IO.
 
TI has several different c6x simulators.  load6x appears to be a basic instruction set simulator - it seems to always provide the exact same for the same set of instructions.  These results appear to match the instruction cycle count in TI's documentation.  There appears to be no consideration given to memory bank conflicts or other real world events.  This simulator is quick and gives a good overview for evaluting sw performance, but has some limitations in accuracy and what you can profile.
 
CCS has at least two simulators per device [keep in mind that 620x/6701 devices have very different real world performance characteristics than 621x/671x devices].  One of these is functionally accurate [faster] and one is cycle accurate [better accuracy].  When profiling functions or groups of functions, the cycle accurate simulator is very close to the hardware measurements.
 
I always try to check my numbers on hardware and a cycle accurate simulator - keeping in mind that even hardware measurements can be slightly different than the real world.
NOTE:
Because of their cache architecture, 621x/671x devices will normally show greater "numbers" variation than 620x/6701 devices.
 
Call me old fashioned, but the numbers that I use come from a working program.  I set a latch to start measurement and clear it it stop and monitor it with a scope or logic analyzer [yes, this adds a tiny bit of overhead but when I remove it that is my margin].
 
I guess that I will get off of my soapbox for now,
mikedunn
 


bhooshaniyer <b...@hotmail.com> wrote:
Hi Sachin,
The IDE is an overhead...what you are observing is
consistent with the understading that shell program runs LIGHTER
compared to the IDE...the ide gives you so many features to debug and
all of that increases your code size as well your cycle count
summary...so as i said,what you are observing is to be expected...its
a matter of only by how much...am sure the load6x results are closer
to the reality than the profiler results...somebody could verify
that...but i think thats the case...anyways,the max,min and avg cycle
being consistent beats me...i dont why thats the case....

Bhooshan

> I am profiling a function and am using CCS as well as load6x for
this purpose. I am getting different results from both of them.

> 1. Load6x is showing a lesser count than CCS.
> 2. Load6x shows consistent counts for max min and avg. CCS shows
> different counts for max min and avg.
>
> Any reasons as to why this might be happening. Do I need to use
> different compiler option with load6x (I tried -mg option and CCS
gives
> a warning "-mg is deprecated use -gp")
>
> Which results should I count on ?
>
> Sachin>
> __________________________________
_____________________________________
Note: If you do a simple "reply" with your email client, only the author of this message will receive your answer. You need to do a "reply all" if you want your answer to be distributed to the entire group.

_____________________________________
About this discussion group:

To Join: Send an email to c...@yahoogroups.com

To Post: Send an email to c...@yahoogroups.com

To Leave: Send an email to c...@yahoogroups.com

Archives: http://www.yahoogroups.com/group/c6x

Other Groups: http://www.dsprelated.com




Mika and Bhooshan,

Thanks for your replies. They have been most helpful.

Mike, which are the two simulators you are talking about ? I know of
only two - one CCS itself and other load6x.

Best Regards,
Sachin

--- Mike Dunn <> wrote:
> To all CCS profiling users,
>
> First of all, let me disspell the myth that the user interface
> affects profiling accuracy. When you are using a simulator, it is
> instrumented with counters to take measurements. By the same token,
> when you are profiling hardware counters inside the DSP are used to
> take measurements. After these measurements are accumulated, they
> are supplied to the user interface.
>
> The comments below refer to profiling DSP programs without DMA or IO.
>
> TI has several different c6x simulators. load6x appears to be a
> basic instruction set simulator - it seems to always provide the
> exact same for the same set of instructions. These results appear to
> match the instruction cycle count in TI's documentation. There
> appears to be no consideration given to memory bank conflicts or
> other real world events. This simulator is quick and gives a good
> overview for evaluting sw performance, but has some limitations in
> accuracy and what you can profile.
>
> CCS has at least two simulators per device [keep in mind that
> 620x/6701 devices have very different real world performance
> characteristics than 621x/671x devices]. One of these is
> functionally accurate [faster] and one is cycle accurate [better
> accuracy]. When profiling functions or groups of functions, the
> cycle accurate simulator is very close to the hardware measurements.
>
> I always try to check my numbers on hardware and a cycle accurate
> simulator - keeping in mind that even hardware measurements can be
> slightly different than the real world.
> NOTE:
> Because of their cache architecture, 621x/671x devices will normally
> show greater "numbers" variation than 620x/6701 devices.
>
> Call me old fashioned, but the numbers that I use come from a working
> program. I set a latch to start measurement and clear it it stop and
> monitor it with a scope or logic analyzer [yes, this adds a tiny bit
> of overhead but when I remove it that is my margin].
>
> I guess that I will get off of my soapbox for now,
> mikedunn >
> bhooshaniyer <> wrote:
> Hi Sachin,
> The IDE is an overhead...what you are observing is
> consistent with the understading that shell program runs LIGHTER
> compared to the IDE...the ide gives you so many features to debug and
>
> all of that increases your code size as well your cycle count
> summary...so as i said,what you are observing is to be expected...its
>
> a matter of only by how much...am sure the load6x results are closer
> to the reality than the profiler results...somebody could verify
> that...but i think thats the case...anyways,the max,min and avg cycle
>
> being consistent beats me...i dont why thats the case....
>
> Bhooshan >
>
> > I am profiling a function and am using CCS as well as load6x for
> this purpose. I am getting different results from both of them.
>
> > 1. Load6x is showing a lesser count than CCS.
> > 2. Load6x shows consistent counts for max min and avg. CCS shows
> > different counts for max min and avg.
> >
> > Any reasons as to why this might be happening. Do I need to use
> > different compiler option with load6x (I tried -mg option and CCS
> gives
> > a warning "-mg is deprecated use -gp")
> >
> > Which results should I count on ?
> >
> > Sachin
> >
> >
> >
> > __________________________________
> > _____________________________________
> Note: If you do a simple "reply" with your email client, only the
> author of this message will receive your answer. You need to do a
> "reply all" if you want your answer to be distributed to the entire
> group.
>
> _____________________________________
> About this discussion group:
>
> To Join: Send an email to
>
> To Post: Send an email to
>
> To Leave: Send an email to
>
> Archives: http://www.yahoogroups.com/group/c6x
>
> Other Groups: http://www.dsprelated.com > ">http://docs.yahoo.com/info/terms/


__________________________________