Forums

Stall Cycles?

Started by salaria January 29, 2007
Hi 
I have just started using CCS for DM642 simulation. During profiling i see
that 99% fo the cycles are stall cycles and under 1% cycles are CPU cycles.
Could someone explain about stall cycles and some way to reduce it?

regards
APS

salaria wrote:
> Hi > I have just started using CCS for DM642 simulation. During profiling i see > that 99% fo the cycles are stall cycles and under 1% cycles are CPU cycles. > Could someone explain about stall cycles and some way to reduce it? > > regards > APS >
This sounds like 1 of 2 things: 1) You're seeing lots of NOPs in which case you need to turn on the optimizer in the compiler options. 2) You're seeing CPU stalls due to cache misses in which case you need to enable L2 cache and the corresponding MAR bits, or alternatively copy data into internal memory, operate on it, and then copy back to external memory. Brad
>salaria wrote: >> Hi >> I have just started using CCS for DM642 simulation. During profiling i
see
>> that 99% fo the cycles are stall cycles and under 1% cycles are CPU
cycles.
>> Could someone explain about stall cycles and some way to reduce it? >> >> regards >> APS >> > >This sounds like 1 of 2 things: > >1) You're seeing lots of NOPs in which case you need to turn on the >optimizer in the compiler options. > >2) You're seeing CPU stalls due to cache misses in which case you need >to enable L2 cache and the corresponding MAR bits, or alternatively copy
>data into internal memory, operate on it, and then copy back to external
>memory. > >Brad >
Hi Brad, I think it is the second reason. could you help me with the following: 1. How do i enable L2 cache and the corresponding MAR bits? 2. How do i tell the compiler to copy packets of data to the internal memmory and put them back to the external memmory when processing is done? Thanks Regards APS
salaria wrote:
>> salaria wrote: >>> Hi >>> I have just started using CCS for DM642 simulation. During profiling i > see >>> that 99% fo the cycles are stall cycles and under 1% cycles are CPU > cycles. >>> Could someone explain about stall cycles and some way to reduce it? >>> >>> regards >>> APS >>> >> This sounds like 1 of 2 things: >> >> 1) You're seeing lots of NOPs in which case you need to turn on the >> optimizer in the compiler options. >> >> 2) You're seeing CPU stalls due to cache misses in which case you need >> to enable L2 cache and the corresponding MAR bits, or alternatively copy > >> data into internal memory, operate on it, and then copy back to external > >> memory. >> >> Brad >> > > Hi Brad, > > I think it is the second reason. could you help me with the following: > > 1. How do i enable L2 cache and the corresponding MAR bits? > 2. How do i tell the compiler to copy packets of data to the internal > memmory and put them back to the external memmory when processing is > done? > > Thanks > Regards > APS >
In order to increase your system performance you want to minimize the number of accesses to system memory. The easy way to do this is to utilize the two-level cache. For example if you have a frame buffer in external memory you just operate on that buffer and the cache will boost your performance. The CCFG register configures the L2 cache mode. You can find its address in the data sheet and a description of the bitfields in the 64x Two-Level Memory Reference Guide. Same for the MAR bits. The more difficult way to do it but sometimes more optimal way to do it is to actually do the copies to internal memory manually. For example you might first copy a frame buffer from external memory to internal memory. Then you do all the processing on it and copy it back out to external memory. This is not something the compiler will do for you. You would most likely want to use DAT_copy from the CSL to do this copy for you (using QDMA) while your code is doing something else. I don't typically use the simulator so I'm not sure how much of this stuff is supported. It sounds like all the cache stuff is supported since it's apparently showing you hits and misses. I'm not sure about the QDMA though. You could substitute a memcpy temporarily and then replace it with a DAT_copy on real hardware if it's not supported. Brad