Forums

Issue with MIPS measurments in TMS320c6416

Started by Hamid May 24, 2004
Hi all

I have an optimized code running on DSK6416, as I count the number of
instruction in my code , it is 74 and because the sampling rate is
8000Hz so it comes to .592 MIPS)

I used the profiler and also TIMERs of DSP to measure the MIPS and it
gives me almost a double(1.1 MIPS) number.

I have checked my code to make sure, no part of it cause latency .

I'm puzzled about the differences,  could it be because of the  L2 =>
L1 Data/Program transfer as my code size is bigger than L1P.

I appreciate any comments.

Regards , H.Sepehr
Hamid wrote:

> Hi all > > I have an optimized code running on DSK6416, as I count the number of > instruction in my code , it is 74 and because the sampling rate is > 8000Hz so it comes to .592 MIPS) > > I used the profiler and also TIMERs of DSP to measure the MIPS and it > gives me almost a double(1.1 MIPS) number.
Is the algorithm sample rate limited? Othervice you might try to process twice the amount of data? (Maybe same data twice...) Do you poll (and wait) for samples? Then make sure you do not account for the busy wait loop. How do you count instructions? Especially when issued concurrently? If you count them eacg it would be rather easy to get more than twice the expected.
> > I have checked my code to make sure, no part of it cause latency . > > I'm puzzled about the differences, could it be because of the L2 => > L1 Data/Program transfer as my code size is bigger than L1P.
Does not 74 instructions fit in L1P??? How many times does it loop - first time it will need to get the instructions from L2 or worse (SDRAM). /RogerL -- Roger Larsson Skellefte� Sweden
I think your guess maybe right. I suggest that you run multiple
iterations of the same code. The second and after iteration should
perform better.

Regards
Piyush
PS: I really coudn't get hold of your mips calculation method. But I
hope it is right.
hsepehr@yahoo.com (Hamid) wrote in message
news:<aa01b0cc.0405240757.4ee8d926@posting.google.com>...
> Hi all > > I have an optimized code running on DSK6416, as I count the number of > instruction in my code , it is 74 and because the sampling rate is > 8000Hz so it comes to .592 MIPS) > > I used the profiler and also TIMERs of DSP to measure the MIPS and it > gives me almost a double(1.1 MIPS) number. > > I have checked my code to make sure, no part of it cause latency . > > I'm puzzled about the differences, could it be because of the L2 => > L1 Data/Program transfer as my code size is bigger than L1P. > > I appreciate any comments. > > Regards , H.Sepehr
Roger Larsson <roger.larsson@skelleftea.mail.telia.com> wrote in message
news:<onvsc.93377$dP1.297674@newsc.telia.net>...
> Hamid wrote: > > > Hi all > > > > I have an optimized code running on DSK6416, as I count the number of > > instruction in my code , it is 74 and because the sampling rate is > > 8000Hz so it comes to .592 MIPS) > > > > I used the profiler and also TIMERs of DSP to measure the MIPS and it > > gives me almost a double(1.1 MIPS) number. > > Is the algorithm sample rate limited? > Othervice you might try to process twice the amount of data? > (Maybe same data twice...) > Do you poll (and wait) for samples? > Then make sure you do not account for the busy wait loop. > How do you count instructions? Especially when issued concurrently? > If you count them eacg it would be rather easy to get more than > twice the expected. > > > > > I have checked my code to make sure, no part of it cause latency . > > > > I'm puzzled about the differences, could it be because of the L2 => > > L1 Data/Program transfer as my code size is bigger than L1P. > > Does not 74 instructions fit in L1P??? > How many times does it loop - first time it will need to get the > instructions from L2 or worse (SDRAM). > > /RogerL
Dear Roger Thanks for your reply. The algorithm is sample based but I pass data manually myself and measure the MIPS with the timer so I'm pretty sure that it runs once. I have handoptimized the code so I count the number of instructions that I have and I expect to get the same amount of instruction. By the way, this is part of the code I'm running and there is the same story for the whole code that is why it doesn't fit in L1P, but as you have mentioned the first time MIPS is a little bit worse but even after it is much higher than it should be. Thanks , H.Sepehr
Hamid wrote:

> The algorithm is sample based but I pass data manually myself and > measure the MIPS with the timer so I'm pretty sure that it runs once.
Have you specified the correct cycle time? (I think that the environment measures cycles only - then converts it to time or MIPS) Problem is that 6416 can execute one or several instructions each cycle - so how is the MIPS calculated??? If you count cycles instead - will that match what you expect?
> > I have handoptimized the code so I count the number of instructions > that I have and I expect to get the same amount of instruction.
The assembler can optimize your assembly... (parallelize it) /RogerL -- Roger Larsson Skellefte&#2013265925; Sweden
Roger Larsson <roger.larsson@skelleftea.mail.telia.com> wrote in message
news:<YqPsc.93436$dP1.298641@newsc.telia.net>...
> Hamid wrote: > > > The algorithm is sample based but I pass data manually myself and > > measure the MIPS with the timer so I'm pretty sure that it runs once. > > Have you specified the correct cycle time? (I think that the environment > measures cycles only - then converts it to time or MIPS) > > Problem is that 6416 can execute one or several instructions each cycle - > so how is the MIPS calculated??? If you count cycles instead - will that > match what you expect? > > > > > I have handoptimized the code so I count the number of instructions > > that I have and I expect to get the same amount of instruction. > > The assembler can optimize your assembly... (parallelize it) > > /RogerL
Dear Roger Thanks for your responce . As you are aware of , in C6x family there is pipelining feature which I have used it pretty good in my hand optimized assembly code . When I mentioned 74 , I meant number of cycle but C6x family can run up to 8 instruction in any cycle. I agree that it's not MIPS but my code has 74 cycle (number of optimized line in my code) but I get double figure when I measure it by timer of the DSP. Regards, hamid
Hamid wrote:

> > Dear Roger > > Thanks for your responce . > As you are aware of , in C6x family there is pipelining feature which > I have used it pretty good in my hand optimized assembly code . > > When I mentioned 74 , I meant number of cycle but C6x family can run > up to 8 instruction in any cycle. I agree that it's not MIPS but my > code has 74 cycle (number of optimized line in my code) but I get > double figure when I measure it by timer of the DSP. >
And you do not have any "NOP 4" and counted it as one cycle? Try to single step in assembly Other than that... - cache misses (associativity might play a part in this... several data streams aligned in the same way in memory...) - memory bank conflicts (~ hurting data load from different arrays) - DMA - 6416 can prioritize DMA accesses higher than CPU. - more? probably... /RogerL -- Roger Larsson Skellefte&#2013265925; Sweden