comp.dsp | Issue with MIPS measurments in TMS320c6416

Hi all

I have an optimized code running on DSK6416, as I count the number of
instruction in my code , it is 74 and because the sampling rate is
8000Hz so it comes to .592 MIPS)

I used the profiler and also TIMERs of DSP to measure the MIPS and it
gives me almost a double(1.1 MIPS) number.

I have checked my code to make sure, no part of it cause latency .

I'm puzzled about the differences,  could it be because of the  L2 =>
L1 Data/Program transfer as my code size is bigger than L1P.

I appreciate any comments.

Regards , H.Sepehr

Reply by Roger Larsson ●May 24, 20042004-05-24

Hamid wrote:

> Hi all
> 
> I have an optimized code running on DSK6416, as I count the number of
> instruction in my code , it is 74 and because the sampling rate is
> 8000Hz so it comes to .592 MIPS)
> 
> I used the profiler and also TIMERs of DSP to measure the MIPS and it
> gives me almost a double(1.1 MIPS) number.

Is the algorithm sample rate limited?
        Othervice you might try to process twice the amount of data?
        (Maybe same data twice...)
Do you poll (and wait) for samples?
        Then make sure you do not account for the busy wait loop.
How do you count instructions? Especially when issued concurrently?
        If you count them eacg it would be rather easy to get more than
        twice the expected.
        
> 
> I have checked my code to make sure, no part of it cause latency .
> 
> I'm puzzled about the differences,  could it be because of the  L2 =>
> L1 Data/Program transfer as my code size is bigger than L1P.

Does not 74 instructions fit in L1P???
How many times does it loop - first time it will need to get the
instructions from L2 or worse (SDRAM).

/RogerL

-- 
Roger Larsson
Skellefte&#4294967295;
Sweden

Reply by Piyush Kaul ●May 25, 20042004-05-25

I think your guess maybe right. I suggest that you run multiple
iterations of the same code. The second and after iteration should
perform better.

Regards
Piyush
PS: I really coudn't get hold of your mips calculation method. But I
hope it is right.
hsepehr@yahoo.com (Hamid) wrote in message news:<aa01b0cc.0405240757.4ee8d926@posting.google.com>...
> Hi all
> 
> I have an optimized code running on DSK6416, as I count the number of
> instruction in my code , it is 74 and because the sampling rate is
> 8000Hz so it comes to .592 MIPS)
> 
> I used the profiler and also TIMERs of DSP to measure the MIPS and it
> gives me almost a double(1.1 MIPS) number.
> 
> I have checked my code to make sure, no part of it cause latency .
> 
> I'm puzzled about the differences,  could it be because of the  L2 =>
> L1 Data/Program transfer as my code size is bigger than L1P.
> 
> I appreciate any comments.
> 
> Regards , H.Sepehr

Reply by Hamid ●May 25, 20042004-05-25

Roger Larsson <roger.larsson@skelleftea.mail.telia.com> wrote in message news:<onvsc.93377$dP1.297674@newsc.telia.net>...
> Hamid wrote:
> 
> > Hi all
> > 
> > I have an optimized code running on DSK6416, as I count the number of
> > instruction in my code , it is 74 and because the sampling rate is
> > 8000Hz so it comes to .592 MIPS)
> > 
> > I used the profiler and also TIMERs of DSP to measure the MIPS and it
> > gives me almost a double(1.1 MIPS) number.
> 
> Is the algorithm sample rate limited?
>         Othervice you might try to process twice the amount of data?
>         (Maybe same data twice...)
> Do you poll (and wait) for samples?
>         Then make sure you do not account for the busy wait loop.
> How do you count instructions? Especially when issued concurrently?
>         If you count them eacg it would be rather easy to get more than
>         twice the expected.
>         
> > 
> > I have checked my code to make sure, no part of it cause latency .
> > 
> > I'm puzzled about the differences,  could it be because of the  L2 =>
> > L1 Data/Program transfer as my code size is bigger than L1P.
> 
> Does not 74 instructions fit in L1P???
> How many times does it loop - first time it will need to get the
> instructions from L2 or worse (SDRAM).
> 
> /RogerL


Dear Roger 

Thanks for your reply.

The algorithm is sample based but I pass data manually myself and
measure the MIPS with the timer so I'm pretty sure that it runs once.

I have handoptimized the code so I count the number of instructions
that I have and I expect to get the same amount of instruction.

By the way, this is part of the code I'm running and there is the same
story for the whole code that is why it doesn't fit in L1P, but as you
have mentioned the first time MIPS is a little bit worse but even
after it is much higher than it should be.

Thanks , H.Sepehr

Reply by Roger Larsson ●May 25, 20042004-05-25

Hamid wrote:

> The algorithm is sample based but I pass data manually myself and
> measure the MIPS with the timer so I'm pretty sure that it runs once.

Have you specified the correct cycle time? (I think that the environment
measures cycles only - then converts it to time or MIPS)

Problem is that 6416 can execute one or several instructions each cycle -
so how is the MIPS calculated??? If you count cycles instead - will that
match what you expect?

> 
> I have handoptimized the code so I count the number of instructions
> that I have and I expect to get the same amount of instruction.

The assembler can optimize your assembly... (parallelize it)

/RogerL

-- 
Roger Larsson
Skellefte&#4294967295;
Sweden

Reply by Hamid ●May 26, 20042004-05-26

Roger Larsson <roger.larsson@skelleftea.mail.telia.com> wrote in message news:<YqPsc.93436$dP1.298641@newsc.telia.net>...
> Hamid wrote:
> 
> > The algorithm is sample based but I pass data manually myself and
> > measure the MIPS with the timer so I'm pretty sure that it runs once.
> 
> Have you specified the correct cycle time? (I think that the environment
> measures cycles only - then converts it to time or MIPS)
> 
> Problem is that 6416 can execute one or several instructions each cycle -
> so how is the MIPS calculated??? If you count cycles instead - will that
> match what you expect?
> 
> > 
> > I have handoptimized the code so I count the number of instructions
> > that I have and I expect to get the same amount of instruction.
> 
> The assembler can optimize your assembly... (parallelize it)
> 
> /RogerL

Dear Roger

Thanks for your responce .
As you are aware of , in C6x family there is pipelining feature which
I have used it pretty good in my hand optimized assembly code .

When I mentioned 74 , I meant number of cycle but C6x family can run
up to 8 instruction in any cycle. I agree that it's not MIPS but my
code has 74 cycle (number of optimized line in my code) but I get
double figure when I measure it by timer of the DSP.

Regards, hamid

Reply by Roger Larsson ●May 26, 20042004-05-26

Hamid wrote:

> 
> Dear Roger
> 
> Thanks for your responce .
> As you are aware of , in C6x family there is pipelining feature which
> I have used it pretty good in my hand optimized assembly code .
> 
> When I mentioned 74 , I meant number of cycle but C6x family can run
> up to 8 instruction in any cycle. I agree that it's not MIPS but my
> code has 74 cycle (number of optimized line in my code) but I get
> double figure when I measure it by timer of the DSP.
> 

And you do not have any "NOP 4" and counted it as one cycle?
Try to single step in assembly

Other than that...
- cache misses (associativity might play a part in this...
   several data streams aligned in the same way in memory...)
- memory bank conflicts (~ hurting data load from different arrays)
- DMA - 6416 can prioritize DMA accesses higher than CPU.
- more? probably...

/RogerL

-- 
Roger Larsson
Skellefte&#4294967295;
Sweden

Issue with MIPS measurments in TMS320c6416

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group