c6x | Slow EMIF transfer| page 2

Reply by "d.stuartnl" ●July 13, 20092009-07-13

Hi all,

I've examined my code and hardware, the FIFO's I'm accessing are configured as Asynchronous.

The address space is configured as:
*0x1800004 = 0x10914221; /* CE1 = async 32 */

If I understand it correctly this should be 1 setup, 2 strobe and 1 hold cycle. The EMIF is 100 MHz so one read should take 0,04us.

When I measure the performance 1 read takes 0.333 us. Where does this delay come from?

The code is as follows:

read1 = (int*) 0x90300004;
tmpRead1 = *read1; // this line takes 0.333 us.

tmpRead1 is defined as a volatile int and resides in IRAM (address: 0x0001d100 according to the .map file.

I saw that R. Williams suggested that there may be 10 wait states, what are these and how can i verify?

If I look at the assembly code for the tmpRead1 = *read1; line it states:

MV.L2X A5,B4
LDW.D2T2 *+B4[0],B4
MVK.S1 0xffffd100,A6
MVKH.S1 0x10000,A6
NOP 2
STW.D1T2 B4,*+A6[0]
NOP

I hope anyone can shed some light on this or point me in the right direction to debug this problem.

With kind regards,

Dominic Stuart
--- In c..., "d.stuartnl" wrote:
>
> Thanks for all of your responses, I've checked the .gel file and found that the EMIF_CE registers for the FIFO's I'm reading from are configured as 32-bit asynchronous. The FIFO's are capable of Synchronous datatransfer so I will check the SPRU document and program the registers correctfully. I will post a final message if that fixes my problem.
>
> --- In c..., "Richard Williams" wrote:
> >
> > D.S,
> >
> > it looks, on first examination, like the memory at 0x90300000 has about 10 wait
> > states.
> >
> > Have you examined the actual source code?
> > I would expect a max of 4 instructions to perform the 'tmpRead1 = *read2'
> > fetch read1 (source address)
> > fetch @ read1 (contents)
> > fetch tmpRead1 (destination address)
> > store @ tmpRead1 (contents)
> >
> > R. Williams
> >
> >
> > ---------- Original Message -----------
> > From: d.stuartnl@
> > To: c...
> > Sent: Tue, 23 Jun 2009 09:17:13 -0400
> > Subject: [c6x] Slow EMIF transfer
> >
> > > Hi all,
> > >
> > > I am a fairly new embedded programmer and this is my first post on
> > > this forum. I am working with a 6713 DSP and in my current project I
> > > am reading data from some FIFO's connected to the EMIF bus. My problem
> > > is the performance of the EMIF I have measured the time it takes to
> > > read from the EMIF and i have confirmed these findings with the simulator.
> > >
> > > I'm excecuting the folowing code:
> > >
> > > x++; // 10 clocks - 0.033 us
> > > read1 = (int*) 0x90300004; // 3 clocks - 0.010 us
> > > read2 = (int*) 0x90300008; // 3 clocks - 0.010 us
> > > tmpRead1 = *read1; // 177 clocks - 0.590 us
> > > tmpRead2 = *read2; // 176 clocks - 0.586 us
> > >
> > > I've commented the measured clocktimes according to the simulator. 177
> > > clocks for 1 read seems a bit much. Am I overlooking something? How
> > > can i acquire a higher transfer speed?
> > >
> > > With kind regards,
> > >
> > > Dominic Stuart
> > ------- End of Original Message -------
>

_____________________________________

Reply by Richard Williams ●July 13, 20092009-07-13

d.stuartnl,

I see 10 instructions in the code:

MV.L2X A5,B4 <<-- copy source address to working register
LDW.D2T2 *+B4[0],B4 <<-- read value at source
MVK.S1 0xffffd100,A6 <<-- set up low portion of destination address in A6
MVKH.S1 0x10000,A6 <<-- set up high portion of destination address in A6
NOP 2 <<-- wait two NOP instruction times
STW.D1T2 B4,*+A6[0] <<-- write value to destination address
NOP <<-- wait 1 NOP instruction time

So the "tmpRead1 = *read1;" takes ~10 instructions.
( I do not have the H/W details at hand, so cannot supply the specifics on cycle
times for each instruction. I also do not know the specific processor well
enough to predict the amount of pipeline stalls, etc )

In general, the operation tmpRead1 = *read1; will take much longer than the time
needed to read the value from the source address.

Finally, the time measuring tool (?JTAG?) probably imposes some delays for
setup/communication/etc.

R. Williams

---------- Original Message -----------
From: "d.stuartnl"
To: c...
Sent: Mon, 13 Jul 2009 13:14:04 -0000
Subject: [c6x] Re: Slow EMIF transfer

> Hi all,
>
> I've examined my code and hardware, the FIFO's I'm accessing are
> configured as Asynchronous.
>
> The address space is configured as:
> *0x1800004 = 0x10914221; /* CE1 = async 32 */
>
> If I understand it correctly this should be 1 setup, 2 strobe and 1
> hold cycle. The EMIF is 100 MHz so one read should take 0,04us.
>
> When I measure the performance 1 read takes 0.333 us. Where does this
> delay come from?
>
> The code is as follows:
>
> read1 = (int*) 0x90300004;
> tmpRead1 = *read1; // this line takes 0.333 us.
>
> tmpRead1 is defined as a volatile int and resides in IRAM (address:
> 0x0001d100 according to the .map file.
>
> I saw that R. Williams suggested that there may be 10 wait states,
> what are these and how can i verify?
>
> If I look at the assembly code for the tmpRead1 = *read1; line it states:
>
> MV.L2X A5,B4
> LDW.D2T2 *+B4[0],B4
> MVK.S1 0xffffd100,A6
> MVKH.S1 0x10000,A6
> NOP 2
> STW.D1T2 B4,*+A6[0]
> NOP
>
> I hope anyone can shed some light on this or point me in the right
> direction to debug this problem.
>
> With kind regards,
>
> Dominic Stuart
>
> --- In c..., "d.stuartnl" wrote:
> >
> > Thanks for all of your responses, I've checked the .gel file and found that
the EMIF_CE registers for the FIFO's I'm reading from are configured as 32-bit
asynchronous. The FIFO's are capable of Synchronous datatransfer so I will check
the SPRU document and program the registers correctfully. I will post a final
message if that fixes my problem.
> >
> > --- In c..., "Richard Williams" wrote:
> > >
> > > D.S,
> > >
> > > it looks, on first examination, like the memory at 0x90300000 has about 10
wait
> > > states.
> > >
> > > Have you examined the actual source code?
> > > I would expect a max of 4 instructions to perform the 'tmpRead1 = *read2'
> > > fetch read1 (source address)
> > > fetch @ read1 (contents)
> > > fetch tmpRead1 (destination address)
> > > store @ tmpRead1 (contents)
> > >
> > > R. Williams
> > >
> > >
> > > ---------- Original Message -----------
> > > From: d.stuartnl@
> > > To: c...
> > > Sent: Tue, 23 Jun 2009 09:17:13 -0400
> > > Subject: [c6x] Slow EMIF transfer
> > >
> > > > Hi all,
> > > >
> > > > I am a fairly new embedded programmer and this is my first post on
> > > > this forum. I am working with a 6713 DSP and in my current project I
> > > > am reading data from some FIFO's connected to the EMIF bus. My problem
> > > > is the performance of the EMIF I have measured the time it takes to
> > > > read from the EMIF and i have confirmed these findings with the simulator.
> > > >
> > > > I'm excecuting the folowing code:
> > > >
> > > > x++; // 10 clocks - 0.033 us
> > > > read1 = (int*) 0x90300004; // 3 clocks - 0.010 us
> > > > read2 = (int*) 0x90300008; // 3 clocks - 0.010 us
> > > > tmpRead1 = *read1; // 177 clocks - 0.590 us
> > > > tmpRead2 = *read2; // 176 clocks - 0.586 us
> > > >
> > > > I've commented the measured clocktimes according to the simulator. 177
> > > > clocks for 1 read seems a bit much. Am I overlooking something? How
> > > > can i acquire a higher transfer speed?
> > > >
> > > > With kind regards,
> > > >
> > > > Dominic Stuart
> > > ------- End of Original Message -------
> > >
> >
------- End of Original Message -------

_____________________________________

Reply by "d.stuartnl" ●July 13, 20092009-07-13

Thanks for your reply R.Williams,

I am measuring the instruction with a hardware timer (which creates 0.1665 us delay). This means that the read instruction still takes (0.333 - 0.1665) 0.1665 us. This seems very slow (+/-6MHz) for a 300 MHz CPU connected to a 100MHz bus. Is there any way to speed this up?
With kind regards,

Dominic

--- In c..., "Richard Williams" wrote:
>
> d.stuartnl,
>
> I see 10 instructions in the code:
>
> MV.L2X A5,B4 <<-- copy source address to working register
> LDW.D2T2 *+B4[0],B4 <<-- read value at source
> MVK.S1 0xffffd100,A6 <<-- set up low portion of destination address in A6
> MVKH.S1 0x10000,A6 <<-- set up high portion of destination address in A6
> NOP 2 <<-- wait two NOP instruction times
> STW.D1T2 B4,*+A6[0] <<-- write value to destination address
> NOP <<-- wait 1 NOP instruction time
>
> So the "tmpRead1 = *read1;" takes ~10 instructions.
> ( I do not have the H/W details at hand, so cannot supply the specifics on cycle
> times for each instruction. I also do not know the specific processor well
> enough to predict the amount of pipeline stalls, etc )
>
> In general, the operation tmpRead1 = *read1; will take much longer than the time
> needed to read the value from the source address.
>
> Finally, the time measuring tool (?JTAG?) probably imposes some delays for
> setup/communication/etc.
>
> R. Williams
>
> ---------- Original Message -----------
> From: "d.stuartnl"
> To: c...
> Sent: Mon, 13 Jul 2009 13:14:04 -0000
> Subject: [c6x] Re: Slow EMIF transfer
>
> > Hi all,
> >
> > I've examined my code and hardware, the FIFO's I'm accessing are
> > configured as Asynchronous.
> >
> > The address space is configured as:
> > *0x1800004 = 0x10914221; /* CE1 = async 32 */
> >
> > If I understand it correctly this should be 1 setup, 2 strobe and 1
> > hold cycle. The EMIF is 100 MHz so one read should take 0,04us.
> >
> > When I measure the performance 1 read takes 0.333 us. Where does this
> > delay come from?
> >
> > The code is as follows:
> >
> > read1 = (int*) 0x90300004;
> > tmpRead1 = *read1; // this line takes 0.333 us.
> >
> > tmpRead1 is defined as a volatile int and resides in IRAM (address:
> > 0x0001d100 according to the .map file.
> >
> > I saw that R. Williams suggested that there may be 10 wait states,
> > what are these and how can i verify?
> >
> > If I look at the assembly code for the tmpRead1 = *read1; line it states:
> >
> > MV.L2X A5,B4
> > LDW.D2T2 *+B4[0],B4
> > MVK.S1 0xffffd100,A6
> > MVKH.S1 0x10000,A6
> > NOP 2
> > STW.D1T2 B4,*+A6[0]
> > NOP
> >
> > I hope anyone can shed some light on this or point me in the right
> > direction to debug this problem.
> >
> > With kind regards,
> >
> > Dominic Stuart
> >
> > --- In c..., "d.stuartnl" wrote:
> > >
> > > Thanks for all of your responses, I've checked the .gel file and found that
> the EMIF_CE registers for the FIFO's I'm reading from are configured as 32-bit
> asynchronous. The FIFO's are capable of Synchronous datatransfer so I will check
> the SPRU document and program the registers correctfully. I will post a final
> message if that fixes my problem.
> > >
> > > --- In c..., "Richard Williams" wrote:
> > > >
> > > > D.S,
> > > >
> > > > it looks, on first examination, like the memory at 0x90300000 has about 10
> wait
> > > > states.
> > > >
> > > > Have you examined the actual source code?
> > > > I would expect a max of 4 instructions to perform the 'tmpRead1 = *read2'
> > > > fetch read1 (source address)
> > > > fetch @ read1 (contents)
> > > > fetch tmpRead1 (destination address)
> > > > store @ tmpRead1 (contents)
> > > >
> > > > R. Williams
> > > >
> > > >
> > > > ---------- Original Message -----------
> > > > From: d.stuartnl@
> > > > To: c...
> > > > Sent: Tue, 23 Jun 2009 09:17:13 -0400
> > > > Subject: [c6x] Slow EMIF transfer
> > > >
> > > > > Hi all,
> > > > >
> > > > > I am a fairly new embedded programmer and this is my first post on
> > > > > this forum. I am working with a 6713 DSP and in my current project I
> > > > > am reading data from some FIFO's connected to the EMIF bus. My problem
> > > > > is the performance of the EMIF I have measured the time it takes to
> > > > > read from the EMIF and i have confirmed these findings with the simulator.
> > > > >
> > > > > I'm excecuting the folowing code:
> > > > >
> > > > > x++; // 10 clocks - 0.033 us
> > > > > read1 = (int*) 0x90300004; // 3 clocks - 0.010 us
> > > > > read2 = (int*) 0x90300008; // 3 clocks - 0.010 us
> > > > > tmpRead1 = *read1; // 177 clocks - 0.590 us
> > > > > tmpRead2 = *read2; // 176 clocks - 0.586 us
> > > > >
> > > > > I've commented the measured clocktimes according to the simulator. 177
> > > > > clocks for 1 read seems a bit much. Am I overlooking something? How
> > > > > can i acquire a higher transfer speed?
> > > > >
> > > > > With kind regards,
> > > > >
> > > > > Dominic Stuart
> > > > ------- End of Original Message -------
> > > >
> > >
> ------- End of Original Message -------
>

_____________________________________

Reply by Jeff Brower ●July 13, 20092009-07-13

Dominic-

> Thanks for your reply R.Williams,
>
> I am measuring the instruction with a hardware timer (which creates
> 0.1665 us delay). This means that the read
> instruction still takes (0.333 - 0.1665) 0.1665 us. This seems very
> slow (+/-6MHz) for a 300 MHz CPU connected to a
> 100MHz bus. Is there any way to speed this up?

Any "internal" technique you use to measure the duration of something in the nsec range is not going to be accurate.
As you have found, reading hardware timer registers has some inherent delay, and as Richard mentions, a JTAG and/or
RTDX based method would take so much time the actual memory cycle would end up a tiny fraction.

The only way to accurately measure a single memory cycle time is externally (dig scope or LA). My suggestion to get a
worst-case figure would be to make three accesses: one to your mem, one to another mem (to force a change in CEn
lines), and a third one to your mem. And watch this on the scope. Then you would know both the cycle duration and
the amount of time the compiler is adding for your line of C code.

-Jeff

> --- In c..., "Richard Williams" wrote:
>>
>> d.stuartnl,
>>
>> I see 10 instructions in the code:
>>
>> MV.L2X A5,B4 <<-- copy source address to working register
>> LDW.D2T2 *+B4[0],B4 <<-- read value at source
>> MVK.S1 0xffffd100,A6 <<-- set up low portion of destination address in A6
>> MVKH.S1 0x10000,A6 <<-- set up high portion of destination address in A6
>> NOP 2 <<-- wait two NOP instruction times
>> STW.D1T2 B4,*+A6[0] <<-- write value to destination address
>> NOP <<-- wait 1 NOP instruction time
>>
>> So the "tmpRead1 = *read1;" takes ~10 instructions.
>> ( I do not have the H/W details at hand, so cannot supply the specifics on cycle
>> times for each instruction. I also do not know the specific processor well
>> enough to predict the amount of pipeline stalls, etc )
>>
>> In general, the operation tmpRead1 = *read1; will take much longer than the time
>> needed to read the value from the source address.
>>
>> Finally, the time measuring tool (?JTAG?) probably imposes some delays for
>> setup/communication/etc.
>>
>> R. Williams
>>
>> ---------- Original Message -----------
>> From: "d.stuartnl"
>> To: c...
>> Sent: Mon, 13 Jul 2009 13:14:04 -0000
>> Subject: [c6x] Re: Slow EMIF transfer
>>
>> > Hi all,
>> >
>> > I've examined my code and hardware, the FIFO's I'm accessing are
>> > configured as Asynchronous.
>> >
>> > The address space is configured as:
>> > *0x1800004 = 0x10914221; /* CE1 = async 32 */
>> >
>> > If I understand it correctly this should be 1 setup, 2 strobe and 1
>> > hold cycle. The EMIF is 100 MHz so one read should take 0,04us.
>> >
>> > When I measure the performance 1 read takes 0.333 us. Where does this
>> > delay come from?
>> >
>> > The code is as follows:
>> >
>> > read1 = (int*) 0x90300004;
>> > tmpRead1 = *read1; // this line takes 0.333 us.
>> >
>> > tmpRead1 is defined as a volatile int and resides in IRAM (address:
>> > 0x0001d100 according to the .map file.
>> >
>> > I saw that R. Williams suggested that there may be 10 wait states,
>> > what are these and how can i verify?
>> >
>> > If I look at the assembly code for the tmpRead1 = *read1; line it states:
>> >
>> > MV.L2X A5,B4
>> > LDW.D2T2 *+B4[0],B4
>> > MVK.S1 0xffffd100,A6
>> > MVKH.S1 0x10000,A6
>> > NOP 2
>> > STW.D1T2 B4,*+A6[0]
>> > NOP
>> >
>> > I hope anyone can shed some light on this or point me in the right
>> > direction to debug this problem.
>> >
>> > With kind regards,
>> >
>> > Dominic Stuart
>> >
>> > --- In c..., "d.stuartnl" wrote:
>> > >
>> > > Thanks for all of your responses, I've checked the .gel file and found that
>> the EMIF_CE registers for the FIFO's I'm reading from are configured as 32-bit
>> asynchronous. The FIFO's are capable of Synchronous datatransfer so I will check
>> the SPRU document and program the registers correctfully. I will post a final
>> message if that fixes my problem.
>> > >
>> > > --- In c..., "Richard Williams" wrote:
>> > > >
>> > > > D.S,
>> > > >
>> > > > it looks, on first examination, like the memory at 0x90300000 has about 10
>> wait
>> > > > states.
>> > > >
>> > > > Have you examined the actual source code?
>> > > > I would expect a max of 4 instructions to perform the 'tmpRead1 = *read2'
>> > > > fetch read1 (source address)
>> > > > fetch @ read1 (contents)
>> > > > fetch tmpRead1 (destination address)
>> > > > store @ tmpRead1 (contents)
>> > > >
>> > > > R. Williams
>> > > >
>> > > >
>> > > > ---------- Original Message -----------
>> > > > From: d.stuartnl@
>> > > > To: c...
>> > > > Sent: Tue, 23 Jun 2009 09:17:13 -0400
>> > > > Subject: [c6x] Slow EMIF transfer
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > I am a fairly new embedded programmer and this is my first post on
>> > > > > this forum. I am working with a 6713 DSP and in my current project I
>> > > > > am reading data from some FIFO's connected to the EMIF bus. My problem
>> > > > > is the performance of the EMIF I have measured the time it takes to
>> > > > > read from the EMIF and i have confirmed these findings with the simulator.
>> > > > >
>> > > > > I'm excecuting the folowing code:
>> > > > >
>> > > > > x++; // 10 clocks - 0.033 us
>> > > > > read1 = (int*) 0x90300004; // 3 clocks - 0.010 us
>> > > > > read2 = (int*) 0x90300008; // 3 clocks - 0.010 us
>> > > > > tmpRead1 = *read1; // 177 clocks - 0.590 us
>> > > > > tmpRead2 = *read2; // 176 clocks - 0.586 us
>> > > > >
>> > > > > I've commented the measured clocktimes according to the simulator. 177
>> > > > > clocks for 1 read seems a bit much. Am I overlooking something? How
>> > > > > can i acquire a higher transfer speed?
>> > > > >
>> > > > > With kind regards,
>> > > > >
>> > > > > Dominic Stuart
>> > > > ------- End of Original Message -------
>> > > >
>> > >
>> ------- End of Original Message -------

_____________________________________

Reply by Richard Williams ●July 13, 20092009-07-13

D.stuartnl,

If I were coding it in ASM. I would do the following:

> > MVK.S1 0xffffd100,A6 <<-- set up low portion of destination address in A6
> > MVKH.S1 0x10000,A6 <<-- set up high portion of destination address in A6
> > LDW.D2T2 *+A5[0],B4 <<-- read value at source
> > NOP <<-- wait 1 NOP instruction time
> > STW.D1T2 B4,*+A6[0] <<-- write value to destination address
> > NOP <<-- wait 1 NOP instruction time

However, this may have some deficiencies in the use of the pipeline and assumes
A5 can be used for accessing the RAM.

R. Williams

---------- Original Message -----------
From: "d.stuartnl"
To: c...
Sent: Mon, 13 Jul 2009 14:31:38 -0000
Subject: [c6x] Re: Slow EMIF transfer

> Thanks for your reply R.Williams,
>
> I am measuring the instruction with a hardware timer (which creates
> 0.1665 us delay). This means that the read instruction still takes
> (0.333 - 0.1665) 0.1665 us. This seems very slow (+/-6MHz) for a 300
> MHz CPU connected to a 100MHz bus. Is there any way to speed this up?
>
> With kind regards,
>
> Dominic
>
> --- In c..., "Richard Williams" wrote:
> >
> > d.stuartnl,
> >
> > I see 10 instructions in the code:
> >
> > MV.L2X A5,B4 <<-- copy source address to working register
> > LDW.D2T2 *+B4[0],B4 <<-- read value at source
> > MVK.S1 0xffffd100,A6 <<-- set up low portion of destination address in A6
> > MVKH.S1 0x10000,A6 <<-- set up high portion of destination address in A6
> > NOP 2 <<-- wait two NOP instruction times
> > STW.D1T2 B4,*+A6[0] <<-- write value to destination address
> > NOP <<-- wait 1 NOP instruction time
> >
> > So the "tmpRead1 = *read1;" takes ~10 instructions.
> > ( I do not have the H/W details at hand, so cannot supply the specifics on cycle
> > times for each instruction. I also do not know the specific processor well
> > enough to predict the amount of pipeline stalls, etc )
> >
> > In general, the operation tmpRead1 = *read1; will take much longer than the time
> > needed to read the value from the source address.
> >
> > Finally, the time measuring tool (?JTAG?) probably imposes some delays for
> > setup/communication/etc.
> >
> > R. Williams
> >
> >
> >
> > ---------- Original Message -----------
> > From: "d.stuartnl"
> > To: c...
> > Sent: Mon, 13 Jul 2009 13:14:04 -0000
> > Subject: [c6x] Re: Slow EMIF transfer
> >
> > > Hi all,
> > >
> > > I've examined my code and hardware, the FIFO's I'm accessing are
> > > configured as Asynchronous.
> > >
> > > The address space is configured as:
> > > *0x1800004 = 0x10914221; /* CE1 = async 32 */
> > >
> > > If I understand it correctly this should be 1 setup, 2 strobe and 1
> > > hold cycle. The EMIF is 100 MHz so one read should take 0,04us.
> > >
> > > When I measure the performance 1 read takes 0.333 us. Where does this
> > > delay come from?
> > >
> > > The code is as follows:
> > >
> > > read1 = (int*) 0x90300004;
> > > tmpRead1 = *read1; // this line takes 0.333 us.
> > >
> > > tmpRead1 is defined as a volatile int and resides in IRAM (address:
> > > 0x0001d100 according to the .map file.
> > >
> > > I saw that R. Williams suggested that there may be 10 wait states,
> > > what are these and how can i verify?
> > >
> > > If I look at the assembly code for the tmpRead1 = *read1; line it states:
> > >
> > > MV.L2X A5,B4
> > > LDW.D2T2 *+B4[0],B4
> > > MVK.S1 0xffffd100,A6
> > > MVKH.S1 0x10000,A6
> > > NOP 2
> > > STW.D1T2 B4,*+A6[0]
> > > NOP
> > >
> > > I hope anyone can shed some light on this or point me in the right
> > > direction to debug this problem.
> > >
> > > With kind regards,
> > >
> > > Dominic Stuart
> > >
> > > --- In c..., "d.stuartnl" wrote:
> > > >
> > > > Thanks for all of your responses, I've checked the .gel file and found that
> > the EMIF_CE registers for the FIFO's I'm reading from are configured as 32-bit
> > asynchronous. The FIFO's are capable of Synchronous datatransfer so I will check
> > the SPRU document and program the registers correctfully. I will post a final
> > message if that fixes my problem.
> > > >
> > > > --- In c..., "Richard Williams" wrote:
> > > > >
> > > > > D.S,
> > > > >
> > > > > it looks, on first examination, like the memory at 0x90300000 has about 10
> > wait
> > > > > states.
> > > > >
> > > > > Have you examined the actual source code?
> > > > > I would expect a max of 4 instructions to perform the 'tmpRead1 = *read2'
> > > > > fetch read1 (source address)
> > > > > fetch @ read1 (contents)
> > > > > fetch tmpRead1 (destination address)
> > > > > store @ tmpRead1 (contents)
> > > > >
> > > > > R. Williams
> > > > >
> > > > >
> > > > > ---------- Original Message -----------
> > > > > From: d.stuartnl@
> > > > > To: c...
> > > > > Sent: Tue, 23 Jun 2009 09:17:13 -0400
> > > > > Subject: [c6x] Slow EMIF transfer
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I am a fairly new embedded programmer and this is my first post on
> > > > > > this forum. I am working with a 6713 DSP and in my current project I
> > > > > > am reading data from some FIFO's connected to the EMIF bus. My problem
> > > > > > is the performance of the EMIF I have measured the time it takes to
> > > > > > read from the EMIF and i have confirmed these findings with the
simulator.
> > > > > >
> > > > > > I'm excecuting the folowing code:
> > > > > >
> > > > > > x++; // 10 clocks - 0.033 us
> > > > > > read1 = (int*) 0x90300004; // 3 clocks - 0.010 us
> > > > > > read2 = (int*) 0x90300008; // 3 clocks - 0.010 us
> > > > > > tmpRead1 = *read1; // 177 clocks - 0.590 us
> > > > > > tmpRead2 = *read2; // 176 clocks - 0.586 us
> > > > > >
> > > > > > I've commented the measured clocktimes according to the simulator. 177
> > > > > > clocks for 1 read seems a bit much. Am I overlooking something? How
> > > > > > can i acquire a higher transfer speed?
> > > > > >
> > > > > > With kind regards,
> > > > > >
> > > > > > Dominic Stuart
> > > > > ------- End of Original Message -------
> > > > >
> > > >
> > ------- End of Original Message -------
> >
------- End of Original Message -------

_____________________________________

Reply by Michael Dunn ●July 13, 20092009-07-13

Dominic,

On Mon, Jul 13, 2009 at 10:15 AM, Jeff Brower wrote:
> Dominic-
>
> > Thanks for your reply R.Williams,
> >
> > I am measuring the instruction with a hardware timer (which creates
> > 0.1665 us delay). This means that the read
> > instruction still takes (0.333 - 0.1665) 0.1665 us. This seems very
> > slow (+/-6MHz) for a 300 MHz CPU connected to a
> > 100MHz bus. Is there any way to speed this up?

Excuse me if I missed this in previous postings...

1. Is the code executing in IRAM or SDRAM??
2. Is ClkOut2 == 150 Mhz??
3. Have you checked Emif Clk in for 100 Mhz??

Now that we [or I :-) ] are calibrated....
>
> Any "internal" technique you use to measure the duration of something in the nsec range is not going to be accurate.
> As you have found, reading hardware timer registers has some inherent delay, and as Richard mentions, a JTAG and/or
> RTDX based method would take so much time the actual memory cycle would end up a tiny fraction.
>
> The only way to accurately measure a single memory cycle time is externally (dig scope or LA). My suggestion to get a
> worst-case figure would be to make three accesses: one to your mem, one to another mem (to force a change in CEn
> lines), and a third one to your mem. And watch this on the scope. Then you would know both the cycle duration and
> the amount of time the compiler is adding for your line of C code.

I pretty much agree with Jeff.
If you are writing your testcase in C, keep it very simple, keep
everything in main, and always check the asm code that was generated.
I prefer a loop because some days it takes me a few tries to get setup
correctly and "sync'd". When doing the reads, look at the CEx line and
AOE to determine the read cycle time.

Start:
Read from 0xA0000000 [CE2].
Read from 0x90300004 [CE1].
Read from 0xA0000000 [CE2].
Read from 0x90300004 [CE1].
goto Start.

Once you get a handle on the measurements, you can insert 2 CE2
accesses around your "code of interest" to measure the time in your
app. If you 'register' your CE2 address, you can get very accurate
numbers.

mikedunn

>
> -Jeff
>
> > --- In c..., "Richard Williams" wrote:
> >>
> >> d.stuartnl,
> >>
> >> I see 10 instructions in the code:
> >>
> >> MV.L2X A5,B4 <<-- copy source address to working register
> >> LDW.D2T2 *+B4[0],B4 <<-- read value at source
> >> MVK.S1 0xffffd100,A6 <<-- set up low portion of destination address in A6
> >> MVKH.S1 0x10000,A6 <<-- set up high portion of destination address in A6
> >> NOP 2 <<-- wait two NOP instruction times
> >> STW.D1T2 B4,*+A6[0] <<-- write value to destination address
> >> NOP <<-- wait 1 NOP instruction time
> >>
> >> So the "tmpRead1 = *read1;" takes ~10 instructions.
> >> ( I do not have the H/W details at hand, so cannot supply the specifics on cycle
> >> times for each instruction. I also do not know the specific processor well
> >> enough to predict the amount of pipeline stalls, etc )
> >>
> >> In general, the operation tmpRead1 = *read1; will take much longer than the time
> >> needed to read the value from the source address.
> >>
> >> Finally, the time measuring tool (?JTAG?) probably imposes some delays for
> >> setup/communication/etc.
> >>
> >> R. Williams
> >>
> >>
> >>
> >> ---------- Original Message -----------
> >> From: "d.stuartnl"
> >> To: c...
> >> Sent: Mon, 13 Jul 2009 13:14:04 -0000
> >> Subject: [c6x] Re: Slow EMIF transfer
> >>
> >> > Hi all,
> >> >
> >> > I've examined my code and hardware, the FIFO's I'm accessing are
> >> > configured as Asynchronous.
> >> >
> >> > The address space is configured as:
> >> > *0x1800004 = 0x10914221; /* CE1 = async 32 */
> >> >
> >> > If I understand it correctly this should be 1 setup, 2 strobe and 1
> >> > hold cycle. The EMIF is 100 MHz so one read should take 0,04us.
> >> >
> >> > When I measure the performance 1 read takes 0.333 us. Where does this
> >> > delay come from?
> >> >
> >> > The code is as follows:
> >> >
> >> > read1 = (int*) 0x90300004;
> >> > tmpRead1 = *read1; // this line takes 0.333 us.
> >> >
> >> > tmpRead1 is defined as a volatile int and resides in IRAM (address:
> >> > 0x0001d100 according to the .map file.
> >> >
> >> > I saw that R. Williams suggested that there may be 10 wait states,
> >> > what are these and how can i verify?
> >> >
> >> > If I look at the assembly code for the tmpRead1 = *read1; line it states:
> >> >
> >> > MV.L2X A5,B4
> >> > LDW.D2T2 *+B4[0],B4
> >> > MVK.S1 0xffffd100,A6
> >> > MVKH.S1 0x10000,A6
> >> > NOP 2
> >> > STW.D1T2 B4,*+A6[0]
> >> > NOP
> >> >
> >> > I hope anyone can shed some light on this or point me in the right
> >> > direction to debug this problem.
> >> >
> >> > With kind regards,
> >> >
> >> > Dominic Stuart
> >> >
> >> > --- In c..., "d.stuartnl" wrote:
> >> > >
> >> > > Thanks for all of your responses, I've checked the .gel file and found that
> >> the EMIF_CE registers for the FIFO's I'm reading from are configured as 32-bit
> >> asynchronous. The FIFO's are capable of Synchronous datatransfer so I will check
> >> the SPRU document and program the registers correctfully. I will post a final
> >> message if that fixes my problem.
> >> > >
> >> > > --- In c..., "Richard Williams" wrote:
> >> > > >
> >> > > > D.S,
> >> > > >
> >> > > > it looks, on first examination, like the memory at 0x90300000 has about 10
> >> wait
> >> > > > states.
> >> > > >
> >> > > > Have you examined the actual source code?
> >> > > > I would expect a max of 4 instructions to perform the 'tmpRead1 = *read2'
> >> > > > fetch read1 (source address)
> >> > > > fetch @ read1 (contents)
> >> > > > fetch tmpRead1 (destination address)
> >> > > > store @ tmpRead1 (contents)
> >> > > >
> >> > > > R. Williams
> >> > > >
> >> > > >
> >> > > > ---------- Original Message -----------
> >> > > > From: d.stuartnl@
> >> > > > To: c...
> >> > > > Sent: Tue, 23 Jun 2009 09:17:13 -0400
> >> > > > Subject: [c6x] Slow EMIF transfer
> >> > > >
> >> > > > > Hi all,
> >> > > > >
> >> > > > > I am a fairly new embedded programmer and this is my first post on
> >> > > > > this forum. I am working with a 6713 DSP and in my current project I
> >> > > > > am reading data from some FIFO's connected to the EMIF bus. My problem
> >> > > > > is the performance of the EMIF I have measured the time it takes to
> >> > > > > read from the EMIF and i have confirmed these findings with the simulator.
> >> > > > >
> >> > > > > I'm excecuting the folowing code:
> >> > > > >
> >> > > > > x++; // 10 clocks - 0.033 us
> >> > > > > read1 = (int*) 0x90300004; // 3 clocks - 0.010 us
> >> > > > > read2 = (int*) 0x90300008; // 3 clocks - 0.010 us
> >> > > > > tmpRead1 = *read1; // 177 clocks - 0.590 us
> >> > > > > tmpRead2 = *read2; // 176 clocks - 0.586 us
> >> > > > >
> >> > > > > I've commented the measured clocktimes according to the simulator. 177
> >> > > > > clocks for 1 read seems a bit much. Am I overlooking something? How
> >> > > > > can i acquire a higher transfer speed?
> >> > > > >
> >> > > > > With kind regards,
> >> > > > >
> >> > > > > Dominic Stuart
> >> > > > ------- End of Original Message -------
> >> > > >
> >> > >
> >> ------- End of Original Message -------
--
www.dsprelated.com/blogs-1/nf/Mike_Dunn.php

_____________________________________

Reply by Adolf Klemenz ●July 13, 20092009-07-13

Dear Dominic,

C6x CPU reads from the EMIF are always significantly slower than
expected. This is caused by pipeline and synchronization penalties: the
data has to cross different clock domains. Also it takes 4 "delay slots"
from a read instruction until the data is available in the register file
and ready to be stored

You can speed up performance by interleaving multiple read instructions,
but this requires low-level assembler programming.

I recommend to use EDMA or QDMA to read your Fifo. With DMA you will get
the expected performance (40ns read cycle time). Make sure the DMA
destination is in internal L2RAM - if in external memory (SDRAM for
example), the EMIF must switch from asynchronous to synchronous mode with
every transfer, which will dramatically slow down the transfer.

Best Regards,
Adolf Klemenz, D.SignT

At 14:31 13.07.2009 +0000, d.stuartnl wrote:
>Thanks for your reply R.Williams,
>
>I am measuring the instruction with a hardware timer (which creates 0.1665
>us delay). This means that the read instruction still takes (0.333 -
>0.1665) 0.1665 us. This seems very slow (+/-6MHz) for a 300 MHz CPU
>connected to a 100MHz bus. Is there any way to speed this up?

_____________________________________

Reply by "d.stuartnl" ●July 13, 20092009-07-13

Hi Jeff,

thanks for your response. I think you're right in stating that when measuring small times in a crude way there is a huge error margin. But it's not about the measurements. I first found the problem when I collected 1000 bytes when I divide those measured results with the bytes read I concluded that the reads are very slow. When I measured the single reads with the same (crude) technique (hardware timer). I substracted the delay the timer imposed and found out that the reading is fairly correct, somehow a read takes 0.16 us is this normal behaviour or am I overlooking something somehow?

I am using a D.Module.C6713 in combination with a TI FIFO (SN74V215). There is some glue logic involved (programmed in the onboard CPLD).

I am suspecting it is a software/configuration problem because I am getting valid data only it's slow. I would think it was hardware related if I had no data or the data was corrupt but since the data is fine I am at a loss why the transfer speed is so slow.

With kind regards,

Dominic

--- In c..., "Jeff Brower" wrote:
>
> Dominic-
>
> > Thanks for your reply R.Williams,
> >
> > I am measuring the instruction with a hardware timer (which creates
> > 0.1665 us delay). This means that the read
> > instruction still takes (0.333 - 0.1665) 0.1665 us. This seems very
> > slow (+/-6MHz) for a 300 MHz CPU connected to a
> > 100MHz bus. Is there any way to speed this up?
>
> Any "internal" technique you use to measure the duration of something in the nsec range is not going to be accurate.
> As you have found, reading hardware timer registers has some inherent delay, and as Richard mentions, a JTAG and/or
> RTDX based method would take so much time the actual memory cycle would end up a tiny fraction.
>
> The only way to accurately measure a single memory cycle time is externally (dig scope or LA). My suggestion to get a
> worst-case figure would be to make three accesses: one to your mem, one to another mem (to force a change in CEn
> lines), and a third one to your mem. And watch this on the scope. Then you would know both the cycle duration and
> the amount of time the compiler is adding for your line of C code.
>
> -Jeff
>
> > --- In c..., "Richard Williams" wrote:
> >>
> >> d.stuartnl,
> >>
> >> I see 10 instructions in the code:
> >>
> >> MV.L2X A5,B4 <<-- copy source address to working register
> >> LDW.D2T2 *+B4[0],B4 <<-- read value at source
> >> MVK.S1 0xffffd100,A6 <<-- set up low portion of destination address in A6
> >> MVKH.S1 0x10000,A6 <<-- set up high portion of destination address in A6
> >> NOP 2 <<-- wait two NOP instruction times
> >> STW.D1T2 B4,*+A6[0] <<-- write value to destination address
> >> NOP <<-- wait 1 NOP instruction time
> >>
> >> So the "tmpRead1 = *read1;" takes ~10 instructions.
> >> ( I do not have the H/W details at hand, so cannot supply the specifics on cycle
> >> times for each instruction. I also do not know the specific processor well
> >> enough to predict the amount of pipeline stalls, etc )
> >>
> >> In general, the operation tmpRead1 = *read1; will take much longer than the time
> >> needed to read the value from the source address.
> >>
> >> Finally, the time measuring tool (?JTAG?) probably imposes some delays for
> >> setup/communication/etc.
> >>
> >> R. Williams
> >>
> >>
> >>
> >> ---------- Original Message -----------
> >> From: "d.stuartnl"
> >> To: c...
> >> Sent: Mon, 13 Jul 2009 13:14:04 -0000
> >> Subject: [c6x] Re: Slow EMIF transfer
> >>
> >> > Hi all,
> >> >
> >> > I've examined my code and hardware, the FIFO's I'm accessing are
> >> > configured as Asynchronous.
> >> >
> >> > The address space is configured as:
> >> > *0x1800004 = 0x10914221; /* CE1 = async 32 */
> >> >
> >> > If I understand it correctly this should be 1 setup, 2 strobe and 1
> >> > hold cycle. The EMIF is 100 MHz so one read should take 0,04us.
> >> >
> >> > When I measure the performance 1 read takes 0.333 us. Where does this
> >> > delay come from?
> >> >
> >> > The code is as follows:
> >> >
> >> > read1 = (int*) 0x90300004;
> >> > tmpRead1 = *read1; // this line takes 0.333 us.
> >> >
> >> > tmpRead1 is defined as a volatile int and resides in IRAM (address:
> >> > 0x0001d100 according to the .map file.
> >> >
> >> > I saw that R. Williams suggested that there may be 10 wait states,
> >> > what are these and how can i verify?
> >> >
> >> > If I look at the assembly code for the tmpRead1 = *read1; line it states:
> >> >
> >> > MV.L2X A5,B4
> >> > LDW.D2T2 *+B4[0],B4
> >> > MVK.S1 0xffffd100,A6
> >> > MVKH.S1 0x10000,A6
> >> > NOP 2
> >> > STW.D1T2 B4,*+A6[0]
> >> > NOP
> >> >
> >> > I hope anyone can shed some light on this or point me in the right
> >> > direction to debug this problem.
> >> >
> >> > With kind regards,
> >> >
> >> > Dominic Stuart
> >> >
> >> > --- In c..., "d.stuartnl" wrote:
> >> > >
> >> > > Thanks for all of your responses, I've checked the .gel file and found that
> >> the EMIF_CE registers for the FIFO's I'm reading from are configured as 32-bit
> >> asynchronous. The FIFO's are capable of Synchronous datatransfer so I will check
> >> the SPRU document and program the registers correctfully. I will post a final
> >> message if that fixes my problem.
> >> > >
> >> > > --- In c..., "Richard Williams" wrote:
> >> > > >
> >> > > > D.S,
> >> > > >
> >> > > > it looks, on first examination, like the memory at 0x90300000 has about 10
> >> wait
> >> > > > states.
> >> > > >
> >> > > > Have you examined the actual source code?
> >> > > > I would expect a max of 4 instructions to perform the 'tmpRead1 = *read2'
> >> > > > fetch read1 (source address)
> >> > > > fetch @ read1 (contents)
> >> > > > fetch tmpRead1 (destination address)
> >> > > > store @ tmpRead1 (contents)
> >> > > >
> >> > > > R. Williams
> >> > > >
> >> > > >
> >> > > > ---------- Original Message -----------
> >> > > > From: d.stuartnl@
> >> > > > To: c...
> >> > > > Sent: Tue, 23 Jun 2009 09:17:13 -0400
> >> > > > Subject: [c6x] Slow EMIF transfer
> >> > > >
> >> > > > > Hi all,
> >> > > > >
> >> > > > > I am a fairly new embedded programmer and this is my first post on
> >> > > > > this forum. I am working with a 6713 DSP and in my current project I
> >> > > > > am reading data from some FIFO's connected to the EMIF bus. My problem
> >> > > > > is the performance of the EMIF I have measured the time it takes to
> >> > > > > read from the EMIF and i have confirmed these findings with the simulator.
> >> > > > >
> >> > > > > I'm excecuting the folowing code:
> >> > > > >
> >> > > > > x++; // 10 clocks - 0.033 us
> >> > > > > read1 = (int*) 0x90300004; // 3 clocks - 0.010 us
> >> > > > > read2 = (int*) 0x90300008; // 3 clocks - 0.010 us
> >> > > > > tmpRead1 = *read1; // 177 clocks - 0.590 us
> >> > > > > tmpRead2 = *read2; // 176 clocks - 0.586 us
> >> > > > >
> >> > > > > I've commented the measured clocktimes according to the simulator. 177
> >> > > > > clocks for 1 read seems a bit much. Am I overlooking something? How
> >> > > > > can i acquire a higher transfer speed?
> >> > > > >
> >> > > > > With kind regards,
> >> > > > >
> >> > > > > Dominic Stuart
> >> > > > ------- End of Original Message -------
> >> > > >
> >> > >
> >> ------- End of Original Message -------
>

_____________________________________

Reply by "d.stuartnl" ●July 13, 20092009-07-13

Dear Adolf,

as I understand DMA, I would need to work in "blocks" of data but that would be very tricky in my application since I do not know how big the datastream is gonna be. Or is it possible to use DMA for single byte transfers?

With kind regards,

Dominic

--- In c..., Adolf Klemenz wrote:
>
> Dear Dominic,
>
> C6x CPU reads from the EMIF are always significantly slower than
> expected. This is caused by pipeline and synchronization penalties: the
> data has to cross different clock domains. Also it takes 4 "delay slots"
> from a read instruction until the data is available in the register file
> and ready to be stored
>
> You can speed up performance by interleaving multiple read instructions,
> but this requires low-level assembler programming.
>
> I recommend to use EDMA or QDMA to read your Fifo. With DMA you will get
> the expected performance (40ns read cycle time). Make sure the DMA
> destination is in internal L2RAM - if in external memory (SDRAM for
> example), the EMIF must switch from asynchronous to synchronous mode with
> every transfer, which will dramatically slow down the transfer.
>
> Best Regards,
> Adolf Klemenz, D.SignT
> At 14:31 13.07.2009 +0000, d.stuartnl wrote:
> >Thanks for your reply R.Williams,
> >
> >I am measuring the instruction with a hardware timer (which creates 0.1665
> >us delay). This means that the read instruction still takes (0.333 -
> >0.1665) 0.1665 us. This seems very slow (+/-6MHz) for a 300 MHz CPU
> >connected to a 100MHz bus. Is there any way to speed this up?
>

_____________________________________

Reply by Jeff Brower ●July 13, 20092009-07-13

Dominic-

> thanks for your response. I think you're right in stating that when
> measuring small times in a crude way there is a
> huge error margin. But it's not about the measurements. I first found
> the problem when I collected 1000 bytes when I
> divide those measured results with the bytes read I concluded that
> the reads are very slow. When I measured the single
> reads with the same (crude) technique (hardware timer). I substracted
> the delay the timer imposed and found out that
> the reading is fairly correct, somehow a read takes 0.16 us is this
> normal behaviour or am I overlooking something somehow?
>
> I am using a D.Module.C6713 in combination with a TI FIFO (SN74V215).
> There is some glue logic involved (programmed in the onboard CPLD).
>
> I am suspecting it is a software/configuration problem because I am
> getting valid data only it's slow. I would think
> it was hardware related if I had no data or the data was corrupt but
> since the data is fine I am at a loss why the transfer speed is so slow.

I answered based on single-cycle access time since that what's you asked about. If now it turns out you're actually
concerned about block transfer rate (in your comments above, 1000 bytes), then I suggest to follow Adolf's advice
regarding DMA.

-Jeff

> --- In c..., "Jeff Brower" wrote:
>>
>> Dominic-
>>
>> > Thanks for your reply R.Williams,
>> >
>> > I am measuring the instruction with a hardware timer (which creates
>> > 0.1665 us delay). This means that the read
>> > instruction still takes (0.333 - 0.1665) 0.1665 us. This seems very
>> > slow (+/-6MHz) for a 300 MHz CPU connected to a
>> > 100MHz bus. Is there any way to speed this up?
>>
>> Any "internal" technique you use to measure the duration of something in the nsec range is not going to be accurate.
>> As you have found, reading hardware timer registers has some inherent delay, and as Richard mentions, a JTAG and/or
>> RTDX based method would take so much time the actual memory cycle would end up a tiny fraction.
>>
>> The only way to accurately measure a single memory cycle time is externally (dig scope or LA). My suggestion to get
>> a
>> worst-case figure would be to make three accesses: one to your mem, one to another mem (to force a change in CEn
>> lines), and a third one to your mem. And watch this on the scope. Then you would know both the cycle duration and
>> the amount of time the compiler is adding for your line of C code.
>>
>> -Jeff
>>
>> > --- In c..., "Richard Williams" wrote:
>> >>
>> >> d.stuartnl,
>> >>
>> >> I see 10 instructions in the code:
>> >>
>> >> MV.L2X A5,B4 <<-- copy source address to working register
>> >> LDW.D2T2 *+B4[0],B4 <<-- read value at source
>> >> MVK.S1 0xffffd100,A6 <<-- set up low portion of destination address in A6
>> >> MVKH.S1 0x10000,A6 <<-- set up high portion of destination address in A6
>> >> NOP 2 <<-- wait two NOP instruction times
>> >> STW.D1T2 B4,*+A6[0] <<-- write value to destination address
>> >> NOP <<-- wait 1 NOP instruction time
>> >>
>> >> So the "tmpRead1 = *read1;" takes ~10 instructions.
>> >> ( I do not have the H/W details at hand, so cannot supply the specifics on cycle
>> >> times for each instruction. I also do not know the specific processor well
>> >> enough to predict the amount of pipeline stalls, etc )
>> >>
>> >> In general, the operation tmpRead1 = *read1; will take much longer than the time
>> >> needed to read the value from the source address.
>> >>
>> >> Finally, the time measuring tool (?JTAG?) probably imposes some delays for
>> >> setup/communication/etc.
>> >>
>> >> R. Williams
>> >>
>> >>
>> >>
>> >> ---------- Original Message -----------
>> >> From: "d.stuartnl"
>> >> To: c...
>> >> Sent: Mon, 13 Jul 2009 13:14:04 -0000
>> >> Subject: [c6x] Re: Slow EMIF transfer
>> >>
>> >> > Hi all,
>> >> >
>> >> > I've examined my code and hardware, the FIFO's I'm accessing are
>> >> > configured as Asynchronous.
>> >> >
>> >> > The address space is configured as:
>> >> > *0x1800004 = 0x10914221; /* CE1 = async 32 */
>> >> >
>> >> > If I understand it correctly this should be 1 setup, 2 strobe and 1
>> >> > hold cycle. The EMIF is 100 MHz so one read should take 0,04us.
>> >> >
>> >> > When I measure the performance 1 read takes 0.333 us. Where does this
>> >> > delay come from?
>> >> >
>> >> > The code is as follows:
>> >> >
>> >> > read1 = (int*) 0x90300004;
>> >> > tmpRead1 = *read1; // this line takes 0.333 us.
>> >> >
>> >> > tmpRead1 is defined as a volatile int and resides in IRAM (address:
>> >> > 0x0001d100 according to the .map file.
>> >> >
>> >> > I saw that R. Williams suggested that there may be 10 wait states,
>> >> > what are these and how can i verify?
>> >> >
>> >> > If I look at the assembly code for the tmpRead1 = *read1; line it states:
>> >> >
>> >> > MV.L2X A5,B4
>> >> > LDW.D2T2 *+B4[0],B4
>> >> > MVK.S1 0xffffd100,A6
>> >> > MVKH.S1 0x10000,A6
>> >> > NOP 2
>> >> > STW.D1T2 B4,*+A6[0]
>> >> > NOP
>> >> >
>> >> > I hope anyone can shed some light on this or point me in the right
>> >> > direction to debug this problem.
>> >> >
>> >> > With kind regards,
>> >> >
>> >> > Dominic Stuart
>> >> >
>> >> > --- In c..., "d.stuartnl" wrote:
>> >> > >
>> >> > > Thanks for all of your responses, I've checked the .gel file and found that
>> >> the EMIF_CE registers for the FIFO's I'm reading from are configured as 32-bit
>> >> asynchronous. The FIFO's are capable of Synchronous datatransfer so I will check
>> >> the SPRU document and program the registers correctfully. I will post a final
>> >> message if that fixes my problem.
>> >> > >
>> >> > > --- In c..., "Richard Williams" wrote:
>> >> > > >
>> >> > > > D.S,
>> >> > > >
>> >> > > > it looks, on first examination, like the memory at 0x90300000 has about 10
>> >> wait
>> >> > > > states.
>> >> > > >
>> >> > > > Have you examined the actual source code?
>> >> > > > I would expect a max of 4 instructions to perform the 'tmpRead1 = *read2'
>> >> > > > fetch read1 (source address)
>> >> > > > fetch @ read1 (contents)
>> >> > > > fetch tmpRead1 (destination address)
>> >> > > > store @ tmpRead1 (contents)
>> >> > > >
>> >> > > > R. Williams
>> >> > > >
>> >> > > >
>> >> > > > ---------- Original Message -----------
>> >> > > > From: d.stuartnl@
>> >> > > > To: c...
>> >> > > > Sent: Tue, 23 Jun 2009 09:17:13 -0400
>> >> > > > Subject: [c6x] Slow EMIF transfer
>> >> > > >
>> >> > > > > Hi all,
>> >> > > > >
>> >> > > > > I am a fairly new embedded programmer and this is my first post on
>> >> > > > > this forum. I am working with a 6713 DSP and in my current project I
>> >> > > > > am reading data from some FIFO's connected to the EMIF bus. My problem
>> >> > > > > is the performance of the EMIF I have measured the time it takes to
>> >> > > > > read from the EMIF and i have confirmed these findings with the simulator.
>> >> > > > >
>> >> > > > > I'm excecuting the folowing code:
>> >> > > > >
>> >> > > > > x++; // 10 clocks - 0.033 us
>> >> > > > > read1 = (int*) 0x90300004; // 3 clocks - 0.010 us
>> >> > > > > read2 = (int*) 0x90300008; // 3 clocks - 0.010 us
>> >> > > > > tmpRead1 = *read1; // 177 clocks - 0.590 us
>> >> > > > > tmpRead2 = *read2; // 176 clocks - 0.586 us
>> >> > > > >
>> >> > > > > I've commented the measured clocktimes according to the simulator. 177
>> >> > > > > clocks for 1 read seems a bit much. Am I overlooking something? How
>> >> > > > > can i acquire a higher transfer speed?
>> >> > > > >
>> >> > > > > With kind regards,
>> >> > > > >
>> >> > > > > Dominic Stuart
>> >> > > > ------- End of Original Message -------
>> >> > > >
>> >> > >
>> >> ------- End of Original Message -------

_____________________________________

123 4 5 Next

Slow EMIF transfer

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group