Forums

C6000 external memory interface speed

Started by Mike S. October 20, 2004
Hello All,

 I am trying to access asynchronous memory mapped devices on a C6713
DSK daughtercard. The fastest access speed the DSK can sustain is from
7 to 8 ECLKOUT cycles per access even if the program does nothing but
continuously writes to a single CE2 or CE3 space address. According to
the C6000 EMIF specification the fastest access should only take 2
cycles. So my C6713 is too slow by a factor of about 4.

I have disabled interrupts and configured the EMIF registers for the
fastest operation. All the program code resides in the internal SRAM
of the DSP. By observing the read enable and write enable lines of the
DSP I have verified that no other EMIF activity, except for what the
program does, occurs. Yet the DSP de-asserts (is that a word) the chip
enable signal (CE2, CE3) for 5 or 6 cycles between the writes and
nothing seems to happen during that time.

Are there any C6000 gurus out there who could point me to the right
direction with this problem? TIA.

Regards,

Mike S.
Hello Mike,

As no "C6000 guru" has yet responded some suggestions from a non-C6000 guru:
You've not written about the datawidth. If it is <64bit.
You can try to write "long double" (64bit) values and measure the CE signals again.
I've discovered something similar with a SDRAM. 
The Emif seems not to know whether a consecutive access is made or not.
(when consecutive writes/reads are used).
But the Emif knows if it slices greater values into smaller peaces for access.
(e.g. 32bit write command, 8bit RAM connected).
This behaviour leads to datasheet-matching accesses e.g. when using 8-bit ram making
8-bit, 16-bit, 32-bit, 64-bit accesses. But between two accesses there seems to be
an additional delay ... (The 5 or 6 cycles you mentioned).
Unfortunately I've not found some Access pictures showing more than one access.
And I yet not know if the EMIF performs better when an EDMA access is done.

May a real guru should know better.

                                            Wolfgang


"Mike S." <coitusinc@yahoo.com> schrieb im Newsbeitrag news:1281c79c.0410200523.63a4e3dc@posting.google.com...
> Hello All, > > I am trying to access asynchronous memory mapped devices on a C6713 > DSK daughtercard. The fastest access speed the DSK can sustain is from > 7 to 8 ECLKOUT cycles per access even if the program does nothing but > continuously writes to a single CE2 or CE3 space address. According to > the C6000 EMIF specification the fastest access should only take 2 > cycles. So my C6713 is too slow by a factor of about 4. > > I have disabled interrupts and configured the EMIF registers for the > fastest operation. All the program code resides in the internal SRAM > of the DSP. By observing the read enable and write enable lines of the > DSP I have verified that no other EMIF activity, except for what the > program does, occurs. Yet the DSP de-asserts (is that a word) the chip > enable signal (CE2, CE3) for 5 or 6 cycles between the writes and > nothing seems to happen during that time. > > Are there any C6000 gurus out there who could point me to the right > direction with this problem? TIA. > > Regards, > > Mike S.
Hello Wolfgang,

"Wolfgang" <never@nowhere.com> wrote in message news:<cl8695$2ab$03$1@news.t-online.com>...
>As no "C6000 guru" has yet responded some suggestions from a non-C6000 guru:
I welcome all suggestions/comments. Guruhood is a vague and subjective concept anyway. I was just trying to provoke people to answer.
> You've not written about the datawidth.
The data width is 32 bits, the alignment is 4 bytes (32 bits) and the memory I'm writing to is specified as ASYNC32.
> If it is <64bit. > You can try to write "long double" (64bit) values and measure the CE signals again.
I thought 64-bit stores (STDW) were only available on C64x while I'm on C6713. How should I go about writing 64-bit values?
> I've discovered something similar with a SDRAM.
I think SDRAM writes are a bit different, don't know how it applies here.
> And I yet not know if the EMIF performs better when an EDMA access is done.
Yeah, I've been meaning to test DMA writes as well, but haven't gotten around to it yet.
> May a real guru should know better. > > Wolfgang
Thanks for your suggestions/comments. Regards, Mike S.
Hello Mike,

> The data width is 32 bits, the alignment is 4 bytes (32 bits) and the memory I'm > writing to is specified as ASYNC32.
O.k. I just thought if it had been less you could have played with it to gain more knowledge.
> I thought 64-bit stores (STDW) were only available on C64x while I'm on C6713.
There you see I'm not the guru. ;-)
> I think SDRAM writes are a bit different, don't know how it applies here.
Yes basically you are right. Your description does not allow a lot. As you've discribed it, it should work better. The only possibility I can additionally think of that there is some glue logic from the 67 to the daughter card. May this slows down performance ?
> Yeah, I've been meaning to test DMA writes as well, but haven't gotten around to > it yet.
... and I don't think that it will lead to better results. Wolfgang
Hello Wolfgang,

"Wolfgang" <never@nowhere.com> wrote in message news:<cla9e6$ldb$02$1@news.t-online.com>...
> The only possibility I can additionally think of that there is some glue logic from the 67 to > the daughter card. May this slows down performance ?
I measured the signals also directly from the DSP (on test points on the DSK board) so the glue logic could not have interfered. One thing that bothers me is that the JTAG and USB portion of the DSK is not documented at all. But I doubt that the JTAG is the problem here. My knowledge of the inner workings of the JTAG emulation is very limited, though.
> > Yeah, I've been meaning to test DMA writes as well, but haven't gotten around to > > it yet. > ... and I don't think that it will lead to better results.
Me neither, but maybe I gather some clues on how to proceed. Anyways, I'm going to ask Spectrum Digital and TI about this, but thought I'd ask here first if there were any obvious EMIF traps one could fall into when trying to get the full specified performance out of it. Regards, Mike S.
>> > Yeah, I've been meaning to test DMA writes as well, but haven't gotten >> > around to it yet. >> ... and I don't think that it will lead to better results. >
Mike S. wrote:
> Me neither, but maybe I gather some clues on how to proceed. Anyways, > I'm going to ask Spectrum Digital and TI about this, but thought I'd > ask here first if there were any obvious EMIF traps one could fall > into when trying to get the full specified performance out of it. >
[I have only used 6711, yet...] Beginning with basics: I am pretty sure that the cycles of async. accesses can be programmed. (Manuals are at work) Have you done that? When accessing CACHED memory, like SDRAM, the EMIF is s smart and loads the full cache line, and thus you will see only one full overhead. (But of cause you cant read from ONE location using that scheme...) For the CPU the EMIF can never really know when a new request will come. (CPU is stalled some of the time - it assumes four cycle access but it takes more...) But since the DMA knows exactly what it is supposed to do it could work ahead. I would recommend to use DMA to move stuff to and from of the SRAM. This way the processor can do what it is good at - number crunching... Note: The cache is of good use for code and data that are used in batches. But take care and prioritize what is more important DMA (that usually has to have moved data from one external port to memory before next arrives) or DSP core - I think the default is that the core is prioritized higher. And that a high frequency core together with cache can cause high EMIF load. But on the 6713 you can configure this. /RogerL
Hi Roger,

Roger Larsson <roger.larsson@skelleftea.mail.telia.com> wrote in message news:<NSfed.107074$dP1.402878@newsc.telia.net>...
> [I have only used 6711, yet...] > > Beginning with basics: I am pretty sure that the cycles of async. accesses > can be programmed. (Manuals are at work) Have you done that?
[For definiteness I'll speak about writes to CE3 memory space below and that's also what I've been playing around with the most.] Yes, I've set the number of setup, hold and strobe cycles to 0 in CECTL3 for both reads and writes. (I know the minimum for setup and strobe is 1 but 0 is treated as 1 here). And I've set the minimum turn around time to 0 as well (also tried TA=1). From looking at the CE3n and AWEn signals it doesn't seem to me that any part of the write access takes too long, but that there is a long delay between the consecutive writes (perhaps as if TA was set to some large value). The CE3n signal comes up and stays up for many ECLKOUT cycles between the writes. From the documentation I would assume that CE3n stays low for consecutive asynchronous writes.
> When accessing CACHED memory, like SDRAM, the EMIF is s smart and loads the > full cache line, and thus you will see only one full overhead. > (But of cause you cant read from ONE location using that scheme...)
The memory I'm writing to is programmed to be ASYNC32 and non-cacheable.
> For the CPU the EMIF can never really know when a new request will come. > (CPU is stalled some of the time - it assumes four cycle access but it takes > more...)
You mean four core clock cycles not ECLKOUT cycles, right? From the docs I would assume that a sustained two ECLKOUT cycle write speed is possible, no? Actually, one thing I haven't mentioned in this thread is that when I start a program the first two writes always take two cycles each and there is a one cycle delay between them. For the next write the delay is three cycles and after that it is either five or six cycles. Now that I stare at the timing diagram (Figure 1-10 in SPRU266B) for the umpteenth time (thanks for making me go look at it again) I notice a "CE write hold" period after the two writes. What's up with that? I don't think it's mentioned in the text at all! I also cannot find any notice of it in SPRS186I. Where is the fine print I'm missing? Perhaps foolhardily I assumed that C671x EMIF would not force any such additional delays. If it's there they certainly don't advertise it much. It wouldn't be a show stopper but a dissapointment nonetheless... well, live and learn. But still I don't think all my observations match with the docs.
> But since the DMA knows exactly what it is supposed to do it could work > ahead. > > I would recommend to use DMA to move stuff to and from of the SRAM. This way > the processor can do what it is good at - number crunching...
Well, I'll be accessing multiple and multirate memory mapped ADCs and DACs in real time. If an EMIF access takes, say, 3 ECLKOUT cycles it is true that not much number crunching can be done during that time. I will most probably use the DMA in the actual application, at least to some extent, but now I'm just trying to characterize the system and to see if it works up to the specs. It seems there is something I don't understand about async EMIF accesses (if that's not too much of an understatement). Knowledge about the latency and delays will anyway be important to me. Thanks for your input. Regards, Mike S.
Found another app notes:

Under optimum conditions:

        TMS320C621x/671x EDMA Performance Data (spraa03.htm, 1 KB)
        05 Mar 2004  Abstract

        "Chapter 2.5 CPU Accesses and Caching

        The following sections examine the access time for various CPU accesses
        to external memory. It should be noted that using the CPU to access data
        is generally a bad use of resources and should be avoided."

        double word reads 225 MHz DSP, 100 MHz memory, non cached SBSRAM =>
        39 CPU cycles average (Fig. 5)

Limitations:

        TMS320C6000 EDMA IO Scheduling and Performance (spraa00.htm, 1 KB)
        05 Mar 2004  Abstract


Most data from 6713 :-)

/RogerL
Roger Larsson <roger.larsson@skelleftea.mail.telia.com> wrote in message news:<gTCed.107165$dP1.403704@newsc.telia.net>...
> Found another app notes: > > Under optimum conditions: > > TMS320C621x/671x EDMA Performance Data (spraa03.htm, 1 KB)
[snip]
> double word reads 225 MHz DSP, 100 MHz memory, non cached SBSRAM => > 39 CPU cycles average (Fig. 5)
Hey, Roger, thanks again for making me go through all these documents (spraa03, spraa00,spra996) again. I've read (or perhaps I should say browsed) them all before but obviously something hadn't quite sunk in. As far as writes go Figure 7 in spraa03 shows approximately 18 CPU cycles per SBSRAM write which is about 8 ECLKOUT cycles. This agrees well with my measurements of 7-8 ECLKOUT cycles per ASRAM write. So it seems that CPU writes to EMIF really are quite slow IMO. However, with EDMA I should be able to do about 90 million writes per second to SBSRAM according to Table 3 in spraa03. I wonder how that translates to ASRAM writes. I would expect that each write takes 2 ECLKOUT cycles so the throughput would be close to 50 MSPS (with 100 MHz EMIF). On the other hand the SBSRAM burst length is 4 so maybe I'll only get 25 MSPS. I guess I'll just have to test and see. Anyway, it should be better than what I get now with CPU writes (about 10 MSPS). Thanks again, regards, Mike S.