DSPRelated.com
Forums

Sloooow Async EMIF accesses

Started by TKSOFT October 31, 2006
Hi,

My 8-bit wide CPU Async read accesses from my C6711B are taking much
longer than I expect.

There seems to be a mysterious "startup" delay of ~19 EMIF cycles.
Is this normal???

I've made a lot of tests, and everything makes perfect sense EXCEPT
the "startup" delay.

The memory is configured as MTYPE=0 (8 bit wide Async) with 0-7-0
waitstates. My EMIF clock is running at CPUClk/2. I was expecting a
read to take ~10 EMIF clocks (20 cpu clks) but instead I measure ~24
EMIF clocks (48 cpu clocks).

I'm testing with the following assembly code executing from internal
RAM repeated 50X and measuring cpu clocks with CCS profile clock
counter. No DMA's enabled, Interrupts off, No cache.

MVKL .S2 0xa0000008,B4
MVKH .S2 0xa0000008,B4
LDBU .D2T2 *B4,B4 |
NOP 4

This executes in 52 CPU clocks!! If I add one additional EMIF wait
state (0-8-0) I see an increase of exactly 1 EMIF clock (2 CPU
clocks) and the time goes to 54 CPU clocks.

Also if I change from an 8-bit to a 32-bit read (change LDBU to LDW)
to cause 4 emif sequences to be required, the time only increases to
98 CPU clocks. Not even double. Which means that the 3 additional
bus reads only took (98-52)/3/2 = ~ 8 EMIF cycles each. Which seems
very reasonable.
Other things I've tried are:

Configuring the MTYPE=2 (32 bit memory) and doing 32-bit accesses
results are exactly the same (52 CPU clocks).

Changing the read address from 0xa0000008 (CE2 space) to 0x00000008
(0 wait state internal RAM) the time goes to 7 cpu cycles as
expected.
I suspect this is just the way the C6x works (although I can't find
that stated anywhere), but if not and anyone knows what I'm doing
wrong or a way to speed this up (without using DMA) please, please,
please, let me know.

Thanks!

TK
TK-

In my experience 6x1x series devices stall for a significant delay any time user code
"moves a signal at the chip edge". This applies to GPIO, EMIF, HINT, etc. In other
words, moving a signal as a result of a software instruction involves
synchronization, queuing, or other logic on the device and is not straightforward.
Avoiding I/O stalls requires successive accesses made by chip circuitry, typically
DMA, McBSP, etc. SDRAM burst mode access via EMIF is one example where efficiency is
high.

For C6x devices, I believe I've seen mention of documentation about this on the group
before -- hopefully someone else can point it out. I know for example that TI has
documented this on 55xx devices (which exhibit a significant difference in this area
between 5510 and 550x.)

-Jeff
> My 8-bit wide CPU Async read accesses from my C6711B are taking much
> longer than I expect.
>
> There seems to be a mysterious "startup" delay of ~19 EMIF cycles.
> Is this normal???
>
> I've made a lot of tests, and everything makes perfect sense EXCEPT
> the "startup" delay.
>
> The memory is configured as MTYPE=0 (8 bit wide Async) with 0-7-0
> waitstates. My EMIF clock is running at CPUClk/2. I was expecting a
> read to take ~10 EMIF clocks (20 cpu clks) but instead I measure ~24
> EMIF clocks (48 cpu clocks).
>
> I'm testing with the following assembly code executing from internal
> RAM repeated 50X and measuring cpu clocks with CCS profile clock
> counter. No DMA's enabled, Interrupts off, No cache.
>
> MVKL .S2 0xa0000008,B4
> MVKH .S2 0xa0000008,B4
> LDBU .D2T2 *B4,B4 |
> NOP 4
>
> This executes in 52 CPU clocks!! If I add one additional EMIF wait
> state (0-8-0) I see an increase of exactly 1 EMIF clock (2 CPU
> clocks) and the time goes to 54 CPU clocks.
>
> Also if I change from an 8-bit to a 32-bit read (change LDBU to LDW)
> to cause 4 emif sequences to be required, the time only increases to
> 98 CPU clocks. Not even double. Which means that the 3 additional
> bus reads only took (98-52)/3/2 = ~ 8 EMIF cycles each. Which seems
> very reasonable.
>
> Other things I've tried are:
>
> Configuring the MTYPE=2 (32 bit memory) and doing 32-bit accesses
> results are exactly the same (52 CPU clocks).
>
> Changing the read address from 0xa0000008 (CE2 space) to 0x00000008
> (0 wait state internal RAM) the time goes to 7 cpu cycles as
> expected.
>
> I suspect this is just the way the C6x works (although I can't find
> that stated anywhere), but if not and anyone knows what I'm doing
> wrong or a way to speed this up (without using DMA) please, please,
> please, let me know.
>
> Thanks!
>
> TK