c6x | EDMA data cache problem

Folks,

I have a real big problem with EDMA and cache coherency.
Board :6416 Spectrum digital
Here's what I am doing.
1) Transfer data from SDRAM to ISRAM.
2) Work with data in ISRAM
3) Transfer back to SDRAM

4) Repeat process for next block in SDRAM to same block in ISRAM
.
.
5) Finally use SDRAM data.

Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
of multiples of 128 (actually (96k).
L2 cache is 128k and enabled.

Now before transfer from SDRAM to ISRAM in step 1, I always
CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
should be enough for cache coherency because the docs say L1D is
handled by EDMA. Also ISRAM block is always cache coherent. Correct?

But the data gets all screwed up....

Logically, I think I am doing things right.

Is there a way to check cache coherency without the debugger or
comparing memory via cpu and checking - that in itself pulls it into
L2/L1D cache?
It looks like EDMA is doing it's job but caching isn't.

What may be the problem here.... Appreciate some ideas.
Thanks,

C

Reply by carlferns ●August 4, 20062006-08-04

Thanks for your response.
I can try increasing the cache size.
But I cannot understand the reason behind that.
Currently I am processing blocks sequentially and am looking for the
reasoning to justify the results. After all I'd like to control the
cache so that the data is good or at least know if there is a work
around for a cache bug.

I looked at spru610.pdf - the cache guide for C64x. Is there another
doc you are referring to.

Once again thanks for the response.
Any other comments will be very appreciated...

-C.

> Hi!
> Try to increase the L2 cache size from 128 to
> 256K(max. support by the processor). It may solve your
> problem. Also refer the cache optimization guide
> manual from TI.
>
> Bye.
--- In c..., "carlferns" wrote:
>
> Folks,
>
> I have a real big problem with EDMA and cache coherency.
> Board :6416 Spectrum digital
> Here's what I am doing.
> 1) Transfer data from SDRAM to ISRAM.
> 2) Work with data in ISRAM
> 3) Transfer back to SDRAM
>
> 4) Repeat process for next block in SDRAM to same block in ISRAM
> .
> .
> 5) Finally use SDRAM data.
>
> Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> of multiples of 128 (actually (96k).
> L2 cache is 128k and enabled.
>
> Now before transfer from SDRAM to ISRAM in step 1, I always
> CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> should be enough for cache coherency because the docs say L1D is
> handled by EDMA. Also ISRAM block is always cache coherent. Correct?
>
> But the data gets all screwed up....
>
> Logically, I think I am doing things right.
>
> Is there a way to check cache coherency without the debugger or
> comparing memory via cpu and checking - that in itself pulls it into
> L2/L1D cache?
> It looks like EDMA is doing it's job but caching isn't.
>
> What may be the problem here.... Appreciate some ideas.
> Thanks,
>
> C
>

Reply by Guy Eschemann ●August 4, 20062006-08-04

This may be a stupid question, but why are you using L2 data cache if your
data is already in internal memory?

If your application permits it, I would suggest that you put everything (ie.
code + data) in internal memory. If you have some large data structures that
won't fit in ISRAM, leave those outside in SDRAM and process them slice-wise
in internal memory. And turn off the L2 data cache. This will get you rid of
the nasty coherency problems, and as a bonus you'll have more internal
memory for your code/data.

In case code + data won't fit into internal memory, leave the code outside
and enable L2 cache for the external code section. L2 caching works much
better for code than for data, because code is executed sequentially most of
the time. This produces less cache misses. And you don't have to worry about
cache coherency issues when using program cache.

In case the execution is too slow, consider moving single critical functions
to internal memory. You can do that by creating a section in your linker
command file, and then using the #pragma CODE_SECTION() directive for
pointing out those functions.

Hope this helps,

Guy Eschemann.
Vienna, Austria.

On 8/2/06, carlferns wrote:
>
> Folks,
>
> I have a real big problem with EDMA and cache coherency.
> Board :6416 Spectrum digital
> Here's what I am doing.
> 1) Transfer data from SDRAM to ISRAM.
> 2) Work with data in ISRAM
> 3) Transfer back to SDRAM
>
> 4) Repeat process for next block in SDRAM to same block in ISRAM
> .
> .
> 5) Finally use SDRAM data.
>
> Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> of multiples of 128 (actually (96k).
> L2 cache is 128k and enabled.
>
> Now before transfer from SDRAM to ISRAM in step 1, I always
> CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> should be enough for cache coherency because the docs say L1D is
> handled by EDMA. Also ISRAM block is always cache coherent. Correct?
>
> But the data gets all screwed up....
>
> Logically, I think I am doing things right.
>
> Is there a way to check cache coherency without the debugger or
> comparing memory via cpu and checking - that in itself pulls it into
> L2/L1D cache?
> It looks like EDMA is doing it's job but caching isn't.
>
> What may be the problem here.... Appreciate some ideas.
>
> Thanks,
>
> C

Reply by Jeff Brower ●August 7, 20062006-08-07

Carl-

> I have a real big problem with EDMA and cache coherency.
> Board :6416 Spectrum digital
> Here's what I am doing.
> 1) Transfer data from SDRAM to ISRAM.
> 2) Work with data in ISRAM
> 3) Transfer back to SDRAM
>
> 4) Repeat process for next block in SDRAM to same block in ISRAM
> .
> .
> 5) Finally use SDRAM data.
>
> Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> of multiples of 128 (actually (96k).
> L2 cache is 128k and enabled.
>
> Now before transfer from SDRAM to ISRAM in step 1, I always
> CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> should be enough for cache coherency because the docs say L1D is
> handled by EDMA. Also ISRAM block is always cache coherent. Correct?
>
> But the data gets all screwed up....
>
> Logically, I think I am doing things right.
>
> Is there a way to check cache coherency without the debugger or
> comparing memory via cpu and checking - that in itself pulls it into
> L2/L1D cache?
> It looks like EDMA is doing it's job but caching isn't.
>
> What may be the problem here.... Appreciate some ideas.

I think Guy is asking a good question. Won't 96k x 32 fit in internal
memory for 6416? So why use SDRAM and EDMA?

If the issue is that at some other time you need internal memory for
another reason, then I would first try with L2 data cache enabled and no
EDMA. The first time through your data loop you lose the speed advantage
of EDMA, but subsequent times your performance is just as good. And more
importantly, that mode forces you to make absolutely sure your data is
organized in the most efficient manner, and you have "thought through"
exactly the sequence that data moves and cache is used.

Then, enable EDMA as your last step. The performance gain its going to
give you in this situation is minimal, should you should get it working
last.

-Jeff

Reply by carlferns ●August 7, 20062006-08-07

Guy,

Thanks for your comments.
All my code/data is in ISRAM. My dynamically allocated data buffer is
the only thing in SDRAM on a 128 cache boundary and you are correct -
I am using the slice wise approach to transfer from SDRAM to ISRAM.

I tried it without L2 cache enabled/set but that does not change the
data corruption.

Here's where I find an issue with the cache (both L1d /L2). If I do a
verify_data(), which effectively compares the data (src and dst) byte
by byte via cpu, there is no data corruption thereafter. I think this
step in itself rectifies the cache for coherency .

Some thoughts
1) Does LID/L2 clear all the cache lines for the entire block of data
given in the functions that take the parameteres (data_block, size,
CACHE_WAIT) or does it just look for the block address alone in cache
and then clear the range (size) specified? I am afraid if the range is
not contiguous , there still remains bad cache data.
The addresses within the size to be cleared may not be contiguous but
could be somewhere else in cache.....
I guess I am looking at the inner working of the cache controller.
I will have to try and clear every address in 128 byte increments
starting from the data_block address.

2) Is there a way I can dump cache data or look at cache data..... May
be a stupid question - via a logic analyser...JTAG - any other tools?

-C
--- In c..., "Guy Eschemann" wrote:
>
> This may be a stupid question, but why are you using L2 data cache
if your
> data is already in internal memory?
>
> If your application permits it, I would suggest that you put
everything (ie.
> code + data) in internal memory. If you have some large data
structures that
> won't fit in ISRAM, leave those outside in SDRAM and process them
slice-wise
> in internal memory. And turn off the L2 data cache. This will get
you rid of
> the nasty coherency problems, and as a bonus you'll have more internal
> memory for your code/data.
>
> In case code + data won't fit into internal memory, leave the code
outside
> and enable L2 cache for the external code section. L2 caching works much
> better for code than for data, because code is executed sequentially
most of
> the time. This produces less cache misses. And you don't have to
worry about
> cache coherency issues when using program cache.
>
> In case the execution is too slow, consider moving single critical
functions
> to internal memory. You can do that by creating a section in your linker
> command file, and then using the #pragma CODE_SECTION() directive for
> pointing out those functions.
>
> Hope this helps,
>
> Guy Eschemann.
> Vienna, Austria.
>
> On 8/2/06, carlferns wrote:
> >
> > Folks,
> >
> > I have a real big problem with EDMA and cache coherency.
> > Board :6416 Spectrum digital
> > Here's what I am doing.
> > 1) Transfer data from SDRAM to ISRAM.
> > 2) Work with data in ISRAM
> > 3) Transfer back to SDRAM
> >
> > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > .
> > .
> > 5) Finally use SDRAM data.
> >
> > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > of multiples of 128 (actually (96k).
> > L2 cache is 128k and enabled.
> >
> > Now before transfer from SDRAM to ISRAM in step 1, I always
> > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > should be enough for cache coherency because the docs say L1D is
> > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> >
> > But the data gets all screwed up....
> >
> > Logically, I think I am doing things right.
> >
> > Is there a way to check cache coherency without the debugger or
> > comparing memory via cpu and checking - that in itself pulls it into
> > L2/L1D cache?
> > It looks like EDMA is doing it's job but caching isn't.
> >
> > What may be the problem here.... Appreciate some ideas.
> >
> > Thanks,
> >
> > C
> >
>

Reply by carlferns ●August 7, 20062006-08-07

Jeff,

Thanks for your comments.
I guess I was not very clear with what I am doing.
My issue is with DMA and the cache coherency.
I have to process a lot more data (16mb)in SDRAM and am slicing it up
to be processed in ISRAM.
What I keep seeing is that despite invalidating L2, the output data in
SDRAM at the very end (having processed all the input data from SDRAM)
is corrupt. The only processing going on is as I mentioned earlier
STEPS 1-5.
If I put a break point and view memory at any stage between Steps 1-5,
the debugger seems to handle the cache correctly and output data is good.

Conclusion - Cache controller is acting up or the API is not doing
what it is supposed to do.

>>
>> If the issue is that at some other time you need internal memory for
>> another reason, then I would first try with L2 data cache enabled
and no
>> EDMA.
>>
Here's what I have tried this far.
NO EDMA and No cache - The algorithm works great .
NO EDMA and ENABLED Cache , No problem since I do not use any of the
caching API.
ENABLED EDMA and ENABLED Cache , the output data is bad.

What I have noticed is it is more of a cache issue rather than a DMA
problem since the data can be verified. It is just that without the
CPU intervening and using the CACHE API, the data gets distorted.

Thanks,
C

P.S : How do I get the posts to show up in this group as a continuous
thread and without the wait.... that would be really cool.

>I think Guy is asking a good question. Won't 96k x 32 fit in internal
> memory for 6416? So why use SDRAM and EDMA?
>
> If the issue is that at some other time you need internal memory for
> another reason, then I would first try with L2 data cache enabled and no
> EDMA. The first time through your data loop you lose the speed
advantage
> of EDMA, but subsequent times your performance is just as good. And
more
> importantly, that mode forces you to make absolutely sure your data is
> organized in the most efficient manner, and you have "thought through"
> exactly the sequence that data moves and cache is used.
>
> Then, enable EDMA as your last step. The performance gain its going to
> give you in this situation is minimal, should you should get it working
> last.
>
> -Jeff

--- In c..., "Jeff Brower" wrote:
>
> Carl-
>
> > I have a real big problem with EDMA and cache coherency.
> > Board :6416 Spectrum digital
> > Here's what I am doing.
> > 1) Transfer data from SDRAM to ISRAM.
> > 2) Work with data in ISRAM
> > 3) Transfer back to SDRAM
> >
> > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > .
> > .
> > 5) Finally use SDRAM data.
> >
> > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > of multiples of 128 (actually (96k).
> > L2 cache is 128k and enabled.
> >
> > Now before transfer from SDRAM to ISRAM in step 1, I always
> > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > should be enough for cache coherency because the docs say L1D is
> > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> >
> > But the data gets all screwed up....
> >
> > Logically, I think I am doing things right.
> >
> > Is there a way to check cache coherency without the debugger or
> > comparing memory via cpu and checking - that in itself pulls it into
> > L2/L1D cache?
> > It looks like EDMA is doing it's job but caching isn't.
> >
> > What may be the problem here.... Appreciate some ideas.
>
> I think Guy is asking a good question. Won't 96k x 32 fit in internal
> memory for 6416? So why use SDRAM and EDMA?
>
> If the issue is that at some other time you need internal memory for
> another reason, then I would first try with L2 data cache enabled and no
> EDMA. The first time through your data loop you lose the speed
advantage
> of EDMA, but subsequent times your performance is just as good. And
more
> importantly, that mode forces you to make absolutely sure your data is
> organized in the most efficient manner, and you have "thought through"
> exactly the sequence that data moves and cache is used.
>
> Then, enable EDMA as your last step. The performance gain its going to
> give you in this situation is minimal, should you should get it working
> last.
>
> -Jeff
>

Reply by William C Bonner ●August 7, 20062006-08-07

I have what ay be a stupid question, but since I'm having some issues
programming a 6713 system that may be similar It's better to ask the
question and get clarification.

If you enable the cache controller DSP, do you really need to make any
other cache calls unless you want to free up the L2 ram that the cache
is using? my limited understanding of caching would be that once it is
enabled, you access memory using standard memory access commands, and
the cache controller optimizes what it thinks it needs to do, in the
chunk sizes it wants to. Issuing a command to invalidate and flush the
cache gives you a controlled state of knowing when the cache has been
flushed, but should not be necessary.

If you wanted complete control of what was in the L2 ram versus the
external ram, you'd be better off disabling cache altogether, freeing up
the cache ram for general purpose use, and paging the data in manually
to do your manipulations.

Is the 6416 processor significantly different in how it works with cache
from the 6713? Am I completely off base in how to deal with a cache
controller?

carlferns wrote:
>
> Jeff,
>
> Thanks for your comments.
> I guess I was not very clear with what I am doing.
> My issue is with DMA and the cache coherency.
> I have to process a lot more data (16mb)in SDRAM and am slicing it up
> to be processed in ISRAM.
> What I keep seeing is that despite invalidating L2, the output data in
> SDRAM at the very end (having processed all the input data from SDRAM)
> is corrupt. The only processing going on is as I mentioned earlier
> STEPS 1-5.
> If I put a break point and view memory at any stage between Steps 1-5,
> the debugger seems to handle the cache correctly and output data is good.
>
> Conclusion - Cache controller is acting up or the API is not doing
> what it is supposed to do.
>
> >>
> >> If the issue is that at some other time you need internal memory for
> >> another reason, then I would first try with L2 data cache enabled
> and no
> >> EDMA.
> >>
> Here's what I have tried this far.
> NO EDMA and No cache - The algorithm works great .
> NO EDMA and ENABLED Cache , No problem since I do not use any of the
> caching API.
> ENABLED EDMA and ENABLED Cache , the output data is bad.
>
> What I have noticed is it is more of a cache issue rather than a DMA
> problem since the data can be verified. It is just that without the
> CPU intervening and using the CACHE API, the data gets distorted.
>
> Thanks,
> C
>
> P.S : How do I get the posts to show up in this group as a continuous
> thread and without the wait.... that would be really cool.
>
> >I think Guy is asking a good question. Won't 96k x 32 fit in internal
> > memory for 6416? So why use SDRAM and EDMA?
> >
> > If the issue is that at some other time you need internal memory for
> > another reason, then I would first try with L2 data cache enabled and no
> > EDMA. The first time through your data loop you lose the speed
> advantage
> > of EDMA, but subsequent times your performance is just as good. And
> more
> > importantly, that mode forces you to make absolutely sure your data is
> > organized in the most efficient manner, and you have "thought through"
> > exactly the sequence that data moves and cache is used.
> >
> > Then, enable EDMA as your last step. The performance gain its going to
> > give you in this situation is minimal, should you should get it working
> > last.
> >
> > -Jeff
>
> --- In c... , "Jeff
> Brower" wrote:
> >
> > Carl-
> >
> > > I have a real big problem with EDMA and cache coherency.
> > > Board :6416 Spectrum digital
> > > Here's what I am doing.
> > > 1) Transfer data from SDRAM to ISRAM.
> > > 2) Work with data in ISRAM
> > > 3) Transfer back to SDRAM
> > >
> > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > .
> > > .
> > > 5) Finally use SDRAM data.
> > >
> > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > > of multiples of 128 (actually (96k).
> > > L2 cache is 128k and enabled.
> > >
> > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > > should be enough for cache coherency because the docs say L1D is
> > > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> > >
> > > But the data gets all screwed up....
> > >
> > > Logically, I think I am doing things right.
> > >
> > > Is there a way to check cache coherency without the debugger or
> > > comparing memory via cpu and checking - that in itself pulls it into
> > > L2/L1D cache?
> > > It looks like EDMA is doing it's job but caching isn't.
> > >
> > > What may be the problem here.... Appreciate some ideas.
> >
> > I think Guy is asking a good question. Won't 96k x 32 fit in internal
> > memory for 6416? So why use SDRAM and EDMA?
> >
> > If the issue is that at some other time you need internal memory for
> > another reason, then I would first try with L2 data cache enabled and no
> > EDMA. The first time through your data loop you lose the speed
> advantage
> > of EDMA, but subsequent times your performance is just as good. And
> more
> > importantly, that mode forces you to make absolutely sure your data is
> > organized in the most efficient manner, and you have "thought through"
> > exactly the sequence that data moves and cache is used.
> >
> > Then, enable EDMA as your last step. The performance gain its going to
> > give you in this situation is minimal, should you should get it working
> > last.
> >
> > -Jeff
> >

Reply by Andrew Elder ●August 8, 20062006-08-08

I would try doing a cache clean and see if that makes a difference.

- Andrew E.

carlferns wrote:

>Jeff,
>
>Thanks for your comments.
>I guess I was not very clear with what I am doing.
>My issue is with DMA and the cache coherency.
>I have to process a lot more data (16mb)in SDRAM and am slicing it up
>to be processed in ISRAM.
>What I keep seeing is that despite invalidating L2, the output data in
>SDRAM at the very end (having processed all the input data from SDRAM)
>is corrupt. The only processing going on is as I mentioned earlier
>STEPS 1-5.
>If I put a break point and view memory at any stage between Steps 1-5,
>the debugger seems to handle the cache correctly and output data is good.
>
>Conclusion - Cache controller is acting up or the API is not doing
>what it is supposed to do.
>
>
>
>>>If the issue is that at some other time you need internal memory for
>>>another reason, then I would first try with L2 data cache enabled
>>>
>>>
>and no
>
>
>>>EDMA.
>>>
>>>
>>>
>Here's what I have tried this far.
>NO EDMA and No cache - The algorithm works great .
>NO EDMA and ENABLED Cache , No problem since I do not use any of the
>caching API.
>ENABLED EDMA and ENABLED Cache , the output data is bad.
>
>What I have noticed is it is more of a cache issue rather than a DMA
>problem since the data can be verified. It is just that without the
>CPU intervening and using the CACHE API, the data gets distorted.
>
>Thanks,
>C
>
>P.S : How do I get the posts to show up in this group as a continuous
>thread and without the wait.... that would be really cool.
>
>
>>I think Guy is asking a good question. Won't 96k x 32 fit in internal
>>memory for 6416? So why use SDRAM and EDMA?
>>
>>If the issue is that at some other time you need internal memory for
>>another reason, then I would first try with L2 data cache enabled and no
>>EDMA. The first time through your data loop you lose the speed
>>
>>
>advantage
>
>
>>of EDMA, but subsequent times your performance is just as good. And
>>
>>
>more
>
>
>>importantly, that mode forces you to make absolutely sure your data is
>>organized in the most efficient manner, and you have "thought through"
>>exactly the sequence that data moves and cache is used.
>>
>>Then, enable EDMA as your last step. The performance gain its going to
>>give you in this situation is minimal, should you should get it working
>>last.
>>
>>-Jeff
>>
>>--- In c..., "Jeff Brower" wrote:
>
>
>>Carl-
>>
>>
>>
>>>I have a real big problem with EDMA and cache coherency.
>>>Board :6416 Spectrum digital
>>>Here's what I am doing.
>>>1) Transfer data from SDRAM to ISRAM.
>>>2) Work with data in ISRAM
>>>3) Transfer back to SDRAM
>>>
>>>4) Repeat process for next block in SDRAM to same block in ISRAM
>>>.
>>>.
>>>5) Finally use SDRAM data.
>>>
>>>Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
>>>of multiples of 128 (actually (96k).
>>>L2 cache is 128k and enabled.
>>>
>>>Now before transfer from SDRAM to ISRAM in step 1, I always
>>>CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
>>>should be enough for cache coherency because the docs say L1D is
>>>handled by EDMA. Also ISRAM block is always cache coherent. Correct?
>>>
>>>But the data gets all screwed up....
>>>
>>>Logically, I think I am doing things right.
>>>
>>>Is there a way to check cache coherency without the debugger or
>>>comparing memory via cpu and checking - that in itself pulls it into
>>>L2/L1D cache?
>>>It looks like EDMA is doing it's job but caching isn't.
>>>
>>>What may be the problem here.... Appreciate some ideas.
>>>
>>>
>>I think Guy is asking a good question. Won't 96k x 32 fit in internal
>>memory for 6416? So why use SDRAM and EDMA?
>>
>>If the issue is that at some other time you need internal memory for
>>another reason, then I would first try with L2 data cache enabled and no
>>EDMA. The first time through your data loop you lose the speed
>>
>>
>advantage
>
>
>>of EDMA, but subsequent times your performance is just as good. And
>>
>>
>more
>
>
>>importantly, that mode forces you to make absolutely sure your data is
>>organized in the most efficient manner, and you have "thought through"
>>exactly the sequence that data moves and cache is used.
>>
>>Then, enable EDMA as your last step. The performance gain its going to
>>give you in this situation is minimal, should you should get it working
>>last.
>>
>>-Jeff
>>
>>
>>
>

Reply by Jeff Brower ●August 8, 20062006-08-08

Carl-

> I guess I was not very clear with what I am doing.
> My issue is with DMA and the cache coherency.
> I have to process a lot more data (16mb)in SDRAM and am slicing it up
> to be processed in ISRAM.
> What I keep seeing is that despite invalidating L2, the output data in
> SDRAM at the very end (having processed all the input data from SDRAM)
> is corrupt. The only processing going on is as I mentioned earlier
> STEPS 1-5.
> If I put a break point and view memory at any stage between Steps 1-5,
> the debugger seems to handle the cache correctly and output data is good.
>
> Conclusion - Cache controller is acting up or the API is not doing
> what it is supposed to do.

Or programmer error. I know I know, not what you want to hear... but 1000s of
engineers use C64x EDMA and cache over the last few years.

> What I have noticed is it is more of a cache issue rather than a DMA
> problem since the data can be verified. It is just that without the
> CPU intervening and using the CACHE API, the data gets distorted.

What do you mean "output data in SDRAM at the very end"? End of what? Each block?
Or end of a bunch of blocks that consume all of SDRAM? If it's just the last block
or so, then what happens if you reduce your data set to use only 1/2 of SDRAM? If
the situation still occurs, then I might say it's "boundary condition" type of error,
which usually implies an application / programmer issue rather than something else.

> P.S : How do I get the posts to show up in this group as a continuous
> thread and without the wait.... that would be really cool.

The group is moderated so posts can take a while to appear. That's a good thing or
the group would die to spam, but this group has been strong since 1999.

-Jeff

>
> >I think Guy is asking a good question. Won't 96k x 32 fit in internal
> > memory for 6416? So why use SDRAM and EDMA?
> >
> > If the issue is that at some other time you need internal memory for
> > another reason, then I would first try with L2 data cache enabled and no
> > EDMA. The first time through your data loop you lose the speed
> advantage
> > of EDMA, but subsequent times your performance is just as good. And
> more
> > importantly, that mode forces you to make absolutely sure your data is
> > organized in the most efficient manner, and you have "thought through"
> > exactly the sequence that data moves and cache is used.
> >
> > Then, enable EDMA as your last step. The performance gain its going to
> > give you in this situation is minimal, should you should get it working
> > last.
> >
> > -Jeff
>
> --- In c..., "Jeff Brower" wrote:
> >
> > Carl-
> >
> > > I have a real big problem with EDMA and cache coherency.
> > > Board :6416 Spectrum digital
> > > Here's what I am doing.
> > > 1) Transfer data from SDRAM to ISRAM.
> > > 2) Work with data in ISRAM
> > > 3) Transfer back to SDRAM
> > >
> > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > .
> > > .
> > > 5) Finally use SDRAM data.
> > >
> > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > > of multiples of 128 (actually (96k).
> > > L2 cache is 128k and enabled.
> > >
> > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > > should be enough for cache coherency because the docs say L1D is
> > > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> > >
> > > But the data gets all screwed up....
> > >
> > > Logically, I think I am doing things right.
> > >
> > > Is there a way to check cache coherency without the debugger or
> > > comparing memory via cpu and checking - that in itself pulls it into
> > > L2/L1D cache?
> > > It looks like EDMA is doing it's job but caching isn't.
> > >
> > > What may be the problem here.... Appreciate some ideas.
> >
> > I think Guy is asking a good question. Won't 96k x 32 fit in internal
> > memory for 6416? So why use SDRAM and EDMA?
> >
> > If the issue is that at some other time you need internal memory for
> > another reason, then I would first try with L2 data cache enabled and no
> > EDMA. The first time through your data loop you lose the speed
> advantage
> > of EDMA, but subsequent times your performance is just as good. And
> more
> > importantly, that mode forces you to make absolutely sure your data is
> > organized in the most efficient manner, and you have "thought through"
> > exactly the sequence that data moves and cache is used.
> >
> > Then, enable EDMA as your last step. The performance gain its going to
> > give you in this situation is minimal, should you should get it working
> > last.
> >
> > -Jeff
> >

Reply by Jeff Brower ●August 8, 20062006-08-08

William-

> I have what ay be a stupid question, but since I'm having some issues
> programming a 6713 system that may be similar It's better to ask the
> question and get clarification.
>
> If you enable the cache controller DSP, do you really need to make any
> other cache calls unless you want to free up the L2 ram that the cache
> is using? my limited understanding of caching would be that once it is
> enabled, you access memory using standard memory access commands, and
> the cache controller optimizes what it thinks it needs to do, in the
> chunk sizes it wants to. Issuing a command to invalidate and flush the
> cache gives you a controlled state of knowing when the cache has been
> flushed, but should not be necessary.
>
> If you wanted complete control of what was in the L2 ram versus the
> external ram, you'd be better off disabling cache altogether, freeing up
> the cache ram for general purpose use, and paging the data in manually
> to do your manipulations.
>
> Is the 6416 processor significantly different in how it works with cache
> from the 6713? Am I completely off base in how to deal with a cache
> controller?

When you use EDMA to move data between external memory and internal SRAM (not cache),
the CPU doesn't "know" the internal memory has been changed; i.e. there is no
snooping. Code has to manually invalidate that area of cache. I think Carl wants to
do it this way because he's got a large amount of external SDRAM data (16 Mbyte) and
he's sort of "double buffering": moving large slices to internal memory while the
CPU is chugging away at another slice. This method might better utilize CPU internal
memory bus bandwidth and keep both CPU and DMA units busy (hopefully).

Although I still don't fully understand Carl doesn't DMA directly into cache space...

Otherwise you're right -- don't use EDMA between SRAM and internal memory, let the
CPU do the work, and keep cache enabled.

-Jeff

> carlferns wrote:
> >
> > Jeff,
> >
> > Thanks for your comments.
> > I guess I was not very clear with what I am doing.
> > My issue is with DMA and the cache coherency.
> > I have to process a lot more data (16mb)in SDRAM and am slicing it up
> > to be processed in ISRAM.
> > What I keep seeing is that despite invalidating L2, the output data in
> > SDRAM at the very end (having processed all the input data from SDRAM)
> > is corrupt. The only processing going on is as I mentioned earlier
> > STEPS 1-5.
> > If I put a break point and view memory at any stage between Steps 1-5,
> > the debugger seems to handle the cache correctly and output data is good.
> >
> > Conclusion - Cache controller is acting up or the API is not doing
> > what it is supposed to do.
> >
> > >>
> > >> If the issue is that at some other time you need internal memory for
> > >> another reason, then I would first try with L2 data cache enabled
> > and no
> > >> EDMA.
> > >>
> > Here's what I have tried this far.
> > NO EDMA and No cache - The algorithm works great .
> > NO EDMA and ENABLED Cache , No problem since I do not use any of the
> > caching API.
> > ENABLED EDMA and ENABLED Cache , the output data is bad.
> >
> > What I have noticed is it is more of a cache issue rather than a DMA
> > problem since the data can be verified. It is just that without the
> > CPU intervening and using the CACHE API, the data gets distorted.
> >
> > Thanks,
> > C
> >
> > P.S : How do I get the posts to show up in this group as a continuous
> > thread and without the wait.... that would be really cool.
> >
> > >I think Guy is asking a good question. Won't 96k x 32 fit in internal
> > > memory for 6416? So why use SDRAM and EDMA?
> > >
> > > If the issue is that at some other time you need internal memory for
> > > another reason, then I would first try with L2 data cache enabled and no
> > > EDMA. The first time through your data loop you lose the speed
> > advantage
> > > of EDMA, but subsequent times your performance is just as good. And
> > more
> > > importantly, that mode forces you to make absolutely sure your data is
> > > organized in the most efficient manner, and you have "thought through"
> > > exactly the sequence that data moves and cache is used.
> > >
> > > Then, enable EDMA as your last step. The performance gain its going to
> > > give you in this situation is minimal, should you should get it working
> > > last.
> > >
> > > -Jeff
> >
> > --- In c... , "Jeff
> > Brower" wrote:
> > >
> > > Carl-
> > >
> > > > I have a real big problem with EDMA and cache coherency.
> > > > Board :6416 Spectrum digital
> > > > Here's what I am doing.
> > > > 1) Transfer data from SDRAM to ISRAM.
> > > > 2) Work with data in ISRAM
> > > > 3) Transfer back to SDRAM
> > > >
> > > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > > .
> > > > .
> > > > 5) Finally use SDRAM data.
> > > >
> > > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > > > of multiples of 128 (actually (96k).
> > > > L2 cache is 128k and enabled.
> > > >
> > > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > > > should be enough for cache coherency because the docs say L1D is
> > > > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> > > >
> > > > But the data gets all screwed up....
> > > >
> > > > Logically, I think I am doing things right.
> > > >
> > > > Is there a way to check cache coherency without the debugger or
> > > > comparing memory via cpu and checking - that in itself pulls it into
> > > > L2/L1D cache?
> > > > It looks like EDMA is doing it's job but caching isn't.
> > > >
> > > > What may be the problem here.... Appreciate some ideas.
> > >
> > > I think Guy is asking a good question. Won't 96k x 32 fit in internal
> > > memory for 6416? So why use SDRAM and EDMA?
> > >
> > > If the issue is that at some other time you need internal memory for
> > > another reason, then I would first try with L2 data cache enabled and no
> > > EDMA. The first time through your data loop you lose the speed
> > advantage
> > > of EDMA, but subsequent times your performance is just as good. And
> > more
> > > importantly, that mode forces you to make absolutely sure your data is
> > > organized in the most efficient manner, and you have "thought through"
> > > exactly the sequence that data moves and cache is used.
> > >
> > > Then, enable EDMA as your last step. The performance gain its going to
> > > give you in this situation is minimal, should you should get it working
> > > last.
> > >
> > > -Jeff
> > >
> >
> >

Previous12 Next

EDMA data cache problem

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group