Reply by Jeff Brower August 9, 20062006-08-09
Carl-

> > Although I still don't fully understand Carl doesn't DMA directly
> into cache space...
>
> Not sure I understood. DMA to cache? you mean SDRAM (EMIF) right?

Sorry I was confusing -- I just mean let the CPU and cache controller do the work.
If your code *always, without exception* accesses L2 SRAM, and you DMA *only* between
SDRAM and L2 SRAM (ping-pong buffers located in L2 SRAM), then the CPU should handle
all cache coherency issues for you. You should not need to make any cache API calls
in CSL.

Probably you have looked this over already, but just in case:

http://focus.ti.com/lit/ug/spru656a/spru656a.pdf

-Jeff

> --- In c..., Jeff Brower wrote:
> >
> > William-
> >
> > > I have what ay be a stupid question, but since I'm having some issues
> > > programming a 6713 system that may be similar It's better to ask the
> > > question and get clarification.
> > >
> > > If you enable the cache controller DSP, do you really need to make any
> > > other cache calls unless you want to free up the L2 ram that the cache
> > > is using? my limited understanding of caching would be that once
> it is
> > > enabled, you access memory using standard memory access commands, and
> > > the cache controller optimizes what it thinks it needs to do, in the
> > > chunk sizes it wants to. Issuing a command to invalidate and
> flush the
> > > cache gives you a controlled state of knowing when the cache has been
> > > flushed, but should not be necessary.
> > >
> > > If you wanted complete control of what was in the L2 ram versus the
> > > external ram, you'd be better off disabling cache altogether,
> freeing up
> > > the cache ram for general purpose use, and paging the data in manually
> > > to do your manipulations.
> > >
> > > Is the 6416 processor significantly different in how it works with
> cache
> > > from the 6713? Am I completely off base in how to deal with a cache
> > > controller?
> >
> > When you use EDMA to move data between external memory and internal
> SRAM (not cache),
> > the CPU doesn't "know" the internal memory has been changed; i.e.
> there is no
> > snooping. Code has to manually invalidate that area of cache. I
> think Carl wants to
> > do it this way because he's got a large amount of external SDRAM
> data (16 Mbyte) and
> > he's sort of "double buffering": moving large slices to internal
> memory while the
> > CPU is chugging away at another slice. This method might better
> utilize CPU internal
> > memory bus bandwidth and keep both CPU and DMA units busy (hopefully).
> >
> > Although I still don't fully understand Carl doesn't DMA directly
> into cache space...
> >
> > Otherwise you're right -- don't use EDMA between SRAM and internal
> memory, let the
> > CPU do the work, and keep cache enabled.
> >
> > -Jeff
> >
> > > carlferns wrote:
> > > >
> > > > Jeff,
> > > >
> > > > Thanks for your comments.
> > > > I guess I was not very clear with what I am doing.
> > > > My issue is with DMA and the cache coherency.
> > > > I have to process a lot more data (16mb)in SDRAM and am slicing
> it up
> > > > to be processed in ISRAM.
> > > > What I keep seeing is that despite invalidating L2, the output
> data in
> > > > SDRAM at the very end (having processed all the input data from
> SDRAM)
> > > > is corrupt. The only processing going on is as I mentioned earlier
> > > > STEPS 1-5.
> > > > If I put a break point and view memory at any stage between
> Steps 1-5,
> > > > the debugger seems to handle the cache correctly and output data
> is good.
> > > >
> > > > Conclusion - Cache controller is acting up or the API is not doing
> > > > what it is supposed to do.
> > > >
> > > > >>
> > > > >> If the issue is that at some other time you need internal
> memory for
> > > > >> another reason, then I would first try with L2 data cache enabled
> > > > and no
> > > > >> EDMA.
> > > > >>
> > > > Here's what I have tried this far.
> > > > NO EDMA and No cache - The algorithm works great .
> > > > NO EDMA and ENABLED Cache , No problem since I do not use any of the
> > > > caching API.
> > > > ENABLED EDMA and ENABLED Cache , the output data is bad.
> > > >
> > > > What I have noticed is it is more of a cache issue rather than a DMA
> > > > problem since the data can be verified. It is just that without the
> > > > CPU intervening and using the CACHE API, the data gets distorted.
> > > >
> > > > Thanks,
> > > > C
> > > >
> > > > P.S : How do I get the posts to show up in this group as a
> continuous
> > > > thread and without the wait.... that would be really cool.
> > > >
> > > > >I think Guy is asking a good question. Won't 96k x 32 fit in
> internal
> > > > > memory for 6416? So why use SDRAM and EDMA?
> > > > >
> > > > > If the issue is that at some other time you need internal
> memory for
> > > > > another reason, then I would first try with L2 data cache
> enabled and no
> > > > > EDMA. The first time through your data loop you lose the speed
> > > > advantage
> > > > > of EDMA, but subsequent times your performance is just as
> good. And
> > > > more
> > > > > importantly, that mode forces you to make absolutely sure your
> data is
> > > > > organized in the most efficient manner, and you have "thought
> through"
> > > > > exactly the sequence that data moves and cache is used.
> > > > >
> > > > > Then, enable EDMA as your last step. The performance gain its
> going to
> > > > > give you in this situation is minimal, should you should get
> it working
> > > > > last.
> > > > >
> > > > > -Jeff
> > > >
> > > > --- In c... , "Jeff
> > > > Brower" wrote:
> > > > >
> > > > > Carl-
> > > > >
> > > > > > I have a real big problem with EDMA and cache coherency.
> > > > > > Board :6416 Spectrum digital
> > > > > > Here's what I am doing.
> > > > > > 1) Transfer data from SDRAM to ISRAM.
> > > > > > 2) Work with data in ISRAM
> > > > > > 3) Transfer back to SDRAM
> > > > > >
> > > > > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > > > > .
> > > > > > .
> > > > > > 5) Finally use SDRAM data.
> > > > > >
> > > > > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed
> in chunks
> > > > > > of multiples of 128 (actually (96k).
> > > > > > L2 cache is 128k and enabled.
> > > > > >
> > > > > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > > > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I
> think that
> > > > > > should be enough for cache coherency because the docs say L1D is
> > > > > > handled by EDMA. Also ISRAM block is always cache coherent.
> Correct?
> > > > > >
> > > > > > But the data gets all screwed up....
> > > > > >
> > > > > > Logically, I think I am doing things right.
> > > > > >
> > > > > > Is there a way to check cache coherency without the debugger or
> > > > > > comparing memory via cpu and checking - that in itself pulls
> it into
> > > > > > L2/L1D cache?
> > > > > > It looks like EDMA is doing it's job but caching isn't.
> > > > > >
> > > > > > What may be the problem here.... Appreciate some ideas.
> > > > >
> > > > > I think Guy is asking a good question. Won't 96k x 32 fit in
> internal
> > > > > memory for 6416? So why use SDRAM and EDMA?
> > > > >
> > > > > If the issue is that at some other time you need internal
> memory for
> > > > > another reason, then I would first try with L2 data cache
> enabled and no
> > > > > EDMA. The first time through your data loop you lose the speed
> > > > advantage
> > > > > of EDMA, but subsequent times your performance is just as
> good. And
> > > > more
> > > > > importantly, that mode forces you to make absolutely sure your
> data is
> > > > > organized in the most efficient manner, and you have "thought
> through"
> > > > > exactly the sequence that data moves and cache is used.
> > > > >
> > > > > Then, enable EDMA as your last step. The performance gain its
> going to
> > > > > give you in this situation is minimal, should you should get
> it working
> > > > > last.
> > > > >
> > > > > -Jeff
> > > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
Reply by Jeff Brower August 9, 20062006-08-09
Carl-

> If it is programmer error, I sure enough will change the code but I'd
> like to understand the reason. I also noticed there was a bug with the
> cache API wherein it needed to be called twice to work correctly.
> Unfortunately after pouring thro' the forum, the general consensus is
> to use a form of CACHE_CLEAN_ALL_L2 which seems like an overkill.
>
> Here's the pseudo code for this test exercise using first principles.
> 1) Read a 16mb file into SDRAM - ( a simple image that has all zeros).
> 2) Loop for each 128k chunk in the 16mb SDRAM
> Invalidate the SDRAM 128k chunk address in L2 cache, size 128k.
> Synchronous Transfer 128k to IRAM from SDRAM using EDMA.

Invalidate step not needed for C64x, since EDMA destination is L2 SRAM. Cache
controller will snoop this and invalidate for you.

> //Verify IRAM and SDRAM block data by calling a routine similar to
> what one would see in the sample DAT examples
> Change IRAM data - in this case just change all the pixels to white
> (255).

Ok as long as code *only* accesses IRAM (L2 SRAM).

> Synchronous Transfer of the 128k back to same address in SDRAM from
> IRAM using EDMA.
> //Verify IRAM and SDRAM block data by calling a routine similar to
> what one would see in the sample DAT examples
> 3) Write out 16mb file from SDRAM to file
>
> The exact locations of resulting bad data in SDRAM is random but
> consistently within the first 128 bytes of every block.

This sounds like a synchronization issue. What if the cache line gets invalidated
once DMA "touches" its associated address in L2 SRAM, but your code somehow accesses
that line before DMA is finished? Only the first line is affected, after that DMA is
faster than your code so the rest of the block looks normal.

-Jeff
> --- In c..., Jeff Brower wrote:
> >
> > Carl-
> >
> > > I guess I was not very clear with what I am doing.
> > > My issue is with DMA and the cache coherency.
> > > I have to process a lot more data (16mb)in SDRAM and am slicing it up
> > > to be processed in ISRAM.
> > > What I keep seeing is that despite invalidating L2, the output data in
> > > SDRAM at the very end (having processed all the input data from SDRAM)
> > > is corrupt. The only processing going on is as I mentioned earlier
> > > STEPS 1-5.
> > > If I put a break point and view memory at any stage between Steps 1-5,
> > > the debugger seems to handle the cache correctly and output data
> is good.
> > >
> > > Conclusion - Cache controller is acting up or the API is not doing
> > > what it is supposed to do.
> >
> > Or programmer error. I know I know, not what you want to hear...
> but 1000s of
> > engineers use C64x EDMA and cache over the last few years.
> >
> > > What I have noticed is it is more of a cache issue rather than a DMA
> > > problem since the data can be verified. It is just that without the
> > > CPU intervening and using the CACHE API, the data gets distorted.
> >
> > What do you mean "output data in SDRAM at the very end"? End of
> what? Each block?
> > Or end of a bunch of blocks that consume all of SDRAM? If it's just
> the last block
> > or so, then what happens if you reduce your data set to use only 1/2
> of SDRAM? If
> > the situation still occurs, then I might say it's "boundary
> condition" type of error,
> > which usually implies an application / programmer issue rather than
> something else.
> >
> > > P.S : How do I get the posts to show up in this group as a continuous
> > > thread and without the wait.... that would be really cool.
> >
> > The group is moderated so posts can take a while to appear. That's
> a good thing or
> > the group would die to spam, but this group has been strong since 1999.
> >
> > -Jeff
> >
> > >
> > > >I think Guy is asking a good question. Won't 96k x 32 fit in
> internal
> > > > memory for 6416? So why use SDRAM and EDMA?
> > > >
> > > > If the issue is that at some other time you need internal memory for
> > > > another reason, then I would first try with L2 data cache
> enabled and no
> > > > EDMA. The first time through your data loop you lose the speed
> > > advantage
> > > > of EDMA, but subsequent times your performance is just as good. And
> > > more
> > > > importantly, that mode forces you to make absolutely sure your
> data is
> > > > organized in the most efficient manner, and you have "thought
> through"
> > > > exactly the sequence that data moves and cache is used.
> > > >
> > > > Then, enable EDMA as your last step. The performance gain its
> going to
> > > > give you in this situation is minimal, should you should get it
> working
> > > > last.
> > > >
> > > > -Jeff
> > >
> > > --- In c..., "Jeff Brower" wrote:
> > > >
> > > > Carl-
> > > >
> > > > > I have a real big problem with EDMA and cache coherency.
> > > > > Board :6416 Spectrum digital
> > > > > Here's what I am doing.
> > > > > 1) Transfer data from SDRAM to ISRAM.
> > > > > 2) Work with data in ISRAM
> > > > > 3) Transfer back to SDRAM
> > > > >
> > > > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > > > .
> > > > > .
> > > > > 5) Finally use SDRAM data.
> > > > >
> > > > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed
> in chunks
> > > > > of multiples of 128 (actually (96k).
> > > > > L2 cache is 128k and enabled.
> > > > >
> > > > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think
> that
> > > > > should be enough for cache coherency because the docs say L1D is
> > > > > handled by EDMA. Also ISRAM block is always cache coherent.
> Correct?
> > > > >
> > > > > But the data gets all screwed up....
> > > > >
> > > > > Logically, I think I am doing things right.
> > > > >
> > > > > Is there a way to check cache coherency without the debugger or
> > > > > comparing memory via cpu and checking - that in itself pulls
> it into
> > > > > L2/L1D cache?
> > > > > It looks like EDMA is doing it's job but caching isn't.
> > > > >
> > > > > What may be the problem here.... Appreciate some ideas.
> > > >
> > > > I think Guy is asking a good question. Won't 96k x 32 fit in
> internal
> > > > memory for 6416? So why use SDRAM and EDMA?
> > > >
> > > > If the issue is that at some other time you need internal memory for
> > > > another reason, then I would first try with L2 data cache
> enabled and no
> > > > EDMA. The first time through your data loop you lose the speed
> > > advantage
> > > > of EDMA, but subsequent times your performance is just as good. And
> > > more
> > > > importantly, that mode forces you to make absolutely sure your
> data is
> > > > organized in the most efficient manner, and you have "thought
> through"
> > > > exactly the sequence that data moves and cache is used.
> > > >
> > > > Then, enable EDMA as your last step. The performance gain its
> going to
> > > > give you in this situation is minimal, should you should get it
> working
> > > > last.
> > > >
> > > > -Jeff
> > > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
Reply by carlferns August 9, 20062006-08-09
William/Jeff,

> I think Carl wants to
> do it this way because he's got a large amount of external SDRAM
data (16 Mbyte) and
> he's sort of "double buffering": moving large slices to internal
memory while the
> CPU is chugging away at another slice. This method might better
utilize CPU internal
> memory bus bandwidth and keep both CPU and DMA units busy (hopefully).

That is correct - I intend implementing the ping/pong.....good
practises suggested in the TI documentation but still haven't got to
it....

> Although I still don't fully understand Carl doesn't DMA directly
into cache space...

Not sure I understood. DMA to cache? you mean SDRAM (EMIF) right?

>Otherwise you're right -- don't use EDMA between SRAM and internal
memory, let the
> CPU do the work, and keep cache enabled.

William , if the DSP alone accesses memory, you will never have a
cache coherency problem.

Regards,
C

--- In c..., Jeff Brower wrote:
>
> William-
>
> > I have what ay be a stupid question, but since I'm having some issues
> > programming a 6713 system that may be similar It's better to ask the
> > question and get clarification.
> >
> > If you enable the cache controller DSP, do you really need to make any
> > other cache calls unless you want to free up the L2 ram that the cache
> > is using? my limited understanding of caching would be that once
it is
> > enabled, you access memory using standard memory access commands, and
> > the cache controller optimizes what it thinks it needs to do, in the
> > chunk sizes it wants to. Issuing a command to invalidate and
flush the
> > cache gives you a controlled state of knowing when the cache has been
> > flushed, but should not be necessary.
> >
> > If you wanted complete control of what was in the L2 ram versus the
> > external ram, you'd be better off disabling cache altogether,
freeing up
> > the cache ram for general purpose use, and paging the data in manually
> > to do your manipulations.
> >
> > Is the 6416 processor significantly different in how it works with
cache
> > from the 6713? Am I completely off base in how to deal with a cache
> > controller?
>
> When you use EDMA to move data between external memory and internal
SRAM (not cache),
> the CPU doesn't "know" the internal memory has been changed; i.e.
there is no
> snooping. Code has to manually invalidate that area of cache. I
think Carl wants to
> do it this way because he's got a large amount of external SDRAM
data (16 Mbyte) and
> he's sort of "double buffering": moving large slices to internal
memory while the
> CPU is chugging away at another slice. This method might better
utilize CPU internal
> memory bus bandwidth and keep both CPU and DMA units busy (hopefully).
>
> Although I still don't fully understand Carl doesn't DMA directly
into cache space...
>
> Otherwise you're right -- don't use EDMA between SRAM and internal
memory, let the
> CPU do the work, and keep cache enabled.
>
> -Jeff
>
> > carlferns wrote:
> > >
> > > Jeff,
> > >
> > > Thanks for your comments.
> > > I guess I was not very clear with what I am doing.
> > > My issue is with DMA and the cache coherency.
> > > I have to process a lot more data (16mb)in SDRAM and am slicing
it up
> > > to be processed in ISRAM.
> > > What I keep seeing is that despite invalidating L2, the output
data in
> > > SDRAM at the very end (having processed all the input data from
SDRAM)
> > > is corrupt. The only processing going on is as I mentioned earlier
> > > STEPS 1-5.
> > > If I put a break point and view memory at any stage between
Steps 1-5,
> > > the debugger seems to handle the cache correctly and output data
is good.
> > >
> > > Conclusion - Cache controller is acting up or the API is not doing
> > > what it is supposed to do.
> > >
> > > >>
> > > >> If the issue is that at some other time you need internal
memory for
> > > >> another reason, then I would first try with L2 data cache enabled
> > > and no
> > > >> EDMA.
> > > >>
> > > Here's what I have tried this far.
> > > NO EDMA and No cache - The algorithm works great .
> > > NO EDMA and ENABLED Cache , No problem since I do not use any of the
> > > caching API.
> > > ENABLED EDMA and ENABLED Cache , the output data is bad.
> > >
> > > What I have noticed is it is more of a cache issue rather than a DMA
> > > problem since the data can be verified. It is just that without the
> > > CPU intervening and using the CACHE API, the data gets distorted.
> > >
> > > Thanks,
> > > C
> > >
> > > P.S : How do I get the posts to show up in this group as a
continuous
> > > thread and without the wait.... that would be really cool.
> > >
> > > >I think Guy is asking a good question. Won't 96k x 32 fit in
internal
> > > > memory for 6416? So why use SDRAM and EDMA?
> > > >
> > > > If the issue is that at some other time you need internal
memory for
> > > > another reason, then I would first try with L2 data cache
enabled and no
> > > > EDMA. The first time through your data loop you lose the speed
> > > advantage
> > > > of EDMA, but subsequent times your performance is just as
good. And
> > > more
> > > > importantly, that mode forces you to make absolutely sure your
data is
> > > > organized in the most efficient manner, and you have "thought
through"
> > > > exactly the sequence that data moves and cache is used.
> > > >
> > > > Then, enable EDMA as your last step. The performance gain its
going to
> > > > give you in this situation is minimal, should you should get
it working
> > > > last.
> > > >
> > > > -Jeff
> > >
> > > --- In c... , "Jeff
> > > Brower" wrote:
> > > >
> > > > Carl-
> > > >
> > > > > I have a real big problem with EDMA and cache coherency.
> > > > > Board :6416 Spectrum digital
> > > > > Here's what I am doing.
> > > > > 1) Transfer data from SDRAM to ISRAM.
> > > > > 2) Work with data in ISRAM
> > > > > 3) Transfer back to SDRAM
> > > > >
> > > > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > > > .
> > > > > .
> > > > > 5) Finally use SDRAM data.
> > > > >
> > > > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed
in chunks
> > > > > of multiples of 128 (actually (96k).
> > > > > L2 cache is 128k and enabled.
> > > > >
> > > > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I
think that
> > > > > should be enough for cache coherency because the docs say L1D is
> > > > > handled by EDMA. Also ISRAM block is always cache coherent.
Correct?
> > > > >
> > > > > But the data gets all screwed up....
> > > > >
> > > > > Logically, I think I am doing things right.
> > > > >
> > > > > Is there a way to check cache coherency without the debugger or
> > > > > comparing memory via cpu and checking - that in itself pulls
it into
> > > > > L2/L1D cache?
> > > > > It looks like EDMA is doing it's job but caching isn't.
> > > > >
> > > > > What may be the problem here.... Appreciate some ideas.
> > > >
> > > > I think Guy is asking a good question. Won't 96k x 32 fit in
internal
> > > > memory for 6416? So why use SDRAM and EDMA?
> > > >
> > > > If the issue is that at some other time you need internal
memory for
> > > > another reason, then I would first try with L2 data cache
enabled and no
> > > > EDMA. The first time through your data loop you lose the speed
> > > advantage
> > > > of EDMA, but subsequent times your performance is just as
good. And
> > > more
> > > > importantly, that mode forces you to make absolutely sure your
data is
> > > > organized in the most efficient manner, and you have "thought
through"
> > > > exactly the sequence that data moves and cache is used.
> > > >
> > > > Then, enable EDMA as your last step. The performance gain its
going to
> > > > give you in this situation is minimal, should you should get
it working
> > > > last.
> > > >
> > > > -Jeff
> > > >
> > >
> > >
> >
> >
> >
> >
> >
>
Reply by Jeff Brower August 8, 20062006-08-08
William-

> I have what ay be a stupid question, but since I'm having some issues
> programming a 6713 system that may be similar It's better to ask the
> question and get clarification.
>
> If you enable the cache controller DSP, do you really need to make any
> other cache calls unless you want to free up the L2 ram that the cache
> is using? my limited understanding of caching would be that once it is
> enabled, you access memory using standard memory access commands, and
> the cache controller optimizes what it thinks it needs to do, in the
> chunk sizes it wants to. Issuing a command to invalidate and flush the
> cache gives you a controlled state of knowing when the cache has been
> flushed, but should not be necessary.
>
> If you wanted complete control of what was in the L2 ram versus the
> external ram, you'd be better off disabling cache altogether, freeing up
> the cache ram for general purpose use, and paging the data in manually
> to do your manipulations.
>
> Is the 6416 processor significantly different in how it works with cache
> from the 6713? Am I completely off base in how to deal with a cache
> controller?

When you use EDMA to move data between external memory and internal SRAM (not cache),
the CPU doesn't "know" the internal memory has been changed; i.e. there is no
snooping. Code has to manually invalidate that area of cache. I think Carl wants to
do it this way because he's got a large amount of external SDRAM data (16 Mbyte) and
he's sort of "double buffering": moving large slices to internal memory while the
CPU is chugging away at another slice. This method might better utilize CPU internal
memory bus bandwidth and keep both CPU and DMA units busy (hopefully).

Although I still don't fully understand Carl doesn't DMA directly into cache space...

Otherwise you're right -- don't use EDMA between SRAM and internal memory, let the
CPU do the work, and keep cache enabled.

-Jeff

> carlferns wrote:
> >
> > Jeff,
> >
> > Thanks for your comments.
> > I guess I was not very clear with what I am doing.
> > My issue is with DMA and the cache coherency.
> > I have to process a lot more data (16mb)in SDRAM and am slicing it up
> > to be processed in ISRAM.
> > What I keep seeing is that despite invalidating L2, the output data in
> > SDRAM at the very end (having processed all the input data from SDRAM)
> > is corrupt. The only processing going on is as I mentioned earlier
> > STEPS 1-5.
> > If I put a break point and view memory at any stage between Steps 1-5,
> > the debugger seems to handle the cache correctly and output data is good.
> >
> > Conclusion - Cache controller is acting up or the API is not doing
> > what it is supposed to do.
> >
> > >>
> > >> If the issue is that at some other time you need internal memory for
> > >> another reason, then I would first try with L2 data cache enabled
> > and no
> > >> EDMA.
> > >>
> > Here's what I have tried this far.
> > NO EDMA and No cache - The algorithm works great .
> > NO EDMA and ENABLED Cache , No problem since I do not use any of the
> > caching API.
> > ENABLED EDMA and ENABLED Cache , the output data is bad.
> >
> > What I have noticed is it is more of a cache issue rather than a DMA
> > problem since the data can be verified. It is just that without the
> > CPU intervening and using the CACHE API, the data gets distorted.
> >
> > Thanks,
> > C
> >
> > P.S : How do I get the posts to show up in this group as a continuous
> > thread and without the wait.... that would be really cool.
> >
> > >I think Guy is asking a good question. Won't 96k x 32 fit in internal
> > > memory for 6416? So why use SDRAM and EDMA?
> > >
> > > If the issue is that at some other time you need internal memory for
> > > another reason, then I would first try with L2 data cache enabled and no
> > > EDMA. The first time through your data loop you lose the speed
> > advantage
> > > of EDMA, but subsequent times your performance is just as good. And
> > more
> > > importantly, that mode forces you to make absolutely sure your data is
> > > organized in the most efficient manner, and you have "thought through"
> > > exactly the sequence that data moves and cache is used.
> > >
> > > Then, enable EDMA as your last step. The performance gain its going to
> > > give you in this situation is minimal, should you should get it working
> > > last.
> > >
> > > -Jeff
> >
> > --- In c... , "Jeff
> > Brower" wrote:
> > >
> > > Carl-
> > >
> > > > I have a real big problem with EDMA and cache coherency.
> > > > Board :6416 Spectrum digital
> > > > Here's what I am doing.
> > > > 1) Transfer data from SDRAM to ISRAM.
> > > > 2) Work with data in ISRAM
> > > > 3) Transfer back to SDRAM
> > > >
> > > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > > .
> > > > .
> > > > 5) Finally use SDRAM data.
> > > >
> > > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > > > of multiples of 128 (actually (96k).
> > > > L2 cache is 128k and enabled.
> > > >
> > > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > > > should be enough for cache coherency because the docs say L1D is
> > > > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> > > >
> > > > But the data gets all screwed up....
> > > >
> > > > Logically, I think I am doing things right.
> > > >
> > > > Is there a way to check cache coherency without the debugger or
> > > > comparing memory via cpu and checking - that in itself pulls it into
> > > > L2/L1D cache?
> > > > It looks like EDMA is doing it's job but caching isn't.
> > > >
> > > > What may be the problem here.... Appreciate some ideas.
> > >
> > > I think Guy is asking a good question. Won't 96k x 32 fit in internal
> > > memory for 6416? So why use SDRAM and EDMA?
> > >
> > > If the issue is that at some other time you need internal memory for
> > > another reason, then I would first try with L2 data cache enabled and no
> > > EDMA. The first time through your data loop you lose the speed
> > advantage
> > > of EDMA, but subsequent times your performance is just as good. And
> > more
> > > importantly, that mode forces you to make absolutely sure your data is
> > > organized in the most efficient manner, and you have "thought through"
> > > exactly the sequence that data moves and cache is used.
> > >
> > > Then, enable EDMA as your last step. The performance gain its going to
> > > give you in this situation is minimal, should you should get it working
> > > last.
> > >
> > > -Jeff
> > >
> >
> >
Reply by Jeff Brower August 8, 20062006-08-08
Carl-

> I guess I was not very clear with what I am doing.
> My issue is with DMA and the cache coherency.
> I have to process a lot more data (16mb)in SDRAM and am slicing it up
> to be processed in ISRAM.
> What I keep seeing is that despite invalidating L2, the output data in
> SDRAM at the very end (having processed all the input data from SDRAM)
> is corrupt. The only processing going on is as I mentioned earlier
> STEPS 1-5.
> If I put a break point and view memory at any stage between Steps 1-5,
> the debugger seems to handle the cache correctly and output data is good.
>
> Conclusion - Cache controller is acting up or the API is not doing
> what it is supposed to do.

Or programmer error. I know I know, not what you want to hear... but 1000s of
engineers use C64x EDMA and cache over the last few years.

> What I have noticed is it is more of a cache issue rather than a DMA
> problem since the data can be verified. It is just that without the
> CPU intervening and using the CACHE API, the data gets distorted.

What do you mean "output data in SDRAM at the very end"? End of what? Each block?
Or end of a bunch of blocks that consume all of SDRAM? If it's just the last block
or so, then what happens if you reduce your data set to use only 1/2 of SDRAM? If
the situation still occurs, then I might say it's "boundary condition" type of error,
which usually implies an application / programmer issue rather than something else.

> P.S : How do I get the posts to show up in this group as a continuous
> thread and without the wait.... that would be really cool.

The group is moderated so posts can take a while to appear. That's a good thing or
the group would die to spam, but this group has been strong since 1999.

-Jeff

>
> >I think Guy is asking a good question. Won't 96k x 32 fit in internal
> > memory for 6416? So why use SDRAM and EDMA?
> >
> > If the issue is that at some other time you need internal memory for
> > another reason, then I would first try with L2 data cache enabled and no
> > EDMA. The first time through your data loop you lose the speed
> advantage
> > of EDMA, but subsequent times your performance is just as good. And
> more
> > importantly, that mode forces you to make absolutely sure your data is
> > organized in the most efficient manner, and you have "thought through"
> > exactly the sequence that data moves and cache is used.
> >
> > Then, enable EDMA as your last step. The performance gain its going to
> > give you in this situation is minimal, should you should get it working
> > last.
> >
> > -Jeff
>
> --- In c..., "Jeff Brower" wrote:
> >
> > Carl-
> >
> > > I have a real big problem with EDMA and cache coherency.
> > > Board :6416 Spectrum digital
> > > Here's what I am doing.
> > > 1) Transfer data from SDRAM to ISRAM.
> > > 2) Work with data in ISRAM
> > > 3) Transfer back to SDRAM
> > >
> > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > .
> > > .
> > > 5) Finally use SDRAM data.
> > >
> > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > > of multiples of 128 (actually (96k).
> > > L2 cache is 128k and enabled.
> > >
> > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > > should be enough for cache coherency because the docs say L1D is
> > > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> > >
> > > But the data gets all screwed up....
> > >
> > > Logically, I think I am doing things right.
> > >
> > > Is there a way to check cache coherency without the debugger or
> > > comparing memory via cpu and checking - that in itself pulls it into
> > > L2/L1D cache?
> > > It looks like EDMA is doing it's job but caching isn't.
> > >
> > > What may be the problem here.... Appreciate some ideas.
> >
> > I think Guy is asking a good question. Won't 96k x 32 fit in internal
> > memory for 6416? So why use SDRAM and EDMA?
> >
> > If the issue is that at some other time you need internal memory for
> > another reason, then I would first try with L2 data cache enabled and no
> > EDMA. The first time through your data loop you lose the speed
> advantage
> > of EDMA, but subsequent times your performance is just as good. And
> more
> > importantly, that mode forces you to make absolutely sure your data is
> > organized in the most efficient manner, and you have "thought through"
> > exactly the sequence that data moves and cache is used.
> >
> > Then, enable EDMA as your last step. The performance gain its going to
> > give you in this situation is minimal, should you should get it working
> > last.
> >
> > -Jeff
> >
Reply by Andrew Elder August 8, 20062006-08-08
I would try doing a cache clean and see if that makes a difference.

- Andrew E.

carlferns wrote:

>Jeff,
>
>Thanks for your comments.
>I guess I was not very clear with what I am doing.
>My issue is with DMA and the cache coherency.
>I have to process a lot more data (16mb)in SDRAM and am slicing it up
>to be processed in ISRAM.
>What I keep seeing is that despite invalidating L2, the output data in
>SDRAM at the very end (having processed all the input data from SDRAM)
>is corrupt. The only processing going on is as I mentioned earlier
>STEPS 1-5.
>If I put a break point and view memory at any stage between Steps 1-5,
>the debugger seems to handle the cache correctly and output data is good.
>
>Conclusion - Cache controller is acting up or the API is not doing
>what it is supposed to do.
>
>
>
>>>If the issue is that at some other time you need internal memory for
>>>another reason, then I would first try with L2 data cache enabled
>>>
>>>
>and no
>
>
>>>EDMA.
>>>
>>>
>>>
>Here's what I have tried this far.
>NO EDMA and No cache - The algorithm works great .
>NO EDMA and ENABLED Cache , No problem since I do not use any of the
>caching API.
>ENABLED EDMA and ENABLED Cache , the output data is bad.
>
>What I have noticed is it is more of a cache issue rather than a DMA
>problem since the data can be verified. It is just that without the
>CPU intervening and using the CACHE API, the data gets distorted.
>
>Thanks,
>C
>
>P.S : How do I get the posts to show up in this group as a continuous
>thread and without the wait.... that would be really cool.
>
>
>>I think Guy is asking a good question. Won't 96k x 32 fit in internal
>>memory for 6416? So why use SDRAM and EDMA?
>>
>>If the issue is that at some other time you need internal memory for
>>another reason, then I would first try with L2 data cache enabled and no
>>EDMA. The first time through your data loop you lose the speed
>>
>>
>advantage
>
>
>>of EDMA, but subsequent times your performance is just as good. And
>>
>>
>more
>
>
>>importantly, that mode forces you to make absolutely sure your data is
>>organized in the most efficient manner, and you have "thought through"
>>exactly the sequence that data moves and cache is used.
>>
>>Then, enable EDMA as your last step. The performance gain its going to
>>give you in this situation is minimal, should you should get it working
>>last.
>>
>>-Jeff
>>
>>--- In c..., "Jeff Brower" wrote:
>
>
>>Carl-
>>
>>
>>
>>>I have a real big problem with EDMA and cache coherency.
>>>Board :6416 Spectrum digital
>>>Here's what I am doing.
>>>1) Transfer data from SDRAM to ISRAM.
>>>2) Work with data in ISRAM
>>>3) Transfer back to SDRAM
>>>
>>>4) Repeat process for next block in SDRAM to same block in ISRAM
>>>.
>>>.
>>>5) Finally use SDRAM data.
>>>
>>>Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
>>>of multiples of 128 (actually (96k).
>>>L2 cache is 128k and enabled.
>>>
>>>Now before transfer from SDRAM to ISRAM in step 1, I always
>>>CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
>>>should be enough for cache coherency because the docs say L1D is
>>>handled by EDMA. Also ISRAM block is always cache coherent. Correct?
>>>
>>>But the data gets all screwed up....
>>>
>>>Logically, I think I am doing things right.
>>>
>>>Is there a way to check cache coherency without the debugger or
>>>comparing memory via cpu and checking - that in itself pulls it into
>>>L2/L1D cache?
>>>It looks like EDMA is doing it's job but caching isn't.
>>>
>>>What may be the problem here.... Appreciate some ideas.
>>>
>>>
>>I think Guy is asking a good question. Won't 96k x 32 fit in internal
>>memory for 6416? So why use SDRAM and EDMA?
>>
>>If the issue is that at some other time you need internal memory for
>>another reason, then I would first try with L2 data cache enabled and no
>>EDMA. The first time through your data loop you lose the speed
>>
>>
>advantage
>
>
>>of EDMA, but subsequent times your performance is just as good. And
>>
>>
>more
>
>
>>importantly, that mode forces you to make absolutely sure your data is
>>organized in the most efficient manner, and you have "thought through"
>>exactly the sequence that data moves and cache is used.
>>
>>Then, enable EDMA as your last step. The performance gain its going to
>>give you in this situation is minimal, should you should get it working
>>last.
>>
>>-Jeff
>>
>>
>>
>
Reply by William C Bonner August 7, 20062006-08-07
I have what ay be a stupid question, but since I'm having some issues
programming a 6713 system that may be similar It's better to ask the
question and get clarification.

If you enable the cache controller DSP, do you really need to make any
other cache calls unless you want to free up the L2 ram that the cache
is using? my limited understanding of caching would be that once it is
enabled, you access memory using standard memory access commands, and
the cache controller optimizes what it thinks it needs to do, in the
chunk sizes it wants to. Issuing a command to invalidate and flush the
cache gives you a controlled state of knowing when the cache has been
flushed, but should not be necessary.

If you wanted complete control of what was in the L2 ram versus the
external ram, you'd be better off disabling cache altogether, freeing up
the cache ram for general purpose use, and paging the data in manually
to do your manipulations.

Is the 6416 processor significantly different in how it works with cache
from the 6713? Am I completely off base in how to deal with a cache
controller?

carlferns wrote:
>
> Jeff,
>
> Thanks for your comments.
> I guess I was not very clear with what I am doing.
> My issue is with DMA and the cache coherency.
> I have to process a lot more data (16mb)in SDRAM and am slicing it up
> to be processed in ISRAM.
> What I keep seeing is that despite invalidating L2, the output data in
> SDRAM at the very end (having processed all the input data from SDRAM)
> is corrupt. The only processing going on is as I mentioned earlier
> STEPS 1-5.
> If I put a break point and view memory at any stage between Steps 1-5,
> the debugger seems to handle the cache correctly and output data is good.
>
> Conclusion - Cache controller is acting up or the API is not doing
> what it is supposed to do.
>
> >>
> >> If the issue is that at some other time you need internal memory for
> >> another reason, then I would first try with L2 data cache enabled
> and no
> >> EDMA.
> >>
> Here's what I have tried this far.
> NO EDMA and No cache - The algorithm works great .
> NO EDMA and ENABLED Cache , No problem since I do not use any of the
> caching API.
> ENABLED EDMA and ENABLED Cache , the output data is bad.
>
> What I have noticed is it is more of a cache issue rather than a DMA
> problem since the data can be verified. It is just that without the
> CPU intervening and using the CACHE API, the data gets distorted.
>
> Thanks,
> C
>
> P.S : How do I get the posts to show up in this group as a continuous
> thread and without the wait.... that would be really cool.
>
> >I think Guy is asking a good question. Won't 96k x 32 fit in internal
> > memory for 6416? So why use SDRAM and EDMA?
> >
> > If the issue is that at some other time you need internal memory for
> > another reason, then I would first try with L2 data cache enabled and no
> > EDMA. The first time through your data loop you lose the speed
> advantage
> > of EDMA, but subsequent times your performance is just as good. And
> more
> > importantly, that mode forces you to make absolutely sure your data is
> > organized in the most efficient manner, and you have "thought through"
> > exactly the sequence that data moves and cache is used.
> >
> > Then, enable EDMA as your last step. The performance gain its going to
> > give you in this situation is minimal, should you should get it working
> > last.
> >
> > -Jeff
>
> --- In c... , "Jeff
> Brower" wrote:
> >
> > Carl-
> >
> > > I have a real big problem with EDMA and cache coherency.
> > > Board :6416 Spectrum digital
> > > Here's what I am doing.
> > > 1) Transfer data from SDRAM to ISRAM.
> > > 2) Work with data in ISRAM
> > > 3) Transfer back to SDRAM
> > >
> > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > .
> > > .
> > > 5) Finally use SDRAM data.
> > >
> > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > > of multiples of 128 (actually (96k).
> > > L2 cache is 128k and enabled.
> > >
> > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > > should be enough for cache coherency because the docs say L1D is
> > > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> > >
> > > But the data gets all screwed up....
> > >
> > > Logically, I think I am doing things right.
> > >
> > > Is there a way to check cache coherency without the debugger or
> > > comparing memory via cpu and checking - that in itself pulls it into
> > > L2/L1D cache?
> > > It looks like EDMA is doing it's job but caching isn't.
> > >
> > > What may be the problem here.... Appreciate some ideas.
> >
> > I think Guy is asking a good question. Won't 96k x 32 fit in internal
> > memory for 6416? So why use SDRAM and EDMA?
> >
> > If the issue is that at some other time you need internal memory for
> > another reason, then I would first try with L2 data cache enabled and no
> > EDMA. The first time through your data loop you lose the speed
> advantage
> > of EDMA, but subsequent times your performance is just as good. And
> more
> > importantly, that mode forces you to make absolutely sure your data is
> > organized in the most efficient manner, and you have "thought through"
> > exactly the sequence that data moves and cache is used.
> >
> > Then, enable EDMA as your last step. The performance gain its going to
> > give you in this situation is minimal, should you should get it working
> > last.
> >
> > -Jeff
> >
Reply by carlferns August 7, 20062006-08-07
Jeff,

Thanks for your comments.
I guess I was not very clear with what I am doing.
My issue is with DMA and the cache coherency.
I have to process a lot more data (16mb)in SDRAM and am slicing it up
to be processed in ISRAM.
What I keep seeing is that despite invalidating L2, the output data in
SDRAM at the very end (having processed all the input data from SDRAM)
is corrupt. The only processing going on is as I mentioned earlier
STEPS 1-5.
If I put a break point and view memory at any stage between Steps 1-5,
the debugger seems to handle the cache correctly and output data is good.

Conclusion - Cache controller is acting up or the API is not doing
what it is supposed to do.

>>
>> If the issue is that at some other time you need internal memory for
>> another reason, then I would first try with L2 data cache enabled
and no
>> EDMA.
>>
Here's what I have tried this far.
NO EDMA and No cache - The algorithm works great .
NO EDMA and ENABLED Cache , No problem since I do not use any of the
caching API.
ENABLED EDMA and ENABLED Cache , the output data is bad.

What I have noticed is it is more of a cache issue rather than a DMA
problem since the data can be verified. It is just that without the
CPU intervening and using the CACHE API, the data gets distorted.

Thanks,
C

P.S : How do I get the posts to show up in this group as a continuous
thread and without the wait.... that would be really cool.

>I think Guy is asking a good question. Won't 96k x 32 fit in internal
> memory for 6416? So why use SDRAM and EDMA?
>
> If the issue is that at some other time you need internal memory for
> another reason, then I would first try with L2 data cache enabled and no
> EDMA. The first time through your data loop you lose the speed
advantage
> of EDMA, but subsequent times your performance is just as good. And
more
> importantly, that mode forces you to make absolutely sure your data is
> organized in the most efficient manner, and you have "thought through"
> exactly the sequence that data moves and cache is used.
>
> Then, enable EDMA as your last step. The performance gain its going to
> give you in this situation is minimal, should you should get it working
> last.
>
> -Jeff

--- In c..., "Jeff Brower" wrote:
>
> Carl-
>
> > I have a real big problem with EDMA and cache coherency.
> > Board :6416 Spectrum digital
> > Here's what I am doing.
> > 1) Transfer data from SDRAM to ISRAM.
> > 2) Work with data in ISRAM
> > 3) Transfer back to SDRAM
> >
> > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > .
> > .
> > 5) Finally use SDRAM data.
> >
> > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > of multiples of 128 (actually (96k).
> > L2 cache is 128k and enabled.
> >
> > Now before transfer from SDRAM to ISRAM in step 1, I always
> > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > should be enough for cache coherency because the docs say L1D is
> > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> >
> > But the data gets all screwed up....
> >
> > Logically, I think I am doing things right.
> >
> > Is there a way to check cache coherency without the debugger or
> > comparing memory via cpu and checking - that in itself pulls it into
> > L2/L1D cache?
> > It looks like EDMA is doing it's job but caching isn't.
> >
> > What may be the problem here.... Appreciate some ideas.
>
> I think Guy is asking a good question. Won't 96k x 32 fit in internal
> memory for 6416? So why use SDRAM and EDMA?
>
> If the issue is that at some other time you need internal memory for
> another reason, then I would first try with L2 data cache enabled and no
> EDMA. The first time through your data loop you lose the speed
advantage
> of EDMA, but subsequent times your performance is just as good. And
more
> importantly, that mode forces you to make absolutely sure your data is
> organized in the most efficient manner, and you have "thought through"
> exactly the sequence that data moves and cache is used.
>
> Then, enable EDMA as your last step. The performance gain its going to
> give you in this situation is minimal, should you should get it working
> last.
>
> -Jeff
>
Reply by carlferns August 7, 20062006-08-07
Guy,

Thanks for your comments.
All my code/data is in ISRAM. My dynamically allocated data buffer is
the only thing in SDRAM on a 128 cache boundary and you are correct -
I am using the slice wise approach to transfer from SDRAM to ISRAM.

I tried it without L2 cache enabled/set but that does not change the
data corruption.

Here's where I find an issue with the cache (both L1d /L2). If I do a
verify_data(), which effectively compares the data (src and dst) byte
by byte via cpu, there is no data corruption thereafter. I think this
step in itself rectifies the cache for coherency .

Some thoughts
1) Does LID/L2 clear all the cache lines for the entire block of data
given in the functions that take the parameteres (data_block, size,
CACHE_WAIT) or does it just look for the block address alone in cache
and then clear the range (size) specified? I am afraid if the range is
not contiguous , there still remains bad cache data.
The addresses within the size to be cleared may not be contiguous but
could be somewhere else in cache.....
I guess I am looking at the inner working of the cache controller.
I will have to try and clear every address in 128 byte increments
starting from the data_block address.

2) Is there a way I can dump cache data or look at cache data..... May
be a stupid question - via a logic analyser...JTAG - any other tools?

-C
--- In c..., "Guy Eschemann" wrote:
>
> This may be a stupid question, but why are you using L2 data cache
if your
> data is already in internal memory?
>
> If your application permits it, I would suggest that you put
everything (ie.
> code + data) in internal memory. If you have some large data
structures that
> won't fit in ISRAM, leave those outside in SDRAM and process them
slice-wise
> in internal memory. And turn off the L2 data cache. This will get
you rid of
> the nasty coherency problems, and as a bonus you'll have more internal
> memory for your code/data.
>
> In case code + data won't fit into internal memory, leave the code
outside
> and enable L2 cache for the external code section. L2 caching works much
> better for code than for data, because code is executed sequentially
most of
> the time. This produces less cache misses. And you don't have to
worry about
> cache coherency issues when using program cache.
>
> In case the execution is too slow, consider moving single critical
functions
> to internal memory. You can do that by creating a section in your linker
> command file, and then using the #pragma CODE_SECTION() directive for
> pointing out those functions.
>
> Hope this helps,
>
> Guy Eschemann.
> Vienna, Austria.
>
> On 8/2/06, carlferns wrote:
> >
> > Folks,
> >
> > I have a real big problem with EDMA and cache coherency.
> > Board :6416 Spectrum digital
> > Here's what I am doing.
> > 1) Transfer data from SDRAM to ISRAM.
> > 2) Work with data in ISRAM
> > 3) Transfer back to SDRAM
> >
> > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > .
> > .
> > 5) Finally use SDRAM data.
> >
> > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > of multiples of 128 (actually (96k).
> > L2 cache is 128k and enabled.
> >
> > Now before transfer from SDRAM to ISRAM in step 1, I always
> > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > should be enough for cache coherency because the docs say L1D is
> > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> >
> > But the data gets all screwed up....
> >
> > Logically, I think I am doing things right.
> >
> > Is there a way to check cache coherency without the debugger or
> > comparing memory via cpu and checking - that in itself pulls it into
> > L2/L1D cache?
> > It looks like EDMA is doing it's job but caching isn't.
> >
> > What may be the problem here.... Appreciate some ideas.
> >
> > Thanks,
> >
> > C
> >
>
Reply by Jeff Brower August 7, 20062006-08-07
Carl-

> I have a real big problem with EDMA and cache coherency.
> Board :6416 Spectrum digital
> Here's what I am doing.
> 1) Transfer data from SDRAM to ISRAM.
> 2) Work with data in ISRAM
> 3) Transfer back to SDRAM
>
> 4) Repeat process for next block in SDRAM to same block in ISRAM
> .
> .
> 5) Finally use SDRAM data.
>
> Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> of multiples of 128 (actually (96k).
> L2 cache is 128k and enabled.
>
> Now before transfer from SDRAM to ISRAM in step 1, I always
> CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> should be enough for cache coherency because the docs say L1D is
> handled by EDMA. Also ISRAM block is always cache coherent. Correct?
>
> But the data gets all screwed up....
>
> Logically, I think I am doing things right.
>
> Is there a way to check cache coherency without the debugger or
> comparing memory via cpu and checking - that in itself pulls it into
> L2/L1D cache?
> It looks like EDMA is doing it's job but caching isn't.
>
> What may be the problem here.... Appreciate some ideas.

I think Guy is asking a good question. Won't 96k x 32 fit in internal
memory for 6416? So why use SDRAM and EDMA?

If the issue is that at some other time you need internal memory for
another reason, then I would first try with L2 data cache enabled and no
EDMA. The first time through your data loop you lose the speed advantage
of EDMA, but subsequent times your performance is just as good. And more
importantly, that mode forces you to make absolutely sure your data is
organized in the most efficient manner, and you have "thought through"
exactly the sequence that data moves and cache is used.

Then, enable EDMA as your last step. The performance gain its going to
give you in this situation is minimal, should you should get it working
last.

-Jeff