> > Although I still don't fully understand Carl
doesn't DMA directly
> into cache space...
>
> Not sure I understood. DMA to cache? you mean SDRAM (EMIF) right?
Sorry I was confusing -- I just mean let the CPU and cache controller do the
work.
If your code *always, without exception* accesses L2 SRAM, and you DMA *only*
between
SDRAM and L2 SRAM (ping-pong buffers located in L2 SRAM), then the CPU should
handle
all cache coherency issues for you. You should not need to make any cache API
calls
in CSL.
Probably you have looked this over already, but just in case:
> --- In c..., Jeff Brower wrote:
> >
> > William-
> >
> > > I have what ay be a stupid question, but since I'm having some
issues
> > > programming a 6713 system that may be similar It's better to ask
the
> > > question and get clarification.
> > >
> > > If you enable the cache controller DSP, do you really need to make any
> > > other cache calls unless you want to free up the L2 ram that the cache
> > > is using? my limited understanding of caching would be that once
> it is
> > > enabled, you access memory using standard memory access commands, and
> > > the cache controller optimizes what it thinks it needs to do, in the
> > > chunk sizes it wants to. Issuing a command to invalidate and
> flush the
> > > cache gives you a controlled state of knowing when the cache has been
> > > flushed, but should not be necessary.
> > >
> > > If you wanted complete control of what was in the L2 ram versus the
> > > external ram, you'd be better off disabling cache altogether,
> freeing up
> > > the cache ram for general purpose use, and paging the data in manually
> > > to do your manipulations.
> > >
> > > Is the 6416 processor significantly different in how it works with
> cache
> > > from the 6713? Am I completely off base in how to deal with a cache
> > > controller?
> >
> > When you use EDMA to move data between external memory and internal
> SRAM (not cache),
> > the CPU doesn't "know" the internal memory has been changed; i.e.
> there is no
> > snooping. Code has to manually invalidate that area of cache. I
> think Carl wants to
> > do it this way because he's got a large amount of external SDRAM
> data (16 Mbyte) and
> > he's sort of "double buffering": moving large slices to internal
> memory while the
> > CPU is chugging away at another slice. This method might better
> utilize CPU internal
> > memory bus bandwidth and keep both CPU and DMA units busy (hopefully).
> >
> > Although I still don't fully understand Carl doesn't DMA
directly
> into cache space...
> >
> > Otherwise you're right -- don't use EDMA between SRAM and
internal
> memory, let the
> > CPU do the work, and keep cache enabled.
> >
> > -Jeff
> >
> > > carlferns wrote:
> > > >
> > > > Jeff,
> > > >
> > > > Thanks for your comments.
> > > > I guess I was not very clear with what I am doing.
> > > > My issue is with DMA and the cache coherency.
> > > > I have to process a lot more data (16mb)in SDRAM and am slicing
> it up
> > > > to be processed in ISRAM.
> > > > What I keep seeing is that despite invalidating L2, the output
> data in
> > > > SDRAM at the very end (having processed all the input data from
> SDRAM)
> > > > is corrupt. The only processing going on is as I mentioned earlier
> > > > STEPS 1-5.
> > > > If I put a break point and view memory at any stage between
> Steps 1-5,
> > > > the debugger seems to handle the cache correctly and output data
> is good.
> > > >
> > > > Conclusion - Cache controller is acting up or the API is not doing
> > > > what it is supposed to do.
> > > >
> > > > >>
> > > > >> If the issue is that at some other time you need internal
> memory for
> > > > >> another reason, then I would first try with L2 data cache enabled
> > > > and no
> > > > >> EDMA.
> > > > >>
> > > > Here's what I have tried this far.
> > > > NO EDMA and No cache - The algorithm works great .
> > > > NO EDMA and ENABLED Cache , No problem since I do not use any of the
> > > > caching API.
> > > > ENABLED EDMA and ENABLED Cache , the output data is bad.
> > > >
> > > > What I have noticed is it is more of a cache issue rather than a DMA
> > > > problem since the data can be verified. It is just that without the
> > > > CPU intervening and using the CACHE API, the data gets distorted.
> > > >
> > > > Thanks,
> > > > C
> > > >
> > > > P.S : How do I get the posts to show up in this group as a
> continuous
> > > > thread and without the wait.... that would be really cool.
> > > >
> > > > >I think Guy is asking a good question. Won't 96k x 32 fit in
> internal
> > > > > memory for 6416? So why use SDRAM and EDMA?
> > > > >
> > > > > If the issue is that at some other time you need internal
> memory for
> > > > > another reason, then I would first try with L2 data cache
> enabled and no
> > > > > EDMA. The first time through your data loop you lose the speed
> > > > advantage
> > > > > of EDMA, but subsequent times your performance is just as
> good. And
> > > > more
> > > > > importantly, that mode forces you to make absolutely sure your
> data is
> > > > > organized in the most efficient manner, and you have "thought
> through"
> > > > > exactly the sequence that data moves and cache is used.
> > > > >
> > > > > Then, enable EDMA as your last step. The performance gain its
> going to
> > > > > give you in this situation is minimal, should you should get
> it working
> > > > > last.
> > > > >
> > > > > -Jeff
> > > >
> > > > --- In c... , "Jeff
> > > > Brower" wrote:
> > > > >
> > > > > Carl-
> > > > >
> > > > > > I have a real big problem with EDMA and cache coherency.
> > > > > > Board :6416 Spectrum digital
> > > > > > Here's what I am doing.
> > > > > > 1) Transfer data from SDRAM to ISRAM.
> > > > > > 2) Work with data in ISRAM
> > > > > > 3) Transfer back to SDRAM
> > > > > >
> > > > > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > > > > .
> > > > > > .
> > > > > > 5) Finally use SDRAM data.
> > > > > >
> > > > > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed
> in chunks
> > > > > > of multiples of 128 (actually (96k).
> > > > > > L2 cache is 128k and enabled.
> > > > > >
> > > > > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > > > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I
> think that
> > > > > > should be enough for cache coherency because the docs say L1D is
> > > > > > handled by EDMA. Also ISRAM block is always cache coherent.
> Correct?
> > > > > >
> > > > > > But the data gets all screwed up....
> > > > > >
> > > > > > Logically, I think I am doing things right.
> > > > > >
> > > > > > Is there a way to check cache coherency without the debugger or
> > > > > > comparing memory via cpu and checking - that in itself pulls
> it into
> > > > > > L2/L1D cache?
> > > > > > It looks like EDMA is doing it's job but caching isn't.
> > > > > >
> > > > > > What may be the problem here.... Appreciate some ideas.
> > > > >
> > > > > I think Guy is asking a good question. Won't 96k x 32 fit in
> internal
> > > > > memory for 6416? So why use SDRAM and EDMA?
> > > > >
> > > > > If the issue is that at some other time you need internal
> memory for
> > > > > another reason, then I would first try with L2 data cache
> enabled and no
> > > > > EDMA. The first time through your data loop you lose the speed
> > > > advantage
> > > > > of EDMA, but subsequent times your performance is just as
> good. And
> > > > more
> > > > > importantly, that mode forces you to make absolutely sure your
> data is
> > > > > organized in the most efficient manner, and you have "thought
> through"
> > > > > exactly the sequence that data moves and cache is used.
> > > > >
> > > > > Then, enable EDMA as your last step. The performance gain its
> going to
> > > > > give you in this situation is minimal, should you should get
> it working
> > > > > last.
> > > > >
> > > > > -Jeff
> > > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
Reply by Jeff Brower●August 9, 20062006-08-09
Carl-
> If it is programmer error, I sure enough will change
the code but I'd
> like to understand the reason. I also noticed there was a bug with the
> cache API wherein it needed to be called twice to work correctly.
> Unfortunately after pouring thro' the forum, the general consensus is
> to use a form of CACHE_CLEAN_ALL_L2 which seems like an overkill.
>
> Here's the pseudo code for this test exercise using first principles.
> 1) Read a 16mb file into SDRAM - ( a simple image that has all zeros).
> 2) Loop for each 128k chunk in the 16mb SDRAM
> Invalidate the SDRAM 128k chunk address in L2 cache, size 128k.
> Synchronous Transfer 128k to IRAM from SDRAM using EDMA.
Invalidate step not needed for C64x, since EDMA destination is L2 SRAM.
Cache
controller will snoop this and invalidate for you.
> //Verify IRAM and SDRAM block data by calling
a routine similar to
> what one would see in the sample DAT examples
> Change IRAM data - in this case just change all the pixels to white
> (255).
Ok as long as code *only* accesses IRAM (L2 SRAM).
> Synchronous Transfer of the 128k back to same
address in SDRAM from
> IRAM using EDMA.
> //Verify IRAM and SDRAM block data by calling a routine similar to
> what one would see in the sample DAT examples
> 3) Write out 16mb file from SDRAM to file
>
> The exact locations of resulting bad data in SDRAM is random but
> consistently within the first 128 bytes of every block.
This sounds like a synchronization issue. What if the cache line gets
invalidated
once DMA "touches" its associated address in L2 SRAM, but your code somehow
accesses
that line before DMA is finished? Only the first line is affected, after that
DMA is
faster than your code so the rest of the block looks normal.
-Jeff > --- In c..., Jeff Brower wrote:
> >
> > Carl-
> >
> > > I guess I was not very clear with what I am doing.
> > > My issue is with DMA and the cache coherency.
> > > I have to process a lot more data (16mb)in SDRAM and am slicing it up
> > > to be processed in ISRAM.
> > > What I keep seeing is that despite invalidating L2, the output data in
> > > SDRAM at the very end (having processed all the input data from SDRAM)
> > > is corrupt. The only processing going on is as I mentioned earlier
> > > STEPS 1-5.
> > > If I put a break point and view memory at any stage between Steps 1-5,
> > > the debugger seems to handle the cache correctly and output data
> is good.
> > >
> > > Conclusion - Cache controller is acting up or the API is not doing
> > > what it is supposed to do.
> >
> > Or programmer error. I know I know, not what you want to hear...
> but 1000s of
> > engineers use C64x EDMA and cache over the last few years.
> >
> > > What I have noticed is it is more of a cache issue rather than a DMA
> > > problem since the data can be verified. It is just that without the
> > > CPU intervening and using the CACHE API, the data gets distorted.
> >
> > What do you mean "output data in SDRAM at the very end"? End of
> what? Each block?
> > Or end of a bunch of blocks that consume all of SDRAM? If it's just
> the last block
> > or so, then what happens if you reduce your data set to use only 1/2
> of SDRAM? If
> > the situation still occurs, then I might say it's "boundary
> condition" type of error,
> > which usually implies an application / programmer issue rather than
> something else.
> >
> > > P.S : How do I get the posts to show up in this group as a continuous
> > > thread and without the wait.... that would be really cool.
> >
> > The group is moderated so posts can take a while to appear. That's
> a good thing or
> > the group would die to spam, but this group has been strong since 1999.
> >
> > -Jeff
> >
> > >
> > > >I think Guy is asking a good question. Won't 96k x 32 fit in
> internal
> > > > memory for 6416? So why use SDRAM and EDMA?
> > > >
> > > > If the issue is that at some other time you need internal memory for
> > > > another reason, then I would first try with L2 data cache
> enabled and no
> > > > EDMA. The first time through your data loop you lose the speed
> > > advantage
> > > > of EDMA, but subsequent times your performance is just as good. And
> > > more
> > > > importantly, that mode forces you to make absolutely sure your
> data is
> > > > organized in the most efficient manner, and you have "thought
> through"
> > > > exactly the sequence that data moves and cache is used.
> > > >
> > > > Then, enable EDMA as your last step. The performance gain its
> going to
> > > > give you in this situation is minimal, should you should get it
> working
> > > > last.
> > > >
> > > > -Jeff
> > >
> > > --- In c..., "Jeff Brower" wrote:
> > > >
> > > > Carl-
> > > >
> > > > > I have a real big problem with EDMA and cache coherency.
> > > > > Board :6416 Spectrum digital
> > > > > Here's what I am doing.
> > > > > 1) Transfer data from SDRAM to ISRAM.
> > > > > 2) Work with data in ISRAM
> > > > > 3) Transfer back to SDRAM
> > > > >
> > > > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > > > .
> > > > > .
> > > > > 5) Finally use SDRAM data.
> > > > >
> > > > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed
> in chunks
> > > > > of multiples of 128 (actually (96k).
> > > > > L2 cache is 128k and enabled.
> > > > >
> > > > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think
> that
> > > > > should be enough for cache coherency because the docs say L1D is
> > > > > handled by EDMA. Also ISRAM block is always cache coherent.
> Correct?
> > > > >
> > > > > But the data gets all screwed up....
> > > > >
> > > > > Logically, I think I am doing things right.
> > > > >
> > > > > Is there a way to check cache coherency without the debugger or
> > > > > comparing memory via cpu and checking - that in itself pulls
> it into
> > > > > L2/L1D cache?
> > > > > It looks like EDMA is doing it's job but caching isn't.
> > > > >
> > > > > What may be the problem here.... Appreciate some ideas.
> > > >
> > > > I think Guy is asking a good question. Won't 96k x 32 fit in
> internal
> > > > memory for 6416? So why use SDRAM and EDMA?
> > > >
> > > > If the issue is that at some other time you need internal memory for
> > > > another reason, then I would first try with L2 data cache
> enabled and no
> > > > EDMA. The first time through your data loop you lose the speed
> > > advantage
> > > > of EDMA, but subsequent times your performance is just as good. And
> > > more
> > > > importantly, that mode forces you to make absolutely sure your
> data is
> > > > organized in the most efficient manner, and you have "thought
> through"
> > > > exactly the sequence that data moves and cache is used.
> > > >
> > > > Then, enable EDMA as your last step. The performance gain its
> going to
> > > > give you in this situation is minimal, should you should get it
> working
> > > > last.
> > > >
> > > > -Jeff
> > > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
Reply by carlferns●August 9, 20062006-08-09
William/Jeff,
> I think Carl wants to
> do it this way because he's got a large amount of external SDRAM data (16 Mbyte) and > he's sort of "double buffering": moving large
slices to internal memory while the > CPU is chugging away at another slice. This method
might better utilize CPU internal > memory bus bandwidth and keep both CPU and DMA units
busy (hopefully).
That is correct - I intend implementing the ping/pong.....good
practises suggested in the TI documentation but still haven't got to
it....
> Although I still don't fully understand Carl
doesn't DMA directly into cache space...
Not sure I understood. DMA to cache? you mean SDRAM (EMIF) right?
>Otherwise you're right -- don't use EDMA
between SRAM and internal memory, let the > CPU do the work, and keep cache enabled.
William , if the DSP alone accesses memory, you will never have a
cache coherency problem.
Regards,
C
--- In c..., Jeff Brower wrote: >
> William-
>
> > I have what ay be a stupid question, but since I'm having some
issues
> > programming a 6713 system that may be similar It's better to ask the
> > question and get clarification.
> >
> > If you enable the cache controller DSP, do you really need to make any
> > other cache calls unless you want to free up the L2 ram that the cache
> > is using? my limited understanding of caching would be that once it is > > enabled, you access memory using standard memory
access commands, and
> > the cache controller optimizes what it thinks it needs to do, in the
> > chunk sizes it wants to. Issuing a command to invalidate and flush the > > cache gives you a controlled state of knowing when
the cache has been
> > flushed, but should not be necessary.
> >
> > If you wanted complete control of what was in the L2 ram versus the
> > external ram, you'd be better off disabling cache altogether, freeing up > > the cache ram for general purpose use, and paging
the data in manually
> > to do your manipulations.
> >
> > Is the 6416 processor significantly different in how it works with cache > > from the 6713? Am I completely off base in how to
deal with a cache
> > controller?
>
> When you use EDMA to move data between external memory and internal SRAM (not cache), > the CPU doesn't "know" the internal memory has
been changed; i.e. there is no > snooping. Code has to manually invalidate that area
of cache. I think Carl wants to > do it this way because he's got a large amount
of external SDRAM data (16 Mbyte) and > he's sort of "double buffering": moving large
slices to internal memory while the > CPU is chugging away at another slice. This method
might better utilize CPU internal > memory bus bandwidth and keep both CPU and DMA units
busy (hopefully).
>
> Although I still don't fully understand Carl doesn't DMA directly into cache space... >
> Otherwise you're right -- don't use EDMA between SRAM and
internal memory, let the > CPU do the work, and keep cache enabled.
>
> -Jeff
>
> > carlferns wrote:
> > >
> > > Jeff,
> > >
> > > Thanks for your comments.
> > > I guess I was not very clear with what I am doing.
> > > My issue is with DMA and the cache coherency.
> > > I have to process a lot more data (16mb)in SDRAM and am slicing it up > > > to be processed in ISRAM.
> > > What I keep seeing is that despite invalidating L2, the output data in > > > SDRAM at the very end (having processed all the
input data from SDRAM) > > > is corrupt. The only processing going on is as I
mentioned earlier
> > > STEPS 1-5.
> > > If I put a break point and view memory at any stage between Steps 1-5, > > > the debugger seems to handle the cache correctly
and output data is good. > > >
> > > Conclusion - Cache controller is acting up or the API is not doing
> > > what it is supposed to do.
> > >
> > > >>
> > > >> If the issue is that at some other time you need internal memory for > > > >> another reason, then I would first try with L2
data cache enabled
> > > and no
> > > >> EDMA.
> > > >>
> > > Here's what I have tried this far.
> > > NO EDMA and No cache - The algorithm works great .
> > > NO EDMA and ENABLED Cache , No problem since I do not use any of the
> > > caching API.
> > > ENABLED EDMA and ENABLED Cache , the output data is bad.
> > >
> > > What I have noticed is it is more of a cache issue rather than a DMA
> > > problem since the data can be verified. It is just that without the
> > > CPU intervening and using the CACHE API, the data gets distorted.
> > >
> > > Thanks,
> > > C
> > >
> > > P.S : How do I get the posts to show up in this group as a continuous > > > thread and without the wait.... that would be
really cool.
> > >
> > > >I think Guy is asking a good question. Won't 96k x 32 fit in internal > > > > memory for 6416? So why use SDRAM and EDMA?
> > > >
> > > > If the issue is that at some other time you need internal memory for > > > > another reason, then I would first try with L2
data cache enabled and no > > > > EDMA. The first time through your data loop you
lose the speed
> > > advantage
> > > > of EDMA, but subsequent times your performance is just as good. And > > > more
> > > > importantly, that mode forces you to make absolutely sure your data is > > > > organized in the most efficient manner, and you
have "thought through" > > > > exactly the sequence that data moves and cache
is used.
> > > >
> > > > Then, enable EDMA as your last step. The performance gain its going to > > > > give you in this situation is minimal, should
you should get it working > > > > last.
> > > >
> > > > -Jeff
> > >
> > > --- In c... , "Jeff
> > > Brower" wrote:
> > > >
> > > > Carl-
> > > >
> > > > > I have a real big problem with EDMA and cache coherency.
> > > > > Board :6416 Spectrum digital
> > > > > Here's what I am doing.
> > > > > 1) Transfer data from SDRAM to ISRAM.
> > > > > 2) Work with data in ISRAM
> > > > > 3) Transfer back to SDRAM
> > > > >
> > > > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > > > .
> > > > > .
> > > > > 5) Finally use SDRAM data.
> > > > >
> > > > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks > > > > > of multiples of 128 (actually (96k).
> > > > > L2 cache is 128k and enabled.
> > > > >
> > > > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that > > > > > should be enough for cache coherency because
the docs say L1D is
> > > > > handled by EDMA. Also ISRAM block is always cache coherent. Correct? > > > > >
> > > > > But the data gets all screwed up....
> > > > >
> > > > > Logically, I think I am doing things right.
> > > > >
> > > > > Is there a way to check cache coherency without the debugger or
> > > > > comparing memory via cpu and checking - that in itself pulls it into > > > > > L2/L1D cache?
> > > > > It looks like EDMA is doing it's job but caching isn't.
> > > > >
> > > > > What may be the problem here.... Appreciate some ideas.
> > > >
> > > > I think Guy is asking a good question. Won't 96k x 32 fit in internal > > > > memory for 6416? So why use SDRAM and EDMA?
> > > >
> > > > If the issue is that at some other time you need internal memory for > > > > another reason, then I would first try with L2
data cache enabled and no > > > > EDMA. The first time through your data loop you
lose the speed
> > > advantage
> > > > of EDMA, but subsequent times your performance is just as good. And > > > more
> > > > importantly, that mode forces you to make absolutely sure your data is > > > > organized in the most efficient manner, and you
have "thought through" > > > > exactly the sequence that data moves and cache
is used.
> > > >
> > > > Then, enable EDMA as your last step. The performance gain its going to > > > > give you in this situation is minimal, should
you should get it working > > > > last.
> > > >
> > > > -Jeff
> > > >
> > >
> > >
> >
> >
> >
> >
> >
>
Reply by Jeff Brower●August 8, 20062006-08-08
William-
> I have what ay be a stupid question, but since
I'm having some issues
> programming a 6713 system that may be similar It's better to ask the
> question and get clarification.
>
> If you enable the cache controller DSP, do you really need to make any
> other cache calls unless you want to free up the L2 ram that the cache
> is using? my limited understanding of caching would be that once it is
> enabled, you access memory using standard memory access commands, and
> the cache controller optimizes what it thinks it needs to do, in the
> chunk sizes it wants to. Issuing a command to invalidate and flush the
> cache gives you a controlled state of knowing when the cache has been
> flushed, but should not be necessary.
>
> If you wanted complete control of what was in the L2 ram versus the
> external ram, you'd be better off disabling cache altogether, freeing
up
> the cache ram for general purpose use, and paging the data in manually
> to do your manipulations.
>
> Is the 6416 processor significantly different in how it works with cache
> from the 6713? Am I completely off base in how to deal with a cache
> controller?
When you use EDMA to move data between external memory and internal SRAM (not
cache),
the CPU doesn't "know" the internal memory has been changed; i.e. there is
no
snooping. Code has to manually invalidate that area of cache. I think Carl
wants to
do it this way because he's got a large amount of external SDRAM data (16
Mbyte) and
he's sort of "double buffering": moving large slices to internal memory
while the
CPU is chugging away at another slice. This method might better utilize CPU
internal
memory bus bandwidth and keep both CPU and DMA units busy (hopefully).
Although I still don't fully understand Carl doesn't DMA directly into
cache space...
Otherwise you're right -- don't use EDMA between SRAM and internal
memory, let the
CPU do the work, and keep cache enabled.
-Jeff
> carlferns wrote:
> >
> > Jeff,
> >
> > Thanks for your comments.
> > I guess I was not very clear with what I am doing.
> > My issue is with DMA and the cache coherency.
> > I have to process a lot more data (16mb)in SDRAM and am slicing it up
> > to be processed in ISRAM.
> > What I keep seeing is that despite invalidating L2, the output data in
> > SDRAM at the very end (having processed all the input data from SDRAM)
> > is corrupt. The only processing going on is as I mentioned earlier
> > STEPS 1-5.
> > If I put a break point and view memory at any stage between Steps 1-5,
> > the debugger seems to handle the cache correctly and output data is good.
> >
> > Conclusion - Cache controller is acting up or the API is not doing
> > what it is supposed to do.
> >
> > >>
> > >> If the issue is that at some other time you need internal memory for
> > >> another reason, then I would first try with L2 data cache enabled
> > and no
> > >> EDMA.
> > >>
> > Here's what I have tried this far.
> > NO EDMA and No cache - The algorithm works great .
> > NO EDMA and ENABLED Cache , No problem since I do not use any of the
> > caching API.
> > ENABLED EDMA and ENABLED Cache , the output data is bad.
> >
> > What I have noticed is it is more of a cache issue rather than a DMA
> > problem since the data can be verified. It is just that without the
> > CPU intervening and using the CACHE API, the data gets distorted.
> >
> > Thanks,
> > C
> >
> > P.S : How do I get the posts to show up in this group as a continuous
> > thread and without the wait.... that would be really cool.
> >
> > >I think Guy is asking a good question. Won't 96k x 32 fit in
internal
> > > memory for 6416? So why use SDRAM and EDMA?
> > >
> > > If the issue is that at some other time you need internal memory for
> > > another reason, then I would first try with L2 data cache enabled and
no
> > > EDMA. The first time through your data loop you lose the speed
> > advantage
> > > of EDMA, but subsequent times your performance is just as good. And
> > more
> > > importantly, that mode forces you to make absolutely sure your data is
> > > organized in the most efficient manner, and you have "thought through"
> > > exactly the sequence that data moves and cache is used.
> > >
> > > Then, enable EDMA as your last step. The performance gain its going to
> > > give you in this situation is minimal, should you should get it working
> > > last.
> > >
> > > -Jeff
> >
> > --- In c... , "Jeff
> > Brower" wrote:
> > >
> > > Carl-
> > >
> > > > I have a real big problem with EDMA and cache coherency.
> > > > Board :6416 Spectrum digital
> > > > Here's what I am doing.
> > > > 1) Transfer data from SDRAM to ISRAM.
> > > > 2) Work with data in ISRAM
> > > > 3) Transfer back to SDRAM
> > > >
> > > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > > .
> > > > .
> > > > 5) Finally use SDRAM data.
> > > >
> > > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in
chunks
> > > > of multiples of 128 (actually (96k).
> > > > L2 cache is 128k and enabled.
> > > >
> > > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > > > should be enough for cache coherency because the docs say L1D is
> > > > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> > > >
> > > > But the data gets all screwed up....
> > > >
> > > > Logically, I think I am doing things right.
> > > >
> > > > Is there a way to check cache coherency without the debugger or
> > > > comparing memory via cpu and checking - that in itself pulls it into
> > > > L2/L1D cache?
> > > > It looks like EDMA is doing it's job but caching isn't.
> > > >
> > > > What may be the problem here.... Appreciate some ideas.
> > >
> > > I think Guy is asking a good question. Won't 96k x 32 fit in
internal
> > > memory for 6416? So why use SDRAM and EDMA?
> > >
> > > If the issue is that at some other time you need internal memory for
> > > another reason, then I would first try with L2 data cache enabled and
no
> > > EDMA. The first time through your data loop you lose the speed
> > advantage
> > > of EDMA, but subsequent times your performance is just as good. And
> > more
> > > importantly, that mode forces you to make absolutely sure your data is
> > > organized in the most efficient manner, and you have "thought through"
> > > exactly the sequence that data moves and cache is used.
> > >
> > > Then, enable EDMA as your last step. The performance gain its going to
> > > give you in this situation is minimal, should you should get it working
> > > last.
> > >
> > > -Jeff
> > >
> >
> >
Reply by Jeff Brower●August 8, 20062006-08-08
Carl-
> I guess I was not very clear with what I am doing.
> My issue is with DMA and the cache coherency.
> I have to process a lot more data (16mb)in SDRAM and am slicing it up
> to be processed in ISRAM.
> What I keep seeing is that despite invalidating L2, the output data in
> SDRAM at the very end (having processed all the input data from SDRAM)
> is corrupt. The only processing going on is as I mentioned earlier
> STEPS 1-5.
> If I put a break point and view memory at any stage between Steps 1-5,
> the debugger seems to handle the cache correctly and output data is good.
>
> Conclusion - Cache controller is acting up or the API is not doing
> what it is supposed to do.
Or programmer error. I know I know, not what you want to hear... but 1000s
of
engineers use C64x EDMA and cache over the last few years.
> What I have noticed is it is more of a cache issue
rather than a DMA
> problem since the data can be verified. It is just that without the
> CPU intervening and using the CACHE API, the data gets distorted.
What do you mean "output data in SDRAM at the very end"? End of what? Each
block?
Or end of a bunch of blocks that consume all of SDRAM? If it's just the
last block
or so, then what happens if you reduce your data set to use only 1/2 of SDRAM?
If
the situation still occurs, then I might say it's "boundary condition" type
of error,
which usually implies an application / programmer issue rather than something
else.
> P.S : How do I get the posts to show up in this group
as a continuous
> thread and without the wait.... that would be really cool.
The group is moderated so posts can take a while to appear. That's a good
thing or
the group would die to spam, but this group has been strong since 1999.
-Jeff
>
> >I think Guy is asking a good question. Won't 96k x 32 fit in
internal
> > memory for 6416? So why use SDRAM and EDMA?
> >
> > If the issue is that at some other time you need internal memory for
> > another reason, then I would first try with L2 data cache enabled and no
> > EDMA. The first time through your data loop you lose the speed
> advantage
> > of EDMA, but subsequent times your performance is just as good. And
> more
> > importantly, that mode forces you to make absolutely sure your data is
> > organized in the most efficient manner, and you have "thought through"
> > exactly the sequence that data moves and cache is used.
> >
> > Then, enable EDMA as your last step. The performance gain its going to
> > give you in this situation is minimal, should you should get it working
> > last.
> >
> > -Jeff
>
> --- In c..., "Jeff Brower" wrote:
> >
> > Carl-
> >
> > > I have a real big problem with EDMA and cache coherency.
> > > Board :6416 Spectrum digital
> > > Here's what I am doing.
> > > 1) Transfer data from SDRAM to ISRAM.
> > > 2) Work with data in ISRAM
> > > 3) Transfer back to SDRAM
> > >
> > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > .
> > > .
> > > 5) Finally use SDRAM data.
> > >
> > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > > of multiples of 128 (actually (96k).
> > > L2 cache is 128k and enabled.
> > >
> > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > > should be enough for cache coherency because the docs say L1D is
> > > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> > >
> > > But the data gets all screwed up....
> > >
> > > Logically, I think I am doing things right.
> > >
> > > Is there a way to check cache coherency without the debugger or
> > > comparing memory via cpu and checking - that in itself pulls it into
> > > L2/L1D cache?
> > > It looks like EDMA is doing it's job but caching isn't.
> > >
> > > What may be the problem here.... Appreciate some ideas.
> >
> > I think Guy is asking a good question. Won't 96k x 32 fit in
internal
> > memory for 6416? So why use SDRAM and EDMA?
> >
> > If the issue is that at some other time you need internal memory for
> > another reason, then I would first try with L2 data cache enabled and no
> > EDMA. The first time through your data loop you lose the speed
> advantage
> > of EDMA, but subsequent times your performance is just as good. And
> more
> > importantly, that mode forces you to make absolutely sure your data is
> > organized in the most efficient manner, and you have "thought through"
> > exactly the sequence that data moves and cache is used.
> >
> > Then, enable EDMA as your last step. The performance gain its going to
> > give you in this situation is minimal, should you should get it working
> > last.
> >
> > -Jeff
> >
Reply by Andrew Elder●August 8, 20062006-08-08
I would try doing a cache clean and see if that makes a difference.
- Andrew E.
carlferns wrote:
>Jeff,
>
>Thanks for your comments.
>I guess I was not very clear with what I am doing.
>My issue is with DMA and the cache coherency.
>I have to process a lot more data (16mb)in SDRAM and am slicing it up
>to be processed in ISRAM.
>What I keep seeing is that despite invalidating L2, the output data in
>SDRAM at the very end (having processed all the input data from SDRAM)
>is corrupt. The only processing going on is as I mentioned earlier
>STEPS 1-5.
>If I put a break point and view memory at any stage between Steps 1-5,
>the debugger seems to handle the cache correctly and output data is good.
>
>Conclusion - Cache controller is acting up or the API is not doing
>what it is supposed to do.
>
>
>
>>>If the issue is that at some other time you need internal memory for
>>>another reason, then I would first try with L2 data cache enabled
>>>
>>>
>and no
>
>
>>>EDMA.
>>>
>>>
>>>
>Here's what I have tried this far.
>NO EDMA and No cache - The algorithm works great .
>NO EDMA and ENABLED Cache , No problem since I do not use any of the
>caching API.
>ENABLED EDMA and ENABLED Cache , the output data is bad.
>
>What I have noticed is it is more of a cache issue rather than a DMA
>problem since the data can be verified. It is just that without the
>CPU intervening and using the CACHE API, the data gets distorted.
>
>Thanks,
>C
>
>P.S : How do I get the posts to show up in this group as a continuous
>thread and without the wait.... that would be really cool.
>
>
>>I think Guy is asking a good question. Won't 96k x 32 fit in internal
>>memory for 6416? So why use SDRAM and EDMA?
>>
>>If the issue is that at some other time you need internal memory for
>>another reason, then I would first try with L2 data cache enabled and no
>>EDMA. The first time through your data loop you lose the speed
>>
>>
>advantage
>
>
>>of EDMA, but subsequent times your performance is just as good. And
>>
>>
>more
>
>
>>importantly, that mode forces you to make absolutely sure your data is
>>organized in the most efficient manner, and you have "thought through"
>>exactly the sequence that data moves and cache is used.
>>
>>Then, enable EDMA as your last step. The performance gain its going to
>>give you in this situation is minimal, should you should get it working
>>last.
>>
>>-Jeff
>>
>>--- In c..., "Jeff Brower" wrote:
>
>
>>Carl-
>>
>>
>>
>>>I have a real big problem with EDMA and cache coherency.
>>>Board :6416 Spectrum digital
>>>Here's what I am doing.
>>>1) Transfer data from SDRAM to ISRAM.
>>>2) Work with data in ISRAM
>>>3) Transfer back to SDRAM
>>>
>>>4) Repeat process for next block in SDRAM to same block in ISRAM
>>>.
>>>.
>>>5) Finally use SDRAM data.
>>>
>>>Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
>>>of multiples of 128 (actually (96k).
>>>L2 cache is 128k and enabled.
>>>
>>>Now before transfer from SDRAM to ISRAM in step 1, I always
>>>CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
>>>should be enough for cache coherency because the docs say L1D is
>>>handled by EDMA. Also ISRAM block is always cache coherent. Correct?
>>>
>>>But the data gets all screwed up....
>>>
>>>Logically, I think I am doing things right.
>>>
>>>Is there a way to check cache coherency without the debugger or
>>>comparing memory via cpu and checking - that in itself pulls it into
>>>L2/L1D cache?
>>>It looks like EDMA is doing it's job but caching isn't.
>>>
>>>What may be the problem here.... Appreciate some ideas.
>>>
>>>
>>I think Guy is asking a good question. Won't 96k x 32 fit in internal
>>memory for 6416? So why use SDRAM and EDMA?
>>
>>If the issue is that at some other time you need internal memory for
>>another reason, then I would first try with L2 data cache enabled and no
>>EDMA. The first time through your data loop you lose the speed
>>
>>
>advantage
>
>
>>of EDMA, but subsequent times your performance is just as good. And
>>
>>
>more
>
>
>>importantly, that mode forces you to make absolutely sure your data is
>>organized in the most efficient manner, and you have "thought through"
>>exactly the sequence that data moves and cache is used.
>>
>>Then, enable EDMA as your last step. The performance gain its going to
>>give you in this situation is minimal, should you should get it working
>>last.
>>
>>-Jeff
>>
>>
>>
>
Reply by William C Bonner●August 7, 20062006-08-07
I have what ay be a stupid question, but since I'm having some issues
programming a 6713 system that may be similar It's better to ask the
question and get clarification.
If you enable the cache controller DSP, do you really need to make any
other cache calls unless you want to free up the L2 ram that the cache
is using? my limited understanding of caching would be that once it is
enabled, you access memory using standard memory access commands, and
the cache controller optimizes what it thinks it needs to do, in the
chunk sizes it wants to. Issuing a command to invalidate and flush the
cache gives you a controlled state of knowing when the cache has been
flushed, but should not be necessary.
If you wanted complete control of what was in the L2 ram versus the
external ram, you'd be better off disabling cache altogether, freeing up
the cache ram for general purpose use, and paging the data in manually
to do your manipulations.
Is the 6416 processor significantly different in how it works with cache
from the 6713? Am I completely off base in how to deal with a cache
controller?
carlferns wrote: >
> Jeff,
>
> Thanks for your comments.
> I guess I was not very clear with what I am doing.
> My issue is with DMA and the cache coherency.
> I have to process a lot more data (16mb)in SDRAM and am slicing it up
> to be processed in ISRAM.
> What I keep seeing is that despite invalidating L2, the output data in
> SDRAM at the very end (having processed all the input data from SDRAM)
> is corrupt. The only processing going on is as I mentioned earlier
> STEPS 1-5.
> If I put a break point and view memory at any stage between Steps 1-5,
> the debugger seems to handle the cache correctly and output data is good.
>
> Conclusion - Cache controller is acting up or the API is not doing
> what it is supposed to do.
>
> >>
> >> If the issue is that at some other time you need internal memory for
> >> another reason, then I would first try with L2 data cache enabled
> and no
> >> EDMA.
> >>
> Here's what I have tried this far.
> NO EDMA and No cache - The algorithm works great .
> NO EDMA and ENABLED Cache , No problem since I do not use any of the
> caching API.
> ENABLED EDMA and ENABLED Cache , the output data is bad.
>
> What I have noticed is it is more of a cache issue rather than a DMA
> problem since the data can be verified. It is just that without the
> CPU intervening and using the CACHE API, the data gets distorted.
>
> Thanks,
> C
>
> P.S : How do I get the posts to show up in this group as a continuous
> thread and without the wait.... that would be really cool.
>
> >I think Guy is asking a good question. Won't 96k x 32 fit in internal
> > memory for 6416? So why use SDRAM and EDMA?
> >
> > If the issue is that at some other time you need internal memory for
> > another reason, then I would first try with L2 data cache enabled and no
> > EDMA. The first time through your data loop you lose the speed
> advantage
> > of EDMA, but subsequent times your performance is just as good. And
> more
> > importantly, that mode forces you to make absolutely sure your data is
> > organized in the most efficient manner, and you have "thought through"
> > exactly the sequence that data moves and cache is used.
> >
> > Then, enable EDMA as your last step. The performance gain its going to
> > give you in this situation is minimal, should you should get it working
> > last.
> >
> > -Jeff
>
> --- In c... , "Jeff
> Brower" wrote:
> >
> > Carl-
> >
> > > I have a real big problem with EDMA and cache coherency.
> > > Board :6416 Spectrum digital
> > > Here's what I am doing.
> > > 1) Transfer data from SDRAM to ISRAM.
> > > 2) Work with data in ISRAM
> > > 3) Transfer back to SDRAM
> > >
> > > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > > .
> > > .
> > > 5) Finally use SDRAM data.
> > >
> > > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > > of multiples of 128 (actually (96k).
> > > L2 cache is 128k and enabled.
> > >
> > > Now before transfer from SDRAM to ISRAM in step 1, I always
> > > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > > should be enough for cache coherency because the docs say L1D is
> > > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> > >
> > > But the data gets all screwed up....
> > >
> > > Logically, I think I am doing things right.
> > >
> > > Is there a way to check cache coherency without the debugger or
> > > comparing memory via cpu and checking - that in itself pulls it into
> > > L2/L1D cache?
> > > It looks like EDMA is doing it's job but caching isn't.
> > >
> > > What may be the problem here.... Appreciate some ideas.
> >
> > I think Guy is asking a good question. Won't 96k x 32 fit in
internal
> > memory for 6416? So why use SDRAM and EDMA?
> >
> > If the issue is that at some other time you need internal memory for
> > another reason, then I would first try with L2 data cache enabled and no
> > EDMA. The first time through your data loop you lose the speed
> advantage
> > of EDMA, but subsequent times your performance is just as good. And
> more
> > importantly, that mode forces you to make absolutely sure your data is
> > organized in the most efficient manner, and you have "thought through"
> > exactly the sequence that data moves and cache is used.
> >
> > Then, enable EDMA as your last step. The performance gain its going to
> > give you in this situation is minimal, should you should get it working
> > last.
> >
> > -Jeff
> >
Reply by carlferns●August 7, 20062006-08-07
Jeff,
Thanks for your comments.
I guess I was not very clear with what I am doing.
My issue is with DMA and the cache coherency.
I have to process a lot more data (16mb)in SDRAM and am slicing it up
to be processed in ISRAM.
What I keep seeing is that despite invalidating L2, the output data in
SDRAM at the very end (having processed all the input data from SDRAM)
is corrupt. The only processing going on is as I mentioned earlier
STEPS 1-5.
If I put a break point and view memory at any stage between Steps 1-5,
the debugger seems to handle the cache correctly and output data is good.
Conclusion - Cache controller is acting up or the API is not doing
what it is supposed to do.
>>
>> If the issue is that at some other time you need internal memory for
>> another reason, then I would first try with L2 data cache enabled and no >> EDMA.
>> Here's what I have tried this far.
NO EDMA and No cache - The algorithm works great .
NO EDMA and ENABLED Cache , No problem since I do not use any of the
caching API.
ENABLED EDMA and ENABLED Cache , the output data is bad.
What I have noticed is it is more of a cache issue rather than a DMA
problem since the data can be verified. It is just that without the
CPU intervening and using the CACHE API, the data gets distorted.
Thanks,
C
P.S : How do I get the posts to show up in this group as a continuous
thread and without the wait.... that would be really cool.
>I think Guy is asking a good question. Won't 96k
x 32 fit in internal
> memory for 6416? So why use SDRAM and EDMA?
>
> If the issue is that at some other time you need internal memory for
> another reason, then I would first try with L2 data cache enabled and no
> EDMA. The first time through your data loop you lose the speed advantage > of EDMA, but subsequent times your performance is
just as good. And more > importantly, that mode forces you to make absolutely
sure your data is
> organized in the most efficient manner, and you have "thought through"
> exactly the sequence that data moves and cache is used.
>
> Then, enable EDMA as your last step. The performance gain its going to
> give you in this situation is minimal, should you should get it working
> last.
>
> -Jeff
--- In c..., "Jeff Brower" wrote: >
> Carl-
>
> > I have a real big problem with EDMA and cache coherency.
> > Board :6416 Spectrum digital
> > Here's what I am doing.
> > 1) Transfer data from SDRAM to ISRAM.
> > 2) Work with data in ISRAM
> > 3) Transfer back to SDRAM
> >
> > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > .
> > .
> > 5) Finally use SDRAM data.
> >
> > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > of multiples of 128 (actually (96k).
> > L2 cache is 128k and enabled.
> >
> > Now before transfer from SDRAM to ISRAM in step 1, I always
> > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > should be enough for cache coherency because the docs say L1D is
> > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> >
> > But the data gets all screwed up....
> >
> > Logically, I think I am doing things right.
> >
> > Is there a way to check cache coherency without the debugger or
> > comparing memory via cpu and checking - that in itself pulls it into
> > L2/L1D cache?
> > It looks like EDMA is doing it's job but caching isn't.
> >
> > What may be the problem here.... Appreciate some ideas.
>
> I think Guy is asking a good question. Won't 96k x 32 fit in internal
> memory for 6416? So why use SDRAM and EDMA?
>
> If the issue is that at some other time you need internal memory for
> another reason, then I would first try with L2 data cache enabled and no
> EDMA. The first time through your data loop you lose the speed advantage > of EDMA, but subsequent times your performance is
just as good. And more > importantly, that mode forces you to make absolutely
sure your data is
> organized in the most efficient manner, and you have "thought through"
> exactly the sequence that data moves and cache is used.
>
> Then, enable EDMA as your last step. The performance gain its going to
> give you in this situation is minimal, should you should get it working
> last.
>
> -Jeff
>
Reply by carlferns●August 7, 20062006-08-07
Guy,
Thanks for your comments.
All my code/data is in ISRAM. My dynamically allocated data buffer is
the only thing in SDRAM on a 128 cache boundary and you are correct -
I am using the slice wise approach to transfer from SDRAM to ISRAM.
I tried it without L2 cache enabled/set but that does not change the
data corruption.
Here's where I find an issue with the cache (both L1d /L2). If I do a
verify_data(), which effectively compares the data (src and dst) byte
by byte via cpu, there is no data corruption thereafter. I think this
step in itself rectifies the cache for coherency .
Some thoughts
1) Does LID/L2 clear all the cache lines for the entire block of data
given in the functions that take the parameteres (data_block, size,
CACHE_WAIT) or does it just look for the block address alone in cache
and then clear the range (size) specified? I am afraid if the range is
not contiguous , there still remains bad cache data.
The addresses within the size to be cleared may not be contiguous but
could be somewhere else in cache.....
I guess I am looking at the inner working of the cache controller.
I will have to try and clear every address in 128 byte increments
starting from the data_block address.
2) Is there a way I can dump cache data or look at cache data..... May
be a stupid question - via a logic analyser...JTAG - any other tools?
-C
--- In c..., "Guy Eschemann" wrote: >
> This may be a stupid question, but why are you using L2 data cache if your > data is already in internal memory?
>
> If your application permits it, I would suggest that you put everything (ie. > code + data) in internal memory. If you have some
large data structures that > won't fit in ISRAM, leave those outside in SDRAM
and process them slice-wise > in internal memory. And turn off the L2 data cache.
This will get you rid of > the nasty coherency problems, and as a bonus
you'll have more internal
> memory for your code/data.
>
> In case code + data won't fit into internal memory, leave the code outside > and enable L2 cache for the external code section. L2
caching works much
> better for code than for data, because code is executed sequentially most of > the time. This produces less cache misses. And you
don't have to worry about > cache coherency issues when using program cache.
>
> In case the execution is too slow, consider moving single critical functions > to internal memory. You can do that by creating a
section in your linker
> command file, and then using the #pragma CODE_SECTION() directive for
> pointing out those functions.
>
> Hope this helps,
>
> Guy Eschemann.
> Vienna, Austria.
>
> On 8/2/06, carlferns wrote:
> >
> > Folks,
> >
> > I have a real big problem with EDMA and cache coherency.
> > Board :6416 Spectrum digital
> > Here's what I am doing.
> > 1) Transfer data from SDRAM to ISRAM.
> > 2) Work with data in ISRAM
> > 3) Transfer back to SDRAM
> >
> > 4) Repeat process for next block in SDRAM to same block in ISRAM
> > .
> > .
> > 5) Finally use SDRAM data.
> >
> > Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> > of multiples of 128 (actually (96k).
> > L2 cache is 128k and enabled.
> >
> > Now before transfer from SDRAM to ISRAM in step 1, I always
> > CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> > should be enough for cache coherency because the docs say L1D is
> > handled by EDMA. Also ISRAM block is always cache coherent. Correct?
> >
> > But the data gets all screwed up....
> >
> > Logically, I think I am doing things right.
> >
> > Is there a way to check cache coherency without the debugger or
> > comparing memory via cpu and checking - that in itself pulls it into
> > L2/L1D cache?
> > It looks like EDMA is doing it's job but caching isn't.
> >
> > What may be the problem here.... Appreciate some ideas.
> >
> > Thanks,
> >
> > C
> >
>
Reply by Jeff Brower●August 7, 20062006-08-07
Carl-
> I have a real big problem with EDMA and cache
coherency.
> Board :6416 Spectrum digital
> Here's what I am doing.
> 1) Transfer data from SDRAM to ISRAM.
> 2) Work with data in ISRAM
> 3) Transfer back to SDRAM
>
> 4) Repeat process for next block in SDRAM to same block in ISRAM
> .
> .
> 5) Finally use SDRAM data.
>
> Blocks are 128 byte aligned in ISRAM and SDRAM and processed in chunks
> of multiples of 128 (actually (96k).
> L2 cache is 128k and enabled.
>
> Now before transfer from SDRAM to ISRAM in step 1, I always
> CACHE_wbInvL2 (SDRAM block , block size , CACHE_WAIT). I think that
> should be enough for cache coherency because the docs say L1D is
> handled by EDMA. Also ISRAM block is always cache coherent. Correct?
>
> But the data gets all screwed up....
>
> Logically, I think I am doing things right.
>
> Is there a way to check cache coherency without the debugger or
> comparing memory via cpu and checking - that in itself pulls it into
> L2/L1D cache?
> It looks like EDMA is doing it's job but caching isn't.
>
> What may be the problem here.... Appreciate some ideas.
I think Guy is asking a good question. Won't 96k x 32 fit in internal
memory for 6416? So why use SDRAM and EDMA?
If the issue is that at some other time you need internal memory for
another reason, then I would first try with L2 data cache enabled and no
EDMA. The first time through your data loop you lose the speed advantage
of EDMA, but subsequent times your performance is just as good. And more
importantly, that mode forces you to make absolutely sure your data is
organized in the most efficient manner, and you have "thought through"
exactly the sequence that data moves and cache is used.
Then, enable EDMA as your last step. The performance gain its going to
give you in this situation is minimal, should you should get it working
last.