Hi, I was working on implementation of h.264 algorithm on Blackfin a couple of years back. I had used the elegantly made DMA of the processor to move data in and out of the internal memory (specially during de-blocking) and i was competing with the cache in terms of cycles.So I had a chance to experiment with the cache. I had observed very strange phenomenon occuring with the cache. I had written two different versions of codes for the same deblocking algorithm.(de-blocking is a part of h.264 algorithm). One is supposedly optimized but actually wasnt.. The order in which i was accessing the pixel and other data is same in both the cases. This point is very important. I havent changed the order in which data is been accessed. Now I disable the cache, both consume the same number of cycles. Now Only if i enable the cache there is a huge difference (almost 40%, i dont remember exactly but it was considerably huge). Now tell me how does the cache bifferently with the two different versions of code for the same algorithm. Remember i havent changed the order in which i was accessing the data. the code, on both oaccasions was residing in the internal memory. there wanst much difference between the code, there was an 'if' statement which was moved to outside a 'for' loop. How can one explain the difference in cycles which occurs only when i enable the cache,when there is no change in the order in which the data being accessed. No I havent changed the cache mapping option, it was kept constant. I had obeserved the same phenomenon at another situation (in the same H.264) on blackfin (BF533).
Weird Behaviour of Blackfin BF533 cache !!!!!!!!!!!!!!
Started by ●May 5, 2008
Reply by ●May 5, 20082008-05-05
rajesh wrote:> Hi, > > I was working on implementation of h.264 algorithm on Blackfin a > couple of years back. I had used the elegantly made DMA of the > processor to move data in and out of the internal memory (specially > during de-blocking) and i was competing with the cache in terms of > cycles.So I had a chance to experiment with the cache.BlackFin doesn't have any means for providing cache and DMA coherency. Hence you generally can't DMA to the memory areas which are covered by cache.> I had observed very strange phenomenon occuring with the cache. I > had written two different versions of codes for the same deblocking > algorithm.(de-blocking is a part of h.264 algorithm). One is > supposedly optimized but actually wasnt.. > > The order in which i was accessing the pixel and other data is same > in both the cases. This point is very important. > I havent changed the order in which data is been accessed. > > Now I disable the cache, both consume the same number of cycles. Now > Only if i enable the cache there is a huge difference (almost 40%, i > dont remember exactly but it was considerably huge).I can't understand what you did. BTW I compared the efficiency of the data cache vs L1 data memory on my tasks. Cache appears to be somewhat 10% slower, and this is what expected.> Now tell me how does the cache bifferently with the two different > versions of code for the same algorithm. > Remember i havent changed the order in which i was accessing the data. > the code, on both oaccasions was residing in the internal memory. > there wanst much difference between the code, there was an 'if' > statement which was moved to outside a 'for' loop. > How can one explain the difference in cycles which occurs only when i > enable the cache,when there is no change in the order in which the > data being accessed. > No I havent changed the cache mapping option, it was kept constant. > I had obeserved the same phenomenon at another situation (in the same > H.264) on blackfin (BF533).You are a muddle headed. Learn hardware. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Reply by ●May 6, 20082008-05-06
On May 6, 2:13 am, Vladimir Vassilevsky <antispam_bo...@hotmail.com> wrote:> rajesh wrote: > > Hi, > > > I was working on implementation of h.264 algorithm on Blackfin a > > couple of years back. I had used the elegantly made DMA of the > > processor to move data in and out of the internal memory (specially > > during de-blocking) and i was competing with the cache in terms of > > cycles.So I had a chance to experiment with the cache. > > BlackFin doesn't have any means for providing cache and DMA coherency. > Hence you generally can't DMA to the memory areas which are covered by > cache. > > > I had observed very strange phenomenon occuring with the cache. I > > had written two different versions of codes for the same deblocking > > algorithm.(de-blocking is a part of h.264 algorithm). One is > > supposedly optimized but actually wasnt.. > > > The order in which i was accessing the pixel and other data is same > > in both the cases. This point is very important. > > I havent changed the order in which data is been accessed. > > > Now I disable the cache, both consume the same number of cycles. Now > > Only if i enable the cache there is a huge difference (almost 40%, i > > dont remember exactly but it was considerably huge). > > I can't understand what you did. BTW I compared the efficiency of the > data cache vs L1 data memory on my tasks. Cache appears to be somewhat > 10% slower, and this is what expected. > > > Now tell me how does the cache bifferently with the two different > > versions of code for the same algorithm. > > Remember i havent changed the order in which i was accessing the data. > > the code, on both oaccasions was residing in the internal memory. > > there wanst much difference between the code, there was an 'if' > > statement which was moved to outside a 'for' loop. > > How can one explain the difference in cycles which occurs only when i > > enable the cache,when there is no change in the order in which the > > data being accessed. > > No I havent changed the cache mapping option, it was kept constant. > > I had obeserved the same phenomenon at another situation (in the same > > H.264) on blackfin (BF533). > > You are a muddle headed. Learn hardware. > > Vladimir Vassilevsky > DSP and Mixed Signal Design Consultanthttp://www.abvolt.comi didnt get it..should i learn hardware coz am muddle... or should i learn hardware to unmuddle myself... In any case..strange phenomenon like the above can make coz anyone to be muddle.. "None of us knew much about staging a variety show, so we just had to muddle through."
Reply by ●May 6, 20082008-05-06
On May 6, 2:13 am, Vladimir Vassilevsky <antispam_bo...@hotmail.com> wrote:> rajesh wrote: > > Hi, > > > I was working on implementation of h.264 algorithm on Blackfin a > > couple of years back. I had used the elegantly made DMA of the > > processor to move data in and out of the internal memory (specially > > during de-blocking) and i was competing with the cache in terms of > > cycles.So I had a chance to experiment with the cache. > > BlackFin doesn't have any means for providing cache and DMA coherency. > Hence you generally can't DMA to the memory areas which are covered by > cache. > > > I had observed very strange phenomenon occuring with the cache. I > > had written two different versions of codes for the same deblocking > > algorithm.(de-blocking is a part of h.264 algorithm). One is > > supposedly optimized but actually wasnt.. > > > The order in which i was accessing the pixel and other data is same > > in both the cases. This point is very important. > > I havent changed the order in which data is been accessed. > > > Now I disable the cache, both consume the same number of cycles. Now > > Only if i enable the cache there is a huge difference (almost 40%, i > > dont remember exactly but it was considerably huge). > > I can't understand what you did. BTW I compared the efficiency of the > data cache vs L1 data memory on my tasks. Cache appears to be somewhat > 10% slower, and this is what expected. > > > Now tell me how does the cache bifferently with the two different > > versions of code for the same algorithm. > > Remember i havent changed the order in which i was accessing the data. > > the code, on both oaccasions was residing in the internal memory. > > there wanst much difference between the code, there was an 'if' > > statement which was moved to outside a 'for' loop. > > How can one explain the difference in cycles which occurs only when i > > enable the cache,when there is no change in the order in which the > > data being accessed. > > No I havent changed the cache mapping option, it was kept constant. > > I had obeserved the same phenomenon at another situation (in the same > > H.264) on blackfin (BF533). > > You are a muddle headed. Learn hardware. > > Vladimir Vassilevsky > DSP and Mixed Signal Design Consultanthttp://www.abvolt.com> BlackFin doesn't have any means for providing cache and DMA coherency. > Hence you generally can't DMA to the memory areas which are covered by > cache.there are I have used an instruction to invalidate cache after dma transfer.
Reply by ●May 6, 20082008-05-06
On May 6, 9:21 am, rajesh <getrajes...@gmail.com> wrote:> On May 6, 2:13 am, Vladimir Vassilevsky <antispam_bo...@hotmail.com> > wrote: > > > > > rajesh wrote: > > > Hi, > > > > I was working on implementation of h.264 algorithm on Blackfin a > > > couple of years back. I had used the elegantly made DMA of the > > > processor to move data in and out of the internal memory (specially > > > during de-blocking) and i was competing with the cache in terms of > > > cycles.So I had a chance to experiment with the cache. > > > BlackFin doesn't have any means for providing cache and DMA coherency. > > Hence you generally can't DMA to the memory areas which are covered by > > cache. > > > > I had observed very strange phenomenon occuring with the cache. I > > > had written two different versions of codes for the same deblocking > > > algorithm.(de-blocking is a part of h.264 algorithm). One is > > > supposedly optimized but actually wasnt.. > > > > The order in which i was accessing the pixel and other data is same > > > in both the cases. This point is very important. > > > I havent changed the order in which data is been accessed. > > > > Now I disable the cache, both consume the same number of cycles. Now > > > Only if i enable the cache there is a huge difference (almost 40%, i > > > dont remember exactly but it was considerably huge). > > > I can't understand what you did. BTW I compared the efficiency of the > > data cache vs L1 data memory on my tasks. Cache appears to be somewhat > > 10% slower, and this is what expected. > > > > Now tell me how does the cache bifferently with the two different > > > versions of code for the same algorithm. > > > Remember i havent changed the order in which i was accessing the data. > > > the code, on both oaccasions was residing in the internal memory. > > > there wanst much difference between the code, there was an 'if' > > > statement which was moved to outside a 'for' loop. > > > How can one explain the difference in cycles which occurs only when i > > > enable the cache,when there is no change in the order in which the > > > data being accessed. > > > No I havent changed the cache mapping option, it was kept constant. > > > I had obeserved the same phenomenon at another situation (in the same > > > H.264) on blackfin (BF533). > > > You are a muddle headed. Learn hardware. > > > Vladimir Vassilevsky > > DSP and Mixed Signal Design Consultanthttp://www.abvolt.com > > BlackFin doesn't have any means for providing cache and DMA coherency. > > Hence you generally can't DMA to the memory areas which are covered by > > cache. > > there are > > I have used an instruction to invalidate cache after dma > transfer.FYI iflush [ p2 ] ; /* Invalidate cache line containing address that P2 points to */
Reply by ●May 6, 20082008-05-06
rajesh wrote:> On May 6, 9:21 am, rajesh <getrajes...@gmail.com> wrote: >>On May 6, 2:13 am, Vladimir Vassilevsky <antispam_bo...@hotmail.com> >>>rajesh wrote: >>> >>>>I was working on implementation of h.264 algorithm on Blackfin a >>>>couple of years back. I had used the elegantly made DMA of the >>>>processor to move data in and out of the internal memory (specially >>>>during de-blocking) and i was competing with the cache in terms of >>>>cycles.So I had a chance to experiment with the cache. >> >>>BlackFin doesn't have any means for providing cache and DMA coherency. >>>Hence you generally can't DMA to the memory areas which are covered by >>>cache. >>>You are a muddle headed. Learn hardware. >> >>I have used an instruction to invalidate cache after dma >>transfer. > > > FYI > > iflush [ p2 ] ; /* Invalidate cache line containing address that > P2 points to */FYI, muddle head: 1. iflush should be used before DMA transfer, not after. 2. iflush is useless, since it is faster to DMA to/from buffer in L1 and copy the data between external memory (cached) and that buffer. 3. Learn hardware. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Reply by ●May 6, 20082008-05-06
On May 6, 3:14 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com> wrote:> rajesh wrote: > > On May 6, 9:21 am, rajesh <getrajes...@gmail.com> wrote: > >>On May 6, 2:13 am, Vladimir Vassilevsky <antispam_bo...@hotmail.com> > >>>rajesh wrote: > > >>>>I was working on implementation of h.264 algorithm on Blackfin a > >>>>couple of years back. I had used the elegantly made DMA of the > >>>>processor to move data in and out of the internal memory (specially > >>>>during de-blocking) and i was competing with the cache in terms of > >>>>cycles.So I had a chance to experiment with the cache. > > >>>BlackFin doesn't have any means for providing cache and DMA coherency. > >>>Hence you generally can't DMA to the memory areas which are covered by > >>>cache. > >>>You are a muddle headed. Learn hardware. > > >>I have used an instruction to invalidate cache after dma > >>transfer. > > > FYI > > > iflush [ p2 ] ; /* Invalidate cache line containing address that > > P2 points to */ > > FYI, muddle head: > > 1. iflush should be used before DMA transfer, not after. > > 2. iflush is useless, since it is faster to DMA to/from buffer in L1 and > copy the data between external memory (cached) and that buffer. > > 3. Learn hardware. > > Vladimir Vassilevsky > DSP and Mixed Signal Design Consultanthttp://www.abvolt.comI have not said that i have used the above instruction..there are a few other instructions which i am not able to recall. You can invalidate the entire cache at one shot. I have used DMA and cache simultaneously and demonstrated the gain of DMA over cache especially when the data acess is sequencial in memory. This happens when one is accessing pixels of a 2-d image. And lastly I didnt get what you mean by 'Hardware'.
Reply by ●May 6, 20082008-05-06
On May 6, 3:14 pm, Vladimir Vassilevsky <antispam_bo...@hotmail.com> wrote:> rajesh wrote: > > On May 6, 9:21 am, rajesh <getrajes...@gmail.com> wrote: > >>On May 6, 2:13 am, Vladimir Vassilevsky <antispam_bo...@hotmail.com> > >>>rajesh wrote: > > >>>>I was working on implementation of h.264 algorithm on Blackfin a > >>>>couple of years back. I had used the elegantly made DMA of the > >>>>processor to move data in and out of the internal memory (specially > >>>>during de-blocking) and i was competing with the cache in terms of > >>>>cycles.So I had a chance to experiment with the cache. > > >>>BlackFin doesn't have any means for providing cache and DMA coherency. > >>>Hence you generally can't DMA to the memory areas which are covered by > >>>cache. > >>>You are a muddle headed. Learn hardware. > > >>I have used an instruction to invalidate cache after dma > >>transfer. > > > FYI > > > iflush [ p2 ] ; /* Invalidate cache line containing address that > > P2 points to */ > > FYI, muddle head: > > 1. iflush should be used before DMA transfer, not after. > > 2. iflush is useless, since it is faster to DMA to/from buffer in L1 and > copy the data between external memory (cached) and that buffer. > > 3. Learn hardware. > > Vladimir Vassilevsky > DSP and Mixed Signal Design Consultanthttp://www.abvolt.com"BlackFin doesn't have any means for providing cache and DMA coherency. Hence you generally can't DMA to the memory areas which are covered by cache. " who said this?
Reply by ●May 6, 20082008-05-06
rajesh wrote:> "BlackFin doesn't have any means for providing cache and DMA > coherency. > Hence you generally can't DMA to the memory areas which are covered by > cache. " > > who said this?Hint: cache snooping Go learn hardware, bad pupil. VLV
Reply by ●May 6, 20082008-05-06
> I have not said that i have used the above instruction..there are a > few other instructions which i am not able to recall.Excuses, excuses, always excuses.> > You can invalidate the entire cache at one shot.Bad idea anyway.> I have used DMA and > cache simultaneously and demonstrated the > gain of DMA over cache especially when the data acess is sequencial > in memory.Nonsense.> This happens when > one is accessing pixels of a 2-d image.This happens when one has a muddle head.> And lastly I didnt get what you mean by 'Hardware'.LOL VLV