Dear all, I just ported my code from DSK6711 to DSK6416. Actually my code is use for counting optical flow. When I use DSK6711, the time for counting one optical flow field is 1.3 sec (greater than 1 Hz). Because I want to make real time application, the optical flow at least at 1 Hz or 2 Hz. So I have bought a new DSK6416 to speed it up. But after I moved to DSK6416, I got the same speed with the DSK6711 that is 1.2 sec. So it is not the CPU speed, but it must have bottleneck at the Peripheral speed like when data reading and data writing from / to SDRAM. For additional information, the image capture from CMOS sensor already using EDMA so it must not have any problem in this part. And for reading/writing data, I am not using EDMA transfer. Actually I am writing hand assembly by using load (LD) / ST (Store) instruction, when processing the optical flow. I am thinking to use QDMA to transfer a small block to L2 Cache. My program will always using 3x3 window or 7x7 window when processing the image. But I am not sure this could solve my problem. Could anyone give any hint to me about this problem. Thank you henrry |
|
Optimization problem
Started by ●April 16, 2004
Reply by ●April 16, 20042004-04-16
Henrry, Am I correct in assuming that you are collecting an image of some size ? How big (in pixels and bytes) ? You are then doing some processing and then outputting the result..... Do you have L2 cache turned on ? Are you doing fixed or floating point processing ? What happens if you disable all the processing - does it complete instantly ? Do you know what part of your processing is taking most of the time ? You say you use EDMA for image capture, but don't use EDMA for reading data. What data are we talking about here, how much, where does it come from and what bus interface is it attached to ? - Andrew E. At 06:13 PM 4/16/2004 +0000, Henrry Andrian wrote: >Dear all, > >I just ported my code from DSK6711 to DSK6416. Actually my code is >use for counting optical flow. When I use DSK6711, the time for >counting one optical flow field is 1.3 sec (greater than 1 Hz). >Because I want to make real time application, the optical flow at >least at 1 Hz or 2 Hz. So I have bought a new DSK6416 to speed it >up. But after I moved to DSK6416, I got the same speed with the >DSK6711 that is 1.2 sec. So it is not the CPU speed, but it must >have bottleneck at the Peripheral speed like when data reading and >data writing from / to SDRAM. >For additional information, the image capture from CMOS sensor >already using EDMA so it must not have any problem in this part. And >for reading/writing data, I am not using EDMA transfer. Actually I >am writing hand assembly by using load (LD) / ST (Store) >instruction, when processing the optical flow. >I am thinking to use QDMA to transfer a small block to L2 Cache. My >program will always using 3x3 window or 7x7 window when processing >the image. But I am not sure this could solve my problem. Could >anyone give any hint to me about this problem. > >Thank you > >henrry > > >_____________________________________ >Note: If you do a simple "reply" with your email client, only the author of this message will receive your answer. You need to do a "reply all" if you want your answer to be distributed to the entire group. > >_____________________________________ >About this discussion group: > >To Join: Send an email to > >To Post: Send an email to > >To Leave: Send an email to > >Archives: http://www.yahoogroups.com/group/c6x > >Other Groups: http://www.dsprelated.com > >Yahoo! Groups Links > > |
|
Reply by ●April 17, 20042004-04-17
I am sorry that I didnt provide enough information. I will describe more detail about it. --- In , Andrew Elder <andrew_elder@b...> wrote: > > Henrry, > > Am I correct in assuming that you are collecting an image of some size ? > How big (in pixels and bytes) ? Yes, my system acquire image from CMOS sensor, the image data 640x480res. stored in FIFO first and than move to SDRAM in DSKBoard. > > You are then doing some processing and then outputting the result..... AFter getting the image in the SDRAM, my program will starting to process the image in the SDRAM for computing optical flow. > > Do you have L2 cache turned on ? No. I didnt use L2 cache. I use the whole L2 as SRAM. could it affect the speed of the program ? > > Are you doing fixed or floating point processing ? I amnot using floating point processing. > > What happens if you disable all the processing - does it complete instantly ? > > Do you know what part of your processing is taking most of the time ? > Yes, I thought that some several looping while computing the image for a small block like 3x3 window and 7x7 window until a whole image. And this operation didnt use any EDMA or QDMA. I am just use LD / ST operation. So I just thinking that this is the big suspection make my program running very slow. But I just want to make sure about this. > You say you use EDMA for image capture, but don't use EDMA for reading data. What data are we talking about here, how much, where does it come from and what bus interface is it attached to ? I am using EDMA to capture image 640x480 from FIFO (The Image data) and moved to SDRAM in DSK board. The FIFO attached in Daughter board, and I am using CE2 to capture from FIFO and moved to SDRAM. The EDMA work very well and fast. So I think that my biggest problem when my program begin to process the image data in SDRAM. So, back to my question. what should I do to speed my program ? must I turn the L2 cache on ? must I use QDMA or must I use DAT ? I just want to start in the correct starting point. Thank you |
|
Reply by ●April 17, 20042004-04-17
> -----Original Message----- > From: Henrry Andrian [mailto:] > Sent: Saturday, April 17, 2004 10:12 AM > To: > Subject: [c6x] Re: Optimization problem > I am sorry that I didnt provide enough information. I will describe > more detail about it. > > --- In , Andrew Elder <andrew_elder@b...> wrote: > > > > Henrry, > > > > Am I correct in assuming that you are collecting an image of some > size ? > > How big (in pixels and bytes) ? > > Yes, my system acquire image from CMOS sensor, the image data > 640x480res. stored in FIFO first and than move to SDRAM in DSKBoard. > > > > > You are then doing some processing and then outputting the > result..... > > AFter getting the image in the SDRAM, my program will starting to > process the image in the SDRAM for computing optical flow. > > > > > Do you have L2 cache turned on ? > > No. I didnt use L2 cache. I use the whole L2 as SRAM. could it > affect the speed of the program ? > > > > > Are you doing fixed or floating point processing ? > > I amnot using floating point processing. > > > > > What happens if you disable all the processing - does it complete > instantly ? > > > > Do you know what part of your processing is taking most of the > time ? > > > Yes, I thought that some several looping while computing the image > for a small block like 3x3 window and 7x7 window until a whole > image. And this operation didnt use any EDMA or QDMA. I am just use > LD / ST operation. So I just thinking that this is the big > suspection make my program running very slow. But I just want to > make sure about this. > > > You say you use EDMA for image capture, but don't use EDMA for > reading data. What data are we talking about here, how much, where > does it come from and what bus interface is it attached to ? > > I am using EDMA to capture image 640x480 from FIFO (The Image data) > and moved to SDRAM in DSK board. The FIFO attached in Daughter > board, and I am using CE2 to capture from FIFO and moved to SDRAM. > The EDMA work very well and fast. So I think that my biggest problem > when my program begin to process the image data in SDRAM. > > So, back to my question. what should I do to speed my program ? must > I turn the L2 cache on ? must I use QDMA or must I use DAT ? I just > want to start in the correct starting point. Thank you Here's what I would suggest.. - Don't use hand assembly. First Optimize the flow in C. Then use intrinsics or serial assembly to optimize the computationally intensive loops of your code. The C6000 compiler is very good and dose an excellent job in most of the high computation functions. - If you can ascertain that, all your code and data can be fit in SRAM, then only you should turn off L2 cache. In any other case it is always better to have some amount of L2 cache. The exact amount will depend on the code and data to be processed. You may need some experimentation to get the L2 cache vs. SRAM ratio correct. - DAT modules are the API's for QDMA. It's always a good idea to use the API's, if you do not have a good knowledge of QDMA. - Always ensure that the data which you are processing in on SRAM. If you cannot allocate the full frame on SRAM, allocate a part of the frame on SRAM. Use the double buffering method to process data in the foreground and perform the data transfers in the background. you can get further info on: C6000 Instruction set: spru189. C6000 cache users guide: spru656. Here you may find "sec 4.3: Processing Chain With DMA Buffering" particularly useful. C6000 programmers guide: spru198 C6000 Optimizing compiler user's guide: spru187. Regards, SS |