DSPRelated.com
Forums

Optimization problem

Started by Henrry Andrian April 16, 2004
Dear all,

I just ported my code from DSK6711 to DSK6416. Actually my code is
use for counting optical flow. When I use DSK6711, the time for
counting one optical flow field is 1.3 sec (greater than 1 Hz).
Because I want to make real time application, the optical flow at
least at 1 Hz or 2 Hz. So I have bought a new DSK6416 to speed it
up. But after I moved to DSK6416, I got the same speed with the
DSK6711 that is 1.2 sec. So it is not the CPU speed, but it must
have bottleneck at the Peripheral speed like when data reading and
data writing from / to SDRAM.
For additional information, the image capture from CMOS sensor
already using EDMA so it must not have any problem in this part. And
for reading/writing data, I am not using EDMA transfer. Actually I
am writing hand assembly by using load (LD) / ST (Store)
instruction, when processing the optical flow.
I am thinking to use QDMA to transfer a small block to L2 Cache. My
program will always using 3x3 window or 7x7 window when processing
the image. But I am not sure this could solve my problem. Could
anyone give any hint to me about this problem.
Thank you

henrry




Henrry,

Am I correct in assuming that you are collecting an image of some size ?
How big (in pixels and bytes) ?

You are then doing some processing and then outputting the result.....

Do you have L2 cache turned on ?

Are you doing fixed or floating point processing ?

What happens if you disable all the processing - does it complete instantly ?

Do you know what part of your processing is taking most of the time ?

You say you use EDMA for image capture, but don't use EDMA for reading data.
What data are we talking about here, how much, where does it come from and what
bus interface is it attached to ? - Andrew E.

At 06:13 PM 4/16/2004 +0000, Henrry Andrian wrote:
>Dear all,
>
>I just ported my code from DSK6711 to DSK6416. Actually my code is
>use for counting optical flow. When I use DSK6711, the time for
>counting one optical flow field is 1.3 sec (greater than 1 Hz).
>Because I want to make real time application, the optical flow at
>least at 1 Hz or 2 Hz. So I have bought a new DSK6416 to speed it
>up. But after I moved to DSK6416, I got the same speed with the
>DSK6711 that is 1.2 sec. So it is not the CPU speed, but it must
>have bottleneck at the Peripheral speed like when data reading and
>data writing from / to SDRAM.
>For additional information, the image capture from CMOS sensor
>already using EDMA so it must not have any problem in this part. And
>for reading/writing data, I am not using EDMA transfer. Actually I
>am writing hand assembly by using load (LD) / ST (Store)
>instruction, when processing the optical flow.
>I am thinking to use QDMA to transfer a small block to L2 Cache. My
>program will always using 3x3 window or 7x7 window when processing
>the image. But I am not sure this could solve my problem. Could
>anyone give any hint to me about this problem. >
>Thank you
>
>henrry >
>
>_____________________________________
>Note: If you do a simple "reply" with your email client, only the author of
this message will receive your answer. You need to do a "reply all" if you want
your answer to be distributed to the entire group.
>
>_____________________________________
>About this discussion group:
>
>To Join: Send an email to
>
>To Post: Send an email to
>
>To Leave: Send an email to
>
>Archives: http://www.yahoogroups.com/group/c6x
>
>Other Groups: http://www.dsprelated.com
>
>Yahoo! Groups Links >
>





I am sorry that I didnt provide enough information. I will describe
more detail about it.

--- In , Andrew Elder <andrew_elder@b...> wrote:
>
> Henrry,
>
> Am I correct in assuming that you are collecting an image of some
size ?
> How big (in pixels and bytes) ?

Yes, my system acquire image from CMOS sensor, the image data
640x480res. stored in FIFO first and than move to SDRAM in DSKBoard.

>
> You are then doing some processing and then outputting the
result.....

AFter getting the image in the SDRAM, my program will starting to
process the image in the SDRAM for computing optical flow.

>
> Do you have L2 cache turned on ?

No. I didnt use L2 cache. I use the whole L2 as SRAM. could it
affect the speed of the program ?

>
> Are you doing fixed or floating point processing ?

I amnot using floating point processing.

>
> What happens if you disable all the processing - does it complete
instantly ?
>
> Do you know what part of your processing is taking most of the
time ?
>
Yes, I thought that some several looping while computing the image
for a small block like 3x3 window and 7x7 window until a whole
image. And this operation didnt use any EDMA or QDMA. I am just use
LD / ST operation. So I just thinking that this is the big
suspection make my program running very slow. But I just want to
make sure about this.

> You say you use EDMA for image capture, but don't use EDMA for
reading data. What data are we talking about here, how much, where
does it come from and what bus interface is it attached to ?

I am using EDMA to capture image 640x480 from FIFO (The Image data)
and moved to SDRAM in DSK board. The FIFO attached in Daughter
board, and I am using CE2 to capture from FIFO and moved to SDRAM.
The EDMA work very well and fast. So I think that my biggest problem
when my program begin to process the image data in SDRAM.

So, back to my question. what should I do to speed my program ? must
I turn the L2 cache on ? must I use QDMA or must I use DAT ? I just
want to start in the correct starting point. Thank you




> -----Original Message-----
> From: Henrry Andrian [mailto:]
> Sent: Saturday, April 17, 2004 10:12 AM
> To:
> Subject: [c6x] Re: Optimization problem > I am sorry that I didnt provide enough information. I will describe
> more detail about it.
>
> --- In , Andrew Elder <andrew_elder@b...> wrote:
> >
> > Henrry,
> >
> > Am I correct in assuming that you are collecting an image of some
> size ?
> > How big (in pixels and bytes) ?
>
> Yes, my system acquire image from CMOS sensor, the image data
> 640x480res. stored in FIFO first and than move to SDRAM in DSKBoard.
>
> >
> > You are then doing some processing and then outputting the
> result.....
>
> AFter getting the image in the SDRAM, my program will starting to
> process the image in the SDRAM for computing optical flow.
>
> >
> > Do you have L2 cache turned on ?
>
> No. I didnt use L2 cache. I use the whole L2 as SRAM. could it
> affect the speed of the program ?
>
> >
> > Are you doing fixed or floating point processing ?
>
> I amnot using floating point processing.
>
> >
> > What happens if you disable all the processing - does it complete
> instantly ?
> >
> > Do you know what part of your processing is taking most of the
> time ?
> >
> Yes, I thought that some several looping while computing the image
> for a small block like 3x3 window and 7x7 window until a whole
> image. And this operation didnt use any EDMA or QDMA. I am just use
> LD / ST operation. So I just thinking that this is the big
> suspection make my program running very slow. But I just want to
> make sure about this.
>
> > You say you use EDMA for image capture, but don't use EDMA for
> reading data. What data are we talking about here, how much, where
> does it come from and what bus interface is it attached to ?
>
> I am using EDMA to capture image 640x480 from FIFO (The Image data)
> and moved to SDRAM in DSK board. The FIFO attached in Daughter
> board, and I am using CE2 to capture from FIFO and moved to SDRAM.
> The EDMA work very well and fast. So I think that my biggest problem
> when my program begin to process the image data in SDRAM.
>
> So, back to my question. what should I do to speed my program ? must
> I turn the L2 cache on ? must I use QDMA or must I use DAT ? I just
> want to start in the correct starting point. Thank you

Here's what I would suggest..

- Don't use hand assembly. First Optimize the flow in C. Then use
intrinsics or serial assembly to optimize the computationally
intensive loops of your code. The C6000 compiler is very good and
dose an excellent job in most of the high computation functions.

- If you can ascertain that, all your code and data can be fit in SRAM,
then only you should turn off L2 cache. In any other case it is always
better to have some amount of L2 cache. The exact amount will depend on
the code and data to be processed. You may need some experimentation
to get the L2 cache vs. SRAM ratio correct.

- DAT modules are the API's for QDMA. It's always a good idea to use the
API's, if you do not have a good knowledge of QDMA.

- Always ensure that the data which you are processing in on SRAM.
If you cannot allocate the full frame on SRAM, allocate a part of the
frame on SRAM. Use the double buffering method to process data in the
foreground and perform the data transfers in the background.

you can get further info on:
C6000 Instruction set: spru189.
C6000 cache users guide: spru656. Here you may find
"sec 4.3: Processing Chain With DMA Buffering" particularly useful.
C6000 programmers guide: spru198
C6000 Optimizing compiler user's guide: spru187.

Regards,
SS