comp.dsp | TMS320DM642 8bit QDMA transfers and subsampling

Hi all.

I'm using a DM642 to capture a PAL frame from a video port as 8 bit
YCbCr. I want to subsample this frame by half, but I can't use the
video port scaler because I need the full frame too. I tried this
using 8 bit QDMA transfers set up as indexed source/incremented dest.
The problem is that this is the probably the most inefficient use of
the EDMA engine possible, and the whole thing grinds to a halt
(bearing in mind that the EDMA is already loaded by servicing the
video ports).

I wondered if it might be better to do 32 bit QDMA transfers into
internal memory, do the subsampling, and transfer back to external
heap memory. Is this likely to be better, or has anyone any other
suggestions?

Cheers

mark-r

-- 
"Let's meet the panel. You couldn't ask for four finer comedians -
so that answers your next question..."
 -- Humphrey Lyttleton

Reply by mbelge ●May 18, 20052005-05-18

>
>Hi all.
>
>I'm using a DM642 to capture a PAL frame from a video port as 8 bit
>YCbCr. I want to subsample this frame by half, but I can't use the
>video port scaler because I need the full frame too. I tried this
>using 8 bit QDMA transfers set up as indexed source/incremented dest.
>The problem is that this is the probably the most inefficient use of
>the EDMA engine possible, and the whole thing grinds to a halt
>(bearing in mind that the EDMA is already loaded by servicing the
>video ports).
>
>I wondered if it might be better to do 32 bit QDMA transfers into
>internal memory, do the subsampling, and transfer back to external
>heap memory. Is this likely to be better, or has anyone any other
>suggestions?
>
>Cheers
>
>mark-r
>
>-- 
>"Let's meet the panel. You couldn't ask for four finer comedians -
>so that answers your next question..."
> -- Humphrey Lyttleton
>

I am assuming that you are trying to subsample the frame in horizontal
direction, i.e. 640x480 comes out as 320x480. You are right that you can
do this by the EDMA engine alone but it will be extremely inefficient to a
point that it is infeasible. The reason is that the EDMA (and the QDMA by
the way) is optimized for 32-bit transfers and for contigous data streams.
That is if you put gaps in between data elements to be transferred, the
EDMA engine gets less slower (actually submits a transfer request for each
data). Your best bet is to bring large chunks of image into internal
memory, do a local subsamling in the register file and transfer the
subsamled image back out the external memory. You can do this in a double
buffering scheme and using two QDMA's with one of them always in flight
(i.e. DMA transfer is interlaced with the CPU operations):

hinCurrent = DAT_copy(currentSlice);
while ( there are more slices )
{
    if ( more slices needed )
    {
        hinNext = DAT_copy(nextSlice);
    }

    DAT_wait(hinCurrent);
    slice = subsample(currentSlice);
    DAT_wait(houtCurrent) ;
    houtCurrent = QDMA(slice);
    
    //rotate handles
     
} 


(I omitted mots of the parameters) This will give you much better
efficieny. 

		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Reply by Michael Schoeberl ●May 18, 20052005-05-18

 > [DAT_copy() function]

be careful with the caching ...


I tried to change my processing routine to fetch some data with EDMA and 
did not touch the writing back (one step at a time) ... ehh - gave 
strange results ...

sometimes you manually need to _wb-invalidate some cache_

after figuring out the problems I finally found the right document:
For my DSP its the document spru610: "TMS320C64x DSP Two-Level Internal 
Memory Reference Guide"


bye,
Michael
PS: the original posting did not reach me :-/

Reply by Mark Robinson ●May 19, 20052005-05-19

[ Thanks to mbelge for a good suggestion ]

Michael Schoeberl wrote:
> 
> be careful with the caching ...

If you do a DMA from external to internal memory, or vice versa, the
cache is not touched at all. So, if any part of your external memory
is cached, the cache copy may become corrupt. I guess the way to avoid
it is to avoid cacheing your buffer, by not accessing it with the CPU
at all (you also need to align buffers on cache line boundaries).

> PS: the original posting did not reach me :-/

I sent it over a month ago, so I guess it will have expired from
some places!

Cheers

mark-r

-- 
"Let's meet the panel. You couldn't ask for four finer comedians -
so that answers your next question..."
 -- Humphrey Lyttleton

Reply by Michael Schoeberl ●May 19, 20052005-05-19

> I guess the way to avoid
> it is to avoid cacheing your buffer, by not accessing it with the CPU
> at all (you also need to align buffers on cache line boundaries).

thats the easy solution - but this is not necessary ... I'll just 
describe what I'm doing on my C6416 (I'm sure someone else out there is 
looking for this like I was ;-)


My data is in external SDRAM (and was previously used and might still be 
in L2) and I want to put it to ISRAM for fast processing.


- call CACHE_wbinvL2 on you data, this writes back L1d and L2 and 
invalidates both (!) for the data ...
- call CACHE_wbinvL1 on you destination ...
- call DAT_COPY to transfer the data



the function CACHE_wbinvL2 did not work for invalidating huge 
data-arrays (600kByte) at a time but it's working in small chunks ...
I guess the problem is the limited register that passes the size to the 
DMA controller - there is a limit of 256k bytes ...
(the API ref guide spru401f does not mention this - it's just in the 
spru610 document)



bye,
Michael

Reply by Mark Robinson ●May 20, 20052005-05-20

Michael Schoeberl wrote:
> 
> > it is to avoid cacheing your buffer, by not accessing it with the CPU

> thats the easy solution - but this is not necessary ... I'll just

It is necessary within an XDAIS algorithm, since you're not allowed to
fiddle with the cache. One mistake I have made is to use memcpy (because
I coudn't be bothered to implement IDMA2) on a buffer that a previous
algorithm in the channel had DMAed. Disasterous!

> describe what I'm doing on my C6416
[snip]

All filed away in my "things are are bound to come in useful" folder,
thanks.

Cheers

mark-r

-- 
"Let's meet the panel. You couldn't ask for four finer comedians -
so that answers your next question..."
 -- Humphrey Lyttleton

TMS320DM642 8bit QDMA transfers and subsampling

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group