DSPRelated.com
Forums

changing word to short for DCT

Started by jrf...@gatv.ssr.upm.es June 20, 2007
Hi all,
I´m trying to perform DCT8x8 (part of imglib which works over data in short format. The trouble is that image data is acquired and stored as word (since each pixel uses 8 bits it is normal to use this format), so I have to change my array containing the data to short. The only way I´ve managed to do so is by copying my array to another short array[] through a for loop. As I have to do so with the whole image it happens to be too much slow for my purposes. Does anyone know if there's an optimized way to do so?

Thanks!
JFR-

> I'm trying to perform DCT8x8 (part of imglib which works over data in short
> format. The trouble is that image data is acquired and stored as word
> (since each pixel uses 8 bits it is normal to use this format), so I have
> to change my array containing the data to short. The only way I've managed
> to do so is by copying my array to another short array[] through a for
> loop. As I have to do so with the whole image it happens to be too much
> slow for my purposes. Does anyone know if there's an optimized way to do so?

You are saying that your pixel data is acquired as "packed" -- four (4) 8-byte pixels
per 32-bit word? And you want to split pixels out to 16-bit short values,
zero-extended on upper 8 bits?

-Jeff
Ups! I explained it the wrong way. The dsp acquires the data as char (1
byte) and I change them into short (2 bytes) in order to use the imglib
which in most cases needs to work over short data. I my post I wrote
'word' when I wanted to write 'char'. Sorry for the confussion.

Thank you,
Best Regards,
juanan

> JFR-
>
>> I'm trying to perform DCT8x8 (part of imglib which works over data in
>> short
>> format. The trouble is that image data is acquired and stored as word
>> (since each pixel uses 8 bits it is normal to use this format), so I
>> have
>> to change my array containing the data to short. The only way I've
>> managed
>> to do so is by copying my array to another short array[] through a for
>> loop. As I have to do so with the whole image it happens to be too much
>> slow for my purposes. Does anyone know if there's an optimized way to do
>> so?
>
> You are saying that your pixel data is acquired as "packed" -- four (4)
> 8-byte pixels
> per 32-bit word? And you want to split pixels out to 16-bit short values,
> zero-extended on upper 8 bits?
>
> -Jeff
>
Juanan-

> Ups! I explained it the wrong way. The dsp acquires the data as char (1
> byte)

How is the char data stored in memory? Packed with no separation between the chars?
Or otherwise?

-Jeff

> and I change them into short (2 bytes) in order to use the imglib
> which in most cases needs to work over short data. I my post I wrote
> 'word' when I wanted to write 'char'. Sorry for the confussion.
>
> Thank you,
> Best Regards,
> juanan
>
> > JFR-
> >
> >> I'm trying to perform DCT8x8 (part of imglib which works over data in
> >> short
> >> format. The trouble is that image data is acquired and stored as word
> >> (since each pixel uses 8 bits it is normal to use this format), so I
> >> have
> >> to change my array containing the data to short. The only way I've
> >> managed
> >> to do so is by copying my array to another short array[] through a for
> >> loop. As I have to do so with the whole image it happens to be too much
> >> slow for my purposes. Does anyone know if there's an optimized way to do
> >> so?
> >
> > You are saying that your pixel data is acquired as "packed" -- four (4)
> > 8-byte pixels
> > per 32-bit word? And you want to split pixels out to 16-bit short values,
> > zero-extended on upper 8 bits?
> >
> > -Jeff
> >
Juanan-

> That's right, Jeff. The data is packed contiguosly, each byte has a pixel
> information. That's the problem, because I can't use any function such as
> DAT_copy or memcpy because there's no space between pixels. It would end
> in having the same structure in other place.

Ok, your pixel data storage situation is clear now. I think you can program a DMA
copy from one mem location to another using 'offset' (or index) modification on the
destination location, for example:

-source -- element size is 1 byte, increment is 1

-destination -- element size is 1 byte, increment is 2

Then you would just need to make sure the destination area is zero-filled first. DMA
will be much faster than a CPU loop.

Here is the C6000 ref guide for DMA:

http://focus.ti.com/lit/ug/spru234c/spru234c.pdf

I'm not sure this will work as I've not done it, but seems like a good path to
investigate.

-Jeff

> > Juanan-
> >
> >> Ups! I explained it the wrong way. The dsp acquires the data as char (1
> >> byte)
> >
> > How is the char data stored in memory? Packed with no separation between
> > the chars?
> > Or otherwise?
> >
> > -Jeff
> >
> >> and I change them into short (2 bytes) in order to use the imglib
> >> which in most cases needs to work over short data. I my post I wrote
> >> 'word' when I wanted to write 'char'. Sorry for the confussion.
> >>
> >> Thank you,
> >> Best Regards,
> >> juanan
> >>
> >> > JFR-
> >> >
> >> >> I'm trying to perform DCT8x8 (part of imglib which works over data in
> >> >> short
> >> >> format. The trouble is that image data is acquired and stored as word
> >> >> (since each pixel uses 8 bits it is normal to use this format), so I
> >> >> have
> >> >> to change my array containing the data to short. The only way I've
> >> >> managed
> >> >> to do so is by copying my array to another short array[] through a
> >> for
> >> >> loop. As I have to do so with the whole image it happens to be too
> >> much
> >> >> slow for my purposes. Does anyone know if there's an optimized way to
> >> do
> >> >> so?
> >> >
> >> > You are saying that your pixel data is acquired as "packed" -- four
> >> (4)
> >> > 8-byte pixels
> >> > per 32-bit word? And you want to split pixels out to 16-bit short
> >> values,
> >> > zero-extended on upper 8 bits?
> >> >
> >> > -Jeff
> >> >
> >
Juanan-

> That's right, Jeff. The data is packed contiguosly, each byte has a pixel
> information. That's the problem, because I can't use any function such as
> DAT_copy or memcpy because there's no space between pixels. It would end
> in having the same structure in other place.

Ok, your pixel data storage situation is clear now. I think you can program a DMA
copy from one mem location to another using 'offset' (or index) modification on the
destination location, for example:

-source -- element size is 1 byte, increment is 1

-destination -- element size is 1 byte, increment is 2

Then you would just need to make sure the destination area is zero-filled first. DMA
will be much faster than a CPU loop.

Here is the C6000 ref guide for DMA:

http://focus.ti.com/lit/ug/spru234c/spru234c.pdf

I'm not sure this will work as I've not done it, but seems like a good path to
investigate.

-Jeff

> > Juanan-
> >
> >> Ups! I explained it the wrong way. The dsp acquires the data as char (1
> >> byte)
> >
> > How is the char data stored in memory? Packed with no separation between
> > the chars?
> > Or otherwise?
> >
> > -Jeff
> >
> >> and I change them into short (2 bytes) in order to use the imglib
> >> which in most cases needs to work over short data. I my post I wrote
> >> 'word' when I wanted to write 'char'. Sorry for the confussion.
> >>
> >> Thank you,
> >> Best Regards,
> >> juanan
> >>
> >> > JFR-
> >> >
> >> >> I'm trying to perform DCT8x8 (part of imglib which works over data in
> >> >> short
> >> >> format. The trouble is that image data is acquired and stored as word
> >> >> (since each pixel uses 8 bits it is normal to use this format), so I
> >> >> have
> >> >> to change my array containing the data to short. The only way I've
> >> >> managed
> >> >> to do so is by copying my array to another short array[] through a
> >> for
> >> >> loop. As I have to do so with the whole image it happens to be too
> >> much
> >> >> slow for my purposes. Does anyone know if there's an optimized way to
> >> do
> >> >> so?
> >> >
> >> > You are saying that your pixel data is acquired as "packed" -- four
> >> (4)
> >> > 8-byte pixels
> >> > per 32-bit word? And you want to split pixels out to 16-bit short
> >> values,
> >> > zero-extended on upper 8 bits?
> >> >
> >> > -Jeff
> >> >
> >
jrf,

Are you saying that you need to save 32bit words into an array
then process them 16 bits at a time?

If so then something similar to the following may solve your problem...

typedef Uint32 wordArray;
typedef Uint16 shortArray;
typedef union dualArray
{
wordArray,
shortArray[2]
} dualArray_t;

// declare the array
union dualArray_t myArrays[500];

// set the 32bit data
myArrays[1].dualArray.wordArray = data32;

// retrieve the 2 16bit datas
high16Data = myArrays[1].dualArray.shortArray[0];
low16Data = myArrays[1].dualArray.shortArray[1];

I have probably left out and/or clobbered some details, but the above should get you pointed in the
right direction.

R. Williams
---------- Original Message -----------
From: j...@gatv.ssr.upm.es
To: c...
Sent: Wed, 20 Jun 2007 12:30:22 -0400
Subject: [c6x] changing word to short for DCT

> Hi all,
> I´m trying to perform DCT8x8 (part of imglib which works over data in short format. The
> trouble is that image data is acquired and stored as word (since each pixel uses 8 bits it
> is normal to use this format), so I have to change my array containing the data to short.
> The only way I´ve managed to do so is by copying my array to another short array[]
> through a for loop. As I have to do so with the whole image it happens to be too much slow
> for my purposes. Does anyone know if there's an optimized way to do so?
>
> Thanks!
>
> Juanan-
>
>> That's right, Jeff. The data is packed contiguosly, each byte has a
>> pixel
>> information. That's the problem, because I can't use any function such
>> as
>> DAT_copy or memcpy because there's no space between pixels. It would end
>> in having the same structure in other place.
>
> Ok, your pixel data storage situation is clear now. I think you can
> program a DMA
> copy from one mem location to another using 'offset' (or index)
> modification on the
> destination location, for example:
>
> -source -- element size is 1 byte, increment is 1
>
> -destination -- element size is 1 byte, increment is 2
>
> Then you would just need to make sure the destination area is zero-filled
> first. DMA
> will be much faster than a CPU loop.
>
> Here is the C6000 ref guide for DMA:
>
> http://focus.ti.com/lit/ug/spru234c/spru234c.pdf
>
> I'm not sure this will work as I've not done it, but seems like a good
> path to
> investigate.
>
> -Jeff

Ok, I've tried to do it with DAT_copy2d which uses DMA tranfers and it's
even slower. I don't know how to implement a DMA transfer by myself so I
used that function. I suppose it it because it is copying byte by byte.
I've read the guide and I suppose what I should do is use a QDMA but I
don't know to do it.
Do you think It would be very different using QDMA instead od DAT_copy? As
I don't know how to use it it would be very disappointing to learn to do
it and then get no advantage of doing so.
Thanks for all,

Juanan.

>
>> > Juanan-
>> >
>> >> Ups! I explained it the wrong way. The dsp acquires the data as char
>> (1
>> >> byte)
>> >
>> > How is the char data stored in memory? Packed with no separation
>> between
>> > the chars?
>> > Or otherwise?
>> >
>> > -Jeff
>> >
>> >> and I change them into short (2 bytes) in order to use the imglib
>> >> which in most cases needs to work over short data. I my post I wrote
>> >> 'word' when I wanted to write 'char'. Sorry for the confussion.
>> >>
>> >> Thank you,
>> >> Best Regards,
>> >> juanan
>> >>
>> >> > JFR-
>> >> >
>> >> >> I'm trying to perform DCT8x8 (part of imglib which works over data
>> in
>> >> >> short
>> >> >> format. The trouble is that image data is acquired and stored as
>> word
>> >> >> (since each pixel uses 8 bits it is normal to use this format), so
>> I
>> >> >> have
>> >> >> to change my array containing the data to short. The only way I've
>> >> >> managed
>> >> >> to do so is by copying my array to another short array[] through a
>> >> for
>> >> >> loop. As I have to do so with the whole image it happens to be too
>> >> much
>> >> >> slow for my purposes. Does anyone know if there's an optimized way
>> to
>> >> do
>> >> >> so?
>> >> >
>> >> > You are saying that your pixel data is acquired as "packed" -- four
>> >> (4)
>> >> > 8-byte pixels
>> >> > per 32-bit word? And you want to split pixels out to 16-bit short
>> >> values,
>> >> > zero-extended on upper 8 bits?
>> >> >
>> >> > -Jeff
>> >> >
>>
Juanan-

> Ok, I've tried to do it with DAT_copy2d which uses DMA tranfers and it's
> even slower. I don't know how to implement a DMA transfer by myself so I
> used that function. I suppose it it because it is copying byte by byte.
> I've read the guide and I suppose what I should do is use a QDMA but I
> don't know to do it.

Mem-to-mem DMA should not be slower than a CPU loop... how are you measuring the
timing? Hopefully not using printf() or JTAG/RTDX based method. Is either the
source or destination located in external mem? Is any other DMA or HPI external
access going on at the same time, for example the pixel acquisition process?

> Do you think It would be very different using QDMA instead od DAT_copy? As
> I don't know how to use it it would be very disappointing to learn to do
> it and then get no advantage of doing so.

Yes I would try QDMA. It's like a special DMA channel that is synchronized with the
CPU, so it should make efficient use of the internal mem bus. I'm not sure if
DAT_copy() figures out when to use QDMA on its own.

-Jeff

> >> > Juanan-
> >> >
> >> >> Ups! I explained it the wrong way. The dsp acquires the data as char
> >> (1
> >> >> byte)
> >> >
> >> > How is the char data stored in memory? Packed with no separation
> >> between
> >> > the chars?
> >> > Or otherwise?
> >> >
> >> > -Jeff
> >> >
> >> >> and I change them into short (2 bytes) in order to use the imglib
> >> >> which in most cases needs to work over short data. I my post I wrote
> >> >> 'word' when I wanted to write 'char'. Sorry for the confussion.
> >> >>
> >> >> Thank you,
> >> >> Best Regards,
> >> >> juanan
> >> >>
> >> >> > JFR-
> >> >> >
> >> >> >> I'm trying to perform DCT8x8 (part of imglib which works over data
> >> in
> >> >> >> short
> >> >> >> format. The trouble is that image data is acquired and stored as
> >> word
> >> >> >> (since each pixel uses 8 bits it is normal to use this format), so
> >> I
> >> >> >> have
> >> >> >> to change my array containing the data to short. The only way I've
> >> >> >> managed
> >> >> >> to do so is by copying my array to another short array[] through a
> >> >> for
> >> >> >> loop. As I have to do so with the whole image it happens to be too
> >> >> much
> >> >> >> slow for my purposes. Does anyone know if there's an optimized way
> >> to
> >> >> do
> >> >> >> so?
> >> >> >
> >> >> > You are saying that your pixel data is acquired as "packed" -- four
> >> >> (4)
> >> >> > 8-byte pixels
> >> >> > per 32-bit word? And you want to split pixels out to 16-bit short
> >> >> values,
> >> >> > zero-extended on upper 8 bits?
> >> >> >
> >> >> > -Jeff
> >> >> >
> >> >
> >
Jeff-

> Mem-to-mem DMA should not be slower than a CPU loop... how are you
> measuring the
> timing? Hopefully not using printf() or JTAG/RTDX based method. Is
> either the
> source or destination located in external mem? Is any other DMA or HPI
> external
> access going on at the same time, for example the pixel acquisition
> process?
I capture teh image with a video camera, process it (at least I try) and
send it to a tv, so I don't measure the tiem, I just see that the image I
get is really slow. Both source and destination are located in external
memory, but no other DMA is working at the same time. First I used a
normal DAT_copy and worked fien, but with DAT_copy2d which I've tried to
use to separate my bytes and allocate them in shorts it is significantly
slower.
>> Do you think It would be very different using QDMA instead od DAT_copy?
>> As
>> I don't know how to use it it would be very disappointing to learn to do
>> it and then get no advantage of doing so.
>
> Yes I would try QDMA. It's like a special DMA channel that is
> synchronized with the
> CPU, so it should make efficient use of the internal mem bus. I'm not
> sure if
> DAT_copy() figures out when to use QDMA on its own.
>
> -Jeff
>
>> >> > Juanan-
>> >> >
>> >> >> Ups! I explained it the wrong way. The dsp acquires the data as
>> char
>> >> (1
>> >> >> byte)
>> >> >
>> >> > How is the char data stored in memory? Packed with no separation
>> >> between
>> >> > the chars?
>> >> > Or otherwise?
>> >> >
>> >> > -Jeff
>> >> >
>> >> >> and I change them into short (2 bytes) in order to use the imglib
>> >> >> which in most cases needs to work over short data. I my post I
>> wrote
>> >> >> 'word' when I wanted to write 'char'. Sorry for the confussion.
>> >> >>
>> >> >> Thank you,
>> >> >> Best Regards,
>> >> >> juanan
>> >> >>
>> >> >> > JFR-
>> >> >> >
>> >> >> >> I'm trying to perform DCT8x8 (part of imglib which works over
>> data
>> >> in
>> >> >> >> short
>> >> >> >> format. The trouble is that image data is acquired and stored
>> as
>> >> word
>> >> >> >> (since each pixel uses 8 bits it is normal to use this format),
>> so
>> >> I
>> >> >> >> have
>> >> >> >> to change my array containing the data to short. The only way
>> I've
>> >> >> >> managed
>> >> >> >> to do so is by copying my array to another short array[]
>> through a
>> >> >> for
>> >> >> >> loop. As I have to do so with the whole image it happens to be
>> too
>> >> >> much
>> >> >> >> slow for my purposes. Does anyone know if there's an optimized
>> way
>> >> to
>> >> >> do
>> >> >> >> so?
>> >> >> >
>> >> >> > You are saying that your pixel data is acquired as "packed" --
>> four
>> >> >> (4)
>> >> >> > 8-byte pixels
>> >> >> > per 32-bit word? And you want to split pixels out to 16-bit
>> short
>> >> >> values,
>> >> >> > zero-extended on upper 8 bits?
>> >> >> >
>> >> >> > -Jeff
>> >> >> >
>> >> >
>>