DSPRelated.com
Forums

Corner bender

Started by Bernhard 'Gustl' Bauer October 15, 2009
Hi,

I'm in need of a 32 bit x 32 bit corner bender. I'm working with C6713
now, but I will switch to other C67xx in future. I haven't found any HW
support for this. Does any one know a SW solution? How much time does it
need?

TIA

Gustl

_____________________________________
Gustl,

On Thu, Oct 15, 2009 at 6:54 AM, Bernhard 'Gustl' Bauer
wrote:
> Hi,
>
> I'm in need of a 32 bit x 32 bit corner bender. I'm working with C6713
> now, but I will switch to other C67xx in future. I haven't found any HW
> support for this. Does any one know a SW solution? How much time does it
> need?

Just to make sure that I have correct context/definition... You have
an image [you did not mention the size] that is 32 bits per pixel and
you want to convert it to 32 bit planes. Correct??

It definitely is easier using an FPGA. I haven't thought too hard
about it, but I would think that you could do it with 1 read per pixel
+ 32 writes per pixel + "a few cpu cycles per pixel".
You will need a bunch of memory and the external memory cycles with
determine the speed.

mikedunn

>
> TIA
>
> Gustl
>
>
>
> _____________________________________
>

--
www.dsprelated.com/blogs-1/nf/Mike_Dunn.php

_____________________________________
Gustl,

Please enlighten me.
What is a 32bit x 32 bit corner bender?

R. Williams

---------- Original Message -----------
From: "Bernhard 'Gustl' Bauer"
To: C6x
Sent: Thu, 15 Oct 2009 13:54:24 +0200
Subject: [c6x] Corner bender

> Hi,
>
> I'm in need of a 32 bit x 32 bit corner bender. I'm working with C6713
> now, but I will switch to other C67xx in future. I haven't found any HW
> support for this. Does any one know a SW solution? How much time does it
> need?
>
> TIA
>
> Gustl
I have 32 words of 32 bit data (numbered from w0 to w31). I write all 32
words into the corner bender, then I read 32 words out of the corner
bender (numbered from r0 to r31). Now r0 contains LSBs of w0 to w31 this
goes up to r31 wich contains the MSBs of w0 to w31.

I can imagine a HW/SW solution like this:
Connect 32 McASP out pins to 32 GPIO pins (configured as input) use a
clock line of McASP as DMA trigger to read all GPIO inputs. But I don't
think there is a C67xx with 32 McASP lines :-(
Richard Williams schrieb:
>
>
> Gustl,
>
> Please enlighten me.
> What is a 32bit x 32 bit corner bender?
>
> R. Williams
>
> *---------- Original Message -----------*
> From: "Bernhard 'Gustl' Bauer"
> To: C6x
> Sent: Thu, 15 Oct 2009 13:54:24 +0200
> Subject: [c6x] Corner bender
>
> > Hi,
> >
> > I'm in need of a 32 bit x 32 bit corner bender. I'm working with C6713
> > now, but I will switch to other C67xx in future. I haven't found any HW
> > support for this. Does any one know a SW solution? How much time does it
> > need?
> >
> > TIA
> >
> > Gustl

_____________________________________
> Re: Corner bender
> Posted by: "Bernhard 'Gustl' Bauer" gustl@q...
> Date: Fri Oct 16, 2009 3:10 am ((PDT))
>
> I have 32 words of 32 bit data (numbered from w0 to w31). I write all 32
> words into the corner bender, then I read 32 words out of the corner
> bender (numbered from r0 to r31). Now r0 contains LSBs of w0 to w31 this
> goes up to r31 wich contains the MSBs of w0 to w31.
>
> I can imagine a HW/SW solution like this:
> Connect 32 McASP out pins to 32 GPIO pins (configured as input) use a
> clock line of McASP as DMA trigger to read all GPIO inputs. But I don't
> think there is a C67xx with 32 McASP lines :-(

Hi Gustl,

The algorithm you described is similar to a square matrix transpose, where
matrix entries are one bit wide - if I read it correctly. The difference is
that matrix entries gets interchanged symmetric to matrix main diagonal,
so that e.g. the rightmost column is copied to the bottommost row, while
the corner bender interchanges bits symmetric to the anti-diagonal, thus
the rightmost column becomes the topmost row.

I know nothing about a possible h/w method - I guess a vhdl code would
do in FPGA. As far as a possible s/w options, I am afraid there are not
much. It is know that there is no fast matrix transposition algorithm.
The count is proportional to O(n^2). There is article by J.-O.Eklund
(NDRI - FOA, Sweden) in Two-Dimensional Digital Signal Processing II -
Transforms and Median Filters, ed. T.Huang, Springer, 1981 on matrix
transpositions, titled Effective Methods of Matrix Transposing - where
it is shown that there are no fast algorithms exist, only suboptimal with
respect to number of loads/stores from/to slow storage.

Thus, the direct brute force method should do the work. I've come out with
this one:

uint32 w[32], r[32];
register uint32 temp;
...

for (i = 0; i < 32; i++) // i-th bit
{
temp = 0; // clean up

for (k = 0; k < 32; k++) // k-th input
{
temp |= (_extu (w[k], 31-i, 31) << k); // "anti"-transpose
}

r[i] = temp; // store output
}

It is proportional (3 * 32^2) bit ops and (32^2) loads and stores. One way
to improve it (if that will work) could be to completely unroll the inner loop:

temp = _extu (w[0], 31-i, 31) << 0;
temp |= _extu (w[0], 31-i, 31) << 1;
...

temp |= _extu (w[0], 31-i, 31) << 31;

Hope this helps,

Andrew

_____________________________________