Reply by Vladimir Vassilevsky July 26, 20072007-07-26
"Edison" <ebell@senscient.com> wrote in message
news:hqOdnU38D7xO_jXbnZ2dnUVZ_g2dnZ2d@giganews.com...
> Hi > > I'm having some issues with a SHARC ADSP 21369. I'm perfroming a number of > FFTs on 8k sample blocks using the ADSP21369 EZKIT LITE and have > encountered a show stopper. > > If I run the FFT using the SDRAM to store the buffers (too big for > internal) then it takes 10 times as long. 2-3 times I would accept. I have > timed reads and writes to the SDRAM and they are within 3 times the number > of cycles. > > Has anyone else encountered / fixed this problem? >
SDRAM read access by the DSP core is VERY SLOW. Every single access takes somewhat 10 bus clock cycles. That depends on the access pattern and configuration. The good way would be to run the FFT in the internal memory of the DSP, and copy the data from/to SDRAM by DMA. While one block is being processed, DMA stores the previous block and fetches the next block. So the DSP runs at maximum speed with almost zero overhead. Vladimir Vassilevsky DSP and Mixed Signal Consultant www.abvolt.com
Reply by Andor July 26, 20072007-07-26
On 26 Jul., 12:55, "Edison" <eb...@senscient.com> wrote:
> >I fixed the problem by writing a special FFT routine that breaks a > >large FFT into a number of smaller FFTs that fit into internal memory, > >and uses DMA to transfer in- and output of the smaller FFTs to SDRAM > >during the FFT processing. It turned out that computing the twiddle > >factors on the fly was faster than pre-computing and storing in SDRAM. > >This was on the 21161, with two 21161 connected as cluster and sharing > >the SDRAM. It was a bitch. > > How do you recombine the FFTs to make the larger one? does this not > require a large FFT running from the SDRAM?
No, it just meanst that you have to implement the input data detangling (bit reversed addressing) for the first couple of stages of the DIT FFT by hand. Because you can download a whole block of data into internal memory at a time, it is faster than FFT that runs directly on SDRAM. See also out-of-core FFT algorithms.
> > Is there a library available for this sort of thing? I can handle complex > DMA stuff but writing FFT code is not something I have tried and would > like to avoid.
None that I know of. Regards, Andor
Reply by Edison July 26, 20072007-07-26

> >I fixed the problem by writing a special FFT routine that breaks a >large FFT into a number of smaller FFTs that fit into internal memory, >and uses DMA to transfer in- and output of the smaller FFTs to SDRAM >during the FFT processing. It turned out that computing the twiddle >factors on the fly was faster than pre-computing and storing in SDRAM. >This was on the 21161, with two 21161 connected as cluster and sharing >the SDRAM. It was a bitch. >
How do you recombine the FFTs to make the larger one? does this not require a large FFT running from the SDRAM? Is there a library available for this sort of thing? I can handle complex DMA stuff but writing FFT code is not something I have tried and would like to avoid.
Reply by Andor July 26, 20072007-07-26
On 26 Jul., 11:05, "Edison" <eb...@senscient.com> wrote:
> Hi > > I'm having some issues with a SHARC ADSP 21369. I'm perfroming a number of > FFTs on 8k sample blocks using the ADSP21369 EZKIT LITE and have > encountered a show stopper. > > If I run the FFT using the SDRAM to store the buffers (too big for > internal) then it takes 10 times as long. 2-3 times I would accept. I have > timed reads and writes to the SDRAM and they are within 3 times the number > of cycles.
SDRAM timing is somewhat opaque. Depending on the setup, 10 times slow down isn't unrealistic, especially if the SDRAM cycle time is slower than the processor cycle time - I think this is a factor of 3 on the 21369. It might well occur that you have to wait SDRAM 7 cylces for a read-access (which translates to 21 DSP cycles wait state, compared to 1 cycle read-access for internal memory).
> Has anyone else encountered / fixed this problem?
I fixed the problem by writing a special FFT routine that breaks a large FFT into a number of smaller FFTs that fit into internal memory, and uses DMA to transfer in- and output of the smaller FFTs to SDRAM during the FFT processing. It turned out that computing the twiddle factors on the fly was faster than pre-computing and storing in SDRAM. This was on the 21161, with two 21161 connected as cluster and sharing the SDRAM. It was a bitch. I think the 21369 needs some special setup to achieve (almost) single- SDRAM-cycle block transfer rate. Regards, Andor
Reply by Edison July 26, 20072007-07-26
Hi

I'm having some issues with a SHARC ADSP 21369. I'm perfroming a number of
FFTs on 8k sample blocks using the ADSP21369 EZKIT LITE and have
encountered a show stopper. 

If I run the FFT using the SDRAM to store the buffers (too big for
internal) then it takes 10 times as long. 2-3 times I would accept. I have
timed reads and writes to the SDRAM and they are within 3 times the number
of cycles.

Has anyone else encountered / fixed this problem?

Ed