Reply by bhaskar_thiagarajan January 16, 20022002-01-16
You can read the ADSP 21k Family Application Handbook for more detail
on a lot of their examples and libraries. (I'm not sure if this is
available in electronic format on the web site).

Anyways, the last but one stage is
computed separately to optimize the butterfly operations. This stage
happens to have only 2 butterflies for each group. So they combine
the 2 butterflies into a single loop to reduce overhead, thus gaining
some extra speed.

The last stage has only 1 butterfly per group and this is again done
as a special case to gain speed (at the cost of writing a little
extra code)

Cheers
Bhaskar --- In adsp@y..., Prashanth Bhushan <pb@s...> wrote:
> Hi,
>
> I was going through the FFT code hosted by Analog Devices where they
> have benchmarked a 128 point real FFT to take just 1160 cycles on
ADSP
> 21062 !
>
> The implementation is split into 4 parts. Out of N stages first part
> computes the first 2 stages . Second part computes for N-4 stages ,
> third part computes the last but one stage and the 4th part
computes the
> last stage.
>
> I needed to know why the N -1th stage is computed separately.
>
> Thanks in advance
> Regds
> Prashanth


Reply by Andrew V. Nesterov January 16, 20022002-01-16

> Date: Tue, 15 Jan 2002 18:12:09 +0530
> From: Prashanth Bhushan <>
>
> Hi,
>
> I was going through the FFT code hosted by Analog Devices where they
> have benchmarked a 128 point real FFT to take just 1160 cycles on ADSP
> 21062 !
>
> The implementation is split into 4 parts. Out of N stages first part
> computes the first 2 stages . Second part computes for N-4 stages ,
> third part computes the last but one stage and the 4th part computes the
> last stage.

Sounds like they use Meyer-Schwartz algorithm. And if this is the case,

> I needed to know why the N -1th stage is computed separately.

on this stage it is possible to compute 4 radix-2 butterflies in a
single loop iteration, which is not possible on any other stages
out of M, where N=2**M.

Regards,
Andrew

> Thanks in advance
> Regds
> Prashanth
>


Reply by Prashanth Bhushan January 15, 20022002-01-15
Hi,

I was going through the FFT code hosted by Analog Devices where they
have benchmarked a 128 point real FFT to take just 1160 cycles on ADSP
21062 !

The implementation is split into 4 parts. Out of N stages first part
computes the first 2 stages . Second part computes for N-4 stages ,
third part computes the last but one stage and the 4th part computes the
last stage.

I needed to know why the N -1th stage is computed separately.

Thanks in advance
Regds
Prashanth