> Hi All,
>
> Is there a big saving (computational complexity) of Radix-4FFT over
> Radix-2FFT.
>
> Given: 512 Points FFT.
> Processor: Lucent 16410(C/speed -> 195MHz).
I can tell you for 2106x SHARC DSPs:
1024 point complex Radix-2 FFT: 20052 cycles
1024 point complex Radix-4 FFT: 18220 cycles
256 point complex Radix-2 FFT: 4185 cycles
256 point complex Radix-4 FFT: 3787 cycles
(512 = 2^9 cannot be expressed as integer power of 4 ...)
That makes a speedup factor of around 1.10. You can assume the Lucent
code scales approximately equal.
> There is a reference FFT assembly library(with Radix 2) which gives an
> estimate of 13592 cylces. It looks too high for my application. My
> opinion is that if we
> can manage to compute one butterfly within 10 cycles - the cycles
> should be within 3000 ( 512/2*10 + overhead) cycles!
Take a smaller FFT? I find that FFT code is usually quite optimized by
the vendor and takes full advantage of the DSPs architecture (to make
it look good on DSP benchmarks, where FFTs always show up). You are
going to have a hard time trying to improve its efficiency.
Perhaps another approch for your problem (bandpass filter, sliding
DFT, Goertzel) would be more adequate than trying to improve your FFT
code.
Regards,
Andor
Reply by santosh nath●February 17, 20042004-02-17
Hi All,
Is there a big saving (computational complexity) of Radix-4FFT over
Radix-2FFT.
Given: 512 Points FFT.
Processor: Lucent 16410(C/speed -> 195MHz).
There is a reference FFT assembly library(with Radix 2) which gives an
estimate of 13592 cylces. It looks too high for my application. My
opinion is that if we
can manage to compute one butterfly within 10 cycles - the cycles
should be within 3000 ( 512/2*10 + overhead) cycles! and hopefully
more savings on Radix-4. The sequence is complex. A single complex
multiplication can be done within 2-3 cycles using MACs in 16410(Can
explain for future comm. ...).
Any ref./pointer is appriciated.
Thanks,
Santosh