Reply by Andor February 17, 20042004-02-17
santosh nath wrote:
> Hi All, > > Is there a big saving (computational complexity) of Radix-4FFT over > Radix-2FFT. > > Given: 512 Points FFT. > Processor: Lucent 16410(C/speed -> 195MHz).
I can tell you for 2106x SHARC DSPs: 1024 point complex Radix-2 FFT: 20052 cycles 1024 point complex Radix-4 FFT: 18220 cycles 256 point complex Radix-2 FFT: 4185 cycles 256 point complex Radix-4 FFT: 3787 cycles (512 = 2^9 cannot be expressed as integer power of 4 ...) That makes a speedup factor of around 1.10. You can assume the Lucent code scales approximately equal.
> There is a reference FFT assembly library(with Radix 2) which gives an > estimate of 13592 cylces. It looks too high for my application. My > opinion is that if we > can manage to compute one butterfly within 10 cycles - the cycles > should be within 3000 ( 512/2*10 + overhead) cycles!
Take a smaller FFT? I find that FFT code is usually quite optimized by the vendor and takes full advantage of the DSPs architecture (to make it look good on DSP benchmarks, where FFTs always show up). You are going to have a hard time trying to improve its efficiency. Perhaps another approch for your problem (bandpass filter, sliding DFT, Goertzel) would be more adequate than trying to improve your FFT code. Regards, Andor
Reply by santosh nath February 17, 20042004-02-17
Hi All,

Is there a big saving (computational complexity) of Radix-4FFT over
Radix-2FFT.

Given: 512 Points FFT.
Processor: Lucent 16410(C/speed -> 195MHz).

There is a reference FFT assembly library(with Radix 2) which gives an
estimate of 13592 cylces. It looks too high for my application. My
opinion is that if we
can manage to compute one butterfly within 10 cycles - the cycles
should be within 3000 ( 512/2*10 + overhead) cycles! and hopefully
more savings on Radix-4. The sequence is complex. A single complex
multiplication can be done within 2-3 cycles using MACs in 16410(Can
explain for future comm. ...).

Any ref./pointer is appriciated.

Thanks,
Santosh