DSPRelated.com
Forums

Re: Execution times of FFT in MATLAB vs complex FFT in C

Started by Unknown May 30, 2004
>I've written a relatively efficient radix 2 fft engine for a commercial >voice quality measurement product (while I was working) which largely >modelled the one in Numerical Recipe.
I've never seen a radix-2 FFT code that was efficient...
>I did some research. Apparently, most people find integrating the fft in >Intel's library in general does indeed give 5 times of speed enhancement.
It's not just Intel's library. I've been benchmarking FFTs for seven years now, and the Numerical Recipes code (along with other textbook radix-2 codes) has always been on the order of 5 times slower than the fastest implementations, on essentially every modern CPU. (See www.fftw.org/speed ...look at the single-precision graphs to find Numerical Recipes.) In order to fully exploit the pipelines, register sets, and caches of CPUs from the last decade, you need to have a larger kernel (among other things) than the simple 2-point butterfly in a classic radix-2 algorithm. Unfortunately, Numerical Recipes' discussion of the benefits of larger radices, etc., is 20 years out of date (they claim that you can only get a 20% improvement, which is true only if the speed were determined entirely by the flop count ... that would make too much sense for machines these days). Architecture-specific hacks like SSE2 help Intel's library as well, of course (as well as FFTW, which also uses SIMD in the latest release), but this is only another factor of ~2 (in double precision).
>I wonder if anyone see the same amount of speed performance using Intel's >library with AMD's processor (which is also MMX compatible).
See the above URL for benchmarks on AMD chips. Cordiall, Steven G. Johnson