george.bush@whitehouse.com (George W. Bush) wrote:
> I benchmarked FFTW 1.3 and a couple of others, one derived from the 87 papers 
> of Henrick Sorensen and the other by Ron Mayer based on the FHT.  FFTW was the 
> fastest by far based, I believe, on improved use of cache.  FFTW is what I use 
> now for everything.  I haven't benched the new version of FFTW though and use 
> 2.1.5.

(Our newly updated benchmarks at http://www.fftw.org/benchfft might
interest you.)

The origin of speed differences between FFT codes depends a lot on
what size transform you consider, and even then it is complicated (you
get 10% here and 10% there, and after a while it adds up).  The memory
hierarchy is critical, especially for very large transforms, but it
isn't the whole story.  It ranges from better use of the CPU pipeline
and register set via long blocks of unrolled code (generated and
optimized by a special-purpose program, in FFTW's case), to hacks to
get around the fact that Pentia integer multiplication nixes the
floating-point unit (grrr), to many tricks for improved cache
utilization (explicit recursive divide & conquer, ordering of
precomputed trig. factors, strategic mixing of in-place and
out-of-place algorithms...), and so on...and, in FFTW's case,
self-optimization to put them all together in the "best" way.  These,
days, of course, SIMD instructions throw another big wrench into the
mix.

Andrzej wrote:
> I want to built fast routine of FFT in C/C++, eg.
> void  fft(double  *xr, double   *xi, double   *yr, double   *yi, int  n )
> n = 2^m;
> however I don't have a lot of experience in this.

A caution: textbook FFT implementations, while pedagogical, are almost
always several times slower than the fastest codes, and it's unlikely
that you can remedy this without significant structural changes. 
Writing your own is a good learning experience, but there are many
excellent free codes (not just FFTW) on the web that you would likely
be better off with if you care about performance.

(Also, by the way, using separate arrays for the real and imaginary
parts is hard on the cache and typically slower than interleaved
formats, ceteris paribus.)

Steven

I benchmarked FFTW 1.3 and a couple of others, one derived from the 87 papers 
of Henrick Sorensen and the other by Ron Mayer based on the FHT.  FFTW was the 
fastest by far based, I believe, on improved use of cache.  FFTW is what I use 
now for everything.  I haven't benched the new version of FFTW though and use 
2.1.5.

In article <bhnhc5$scd$1@news.onet.pl>, "Andrzej" <nospam@nospam.pl> wrote:
>Hello All!
>I want to built fast routine of FFT in C/C++, eg.
>
>void  fft(double  *xr, double   *xi, double   *yr, double   *yi, int  n )
>
>n = 2^m;
>
>however I don't have a lot of experience in this. Should I use FFTW (Fastest
>Fourier Transform in the West)? Can it be really the fastest in double
>precision floating point? And Why?
>
>Any comments are apprecialted.
>Thanks,
>Andrzej
>
>

Andrzej wrote:
> 
> Hello All!
> I want to built fast routine of FFT in C/C++, eg.
> 
> void  fft(double  *xr, double   *xi, double   *yr, double   *yi, int  n )
> 
> n = 2^m;
> 
> however I don't have a lot of experience in this. 

> Should I use FFTW (Fastest Fourier Transform in the West)? 

Yes, it works and is well tested by thousands of people throughout
the world.

> Can it be really the fastest in double
> precision floating point? 

The benchmarks are there on the FFTW web site.

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo  nospam@mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+
"In my opinion, shareware tends to combine the worst of 
commercial software (no sources) with the worst of free 
software (no finishing touches). I simply do not believe 
in the shareware market at all."  -- Linus Torvalds

Hello All!
I want to built fast routine of FFT in C/C++, eg.

void  fft(double  *xr, double   *xi, double   *yr, double   *yi, int  n )

n = 2^m;

however I don't have a lot of experience in this. Should I use FFTW (Fastest
Fourier Transform in the West)? Can it be really the fastest in double
precision floating point? And Why?

Any comments are apprecialted.
Thanks,
Andrzej