Reply by ## February 17, 20062006-02-17
stevenj@alum.mit.edu wrote in news:1140208214.201862.187120
@g14g2000cwa.googlegroups.com:

> We have benchmarks comparing many different FFT codes at > www.fftw.org/speed
I did have a look at it.
> > Be sure you are using FFTW properly (if you re-create the plan for each > transform you will slow it down tremendously). See the FAQ: > http://www.fftw.org/faq/section3.html#slow
My plans are created once for type or array be4 the start of the simulation proper. So, this is not the problem.
> > (Note that our DCT code currently doesn't exploit SSE, unfortunately, > so for that particular problem Intel may beat us by a large margin if > they do. For such small DCTs you can speed FFTW up tremendously, > however, by generating size-specific DCT codelets; see the "generating > your own code" section of the manual.) > > Note also that for such small transforms working out-of-place might be > faster. You should also definitely use FFTW's configure script to > compile FFTW with gcc (e.g. via MinGW), since it turns out that the > proper choice of compiler flags makes a big difference and the best > choice is somewhat counter-intuitive.
This I will try. Thank you very much. FFTW is still a great set of transforms. I am amazed that even as the code stands, I can still do DNS simulation on a home pc that I used to run on a Cray XMP in the early 90s. --
> > Best of luck, > Steven G. Johnson > >
Reply by February 17, 20062006-02-17
We have benchmarks comparing many different FFT codes at
www.fftw.org/speed

Be sure you are using FFTW properly (if you re-create the plan for each
transform you will slow it down tremendously).  See the FAQ:
http://www.fftw.org/faq/section3.html#slow

(Note that our DCT code currently doesn't exploit SSE, unfortunately,
so for that particular problem Intel may beat us by a large margin if
they do.  For such small DCTs you can speed FFTW up tremendously,
however, by generating size-specific DCT codelets; see the "generating
your own code" section of the manual.)

Note also that for such small transforms working out-of-place might be
faster.  You should also definitely use FFTW's configure script to
compile FFTW with gcc (e.g. via MinGW), since it turns out that the
proper choice of compiler flags makes a big difference and the best
choice is somewhat counter-intuitive.

Best of luck,
Steven G. Johnson

Reply by ## February 17, 20062006-02-17

Hello,
     I am using fftw in a fluid simulation code for doing 1D,2D ffts and 
cosine transforms associaciated with Chebyshev differentiation. All this 
is done on a P4 pc with winxp. Code written in C on the dev-C++ IDE i.e. 
the compiler is gcc.

To my surprise, it turns out that the simulation run times are dominated 
by the ffts (I expected the matrix inversion in the elliptical part of 
the  code to dominate cpu usage which is past experience.) Standard sizes 
of the tranforms: 128/192/256/384/512 for the ffts in 1D/2D and 129/257 
for the cosine transform in double and in place.

So, I am on the lookout for ffts going faster on the P4 than fftw. I am 
aware of the Intel offering, so it is clearly possible, but but but but,  
since I am doing this on my home pc in my spare time I would rather use 
freeware. So, has anyone heard of or used freeware ffts that run faster 
than fftw specifically on a P4 of a home pc? Any suggestions would be 
appreciated.

Thanks.
--