DSPRelated.com
Forums

FFT and IFFT in C/C++

Started by Unknown August 17, 2003
Hello All!
I want to built fast routine of FFT in C/C++, eg.

void  fft(double  *xr, double   *xi, double   *yr, double   *yi, int  n )

n = 2^m;

however I don't have a lot of experience in this. Should I use FFTW (Fastest
Fourier Transform in the West)? Can it be really the fastest in double
precision floating point? And Why?

Any comments are apprecialted.
Thanks,
Andrzej


Andrzej wrote:
> > Hello All! > I want to built fast routine of FFT in C/C++, eg. > > void fft(double *xr, double *xi, double *yr, double *yi, int n ) > > n = 2^m; > > however I don't have a lot of experience in this.
> Should I use FFTW (Fastest Fourier Transform in the West)?
Yes, it works and is well tested by thousands of people throughout the world.
> Can it be really the fastest in double > precision floating point?
The benchmarks are there on the FFTW web site. Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo nospam@mega-nerd.com (Yes it's valid) +-----------------------------------------------------------+ "In my opinion, shareware tends to combine the worst of commercial software (no sources) with the worst of free software (no finishing touches). I simply do not believe in the shareware market at all." -- Linus Torvalds
I benchmarked FFTW 1.3 and a couple of others, one derived from the 87 papers 
of Henrick Sorensen and the other by Ron Mayer based on the FHT.  FFTW was the 
fastest by far based, I believe, on improved use of cache.  FFTW is what I use 
now for everything.  I haven't benched the new version of FFTW though and use 
2.1.5.

In article <bhnhc5$scd$1@news.onet.pl>, "Andrzej" <nospam@nospam.pl> wrote:
>Hello All! >I want to built fast routine of FFT in C/C++, eg. > >void fft(double *xr, double *xi, double *yr, double *yi, int n ) > >n = 2^m; > >however I don't have a lot of experience in this. Should I use FFTW (Fastest >Fourier Transform in the West)? Can it be really the fastest in double >precision floating point? And Why? > >Any comments are apprecialted. >Thanks, >Andrzej > >
george.bush@whitehouse.com (George W. Bush) wrote:
> I benchmarked FFTW 1.3 and a couple of others, one derived from the 87 papers > of Henrick Sorensen and the other by Ron Mayer based on the FHT. FFTW was the > fastest by far based, I believe, on improved use of cache. FFTW is what I use > now for everything. I haven't benched the new version of FFTW though and use > 2.1.5.
(Our newly updated benchmarks at http://www.fftw.org/benchfft might interest you.) The origin of speed differences between FFT codes depends a lot on what size transform you consider, and even then it is complicated (you get 10% here and 10% there, and after a while it adds up). The memory hierarchy is critical, especially for very large transforms, but it isn't the whole story. It ranges from better use of the CPU pipeline and register set via long blocks of unrolled code (generated and optimized by a special-purpose program, in FFTW's case), to hacks to get around the fact that Pentia integer multiplication nixes the floating-point unit (grrr), to many tricks for improved cache utilization (explicit recursive divide & conquer, ordering of precomputed trig. factors, strategic mixing of in-place and out-of-place algorithms...), and so on...and, in FFTW's case, self-optimization to put them all together in the "best" way. These, days, of course, SIMD instructions throw another big wrench into the mix. Andrzej wrote:
> I want to built fast routine of FFT in C/C++, eg. > void fft(double *xr, double *xi, double *yr, double *yi, int n ) > n = 2^m; > however I don't have a lot of experience in this.
A caution: textbook FFT implementations, while pedagogical, are almost always several times slower than the fastest codes, and it's unlikely that you can remedy this without significant structural changes. Writing your own is a good learning experience, but there are many excellent free codes (not just FFTW) on the web that you would likely be better off with if you care about performance. (Also, by the way, using separate arrays for the real and imaginary parts is hard on the cache and typically slower than interleaved formats, ceteris paribus.) Steven