Forums

faster FFTW?

Started by AL April 19, 2004
Hello All,
I haven't been able to find a similar thread, so pardon me if this has
already been addressed. But I was wondering if anyone has attempted to
convert the latest FFTW release to work with integers (taking into
account the necessary trig. modifications, later float conversion for
proper numerical results, etc.) in order to get faster results when
dealing with a real-time data stream. Since the application will be
used on a P-IV XEON, I found their 'threads enabling' option to be
somewhat beneficial, but I wanted to see if one could also take
advantage of the two ALUs on the P-IVs, by working with integers. Now
I've looked into the source, and since I'm not a compiler guru, the
part that's written in OCAML is heady, to say the least! I would
greatly appreciate any comments on taking this route, or point me
towards a better approach.
Regards,
AL
fortranchick@hotmail.com (AL) wrote in message news:<f844b0ef.0404191815.1a0e75f7@posting.google.com>...
> Hello All, > I haven't been able to find a similar thread, so pardon me if this has > already been addressed. But I was wondering if anyone has attempted to > convert the latest FFTW release to work with integers (taking into > account the necessary trig. modifications, later float conversion for > proper numerical results, etc.) in order to get faster results when > dealing with a real-time data stream. Since the application will be > used on a P-IV XEON, I found their 'threads enabling' option to be > somewhat beneficial, but I wanted to see if one could also take > advantage of the two ALUs on the P-IVs, by working with integers. Now > I've looked into the source, and since I'm not a compiler guru, the > part that's written in OCAML is heady, to say the least! I would > greatly appreciate any comments on taking this route, or point me > towards a better approach.
I'm not aware of anyone who has converted the whole of FFTW to fixed point. FFTW is designed for general-purpose CPUs, like the Pentium, that have a floating-point unit, and for these CPUs integer FFTs are usually slower in our experience (e.g. the fp unit can operate in parallel with integer address arithmetic). Our next version of FFTW, not yet released, includes an undocumented hack that allows you to generate codelets (hard-coded DFTs of small sizes) that use macros for things like multiplying two numbers, so you can replace this with an simple type of fixed-point operation (multiply then shift). Just a test, I tried this on my 2.2GHz Pentium-IV and compared the speed of a hard-coded size-64 DFT in fixed (32-bit) and floating point (single precision) arithmetic, compiled with gcc, and the fixed-point version was 20-30% slower. Of course, if you are willing to sacrifice precision go to 16-bit types, you might get some additional speedup on a Pentium by using the 8-way MMX instructions. For 32-bit types, however both integer and floating-point have 4-way SIMD. Cordially, Steven G. Johnson
fortranchick@hotmail.com (AL) wrote in message news:<f844b0ef.0404191815.1a0e75f7@posting.google.com>...

P.S. I should also mention that I've tried enabling all available
compiler optimizations on Intel's icc compiler, in hopes for better
performance. But came to realize that FFTW's inherent optimizations
are quite clever, and on said machine produce only slightly slower
code without compiler optimization.
Much thanks in advance. -AL