Hello All, I haven't been able to find a similar thread, so pardon me if this has already been addressed. But I was wondering if anyone has attempted to convert the latest FFTW release to work with integers (taking into account the necessary trig. modifications, later float conversion for proper numerical results, etc.) in order to get faster results when dealing with a real-time data stream. Since the application will be used on a P-IV XEON, I found their 'threads enabling' option to be somewhat beneficial, but I wanted to see if one could also take advantage of the two ALUs on the P-IVs, by working with integers. Now I've looked into the source, and since I'm not a compiler guru, the part that's written in OCAML is heady, to say the least! I would greatly appreciate any comments on taking this route, or point me towards a better approach. Regards, AL
faster FFTW?
Started by ●April 19, 2004
Reply by ●April 20, 20042004-04-20
fortranchick@hotmail.com (AL) wrote in message news:<f844b0ef.0404191815.1a0e75f7@posting.google.com>...> Hello All, > I haven't been able to find a similar thread, so pardon me if this has > already been addressed. But I was wondering if anyone has attempted to > convert the latest FFTW release to work with integers (taking into > account the necessary trig. modifications, later float conversion for > proper numerical results, etc.) in order to get faster results when > dealing with a real-time data stream. Since the application will be > used on a P-IV XEON, I found their 'threads enabling' option to be > somewhat beneficial, but I wanted to see if one could also take > advantage of the two ALUs on the P-IVs, by working with integers. Now > I've looked into the source, and since I'm not a compiler guru, the > part that's written in OCAML is heady, to say the least! I would > greatly appreciate any comments on taking this route, or point me > towards a better approach.I'm not aware of anyone who has converted the whole of FFTW to fixed point. FFTW is designed for general-purpose CPUs, like the Pentium, that have a floating-point unit, and for these CPUs integer FFTs are usually slower in our experience (e.g. the fp unit can operate in parallel with integer address arithmetic). Our next version of FFTW, not yet released, includes an undocumented hack that allows you to generate codelets (hard-coded DFTs of small sizes) that use macros for things like multiplying two numbers, so you can replace this with an simple type of fixed-point operation (multiply then shift). Just a test, I tried this on my 2.2GHz Pentium-IV and compared the speed of a hard-coded size-64 DFT in fixed (32-bit) and floating point (single precision) arithmetic, compiled with gcc, and the fixed-point version was 20-30% slower. Of course, if you are willing to sacrifice precision go to 16-bit types, you might get some additional speedup on a Pentium by using the 8-way MMX instructions. For 32-bit types, however both integer and floating-point have 4-way SIMD. Cordially, Steven G. Johnson
Reply by ●April 20, 20042004-04-20
fortranchick@hotmail.com (AL) wrote in message news:<f844b0ef.0404191815.1a0e75f7@posting.google.com>... P.S. I should also mention that I've tried enabling all available compiler optimizations on Intel's icc compiler, in hopes for better performance. But came to realize that FFTW's inherent optimizations are quite clever, and on said machine produce only slightly slower code without compiler optimization. Much thanks in advance. -AL