DSPRelated.com
Forums

FFTW speed !!

Started by m.baldasseroni February 6, 2007
First of all thanks for your always "positive" behaviour about my
comment on FFTW, about my work and about a person who is trying to
learn from his mistakes !!!
However, since October 2006 I have spent my time studying and working
around different aspect of my work and I have improved my knowledge
not only on the trite compile process!
I'm guessing that you have not read what I have write during the last
few days!

>>I'm guessing that you screwed up the compilation process. (From the >>sound of things, you haven't even checked whether the Altivec code is >>actually being used.)
In any case, I have overtake the step you are writing about 3 months ago and, I repeat again, I'm sure that Altivec optimization is active! To support my hypothesis I have generated an fftw library with Altivec support and I have started my application disabling the altivec coprocessor support through the taskSpawn() command of vxWorks. In this case the foreseen error occurs (Altivec unavailable). In particular the error occurs on the first assembler instruction "stvx". This fact shows that Altivec optimization is active !
>>We routinely cross-compile FFTW for other platforms using the >>configure script, and a Google search reveals numerous people using >>autoconf configure script with vxWorks cross-compilers
In the fftw guide (par 8.3, line 32) I suppose you suggest to set the various options and compiler characteristics in config.h file! Do you think that I have read in a correct way or not?? Three months ago I followed your indications and, for example, I have set the Altivec optimization (config.h, lines 69-70 /* Define to enable Altivec optimizations. */ #define HAVE_ALTIVEC). However, following other indications that I think is useless to repeat again, now I have generated a code with fftw. When I used the script you have suggested me (using cygwin etc..), I obtained the same config.h file that I have previously hand configured. I underline that I'm spending my time on FFTW because I have to obtain the same results reported on the fftw.org site, because my work group need it, and so I think that I'm doing a mistake that I'm not able to find. I hope to be enough explicit and, in any case, I underline that the first intent in the use of forum resource would not be to explain personal critics about the approach to problems of other person, but to help the colleagues and to have a positive intent to talk around the main argument of the forum!! I wish this can be the first step for a good dialog and not for private attack ! Regards, Massimo Baldasseroni
On Feb 13, 10:52 am, "m.baldasseroni" <mbaldasser...@progesi.it>
wrote:
> In any case, I have overtake the step you are writing about 3 months > ago and, I repeat again, I'm sure that Altivec optimization is > active!
Try running the fftwf_print_plan command to see what algorithm FFTW is using (the codelet names have a "v" in them for vectorized versions). You can also pass FFTW_NO_SIMD, which disables the SIMD code---if FFTW_NO_SIMD does not slow things down considerably, then the Altivec code is not being used. Note that you are using an *ancient* version of gcc (2.95). If you read our FAQ, you'll find that some versions of gcc 2.95 don't compile FFTW's altivec code correctly. I would strongly recommend upgrading to gcc 3.4.4 at least. (I believe I recommended upgrading your gcc a year ago, too.) Note also that the configure script picks different compiler flags than you are using, which probably make a 20% difference in speed. And with older buggy gcc versions, the choice of optimization flags can also affect correctness.
> In the fftw guide (par 8.3, line 32) I suppose you suggest to set the > various options and compiler characteristics in config.h file!
This is as a last resort, for people who know what they are doing, in cases where the configure script is not applicable. It is vastly preferable to use our configure scripts and Makefiles, which we know compile the correct files, enable the correct options, and use carefully selected compiler optimizations.
> I underline that I'm spending my time on FFTW because I have to obtain > the same results reported on the fftw.org site,
I'm not sure how you expect to get the "same" results as in the graphs on the FFTW web site. Those PowerPC benchmarks were performed on a G5 (and other people have performed similar benchmarks with older FFTW versions, see e.g. http://findsabrina.org/altivec/). You are apparently using a G4, with an unspecified clock speed. Many things can go wrong with benchmarks. For example, are you sure it's even producing the correct results? Our provided "bench" program can perform correctness and speed tests. As another example, if you are benchmarking by repeatedly forwards FFTing and then backwards FFTing the same array, this is a diverging process (because FFTW is unnormalized) and will lead to floating-point exceptions that will slow things down dramatically (the easy solution is to initialize the array to zero). Or maybe there is some other bug. There's a limit to how much other people can debug your code for you remotely, however. In general, I would suggest compiling our provided self-test program, checking for correctness first, and then using the same program to check performance. e.g. ./bench -v2 -y 2048 # check correctness ./bench -v2 2048 # check speed The "-v2" option will cause it to call fftw_print_plan so you can see what algorithm it chooses. In general, the less you diverge from our standard configure/ compilation process, the less you will have to debug and the easier it is to help you. I'm sorry you feel insulted, but if your basic approach is founded on ignorance, you need to be told (and others trying to help you need to be aware too that they are dealing with someone who doesn't know how to install "make" but is trying to hand- configure, compile, and benchmark a large and complex piece of software ... it affects the basic assumptions we make in offering you advice). See also: http://www.catb.org/~esr/faqs/smart-questions.html Regards, Steven G. Johnson
On 2007-02-13, m.baldasseroni <mbaldasseroni@progesi.it> wrote:
> particular the error occurs on the first assembler instruction > "stvx". This fact shows that Altivec optimization is active !
That particular instruction (IIRC) would only prove that you have EABI support (IE your compile supports OTHER people using Altivec by doing full 64-bit register saves). You need to ensure that the actual altivec instructions are being used. -- Ben Jackson AD7GD <ben@ben.com> http://www.ben.com/
"m.baldasseroni" <mbaldasseroni@progesi.it> writes:

> Good morning, > I'm using FFTW library for my project. > I'm working on a PowerPC 7447 processor with vxWorks operating system. > I have generated fftw library through gcc 2.95 compiler using the > sequent options for obj files: > -O3 -fomit-frame-pointer -fstrict-aliasing -fvec-eabi -mcpu=7450 > I'm working in single precision.
[Sorry for the off-topic post] Try these CFLAGS: -O3 -fno-schedule-insns -fvec-eabi -mcpu=7450 FFTW's codelets are generated in a way that minimizes the number of register spills independently on the number of registers. This remarkable property depends upon the compiler preserving the order in which the code is written, however. In gcc, the -O3 flag destroys the order. The suggested CFLAGS instruct gcc to preserve the original order. Regards, Matteo Frigo