Reply by Matteo Frigo February 13, 20072007-02-13
"m.baldasseroni" <mbaldasseroni@progesi.it> writes:

> Good morning, > I'm using FFTW library for my project. > I'm working on a PowerPC 7447 processor with vxWorks operating system. > I have generated fftw library through gcc 2.95 compiler using the > sequent options for obj files: > -O3 -fomit-frame-pointer -fstrict-aliasing -fvec-eabi -mcpu=7450 > I'm working in single precision.
[Sorry for the off-topic post] Try these CFLAGS: -O3 -fno-schedule-insns -fvec-eabi -mcpu=7450 FFTW's codelets are generated in a way that minimizes the number of register spills independently on the number of registers. This remarkable property depends upon the compiler preserving the order in which the code is written, however. In gcc, the -O3 flag destroys the order. The suggested CFLAGS instruct gcc to preserve the original order. Regards, Matteo Frigo
Reply by Ben Jackson February 13, 20072007-02-13
On 2007-02-13, m.baldasseroni <mbaldasseroni@progesi.it> wrote:
> particular the error occurs on the first assembler instruction > "stvx". This fact shows that Altivec optimization is active !
That particular instruction (IIRC) would only prove that you have EABI support (IE your compile supports OTHER people using Altivec by doing full 64-bit register saves). You need to ensure that the actual altivec instructions are being used. -- Ben Jackson AD7GD <ben@ben.com> http://www.ben.com/
Reply by February 13, 20072007-02-13
On Feb 13, 10:52 am, "m.baldasseroni" <mbaldasser...@progesi.it>
wrote:
> In any case, I have overtake the step you are writing about 3 months > ago and, I repeat again, I'm sure that Altivec optimization is > active!
Try running the fftwf_print_plan command to see what algorithm FFTW is using (the codelet names have a "v" in them for vectorized versions). You can also pass FFTW_NO_SIMD, which disables the SIMD code---if FFTW_NO_SIMD does not slow things down considerably, then the Altivec code is not being used. Note that you are using an *ancient* version of gcc (2.95). If you read our FAQ, you'll find that some versions of gcc 2.95 don't compile FFTW's altivec code correctly. I would strongly recommend upgrading to gcc 3.4.4 at least. (I believe I recommended upgrading your gcc a year ago, too.) Note also that the configure script picks different compiler flags than you are using, which probably make a 20% difference in speed. And with older buggy gcc versions, the choice of optimization flags can also affect correctness.
> In the fftw guide (par 8.3, line 32) I suppose you suggest to set the > various options and compiler characteristics in config.h file!
This is as a last resort, for people who know what they are doing, in cases where the configure script is not applicable. It is vastly preferable to use our configure scripts and Makefiles, which we know compile the correct files, enable the correct options, and use carefully selected compiler optimizations.
> I underline that I'm spending my time on FFTW because I have to obtain > the same results reported on the fftw.org site,
I'm not sure how you expect to get the "same" results as in the graphs on the FFTW web site. Those PowerPC benchmarks were performed on a G5 (and other people have performed similar benchmarks with older FFTW versions, see e.g. http://findsabrina.org/altivec/). You are apparently using a G4, with an unspecified clock speed. Many things can go wrong with benchmarks. For example, are you sure it's even producing the correct results? Our provided "bench" program can perform correctness and speed tests. As another example, if you are benchmarking by repeatedly forwards FFTing and then backwards FFTing the same array, this is a diverging process (because FFTW is unnormalized) and will lead to floating-point exceptions that will slow things down dramatically (the easy solution is to initialize the array to zero). Or maybe there is some other bug. There's a limit to how much other people can debug your code for you remotely, however. In general, I would suggest compiling our provided self-test program, checking for correctness first, and then using the same program to check performance. e.g. ./bench -v2 -y 2048 # check correctness ./bench -v2 2048 # check speed The "-v2" option will cause it to call fftw_print_plan so you can see what algorithm it chooses. In general, the less you diverge from our standard configure/ compilation process, the less you will have to debug and the easier it is to help you. I'm sorry you feel insulted, but if your basic approach is founded on ignorance, you need to be told (and others trying to help you need to be aware too that they are dealing with someone who doesn't know how to install "make" but is trying to hand- configure, compile, and benchmark a large and complex piece of software ... it affects the basic assumptions we make in offering you advice). See also: http://www.catb.org/~esr/faqs/smart-questions.html Regards, Steven G. Johnson
Reply by m.baldasseroni February 13, 20072007-02-13
First of all thanks for your always "positive" behaviour about my
comment on FFTW, about my work and about a person who is trying to
learn from his mistakes !!!
However, since October 2006 I have spent my time studying and working
around different aspect of my work and I have improved my knowledge
not only on the trite compile process!
I'm guessing that you have not read what I have write during the last
few days!

>>I'm guessing that you screwed up the compilation process. (From the >>sound of things, you haven't even checked whether the Altivec code is >>actually being used.)
In any case, I have overtake the step you are writing about 3 months ago and, I repeat again, I'm sure that Altivec optimization is active! To support my hypothesis I have generated an fftw library with Altivec support and I have started my application disabling the altivec coprocessor support through the taskSpawn() command of vxWorks. In this case the foreseen error occurs (Altivec unavailable). In particular the error occurs on the first assembler instruction "stvx". This fact shows that Altivec optimization is active !
>>We routinely cross-compile FFTW for other platforms using the >>configure script, and a Google search reveals numerous people using >>autoconf configure script with vxWorks cross-compilers
In the fftw guide (par 8.3, line 32) I suppose you suggest to set the various options and compiler characteristics in config.h file! Do you think that I have read in a correct way or not?? Three months ago I followed your indications and, for example, I have set the Altivec optimization (config.h, lines 69-70 /* Define to enable Altivec optimizations. */ #define HAVE_ALTIVEC). However, following other indications that I think is useless to repeat again, now I have generated a code with fftw. When I used the script you have suggested me (using cygwin etc..), I obtained the same config.h file that I have previously hand configured. I underline that I'm spending my time on FFTW because I have to obtain the same results reported on the fftw.org site, because my work group need it, and so I think that I'm doing a mistake that I'm not able to find. I hope to be enough explicit and, in any case, I underline that the first intent in the use of forum resource would not be to explain personal critics about the approach to problems of other person, but to help the colleagues and to have a positive intent to talk around the main argument of the forum!! I wish this can be the first step for a good dialog and not for private attack ! Regards, Massimo Baldasseroni
Reply by February 12, 20072007-02-12
On Feb 12, 10:33 am, "m.baldasseroni" <mbaldasser...@progesi.it>
wrote:
> I'm working under vxWorks operating system, so I have manually > configured the config.h file. I have enabled the altivec optimization > and single precision mode.
I'm guessing that you screwed up the compilation process. (From the sound of things, you haven't even checked whether the Altivec code is actually being used.) As I repeatedly urged you the last time you asked about this on comp.dsp, in October 2006, you should use FFTW's built-in configure script, its compiler flags, and its Makefiles. There is no reason why this should not be possible with cross-compilers for vxWorks; we routinely cross-compile FFTW for other platforms using the configure script, and a Google search reveals numerous people using autoconf configure script with vxWorks cross-compilers. (If I recall correctly, you are using Cygwin with a cross-compiler, using some weird/old variant of gcc.) At the time, your objection was that you didn't know how to run or install "make", which is a rather basic question in Unix-style software compilation and reveals that you had/have a lot of catching- up to do (and moreover were asking on the wrong newsgroup). There are plenty of books out there on compiling Unix software, using Autoconf- style configure scripts, cross-compiling, and so on. Rather than spend a little time learning how to use the tools properly, however, you've instead spent over a year trying to do it your own way based on minimal understanding, and apparently failing. I'm sorry to be so negative, but your responses to my explanations so far have made helping you rather frustrating and unrewarding. Regards, Steven G. Johnson
Reply by m.baldasseroni February 12, 20072007-02-12
Mark Borgerding ha scritto:

> m.baldasseroni wrote: > > First of all thanks you for your answer! > > All caches and branch predictions are enabled (I have verified it on > > marvel register). > > I'm sure of Altivec elaboration and caches use because I've tried the > > vBigDSP elaboration and the results I've obtained are compliant with > > the theory and with the graphics show on the fft.org site. > > I'm using the same test program for measuring fft speed in both cases > > (FFTW and vBigDsp), so I'm quite sure that there is a mistake in FFTW > > using, configuration or compile process. > > Could you give me any indications about your configure file (config.h) > > and the options of compile process ? > > If you prefere I can post my config.h file... > > > > I thank you again for your help > > Massimo > > > > > > > Are you calling the fftw configure script with the --enable-single and > --enable-altivec arguments? > > Are you certain you have required alignment for SIMD? I'm not sure if > this is important for AlitVec, but for SSE, this is crucial for speed. > > > > -- > Mark Borgerding > 3dB Labs, Inc > Innovate. Develop. Deliver.
Thanks for your answer. I'm working under vxWorks operating system, so I have manually configured the config.h file. I have enabled the altivec optimization and single precision mode. I remember you that my target is a PowerPC 7447, so I have enabled ONLY altivec optimization and not SSE and SSE2. About SIMD alignment I have used fftwf_malloc() function that guarantes the correct alignment at 16 bytes. Have you used FFTW algorithms? Which compile options have you used? Have you got any other suggestes? Thank you in advance Massimo
Reply by Mark Borgerding February 9, 20072007-02-09
m.baldasseroni wrote:
> First of all thanks you for your answer! > All caches and branch predictions are enabled (I have verified it on > marvel register). > I'm sure of Altivec elaboration and caches use because I've tried the > vBigDSP elaboration and the results I've obtained are compliant with > the theory and with the graphics show on the fft.org site. > I'm using the same test program for measuring fft speed in both cases > (FFTW and vBigDsp), so I'm quite sure that there is a mistake in FFTW > using, configuration or compile process. > Could you give me any indications about your configure file (config.h) > and the options of compile process ? > If you prefere I can post my config.h file... > > I thank you again for your help > Massimo > >
Are you calling the fftw configure script with the --enable-single and --enable-altivec arguments? Are you certain you have required alignment for SIMD? I'm not sure if this is important for AlitVec, but for SSE, this is crucial for speed. -- Mark Borgerding 3dB Labs, Inc Innovate. Develop. Deliver.
Reply by m.baldasseroni February 8, 20072007-02-08
First of all thanks you for your answer!
All caches and branch predictions are enabled (I have verified it on
marvel register).
I'm sure of Altivec elaboration and caches use because I've tried the
vBigDSP elaboration and the results I've obtained are compliant with
the theory and with the graphics show on the fft.org site.
I'm using the same test program for measuring fft speed in both cases
(FFTW and vBigDsp), so I'm quite sure that there is a mistake in FFTW
using, configuration or compile process.
Could you give me any indications about your configure file (config.h)
and the options of compile process ?
If you prefere I can post my config.h file...

I thank you again for your help
Massimo


Ben Jackson ha scritto:

> On 2007-02-06, m.baldasseroni <mbaldasseroni@progesi.it> wrote: > > Good morning, > > I'm using FFTW library for my project. > > I'm working on a PowerPC 7447 processor with vxWorks operating system. > > Are you sure you have all the caches and branch prediction enabled? > > -- > Ben Jackson AD7GD > <ben@ben.com> > http://www.ben.com/
Reply by Ben Jackson February 7, 20072007-02-07
On 2007-02-06, m.baldasseroni <mbaldasseroni@progesi.it> wrote:
> Good morning, > I'm using FFTW library for my project. > I'm working on a PowerPC 7447 processor with vxWorks operating system.
Are you sure you have all the caches and branch prediction enabled? -- Ben Jackson AD7GD <ben@ben.com> http://www.ben.com/
Reply by m.baldasseroni February 7, 20072007-02-07
I think the speed of FFTW is highly CPU dependant but I'm using a
PowerPC 7447 with SSE and Altivec optimization, too !

Have you got obtained the results shows in fft.org site?

Regards
Massimo

> Probably no mistake. The speed of FFTW is probably highly CPU dependant. > For instance, to get the speed quoted my require a CPU with SSE or > Altivec SIMD (single instruction multiple data) instructions.