DSPRelated.com
Forums

FFTW speed !!

Started by m.baldasseroni February 6, 2007
Good morning,
I'm using FFTW library for my project.
I'm working on a PowerPC 7447 processor with vxWorks operating system.
I have generated fftw library through gcc 2.95 compiler using the
sequent options for obj files:
-O3 -fomit-frame-pointer -fstrict-aliasing -fvec-eabi -mcpu=7450
I'm working in single precision.

My problem is the speed of fft elaboration.
I'm using the BASIC interface and I have tried 2 kinds of algorithm
(fftwf_plan_dft_1d()  and fftwf_plan_dft(1, ...) ) for complex data:

plan_1 = fftwf_plan_dft_1d(length_FFTW, in_complex, out_complex,
FFTW_FORWARD, FFTW_MEASURE);
plan_2 = fftwf_plan_dft_1d(length_FFTW, in_complex, out_complex,
FFTW_BACKWARD, FFTW_MEASURE);

and

plan_1 = fftwf_plan_dft(1, array_length, in_complex, out_complex,
FFTW_FORWARD, FFTW_MEASURE);
plan_2 = fftwf_plan_dft(1, array_length, in_complex, out_complex,
FFTW_BACKWARD, FFTW_MEASURE);

but the best results I've obtained are the follows:

Algorithm	              Length	Interface	Flag	         ns/sample
fftwf_plan_dft(1, ...)	         8	      Basic	ESTIMATE	37,85
fftwf_plan_dft(1, ...)	        16	     Basic     ESTIMATE	       37,47
fftwf_plan_dft(1, ...)	        32	     Basic     ESTIMATE	       22,77
fftwf_plan_dft(1, ...)	        64	     Basic     ESTIMATE	       16,06
fftwf_plan_dft(1, ...)	       100	    Basic     ESTIMATE	      15,86
fftwf_plan_dft(1, ...)	       128	    Basic     ESTIMATE	      16,17
fftwf_plan_dft(1, ...)	       256	    Basic     ESTIMATE        21,65
fftwf_plan_dft(1, ...)	       500	    Basic     ESTIMATE        34,94
fftwf_plan_dft(1, ...)	       512	    Basic     ESTIMATE	      27,15
fftwf_plan_dft(1, ...)	      1000	   Basic     ESTIMATE        25
fftwf_plan_dft(1, ...)	      1024	   Basic     ESTIMATE	     34,22
fftwf_plan_dft(1, ...)	      2048	   Basic     ESTIMATE        43,98

I've executed my test in 10000 loops in out of place mode. I've tried
with other flags as MEASURE, PATIENT and EXHAUSTIVE, but results are
worse!
For a fft length of 1024, for example,  I've measured a speed of 34,22
ns/sample that corresponds to 1461 mflops while the graphic on
fftw.org site shows a speed of about 4700 mflops !!
Have you got any idea about the mistake I have done?

Thanks in advance
Massimo

On Tue, 06 Feb 2007 06:04:54 -0800, m.baldasseroni wrote:

> For a fft length of 1024, for example, I've measured a speed of 34,22 > ns/sample that corresponds to 1461 mflops while the graphic on > fftw.org site shows a speed of about 4700 mflops !! > Have you got any idea about the mistake I have done?
Have you excluded the time taken to produce the plan? That should be done once at startup, and then reused for all the data with that length and buffer alignment. In particular producing a new plan for every data block will totally kill performance. Regards, Dan.
m.baldasseroni wrote:

> For a fft length of 1024, for example, I've measured a speed of 34,22 > ns/sample that corresponds to 1461 mflops while the graphic on > fftw.org site shows a speed of about 4700 mflops !! > Have you got any idea about the mistake I have done?
Probably no mistake. The speed of FFTW is probably highly CPU dependant. For instance, to get the speed quoted my require a CPU with SSE or Altivec SIMD (single instruction multiple data) instructions. Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ "If POSIX threads are a good thing, perhaps I don't want to know what they're better than." -- Rob Pike
Thanks but I have excluded the time taken to produce the plan and I
have reused the same plan for all the data.

Regards
Massimo

> Have you excluded the time taken to produce the plan? > That should be done once at startup, and then reused for all the data with > that length and buffer alignment. > > In particular producing a new plan for every data block will totally kill > performance. > > Regards, Dan.
I think the speed of FFTW is highly CPU dependant but I'm using a
PowerPC 7447 with SSE and Altivec optimization, too !

Have you got obtained the results shows in fft.org site?

Regards
Massimo

> Probably no mistake. The speed of FFTW is probably highly CPU dependant. > For instance, to get the speed quoted my require a CPU with SSE or > Altivec SIMD (single instruction multiple data) instructions.
On 2007-02-06, m.baldasseroni <mbaldasseroni@progesi.it> wrote:
> Good morning, > I'm using FFTW library for my project. > I'm working on a PowerPC 7447 processor with vxWorks operating system.
Are you sure you have all the caches and branch prediction enabled? -- Ben Jackson AD7GD <ben@ben.com> http://www.ben.com/
First of all thanks you for your answer!
All caches and branch predictions are enabled (I have verified it on
marvel register).
I'm sure of Altivec elaboration and caches use because I've tried the
vBigDSP elaboration and the results I've obtained are compliant with
the theory and with the graphics show on the fft.org site.
I'm using the same test program for measuring fft speed in both cases
(FFTW and vBigDsp), so I'm quite sure that there is a mistake in FFTW
using, configuration or compile process.
Could you give me any indications about your configure file (config.h)
and the options of compile process ?
If you prefere I can post my config.h file...

I thank you again for your help
Massimo


Ben Jackson ha scritto:

> On 2007-02-06, m.baldasseroni <mbaldasseroni@progesi.it> wrote: > > Good morning, > > I'm using FFTW library for my project. > > I'm working on a PowerPC 7447 processor with vxWorks operating system. > > Are you sure you have all the caches and branch prediction enabled? > > -- > Ben Jackson AD7GD > <ben@ben.com> > http://www.ben.com/
m.baldasseroni wrote:
> First of all thanks you for your answer! > All caches and branch predictions are enabled (I have verified it on > marvel register). > I'm sure of Altivec elaboration and caches use because I've tried the > vBigDSP elaboration and the results I've obtained are compliant with > the theory and with the graphics show on the fft.org site. > I'm using the same test program for measuring fft speed in both cases > (FFTW and vBigDsp), so I'm quite sure that there is a mistake in FFTW > using, configuration or compile process. > Could you give me any indications about your configure file (config.h) > and the options of compile process ? > If you prefere I can post my config.h file... > > I thank you again for your help > Massimo > >
Are you calling the fftw configure script with the --enable-single and --enable-altivec arguments? Are you certain you have required alignment for SIMD? I'm not sure if this is important for AlitVec, but for SSE, this is crucial for speed. -- Mark Borgerding 3dB Labs, Inc Innovate. Develop. Deliver.
Mark Borgerding ha scritto:

> m.baldasseroni wrote: > > First of all thanks you for your answer! > > All caches and branch predictions are enabled (I have verified it on > > marvel register). > > I'm sure of Altivec elaboration and caches use because I've tried the > > vBigDSP elaboration and the results I've obtained are compliant with > > the theory and with the graphics show on the fft.org site. > > I'm using the same test program for measuring fft speed in both cases > > (FFTW and vBigDsp), so I'm quite sure that there is a mistake in FFTW > > using, configuration or compile process. > > Could you give me any indications about your configure file (config.h) > > and the options of compile process ? > > If you prefere I can post my config.h file... > > > > I thank you again for your help > > Massimo > > > > > > > Are you calling the fftw configure script with the --enable-single and > --enable-altivec arguments? > > Are you certain you have required alignment for SIMD? I'm not sure if > this is important for AlitVec, but for SSE, this is crucial for speed. > > > > -- > Mark Borgerding > 3dB Labs, Inc > Innovate. Develop. Deliver.
Thanks for your answer. I'm working under vxWorks operating system, so I have manually configured the config.h file. I have enabled the altivec optimization and single precision mode. I remember you that my target is a PowerPC 7447, so I have enabled ONLY altivec optimization and not SSE and SSE2. About SIMD alignment I have used fftwf_malloc() function that guarantes the correct alignment at 16 bytes. Have you used FFTW algorithms? Which compile options have you used? Have you got any other suggestes? Thank you in advance Massimo
On Feb 12, 10:33 am, "m.baldasseroni" <mbaldasser...@progesi.it>
wrote:
> I'm working under vxWorks operating system, so I have manually > configured the config.h file. I have enabled the altivec optimization > and single precision mode.
I'm guessing that you screwed up the compilation process. (From the sound of things, you haven't even checked whether the Altivec code is actually being used.) As I repeatedly urged you the last time you asked about this on comp.dsp, in October 2006, you should use FFTW's built-in configure script, its compiler flags, and its Makefiles. There is no reason why this should not be possible with cross-compilers for vxWorks; we routinely cross-compile FFTW for other platforms using the configure script, and a Google search reveals numerous people using autoconf configure script with vxWorks cross-compilers. (If I recall correctly, you are using Cygwin with a cross-compiler, using some weird/old variant of gcc.) At the time, your objection was that you didn't know how to run or install "make", which is a rather basic question in Unix-style software compilation and reveals that you had/have a lot of catching- up to do (and moreover were asking on the wrong newsgroup). There are plenty of books out there on compiling Unix software, using Autoconf- style configure scripts, cross-compiling, and so on. Rather than spend a little time learning how to use the tools properly, however, you've instead spent over a year trying to do it your own way based on minimal understanding, and apparently failing. I'm sorry to be so negative, but your responses to my explanations so far have made helping you rather frustrating and unrewarding. Regards, Steven G. Johnson