comp.dsp | FFTW speed !!

Good morning,
I'm using FFTW library for my project.
I'm working on a PowerPC 7447 processor with vxWorks operating system.
I have generated fftw library through gcc 2.95 compiler using the
sequent options for obj files:
-O3 -fomit-frame-pointer -fstrict-aliasing -fvec-eabi -mcpu=7450
I'm working in single precision.

My problem is the speed of fft elaboration.
I'm using the BASIC interface and I have tried 2 kinds of algorithm
(fftwf_plan_dft_1d()  and fftwf_plan_dft(1, ...) ) for complex data:

plan_1 = fftwf_plan_dft_1d(length_FFTW, in_complex, out_complex,
FFTW_FORWARD, FFTW_MEASURE);
plan_2 = fftwf_plan_dft_1d(length_FFTW, in_complex, out_complex,
FFTW_BACKWARD, FFTW_MEASURE);

and

plan_1 = fftwf_plan_dft(1, array_length, in_complex, out_complex,
FFTW_FORWARD, FFTW_MEASURE);
plan_2 = fftwf_plan_dft(1, array_length, in_complex, out_complex,
FFTW_BACKWARD, FFTW_MEASURE);

but the best results I've obtained are the follows:

Algorithm	              Length	Interface	Flag	         ns/sample
fftwf_plan_dft(1, ...)	         8	      Basic	ESTIMATE	37,85
fftwf_plan_dft(1, ...)	        16	     Basic     ESTIMATE	       37,47
fftwf_plan_dft(1, ...)	        32	     Basic     ESTIMATE	       22,77
fftwf_plan_dft(1, ...)	        64	     Basic     ESTIMATE	       16,06
fftwf_plan_dft(1, ...)	       100	    Basic     ESTIMATE	      15,86
fftwf_plan_dft(1, ...)	       128	    Basic     ESTIMATE	      16,17
fftwf_plan_dft(1, ...)	       256	    Basic     ESTIMATE        21,65
fftwf_plan_dft(1, ...)	       500	    Basic     ESTIMATE        34,94
fftwf_plan_dft(1, ...)	       512	    Basic     ESTIMATE	      27,15
fftwf_plan_dft(1, ...)	      1000	   Basic     ESTIMATE        25
fftwf_plan_dft(1, ...)	      1024	   Basic     ESTIMATE	     34,22
fftwf_plan_dft(1, ...)	      2048	   Basic     ESTIMATE        43,98

I've executed my test in 10000 loops in out of place mode. I've tried
with other flags as MEASURE, PATIENT and EXHAUSTIVE, but results are
worse!
For a fft length of 1024, for example,  I've measured a speed of 34,22
ns/sample that corresponds to 1461 mflops while the graphic on
fftw.org site shows a speed of about 4700 mflops !!
Have you got any idea about the mistake I have done?

Thanks in advance
Massimo

Reply by Dan Mills ●February 6, 20072007-02-06

On Tue, 06 Feb 2007 06:04:54 -0800, m.baldasseroni wrote:

> For a fft length of 1024, for example,  I've measured a speed of 34,22
> ns/sample that corresponds to 1461 mflops while the graphic on
> fftw.org site shows a speed of about 4700 mflops !!
> Have you got any idea about the mistake I have done?

Have you excluded the time taken to produce the plan? 
That should be done once at startup, and then reused for all the data with
that length and buffer alignment. 

In particular producing a new plan for every data block will totally kill
performance. 

Regards, Dan.

Reply by Erik de Castro Lopo ●February 6, 20072007-02-06

m.baldasseroni wrote:

> For a fft length of 1024, for example,  I've measured a speed of 34,22
> ns/sample that corresponds to 1461 mflops while the graphic on
> fftw.org site shows a speed of about 4700 mflops !!
> Have you got any idea about the mistake I have done?

Probably no mistake. The speed of FFTW is probably highly CPU dependant.
For instance, to get the speed quoted my require a CPU with SSE or 
Altivec SIMD (single instruction multiple data) instructions.

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo
+-----------------------------------------------------------+
"If POSIX threads are a good thing, perhaps I don't want to know what
they're better than."                                   -- Rob Pike

Reply by m.baldasseroni ●February 7, 20072007-02-07

Thanks but I have excluded the time taken to produce the plan and I
have reused the same plan for all the data.

Regards
Massimo

> Have you excluded the time taken to produce the plan?
> That should be done once at startup, and then reused for all the data with
> that length and buffer alignment.
>
> In particular producing a new plan for every data block will totally kill
> performance.
>
> Regards, Dan.

Reply by m.baldasseroni ●February 7, 20072007-02-07

I think the speed of FFTW is highly CPU dependant but I'm using a
PowerPC 7447 with SSE and Altivec optimization, too !

Have you got obtained the results shows in fft.org site?

Regards
Massimo

> Probably no mistake. The speed of FFTW is probably highly CPU dependant.
> For instance, to get the speed quoted my require a CPU with SSE or
> Altivec SIMD (single instruction multiple data) instructions.

Reply by Ben Jackson ●February 7, 20072007-02-07

On 2007-02-06, m.baldasseroni <mbaldasseroni@progesi.it> wrote:
> Good morning,
> I'm using FFTW library for my project.
> I'm working on a PowerPC 7447 processor with vxWorks operating system.

Are you sure you have all the caches and branch prediction enabled?

-- 
Ben Jackson AD7GD
<ben@ben.com>
http://www.ben.com/

Reply by m.baldasseroni ●February 8, 20072007-02-08

First of all thanks you for your answer!
All caches and branch predictions are enabled (I have verified it on
marvel register).
I'm sure of Altivec elaboration and caches use because I've tried the
vBigDSP elaboration and the results I've obtained are compliant with
the theory and with the graphics show on the fft.org site.
I'm using the same test program for measuring fft speed in both cases
(FFTW and vBigDsp), so I'm quite sure that there is a mistake in FFTW
using, configuration or compile process.
Could you give me any indications about your configure file (config.h)
and the options of compile process ?
If you prefere I can post my config.h file...

I thank you again for your help
Massimo

Ben Jackson ha scritto:

> On 2007-02-06, m.baldasseroni <mbaldasseroni@progesi.it> wrote:
> > Good morning,
> > I'm using FFTW library for my project.
> > I'm working on a PowerPC 7447 processor with vxWorks operating system.
>
> Are you sure you have all the caches and branch prediction enabled?
>
> --
> Ben Jackson AD7GD
> <ben@ben.com>
> http://www.ben.com/

Reply by Mark Borgerding ●February 9, 20072007-02-09

m.baldasseroni wrote:
> First of all thanks you for your answer!
> All caches and branch predictions are enabled (I have verified it on
> marvel register).
> I'm sure of Altivec elaboration and caches use because I've tried the
> vBigDSP elaboration and the results I've obtained are compliant with
> the theory and with the graphics show on the fft.org site.
> I'm using the same test program for measuring fft speed in both cases
> (FFTW and vBigDsp), so I'm quite sure that there is a mistake in FFTW
> using, configuration or compile process.
> Could you give me any indications about your configure file (config.h)
> and the options of compile process ?
> If you prefere I can post my config.h file...
> 
> I thank you again for your help
> Massimo
> 
> 


Are you calling the fftw configure script with the --enable-single and 
--enable-altivec arguments?

Are you certain you have required alignment for SIMD?  I'm not sure if 
this is important for AlitVec, but for SSE, this is crucial for speed.



-- 
Mark Borgerding
3dB Labs, Inc
Innovate.  Develop.  Deliver.

Reply by m.baldasseroni ●February 12, 20072007-02-12

Mark Borgerding ha scritto:

> m.baldasseroni wrote:
> > First of all thanks you for your answer!
> > All caches and branch predictions are enabled (I have verified it on
> > marvel register).
> > I'm sure of Altivec elaboration and caches use because I've tried the
> > vBigDSP elaboration and the results I've obtained are compliant with
> > the theory and with the graphics show on the fft.org site.
> > I'm using the same test program for measuring fft speed in both cases
> > (FFTW and vBigDsp), so I'm quite sure that there is a mistake in FFTW
> > using, configuration or compile process.
> > Could you give me any indications about your configure file (config.h)
> > and the options of compile process ?
> > If you prefere I can post my config.h file...
> >
> > I thank you again for your help
> > Massimo
> >
> >
>
>
> Are you calling the fftw configure script with the --enable-single and
> --enable-altivec arguments?
>
> Are you certain you have required alignment for SIMD?  I'm not sure if
> this is important for AlitVec, but for SSE, this is crucial for speed.
>
>
>
> --
> Mark Borgerding
> 3dB Labs, Inc
> Innovate.  Develop.  Deliver.

Thanks for your answer.
I'm working under vxWorks operating system, so I have manually
configured the config.h file. I have enabled the altivec optimization
and single precision mode. I remember you that my target is a PowerPC
7447, so I have enabled ONLY altivec optimization and not SSE and
SSE2.
About SIMD alignment I have used fftwf_malloc() function that
guarantes the correct alignment at 16 bytes.
Have you used FFTW algorithms? Which compile options have you used?
Have you got any other suggestes?
Thank you in advance
Massimo

Reply by ●February 12, 20072007-02-12

On Feb 12, 10:33 am, "m.baldasseroni" <mbaldasser...@progesi.it>
wrote:
> I'm working under vxWorks operating system, so I have manually
> configured the config.h file. I have enabled the altivec optimization
> and single precision mode.

I'm guessing that you screwed up the compilation process.  (From the
sound of things, you haven't even checked whether the Altivec code is
actually being used.)

As I repeatedly urged you the last time you asked about this on
comp.dsp, in October 2006, you should use FFTW's built-in configure
script, its compiler flags, and its Makefiles.  There is no reason why
this should not be possible with cross-compilers for vxWorks; we
routinely cross-compile FFTW for other platforms using the configure
script, and a Google search reveals numerous people using autoconf
configure script with vxWorks cross-compilers.  (If I recall
correctly, you are using Cygwin with a cross-compiler, using some
weird/old variant of gcc.)

At the time, your objection was that you didn't know how to run or
install "make", which is a rather basic question in Unix-style
software compilation and reveals that you had/have a lot of catching-
up to do (and moreover were asking on the wrong newsgroup).  There are
plenty of books out there on compiling Unix software, using Autoconf-
style configure scripts, cross-compiling, and so on.  Rather than
spend a little time learning how to use the tools properly, however,
you've instead spent over a year trying to do it your own way based on
minimal understanding, and apparently failing.

I'm sorry to be so negative, but your responses to my explanations so
far have made helping you rather frustrating and unrewarding.

Regards,
Steven G. Johnson

Previous12 Next

FFTW speed !!

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group