Good morning, I'm using FFTW library for my project. I'm working on a PowerPC 7447 processor with vxWorks operating system. I have generated fftw library through gcc 2.95 compiler using the sequent options for obj files: -O3 -fomit-frame-pointer -fstrict-aliasing -fvec-eabi -mcpu=7450 I'm working in single precision. My problem is the speed of fft elaboration. I'm using the BASIC interface and I have tried 2 kinds of algorithm (fftwf_plan_dft_1d() and fftwf_plan_dft(1, ...) ) for complex data: plan_1 = fftwf_plan_dft_1d(length_FFTW, in_complex, out_complex, FFTW_FORWARD, FFTW_MEASURE); plan_2 = fftwf_plan_dft_1d(length_FFTW, in_complex, out_complex, FFTW_BACKWARD, FFTW_MEASURE); and plan_1 = fftwf_plan_dft(1, array_length, in_complex, out_complex, FFTW_FORWARD, FFTW_MEASURE); plan_2 = fftwf_plan_dft(1, array_length, in_complex, out_complex, FFTW_BACKWARD, FFTW_MEASURE); but the best results I've obtained are the follows: Algorithm Length Interface Flag ns/sample fftwf_plan_dft(1, ...) 8 Basic ESTIMATE 37,85 fftwf_plan_dft(1, ...) 16 Basic ESTIMATE 37,47 fftwf_plan_dft(1, ...) 32 Basic ESTIMATE 22,77 fftwf_plan_dft(1, ...) 64 Basic ESTIMATE 16,06 fftwf_plan_dft(1, ...) 100 Basic ESTIMATE 15,86 fftwf_plan_dft(1, ...) 128 Basic ESTIMATE 16,17 fftwf_plan_dft(1, ...) 256 Basic ESTIMATE 21,65 fftwf_plan_dft(1, ...) 500 Basic ESTIMATE 34,94 fftwf_plan_dft(1, ...) 512 Basic ESTIMATE 27,15 fftwf_plan_dft(1, ...) 1000 Basic ESTIMATE 25 fftwf_plan_dft(1, ...) 1024 Basic ESTIMATE 34,22 fftwf_plan_dft(1, ...) 2048 Basic ESTIMATE 43,98 I've executed my test in 10000 loops in out of place mode. I've tried with other flags as MEASURE, PATIENT and EXHAUSTIVE, but results are worse! For a fft length of 1024, for example, I've measured a speed of 34,22 ns/sample that corresponds to 1461 mflops while the graphic on fftw.org site shows a speed of about 4700 mflops !! Have you got any idea about the mistake I have done? Thanks in advance Massimo
FFTW speed !!
Started by ●February 6, 2007
Reply by ●February 6, 20072007-02-06
On Tue, 06 Feb 2007 06:04:54 -0800, m.baldasseroni wrote:> For a fft length of 1024, for example, I've measured a speed of 34,22 > ns/sample that corresponds to 1461 mflops while the graphic on > fftw.org site shows a speed of about 4700 mflops !! > Have you got any idea about the mistake I have done?Have you excluded the time taken to produce the plan? That should be done once at startup, and then reused for all the data with that length and buffer alignment. In particular producing a new plan for every data block will totally kill performance. Regards, Dan.
Reply by ●February 6, 20072007-02-06
m.baldasseroni wrote:> For a fft length of 1024, for example, I've measured a speed of 34,22 > ns/sample that corresponds to 1461 mflops while the graphic on > fftw.org site shows a speed of about 4700 mflops !! > Have you got any idea about the mistake I have done?Probably no mistake. The speed of FFTW is probably highly CPU dependant. For instance, to get the speed quoted my require a CPU with SSE or Altivec SIMD (single instruction multiple data) instructions. Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ "If POSIX threads are a good thing, perhaps I don't want to know what they're better than." -- Rob Pike
Reply by ●February 7, 20072007-02-07
Thanks but I have excluded the time taken to produce the plan and I have reused the same plan for all the data. Regards Massimo> Have you excluded the time taken to produce the plan? > That should be done once at startup, and then reused for all the data with > that length and buffer alignment. > > In particular producing a new plan for every data block will totally kill > performance. > > Regards, Dan.
Reply by ●February 7, 20072007-02-07
I think the speed of FFTW is highly CPU dependant but I'm using a PowerPC 7447 with SSE and Altivec optimization, too ! Have you got obtained the results shows in fft.org site? Regards Massimo> Probably no mistake. The speed of FFTW is probably highly CPU dependant. > For instance, to get the speed quoted my require a CPU with SSE or > Altivec SIMD (single instruction multiple data) instructions.
Reply by ●February 7, 20072007-02-07
On 2007-02-06, m.baldasseroni <mbaldasseroni@progesi.it> wrote:> Good morning, > I'm using FFTW library for my project. > I'm working on a PowerPC 7447 processor with vxWorks operating system.Are you sure you have all the caches and branch prediction enabled? -- Ben Jackson AD7GD <ben@ben.com> http://www.ben.com/
Reply by ●February 8, 20072007-02-08
First of all thanks you for your answer! All caches and branch predictions are enabled (I have verified it on marvel register). I'm sure of Altivec elaboration and caches use because I've tried the vBigDSP elaboration and the results I've obtained are compliant with the theory and with the graphics show on the fft.org site. I'm using the same test program for measuring fft speed in both cases (FFTW and vBigDsp), so I'm quite sure that there is a mistake in FFTW using, configuration or compile process. Could you give me any indications about your configure file (config.h) and the options of compile process ? If you prefere I can post my config.h file... I thank you again for your help Massimo Ben Jackson ha scritto:> On 2007-02-06, m.baldasseroni <mbaldasseroni@progesi.it> wrote: > > Good morning, > > I'm using FFTW library for my project. > > I'm working on a PowerPC 7447 processor with vxWorks operating system. > > Are you sure you have all the caches and branch prediction enabled? > > -- > Ben Jackson AD7GD > <ben@ben.com> > http://www.ben.com/
Reply by ●February 9, 20072007-02-09
m.baldasseroni wrote:> First of all thanks you for your answer! > All caches and branch predictions are enabled (I have verified it on > marvel register). > I'm sure of Altivec elaboration and caches use because I've tried the > vBigDSP elaboration and the results I've obtained are compliant with > the theory and with the graphics show on the fft.org site. > I'm using the same test program for measuring fft speed in both cases > (FFTW and vBigDsp), so I'm quite sure that there is a mistake in FFTW > using, configuration or compile process. > Could you give me any indications about your configure file (config.h) > and the options of compile process ? > If you prefere I can post my config.h file... > > I thank you again for your help > Massimo > >Are you calling the fftw configure script with the --enable-single and --enable-altivec arguments? Are you certain you have required alignment for SIMD? I'm not sure if this is important for AlitVec, but for SSE, this is crucial for speed. -- Mark Borgerding 3dB Labs, Inc Innovate. Develop. Deliver.
Reply by ●February 12, 20072007-02-12
Mark Borgerding ha scritto:> m.baldasseroni wrote: > > First of all thanks you for your answer! > > All caches and branch predictions are enabled (I have verified it on > > marvel register). > > I'm sure of Altivec elaboration and caches use because I've tried the > > vBigDSP elaboration and the results I've obtained are compliant with > > the theory and with the graphics show on the fft.org site. > > I'm using the same test program for measuring fft speed in both cases > > (FFTW and vBigDsp), so I'm quite sure that there is a mistake in FFTW > > using, configuration or compile process. > > Could you give me any indications about your configure file (config.h) > > and the options of compile process ? > > If you prefere I can post my config.h file... > > > > I thank you again for your help > > Massimo > > > > > > > Are you calling the fftw configure script with the --enable-single and > --enable-altivec arguments? > > Are you certain you have required alignment for SIMD? I'm not sure if > this is important for AlitVec, but for SSE, this is crucial for speed. > > > > -- > Mark Borgerding > 3dB Labs, Inc > Innovate. Develop. Deliver.Thanks for your answer. I'm working under vxWorks operating system, so I have manually configured the config.h file. I have enabled the altivec optimization and single precision mode. I remember you that my target is a PowerPC 7447, so I have enabled ONLY altivec optimization and not SSE and SSE2. About SIMD alignment I have used fftwf_malloc() function that guarantes the correct alignment at 16 bytes. Have you used FFTW algorithms? Which compile options have you used? Have you got any other suggestes? Thank you in advance Massimo
Reply by ●February 12, 20072007-02-12
On Feb 12, 10:33 am, "m.baldasseroni" <mbaldasser...@progesi.it> wrote:> I'm working under vxWorks operating system, so I have manually > configured the config.h file. I have enabled the altivec optimization > and single precision mode.I'm guessing that you screwed up the compilation process. (From the sound of things, you haven't even checked whether the Altivec code is actually being used.) As I repeatedly urged you the last time you asked about this on comp.dsp, in October 2006, you should use FFTW's built-in configure script, its compiler flags, and its Makefiles. There is no reason why this should not be possible with cross-compilers for vxWorks; we routinely cross-compile FFTW for other platforms using the configure script, and a Google search reveals numerous people using autoconf configure script with vxWorks cross-compilers. (If I recall correctly, you are using Cygwin with a cross-compiler, using some weird/old variant of gcc.) At the time, your objection was that you didn't know how to run or install "make", which is a rather basic question in Unix-style software compilation and reveals that you had/have a lot of catching- up to do (and moreover were asking on the wrong newsgroup). There are plenty of books out there on compiling Unix software, using Autoconf- style configure scripts, cross-compiling, and so on. Rather than spend a little time learning how to use the tools properly, however, you've instead spent over a year trying to do it your own way based on minimal understanding, and apparently failing. I'm sorry to be so negative, but your responses to my explanations so far have made helping you rather frustrating and unrewarding. Regards, Steven G. Johnson