comp.dsp | Maximize speed when using canned fft routines

I'm looking at the periodic components of GPS position errors.

I have chosen to use fft as my _initial_ tool.
   I have it.
   I'm comfortable with it.
   I think I understand the possible results.
   I believe my "problem" meets criteria discussed in thread
"When is FFT of experimental numerical data logically valid/legitimate"

I expect periods of interest to range from 1/hour to 1/month.
I intend to decimate my data to 1/.1 hour and observation window to 10 
months.

First question: Have I chosen rate and window width appropriately.


The above implies I'll be doing an ~200k point fft. As I'll be doing 
many, that will take lots of time. [I'll be using Scilab 4.1 under 
Windows XP]

Second questions.
1. Would I gain significantly by using power of 2 samples (ie 256k not 
201k)?
2. Are there other lengths that might be beneficial? - I seem to recall 
a discussion sometime back about number of points being product of small 
number of primes was advantageous.
3. Other options?

Reply by ●April 23, 20072007-04-23

On Apr 23, 2:11 pm, Richard Owlett <rowl...@atlascomm.net> wrote:
> I'm looking at the periodic components of GPS position errors.
>
> I have chosen to use fft as my _initial_ tool.
>    I have it.
>    I'm comfortable with it.
>    I think I understand the possible results.
>    I believe my "problem" meets criteria discussed in thread
> "When is FFT of experimental numerical data logically valid/legitimate"
>
> I expect periods of interest to range from 1/hour to 1/month.
> I intend to decimate my data to 1/.1 hour and observation window to 10
> months.
>
> First question: Have I chosen rate and window width appropriately.
>
> The above implies I'll be doing an ~200k point fft. As I'll be doing
> many, that will take lots of time. [I'll be using Scilab 4.1 under
> Windows XP]
>
> Second questions.
> 1. Would I gain significantly by using power of 2 samples (ie 256k not
> 201k)?
> 2. Are there other lengths that might be beneficial? - I seem to recall
> a discussion sometime back about number of points being product of small
> number of primes was advantageous.
> 3. Other options?

It is likely that powers of two will be faster. If you use FFTW for
the FFTs (see http://www.scilab.org/contrib/displayContribution.php?fileID=869),
the benchmarks on www.fftw.org are a useful guide.

John

Reply by Steven G. Johnson ●April 23, 20072007-04-23

On Apr 23, 2:11 pm, Richard Owlett <rowl...@atlascomm.net> wrote:
> Second questions.
> 1. Would I gain significantly by using power of 2 samples (ie 256k not
> 201k)?

No, in general these days it is often significantly *slower* to pad up
to the next power of two, as long as you are starting with a size that
is factorizable into small primes (no prime factors greater than 7,
say), and assuming you use an FFT implementation that is well
optimized for non-power-of-two sizes.

(There is actually a speed *penalty* associated with power-of-two
sizes these days thanks to limited cache associativity based on low-
order address bits, which makes power-of-two strides in memory the
worst case.  On the other hand, FFT routines tend to be much more
heavily optimized for the power-of-two cases, since that's what most
people use by force of habit.)

For example, using our FFTW implementation (www.fftw.org) on a 3GHz
Core Duo in double precision (with Linux/gcc), some timing numbers for
(complex-data) FFTs of various sizes are:

   N = 256k = 262144   ---  10.78 ms
   N = 200k = 204800   ---     6.2 ms
   N = 200000              ---     6.4 ms
   N = 201k = 205824   ---   16.9 ms

Note that 201k is not such a good size because it has a relatively
large prime factor (67), but it is still not *that* much worse than
padding up to N=256k!  Note also that for purely real input data, it
will be faster by roughly a factor of two.

> 2. Are there other lengths that might be beneficial? - I seem to recall
> a discussion sometime back about number of points being product of small
> number of primes was advantageous.

Usually, I would just recommend the smallest highly composite size
(prime factors <= 7) that your data fits into.  If you want, and if
you have lots of transforms to do, then you could perform some
benchmarking to find the best size.

Regards,
Steven G. Johnson

Reply by ●April 23, 20072007-04-23

On Apr 23, 9:07 pm, "Steven G. Johnson" <stev...@alum.mit.edu> wrote:
> On Apr 23, 2:11 pm, Richard Owlett <rowl...@atlascomm.net> wrote:
>
> > Second questions.
> > 1. Would I gain significantly by using power of 2 samples (ie 256k not
> > 201k)?
>
> No, in general these days it is often significantly *slower* to pad up
> to the next power of two, as long as you are starting with a size that
> is factorizable into small primes (no prime factors greater than 7,
> say), and assuming you use an FFT implementation that is well
> optimized for non-power-of-two sizes.
>
> (There is actually a speed *penalty* associated with power-of-two
> sizes these days thanks to limited cache associativity based on low-
> order address bits, which makes power-of-two strides in memory the
> worst case.  On the other hand, FFT routines tend to be much more
> heavily optimized for the power-of-two cases, since that's what most
> people use by force of habit.)
>
> For example, using our FFTW implementation (www.fftw.org) on a 3GHz
> Core Duo in double precision (with Linux/gcc), some timing numbers for
> (complex-data) FFTs of various sizes are:
>
>    N = 256k = 262144   ---  10.78 ms
>    N = 200k = 204800   ---     6.2 ms
>    N = 200000              ---     6.4 ms
>    N = 201k = 205824   ---   16.9 ms
>
> Note that 201k is not such a good size because it has a relatively
> large prime factor (67), but it is still not *that* much worse than
> padding up to N=256k!  Note also that for purely real input data, it
> will be faster by roughly a factor of two.
>
> > 2. Are there other lengths that might be beneficial? - I seem to recall
> > a discussion sometime back about number of points being product of small
> > number of primes was advantageous.
>
> Usually, I would just recommend the smallest highly composite size
> (prime factors <= 7) that your data fits into.  If you want, and if
> you have lots of transforms to do, then you could perform some
> benchmarking to find the best size.
>
> Regards,
> Steven G. Johnson

I stand corrected. It appears that for older machines one could
extrapolate a different conclusion, but the graphs don't go out to
200k.

John

Maximize speed when using canned fft routines

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group