DSPRelated.com
Forums

Maximize speed when using canned fft routines

Started by Richard Owlett April 23, 2007
I'm looking at the periodic components of GPS position errors.

I have chosen to use fft as my _initial_ tool.
   I have it.
   I'm comfortable with it.
   I think I understand the possible results.
   I believe my "problem" meets criteria discussed in thread
"When is FFT of experimental numerical data logically valid/legitimate"

I expect periods of interest to range from 1/hour to 1/month.
I intend to decimate my data to 1/.1 hour and observation window to 10 
months.

First question: Have I chosen rate and window width appropriately.


The above implies I'll be doing an ~200k point fft. As I'll be doing 
many, that will take lots of time. [I'll be using Scilab 4.1 under 
Windows XP]

Second questions.
1. Would I gain significantly by using power of 2 samples (ie 256k not 
201k)?
2. Are there other lengths that might be beneficial? - I seem to recall 
a discussion sometime back about number of points being product of small 
number of primes was advantageous.
3. Other options?
On Apr 23, 2:11 pm, Richard Owlett <rowl...@atlascomm.net> wrote:
> I'm looking at the periodic components of GPS position errors. > > I have chosen to use fft as my _initial_ tool. > I have it. > I'm comfortable with it. > I think I understand the possible results. > I believe my "problem" meets criteria discussed in thread > "When is FFT of experimental numerical data logically valid/legitimate" > > I expect periods of interest to range from 1/hour to 1/month. > I intend to decimate my data to 1/.1 hour and observation window to 10 > months. > > First question: Have I chosen rate and window width appropriately. > > The above implies I'll be doing an ~200k point fft. As I'll be doing > many, that will take lots of time. [I'll be using Scilab 4.1 under > Windows XP] > > Second questions. > 1. Would I gain significantly by using power of 2 samples (ie 256k not > 201k)? > 2. Are there other lengths that might be beneficial? - I seem to recall > a discussion sometime back about number of points being product of small > number of primes was advantageous. > 3. Other options?
It is likely that powers of two will be faster. If you use FFTW for the FFTs (see http://www.scilab.org/contrib/displayContribution.php?fileID=869), the benchmarks on www.fftw.org are a useful guide. John
On Apr 23, 2:11 pm, Richard Owlett <rowl...@atlascomm.net> wrote:
> Second questions. > 1. Would I gain significantly by using power of 2 samples (ie 256k not > 201k)?
No, in general these days it is often significantly *slower* to pad up to the next power of two, as long as you are starting with a size that is factorizable into small primes (no prime factors greater than 7, say), and assuming you use an FFT implementation that is well optimized for non-power-of-two sizes. (There is actually a speed *penalty* associated with power-of-two sizes these days thanks to limited cache associativity based on low- order address bits, which makes power-of-two strides in memory the worst case. On the other hand, FFT routines tend to be much more heavily optimized for the power-of-two cases, since that's what most people use by force of habit.) For example, using our FFTW implementation (www.fftw.org) on a 3GHz Core Duo in double precision (with Linux/gcc), some timing numbers for (complex-data) FFTs of various sizes are: N = 256k = 262144 --- 10.78 ms N = 200k = 204800 --- 6.2 ms N = 200000 --- 6.4 ms N = 201k = 205824 --- 16.9 ms Note that 201k is not such a good size because it has a relatively large prime factor (67), but it is still not *that* much worse than padding up to N=256k! Note also that for purely real input data, it will be faster by roughly a factor of two.
> 2. Are there other lengths that might be beneficial? - I seem to recall > a discussion sometime back about number of points being product of small > number of primes was advantageous.
Usually, I would just recommend the smallest highly composite size (prime factors <= 7) that your data fits into. If you want, and if you have lots of transforms to do, then you could perform some benchmarking to find the best size. Regards, Steven G. Johnson
On Apr 23, 9:07 pm, "Steven G. Johnson" <stev...@alum.mit.edu> wrote:
> On Apr 23, 2:11 pm, Richard Owlett <rowl...@atlascomm.net> wrote: > > > Second questions. > > 1. Would I gain significantly by using power of 2 samples (ie 256k not > > 201k)? > > No, in general these days it is often significantly *slower* to pad up > to the next power of two, as long as you are starting with a size that > is factorizable into small primes (no prime factors greater than 7, > say), and assuming you use an FFT implementation that is well > optimized for non-power-of-two sizes. > > (There is actually a speed *penalty* associated with power-of-two > sizes these days thanks to limited cache associativity based on low- > order address bits, which makes power-of-two strides in memory the > worst case. On the other hand, FFT routines tend to be much more > heavily optimized for the power-of-two cases, since that's what most > people use by force of habit.) > > For example, using our FFTW implementation (www.fftw.org) on a 3GHz > Core Duo in double precision (with Linux/gcc), some timing numbers for > (complex-data) FFTs of various sizes are: > > N = 256k = 262144 --- 10.78 ms > N = 200k = 204800 --- 6.2 ms > N = 200000 --- 6.4 ms > N = 201k = 205824 --- 16.9 ms > > Note that 201k is not such a good size because it has a relatively > large prime factor (67), but it is still not *that* much worse than > padding up to N=256k! Note also that for purely real input data, it > will be faster by roughly a factor of two. > > > 2. Are there other lengths that might be beneficial? - I seem to recall > > a discussion sometime back about number of points being product of small > > number of primes was advantageous. > > Usually, I would just recommend the smallest highly composite size > (prime factors <= 7) that your data fits into. If you want, and if > you have lots of transforms to do, then you could perform some > benchmarking to find the best size. > > Regards, > Steven G. Johnson
I stand corrected. It appears that for older machines one could extrapolate a different conclusion, but the graphs don't go out to 200k. John