DSPRelated.com
Forums

Large FFT --- NOT radix-2

Started by Richard Owlett January 1, 2004
Any pointers to doing non-radix 2 FFT's.
Specifically I wish to do 44100 pt FFT's on 16 bit data.
Simplicity of coding OUTWEIGHS speed!
[ Scilab is slow but adequate ]

On Thu, 01 Jan 2004 18:39:31 -0600, Richard Owlett
<rowlett@atlascomm.net> wrote:

>Any pointers to doing non-radix 2 FFT's. >Specifically I wish to do 44100 pt FFT's on 16 bit data. >Simplicity of coding OUTWEIGHS speed! >[ Scilab is slow but adequate ]
Hi Richard, I'm ignorant of these schemes, but ya' might do an Internet search on "Prime Factor" and "Split-radix" FFTs. Good Luck, [-Rick-]
"Richard Owlett" <rowlett@atlascomm.net> wrote in message
news:vv9fams78p7438@corp.supernews.com...
> Any pointers to doing non-radix 2 FFT's. > Specifically I wish to do 44100 pt FFT's on 16 bit data. > Simplicity of coding OUTWEIGHS speed! > [ Scilab is slow but adequate ] >
Doesn't SCILAB just do them? I tried it and it seems to. It took 0.421 seconds on a dual processor Pentium 200MHz. Fred
Fred Marshall wrote:

> "Richard Owlett" <rowlett@atlascomm.net> wrote in message > news:vv9fams78p7438@corp.supernews.com... > >>Any pointers to doing non-radix 2 FFT's. >>Specifically I wish to do 44100 pt FFT's on 16 bit data. >>Simplicity of coding OUTWEIGHS speed! >>[ Scilab is slow but adequate ] >> > > > Doesn't SCILAB just do them? I tried it and it seems to. It took 0.421 > seconds on a dual processor Pentium 200MHz. > > Fred > >
I'm going to have a second look at the framework I was using. My Pentium 4 1.6 GHz seems to do it no faster than your machine. Well, I did do ~1800 of them in a batch ;} BTW, it's already been pointed out to me in another forum that I could make my life simpler by padding to 64k ;/
Richard Owlett wrote:
> Fred Marshall wrote: > >> "Richard Owlett" <rowlett@atlascomm.net> wrote in message >> news:vv9fams78p7438@corp.supernews.com... >> >>> Any pointers to doing non-radix 2 FFT's. >>> Specifically I wish to do 44100 pt FFT's on 16 bit data. >>> Simplicity of coding OUTWEIGHS speed! >>> [ Scilab is slow but adequate ] >>> >> >> >> Doesn't SCILAB just do them? I tried it and it seems to. It took 0.421 >> seconds on a dual processor Pentium 200MHz. >> >> Fred >> >> > > I'm going to have a second look at the framework I was using. My Pentium > 4 1.6 GHz seems to do it no faster than your machine. Well, I did do > ~1800 of them in a batch ;} > > BTW, it's already been pointed out to me in another forum that I could > make my life simpler by padding to 64k ;/ > >
Just did a timing check using Scilab's timer(). My FFT's usually take .14 seconds with a few as fast as .125 seconds. Not the speed improvement I would expect. What's your OS? I'm using Winxp Pro SP1. Doing the timing pointed out another problem with my setup. It takes .94 seconds to move the 44k samples from the input data array to the array on which the FFT is performed.
"Richard Owlett" <rowlett@atlascomm.net> wrote in message
news:vvar7epmfqmuee@corp.supernews.com...
> Richard Owlett wrote: > > Fred Marshall wrote: > > > >> "Richard Owlett" <rowlett@atlascomm.net> wrote in message > >> news:vv9fams78p7438@corp.supernews.com... > >> > >>> Any pointers to doing non-radix 2 FFT's. > >>> Specifically I wish to do 44100 pt FFT's on 16 bit data. > >>> Simplicity of coding OUTWEIGHS speed! > >>> [ Scilab is slow but adequate ] > >>> > >> > >> > >> Doesn't SCILAB just do them? I tried it and it seems to. It took
0.421
> >> seconds on a dual processor Pentium 200MHz. > >> > >> Fred > >> > >> > > > > I'm going to have a second look at the framework I was using. My Pentium > > 4 1.6 GHz seems to do it no faster than your machine. Well, I did do > > ~1800 of them in a batch ;} > > > > BTW, it's already been pointed out to me in another forum that I could > > make my life simpler by padding to 64k ;/ > > > > > > Just did a timing check using Scilab's timer(). My FFT's usually take > .14 seconds with a few as fast as .125 seconds. Not the speed > improvement I would expect. What's your OS? I'm using Winxp Pro SP1. > > Doing the timing pointed out another problem with my setup. It takes > .94 seconds to move the 44k samples from the input data array to the > array on which the FFT is performed. >
Well early p4's (most p4's) aren't very good at floating point unless your code is recompiled for sse2. Athlon's do a lot better with x87 floating point. (athlon64 + 1 or 2 GB ram are looking like a real nice upgrade for a lot of things not just gaming) A p3 at 1.3GHz will beat a p4 1.6GHz on somethings about equal on others. A p4 should beat a dual pentium 200 on most things (should even beat dual pentium pro 200) How much memory do you have installed ? With xp of any sort 256MB is the absolute minimum for any serious sort of work. Especially as xp itself usually grabs around 128MB. Also what else is running in the background ? Did you kill off fast.exe which is the background loader for internet explorer. Been a while since I last used / ran scilab. Does scilab use atlas ? If so is it configured for your machine ? http://math-atlas.sourceforge.net/ I known matlab uses it for some things. Or have your recompiled scilab to tune it for the machine your on ? Shouldn't be necessary for most machines. Alex Gibson
"Richard Owlett" <rowlett@atlascomm.net> wrote in message
news:vvar7epmfqmuee@corp.supernews.com...
> Richard Owlett wrote: > > Fred Marshall wrote: > > > >> "Richard Owlett" <rowlett@atlascomm.net> wrote in message > >> news:vv9fams78p7438@corp.supernews.com... > >> > >>> Any pointers to doing non-radix 2 FFT's. > >>> Specifically I wish to do 44100 pt FFT's on 16 bit data. > >>> Simplicity of coding OUTWEIGHS speed! > >>> [ Scilab is slow but adequate ] > >>> > >> > >> > >> Doesn't SCILAB just do them? I tried it and it seems to. It took
0.421
> >> seconds on a dual processor Pentium 200MHz. > >> > >> Fred > >> > >> > > > > I'm going to have a second look at the framework I was using. My Pentium > > 4 1.6 GHz seems to do it no faster than your machine. Well, I did do > > ~1800 of them in a batch ;} > > > > BTW, it's already been pointed out to me in another forum that I could > > make my life simpler by padding to 64k ;/ > > > > > > Just did a timing check using Scilab's timer(). My FFT's usually take > .14 seconds with a few as fast as .125 seconds. Not the speed > improvement I would expect. What's your OS? I'm using Winxp Pro SP1. > > Doing the timing pointed out another problem with my setup. It takes > .94 seconds to move the 44k samples from the input data array to the > array on which the FFT is performed.
Richard, First, the test I did was structured like this: time=0 i=1 p(1:44100)=1 while i<=100 timer(); pf=fft(p,-1); time=time+timer() i=i+1 end So, there's no moving of arrrays in this code unless it's being done in the fft - it's just a simple timing loop. The machine I ran it on runs NT4sp6a. Well it *is* only a 200MHz machine! But it may be able to use both processors in this case - I don't know and would frankly be a little surprised if it did. Oh, but it appears that it is using both processors! So, either the speed difference would be 4 assuming you have a single processor machine and we're seeing 3.5 which is close enough to 4 for me. Maybe XP is using more resources than NT. I do note that 44100 breaks into a series of twice repeated primes 2,3,5,7 - so if you want to do something fancier, maybe you can take advantage of this - but I'm not at all sure it will be worth it. You *did* say that speed isn't the first issue. Just for fun, I tried decreasing the array size to 32768 -> 0.235 secs. 44100 -> 0.437 secs. 65536 -> 0.781 secs. I don't know, it looks like there must be some swapping going on for the largest array: 44100/32768 = 1.346 .437/.235 = 1.859 1.859/1.346 = 1.381 factor over linear increase 65536/44100 = 1.486 .781/.437 = 1.787 1.787/1.486 = 1.2 factor over linear increase 65536/32768 = 2 .781/.235 = 3.32 3.32/2 = 1.66 factor over linear increase. N*log2(N) @ 32768 = 32768*15 = 491,520 @ 44100 = 44100*15.43 = 680,463 (just for reference even though it's not a power of 2) @ 65536 = 65536*16 = 1,048,576 44/32 > 680,463/491,520 = 1.384 predicted vs. 1.859 measured; or 1.34 higher than predicted increase 65/44 > 1,048,576/680,463 = 1.54 predicted vs. 1.787 measured; or 1.16 higher than predicted increase 65/32 > 1,048,576/491,520 = 2.133 predicted vs. 3.32 measured; or 1.56 higher than predicted increase If we assume that 44100 would be less efficient, then it makes sense that 44/32 is higher than 65/44 doesn't it? (That is, if 44100 is less efficient it should be closer to 65K and further from 32K). What I don't figure yet is why 65/32 would be the worst situation. Perhaps it's because there's swapping going on with such a large array - I don't know, this system has 192MB of memory. Fred
"Richard Owlett" <rowlett@atlascomm.net> wrote in message
news:vvar7epmfqmuee@corp.supernews.com...
> Richard Owlett wrote: > > Fred Marshall wrote: > > > >> "Richard Owlett" <rowlett@atlascomm.net> wrote in message > >> news:vv9fams78p7438@corp.supernews.com... > >> > >>> Any pointers to doing non-radix 2 FFT's. > >>> Specifically I wish to do 44100 pt FFT's on 16 bit data. > >>> Simplicity of coding OUTWEIGHS speed! > >>> [ Scilab is slow but adequate ] > >>> > >> > >> > >> Doesn't SCILAB just do them? I tried it and it seems to. It took
0.421
> >> seconds on a dual processor Pentium 200MHz. > >> > >> Fred > >> > >> > > > > I'm going to have a second look at the framework I was using. My Pentium > > 4 1.6 GHz seems to do it no faster than your machine. Well, I did do > > ~1800 of them in a batch ;} > > > > BTW, it's already been pointed out to me in another forum that I could > > make my life simpler by padding to 64k ;/ > > > >
Richard, I forgot to address the padding to 64K. I don't see why you'd want to do that unless there performance was greatly affected or unless you need to zero pad for other reasons - like for circular convolution considerations. 44100 is a product of repeat numbers of primes: 2x2x3x3x5x5x7x7 so its performance might be expected to be OK if the implemenation is good. It appears it is in Scilab. On the other hand, try 4091 or 4093!! Here's the recent code I used. The runs times come out a little bit longer: function [rowtime,coltime]=fftloop(N,loops) i=1; rowtime=0; coltime=0; while i<=loops p=rand(1,N); timer() pf=fft(p,-1); rowtime=rowtime+timer(); i=i+1; end rowtime=rowtime/loops i=1; while i<=loops q=rand(N,1); timer(); pf=fft(q,-1); coltime=coltime+timer(); i=i+1; end coltime=coltime/loops Fred
r.lyons@REMOVE.ieee.org (Rick Lyons) wrote...
> Richard Owlett <rowlett@atlascomm.net> wrote: > >Any pointers to doing non-radix 2 FFT's. > >Specifically I wish to do 44100 pt FFT's on 16 bit data. > >Simplicity of coding OUTWEIGHS speed! > > I'm ignorant of these schemes, > but ya' might do an Internet search > on "Prime Factor" and "Split-radix" > FFTs.
The split-radix algorithm is essentially a combination of radices 2 and 4, and only works for power-of-two sizes. The Prime-Factor Algorithm (PFA) only decomposes a DFT into relatively prime factors (so e.g. it cannot further decompose the factor of 7^2 in 44100) and is fairly complicated to implement. What the original poster wants is a "mixed-radix" FFT, which is simply the generalized version of the standard Cooley-Tukey algorithm to arbitrary factorizations. Many implementations can be found online (see www.fftw.org/links.html) Of course, for spectral estimation you should also be doing windowing, etcetera, in order to reduce edge artifacts. Ditto if you are doing zero-padding. See e.g. Oppenheim and Schafer, "Digital Signal Processing". Cordially, Steven G. Johnson
Richard Owlett <rowlett@atlascomm.net> wrote in message news:<vv9fams78p7438@corp.supernews.com>...
> Any pointers to doing non-radix 2 FFT's. > Specifically I wish to do 44100 pt FFT's on 16 bit data. > Simplicity of coding OUTWEIGHS speed! > [ Scilab is slow but adequate ]
I'm not sure what form of solution you seek. A script? A program? A manual method? If you want to make a C program to accomplish the task, you could use my KISS FFT library ( http://sourceforge.net/projects/kissfft/ ), specifically designed for simplicity. If you know Python (great lang imho), you could use the FFT module, bundled with NumPy/Numeric. Another scripting alternative is octave, which mostly works like Matlab. Good Luck, Mark Borgerding