comp.dsp | Large FFT --- NOT radix-2| page 2

Reply by Richard Owlett ●January 5, 20042004-01-05

Many helpful people wrote:
> Richard Owlett <rowlett@atlascomm.net> wrote in message news:<vv9fams78p7438@corp.supernews.com>...
> 
>>Any pointers to doing non-radix 2 FFT's.
>>Specifically I wish to do 44100 pt FFT's on 16 bit data.
>>Simplicity of coding OUTWEIGHS speed!
>>[ Scilab is slow but adequate ]
> 
> 
>[Snip many educational replies]

My problem was how I was getting the data on which the FFT would be 
done. NOT, how fast Scilab could do an FFT. Mind if I don't admit how 
bad my code was ;?

Reply by ●January 5, 20042004-01-05

Fred Marshall <fmarshallx@remove_the_x.acm.org> wrote:
> 
> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
> news:vv9fams78p7438@corp.supernews.com...
>> Any pointers to doing non-radix 2 FFT's.
>> Specifically I wish to do 44100 pt FFT's on 16 bit data.
>> Simplicity of coding OUTWEIGHS speed!
>> [ Scilab is slow but adequate ]
>>
> 
> Doesn't SCILAB just do them?  I tried it and it seems to.  It took 0.421
> seconds on a dual processor Pentium 200MHz.

P200:
octave:4> x=randn(1,44100);tic,w=fft(x);toc
ans = 0.18392

A1600:
octave:5> x=randn(1,44100);tic,w=fft(x);toc
ans = 0.022966

Reply by Richard Owlett ●January 6, 20042004-01-06

Alex Gibson wrote:
> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
> news:vvar7epmfqmuee@corp.supernews.com...
> 
>>Richard Owlett wrote:
>>
>>>Fred Marshall wrote:
>>>
>>>
>>>>"Richard Owlett" <rowlett@atlascomm.net> wrote in message
>>>>news:vv9fams78p7438@corp.supernews.com...
>>>>
>>>>
>>>>>Any pointers to doing non-radix 2 FFT's.
>>>>>Specifically I wish to do 44100 pt FFT's on 16 bit data.
>>>>>Simplicity of coding OUTWEIGHS speed!
>>>>>[ Scilab is slow but adequate ]
>>>>>
>>>>
>>>>
>>>>Doesn't SCILAB just do them?  I tried it and it seems to.  It took
> 
> 0.421
> 
>>>>seconds on a dual processor Pentium 200MHz.
>>>>
>>>>Fred
>>>>
>>>>
>>>
>>>I'm going to have a second look at the framework I was using. My Pentium
>>>4 1.6 GHz seems to do it no faster than your machine. Well, I did do
>>>~1800 of them in a batch ;}
>>>
>>>BTW, it's already been pointed out to me in another forum that I could
>>>make my life simpler by padding to 64k ;/
>>>
>>>
>>
>>Just did a timing check using Scilab's timer(). My FFT's usually take
>>.14 seconds with a few as fast as .125 seconds. Not the speed
>>improvement I would expect. What's your OS? I'm using Winxp Pro SP1.
>>
>>Doing the timing pointed out another problem with my setup. It takes
>>.94 seconds to move the 44k samples from the input data array to the
>>array on which the FFT is performed.
>>
> 
> 
> Well early p4's (most p4's) aren't very good at floating point
> unless your code is recompiled for sse2.
> 
> Athlon's do a lot better with x87 floating point.
> 
> (athlon64 + 1 or 2 GB ram  are looking like a real nice upgrade
> for a lot of things not just gaming)
> 
> A p3 at 1.3GHz will beat a p4 1.6GHz
> on somethings about equal on others.
> 
> A p4 should beat a dual pentium 200 on most things
> (should even beat dual pentium pro 200)
> 
> How much memory do you have installed ?
> With xp of any sort 256MB is the absolute minimum for any
> serious sort of work.
> Especially as xp itself usually grabs around 128MB.

1.25 Gig
Knew I'd be wanting lots of memory when i bought the machine :}


> 
> Also what else is running in the background ?
> Did you kill off fast.exe which is the background loader for internet
> explorer.
> 

It's not shown on Task Manager, I use Mozilla in any case.
Timing didn't change when I disabled firewall and anti-virus software.


> Been a while since I last used / ran scilab.
> 
> Does scilab use atlas ?

I don't know.

> If so is it configured for your machine ?
> http://math-atlas.sourceforge.net/
> I known matlab uses it for some things.
> 
> Or have your recompiled scilab to tune it for
> the machine your on ?
> Shouldn't be necessary for most machines.

No. I don't have the tools.


> 
> Alex Gibson
> 
>

Reply by Ray Andraka ●January 8, 20042004-01-08

Well, 44100 is 3^2 * 7^2 * 10^10.  You could use 6 FFTs
matching those sizes  (Winograd, Singleton and Rader have
3,7, 5 point FFTs), however the code is likely to be less
simple than you want.  If you really don't care about speed,
you could just do a 44100 point DFT.  The code would be dirt
simple, although it may also be dog slow.

Richard Owlett wrote:

> Any pointers to doing non-radix 2 FFT's.
> Specifically I wish to do 44100 pt FFT's on 16 bit data.
> Simplicity of coding OUTWEIGHS speed!
> [ Scilab is slow but adequate ]

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin
Franklin, 1759

Reply by Ray Andraka ●January 8, 20042004-01-08

He could do a pair of prime factor 210 point FFTs cascaded using the mixed
radix algorithm though.  210=2*3*5*7, so for that you could use the PFA.
PFA is nice because it eliminates the phase rotations between the smaller
FFT kernels.  The 2,3,5 and 7 point FFTs could be any of winograd,
singleton or Rader.   A decent reference is the Smith and Smith "Handbook
of real-time fast fourier transforms", IEEE press.

"Steven G. Johnson" wrote:

> r.lyons@REMOVE.ieee.org (Rick Lyons) wrote...
> > Richard Owlett <rowlett@atlascomm.net> wrote:
> > >Any pointers to doing non-radix 2 FFT's.
> > >Specifically I wish to do 44100 pt FFT's on 16 bit data.
> > >Simplicity of coding OUTWEIGHS speed!
> >
> >   I'm ignorant of these schemes,
> > but ya' might do an Internet search
> > on "Prime Factor" and "Split-radix"
> > FFTs.
>
> The split-radix algorithm is essentially a combination of radices 2
> and 4, and only works for power-of-two sizes.  The Prime-Factor
> Algorithm (PFA) only decomposes a DFT into relatively prime factors
> (so e.g. it cannot further decompose the factor of 7^2 in 44100) and
> is fairly complicated to implement.
>
> What the original poster wants is a "mixed-radix" FFT, which is simply
> the generalized version of the standard Cooley-Tukey algorithm to
> arbitrary factorizations.  Many implementations can be found online
> (see www.fftw.org/links.html)
>
> Of course, for spectral estimation you should also be doing windowing,
> etcetera, in order to reduce edge artifacts.  Ditto if you are doing
> zero-padding.  See e.g. Oppenheim and Schafer, "Digital Signal
> Processing".
>
> Cordially,
> Steven G. Johnson

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Reply by James Van Buskirk ●January 21, 20042004-01-21

"Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
news:27cfb406.0401021313.4196aa02@posting.google.com...

> The split-radix algorithm is essentially a combination of radices 2
> and 4, and only works for power-of-two sizes.

The following question is only loosely related to this comment:

Last year you wrote:

> Actually, the minimal operation count (for powers of two) is achieved by
> something called "split-radix" that is a mixture of radix 2 and 4.  At
> each step, it breaks your size-n DFT into three pieces, one of size n/2
> and two of size n/4.

But it seems to me that split-radix requires 912 additions and 248
multiplications for n = 64, and 1160 isn't the minimal number of
operations.  Is there some paper you are thinking of that discusses
a modification of the standard split-radix algorithm that always
attains the minimal number of arithmetic operations (real adds +
real multiplies?)

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

Reply by Steven G. Johnson ●January 22, 20042004-01-22

James Van Buskirk wrote:
> But it seems to me that split-radix requires 912 additions and 248
> multiplications for n = 64, and 1160 isn't the minimal number of
> operations.

What do you think is the lowest known number of operations for an n=64 
complex DFT, using what algorithm?

Reply by James Van Buskirk ●January 22, 20042004-01-22

"Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
news:bunj9a$hts$1@news.fas.harvard.edu...

> James Van Buskirk wrote:
> > But it seems to me that split-radix requires 912 additions and 248
> > multiplications for n = 64, and 1160 isn't the minimal number of
> > operations.

> What do you think is the lowest known number of operations for an n=64
> complex DFT, using what algorithm?

The best I know of is 912 additions and 240 multiplications; just
figured out how to achieve this result yesterday.  Algorithm is
the same as used for all my computational kernals (heavy use of
real-half-complex DFTs,) just became aware of another possible trick.

The lowest known by anyone is a different question.  The more you
look at DFTs, the more you become uncertain that what you or anyone
else does is optimal, even if you get to write the criteria of
optimality yourself.

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

Reply by Steven G. Johnson ●January 22, 20042004-01-22

James Van Buskirk wrote:
>>James Van Buskirk wrote:
>>>But it seems to me that split-radix requires 912 additions and 248
>>>multiplications for n = 64, and 1160 isn't the minimal number of
>>>operations.
> 
> The best I know of is 912 additions and 240 multiplications; just
> figured out how to achieve this result yesterday.

Please do let me know when you publish your results, as I try to keep 
abreast of the latest FFT developments.

The arithmetic complexity of FFT algorithms was extensively studied in 
the 1980's, when it was still considered to be a good proxy for speed 
(sadly no longer the case).  As of 1991 [1], the lowest-known operation 
count for an n=64 complex DFT was 1160 flops, a record that had been set 
in 1968 by Yavne [2] and was later matched using a simpler algorithm by 
the split-radix technique.  For N=2^m, this record is:

	flops = 4 N log2(N) - 6N + 8

(This is asymptotically 20% fewer flops than for vanilla radix 2.  Note 
that you can trade off some multiplies for additions by using the 3 adds 
+ 3 multiplies complex-multiply trick.)

Actually, it has supposedly been rigorously proved [1] that split-radix 
achieves the minimal number of complex additions.  There is also a 
rigorous lower bound [1] on the number of non-trivial complex (or real) 
multiplications, which split-radix can achieve up to N=16 (and it is 
close for larger N).  For larger N, there are known algorithms that 
achieve the minimal number of multiplies, but they do not have fewer 
total flops than split-radix, and have more flops for N > 64.  So, the 
achievable total flop count has yet not been rigorously proven.

 > Algorithm is
> the same as used for all my computational kernals (heavy use of
> real-half-complex DFTs,) just became aware of another possible trick.

(There is the known technique of taking the real-input FFTs of the real 
and imaginary parts, and then combining them with (2N-4) additions. 
However, if you use the lowest-known operation counts for real-input 
FFTs [3], this gives exactly the same known flops for the complex DFT. 
e.g. 1160flops for n=64 from two 518flops n=64 real-input FFTs.)

Cordially,
Steven G. Johnson

[1] P. Duhamel and M. Vetterli, "Fast Fourier transforms: A tutorial 
review and a state of the art," Signal Processing 19, 259-299 (1990).
[2] R. Yavne, "An economical method for calculating the discrete Fourier 
transform," Proc. AFIPS 33, 115-125 (1968).
[3]  H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus, 
"Real-valued fast Fourier transform algorithms," IEEE Trans. Acoust. 
Speech Sig. Processing ASSP-35, 849&#4294967295;863 (1987).

Reply by James Van Buskirk ●January 22, 20042004-01-22

"Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
news:4010424d$0$564$b45e6eb0@senator-bedfellow.mit.edu...

> Please do let me know when you publish your results, as I try to keep
> abreast of the latest FFT developments.

Well, I would just post the code the n = 64, but it's a little long --
35937 bytes.  Is this objectionably long for this group?  The trick
that made the difference for n = 64 would also improve the number of
multiplies for some of my smaller modules.  I'm not sure at this point
whether I want to rewrite them all (many take a day each) or try to
create a code generator that can do what I do in the short time
available to me just now to ponder this problem again.

> Actually, it has supposedly been rigorously proved [1] that split-radix
> achieves the minimal number of complex additions.

My code is structured for minimum-add so I am not challenging the
above assertion.

> [1] P. Duhamel and M. Vetterli, "Fast Fourier transforms: A tutorial
> review and a state of the art," Signal Processing 19, 259-299 (1990).
> [2] R. Yavne, "An economical method for calculating the discrete Fourier
> transform," Proc. AFIPS 33, 115-125 (1968).
> [3]  H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus,
> "Real-valued fast Fourier transform algorithms," IEEE Trans. Acoust.
> Speech Sig. Processing ASSP-35, 849&#4294967295;863 (1987).

Thanks for the comments and the references.  Lets see... I've read [1],
and it is highly recommended.  [2] just happened to be sitting on my
desk as I read this, but I'm not sure if I've seen [3].

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

Previous 123 Next

Large FFT --- NOT radix-2

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group