comp.dsp | use of radix-2 ffts| page 2

Reply by Andrew Reilly ●October 10, 20032003-10-10

On Fri, 10 Oct 2003 21:33:08 +0000, Greg Berchin wrote:

> On Fri, 10 Oct 2003 19:05:42 GMT, "James Van Buskirk"
> <not_valid@comcast.net> wrote:
> 
>>>The increase in resource usage for radix-2 FFTs is not always
>>>small.  The number of cache or memory reads and writes goes
>>>up significantly compared to, for example, radix-8 FFTs.  
> 
> Well, the original question had to do with radix-4 vs. radix-2.
> Nonetheless, though a radix-8 FFT may provide better resource
> utilization in a particular implementation, if the user needs a
> 16384-point FFT then they are out of luck.
> 
>>>On a processor where these are the limiting resource radix-2 
>>>FFTs are a bad idea, e.g. Alpha 21164.
> 
> As I said, "The answer to your question depends upon the 
> specific circumstances."

I agree entirely.  If you consider the other extreme: a classical DSP with
zero-wait-state, multiple access on-chip memory and a MAC unit, the
"fancy" algorithms that avoid multiplies don't help at all.  Radix 2 can
be the best approach.

-- 
Andrew

Reply by James Van Buskirk ●October 11, 20032003-10-11

"Greg Berchin" <Bank-to-Turn@mchsi.com> wrote in message
news:ot8eovgv6tan2rs6t0i90k1f8skjn55jft@4ax.com...

> Well, the original question had to do with radix-4 vs. radix-2.
> Nonetheless, though a radix-8 FFT may provide better resource
> utilization in a particular implementation, if the user needs a
> 16384-point FFT then they are out of luck.

I don't understand the comment about 16384-point FFTs.  Does using
a radix-8 FFT to carry out most of the work for such an FFT mean
that one is honor-bound never to break up the 14 stages into 4
stages of radix-8 and one stage of radix-4, or something?  In my
power of 2 full complex FFT code, the program tries to do as much
work as possible via the radix-8 workhorse but obviously to handle
all integer powers of 2 it also has radix-4 and radix-2 stages
available to it that it utilizes as necessary when the command
generator component decides.  Works OK for what it was designed
for and is more efficient than the CXML FFT for all powers of 2
through 4194304 except for 8192 IIRC on my 21164.

-- 
write(*,*) transfer((/17.392111325966148d0,3.6351694777236872d228, &
6.0134700169991705d-154/),(/'x'/)); end

Reply by Greg Berchin ●October 11, 20032003-10-11

On Sat, 11 Oct 2003 07:09:14 GMT, "James Van Buskirk"
<not_valid@comcast.net> wrote:

>>I don't understand the comment about 16384-point FFTs.  Does using
>>a radix-8 FFT to carry out most of the work for such an FFT mean
>>that one is honor-bound never to break up the 14 stages into 4
>>stages of radix-8 and one stage of radix-4, or something?  

No, not at all.  That is one way to perform a 16384 FFT using
radix-8.  My point was addressed to the situation in which only
ONE radix is implemented.  If you have nothing but radix-8
butterflies in your set of primitives, then 16384 is unreachable.

Greg Berchin

Reply by Steven G. Johnson ●October 12, 20032003-10-12

James Van Buskirk wrote:
> The increase in resource usage for radix-2 FFTs is not always
> small.  The number of cache or memory reads and writes goes
> up significantly compared to, for example, radix-8 FFTs.  On
> a processor where these are the limiting resource radix-2 FFTs
> are a bad idea, e.g. Alpha 21164.

I would go further and say that this is true for essentially every 
modern general-purpose CPU (not including things like DSP chips, as 
another poster pointed out).  I don't recall seeing *any* pure radix-2 
implementation that wasn't typically several times slower than the 
fastest available FFT code for such a CPU (see benchmarks at 
www.fftw.org/benchfft).

Although if you really care about cache (as you should, especially for 
transforms of a few thousand points or more), there are a lot of 
additional tricks besides just using higher radices.  [For one thing, we 
(fftw.org) have argued that the traditional 1965 non-recursive style of 
implementation is a bad idea.]

(And for small sizes where cache isn't as important, there are other 
equally compelling reasons not to do radix-2, having to do with the CPU 
pipeline etc.  It's not just a matter of finding a radix that "fits" in 
the registers.  Modern x86 processors have a lot more registers 
internally than are exposed by the instruction set, and moreover it is 
possible to some extent to simulate a larger register set with a smaller 
one thanks to the way most CPU pipelines work.)

Things have changed a lot since the 1980's, when the authors of 
Numerical Recipes could write that higher radices etc. "can be faster 
than simpler FFTs by some significant, but not overwhelming, factor, 
e.g., 20 or 30 percent" (this is only true if speed == # of 
floating-point operations).  Unfortunately, they and many other authors 
have not updated their books to reflect this.

Cordially,
Steven G. Johnson

Previous 12Next

use of radix-2 ffts

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group