stevenj@alum.mit.edu wrote:
> Rune Allnor wrote:
> > The radix-2 FFT is probably the fastest FFT [where] run-time is
> > concerned.
>
> Um, this is not even close to true, in my experience.   Unless by
> "radix-2" you just mean "any FFT for 2^n", which is not the standard
> terminology.
>
> > There are prime-factor FFTs around that are more
> > general, but probably not quite as fast as the radix-2 version.
>
> I think you mean mixed-radix FFTs (which don't require prime factors,
> by the way).  The Prime-Factor Algorithm is *less* general, because it
> requires its factors to be relatively prime (e.g. it doesn't work for
> 2^n.)
>
> Keeping your data within factors of 2/3/5 is generally faster, given a
> good mixed-radix code, than restricting yourself to 2^n, if the latter
> would require you to significantly increase the length of your array
> beyond what you need (e.g. you may need to almost double the length).
> The situation is even more dramatic in 2d and 3d, where adjacent
> power-of-two sizes differ by factors of 4 and 8, respectively.
>
> (Note that the comparison is muddied by the fact that there is a cycle:
> most people are ingrained to use 2^n, so 2^n FFTs get most of the
> optimization effort, so most people use 2^n, so...)
>
> Steven

OK, I got the details wrong.

Still, most users of an FFT package would not be concerned about
the possibility of the FFT going a few millisecond faster if the
FFT lengths comprised certain factors. Only the really demanding
users would care, and I don't there are very many such users. 

Rune

Rune Allnor wrote:
> The radix-2 FFT is probably the fastest FFT [where] run-time is
> concerned.

Um, this is not even close to true, in my experience.   Unless by
"radix-2" you just mean "any FFT for 2^n", which is not the standard
terminology.

> There are prime-factor FFTs around that are more
> general, but probably not quite as fast as the radix-2 version.

I think you mean mixed-radix FFTs (which don't require prime factors,
by the way).  The Prime-Factor Algorithm is *less* general, because it
requires its factors to be relatively prime (e.g. it doesn't work for
2^n.)

Keeping your data within factors of 2/3/5 is generally faster, given a
good mixed-radix code, than restricting yourself to 2^n, if the latter
would require you to significantly increase the length of your array
beyond what you need (e.g. you may need to almost double the length).
The situation is even more dramatic in 2d and 3d, where adjacent
power-of-two sizes differ by factors of 4 and 8, respectively.

(Note that the comparison is muddied by the fact that there is a cycle:
most people are ingrained to use 2^n, so 2^n FFTs get most of the
optimization effort, so most people use 2^n, so...)

Steven

James Van Buskirk wrote:
> <stevenj@alum.mit.edu> wrote in message
> news:1125943989.790065.231790@o13g2000cwo.googlegroups.com...
>
> > Of course.  This is why, for computational purposes, I've always felt
> > that the best normalization for FFTs is no normalization at all---any
> > overall scale factor can, in most practical applications, be absorbed
> > elsewhere at little or no cost (e.g. in the convolution kernel).
>
> But continuing along these lines is dangerous because it can lead
> to computing convolutions in fewer operations than would be the case
> if the FFT and its inverse (or transpose) were used, as in
>
> http://home.comcast.net/~kmbtib/conv2c.f90
>
> and this would violate the orthodoxy contraints of the current
> forum...

I don't know what you mean by comp.dsp being "orthodox", but
as a general rule, an improvement of a general algorithm takes
advantage of some particular property of some subset of the
measurements, and the improvement comes at the expense of
general usefulness of the algorithm.

The radix-2 FFT is probably the fastest FFT what run-time is
concerned. There are prime-factor FFTs around that are more
general, but probably not quite as fast as the radix-2 version.
Does that mean that the radix-2 FFT is a better tool than the
general FFT? It depends on the cost of restricting the data
to be of length 2^N for some integer N. Some times that
restriction is more expensive, in some way other than run-time,
than using a general length FFT. Are people who prefer the
general FFT "orthodox" because they use a tool with more
flexibility?

You tell me.

Rune

stev...@alum.mit.edu wrote:
> James Van Buskirk wrote:
> > But continuing along these lines is dangerous because it can lead
> > to computing convolutions in fewer operations than would be the case
> > if the FFT and its inverse (or transpose) were used [...]
> > and this would violate the orthodoxy contraints of the current
> > forum...
>
> James, you know that I respect your work on reducing op counts, but the
> general idea that a specialized convolution algorithm can be more
> efficient (either in arithmetic counts or by other measures) than a
> general-purpose FFT/IFFT pair has been known for decades and is not
> especially controversial.  However, this kind of specialization usually
> comes at the cost of generality.
>
> The classic example, of course, is the idea that one can eliminate
> bit-reversal passes if you are just doing a convolution (or, even if
> your FFT doesn't have a separate bit-reversal "pass", a lack of
> re-ordering can have other benefits).  In principle, such a permuted
> FFT can be used for just about any other FFT application as well...you
> just have to absorb the permutation into other parts of the
> computation.   Requiring the user to deal with a permuted ordering in
> general, however, imposes nontrivial additional coding complexity as
> well as (depending on how the user implements it) a possibly
> non-negligible time as well (changing memory access patterns is not
> free).

i dunno what James's issue is (you might have had some discussion that
i wasn't watching), but one idea i used (in the olden days when i
actually did some work in fast convolution) was to have two different
routines for the FFT and iFFT, one based on decimation-in-frequency
(that required normal ordering in and yielded bit-reversed ordering
going out) for the FFT and decimation-in-time (bit-reversed input,
normal output) for the iFFT.  (the transfer function had to be
pre-bit-reversed.)  the extra code was very, very small and they could
use the same twiddle factor table.  that saved in the bit reverse
process (by avoiding it) since memory accesses were pretty slow.  the
only other trick was that it was written so that multiplication by 1 or
by j was avoided.  other than that, it was a pretty vanilla flavored
radix-2 FFT).

--

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

James Van Buskirk wrote:
> But continuing along these lines is dangerous because it can lead
> to computing convolutions in fewer operations than would be the case
> if the FFT and its inverse (or transpose) were used [...]
> and this would violate the orthodoxy contraints of the current
> forum...

James, you know that I respect your work on reducing op counts, but the
general idea that a specialized convolution algorithm can be more
efficient (either in arithmetic counts or by other measures) than a
general-purpose FFT/IFFT pair has been known for decades and is not
especially controversial.  However, this kind of specialization usually
comes at the cost of generality.

The classic example, of course, is the idea that one can eliminate
bit-reversal passes if you are just doing a convolution (or, even if
your FFT doesn't have a separate bit-reversal "pass", a lack of
re-ordering can have other benefits).  In principle, such a permuted
FFT can be used for just about any other FFT application as well...you
just have to absorb the permutation into other parts of the
computation.   Requiring the user to deal with a permuted ordering in
general, however, imposes nontrivial additional coding complexity as
well as (depending on how the user implements it) a possibly
non-negligible time as well (changing memory access patterns is not
free).

For a *general-purpose* FFT routine, the most I feel justified in
pushing onto the user is a trivial overall scale factor, largely
because (unlike the output ordering or a per-output scaling) there is
no general agreement on what the "right" scale factor should be.  (Even
so, for things like Matlab a normalized fft/ifft pair is more
appropriate.)

Steven

<stevenj@alum.mit.edu> wrote in message
news:1125943989.790065.231790@o13g2000cwo.googlegroups.com...

> Of course.  This is why, for computational purposes, I've always felt
> that the best normalization for FFTs is no normalization at all---any
> overall scale factor can, in most practical applications, be absorbed
> elsewhere at little or no cost (e.g. in the convolution kernel).

But continuing along these lines is dangerous because it can lead
to computing convolutions in fewer operations than would be the case
if the FFT and its inverse (or transpose) were used, as in

http://home.comcast.net/~kmbtib/conv2c.f90

and this would violate the orthodoxy contraints of the current
forum...

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

> strictly speaking, i guess you're right, Steven, but since
> the transfer function, H[k] = DFT{ h[n] } (where h[n] is the
> FIR) is calculated in advance and is constant for fast
> convolution, you can deal with all of the scaling in H[k]
> and you don't need any 1/N anywhere, in either the
> forward or inverse FFT.

Of course.  This is why, for computational purposes, I've always felt
that the best normalization for FFTs is no normalization at all---any
overall scale factor can, in most practical applications, be absorbed
elsewhere at little or no cost (e.g. in the convolution kernel).

Steven

in article 1125910850.954658.133700@g14g2000cwa.googlegroups.com, Rune
Allnor at allnor@tele.ntnu.no wrote on 09/05/2005 05:00:

> 
> stevenj@alum.mit.edu wrote:
>>   Using 1/sqrt(N) will *not* give you the
>> correct normalization for a convolution a * b = IFFT(FFT(a) FFT(b)),
>> and it *does* matter whether you put the 1/N in the FFT or the IFFT (it
>> should go in the IFFT).

strictly speaking, i guess you're right, Steven, but since the transfer
function, H[k] = DFT{ h[n] } (where h[n] is the FIR) is calculated in
advance and is constant for fast convolution, you can deal with all of the
scaling in H[k] and you don't need any 1/N anywhere, in either the forward
or inverse FFT.

>>  The reason is that the normalization factor of
>> the FFT is squared in this expression while that of the IFFT is not.
> 
> I don't understand what you mean by "squared FFT"?
> 
>> This is a good example of why the idea that there is only one "proper"
>> normalization of the DFT is misguided, I think.
> 
> Agreed. I have checked my books, and all DSP books define the
> N-point DFT/IDFT pair as
> 
>           N-1
>    X[k] = sum  x[n] exp(-j2*pi*n*k/N)                      [1.a]
>           n=0
> 
>           1  N-1
>    x[n] = -  sum  X[k] exp( j2*pi*n*k/N)                   [1.b]
>           N  k=0

as some others have recently pointed out, this seemed bass-ackwards to me,
too.  at least at first.  ostensibly, the most natural definition of the DFT
would start with:

           N-1
    x[n] = sum  X[k] exp(+j*2*pi*n*k/N)
           k=0

resulting in:

           1  N-1
    X[k] = -  sum  x[n] exp(-j*2*pi*n*k/N)
           N  n=0

this definition is nice to compare the DFT to Fourier Series but the problem
with it is that, for circular convolution (in the "time domain"), it results
in:

    DFT{ h[n] (*) x[n] }  =  N * H[k] * X[k]

                                            (*) means "discrete convolution"

so you gotta remember dragging that N into this (usually by scaling H[k])
whereas for the "conventional" definition

    DFT{ h[n] (*) x[n] }  =  H[k] * X[k]

no "N" factor, at least for "time domain" circular convolution.  you'll get
it for

    iDFT{ H[k] (*) X[k] }  =  N * h[n] * x[n]

when using the "conventional" definition of DFT and iDFT.  all this is just
housekeeping.

> With this convention, Parseval's identity is just about the only
> formula that needs to take the number of points into account,
> 
>    N-1             1 N-1
>    sum |x[n]|^2  = - sum |X[k]|^2                           [2]
>    n=0             N k=0
> 
> I saw somewhere that Gauss himself had developed something similar
> to the FFT some 200 years ago, so it makes sense to minimize the
> use of scale factors etc, in these formulas.

the only way to do that for the DFT/iDFT is to use the 1/sqrt(N) factor as
you say so below.

> If you look at the Fourier transform, however, the use of different
> scaling factors is more easily observed. There is a classic book
> on the FT,
> 
> Papoulis: "The Fourier Integral and its Applications"
>           McGraw-Hill, 1962
> 
> where the FT/IFT pair is defined as
> 
>               +inf
>     X(w) =  integral  x(t) exp(-jwt) dt                   [2.a]
>               -inf
> 
>              1     +inf
>     x(t) = ----  integral  X(w) exp(jwt) dw              [2.b]
>            2*pi    -inf

with a slight mod of this definition:

                +inf
     X(jw) =  integral{  x(t) exp(-jwt) dt}
                -inf

              1     +inf
     x(t) = ----  integral{  X(jw) exp(jwt) dw}
            2*pi    -inf

you get one that is identical to the two-sided Laplace Transform with the
substitution of s -> jw.  sometimes that's nice.

> In a mathematics textbook,
> 
> Kreyzig: "Advanced Engineering Mathematics", 6th ed.,
>           Wiley, 1988, p 632

i gotta earlier version of that one. 1975 i think.

> the definitions are
> 
>                1         +inf
>     X(w) = ---------  integral  x(t) exp(-jwt) dt         [3.a]
>            sqrt(2pi)     -inf
> 
>                1         +inf
>     x(t) = ---------   integral  X(w) exp(jwt) dw         [3.b]
>            sqrt(2pi)     -inf

here is a nice, virtually isometric definition that you see commonly in EE
communications textbooks (e.g. Carlson):

              +inf
    X(f) =  integral{ x(t) * exp(-j*2*pi*f*t) dt}
              -inf

              +inf
    x(t) =  integral{ X(f) * exp(+j*2*pi*f*t) df}
              -inf

like the Kreyzig def, no funky 2*pi or sqrt(2*pi) factors to worry about in
either use of Parseval's Theorem or the Duality Theorem, but this has better
normalization properties to get the DC component or power of the x(t) (the
Kreyzig def has ugly sqrt(2*pi) factors).  in addition it's Hz and not
rad/sec.

this is IMO the cleanest definition.

> Other texts still, switch the signs in the exp() terms of the
> FT and IFT.

i've only seen that in the context of the so-called "Characteristic
Function" of a p.d.f. of a random variable.

i wish we could get everyone to agree on the definition of both the
continuous and discrete Fourier Transform, and i want to dictate the
universal definition.  :-)

> As for why I prefer to think  in terms of the 1/sqrt(N)
> normalization, I like to think of the DFT/IDFT as a unitary
> transform pair, defined as
> 
>     X = Wx                                                [4.a]
>     x = W^HX                                              [4.b]
> 
> where x and X are Nx1 vectors and W is an NxN unitary matrix,
> 
>    W[n,k] = 1/sqrt(N) exp(-j2*pi*n*k/N)                    [5]
> 
> With this convention, we find
> 
>    W^HW = I                                               [6.a]
>    WW^H = I                                               [6.b]
> 
> and so the DFT preserves the norm and all that. The problem with
> the definition [5] is that one needs to drag that 1/sqrt(N)
> around everywhere, which is cumbersome analytically, as well as
> numerically expensive.

depends on what you're doing.  in a MATLAB-like environment, perhaps you
wanna drag it around and pay for the scaling expense and not have to think
about it.  in a real-time fast convolution application, i would leave the
1/N out of *both* forward and inverse FFT and put all of the necessary
scaling into H[k].  why not?

> So there are several very good practical reasons for doing things
> the way they are done now, and have been for at least four decades,

namely to reduce confusion.

but i don't agree with that for everything in general.  Mathworks still
should fix their indexing deficit in MATLAB.  even if MATLAB has been around
for a couple of decades and people are getting used to finding their DC
component in X(1).  there is still no excuse for it.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."