comp.dsp | Fast Convolution: does it matter whether I normalize to block length (N) in the FFT versus in the iFFT?| page 2

Reply by Rune Allnor ●September 2, 20052005-09-02

Ron Gerhards wrote:
> >
> >Ron Gerhards wrote:
> >
> >> >> More importantly, wouldn't the result you get in doing fast
> >> convolution
> >> >> differ depending on where the FFT/IFFT 1/N normalization was done?
> >> >
> >> >Formally, no, it does not matter where the division is made. Both the
> >> >FFT and IFFT are linear operators, so any constant scaling factor
> >> >can be factored out from the operation, ending in an overall 1/N
> factor
> >> >for the FFT/IFFT combo.
> >> >
> >>
> >> Right, but when I'm doing fast convolution, I'm multiplying the result
> of
> >> 2 FFT'd signals (input and filter). If my 1/N normalization was in the
> >> FFT, I would get a factor of (1/N) x (1/N) in the result (and the iFFT
> in
> >> this case would do no normalizing). If my 1/N normalization was in the
> >> iFFT, I would get only a factor of (1/N) due to the iFFT. Does that
> make
> >> sense? I realize it would only be a scale factor, but if true, the end
> >> result of the fast convolution would be a much smaller signal in the
> case
> >> where my normalization is in the iFFT. I think.
> >
> >What you FFT, is a convolution of x[n] and h[n]:
> >
> >Y[k] = FFT{x[n](*)h[n]} = 1/sqrt(N)FFT'{x[n](*)h[n]}
> >
> >where FFT'{} is the "unproperly scaled" version of the FFT.
> >The 1/sqrt(N) factor appears only once. I think...
> >
> >Rune
> >
> >
>
> I'm not sure what you mean by "unproperly scaled" version of the FFT.
>
> Perhaps let me put the question this way:
>
> Let FFT' be the FFT without ANY normalization to N.

This is what I called the "unproperly scaled FFT".

> Let iFFT' be the iFFT without ANY normalization to N.
>
>
> In case where normalization to N occurs in iFFT, I have:
>
> y1[n]=i[n]*h[n]
>     = (1/N)x iFFT'(FFT'(i) x FFT'(h))
>
> In case where normalization to N occurs in FFT, I have:
>
> y2[n]=i[n]*h[n]
>     = iFFT'((1/N)x FFT'(i) x (1/N)x FFT'(h))
>     = (1/N)x (1/N) x iFFT'(FFT'(i) x FFT'(h))
>
> and therefore y2[n] is not equal to y1[n].
>
> Have I done something wrong here, or are they really not equal??

Hmm... I thought of that right after I hit the "post" button
yesterday. It seems you are right, at least I am not able
to find a flaw.

Let's see what happens with the 1/sqrt(N) terms:

y3[n] = 1/sqrt(N)*IFFT'{1/sqrt(N)*FFT'(i) * 1/sqrt(N)*FFT(h)}
      = N^{3/2} IFFT'{FFT'(i)*FFT(x)}

It is interesting to see that  |y2[n]| < |y3[n]| < |y1[n]| fo all n.

An interesting question, this. Usually, people don't pay much
attention to the scaling factors. I did not realize before now
that the normalization factor behaved like this.

You may actually be right in that the order of the normalization
factors make a significant difference.

Rune

Reply by Ron Gerhards ●September 2, 20052005-09-02

>Hi Ron,
>
>As you've discovered, for FFT convolution, it makes a big difference
where 
>the normalization goes, because the multiplication in the frequency
domain 
>squares any normalization that has been applied up to that point.
>
>I always remember the right place for normalization like this:
convolution 
>with the unit impulse is a noop, so if my FFT and iFFT are inverses, then

>multiplication by the FFT of the unit impulse must be a noop, i.e., 
>FFT([1,0,0,0...]) = [1,1,1,1].  That is an unnormalized FFT.
>
>--
>Matt
>
>
>

Hi Matt,

Thanks very much for validating my thoughts on this. I REALLY like your
explanation!

I guess it confirms that in the case of Fast Convolution, 1/N
normalization must happen in the iFFT. It's good to have a simple,
succinct example that "proves" it.

Ron


		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Reply by ●September 2, 20052005-09-02

Hi Rune,

I'm afraid you're mistaken.  Using 1/sqrt(N) will *not* give you the
correct normalization for a convolution a * b = IFFT(FFT(a) FFT(b)),
and it *does* matter whether you put the 1/N in the FFT or the IFFT (it
should go in the IFFT).  The reason is that the normalization factor of
the FFT is squared in this expression while that of the IFFT is not.

This is a good example of why the idea that there is only one "proper"
normalization of the DFT is misguided, I think.

Cordially,
Steven G. Johnson

Reply by Ron Gerhards ●September 2, 20052005-09-02

>Hi Rune,
>
>I'm afraid you're mistaken.  Using 1/sqrt(N) will *not* give you the
>correct normalization for a convolution a * b = IFFT(FFT(a) FFT(b)),
>and it *does* matter whether you put the 1/N in the FFT or the IFFT (it
>should go in the IFFT).  The reason is that the normalization factor of
>the FFT is squared in this expression while that of the IFFT is not.
>
>This is a good example of why the idea that there is only one "proper"
>normalization of the DFT is misguided, I think.
>
>Cordially,
>Steven G. Johnson
>
>

Hi Steven,

>This is a good example of why the idea that there is only one "proper"
>normalization of the DFT is misguided, I think.

Yes, that seems to be the case. It hadn't occurred to me until faced with
this quandry. 

BTW, I don't believe Rune was saying the 1/sqrt(N) normalization is right.
He was just pointing out that all three normalization approaches give
different answers. 

Thanks,

Ron


		
This message was sent using the Comp.DSP web interface on
www.DSPRelated.com

Reply by Rune Allnor ●September 5, 20052005-09-05

stevenj@alum.mit.edu wrote:
> Hi Rune,
>
> I'm afraid you're mistaken.  Using 1/sqrt(N) will *not* give you the
> correct normalization for a convolution a * b = IFFT(FFT(a) FFT(b)),
> and it *does* matter whether you put the 1/N in the FFT or the IFFT (it
> should go in the IFFT).  The reason is that the normalization factor of
> the FFT is squared in this expression while that of the IFFT is not.

I don't understand what you mean by "squared FFT"?

> This is a good example of why the idea that there is only one "proper"
> normalization of the DFT is misguided, I think.

Agreed. I have checked my books, and all DSP books define the
N-point DFT/IDFT pair as

          N-1
   X[k] = sum  x[n] exp(-j2*pi*n*k/N)                      [1.a]
          n=0

          1  N-1
   x[n] = -  sum  X[k] exp( j2*pi*n*k/N)                   [1.b]
          N  k=0

With this convention, Parseval's identity is just about the only
formula that needs to take the number of points into account,

   N-1             1 N-1
   sum |x[n]|^2  = - sum |X[k]|^2                           [2]
   n=0             N k=0

I saw somewhere that Gauss himself had developed something similar
to the FFT some 200 years ago, so it makes sense to minimize the
use of scale factors etc, in these formulas.

If you look at the Fourier transform, however, the use of different
scaling factors is more easily observed. There is a classic book
on the FT,

Papoulis: "The Fourier Integral and its Applications"
          McGraw-Hill, 1962

where the FT/IFT pair is defined as

            infinite
    F(w) =  integral  f(t) exp(-jwt) dt                   [2.a]
           -infinite

             1    infinite
    f(t) = -----  integral  F(w) exp(jwt) dw              [2.b]
           2*pi  -infinite

In a mathematics textbook,

Kreyzig: "Advanced Engineering Mathematics", 6th ed.,
          Wiley, 1988, p 632

the definitions are

               1      infinite
    F(w) = ---------  integral  f(t) exp(-jwt) dt         [3.a]
           sqrt(2pi)  -infinite

               1       infinite
    f(t) = ---------   integral  F(w) exp(jwt) dw         [3.b]
           sqrt(2pi)  -infinite

Other texts still, swithch the signs in the exp() terms of the
FT and IFT.

As for why I prefer to think  in terms of the 1/sqrt(N)
normalization, I like to think of the DFT/IDFT as a unitary
transform pair, defined as

    X = Wx                                                [4.a]
    x = W^HX                                              [4.b]

where x and X are Nx1 vectors and W is an NxN unitary matrix,

   W[n,k] = 1/sqrt(N) exp(-j2*pi*n*k/N)                    [5]

With this convention, we find

   W^HW = I                                               [6.a]
   WW^H = I                                               [6.b]

and so the DFT preserves the norm and all that. The problem with
the definition [5] is that one needs to drag that 1/sqrt(N)
around everywhere, which is cumbersome analytically, as well as
numerically expensive.

So there are several very good practical reasons for doing things
the way they are done now, and have been for at least four decades,
perhaps since the time of Gauss himself.

Rune

Reply by robert bristow-johnson ●September 5, 20052005-09-05

in article 1125910850.954658.133700@g14g2000cwa.googlegroups.com, Rune
Allnor at allnor@tele.ntnu.no wrote on 09/05/2005 05:00:

> 
> stevenj@alum.mit.edu wrote:
>>   Using 1/sqrt(N) will *not* give you the
>> correct normalization for a convolution a * b = IFFT(FFT(a) FFT(b)),
>> and it *does* matter whether you put the 1/N in the FFT or the IFFT (it
>> should go in the IFFT).

strictly speaking, i guess you're right, Steven, but since the transfer
function, H[k] = DFT{ h[n] } (where h[n] is the FIR) is calculated in
advance and is constant for fast convolution, you can deal with all of the
scaling in H[k] and you don't need any 1/N anywhere, in either the forward
or inverse FFT.

>>  The reason is that the normalization factor of
>> the FFT is squared in this expression while that of the IFFT is not.
> 
> I don't understand what you mean by "squared FFT"?
> 
>> This is a good example of why the idea that there is only one "proper"
>> normalization of the DFT is misguided, I think.
> 
> Agreed. I have checked my books, and all DSP books define the
> N-point DFT/IDFT pair as
> 
>           N-1
>    X[k] = sum  x[n] exp(-j2*pi*n*k/N)                      [1.a]
>           n=0
> 
>           1  N-1
>    x[n] = -  sum  X[k] exp( j2*pi*n*k/N)                   [1.b]
>           N  k=0

as some others have recently pointed out, this seemed bass-ackwards to me,
too.  at least at first.  ostensibly, the most natural definition of the DFT
would start with:

           N-1
    x[n] = sum  X[k] exp(+j*2*pi*n*k/N)
           k=0

resulting in:

           1  N-1
    X[k] = -  sum  x[n] exp(-j*2*pi*n*k/N)
           N  n=0

this definition is nice to compare the DFT to Fourier Series but the problem
with it is that, for circular convolution (in the "time domain"), it results
in:

    DFT{ h[n] (*) x[n] }  =  N * H[k] * X[k]

                                            (*) means "discrete convolution"

so you gotta remember dragging that N into this (usually by scaling H[k])
whereas for the "conventional" definition

    DFT{ h[n] (*) x[n] }  =  H[k] * X[k]

no "N" factor, at least for "time domain" circular convolution.  you'll get
it for

    iDFT{ H[k] (*) X[k] }  =  N * h[n] * x[n]

when using the "conventional" definition of DFT and iDFT.  all this is just
housekeeping.

> With this convention, Parseval's identity is just about the only
> formula that needs to take the number of points into account,
> 
>    N-1             1 N-1
>    sum |x[n]|^2  = - sum |X[k]|^2                           [2]
>    n=0             N k=0
> 
> I saw somewhere that Gauss himself had developed something similar
> to the FFT some 200 years ago, so it makes sense to minimize the
> use of scale factors etc, in these formulas.

the only way to do that for the DFT/iDFT is to use the 1/sqrt(N) factor as
you say so below.

> If you look at the Fourier transform, however, the use of different
> scaling factors is more easily observed. There is a classic book
> on the FT,
> 
> Papoulis: "The Fourier Integral and its Applications"
>           McGraw-Hill, 1962
> 
> where the FT/IFT pair is defined as
> 
>               +inf
>     X(w) =  integral  x(t) exp(-jwt) dt                   [2.a]
>               -inf
> 
>              1     +inf
>     x(t) = ----  integral  X(w) exp(jwt) dw              [2.b]
>            2*pi    -inf

with a slight mod of this definition:

                +inf
     X(jw) =  integral{  x(t) exp(-jwt) dt}
                -inf

              1     +inf
     x(t) = ----  integral{  X(jw) exp(jwt) dw}
            2*pi    -inf

you get one that is identical to the two-sided Laplace Transform with the
substitution of s -> jw.  sometimes that's nice.

> In a mathematics textbook,
> 
> Kreyzig: "Advanced Engineering Mathematics", 6th ed.,
>           Wiley, 1988, p 632

i gotta earlier version of that one. 1975 i think.

> the definitions are
> 
>                1         +inf
>     X(w) = ---------  integral  x(t) exp(-jwt) dt         [3.a]
>            sqrt(2pi)     -inf
> 
>                1         +inf
>     x(t) = ---------   integral  X(w) exp(jwt) dw         [3.b]
>            sqrt(2pi)     -inf

here is a nice, virtually isometric definition that you see commonly in EE
communications textbooks (e.g. Carlson):

              +inf
    X(f) =  integral{ x(t) * exp(-j*2*pi*f*t) dt}
              -inf

              +inf
    x(t) =  integral{ X(f) * exp(+j*2*pi*f*t) df}
              -inf

like the Kreyzig def, no funky 2*pi or sqrt(2*pi) factors to worry about in
either use of Parseval's Theorem or the Duality Theorem, but this has better
normalization properties to get the DC component or power of the x(t) (the
Kreyzig def has ugly sqrt(2*pi) factors).  in addition it's Hz and not
rad/sec.

this is IMO the cleanest definition.

> Other texts still, switch the signs in the exp() terms of the
> FT and IFT.

i've only seen that in the context of the so-called "Characteristic
Function" of a p.d.f. of a random variable.

i wish we could get everyone to agree on the definition of both the
continuous and discrete Fourier Transform, and i want to dictate the
universal definition.  :-)

> As for why I prefer to think  in terms of the 1/sqrt(N)
> normalization, I like to think of the DFT/IDFT as a unitary
> transform pair, defined as
> 
>     X = Wx                                                [4.a]
>     x = W^HX                                              [4.b]
> 
> where x and X are Nx1 vectors and W is an NxN unitary matrix,
> 
>    W[n,k] = 1/sqrt(N) exp(-j2*pi*n*k/N)                    [5]
> 
> With this convention, we find
> 
>    W^HW = I                                               [6.a]
>    WW^H = I                                               [6.b]
> 
> and so the DFT preserves the norm and all that. The problem with
> the definition [5] is that one needs to drag that 1/sqrt(N)
> around everywhere, which is cumbersome analytically, as well as
> numerically expensive.

depends on what you're doing.  in a MATLAB-like environment, perhaps you
wanna drag it around and pay for the scaling expense and not have to think
about it.  in a real-time fast convolution application, i would leave the
1/N out of *both* forward and inverse FFT and put all of the necessary
scaling into H[k].  why not?

> So there are several very good practical reasons for doing things
> the way they are done now, and have been for at least four decades,

namely to reduce confusion.

but i don't agree with that for everything in general.  Mathworks still
should fix their indexing deficit in MATLAB.  even if MATLAB has been around
for a couple of decades and people are getting used to finding their DC
component in X(1).  there is still no excuse for it.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by ●September 5, 20052005-09-05

> strictly speaking, i guess you're right, Steven, but since
> the transfer function, H[k] = DFT{ h[n] } (where h[n] is the
> FIR) is calculated in advance and is constant for fast
> convolution, you can deal with all of the scaling in H[k]
> and you don't need any 1/N anywhere, in either the
> forward or inverse FFT.

Of course.  This is why, for computational purposes, I've always felt
that the best normalization for FFTs is no normalization at all---any
overall scale factor can, in most practical applications, be absorbed
elsewhere at little or no cost (e.g. in the convolution kernel).

Steven

Reply by James Van Buskirk ●September 6, 20052005-09-06

<stevenj@alum.mit.edu> wrote in message
news:1125943989.790065.231790@o13g2000cwo.googlegroups.com...

> Of course.  This is why, for computational purposes, I've always felt
> that the best normalization for FFTs is no normalization at all---any
> overall scale factor can, in most practical applications, be absorbed
> elsewhere at little or no cost (e.g. in the convolution kernel).

But continuing along these lines is dangerous because it can lead
to computing convolutions in fewer operations than would be the case
if the FFT and its inverse (or transpose) were used, as in

http://home.comcast.net/~kmbtib/conv2c.f90

and this would violate the orthodoxy contraints of the current
forum...

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

Reply by ●September 6, 20052005-09-06

James Van Buskirk wrote:
> But continuing along these lines is dangerous because it can lead
> to computing convolutions in fewer operations than would be the case
> if the FFT and its inverse (or transpose) were used [...]
> and this would violate the orthodoxy contraints of the current
> forum...

James, you know that I respect your work on reducing op counts, but the
general idea that a specialized convolution algorithm can be more
efficient (either in arithmetic counts or by other measures) than a
general-purpose FFT/IFFT pair has been known for decades and is not
especially controversial.  However, this kind of specialization usually
comes at the cost of generality.

The classic example, of course, is the idea that one can eliminate
bit-reversal passes if you are just doing a convolution (or, even if
your FFT doesn't have a separate bit-reversal "pass", a lack of
re-ordering can have other benefits).  In principle, such a permuted
FFT can be used for just about any other FFT application as well...you
just have to absorb the permutation into other parts of the
computation.   Requiring the user to deal with a permuted ordering in
general, however, imposes nontrivial additional coding complexity as
well as (depending on how the user implements it) a possibly
non-negligible time as well (changing memory access patterns is not
free).

For a *general-purpose* FFT routine, the most I feel justified in
pushing onto the user is a trivial overall scale factor, largely
because (unlike the output ordering or a per-output scaling) there is
no general agreement on what the "right" scale factor should be.  (Even
so, for things like Matlab a normalized fft/ifft pair is more
appropriate.)

Steven

Reply by robert bristow-johnson ●September 6, 20052005-09-06

stev...@alum.mit.edu wrote:
> James Van Buskirk wrote:
> > But continuing along these lines is dangerous because it can lead
> > to computing convolutions in fewer operations than would be the case
> > if the FFT and its inverse (or transpose) were used [...]
> > and this would violate the orthodoxy contraints of the current
> > forum...
>
> James, you know that I respect your work on reducing op counts, but the
> general idea that a specialized convolution algorithm can be more
> efficient (either in arithmetic counts or by other measures) than a
> general-purpose FFT/IFFT pair has been known for decades and is not
> especially controversial.  However, this kind of specialization usually
> comes at the cost of generality.
>
> The classic example, of course, is the idea that one can eliminate
> bit-reversal passes if you are just doing a convolution (or, even if
> your FFT doesn't have a separate bit-reversal "pass", a lack of
> re-ordering can have other benefits).  In principle, such a permuted
> FFT can be used for just about any other FFT application as well...you
> just have to absorb the permutation into other parts of the
> computation.   Requiring the user to deal with a permuted ordering in
> general, however, imposes nontrivial additional coding complexity as
> well as (depending on how the user implements it) a possibly
> non-negligible time as well (changing memory access patterns is not
> free).

i dunno what James's issue is (you might have had some discussion that
i wasn't watching), but one idea i used (in the olden days when i
actually did some work in fast convolution) was to have two different
routines for the FFT and iFFT, one based on decimation-in-frequency
(that required normal ordering in and yielded bit-reversed ordering
going out) for the FFT and decimation-in-time (bit-reversed input,
normal output) for the iFFT.  (the transfer function had to be
pre-bit-reversed.)  the extra code was very, very small and they could
use the same twiddle factor table.  that saved in the bit reverse
process (by avoiding it) since memory accesses were pretty slow.  the
only other trick was that it was written so that multiplication by 1 or
by j was avoided.  other than that, it was a pretty vanilla flavored
radix-2 FFT).

--

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Previous 123 Next

Fast Convolution: does it matter whether I normalize to block length (N) in the FFT versus in the iFFT?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group