DSPRelated.com
Forums

Fast Convolution: does it matter whether I normalize to block length (N) in the FFT versus in the iFFT?

Started by Ron Gerhards August 31, 2005
Ron Gerhards wrote:
> > > >Ron Gerhards wrote: > > > >> >> More importantly, wouldn't the result you get in doing fast > >> convolution > >> >> differ depending on where the FFT/IFFT 1/N normalization was done? > >> > > >> >Formally, no, it does not matter where the division is made. Both the > >> >FFT and IFFT are linear operators, so any constant scaling factor > >> >can be factored out from the operation, ending in an overall 1/N > factor > >> >for the FFT/IFFT combo. > >> > > >> > >> Right, but when I'm doing fast convolution, I'm multiplying the result > of > >> 2 FFT'd signals (input and filter). If my 1/N normalization was in the > >> FFT, I would get a factor of (1/N) x (1/N) in the result (and the iFFT > in > >> this case would do no normalizing). If my 1/N normalization was in the > >> iFFT, I would get only a factor of (1/N) due to the iFFT. Does that > make > >> sense? I realize it would only be a scale factor, but if true, the end > >> result of the fast convolution would be a much smaller signal in the > case > >> where my normalization is in the iFFT. I think. > > > >What you FFT, is a convolution of x[n] and h[n]: > > > >Y[k] = FFT{x[n](*)h[n]} = 1/sqrt(N)FFT'{x[n](*)h[n]} > > > >where FFT'{} is the "unproperly scaled" version of the FFT. > >The 1/sqrt(N) factor appears only once. I think... > > > >Rune > > > > > > I'm not sure what you mean by "unproperly scaled" version of the FFT. > > Perhaps let me put the question this way: > > Let FFT' be the FFT without ANY normalization to N.
This is what I called the "unproperly scaled FFT".
> Let iFFT' be the iFFT without ANY normalization to N. > > > In case where normalization to N occurs in iFFT, I have: > > y1[n]=i[n]*h[n] > = (1/N)x iFFT'(FFT'(i) x FFT'(h)) > > In case where normalization to N occurs in FFT, I have: > > y2[n]=i[n]*h[n] > = iFFT'((1/N)x FFT'(i) x (1/N)x FFT'(h)) > = (1/N)x (1/N) x iFFT'(FFT'(i) x FFT'(h)) > > and therefore y2[n] is not equal to y1[n]. > > Have I done something wrong here, or are they really not equal??
Hmm... I thought of that right after I hit the "post" button yesterday. It seems you are right, at least I am not able to find a flaw. Let's see what happens with the 1/sqrt(N) terms: y3[n] = 1/sqrt(N)*IFFT'{1/sqrt(N)*FFT'(i) * 1/sqrt(N)*FFT(h)} = N^{3/2} IFFT'{FFT'(i)*FFT(x)} It is interesting to see that |y2[n]| < |y3[n]| < |y1[n]| fo all n. An interesting question, this. Usually, people don't pay much attention to the scaling factors. I did not realize before now that the normalization factor behaved like this. You may actually be right in that the order of the normalization factors make a significant difference. Rune
>Hi Ron, > >As you've discovered, for FFT convolution, it makes a big difference
where
>the normalization goes, because the multiplication in the frequency
domain
>squares any normalization that has been applied up to that point. > >I always remember the right place for normalization like this:
convolution
>with the unit impulse is a noop, so if my FFT and iFFT are inverses, then
>multiplication by the FFT of the unit impulse must be a noop, i.e., >FFT([1,0,0,0...]) = [1,1,1,1]. That is an unnormalized FFT. > >-- >Matt > > >
Hi Matt, Thanks very much for validating my thoughts on this. I REALLY like your explanation! I guess it confirms that in the case of Fast Convolution, 1/N normalization must happen in the iFFT. It's good to have a simple, succinct example that "proves" it. Ron This message was sent using the Comp.DSP web interface on www.DSPRelated.com
Hi Rune,

I'm afraid you're mistaken.  Using 1/sqrt(N) will *not* give you the
correct normalization for a convolution a * b = IFFT(FFT(a) FFT(b)),
and it *does* matter whether you put the 1/N in the FFT or the IFFT (it
should go in the IFFT).  The reason is that the normalization factor of
the FFT is squared in this expression while that of the IFFT is not.

This is a good example of why the idea that there is only one "proper"
normalization of the DFT is misguided, I think.

Cordially,
Steven G. Johnson

>Hi Rune, > >I'm afraid you're mistaken. Using 1/sqrt(N) will *not* give you the >correct normalization for a convolution a * b = IFFT(FFT(a) FFT(b)), >and it *does* matter whether you put the 1/N in the FFT or the IFFT (it >should go in the IFFT). The reason is that the normalization factor of >the FFT is squared in this expression while that of the IFFT is not. > >This is a good example of why the idea that there is only one "proper" >normalization of the DFT is misguided, I think. > >Cordially, >Steven G. Johnson > >
Hi Steven,
>This is a good example of why the idea that there is only one "proper" >normalization of the DFT is misguided, I think.
Yes, that seems to be the case. It hadn't occurred to me until faced with this quandry. BTW, I don't believe Rune was saying the 1/sqrt(N) normalization is right. He was just pointing out that all three normalization approaches give different answers. Thanks, Ron This message was sent using the Comp.DSP web interface on www.DSPRelated.com
stevenj@alum.mit.edu wrote:
> Hi Rune, > > I'm afraid you're mistaken. Using 1/sqrt(N) will *not* give you the > correct normalization for a convolution a * b = IFFT(FFT(a) FFT(b)), > and it *does* matter whether you put the 1/N in the FFT or the IFFT (it > should go in the IFFT). The reason is that the normalization factor of > the FFT is squared in this expression while that of the IFFT is not.
I don't understand what you mean by "squared FFT"?
> This is a good example of why the idea that there is only one "proper" > normalization of the DFT is misguided, I think.
Agreed. I have checked my books, and all DSP books define the N-point DFT/IDFT pair as N-1 X[k] = sum x[n] exp(-j2*pi*n*k/N) [1.a] n=0 1 N-1 x[n] = - sum X[k] exp( j2*pi*n*k/N) [1.b] N k=0 With this convention, Parseval's identity is just about the only formula that needs to take the number of points into account, N-1 1 N-1 sum |x[n]|^2 = - sum |X[k]|^2 [2] n=0 N k=0 I saw somewhere that Gauss himself had developed something similar to the FFT some 200 years ago, so it makes sense to minimize the use of scale factors etc, in these formulas. If you look at the Fourier transform, however, the use of different scaling factors is more easily observed. There is a classic book on the FT, Papoulis: "The Fourier Integral and its Applications" McGraw-Hill, 1962 where the FT/IFT pair is defined as infinite F(w) = integral f(t) exp(-jwt) dt [2.a] -infinite 1 infinite f(t) = ----- integral F(w) exp(jwt) dw [2.b] 2*pi -infinite In a mathematics textbook, Kreyzig: "Advanced Engineering Mathematics", 6th ed., Wiley, 1988, p 632 the definitions are 1 infinite F(w) = --------- integral f(t) exp(-jwt) dt [3.a] sqrt(2pi) -infinite 1 infinite f(t) = --------- integral F(w) exp(jwt) dw [3.b] sqrt(2pi) -infinite Other texts still, swithch the signs in the exp() terms of the FT and IFT. As for why I prefer to think in terms of the 1/sqrt(N) normalization, I like to think of the DFT/IDFT as a unitary transform pair, defined as X = Wx [4.a] x = W^HX [4.b] where x and X are Nx1 vectors and W is an NxN unitary matrix, W[n,k] = 1/sqrt(N) exp(-j2*pi*n*k/N) [5] With this convention, we find W^HW = I [6.a] WW^H = I [6.b] and so the DFT preserves the norm and all that. The problem with the definition [5] is that one needs to drag that 1/sqrt(N) around everywhere, which is cumbersome analytically, as well as numerically expensive. So there are several very good practical reasons for doing things the way they are done now, and have been for at least four decades, perhaps since the time of Gauss himself. Rune
in article 1125910850.954658.133700@g14g2000cwa.googlegroups.com, Rune
Allnor at allnor@tele.ntnu.no wrote on 09/05/2005 05:00:

> > stevenj@alum.mit.edu wrote: >> Using 1/sqrt(N) will *not* give you the >> correct normalization for a convolution a * b = IFFT(FFT(a) FFT(b)), >> and it *does* matter whether you put the 1/N in the FFT or the IFFT (it >> should go in the IFFT).
strictly speaking, i guess you're right, Steven, but since the transfer function, H[k] = DFT{ h[n] } (where h[n] is the FIR) is calculated in advance and is constant for fast convolution, you can deal with all of the scaling in H[k] and you don't need any 1/N anywhere, in either the forward or inverse FFT.
>> The reason is that the normalization factor of >> the FFT is squared in this expression while that of the IFFT is not. > > I don't understand what you mean by "squared FFT"? > >> This is a good example of why the idea that there is only one "proper" >> normalization of the DFT is misguided, I think. > > Agreed. I have checked my books, and all DSP books define the > N-point DFT/IDFT pair as > > N-1 > X[k] = sum x[n] exp(-j2*pi*n*k/N) [1.a] > n=0 > > 1 N-1 > x[n] = - sum X[k] exp( j2*pi*n*k/N) [1.b] > N k=0
as some others have recently pointed out, this seemed bass-ackwards to me, too. at least at first. ostensibly, the most natural definition of the DFT would start with: N-1 x[n] = sum X[k] exp(+j*2*pi*n*k/N) k=0 resulting in: 1 N-1 X[k] = - sum x[n] exp(-j*2*pi*n*k/N) N n=0 this definition is nice to compare the DFT to Fourier Series but the problem with it is that, for circular convolution (in the "time domain"), it results in: DFT{ h[n] (*) x[n] } = N * H[k] * X[k] (*) means "discrete convolution" so you gotta remember dragging that N into this (usually by scaling H[k]) whereas for the "conventional" definition DFT{ h[n] (*) x[n] } = H[k] * X[k] no "N" factor, at least for "time domain" circular convolution. you'll get it for iDFT{ H[k] (*) X[k] } = N * h[n] * x[n] when using the "conventional" definition of DFT and iDFT. all this is just housekeeping.
> With this convention, Parseval's identity is just about the only > formula that needs to take the number of points into account, > > N-1 1 N-1 > sum |x[n]|^2 = - sum |X[k]|^2 [2] > n=0 N k=0 > > I saw somewhere that Gauss himself had developed something similar > to the FFT some 200 years ago, so it makes sense to minimize the > use of scale factors etc, in these formulas.
the only way to do that for the DFT/iDFT is to use the 1/sqrt(N) factor as you say so below.
> If you look at the Fourier transform, however, the use of different > scaling factors is more easily observed. There is a classic book > on the FT, > > Papoulis: "The Fourier Integral and its Applications" > McGraw-Hill, 1962 > > where the FT/IFT pair is defined as > > +inf > X(w) = integral x(t) exp(-jwt) dt [2.a] > -inf > > 1 +inf > x(t) = ---- integral X(w) exp(jwt) dw [2.b] > 2*pi -inf
with a slight mod of this definition: +inf X(jw) = integral{ x(t) exp(-jwt) dt} -inf 1 +inf x(t) = ---- integral{ X(jw) exp(jwt) dw} 2*pi -inf you get one that is identical to the two-sided Laplace Transform with the substitution of s -> jw. sometimes that's nice.
> In a mathematics textbook, > > Kreyzig: "Advanced Engineering Mathematics", 6th ed., > Wiley, 1988, p 632
i gotta earlier version of that one. 1975 i think.
> the definitions are > > 1 +inf > X(w) = --------- integral x(t) exp(-jwt) dt [3.a] > sqrt(2pi) -inf > > 1 +inf > x(t) = --------- integral X(w) exp(jwt) dw [3.b] > sqrt(2pi) -inf
here is a nice, virtually isometric definition that you see commonly in EE communications textbooks (e.g. Carlson): +inf X(f) = integral{ x(t) * exp(-j*2*pi*f*t) dt} -inf +inf x(t) = integral{ X(f) * exp(+j*2*pi*f*t) df} -inf like the Kreyzig def, no funky 2*pi or sqrt(2*pi) factors to worry about in either use of Parseval's Theorem or the Duality Theorem, but this has better normalization properties to get the DC component or power of the x(t) (the Kreyzig def has ugly sqrt(2*pi) factors). in addition it's Hz and not rad/sec. this is IMO the cleanest definition.
> Other texts still, switch the signs in the exp() terms of the > FT and IFT.
i've only seen that in the context of the so-called "Characteristic Function" of a p.d.f. of a random variable. i wish we could get everyone to agree on the definition of both the continuous and discrete Fourier Transform, and i want to dictate the universal definition. :-)
> As for why I prefer to think in terms of the 1/sqrt(N) > normalization, I like to think of the DFT/IDFT as a unitary > transform pair, defined as > > X = Wx [4.a] > x = W^HX [4.b] > > where x and X are Nx1 vectors and W is an NxN unitary matrix, > > W[n,k] = 1/sqrt(N) exp(-j2*pi*n*k/N) [5] > > With this convention, we find > > W^HW = I [6.a] > WW^H = I [6.b] > > and so the DFT preserves the norm and all that. The problem with > the definition [5] is that one needs to drag that 1/sqrt(N) > around everywhere, which is cumbersome analytically, as well as > numerically expensive.
depends on what you're doing. in a MATLAB-like environment, perhaps you wanna drag it around and pay for the scaling expense and not have to think about it. in a real-time fast convolution application, i would leave the 1/N out of *both* forward and inverse FFT and put all of the necessary scaling into H[k]. why not?
> So there are several very good practical reasons for doing things > the way they are done now, and have been for at least four decades,
namely to reduce confusion. but i don't agree with that for everything in general. Mathworks still should fix their indexing deficit in MATLAB. even if MATLAB has been around for a couple of decades and people are getting used to finding their DC component in X(1). there is still no excuse for it. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
> strictly speaking, i guess you're right, Steven, but since > the transfer function, H[k] = DFT{ h[n] } (where h[n] is the > FIR) is calculated in advance and is constant for fast > convolution, you can deal with all of the scaling in H[k] > and you don't need any 1/N anywhere, in either the > forward or inverse FFT.
Of course. This is why, for computational purposes, I've always felt that the best normalization for FFTs is no normalization at all---any overall scale factor can, in most practical applications, be absorbed elsewhere at little or no cost (e.g. in the convolution kernel). Steven
<stevenj@alum.mit.edu> wrote in message
news:1125943989.790065.231790@o13g2000cwo.googlegroups.com...

> Of course. This is why, for computational purposes, I've always felt > that the best normalization for FFTs is no normalization at all---any > overall scale factor can, in most practical applications, be absorbed > elsewhere at little or no cost (e.g. in the convolution kernel).
But continuing along these lines is dangerous because it can lead to computing convolutions in fewer operations than would be the case if the FFT and its inverse (or transpose) were used, as in http://home.comcast.net/~kmbtib/conv2c.f90 and this would violate the orthodoxy contraints of the current forum... -- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end
James Van Buskirk wrote:
> But continuing along these lines is dangerous because it can lead > to computing convolutions in fewer operations than would be the case > if the FFT and its inverse (or transpose) were used [...] > and this would violate the orthodoxy contraints of the current > forum...
James, you know that I respect your work on reducing op counts, but the general idea that a specialized convolution algorithm can be more efficient (either in arithmetic counts or by other measures) than a general-purpose FFT/IFFT pair has been known for decades and is not especially controversial. However, this kind of specialization usually comes at the cost of generality. The classic example, of course, is the idea that one can eliminate bit-reversal passes if you are just doing a convolution (or, even if your FFT doesn't have a separate bit-reversal "pass", a lack of re-ordering can have other benefits). In principle, such a permuted FFT can be used for just about any other FFT application as well...you just have to absorb the permutation into other parts of the computation. Requiring the user to deal with a permuted ordering in general, however, imposes nontrivial additional coding complexity as well as (depending on how the user implements it) a possibly non-negligible time as well (changing memory access patterns is not free). For a *general-purpose* FFT routine, the most I feel justified in pushing onto the user is a trivial overall scale factor, largely because (unlike the output ordering or a per-output scaling) there is no general agreement on what the "right" scale factor should be. (Even so, for things like Matlab a normalized fft/ifft pair is more appropriate.) Steven
stev...@alum.mit.edu wrote:
> James Van Buskirk wrote: > > But continuing along these lines is dangerous because it can lead > > to computing convolutions in fewer operations than would be the case > > if the FFT and its inverse (or transpose) were used [...] > > and this would violate the orthodoxy contraints of the current > > forum... > > James, you know that I respect your work on reducing op counts, but the > general idea that a specialized convolution algorithm can be more > efficient (either in arithmetic counts or by other measures) than a > general-purpose FFT/IFFT pair has been known for decades and is not > especially controversial. However, this kind of specialization usually > comes at the cost of generality. > > The classic example, of course, is the idea that one can eliminate > bit-reversal passes if you are just doing a convolution (or, even if > your FFT doesn't have a separate bit-reversal "pass", a lack of > re-ordering can have other benefits). In principle, such a permuted > FFT can be used for just about any other FFT application as well...you > just have to absorb the permutation into other parts of the > computation. Requiring the user to deal with a permuted ordering in > general, however, imposes nontrivial additional coding complexity as > well as (depending on how the user implements it) a possibly > non-negligible time as well (changing memory access patterns is not > free).
i dunno what James's issue is (you might have had some discussion that i wasn't watching), but one idea i used (in the olden days when i actually did some work in fast convolution) was to have two different routines for the FFT and iFFT, one based on decimation-in-frequency (that required normal ordering in and yielded bit-reversed ordering going out) for the FFT and decimation-in-time (bit-reversed input, normal output) for the iFFT. (the transfer function had to be pre-bit-reversed.) the extra code was very, very small and they could use the same twiddle factor table. that saved in the bit reverse process (by avoiding it) since memory accesses were pretty slow. the only other trick was that it was written so that multiplication by 1 or by j was avoided. other than that, it was a pretty vanilla flavored radix-2 FFT). -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."