DSPRelated.com
Forums

N-pt DFT where n != power of 2

Started by dspdummy August 12, 2009
Steven G. Johnson <stevenj@alum.mit.edu> wrote:

>On Aug 13, 7:12&#4294967295;pm, Oli Charlesworth <ca...@olifilth.co.uk> wrote:
>> Isn't there only one rounding point in a DFT? &#4294967295;(At the end of the >> multiply-accumulate for each output bin.) &#4294967295;This is versus log(N) >> rounding points in anFFT.
>No.
One can in theory have as few or as many "rounding points" as one chooses with either algorithm, but I believe the sense of Oli's statement for the fixed point scenario is as follows: For DFT: (1) round the coefficients; (2) compute each output via multiply/accumulate, keeping all overflow/underflow bits; (3) round the outputs For FFT: (1) round the coefficients; (2) after each butterfly, round the intermediate result (3) round the final outputs In the FFT case, if you try to omit the rounding in step (2), you have an enormous number of underflow bits to keep around, since at each stage the word width grows by the number of bits in the coefficient. So for example, consider a radix 4, size 1024 FFT. If the coefficients are 12 bits, and the input is 16 bits, with no intermediate rounding the word width grows to something like 16 + log4(1024) * 12 = 76 bits. Whereas in the DFT case, it only grows to about 16 + 12 + log2(1024) = 38 bits. Viewed this way Oli's claim is correct -- you're probably forced to do more intermediate rounding in an FFT. But there are some other issues, most immediately that the required precision of the coefficients will be greater for the DFT, and more of the FFT coefficients are trivial numbers like +/- 1. Ultimately any such algorithm is a black box and you could choose to have no rounding within that black box. You only absolutely need to impose precision requirements on the data inputs and coefficient inputs to make it a realizable computation. You can keep all the overflow and underflow bits. Even if all the inputs are double-precision floating point, you could choose to do no rounding internally creating an output in some gargantuan number format. However that is never how it's done, and in reality doing unbiased rounding/saturation does amount to some cost. Steve
Steve Pope <spope33@speedymail.org> wrote:
(snip)
 
< One can in theory have as few or as many "rounding points" as one 
< chooses with either algorithm, but I believe the sense of 
< Oli's statement for the fixed point scenario is as follows:
(snip)
 
< For FFT:
 
< (1) round the coefficients;
< (2) after each butterfly, round the intermediate result
< (3) round the final outputs
 
< In the FFT case, if you try to omit the rounding in step (2),
< you have an enormous number of underflow bits to keep around,
< since at each stage the word width grows by the number of
< bits in the coefficient.

I suppose so.  But it seems more reasonable to me to add
one bit for each radix two butterfly.  That is, keep a constant
number of bits after the binary point, but an increasing number
before as the result of addition.   In some cases, that will
still be too many bits, but for 120 or 960 maybe not so bad.

-- glen 
glen herrmannsfeldt wrote:
> Steve Pope <spope33@speedymail.org> wrote: > (snip) > > < One can in theory have as few or as many "rounding points" as one > < chooses with either algorithm, but I believe the sense of > < Oli's statement for the fixed point scenario is as follows: > (snip)
I should have clarified; yes, I was assuming fixed-point maths. (Although I'm curious to know how the situation is significantly altered by the use of floating point.)
> > < For FFT: > > < (1) round the coefficients; > < (2) after each butterfly, round the intermediate result > < (3) round the final outputs > > < In the FFT case, if you try to omit the rounding in step (2), > < you have an enormous number of underflow bits to keep around, > < since at each stage the word width grows by the number of > < bits in the coefficient. > > I suppose so. But it seems more reasonable to me to add > one bit for each radix two butterfly. That is, keep a constant > number of bits after the binary point, but an increasing number > before as the result of addition. In some cases, that will > still be too many bits, but for 120 or 960 maybe not so bad. >
One bit of growth in each stage will account for the addition, but it will not account for the increased number of fractional bits that exist from the multiply by the twiddle factor. Perhaps, in practice, this is less important than the "integer" growth, although it's not immediately obvious to me why this would be! -- Oli
Oli Charlesworth <catch@olifilth.co.uk> wrote:
< glen herrmannsfeldt wrote:
<> Steve Pope <spope33@speedymail.org> wrote:

(snip) 
< I should have clarified; yes, I was assuming fixed-point maths.
< (Although I'm curious to know how the situation is significantly altered
< by the use of floating point.)
(snip)
 
<> I suppose so.  But it seems more reasonable to me to add
<> one bit for each radix two butterfly.  That is, keep a constant
<> number of bits after the binary point, but an increasing number
<> before as the result of addition.   In some cases, that will
<> still be too many bits, but for 120 or 960 maybe not so bad.
 
< One bit of growth in each stage will account for the addition, but it
< will not account for the increased number of fractional bits that exist
< from the multiply by the twiddle factor.

If the input data to the transform has an uncertainty of about
1 in the LSB, and the twiddle factors are of order 1, then the
uncertainty of the product is about 1 in the LSB if the product is
rounded to the same number of places after the binary point.

Not that there is never reason to keep more, but you can see
why you wouldn't want any fewer.   In most cases, except for the
zero frequency (DC) term, there will be some cancellation.

If the input is a full scale sinusoid, then it will take at about
log2(N) extra bits on the left of the binary point.  

It might be that due to averaging, you gain log2(sqrt(N)) 
additional bits to the result, but I don't think it is too
far off if the result has the same scale factor as the input.
 
< Perhaps, in practice, this is less important than the "integer" growth,
< although it's not immediately obvious to me why this would be!

If you get the integer growth wrong, it overflows, likely with
very strange results.

-- glen
glen herrmannsfeldt wrote:
> Oli Charlesworth <catch@olifilth.co.uk> wrote: > < glen herrmannsfeldt wrote: > <> Steve Pope <spope33@speedymail.org> wrote: > > (snip) > < I should have clarified; yes, I was assuming fixed-point maths. > < (Although I'm curious to know how the situation is significantly altered > < by the use of floating point.) > (snip) > > <> I suppose so. But it seems more reasonable to me to add > <> one bit for each radix two butterfly. That is, keep a constant > <> number of bits after the binary point, but an increasing number > <> before as the result of addition. In some cases, that will > <> still be too many bits, but for 120 or 960 maybe not so bad. > > < One bit of growth in each stage will account for the addition, but it > < will not account for the increased number of fractional bits that exist > < from the multiply by the twiddle factor. > > If the input data to the transform has an uncertainty of about > 1 in the LSB, and the twiddle factors are of order 1, then the > uncertainty of the product is about 1 in the LSB if the product is > rounded to the same number of places after the binary point. > > Not that there is never reason to keep more, but you can see > why you wouldn't want any fewer. In most cases, except for the > zero frequency (DC) term, there will be some cancellation. > > If the input is a full scale sinusoid, then it will take at about > log2(N) extra bits on the left of the binary point. > > It might be that due to averaging, you gain log2(sqrt(N)) > additional bits to the result, but I don't think it is too > far off if the result has the same scale factor as the input. > > < Perhaps, in practice, this is less important than the "integer" growth, > < although it's not immediately obvious to me why this would be! > > If you get the integer growth wrong, it overflows, likely with > very strange results.
Oh, absolutely agreed. However, a typical strategy to combat this is simply to right-shift by 1 and truncate back to the original word length (with rounding) after every radix-2 stage. What I was questioning is why 1 bit of growth per stage (beyond the strategy just mentioned) should be sufficient, whereas for bit-exactness one should require (e.g.) 15. I suspect the answer is in the first part of your response, which I shall give some thought... -- Oli
Steven G. Johnson wrote:

> ... There is some discussion of this in the > penultimate section of our chapter of Sid Burrus' book (http://cnx.org/ > content/m16336/latest/), with references.
... Make that http://cnx.org/content/m16336/latest/ Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
On Aug 14, 1:33&#4294967295;pm, "dspdummy" <chris.gl...@orange.fr> wrote:
> >On 2009-08-14 10:49:41 -0300, Jerry Avins <j...@ieee.org> said: > > >> dspdummy wrote: > > >> &#4294967295; &#4294967295;... > > >>> Hello Jerry, > >>> No pb. I was talking fft and not ifft at the beginning because of > their > >>> mathematical similarites but I DO need an IFFT. I'll check out that > FFTW > >>> but who is Gordon Sande? > > >> Gordon Sande and Steven Johnson maintain and improve FFTW. Both have > >> contributed to this thread, so you can probably see their email > >> addresses. *Netiquette: don't bug them privately unless invited to.* > > >> Jerry > > >Give credit for FFTW to Steven Johnson. My code is much older as it > >dates to 1966. > >It is used in places like Xray crystallography on heavy iron if I > understand > >what I am told by my Xtalgraphy friends. FFTW is for things like > workstations > >like Suns. It is heavy on the software engineering of dealing with > >limited caches. > > Hello again again > Still trying to understand FFTW but not quite there yet. > My final question concerning my problem is: > How can I efficiently(low MIPS) perform a 15-pt fft instead of a dft? > Prime factor or other algorithm? > It would be great to see the algorithm coded in Matlab or C but only for > this specific case. Any suggestions or examples? > Thanks > Bill- Hide quoted text - > > - Show quoted text -
I believe that FFTW does this sort of thing implicitly for you. If you want to do it yourself, look at the Rice University Connections collateral, for example: http://cnx.org/content/m12033/latest/ There is also a collection of resources at: http://faculty.prairiestate.edu/skifowit/fft/ At OptNgn (http://www.optngn.com), we have developed Wingorad based kernels for FPGAs. alan
On Aug 15, 5:26&#4294967295;am, Oli Charlesworth <ca...@olifilth.co.uk> wrote:
> I should have clarified; yes, I was assuming fixed-point maths. > (Although I'm curious to know how the situation is significantly altered > by the use of floating point.)
The error growth is completely different in fixed and floating point. In fixed point, a Cooley-Tukey FFT has rounding errors that grow as O (sqrt(N)), leading to an often-quoted rule of "half a bit of error per radix-2 stage". In floating-point, the same algorithm gives rounding errors that are at worst O(log N) and on average O(sqrt(log N)). Steven
>On Aug 14, 1:33=A0pm, "dspdummy" <chris.gl...@orange.fr> wrote: >> >On 2009-08-14 10:49:41 -0300, Jerry Avins <j...@ieee.org> said: >> >> >> dspdummy wrote: >> >> >> =A0 =A0... >> >> >>> Hello Jerry, >> >>> No pb. I was talking fft and not ifft at the beginning because of >> their >> >>> mathematical similarites but I DO need an IFFT. I'll check out
that
>> FFTW >> >>> but who is Gordon Sande? >> >> >> Gordon Sande and Steven Johnson maintain and improve FFTW. Both
have
>> >> contributed to this thread, so you can probably see their email >> >> addresses. *Netiquette: don't bug them privately unless invited
to.*
>> >> >> Jerry >> >> >Give credit for FFTW to Steven Johnson. My code is much older as it >> >dates to 1966. >> >It is used in places like Xray crystallography on heavy iron if I >> understand >> >what I am told by my Xtalgraphy friends. FFTW is for things like >> workstations >> >like Suns. It is heavy on the software engineering of dealing with >> >limited caches. >> >> Hello again again >> Still trying to understand FFTW but not quite there yet. >> My final question concerning my problem is: >> How can I efficiently(low MIPS) perform a 15-pt fft instead of a dft? >> Prime factor or other algorithm? >> It would be great to see the algorithm coded in Matlab or C but only
for
>> this specific case. Any suggestions or examples? >> Thanks >> Bill- Hide quoted text - >> >> - Show quoted text - > >I believe that FFTW does this sort of thing implicitly for you. > >If you want to do it yourself, look at the Rice University Connections >collateral, for example: >http://cnx.org/content/m12033/latest/ > >There is also a collection of resources at: >http://faculty.prairiestate.edu/skifowit/fft/
ok. I got it thanks mostly to the cnx reference above and that wikipedia reference. Thanks a lot for the help! If anyone is interested I can send you the Matlab M file for the 15-point dft using prime factor algorithm. Now I can also extend it to 960 or 120 pts. Bill
>At OptNgn (http://www.optngn.com), we have developed Wingorad based >kernels for FPGAs. > >alan >
dspdummy wrote:

   ...

> ok. I got it thanks mostly to the cnx reference above and that wikipedia > reference. Thanks a lot for the help! If anyone is interested I can send > you the Matlab M file for the 15-point dft using prime factor algorithm. > Now I can also extend it to 960 or 120 pts.
My hat is off to you for your enthusiasm and comprehension. Nevertheless, I doubt that your finished code will be as fast as FFTW. Why not use the tools that are available for the cost of downloading? Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;