DSPRelated.com
Forums

FFT algoritms

Started by Richard Owlett November 24, 2007
Richard Owlett wrote:

(I wrote)
>> Square waves are not orthogonal, which makes the transform >> harder, but not impossible.
> Why would they be any less orthogonal than sets of sines and cosines? > They should also span the space as well as sines and cosines.
They span the space fine, but aren't orthogonal.
>> Still, you might look at the ICT. It is a transform specifically >> designed to make the forward transform fast on a machine with no >> multiply instructions. You might find from there some of the >> properties of transforms on non-orthogonal basis functions.
>> The ICT assumes that the inverse transform is done on a much >> faster processor (on earth). The forward case can be done >> fast on a CDP1802.
> By ICT do you mean Inverse Cosine Transform? That was the only relevant > Google hit I found. My signal is real but not necessarily even (or odd).
> I'd be interested in anything that eliminated multiplications.
Integer Cosine Transform. http://tda.jpl.nasa.gov/progress_report/42-119/119M.pdf Consider that you have to do the forward transform on a CDP1802, maybe similar in power to a 6502. You have a limited time in which to do the transform, and limited memory for intermediate data. The inverse can be done using the fastest hardware you can find, and you are much less restricted in the time available. Such restrictions led to an interesting new transform. (Well, not so new now.) -- glen
Ron N. wrote:

> On Nov 25, 4:30 am, Richard Owlett <rowl...@atlascomm.net> wrote: > >>glen herrmannsfeldt wrote: >> >>>Richard Owlett wrote: >> >>>(snip) >> >>>>I'm not interested in the computation time but in whether or not I can >>>>follow what's going on. Eventually I want to experiment with using >>>>square waves instead of sinusoids as my basis functions. I suspect it >>>>will be a better model for my situation. >> >>>Square waves are not orthogonal, which makes the transform >>>harder, but not impossible. >> >>Why would they be any less orthogonal than sets of sines and cosines? >>They should also span the space as well as sines and cosines. > > > It depends on which waves. Even waves are orthogonal to odd > waves (sinusoidal or square), periodic waves are orthogonal > to waves at 2^N times the frequency (sinusoidal or square). > But square waves are not orthogonal to square waves at three > times the frequency (or other odd multiples), whereas > sinusoids are.
Oh the trouble gotten into when reasoning by analogy without checking ;)
> > There are Walsh functions which do span a space and are > orthogonal, but they are not purely "square" waves, if square > means periodic with a constant 50% duty cycle. Since they > are not completely constant in periodicity or duty cycle, > using them to determine "frequency content", for more common > usages of that term, might be problematic. >
I can't see an EXPLICIT requirement for 50% duty cycle. However I'm looking for simplicity and a ZERO mean value can easily be shown to be a requirement. So 50% duty cycle probably is IMPLICIT. Thanks.
glen herrmannsfeldt wrote:

> Richard Owlett wrote: > > (I wrote) > >>> Square waves are not orthogonal, which makes the transform >>> harder, but not impossible. > > >> Why would they be any less orthogonal than sets of sines and cosines? >> They should also span the space as well as sines and cosines. > > > They span the space fine, but aren't orthogonal. > >>> Still, you might look at the ICT. It is a transform specifically >>> designed to make the forward transform fast on a machine with no >>> multiply instructions. You might find from there some of the >>> properties of transforms on non-orthogonal basis functions. > > >>> The ICT assumes that the inverse transform is done on a much >>> faster processor (on earth). The forward case can be done >>> fast on a CDP1802. > > >> By ICT do you mean Inverse Cosine Transform? That was the only >> relevant Google hit I found. My signal is real but not necessarily >> even (or odd). > > >> I'd be interested in anything that eliminated multiplications. > > > Integer Cosine Transform. > > http://tda.jpl.nasa.gov/progress_report/42-119/119M.pdf
Written by and FOR mathematicians ;<
> > Consider that you have to do the forward transform on a CDP1802,
When Googling for CDP1802 to see if was what I recalled as an 1802, I found "A CDP1802 is much like "486" or Pentium is to Windows PCs today" http://www.cdp1802.org/cdp1802.html OK, so what is a little context between friends ;)
> maybe similar in power to a 6502. You have a limited time in which > to do the transform, and limited memory for intermediate data. > > The inverse can be done using the fastest hardware you can find, > and you are much less restricted in the time available. Such > restrictions led to an interesting new transform. > (Well, not so new now.) > > -- glen >
Richard Owlett wrote:
> In "Chapter 12 - The Fast Fourier Transform / FFT Programs" of _The > Scientist and Engineer's Guide to Digital Signal Processing_ > Smith says "There are also FFT routines that completely eliminate the > bit reversal sorting." > > Could someone point me to a description of one, preferably with BASIC or > FORTRAN sample code. > > TIA
KissFFT does not use bit-reversed addressing. I think the mixed-radix fft is not a good match for it. If you are interested in seeing the derivation, I put together a pdf that contains the derivation used in kissfft. http://sourceforge.net/projects/kissfft (go to download, browse all files) -- Mark
Mark Borgerding wrote:
> Richard Owlett wrote: > >> In "Chapter 12 - The Fast Fourier Transform / FFT Programs" of _The >> Scientist and Engineer's Guide to Digital Signal Processing_ >> Smith says "There are also FFT routines that completely eliminate the >> bit reversal sorting." >> >> Could someone point me to a description of one, preferably with BASIC >> or FORTRAN sample code. >> >> TIA > > > > KissFFT does not use bit-reversed addressing. > I think the mixed-radix fft is not a good match for it. > > If you are interested in seeing the derivation, I put together a pdf > that contains the derivation used in kissfft. > > http://sourceforge.net/projects/kissfft > (go to download, browse all files) > > -- Mark
Downloaded. This group seems to insist I understand the math I took 40 years ago ;) Thanks
On Nov 24, 9:45 am, Richard Owlett <rowl...@atlascomm.net> wrote:
> In "Chapter 12 - The Fast Fourier Transform / FFT Programs" of _The > Scientist and Engineer's Guide to Digital Signal Processing_ > Smith says "There are also FFT routines that completely eliminate the > bit reversal sorting." > > Could someone point me to a description of one, preferably with BASIC or > FORTRAN sample code.
A number of references are given in the Wikipedia article on the Cooley-Tukey FFT algorithm. http://en.wikipedia.org/wiki/Cooley-Tukey_FFT_algorithm The Cooley-Tukey FFT (the most common algorithm), along with other FFT algorithms for that matter, intrinsically requires some sort of re- ordering, in the sense that its operations map consecutive inputs to non-consecutive outputs and vice-versa. The most well-known approach is to do the re-ordering all at once, with a single bit-reversal pass (or, more generally, a mixed-base digit reversal). Another approach that still works in-place is to arrange the sequence of radices so that one can do a sequence of small transpositions interleaved with the transform (alternating DIT/DIF stages work well here). The simplest approaches work out-of-place. e.g. the Stockham algorithm goes back and forth between two arrays with each pass, transposing one digit at a time. The easiest approach, in my mind, is simply to do the algorithm recursively out-of-place; if you directly transcribe the Cooley-Tukey algorithm from the mathematical description, working recursively, all of the outputs are "automatically" put in the right place. (This was the approach we used in the original FFTW; the current version of FFTW implements a variety of algorithms as described in our Proc. IEEE paper.) For example, here is a very simple 15-line radix-2 FFT, working recursively and out-of-place, that does not require any separate bit- reversal pass. (To make it reasonably fast, one would want to use a larger base case than the size-1 base-case used here. It also uses a relatively inaccurate trigonometric recurrence. But of course, if you were actually interested in such practical issues you should be using a pre-existing FFT library.) #include <complex.h> #define TWOPI 6.2831853071795864769252867665590057683943388 /* call with rec_fft(+1 or -1, N, input, output, 1, 1), where input and output are arrays of length N ... requires C99 complex-number support */ void rec_fft(int sign, int N, complex double *x, complex double *y, int is, int os) { if (N == 1) *y = *x; else { int N2 = N/2; rec_fft(sign, N2, x, y, is*2, os); rec_fft(sign, N2, x+is, y+os*N2, is*2, os); complex double omega = 1.0, omega1 = cexp(sign * I * TWOPI / N); for (int i = 0; i < N2; ++i) { complex double a = y[os*i], b = omega * y[os*(i + N2)]; y[os*i] = a + b; y[os*(i + N2)] = a - b; omega *= omega1; } } } Regards, Steven G. Johnson PS. Numerical Recipes, which several other posters have suggested, is not the best reference for information beyond the textbook radix-2 algorithm. Its advice on more sophisticated FFT algorithms is at least 20 years out of date, to the extent that it was ever true.
On Nov 24, 11:30 pm, robert bristow-johnson
<r...@audioimagination.com> wrote:
> there are drawings of it in Oppenheim and Schafer (any era of O&S) for > the case of N=8. i think i can see a pattern in them but, they do not > appear to me to be as efficient, for most instruction sets, as the > simple FFT where you have to bit reverse before or after. you need to > maintain many more pointers and i'm not completely sure how you > manipulate them all for the different FFT passes.
To the contrary, a separate bit-reversal pass means that you have an additional "pass" over your data, which can have a non-negligible impact on performance (if the rest of your FFT is fast...if you are using vanilla radix-2 it doesn't make much difference). For large N, the problem is the cache impact. For small N, e.g. N=8, you should just read everything into registers, compute the transform, and write it back out in the correct order (or whatever order you want). Even for N larger than the cache (or registers, which can be viewed as a kind of cache), the goal is to do as much as possible with the data while it is in cache, and avoid going in and out of the cache as much as possible. One point of potential confusion: the FFT variants that do not require bit-reversal are not really different "algorithms". All of the ones being referred to here are still variants of the same Cooley-Tukey factorization of the DFT, with the same number of arithmetic operations. The only difference lies in where/when data are stored/ accessed in memory. Regards, Steven G. Johnson
On Nov 27, 12:35 pm, "Steven G. Johnson" <stev...@alum.mit.edu> wrote:
> On Nov 24, 11:30 pm, robert bristow-johnson > > <r...@audioimagination.com> wrote: > > there are drawings of it in Oppenheim and Schafer (any era of O&S) for > > the case of N=8. i think i can see a pattern in them but, they do not > > appear to me to be as efficient, for most instruction sets, as the > > simple FFT where you have to bit reverse before or after. you need to > > maintain many more pointers and i'm not completely sure how you > > manipulate them all for the different FFT passes. > > To the contrary, a separate bit-reversal pass means that you have an > additional "pass" over your data, which can have a non-negligible > impact on performance (if the rest of your FFT is fast...if you are > using vanilla radix-2 it doesn't make much difference). For large N, > the problem is the cache impact. For small N, e.g. N=8, you should > just read everything into registers, compute the transform, and write > it back out in the correct order (or whatever order you want). Even > for N larger than the cache (or registers, which can be viewed as a > kind of cache), the goal is to do as much as possible with the data > while it is in cache, and avoid going in and out of the cache as much > as possible. > > One point of potential confusion: the FFT variants that do not require > bit-reversal are not really different "algorithms". All of the ones > being referred to here are still variants of the same Cooley-Tukey > factorization of the DFT, with the same number of arithmetic > operations. The only difference lies in where/when data are stored/ > accessed in memory.
Steven, what's different (in terms of code) is the pointer arithmetic. doing all sorts of goofy pointer arithmetic might be more costly, in the aggregate throughout all of the FFT passes, than a single bit-reversing pass at the beginning or end. the normal in- place FFT should do better cache-wise than these normal-in, normal-out routines because the top (or bottom) of the butterfly will share pages of memory with the top (or bottom) of the next butterfly. and in- place should do better cache-wise than an indexing scheme where two input nodes get read in and crunched and then written to two totally different locations for the output (and this cannot be in place, you would have to have two buffers, each of size N, and ping-pong back and forth between the two for each pass). in addition, as i tried to point out earlier, if you're doing something like fast-convolution using the FFT, then if your forward FFT leaves its results in bit-reversing order and your inverse FFT takes its input in bit reversed order (and leaves the results in normal order), then no bit reversing will be necessary. (the transfer function, besides being pre-computed, will also be pre bit-reversed.) r b-j
Hi Richard,
I haven&rsquo;t seen these alternate FFTs in years, but I can look around if
you have a need.  However, they are just as cryptic and difficult to
understand as the conventional ones.    Also, I doubt that you could
modify one to use square waves.  It sounds like you should be using the
&ldquo;DFT by Correlation&rdquo; method.  See page 157 in Chapter 8 of my on-line
book (www.DSPguide.com). This can directly be modified to use square
waves.  Good luck and let me know if I can help. 
Regards,
Steve

Steve.Smith1@SpectrumSDI.com    
Steven G. Johnson wrote:
(snip regarding bit-reversal in FFTs)

> The most well-known approach is to do the re-ordering all at once, > with a single bit-reversal pass (or, more generally, a mixed-base > digit reversal). Another approach that still works in-place is to > arrange the sequence of radices so that one can do a sequence of small > transpositions interleaved with the transform (alternating DIT/DIF > stages work well here). The simplest approaches work out-of-place. > e.g. the Stockham algorithm goes back and forth between two arrays > with each pass, transposing one digit at a time.
I still remember a Fortran FFT subroutine from long ago that did bit reversal using nested DO loops. Depending on N (the number of points) it would (illegally) GOTO the appropriate nested DO and (maybe) GOTO out at the appropriate point. Not very portable, but it must have worked on at least one machine. -- glen