DSPRelated.com
Forums

non-recursive fft

Started by MatthewA November 22, 2019
In the past I have coded an FFT where you calculate one column per function call (there are log2(fftsize) columns).  There is a variant of the standard FFT where the arrows that connect one column to the next are always the same. It’s in Oppenheim and Schaefer if you have it. This would allow you to call the same function (with different twiddle factors passed to the function) on each call.  The drawback is that the input and output vectors are in a strange order.
 Alternatively you could call a different function for each column calculation.  
Typically in an audio application you are using overlapping time-domain windows separated by a “hop” factor. If the hop amount is greater than the number of FFT columns then you might be able to execute one FFT column per input sample, and still keep up with the input data rate.  

Bob
On Fri, 22 Nov 2019 08:37:32 -0800 (PST), MatthewA
<matthewaudio@gmail.com> wrote:

>Is there any info on how to implement fft and ifft algorithms that aren't recursive? I'd like to possibly calculate a really long on inside a signal chain so attempting to compute a 2 second fft in one function call would probably be disastrous. > >-Matt
Multiple FFTs can be pipelined, since independent processors can work on each stage (there are log2(N) stages in an N-point FFT), so that a new FFT output is produced after every stage-length computation rather than waiting for an entire FFT computation. A single, large-N FFT could be computed in parallel by using a DFT instead of an FFT. There would be more computational load overall, but you can exploit parallelism to potentially reduce latency. For an N-point DFT you could have N separate processors compute each output, since there is no interdepence between outputs. If you only have M processors, then each processor computes N/M outputs. Not sure if that's what you're looking for, but thought I'd throw it out there.
On Friday, November 22, 2019 at 12:04:20 PM UTC-6, MatthewA wrote:
> > The point of the original question is, how can I spread the computation of an FFT out over multiple vector-based performance routine calls. (not sure the exact term for that.)
If I understand your question correctly, you want to spread the computation of an N-point DFT across P processors, with each processor computing a unique portion of the transform, such that they can all be combined afterward. Going back to the defining equation: N-1 X(k) = SUM {x(n)&middot;exp(-j2PIkn/N)}, k = 0 ... N n=0 You could have one processor compute, for example, X(k) for k = 0 ... N/P-1, another compute for k = N/P ... 2N/P-1, etc. Search for "pruned FFT". Alternatively, you could have one processor compute N/P-1 X0(k) = SUM {x(n)&middot;exp(-j2PIkn/N)}, k = 0 ... N, n=0 another processor compute 2N/P-1 X1(k) = SUM {x(n)&middot;exp(-j2PIkn/N)}, k = 0 ... N, n=N/P and so on. It should be fairly easy to devise a fast algorithm for this. You could even combine both techniques. - Greg
On Friday, November 22, 2019 at 8:37:37 AM UTC-8, MatthewA wrote:
> Is there any info on how to implement fft and ifft algorithms > that aren't recursive? I'd like to possibly calculate a really > long on inside a signal chain so attempting to compute > a 2 second fft in one function call would probably be disastrous.
The FFT is defined through recursion, but rarely implemented that way. The recursive definition makes it easy to see where the log(N) comes from, though. It is rarely most efficient to write recursive algorithms using actual recursion, the recursive descent compiler possibly being an exception. As for FFT, it isn't hard to describe in parallel terms, but the array element access pattern complicates the implementation on some hardware.