comp.dsp | non-recursive fft

Is there any info on how to implement fft and ifft algorithms that aren't recursive?  I'd like to possibly calculate a really long on inside a signal chain so attempting to compute a 2 second fft in one function call would probably be disastrous.  

-Matt

Reply by Steve Pope ●November 22, 20192019-11-22

MatthewA  <matthewaudio@gmail.com> wrote:

>Is there any info on how to implement fft and ifft algorithms that
>aren't recursive?  I'd like to possibly calculate a really long on
>inside a signal chain so attempting to compute a 2 second fft in one
>function call would probably be disastrous.  

They usually are not recursive.

I think what you're saying though is you want a lower-latency FFT/IFFT.
There can be a complexity/tradeoff issue but you can by throwing 
compute resources at it make the latency as close to zero as you wish.

Steve

Reply by MatthewA ●November 22, 20192019-11-22

Gosh, Steve...  apologies for my naivety.   The only implementation I've seen has been recursive.  I'm a bit of a noob in this regard.    I'm definitely looking to compute these frames with a high degree of latency.  The idea here is that we trade immediacy for spectral accuracy because, in my use case, latency isn't a huge problem as long as it stays under a second or two.    In my current implementation I use a low priority thread and do it on the graphics processor since it could be up to a few seconds of 44.1k audio. 

The point of the original question is, how can I spread the computation of an FFT out over multiple vector-based performance routine calls.  (not sure the exact term for that.)

&#8499;

Reply by Steve Pope ●November 22, 20192019-11-22

MatthewA  <matthewaudio@gmail.com> wrote:

>Gosh, Steve...  apologies for my naivety.   The only implementation I've
>seen has been recursive.  I'm a bit of a noob in this regard.    

No problem.

In terms of DSP, an FFT is not recursive, it is a FIR (finite impulse
response) filter, not an IIR (infinite impulse response, recursive) filter.

In terms of being sometimes a piece of software, sure an FFT would likely
be "recursive", or at least re-entrant, by building up bigger FFT's
from smaller ones in a recursion.

I could have guessed this is what you meant.

Any, "unrolling" loops and recursion in software can speed things up,
but not maybe to the extent you need.

Steve

Reply by MatthewA ●November 22, 20192019-11-22

Ouch!

Reply by Piergiorgio Sartor ●November 22, 20192019-11-22

On 22/11/2019 19.33, MatthewA wrote:
> Ouch!

Ops, sorry, I did not meant to be
rude or offend, or I misunderstood...

bye,

-- 

piergiorgio

Reply by Christian Gollwitzer ●November 22, 20192019-11-22

Am 22.11.19 um 19:04 schrieb MatthewA:
> Gosh, Steve...  apologies for my naivety.   The only implementation I've seen has been recursive.  I'm a bit of a noob in this regard.    I'm definitely looking to compute these frames with a high degree of latency.  The idea here is that we trade immediacy for spectral accuracy because, in my use case, latency isn't a huge problem as long as it stays under a second or two.    In my current implementation I use a low priority thread and do it on the graphics processor since it could be up to a few seconds of 44.1k audio.
> 
> The point of the original question is, how can I spread the computation of an FFT out over multiple vector-based performance routine calls.  (not sure the exact term for that.)
> 

Ah, so the "real question" is: How can I efficiently compute an FFT on a 
parallel vector processor like a GPU?

Unfortunately, that question is nontrivial, because, as you have 
discovered, there is a lot of dopendency between the input and output 
values. Each input value in a FFT has an influence onto each output 
value, which makes parallel processing hard.

You have several options, depending on your input length, that might 
speed up things.


1) Do you really need the full FFT, or just a few single bins? If only a 
few bins are required, a direct convolution can be quite efficient on 
the GPU. You can precompute the sine/cosine for these frequency bins and 
compute the direct dot product.

"True" FFT only pays off if you want the full transform. There is even a 
special algorithm for the computation of single FFT bins, the Goertzel 
algorithm.

2) Do you have maybe multiple FFTs which you can process in parallel? In 
particular, if you compute 2D FFTs, then the FFT runs on each row and 
then on each column, which can all be processed in parallel

3) There are open source libraries for FFTs on GPUs, e.g. 
https://github.com/clMathLibraries/clFFT for OpenCL, based on some early 
examples by Apple. The code is not that straight forward and it requires 
multiple passes on the host compiler, so your mileage may vary.

	Christian

Reply by MatthewA ●November 22, 20192019-11-22

Gosh I had no clue I could be interpreted so many ways.  I am very sorry.  Let me put it in pseudocode:

cnt = 0
maxCnt= 2000

audioPerform(double *in, double *out){

    //this will interrupt audio!!!!  
    if(cnt==0)
          Compute an entire 16777216 bin fft 

      cnt =(cnt+1)%maxCnt
}


audioPerform(double *in, double *out){

    //this is presumable will not stop audio!!!!  
    Compute (16777216/maxCnt) bins of a  16777216 Bin fft.

      
}

Reply by Tauno Voipio ●November 23, 20192019-11-23

On 23.11.19 02:06, MatthewA wrote:
> Gosh I had no clue I could be interpreted so many ways.  I am very sorry.  Let me put it in pseudocode:
> 
> cnt = 0
> maxCnt= 2000
> 
> audioPerform(double *in, double *out){
> 
>      //this will interrupt audio!!!!
>      if(cnt==0)
>            Compute an entire 16777216 bin fft
> 
>        cnt =(cnt+1)%maxCnt
> }
> 
> 
> audioPerform(double *in, double *out){
> 
>      //this is presumable will not stop audio!!!!
>      Compute (16777216/maxCnt) bins of a  16777216 Bin fft.
> 
>        
> }


The FFT cannot be partitioned in the way you ask,
but you can compute one butterfly at a time. It does
not, however, produce any useful results until the
last butterfly is computed.

-- 

-TV

Reply by Marcel Mueller ●November 23, 20192019-11-23

Am 23.11.19 um 00:05 schrieb Christian Gollwitzer:
> Am 22.11.19 um 19:04 schrieb MatthewA:
>> The point of the original question is, how can I spread the 
>> computation of an FFT out over multiple vector-based performance 
>> routine calls.&nbsp; (not sure the exact term for that.)
> 
> Ah, so the "real question" is: How can I efficiently compute an FFT on a 
> parallel vector processor like a GPU?
> 
> Unfortunately, that question is nontrivial, because, as you have 
> discovered, there is a lot of dopendency between the input and output 
> values. Each input value in a FFT has an influence onto each output 
> value, which makes parallel processing hard.

In fact it is possible and it is implemented e.g. by the hello_fft 
example for the Raspberry Pi. (Even Pi Model 1 is by far fast enough to 
do real time audio processing with FFT on the GPU, e.g a 64 k FIR filter.)

But you are right, you have to synchronize several times depending on 
the capacity of your single stages. The basic idea is that any FFT can 
be divided into cascaded smaller FFTs of arbitrary size.
It depends on the actual architecture how many synchronizations are 
required. If for instance your hardware can process a 4 bit FFT (16 
values) in a single shader unit and you want to do a 16 bit FFT (64k) 
then you need 4 stages. After each stage the result is written back to 
memory and each stage processes different tuples of 16 values in a 
shader unit. In the example you need 4 synchronizations to do an 64 k 
FFT, 3 between the stages an the last one when everything is done.

You should take care of caching issues during processing because the 
memory accesses of some stages aligned at higher powers of 2 might 
impact cache efficiency largely. So basically FFT is mostly memory bound 
rather than CPU/GPU bound on modern hardware.
It may simplify things significantly if you can pass either the input or 
the output array in bit reversed order, e.g. store the frequency domain 
values always bit reversed.


Marcel

Previous12 Next

non-recursive fft

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group