DSPRelated.com
Forums

FFT stitching

Started by SManeesha 4 years ago6 replieslatest reply 4 years ago516 views

Post deleted by author

[ - ]
Reply by KnutIngeMarch 17, 2020

The basic power of two factorization that describes the FFT in the first place shows how to use two N/2 FFTs (+ pre or post processing) to do one N fft. By doing this recursively you can see how large FFTs can be broken into small ones

[ - ]
Reply by kazMarch 17, 2020

True, The butterfly is just fft of 2 samples (N = 2) the rest of the design of any N fft is built on this unit.

Any modifications such as using 64 N for 1024 N etc. is playing around with specific cases but any advantage is implementation dependent between given small fft and the larger fft built around it. 

[ - ]
Reply by rbjMarch 17, 2020

kaz, even doing a butterfly is implementation dependent, i think, because how you gotta handle the indexing of the two in-place bins and then doing the add/subtract math and the scaling with a complex number.

but, i think how the stitching is done is by "manually" implementing the butterflies that include 64-point FFT as a single super-butterfly.

considering the illustration that i stole from https://www.slideshare.net/op205/fast-fourier-transform-presentation , the rectangular 2-point DFT is the same as one of the circular nodes. if you pick the four x0, x4, x2, x6 and go to A0, A1, A2, A3 (two passes) that would be a 4-point DFT node.  put it in a box with four in and four out.  stitch four of those together and you have a 16-point DFT.  at some point you are stitching together 64-point nodes (that would have 6 passes of 2-point DFTs which are otherwise known as "butterflies").




dft_61328.jpg

[ - ]
Reply by kazMarch 17, 2020

I was at some point excited about doing Altera (Intel) 2k fft from their 2x1k fft as described here(figure 2):

https://www.researchgate.net/publication/279753021_Implementing_super-efficient_FFTs_in_Altera_FPGAs

However relative to a 2k fft with serial stream input and serial stream output, what the authors have missed is that to parallise inputs as two substreams for 1k case (or alternatively get one output stream from two output streams) extra memory is needed. Fortunately in my design there were already memory blocks and so I could do parallel substreams at no extra cost and took advantage. Such advantage is purely because how the 2k core was designed in the first place. 

[ - ]
Reply by lamabrewMarch 17, 2020
I read this as sounding like extending the "trick" of reducing FFT complexity by 2 when the input sequence is real (i.e complex ={0}) since you know the output will be symmetric, etc.  In that case it is like getting two for the price of one.  But I don't think you can extend that to M orthogonal datasets, I have a vague memory (and can't find a reference...) that the computations to separate the output are on the same order of complexity as just doing the desired FFTs.  To me that seems to also make intuitive sense, maybe some of the math experts could formally answer that, as I'm curious now too.
[ - ]
Reply by angel0dmosMarch 17, 2020

Do you mean how to perfor a 1024 point FFT with a 64 points core?? This topic is shown in R. Lyons Book in the tips and tricks chapter.