DSPRelated.com
Forums

assemble 2K FFT by 2 times 1K FFT?

Started by RobR July 29, 2008
Hi folks,

I heard about the possibility to 
obtain higher order FFTs by utilizing lower order FFTs.
In fact, this IS the basic idea of FFT as far as i know.
But since I am not that deep into the theory of FFT, I have a question to
the experts.

I want to perform a 2K FFT within FPGA.
Now i know, that 2K is just a power of 2 but not of 4, what would enable
to use radix-4 and so a complexity of Nlog4N.
Now my complexity calculation is:

2K FFT:
2048*log2(2048)=22528

2x 1K FFT:
2*1024*log4(1024)=10240

So, if this really works i would come out with half the complexity.

If it's possible, which input samples (time domain) should I feed in wich
1K FFT and how do I merge the output? Will the output merging process will
equal the complexity to the one of the conventional 2K FFT and so will
bring no gain?

Best regards,
Robert
See http://www.dsprelated.com/showarticle/63.php


On Jul 29, 9:40 am, "RobR" <mas...@gmx.de> wrote:
> I heard about the possibility to > obtain higher order FFTs by utilizing lower order FFTs. > In fact, this IS the basic idea of FFTas far as i know. > But since I am not that deep into the theory ofFFT, I have a question to > the experts. > > I want to perform a 2KFFTwithin FPGA. > Now i know, that 2K is just a power of 2 but not of 4, what would enable > to use radix-4 and so a complexity of Nlog4N.
You're confused about the "complexity", and about radix 2 vs. 4. First of all, "complexity" generally refers to asymptotic "big-O" analyses. O(N log2 N) and O(N log4 N) are exactly the same thing, because log2 N and log4 N only differ by a constant factor. Second, if you want an exact arithmetic count, it is not N log2 N and N log4 N. There is a constant coefficient in front that you are leaving off, and there is also an O(N) term that you can't ignore until N is really large. Third, the savings from radix 2 vs. radix 4 do not come from the log2 vs. log4 per se, and the differences in arithmetic counts are most certainly *not* a factor of 2, because the radix-4 algorithm has a bigger constant factor in front of the N log4 N (a single radix-4 butterfly takes more computation than a single radix-2 butterfly). When you work it out, the arithmetic (adds+multiples) count of the radix-2 algorithm goes asymptotically as 5 N log2 N, while the adds +multiples count of the radix-4 algorithm goes asymptotically as 17/4 N log2 N (see http://cnx.org/content/m12027/latest/). The radix-4 algorithm is slightly better (by 15%) because one radix-4 butterfly costs less than 2 radix-2 butterflies, due to some multiplications that can be simplified in the radix-4 case. Fourth, you shouldn't think of radix-2 and radix-4 as algorithms where you *have* to use only one or the other. Think of them as steps in an FFT: each radix-2 step divides N by 2, and each radix-4 step divides N by 4. So, it is perfectly fine to do N=2048 as one radix-2 step followed by 5 radix-4 steps. If you understand the derivation of the radix-2 and radix-4 algorithms, this should be obvious. Fifth, if you really care about minimizing arithmetic counts, there are more algorithms out there than radix-2 and radix-4. e.g. the classic split-radix algorithm (count ~ 4 N log2 N) and recent variations thereof (count ~ 34/9 N log2 N .... see e.g. www.fftw.org/newsplit.pdf) Sixth, depending on your hardware, simply minimizing arithmetic counts is probably not the right thing to do. e.g. for an FPGA I'm guessing multiplies are more expensive (although I have no particular expertise in FPGAs), and there are various things you can do to trade off multiplications for additions. Regards, Steven G. Johnson
>Sixth, depending on your hardware, simply minimizing arithmetic counts >is probably not the right thing to do. e.g. for an FPGA I'm guessing >multiplies are more expensive (although I have no particular expertise >in FPGAs), and there are various things you can do to trade off >multiplications for additions. > >Regards, >Steven G. Johnson >
Newer FPGAs (the last several years or so) have dedicated multipliers in them. Typical size is only 18 bits, but they often come with logic of some sort plus an accumulator (such as with the Xilinx SX parts). Complex multiplies obviously take several multipliers depending upon the actual implementation. Doing larger bit-widths will increase the number of required multiplies, too. Ultimately, given the way the FPGA software works, it is difficult to trade multiplies and additions directly since it will often figure out the best implementation w.r.t. whichever parameter you choose to optimize against, e.g., you can optimize for speed or size. Mark