comp.dsp | assemble 2K FFT by 2 times 1K FFT?

Hi folks,

I heard about the possibility to 
obtain higher order FFTs by utilizing lower order FFTs.
In fact, this IS the basic idea of FFT as far as i know.
But since I am not that deep into the theory of FFT, I have a question to
the experts.

I want to perform a 2K FFT within FPGA.
Now i know, that 2K is just a power of 2 but not of 4, what would enable
to use radix-4 and so a complexity of Nlog4N.
Now my complexity calculation is:

2K FFT:
2048*log2(2048)=22528

2x 1K FFT:
2*1024*log4(1024)=10240

So, if this really works i would come out with half the complexity.

If it's possible, which input samples (time domain) should I feed in wich
1K FFT and how do I merge the output? Will the output merging process will
equal the complexity to the one of the conventional 2K FFT and so will
bring no gain?

Best regards,
Robert

Reply by Chris Felton ●July 29, 20082008-07-29

See http://www.dsprelated.com/showarticle/63.php

Reply by Steven G. Johnson ●August 12, 20082008-08-12

On Jul 29, 9:40 am, "RobR" <mas...@gmx.de> wrote:
> I heard about the possibility to
> obtain higher order FFTs by utilizing lower order FFTs.
> In fact, this IS the basic idea of FFTas far as i know.
> But since I am not that deep into the theory ofFFT, I have a question to
> the experts.
>
> I want to perform a 2KFFTwithin FPGA.
> Now i know, that 2K is just a power of 2 but not of 4, what would enable
> to use radix-4 and so a complexity of Nlog4N.

You're confused about the "complexity", and about radix 2 vs. 4.

First of all, "complexity" generally refers to asymptotic "big-O"
analyses.  O(N log2 N) and O(N log4 N) are exactly the same thing,
because log2 N and log4 N only differ by a constant factor.

Second, if you want an exact arithmetic count, it is not N log2 N and
N log4 N.  There is a constant coefficient in front that you are
leaving off, and there is also an O(N) term that you can't ignore
until N is really large.

Third, the savings from radix 2 vs. radix 4 do not come from the log2
vs. log4 per se, and the differences in arithmetic counts are most
certainly *not* a factor of 2, because the radix-4 algorithm has a
bigger constant factor in front of the N log4 N (a single radix-4
butterfly takes more computation than a single radix-2 butterfly).
When you work it out, the arithmetic (adds+multiples) count of the
radix-2 algorithm goes asymptotically as 5 N log2 N, while the adds
+multiples count of the radix-4 algorithm goes asymptotically as 17/4
N log2 N (see http://cnx.org/content/m12027/latest/).  The radix-4
algorithm is slightly better (by 15%) because one radix-4 butterfly
costs less than 2 radix-2 butterflies, due to some multiplications
that can be simplified in the radix-4 case.

Fourth, you shouldn't think of radix-2 and radix-4 as algorithms where
you *have* to use only one or the other.  Think of them as steps in an
FFT: each radix-2 step divides N by 2, and each radix-4 step divides N
by 4.  So, it is perfectly fine to do N=2048 as one radix-2 step
followed by 5 radix-4 steps.  If you understand the derivation of the
radix-2 and radix-4 algorithms, this should be obvious.

Fifth, if you really care about minimizing arithmetic counts, there
are more algorithms out there than radix-2 and radix-4.  e.g. the
classic split-radix algorithm (count ~ 4 N log2 N) and recent
variations thereof (count ~ 34/9 N log2 N .... see e.g. www.fftw.org/newsplit.pdf)

Sixth, depending on your hardware, simply minimizing arithmetic counts
is probably not the right thing to do.  e.g. for an FPGA I'm guessing
multiplies are more expensive (although I have no particular expertise
in FPGAs), and there are various things you can do to trade off
multiplications for additions.

Regards,
Steven G. Johnson

Reply by markt ●August 13, 20082008-08-13

>Sixth, depending on your hardware, simply minimizing arithmetic counts
>is probably not the right thing to do.  e.g. for an FPGA I'm guessing
>multiplies are more expensive (although I have no particular expertise
>in FPGAs), and there are various things you can do to trade off
>multiplications for additions.
>
>Regards,
>Steven G. Johnson
>
Newer FPGAs (the last several years or so) have dedicated multipliers in
them.  Typical size is only 18 bits, but they often come with logic of some
sort plus an accumulator (such as with the Xilinx SX parts).  Complex
multiplies obviously take several multipliers depending upon the actual
implementation.  Doing larger bit-widths will increase the number of
required multiplies, too.

Ultimately, given the way the FPGA software works, it is difficult to
trade multiplies and additions directly since it will often figure out the
best implementation w.r.t. whichever parameter you choose to optimize
against, e.g., you can optimize for speed or size.

Mark

assemble 2K FFT by 2 times 1K FFT?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group