comp.dsp | Million-point FFT

Hi all,

I am interested in performing million-point FFT in FPGA. I've read some
papers and concluded that the best way of performing million-point FFT
would be having multiple smaller size FFT in parallel, for example 8x
16K-point FFT, with twiddle factor multipliers (I think it will be
radix-2^2 streaming FFT with multiple inputs)
But, since I am a newbie in DSP, I want to check if there is any easier
way. I am especially interested in a way where I can avoid parallel input.

Could you give me your idea/comment?
Thanks!

Thomas

Reply by glen herrmannsfeldt ●December 11, 20122012-12-11

gongdori <16548@dsprelated> wrote:

> I am interested in performing million-point FFT in FPGA. 

Do you want to do it in an FPGA just to see that it can be done?

The whole idea behind FFT is that it is O(N logN) and so can be
done fast in software. A million points isn't all that many
for modern processors.

> I've read some
> papers and concluded that the best way of performing million-point FFT
> would be having multiple smaller size FFT in parallel, for example 8x
> 16K-point FFT, with twiddle factor multipliers (I think it will be
> radix-2^2 streaming FFT with multiple inputs)
> But, since I am a newbie in DSP, I want to check if there is any easier
> way. 

As far as I know, parallell inputs don't help the FPGA much, though
it might help data get in faster. 

What is the data source? If you can get the data in bit reversed
order, then the FFT is very easy to do. 

If you write it all to RAM, then read it out in bit reversed order, 
it should work pretty well. 

The only reason to move the data in parallel is to get it in and
out faster. How fast do you need it to run?

> I am especially interested in a way where I can avoid parallel input.

-- glen

Reply by Tim Wescott ●December 11, 20122012-12-11

On Tue, 11 Dec 2012 19:28:34 +0000, glen herrmannsfeldt wrote:

> gongdori <16548@dsprelated> wrote:
>  
>> I am interested in performing million-point FFT in FPGA.
> 
> Do you want to do it in an FPGA just to see that it can be done?
> 
> The whole idea behind FFT is that it is O(N logN) and so can be done
> fast in software. A million points isn't all that many for modern
> processors.

For certain problems -- and this one 'feels' like a candidate -- doing 
math on an FPGA can be less expensive in terms of BOM cost, space 
consumed by hardware, and/or power consumption.

Assuming, of course, that one is building custom hardware anyway.  If you 
just want to do it once, use a PC!

-- 
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

Reply by glen herrmannsfeldt ●December 11, 20122012-12-11

>> gongdori <16548@dsprelated> wrote:
>>> I am interested in performing million-point FFT in FPGA.

(snip, then I wrote)
>> Do you want to do it in an FPGA just to see that it can be done?

>> The whole idea behind FFT is that it is O(N logN) and so can be done
>> fast in software. A million points isn't all that many for modern
>> processors.

Then Tim Wescott <tim@seemywebsite.com> wrote:
> For certain problems -- and this one 'feels' like a candidate -- doing 
> math on an FPGA can be less expensive in terms of BOM cost, space 
> consumed by hardware, and/or power consumption.

Certainly that is true. But then again, you can always use a soft
processor on the FPGA. 

The OP didn't say fixed or floating point, which also might matter.

It could also be a school assignment to do something on an FPGA, and
possibly with the requirement that it not be in a soft processor.

> Assuming, of course, that one is building custom hardware anyway.  
> If you just want to do it once, use a PC!

That is true, too. But how fast and how often are important in deciding
on the optimal design, especially regarding I/O bus width.

I will guess that the OP isn't a native english speaker:

>>> I am interested in performing million-point FFT in FPGA.

That should say either "a million point" (only one) 
or "million-point FFTs" (and maybe how many or how fast).

If you can get the data already in bit-reversed order, it should 
be easy to make it really fast.

-- glen

Reply by Tim Wescott ●December 11, 20122012-12-11

On Wed, 12 Dec 2012 00:07:32 +0000, glen herrmannsfeldt wrote:

>>> gongdori <16548@dsprelated> wrote:
>>>> I am interested in performing million-point FFT in FPGA.
> 
> (snip, then I wrote)
>>> Do you want to do it in an FPGA just to see that it can be done?
>  
>>> The whole idea behind FFT is that it is O(N logN) and so can be done
>>> fast in software. A million points isn't all that many for modern
>>> processors.
>  
> Then Tim Wescott <tim@seemywebsite.com> wrote:
>> For certain problems -- and this one 'feels' like a candidate -- doing
>> math on an FPGA can be less expensive in terms of BOM cost, space
>> consumed by hardware, and/or power consumption.
> 
> Certainly that is true. But then again, you can always use a soft
> processor on the FPGA.
> 
> The OP didn't say fixed or floating point, which also might matter.
> 
> It could also be a school assignment to do something on an FPGA, and
> possibly with the requirement that it not be in a soft processor.
>  
>> Assuming, of course, that one is building custom hardware anyway.
>> If you just want to do it once, use a PC!
> 
> That is true, too. But how fast and how often are important in deciding
> on the optimal design, especially regarding I/O bus width.
> 
> I will guess that the OP isn't a native english speaker:
> 
>>>> I am interested in performing million-point FFT in FPGA.
> 
> That should say either "a million point" (only one)
> or "million-point FFTs" (and maybe how many or how fast).
> 
> If you can get the data already in bit-reversed order, it should be easy
> to make it really fast.
> 
> -- glen

On the whole, I think a soft processor would be slower, bigger, hotter, 
and more expensive than a real one.

I was thinking more on the lines that if the task entailed slurping in 
10^6 data points, doing an FFT, then spewing out 10^6 results as fast as 
possible, that an FPGA might (most certainly might, not "would") be the 
answer.

Of course, if the assignment is coming from a prof, then that's all she 
wrote.

-- 
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

Reply by Les Cargill ●December 11, 20122012-12-11

gongdori wrote:
> Hi all,
>
> I am interested in performing million-point FFT in FPGA. I've read some
> papers and concluded that the best way of performing million-point FFT
> would be having multiple smaller size FFT in parallel, for example 8x
> 16K-point FFT, with twiddle factor multipliers (I think it will be
> radix-2^2 streaming FFT with multiple inputs)
> But, since I am a newbie in DSP, I want to check if there is any easier
> way. I am especially interested in a way where I can avoid parallel input.
>
> Could you give me your idea/comment?
> Thanks!
>
> Thomas
>
>


A million points is 22+ seconds of audio @ CD rate. This happens *all 
the time* in software, on a GPP. Takes...seconds, perhaps a fraction
of a second.

--
Les Cargill

Reply by glen herrmannsfeldt ●December 12, 20122012-12-12

Tim Wescott <tim@seemywebsite.please> wrote:
(snip)
>> Then Tim Wescott <tim@seemywebsite.com> wrote:
>>> For certain problems -- and this one 'feels' like a candidate -- doing
>>> math on an FPGA can be less expensive in terms of BOM cost, space
>>> consumed by hardware, and/or power consumption.

(snip, then I wrote)
>> Certainly that is true. But then again, you can always use a soft
>> processor on the FPGA.

>> The OP didn't say fixed or floating point, which also might matter.

>> It could also be a school assignment to do something on an FPGA, and
>> possibly with the requirement that it not be in a soft processor.

(snip)

> On the whole, I think a soft processor would be slower, bigger, hotter, 
> and more expensive than a real one.

Yes, but the FPGA has the advantage that you can put much of the other
logic that you need inside, also.

> I was thinking more on the lines that if the task entailed slurping in 
> 10^6 data points, doing an FFT, then spewing out 10^6 results as fast as 
> possible, that an FPGA might (most certainly might, not "would") be the 
> answer.

The "as fast as possible" was not in the original post. For fixed point,
I would be pretty sure that the FPGA would be a good choice. I would
be a lot less sure for floating point.

Many FPGA families include an 18x18 block multiplier. If 18 bits
is enough, then the limit is mostly how fast you can stuff numbers
through the multipliers, though a little bit moving them around.

I haven't gone through the bit-reversed order algorithm recently, but
I believe that if you can get them in that order it works pretty well.
If not, then you have to store them somewhere, and though FPGAs do have
a lot of memory now, maybe not that much.

> Of course, if the assignment is coming from a prof, then that's all she 
> wrote.

Yes. Maybe the OP will explain more.

-- glen

Reply by kevin ●December 12, 20122012-12-12

gongdori <16548@dsprelated> wrote:
> I am interested in performing million-point FFT in FPGA.

You don't mention the word size you want to use, fixed or floating
point, input speed, latency requirements, complex or real-only inputs,
etc., so it's difficult to say: this is what you need to look at.
However, FFT architectures can be broadly categorized into 4 basic
types: 1) single processor, 2) multi-processor, 3) pipeline (one
processor for each butterfly column), or 4) systolic (one processor
per butterfly).  Within the 4 categories, there are a large number of
variations.  Rabiner and Gold's book: Theory and Application of
Digital Signal Processing (Chapter 10), would be a good place to
start.

Pipeline architectures can be very high-speed, while minimizing the
input/output requirements.  Data streams into one end, and, after some
latency, streams out the other.  Most architectures operate
continuously, with no breaks on the input or output (ie: the first
million data points to be transformed are followed by the second
million to be transformed, etc.).  Depending on the algorithm/
architecture, bit-reversal can be done on the input, on the output, or
within the processor by proper buffering.

A relatively straightforward design would be a 1024 point pipeline,
followed by a million word buffer, followed by another 1024 point
pipeline.  Each 1024 point pipeline could be configured as 10 radix-2
stages, but there are a great many variations possible, depending on
the requirements.

There are companies that produce FPGA designs for FFT pipelines, such
as Dillon Engineering, and Annapolis Microsystems:

http://ecat.edacafe.com/corpprofile.php?vendor_id=9001925

http://www.annapmicro.com/aboutus.html

But I think you'll have to be more specific about your requirements
before you proceed.

Kevin

Reply by Martin Thompson ●December 12, 20122012-12-12

"gongdori" <16548@dsprelated> writes:

> Hi all,
>
> I am interested in performing million-point FFT in FPGA. 

Why in an FPGA specifically?  

* For interest?
* or because you have analysis that says an FPGA is optimal for your
  particular set of constraints?

The first is good enough reason in itself.  Otherwise until you *need*
an FPGA to meet some constraint or other, steer clear.  (I speak as one
who "does" FPGAs for a living)

> I've read some papers and concluded that the best way of performing
> million-point FFT would be having multiple smaller size FFT in
> parallel, for example 8x 16K-point FFT, with twiddle factor
> multipliers (I think it will be radix-2^2 streaming FFT with multiple
> inputs) But, since I am a newbie in DSP, I want to check if there is
> any easier way. I am especially interested in a way where I can avoid
> parallel input.

2 calculations might be instructive:

* How big an FPGA do you need to do a 16K point FFT - you'll probably
  find yourself multiplier or RAM limited rather than logic limited, so
  the calculations are realtively easy

* For you chosen FFT setup: how long will you spend shuffling data in
  and out of external memory?

FFTs in FPGA often turn into a memory-bandwidth-bound problem, as the
memory/logic density of FPGAs doesn't match up to the cache/processing
density of a high-end processor.

Cheers,
Martin

-- 
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.co.uk/capabilities/39-electronic-hardware

Reply by Allan Herriman ●December 12, 20122012-12-12

On Tue, 11 Dec 2012 07:20:07 -0600, gongdori wrote:

> Hi all,
> 
> I am interested in performing million-point FFT in FPGA. I've read some
> papers and concluded that the best way of performing million-point FFT
> would be having multiple smaller size FFT in parallel, for example 8x
> 16K-point FFT, with twiddle factor multipliers (I think it will be
> radix-2^2 streaming FFT with multiple inputs)
> But, since I am a newbie in DSP, I want to check if there is any easier
> way. I am especially interested in a way where I can avoid parallel
> input.
> 
> Could you give me your idea/comment?
> Thanks!
> 
> Thomas

1M points would require 128Mb of storage, if using 128 bits for each 
complex point.  You didn't specify a SFDR or other metric which would 
allow the number of bits to be established.  I guessed at 128.

The largest FPGAs in the newest families from the two leading FPGA 
vendors have onboard storage for about 50-60Mb.  These largest parts cost 
thousands of dollars each.  They will not fit all your points in onboard 
RAM.

Your 16k point suggestion would require 2Mb, again assuming 128 bits per 
complex point.  Suitably sized FPGAs for 16k points are much cheaper, but 
you would need an external memory to hold all 1M points.

The disadvantage of external memory is that it becomes the limiting 
factor in the throughput.  In that sense, an FPGA won't do any better 
than a (cheaper and easier-to-program) DSP chip which can use exactly the 
same type of memory.

A single 16 bit wide DDR3 RAM chip requires about 40 FPGA pins and can 
transfer about 10-15Gb/s.  (The RAMs can go much faster, but it's hard to 
get the soft FPGA controllers to work as well as the hard controllers in 
other devices.)
OTOH, you can perhaps fit a dozen such RAM interfaces into a (large) 
FPGA, leading to up to 180Gb/s of data transfer.
That's a peak rate though.  It can be sustained only with linear 
addressing of large blocks (for some value of "large").  I'm guessing 
your method of using multiple 16k FFTs requires the opposite of linear 
addressing, so this may not work out too well unless you're very careful 
with how you (re)order you RAM accesses.

> I am especially interested in a way where I can avoid parallel input.

Use serial input then.  Many modern FPGAs have multiple multi-megabit 
transceivers.  A protocol like 10Gb/s Ethernet could give you a "fill 
time" of less than 20ms.

Nice homework problem, BTW.

Regards,
Allan

Previous12 Next

Million-point FFT

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group