DSPRelated.com
Forums

Million-point FFT

Started by gongdori December 11, 2012
Hi all,

I am interested in performing million-point FFT in FPGA. I've read some
papers and concluded that the best way of performing million-point FFT
would be having multiple smaller size FFT in parallel, for example 8x
16K-point FFT, with twiddle factor multipliers (I think it will be
radix-2^2 streaming FFT with multiple inputs)
But, since I am a newbie in DSP, I want to check if there is any easier
way. I am especially interested in a way where I can avoid parallel input.

Could you give me your idea/comment?
Thanks!

Thomas


gongdori <16548@dsprelated> wrote:
 
> I am interested in performing million-point FFT in FPGA.
Do you want to do it in an FPGA just to see that it can be done? The whole idea behind FFT is that it is O(N logN) and so can be done fast in software. A million points isn't all that many for modern processors.
> I've read some > papers and concluded that the best way of performing million-point FFT > would be having multiple smaller size FFT in parallel, for example 8x > 16K-point FFT, with twiddle factor multipliers (I think it will be > radix-2^2 streaming FFT with multiple inputs) > But, since I am a newbie in DSP, I want to check if there is any easier > way.
As far as I know, parallell inputs don't help the FPGA much, though it might help data get in faster. What is the data source? If you can get the data in bit reversed order, then the FFT is very easy to do. If you write it all to RAM, then read it out in bit reversed order, it should work pretty well. The only reason to move the data in parallel is to get it in and out faster. How fast do you need it to run?
> I am especially interested in a way where I can avoid parallel input.
-- glen
On Tue, 11 Dec 2012 19:28:34 +0000, glen herrmannsfeldt wrote:

> gongdori <16548@dsprelated> wrote: > >> I am interested in performing million-point FFT in FPGA. > > Do you want to do it in an FPGA just to see that it can be done? > > The whole idea behind FFT is that it is O(N logN) and so can be done > fast in software. A million points isn't all that many for modern > processors.
For certain problems -- and this one 'feels' like a candidate -- doing math on an FPGA can be less expensive in terms of BOM cost, space consumed by hardware, and/or power consumption. Assuming, of course, that one is building custom hardware anyway. If you just want to do it once, use a PC! -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com
 
>> gongdori <16548@dsprelated> wrote: >>> I am interested in performing million-point FFT in FPGA.
(snip, then I wrote)
>> Do you want to do it in an FPGA just to see that it can be done?
>> The whole idea behind FFT is that it is O(N logN) and so can be done >> fast in software. A million points isn't all that many for modern >> processors.
Then Tim Wescott <tim@seemywebsite.com> wrote:
> For certain problems -- and this one 'feels' like a candidate -- doing > math on an FPGA can be less expensive in terms of BOM cost, space > consumed by hardware, and/or power consumption.
Certainly that is true. But then again, you can always use a soft processor on the FPGA. The OP didn't say fixed or floating point, which also might matter. It could also be a school assignment to do something on an FPGA, and possibly with the requirement that it not be in a soft processor.
> Assuming, of course, that one is building custom hardware anyway. > If you just want to do it once, use a PC!
That is true, too. But how fast and how often are important in deciding on the optimal design, especially regarding I/O bus width. I will guess that the OP isn't a native english speaker:
>>> I am interested in performing million-point FFT in FPGA.
That should say either "a million point" (only one) or "million-point FFTs" (and maybe how many or how fast). If you can get the data already in bit-reversed order, it should be easy to make it really fast. -- glen
On Wed, 12 Dec 2012 00:07:32 +0000, glen herrmannsfeldt wrote:

>>> gongdori <16548@dsprelated> wrote: >>>> I am interested in performing million-point FFT in FPGA. > > (snip, then I wrote) >>> Do you want to do it in an FPGA just to see that it can be done? > >>> The whole idea behind FFT is that it is O(N logN) and so can be done >>> fast in software. A million points isn't all that many for modern >>> processors. > > Then Tim Wescott <tim@seemywebsite.com> wrote: >> For certain problems -- and this one 'feels' like a candidate -- doing >> math on an FPGA can be less expensive in terms of BOM cost, space >> consumed by hardware, and/or power consumption. > > Certainly that is true. But then again, you can always use a soft > processor on the FPGA. > > The OP didn't say fixed or floating point, which also might matter. > > It could also be a school assignment to do something on an FPGA, and > possibly with the requirement that it not be in a soft processor. > >> Assuming, of course, that one is building custom hardware anyway. >> If you just want to do it once, use a PC! > > That is true, too. But how fast and how often are important in deciding > on the optimal design, especially regarding I/O bus width. > > I will guess that the OP isn't a native english speaker: > >>>> I am interested in performing million-point FFT in FPGA. > > That should say either "a million point" (only one) > or "million-point FFTs" (and maybe how many or how fast). > > If you can get the data already in bit-reversed order, it should be easy > to make it really fast. > > -- glen
On the whole, I think a soft processor would be slower, bigger, hotter, and more expensive than a real one. I was thinking more on the lines that if the task entailed slurping in 10^6 data points, doing an FFT, then spewing out 10^6 results as fast as possible, that an FPGA might (most certainly might, not "would") be the answer. Of course, if the assignment is coming from a prof, then that's all she wrote. -- Tim Wescott Control system and signal processing consulting www.wescottdesign.com
gongdori wrote:
> Hi all, > > I am interested in performing million-point FFT in FPGA. I've read some > papers and concluded that the best way of performing million-point FFT > would be having multiple smaller size FFT in parallel, for example 8x > 16K-point FFT, with twiddle factor multipliers (I think it will be > radix-2^2 streaming FFT with multiple inputs) > But, since I am a newbie in DSP, I want to check if there is any easier > way. I am especially interested in a way where I can avoid parallel input. > > Could you give me your idea/comment? > Thanks! > > Thomas > >
A million points is 22+ seconds of audio @ CD rate. This happens *all the time* in software, on a GPP. Takes...seconds, perhaps a fraction of a second. -- Les Cargill
Tim Wescott <tim@seemywebsite.please> wrote:
(snip)
>> Then Tim Wescott <tim@seemywebsite.com> wrote: >>> For certain problems -- and this one 'feels' like a candidate -- doing >>> math on an FPGA can be less expensive in terms of BOM cost, space >>> consumed by hardware, and/or power consumption.
(snip, then I wrote)
>> Certainly that is true. But then again, you can always use a soft >> processor on the FPGA.
>> The OP didn't say fixed or floating point, which also might matter.
>> It could also be a school assignment to do something on an FPGA, and >> possibly with the requirement that it not be in a soft processor.
(snip)
> On the whole, I think a soft processor would be slower, bigger, hotter, > and more expensive than a real one.
Yes, but the FPGA has the advantage that you can put much of the other logic that you need inside, also.
> I was thinking more on the lines that if the task entailed slurping in > 10^6 data points, doing an FFT, then spewing out 10^6 results as fast as > possible, that an FPGA might (most certainly might, not "would") be the > answer.
The "as fast as possible" was not in the original post. For fixed point, I would be pretty sure that the FPGA would be a good choice. I would be a lot less sure for floating point. Many FPGA families include an 18x18 block multiplier. If 18 bits is enough, then the limit is mostly how fast you can stuff numbers through the multipliers, though a little bit moving them around. I haven't gone through the bit-reversed order algorithm recently, but I believe that if you can get them in that order it works pretty well. If not, then you have to store them somewhere, and though FPGAs do have a lot of memory now, maybe not that much.
> Of course, if the assignment is coming from a prof, then that's all she > wrote.
Yes. Maybe the OP will explain more. -- glen
gongdori <16548@dsprelated> wrote:
> I am interested in performing million-point FFT in FPGA.
You don't mention the word size you want to use, fixed or floating point, input speed, latency requirements, complex or real-only inputs, etc., so it's difficult to say: this is what you need to look at. However, FFT architectures can be broadly categorized into 4 basic types: 1) single processor, 2) multi-processor, 3) pipeline (one processor for each butterfly column), or 4) systolic (one processor per butterfly). Within the 4 categories, there are a large number of variations. Rabiner and Gold's book: Theory and Application of Digital Signal Processing (Chapter 10), would be a good place to start. Pipeline architectures can be very high-speed, while minimizing the input/output requirements. Data streams into one end, and, after some latency, streams out the other. Most architectures operate continuously, with no breaks on the input or output (ie: the first million data points to be transformed are followed by the second million to be transformed, etc.). Depending on the algorithm/ architecture, bit-reversal can be done on the input, on the output, or within the processor by proper buffering. A relatively straightforward design would be a 1024 point pipeline, followed by a million word buffer, followed by another 1024 point pipeline. Each 1024 point pipeline could be configured as 10 radix-2 stages, but there are a great many variations possible, depending on the requirements. There are companies that produce FPGA designs for FFT pipelines, such as Dillon Engineering, and Annapolis Microsystems: http://ecat.edacafe.com/corpprofile.php?vendor_id=9001925 http://www.annapmicro.com/aboutus.html But I think you'll have to be more specific about your requirements before you proceed. Kevin
"gongdori" <16548@dsprelated> writes:

> Hi all, > > I am interested in performing million-point FFT in FPGA.
Why in an FPGA specifically? * For interest? * or because you have analysis that says an FPGA is optimal for your particular set of constraints? The first is good enough reason in itself. Otherwise until you *need* an FPGA to meet some constraint or other, steer clear. (I speak as one who "does" FPGAs for a living)
> I've read some papers and concluded that the best way of performing > million-point FFT would be having multiple smaller size FFT in > parallel, for example 8x 16K-point FFT, with twiddle factor > multipliers (I think it will be radix-2^2 streaming FFT with multiple > inputs) But, since I am a newbie in DSP, I want to check if there is > any easier way. I am especially interested in a way where I can avoid > parallel input.
2 calculations might be instructive: * How big an FPGA do you need to do a 16K point FFT - you'll probably find yourself multiplier or RAM limited rather than logic limited, so the calculations are realtively easy * For you chosen FFT setup: how long will you spend shuffling data in and out of external memory? FFTs in FPGA often turn into a memory-bandwidth-bound problem, as the memory/logic density of FPGAs doesn't match up to the cache/processing density of a high-end processor. Cheers, Martin -- martin.j.thompson@trw.com TRW Conekt - Consultancy in Engineering, Knowledge and Technology http://www.conekt.co.uk/capabilities/39-electronic-hardware
On Tue, 11 Dec 2012 07:20:07 -0600, gongdori wrote:

> Hi all, > > I am interested in performing million-point FFT in FPGA. I've read some > papers and concluded that the best way of performing million-point FFT > would be having multiple smaller size FFT in parallel, for example 8x > 16K-point FFT, with twiddle factor multipliers (I think it will be > radix-2^2 streaming FFT with multiple inputs) > But, since I am a newbie in DSP, I want to check if there is any easier > way. I am especially interested in a way where I can avoid parallel > input. > > Could you give me your idea/comment? > Thanks! > > Thomas
1M points would require 128Mb of storage, if using 128 bits for each complex point. You didn't specify a SFDR or other metric which would allow the number of bits to be established. I guessed at 128. The largest FPGAs in the newest families from the two leading FPGA vendors have onboard storage for about 50-60Mb. These largest parts cost thousands of dollars each. They will not fit all your points in onboard RAM. Your 16k point suggestion would require 2Mb, again assuming 128 bits per complex point. Suitably sized FPGAs for 16k points are much cheaper, but you would need an external memory to hold all 1M points. The disadvantage of external memory is that it becomes the limiting factor in the throughput. In that sense, an FPGA won't do any better than a (cheaper and easier-to-program) DSP chip which can use exactly the same type of memory. A single 16 bit wide DDR3 RAM chip requires about 40 FPGA pins and can transfer about 10-15Gb/s. (The RAMs can go much faster, but it's hard to get the soft FPGA controllers to work as well as the hard controllers in other devices.) OTOH, you can perhaps fit a dozen such RAM interfaces into a (large) FPGA, leading to up to 180Gb/s of data transfer. That's a peak rate though. It can be sustained only with linear addressing of large blocks (for some value of "large"). I'm guessing your method of using multiple 16k FFTs requires the opposite of linear addressing, so this may not work out too well unless you're very careful with how you (re)order you RAM accesses.
> I am especially interested in a way where I can avoid parallel input.
Use serial input then. Many modern FPGAs have multiple multi-megabit transceivers. A protocol like 10Gb/s Ethernet could give you a "fill time" of less than 20ms. Nice homework problem, BTW. Regards, Allan