DSPRelated.com
Forums

All-real FFT for FPGA

Started by Tim Wescott February 12, 2017
On Mon, 13 Feb 2017 22:36:41 -0500, rickman wrote:

> On 2/13/2017 9:39 PM, Tim Wescott wrote: >> On Mon, 13 Feb 2017 16:08:00 -0500, rickman wrote: >> >>> On 2/13/2017 12:56 PM, Tim Wescott wrote: >>>> On Mon, 13 Feb 2017 01:48:49 -0500, rickman wrote: >>>> >>>>> On 2/13/2017 12:03 AM, Tim Wescott wrote: >>>>>> On Sun, 12 Feb 2017 20:32:59 -0500, rickman wrote: >>>>>> >>>>>>> On 2/12/2017 1:05 PM, Tim Wescott wrote: >>>>>>>> So, there are algorithms out there to perform an FFT on real >>>>>>>> data, that save (I think) roughly 2x the calculations of FFTs for >>>>>>>> complex data. >>>>>>>> >>>>>>>> I did a quick search, but didn't find any that are made >>>>>>>> specifically for FPGAs. Was my search too quick, or are there no >>>>>>>> IP sources to do this? >>>>>>>> >>>>>>>> It would seem like a slam-dunk for Xilinx and Intel/Altera to >>>>>>>> include these algorithms in their FFT libraries. >>>>>>> >>>>>>> I thought I replied to this, but my computer has been crashing a >>>>>>> bit so maybe I lost that one. >>>>>>> >>>>>>> An FFT is inherently a complex function. Real data can be >>>>>>> processed by taking advantage of some of the symmetries in the >>>>>>> FFT. I don't recall the details as it has been over a decade >>>>>>> since I worked with this, but you can fold half of the real data >>>>>>> into the imaginary portion, run a size N/2 FFT and then unfold the >>>>>>> results. I believe you have to run a final N sized complex pass >>>>>>> on these results to get your final answer. I do recall it saved a >>>>>>> *lot* when performing FFTs, >>>>>>> nearly 50%. >>>>>> >>>>>> My understanding is that there were some software packages that >>>>>> baked that into the algorithm, for what savings I don't know. I >>>>>> was wondering if it was done for FFTs as well. >>>>> >>>>> I'm not sure what you mean. When you say "baking" it into the >>>>> algorithm, it would do pretty much what I described. That *is* the >>>>> algorithm. I haven't heard of any other optimizations. The savings >>>>> is in time/size. Essentially it does an N/2 size FFT with an extra >>>>> pass, so instead of N*log(N) step it takes (N+1)*log(N/2) steps. >>>> >>>> There's a distinct symmetry to the Fourier transform that you could >>>> use at each step of the way instead of doing the whole thing and >>>> fixing it up at the end. >>>> >>>> I don't know if it would save steps, but it would certainly be easier >>>> on someone who just wants to apply an algorithm. >>> >>> Are you referring to the nature of the FFT on real signals (i.e. data >>> sets that have zeros for the imaginary data)? That would only help >>> with the first pass of butterflies. Once your perform those the >>> imaginary part is not zero anymore. >> >> Yes, but the imaginary and real parts are still symmetrical around f = >> 0, >> so you should neither have to compute nor use them. >> >>> That's why you get very little optimization by plugging in the zero >>> data and letting the tools work it out. >> >> Yes, unless the optimizers get _really_ smart. > > We are talking two different things. The HDL synthesis optimizers are > optimizing *logic*. They don't know diddly about what you are using the > logic for.
Well, yes, but these computers are getting smarter all the time. When you tell Synopsis to synthesize and it says "your fly is unzipped", you'll know that the optimizers are too damned smart. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com I'm looking for work -- see my website!
Tim Wescott  <seemywebsite@myfooter.really> wrote:

>On Mon, 13 Feb 2017 22:36:41 -0500, rickman wrote:
>> On 2/13/2017 9:39 PM, Tim Wescott wrote:
>>>> Are you referring to the nature of the FFT on real signals (i.e. data >>>> sets that have zeros for the imaginary data)? That would only help >>>> with the first pass of butterflies. Once your perform those the >>>> imaginary part is not zero anymore.
>>> Yes, but the imaginary and real parts are still symmetrical around f = >>> 0, >>> so you should neither have to compute nor use them.
>>>> That's why you get very little optimization by plugging in the zero >>>> data and letting the tools work it out.
>>> Yes, unless the optimizers get _really_ smart.
>> We are talking two different things. The HDL synthesis optimizers are >> optimizing *logic*. They don't know diddly about what you are using the >> logic for.
>Well, yes, but these computers are getting smarter all the time. When >you tell Synopsis to synthesize and it says "your fly is unzipped", >you'll know that the optimizers are too damned smart.
If for example you compute (a * b) and also compute (a * -b), the synthesizer is smart enough to know there are not two full multipliers needed. I'm very unclear that it would not optimize past the first butterfly. They can also optimize across pipeline stages and also modify the number of bits of state and their logic values within a pipeline register. But, to know for sure, try running it and see what happens. (And you want to use Synopsis or the equivalent, and probably not Xilinx/Altera native synthesizers...) If faced with this design I would certainly give it a shot and see if the synthesis is good enough, rather than assuming it isn't and embarking on a major bit of perhaps unneeded design work. Steve
Am 14.02.17 um 03:39 schrieb Tim Wescott:
> The Xilinx logic generators also seem to only support 2^n vector sizes. > I don't know how convoluted things get, but I know that with a general- > purpose processor, a prime-value-sized FFT is nearly as fast as a 2^n > one. Don't ask me how -- it's just a box with magic inside for me at > this point.
:) It is very "convoluted" :))))) Sorry for the pun. Prime-sized FFTs (of long length*) are computed using the Bluestein algorithm. Bluestein rewrites the FFT of any size as a convolution. This convolution is then computed using a padded FFT of longer size... A typical FFT library first tries to factor the length into prime numbers, and contains optimized FFTs for small prime factors. If that fails, it uses the Bluestein algorithm. I'm totally ignorant of FPGA, but somehow it goes over my imagination how to express such a "convoluted" thing as gates. Christian
>
* there is also the Winograd transform which works for a handful of selected prime numbers, e.g. 11 and 13, but not in the general case
On 2/14/2017 1:52 AM, Steve Pope wrote:
> Tim Wescott <seemywebsite@myfooter.really> wrote: > >> On Mon, 13 Feb 2017 22:36:41 -0500, rickman wrote: > >>> On 2/13/2017 9:39 PM, Tim Wescott wrote: > >>>>> Are you referring to the nature of the FFT on real signals (i.e. data >>>>> sets that have zeros for the imaginary data)? That would only help >>>>> with the first pass of butterflies. Once your perform those the >>>>> imaginary part is not zero anymore. > >>>> Yes, but the imaginary and real parts are still symmetrical around f = >>>> 0, >>>> so you should neither have to compute nor use them. > >>>>> That's why you get very little optimization by plugging in the zero >>>>> data and letting the tools work it out. > >>>> Yes, unless the optimizers get _really_ smart. > >>> We are talking two different things. The HDL synthesis optimizers are >>> optimizing *logic*. They don't know diddly about what you are using the >>> logic for. > >> Well, yes, but these computers are getting smarter all the time. When >> you tell Synopsis to synthesize and it says "your fly is unzipped", >> you'll know that the optimizers are too damned smart. > > If for example you compute (a * b) and also compute (a * -b), > the synthesizer is smart enough to know there are not two full > multipliers needed. > > I'm very unclear that it would not optimize past the first butterfly. > They can also optimize across pipeline stages and also modify > the number of bits of state and their logic values within a pipeline > register.
What other optimizations do you see?
> But, to know for sure, try running it and see what happens. > (And you want to use Synopsis or the equivalent, and probably not > Xilinx/Altera native synthesizers...) > > If faced with this design I would certainly give it a shot and > see if the synthesis is good enough, rather than assuming it isn't > and embarking on a major bit of perhaps unneeded design work.
Ok, who will bell the cat? I'm not in the mood for designing and then analyzing FFTs at the moment. -- Rick C
On 2/14/2017 1:56 AM, Christian Gollwitzer wrote:
> Am 14.02.17 um 03:39 schrieb Tim Wescott: >> The Xilinx logic generators also seem to only support 2^n vector sizes. >> I don't know how convoluted things get, but I know that with a general- >> purpose processor, a prime-value-sized FFT is nearly as fast as a 2^n >> one. Don't ask me how -- it's just a box with magic inside for me at >> this point. > > :) It is very "convoluted" :))))) Sorry for the pun. Prime-sized FFTs > (of long length*) are computed using the Bluestein algorithm. Bluestein > rewrites the FFT of any size as a convolution. This convolution is then > computed using a padded FFT of longer size... > > A typical FFT library first tries to factor the length into prime > numbers, and contains optimized FFTs for small prime factors. If that > fails, it uses the Bluestein algorithm. > > I'm totally ignorant of FPGA, but somehow it goes over my imagination > how to express such a "convoluted" thing as gates.
People think sequential code is easier because they started with that and it is now second nature to them. I remember some of the misconceptions students would have because they don't get that the computer is a sequential beast. HDLs are written to support parallel processes as second nature because that is the way hardware works. You describe each piece and it runs 100% of the time. Things only happen when the inputs changes, but the hardware is sitting there waiting for that to happen. In reality it is *easier* to express most things in HDL I think. But if you only think in sequential logic, then just write your HDL as a process. The code inside a process is purely sequential. -- Rick C
On Tue, 14 Feb 2017 06:52:32 +0000, Steve Pope wrote:

> Tim Wescott <seemywebsite@myfooter.really> wrote: > >>On Mon, 13 Feb 2017 22:36:41 -0500, rickman wrote: > >>> On 2/13/2017 9:39 PM, Tim Wescott wrote: > >>>>> Are you referring to the nature of the FFT on real signals (i.e. >>>>> data sets that have zeros for the imaginary data)? That would only >>>>> help with the first pass of butterflies. Once your perform those >>>>> the imaginary part is not zero anymore. > >>>> Yes, but the imaginary and real parts are still symmetrical around f >>>> = >>>> 0, >>>> so you should neither have to compute nor use them. > >>>>> That's why you get very little optimization by plugging in the zero >>>>> data and letting the tools work it out. > >>>> Yes, unless the optimizers get _really_ smart. > >>> We are talking two different things. The HDL synthesis optimizers are >>> optimizing *logic*. They don't know diddly about what you are using >>> the logic for. > >>Well, yes, but these computers are getting smarter all the time. When >>you tell Synopsis to synthesize and it says "your fly is unzipped", >>you'll know that the optimizers are too damned smart. > > If for example you compute (a * b) and also compute (a * -b), > the synthesizer is smart enough to know there are not two full > multipliers needed.
I would be very leery of an optimizer that felt free to optimize things so that they are no longer bit-exact -- and for some combinations of bits, I'm pretty sure that -(a * b) is not necessarily (a * -b). -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
Tim Wescott  <tim@seemywebsite.com> wrote:

>On Tue, 14 Feb 2017 06:52:32 +0000, Steve Pope wrote:
>> If for example you compute (a * b) and also compute (a * -b), >> the synthesizer is smart enough to know there are not two full >> multipliers needed.
>I would be very leery of an optimizer that felt free to optimize things >so that they are no longer bit-exact --
Of course, synthesizers need to be bit-exact and conform to the HDL language spec
> and for some combinations of > bits, I'm pretty sure that -(a * b) is not necessarily (a * -b).
That is not the example I gave, but in either example you still would not need two multipliers, just one multiplier and some small amount of logic; any reasonable synthesizer would not use two multipliers worth of gates. Steve
On Monday, February 13, 2017 at 7:05:24 AM UTC+13, Tim Wescott wrote:
> So, there are algorithms out there to perform an FFT on real data, that > save (I think) roughly 2x the calculations of FFTs for complex data. > > I did a quick search, but didn't find any that are made specifically for > FPGAs. Was my search too quick, or are there no IP sources to do this? > > It would seem like a slam-dunk for Xilinx and Intel/Altera to include > these algorithms in their FFT libraries. > > -- > > Tim Wescott > Wescott Design Services > http://www.wescottdesign.com > > I'm looking for work -- see my website!
Compact rio or myRio has built in FFT algorithms using FPGA.
On Mon, 13 Feb 2017 22:36:41 -0500, rickman <gnuarm@gmail.com> wrote:

>On 2/13/2017 9:39 PM, Tim Wescott wrote: >> On Mon, 13 Feb 2017 16:08:00 -0500, rickman wrote: >> >>> On 2/13/2017 12:56 PM, Tim Wescott wrote: >>>> On Mon, 13 Feb 2017 01:48:49 -0500, rickman wrote: >>>> >>>>> On 2/13/2017 12:03 AM, Tim Wescott wrote: >>>>>> On Sun, 12 Feb 2017 20:32:59 -0500, rickman wrote: >>>>>> >>>>>>> On 2/12/2017 1:05 PM, Tim Wescott wrote: >>>>>>>> So, there are algorithms out there to perform an FFT on real data, >>>>>>>> that save (I think) roughly 2x the calculations of FFTs for complex >>>>>>>> data. >>>>>>>> >>>>>>>> I did a quick search, but didn't find any that are made >>>>>>>> specifically for FPGAs. Was my search too quick, or are there no >>>>>>>> IP sources to do this? >>>>>>>> >>>>>>>> It would seem like a slam-dunk for Xilinx and Intel/Altera to >>>>>>>> include these algorithms in their FFT libraries. >>>>>>> >>>>>>> I thought I replied to this, but my computer has been crashing a bit >>>>>>> so maybe I lost that one. >>>>>>> >>>>>>> An FFT is inherently a complex function. Real data can be processed >>>>>>> by taking advantage of some of the symmetries in the FFT. I don't >>>>>>> recall the details as it has been over a decade since I worked with >>>>>>> this, but you can fold half of the real data into the imaginary >>>>>>> portion, run a size N/2 FFT and then unfold the results. I believe >>>>>>> you have to run a final N sized complex pass on these results to get >>>>>>> your final answer. I do recall it saved a *lot* when performing >>>>>>> FFTs, >>>>>>> nearly 50%. >>>>>> >>>>>> My understanding is that there were some software packages that baked >>>>>> that into the algorithm, for what savings I don't know. I was >>>>>> wondering if it was done for FFTs as well. >>>>> >>>>> I'm not sure what you mean. When you say "baking" it into the >>>>> algorithm, it would do pretty much what I described. That *is* the >>>>> algorithm. I haven't heard of any other optimizations. The savings >>>>> is in time/size. Essentially it does an N/2 size FFT with an extra >>>>> pass, so instead of N*log(N) step it takes (N+1)*log(N/2) steps. >>>> >>>> There's a distinct symmetry to the Fourier transform that you could use >>>> at each step of the way instead of doing the whole thing and fixing it >>>> up at the end. >>>> >>>> I don't know if it would save steps, but it would certainly be easier >>>> on someone who just wants to apply an algorithm. >>> >>> Are you referring to the nature of the FFT on real signals (i.e. data >>> sets that have zeros for the imaginary data)? That would only help with >>> the first pass of butterflies. Once your perform those the imaginary >>> part is not zero anymore. >> >> Yes, but the imaginary and real parts are still symmetrical around f = 0, >> so you should neither have to compute nor use them. >> >>> That's why you get very little optimization by plugging in the zero data >>> and letting the tools work it out. >> >> Yes, unless the optimizers get _really_ smart. > >We are talking two different things. The HDL synthesis optimizers are >optimizing *logic*. They don't know diddly about what you are using the >logic for. > > >>> On the other hand, the vendor's FFT logic generator should support real >>> data inputs. It is a common enough need. >> >> I agree, but when I looked I didn't see it. >> >> The Xilinx logic generators also seem to only support 2^n vector sizes. >> I don't know how convoluted things get, but I know that with a general- >> purpose processor, a prime-value-sized FFT is nearly as fast as a 2^n >> one. Don't ask me how -- it's just a box with magic inside for me at >> this point. > >When I first learned FFTs, I did it by looking at the equations in terms >of the twiddle factors each input was multiplied by as it contributed to >the final value. It works out to be the same calculation as the DFT. >It is taking advantage of the cyclic nature of the twiddle factors to >avoid redundant calculations. The same basis provides the various >symmetries.
Another way to understand it is that the DFT is a linear-algebra multiplication of the length-N input vector with the NxN transform matrix. Before electronics were prevalent astronomers would hand-calculate DFTs (for orbit calculations, I think), and after doing this a lot Gauss observed that many computations were repeated in a pattern which allowed the transform matrix to be factored to save redundant calculations. This is why Gauss is often credited with "inventing" the FFT, and others much later rediscovered the same thing.
>I can't say how prime sized data sets work out. I'm very surprised they >even exist. I can't picture how the butterflies would work.
There are multiple ways to do non-power-of-two-sized FFTs, some are more efficient than others. Alternative radix or even multiple radix transforms are sometimes used. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
On Tue, 14 Feb 2017 06:52:32 +0000 (UTC), spope33@speedymail.org
(Steve Pope) wrote:

>Tim Wescott <seemywebsite@myfooter.really> wrote: > >>On Mon, 13 Feb 2017 22:36:41 -0500, rickman wrote: > >>> On 2/13/2017 9:39 PM, Tim Wescott wrote: > >>>>> Are you referring to the nature of the FFT on real signals (i.e. data >>>>> sets that have zeros for the imaginary data)? That would only help >>>>> with the first pass of butterflies. Once your perform those the >>>>> imaginary part is not zero anymore. > >>>> Yes, but the imaginary and real parts are still symmetrical around f = >>>> 0, >>>> so you should neither have to compute nor use them. > >>>>> That's why you get very little optimization by plugging in the zero >>>>> data and letting the tools work it out. > >>>> Yes, unless the optimizers get _really_ smart. > >>> We are talking two different things. The HDL synthesis optimizers are >>> optimizing *logic*. They don't know diddly about what you are using the >>> logic for. > >>Well, yes, but these computers are getting smarter all the time. When >>you tell Synopsis to synthesize and it says "your fly is unzipped", >>you'll know that the optimizers are too damned smart. > >If for example you compute (a * b) and also compute (a * -b), >the synthesizer is smart enough to know there are not two full >multipliers needed. > >I'm very unclear that it would not optimize past the first butterfly. >They can also optimize across pipeline stages and also modify >the number of bits of state and their logic values within a pipeline >register.
The results of synthesizer optimization on something like an FFT will depend a LOT on the architecture used. A highly parallel architecture, or an architecture where the FFT stages are independently pipelined, would likely benefit from optimization when the imaginary inputs are static zeroes than a serial architecture would.
>But, to know for sure, try running it and see what happens. >(And you want to use Synopsis or the equivalent, and probably not >Xilinx/Altera native synthesizers...) > >If faced with this design I would certainly give it a shot and >see if the synthesis is good enough, rather than assuming it isn't >and embarking on a major bit of perhaps unneeded design work. > > > >Steve
--- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus