DSPRelated.com
Forums

All-real FFT for FPGA

Started by Tim Wescott February 12, 2017
On Monday, February 13, 2017 at 3:11:04 PM UTC-5, robert bristow-johnson wrote:
> On Monday, February 13, 2017 at 2:21:14 PM UTC-5, eric.j...@ieee.org wrote: > > On Sun, 12 Feb 2017 12:05:17 -0600, Tim Wescott > > <seemywebsite@myfooter.really> wrote: > > > > >So, there are algorithms out there to perform an FFT on real data, that > > >save (I think) roughly 2x the calculations of FFTs for complex data. > > > > > >I did a quick search, but didn't find any that are made specifically for > > >FPGAs. Was my search too quick, or are there no IP sources to do this? > > > > > >It would seem like a slam-dunk for Xilinx and Intel/Altera to include > > >these algorithms in their FFT libraries. > > > > As has been mentioned, there are a number of algorithms that exploit > > symmetries to prune the computations down near minimal, but I don't > > know of any canned FPGA libraries that do it. I don't know of any > > canned FPGA libraries that include a FHT (Fast Hartley Transofrm) > > either, which would also serve essentially the same purpose. > > > > As Steve suggested, you could just force the imaginary part to static > > zeroes and let the synthesizer optimizations clean it up, but it > > probably wouldn't be quite as good as using a well-designed RFFT if > > one was available. > > > > All that said, many of the existing FPGA FFT libraries are pretty > > efficient, so it may be worth just brute-forcing it and see whether it > > is good enough. > > Tim, i dunno diddley-shit about how algorithms get programmed into FPGAs. but assuming you do (or your client), i can send you the math necessary to use a regular-old complex FFT of N/2 points to perform a real FFT for N points. put the N/2 even-numbered reals into the real part and the N/2 odd-numbered reals into the imaginary part, do the size-N/2 complex FFT and sort it out on the output of the FFT. you must, of course, use the Hermitian symmetry of the FT of a real signal to detangle the results. and you will need the twiddle factors of the size-N FFT that are not needed in the size-N/2 FFT. > > lemme know, i'll send you something (unless you figure it out).
there appears to be an answer at http://dsp.stackexchange.com/questions/30185/fft-of-a-n-length-real-sequence-via-fft-of-a-n-2-length-complex-sequence
> L8r, > > r b-j
On 2/13/2017 2:34 AM, rickman wrote:
> On 2/13/2017 12:24 AM, Steve Pope wrote: >> On 2/12/2017 1:05 PM, Tim Wescott wrote: >> >>> So, there are algorithms out there to perform an FFT on real data, that >>> save (I think) roughly 2x the calculations of FFTs for complex data. >> >>> I did a quick search, but didn't find any that are made specifically for >>> FPGAs. Was my search too quick, or are there no IP sources to do this? >> >>> It would seem like a slam-dunk for Xilinx and Intel/Altera to include >>> these algorithms in their FFT libraries. >> >> Protip: the synthesizer should trim out the unneeded logic, so >> you don't need an optimized library macro. > > I don't think the synthesizer is capable of getting the same savings. > The optimizations would see the zero imaginary inputs and optimize the > first pass of the FFT. All passes would be N/2 butterflies while the > optimized approach would use half that many at the expense of an extra > pass. This is a big savings that the synthesis tools aren't likely to > figure out unless they recognize you are performing an FFT. > > Someone refresh my memory. If you do an FFT with zeros in the imaginary > part of the inputs, the output has a symmetry that can be used to > process two real streams at once. I can't recall how it works exactly, > but that symmetry is the basis for separating the results of the two > halves of the original sequence before completing the last pass. One > portion is pulled out because of the even symmetry and the other portion > is pulled out because of the odd symmetry. > > I found this page that appears to explain it, but I haven't taken the > time to dig into the math. I think I'd have to start all over again, > it's just been too long.
Opps, forgot the link. http://www.engineeringproductivitytools.com/stuff/T0001/PT10.HTM -- Rick C
On 2/13/2017 12:56 PM, Tim Wescott wrote:
> On Mon, 13 Feb 2017 01:48:49 -0500, rickman wrote: > >> On 2/13/2017 12:03 AM, Tim Wescott wrote: >>> On Sun, 12 Feb 2017 20:32:59 -0500, rickman wrote: >>> >>>> On 2/12/2017 1:05 PM, Tim Wescott wrote: >>>>> So, there are algorithms out there to perform an FFT on real data, >>>>> that save (I think) roughly 2x the calculations of FFTs for complex >>>>> data. >>>>> >>>>> I did a quick search, but didn't find any that are made specifically >>>>> for FPGAs. Was my search too quick, or are there no IP sources to do >>>>> this? >>>>> >>>>> It would seem like a slam-dunk for Xilinx and Intel/Altera to include >>>>> these algorithms in their FFT libraries. >>>> >>>> I thought I replied to this, but my computer has been crashing a bit >>>> so maybe I lost that one. >>>> >>>> An FFT is inherently a complex function. Real data can be processed >>>> by taking advantage of some of the symmetries in the FFT. I don't >>>> recall the details as it has been over a decade since I worked with >>>> this, but you can fold half of the real data into the imaginary >>>> portion, run a size N/2 FFT and then unfold the results. I believe >>>> you have to run a final N sized complex pass on these results to get >>>> your final answer. I do recall it saved a *lot* when performing FFTs, >>>> nearly 50%. >>> >>> My understanding is that there were some software packages that baked >>> that into the algorithm, for what savings I don't know. I was >>> wondering if it was done for FFTs as well. >> >> I'm not sure what you mean. When you say "baking" it into the >> algorithm, it would do pretty much what I described. That *is* the >> algorithm. I haven't heard of any other optimizations. The savings is >> in time/size. Essentially it does an N/2 size FFT with an extra pass, >> so instead of N*log(N) step it takes (N+1)*log(N/2) steps. > > There's a distinct symmetry to the Fourier transform that you could use > at each step of the way instead of doing the whole thing and fixing it up > at the end. > > I don't know if it would save steps, but it would certainly be easier on > someone who just wants to apply an algorithm.
Are you referring to the nature of the FFT on real signals (i.e. data sets that have zeros for the imaginary data)? That would only help with the first pass of butterflies. Once your perform those the imaginary part is not zero anymore. That's why you get very little optimization by plugging in the zero data and letting the tools work it out. On the other hand, the vendor's FFT logic generator should support real data inputs. It is a common enough need. -- Rick C
In article <3c112e82-e848-4e96-9171-c073b466eee3@googlegroups.com>,
robert bristow-johnson  <rbj@audioimagination.com> wrote:
>On Monday, February 13, 2017 at 3:11:04 PM UTC-5, robert bristow-johnson wrote: >> On Monday, February 13, 2017 at 2:21:14 PM UTC-5, eric.j...@ieee.org wrote: >> > On Sun, 12 Feb 2017 12:05:17 -0600, Tim Wescott >> > <seemywebsite@myfooter.really> wrote: >> > >> > >So, there are algorithms out there to perform an FFT on real data, that >> > >save (I think) roughly 2x the calculations of FFTs for complex data. >> > > >> > >I did a quick search, but didn't find any that are made specifically for >> > >FPGAs. Was my search too quick, or are there no IP sources to do this? >> > > >> > >It would seem like a slam-dunk for Xilinx and Intel/Altera to include >> > >these algorithms in their FFT libraries. >> > >> > As has been mentioned, there are a number of algorithms that exploit >> > symmetries to prune the computations down near minimal, but I don't >> > know of any canned FPGA libraries that do it. I don't know of any >> > canned FPGA libraries that include a FHT (Fast Hartley Transofrm) >> > either, which would also serve essentially the same purpose. >> > >> > As Steve suggested, you could just force the imaginary part to static >> > zeroes and let the synthesizer optimizations clean it up, but it >> > probably wouldn't be quite as good as using a well-designed RFFT if >> > one was available. >> > >> > All that said, many of the existing FPGA FFT libraries are pretty >> > efficient, so it may be worth just brute-forcing it and see whether it >> > is good enough. >> >> Tim, i dunno diddley-shit about how algorithms get programmed into FPGAs. >> but assuming you do (or your client), i can send you the math necessary >> to use a regular-old complex FFT of N/2 points to perform a real FFT for >> N points. put the N/2 even-numbered reals into the real part and the N/2 >> odd-numbered reals into the imaginary part, do the size-N/2 complex FFT >> and sort it out on the output of the FFT. you must, of course, use the >> Hermitian symmetry of the FT of a real signal to detangle the results. >> and you will need the twiddle factors of the size-N FFT that are not >> needed in the size-N/2 FFT. >> >> lemme know, i'll send you something (unless you figure it out).
Richard Lyons book "Understanding Digital Signal Processing" has a great writeup on this technique. I used his book as reference to design one on an ASIC over a decode ago. We had an optimized complex<=>complex FFT design but only needed a real input. If I recall the only real work (for the FFT direction) was at the input side. On the output side things were just shuffled. If you're algorithm could tolerate the final output being shuffled in memory, then you were done. Regards, Mark
On Monday, February 13, 2017 at 4:49:25 PM UTC-5, Mark Curry wrote:
> > Richard Lyons book "Understanding Digital Signal Processing" has a great > writeup on this technique. I used his book as reference to design one > on an ASIC over a decode ago. We had an optimized complex<=>complex > FFT design but only needed a real input. If I recall the only real > work (for the FFT direction) was at the input side. On the output side things > were just shuffled. If you're algorithm could tolerate the final output being > shuffled in memory, then you were done.
in the stack exchange answer http://dsp.stackexchange.com/questions/30185/fft-of-a-n-length-real-sequence-via-fft-of-a-n-2-length-complex-sequence which is the way i was thinking of doing it, there isn't much work done on the input side (the even-numbered real samples go into the real part and the odd-numbered real samples go into the imaginary part), but the major work is done untangling the output into N/2 complex pairs. r b-j
On Mon, 13 Feb 2017 21:47:34 -0000 (UTC), gtwrek@sonic.net (Mark
Curry) wrote:

>In article <3c112e82-e848-4e96-9171-c073b466eee3@googlegroups.com>, >robert bristow-johnson <rbj@audioimagination.com> wrote: >>On Monday, February 13, 2017 at 3:11:04 PM UTC-5, robert bristow-johnson wrote: >>> On Monday, February 13, 2017 at 2:21:14 PM UTC-5, eric.j...@ieee.org wrote: >>> > On Sun, 12 Feb 2017 12:05:17 -0600, Tim Wescott >>> > <seemywebsite@myfooter.really> wrote: >>> > >>> > >So, there are algorithms out there to perform an FFT on real data, that >>> > >save (I think) roughly 2x the calculations of FFTs for complex data. >>> > > >>> > >I did a quick search, but didn't find any that are made specifically for >>> > >FPGAs. Was my search too quick, or are there no IP sources to do this? >>> > > >>> > >It would seem like a slam-dunk for Xilinx and Intel/Altera to include >>> > >these algorithms in their FFT libraries. >>> > >>> > As has been mentioned, there are a number of algorithms that exploit >>> > symmetries to prune the computations down near minimal, but I don't >>> > know of any canned FPGA libraries that do it. I don't know of any >>> > canned FPGA libraries that include a FHT (Fast Hartley Transofrm) >>> > either, which would also serve essentially the same purpose. >>> > >>> > As Steve suggested, you could just force the imaginary part to static >>> > zeroes and let the synthesizer optimizations clean it up, but it >>> > probably wouldn't be quite as good as using a well-designed RFFT if >>> > one was available. >>> > >>> > All that said, many of the existing FPGA FFT libraries are pretty >>> > efficient, so it may be worth just brute-forcing it and see whether it >>> > is good enough. >>> >>> Tim, i dunno diddley-shit about how algorithms get programmed into FPGAs. >>> but assuming you do (or your client), i can send you the math necessary >>> to use a regular-old complex FFT of N/2 points to perform a real FFT for >>> N points. put the N/2 even-numbered reals into the real part and the N/2 >>> odd-numbered reals into the imaginary part, do the size-N/2 complex FFT >>> and sort it out on the output of the FFT. you must, of course, use the >>> Hermitian symmetry of the FT of a real signal to detangle the results. >>> and you will need the twiddle factors of the size-N FFT that are not >>> needed in the size-N/2 FFT. >>> >>> lemme know, i'll send you something (unless you figure it out). > >Richard Lyons book "Understanding Digital Signal Processing" has a great >writeup on this technique. I used his book as reference to design one >on an ASIC over a decode ago. We had an optimized complex<=>complex >FFT design but only needed a real input. If I recall the only real >work (for the FFT direction) was at the input side. On the output side things >were just shuffled. If you're algorithm could tolerate the final output being >shuffled in memory, then you were done. > >Regards, > >Mark
Three excellent texts for FFT-related stuff like this, i.e., FFT tricks, configurations, optimizations and stuff, are: 1. Rick's book, as mentioned, Understanding Digital Signal Processing, Richard G. Lyons, Addison Wesley 2. Discrete-Time Signal Processing, Oppenheim and Schafer, Prentice-Hall 3. The Fast Fourier Transform, E. Oran Brigham, Prentice-Hall In 2, the chapter on Computation of the Discrete Fourier Transform is where the good stuff is. I think 3, Brigham's book, should be in the library of anybody who does much with FFTs beyond ordinary applications or wants a deeper understanding of how the FFT works and how it can be effectivly applied. Rick's book should be in the library of anybody who does much DSP in general. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
On 2/13/2017 6:14 PM, robert bristow-johnson wrote:
> On Monday, February 13, 2017 at 4:49:25 PM UTC-5, Mark Curry wrote: >> >> Richard Lyons book "Understanding Digital Signal Processing" has a great >> writeup on this technique. I used his book as reference to design one >> on an ASIC over a decode ago. We had an optimized complex<=>complex >> FFT design but only needed a real input. If I recall the only real >> work (for the FFT direction) was at the input side. On the output side things >> were just shuffled. If you're algorithm could tolerate the final output being >> shuffled in memory, then you were done. > > in the stack exchange answer > > http://dsp.stackexchange.com/questions/30185/fft-of-a-n-length-real-sequence-via-fft-of-a-n-2-length-complex-sequence > > which is the way i was thinking of doing it, there isn't much work done on the input side (the even-numbered real samples go into the real part and the odd-numbered real samples go into the imaginary part), but the major work is done untangling the output into N/2 complex pairs.
There are symmetries in the FFT, a lot of them. I am sure you can either fold the data into the inputs and do messy stuff with the twiddle factors on the output... or you can do messy stuff with the twiddle factors on the input and do a simple unfolding on the output. One way or the other the algorithm consists of a size N/2 FFT and a row of further butterflies of size N data. Pay me now or pay me later. -- Rick C
On Mon, 13 Feb 2017 16:08:00 -0500, rickman wrote:

> On 2/13/2017 12:56 PM, Tim Wescott wrote: >> On Mon, 13 Feb 2017 01:48:49 -0500, rickman wrote: >> >>> On 2/13/2017 12:03 AM, Tim Wescott wrote: >>>> On Sun, 12 Feb 2017 20:32:59 -0500, rickman wrote: >>>> >>>>> On 2/12/2017 1:05 PM, Tim Wescott wrote: >>>>>> So, there are algorithms out there to perform an FFT on real data, >>>>>> that save (I think) roughly 2x the calculations of FFTs for complex >>>>>> data. >>>>>> >>>>>> I did a quick search, but didn't find any that are made >>>>>> specifically for FPGAs. Was my search too quick, or are there no >>>>>> IP sources to do this? >>>>>> >>>>>> It would seem like a slam-dunk for Xilinx and Intel/Altera to >>>>>> include these algorithms in their FFT libraries. >>>>> >>>>> I thought I replied to this, but my computer has been crashing a bit >>>>> so maybe I lost that one. >>>>> >>>>> An FFT is inherently a complex function. Real data can be processed >>>>> by taking advantage of some of the symmetries in the FFT. I don't >>>>> recall the details as it has been over a decade since I worked with >>>>> this, but you can fold half of the real data into the imaginary >>>>> portion, run a size N/2 FFT and then unfold the results. I believe >>>>> you have to run a final N sized complex pass on these results to get >>>>> your final answer. I do recall it saved a *lot* when performing >>>>> FFTs, >>>>> nearly 50%. >>>> >>>> My understanding is that there were some software packages that baked >>>> that into the algorithm, for what savings I don't know. I was >>>> wondering if it was done for FFTs as well. >>> >>> I'm not sure what you mean. When you say "baking" it into the >>> algorithm, it would do pretty much what I described. That *is* the >>> algorithm. I haven't heard of any other optimizations. The savings >>> is in time/size. Essentially it does an N/2 size FFT with an extra >>> pass, so instead of N*log(N) step it takes (N+1)*log(N/2) steps. >> >> There's a distinct symmetry to the Fourier transform that you could use >> at each step of the way instead of doing the whole thing and fixing it >> up at the end. >> >> I don't know if it would save steps, but it would certainly be easier >> on someone who just wants to apply an algorithm. > > Are you referring to the nature of the FFT on real signals (i.e. data > sets that have zeros for the imaginary data)? That would only help with > the first pass of butterflies. Once your perform those the imaginary > part is not zero anymore.
Yes, but the imaginary and real parts are still symmetrical around f = 0, so you should neither have to compute nor use them.
> That's why you get very little optimization by plugging in the zero data > and letting the tools work it out.
Yes, unless the optimizers get _really_ smart.
> On the other hand, the vendor's FFT logic generator should support real > data inputs. It is a common enough need.
I agree, but when I looked I didn't see it. The Xilinx logic generators also seem to only support 2^n vector sizes. I don't know how convoluted things get, but I know that with a general- purpose processor, a prime-value-sized FFT is nearly as fast as a 2^n one. Don't ask me how -- it's just a box with magic inside for me at this point. -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
On Mon, 13 Feb 2017 12:11:02 -0800, robert bristow-johnson wrote:

> On Monday, February 13, 2017 at 2:21:14 PM UTC-5, eric.j...@ieee.org > wrote: >> On Sun, 12 Feb 2017 12:05:17 -0600, Tim Wescott >> <seemywebsite@myfooter.really> wrote: >> >> >So, there are algorithms out there to perform an FFT on real data, >> >that save (I think) roughly 2x the calculations of FFTs for complex >> >data. >> > >> >I did a quick search, but didn't find any that are made specifically >> >for FPGAs. Was my search too quick, or are there no IP sources to do >> >this? >> > >> >It would seem like a slam-dunk for Xilinx and Intel/Altera to include >> >these algorithms in their FFT libraries. >> >> As has been mentioned, there are a number of algorithms that exploit >> symmetries to prune the computations down near minimal, but I don't >> know of any canned FPGA libraries that do it. I don't know of any >> canned FPGA libraries that include a FHT (Fast Hartley Transofrm) >> either, which would also serve essentially the same purpose. >> >> As Steve suggested, you could just force the imaginary part to static >> zeroes and let the synthesizer optimizations clean it up, but it >> probably wouldn't be quite as good as using a well-designed RFFT if one >> was available. >> >> All that said, many of the existing FPGA FFT libraries are pretty >> efficient, so it may be worth just brute-forcing it and see whether it >> is good enough. > > Tim, i dunno diddley-shit about how algorithms get programmed into > FPGAs. but assuming you do (or your client), i can send you the math > necessary to use a regular-old complex FFT of N/2 points to perform a > real FFT for N points. put the N/2 even-numbered reals into the real > part and the N/2 odd-numbered reals into the imaginary part, do the > size-N/2 complex FFT and sort it out on the output of the FFT. you > must, of course, use the Hermitian symmetry of the FT of a real signal > to detangle the results. and you will need the twiddle factors of the > size-N FFT that are not needed in the size-N/2 FFT. > > lemme know, i'll send you something (unless you figure it out).
I'm preparing a talk, and was trying to find out if I need to include it. I think I'll just mention it in passing and not worry about it. -- Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com
On 2/13/2017 9:39 PM, Tim Wescott wrote:
> On Mon, 13 Feb 2017 16:08:00 -0500, rickman wrote: > >> On 2/13/2017 12:56 PM, Tim Wescott wrote: >>> On Mon, 13 Feb 2017 01:48:49 -0500, rickman wrote: >>> >>>> On 2/13/2017 12:03 AM, Tim Wescott wrote: >>>>> On Sun, 12 Feb 2017 20:32:59 -0500, rickman wrote: >>>>> >>>>>> On 2/12/2017 1:05 PM, Tim Wescott wrote: >>>>>>> So, there are algorithms out there to perform an FFT on real data, >>>>>>> that save (I think) roughly 2x the calculations of FFTs for complex >>>>>>> data. >>>>>>> >>>>>>> I did a quick search, but didn't find any that are made >>>>>>> specifically for FPGAs. Was my search too quick, or are there no >>>>>>> IP sources to do this? >>>>>>> >>>>>>> It would seem like a slam-dunk for Xilinx and Intel/Altera to >>>>>>> include these algorithms in their FFT libraries. >>>>>> >>>>>> I thought I replied to this, but my computer has been crashing a bit >>>>>> so maybe I lost that one. >>>>>> >>>>>> An FFT is inherently a complex function. Real data can be processed >>>>>> by taking advantage of some of the symmetries in the FFT. I don't >>>>>> recall the details as it has been over a decade since I worked with >>>>>> this, but you can fold half of the real data into the imaginary >>>>>> portion, run a size N/2 FFT and then unfold the results. I believe >>>>>> you have to run a final N sized complex pass on these results to get >>>>>> your final answer. I do recall it saved a *lot* when performing >>>>>> FFTs, >>>>>> nearly 50%. >>>>> >>>>> My understanding is that there were some software packages that baked >>>>> that into the algorithm, for what savings I don't know. I was >>>>> wondering if it was done for FFTs as well. >>>> >>>> I'm not sure what you mean. When you say "baking" it into the >>>> algorithm, it would do pretty much what I described. That *is* the >>>> algorithm. I haven't heard of any other optimizations. The savings >>>> is in time/size. Essentially it does an N/2 size FFT with an extra >>>> pass, so instead of N*log(N) step it takes (N+1)*log(N/2) steps. >>> >>> There's a distinct symmetry to the Fourier transform that you could use >>> at each step of the way instead of doing the whole thing and fixing it >>> up at the end. >>> >>> I don't know if it would save steps, but it would certainly be easier >>> on someone who just wants to apply an algorithm. >> >> Are you referring to the nature of the FFT on real signals (i.e. data >> sets that have zeros for the imaginary data)? That would only help with >> the first pass of butterflies. Once your perform those the imaginary >> part is not zero anymore. > > Yes, but the imaginary and real parts are still symmetrical around f = 0, > so you should neither have to compute nor use them. > >> That's why you get very little optimization by plugging in the zero data >> and letting the tools work it out. > > Yes, unless the optimizers get _really_ smart.
We are talking two different things. The HDL synthesis optimizers are optimizing *logic*. They don't know diddly about what you are using the logic for.
>> On the other hand, the vendor's FFT logic generator should support real >> data inputs. It is a common enough need. > > I agree, but when I looked I didn't see it. > > The Xilinx logic generators also seem to only support 2^n vector sizes. > I don't know how convoluted things get, but I know that with a general- > purpose processor, a prime-value-sized FFT is nearly as fast as a 2^n > one. Don't ask me how -- it's just a box with magic inside for me at > this point.
When I first learned FFTs, I did it by looking at the equations in terms of the twiddle factors each input was multiplied by as it contributed to the final value. It works out to be the same calculation as the DFT. It is taking advantage of the cyclic nature of the twiddle factors to avoid redundant calculations. The same basis provides the various symmetries. I can't say how prime sized data sets work out. I'm very surprised they even exist. I can't picture how the butterflies would work. -- Rick C