I have invented a new FFT algorithm I'm calling the Lifted Fourier Transform (LFT) ("lifted" because the guts of the method operate in a higher dimension than is necessary; c.f. https://mathworld.wolfram.com/Lift.html). The algorithm has many interesting properties and I'm wondering where it might have the greatest impact in research and/or industry.
Here are some of the most intriguing features:
- Memory requirements are proportional to output size. That is, if only K output bins are desired (for signal length N > K), then the algorithm only needs O(K) bytes of RAM. For fun, I computed a pruned LFT of length N=2^30 on my laptop; in this case, FFTW runs out of memory. For more practical N, I imagine an FPGA or ASIC that could benefit from lower memory usage, but y'all are certainly more knowledgeable than me about that stuff.
- It is a streaming algorithm. Samples need not be buffered before given to the LFT. Therefore, the LFT could consume samples from a sensor, and the entire sensor/LFT system would only require O(K) memory for any transform length N >> K.
- It is fast; O(N log N). It is not as fast as FFTW for small N or large outputs K ~ N, but can be faster for K << N where N is large. As K decreases, performance approaches O(N).
- There are no twiddle factors! Just 48 floating point numbers will give any power-of-2 transform. The LFT therefore does not require a planning phase.
- The calculation is stable. I've tested against FFTW, and the output agrees to machine precision, even for N=2^28.
- All intermediate stages are short-time LFTs. The algorithm combines adjacent frames to get the next stage. This means the stages are useful by themselves; one can design adaptive algorithms with custom logic in the stages. For example, output pruning based on spectral power.
- Zero-padded transforms are trivial to compute and implicit. There is no need to actually allocate zeros in memory. (There must be FFT implementations that have this property; so maybe not a big win)
- Related to 7., input pruning is easy and speeds up the algorithm. For example, consider Fourier interpolation. The LFT is given a list of (n_i, s_i) pairs where n_i is a sample index and s_i the sample value. K low-frequency bins are kept to smooth the sparse signal. If the sparse input is size L, and we interpolate to N new samples, the computation scales like the larger of O(L), O(K), O(N).
Thanks everybody. I look forward to hearing all your application ideas given the wealth of knowledge in these forums.
A common problem in audio spectrum analyzers is that due to the log frequency scale you need a very fine resolution for the lower frequencies. Therefore you have to compute a large FFT, but really only need the first few bins. For higher frequencies you could use a much shorter FFT, so I could imagine using a short FFT for the overall picture and a heavily pruned long FFT to add precision in low frequencies.
This is in fact how the LFT came to be. I was using it to generate STFTs of many different power-of-2 window lengths (from property 6.) to find the best sparse approximations of audio signals. Basically, the LFT is a good way to generate a large dictionary for problems in https://en.wikipedia.org/wiki/Sparse_approximation.
I am interested in the user-interaction aspect of audio spectrum analyzers. The ability to zoom in on (increase frequency resolution of) arbitrary frequency bands without redoing the FFT seems nice.
I'm interested in using LFT for radioastronomy under noisy conditions. Instead of running a complete FFT and throwing away +50% of the spectrum due to noise, only calculate the known clean segments around TV and/or FM stations.
50% is still a large fraction of the output. The LFT becomes helpful when you want to keep < 10% of the output (smaller the better). I think in your case computing the FFT and throwing away the noise is the way to go.
You may be interested in traditional FFT pruning, which can speed things up, but still consumes the memory of a full FFT: http://www.fftw.org/pruned.html
One application area could be LTE (4G/5G radio radio stations) usually on FPGA platforms.
The required bins are as follows:
uplink/Downlink: 1200 or so from 2k DFT
random access preamble: 839 from 24K DFT
NBIoT: 48 bins from 8k DFT
However I tried the 48 bins case using direct accumulators and mixing with 24 mixers, no memory was needed but ended up flooded with registers and so gave up due mostly to extensive number of channels.
Preamble and NBIoT are interesting. Does "flooded with registers" mean your FPGA wasn't big enough to hold all the accumulators/mixers?
I ran a test on 48 bins from 8k DFT. LFT was not faster than FFTW, but FFTW uses > 8x the memory. Also, FFTW is highly optimized for CPU.
I should see what an FPGA implementation of the LFT looks like. Any tips on getting started?
For NBIoT we had 16 channels each needed 48 bins to be extracted out of 8K DFT (if we use LTE stream directly).
option1: We can use 8k full FFT to be shared by all 16 channels (as system clock is fast enough) but the main problem then is memory as we have to store inputs of 8k per channel of 32 bit width each for IQ data in addition to FFT requirements and output buffering
option2: one set of 24 or 48 mixers (for 48 bins), mix and accumulate for 8K samples. This is allowed full bit growth to keep quality (32 bits at input => 34 bits after mixers => 47 bits after full accumulation). The result is 48 bins X 47 bits X 16 channels = 36096 registers for mixers and accumulators only. Plus some memory for mixers plus further pipe registers for speed. Our FPGA contains well over 1200k registers but we are dealing with massive project.
option3: decimate down from lte speed and use 0.5K fft shared by all 16 channels serially. At the cost of design time, filters and minor quality issue. Such fft and even smaller is enough for NBIoT tone spacing.
option3 is the most realistic in such a case. If we are dealing with few bins over few channels then 8k direct DFT(option2) is no problem.
Interesting, thanks for sharing. I am curious, is there some reason I don't see larger FFTs used in RF communication?
For example, it seems like encoding data in a bin of a N=2^20 length DFT would be more robust against noise than N=1k. I know latency/compute/memory all increase with N, but it seems many applications would care more about noise than these barriers. The LFT could be a huge help here.
Yeah! New fast algorithm! And no twiddle factors! Great! I can't imagine doing a DFT without some sines and cosines, I'm very interested in that. Is it an integral transform? Good work and congrats heycosmo!
Thanks! sin and cos are often implemented on a computer with polynomial approximations (to machine precision). The LFT is closely related to this approach.
It definitely sounds interesting. If all of your claims are correct it could be very interesting in some applications.
It sounds a little like a sparse (pruned) sliding-DFT, but some of what you say may be an improvement to that as well.
I've been looking into predictive maintenance, where accelerometers listen for e.g. bearing wear on various spinning machinery. It seems like a good application: long DFTs to detect weak sinusoids in noise; looking within a few narrow frequency bands (harmonics of the operating frequency); and low memory requirements mean computation could be done on sensor.
Sliding DFT is certainly good for a window moving a few samples at a time. LFT would be more useful for larger hops.
This sounds really cool. Do you have any code that we could play with?
I am keeping my implementation private while I look for commercialization opportunities, but I hope to open-source it later whether I'm successful or not. I will let everyone know when I do.
Vaguely sounds like the Lifted wavelet transforms, with the wavelet transform being seen as cascaded short ffts or FIR filters.
Still sounds interesting. Definately be interested in anything that's published..
LFT is also known as an acronym for Logarithmic Fourier Transformation
Not to worry; a friend suggested LiFT, an objectively better acronym for this ;)