DSPRelated.com
Forums

need recommendation(s) for FFT algorithm to use

Started by ggk September 4, 2010
Hi Experts,

My application requires taking FFT and IFFT on a REAL array of points,
where the array size is fairly large (number of points in array ranges
between 2^20 to 2^27, which is 1M to 140M). Once in the frequency domain, I
zero out all bins except for those bins associated with spurs, then IFFT
the array back. 

I'm looking for a recommendation for an algorithm suited for this purpose
(I'm not that fluent in understanding which flavor of FFT is better than
another here). I currently use Matlab (based on FFTW), but I've been
looking at what's available on the internet and there's quite a variety
(realFFT, kissFFT, FFTW, etc.). Don't know why one might be better than
another for this application, nor which one(s) have better reputations for
accuracy and speed, etc.

I will eventually place whatever FFT solution is used on a server and
access the FFT solution with the back-end of my custom application
software. The front-end of this application software is a GUI observed by a
client computer over the internet. While the end user isn't aware of what
FFT algorithm is applied (only the back-end of my software application
interacts with the FFT algorithm), I'm worried that if I use a GPL licensed
FFT solution, that I would need to "own" the server that the FFT algorithm
resides on in order to consider it internal use. 

That is, as in most situations I guess, I would outsource the web hosting
of my application to a third party (like Amazon cloud computing) and
therefore the third party would "own" the server, and so I worry that if I
install FFTW or other GPL licensed code on this server that it might be
considered "distribution" because I'm giving it to the web hosting company
(even though I'm their customer and not the other way around), and then I
worry my proprietary software is no longer proprietary... Could anyone
comment on whether this is a real concern or not? If it is, then a non-GPL
solution would be required unless the commercial license is not too
expensive (e.g. <$2K USD). 

thanks in advance, ggk


ggk <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
 
> My application requires taking FFT and IFFT on a REAL array of points, > where the array size is fairly large (number of points in array ranges > between 2^20 to 2^27, which is 1M to 140M). Once in the frequency domain, I > zero out all bins except for those bins associated with spurs, then IFFT > the array back.
I believe in that case it is faster to use one that leaves the element in bit-reversed order, and then IFFT them from that order. It saves a lot of moving around, though will (slightly) complicate your bin zeroing. -- glen

ggk wrote:

> Hi Experts, > > My application requires taking FFT and IFFT on a REAL array of points, > where the array size is fairly large (number of points in array ranges > between 2^20 to 2^27, which is 1M to 140M). Once in the frequency domain, I > zero out all bins except for those bins associated with spurs, then IFFT > the array back.
What is the purpose of this?
> I'm looking for a recommendation for an algorithm suited for this purpose > (I'm not that fluent in understanding which flavor of FFT is better than > another here). I currently use Matlab (based on FFTW), but I've been > looking at what's available on the internet and there's quite a variety > (realFFT, kissFFT, FFTW, etc.). Don't know why one might be better than > another for this application, nor which one(s) have better reputations for > accuracy and speed, etc.
http://www.fftw.org/benchfft/ Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
> > >ggk wrote: > >> Hi Experts, >> >> My application requires taking FFT and IFFT on a REAL array of points, >> where the array size is fairly large (number of points in array ranges >> between 2^20 to 2^27, which is 1M to 140M). Once in the frequency
domain, I
>> zero out all bins except for those bins associated with spurs, then
IFFT
>> the array back. > >What is the purpose of this? > >> I'm looking for a recommendation for an algorithm suited for this
purpose
>> (I'm not that fluent in understanding which flavor of FFT is better
than
>> another here). I currently use Matlab (based on FFTW), but I've been >> looking at what's available on the internet and there's quite a variety >> (realFFT, kissFFT, FFTW, etc.). Don't know why one might be better than >> another for this application, nor which one(s) have better reputations
for
>> accuracy and speed, etc. > >http://www.fftw.org/benchfft/ > > > >Vladimir Vassilevsky >DSP and Mixed Signal Design Consultant >http://www.abvolt.com >
The purpose of zeroing bins and taking IFFT is to observe the time-domain contribution due to the spurs only (and not the random noise). For the processing that I'm doing, this is the key piece of information.
>Hi Experts, > >My application requires taking FFT and IFFT on a REAL array of points, >where the array size is fairly large (number of points in array ranges >between 2^20 to 2^27, which is 1M to 140M). Once in the frequency domain,
I
>zero out all bins except for those bins associated with spurs, then IFFT >the array back. > >I'm looking for a recommendation for an algorithm suited for this purpose >(I'm not that fluent in understanding which flavor of FFT is better than >another here). I currently use Matlab (based on FFTW), but I've been >looking at what's available on the internet and there's quite a variety >(realFFT, kissFFT, FFTW, etc.). Don't know why one might be better than >another for this application, nor which one(s) have better reputations
for
>accuracy and speed, etc. > >I will eventually place whatever FFT solution is used on a server and >access the FFT solution with the back-end of my custom application >software. The front-end of this application software is a GUI observed by
a
>client computer over the internet. While the end user isn't aware of what >FFT algorithm is applied (only the back-end of my software application >interacts with the FFT algorithm), I'm worried that if I use a GPL
licensed
>FFT solution, that I would need to "own" the server that the FFT
algorithm
>resides on in order to consider it internal use. > >That is, as in most situations I guess, I would outsource the web hosting >of my application to a third party (like Amazon cloud computing) and >therefore the third party would "own" the server, and so I worry that if
I
>install FFTW or other GPL licensed code on this server that it might be >considered "distribution" because I'm giving it to the web hosting
company
>(even though I'm their customer and not the other way around), and then I >worry my proprietary software is no longer proprietary... Could anyone >comment on whether this is a real concern or not? If it is, then a
non-GPL
>solution would be required unless the commercial license is not too >expensive (e.g. <$2K USD). > >thanks in advance, ggk > >
OK, I've learned that GPL licensing shouldn't be a problem for this application. So without any licensing restrictions, I remaining question is: is one flavor of FFT better able to tackle (from speed and accuracy point of view) large real FFT (and IFFT) than another flavor of FFT algorithm?
On Sep 4, 9:43&#4294967295;am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
> ... > > The purpose of zeroing bins and taking IFFT is to observe the time-domain > contribution due to the spurs only (and not the random noise). For the > processing that I'm doing, this is the key piece of information.
Abrupt transitions between weightings of 0 and 1 in regions of significant response in the frequency domain produce significant artifacts in the time domain. One of the reasons to ask about the purpose of your processing scheme is to understand why you think that your application would be immune to the artifacts. So, what is the application? Dale B. Dalrymple
>On Sep 4, 9:43=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote: >> ... >> >> The purpose of zeroing bins and taking IFFT is to observe the
time-domain
>> contribution due to the spurs only (and not the random noise). For the >> processing that I'm doing, this is the key piece of information. > >Abrupt transitions between weightings of 0 and 1 in regions of >significant response in the frequency domain produce significant >artifacts in the time domain. One of the reasons to ask about the >purpose of your processing scheme is to understand why you think that >your application would be immune to the artifacts. So, what is the >application? > >Dale B. Dalrymple >
Sure Dale, let me describe the application more... I have a time-domain waveform containing deterministic and random sources of noise. This waveform is derived from a clock output of an electronic device using a high-speed oscilloscope (40GS/s). Sources of deterministic noise might be a switch-mode power supply or crosstalk from neighboring signal lines that couple onto the signal output of interest. Random noise may include thermal noise from the oscilloscope and the device, for example. When I take an FFT of this waveform, I'll see spurs due to the deterministic noise popping up out of a background of random noise. For example, if a switch-mode power supply switches at 100KHz, when I observe, for example, a 100MHz clock signal and look at the timing noise I'd see a 100KHz tone in the frequency domain. By zeroing-out the non-spur frequency bins, and taking the IFFT (of only the spur bins), the resulting time-domain waveform will be due only to the deterministic sources of noise. If the 100KHz tone was the only spur observed, the IFFT would result in a 100KHz sine wave whose amplitude is a measure of the strength of the noise introduced by the switch mode power supply. If multiple spurs are observed in the frequency domain, the IFFT time-domain waveform would be a more complex pattern as these sine waves combine with each other. The end goal is to obtain a histogram (or, probability density function) of the y-axis (amplitude) values of this (IFFT) time-domain waveform. This histogram will be used for further processing, but it's critical that this histogram only include deterministic sources of noise (e.g. due to spurs from frequency domain, not random background frequency bins). Let me know if this helps or if you need more details somewhere. Could you describe what type of artifact might get introduced in such a scheme? Best regards, ggk
On 9/4/2010 9:46 AM, all4dsp wrote:
is one flavor of FFT better able to tackle (from speed and accuracy
> point of view) large real FFT (and IFFT) than another flavor of FFT > algorithm?
I can't answer your direct question but seem to recall that FFTW is pretty good and you can check the link that Vlad provided. You might be wanting to ask yourself another question though (and perhaps you have): "Is zeroing samples in the frequency domain likely to do what you expect?" This is an implementation of filtering by frequency domain multiplication (multiplying by zeros and ones). If you conjure up such a filter and IFFT just the filter then you'll see that the temporal unit sample response may (or may not) be exactly what you hoped for. The issue is time-domain aliasing or "folding". It would be more "acceptable" or "safer" practice to smooth the transitions in frequency in going from passbands to stopbands so that the temporal output doesn't "wrap/fold/overlap". Whether this matters is application specific. It does have some impact on the compute load but not. More technically: For a data sequence of length N: If you arbitrarily apply 1's and 0's in the frequency domain you could conceptually do this: - Form the sequence of 1's and 0's. And, just so things are at least reasonably correct: - the 1st sample (at f=0) is assumed to repeat at fs. - the 2nd sample=Nth sample - the 3rd sample=(N-1)th sample\. . . . At least then the sequence is symmetric about f=0 and the IFFT will be real. Without any other consideration, this array will be of length N also, right? - IFFT just this sequence. The result will be the unit sample response of the filter and will have length N. So, now we have a data sequence of length N and a filter of length N. If you were to convolve them in time the result would have length 2N-1 and, by the way, the entire output sequence would be either the transient of filling the filter or the transient of emptying the filter and there'd be no steady-state output at all! But, since the plan is to multiply in frequency then you'll be doing a *circular* convolution. If you do a circular convolution with only N points then the output will be totally overlapped upon itself. The proper way to do the circular convolution is this: Assume the data length of N as above. Assume the filter length is M and, I'll suggest that M should be a fair bit shorter than N. I would have said M=0.1*N but let's use M=0.33*N That way, 1/3 of the output will be the initial transient, 1/3 of the output will be steady state and 1/3 of the output will be the trailing transient. A straight convolution of data and filter here would result in M+N-1 samples in the sequence. So must the circular convolution. For simplicity, let's just make them both of length M+N. Thus, we will append M zeros to the end of the data sequence and we will append N zeros to the end of the filter sequence. Now both sequences are of length M+N which just happens to be 1.33*N. And, the filter sequence will have 3/4 of its samples=0 and will (or could have) only have 1 in 4 nonzero samples in frequency. This tells you something about the frequency transition widths that are possible with this filter. Summary: If you are going to do the original process then the number of samples might ought to be doubled with zero-valued samples before computing the FFT. Ditto the conceptual filter that you will construct. But, perhaps it makes more sense to keep the filter shorter in time as above...... P.S. The number of samples you have suggests huge overkill for doing most things. It implies that there must be spikes very close to fs/2. Is that right? The resolution implied goes way beyond what's "normal". If so, then there are likely any number of tricks to be played to reduce the compute load. Fred
>On 9/4/2010 9:46 AM, all4dsp wrote: >is one flavor of FFT better able to tackle (from speed and accuracy >> point of view) large real FFT (and IFFT) than another flavor of FFT >> algorithm? > >I can't answer your direct question but seem to recall that FFTW is >pretty good and you can check the link that Vlad provided. > >You might be wanting to ask yourself another question though (and >perhaps you have): > >"Is zeroing samples in the frequency domain likely to do what you
expect?"
> >This is an implementation of filtering by frequency domain >multiplication (multiplying by zeros and ones). If you conjure up such >a filter and IFFT just the filter then you'll see that the temporal unit >sample response may (or may not) be exactly what you hoped for. > >The issue is time-domain aliasing or "folding". It would be more >"acceptable" or "safer" practice to smooth the transitions in frequency >in going from passbands to stopbands so that the temporal output doesn't >"wrap/fold/overlap". Whether this matters is application specific. It >does have some impact on the compute load but not. > >More technically: >For a data sequence of length N: >If you arbitrarily apply 1's and 0's in the frequency domain you could >conceptually do this: >- Form the sequence of 1's and 0's. And, just so things are at least >reasonably correct: >- the 1st sample (at f=0) is assumed to repeat at fs. >- the 2nd sample=Nth sample >- the 3rd sample=(N-1)th sample\. >. >. >. >At least then the sequence is symmetric about f=0 and the IFFT will be
real.
> >Without any other consideration, this array will be of length N also,
right?
>- IFFT just this sequence. The result will be the unit sample response >of the filter and will have length N. >So, now we have a data sequence of length N and a filter of length N. >If you were to convolve them in time the result would have length 2N-1 >and, by the way, the entire output sequence would be either the >transient of filling the filter or the transient of emptying the filter >and there'd be no steady-state output at all! > >But, since the plan is to multiply in frequency then you'll be doing a >*circular* convolution. If you do a circular convolution with only N >points then the output will be totally overlapped upon itself. > >The proper way to do the circular convolution is this: >Assume the data length of N as above. >Assume the filter length is M and, I'll suggest that M should be a fair >bit shorter than N. I would have said M=0.1*N but let's use M=0.33*N >That way, 1/3 of the output will be the initial transient, 1/3 of the >output will be steady state and 1/3 of the output will be the trailing >transient. > >A straight convolution of data and filter here would result in M+N-1 >samples in the sequence. So must the circular convolution. For >simplicity, let's just make them both of length M+N. >Thus, we will append M zeros to the end of the data sequence >and >we will append N zeros to the end of the filter sequence. >Now both sequences are of length M+N which just happens to be 1.33*N. >And, the filter sequence will have 3/4 of its samples=0 and will (or >could have) only have 1 in 4 nonzero samples in frequency. This tells >you something about the frequency transition widths that are possible >with this filter. > >Summary: >If you are going to do the original process then the number of samples >might ought to be doubled with zero-valued samples before computing the >FFT. Ditto the conceptual filter that you will construct. >But, perhaps it makes more sense to keep the filter shorter in time as >above...... > >P.S. The number of samples you have suggests huge overkill for doing >most things. It implies that there must be spikes very close to fs/2. >Is that right? The resolution implied goes way beyond what's "normal". > If so, then there are likely any number of tricks to be played to >reduce the compute load. > >Fred >
Thanks Fred, you raise several good points for me to think about. There will be spikes very close or even at Fs/2 due to the physics involved. Turns out that the real signal often has spikes at or near Fs/2 for various reasons due to real noise phenomena. However, there will be no aliasing in the original signal (from the test environment in acquiring the data). The reason the array is so large is that the acquisition window is directly proportional to the lowest frequency that is measured. That is, if the acquisition window is 1ms of time, I can view down to 1KHz in frequencies. Since my application requires looking at 1KHz or even lower frequencies, I need to work with large arrays. Any filter (e.g. M points < N points) that looks at a subset of this data would be removing these low frequencies from the time-domain view, and sort of defeats the purpose of acquiring a large array to look at this data to begin with.
On Sep 4, 11:52 am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
> ...
> Let me know if this helps or if you need more details somewhere. Could you > describe what type of artifact might get introduced in such a scheme?
> Best regards, ggk
The type of scheme you present has been around for a long time in the acoustic world. Speech enhancement has been one application. There even with great attention to thresholding schemes there is a history of audible "singing" artifacts introduced by the process. The size of the transforms you propose has some consequences. Computer memory architectures have long been multi-tiered. It used to be ram and moving magnetic media. Now it is cache and bulk ram. There are "out-of-core" and "out-of-cache" (depending on their age) algorithms that may make more difference in execution time than optimizing multiply and adds. Your ffts are too big to fit in common cache sizes. Large transform sizes also lead to large word width. The physical world has few systems stable enough to achieve the stability to apply your 1 kHz resolution at 40GS Fs, let alone instument sampling clocks stable enough to sample to that resolution at the high end of the frequency range.. The dynamic range of such systems tends to drop with increasing sample frequency. The power supply generated interference you mention can be modulated by load changes. Some power supplies are designed to spread their spectra to escape narrowband emi specs. will that help or hurt our application? Let us know how closely your measured data matches your expectations. Good luck! Dale B. Dalrymple