comp.dsp | need recommendation(s) for FFT algorithm to use

Hi Experts,

My application requires taking FFT and IFFT on a REAL array of points,
where the array size is fairly large (number of points in array ranges
between 2^20 to 2^27, which is 1M to 140M). Once in the frequency domain, I
zero out all bins except for those bins associated with spurs, then IFFT
the array back. 

I'm looking for a recommendation for an algorithm suited for this purpose
(I'm not that fluent in understanding which flavor of FFT is better than
another here). I currently use Matlab (based on FFTW), but I've been
looking at what's available on the internet and there's quite a variety
(realFFT, kissFFT, FFTW, etc.). Don't know why one might be better than
another for this application, nor which one(s) have better reputations for
accuracy and speed, etc.

I will eventually place whatever FFT solution is used on a server and
access the FFT solution with the back-end of my custom application
software. The front-end of this application software is a GUI observed by a
client computer over the internet. While the end user isn't aware of what
FFT algorithm is applied (only the back-end of my software application
interacts with the FFT algorithm), I'm worried that if I use a GPL licensed
FFT solution, that I would need to "own" the server that the FFT algorithm
resides on in order to consider it internal use. 

That is, as in most situations I guess, I would outsource the web hosting
of my application to a third party (like Amazon cloud computing) and
therefore the third party would "own" the server, and so I worry that if I
install FFTW or other GPL licensed code on this server that it might be
considered "distribution" because I'm giving it to the web hosting company
(even though I'm their customer and not the other way around), and then I
worry my proprietary software is no longer proprietary... Could anyone
comment on whether this is a real concern or not? If it is, then a non-GPL
solution would be required unless the commercial license is not too
expensive (e.g. <$2K USD). 

thanks in advance, ggk

Reply by glen herrmannsfeldt ●September 4, 20102010-09-04

ggk <all4dsp@n_o_s_p_a_m.comcast.net> wrote:

> My application requires taking FFT and IFFT on a REAL array of points,
> where the array size is fairly large (number of points in array ranges
> between 2^20 to 2^27, which is 1M to 140M). Once in the frequency domain, I
> zero out all bins except for those bins associated with spurs, then IFFT
> the array back. 

I believe in that case it is faster to use one that leaves the
element in bit-reversed order, and then IFFT them from that order.
It saves a lot of moving around, though will (slightly) complicate
your bin zeroing.

-- glen

Reply by Vladimir Vassilevsky ●September 4, 20102010-09-04


ggk wrote:

> Hi Experts,
> 
> My application requires taking FFT and IFFT on a REAL array of points,
> where the array size is fairly large (number of points in array ranges
> between 2^20 to 2^27, which is 1M to 140M). Once in the frequency domain, I
> zero out all bins except for those bins associated with spurs, then IFFT
> the array back. 

What is the purpose of this?

> I'm looking for a recommendation for an algorithm suited for this purpose
> (I'm not that fluent in understanding which flavor of FFT is better than
> another here). I currently use Matlab (based on FFTW), but I've been
> looking at what's available on the internet and there's quite a variety
> (realFFT, kissFFT, FFTW, etc.). Don't know why one might be better than
> another for this application, nor which one(s) have better reputations for
> accuracy and speed, etc.

http://www.fftw.org/benchfft/



Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Reply by all4dsp ●September 4, 20102010-09-04

>
>
>ggk wrote:
>
>> Hi Experts,
>> 
>> My application requires taking FFT and IFFT on a REAL array of points,
>> where the array size is fairly large (number of points in array ranges
>> between 2^20 to 2^27, which is 1M to 140M). Once in the frequency
domain, I
>> zero out all bins except for those bins associated with spurs, then
IFFT
>> the array back. 
>
>What is the purpose of this?
>
>> I'm looking for a recommendation for an algorithm suited for this
purpose
>> (I'm not that fluent in understanding which flavor of FFT is better
than
>> another here). I currently use Matlab (based on FFTW), but I've been
>> looking at what's available on the internet and there's quite a variety
>> (realFFT, kissFFT, FFTW, etc.). Don't know why one might be better than
>> another for this application, nor which one(s) have better reputations
for
>> accuracy and speed, etc.
>
>http://www.fftw.org/benchfft/
>
>
>
>Vladimir Vassilevsky
>DSP and Mixed Signal Design Consultant
>http://www.abvolt.com
>

The purpose of zeroing bins and taking IFFT is to observe the time-domain
contribution due to the spurs only (and not the random noise). For the
processing that I'm doing, this is the key piece of information.

Reply by all4dsp ●September 4, 20102010-09-04

>Hi Experts,
>
>My application requires taking FFT and IFFT on a REAL array of points,
>where the array size is fairly large (number of points in array ranges
>between 2^20 to 2^27, which is 1M to 140M). Once in the frequency domain,
I
>zero out all bins except for those bins associated with spurs, then IFFT
>the array back. 
>
>I'm looking for a recommendation for an algorithm suited for this purpose
>(I'm not that fluent in understanding which flavor of FFT is better than
>another here). I currently use Matlab (based on FFTW), but I've been
>looking at what's available on the internet and there's quite a variety
>(realFFT, kissFFT, FFTW, etc.). Don't know why one might be better than
>another for this application, nor which one(s) have better reputations
for
>accuracy and speed, etc.
>
>I will eventually place whatever FFT solution is used on a server and
>access the FFT solution with the back-end of my custom application
>software. The front-end of this application software is a GUI observed by
a
>client computer over the internet. While the end user isn't aware of what
>FFT algorithm is applied (only the back-end of my software application
>interacts with the FFT algorithm), I'm worried that if I use a GPL
licensed
>FFT solution, that I would need to "own" the server that the FFT
algorithm
>resides on in order to consider it internal use. 
>
>That is, as in most situations I guess, I would outsource the web hosting
>of my application to a third party (like Amazon cloud computing) and
>therefore the third party would "own" the server, and so I worry that if
I
>install FFTW or other GPL licensed code on this server that it might be
>considered "distribution" because I'm giving it to the web hosting
company
>(even though I'm their customer and not the other way around), and then I
>worry my proprietary software is no longer proprietary... Could anyone
>comment on whether this is a real concern or not? If it is, then a
non-GPL
>solution would be required unless the commercial license is not too
>expensive (e.g. <$2K USD). 
>
>thanks in advance, ggk
>
>
OK, I've learned that GPL licensing shouldn't be a problem for this
application. So without any licensing restrictions, I remaining question
is: is one flavor of FFT better able to tackle (from speed and accuracy
point of view) large real FFT (and IFFT) than another flavor of FFT
algorithm?

Reply by dbd ●September 4, 20102010-09-04

On Sep 4, 9:43&#4294967295;am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
> ...
>
> The purpose of zeroing bins and taking IFFT is to observe the time-domain
> contribution due to the spurs only (and not the random noise). For the
> processing that I'm doing, this is the key piece of information.

Abrupt transitions between weightings of 0 and 1 in regions of
significant response in the frequency domain produce significant
artifacts in the time domain. One of the reasons to ask about the
purpose of your processing scheme is to understand why you think that
your application would be immune to the artifacts. So, what is the
application?

Dale B. Dalrymple

Reply by all4dsp ●September 4, 20102010-09-04

>On Sep 4, 9:43=A0am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
>> ...
>>
>> The purpose of zeroing bins and taking IFFT is to observe the
time-domain
>> contribution due to the spurs only (and not the random noise). For the
>> processing that I'm doing, this is the key piece of information.
>
>Abrupt transitions between weightings of 0 and 1 in regions of
>significant response in the frequency domain produce significant
>artifacts in the time domain. One of the reasons to ask about the
>purpose of your processing scheme is to understand why you think that
>your application would be immune to the artifacts. So, what is the
>application?
>
>Dale B. Dalrymple
>

Sure Dale, let me describe the application more... I have a time-domain
waveform containing deterministic and random sources of noise. This
waveform is derived from a clock output of an electronic device using a
high-speed oscilloscope (40GS/s). Sources of deterministic noise might be a
switch-mode power supply or crosstalk from neighboring signal lines that
couple onto the signal output of interest. Random noise may include thermal
noise from the oscilloscope and the device, for example.

When I take an FFT of this waveform, I'll see spurs due to the
deterministic noise popping up out of a background of random noise. For
example, if a switch-mode power supply switches at 100KHz, when I observe,
for example, a 100MHz clock signal and look at the timing noise I'd see a
100KHz tone in the frequency domain. By zeroing-out the non-spur frequency
bins, and taking the IFFT (of only the spur bins), the resulting
time-domain waveform will be due only to the deterministic sources of
noise. If the 100KHz tone was the only spur observed, the IFFT would result
in a 100KHz sine wave whose amplitude is a measure of the strength of the
noise introduced by the switch mode power supply. If multiple spurs are
observed in the frequency domain, the IFFT time-domain waveform would be a
more complex pattern as these sine waves combine with each other. 

The end goal is to obtain a histogram (or, probability density function) of
the y-axis (amplitude) values of this (IFFT) time-domain waveform. This
histogram will be used for further processing, but it's critical that this
histogram only include deterministic sources of noise (e.g. due to spurs
from frequency domain, not random background frequency bins).  

Let me know if this helps or if you need more details somewhere. Could you
describe what type of artifact might get introduced in such a scheme?

Best regards, ggk

Reply by Fred Marshall ●September 4, 20102010-09-04

On 9/4/2010 9:46 AM, all4dsp wrote:
is one flavor of FFT better able to tackle (from speed and accuracy
> point of view) large real FFT (and IFFT) than another flavor of FFT
> algorithm?

I can't answer your direct question but seem to recall that FFTW is 
pretty good and you can check the link that Vlad provided.

You might be wanting to ask yourself another question though (and 
perhaps you have):

"Is zeroing samples in the frequency domain likely to do what you expect?"

This is an implementation of filtering by frequency domain 
multiplication (multiplying by zeros and ones).  If you conjure up such 
a filter and IFFT just the filter then you'll see that the temporal unit 
sample response may (or may not) be exactly what you hoped for.

The issue is time-domain aliasing or "folding".  It would be more 
"acceptable" or "safer" practice to smooth the transitions in frequency 
in going from passbands to stopbands so that the temporal output doesn't 
"wrap/fold/overlap".  Whether this matters is application specific.  It 
does have some impact on the compute load but not.

More technically:
For a data sequence of length N:
If you arbitrarily apply 1's and 0's in the frequency domain you could 
conceptually do this:
- Form the sequence of 1's and 0's.  And, just so things are at least 
reasonably correct:
- the 1st sample (at f=0) is assumed to repeat at fs.
- the 2nd sample=Nth sample
- the 3rd sample=(N-1)th sample\.
.
.
.
At least then the sequence is symmetric about f=0 and the IFFT will be real.

Without any other consideration, this array will be of length N also, right?
- IFFT just this sequence.  The result will be the unit sample response 
of the filter and will have length N.
So, now we have a data sequence of length N and a filter of length N.
If you were to convolve them in time the result would have length 2N-1 
and, by the way, the entire output sequence would be either the 
transient of filling the filter or the transient of emptying the filter 
and there'd be no steady-state output at all!

But, since the plan is to multiply in frequency then you'll be doing a 
*circular* convolution.  If you do a circular convolution with only N 
points then the output will be totally overlapped upon itself.

The proper way to do the circular convolution is this:
Assume the data length of N as above.
Assume the filter length is M and, I'll suggest that M should be a fair 
bit shorter than N.  I would have said M=0.1*N but let's use M=0.33*N
That way, 1/3 of the output will be the initial transient, 1/3 of the 
output will be steady state and 1/3 of the output will be the trailing 
transient.

A straight convolution of data and filter here would result in M+N-1 
samples in the sequence.  So must the circular convolution.  For 
simplicity, let's just make them both of length M+N.
Thus, we will append M zeros to the end of the data sequence
and
we will append N zeros to the end of the filter sequence.
Now both sequences are of length M+N which just happens to be 1.33*N.
And, the filter sequence will have 3/4 of its samples=0 and will (or 
could have) only have 1 in 4 nonzero samples in frequency.  This tells 
you something about the frequency transition widths that are possible 
with this filter.

Summary:
If you are going to do the original process then the number of samples 
might ought to be doubled with zero-valued samples before computing the 
FFT.  Ditto the conceptual filter that you will construct.
But, perhaps it makes more sense to keep the filter shorter in time as 
above......

P.S.  The number of samples you have suggests huge overkill for doing 
most things.  It implies that there must be spikes very close to fs/2. 
Is that right?  The resolution implied goes way beyond what's "normal". 
  If so, then there are likely any number of tricks to be played to 
reduce the compute load.

Fred

Reply by all4dsp ●September 4, 20102010-09-04

>On 9/4/2010 9:46 AM, all4dsp wrote:
>is one flavor of FFT better able to tackle (from speed and accuracy
>> point of view) large real FFT (and IFFT) than another flavor of FFT
>> algorithm?
>
>I can't answer your direct question but seem to recall that FFTW is 
>pretty good and you can check the link that Vlad provided.
>
>You might be wanting to ask yourself another question though (and 
>perhaps you have):
>
>"Is zeroing samples in the frequency domain likely to do what you
expect?"
>
>This is an implementation of filtering by frequency domain 
>multiplication (multiplying by zeros and ones).  If you conjure up such 
>a filter and IFFT just the filter then you'll see that the temporal unit 
>sample response may (or may not) be exactly what you hoped for.
>
>The issue is time-domain aliasing or "folding".  It would be more 
>"acceptable" or "safer" practice to smooth the transitions in frequency 
>in going from passbands to stopbands so that the temporal output doesn't 
>"wrap/fold/overlap".  Whether this matters is application specific.  It 
>does have some impact on the compute load but not.
>
>More technically:
>For a data sequence of length N:
>If you arbitrarily apply 1's and 0's in the frequency domain you could 
>conceptually do this:
>- Form the sequence of 1's and 0's.  And, just so things are at least 
>reasonably correct:
>- the 1st sample (at f=0) is assumed to repeat at fs.
>- the 2nd sample=Nth sample
>- the 3rd sample=(N-1)th sample\.
>.
>.
>.
>At least then the sequence is symmetric about f=0 and the IFFT will be
real.
>
>Without any other consideration, this array will be of length N also,
right?
>- IFFT just this sequence.  The result will be the unit sample response 
>of the filter and will have length N.
>So, now we have a data sequence of length N and a filter of length N.
>If you were to convolve them in time the result would have length 2N-1 
>and, by the way, the entire output sequence would be either the 
>transient of filling the filter or the transient of emptying the filter 
>and there'd be no steady-state output at all!
>
>But, since the plan is to multiply in frequency then you'll be doing a 
>*circular* convolution.  If you do a circular convolution with only N 
>points then the output will be totally overlapped upon itself.
>
>The proper way to do the circular convolution is this:
>Assume the data length of N as above.
>Assume the filter length is M and, I'll suggest that M should be a fair 
>bit shorter than N.  I would have said M=0.1*N but let's use M=0.33*N
>That way, 1/3 of the output will be the initial transient, 1/3 of the 
>output will be steady state and 1/3 of the output will be the trailing 
>transient.
>
>A straight convolution of data and filter here would result in M+N-1 
>samples in the sequence.  So must the circular convolution.  For 
>simplicity, let's just make them both of length M+N.
>Thus, we will append M zeros to the end of the data sequence
>and
>we will append N zeros to the end of the filter sequence.
>Now both sequences are of length M+N which just happens to be 1.33*N.
>And, the filter sequence will have 3/4 of its samples=0 and will (or 
>could have) only have 1 in 4 nonzero samples in frequency.  This tells 
>you something about the frequency transition widths that are possible 
>with this filter.
>
>Summary:
>If you are going to do the original process then the number of samples 
>might ought to be doubled with zero-valued samples before computing the 
>FFT.  Ditto the conceptual filter that you will construct.
>But, perhaps it makes more sense to keep the filter shorter in time as 
>above......
>
>P.S.  The number of samples you have suggests huge overkill for doing 
>most things.  It implies that there must be spikes very close to fs/2. 
>Is that right?  The resolution implied goes way beyond what's "normal". 
>  If so, then there are likely any number of tricks to be played to 
>reduce the compute load.
>
>Fred
>
Thanks Fred, you raise several good points for me to think about. 

There will be spikes very close or even at Fs/2 due to the physics
involved. Turns out that the real signal often has spikes at or near Fs/2
for various reasons due to real noise phenomena. However, there will be no
aliasing in the original signal (from the test environment in acquiring the
data). 

The reason the array is so large is that the acquisition window is directly
proportional to the lowest frequency that is measured. That is, if the
acquisition window is 1ms of time, I can view down to 1KHz in frequencies.
Since my application requires looking at 1KHz or even lower frequencies, I
need to work with large arrays. Any filter (e.g. M points < N points) that
looks at a subset of this data would be removing these low frequencies from
the time-domain view, and sort of defeats the purpose of acquiring a large
array to look at this data to begin with.

Reply by dbd ●September 4, 20102010-09-04

On Sep 4, 11:52 am, "all4dsp" <all4dsp@n_o_s_p_a_m.comcast.net> wrote:
> ...

> Let me know if this helps or if you need more details somewhere. Could you
> describe what type of artifact might get introduced in such a scheme?

> Best regards, ggk

The type of scheme you present has been around for a long time in the
acoustic world. Speech enhancement has been one application. There
even with great attention to thresholding schemes there is a history
of audible "singing" artifacts introduced by the process.

The size of the transforms you propose has some consequences. Computer
memory architectures have long been multi-tiered. It used to be ram
and moving magnetic media. Now it is  cache and bulk ram. There are
"out-of-core" and "out-of-cache" (depending on their age) algorithms
that may make more difference in execution time than optimizing
multiply and adds. Your ffts are too big to fit in common cache sizes.
Large transform sizes also lead to large word width.

The physical world has few systems stable enough to achieve the
stability to apply your 1 kHz resolution at 40GS Fs, let alone
instument sampling clocks stable enough to sample to that resolution
at the high end of the frequency range.. The dynamic range of such
systems tends to drop with increasing sample frequency. The power
supply generated interference you mention can be modulated by load
changes. Some power supplies are designed to spread their spectra to
escape narrowband emi specs. will that help or hurt our application?

Let us know how closely your measured data matches your expectations.
Good luck!

Dale B. Dalrymple

Previous12 Next

need recommendation(s) for FFT algorithm to use

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group