comp.dsp | How to fft huge data arrays?

Hi!

For my little project i need to fft large data arrays, that do not fit 
into memory. So far i know about overlap add and for some problems that 
works fine, but I wondered if it's possible to transform the whole data 
altogether.
I tried to code a special memory management for the fft but due to bit 
reversed addressing i get so many page faults that it's really not 
efficient...  I also tried to reconstruct the large fft from several 
windowed transforms, but it got me very strange results (is this even 
possible??? i'm really not an expert in dsp). So, any help would be nice!

Andreas

Reply by Richard Dobson ●September 26, 20052005-09-26

How large?

Try the author of "mammut", &#4294967295;yvind Hammer:

http://www.notam02.no/notam02/prod-prg-mammuthelp.html

it does a single huge FFT on a soundfile. The link to the software itself seems 
to be broken or just not available.

One way or another, if your data i on disk rather than in memory you will be 
swapping pages (one assumes that the OS memory mapping support will be 
reasonably eficcient!), so there must by definition be a limit on just how 
efficient it can be.

Richard Dobson

Andreas Besting wrote:

> Hi!
> 
> For my little project i need to fft large data arrays, that do not fit 
> into memory. So far i know about overlap add and for some problems that 
> works fine, but I wondered if it's possible to transform the whole data 
> altogether.
> I tried to code a special memory management for the fft but due to bit 
> reversed addressing i get so many page faults that it's really not 
> efficient...  I also tried to reconstruct the large fft from several 
> windowed transforms, but it got me very strange results (is this even 
> possible??? i'm really not an expert in dsp). So, any help would be nice!
> 
> Andreas

Reply by Andreas Besting ●September 26, 20052005-09-26

Hi!

Thanks for the link!

Richard Dobson wrote:
> How large?

That depends - sometimes up to 100MB per signal, but often in my 
application i haven opened many signals, and then it gets critical. But 
you're right, i have to swap pages.
The main problem is, that i can't do it during the transform or after 
the tranform when i'm restoring the original bit sorting. I would have 
to read single values almost randomly from the file, and that's too slow.
The best way for me would be to use my current fft implementation on 
small segments, and put it all together after that using demand paging, 
but i don't know how this works. I first tried to take from each 
transformed segment real and imaginary samples and add them up one after 
another in a large complex array, but when i took the inverse transform 
it wasn't looking like the original signal :-(

> 
> Try the author of "mammut", &#4294967295;yvind Hammer:
> 
> http://www.notam02.no/notam02/prod-prg-mammuthelp.html
> 
> it does a single huge FFT on a soundfile. The link to the software 
> itself seems to be broken or just not available.
> 
> One way or another, if your data i on disk rather than in memory you 
> will be swapping pages (one assumes that the OS memory mapping support 
> will be reasonably eficcient!), so there must by definition be a limit 
> on just how efficient it can be.
> 
> 
> 
> Richard Dobson
>

Reply by Rune Allnor ●September 26, 20052005-09-26

Andreas Besting wrote:
> Hi!
>
> Thanks for the link!
>
> Richard Dobson wrote:
> > How large?
>
> That depends - sometimes up to 100MB per signal, but often in my
> application i haven opened many signals, and then it gets critical.

What is the application? Why do you have to work with 100 MB segments?
If you are implementing a filter, could you make do with smaller
segments, say 5-10 MB, and use an overlap-add/overlap-save technique?

In my experience, the problems with handling sheer data volumes
can become overwhelming for even the most trivial operations, when
the data volumes become comparable with memory size.

I find that it surprisingly often pays off to use a smallish buffer
and loop through the data file a number of times. The data become
easier to handle, and the performance stays reasonable.

Rune

Reply by rhnl...@yahoo.com ●September 26, 20052005-09-26

Andreas Besting wrote:
> For my little project i need to fft large data arrays, that do not fit
> into memory. So far i know about overlap add and for some problems that
> works fine, but I wondered if it's possible to transform the whole data
> altogether.
> I tried to code a special memory management for the fft but due to bit
> reversed addressing i get so many page faults that it's really not
> efficient...  I also tried to reconstruct the large fft from several
> windowed transforms, but it got me very strange results (is this even
> possible??? i'm really not an expert in dsp). So, any help would be nice!

I think the way they did very long fft's in the old days was to use
a multipass algorithm, writing out the array to tape in a different
sort on each near-linear pass, so as not to trash memory or any
cache.  Might have required more than two 9-track tape drives,
and/or more than on pass per address bit in the reversed address.

Did these algorithms have a name?

IMHO. YMMV.
-- 
rhn A.T nicholson d.O.t C-o-M

Reply by Rune Allnor ●September 27, 20052005-09-27

Andreas Besting wrote:
> Hi!
>
> For my little project i need to fft large data arrays, that do not fit
> into memory. So far i know about overlap add and for some problems that
> works fine, but I wondered if it's possible to transform the whole data
> altogether.
> I tried to code a special memory management for the fft but due to bit
> reversed addressing i get so many page faults that it's really not
> efficient...  I also tried to reconstruct the large fft from several
> windowed transforms, but it got me very strange results (is this even
> possible??? i'm really not an expert in dsp). So, any help would be nice!

I don't know for sure, but assume your "little project" involves
applying some sort of filter to a sound file.

The Fast Fourier Transform, FFT, is a fast way to implement the
Discrete Fourier Transform, DFT. The DFT can naively be implemented
in a way that requires O(N^2) FLOPs for transforming an N-point
sequence, while the FFT requires only O(N*log(N)) FLOPs for
doing the same job.

So the FFT uses a fraction log(N)/N of the time of the naive
algorithm. For N>1 we see that it is a very good idea to use
the FFT to compute the DFT, when disregarding various forms
of program run-time overhead.

Now, the operation of filtering a data sequence can be expressed
in terms the DFTs of the input signal and filter transfer function,
so a filtering operation can be imnplemented in terms of FFTs.

The question is whether that is a smart thing to do.

Implementing the filter in terms of DFTs requires computing the
FFT of the input signal, one or two FFTs to find the filter
transfer function, multiplying the spectra of the input signal
and the transfer function followed by an IFFT to find the output
signal. All of this takes place in complex arithmetic, so it
turns out to be quite a number of FLOPs involved.

The net result is that if the number of coefficients in your
filter is comparable to log(N), N being the number of data
points, you might want to consider the straight-forward
time domain implementation of your filter.

The run-time penalty will be small compared to the FFT/IFFT
algorithm (for very small filters the time-domain methof may
even be faster), and it wuill be way simpler to implement.
Depending on the compiler you use, you might even be able
to exploit certain fancy MMX-instructions in your CPU that
can make a very fast implementation of the algorithm.

Now, you need very presice knowledge of the number of FLOPs
of an FFT algorithm to evaluate where the limit is for
switching to time domain, but a very naive argument says that
time domain is the place to work with 100000000 samples and a
filter with 25 or less coefficients.

So at first glance, this type of argument only works for small
IIR filters or very small FIR filters. However, the O(NlogN)
notation does away with numerical constants like in

  O(NlogN) -> "The FFT takes 28*N*log(30*N) FLOPS"

so depending on the numerical constants that go into the
exact expression, there may be room for somewhat larger
filters in time domain. But you need to consult somebody
who know the FLOP counts in the FFT algorithms to find out
the exact details.

Rune

Reply by Andreas Besting ●September 27, 20052005-09-27

Rune Allnor wrote:

> 
> I don't know for sure, but assume your "little project" involves
> applying some sort of filter to a sound file.

Currently I'm working with biomedical time series and i wrote a small 
viewer/editor to process the data with some filters and other functions. 
It's also a project for me to learn a bit more about dsp.

> The net result is that if the number of coefficients in your
> filter is comparable to log(N), N being the number of data
> points, you might want to consider the straight-forward
> time domain implementation of your filter.
> 
> The run-time penalty will be small compared to the FFT/IFFT
> algorithm (for very small filters the time-domain methof may
> even be faster), and it wuill be way simpler to implement.
> Depending on the compiler you use, you might even be able
> to exploit certain fancy MMX-instructions in your CPU that
> can make a very fast implementation of the algorithm.
> 

That's interesting, i can't even imagine how filtering in the time 
domain works.

When i started on the project, i read some tutorials about the FFT and 
after some time (months...) i managed to implement it. I wrote a simple 
rectangular filter that worked like that: FFT, zero frequency bins, 
IFFT. Then i realized, that for large signals my heap space was too 
small and the program crashed. I read about overlap add and coded a 
better filter. That worked fine, but due to the overlapping the 
performance was much worse than before.
(BTW, i still have a question about filters: When i convolve the signal 
with the filter, i understand that the result signal's size is longer 
than the original one (+filter size). Is it ok when i cut the beginning 
and the end to get a filtered signal with equal length? i did it and the 
result looks quite good.)
After that i implemented cross-correlation using FFT and had the same 
problem that my heap space was too small. But now my "filter" was 
another large signal and i could'n use overlap add (or can i?). So i 
thought, if i had an fft that worked also on very large signals, many 
other problems (such as cross-correlation) would be much easier to realize.
I tried to write an FFT using on-demand paging, but it was terribly slow 
because the bit reversed addressing caused too many page faults.
My last attempt was to code a kind of "buffered" FFT, that could somehow 
merge many small spectra to a large one, but as i said, this didn't work 
at all.
I just found a paper that looks very interesting, it's called 
"Performing Out-of-Core FFTs on Parallel Disk Systems", 
http://historical.ncstrl.org/tr/pdf/icase/TR-96-70.pdf
I haven't read it but it seems to address my problem.

So far, thank you for your help!

Reply by john ●September 27, 20052005-09-27

Another approach is to use a series of zoom FFTs on sub-bands of data,
building up the big FFT in chunks. This method takes N linear passes
through the data, where N is the number of sub-bands. Recently I chose
this technique to do a  524288 real point FFT on a '54XX DSP chip using
a 4096 point complex FFT routine and a decimation factor of 128
(4096*128 == 524288).

John

Reply by Andreas Huennebeck ●September 27, 20052005-09-27

Andreas Besting wrote:

> That's interesting, i can't even imagine how filtering in the time
> domain works.

Then you definitely should read this book (free download!):
http://www.dspguide.com/

> (BTW, i still have a question about filters: When i convolve the signal
> with the filter, i understand that the result signal's size is longer
> than the original one (+filter size). Is it ok when i cut the beginning
> and the end to get a filtered signal with equal length? i did it and the
> result looks quite good.)

I don't thinks so - but it's all in the book.

bye
Andreas
-- 
Andreas H&#4294967295;nnebeck | email: ah@despammed.com
----- privat ---- | www  : http://www.huennebeck-online.de
Fax/Anrufbeantworter: 0721/151-284301
GPG-Key: http://www.huennebeck-online.de/public_keys/andreas.asc
PGP-Key: http://www.huennebeck-online.de/public_keys/pgp_andreas.asc

Reply by Andreas Besting ●September 27, 20052005-09-27

john wrote:
> Another approach is to use a series of zoom FFTs on sub-bands of data,
> building up the big FFT in chunks. This method takes N linear passes
> through the data, where N is the number of sub-bands. Recently I chose
> this technique to do a  524288 real point FFT on a '54XX DSP chip using
> a 4096 point complex FFT routine and a decimation factor of 128
> (4096*128 == 524288).
> 
> John
> 

Yes!! That's exactly what i need! So it does work! But how do i create 
the large spectrum from the chunks?


Andreas

Previous12 Next

How to fft huge data arrays?

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group