DSPRelated.com
Forums

How to fft huge data arrays?

Started by Andreas Besting September 26, 2005
Andreas Huennebeck wrote:
> Andreas Besting wrote: > > >>That's interesting, i can't even imagine how filtering in the time >>domain works. > > > Then you definitely should read this book (free download!): > http://www.dspguide.com/
I know it, it's really great!
> > >>(BTW, i still have a question about filters: When i convolve the signal >>with the filter, i understand that the result signal's size is longer >>than the original one (+filter size). Is it ok when i cut the beginning >>and the end to get a filtered signal with equal length? i did it and the >>result looks quite good.) > >
> I don't thinks so - but it's all in the book. >
Hm, in Ch 18 (page 313 or 315) it looks like the filtered signal is always centered in the middle. I experimented a little bit and the larger the filter (or the convolution size), the more i got somehow "sqeezed and centered" signals. So i thought, cut the beginning and the end of half the filter size would be ok... I plotted both signals and they seem to match. But i still wonder if this is really correct. Sometimes i need a filtered signal with the exact length as the original one.
Andreas Besting wrote:
> john wrote: > > Another approach is to use a series of zoom FFTs on sub-bands of data, > > building up the big FFT in chunks. This method takes N linear passes > > through the data, where N is the number of sub-bands. Recently I chose > > this technique to do a 524288 real point FFT on a '54XX DSP chip using > > a 4096 point complex FFT routine and a decimation factor of 128 > > (4096*128 == 524288). > > > > John > > > > Yes!! That's exactly what i need! So it does work! But how do i create > the large spectrum from the chunks? > > > Andreas
You're just using the zoom-FFT to zoom in on one chunk of the spectrum at a time. The chunks are then placed side by side to form the final hi-res spectrum. If you are only interested in pieces of the spectrum, this can be fairly time efficient. It is not a particularly time efficient way to compute the entire spectrum, however it does not require much RAM, and it doesn't jump all over the place in the stored data (could run on a tape drive). John
Andreas Besting wrote:
> For my little project i need to fft large data arrays, that do not fit > into memory. So far i know about overlap add and for some problems that > works fine, but I wondered if it's possible to transform the whole data > altogether.
Take a page from the BETA receiver used in the SETI project. Decompose the huge FFT into smaller chunks that can be handled within your memory. Appendix C of this document goes into details: http://seti.harvard.edu/grad/dpdf/thesislt.pdf -- Dave Tweed
Rune Allnor wrote:
> Andreas Besting wrote: > >>Hi! >> >>For my little project i need to fft large data arrays, that do not fit >>into memory. So far i know about overlap add and for some problems that >>works fine, but I wondered if it's possible to transform the whole data >>altogether. >>I tried to code a special memory management for the fft but due to bit >>reversed addressing i get so many page faults that it's really not >>efficient... I also tried to reconstruct the large fft from several >>windowed transforms, but it got me very strange results (is this even >>possible??? i'm really not an expert in dsp). So, any help would be nice! > > > I don't know for sure, but assume your "little project" involves > applying some sort of filter to a sound file. > > The Fast Fourier Transform, FFT, is a fast way to implement the > Discrete Fourier Transform, DFT. The DFT can naively be implemented > in a way that requires O(N^2) FLOPs for transforming an N-point > sequence, while the FFT requires only O(N*log(N)) FLOPs for > doing the same job. > > So the FFT uses a fraction log(N)/N of the time of the naive > algorithm. For N>1 we see that it is a very good idea to use > the FFT to compute the DFT, when disregarding various forms > of program run-time overhead. > > Now, the operation of filtering a data sequence can be expressed > in terms the DFTs of the input signal and filter transfer function, > so a filtering operation can be imnplemented in terms of FFTs. > > The question is whether that is a smart thing to do. > > Implementing the filter in terms of DFTs requires computing the > FFT of the input signal, one or two FFTs to find the filter > transfer function, multiplying the spectra of the input signal > and the transfer function followed by an IFFT to find the output > signal. All of this takes place in complex arithmetic, so it > turns out to be quite a number of FLOPs involved. > > The net result is that if the number of coefficients in your > filter is comparable to log(N), N being the number of data > points, you might want to consider the straight-forward > time domain implementation of your filter. > > The run-time penalty will be small compared to the FFT/IFFT > algorithm (for very small filters the time-domain methof may > even be faster), and it will be way simpler to implement. > Depending on the compiler you use, you might even be able > to exploit certain fancy MMX-instructions in your CPU that > can make a very fast implementation of the algorithm. > > Now, you need very presice knowledge of the number of FLOPs > of an FFT algorithm to evaluate where the limit is for > switching to time domain, but a very naive argument says that > time domain is the place to work with 100000000 samples and a > filter with 25 or less coefficients. > > So at first glance, this type of argument only works for small > IIR filters or very small FIR filters. However, the O(NlogN) > notation does away with numerical constants like in > > O(NlogN) -> "The FFT takes 28*N*log(30*N) FLOPS" > > so depending on the numerical constants that go into the > exact expression, there may be room for somewhat larger > filters in time domain. But you need to consult somebody > who know the FLOP counts in the FFT algorithms to find out > the exact details.
There's another point in favor of working in the time domain: it's easier to avoid the kind of paging inefficiencies that are plaguing Andreas. But there are uses for finding the spectrum aside from filtering. We don't know the application here. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Andreas Besting wrote:
> Rune Allnor wrote: > > > > > I don't know for sure, but assume your "little project" involves > > applying some sort of filter to a sound file. > > Currently I'm working with biomedical time series and i wrote a small > viewer/editor to process the data with some filters and other functions. > It's also a project for me to learn a bit more about dsp.
I like that. It's very hands-on and you will, even after what you have done so far, have a way better basis for working with DSP than somebody who uses the canned routines from the matlab Signal Processing Toolbox. Even if you happen to use matlab in your little project.
> > The net result is that if the number of coefficients in your > > filter is comparable to log(N), N being the number of data > > points, you might want to consider the straight-forward > > time domain implementation of your filter. > > > > The run-time penalty will be small compared to the FFT/IFFT > > algorithm (for very small filters the time-domain methof may > > even be faster), and it wuill be way simpler to implement. > > Depending on the compiler you use, you might even be able > > to exploit certain fancy MMX-instructions in your CPU that > > can make a very fast implementation of the algorithm. > > > > That's interesting, i can't even imagine how filtering in the time > domain works.
It is very simple, at least in principle: y[n] = a1*y[n-1] + a2*y[n-2] + ... + aN*y[n-N] +b0*x[n] + b1*x[n-1] + ... + bM*x[n-M] where an, bm are filter coefficients, x[n] is the input signal and y[n] is the output signal.
> When i started on the project, i read some tutorials about the FFT and > after some time (months...) i managed to implement it.
You did that yourself? Not following a recipe/pseudo code? Brilliant!
> I wrote a simple > rectangular filter that worked like that: FFT, zero frequency bins, > IFFT.
The problem with this applroach is that ypu have no control of what happens in the bins between the coefficients. Read a basic text on "FIR filter design by the window method" to find a better way. The implementation is OK, though.
> Then i realized, that for large signals my heap space was too > small and the program crashed. I read about overlap add and coded a > better filter. That worked fine, but due to the overlapping the > performance was much worse than before.
Well, large amounts of data present their own problems. Generally speaking, I would say "stay away from the huge amounts of data until you have to." It just takes too much detail knowledge to find good algorithms and good implementations.
> (BTW, i still have a question about filters: When i convolve the signal > with the filter, i understand that the result signal's size is longer > than the original one (+filter size). Is it ok when i cut the beginning > and the end to get a filtered signal with equal length? i did it and the > result looks quite good.)
It depends on the application, but yes, this is onme way of doing things. It is straight-forward for FIR filters where the length of the impulse response is known, but less simple when IIR filters are involved. Again, get a basic text on DSP to learn the details.
> After that i implemented cross-correlation using FFT and had the same > problem that my heap space was too small. But now my "filter" was > another large signal and i could'n use overlap add (or can i?). So i > thought, if i had an fft that worked also on very large signals, many > other problems (such as cross-correlation) would be much easier to realize. > I tried to write an FFT using on-demand paging, but it was terribly slow > because the bit reversed addressing caused too many page faults.
The devil is in the (implementation) details... With large amounts of (stationary) data, it might be better to compute a shorter aotocorrelation sequence, and use an average of many estimates of the sequence. I am not sure how useful the autocorrelation coefficient at lag 50000000 will be; a good estimate for the coefficient at lag 5000 might be more interesting...
> My last attempt was to code a kind of "buffered" FFT, that could somehow > merge many small spectra to a large one, but as i said, this didn't work > at all. > I just found a paper that looks very interesting, it's called > "Performing Out-of-Core FFTs on Parallel Disk Systems", > http://historical.ncstrl.org/tr/pdf/icase/TR-96-70.pdf > I haven't read it but it seems to address my problem. > > So far, thank you for your help!
Well, you certainly have your work cut out for you. I am impressed by the guts you have shown by implementing these things yourself more or less from scratch. Again, the lessons learned by doing something like this are invaluable in a future job dealing with DSP. Rune
rhnlogic@yahoo.com wrote:
> Did these algorithms have a name?
Out-of-core FFT? -- Jim Thomas Principal Applications Engineer Bittware, Inc jthomas@bittware.com http://www.bittware.com (603) 226-0404 x536 Today is the last day of your life so far.
Jim Thomas wrote:
> rhnlogic@yahoo.com wrote: > > Did these algorithms have a name? > > Out-of-core FFT?
I think Don Knuth treated the problem of processing a data set that is much larger than the available RAM in volume 2 of his "The art of computer programming".
> Today is the last day of your life so far.
This is an experienced engineer talking... Rune
"Jim Thomas" <jthomas@bittware.com> wrote in message
news:11jirn01co2om85@corp.supernews.com...

> rhnlogic@yahoo.com wrote: > > Did these algorithms have a name?
> Out-of-core FFT?
Four-step (or was it five-step?) FFT; in http://www.jjj.de/fxt/fxtbook.pdf it's referred to as the matrix Fourier algorithm (MFA). None of these names seems very descriptive, as the number of steps is in the eye of the beholder, and any DFT algorithm is just trying to compute the action of a matrix on a vector. The idea is to factor the problem size into two roughly equal pieces and apply a Cooley-Tukey step to these two pieces. The data are originally badly placed in memory to execute the transforms implied by the first factor, so the matrix is effectively transposed so as to make this set of transforms operate on contiguous data. After these transforms the data are transposed back to make the data sets for the second course of transforms each contiguous in memory. After the second set of transforms a third and final transpose is necessary to put the outputs in standard order. The three transposes seem like gratuitous extra work, but these are the only stages that touch memory and the memory operations aren't necessarily as horrible as those requested by the FFT. With due attention to the sizes of things you are touching, such as cache lines and pages, the memory operation phases can be fairly effecient. I once coded an algorithm like this and it beat the CXML FFT algorithm on a 21164 in speed for large data sizes for this reason (it won at small data sizes for a different reason.) BTW, I don't recommend using an LUT to hold the phase (or some would say twiddle) factors between the two halves of the procedure. If your data don't fit in memory, this one set of phase factors won't, either. LUTs are fine for the phase factors in the substages of the small FFTs, though. -- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end
Wow, that's one hell of a book. Looks like a treasure trove!

Thanks for posting that.

John