comp.dsp | How to fft huge data arrays?| page 2

Reply by Andreas Besting ●September 27, 20052005-09-27

Andreas Huennebeck wrote:
> Andreas Besting wrote:
> 
> 
>>That's interesting, i can't even imagine how filtering in the time
>>domain works.
> 
> 
> Then you definitely should read this book (free download!):
> http://www.dspguide.com/

I know it, it's really great!

> 
> 
>>(BTW, i still have a question about filters: When i convolve the signal
>>with the filter, i understand that the result signal's size is longer
>>than the original one (+filter size). Is it ok when i cut the beginning
>>and the end to get a filtered signal with equal length? i did it and the
>>result looks quite good.)
> 
> 

> I don't thinks so - but it's all in the book.
> 

Hm, in Ch 18 (page 313 or 315) it looks like the filtered signal is 
always centered in the middle. I experimented a little bit and the 
larger the filter (or the convolution size), the more i got somehow 
"sqeezed and centered" signals. So i thought, cut the beginning and the 
end of half the filter size would be ok... I plotted both signals and 
they seem to match. But i still wonder if this is really correct.

Sometimes i need a filtered signal with the exact length as the original 
one.

Reply by john ●September 27, 20052005-09-27

Andreas Besting wrote:
> john wrote:
> > Another approach is to use a series of zoom FFTs on sub-bands of data,
> > building up the big FFT in chunks. This method takes N linear passes
> > through the data, where N is the number of sub-bands. Recently I chose
> > this technique to do a  524288 real point FFT on a '54XX DSP chip using
> > a 4096 point complex FFT routine and a decimation factor of 128
> > (4096*128 == 524288).
> >
> > John
> >
>
> Yes!! That's exactly what i need! So it does work! But how do i create
> the large spectrum from the chunks?
>
>
> Andreas

You're just using the zoom-FFT to zoom in on one chunk of the spectrum
at a time. The chunks are then placed side by side to form the final
hi-res spectrum. If you are only interested in pieces of the spectrum,
this can be fairly time efficient. It is not a particularly time
efficient way to compute the entire spectrum, however it does not
require much RAM, and it doesn't jump all over the place in the stored
data (could run on a tape drive).

John

Reply by David Tweed ●September 27, 20052005-09-27

Andreas Besting wrote:
> For my little project i need to fft large data arrays, that do not fit
> into memory. So far i know about overlap add and for some problems that
> works fine, but I wondered if it's possible to transform the whole data
> altogether.

Take a page from the BETA receiver used in the SETI project. Decompose
the huge FFT into smaller chunks that can be handled within your memory.

Appendix C of this document goes into details:
   http://seti.harvard.edu/grad/dpdf/thesislt.pdf

-- Dave Tweed

Reply by Jerry Avins ●September 27, 20052005-09-27

Rune Allnor wrote:
> Andreas Besting wrote:
> 
>>Hi!
>>
>>For my little project i need to fft large data arrays, that do not fit
>>into memory. So far i know about overlap add and for some problems that
>>works fine, but I wondered if it's possible to transform the whole data
>>altogether.
>>I tried to code a special memory management for the fft but due to bit
>>reversed addressing i get so many page faults that it's really not
>>efficient...  I also tried to reconstruct the large fft from several
>>windowed transforms, but it got me very strange results (is this even
>>possible??? i'm really not an expert in dsp). So, any help would be nice!
> 
> 
> I don't know for sure, but assume your "little project" involves
> applying some sort of filter to a sound file.
> 
> The Fast Fourier Transform, FFT, is a fast way to implement the
> Discrete Fourier Transform, DFT. The DFT can naively be implemented
> in a way that requires O(N^2) FLOPs for transforming an N-point
> sequence, while the FFT requires only O(N*log(N)) FLOPs for
> doing the same job.
> 
> So the FFT uses a fraction log(N)/N of the time of the naive
> algorithm. For N>1 we see that it is a very good idea to use
> the FFT to compute the DFT, when disregarding various forms
> of program run-time overhead.
> 
> Now, the operation of filtering a data sequence can be expressed
> in terms the DFTs of the input signal and filter transfer function,
> so a filtering operation can be imnplemented in terms of FFTs.
> 
> The question is whether that is a smart thing to do.
> 
> Implementing the filter in terms of DFTs requires computing the
> FFT of the input signal, one or two FFTs to find the filter
> transfer function, multiplying the spectra of the input signal
> and the transfer function followed by an IFFT to find the output
> signal. All of this takes place in complex arithmetic, so it
> turns out to be quite a number of FLOPs involved.
> 
> The net result is that if the number of coefficients in your
> filter is comparable to log(N), N being the number of data
> points, you might want to consider the straight-forward
> time domain implementation of your filter.
> 
> The run-time penalty will be small compared to the FFT/IFFT
> algorithm (for very small filters the time-domain methof may
> even be faster), and it will be way simpler to implement.
> Depending on the compiler you use, you might even be able
> to exploit certain fancy MMX-instructions in your CPU that
> can make a very fast implementation of the algorithm.
> 
> Now, you need very presice knowledge of the number of FLOPs
> of an FFT algorithm to evaluate where the limit is for
> switching to time domain, but a very naive argument says that
> time domain is the place to work with 100000000 samples and a
> filter with 25 or less coefficients.
> 
> So at first glance, this type of argument only works for small
> IIR filters or very small FIR filters. However, the O(NlogN)
> notation does away with numerical constants like in
> 
>   O(NlogN) -> "The FFT takes 28*N*log(30*N) FLOPS"
> 
> so depending on the numerical constants that go into the
> exact expression, there may be room for somewhat larger
> filters in time domain. But you need to consult somebody
> who know the FLOP counts in the FFT algorithms to find out
> the exact details.

There's another point in favor of working in the time domain: it's 
easier to avoid the kind of paging inefficiencies that are plaguing 
Andreas. But there are uses for finding the spectrum aside from 
filtering. We don't know the application here.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Rune Allnor ●September 27, 20052005-09-27

Andreas Besting wrote:
> Rune Allnor wrote:
>
> >
> > I don't know for sure, but assume your "little project" involves
> > applying some sort of filter to a sound file.
>
> Currently I'm working with biomedical time series and i wrote a small
> viewer/editor to process the data with some filters and other functions.
> It's also a project for me to learn a bit more about dsp.

I like that. It's very hands-on and you will, even after what you
have done so far, have a way better basis for working with DSP
than somebody who uses the canned routines from the matlab
Signal Processing Toolbox. Even if you happen to use matlab
in your little project.

> > The net result is that if the number of coefficients in your
> > filter is comparable to log(N), N being the number of data
> > points, you might want to consider the straight-forward
> > time domain implementation of your filter.
> >
> > The run-time penalty will be small compared to the FFT/IFFT
> > algorithm (for very small filters the time-domain methof may
> > even be faster), and it wuill be way simpler to implement.
> > Depending on the compiler you use, you might even be able
> > to exploit certain fancy MMX-instructions in your CPU that
> > can make a very fast implementation of the algorithm.
> >
>
> That's interesting, i can't even imagine how filtering in the time
> domain works.

It is very simple, at least in principle:

y[n] = a1*y[n-1] + a2*y[n-2] + ... + aN*y[n-N]
      +b0*x[n] + b1*x[n-1] + ... + bM*x[n-M]

where an, bm are filter coefficients, x[n] is the input signal and
y[n] is the output signal.

> When i started on the project, i read some tutorials about the FFT and
> after some time (months...) i managed to implement it.

You did that yourself? Not following a recipe/pseudo code?
Brilliant!

> I wrote a simple
> rectangular filter that worked like that: FFT, zero frequency bins,
> IFFT.

The problem with this applroach is that ypu have no control of
what happens in the bins between the coefficients. Read a basic
text on "FIR filter design by the window method" to find a better
way. The implementation is OK, though.

> Then i realized, that for large signals my heap space was too
> small and the program crashed. I read about overlap add and coded a
> better filter. That worked fine, but due to the overlapping the
> performance was much worse than before.

Well, large amounts of data present their own problems. Generally
speaking, I would say "stay away from the huge amounts of data until
you have to." It just takes too much detail knowledge to find good
algorithms and good implementations.

> (BTW, i still have a question about filters: When i convolve the signal
> with the filter, i understand that the result signal's size is longer
> than the original one (+filter size). Is it ok when i cut the beginning
> and the end to get a filtered signal with equal length? i did it and the
> result looks quite good.)

It depends on the application, but yes, this is onme way of
doing things. It is straight-forward for FIR filters where the
length of the impulse response is known, but less simple when
IIR filters are involved.

Again, get a basic text on DSP to learn the details.

> After that i implemented cross-correlation using FFT and had the same
> problem that my heap space was too small. But now my "filter" was
> another large signal and i could'n use overlap add (or can i?). So i
> thought, if i had an fft that worked also on very large signals, many
> other problems (such as cross-correlation) would be much easier to realize.
> I tried to write an FFT using on-demand paging, but it was terribly slow
> because the bit reversed addressing caused too many page faults.

The devil is in the (implementation) details...

With large amounts of (stationary) data, it might be better to
compute a shorter aotocorrelation sequence, and use an average of
many estimates of the sequence. I am not sure how useful the
autocorrelation coefficient at lag 50000000 will be; a good estimate
for the coefficient at lag 5000 might be more interesting...

> My last attempt was to code a kind of "buffered" FFT, that could somehow
> merge many small spectra to a large one, but as i said, this didn't work
> at all.
> I just found a paper that looks very interesting, it's called
> "Performing Out-of-Core FFTs on Parallel Disk Systems",
> http://historical.ncstrl.org/tr/pdf/icase/TR-96-70.pdf
> I haven't read it but it seems to address my problem.
>
> So far, thank you for your help!

Well, you certainly have your work cut out for you. I am impressed
by the guts you have shown by implementing these things yourself
more or less from scratch. Again, the lessons learned by doing
something like this are invaluable in a future job dealing with DSP.

Rune

Reply by Jim Thomas ●September 27, 20052005-09-27

rhnlogic@yahoo.com wrote:
> Did these algorithms have a name?

Out-of-core FFT?


-- 
Jim Thomas            Principal Applications Engineer  Bittware, Inc
jthomas@bittware.com  http://www.bittware.com    (603) 226-0404 x536
Today is the last day of your life so far.

Reply by Rune Allnor ●September 27, 20052005-09-27

Jim Thomas wrote:
> rhnlogic@yahoo.com wrote:
> > Did these algorithms have a name?
>
> Out-of-core FFT?

I think Don Knuth treated the problem of processing a
data set that is much larger than the available RAM in volume 2
of his "The art of computer programming".

> Today is the last day of your life so far.

This is an experienced engineer talking...

Rune

Reply by James Van Buskirk ●September 28, 20052005-09-28

"Jim Thomas" <jthomas@bittware.com> wrote in message
news:11jirn01co2om85@corp.supernews.com...

> rhnlogic@yahoo.com wrote:
> > Did these algorithms have a name?

> Out-of-core FFT?

Four-step (or was it five-step?) FFT; in

http://www.jjj.de/fxt/fxtbook.pdf

it's referred to as the matrix Fourier algorithm (MFA).  None
of these names seems very descriptive, as the number of steps
is in the eye of the beholder, and any DFT algorithm is just
trying to compute the action of a matrix on a vector.  The
idea is to factor the problem size into two roughly equal
pieces and apply a Cooley-Tukey step to these two pieces.
The data are originally badly placed in memory to execute
the transforms implied by the first factor, so the matrix
is effectively transposed so as to make this set of transforms
operate on contiguous data.  After these transforms the data
are transposed back to make the data sets for the second
course of transforms each contiguous in memory.  After the
second set of transforms a third and final transpose is
necessary to put the outputs in standard order.  The three
transposes seem like gratuitous extra work, but these are
the only stages that touch memory and the memory operations
aren't necessarily as horrible as those requested by the FFT.
With due attention to the sizes of things you are touching,
such as cache lines and pages, the memory operation phases
can be fairly effecient.  I once coded an algorithm like
this and it beat the CXML FFT algorithm on a 21164 in speed
for large data sizes for this reason (it won at small data
sizes for a different reason.)  BTW, I don't recommend using
an LUT to hold the phase (or some would say twiddle) factors
between the two halves of the procedure.  If your data don't
fit in memory, this one set of phase factors won't, either.
LUTs are fine for the phase factors in the substages of the
small FFTs, though.

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end

Reply by john ●September 28, 20052005-09-28

Wow, that's one hell of a book. Looks like a treasure trove!

Thanks for posting that.

John

Previous 12Next

How to fft huge data arrays?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group