decimation and anti-aliasing

Started by June 7, 2007
Dear all,
Thanks for all your replies but I am still a bit confused. Let me try
rephrasing my question so that everyone gets a better picture of what I am
trying to do. I have a large number (order of 10^13) of discretized data
samples. I have to obtain the frequency spectrum of this data so that I
can predict minima in the discretized data. I am using the FFTW C library
to obtain the Frequency spectrum. But the maximum size of my data array is
limited to 2^32 in C. So I am picking a smaller subset of 1M samples from
the original set of 10^13 data points.

Jerry is right about the fact that what I am looking for is subsampling
or decimation without running into aliasing problems. But due to the large
amount of data, I cannot perform decimation on the entire data set. Also,
I'd like to point out that my data is inherently discretized and so using
an analog anti-aliasing filter to remove all the high frequency components
that I am not interested in is NOT an option for me. So some scheme that
resembles what John mentioned is of great interest to me. But I am not
sure if that will work. An ideal scheme for me would be like the
following:

1. Collect N=1M equidistant samples out of the n=10^13 data points.
2. Filter out the high frequency components present in the samples
using some kind of LPF, i.e.
For each sample "s_i" in the 1M samples
i. Collect some constant number (say 10) samples before and
after the sample "s_i"
ii. Perform a Digital Low Pass Filtering operation on these
21 samples.
iii. Pick one of these 21 samples (say the median of the
21 samples) to replace "s_i"

Will the above procedure eliminate or atleast drastically reduce
aliasing (assuming of course that the original 10^13 data points were
sampled without aliasing)? Can I somehow tell which is the lowest
frequency that was significantly filtered out (so that I know which of the
lower frequencies contain aliases)?

Some other basic questions regarding interpretation of the N-pt FFT
results :

1. The frequency bin "i" corresponds to frequency fi = (i*fs/N),
right?
So bin N/2 contains frequency fs/2 while bin 1 contains
frequency fs/N .
2. Assuming that the phase angles lie in [0, 2pi], if the phase angle

of a frequency is 0, does that mean that the first positive peak
of that frequency is at t=0?
(Is this the case for FFTW's FFT results?)

Thanks,

news:BfOdnXcCDqBTsfXbnZ2dnUVZ_revnZ2d@giganews.com:

> But due to the large > amount of data, I cannot perform decimation on the entire data set.
Make believe you are doing a DSP in real time. Read your data one point at a time, do the appropriate convolution for filtering. Maintain enough of a data buffer to do the convolution, letting the older stuff fall off the end, but save every Nth point to a results array. -- Scott Reverse name to reply

> Thanks for all your replies but I am still a bit confused. Let me try > rephrasing my question so that everyone gets a better picture of what I am > trying to do. I have a large number (order of 10^13) of discretized data > samples. I have to obtain the frequency spectrum of this data so that I > can predict minima in the discretized data. I am using the FFTW C library > to obtain the Frequency spectrum. But the maximum size of my data array is > limited to 2^32 in C. So I am picking a smaller subset of 1M samples from > the original set of 10^13 data points.
If you only need part of the FFT, such as the low frequency points, there are ways to speed it up. But 1e13 is a lot of points. I have wondered in the past about an FFT on an entire CD, which is much less than 1e13 samples. There are many tricks that can be done with large data sets to process them faster. -- glen
glen herrmannsfeldt wrote:
> prad wrote: > >> Thanks for all your replies but I am still a bit confused. Let me try >> rephrasing my question so that everyone gets a better picture of what >> I am >> trying to do. I have a large number (order of 10^13) of discretized data >> samples. I have to obtain the frequency spectrum of this data so that I >> can predict minima in the discretized data. I am using the FFTW C library >> to obtain the Frequency spectrum. But the maximum size of my data >> array is >> limited to 2^32 in C. So I am picking a smaller subset of 1M samples from >> the original set of 10^13 data points. > > If you only need part of the FFT, such as the low frequency points, > there are ways to speed it up. But 1e13 is a lot of points. > I have wondered in the past about an FFT on an entire CD, which is > much less than 1e13 samples. There are many tricks that can be > done with large data sets to process them faster.
Do we know that speed is an issue? Multirate decimation from and to files on disk is surely a possibility for a one-off calculation. The numbers don't add up for me. 2^32 = 4,294,967,296. Why mess around with 2^20 (1,048,576)? Decimate by 2^12 and go. (I don't know the highest original frequency, but the highest one left after decimation will be f/4096. Unless those minima are very broad, the processing will fill them in.) Jerry -- Engineering is the art of making what you want from things you can get. &macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;
If you are interested in the whole spectrum, you are stuck with random
sampling with significantly more than 1E6 points.  If you are only interested
in a "small" of the spectrum, you might try filtering out only the band of
interest and transforming that.

As an aside, there are 64 bit computers available for reasonable amounts of
money.  Maybe a lease ...

In article <koOdnbpCo5g3A_XbnZ2dnUVZ_qupnZ2d@rcn.net>, jya@ieee.org wrote:
>glen herrmannsfeldt wrote: >> prad wrote: >> >>> Thanks for all your replies but I am still a bit confused. Let me try >>> rephrasing my question so that everyone gets a better picture of what >>> I am >>> trying to do. I have a large number (order of 10^13) of discretized data >>> samples. I have to obtain the frequency spectrum of this data so that I >>> can predict minima in the discretized data. I am using the FFTW C library >>> to obtain the Frequency spectrum. But the maximum size of my data >>> array is >>> limited to 2^32 in C. So I am picking a smaller subset of 1M samples from >>> the original set of 10^13 data points. >> >> If you only need part of the FFT, such as the low frequency points, >> there are ways to speed it up. But 1e13 is a lot of points. >> I have wondered in the past about an FFT on an entire CD, which is >> much less than 1e13 samples. There are many tricks that can be >> done with large data sets to process them faster. > >Do we know that speed is an issue? Multirate decimation from and to >files on disk is surely a possibility for a one-off calculation. The >numbers don't add up for me. 2^32 = 4,294,967,296. Why mess around with >2^20 (1,048,576)? Decimate by 2^12 and go. (I don't know the highest >original frequency, but the highest one left after decimation will be >f/4096. Unless those minima are very broad, the processing will fill >them in.) > >Jerry