Dear all, Thanks for all your replies but I am still a bit confused. Let me try rephrasing my question so that everyone gets a better picture of what I am trying to do. I have a large number (order of 10^13) of discretized data samples. I have to obtain the frequency spectrum of this data so that I can predict minima in the discretized data. I am using the FFTW C library to obtain the Frequency spectrum. But the maximum size of my data array is limited to 2^32 in C. So I am picking a smaller subset of 1M samples from the original set of 10^13 data points. Jerry is right about the fact that what I am looking for is subsampling or decimation without running into aliasing problems. But due to the large amount of data, I cannot perform decimation on the entire data set. Also, I'd like to point out that my data is inherently discretized and so using an analog anti-aliasing filter to remove all the high frequency components that I am not interested in is NOT an option for me. So some scheme that resembles what John mentioned is of great interest to me. But I am not sure if that will work. An ideal scheme for me would be like the following: 1. Collect N=1M equidistant samples out of the n=10^13 data points. 2. Filter out the high frequency components present in the samples using some kind of LPF, i.e. For each sample "s_i" in the 1M samples i. Collect some constant number (say 10) samples before and after the sample "s_i" ii. Perform a Digital Low Pass Filtering operation on these 21 samples. iii. Pick one of these 21 samples (say the median of the 21 samples) to replace "s_i" Will the above procedure eliminate or atleast drastically reduce aliasing (assuming of course that the original 10^13 data points were sampled without aliasing)? Can I somehow tell which is the lowest frequency that was significantly filtered out (so that I know which of the lower frequencies contain aliases)? Some other basic questions regarding interpretation of the N-pt FFT results : 1. The frequency bin "i" corresponds to frequency fi = (i*fs/N), right? So bin N/2 contains frequency fs/2 while bin 1 contains frequency fs/N . 2. Assuming that the phase angles lie in [0, 2pi], if the phase angle of a frequency is 0, does that mean that the first positive peak of that frequency is at t=0? (Is this the case for FFTW's FFT results?) Thanks, Pradeep.

# decimation and anti-aliasing

Started by ●June 7, 2007

Reply by ●June 7, 20072007-06-07

"prad" <pradeep.fernando@gmail.com> wrote in news:BfOdnXcCDqBTsfXbnZ2dnUVZ_revnZ2d@giganews.com:> But due to the large > amount of data, I cannot perform decimation on the entire data set.Make believe you are doing a DSP in real time. Read your data one point at a time, do the appropriate convolution for filtering. Maintain enough of a data buffer to do the convolution, letting the older stuff fall off the end, but save every Nth point to a results array. -- Scott Reverse name to reply

Reply by ●June 7, 20072007-06-07

prad wrote:> Thanks for all your replies but I am still a bit confused. Let me try > rephrasing my question so that everyone gets a better picture of what I am > trying to do. I have a large number (order of 10^13) of discretized data > samples. I have to obtain the frequency spectrum of this data so that I > can predict minima in the discretized data. I am using the FFTW C library > to obtain the Frequency spectrum. But the maximum size of my data array is > limited to 2^32 in C. So I am picking a smaller subset of 1M samples from > the original set of 10^13 data points.If you only need part of the FFT, such as the low frequency points, there are ways to speed it up. But 1e13 is a lot of points. I have wondered in the past about an FFT on an entire CD, which is much less than 1e13 samples. There are many tricks that can be done with large data sets to process them faster. -- glen

Reply by ●June 7, 20072007-06-07

glen herrmannsfeldt wrote:> prad wrote: > >> Thanks for all your replies but I am still a bit confused. Let me try >> rephrasing my question so that everyone gets a better picture of what >> I am >> trying to do. I have a large number (order of 10^13) of discretized data >> samples. I have to obtain the frequency spectrum of this data so that I >> can predict minima in the discretized data. I am using the FFTW C library >> to obtain the Frequency spectrum. But the maximum size of my data >> array is >> limited to 2^32 in C. So I am picking a smaller subset of 1M samples from >> the original set of 10^13 data points. > > If you only need part of the FFT, such as the low frequency points, > there are ways to speed it up. But 1e13 is a lot of points. > I have wondered in the past about an FFT on an entire CD, which is > much less than 1e13 samples. There are many tricks that can be > done with large data sets to process them faster.Do we know that speed is an issue? Multirate decimation from and to files on disk is surely a possibility for a one-off calculation. The numbers don't add up for me. 2^32 = 4,294,967,296. Why mess around with 2^20 (1,048,576)? Decimate by 2^12 and go. (I don't know the highest original frequency, but the highest one left after decimation will be f/4096. Unless those minima are very broad, the processing will fill them in.) Jerry -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Reply by ●June 8, 20072007-06-08

If you are interested in the whole spectrum, you are stuck with random sampling with significantly more than 1E6 points. If you are only interested in a "small" of the spectrum, you might try filtering out only the band of interest and transforming that. As an aside, there are 64 bit computers available for reasonable amounts of money. Maybe a lease ... In article <koOdnbpCo5g3A_XbnZ2dnUVZ_qupnZ2d@rcn.net>, jya@ieee.org wrote:>glen herrmannsfeldt wrote: >> prad wrote: >> >>> Thanks for all your replies but I am still a bit confused. Let me try >>> rephrasing my question so that everyone gets a better picture of what >>> I am >>> trying to do. I have a large number (order of 10^13) of discretized data >>> samples. I have to obtain the frequency spectrum of this data so that I >>> can predict minima in the discretized data. I am using the FFTW C library >>> to obtain the Frequency spectrum. But the maximum size of my data >>> array is >>> limited to 2^32 in C. So I am picking a smaller subset of 1M samples from >>> the original set of 10^13 data points. >> >> If you only need part of the FFT, such as the low frequency points, >> there are ways to speed it up. But 1e13 is a lot of points. >> I have wondered in the past about an FFT on an entire CD, which is >> much less than 1e13 samples. There are many tricks that can be >> done with large data sets to process them faster. > >Do we know that speed is an issue? Multirate decimation from and to >files on disk is surely a possibility for a one-off calculation. The >numbers don't add up for me. 2^32 = 4,294,967,296. Why mess around with >2^20 (1,048,576)? Decimate by 2^12 and go. (I don't know the highest >original frequency, but the highest one left after decimation will be >f/4096. Unless those minima are very broad, the processing will fill >them in.) > >Jerry