DSPRelated.com
Forums

Beginner's question on undersampling and aliasing

Started by prad June 6, 2007
>prad wrote: > >> Are you implying that I first pick 1M samples, then LPF the samples
and
>> use them? Will that work? Or are you saying that I have to LPF the
entire
>> digitized data and then do decimation? The second method is not
feasible
>> for me as the data set is too large and will require an enormous amount
of
>> time. Please advise. > >The exact answer depends much on the ratio of sampling rate to the >frequency you are interested in. Most likely it is possible to come >up with a relatively fast filter to get what you want. > >Consider the moving average filter. You will find that DSP people >don't like it very much because its properties are not very good, >but it is easy to describe and fast to implement. > >You might, for example, do a 1000 point moving average where the >first output value is the mean of the first 1000 input samples, >the second is the mean of 2 through 1001, etc. This can be done >very fast with a circular buffer of length 1000 to update the >sum appropriately. It might be that a moving average filter >followed by a decimation by 10, (output only every 10th point) >would be a good initial filter, and reduce your problem by >a factor of ten. Then another filter could finish off the job. > >I have not done the calculations to show that the above is >actually a good way to proceed, but only offer it to show >that there are fast ways to filter large amounts of data, >and that in some cases they might be useful. Also, that >cascaded filters might be faster than doing the whole >thing in one step. > >-- glen > >
Hi Glen, One of my major problems is that the data sets I have are huge, in fact in terms of memory complexity, they are O(n!). 10^13 is one of the smallest data sets I have. So I'd like to know if there are ways of removing the high frequency components using only the 1M samples. Thanks, Pradeep.
prad wrote:

(snip)

> One of my major problems is that the data sets I have are huge, in fact > in terms of memory complexity, they are O(n!). 10^13 is one of the smallest > data sets I have. So I'd like to know if there are ways of removing the > high frequency components using only the 1M samples.
What is the sampling rate, and what frequency range do you need? How many processors do you have, and how is the data stored? As popular disk drives are about 5e11 bytes, and your data might be in 16 bit samples, it would take 40 such drives to hold the data. If those were on 40 machines, each could process its part, with just a little extra work at the boundaries. How long can you wait for results? How much money do you have to buy more machines? -- glen
"prad" <pradeep.fernando@gmail.com> writes:
> [...] > Hi Glen, > > One of my major problems is that the data sets I have are huge, in fact > in terms of memory complexity, they are O(n!). 10^13 is one of the smallest > data sets I have. So I'd like to know if there are ways of removing the > high frequency components using only the 1M samples.
Prad, If you're asking if you can extract 1M samples from 10^13 samples without filtering and without aliasing, then the answer is NO. Denying reality won't help you get your problem solved. -- % Randy Yates % "Though you ride on the wheels of tomorrow, %% Fuquay-Varina, NC % you still wander the fields of your %%% 919-577-9882 % sorrow." %%%% <yates@ieee.org> % '21st Century Man', *Time*, ELO http://home.earthlink.net/~yatescr
Randy : 

      Yes, I am wondering if I can extract 1M samples out of 10^13 data
points without aliasing. I am not saying that I do not want to perform
filtering. But I cannot afford to perform filtering on the entire dataset
and I am wondering if I can perform the filtering on a much smaller subset
of the data points. Something like 20*1M of the data points rather than the
10^13 points. 

Glen :
     Actually the data samples are produced by a code. But the code can
exactly produce the ith data sample that I need out of the entire 10^13
data points. So storage is not an issue. My limitation is that the C code
for the FFT takes the data input as an array of samples. And the largest
array size is 2^32. Since the data is not periodic, I need equidistant
samples picked out from the entire 10^13 data points. You can assume that
the sampling rate for the entire data set is 2Ghz. Then the maximum
frequency in the data is 1Ghz (assuming that LPF was already done). But
now to get down to 1M samples from the total of 10^13 data points (i.e.
decimate) I have to LPF again. But I cannot afford to LPF the entire data
set and would like to apply the LPF to the 1M samples or maybe a constant
number of neighboring datapoints of the selected 1M samples. Is this
possible? 

Please advise.


Thanks,
Pradeep.





Thanks,
Pradeep.


>"prad" <pradeep.fernando@gmail.com> writes: >> [...] >> Hi Glen, >> >> One of my major problems is that the data sets I have are huge, in
fact
>> in terms of memory complexity, they are O(n!). 10^13 is one of the
smallest
>> data sets I have. So I'd like to know if there are ways of removing
the
>> high frequency components using only the 1M samples. > >Prad, > >If you're asking if you can extract 1M samples from 10^13 samples
without
>filtering and without aliasing, then the answer is NO. > >Denying reality won't help you get your problem solved. >-- >% Randy Yates % "Though you ride on the wheels of
tomorrow,
>%% Fuquay-Varina, NC % you still wander the fields of your >%%% 919-577-9882 % sorrow." >%%%% <yates@ieee.org> % '21st Century Man', *Time*, ELO >http://home.earthlink.net/~yatescr >
On Jun 7, 3:39 pm, "prad" <pradeep.ferna...@gmail.com> wrote:
> Randy : > > Yes, I am wondering if I can extract 1M samples out of 10^13 data > points without aliasing. I am not saying that I do not want to perform > filtering. But I cannot afford to perform filtering on the entire dataset > and I am wondering if I can perform the filtering on a much smaller subset > of the data points. Something like 20*1M of the data points rather than the > 10^13 points. > > Glen : > Actually the data samples are produced by a code. But the code can > exactly produce the ith data sample that I need out of the entire 10^13 > data points. So storage is not an issue. My limitation is that the C code > for the FFT takes the data input as an array of samples. And the largest > array size is 2^32. Since the data is not periodic, I need equidistant > samples picked out from the entire 10^13 data points. You can assume that > the sampling rate for the entire data set is 2Ghz. Then the maximum > frequency in the data is 1Ghz (assuming that LPF was already done). But > now to get down to 1M samples from the total of 10^13 data points (i.e. > decimate) I have to LPF again. But I cannot afford to LPF the entire data > set and would like to apply the LPF to the 1M samples or maybe a constant > number of neighboring datapoints of the selected 1M samples. Is this > possible?
Not unless you know something about the entire data set, such as that it is already bandlimited or filtered below some reasonable limit. If you only look at 1 M samples + 20 neighboring points per sample, how do you know whether or not some of your samples are sitting on top of narrow 41 point wide spikes (alias noise compared to the 10^7 points of surrounding data between samples) ? You could try a randomized statistical sampling of data points with varying sample spacings, and perhaps fail to falsify that your data set has no energy in selected frequency bands above some limit to some degree of probability. Would that do? IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M
prad wrote:
> Hi > > Are you implying that I first pick 1M samples, then LPF the samples and > use them? Will that work? Or are you saying that I have to LPF the entire > digitized data and then do decimation? The second method is not feasible > for me as the data set is too large and will require an enormous amount of > time. Please advise.
You are playing with very big numbers, but I presume you don't need to do one of these every day. Filtering can be part of the decimating process. You need to calculate only those samples you will eventually use. The more you decimate, the lower you will have to set the filter's cutoff, and the more you will fill in the minima you're looking for. Jerry -- Engineering is the art of making what you want from things you can get. &macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;
prad wrote:
> Randy : > > Yes, I am wondering if I can extract 1M samples out of 10^13 data > points without aliasing. I am not saying that I do not want to perform > filtering. But I cannot afford to perform filtering on the entire dataset > and I am wondering if I can perform the filtering on a much smaller subset > of the data points. Something like 20*1M of the data points rather than the > 10^13 points. > > Glen : > Actually the data samples are produced by a code. But the code can > exactly produce the ith data sample that I need out of the entire 10^13 > data points. So storage is not an issue. My limitation is that the C code > for the FFT takes the data input as an array of samples. And the largest > array size is 2^32. Since the data is not periodic, I need equidistant > samples picked out from the entire 10^13 data points. You can assume that > the sampling rate for the entire data set is 2Ghz. Then the maximum > frequency in the data is 1Ghz (assuming that LPF was already done). But > now to get down to 1M samples from the total of 10^13 data points (i.e. > decimate) I have to LPF again. But I cannot afford to LPF the entire data > set and would like to apply the LPF to the 1M samples or maybe a constant > number of neighboring datapoints of the selected 1M samples. Is this > possible?
Why not change the producing code to generate a decimated set in the first place? Jerry -- Engineering is the art of making what you want from things you can get. &macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;
Hi all,

Jerry: 
   I was just thinking about sth like that. I was thinking that I might be
able to modify my code to produce the sum of a subset of data points. Yes,
if I could produce the decimated data points directly then I guess my
problem is solved. 

Ron: 
   Randomized Statistical Sampling is another good idea. Initially I was
thinking along another line involving random sampling. I was thinking
about producing random data samples and then performing FFT on these
samples. In fact, I did it. But since I am not that familiar with DSP and
FFT, could not really figure out how to interpret the FFT results. In
fact, most of the links I found on FFT with non-uniformly spaced samples
were interpolating to find the equally spaced samples and then performing
FFT. Is this the standard technique for FFT with non-uniformly spaced
samples? Thanks Ron for this new idea. I will investigate it further. 

BTW, some other basic questions regarding interpretation of the N-pt FFT
results : 

1. The frequency bin "i" corresponds to frequency fi = (i*fs/N),
right? So bin N/2 contains frequency fs/2 while bin 1 contains 
frequency fs/N .

2. Assuming that the phase angles lie in [0, 2pi], if the phase angle
of a frequency is 0, does that mean that the first positive peak 
of that frequency is at t=0?
(Is this the case for FFTW's FFT results?) 



Thanks,
Pradeep.



>prad wrote: >> Randy : >> >> Yes, I am wondering if I can extract 1M samples out of 10^13
data
>> points without aliasing. I am not saying that I do not want to perform >> filtering. But I cannot afford to perform filtering on the entire
dataset
>> and I am wondering if I can perform the filtering on a much smaller
subset
>> of the data points. Something like 20*1M of the data points rather than
the
>> 10^13 points. >> >> Glen : >> Actually the data samples are produced by a code. But the code
can
>> exactly produce the ith data sample that I need out of the entire
10^13
>> data points. So storage is not an issue. My limitation is that the C
code
>> for the FFT takes the data input as an array of samples. And the
largest
>> array size is 2^32. Since the data is not periodic, I need equidistant >> samples picked out from the entire 10^13 data points. You can assume
that
>> the sampling rate for the entire data set is 2Ghz. Then the maximum >> frequency in the data is 1Ghz (assuming that LPF was already done).
But
>> now to get down to 1M samples from the total of 10^13 data points
(i.e.
>> decimate) I have to LPF again. But I cannot afford to LPF the entire
data
>> set and would like to apply the LPF to the 1M samples or maybe a
constant
>> number of neighboring datapoints of the selected 1M samples. Is this >> possible? > >Why not change the producing code to generate a decimated set in the >first place? > >Jerry >-- >Engineering is the art of making what you want from things you can get. >&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr; >
"prad" <pradeep.fernando@gmail.com> writes:

> Randy : > > Yes, I am wondering if I can extract 1M samples out of 10^13 data > points without aliasing. I am not saying that I do not want to perform > filtering. But I cannot afford to perform filtering on the entire dataset > and I am wondering if I can perform the filtering on a much smaller subset > of the data points. Something like 20*1M of the data points rather than the > 10^13 points.
If you required the entire dataset to get the low-frequency accuracy you need, then you can't just filter/decimate a small subset and get that same accuracy. -- % Randy Yates % "She's sweet on Wagner-I think she'd die for Beethoven. %% Fuquay-Varina, NC % She love the way Puccini lays down a tune, and %%% 919-577-9882 % Verdi's always creepin' from her room." %%%% <yates@ieee.org> % "Rockaria", *A New World Record*, ELO http://home.earthlink.net/~yatescr
On Thu, 07 Jun 2007 17:39:05 -0500, "prad"
<pradeep.fernando@gmail.com> wrote:

>Randy : > > Yes, I am wondering if I can extract 1M samples out of 10^13 data >points without aliasing.
If you take 1M contiguous data points out of the set as a hopefully representative sample (which could be could if you think/know that the statistics are stationary or close to it), then they won't be any more aliased than the entire set. If you want to just take 1M points by taking every Nth sample, that's problematic unless you *know* that the signal is at least N times oversampled. It does sound, however like that's not the case. Taking a chunk of 1M contiguous samples could tell you useful things about the signal, especially if you can show that the metrics that you're looking for are reasonably stationary/stable across the rest of the sample. Or you could space a few of these 1M samples sets across the bigger set to get an idea of how stationary things are or not.
>I am not saying that I do not want to perform >filtering. But I cannot afford to perform filtering on the entire dataset >and I am wondering if I can perform the filtering on a much smaller subset >of the data points. Something like 20*1M of the data points rather than the >10^13 points.
As long as they're taken in a contiguous fashion it's going to have the same effect as if you did the filtering on the entire set. There may be some edge effects depending on how you go about this, but they can be made manageable.
>Glen : > Actually the data samples are produced by a code. But the code can >exactly produce the ith data sample that I need out of the entire 10^13 >data points. So storage is not an issue. My limitation is that the C code >for the FFT takes the data input as an array of samples. And the largest >array size is 2^32. Since the data is not periodic, I need equidistant >samples picked out from the entire 10^13 data points.
Bleah. Sounds like you want every Nth sample. Not good.
> You can assume that >the sampling rate for the entire data set is 2Ghz. Then the maximum >frequency in the data is 1Ghz (assuming that LPF was already done).
No need to assume, if it's sampled at 2GHz, the highest frequency in the data is 1GHz. Whether there was any aliasing in the sampling process may be a question, but once it's sampled the highest supported frequency is 1GHz.
> But >now to get down to 1M samples from the total of 10^13 data points (i.e. >decimate) I have to LPF again. But I cannot afford to LPF the entire data >set and would like to apply the LPF to the 1M samples or maybe a constant >number of neighboring datapoints of the selected 1M samples. Is this >possible?
There are definitely ways to decimate sets without wasting operations. If you want every Nth sample, filtered appropriately, there's no need to compute anything related to the 1 - (N-1)th output samples in between. This can cut computation down tremendously, but in your case you still got a whole lotta numbers to crunch. Might you have access to any hardware accelerators? Like a board with an FPGA on it connected to the computer? At a 400MHz clock rate, which is practical these days, you could run the entire 10^13 set through a hardware filter in an FPGA in 69 hours (about three days) assuming one clock per sample. Keeping the buffers fed and getting the data out in a timely fashion might be a challenge, but this sort of thing might be worth looking into. Eric Jacobsen Minister of Algorithms Abineau Communications http://www.ericjacobsen.org