# Beginner's question on undersampling and aliasing

Started by June 6, 2007
```>prad wrote:
>
>>   Are you implying that I first pick 1M samples, then LPF the samples
and
>> use them? Will that work? Or are you saying that I have to LPF the
entire
>> digitized data and then do decimation? The second method is not
feasible
>> for me as the data set is too large and will require an enormous amount
of
>
>The exact answer depends much on the ratio of sampling rate to the
>frequency you are interested in.  Most likely it is possible to come
>up with a relatively fast filter to get what you want.
>
>Consider the moving average filter.  You will find that DSP people
>don't like it very much because its properties are not very good,
>but it is easy to describe and fast to implement.
>
>You might, for example, do a 1000 point moving average where the
>first output value is the mean of the first 1000 input samples,
>the second is the mean of 2 through 1001, etc.  This can be done
>very fast with a circular buffer of length 1000 to update the
>sum appropriately.  It might be that a moving average filter
>followed by a decimation by 10, (output only every 10th point)
>would be a good initial filter, and reduce your problem by
>a factor of ten.  Then another filter could finish off the job.
>
>I have not done the calculations to show that the above is
>actually a good way to proceed, but only offer it to show
>that there are fast ways to filter large amounts of data,
>and that in some cases they might be useful.  Also, that
>cascaded filters might be faster than doing the whole
>thing in one step.
>
>-- glen
>
>

Hi Glen,

One of my major problems is that the data sets I have are huge, in fact
in terms of memory complexity, they are O(n!). 10^13 is one of the smallest
data sets I have. So I'd like to know if there are ways of removing the
high frequency components using only the 1M samples.

Thanks,

```
```prad wrote:

(snip)

>    One of my major problems is that the data sets I have are huge, in fact
> in terms of memory complexity, they are O(n!). 10^13 is one of the smallest
> data sets I have. So I'd like to know if there are ways of removing the
> high frequency components using only the 1M samples.

What is the sampling rate, and what frequency range do you need?

How many processors do you have, and how is the data stored?

As popular disk drives are about 5e11 bytes, and your data might be
in 16 bit samples, it would take 40 such drives to hold the data.
If those were on 40 machines, each could process its part, with
just a little extra work at the boundaries.

How long can you wait for results?

How much money do you have to buy more machines?

-- glen

```
```"prad" <pradeep.fernando@gmail.com> writes:
> [...]
> Hi Glen,
>
>    One of my major problems is that the data sets I have are huge, in fact
> in terms of memory complexity, they are O(n!). 10^13 is one of the smallest
> data sets I have. So I'd like to know if there are ways of removing the
> high frequency components using only the 1M samples.

If you're asking if you can extract 1M samples from 10^13 samples without
filtering and without aliasing, then the answer is NO.

--
%  Randy Yates                  % "Though you ride on the wheels of tomorrow,
%% Fuquay-Varina, NC            %  you still wander the fields of your
%%% 919-577-9882                %  sorrow."
%%%% <yates@ieee.org>           % '21st Century Man', *Time*, ELO
```
```Randy :

Yes, I am wondering if I can extract 1M samples out of 10^13 data
points without aliasing. I am not saying that I do not want to perform
filtering. But I cannot afford to perform filtering on the entire dataset
and I am wondering if I can perform the filtering on a much smaller subset
of the data points. Something like 20*1M of the data points rather than the
10^13 points.

Glen :
Actually the data samples are produced by a code. But the code can
exactly produce the ith data sample that I need out of the entire 10^13
data points. So storage is not an issue. My limitation is that the C code
for the FFT takes the data input as an array of samples. And the largest
array size is 2^32. Since the data is not periodic, I need equidistant
samples picked out from the entire 10^13 data points. You can assume that
the sampling rate for the entire data set is 2Ghz. Then the maximum
frequency in the data is 1Ghz (assuming that LPF was already done). But
now to get down to 1M samples from the total of 10^13 data points (i.e.
decimate) I have to LPF again. But I cannot afford to LPF the entire data
set and would like to apply the LPF to the 1M samples or maybe a constant
number of neighboring datapoints of the selected 1M samples. Is this
possible?

Thanks,

Thanks,

>> [...]
>> Hi Glen,
>>
>>    One of my major problems is that the data sets I have are huge, in
fact
>> in terms of memory complexity, they are O(n!). 10^13 is one of the
smallest
>> data sets I have. So I'd like to know if there are ways of removing
the
>> high frequency components using only the 1M samples.
>
>
>If you're asking if you can extract 1M samples from 10^13 samples
without
>filtering and without aliasing, then the answer is NO.
>
>--
>%  Randy Yates                  % "Though you ride on the wheels of
tomorrow,
>%% Fuquay-Varina, NC            %  you still wander the fields of your
>%%% 919-577-9882                %  sorrow."
>%%%% <yates@ieee.org>           % '21st Century Man', *Time*, ELO
>
```
```On Jun 7, 3:39 pm, "prad" <pradeep.ferna...@gmail.com> wrote:
> Randy :
>
>       Yes, I am wondering if I can extract 1M samples out of 10^13 data
> points without aliasing. I am not saying that I do not want to perform
> filtering. But I cannot afford to perform filtering on the entire dataset
> and I am wondering if I can perform the filtering on a much smaller subset
> of the data points. Something like 20*1M of the data points rather than the
> 10^13 points.
>
> Glen :
>      Actually the data samples are produced by a code. But the code can
> exactly produce the ith data sample that I need out of the entire 10^13
> data points. So storage is not an issue. My limitation is that the C code
> for the FFT takes the data input as an array of samples. And the largest
> array size is 2^32. Since the data is not periodic, I need equidistant
> samples picked out from the entire 10^13 data points. You can assume that
> the sampling rate for the entire data set is 2Ghz. Then the maximum
> frequency in the data is 1Ghz (assuming that LPF was already done). But
> now to get down to 1M samples from the total of 10^13 data points (i.e.
> decimate) I have to LPF again. But I cannot afford to LPF the entire data
> set and would like to apply the LPF to the 1M samples or maybe a constant
> number of neighboring datapoints of the selected 1M samples. Is this
> possible?

Not unless you know something about the entire data set,
such as that it is already bandlimited or filtered below some
reasonable limit.

If you only look at 1 M samples + 20 neighboring points
per sample, how do you know whether or not some of your
samples are sitting on top of narrow 41 point wide spikes
(alias noise compared to the 10^7 points of surrounding
data between samples) ?

You could try a randomized statistical sampling of
data points with varying sample spacings, and perhaps
fail to falsify that your data set has no energy in
selected frequency bands above some limit to some
degree of probability.  Would that do?

IMHO. YMMV.
--
rhn A.T nicholson d.0.t C-o-M

```
```prad wrote:
> Hi
>
>   Are you implying that I first pick 1M samples, then LPF the samples and
> use them? Will that work? Or are you saying that I have to LPF the entire
> digitized data and then do decimation? The second method is not feasible
> for me as the data set is too large and will require an enormous amount of

You are playing with very big numbers, but I presume you don't need to
do one of these every day. Filtering can be part of the decimating
process. You need to calculate only those samples you will eventually
use. The more you decimate, the lower you will have to set the filter's
cutoff, and the more you will fill in the minima you're looking for.

Jerry
--
Engineering is the art of making what you want from things you can get.
&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;
```
```prad wrote:
> Randy :
>
>       Yes, I am wondering if I can extract 1M samples out of 10^13 data
> points without aliasing. I am not saying that I do not want to perform
> filtering. But I cannot afford to perform filtering on the entire dataset
> and I am wondering if I can perform the filtering on a much smaller subset
> of the data points. Something like 20*1M of the data points rather than the
> 10^13 points.
>
> Glen :
>      Actually the data samples are produced by a code. But the code can
> exactly produce the ith data sample that I need out of the entire 10^13
> data points. So storage is not an issue. My limitation is that the C code
> for the FFT takes the data input as an array of samples. And the largest
> array size is 2^32. Since the data is not periodic, I need equidistant
> samples picked out from the entire 10^13 data points. You can assume that
> the sampling rate for the entire data set is 2Ghz. Then the maximum
> frequency in the data is 1Ghz (assuming that LPF was already done). But
> now to get down to 1M samples from the total of 10^13 data points (i.e.
> decimate) I have to LPF again. But I cannot afford to LPF the entire data
> set and would like to apply the LPF to the 1M samples or maybe a constant
> number of neighboring datapoints of the selected 1M samples. Is this
> possible?

Why not change the producing code to generate a decimated set in the
first place?

Jerry
--
Engineering is the art of making what you want from things you can get.
&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;
```
```Hi all,

Jerry:
I was just thinking about sth like that. I was thinking that I might be
able to modify my code to produce the sum of a subset of data points. Yes,
if I could produce the decimated data points directly then I guess my
problem is solved.

Ron:
Randomized Statistical Sampling is another good idea. Initially I was
thinking along another line involving random sampling. I was thinking
about producing random data samples and then performing FFT on these
samples. In fact, I did it. But since I am not that familiar with DSP and
FFT, could not really figure out how to interpret the FFT results. In
fact, most of the links I found on FFT with non-uniformly spaced samples
were interpolating to find the equally spaced samples and then performing
FFT. Is this the standard technique for FFT with non-uniformly spaced
samples? Thanks Ron for this new idea. I will investigate it further.

BTW, some other basic questions regarding interpretation of the N-pt FFT
results :

1. The frequency bin "i" corresponds to frequency fi = (i*fs/N),
right? So bin N/2 contains frequency fs/2 while bin 1 contains
frequency fs/N .

2. Assuming that the phase angles lie in [0, 2pi], if the phase angle
of a frequency is 0, does that mean that the first positive peak
of that frequency is at t=0?
(Is this the case for FFTW's FFT results?)

Thanks,

>> Randy :
>>
>>       Yes, I am wondering if I can extract 1M samples out of 10^13
data
>> points without aliasing. I am not saying that I do not want to perform
>> filtering. But I cannot afford to perform filtering on the entire
dataset
>> and I am wondering if I can perform the filtering on a much smaller
subset
>> of the data points. Something like 20*1M of the data points rather than
the
>> 10^13 points.
>>
>> Glen :
>>      Actually the data samples are produced by a code. But the code
can
>> exactly produce the ith data sample that I need out of the entire
10^13
>> data points. So storage is not an issue. My limitation is that the C
code
>> for the FFT takes the data input as an array of samples. And the
largest
>> array size is 2^32. Since the data is not periodic, I need equidistant
>> samples picked out from the entire 10^13 data points. You can assume
that
>> the sampling rate for the entire data set is 2Ghz. Then the maximum
>> frequency in the data is 1Ghz (assuming that LPF was already done).
But
>> now to get down to 1M samples from the total of 10^13 data points
(i.e.
>> decimate) I have to LPF again. But I cannot afford to LPF the entire
data
>> set and would like to apply the LPF to the 1M samples or maybe a
constant
>> number of neighboring datapoints of the selected 1M samples. Is this
>> possible?
>
>Why not change the producing code to generate a decimated set in the
>first place?
>
>Jerry
>--
>Engineering is the art of making what you want from things you can get.
>&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;
>
```
```"prad" <pradeep.fernando@gmail.com> writes:

> Randy :
>
>       Yes, I am wondering if I can extract 1M samples out of 10^13 data
> points without aliasing. I am not saying that I do not want to perform
> filtering. But I cannot afford to perform filtering on the entire dataset
> and I am wondering if I can perform the filtering on a much smaller subset
> of the data points. Something like 20*1M of the data points rather than the
> 10^13 points.

If you required the entire dataset to get the low-frequency accuracy
you need, then you can't just filter/decimate a small subset and get
that same accuracy.
--
%  Randy Yates                  % "She's sweet on Wagner-I think she'd die for Beethoven.
%% Fuquay-Varina, NC            %  She love the way Puccini lays down a tune, and
%%% 919-577-9882                %  Verdi's always creepin' from her room."
%%%% <yates@ieee.org>           % "Rockaria", *A New World Record*, ELO
```
```On Thu, 07 Jun 2007 17:39:05 -0500, "prad"

>Randy :
>
>      Yes, I am wondering if I can extract 1M samples out of 10^13 data
>points without aliasing.

If you take 1M contiguous data points out of the set as a hopefully
representative sample (which could be could if you think/know that the
statistics are stationary or close to it), then they won't be any more
aliased than the entire set.

If you want to just take 1M points by taking every Nth sample, that's
problematic unless you *know* that the signal is at least N times
oversampled.   It does sound, however like that's not the case.

Taking a chunk of 1M contiguous samples could tell you useful things
about the signal, especially if you can show that the metrics that
you're looking for are reasonably stationary/stable across the rest of
the sample.   Or you could space a few of these 1M samples sets across
the bigger set to get an idea of how stationary things are or not.

>I am not saying that I do not want to perform
>filtering. But I cannot afford to perform filtering on the entire dataset
>and I am wondering if I can perform the filtering on a much smaller subset
>of the data points. Something like 20*1M of the data points rather than the
>10^13 points.

As long as they're taken in a contiguous fashion it's going to have
the same effect as if you did the filtering on the entire set.   There

>Glen :
>     Actually the data samples are produced by a code. But the code can
>exactly produce the ith data sample that I need out of the entire 10^13
>data points. So storage is not an issue. My limitation is that the C code
>for the FFT takes the data input as an array of samples. And the largest
>array size is 2^32. Since the data is not periodic, I need equidistant
>samples picked out from the entire 10^13 data points.

Bleah. Sounds like you want every Nth sample.   Not good.

> You can assume that
>the sampling rate for the entire data set is 2Ghz. Then the maximum
>frequency in the data is 1Ghz (assuming that LPF was already done).

No need to assume, if it's sampled at 2GHz, the highest frequency in
the data is 1GHz.   Whether there was any aliasing in the sampling
process may be a question, but once it's sampled the highest supported
frequency is 1GHz.

> But
>now to get down to 1M samples from the total of 10^13 data points (i.e.
>decimate) I have to LPF again. But I cannot afford to LPF the entire data
>set and would like to apply the LPF to the 1M samples or maybe a constant
>number of neighboring datapoints of the selected 1M samples. Is this
>possible?

There are definitely ways to decimate sets without wasting operations.
If you want every Nth sample, filtered appropriately, there's no need
to compute anything related to the 1 - (N-1)th output samples in
between.   This can cut computation down tremendously, but in your
case you still got a whole lotta numbers to crunch.