On Jun 10, 1:38 pm, "Fred Marshall" <fmarshallx@remove_the_x.acm.org>
wrote:
> So, I might take 1024 or 4096 or .... you choose the number ... and compute
> an FFT on just those contiguous samples.  You might do this for various such
> epochs along the total sequence.  While the resolution will be limited, the
> entire frequency range will be covered each time.

An fft of 1024 contiguous samples won't tell you too much
about the spectral content of frequencies on the rough
order of one-hundredth to one-billionth of the fft's first
bin, which seems to be where the OP is looking (the first
10e6 dft bins out of circa 10e13 possible?).

It still seems to me that the way to look at such huge
potential data sets (the weight of every rodent in North
America by longitude, or some such) is to start with
some statistical sampling.  What I'm wondering is if
there is a name for the procedure of taking a bunch of
randomly spaced samples and doing a regression fit of
those samples against an set of orthogonal sinusoidal
basis vectors.

IMHO. YMMV.
--
rhn A.T nicholson d.0.t C-o-M

"John E. Hadstate" <jh113355@hotmail.com> wrote in message 
news:DL8bi.26179$dy1.22507@bigfe9...
>
> "Fred Marshall" <fmarshallx@remove_the_x.acm.org> wrote in message 
> news:aqCdnboPO6bS_PHbnZ2dnUVZ_q2pnZ2d@centurytel.net...
>
>>
>> So, I might take 1024 or 4096 or .... you choose the number ... and 
>> compute an FFT on just those contiguous samples.  You might do this for 
>> various such epochs along the total sequence.  While the resolution will 
>> be limited, the entire frequency range will be covered each time.
>> If the results are quite different then you know that the spectral 
>> character of the samples is varying from segment to segment.
>> If the results are rather similar then the opposite.
>
> Consider what you would see if you computed a short DFT on a 
> low-baud-rate, highly-oversampled FSK signal.  If the DFT is short enough 
> relative to the baud rate, you will see the Mark and Space frequencies in 
> separate DFT windows.  Would you then conclude that the spectral character 
> of the signal is varying or would you conclude that the varying spectrum 
> characterizes the signal?

John,

Yes, of course I would.  :-)

Fred

"Fred Marshall" <fmarshallx@remove_the_x.acm.org> wrote in 
message 
news:aqCdnboPO6bS_PHbnZ2dnUVZ_q2pnZ2d@centurytel.net...

>
> So, I might take 1024 or 4096 or .... you choose the 
> number ... and compute an FFT on just those contiguous 
> samples.  You might do this for various such epochs along 
> the total sequence.  While the resolution will be limited, 
> the entire frequency range will be covered each time.
> If the results are quite different then you know that the 
> spectral character of the samples is varying from segment 
> to segment.
> If the results are rather similar then the opposite.

Consider what you would see if you computed a short DFT on a 
low-baud-rate, highly-oversampled FSK signal.  If the DFT is 
short enough relative to the baud rate, you will see the 
Mark and Space frequencies in separate DFT windows.  Would 
you then conclude that the spectral character of the signal 
is varying or would you conclude that the varying spectrum 
characterizes the signal?

> You should be asking yourself this question:
> Even though I have a huge number of samples, what is the 
> frequency resolution that I require?  The frequency 
> resolution is the reciprocal of the temporal epoch that 
> you choose to analyze.
>

A huge number of samples might be the result of 
oversampling.  It might also be the result of long-term 
observation of a phenomenon that  is undersampled.  It 
sounds like the OP is not sure which case applies to his 
data.

Fred and Ron,

      Thanks for your input. I'll try your suggestions. 


Prad.




>prad,
>
>If someone suggested it, then I've missed it.... here's what I would do:
>
>Because you have so many data points the frequency resolution could be
quite 
>a bit better than you need.  The fewer contiguous points you use, the 
>coarser the frequency resolution.
>
>So, I might take 1024 or 4096 or .... you choose the number ... and
compute 
>an FFT on just those contiguous samples.  You might do this for various
such 
>epochs along the total sequence.  While the resolution will be limited,
the 
>entire frequency range will be covered each time.
>If the results are quite different then you know that the spectral
character 
>of the samples is varying from segment to segment.
>If the results are rather similar then the opposite.
>
>Also, you'll be able to see the actual important bandwidth of the 
>information - so you might be able to decide that some decimation is OK
to 
>do without aliasing.
>
>You should be asking yourself this question:
>Even though I have a huge number of samples, what is the frequency 
>resolution that I require?  The frequency resolution is the reciprocal of

>the temporal epoch that you choose to analyze.
>
>Example:
>
>If you have 1 second worth of samples and the sample rate is 1MHz, then
you 
>have 10^6 samples.  If you FFT the whole sequence, you will have 1Hz 
>resolution over a range 0.5MHz (fs/2).  Maybe 1Hz resolution is overkill
for 
>your application.
>
>So, 0.1secs of data would be 100,000 samples with 10Hz resolution... and
so 
>forth.
>
>Pick the temporal length that gives suitable resolution.
>
>I hope this helps.
>
>Fred 
>
>
>

prad,

If someone suggested it, then I've missed it.... here's what I would do:

Because you have so many data points the frequency resolution could be quite 
a bit better than you need.  The fewer contiguous points you use, the 
coarser the frequency resolution.

So, I might take 1024 or 4096 or .... you choose the number ... and compute 
an FFT on just those contiguous samples.  You might do this for various such 
epochs along the total sequence.  While the resolution will be limited, the 
entire frequency range will be covered each time.
If the results are quite different then you know that the spectral character 
of the samples is varying from segment to segment.
If the results are rather similar then the opposite.

Also, you'll be able to see the actual important bandwidth of the 
information - so you might be able to decide that some decimation is OK to 
do without aliasing.

You should be asking yourself this question:
Even though I have a huge number of samples, what is the frequency 
resolution that I require?  The frequency resolution is the reciprocal of 
the temporal epoch that you choose to analyze.

Example:

If you have 1 second worth of samples and the sample rate is 1MHz, then you 
have 10^6 samples.  If you FFT the whole sequence, you will have 1Hz 
resolution over a range 0.5MHz (fs/2).  Maybe 1Hz resolution is overkill for 
your application.

So, 0.1secs of data would be 100,000 samples with 10Hz resolution... and so 
forth.

Pick the temporal length that gives suitable resolution.

I hope this helps.

Fred

On Jun 7, 7:11 pm, "prad" <pradeep.ferna...@gmail.com> wrote:
> Ron:
>    Randomized Statistical Sampling is another good idea. Initially I was
> thinking along another line involving random sampling. I was thinking
> about producing random data samples and then performing FFT on these
> samples. In fact, I did it. But since I am not that familiar with DSP and
> FFT, could not really figure out how to interpret the FFT results. In
> fact, most of the links I found on FFT with non-uniformly spaced samples
> were interpolating to find the equally spaced samples and then performing
> FFT. Is this the standard technique for FFT with non-uniformly spaced
> samples? Thanks Ron for this new idea. I will investigate it further.

Actually, this might be a place where trying to use a randomly
sampled low pass filter might be better than nothing.
Essentially create your low pass filter waveform (say a
windowed sinc of some period and width), and then use that
filter waveform in a weighted random number generator.  Use
those weighted random numbers to select sub-samples centered
around the neighborhood of a sample point of interest.  After
some number of sub-samples, if the mean and variance seem to be
converging somewhere after a sufficient number of sub-samples,
then the mean might approximate the value of a decimated sample
of the bandlimited signal perhaps within some statistical
confidence interval.

Does this type of procedure have a name?

IMHO. YMMV.
--
rhn A.T nicholson d.0.t C-o-M

Now I am sure that what you are doing is sheer nonsense. Besides the 
dumb brute forcing, the obvious sign is the cluelessness, the other 
obvious sign is the secrecy. The earlier you will dismiss your priceless 
ideas, the better.

When people do a serious research, they don't ask the stupid questions 
in the newsgroups. Instead, they learn the subject themselves and/or 
seek for the professional assistance.

VLV

prad wrote:
> I am sorry that you think that. This is for a research subject and not
> homework. I am not sharing details about the data as it would give away
> the novel modeling I am trying to do. Thanks to all those who gave useful
> information and helped me. 
> 
> VLV: Please think before you send a comment like that. 
> 
> 
> 
> Pradeep.
> 

>>
>>Don't you know? This is homework. A stupident is generating a huge data 
>>by software and then trying to make use of that data. Numerical nonsense
> 
> 
>>instead of thinking of the better approaches, as usual.
>>
>>VLV
>>

Rune Allnor wrote:

(snip)

> Why don't you get into details about how these data
> come about and what you try to do with them?

He seems to want to keep the data source proprietary,
in which case I believe our answers also should be.

Now, if he wants to pay for answers, that is different.

-- glen

jim wrote:
(snip)

> What good is a 1000 point moving average filter going to be? If he is
> downsampling from 10^13 samples to 10^6 samples to prevent aliasing he's
> going to need a low pass filter that has millions and millions of
> coefficients. 

Maybe it should be more than 1000, maybe less.  I haven't seen any 
numbers related to the frequency range of interest.

>  	It's hard to imagine why, if the data contains only important info at
> such a low frequency, it is being sampled at such a high sample rate in
> the first place.

It seems that it is generated, and not from a natural source.  It might
be the output of a linear congruential or linear feedback shift 
register, for example.  It might be a 43 bit LFSR, and one bit samples.
Note that the number of bits per sample still hasn't been mentioned.

I believe with either linear congruential or LFSR you can rewrite it
to generate every Nth point of the original series.

-- glen

On 9 Jun, 02:39, "prad" <pradeep.ferna...@gmail.com> wrote:
> I am sorry that you think that.

Don't top post when commenting on previous posts.

> This is for a research subject and not
> homework. I am not sharing details about the data as it would give away
> the novel modeling I am trying to do.

Those who have followed my posts here for a couple of years
would know that I sometimes make a point of saying that I
am an engineer, not a researcher, despite my misfortune
of having obtained a PhD degree. The difference is that
the engineer knows what he is doing whereas the researcher [*]
does not.

Tour project is flawed. There is no need whatsoever to
come up with those sorts of data in any but the biggest
survey projects.

There is the vanishing (though still non-zero) chance that
you really are into something, but even so, your timing is
wrong. These days, the sheer logistics of what you try to do
is out of reach for anyone but the largest computer centra.
That will change with time. When I started playing with
computers 20 years ago, it was specialist's work to handle
more than 64 kilobytes of data at any one time. When I first
played with certan sonar data processing ideas some 10 years
ago, anything beyond 1MByte was, for practical purposes,
outside my reach. These days my computer's RAM is the
limitation (it has only 512 MBytes) and I plan my programs
for use with, say, 10 Gyte of data once I can get my hands
on that sort of computer. It will be affordable by the
time I finish my program.

So times change, and maybe you or I will look up this
thread in five or ten years time and smile at the old
days when a mere 20 TB of data represented an insurmountable
obstacle.

However, ath the time of writing, June 2007, 20 TB of data
*is* an insurmountable obstacle. If you end up with the
need to process that sort of data, there is sucha huge
discrepancy between what you want and what is whithin
your abilities to do, that you might as well do something
else in the mean time, waiting for the required technology
to become available to you.

If you do this as part of an employment, circulate your
resume. Whoever assigned you to this task has no idea
what he or she is doing.

> VLV: Please think before you send a comment like that.

Vladimir is, as is his habit, perfectly on the mark.

Rune

[*] In Norwegian, "researcher" is translated to "forsker"
    "one who does research", whereas "scientist" is translated
    to "vitenskapsmann" which means "one who knows the
    sciences." The difference might be subtle, but anyone
    can emark on some sort of research, while insight into
    the sciences requires more insight, usually otained
    through decades of dedicated studies. Needless to say,
    the world are full of researchers, with hardly one
    scientist alive, world wide.