Reply by Ron N. June 11, 20072007-06-11
On Jun 10, 1:38 pm, "Fred Marshall" <fmarshallx@remove_the_x.acm.org>
wrote:
> So, I might take 1024 or 4096 or .... you choose the number ... and compute > an FFT on just those contiguous samples. You might do this for various such > epochs along the total sequence. While the resolution will be limited, the > entire frequency range will be covered each time.
An fft of 1024 contiguous samples won't tell you too much about the spectral content of frequencies on the rough order of one-hundredth to one-billionth of the fft's first bin, which seems to be where the OP is looking (the first 10e6 dft bins out of circa 10e13 possible?). It still seems to me that the way to look at such huge potential data sets (the weight of every rodent in North America by longitude, or some such) is to start with some statistical sampling. What I'm wondering is if there is a name for the procedure of taking a bunch of randomly spaced samples and doing a regression fit of those samples against an set of orthogonal sinusoidal basis vectors. IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M
Reply by Fred Marshall June 11, 20072007-06-11
"John E. Hadstate" <jh113355@hotmail.com> wrote in message 
news:DL8bi.26179$dy1.22507@bigfe9...
> > "Fred Marshall" <fmarshallx@remove_the_x.acm.org> wrote in message > news:aqCdnboPO6bS_PHbnZ2dnUVZ_q2pnZ2d@centurytel.net... > >> >> So, I might take 1024 or 4096 or .... you choose the number ... and >> compute an FFT on just those contiguous samples. You might do this for >> various such epochs along the total sequence. While the resolution will >> be limited, the entire frequency range will be covered each time. >> If the results are quite different then you know that the spectral >> character of the samples is varying from segment to segment. >> If the results are rather similar then the opposite. > > Consider what you would see if you computed a short DFT on a > low-baud-rate, highly-oversampled FSK signal. If the DFT is short enough > relative to the baud rate, you will see the Mark and Space frequencies in > separate DFT windows. Would you then conclude that the spectral character > of the signal is varying or would you conclude that the varying spectrum > characterizes the signal?
John, Yes, of course I would. :-) Fred
Reply by John E. Hadstate June 11, 20072007-06-11
"Fred Marshall" <fmarshallx@remove_the_x.acm.org> wrote in 
message 
news:aqCdnboPO6bS_PHbnZ2dnUVZ_q2pnZ2d@centurytel.net...

> > So, I might take 1024 or 4096 or .... you choose the > number ... and compute an FFT on just those contiguous > samples. You might do this for various such epochs along > the total sequence. While the resolution will be limited, > the entire frequency range will be covered each time. > If the results are quite different then you know that the > spectral character of the samples is varying from segment > to segment. > If the results are rather similar then the opposite.
Consider what you would see if you computed a short DFT on a low-baud-rate, highly-oversampled FSK signal. If the DFT is short enough relative to the baud rate, you will see the Mark and Space frequencies in separate DFT windows. Would you then conclude that the spectral character of the signal is varying or would you conclude that the varying spectrum characterizes the signal?
> You should be asking yourself this question: > Even though I have a huge number of samples, what is the > frequency resolution that I require? The frequency > resolution is the reciprocal of the temporal epoch that > you choose to analyze. >
A huge number of samples might be the result of oversampling. It might also be the result of long-term observation of a phenomenon that is undersampled. It sounds like the OP is not sure which case applies to his data.
Reply by prad June 10, 20072007-06-10
Fred and Ron,

      Thanks for your input. I'll try your suggestions. 


Prad.




>prad, > >If someone suggested it, then I've missed it.... here's what I would do: > >Because you have so many data points the frequency resolution could be
quite
>a bit better than you need. The fewer contiguous points you use, the >coarser the frequency resolution. > >So, I might take 1024 or 4096 or .... you choose the number ... and
compute
>an FFT on just those contiguous samples. You might do this for various
such
>epochs along the total sequence. While the resolution will be limited,
the
>entire frequency range will be covered each time. >If the results are quite different then you know that the spectral
character
>of the samples is varying from segment to segment. >If the results are rather similar then the opposite. > >Also, you'll be able to see the actual important bandwidth of the >information - so you might be able to decide that some decimation is OK
to
>do without aliasing. > >You should be asking yourself this question: >Even though I have a huge number of samples, what is the frequency >resolution that I require? The frequency resolution is the reciprocal of
>the temporal epoch that you choose to analyze. > >Example: > >If you have 1 second worth of samples and the sample rate is 1MHz, then
you
>have 10^6 samples. If you FFT the whole sequence, you will have 1Hz >resolution over a range 0.5MHz (fs/2). Maybe 1Hz resolution is overkill
for
>your application. > >So, 0.1secs of data would be 100,000 samples with 10Hz resolution... and
so
>forth. > >Pick the temporal length that gives suitable resolution. > >I hope this helps. > >Fred > > >
Reply by Fred Marshall June 10, 20072007-06-10
prad,

If someone suggested it, then I've missed it.... here's what I would do:

Because you have so many data points the frequency resolution could be quite 
a bit better than you need.  The fewer contiguous points you use, the 
coarser the frequency resolution.

So, I might take 1024 or 4096 or .... you choose the number ... and compute 
an FFT on just those contiguous samples.  You might do this for various such 
epochs along the total sequence.  While the resolution will be limited, the 
entire frequency range will be covered each time.
If the results are quite different then you know that the spectral character 
of the samples is varying from segment to segment.
If the results are rather similar then the opposite.

Also, you'll be able to see the actual important bandwidth of the 
information - so you might be able to decide that some decimation is OK to 
do without aliasing.

You should be asking yourself this question:
Even though I have a huge number of samples, what is the frequency 
resolution that I require?  The frequency resolution is the reciprocal of 
the temporal epoch that you choose to analyze.

Example:

If you have 1 second worth of samples and the sample rate is 1MHz, then you 
have 10^6 samples.  If you FFT the whole sequence, you will have 1Hz 
resolution over a range 0.5MHz (fs/2).  Maybe 1Hz resolution is overkill for 
your application.

So, 0.1secs of data would be 100,000 samples with 10Hz resolution... and so 
forth.

Pick the temporal length that gives suitable resolution.

I hope this helps.

Fred 


Reply by Ron N. June 9, 20072007-06-09
On Jun 7, 7:11 pm, "prad" <pradeep.ferna...@gmail.com> wrote:
> Ron: > Randomized Statistical Sampling is another good idea. Initially I was > thinking along another line involving random sampling. I was thinking > about producing random data samples and then performing FFT on these > samples. In fact, I did it. But since I am not that familiar with DSP and > FFT, could not really figure out how to interpret the FFT results. In > fact, most of the links I found on FFT with non-uniformly spaced samples > were interpolating to find the equally spaced samples and then performing > FFT. Is this the standard technique for FFT with non-uniformly spaced > samples? Thanks Ron for this new idea. I will investigate it further.
Actually, this might be a place where trying to use a randomly sampled low pass filter might be better than nothing. Essentially create your low pass filter waveform (say a windowed sinc of some period and width), and then use that filter waveform in a weighted random number generator. Use those weighted random numbers to select sub-samples centered around the neighborhood of a sample point of interest. After some number of sub-samples, if the mean and variance seem to be converging somewhere after a sufficient number of sub-samples, then the mean might approximate the value of a decimated sample of the bandlimited signal perhaps within some statistical confidence interval. Does this type of procedure have a name? IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M
Reply by Vladimir Vassilevsky June 9, 20072007-06-09
Now I am sure that what you are doing is sheer nonsense. Besides the 
dumb brute forcing, the obvious sign is the cluelessness, the other 
obvious sign is the secrecy. The earlier you will dismiss your priceless 
ideas, the better.

When people do a serious research, they don't ask the stupid questions 
in the newsgroups. Instead, they learn the subject themselves and/or 
seek for the professional assistance.

VLV


prad wrote:
> I am sorry that you think that. This is for a research subject and not > homework. I am not sharing details about the data as it would give away > the novel modeling I am trying to do. Thanks to all those who gave useful > information and helped me. > > VLV: Please think before you send a comment like that. > > > > Pradeep. >
>> >>Don't you know? This is homework. A stupident is generating a huge data >>by software and then trying to make use of that data. Numerical nonsense > > >>instead of thinking of the better approaches, as usual. >> >>VLV >>
Reply by glen herrmannsfeldt June 9, 20072007-06-09
Rune Allnor wrote:

(snip)

> Why don't you get into details about how these data > come about and what you try to do with them?
He seems to want to keep the data source proprietary, in which case I believe our answers also should be. Now, if he wants to pay for answers, that is different. -- glen
Reply by glen herrmannsfeldt June 9, 20072007-06-09
jim wrote:
(snip)

> What good is a 1000 point moving average filter going to be? If he is > downsampling from 10^13 samples to 10^6 samples to prevent aliasing he's > going to need a low pass filter that has millions and millions of > coefficients.
Maybe it should be more than 1000, maybe less. I haven't seen any numbers related to the frequency range of interest.
> It's hard to imagine why, if the data contains only important info at > such a low frequency, it is being sampled at such a high sample rate in > the first place.
It seems that it is generated, and not from a natural source. It might be the output of a linear congruential or linear feedback shift register, for example. It might be a 43 bit LFSR, and one bit samples. Note that the number of bits per sample still hasn't been mentioned. I believe with either linear congruential or LFSR you can rewrite it to generate every Nth point of the original series. -- glen
Reply by Rune Allnor June 9, 20072007-06-09
On 9 Jun, 02:39, "prad" <pradeep.ferna...@gmail.com> wrote:
> I am sorry that you think that.
Don't top post when commenting on previous posts.
> This is for a research subject and not > homework. I am not sharing details about the data as it would give away > the novel modeling I am trying to do.
Those who have followed my posts here for a couple of years would know that I sometimes make a point of saying that I am an engineer, not a researcher, despite my misfortune of having obtained a PhD degree. The difference is that the engineer knows what he is doing whereas the researcher [*] does not. Tour project is flawed. There is no need whatsoever to come up with those sorts of data in any but the biggest survey projects. There is the vanishing (though still non-zero) chance that you really are into something, but even so, your timing is wrong. These days, the sheer logistics of what you try to do is out of reach for anyone but the largest computer centra. That will change with time. When I started playing with computers 20 years ago, it was specialist's work to handle more than 64 kilobytes of data at any one time. When I first played with certan sonar data processing ideas some 10 years ago, anything beyond 1MByte was, for practical purposes, outside my reach. These days my computer's RAM is the limitation (it has only 512 MBytes) and I plan my programs for use with, say, 10 Gyte of data once I can get my hands on that sort of computer. It will be affordable by the time I finish my program. So times change, and maybe you or I will look up this thread in five or ten years time and smile at the old days when a mere 20 TB of data represented an insurmountable obstacle. However, ath the time of writing, June 2007, 20 TB of data *is* an insurmountable obstacle. If you end up with the need to process that sort of data, there is sucha huge discrepancy between what you want and what is whithin your abilities to do, that you might as well do something else in the mean time, waiting for the required technology to become available to you. If you do this as part of an employment, circulate your resume. Whoever assigned you to this task has no idea what he or she is doing.
> VLV: Please think before you send a comment like that.
Vladimir is, as is his habit, perfectly on the mark. Rune [*] In Norwegian, "researcher" is translated to "forsker" "one who does research", whereas "scientist" is translated to "vitenskapsmann" which means "one who knows the sciences." The difference might be subtle, but anyone can emark on some sort of research, while insight into the sciences requires more insight, usually otained through decades of dedicated studies. Needless to say, the world are full of researchers, with hardly one scientist alive, world wide.