DSPRelated.com
Forums

power spectrum via autocorrelation of broken data time series

Started by pelusa September 23, 2008
Hi,
I think I understand the basic ideas of spectral analysis but I still have
no good feeling how to improve.
My problem is that I have data that is broken every 3 minutes for about
20seconds. In order to generate a power spectrum I try several attempts of
simulated data to see what will be the best. I know that the power spectrum
should be a power law. So I generate a random time series by drawing from
such a power law. By chopping out sections, I simulate the broken time
series. First of all to see if up to now I did everything right, I choose 0
width for the chopped out sections and FFT plus square it. The power law is
back. OK so far so good. Now I have three methods:

1) linearly interpolate in between the non-data section and FFT square it
to get the right PSD

2) Just use sequences of non-broken pieces and average the power spectrum
afterwards. But I loose all low frequency information

3) Calculate the autocorrelation function of the broken time series,
calculate the autocorrelation function of the rectangular window function,
which is basically the function that as used to create the broken time
series, by multiplying it with the full time series (ones and zeros). Now I
divide the two autocorrelation functions and the FFT it to get the power
spectrum estimate of the true, unbroken time series.

All PSD are smoothed by a boxcar filter, which helps a lot to reduce
variance, and as I have lots of data points, this is also OK

results:
1) works fairly well, but I don't like it too much as I am adding data
points that I did not measure, even though they are not contributing too
much to any frequency content

2) works really good but low frequency information is lost

3) I like it the most as it seems to reflect the reality the best. But the
result is not really good. Its quite noisy at higher frequencies, starting
at the corresponding time range of the broken sequences. Even after the
boxcar filter there are quite some features (sinc leaking? -- I have no
real intuition when it comes to autocorrelation what this actually means I
know FT of a rectangular window produces the sinc function and hence severe
leaking --does the the same argumentation hold, when it comes to time
lags?)

My question would be what is the best way to deal with such broken time
series? I could use the high frequency part from method (2) and the low
frequency part from method (3) but this seems very odd! Also I could try to
use non-rectangular windows but this does not reflect the real time series,
however I assume my problem is the leaking, so a Hamming window for each
sequence could probably help? But what about normalization? Is there a
standard way to deal with such problems?
I don't have a good feeling to what I should with the autocorrelation
functions. Should I try window them as it seems the dividing of the window
autocorrelation function still leaves some little peaks behind which screw
up my PSD in the end.

I am coding everything in C, so standard Matlab functions won't help.

Thanks for your help and sorry for the long post!
Pelusa


On 23 Sep, 14:05, "pelusa" <young_f...@gmx.de> wrote:
> Hi, > I think I understand the basic ideas of spectral analysis but I still have > no good feeling how to improve. > My problem is that I have data that is broken every 3 minutes for about > 20seconds. In order to generate a power spectrum I try several attempts of > simulated data to see what will be the best. I know that the power spectrum > should be a power law. So I generate a random time series by drawing from > such a power law. By chopping out sections, I simulate the broken time > series. First of all to see if up to now I did everything right, I choose 0 > width for the chopped out sections and FFT plus square it. The power law is > back. OK so far so good. Now I have three methods: > > 1) linearly interpolate in between the non-data section and FFT square it > to get the right PSD > > 2) Just use sequences of non-broken pieces and average the power spectrum > afterwards. But I loose all low frequency information > > 3) Calculate the autocorrelation function of the broken time series, > calculate the autocorrelation function of the rectangular window function, > which is basically the function that as used to create the broken time > series, by multiplying it with the full time series (ones and zeros). Now I > divide the two autocorrelation functions and the FFT it to get the power > spectrum estimate of the true, unbroken time series. > > All PSD are smoothed by a boxcar filter, which helps a lot to reduce > variance, and as I have lots of data points, this is also OK > > results: > 1) works fairly well, but I don't like it too much as I am adding data > points that I did not measure, even though they are not contributing too > much to any frequency content
Two factor which both go against this method.
> 2) works really good but low frequency information is lost
Depending on what you mean by 'low frequency contents' this might be an obsolete method. The high-frequency parts of the data are represented by the good data sequences anyway.
> 3) I like it the most as it seems to reflect the reality the best. But the > result is not really good. Its quite noisy at higher frequencies, starting > at the corresponding time range of the broken sequences. Even after the > boxcar filter there are quite some features (sinc leaking? -- I have no > real intuition when it comes to autocorrelation what this actually means I > know FT of a rectangular window produces the sinc function and hence severe > leaking --does the the same argumentation hold, when it comes to time > lags?) > > My question would be what is the best way to deal with such broken time > series? I could use the high frequency part from method (2) and the low > frequency part from method (3) but this seems very odd! Also I could try to > use non-rectangular windows but this does not reflect the real time series, > however I assume my problem is the leaking, so a Hamming window for each > sequence could probably help? But what about normalization? Is there a > standard way to deal with such problems? > I don't have a good feeling to what I should with the autocorrelation > functions. Should I try window them as it seems the dividing of the window > autocorrelation function still leaves some little peaks behind which screw > up my PSD in the end.
You didn't mention sampling frequencies, you didn't quantify relative terms like 'high frequency' and 'low frequency', and you didn't mention why you do the analysis. Anyway, I would start by estimating the autocorrelation function from the data available and not do too much manipulations. That is, treat each 2 min 40 sec sequence as an independent data sequence, compute the autocorrelations and PSDs of each, and then average the PSDs to find the 'total' PSD. Only if it is demonstrated that some required [*] information is lost by this method would I starte evaluating other approaches. Rune {*} Note the difference between 'required' and 'desired' information. One would always like (desire) to get everything all the time, but in a less than perfect world one needs to prioritoze what is actually required. If one can get the required stuff with an imperfect solution, that's OK and any additional efforts to get a perfect result is wasted.
>On 23 Sep, 14:05, "pelusa" <young_f...@gmx.de> wrote: >> Hi, >> I think I understand the basic ideas of spectral analysis but I still
have
>> no good feeling how to improve. >> My problem is that I have data that is broken every 3 minutes for
about
>> 20seconds. In order to generate a power spectrum I try several attempts
of
>> simulated data to see what will be the best. I know that the power
spectrum
>> should be a power law. So I generate a random time series by drawing
from
>> such a power law. By chopping out sections, I simulate the broken time >> series. First of all to see if up to now I did everything right, I
choose 0
>> width for the chopped out sections and FFT plus square it. The power
law is
>> back. OK so far so good. Now I have three methods: >> >> 1) linearly interpolate in between the non-data section and FFT square
it
>> to get the right PSD >> >> 2) Just use sequences of non-broken pieces and average the power
spectrum
>> afterwards. But I loose all low frequency information >> >> 3) Calculate the autocorrelation function of the broken time series, >> calculate the autocorrelation function of the rectangular window
function,
>> which is basically the function that as used to create the broken time >> series, by multiplying it with the full time series (ones and zeros).
Now I
>> divide the two autocorrelation functions and the FFT it to get the
power
>> spectrum estimate of the true, unbroken time series. >> >> All PSD are smoothed by a boxcar filter, which helps a lot to reduce >> variance, and as I have lots of data points, this is also OK >> >> results: >> 1) works fairly well, but I don't like it too much as I am adding data >> points that I did not measure, even though they are not contributing
too
>> much to any frequency content > >Two factor which both go against this method. > >> 2) works really good but low frequency information is lost > >Depending on what you mean by 'low frequency contents' >this might be an obsolete method. The high-frequency parts >of the data are represented by the good data sequences >anyway.
I am samplimg at 20ms, and need the information all the way up to Nyquist, hence 25Hz. In addition I need the information on the low frequency part down to 10^-4Hz, hence hours. I do have the data for 8 hours, just broken into pieces.
> >> 3) I like it the most as it seems to reflect the reality the best. But
the
>> result is not really good. Its quite noisy at higher frequencies,
starting
>> at the corresponding time range of the broken sequences. Even after
the
>> boxcar filter there are quite some features (sinc leaking? -- I have
no
>> real intuition when it comes to autocorrelation what this actually
means I
>> know FT of a rectangular window produces the sinc function and hence
severe
>> leaking --does the the same argumentation hold, when it comes to time >> lags?) >> >> My question would be what is the best way to deal with such broken
time
>> series? I could use the high frequency part from method (2) and the
low
>> frequency part from method (3) but this seems very odd! Also I could
try to
>> use non-rectangular windows but this does not reflect the real time
series,
>> however I assume my problem is the leaking, so a Hamming window for
each
>> sequence could probably help? But what about normalization? Is there a >> standard way to deal with such problems? >> I don't have a good feeling to what I should with the autocorrelation >> functions. Should I try window them as it seems the dividing of the
window
>> autocorrelation function still leaves some little peaks behind which
screw
>> up my PSD in the end. > >You didn't mention sampling frequencies, you didn't quantify relative >terms >like 'high frequency' and 'low frequency', and you didn't mention why >you >do the analysis. > >Anyway, I would start by estimating the autocorrelation function from >the >data available and not do too much manipulations. That is, treat each >2 min 40 sec sequence as an independent data sequence, compute the >autocorrelations and PSDs of each, and then average the PSDs to >find the 'total' PSD. > >Only if it is demonstrated that some required [*] information is lost >by this method would I starte evaluating other approaches. > >Rune > >{*} Note the difference between 'required' and 'desired' information. > One would always like (desire) to get everything all the time, > but in a less than perfect world one needs to prioritoze what > is actually required. If one can get the required stuff with an > imperfect solution, that's OK and any additional efforts to get > a perfect result is wasted. >
Thanks for your reply, see my response in between regarding sampling frequency and low/high frequency. I am doing this analysis in order to get the complete power spectrum, which should be a power law, however might deviate here and there a bit due to physical reasons. hence it is essential to have the cleanest possible spectrum on all frequency scales (OK I know this might not be possible but at lest I try to understand and search for the best analysis method, available). The autocorrelation from simply the broken data looks awful, by no means like a power law (I am talking about my simulation, where I know what I should get in the end now). So I definitely have to use window functions to represent the brakes. The question would be if I have to further use a different window to not introduce leaking. However, this process with dividing autocorrelation functions should actually give an estimate of the true, un-windowed time series .... Thomas