Forums

Purity > 1?

Started by Aaron45 2 weeks ago20 replieslatest reply 6 days ago112 views
I have an application where I am computing a purity value, because I'm interested in how much noise is present.  Since I don't have a full spectrum measurement (I'm using the generalized Goertzel https://doi.org/10.1186/1687-6180-2012-56 to compute a single DTFT), I use Parseval's energy theorem to compute full signal energy in the time domain.  This normally works very well, but occasionally I get a purity slightly greater than 1.  Is there some good reason for this, or should I start looking for an error in the computation?

I have compared my Goertzel implementation to FFTW so I know there is minimal error there: https://bitbucket.org/hexamer/testgoertzelresonato...

[ - ]
Reply by PlatybelNovember 18, 2020

Please give some description  and expression for calculation of the purity.

[ - ]
Reply by Aaron45November 18, 2020

"Total energy" from time domain is the summation of all the squares of samples Σ(x[n])^2

"DTFT energy" is (2/N)*(DTFT magnitude)^2

Purity = "DTFT energy" / "Total energy"

[ - ]
Reply by SlartibartfastNovember 19, 2020

How much is the difference?   If it is a factor of two or a factor of sqrt(2) or their inverses, there is an arbitrary scaling factor that can be applied.   If it is something very close to 1.0, it is probably just a scale factor error (maybe due to quantization) in the transform coefficients.

[ - ]
Reply by Aaron45November 19, 2020

Its normally very close to 1 (e.g in a low noise scenario, it's typical for the purity to be > 0.999).

Goertzel is known to have numerical errors with large input sequences, near theta=0 (up to N^2 error growth), but I've applied the Reinsch modifications to mitigate this.  With the size of sequences I use and these mitigations, these errors are < 1e-10, as confirmed in this project:https://bitbucket.org/hexamer/testgoertzelresonato...


[ - ]
Reply by PlatybelNovember 18, 2020

I think DTFT energy should be (1/N)*(DTFT)^2  .

It all depends as to how DTFT is calculated.  Please compare the DTFT with the FFT output.  To obtain the Fourier coefficients from FFT output, the DC and Nyquist terms have to be divided by N/2.  The other terms by N.  Parseval's theorem holds for the Fourier coefficients.

[ - ]
Reply by Aaron45November 19, 2020

I had been studying this which also notes the special treatment of the 0 and nyquist bins: http://www.dspguide.com/ch10/7.htm

I can definitely see similar effects in my application - as the DTFT is approaching Nyquist, the purity can be near 2.

In the dspguide the factor for the adjusted DFT is 2/N.  In my application, for low noise signals the purity is very near (and occasionally more than) 1, so I assume 2/N is right.

[ - ]
Reply by PlatybelNovember 19, 2020

If you have access to the input time series, you can apply FFT and get the specific Fourier component you calculate with Goertzel.  Do they differ by a factor?  

Does most of the energy in your time series reside in the DTFT component?  Normally the purity for a single frequency should be well below 1, and as you have noted, not exceed 1.

I suspect you may missing another factor of (1/N),

[ - ]
Reply by kazNovember 19, 2020

DFT is defined in math as:

{\displaystyle X_{k}=\sum _{n=0}^{N-1}x_{n}e^{-i2\pi kn/N}\qquad k=0,\ldots ,N-1,}

Thus there is no down scaling of accumulated sum (hence output is scaled up by N relative to input). For iDFT the accumulator result is scaled back by 1/N. Apparently the purpose is to make sure if you apply DFT followed by iDFT you get back to your original signal.

This scaling is arbitrary and is not respected by most hardware fft ip vendors (Intel,xilinx...etc). Instead they output(in both cases of fft/ifft) scaling up by sqrt(N). 

Either way it is correct. You only need to take scaling effect out of the analysis. accumulation of N samples scales result N times and you might conclude that you need to divide back by N but this approach causes trouble to power results. Matlab fft follows math rule and outputs scaling by N (i.e. just accumulates) and it outputs ifft as 1/N of accumulated result. So if you want to see fft power in matlab you need divide it back by sqrt(N). Tools, books & platforms are not that in sync.

For the case of purity of single tone, that depends on DFT smearing and noise but it can't be > 1 based on the posted definition because you get out what you have put in.

[ - ]
Reply by Aaron45November 20, 2020

Thank you, Kaz.  This is kind of what I suspected.  Purity > 1 makes no sense.  Wondering if this has anything to do with the use of DTFT vs DFT.

[ - ]
Reply by PlatybelNovember 20, 2020

The factor of 2 in the purity expression is not correct. The outside factor should be (1/N)

Please see the simple derivation of Parseval's theorem in  

https://www.gaussianwaves.com/2020/09/parsevals-th...


[ - ]
Reply by dszaboNovember 24, 2020

It’s pretty correct. OP is only calculating a single bin value. There would be an identical magnitude value at the negative bin, hence the factor of two

[ - ]
Reply by dszaboNovember 24, 2020

Maybe a dumb question, but are you using the same N for the Goerztel and signal energies?  If they were mismatched, it would mostly work on average, but it could for sure cause excursions above 1

[ - ]
Reply by Aaron45November 24, 2020

Not a dumb question at all, in fact it's been on my mind too.  As you probably know, to compute a DFT coefficient with Goertzel, you iterate one final time with a 0 input.  So, in my case to compute the signal energy, the factor I use is actually 2/(N+1), not 2/N.  That last sample does not affect the time domain energy since it's just a summation of the samples squared.  However, if I were to use N instead of N+1, it would only make the purity greater.  And, I don't know is skipping the last iteration with 0 input is valid for my applicaiton.

[ - ]
Reply by Rick LyonsNovember 19, 2020

Aaron45, are you using 64-bit numerical values in your computations? Also, what is the nature of your input test signal?


[ - ]
Reply by Aaron45November 19, 2020

Hi Rick,

The inputs are all 16 bit samples, and all subsequent computations are done with doubles.  Input signals are very often pure sinusoids with some low background noise, besides quantization.

I'm starting to think that I should start generating some simulated ideal inputs to see if I can characterize when purity goes greater than 1.

I've also started to ponder how using a DTFT is affecting things.  With an FFT with discrete bins, and the 0 and fs/2 bins requiring special treatment for for Parseval to hold true, what might that mean for DTFT where there aren't necessarily discrete bins?

Aaron.

[ - ]
Reply by Rick LyonsNovember 20, 2020

Hi Aaron45. I see what's happening here!

The version of Parseval's Theorem that applies to digital signal processing is:

Σ(x[n])^2 = 1/N*Σ(|X[m]|)^2         (1)

where |X[m]| is the N spectral magnitude samples computed using an N-point DFT (or FFT). The above Equation (1) says that the total time-domain energy will be equal to the total spectral magnitude energy divided by N.

Your computation of the value you call "Purity" will only be correct if: (1) the single input sinusoid's frequency is exactly at a DFT bin center (no spectral leakage), and (2) the input sinusoid is noise free.

So, ...for a noise-contaminated signal you must compute an N-point DFT, compute the |X[m]| magnitude sequence, and then use the quantities in my above Equation (1). And then your Purity value will always be unity.

SIDE NOTE: Aaron45, the Wiki page for the Goertzel algorithm states that poles on the z-plane's unit circle make the Goertzel algorithm "marginally stable and vulnerable to numerical-error accumulation". That widely-believed statement is NOT true. The Goertzel algorithm is always guaranteed stable!

[ - ]
Reply by Aaron45November 20, 2020

Thank you for this insight, Rick.  One reason I use the generalized Goertzel (a DTFT and not DFT) is so I don't have to worry about bin centering.  But I think you're hinting that the use of the DTFT is a reason purity does not seem to work.  I have some ideas for an experiment to see how the DTFT affects purity - stay tuned :)


On Goertzel, I agree with you that the marginally stable part is wrong .. I've read your excellent article on that matter.  The part on numerical-error accumulation correct.  Not that I really needed to given the amount of literature on the matter, but I independently confirmed it: https://www.dsprelated.com/thread/767/error-growth...

[ - ]
Reply by Rick LyonsNovember 21, 2020

Hi Aaron45. I think of the numerical-error accumulation as being caused by the fact that we cannot place the Goertzel network's poles *exactly* at the desired angles on the z-plane's unit circle due to quantized coefficients. With quantized network coefficients the poles will always be located on the unit circle, but at angles that are very slightly different than what we desire. (By the way, when the desired pole angles are 0, ±pi/2, or pi radians then no coefficient quantization error occurs.)

Aaron45, my previous November 20, 2020 "Reply" was based on my assumption that in your expression:

"DTFT energy" is (2/N)*(DTFT magnitude)^2

the term "DTFT magnitude" was a single number. Was my assumption correct?

[ - ]
Reply by Aaron45November 21, 2020

Rick, Yes, that is a correct assumption.  It's a single frequency point DTFT calculation, most often not at a bin center.

[ - ]
Reply by Rick LyonsNovember 22, 2020
Hi Aaron45. OK. So when a single input sinusoid's frequency is not at an N-point DFT's bin center then all of the DFT's N magnitude samples will be nonzero  Then you can now see that using a single DFT magnitude sample will not compute the correct total spectral energy (your "DTFT energy" value).