## Purity > 1? Started by 3 years ago26 replieslatest reply 3 years ago221 views
I have an application where I am computing a purity value, because I'm interested in how much noise is present.  Since I don't have a full spectrum measurement (I'm using the generalized Goertzel https://doi.org/10.1186/1687-6180-2012-56 to compute a single DTFT), I use Parseval's energy theorem to compute full signal energy in the time domain.  This normally works very well, but occasionally I get a purity slightly greater than 1.  Is there some good reason for this, or should I start looking for an error in the computation?

I have compared my Goertzel implementation to FFTW so I know there is minimal error there: https://bitbucket.org/hexamer/testgoertzelresonato...

[ - ] Please give some description  and expression for calculation of the purity.

[ - ] "Total energy" from time domain is the summation of all the squares of samples Σ(x[n])^2

"DTFT energy" is (2/N)*(DTFT magnitude)^2

Purity = "DTFT energy" / "Total energy"

[ - ] How much is the difference?   If it is a factor of two or a factor of sqrt(2) or their inverses, there is an arbitrary scaling factor that can be applied.   If it is something very close to 1.0, it is probably just a scale factor error (maybe due to quantization) in the transform coefficients.

[ - ] Its normally very close to 1 (e.g in a low noise scenario, it's typical for the purity to be > 0.999).

Goertzel is known to have numerical errors with large input sequences, near theta=0 (up to N^2 error growth), but I've applied the Reinsch modifications to mitigate this.  With the size of sequences I use and these mitigations, these errors are < 1e-10, as confirmed in this project:https://bitbucket.org/hexamer/testgoertzelresonato...

[ - ] I think DTFT energy should be (1/N)*(DTFT)^2  .

It all depends as to how DTFT is calculated.  Please compare the DTFT with the FFT output.  To obtain the Fourier coefficients from FFT output, the DC and Nyquist terms have to be divided by N/2.  The other terms by N.  Parseval's theorem holds for the Fourier coefficients.

[ - ] I had been studying this which also notes the special treatment of the 0 and nyquist bins: http://www.dspguide.com/ch10/7.htm

I can definitely see similar effects in my application - as the DTFT is approaching Nyquist, the purity can be near 2.

In the dspguide the factor for the adjusted DFT is 2/N.  In my application, for low noise signals the purity is very near (and occasionally more than) 1, so I assume 2/N is right.

[ - ] If you have access to the input time series, you can apply FFT and get the specific Fourier component you calculate with Goertzel.  Do they differ by a factor?

Does most of the energy in your time series reside in the DTFT component?  Normally the purity for a single frequency should be well below 1, and as you have noted, not exceed 1.

I suspect you may missing another factor of (1/N),

[ - ] DFT is defined in math as: Thus there is no down scaling of accumulated sum (hence output is scaled up by N relative to input). For iDFT the accumulator result is scaled back by 1/N. Apparently the purpose is to make sure if you apply DFT followed by iDFT you get back to your original signal.

This scaling is arbitrary and is not respected by most hardware fft ip vendors (Intel,xilinx...etc). Instead they output(in both cases of fft/ifft) scaling up by sqrt(N).

Either way it is correct. You only need to take scaling effect out of the analysis. accumulation of N samples scales result N times and you might conclude that you need to divide back by N but this approach causes trouble to power results. Matlab fft follows math rule and outputs scaling by N (i.e. just accumulates) and it outputs ifft as 1/N of accumulated result. So if you want to see fft power in matlab you need divide it back by sqrt(N). Tools, books & platforms are not that in sync.

For the case of purity of single tone, that depends on DFT smearing and noise but it can't be > 1 based on the posted definition because you get out what you have put in.

[ - ] Thank you, Kaz.  This is kind of what I suspected.  Purity > 1 makes no sense.  Wondering if this has anything to do with the use of DTFT vs DFT.

[ - ] The factor of 2 in the purity expression is not correct. The outside factor should be (1/N)

Please see the simple derivation of Parseval's theorem in

[ - ] It’s pretty correct. OP is only calculating a single bin value. There would be an identical magnitude value at the negative bin, hence the factor of two

[ - ] Maybe a dumb question, but are you using the same N for the Goerztel and signal energies?  If they were mismatched, it would mostly work on average, but it could for sure cause excursions above 1

[ - ] Not a dumb question at all, in fact it's been on my mind too.  As you probably know, to compute a DFT coefficient with Goertzel, you iterate one final time with a 0 input.  So, in my case to compute the signal energy, the factor I use is actually 2/(N+1), not 2/N.  That last sample does not affect the time domain energy since it's just a summation of the samples squared.  However, if I were to use N instead of N+1, it would only make the purity greater.  And, I don't know is skipping the last iteration with 0 input is valid for my applicaiton.

[ - ] Hi again. My apologies for not providing a more rigorous mathematical analysis, but I have to clock in here in a few minutes. I ran some tests in Python and I think I was able to reproduce what you were seeing. Given an input sine wave, get the RMS, square, sum, divide by length, root. Easy enough.  I then did a ‘manual’ DFT bin calculation by modulating the data with a complex sinusoid of the same frequency and summing all the data.  The corresponding energy/rms/whatever can be calculated as root of two times the square of the magnitude.  For frequencies where an integer number of periods were contained in the input, the values matched exactly. For non integer values, the DFT value was slightly higher, which would correspond to your purity > 1.  If I were off the cuff justifying it, I’d say that the window on the Complex modulation results in non-zero side lobes at the corresponding negative frequency, which means something something so it doesn’t work. I’ll try and follow up when I have more time, but at least I can back up that you aren’t crazy

[ - ] Nice to see that a completely different approach confirms similar results!

[ - ] Aaron45, are you using 64-bit numerical values in your computations? Also, what is the nature of your input test signal?

[ - ] Hi Rick,

The inputs are all 16 bit samples, and all subsequent computations are done with doubles.  Input signals are very often pure sinusoids with some low background noise, besides quantization.

I'm starting to think that I should start generating some simulated ideal inputs to see if I can characterize when purity goes greater than 1.

I've also started to ponder how using a DTFT is affecting things.  With an FFT with discrete bins, and the 0 and fs/2 bins requiring special treatment for for Parseval to hold true, what might that mean for DTFT where there aren't necessarily discrete bins?

Aaron.

[ - ] Hi Aaron45. I see what's happening here!

The version of Parseval's Theorem that applies to digital signal processing is:

Σ(x[n])^2 = 1/N*Σ(|X[m]|)^2         (1)

where |X[m]| is the N spectral magnitude samples computed using an N-point DFT (or FFT). The above Equation (1) says that the total time-domain energy will be equal to the total spectral magnitude energy divided by N.

Your computation of the value you call "Purity" will only be correct if: (1) the single input sinusoid's frequency is exactly at a DFT bin center (no spectral leakage), and (2) the input sinusoid is noise free.

So, ...for a noise-contaminated signal you must compute an N-point DFT, compute the |X[m]| magnitude sequence, and then use the quantities in my above Equation (1). And then your Purity value will always be unity.

SIDE NOTE: Aaron45, the Wiki page for the Goertzel algorithm states that poles on the z-plane's unit circle make the Goertzel algorithm "marginally stable and vulnerable to numerical-error accumulation". That widely-believed statement is NOT true. The Goertzel algorithm is always guaranteed stable!

[ - ] Thank you for this insight, Rick.  One reason I use the generalized Goertzel (a DTFT and not DFT) is so I don't have to worry about bin centering.  But I think you're hinting that the use of the DTFT is a reason purity does not seem to work.  I have some ideas for an experiment to see how the DTFT affects purity - stay tuned :)

On Goertzel, I agree with you that the marginally stable part is wrong .. I've read your excellent article on that matter.  The part on numerical-error accumulation correct.  Not that I really needed to given the amount of literature on the matter, but I independently confirmed it: https://www.dsprelated.com/thread/767/error-growth...

[ - ] Hi Aaron45. I think of the numerical-error accumulation as being caused by the fact that we cannot place the Goertzel network's poles *exactly* at the desired angles on the z-plane's unit circle due to quantized coefficients. With quantized network coefficients the poles will always be located on the unit circle, but at angles that are very slightly different than what we desire. (By the way, when the desired pole angles are 0, ±pi/2, or pi radians then no coefficient quantization error occurs.)

Aaron45, my previous November 20, 2020 "Reply" was based on my assumption that in your expression:

"DTFT energy" is (2/N)*(DTFT magnitude)^2

the term "DTFT magnitude" was a single number. Was my assumption correct?

[ - ] Rick, Yes, that is a correct assumption.  It's a single frequency point DTFT calculation, most often not at a bin center.

[ - ] Hi Aaron45. OK. So when a single input sinusoid's frequency is not at an N-point DFT's bin center then all of the DFT's N magnitude samples will be nonzero  Then you can now see that using a single DFT magnitude sample will not compute the correct total spectral energy (your "DTFT energy" value).
[ - ] Purity plot (last 1000).png

Purity plot (first 200).png

Thanks to everyone who took the time to chime in here. Rick's comment got me thinking about how leakage affects the purity calculation in my application, so I tried a couple experiments.

The first experiment I tried was hacking up my application to send in simulated (ideal) data in place of the real data. I noticed some definite trends (see graphs):  Purity plot (last 1000), Purity Plot (first 200)

In my second experiment, I made an Octave script (see attached) to visualize where the DTFT is sampled and how the leakage occurs. The results are consistent with my application (see graph): Purity vs Bin Offset N is 16, f is 0.25 fs

1) There is a sinusoidal oscillation in the purity value with a frequency of 2x the DFT bin width.
2) When well away from 0 or Nyquist, regardless of bin center offset, the purity is less than or equal to 1
3) As expected, processing gain minimizes leakage and makes purity higher, all else equal
4) The phase of the purity oscillation within the frequency spectrum depends on whether N is even or odd.
5) When approaching 0 or Nyquist, the purity can become greater than 1. For the near-fs/2 case, It approaches 3/2 then jumps to 2 right at Nyquist. Overall it looks very sinc-like. For the near-0 case, it looks a little more strange.
6) If I force all samples to be at bin center by enforcing an integer number of cycles, the purity at the fs/2 sample is still 2. Consistent with corrections described here?: https://www.dspguide.com/ch10/7.htm

One naive way to address this would be to place extra emphasis on capturing integer cycles to get close to a bin center. I can do this with rational approximation of fs/f and using the denominator as the number of cycles. Unfortunately this is inefficient as it leads to higher samples and loss of consistent bandwidth control.

That has me wondering whether I can construct a correction factor based on inputs like N, bin offset, f/fs.

I initially though about sampling the ideal DTFT (Dirichlet pair) to compute the theoretical purity error and correct for that. But that could be computationally expensive for large N.

Does anyone know of a a way to compute such a "Leakage Correction Factor".  I've been searching and found a few articles.  The most promising may be this article which attempts to do something similar: http://www.nicholson.com/rhn/dsp.html#4, but it seems that it's finding the sum of the DFT bin magnitudes whereas I need to compute an energy term? Unfortunately there is no derivation there so I can't see how to adapt it.

12/8/2020: Edited Observation 4 above.  Being at a bin center always creates purity = 1.  Here's what it looks like across half spectrum for even and odd N:  Purity Simulation Half Spectrum N is 16.pdf Purity Simulation Half Spectrum N is 17.pdf

[ - ] Hi, I am sorry to say but I am totally puzzled by your analysis.

If I use Octave FFT (basically floating point) I always get 1 or very close irrespective of frequency. So are you checking your Goertzel algorithm or am I wrong using FFT?

%%%%%%%%%%%%%%%%%%%%%

clear;clc;

N = 100;

ind = 0;

for f = 0:.005:.5

x = cos(2*pi*(0:N-1)*f);

y = fft(x);

ind = ind+1;

pwr_ratio(ind) = mean(abs(y).^2)/sum(x.^2);

end

plot(pwr_ratio,'.-');

%%%%%%%%%%%%%%%%%%%%%%%

[ - ] Kaz,

Your results make sense to me.  Your power ratio is always based on time domain samples or DFT frequency samples.  So this script is really demonstrating Parseval's Theorem.  What Rick has pointed out is that my application is expecting that my single DTFT sample at the pure frequency is just an ordinary DFT sample.  And of course when it's off bin center it's not a DFT sample.  For Parseval's to hold, the Frequency domain samples need to be DFT samples.  You can visualize what's happening in my case if you execute purity.m.  When my pure signal is off bin center, all the DFT samples have some non-0 values (leakage).  Since the energy in those non-0 DFT samples is equivalent to the time domain energy (Parseval), it affects the time domain energy (the basis for my total energy denominator), and makes purity != 1.

Aaron

[ - ] Sorry for dragging this further.

firstly, we agree that we need to force total power of time domain & frequency domain to be same, not just to keep to Parseval's but for this task's common sense.

secondly, we agree that one or few bins of dft should have less power than the sum of all bins.

thirdly we agree that a single tone may end up in one bin or leak.

and finally we agree that if there is additional noise at input then it will go to its bins.

So no matter what resolution you use or what frequency you observe or what leak occurs the power (in one or more bins) as ratio to all bins should be either 1 or less than 1.

If you get >> 1 you are doing something wrong or your dft is going wrong but I checked your gga function (Goertzel) and it is doing well.