comp.dsp | Show me some more numbers| page 3

Reply by Cedron ●June 8, 20152015-06-08

>
>There is indeed a risk that, say, one particular noise pattern
>might be pathological with respect to a particular data pattern.
>
>In simulation, you select a runsize (i.e. a large set of noise patterns)

>large enough such that these pathological combinations appear in their 
>expected incidence, such that you have created a large enough ensemble
>such that the results are not overly-dominated by outliers, but
>include them with close to their correct statistics.
>
>In a sense, you are right that varying the noise patterns with SNR
>may give you a larger ensemble effect.  

Call it the portfolio diversification effect.

>That is, if you have six SNR's
>each with 10,000 different noise patterns, you may be 
>able to look at the resulting curve or column of data and get
>a gist of what it would have looked like had you chosen a
>runsize of 60,000 in the first place.
>
>But it's not going to be as good a result as running the same 60,000 
>noise patterns at each SNR.  It is ... sort of a half-measure,
>if that makes any sense.
>
I'm not sure what you are saying.  The point I made was that if you wanted
you use the same noise patterns on all the SNR levels (or if you were just
studying one SNR level), you would want to do it multiple times so you had
multiple sets of results.  This would allow you to compare the results
sets for common patterns vs unique patterns.  If you just ran an
equivalent number of runs in one result set you would not have any way of
determining if an observed pattern was due to the formulas or the noise.

>Also you are saying runtime is not a problem for this experiment.
>

No, it just takes a few seconds to run.  Upping that some is not a
problem.  However, the precision limitation effect does seem to be a
problem.  That's one of the reasons I posted my last set of data.  When I
dropped from 10,000 runs to 1,000 runs, the differences between my 3 bin
complex formula and Candan's 2013 disappeared.

I appreciate the commentary on improving my testing.  I am going to center
and rescale the noise sets so each one is zero centered with the correct
target RMS.

The other thing that is nuts about my last data run is that the formula
for real signals outperformed the formula for complex signals on complex
signals.  This test was not meant to be comprehensive, it was meant to be
a rough side by side comparison.  The improvements you are talking about
won't materially affect the conclusions drawn.

1) The two bin outperforms the three bin in the interior region

2) The standard deviations are roughly proportional to the noise level

3) The complex formulas don't work very well on real signals.

The second one is important because it means the point at which the two
bin is better than the three bin is independent of noise level.

I am going to improve the program for the real signal case.  Martin
Vicanek's approach is very impressive (see earlier post in this thread by
him).  There is the very real possibility that he will be able to improve
on my results significantly.  Not in the noiseless case, of course, but
when noise is present.

As with the complex case, I expect a two bin solution to be better when
the frequency is in the interior region between bins.

>
>Steve

Ced
---------------------------------------
Posted through http://www.DSPRelated.com

Reply by ●June 8, 20152015-06-08

How does it perform for input frequencies below bin 2 (with bin 1 being the DC bin)? I work in a field where we simulate for days and may not be able to produce one entire cycle of a sine-wave, and yet we would still like to get some estimate of distortion. If we had a high-accuracy frequency/phase estimate then we can subtract a synthesized waveform and look at the residual. 


Bob

Reply by Cedron ●June 8, 20152015-06-08

>How does it perform for input frequencies below bin 2 (with bin 1 being
>the DC bin)? I work in a field where we simulate for days and may not be
able
>to produce one entire cycle of a sine-wave, and yet we would still like
to
>get some estimate of distortion. If we had a high-accuracy
frequency/phase
>estimate then we can subtract a synthesized waveform and look at the
>residual. 
>
>
>Bob

If there is only one "tone" in the signal you may be better off doing a
best fit in the time domain.

Here is some output for the range you are talking about.  I changed the
first column from being the error to the actual frequency.  This is based
on 100 sample points with 1000 runs per row.

Target Noise Level = 0.000

Freq   Dawg 3 Bin
----   -------------------
0.1    0.100000   0.000000
0.2    0.200000   0.000000
0.3    0.300000   0.000000
0.4    0.400000   0.000000
0.5    0.500000   0.000000
0.6    0.600000   0.000000
0.7    0.700000   0.000000
0.8    0.800000   0.000000
0.9    0.900000   0.000000


Target Noise Level = 0.001

Freq   Dawg 3 Bin
----   -------------------
0.1    0.099868   0.003325
0.2    0.200043   0.002297
0.3    0.299992   0.001122
0.4    0.399983   0.000561
0.5    0.500010   0.000325
0.6    0.599990   0.000242
0.7    0.699990   0.000172
0.8    0.800006   0.000150
0.9    0.899999   0.000125


Target Noise Level = 0.010

Freq   Dawg 3 Bin
----   -------------------
0.1    0.094057   0.038866
0.2    0.199179   0.023595
0.3    0.300037   0.011098
0.4    0.399983   0.005504
0.5    0.499975   0.003358
0.6    0.599938   0.002333
0.7    0.699981   0.001908
0.8    0.800035   0.001511
0.9    0.900070   0.001244


Target Noise Level = 0.100

Freq   Dawg 3 Bin
----   -------------------
0.1    0.122839   0.127990
0.2    0.190757   0.157891
0.3    0.276236   0.132423
0.4    0.395439   0.058808
0.5    0.497421   0.033544
0.6    0.597931   0.024322
0.7    0.700397   0.017851
0.8    0.799699   0.015278
0.9    0.899727   0.012647

For how to solve for the amplitude and phase check out my blog article
titled "Phase and Amplitude Calculation for a Pure Real Tone in a DFT:
Method 1" at dsprelated.com.

I also have some unpublished noise mitigation techniques that may be
helpful.

Ced

---------------------------------------
Posted through http://www.DSPRelated.com

Reply by Cedron ●June 8, 20152015-06-08

>Target Noise Level = 0.100
>
>Freq   Dawg 3 Bin
>----   -------------------
>0.1    0.122839   0.127990
>0.2    0.190757   0.157891
>0.3    0.276236   0.132423
>0.4    0.395439   0.058808
>0.5    0.497421   0.033544
>0.6    0.597931   0.024322
>0.7    0.700397   0.017851
>0.8    0.799699   0.015278
>0.9    0.899727   0.012647
>
My program is not designed to wrap around below zero.  The results on the
lower half would be better if it were.  It can be done.  The standard
deviations would be expected to be the mirror image of the upper values.

Ced

---------------------------------------
Posted through http://www.DSPRelated.com

Reply by ●June 9, 20152015-06-09

Thanks. Time-domain best-fits are famous for local minima problems, but maybe your technique gives a good enough initial guess that a simple gradient descent will get me the rest of the way. 

Bob

Reply by Cedron ●June 9, 20152015-06-09

>Thanks. Time-domain best-fits are famous for local minima problems, but
>maybe your technique gives a good enough initial guess that a simple
>gradient descent will get me the rest of the way. 
>
>Bob

You're welcome.  How noisy is your data?  Are the points evenly spaced in
time?

I would make it a one dimensional gradient search on the frequency only. 
For each frequency guess pretty much mimic the technique I use in the
frequency domain in my blog article in the time domain.  That is, generate
two basis vectors and solve for the best fit.

Signal(n) = a cos( Guess * n ) + b sin( Guess * n )

Dot the equation with each basis vector and solve the resulting two
equation/two unknown linear system for a and b.

Since you already know where your frequency is expected you only have to
calculate three bins of the DFT to calculate your initial guess.

Hope this helps.

Ced
---------------------------------------
Posted through http://www.DSPRelated.com

Reply by Cedron ●June 9, 20152015-06-09

I said:

>I would make it a one dimensional gradient search on the frequency only.


Perhaps a better idea, is to measure your error (sum of the differences
squared) at the guess frequency (with its a,b solution set) and two nearby
frequencies, then do a parabolic fit to the values and find the minimum. 
You could iterate with a smaller spacing to get a more accurate answer,
but I doubt that would be necessary.

Ced
---------------------------------------
Posted through http://www.DSPRelated.com

Reply by dbd ●June 9, 20152015-06-09

On Sunday, June 7, 2015 at 7:46:59 AM UTC-7, Cedron wrote:
...
> 
> But the noise is not known beyond its expected average and its expected
> RMS.  If I would want to use "canned" noise patterns for all rows and all
> noise levels, I would at least want to recenter and rescale them so their
> average was known to be zero and their RMS known to be my target value. 
> Even so, the particular distribution of values may present as some pattern
> in the data that would be indistinguishable from a pattern formed by the
> formulas. 
> 
...
> Ced

That first sentence makes no sense. You can test your generator and your data however you want. That's any algorithm tester's responsibility. That's why it is common to use well tested and documented generators. That's also why seeded generators are used so that anyone else can regenerate the data to test if they wish.

When generating independent finite sequences of AWGN there is an expected variance in mean known as the 'expected error in mean'. There is an expected error in variance known as the 'expected error in variance'. A generator that produces independent sequences with a consistently larger or consistently smaller mean than the expected error in mean is broken and should be fixed or replaced and retested. A generator that produces independent sequences with consistently larger or consistently smaller error in variance than the expected error in mean is broken and should be fixed or replaced and retested.

Mean and variance  are not the only statistics of a generator that can be used to evaluate the correctness of a generator of AWGN. There are similar expected errors in skew and kurtosis for example.

Real implementations of algorithms operate on finite data sets that do not have an expected mean of zero or expected error in variance of zero and are not properly tested by cooked data.

If you think smaller expected error is needed, get it by fixing your broken generator, if that is the problem or increasing the size of the data set, not cooking the data.

Dale B. Dalrymple

Reply by Steve Pope ●June 9, 20152015-06-09

On Sunday, June 7, 2015 at 7:46:59 AM UTC-7, Cedron wrote:
...
> 
> If I would want to use "canned" noise patterns for all rows and all
> noise levels, I would at least want to recenter and rescale them so their
> average was known to be zero and their RMS known to be my target value. 
> Even so, the particular distribution of values may present as some pattern
> in the data that would be indistinguishable from a pattern formed by the
> formulas. 

You don't want to "recenter" the noise, and you don't want to
"scale" the noise other than by a factor defined by your
SNR.

The mean value of any given N values of normal (0,1) white gaussian noise,
where N is finite, is going to be something other than zero, and the RMS 
of these values is going to be something other than one.  You
do not want to center/rescale these to be 0 and 1.  That will give
you an incorrect, and usually optimistic, simulation result.
You want to use the values "as is" as emitted by a known
good N(0,1) generator, "good" meaning with good statistics including,
but not limited to, a mean of zero and RMS of 1.

(On this latter point, there are varous trustworthy methods of
constructing such a generator, and I have used several of these over 
time.  I personally prefer to use Knuth's polar method to convert a 
uniformly-distributed random number to a gaussian, and to choose/
evaluate the underlying uniform random generator carefully.

Steve

Reply by Cedron ●June 10, 20152015-06-10

[...snip...]
>
>You don't want to "recenter" the noise, and you don't want to
>"scale" the noise other than by a factor defined by your
>SNR.
>
For a side by side test it doesn't matter, as long as the formulas get the
same patterns.  For a "real world applicability" test, this is true.  To
see numbers that are similar to the answer if you solved this
analytically, this is definitely what you would want to do.  Likewise, to
try to insulate the results from effects of the noise patterns, as we were
discussing, this would also do the job.


>The mean value of any given N values of normal (0,1) white gaussian
noise,
>where N is finite, is going to be something other than zero, and the RMS

>of these values is going to be something other than one.  You
>do not want to center/rescale these to be 0 and 1.  That will give
>you an incorrect, and usually optimistic, simulation result.

Yes, but the expected value of the mean is zero, and the expected value of
the RMS is 1.  As above, incorrect for applicability to usage, undoubtedly
optimistic in that regard.

>You want to use the values "as is" as emitted by a known
>good N(0,1) generator, "good" meaning with good statistics including,
>but not limited to, a mean of zero and RMS of 1.
>

I made a quick and dirty implementation that was adequate for the job at
hand.  The only claim I made was that it was near Gaussian, I never
claimed it was AWGN.  In the 10 sample point case, there is hardly a
difference.  For larger run sizes, you may notice a difference.  I did not
measure the means, only the RMS values.  I printed those results in the
"Show me the numbers thread."  Clearly the standard deviations of the RMS
values show the noise did not hit the target value for every run.

>(On this latter point, there are varous trustworthy methods of
>constructing such a generator, and I have used several of these over 
>time.  I personally prefer to use Knuth's polar method to convert a 
>uniformly-distributed random number to a gaussian, and to choose/
>evaluate the underlying uniform random generator carefully.
>
>Steve

In this test, for what I am trying to show, I don't really think it
matters.  Our discussion on whether to use a single set of noise patterns
for all runs or give each run a fresh pattern is independent of the
particular noise model used, agreed?

Ced

---------------------------------------
Posted through http://www.DSPRelated.com