Forums

Show me some more numbers

Started by Cedron June 4, 2015
> >There is indeed a risk that, say, one particular noise pattern >might be pathological with respect to a particular data pattern. > >In simulation, you select a runsize (i.e. a large set of noise patterns)
>large enough such that these pathological combinations appear in their >expected incidence, such that you have created a large enough ensemble >such that the results are not overly-dominated by outliers, but >include them with close to their correct statistics. > >In a sense, you are right that varying the noise patterns with SNR >may give you a larger ensemble effect.
Call it the portfolio diversification effect.
>That is, if you have six SNR's >each with 10,000 different noise patterns, you may be >able to look at the resulting curve or column of data and get >a gist of what it would have looked like had you chosen a >runsize of 60,000 in the first place. > >But it's not going to be as good a result as running the same 60,000 >noise patterns at each SNR. It is ... sort of a half-measure, >if that makes any sense. >
I'm not sure what you are saying. The point I made was that if you wanted you use the same noise patterns on all the SNR levels (or if you were just studying one SNR level), you would want to do it multiple times so you had multiple sets of results. This would allow you to compare the results sets for common patterns vs unique patterns. If you just ran an equivalent number of runs in one result set you would not have any way of determining if an observed pattern was due to the formulas or the noise.
>Also you are saying runtime is not a problem for this experiment. >
No, it just takes a few seconds to run. Upping that some is not a problem. However, the precision limitation effect does seem to be a problem. That's one of the reasons I posted my last set of data. When I dropped from 10,000 runs to 1,000 runs, the differences between my 3 bin complex formula and Candan's 2013 disappeared. I appreciate the commentary on improving my testing. I am going to center and rescale the noise sets so each one is zero centered with the correct target RMS. The other thing that is nuts about my last data run is that the formula for real signals outperformed the formula for complex signals on complex signals. This test was not meant to be comprehensive, it was meant to be a rough side by side comparison. The improvements you are talking about won't materially affect the conclusions drawn. 1) The two bin outperforms the three bin in the interior region 2) The standard deviations are roughly proportional to the noise level 3) The complex formulas don't work very well on real signals. The second one is important because it means the point at which the two bin is better than the three bin is independent of noise level. I am going to improve the program for the real signal case. Martin Vicanek's approach is very impressive (see earlier post in this thread by him). There is the very real possibility that he will be able to improve on my results significantly. Not in the noiseless case, of course, but when noise is present. As with the complex case, I expect a two bin solution to be better when the frequency is in the interior region between bins.
> >Steve
Ced --------------------------------------- Posted through http://www.DSPRelated.com
How does it perform for input frequencies below bin 2 (with bin 1 being the DC bin)? I work in a field where we simulate for days and may not be able to produce one entire cycle of a sine-wave, and yet we would still like to get some estimate of distortion. If we had a high-accuracy frequency/phase estimate then we can subtract a synthesized waveform and look at the residual. 


Bob
>How does it perform for input frequencies below bin 2 (with bin 1 being >the DC bin)? I work in a field where we simulate for days and may not be
able
>to produce one entire cycle of a sine-wave, and yet we would still like
to
>get some estimate of distortion. If we had a high-accuracy
frequency/phase
>estimate then we can subtract a synthesized waveform and look at the >residual. > > >Bob
If there is only one "tone" in the signal you may be better off doing a best fit in the time domain. Here is some output for the range you are talking about. I changed the first column from being the error to the actual frequency. This is based on 100 sample points with 1000 runs per row. Target Noise Level = 0.000 Freq Dawg 3 Bin ---- ------------------- 0.1 0.100000 0.000000 0.2 0.200000 0.000000 0.3 0.300000 0.000000 0.4 0.400000 0.000000 0.5 0.500000 0.000000 0.6 0.600000 0.000000 0.7 0.700000 0.000000 0.8 0.800000 0.000000 0.9 0.900000 0.000000 Target Noise Level = 0.001 Freq Dawg 3 Bin ---- ------------------- 0.1 0.099868 0.003325 0.2 0.200043 0.002297 0.3 0.299992 0.001122 0.4 0.399983 0.000561 0.5 0.500010 0.000325 0.6 0.599990 0.000242 0.7 0.699990 0.000172 0.8 0.800006 0.000150 0.9 0.899999 0.000125 Target Noise Level = 0.010 Freq Dawg 3 Bin ---- ------------------- 0.1 0.094057 0.038866 0.2 0.199179 0.023595 0.3 0.300037 0.011098 0.4 0.399983 0.005504 0.5 0.499975 0.003358 0.6 0.599938 0.002333 0.7 0.699981 0.001908 0.8 0.800035 0.001511 0.9 0.900070 0.001244 Target Noise Level = 0.100 Freq Dawg 3 Bin ---- ------------------- 0.1 0.122839 0.127990 0.2 0.190757 0.157891 0.3 0.276236 0.132423 0.4 0.395439 0.058808 0.5 0.497421 0.033544 0.6 0.597931 0.024322 0.7 0.700397 0.017851 0.8 0.799699 0.015278 0.9 0.899727 0.012647 For how to solve for the amplitude and phase check out my blog article titled "Phase and Amplitude Calculation for a Pure Real Tone in a DFT: Method 1" at dsprelated.com. I also have some unpublished noise mitigation techniques that may be helpful. Ced --------------------------------------- Posted through http://www.DSPRelated.com
>Target Noise Level = 0.100 > >Freq Dawg 3 Bin >---- ------------------- >0.1 0.122839 0.127990 >0.2 0.190757 0.157891 >0.3 0.276236 0.132423 >0.4 0.395439 0.058808 >0.5 0.497421 0.033544 >0.6 0.597931 0.024322 >0.7 0.700397 0.017851 >0.8 0.799699 0.015278 >0.9 0.899727 0.012647 >
My program is not designed to wrap around below zero. The results on the lower half would be better if it were. It can be done. The standard deviations would be expected to be the mirror image of the upper values. Ced --------------------------------------- Posted through http://www.DSPRelated.com
Thanks. Time-domain best-fits are famous for local minima problems, but maybe your technique gives a good enough initial guess that a simple gradient descent will get me the rest of the way. 

Bob
>Thanks. Time-domain best-fits are famous for local minima problems, but >maybe your technique gives a good enough initial guess that a simple >gradient descent will get me the rest of the way. > >Bob
You're welcome. How noisy is your data? Are the points evenly spaced in time? I would make it a one dimensional gradient search on the frequency only. For each frequency guess pretty much mimic the technique I use in the frequency domain in my blog article in the time domain. That is, generate two basis vectors and solve for the best fit. Signal(n) = a cos( Guess * n ) + b sin( Guess * n ) Dot the equation with each basis vector and solve the resulting two equation/two unknown linear system for a and b. Since you already know where your frequency is expected you only have to calculate three bins of the DFT to calculate your initial guess. Hope this helps. Ced --------------------------------------- Posted through http://www.DSPRelated.com
I said:

>I would make it a one dimensional gradient search on the frequency only.
Perhaps a better idea, is to measure your error (sum of the differences squared) at the guess frequency (with its a,b solution set) and two nearby frequencies, then do a parabolic fit to the values and find the minimum. You could iterate with a smaller spacing to get a more accurate answer, but I doubt that would be necessary. Ced --------------------------------------- Posted through http://www.DSPRelated.com
On Sunday, June 7, 2015 at 7:46:59 AM UTC-7, Cedron wrote:
...
> > But the noise is not known beyond its expected average and its expected > RMS. If I would want to use "canned" noise patterns for all rows and all > noise levels, I would at least want to recenter and rescale them so their > average was known to be zero and their RMS known to be my target value. > Even so, the particular distribution of values may present as some pattern > in the data that would be indistinguishable from a pattern formed by the > formulas. >
...
> Ced
That first sentence makes no sense. You can test your generator and your data however you want. That's any algorithm tester's responsibility. That's why it is common to use well tested and documented generators. That's also why seeded generators are used so that anyone else can regenerate the data to test if they wish. When generating independent finite sequences of AWGN there is an expected variance in mean known as the 'expected error in mean'. There is an expected error in variance known as the 'expected error in variance'. A generator that produces independent sequences with a consistently larger or consistently smaller mean than the expected error in mean is broken and should be fixed or replaced and retested. A generator that produces independent sequences with consistently larger or consistently smaller error in variance than the expected error in mean is broken and should be fixed or replaced and retested. Mean and variance are not the only statistics of a generator that can be used to evaluate the correctness of a generator of AWGN. There are similar expected errors in skew and kurtosis for example. Real implementations of algorithms operate on finite data sets that do not have an expected mean of zero or expected error in variance of zero and are not properly tested by cooked data. If you think smaller expected error is needed, get it by fixing your broken generator, if that is the problem or increasing the size of the data set, not cooking the data. Dale B. Dalrymple
On Sunday, June 7, 2015 at 7:46:59 AM UTC-7, Cedron wrote:
...
> > If I would want to use "canned" noise patterns for all rows and all > noise levels, I would at least want to recenter and rescale them so their > average was known to be zero and their RMS known to be my target value. > Even so, the particular distribution of values may present as some pattern > in the data that would be indistinguishable from a pattern formed by the > formulas.
You don't want to "recenter" the noise, and you don't want to "scale" the noise other than by a factor defined by your SNR. The mean value of any given N values of normal (0,1) white gaussian noise, where N is finite, is going to be something other than zero, and the RMS of these values is going to be something other than one. You do not want to center/rescale these to be 0 and 1. That will give you an incorrect, and usually optimistic, simulation result. You want to use the values "as is" as emitted by a known good N(0,1) generator, "good" meaning with good statistics including, but not limited to, a mean of zero and RMS of 1. (On this latter point, there are varous trustworthy methods of constructing such a generator, and I have used several of these over time. I personally prefer to use Knuth's polar method to convert a uniformly-distributed random number to a gaussian, and to choose/ evaluate the underlying uniform random generator carefully. Steve
[...snip...]
> >You don't want to "recenter" the noise, and you don't want to >"scale" the noise other than by a factor defined by your >SNR. >
For a side by side test it doesn't matter, as long as the formulas get the same patterns. For a "real world applicability" test, this is true. To see numbers that are similar to the answer if you solved this analytically, this is definitely what you would want to do. Likewise, to try to insulate the results from effects of the noise patterns, as we were discussing, this would also do the job.
>The mean value of any given N values of normal (0,1) white gaussian
noise,
>where N is finite, is going to be something other than zero, and the RMS
>of these values is going to be something other than one. You >do not want to center/rescale these to be 0 and 1. That will give >you an incorrect, and usually optimistic, simulation result.
Yes, but the expected value of the mean is zero, and the expected value of the RMS is 1. As above, incorrect for applicability to usage, undoubtedly optimistic in that regard.
>You want to use the values "as is" as emitted by a known >good N(0,1) generator, "good" meaning with good statistics including, >but not limited to, a mean of zero and RMS of 1. >
I made a quick and dirty implementation that was adequate for the job at hand. The only claim I made was that it was near Gaussian, I never claimed it was AWGN. In the 10 sample point case, there is hardly a difference. For larger run sizes, you may notice a difference. I did not measure the means, only the RMS values. I printed those results in the "Show me the numbers thread." Clearly the standard deviations of the RMS values show the noise did not hit the target value for every run.
>(On this latter point, there are varous trustworthy methods of >constructing such a generator, and I have used several of these over >time. I personally prefer to use Knuth's polar method to convert a >uniformly-distributed random number to a gaussian, and to choose/ >evaluate the underlying uniform random generator carefully. > >Steve
In this test, for what I am trying to show, I don't really think it matters. Our discussion on whether to use a single set of noise patterns for all runs or give each run a fresh pattern is independent of the particular noise model used, agreed? Ced --------------------------------------- Posted through http://www.DSPRelated.com