Forums

Show me some more numbers

Started by Cedron June 4, 2015
> >There are many tricks to using computer state variables to reseed PRGs
on
>the fly so any encryption based on them can be cracked.
That should be "can't be cracked". Sorry. Ced --------------------------------------- Posted through http://www.DSPRelated.com
On Thu, 11 Jun 2015 20:08:43 -0500, "Cedron" <103185@DSPRelated>
wrote:

>> >>There are many tricks to using computer state variables to reseed PRGs >on >>the fly so any encryption based on them can be cracked. > >That should be "can't be cracked". Sorry. > >Ced >--------------------------------------- >Posted through http://www.DSPRelated.com
No, "can be cracked" was correct. This is why there is a lot of effort to build truly stochastic (i.e., natural, not deterministic or reproducible) sources for seeds and similar parameters. Even the ring oscillators used in many applications are not considered sufficient for others, so things like transistor/diode noise embedded in the device are sometimes exploited, and even then a lot of specific design considerations are made to assure the entropy and distribution characteristics of the results are sufficient, and the standards are very, very high. Intel has done a lot of work in this area since they embed a lot of encryption and security hardware acceleration instructions these days. Their new on-chip RNGs are very good. Eric Jacobsen Anchor Hill Communications http://www.anchorhill.com
>Sorry, I don't have the program recoded yet for pattern reuse. > >Ced >--------------------------------------- >Posted through http://www.DSPRelated.com
It's done now. Here are some results using canned uncooked noise. I went with 1,000 and 5,000 runs instead of the 10,000 and 50,000 because of the differences in my 3 Bin Complex with Candan 2013. Since these are mathematically the results should be identical as they are with smaller run sizes. The 1,000 are the first part of the 5,000. The program has been adjusted to reanchor the three bin sets above 3.5. You can clearly see that there appears to be a bias pattern in each average column that repeats at the higher noise level and appears similar in the higher run size set. This is in contrast to the original set that started this thread where there is no discernible pattern and no repeat of values at the next level. However, the standard deviation columns do show the same pattern. That is for the three bin formulas the peak is in the center and for the two bin formula the minimum is at the center. This is explained by the SNR values of the bins. Without that explanation, it would not be clear from these data sets that it was due to the formula and not the noise. There is no way to tell by looking at the data if the apparent bias pattern in the average column is due to the noise rather than the formulas. By using fresh noise for every row, no pattern emerges, so the noise is the logical explanation for the variation from zero. Ced ========================================= All values x1000 Sample Count = 10 Target Noise Level = 0.010 Run Count = 1000 Freq Dawg Real Dawg 2 Bin Dawg 3 Bin Candan 2013 ---- ------------- ------------- ------------- ------------- 3.00 0.019 1.568 0.063 2.241 0.017 1.658 0.017 1.658 3.10 0.003 1.671 0.045 1.858 0.005 1.702 0.005 1.702 3.20 -0.012 1.859 0.030 1.586 -0.006 1.815 -0.006 1.815 3.30 -0.024 2.133 0.018 1.404 -0.013 1.994 -0.013 1.994 3.40 -0.029 2.500 0.010 1.298 -0.013 2.236 -0.013 2.236 3.50 -0.026 2.982 0.009 1.266 -0.003 2.552 -0.004 2.552 3.60 0.053 1.601 0.012 1.310 0.097 2.168 0.097 2.168 3.70 0.049 1.527 0.022 1.434 0.090 1.971 0.090 1.971 3.80 0.050 1.542 0.037 1.642 0.082 1.843 0.082 1.843 3.90 0.055 1.642 0.057 1.937 0.073 1.771 0.074 1.771 Target Noise Level = 0.100 Run Count = 1000 Freq Dawg Real Dawg 2 Bin Dawg 3 Bin Candan 2013 ---- ------------- ------------- ------------- ------------- 3.00 0.209 15.673 0.634 22.400 0.161 16.565 0.161 16.565 3.10 0.064 16.676 0.471 18.538 0.051 16.987 0.048 16.987 3.20 -0.070 18.564 0.315 15.828 -0.052 18.137 -0.059 18.135 3.30 -0.164 21.344 0.187 14.035 -0.119 19.963 -0.131 19.961 3.40 -0.178 25.091 0.105 13.005 -0.120 22.437 -0.139 22.436 3.50 -0.067 30.002 0.081 12.687 -0.026 25.621 -0.059 25.618 3.60 0.584 16.049 0.120 13.115 0.965 21.747 0.985 21.746 3.70 0.548 15.274 0.222 14.354 0.895 19.728 0.907 19.727 3.80 0.568 15.415 0.382 16.456 0.819 18.417 0.825 18.417 3.90 0.638 16.445 0.585 19.469 0.733 17.689 0.736 17.688 Target Noise Level = 0.010 Run Count = 5000 Freq Dawg Real Dawg 2 Bin Dawg 3 Bin Candan 2013 ---- ------------- ------------- ------------- ------------- 3.00 0.011 1.598 0.012 2.267 0.025 1.683 0.025 1.683 3.10 0.006 1.691 -0.007 1.901 0.017 1.721 0.017 1.721 3.20 0.002 1.857 -0.021 1.631 0.009 1.819 0.009 1.819 3.30 0.002 2.105 -0.029 1.440 0.003 1.978 0.003 1.978 3.40 0.004 2.452 -0.031 1.319 0.000 2.207 0.000 2.207 3.50 0.011 2.923 -0.029 1.271 0.001 2.520 0.000 2.520 3.60 -0.010 1.597 -0.023 1.301 0.004 2.202 0.004 2.202 3.70 -0.002 1.494 -0.013 1.411 0.012 1.959 0.012 1.959 3.80 0.006 1.485 0.000 1.605 0.019 1.793 0.019 1.793 3.90 0.015 1.570 0.014 1.885 0.025 1.699 0.025 1.699 Target Noise Level = 0.100 Run Count = 5000 Freq Dawg Real Dawg 2 Bin Dawg 3 Bin Candan 2013 ---- ------------- ------------- ------------- ------------- 3.00 0.131 15.985 0.133 22.678 0.252 16.828 0.252 16.827 3.10 0.077 16.927 -0.074 19.030 0.161 17.225 0.158 17.224 3.20 0.048 18.611 -0.216 16.345 0.077 18.218 0.071 18.217 3.30 0.059 21.113 -0.298 14.422 0.018 19.816 0.007 19.815 3.40 0.129 24.603 -0.323 13.204 -0.004 22.104 -0.024 22.102 3.50 0.278 29.342 -0.298 12.717 0.015 25.220 -0.016 25.217 3.60 -0.037 16.002 -0.228 13.016 0.032 22.066 0.052 22.065 3.70 0.041 14.957 -0.124 14.132 0.124 19.626 0.135 19.625 3.80 0.136 14.866 0.007 16.076 0.202 17.948 0.208 17.947 3.90 0.242 15.717 0.152 18.895 0.254 16.981 0.257 16.981 --------------------------------------- Posted through http://www.DSPRelated.com
>Cedron <103185@DSPRelated> wrote: > >>>>Centering and rescaling would help with that at the cost of >>>>being less realistic. > >>>I think that's a pretty bad direction to go in. > >>I wouldn't call it good or bad. What you are in essence doing is >>shortcutting using a much larger runsize. The purpose of a larger
runsize
>>is to get the distributions closer to the ideal. > >But it's then no longer N(0,1) noise. That to me is a very big deal. >
[...snip...]
> >Steve
I think your distaste for centering and rescaling the noise is unfounded. I have no problem with the term "cooked" to describe it. If you think of the situation in terms of the analytical equation: WB(Z+E) / W(Z+E) = WBZ / WZ + (Misc terms)E + H.O.T. (I used V by mistake last time.) E represents the DFT of the noise. By shifting the noise, only the DC bin of E will be affected. As long as your two or three bin set doesn't cover the DC bin there will be no effect. Rescaling the noise will rescale each bin of E the same. This will not alter the relative values. If you are fine with rescaling the noise for different noise levels, then you should be comfortable with rescaling it to make the numbers more "well behaved" in terms of magnitude, so the results reflect more consistent values. By doing so, the standard deviations become a little smaller, but if your interest is building a model to quantify them, then rescaling them will give you better results. I am thinking particularly about the problem of figuring out where the cutoff is between the three and two bin formulas is. Ced --------------------------------------- Posted through http://www.DSPRelated.com
Cedron <103185@DSPRelated> wrote:

> [Pope wrote]
>>But it's then no longer N(0,1) noise. That to me is a very big deal.
>I think your distaste for centering and rescaling the noise is unfounded. >I have no problem with the term "cooked" to describe it. If you think of >the situation in terms of the analytical equation: > >WB(Z+E) / W(Z+E) = WBZ / WZ + (Misc terms)E + H.O.T. > >(I used V by mistake last time.) > >E represents the DFT of the noise. By shifting the noise, only the DC bin >of E will be affected. As long as your two or three bin set doesn't cover >the DC bin there will be no effect.
I agree so far
>Rescaling the noise will rescale each >bin of E the same. This will not alter the relative values. If you are >fine with rescaling the noise for different noise levels, then you should >be comfortable with rescaling it to make the numbers more "well behaved" >in terms of magnitude, so the results reflect more consistent values.
That's completely wrong. Those noise patterns that happen (due to the nature of the Gaussian distribution) to have large individual components are the ones dominating the error rate; and if you scale these back (because, you notice they are large) then you are screwing with your error rate. What could be more fundamental than applying AWGN to a signal, and leaving it at that?
>doing so, the standard deviations become a little smaller, but if your >interest is building a model to quantify them, then rescaling them will >give you better results.
Wronger results. Steve
Cedron <103185@DSPRelated> wrote:

>Here are some results using canned uncooked noise.
Thanks for running these.
>I went with 1,000 and >5,000 runs instead of the 10,000 and 50,000 because of the differences in >my 3 Bin Complex with Candan 2013. Since these are mathematically the >results should be identical as they are with smaller run sizes. The 1,000 >are the first part of the 5,000. The program has been adjusted to >reanchor the three bin sets above 3.5.
>You can clearly see that there appears to be a bias pattern in each >average column that repeats at the higher noise level and appears similar >in the higher run size set. This is in contrast to the original set that >started this thread where there is no discernible pattern and no repeat of >values at the next level.
This is what I would expect
>There is no way to tell by looking at the data if the apparent bias >pattern in the average column is due to the noise rather than the >formulas.
I am quoting a subset of your results:
>All values x1000 Sample Count = 10
>Target Noise Level = 0.010 Run Count = 1000 > >Freq Dawg Real >---- ------------- >3.00 0.019 1.568 >3.10 0.003 1.671 >3.20 -0.012 1.859 >3.30 -0.024 2.133 >3.40 -0.029 2.500 >3.50 -0.026 2.982 >3.60 0.053 1.601 >3.70 0.049 1.527 >3.80 0.050 1.542 >3.90 0.055 1.642
>Target Noise Level = 0.010 Run Count = 5000 > >Freq Dawg Real >---- ------------- >3.00 0.011 1.598 >3.10 0.006 1.691 >3.20 0.002 1.857 >3.30 0.002 2.105 >3.40 0.004 2.452 >3.50 0.011 2.923 >3.60 -0.010 1.597 >3.70 -0.002 1.494 >3.80 0.006 1.485 >3.90 0.015 1.570
My conclusion looking at the above is that the simulation has not converged. That is, the run with the first 1,000 noise patterns exhibits an apparent negative bias in the "average" column for frequencies 3.2 through 3.5, but after running this out to 5,000 patterns total (including the first 1,000 as a subset) this apparent bias disappears, which means it was an artifact peculiar to the first 1,000 noise patterns and is not a real result portrying algorithm behavior. You'd have to run it out past 5,000 to see if the averages in the run of 5000 are accurate or are also an artifact. Do you have a different conclusion? Steve
[...snip...]
> >>Rescaling the noise will rescale each >>bin of E the same. This will not alter the relative values. If you
are
>>fine with rescaling the noise for different noise levels, then you
should
>>be comfortable with rescaling it to make the numbers more "well
behaved"
>>in terms of magnitude, so the results reflect more consistent values. > >That's completely wrong. Those noise patterns that happen >(due to the nature of the Gaussian distribution) to have large >individual components are the ones dominating the error rate; >and if you scale these back (because, you notice they are large) then >you are screwing with your error rate. >
There is no error rate because there is no tolerance specification. So it is just a matter of tightening up the variance. There are just as many expected cases that need to be increased as those that need to be decreased. So by "cooking" the numbers you can expect values that will be more similar to making a larger set of runs. However, it is not a true short cut in that the sizes of the E values are more likely to be varied than if you ran longer runs.
>What could be more fundamental than applying AWGN to a signal, >and leaving it at that? >
That's one way of testing, it shouldn't exclude others.
>>doing so, the standard deviations become a little smaller, but if your >>interest is building a model to quantify them, then rescaling them will >>give you better results. > >Wronger results. > > >Steve
This isn't a matter of right and wrong. When I said "better" I meant in regard to getting numbers that closer represent what an analytical solution would provide with fewer runs. Unfortunately, upping the run count has introduced a clear precision error. As long as you understand the conditions of the test you shouldn't have trouble interpreting the results. And as we have previously agreed, the most important aspect when doing a side by side comparison is that all the formulas face the same test cases. Ced --------------------------------------- Posted through http://www.DSPRelated.com
>Cedron <103185@DSPRelated> wrote: > >>Here are some results using canned uncooked noise. > >Thanks for running these. >
You're welcome. It was worth it to have a clear set of numbers that reflected the statements I have made. [...snip...]
> >I am quoting a subset of your results: > >>All values x1000 Sample Count = 10 > >>Target Noise Level = 0.010 Run Count = 1000 >> >>Freq Dawg Real >>---- ------------- >>3.00 0.019 1.568 >>3.10 0.003 1.671 >>3.20 -0.012 1.859 >>3.30 -0.024 2.133 >>3.40 -0.029 2.500 >>3.50 -0.026 2.982 >>3.60 0.053 1.601 >>3.70 0.049 1.527 >>3.80 0.050 1.542 >>3.90 0.055 1.642 > >>Target Noise Level = 0.010 Run Count = 5000 >> >>Freq Dawg Real >>---- ------------- >>3.00 0.011 1.598 >>3.10 0.006 1.691 >>3.20 0.002 1.857 >>3.30 0.002 2.105 >>3.40 0.004 2.452 >>3.50 0.011 2.923 >>3.60 -0.010 1.597 >>3.70 -0.002 1.494 >>3.80 0.006 1.485 >>3.90 0.015 1.570 > >My conclusion looking at the above is that the simulation has not >converged. That is, the run with the first 1,000 noise patterns >exhibits an apparent negative bias in the "average" column for
frequencies
>3.2 through 3.5, but after running this out to 5,000 patterns total >(including the first 1,000 as a subset) this apparent bias disappears, >which means it was an artifact peculiar to the first 1,000 noise patterns
>and is not a real result portrying algorithm behavior. > >You'd have to run it out past 5,000 to see if the averages in the run >of 5000 are accurate or are also an artifact. > >Do you have a different conclusion? > > >Steve
The other cases weren't as clear cut. I still maintain that it is much easier discerning what was caused by the noise and which was caused by the formula by using fresh noise for each row. In those cases, there was no pattern in the averages, and the average values in the increased noise case were not nearly proportional to the next noise level. However, since the standard deviation columns did present a clear pattern that survived with different noise cases means (most likely) that that pattern was due to the formulas. So, in conclusion, fresh noise for each row does a better job of distinguishing what effects are due to the noise and which are due to the formula. I'm not saying your approach can't do it, it just doesn't do it as well. Ced --------------------------------------- Posted through http://www.DSPRelated.com
Cedron <103185@DSPRelated> wrote:

> Pope says,
>> Rescaling the noise will rescale each bin of E the same. >> This will not alter the relative values. If you are fine with >> rescaling the noise for different noise levels, then you should >> be comfortable with rescaling it to make the numbers more >> "well behaved" in terms of magnitude, so the results reflect >> more consistent values.
>>That's completely wrong. Those noise patterns that happen >>(due to the nature of the Gaussian distribution) to have large >>individual components are the ones dominating the error rate; >>and if you scale these back (because, you notice they are large) then >>you are screwing with your error rate.
>There is no error rate because there is no tolerance specification.
Then replace "error rate" by "average error" (i.e your first column of data).
> So it is just a matter of tightening up the variance. There > are just as many expected cases that need to be increased as > those that need to be decreased. So by "cooking" the numbers > you can expect values that will be more similar to making a > larger set of runs.
I don't think so.
>>What could be more fundamental than applying AWGN to a signal, >>and leaving it at that?
> That's one way of testing, it shouldn't exclude others. > [..] This isn't a matter of right and wrong. When I said "better" > I meant in regard to getting numbers that closer represent what an > analytical solution would provide with fewer runs.
By rescaling each individual noise pattern to have the same sigma, you are destroying your results. The purpose of creating an ensemble of 1,000 (or 10,000, or as many as is necessary) noise patterns is so that you can see how the statistics of added white Gaussian noise affect system performance. As to whether doing this "excludes" other tests, of course it does not, but if you want to discuss the performance of a system in AWGN, and compare results with those of other investigations / investigators, you really must not do this rescaling business. Steve
> >Then replace "error rate" by "average error" (i.e your first column >of data). > >> So it is just a matter of tightening up the variance. There >> are just as many expected cases that need to be increased as >> those that need to be decreased. So by "cooking" the numbers >> you can expect values that will be more similar to making a >> larger set of runs. > >I don't think so. >
By variance here I meant the variance of the average error values. In your own words, cooking the noise will make the average error values more optimistic, meaning closer to zero. This is what a larger set of runs is expected to do as well. [...snip...]
> >By rescaling each individual noise pattern to have the same sigma, you >are destroying your results. >
"Modifying" is not "destroying".
>The purpose of creating an ensemble of 1,000 (or 10,000, or as >many as is necessary) noise patterns is so that you can see how the >statistics of added white Gaussian noise affect system performance. >
Unfortunately, the precision issue messes with this.
>As to whether doing this "excludes" other tests, of course it does >not, but if you want to discuss the performance of a system >in AWGN, and compare results with those of other investigations / >investigators, you really must not do this rescaling business. > >Steve
Well, I never claimed AWGN, all I claimed was "near Gaussian". My purpose was to back my assertion that my formula would react to noise in a similar manner to Jacobsen's estimator because his estimator is an approximation of my formula. I also said quite clearly in the Matlab Beginner thread that I was not the best person to do standard testing. My tests did back my assertions, and they seemed to prompt both Julien and Jacobsen to run independent tests. I am more impressed by Julien's tests because he tested all the formulas against both real and complex signals whereas Jacobsen, though he mentions my formula was derived for real signals, tested only against complex signals. I am still surprised and pleased at how well it does in the complex signal case. The bottom line is that my formula is a significant advance at the theoretical level and quite the contender at the pragmatic level. Martin Vicanek's approach still has the possibility of bettering it in noisy cases and I am working on an improvement which is much more calculation intensive, but may offer better results even yet. Stay tuned. Ced --------------------------------------- Posted through http://www.DSPRelated.com