DSPRelated.com
Forums

convolving noise with noise will get Gaussian? what does that imply?

Started by kiki November 2, 2004
Suppose I have a segment of data, which is basically random numbers between 
0 and 1. I call it "f".

Doing "f" -- the noise -- self convolution several times, the resultant 
data, if plotted, is of Gaussian shape.

I know this is somewhat related to CLT...

But what does this mean in practice? What does this imply? Does this CLT 
fact has any implication or application in practice? Any deep thoughts, 
intuitions?

Thanks a lot!





"kiki" <lunaliu3@yahoo.com> wrote in message news:<cm8mci$sjo$1@news.Stanford.EDU>...
> Suppose I have a segment of data, which is basically random numbers between > 0 and 1. I call it "f". > > Doing "f" -- the noise -- self convolution several times, the resultant > data, if plotted, is of Gaussian shape.
Exactly what do you do, and how have you verified that your output is of Gaussian shape, and not, say, triangular?
> I know this is somewhat related to CLT...
It needs not be at all. I'd put my money on that what you see is a "streching effect" due to accumulating end effects from doing the multiple convolutions. I'd expect something like this to happen even if you happen to perform a circular convolution. Actually (but I may be wrong here!), I believe the CLT applies to convolving PDFs, not the random data themselves...
> But what does this mean in practice? What does this imply? Does this CLT > fact has any implication or application in practice? Any deep thoughts, > intuitions?
I'd expect it to be a nice learning experience with respect to pracitical analyzis of data (including documentation of your algorithm), as well as to approach results with a hint of critical thoughts involved. Other than that, I'm quite shallow these days.
> Thanks a lot!
Y' welcome. Rune
On Tue, 2 Nov 2004 11:15:30 -0800, "kiki" <lunaliu3@yahoo.com> wrote:

>Suppose I have a segment of data, which is basically random numbers between >0 and 1. I call it "f". > >Doing "f" -- the noise -- self convolution several times, the resultant >data, if plotted, is of Gaussian shape. > >I know this is somewhat related to CLT... > >But what does this mean in practice? What does this imply? Does this CLT >fact has any implication or application in practice? Any deep thoughts, >intuitions?
I don't have any deep thoughts or intuitions about what this means, but since someone else has said you're simply wrong about all this I'll just say that yes, convolving the distribution of the noise with itself several times does lead to something roughly Gaussian. This has a lot to do with CLT, in fact this is exactly what CLT _says_!
>Thanks a lot! > > > >
************************ David C. Ullrich
In article <f56893ae.0411030026.1385213f@posting.google.com>,
Rune Allnor <allnor@tele.ntnu.no> wrote:
>"kiki" <lunaliu3@yahoo.com> wrote in message news:<cm8mci$sjo$1@news.Stanford.EDU>... >> Suppose I have a segment of data, which is basically random numbers between >> 0 and 1. I call it "f".
>> Doing "f" -- the noise -- self convolution several times, the resultant >> data, if plotted, is of Gaussian shape.
The convolution of identical distributions with second moments is APPROXIMATELY normal, the approximation becoming better with the number convolved. The convolution of two distributions is never normal unless both are normal. These two theorems may seem paradoxical, but they are both true. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Department of Statistics, Purdue University hrubin@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
hrubin@odds.stat.purdue.edu (Herman Rubin) wrote in message news:<cmbd3d$5vsm@odds.stat.purdue.edu>...
> In article <f56893ae.0411030026.1385213f@posting.google.com>, > Rune Allnor <allnor@tele.ntnu.no> wrote: > >"kiki" <lunaliu3@yahoo.com> wrote in message news:<cm8mci$sjo$1@news.Stanford.EDU>... > >> Suppose I have a segment of data, which is basically random numbers between > >> 0 and 1. I call it "f". > > >> Doing "f" -- the noise -- self convolution several times, the resultant > >> data, if plotted, is of Gaussian shape. > > The convolution of identical distributions with second > moments is APPROXIMATELY normal, the approximation becoming > better with the number convolved. > > The convolution of two distributions is never normal > unless both are normal. > > These two theorems may seem paradoxical, but they are > both true.
I only know the CLT from statistics, where Papoulis says [1, p 214]: "The [CLT] states that under _certain_general_conditions_..." and "The CLT can be expressed as a property of convolution: The convolution of a large number of _positive_functions_ is approcimately a normal function." (My emphasis in both quotes.) So there are "conditions" for the CLT to hold, and "positive functions" are mentioned explicitly. My personal observation is that the "CLT" is always mentioned in the context of "distributions", never with "data". Now, the OP mentined that she(?) worked with random *data* with a probability distribution defined on the interval [0,1]. So my argument is that the CLT can not be invoked in this case, since the convolution operates on the *data*, not the PDF. I could stretch as far, though, as accepting the CLT as not entirely irrelevant for this particular data set that necessarily is non-negative because of the ditribution it is generated from. In general, however, I think the CLT has no place in a discussion that regards the *data* directly. Rune [1] Papoulis: "Probability, Random Variables and stochastic Processes" 3rd ed., McGraw-Hill, 1991.
Randy Yates <randy.yates@sonyericsson.com> wrote in message news:<xxppt2tofos.fsf@usrts005.corpusers.net>...
> allnor@tele.ntnu.no (Rune Allnor) writes: > > > hrubin@odds.stat.purdue.edu (Herman Rubin) wrote in message news:<cmbd3d$5vsm@odds.stat.purdue.edu>... > > > In article <f56893ae.0411030026.1385213f@posting.google.com>, > > > Rune Allnor <allnor@tele.ntnu.no> wrote: > > > >"kiki" <lunaliu3@yahoo.com> wrote in message news:<cm8mci$sjo$1@news.Stanford.EDU>... > > > >> Suppose I have a segment of data, which is basically random numbers between > > > >> 0 and 1. I call it "f". > > > > >> Doing "f" -- the noise -- self convolution several times, the resultant > > > >> data, if plotted, is of Gaussian shape. > > > > > > The convolution of identical distributions with second > > > moments is APPROXIMATELY normal, the approximation becoming > > > better with the number convolved. > > > > > > The convolution of two distributions is never normal > > > unless both are normal. > > > > > > These two theorems may seem paradoxical, but they are > > > both true. > > > > I only know the CLT from statistics, where Papoulis says [1, p 214]: > > > > "The [CLT] states that under _certain_general_conditions_..." > > > > and > > > > "The CLT can be expressed as a property of convolution: The > > convolution of a large number of _positive_functions_ > > is approcimately a normal function." > > > > (My emphasis in both quotes.) > > > > So there are "conditions" for the CLT to hold, and "positive functions" > > are mentioned explicitly. My personal observation is that the "CLT" is > > always mentioned in the context of "distributions", never with "data". > > Rune, > > In my estimation, you're either confused or blinding yourself by being > overly pedantic.
In the past, I've been known to be both... ;)
> First of all, a PDF is *ALWAYS* a "positive function,"
Agreed.
> so your caveat > on this point is empty.
It is pretty obvious from the original post that what is being convolved here is the (positive) random data that comply to a (positive) PDF, not the PDF itself. Or, to rephrase the original question a bit a more pedantically: [ Note that I have changed the notation to comply with "standard" conventions. The fact that a random variable by chance is referred to by the letter "f" does *not* transform the variable into a PDF!] A random vector x= [x_0, x_1,...,x_N] is drawn from a random process X. The process X is characterized by an (unspecified) probability density function f such that each coefficient x_n of x obeys 0 < x_n < 1, n=0,1,...N. The discussion relates to repeated convolutions of the random vector x, not the PDF f. In fact, the PDF f has never been explicitly defined in this thread. I am sure you agree with me in that the Gaussian N(0,1) probability distribution function is positive for all arguments. I am also sure you agree with me in that the random data generated by a stochastic process characterized by the N(0,1) PDF will *not* be positive. So where does the CLT apply? With positive PDFs? With not necessarily positive random data? That's the whole point here.
> Secondly, when you add independent random variables, the distribution > of the result is the convolution of the input variables' PDFs.
Exactly. But the OP talked about "data", not "PDFs". One of my severe shortcommings in life, is that I am not clairvoyant. I can't look into the minds of other people and see what they actually mean to ask. I have to relate to what they express either orally or in writing. The OP explicitly used the term "data". So I try to discuss "data". If the OP meant to discuss PDFs, well, so be it. If so, she(?) phrased the question very poorly. And did the wrong experiment as well.
> What's > the difference betseen adding "data" that is from two random variables > and adding the random variables themselves? I say "nothing."
I have no problems with that. But what you say here has nothing to do with PDFs. The CLT as I know it, applies to PDFs, not random variables.
> Both Papoulis and Vinniotis [1] state that in practice, adding about 30 > variables results in a Gaussian distribution.
I am sure they are right. But I can't see what this has to do with *convolving* the random variables, which was what the OP did. The last time I checked, "addition" and "convolution" were two different operations. Again, the CLT as I know it applies to PDFs. Please show me references to the CLT being extended to arbitrary non-positive functions.
> [1] Yannis Vinniotis, "Probability and Random Process for Electrical > Engineers," C1998, McGraw-Hill.
Rune
allnor@tele.ntnu.no (Rune Allnor) writes:
> [...] > The CLT as I know it, applies to PDFs, not random variables.
Do you mean that the convolution it speaks of is of the PDFs, not the random variables themselves? I agree with that. However, to say that the CLT does not apply to random variables is pretty much false in my book - it's all about random variables.
> [...] But I can't see what this has to do with *convolving* the > random variables, which was what the OP did.
I didn't read the original post clearly enough. It does appear that is what he is saying. Of course that is a different operation than adding the R.V.s.
> The last time I checked, "addition" and "convolution" were two > different operations.
That comes across as extremely smart-assed, Rune. Maybe that's my misinterpretation, but I thought I'd let you know.
> Again, the CLT as I know it applies to PDFs. Please show me references > to the CLT being extended to arbitrary non-positive functions.
Again, as in the first paragraph above, the CLT does "apply" to random variables. The proper thing to do is supply a version of the theorem and show this from the language of the theorem, but I'm not motivated enough at the moment. To make a long story short, I thought the OP was talking about adding the noise data. If that's not the operation being performed, then I agree that the CLT doesn't apply. -- % Randy Yates % "Rollin' and riding and slippin' and %% Fuquay-Varina, NC % sliding, it's magic." %%% 919-577-9882 % %%%% <yates@ieee.org> % 'Living' Thing', *A New World Record*, ELO http://home.earthlink.net/~yatescr
Randy Yates <yates@ieee.org> wrote in message news:<7jp0bmy9.fsf@ieee.org>...
> allnor@tele.ntnu.no (Rune Allnor) writes: > > [...] > > The CLT as I know it, applies to PDFs, not random variables. > > Do you mean that the convolution it speaks of is of the PDFs, not > the random variables themselves?
That's exactly what I have been saying during this whole thread!
> I agree with that.
Good.
> However, to > say that the CLT does not apply to random variables is pretty > much false in my book - it's all about random variables.
OK, if we have to go nit-picking, here's my 2c: A "random process" generates "random variables" (or "random data"), RVs, that in some way are characterized by a "Probablility Density Function", PDF. In that sense, the RV and the PDF are interconnected in that both are associated with a random process. The "CLT operator" takes multiple PDFs as input and produces one PDF as output. When I look at the inner workings of the CLT, I see PDFs, not RVs. I could have agreed with you if you said "it's all about random _processes_". You didn't.
> > [...] But I can't see what this has to do with *convolving* the > > random variables, which was what the OP did. > > I didn't read the original post clearly enough. It does appear that > is what he is saying. Of course that is a different operation than > adding the R.V.s.
Good. We agree.
> > The last time I checked, "addition" and "convolution" were two > > different operations. > > That comes across as extremely smart-assed, Rune. Maybe that's my > misinterpretation, but I thought I'd let you know.
Too bad. Still, one ought to be aware that different words more often than not mean different things. More than that, it depends to a large extent on the context whether any particular word makes sense or not. My point is that mentioning the CLT only makes sense when studying PDFs. The OP tried to link the CLT directly to the random variable. A "PDF" and a "random variable" are, like it or not, two different things just as "addition" and "convolution" are two different things. If stating this makes me a smart-ass, well, so be it. Like it or not, but such subtle points are essential to this discussion.
> > Again, the CLT as I know it applies to PDFs. Please show me references > > to the CLT being extended to arbitrary non-positive functions. > > Again, as in the first paragraph above, the CLT does "apply" to random > variables. The proper thing to do is supply a version of the theorem > and show this from the language of the theorem, but I'm not motivated > enough at the moment.
I haven't seen that done, and based on the text I quoted a couple of posts ago, I doubt the CLT is as general as that. I would prefer to see a proof that the "CLT operator" works as well with RVs as it does with PDFs.
> To make a long story short, I thought the OP was talking about adding > the noise data. If that's not the operation being performed, then I > agree that the CLT doesn't apply.
Good. We agree. Rune
allnor@tele.ntnu.no (Rune Allnor) writes:

> Randy Yates <yates@ieee.org> wrote in message news:<7jp0bmy9.fsf@ieee.org>... > > allnor@tele.ntnu.no (Rune Allnor) writes: > > > [...] > > > The CLT as I know it, applies to PDFs, not random variables. > > > > Do you mean that the convolution it speaks of is of the PDFs, not > > the random variables themselves? > > That's exactly what I have been saying during this whole thread!
No, that's not "exactly" what you've been saying, and that is part of my issue with you, Rune. You said, exactly, The CLT as I know it, applies to PDFs, not random variables. This statement does not mention convolution.
> > However, to > > say that the CLT does not apply to random variables is pretty > > much false in my book - it's all about random variables. > > OK, if we have to go nit-picking,
If I'm nit-picking, then so is Papoulis. His section on the CLT begins like this: Given n independent *RVs* x_i, we form their sum x = x_1 + ... + x_n This is an *RV* with mean ... and variance ... . ... Furthermore, if the *RVs* x_i are of continuous type, ... the density f(x) of x approaches a normal density ... . This important theorem ... . [emphases mine]. He CLEARLY associates the CLT with RVs. Now it is true that he also goes on to say "The CLT can be expressed as a property of convolutions ...", but it seems pretty clear that the main interpretation and utility of the CLT is in association with RVs. To divorce it from RVs and speak only of convolving "positive functions," while theoretically accurate, robs it of its real value: explaining why randomness in nature is often Gaussian.
> here's my 2c: A "random process" > generates "random variables" (or "random data"), RVs, that in some > way are characterized by a "Probablility Density Function", PDF. > In that sense, the RV and the PDF are interconnected in that both > are associated with a random process.
Wow. Now that's rich, Rune. After two courses in Random Processes and another two in basic probability theory, I've never heard anyone condition the association of a RV and its PDF on an association with a random process. I don't know where you've come up with that idea, but it is completely unorthodox in my experience.
> The "CLT operator"
Huh? Since when was anyone talking about a "CLT operator"? You've just now introduced new language. The topic of discussion thus far has been about a theorem, the "Central Limit Theorem," NOT an operator!
> takes multiple PDFs as input and produces one > PDF as output. When I look at the inner workings of the CLT, I see > PDFs, not RVs. I could have agreed with you if you said "it's all > about random _processes_". You didn't.
No, I certainly did not, because the CLT (reverting to the terminology that we've been using) at least as presented by Papoulis, is not about a random process. It has NOTHING to do with random processes.
> My point is that mentioning the CLT only makes sense when studying > PDFs.
I heartily disagree, for the reasons I've already explained above.
> The OP tried to link the CLT directly to the random variable.
As well he should. The only problem is, he apparently did so improperly (i.e., via convolution of the RVs rather than the sum of the RVs). -- Randy Yates Sony Ericsson Mobile Communications Research Triangle Park, NC, USA randy.yates@sonyericsson.com, 919-472-1124
Randy Yates <randy.yates@sonyericsson.com> wrote in message news:<xxpzn1u9xzh.fsf@usrts005.corpusers.net>...
> allnor@tele.ntnu.no (Rune Allnor) writes: > > > Randy Yates <yates@ieee.org> wrote in message news:<7jp0bmy9.fsf@ieee.org>... > > > allnor@tele.ntnu.no (Rune Allnor) writes: > > > > [...] > > > > The CLT as I know it, applies to PDFs, not random variables. > > > > > > Do you mean that the convolution it speaks of is of the PDFs, not > > > the random variables themselves? > > > > That's exactly what I have been saying during this whole thread! > > No, that's not "exactly" what you've been saying, and that is part > of my issue with you, Rune. You said, exactly, > > The CLT as I know it, applies to PDFs, not random variables. > > This statement does not mention convolution.
C'm on, Randy. If you read the whole thread (including your own posts), you will find that the OP convolved the data in the first place. In fact, you yourself wrote "Secondly, when you add independent random variables, the distribution of the result is the convolution of the input variables' PDFs." in the post of November 4th (your first post in this thread). We agree in the basic properties of the CLT; why do you make such a fuzz about disagreeing with me now?
> > > However, to > > > say that the CLT does not apply to random variables is pretty > > > much false in my book - it's all about random variables. > > > > OK, if we have to go nit-picking, > > If I'm nit-picking, then so is Papoulis. His section on the CLT > begins like this: > > Given n independent *RVs* x_i, we form their sum > > x = x_1 + ... + x_n > > This is an *RV* with mean ... and variance ... . ... Furthermore, if > the *RVs* x_i are of continuous type, ... the density f(x) of x > approaches a normal density ... . This important theorem ... . > > [emphases mine]. He CLEARLY associates the CLT with RVs.
I have never contested that. But if you want the CLT to work and produce Gaussian distributions, you need to work on the PDFs.
> Now it is true that he also goes on to say "The CLT can be expressed > as a property of convolutions ...", but it seems pretty clear that the > main interpretation and utility of the CLT is in association with RVs. > To divorce it from RVs and speak only of convolving "positive > functions," while theoretically accurate,
Make up your mind. Do you agree in tht what is convolved to produce results according to the CLT are PDFs, or do you not agree?
> robs it of its real value: > explaining why randomness in nature is often Gaussian.
No. The CLT is an ad hoc excuse for the analyst to stay with the nice and easily tractable Gaussian distributions instead of diving into the more tricky ones. The CLT does not "make a non-Gaussian process Gaussian", it only provides some comfort in stating that one does not make a very big mistake if one chooses to work under the Gaussian hypothesis.
> > here's my 2c: A "random process" > > generates "random variables" (or "random data"), RVs, that in some > > way are characterized by a "Probablility Density Function", PDF. > > In that sense, the RV and the PDF are interconnected in that both > > are associated with a random process. > > Wow. Now that's rich, Rune. After two courses in Random Processes > and another two in basic probability theory, I've never heard anyone > condition the association of a RV and its PDF on an association > with a random process. I don't know where you've come up with that > idea, but it is completely unorthodox in my experience.
Is a "random process" unorthodox to you? (OK, I should perhaps used the term "stochastic process", but I didn't want to go pedantic on you...) Hey, Randy, this is a joke, right?
> > The "CLT operator" > > Huh? Since when was anyone talking about a "CLT operator"? You've just > now introduced new language. The topic of discussion thus far has been > about a theorem, the "Central Limit Theorem," NOT an operator!
I'm not introducing new language. If you take a course on linear systems in maths, you'll find the term "operator" used all over the place. Particularly in the context of convolution integrals. If you express the CLT as a property of the expression y_CLT = y_1 (*) y_2 (*) ... (*) y_N where (*) means convolution and y_n are PDFs, the term "CLT operator" makes perfect sense.
> > takes multiple PDFs as input and produces one > > PDF as output. When I look at the inner workings of the CLT, I see > > PDFs, not RVs. I could have agreed with you if you said "it's all > > about random _processes_". You didn't. > > No, I certainly did not, because the CLT (reverting to the terminology > that we've been using) at least as presented by Papoulis, is not about > a random process. It has NOTHING to do with random processes.
Well, you may disagree with my approach to these matters and the exact way I interpret the problem and phrase my opionions. You should be very careful about how you state your objections, though. You might find yourself in a position you can not defend.
> > My point is that mentioning the CLT only makes sense when studying > > PDFs. > > I heartily disagree, for the reasons I've already explained above.
Please, Randy, I know you don't mean this. Yes, the effects of adding several random variables is the reason why the CLT is interesting. Arguing *why* the CLT works, and *how*, requires the studying stochastic processes and the convolution of their PDFs. Not the random variables. For the simple reason that given a random vector, you don't know anything about its PDF. You can make up an opinion, based on a histogram, but you don't know. The concept of a PDF only makes sense in the context of a stochastic process.
> > The OP tried to link the CLT directly to the random variable. > > As well he should. The only problem is, he apparently did so > improperly (i.e., via convolution of the RVs rather than the > sum of the RVs).
The OP used random data (a single realization of a random variable) where a PDF should have been used. The exact nature of the PDF was never specified (not enven an estimate through a histogram), and no histogram of the resulting data were used. The important difference between a stochastic process generating random variables, and the random data as a realization of sucha random variable, was never grasped. The question was phrased in a way that disagreed just enough with standard terminology to cause confusion (denoting the random variable by the symbol "f", which usually is reserved for PDFs). Apart from that, the OP did an excellent job in verifying the CLT. Rune