Dear members, For two Real Random Process X(t) and Y(t), the cross-covariance is Mean of ((X(t)-MeanX(t))(Y(t)-MeanY(t)). I learn that people calculate the correlation of two Random Process to determine how similar two processes are. I am wondering what is the target of calculating the covariance? If we want to find out the similarity of two process, correlation is enough, why we need covariance? One more thing, correlation is for determine the similarity of two processes, what is the target of Auto-correlation? Thanks a lot. Regards
A question about Covariance??
Started by ●May 29, 2006
Reply by ●May 29, 20062006-05-29
VijaKhara wrote:> Dear members, > For two Real Random Process X(t) and Y(t), the cross-covariance is Mean > of ((X(t)-MeanX(t))(Y(t)-MeanY(t)). > I learn that people calculate the correlation of two Random Process to > determine how similar two processes are. I am wondering what is the > target of calculating the covariance? If we want to find out the > similarity of two process, correlation is enough, why we need > covariance? > > One more thing, correlation is for determine the similarity of two > processes, what is the target of Auto-correlation? > Thanks a lot. > > RegardsI will speak as a non-expert, this not being my field. Imagine you had a noise source, and you wanted to know what the decorrelation time is. Do an auto-correlation and measure the width of the "pulse" centered at zero. The pulse might be a simple spike, or the envelope of a high frequency oscillation. Of course, there are other uses for auto-correlations in other fields, but you are talking about random processes.
Reply by ●May 30, 20062006-05-30
VijaKhara, I'm by no means an expert on this material, just a grad student, but hopefully the following comments will help further your knowledge. It's difficult to do justice to these concepts quickly, so your best bet is to Google for detailed explanations with illustrations. Basic terminology and assumptions: Each index of a random process is a separate random variable. For an ideal wide-sense stationary (WSS) process, all of the random variables have the same mean/variance, plus they have a constant covariance for a given lag (by 'lag' I mean delaying one sequence relative to itself, or another sequence, by a fixed number of samples). Also, if the WSS process is ergodic, we can approximate ensemble statistics (derived from many instances of a process) with means and correlations from a sufficiently long instance of the process. The autocorrelation of a random process is its correlation with itself for a range of lags. (This correlation is not to be confused with the 'correlation' of random variables, which divides out the respective variances to normalize a covariance to the range -1 and 1.) If the process is zero mean, lag 0 of the autocorrelation is the process variance. The DFT of the autocorrelation is Power Spectral Density (PSD), so for real-valued signals, autocorrelation can be efficiently calculated from the IFFT of the |FFT|^2 of a signal (make sure to account for circular convolution by zero padding -- ref. overlap/add & overlap/save). Similarly, cross-correlation computes correlations between separate processes for a range of lags. One common use for autocorrelations and cross-correlations is for fitting one or more random processes to an autoregressive model (i.e. recursion plus random stimulus) using the Yule Walker equations.
Reply by ●May 31, 20062006-05-31
VijaKhara wrote:> Dear members, > For two Real Random Process X(t) and Y(t), the cross-covariance is Mean > of ((X(t)-MeanX(t))(Y(t)-MeanY(t)). > I learn that people calculate the correlation of two Random Process to > determine how similar two processes are. I am wondering what is the > target of calculating the covariance? If we want to find out the > similarity of two process, correlation is enough, why we need > covariance? > > One more thing, correlation is for determine the similarity of two > processes, what is the target of Auto-correlation? > Thanks a lot. > > RegardsVijaKhara, when dealing with more than one random process, it should be obvious that it would be nice to be able to have a number that could quickly give us an idea of how similar the processes are. To do this, we use the covariance, which is analogous to the variance of a single variable. A measure of how much the deviations of two or more variables or processes match.For two processes, X and Y if they are not closely related then the covariance will be small, and if they are similar then the covariance will be large. Two processes are "closely related" if their distribution spreads are almost equal and they are around the same, or a very slightly different, mean. Correlation of two variables provides us with a measure of how the two variables affect one another. A measure of how much one random variable depends upon the other.This measure of association between the variables will provide us with a clue as to how well the value of one variable can be predicted from the value of the other. The correlation is equal to the average of the product of two random variables. with Autocorrelation {Rxx(t)} we can find the energy of the signal. Energy of the signal = Rxx(0);
Reply by ●May 31, 20062006-05-31
> One more thing, correlation is for determine the similarity of two > processes, what is the target of Auto-correlation?Some other usefullness include: predicton (ie, forecasting) data compression (ie, lpc coding of speech signals) de-noising etc..
Reply by ●May 31, 20062006-05-31
> VijaKhara, > > when dealing with more than one random process, it should be obvious > that it would be nice to be able to have a number that could quickly > give us an idea of how similar the processes are. To do this, we use > the covariance, which is analogous to the variance of a single > variable. > A measure of how much the deviations of two or more variables or > processes match.For two processes, X and Y if they are not closely > related then the covariance will be small, and if they are similar then > the covariance will be large. Two processes are "closely related" if > their distribution spreads are almost equal and they are around the > same, or a very slightly different, mean. > Correlation of two variables provides us with a measure of how the two > variables affect one another. > A measure of how much one random variable depends upon the > other.This measure of association between the variables will provide us > with a clue as to how well the value of one variable can be predicted > from the value of the other. The correlation is equal to the average > of the product of two random variables. > with Autocorrelation {Rxx(t)} we can find the energy of the signal. > Energy of the signal = Rxx(0);Srikar, With DFTs we're dealing with periodic signals, which are power signals, so Rxx is defined as the limit of the mean as L->inf. So the Fourier transform of Rxx is a PSD, not an ESD. By the same token, Rxx(0) is the average signal power, not the energy. The cross-covariance of two random processes is close to the cross-correlation. For the cross-covariance, you subtract away the mean from each process. Both are a function of lag. When you refer to the distribution spread, are you referring to lag 0 (assuming we're dealing with a WSS process)? Also, cross-correlation isn't a good indicator of how two variables relate. At the very least, you need to use the "cross-correlation function", which normalizes by the standard deviation of each process. (This is akin to the defintion of correlation for random variables.) In the frequency domain there's something similar called 'coherence' (or squared coherency) that normalizes the cross spectrum by the square roots of the processes' spectra. Bear in mind, also, that highly correlated variables may have no causal link (for example, there's the classic example of high crime rates correlating with the number of churches/temples in a given area, but the underlying cause is actually population density). So take care with phrases like "[a] measure of how much one random variable depends upon the other". There's actually a great mathematical tool called Granger Causality that does allow a researcher to more clearly understand how processes relate, but it's not perfect either... As the old saying goes, there are three kinds of lies: lies, damned lies, and statistics. Regards
Reply by ●June 2, 20062006-06-02
VijaKhara skrev:> Dear members, > For two Real Random Process X(t) and Y(t), the cross-covariance is Mean > of ((X(t)-MeanX(t))(Y(t)-MeanY(t)). > I learn that people calculate the correlation of two Random Process to > determine how similar two processes are. I am wondering what is the > target of calculating the covariance? If we want to find out the > similarity of two process, correlation is enough, why we need > covariance?I think you are wrong. If we want to find the "similarity" between two general signals, we need to use covariance. Correlation only works if we want to test zero-mean signals. Take an example: x(n) = cos(pi*n/2)+10 y(n)= sin(pi*n/2)+ 3 The cos and sin terms (with zero mean) are orthogonal. Adding the mean terms destroy this orthiogonality, as is easily seen: One period, cos and sin terms only, zero mean: x1(n) = 1 0 -1 0 y1(n) = 0 1 0 -1 <x1,y1> = 0 Adding the means: x2(n) = 11 10 9 10 y2(n) = 3 4 3 2 <x2,y2> = 33+40+27+20 = 110 So two signal shapes that ought to be orthogonal end up with a non-zero correlation by adding a non-zero mean. The "spurious" correlation <x2,y2> above can easily mask the "true" match <y2,y2>, as can be seen <y2,y2> = 9 + 16 + 9 +4 = 38 << 110. Subtracting the means is crucial to find the correct match.> One more thing, correlation is for determine the similarity of two > processes, what is the target of Auto-correlation?One use is to test for periodic features in the signal. Rune
Reply by ●June 2, 20062006-06-02
On 2006-06-02 07:24:55 -0300, "Rune Allnor" <allnor@tele.ntnu.no> said:> > VijaKhara skrev: >> Dear members, >> For two Real Random Process X(t) and Y(t), the cross-covariance is Mean >> of ((X(t)-MeanX(t))(Y(t)-MeanY(t)). >> I learn that people calculate the correlation of two Random Process to >> determine how similar two processes are. I am wondering what is the >> target of calculating the covariance? If we want to find out the >> similarity of two process, correlation is enough, why we need >> covariance? > > I think you are wrong. If we want to find the "similarity" between two > general signals, we need to use covariance. Correlation only works if > we want to test zero-mean signals. > > Take an example: > > x(n) = cos(pi*n/2)+10 > y(n)= sin(pi*n/2)+ 3 > > The cos and sin terms (with zero mean) are orthogonal. > Adding the mean terms destroy this orthiogonality, as is > easily seen: > > One period, cos and sin terms only, zero mean: > > x1(n) = 1 0 -1 0 > y1(n) = 0 1 0 -1 > > <x1,y1> = 0 > > Adding the means: > > x2(n) = 11 10 9 10 > y2(n) = 3 4 3 2 > > <x2,y2> = 33+40+27+20 = 110 > > So two signal shapes that ought to be orthogonal > end up with a non-zero correlation by adding a > non-zero mean. > > The "spurious" correlation <x2,y2> above can > easily mask the "true" match <y2,y2>, as can > be seen > > <y2,y2> = 9 + 16 + 9 +4 = 38 << 110. > > Subtracting the means is crucial to find the correct match. > >> One more thing, correlation is for determine the similarity of two >> processes, what is the target of Auto-correlation? > > One use is to test for periodic features in the signal. > > RuneThe difference between correlation and covariance is scaling and not centering. Correlation is scaled to lie between -1 and +1. The scale factor is the product of the two standard deviations (which is of the same physical dimension like volts) so correlation is a pure number. Both correlation and covariance are with respect to the mean, or in other words they are centered. One expects that what the original poster missed was the extra word "auto" as in autocorrelation, autocovariance and autocrosscovariance. There is a terminology problem. The autocovariance is between time offsets of the same time series while the scalar version is between two different statistical events. So autocrosscovariance would be needed for time offsets of differing time series. In you example the sine times series is zero correlated at offset zero with the cosine but well correlated (value 1) at an offset of 1/4 period. The answer to the original question is that the correlation needs to also specifiy a timing offset. That then involves the extra adjective of auto as in autocovariance to see if a time series is like itself but with a delay or autocrosscovariance to see if the one time series more like the other at some particular delay. Ask how one detect the other series being a delayed version of the first series as a simple case where the offset is important.
Reply by ●June 3, 20062006-06-03
Gordon Sande skrev:> On 2006-06-02 07:24:55 -0300, "Rune Allnor" <allnor@tele.ntnu.no> said: > > > > > VijaKhara skrev: > >> Dear members, > >> For two Real Random Process X(t) and Y(t), the cross-covariance is Mean > >> of ((X(t)-MeanX(t))(Y(t)-MeanY(t)). > >> I learn that people calculate the correlation of two Random Process to > >> determine how similar two processes are. I am wondering what is the > >> target of calculating the covariance? If we want to find out the > >> similarity of two process, correlation is enough, why we need > >> covariance? > > > > I think you are wrong. If we want to find the "similarity" between two > > general signals, we need to use covariance. Correlation only works if > > we want to test zero-mean signals. > > > > Take an example: > > > > x(n) = cos(pi*n/2)+10 > > y(n)= sin(pi*n/2)+ 3 > > > > The cos and sin terms (with zero mean) are orthogonal. > > Adding the mean terms destroy this orthiogonality, as is > > easily seen: > > > > One period, cos and sin terms only, zero mean: > > > > x1(n) = 1 0 -1 0 > > y1(n) = 0 1 0 -1 > > > > <x1,y1> = 0 > > > > Adding the means: > > > > x2(n) = 11 10 9 10 > > y2(n) = 3 4 3 2 > > > > <x2,y2> = 33+40+27+20 = 110 > > > > So two signal shapes that ought to be orthogonal > > end up with a non-zero correlation by adding a > > non-zero mean. > > > > The "spurious" correlation <x2,y2> above can > > easily mask the "true" match <y2,y2>, as can > > be seen > > > > <y2,y2> = 9 + 16 + 9 +4 = 38 << 110. > > > > Subtracting the means is crucial to find the correct match. > > > >> One more thing, correlation is for determine the similarity of two > >> processes, what is the target of Auto-correlation? > > > > One use is to test for periodic features in the signal. > > > > Rune > > The difference between correlation and covariance is scaling and not > centering. Correlation is scaled to lie between -1 and +1. The scale > factor is the product of the two standard deviations (which is of the > same physical dimension like volts) so correlation is a pure number. > Both correlation and covariance are with respect to the mean, or in other > words they are centered.Hmmm.... What you say makes sense to me; the normalized and non-normalized centered second order moments seem to be useful. However, it seems to be at odds with most texts on statistichal DSP I have seen... I think. I don't have my books easily available right now, but as far as I remember, the definitions for stationary x(t) and y(t) are Covariance: Cxy(tau) = <x(t)-m_x,y(t+tau)-m_y> Correlation: Rxy(tau) = <x(t),y(t+tau)> where m_x and m_y are the means of x(t) and y(t). I have seen the normalized version named "coherence" in some texts. The one text I remember did things this way, is Therrien: Random data and statistical signal processing Prentice-Hall, 1992, and I *think* Bendat and Piersol: Random data, Wiley, 2000 used the same convention. Rune
Reply by ●June 3, 20062006-06-03
On 2006-06-03 10:34:14 -0300, "Rune Allnor" <allnor@tele.ntnu.no> said:> > Gordon Sande skrev: >> On 2006-06-02 07:24:55 -0300, "Rune Allnor" <allnor@tele.ntnu.no> said: >> >>> >>> VijaKhara skrev: >>>> Dear members, >>>> For two Real Random Process X(t) and Y(t), the cross-covariance is Mean >>>> of ((X(t)-MeanX(t))(Y(t)-MeanY(t)). >>>> I learn that people calculate the correlation of two Random Process to >>>> determine how similar two processes are. I am wondering what is the >>>> target of calculating the covariance? If we want to find out the >>>> similarity of two process, correlation is enough, why we need >>>> covariance? >>> >>> I think you are wrong. If we want to find the "similarity" between two >>> general signals, we need to use covariance. Correlation only works if >>> we want to test zero-mean signals. >>> >>> Take an example: >>> >>> x(n) = cos(pi*n/2)+10 >>> y(n)= sin(pi*n/2)+ 3 >>> >>> The cos and sin terms (with zero mean) are orthogonal. >>> Adding the mean terms destroy this orthiogonality, as is >>> easily seen: >>> >>> One period, cos and sin terms only, zero mean: >>> >>> x1(n) = 1 0 -1 0 >>> y1(n) = 0 1 0 -1 >>> >>> <x1,y1> = 0 >>> >>> Adding the means: >>> >>> x2(n) = 11 10 9 10 >>> y2(n) = 3 4 3 2 >>> >>> <x2,y2> = 33+40+27+20 = 110 >>> >>> So two signal shapes that ought to be orthogonal >>> end up with a non-zero correlation by adding a >>> non-zero mean. >>> >>> The "spurious" correlation <x2,y2> above can >>> easily mask the "true" match <y2,y2>, as can >>> be seen >>> >>> <y2,y2> = 9 + 16 + 9 +4 = 38 << 110. >>> >>> Subtracting the means is crucial to find the correct match. >>> >>>> One more thing, correlation is for determine the similarity of two >>>> processes, what is the target of Auto-correlation? >>> >>> One use is to test for periodic features in the signal. >>> >>> Rune >> >> The difference between correlation and covariance is scaling and not >> centering. Correlation is scaled to lie between -1 and +1. The scale >> factor is the product of the two standard deviations (which is of the >> same physical dimension like volts) so correlation is a pure number. >> Both correlation and covariance are with respect to the mean, or in other >> words they are centered. > > Hmmm.... > > What you say makes sense to me; the normalized and non-normalized > centered second order moments seem to be useful. However, it seems > to be at odds with most texts on statistichal DSP I have seen... I > think.More than a little sloppiness in many of the books. The zero mean assumption is often several chapters earlier so opening the book directly is a good way to miss it.> I don't have my books easily available right now, but as far as I > remember, the definitions for stationary x(t) and y(t) areFor scalars getting the divisors right is easy. For time series the issue is more awkward so it is likely to be skipped.> Covariance: Cxy(tau) = <x(t)-m_x,y(t+tau)-m_y> > Correlation: Rxy(tau) = <x(t),y(t+tau)> > > where m_x and m_y are the means of x(t) and y(t). I have seen the > normalized version named "coherence" in some texts.Coherence is usually measured in the frequency domain. It is a scaled cross spectrum. Requires a bit of care in seting up the windows as overly narrow windows lead to high coherence.> The one text I remember did things this way, is > > Therrien: Random data and statistical signal processing > Prentice-Hall, 1992, > > and I *think* > > Bendat and Piersol: Random data, Wiley, 2000 > > used the same convention.*IF* they did then there are many reputable books, not at the introductory level, that do not. The only excuse for that type of sloppiness is trying to avoid the requirement for highly technical.> RuneThe prevasive statistical usage is that correlation is the covariance scaled to lie in [ -1, 1 ]. It is sloppy, but all too common, to say correlation when covariance is the correct term. Correlation analysis is often said when the analysis if for linear structure as measured by covariances. Correlation of 0.80 is a meaningful statement but covariance of 1.5 conveys no immediate meaning as the units are unknown.






