Forums

Correlation of 2 Sequences using FFT (W/W.O FFTW)

Started by Ramoj Paruchuri July 14, 2004
Hi All,

I am very much excited to see that this group answers almost all the
doubts. But I have some trivial yet confusing doubts.

Situation/Objective:

Correlating 2 Sequences -- S = {s1,s2,s3,...sN} and C =
{c1,c2,c3,....cM} where S and C are 2 discrete set of points one of
length N and one of length M (say, M>N). We do not know anything about
their periodicity or any other properties...etc.

All we know is that these sequences are 2 sets of points. To make the
case simple,  we can assume that these values are noted/taken/observed
at constant time interval that is same for both the sequences. i.e.,
say deltat= (tK-tJ)=(tJ-tI) for all i, j, k and the sequences are
noted as below...or some thing like that:

Time   S     C
to     --    c0
t1     --    c1
t2     s1    c2
.......
........
tK    sN     cK
.....
........
tM    --     cM


Some concers that I have: 
1. I could physically correlate S to C (say), and found {s1....sN}
fits/correlates well with some section of C say {cE.....cL} how can I
find the index cE on the sequence C when I try to do this correlation
by FFT.

2. Correlation through FFT needs: a. Finding FFT(S) and FFT(C) b.
ComplexArray=FFT(S) * Conj{FFT(C)} C. IFFT{ComplexArray}

So, I need to pad (say zero padding) S and C individually to some
number 2^N (Say if I want to do a radix2 type) ---Is this correct?

3. If yes, what difference does it make to pad some zero's on top of S
and some on the bottom of S to padding all the zeros on the bottom.
-- When I am using FFTW for calculating r2c transform, do I need to
necessarily pad below the actual sequence?

4. After padding how can I say that ok, this sequence S fits with that
part of sequence C staring with some index = cE (say)?

I know certain things here and there on FFTs Applied to Correlation,
but I wanted to confirm what I understand is write. I will be very
happy and thankful if you can answer the above questions...Thanks in
advance for your patience and help.

--Ramoj Paruchuri
ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote in message news:<30d8ae8b.0407141106.7a9ba8f2@posting.google.com>...
> Hi All, > > I am very much excited to see that this group answers almost all the > doubts. But I have some trivial yet confusing doubts.
Ah, the trivial and yet confusing questions are the fun ones; those are the ones one generally learn the most from...
> Situation/Objective: > > Correlating 2 Sequences -- S = {s1,s2,s3,...sN} and C = > {c1,c2,c3,....cM} where S and C are 2 discrete set of points one of > length N and one of length M (say, M>N). We do not know anything about > their periodicity or any other properties...etc. > > All we know is that these sequences are 2 sets of points. To make the > case simple, we can assume that these values are noted/taken/observed > at constant time interval that is same for both the sequences. i.e., > say deltat= (tK-tJ)=(tJ-tI) for all i, j, k and the sequences are > noted as below...or some thing like that: > > Time S C > to -- c0 > t1 -- c1 > t2 s1 c2 > ....... > ........ > tK sN cK > ..... > ........ > tM -- cM > > > Some concers that I have: > 1. I could physically correlate S to C (say), and found {s1....sN} > fits/correlates well with some section of C say {cE.....cL} how can I > find the index cE on the sequence C when I try to do this correlation > by FFT.
The time delay should be available in the time series after applying the IFFT to your spectrum. If IFFT is not an option, investigate the phase of your spectrum. Unwrap the phase and apply a weighted LMS procedure to find the slope of the phase, and thus the time delay, of your data. Please consult equation (5.101b) in Bendat & Piersol: Random Data, 3rd ed. Wiley Interscience, 2000 for details.
> 2. Correlation through FFT needs: a. Finding FFT(S) and FFT(C) b. > ComplexArray=FFT(S) * Conj{FFT(C)} C. IFFT{ComplexArray} > > So, I need to pad (say zero padding) S and C individually to some > number 2^N (Say if I want to do a radix2 type) ---Is this correct?
Sounds reasonable. You need to ensure that the frequency sampling is equal for the two spectra.
> 3. If yes, what difference does it make to pad some zero's on top of S > and some on the bottom of S to padding all the zeros on the bottom. > -- When I am using FFTW for calculating r2c transform, do I need to > necessarily pad below the actual sequence?
Well, that would have severe impact on the phase of the spectrum of s. If you implement a delay estimator based on phase as I suggested above, you will find that the time reference shifts according to how you zero pad the data.
> 4. After padding how can I say that ok, this sequence S fits with that > part of sequence C staring with some index = cE (say)?
I don't understand your question here?
> I know certain things here and there on FFTs Applied to Correlation, > but I wanted to confirm what I understand is write. I will be very > happy and thankful if you can answer the above questions...Thanks in > advance for your patience and help. > > --Ramoj Paruchuri
Except for question 4, where I did not understand what you mean, I think what you says makes sense and sounds reasonable. For practical use (i.e. interpretation and analysis of results) the books by Bendat, and Bendat & Piersol (there are three such books) are excellent. A good library will have a copy available of at least "Random Data" mentioned above. Rune
> Ah, the trivial and yet confusing questions are the fun ones; those > are the ones one generally learn the most from...
:) You are right!
> The time delay should be available in the time series after applying > the IFFT to your spectrum. If IFFT is not an option, investigate the > phase of your spectrum. Unwrap the phase and apply a weighted LMS > procedure to find the slope of the phase, and thus the time delay, > of your data. Please consult equation (5.101b) in > > Bendat & Piersol: Random Data, 3rd ed. > Wiley Interscience, 2000 > > for details.
This seems reasonable..Eithercase, I will try looking into that book.
> Sounds reasonable. You need to ensure that the frequency sampling is > equal for the two spectra.
Frequency sampling? or is it time sampling? -- I thought if one can ensure that time sampling is equal with the total no. of samples equal, frequency sampling of the 2 spectra are automatically equal?
> Well, that would have severe impact on the phase of the spectrum of s. > If you implement a delay estimator based on phase as I suggested above, > you will find that the time reference shifts according to how you zero > pad the data.
You are correct. But my problem was a simple correlation of 2 real no. sequences -- so my main objective would be to get a correlation coeff with the correlation index, how does it effect these? Also, in using FFTW (r2c) for FFT, zero padding at the end of the sequence is a *MUST* or does this optimize it more than padding on top and bottom?
> I don't understand your question here?
I think, I understand now...
> Except for question 4, where I did not understand what you mean, I think > what you says makes sense and sounds reasonable. For practical use (i.e. > interpretation and analysis of results) the books by Bendat, and Bendat & > Piersol (there are three such books) are excellent. A good library will > have a copy available of at least "Random Data" mentioned above. > > Rune
Thank you very much for the effort in explain the above. Help from others is very welcome..I would like to know more how this FFTW (r2c) thing works! Thanks in advance, Ramoj
ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote...
> You are correct. But my problem was a simple correlation of 2 real > no. sequences -- so my main objective would be to get a correlation > coeff with the correlation index, how does it effect these? > > Also, in using FFTW (r2c) for FFT, zero padding at the end of the > sequence is a *MUST* or does this optimize it more than padding on > top and bottom?
This has nothing in particular to do with FFTW or with its r2c transform (which is just an optimization for real data and does not affect signal-processing questions like whether the input is zero-padded etc.). Whether you zero-pad depends entirely upon how you want to treat the boundaries in the correlation function. If you don't zero pad, then you are computing the "cyclic" correlation function with periodic boundaries. If you zero pad to at least twice the length, then you are effectively computing the correlation with non-periodic, zero-extended boundaries. There are other possibilities as well (mirrored boundaries, etc). Neither of these choices is really correct if you want to estimate the correlation function of two infinite sequences via short segments thereof. Like the related problem of spectral estimation, you probably want to apply some sort of window function to the data before transforming, with or without zero-padding. Cordially, Steven G. Johnson
ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote...
> So, I need to pad (say zero padding) S and C individually to some > number 2^N (Say if I want to do a radix2 type) ---Is this correct?
If you are using FFTW, there is no need to pad to a power-of-two size. (Highly-composite sizes, say products of 2/3/5, *will* be more efficient, but any size should be handled in a reasonable amount of time.) The main reason for padding is to choose between cyclic and linear correlations, as I explained in my other message. Cordially, Steven G. Johnson
ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote in message news:<30d8ae8b.0407150703.4a39393c@posting.google.com>...

> > Sounds reasonable. You need to ensure that the frequency sampling is > > equal for the two spectra. > > Frequency sampling? or is it time sampling? -- I thought if one can > ensure that time sampling is equal with the total no. of samples > equal, frequency sampling of the 2 spectra are automatically equal?
The discretization of the frequency axis must be equal for both spectra. They must have the same sampling frequency (which is ensured by default in all but the weirdest sequences) and they need to have the same number of frequency bins. In the case that you have two sequences of lengths N and M, where M<N, you need to zero-pad the shortest sequence so that it has a total of N samples, before computing the spectrum.
> > Well, that would have severe impact on the phase of the spectrum of s. > > If you implement a delay estimator based on phase as I suggested above, > > you will find that the time reference shifts according to how you zero > > pad the data. > > You are correct. But my problem was a simple correlation of 2 real > no. sequences -- so my main objective would be to get a correlation > coeff with the correlation index, how does it effect these?
I haven't worked through the details, but I would expect that everything is referenced to the first sample in both sequences. So if you zero-pad after the signal, you should get a cross correlation time delay that corresponds to the matching sequence directly. If you pad zeros before the reference signal, you would have to correct for that extra delay in somme way. I'm not sure how bad that would be, but it would at least introduce some extra book keeping that would otherwise be unnecessary. Padding zeros at the front could, in the worst case, also introduce wrap-around effects that might obfuscate your analysis.
> Also, in using FFTW (r2c) for FFT, zero padding at the end of the > sequence is a *MUST* or does this optimize it more than padding on > top and bottom?
I don't know what FFTW does, but I can't see any obvious reasons why anyone would zero-pad before the signal. Rune
stevenj@alum.mit.edu (Steven G. Johnson) wrote in message news:<27cfb406.0407151527.f4f825a@posting.google.com>...
> ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote... > > So, I need to pad (say zero padding) S and C individually to some > > number 2^N (Say if I want to do a radix2 type) ---Is this correct? > > If you are using FFTW, there is no need to pad to a power-of-two size. > (Highly-composite sizes, say products of 2/3/5, *will* be more > efficient, but any size should be handled in a reasonable amount of > time.) The main reason for padding is to choose between cyclic and > linear correlations, as I explained in my other message. > > Cordially, > Steven G. Johnson
Thanks to Rune and Steven! Things are quite clear now and I am happy that I am in the right direction. Rune: I tried for that book, but surprisingly Barnes & Noble doesnt have a copy in any of their stores. Thanks again! --Ramoj
ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote in message news:<30d8ae8b.0407160625.5c522f56@posting.google.com>...

> Rune: I tried for that book, but surprisingly Barnes & Noble doesnt > have a copy in any of their stores.
I'm not all that surprised that the bookstores don't have that book. I don't know many people who actually own (or have available) a copy, but all four of them work with real-world data analysis as opposed to academic research or teaching. If a book is not used in a class, it will not be available in the bookstores. I am quite sure the book is available through bookstores on special order, as well as from www.amazon.com. Rune
Rune Allnor wrote:
> ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote in message news:<30d8ae8b.0407160625.5c522f56@posting.google.com>... > > >>Rune: I tried for that book, but surprisingly Barnes & Noble doesnt >>have a copy in any of their stores. > > > I'm not all that surprised that the bookstores don't have that book. > I don't know many people who actually own (or have available) a copy, > but all four of them work with real-world data analysis as opposed > to academic research or teaching. If a book is not used in a class, > it will not be available in the bookstores.
I saw Bendat and Piersol's book at the Fairlakes Barnes and Noble in Fairfax, VA , USA about a month ago. I didn't have a chance to browse through it. I own the 2nd edition, does anyone have any comment on the 3rd edition? worth updating?
> > I am quite sure the book is available through bookstores on special > order, as well as from www.amazon.com. > > Rune
Hi All, 
I have some weird results....Ok! Let me explain what I did

inA[2N]; inB[2N] are my 2 input arrays which has real no.s in it.
<for now lets say, 2N is either -- 2, 4, 16, 32, ....>

I had used FFTWs to calculate FFTs.
outA[N+1][2] = FFTW_r2c (real_to_complex) { inA[2N] }
outB[N+1][2] = FFTW_r2c (real_to_complex) { inB[2N] }

Where [][0] = real part; [][1] = img. part;

Then, outB'[N+1][2] = conjugate of { outB[N+1][2] }
   (a+ib = conjugate of a-ib) -- if my math is not wrong.

Then I found outC[N+1][2] = outA[N+1][2] * outB'[N+1][2];
Then used the regular 1d complex BACKWARD Tranform function in FFTW to
find outD[N+1][2] = FFTW_1d_ (complex to complex, backward)
(outC[N+1][2])

Then does the Values in the real. part of that outD mean correlation
coefficients?
if yes, the highest correlation coeff Index is different from that
obtained from a normal/regular cross-correlation.

I am not very sure what I am doing wrong above...Any help is highly
appreciated.
Thanks,
Ramoj