comp.dsp | Correlation of 2 Sequences using FFT (W/W.O FFTW)

Hi All,

I am very much excited to see that this group answers almost all the
doubts. But I have some trivial yet confusing doubts.

Situation/Objective:

Correlating 2 Sequences -- S = {s1,s2,s3,...sN} and C =
{c1,c2,c3,....cM} where S and C are 2 discrete set of points one of
length N and one of length M (say, M>N). We do not know anything about
their periodicity or any other properties...etc.

All we know is that these sequences are 2 sets of points. To make the
case simple,  we can assume that these values are noted/taken/observed
at constant time interval that is same for both the sequences. i.e.,
say deltat= (tK-tJ)=(tJ-tI) for all i, j, k and the sequences are
noted as below...or some thing like that:

Time   S     C
to     --    c0
t1     --    c1
t2     s1    c2
.......
........
tK    sN     cK
.....
........
tM    --     cM


Some concers that I have: 
1. I could physically correlate S to C (say), and found {s1....sN}
fits/correlates well with some section of C say {cE.....cL} how can I
find the index cE on the sequence C when I try to do this correlation
by FFT.

2. Correlation through FFT needs: a. Finding FFT(S) and FFT(C) b.
ComplexArray=FFT(S) * Conj{FFT(C)} C. IFFT{ComplexArray}

So, I need to pad (say zero padding) S and C individually to some
number 2^N (Say if I want to do a radix2 type) ---Is this correct?

3. If yes, what difference does it make to pad some zero's on top of S
and some on the bottom of S to padding all the zeros on the bottom.
-- When I am using FFTW for calculating r2c transform, do I need to
necessarily pad below the actual sequence?

4. After padding how can I say that ok, this sequence S fits with that
part of sequence C staring with some index = cE (say)?

I know certain things here and there on FFTs Applied to Correlation,
but I wanted to confirm what I understand is write. I will be very
happy and thankful if you can answer the above questions...Thanks in
advance for your patience and help.

--Ramoj Paruchuri

Reply by Rune Allnor ●July 15, 20042004-07-15

ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote in message news:<30d8ae8b.0407141106.7a9ba8f2@posting.google.com>...
> Hi All,
> 
> I am very much excited to see that this group answers almost all the
> doubts. But I have some trivial yet confusing doubts.

Ah, the trivial and yet confusing questions are the fun ones; those 
are the ones one generally learn the most from...

> Situation/Objective:
> 
> Correlating 2 Sequences -- S = {s1,s2,s3,...sN} and C =
> {c1,c2,c3,....cM} where S and C are 2 discrete set of points one of
> length N and one of length M (say, M>N). We do not know anything about
> their periodicity or any other properties...etc.
> 
> All we know is that these sequences are 2 sets of points. To make the
> case simple,  we can assume that these values are noted/taken/observed
> at constant time interval that is same for both the sequences. i.e.,
> say deltat= (tK-tJ)=(tJ-tI) for all i, j, k and the sequences are
> noted as below...or some thing like that:
> 
> Time   S     C
> to     --    c0
> t1     --    c1
> t2     s1    c2
> .......
> ........
> tK    sN     cK
> .....
> ........
> tM    --     cM
> 
> 
> Some concers that I have: 
> 1. I could physically correlate S to C (say), and found {s1....sN}
> fits/correlates well with some section of C say {cE.....cL} how can I
> find the index cE on the sequence C when I try to do this correlation
> by FFT.

The time delay should be available in the time series after applying 
the IFFT to your spectrum. If IFFT is not an option, investigate the 
phase of your spectrum. Unwrap the phase and apply a weighted LMS 
procedure to find the slope of the phase, and thus the time delay, 
of your data. Please consult equation (5.101b) in

Bendat & Piersol: Random Data, 3rd ed.
      Wiley Interscience, 2000

for details.

> 2. Correlation through FFT needs: a. Finding FFT(S) and FFT(C) b.
> ComplexArray=FFT(S) * Conj{FFT(C)} C. IFFT{ComplexArray}
> 
> So, I need to pad (say zero padding) S and C individually to some
> number 2^N (Say if I want to do a radix2 type) ---Is this correct?

Sounds reasonable. You need to ensure that the frequency sampling is 
equal for the two spectra.

> 3. If yes, what difference does it make to pad some zero's on top of S
> and some on the bottom of S to padding all the zeros on the bottom.
> -- When I am using FFTW for calculating r2c transform, do I need to
> necessarily pad below the actual sequence?

Well, that would have severe impact on the phase of the spectrum of s. 
If you implement a delay estimator based on phase as I suggested above, 
you will find that the time reference shifts according to how you zero
pad the data. 

> 4. After padding how can I say that ok, this sequence S fits with that
> part of sequence C staring with some index = cE (say)?

I don't understand your question here?

> I know certain things here and there on FFTs Applied to Correlation,
> but I wanted to confirm what I understand is write. I will be very
> happy and thankful if you can answer the above questions...Thanks in
> advance for your patience and help.
> 
> --Ramoj Paruchuri

Except for question 4, where I did not understand what you mean, I think 
what you says makes sense and sounds reasonable. For practical use (i.e.
interpretation and analysis of results) the books by Bendat, and Bendat &
Piersol (there are three such books) are excellent. A good library will 
have a copy available of at least "Random Data" mentioned above.

Rune

Reply by Ramoj Paruchuri ●July 15, 20042004-07-15

> Ah, the trivial and yet confusing questions are the fun ones; those 
> are the ones one generally learn the most from...

 :) You are right!

> The time delay should be available in the time series after applying 
> the IFFT to your spectrum. If IFFT is not an option, investigate the 
> phase of your spectrum. Unwrap the phase and apply a weighted LMS 
> procedure to find the slope of the phase, and thus the time delay, 
> of your data. Please consult equation (5.101b) in
> 
> Bendat & Piersol: Random Data, 3rd ed.
>       Wiley Interscience, 2000
> 
> for details.

  This seems reasonable..Eithercase, I will try looking into that
book.

> Sounds reasonable. You need to ensure that the frequency sampling is 
> equal for the two spectra.

  Frequency sampling? or is it time sampling? -- I thought if one can
ensure that time sampling is equal with the total no. of samples
equal, frequency sampling of the 2 spectra are automatically equal?

> Well, that would have severe impact on the phase of the spectrum of s. 
> If you implement a delay estimator based on phase as I suggested above, 
> you will find that the time reference shifts according to how you zero
> pad the data. 

 You are correct. But my problem was a simple correlation of 2 real
no. sequences -- so my main objective would be to get a correlation
coeff with the correlation index, how does it effect these?

 Also, in using FFTW (r2c) for FFT, zero padding at the end of the
sequence is a  *MUST* or does this optimize it more than padding on
top and bottom?

> I don't understand your question here?

I think, I understand now...

> Except for question 4, where I did not understand what you mean, I think 
> what you says makes sense and sounds reasonable. For practical use (i.e.
> interpretation and analysis of results) the books by Bendat, and Bendat &
> Piersol (there are three such books) are excellent. A good library will 
> have a copy available of at least "Random Data" mentioned above.
> 
> Rune

Thank you very much for the effort in explain the above. Help from
others is very welcome..I would like to know more how this FFTW (r2c)
thing works!

Thanks in advance,
Ramoj

Reply by Steven G. Johnson ●July 15, 20042004-07-15

ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote...
>  You are correct. But my problem was a simple correlation of 2 real
> no. sequences -- so my main objective would be to get a correlation
> coeff with the correlation index, how does it effect these?
> 
>  Also, in using FFTW (r2c) for FFT, zero padding at the end of the
> sequence is a  *MUST* or does this optimize it more than padding on
> top and bottom?

This has nothing in particular to do with FFTW or with its r2c
transform (which is just an optimization for real data and does not
affect signal-processing questions like whether the input is
zero-padded etc.).

Whether you zero-pad depends entirely upon how you want to treat the
boundaries in the correlation function.  If you don't zero pad, then
you are computing the "cyclic" correlation function with periodic
boundaries.  If you zero pad to at least twice the length, then you
are effectively computing the correlation with non-periodic,
zero-extended boundaries.  There are other possibilities as well
(mirrored boundaries, etc).

Neither of these choices is really correct if you want to estimate the
correlation function of two infinite sequences via short segments
thereof.  Like the related problem of spectral estimation, you
probably want to apply some sort of window function to the data before
transforming, with or without zero-padding.

Cordially,
Steven G. Johnson

Reply by Steven G. Johnson ●July 15, 20042004-07-15

ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote...
> So, I need to pad (say zero padding) S and C individually to some
> number 2^N (Say if I want to do a radix2 type) ---Is this correct?

If you are using FFTW, there is no need to pad to a power-of-two size.
 (Highly-composite sizes, say products of 2/3/5, *will* be more
efficient, but any size should be handled in a reasonable amount of
time.)  The main reason for padding is to choose between cyclic and
linear correlations, as I explained in my other message.

Cordially,
Steven G. Johnson

Reply by Rune Allnor ●July 16, 20042004-07-16

ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote in message news:<30d8ae8b.0407150703.4a39393c@posting.google.com>...

> > Sounds reasonable. You need to ensure that the frequency sampling is 
> > equal for the two spectra.
> 
>   Frequency sampling? or is it time sampling? -- I thought if one can
> ensure that time sampling is equal with the total no. of samples
> equal, frequency sampling of the 2 spectra are automatically equal?

The discretization of the frequency axis must be equal for both spectra.
They must have the same sampling frequency (which is ensured by default
in all but the weirdest sequences) and they need to have the same number 
of frequency bins. 

In the case that you have two sequences of lengths N and M, where M<N,
you need to zero-pad the shortest sequence so that it has a total of 
N samples, before computing the spectrum. 

> > Well, that would have severe impact on the phase of the spectrum of s. 
> > If you implement a delay estimator based on phase as I suggested above, 
> > you will find that the time reference shifts according to how you zero
> > pad the data. 
> 
>  You are correct. But my problem was a simple correlation of 2 real
> no. sequences -- so my main objective would be to get a correlation
> coeff with the correlation index, how does it effect these?

I haven't worked through the details, but I would expect that everything 
is referenced to the first sample in both sequences. So if you zero-pad 
after the signal, you should get a cross correlation time delay that
corresponds to the matching sequence directly. If you pad zeros before 
the reference signal, you would have to correct for that extra delay 
in somme way. I'm not sure how bad that would be, but it would at least 
introduce some extra book keeping that would otherwise be unnecessary.

Padding zeros at the front could, in the worst case, also introduce 
wrap-around effects that might obfuscate your analysis.  

>  Also, in using FFTW (r2c) for FFT, zero padding at the end of the
> sequence is a  *MUST* or does this optimize it more than padding on
> top and bottom?

I don't know what FFTW does, but I can't see any obvious reasons why 
anyone would zero-pad before the signal.

Rune

Reply by Ramoj Paruchuri ●July 16, 20042004-07-16

stevenj@alum.mit.edu (Steven G. Johnson) wrote in message news:<27cfb406.0407151527.f4f825a@posting.google.com>...
> ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote...
> > So, I need to pad (say zero padding) S and C individually to some
> > number 2^N (Say if I want to do a radix2 type) ---Is this correct?
> 
> If you are using FFTW, there is no need to pad to a power-of-two size.
>  (Highly-composite sizes, say products of 2/3/5, *will* be more
> efficient, but any size should be handled in a reasonable amount of
> time.)  The main reason for padding is to choose between cyclic and
> linear correlations, as I explained in my other message.
> 
> Cordially,
> Steven G. Johnson

Thanks to Rune and Steven! Things are quite clear now and I am happy
that I am in the right direction.

Rune: I tried for that book, but surprisingly Barnes & Noble doesnt
have a copy in any of their stores.

Thanks again!
--Ramoj

Reply by Rune Allnor ●July 17, 20042004-07-17

ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote in message news:<30d8ae8b.0407160625.5c522f56@posting.google.com>...

> Rune: I tried for that book, but surprisingly Barnes & Noble doesnt
> have a copy in any of their stores.

I'm not all that surprised that the bookstores don't have that book. 
I don't know many people who actually own (or have available) a copy, 
but all four of them work with real-world data analysis as opposed 
to academic research or teaching. If a book is not used in a class, 
it will not be available in the bookstores. 

I am quite sure the book is available through bookstores on special 
order, as well as from www.amazon.com.

Rune

Reply by Stan Pawlukiewicz ●July 19, 20042004-07-19

Rune Allnor wrote:
> ramojparuchuri@gmail.com (Ramoj Paruchuri) wrote in message news:<30d8ae8b.0407160625.5c522f56@posting.google.com>...
> 
> 
>>Rune: I tried for that book, but surprisingly Barnes & Noble doesnt
>>have a copy in any of their stores.
> 
> 
> I'm not all that surprised that the bookstores don't have that book. 
> I don't know many people who actually own (or have available) a copy, 
> but all four of them work with real-world data analysis as opposed 
> to academic research or teaching. If a book is not used in a class, 
> it will not be available in the bookstores. 

I saw Bendat and Piersol's book at the Fairlakes Barnes and Noble in 
Fairfax, VA , USA about a month ago.  I didn't have a chance to browse 
through it.  I own the 2nd edition, does anyone have any comment on the 
3rd edition? worth updating?


> 
> I am quite sure the book is available through bookstores on special 
> order, as well as from www.amazon.com.
> 
> Rune

Reply by Ramoj Paruchuri ●July 19, 20042004-07-19

Hi All, 
I have some weird results....Ok! Let me explain what I did

inA[2N]; inB[2N] are my 2 input arrays which has real no.s in it.
<for now lets say, 2N is either -- 2, 4, 16, 32, ....>

I had used FFTWs to calculate FFTs.
outA[N+1][2] = FFTW_r2c (real_to_complex) { inA[2N] }
outB[N+1][2] = FFTW_r2c (real_to_complex) { inB[2N] }

Where [][0] = real part; [][1] = img. part;

Then, outB'[N+1][2] = conjugate of { outB[N+1][2] }
   (a+ib = conjugate of a-ib) -- if my math is not wrong.

Then I found outC[N+1][2] = outA[N+1][2] * outB'[N+1][2];
Then used the regular 1d complex BACKWARD Tranform function in FFTW to
find outD[N+1][2] = FFTW_1d_ (complex to complex, backward)
(outC[N+1][2])

Then does the Values in the real. part of that outD mean correlation
coefficients?
if yes, the highest correlation coeff Index is different from that
obtained from a normal/regular cross-correlation.

I am not very sure what I am doing wrong above...Any help is highly
appreciated.
Thanks,
Ramoj

Previous12 Next

Correlation of 2 Sequences using FFT (W/W.O FFTW)

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group