comp.dsp | Sub-sequence indices for computing FFT cross-correlation between two long sequences

Hello,

I need to compute the cross-correlation between two signals x[n] and 
y[n].  For example, signal x[n] has a length of 3441534 and signal y[n] 
has a length of 3441528.  I only need the cross-correlation at a 
specific number of lags.

In Matlab, I've been doing something similar to the following.  This 
takes up a lot of memory:

M = 10000;
corrLength = length(x)+length(y)-1;
y4 = fftshift(ifft(fft(x,corrLength).*conj(fft(y,corrLength))));
N = length(y);
y5 = y4(N:-1:N-M+1);

I suppose that M is the number of "lags" that I want, if this is the 
proper use of the term.  I've found an excellent example of 
cross-correlation using sub-sequences posted to this newsgroup a little 
while ago: http://www.dsprelated.com/showmessage/3341/1.php.

However, looking at the Matlab code in this previous post, I am 
uncertain with respect to how I should cut the signals x[n] and y[n] 
into sub-sequences.  I had a look at Blahut's book mentioned in the 
previous post, but as far as I can discern, Blahut deals mostly with 
cross-correlation using a series of large matrices for very optimized 
applications.

This is what I understand from the previous post:  The FFT is being 
applied to zero-padded sections of the signals x[n] and y[n].  An 
addition is being made in the frequency domain of the zero-padded 
sections.  The IFFT is applied to the resulting sum only when an answer 
is required.  The buffer length = nfft - (2 * lags).  This is a very 
neat algorithm.

Question:

How do I construct the sub-sequences {X1, X2, X3, X4,...} and {Y1, Y2, 
Y3, Y4,...}?  What are the indices of the sub-sequences, and how many 
zeros do I need to pad these sub-sequences?  Is there some general 
equation/relationship that can be used to calculate this?  To me, this 
is not immediately apparent from the Matlab code in the previous post 
(http://www.dsprelated.com/showmessage/3341/1.php).

Nicholas

Reply by robert bristow-johnson ●April 1, 20122012-04-01

On 4/1/12 12:27 AM, Nicholas Kinar wrote:
>
> I need to compute the cross-correlation between two signals x[n] and
> y[n]. For example, signal x[n] has a length of 3441534 and signal y[n]
> has a length of 3441528. I only need the cross-correlation at a specific
> number of lags.

i wonder then, if you only need the result for a number of lags that is 
much smaller than either x[n] or y[n], that you might do just as well 
cranking this out in the time domain, for each of the fewer lags. 
assume either x[n] or y[n] are zero padded which will make their unequal 
lengths a moot issue.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by HardySpicer ●April 1, 20122012-04-01

On Apr 1, 4:27&#4294967295;pm, Nicholas Kinar <n.ki...@usask.ca> wrote:
> Hello,
>
> I need to compute the cross-correlation between two signals x[n] and
> y[n]. &#4294967295;For example, signal x[n] has a length of 3441534 and signal y[n]
> has a length of 3441528. &#4294967295;I only need the cross-correlation at a
> specific number of lags.
>
> In Matlab, I've been doing something similar to the following. &#4294967295;This
> takes up a lot of memory:
>
> M = 10000;
> corrLength = length(x)+length(y)-1;
> y4 = fftshift(ifft(fft(x,corrLength).*conj(fft(y,corrLength))));
> N = length(y);
> y5 = y4(N:-1:N-M+1);
>
> I suppose that M is the number of "lags" that I want, if this is the
> proper use of the term. &#4294967295;I've found an excellent example of
> cross-correlation using sub-sequences posted to this newsgroup a little
> while ago:http://www.dsprelated.com/showmessage/3341/1.php.
>
> However, looking at the Matlab code in this previous post, I am
> uncertain with respect to how I should cut the signals x[n] and y[n]
> into sub-sequences. &#4294967295;I had a look at Blahut's book mentioned in the
> previous post, but as far as I can discern, Blahut deals mostly with
> cross-correlation using a series of large matrices for very optimized
> applications.
>
> This is what I understand from the previous post: &#4294967295;The FFT is being
> applied to zero-padded sections of the signals x[n] and y[n]. &#4294967295;An
> addition is being made in the frequency domain of the zero-padded
> sections. &#4294967295;The IFFT is applied to the resulting sum only when an answer
> is required. &#4294967295;The buffer length = nfft - (2 * lags). &#4294967295;This is a very
> neat algorithm.
>
> Question:
>
> How do I construct the sub-sequences {X1, X2, X3, X4,...} and {Y1, Y2,
> Y3, Y4,...}? &#4294967295;What are the indices of the sub-sequences, and how many
> zeros do I need to pad these sub-sequences? &#4294967295;Is there some general
> equation/relationship that can be used to calculate this? &#4294967295;To me, this
> is not immediately apparent from the Matlab code in the previous post
> (http://www.dsprelated.com/showmessage/3341/1.php).
>
> Nicholas

Warning, ordinary cc does not perform well unless the signals are
white! Google Gereralized cross correlation
for a better methos (or two!)


Hardy

Reply by Nicholas Kinar ●April 1, 20122012-04-01

On 01/04/2012 12:10 AM, robert bristow-johnson wrote:
>
> i wonder then, if you only need the result for a number of lags that is
> much smaller than either x[n] or y[n], that you might do just as well
> cranking this out in the time domain, for each of the fewer lags. assume
> either x[n] or y[n] are zero padded which will make their unequal
> lengths a moot issue.

Thanks for your response, Robert.  Let me see if I understand this: 
perhaps the following code demonstrates cross-correlation in the time 
domain for a certain number of lags?  In my code snippet below, the j 
index is computed from 0 to M-1, where M is the total length of the y 
vector. To speed up the calculation, is there a way to avoid looping 
from 0 to M-1, or is this simply part of the correlation operation in 
the time domain?

x =  1:10;      % test vector
y = 11:20;      % test vector
M = length(x);
N = length(y);
lag = 5;
ytest = round(xcorr(x,y));
yout_test = ytest(N:-1:N-lag+1);

% Correlation of x and y in time domain
% yout = correlation(x,y)
cnt = 1;
yout = zeros(1, lag);
for i = N-1:-1:N-lag
     for j = 0:M-1
         if(i-j < 0 || i-j > N-1)
             continue;
         end
         yout(cnt) = yout(cnt) + x(j+1) * y(N-1-(i-j)+1);
     end
     cnt = cnt + 1;
end
fprintf('comparison = %d\n', yout_test == yout);

Reply by Nicholas Kinar ●April 1, 20122012-04-01

On 01/04/2012 10:36 AM, Nicholas Kinar wrote:
> In my code snippet below, the j
> index is computed from 0 to M-1, where M is the total length of the y
> vector.

This should read "the total length of the x vector."

Reply by Nicholas Kinar ●April 2, 20122012-04-02

On 01/04/2012 12:47 AM, HardySpicer wrote:
> Warning, ordinary cc does not perform well unless the signals are
> white! Google Gereralized cross correlation
> for a better methos (or two!)
>
>
> Hardy

That's a great tip; thanks for posting this, Hardy.

Reply by Nicholas Kinar ●April 2, 20122012-04-02

On 31/03/2012 10:27 PM, Nicholas Kinar wrote:
> Question:
>
> How do I construct the sub-sequences {X1, X2, X3, X4,...} and {Y1, Y2,
> Y3, Y4,...}? What are the indices of the sub-sequences, and how many
> zeros do I need to pad these sub-sequences? Is there some general
> equation/relationship that can be used to calculate this? To me, this is
> not immediately apparent from the Matlab code in the previous post
> (http://www.dsprelated.com/showmessage/3341/1.php).
>

Okay, I think that I know what is happening here.  An algorithm for fast 
cross-correlation using sub-sequences is given on pp. 177-178 of the 
monograph by Nielsen [1], but it is taken from pg. 560 of another more 
widely-known reference [2].

[1] R. O. Nielsen, Sonar Signal Processing. Boston: Artech House, 1991.
[2] A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. 
Englewood Cliffs, N.J., Prentice-Hall: , 1975.

To cross-correlate two large sequences x[k] and y[k], two buffers are 
created, x_i[n] and y_i[n].  If the number of lags is given by L, then 
the length of each buffer is 2*L.  The sequences x[k] and y[k] need to 
have the same length, or must be zero-padded to be the same length.

The x_i[n] is half-filled with elements from x[n] and half-padded with 
zeros.  Thus, if 0 <= n <= (2*L-1), elements of the buffer x_i[n] from 0 
<= n <= (L/2-1) are filled with a section of x[n], and elements of the 
buffer x_i[n] from (L/2-1) < n <= (2*L-1) are filled with zeros.

The y_i[n] is completely filled with elements from y[n], such that 0 <= 
n <= (2*L-1).  The y_i[n] is not zero-padded at all.

The FFT is applied to both buffers. Each sub-sequence is then added 
together and stored in buffS_i[n].  The addition can be done in the 
frequency domain:

  buffS_i[n] = x_i[n] + conj(fft(x_i[n])) .* fft(y_i[n]);

Each sub-sequence i is generated using the following:

x_i[n]= x[n + (i-1)*L], for n = 0,...,L-1
x_i[n] = 0, for n = L,...,2*L-1
y_i[n] = y[n+(i-1)*L], for n = 0,...,2*L-1

The final output is found by the inverse FFT once all of the 
sub-sequences have been added:

yout[n] = IFFT(buffS_i[n]);

The neat thing about the algorithm is that the full FFT does not have to 
be taken of both x[n] and y[n].

Nicholas

Sub-sequence indices for computing FFT cross-correlation between two long sequences

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group