DSPRelated.com
Forums

Sub-sequence indices for computing FFT cross-correlation between two long sequences

Started by Nicholas Kinar April 1, 2012
Hello,

I need to compute the cross-correlation between two signals x[n] and 
y[n].  For example, signal x[n] has a length of 3441534 and signal y[n] 
has a length of 3441528.  I only need the cross-correlation at a 
specific number of lags.

In Matlab, I've been doing something similar to the following.  This 
takes up a lot of memory:

M = 10000;
corrLength = length(x)+length(y)-1;
y4 = fftshift(ifft(fft(x,corrLength).*conj(fft(y,corrLength))));
N = length(y);
y5 = y4(N:-1:N-M+1);

I suppose that M is the number of "lags" that I want, if this is the 
proper use of the term.  I've found an excellent example of 
cross-correlation using sub-sequences posted to this newsgroup a little 
while ago: http://www.dsprelated.com/showmessage/3341/1.php.

However, looking at the Matlab code in this previous post, I am 
uncertain with respect to how I should cut the signals x[n] and y[n] 
into sub-sequences.  I had a look at Blahut's book mentioned in the 
previous post, but as far as I can discern, Blahut deals mostly with 
cross-correlation using a series of large matrices for very optimized 
applications.

This is what I understand from the previous post:  The FFT is being 
applied to zero-padded sections of the signals x[n] and y[n].  An 
addition is being made in the frequency domain of the zero-padded 
sections.  The IFFT is applied to the resulting sum only when an answer 
is required.  The buffer length = nfft - (2 * lags).  This is a very 
neat algorithm.

Question:

How do I construct the sub-sequences {X1, X2, X3, X4,...} and {Y1, Y2, 
Y3, Y4,...}?  What are the indices of the sub-sequences, and how many 
zeros do I need to pad these sub-sequences?  Is there some general 
equation/relationship that can be used to calculate this?  To me, this 
is not immediately apparent from the Matlab code in the previous post 
(http://www.dsprelated.com/showmessage/3341/1.php).

Nicholas









On 4/1/12 12:27 AM, Nicholas Kinar wrote:
> > I need to compute the cross-correlation between two signals x[n] and > y[n]. For example, signal x[n] has a length of 3441534 and signal y[n] > has a length of 3441528. I only need the cross-correlation at a specific > number of lags.
i wonder then, if you only need the result for a number of lags that is much smaller than either x[n] or y[n], that you might do just as well cranking this out in the time domain, for each of the fewer lags. assume either x[n] or y[n] are zero padded which will make their unequal lengths a moot issue. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
On Apr 1, 4:27&#4294967295;pm, Nicholas Kinar <n.ki...@usask.ca> wrote:
> Hello, > > I need to compute the cross-correlation between two signals x[n] and > y[n]. &#4294967295;For example, signal x[n] has a length of 3441534 and signal y[n] > has a length of 3441528. &#4294967295;I only need the cross-correlation at a > specific number of lags. > > In Matlab, I've been doing something similar to the following. &#4294967295;This > takes up a lot of memory: > > M = 10000; > corrLength = length(x)+length(y)-1; > y4 = fftshift(ifft(fft(x,corrLength).*conj(fft(y,corrLength)))); > N = length(y); > y5 = y4(N:-1:N-M+1); > > I suppose that M is the number of "lags" that I want, if this is the > proper use of the term. &#4294967295;I've found an excellent example of > cross-correlation using sub-sequences posted to this newsgroup a little > while ago:http://www.dsprelated.com/showmessage/3341/1.php. > > However, looking at the Matlab code in this previous post, I am > uncertain with respect to how I should cut the signals x[n] and y[n] > into sub-sequences. &#4294967295;I had a look at Blahut's book mentioned in the > previous post, but as far as I can discern, Blahut deals mostly with > cross-correlation using a series of large matrices for very optimized > applications. > > This is what I understand from the previous post: &#4294967295;The FFT is being > applied to zero-padded sections of the signals x[n] and y[n]. &#4294967295;An > addition is being made in the frequency domain of the zero-padded > sections. &#4294967295;The IFFT is applied to the resulting sum only when an answer > is required. &#4294967295;The buffer length = nfft - (2 * lags). &#4294967295;This is a very > neat algorithm. > > Question: > > How do I construct the sub-sequences {X1, X2, X3, X4,...} and {Y1, Y2, > Y3, Y4,...}? &#4294967295;What are the indices of the sub-sequences, and how many > zeros do I need to pad these sub-sequences? &#4294967295;Is there some general > equation/relationship that can be used to calculate this? &#4294967295;To me, this > is not immediately apparent from the Matlab code in the previous post > (http://www.dsprelated.com/showmessage/3341/1.php). > > Nicholas
Warning, ordinary cc does not perform well unless the signals are white! Google Gereralized cross correlation for a better methos (or two!) Hardy
On 01/04/2012 12:10 AM, robert bristow-johnson wrote:
> > i wonder then, if you only need the result for a number of lags that is > much smaller than either x[n] or y[n], that you might do just as well > cranking this out in the time domain, for each of the fewer lags. assume > either x[n] or y[n] are zero padded which will make their unequal > lengths a moot issue.
Thanks for your response, Robert. Let me see if I understand this: perhaps the following code demonstrates cross-correlation in the time domain for a certain number of lags? In my code snippet below, the j index is computed from 0 to M-1, where M is the total length of the y vector. To speed up the calculation, is there a way to avoid looping from 0 to M-1, or is this simply part of the correlation operation in the time domain? x = 1:10; % test vector y = 11:20; % test vector M = length(x); N = length(y); lag = 5; ytest = round(xcorr(x,y)); yout_test = ytest(N:-1:N-lag+1); % Correlation of x and y in time domain % yout = correlation(x,y) cnt = 1; yout = zeros(1, lag); for i = N-1:-1:N-lag for j = 0:M-1 if(i-j < 0 || i-j > N-1) continue; end yout(cnt) = yout(cnt) + x(j+1) * y(N-1-(i-j)+1); end cnt = cnt + 1; end fprintf('comparison = %d\n', yout_test == yout);
On 01/04/2012 10:36 AM, Nicholas Kinar wrote:
> In my code snippet below, the j > index is computed from 0 to M-1, where M is the total length of the y > vector.
This should read "the total length of the x vector."
On 01/04/2012 12:47 AM, HardySpicer wrote:
> Warning, ordinary cc does not perform well unless the signals are > white! Google Gereralized cross correlation > for a better methos (or two!) > > > Hardy
That's a great tip; thanks for posting this, Hardy.
On 31/03/2012 10:27 PM, Nicholas Kinar wrote:
> Question: > > How do I construct the sub-sequences {X1, X2, X3, X4,...} and {Y1, Y2, > Y3, Y4,...}? What are the indices of the sub-sequences, and how many > zeros do I need to pad these sub-sequences? Is there some general > equation/relationship that can be used to calculate this? To me, this is > not immediately apparent from the Matlab code in the previous post > (http://www.dsprelated.com/showmessage/3341/1.php). >
Okay, I think that I know what is happening here. An algorithm for fast cross-correlation using sub-sequences is given on pp. 177-178 of the monograph by Nielsen [1], but it is taken from pg. 560 of another more widely-known reference [2]. [1] R. O. Nielsen, Sonar Signal Processing. Boston: Artech House, 1991. [2] A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. Englewood Cliffs, N.J., Prentice-Hall: , 1975. To cross-correlate two large sequences x[k] and y[k], two buffers are created, x_i[n] and y_i[n]. If the number of lags is given by L, then the length of each buffer is 2*L. The sequences x[k] and y[k] need to have the same length, or must be zero-padded to be the same length. The x_i[n] is half-filled with elements from x[n] and half-padded with zeros. Thus, if 0 <= n <= (2*L-1), elements of the buffer x_i[n] from 0 <= n <= (L/2-1) are filled with a section of x[n], and elements of the buffer x_i[n] from (L/2-1) < n <= (2*L-1) are filled with zeros. The y_i[n] is completely filled with elements from y[n], such that 0 <= n <= (2*L-1). The y_i[n] is not zero-padded at all. The FFT is applied to both buffers. Each sub-sequence is then added together and stored in buffS_i[n]. The addition can be done in the frequency domain: buffS_i[n] = x_i[n] + conj(fft(x_i[n])) .* fft(y_i[n]); Each sub-sequence i is generated using the following: x_i[n]= x[n + (i-1)*L], for n = 0,...,L-1 x_i[n] = 0, for n = L,...,2*L-1 y_i[n] = y[n+(i-1)*L], for n = 0,...,2*L-1 The final output is found by the inverse FFT once all of the sub-sequences have been added: yout[n] = IFFT(buffS_i[n]); The neat thing about the algorithm is that the full FFT does not have to be taken of both x[n] and y[n]. Nicholas