# Calculating a cross correlation (to find delay) on irregulary sampled data

Started by September 4, 2010
```Hi there,

I have two sets of data which are irregularly sampled. These two sets are
two measurements of the same product but on different places.

Eg:
Sensor 1: time 1, 3, 7, 8, 10, 15 ...
Sensor 2: time 1, 2, 3, 5, 9, 10, 11, 14 ...

The measurements are irregular because they are done by people, not by an
automatic sampled system.

I would like to calculate the cross correlation between these signals to
obtain an estimate for a time delay.

The question however is how to do this because of this missing data. The
only thing I would come up with is interpolating the missing data, but I
hardly believe that this is the best solution.

```
```On Sep 4, 8:03&#2013266080;am, "webinn" <da_junk2@n_o_s_p_a_m.telenet.be> wrote:
> Hi there,
>
> I have two sets of data which are irregularly sampled. These two sets are
> two measurements of the same product but on different places.
>
> Eg:
> Sensor 1: time 1, 3, 7, 8, 10, 15 ...
> Sensor 2: time 1, 2, 3, 5, 9, 10, 11, 14 ...
>
> The measurements are irregular because they are done by people, not by an
> automatic sampled system.
>
> I would like to calculate the cross correlation between these signals to
> obtain an estimate for a time delay.
>
> The question however is how to do this because of this missing data. The
> only thing I would come up with is interpolating the missing data, but I
> hardly believe that this is the best solution.
>

Seems like a difficult problem.  Chatfield's book "The Analysis of
Time Series: An Introduction", which I had for a graduate course in
the early 80's only mentions unequal spaced data in passing.  He
suggests using splines to interpolate in the time domain.

I'm not an expert at non-uniform sampling, but I briefly looked into
it many years ago.  I seem to recall that it was possible to compute
uniformly spaced frequency domain outputs, even if your time data was
not uniformly sampled.

Consider a simple 4x4 DFT matrix for uniform samples:

X0       | 0  0  0  0 |  | x0 |
X1   =  | 0  1  2  3 |  | x1 |
X2       | 0  2  4  6 |  | x2 |
X3       | 0  3  6  9 |  | x3 |

The 4x4 matrix are your twiddles (the k*n in e(-j*twopi*k*n/N), the Xk
column to the left are the (complex) DFT results, and the xn column
are the uniformly spaced inputs.

I find it helpful to put a visual interpretation to the above. The
topmost row of twiddles are a constant '0' frequency waveform (1 -
j0), the next row down is a 1 cycle (1f) complex waveform, the next is
2 cycles, and the bottom is 3 cycles.

Now suppose your xn are not uniformly sampled in time.  x0 is still
your zero point, but let's say x1 was actually x1.1, x2 is really
x1.9, and x3 is x2.9 (samples at times 0, 1.1, 1.9 and 2.9).

We'd have to modify the DFT twiddles to accommodate the unequal
spacing by changing the 'k*n' in the above, because 'n' is no longer
an integer.  The 'k' will remain the same (k = 0,1, 2 or 3), but 'n'
will now be 1, 1.1, 1.9 and 2.9.  And as I recall, it can be quite
tricky to figure out exactly how to do this.  What I did years ago was
to generate 16 points of a single cycle sine wave (no noise), sample
it unequally (e.g.: at 0, 1.2, 1.8, 2.1, ... 14.9, 15.1), and then
figure out what twiddles to use in the DFT matrix.  It helps to
visualize it by drawing the twiddle matrix as waveforms, and then
drawing vertical lines down on them to represent the time points
corresponding to your sample times.  Then I programmed it to make sure
that I was doing things correctly.

The results were pretty good, but my samples were only spaced a little
bit off-center from the uniform case (e.g: +/- .1 to .3).  And it
makes sense in that the DFT is a least-mean-squares estimator. If you
inverse transform your equally spaced frequency domain estimates using
a 'normal' DFT, you should see what your (interpolated) uniform time
samples look like.

If you go that route, at least you'd have uniformly spaced frequency
domain points (based on non-uniform time samples), and you could cross-
correlate in the regular way.

But I also seem to recall that the specifics of non-uniform time
estimates (e.g.: sparse samples over some regions, and dense over
others).

So I don't really know if the above would be useful to you.  Perhaps
you could elaborate on the kind of data you're dealing with.  Maybe
some others here have dealt with similar problems and can suggest
better solutions.

Kevin McGee
```
```
webinn wrote:
> Hi there,
>
> I have two sets of data which are irregularly sampled. These two sets are
> two measurements of the same product but on different places.
>
> Eg:
> Sensor 1: time 1, 3, 7, 8, 10, 15 ...
> Sensor 2: time 1, 2, 3, 5, 9, 10, 11, 14 ...
>
> The measurements are irregular because they are done by people, not by an
> automatic sampled system.
>
> I would like to calculate the cross correlation between these signals to
> obtain an estimate for a time delay.
>
> The question however is how to do this because of this missing data. The
> only thing I would come up with is interpolating the missing data, but I
> hardly believe that this is the best solution.

If the data source can be described by a model, you can approach this as
a system identification problem. If nothing is known about the system,
interpolation is the only solution.

DSP and Mixed Signal Design Consultant
http://www.abvolt.com
```
```On Sep 6, 6:05&#2013266080;am, Vladimir Vassilevsky <nos...@nowhere.com> wrote:
> webinn wrote:
> > Hi there,
>
> > I have two sets of data which are irregularly sampled. These two sets are
> > two measurements of the same product but on different places.
>
> > Eg:
> > Sensor 1: time 1, 3, 7, 8, 10, 15 ...
> > Sensor 2: time 1, 2, 3, 5, 9, 10, 11, 14 ...
>
> > The measurements are irregular because they are done by people, not by an
> > automatic sampled system.
>
> > I would like to calculate the cross correlation between these signals to
> > obtain an estimate for a time delay.
>
> > The question however is how to do this because of this missing data. The
> > only thing I would come up with is interpolating the missing data, but I
> > hardly believe that this is the best solution.
>
> If the data source can be described by a model, you can approach this as
> a system identification problem. If nothing is known about the system,
> interpolation is the only solution.

I suggest using several different interpolation schemes
and compare the results.

If you use DFT interpolation be aware that, for nonuniform
spacing, the reconstruction is based on Least-Squares
and not on the IDFT formula.

Search in  comp.soft-sys.matlab using

greg heath dftgh6

for matlab code containing relevant pseudoinverse and QR
reconstruction formulae. High frequency zero padding
will yield the interpolation.

Hope this helps.

Greg
```