# choosing correct window size for audio cross-correlation

Started by October 11, 2005
```Hello,

I am looking for advice, or a push in the right direction for the
appropriate selection of window size applied to an audio digital
signal.

My intention is to cross-correlate two non-stationary audio signals
obtained outdoors by a pair of adjacent microphones for the purpose of
determining the time delay of arrival between them.  Both have a
sampling frequency of 44.1kHz.  However from what I've read the
cross-correlation technique assumes the signals to be stationary.  So
my window needs to be small enough for the audio signal to be
quasi-stationary but yet big enough to result in a reasonably confident
result.  How can I decide on a particular window size, prove that the
resulting audio segments are stationary and hence justify my decision?

>From what I understand, the window size depends on the type of data and
what is considered 'stationary' for that type of data, i.e. a speech
signal is considered stationary for windows of size 20 to 30 ms.
However, my signal is any general outdoor noise that I can pick up and
I don't want to specify one type of sound.

I can make an ad-hoc decision, but really want to back up and prove
that my choice in window size is the most appropriate.  Any advice is
much appreciated, thanks

OJ

```
```oj wrote:
> Hello,
>
> I am looking for advice, or a push in the right direction for the
> appropriate selection of window size applied to an audio digital
> signal.
>
> My intention is to cross-correlate two non-stationary audio signals
> obtained outdoors by a pair of adjacent microphones for the purpose of
> determining the time delay of arrival between them.  Both have a
> sampling frequency of 44.1kHz.  However from what I've read the
> cross-correlation technique assumes the signals to be stationary.  So
> my window needs to be small enough for the audio signal to be
> quasi-stationary but yet big enough to result in a reasonably confident
> result.  How can I decide on a particular window size, prove that the
> resulting audio segments are stationary and hence justify my decision?
>
>
>>From what I understand, the window size depends on the type of data and
> what is considered 'stationary' for that type of data, i.e. a speech
> signal is considered stationary for windows of size 20 to 30 ms.
> However, my signal is any general outdoor noise that I can pick up and
> I don't want to specify one type of sound.
>
> I can make an ad-hoc decision, but really want to back up and prove
> that my choice in window size is the most appropriate.  Any advice is
> much appreciated, thanks

How fast does the signal direction change?

General outdoor sound comes simultaneously from many directions. What
are you trying to find?

You can usually do better by restricting the analysis to higher
frequencies where a given delay causes a greater phase difference. Then
a shorter window can be effective.

Jerry
--
Engineering is the art of making what you want from things you can get.

```
```I'm trying to track moving objects such as vehicles on a road, between
3 and 10 meters from the microphones.

The sound source would be moving perpendicular to the mics at anything
from 30kmph up to 150kmph, so when they are passing the array they
would be changing direction very rapidly.  I can do the maths and get

That's a good idea about restricting the analysis to higher frequencies
to make the window more effective, thanks.

```
```oj wrote:
> I'm trying to track moving objects such as vehicles on a road, between
> 3 and 10 meters from the microphones.
>
> The sound source would be moving perpendicular to the mics at anything
> from 30kmph up to 150kmph, so when they are passing the array they
> would be changing direction very rapidly.  I can do the maths and get
>
> That's a good idea about restricting the analysis to higher frequencies
> to make the window more effective, thanks.

You get better resolution with wider microphone separation, but side
lobes develop when the separation is too wide. Three (or more)
microphones help to eliminate that effect. There's been a lot of work on
side-looking passive sonar that might bear on your project. I'm not up
on it, but I think the papers are out "there".

Jerry
--
Engineering is the art of making what you want from things you can get.

```
```Ordinary Cross-Correlation for estimating time-delays only works well
with white noise as the signals! You will need what is called in the
literature 'Generalized Cross Correlation'. There are many methods -
Hanan-Thomson,PHAT,SCOT algorithms to mention just three.Some work
better in reverberant environments and others in noisy environments.

```
```I agree and have found that ordinary cross-correlation is insufficient;
I've been working with a GCC-PHAT algorithm and found it to be better
than the other ones for my application, thanks for the tip.  Also, I've
looked a bit at the impact of microphone separation.

What I'm really trying to pin down is how to select and justify my
window size prior to applying the GCC-PHAT cross correlation method to
(Gaussian) signal' or something like that.  Fine, so that has to be the
case and I understand why.  But I'm looking for some method to test a
series of window sizes (or mathematical proof) to come to a particular
value or range (for a given sampling frequency and bandwidth.

Hopefully I'm making sense and thank you so much Jerry and Naebad for

```
```I don't see how you can make this work.  At the ranges and speeds you are
looking at, a vehicle is a distributed sound source so that there is no single
distance that can be measured.  You might help yourself by using more
you knew your source perfectly, it couldn't be tracked using the techniques
you are proposing.

<duffnero@gmail.com> wrote:
>I'm trying to track moving objects such as vehicles on a road, between
>3 and 10 meters from the microphones.
>
>The sound source would be moving perpendicular to the mics at anything
>from 30kmph up to 150kmph, so when they are passing the array they
>would be changing direction very rapidly.  I can do the maths and get
>
>That's a good idea about restricting the analysis to higher frequencies
>to make the window more effective, thanks.
>
```