Hello, I am looking for advice, or a push in the right direction for the appropriate selection of window size applied to an audio digital signal. My intention is to cross-correlate two non-stationary audio signals obtained outdoors by a pair of adjacent microphones for the purpose of determining the time delay of arrival between them. Both have a sampling frequency of 44.1kHz. However from what I've read the cross-correlation technique assumes the signals to be stationary. So my window needs to be small enough for the audio signal to be quasi-stationary but yet big enough to result in a reasonably confident result. How can I decide on a particular window size, prove that the resulting audio segments are stationary and hence justify my decision?>From what I understand, the window size depends on the type of data andwhat is considered 'stationary' for that type of data, i.e. a speech signal is considered stationary for windows of size 20 to 30 ms. However, my signal is any general outdoor noise that I can pick up and I don't want to specify one type of sound. I can make an ad-hoc decision, but really want to back up and prove that my choice in window size is the most appropriate. Any advice is much appreciated, thanks OJ

# choosing correct window size for audio cross-correlation

Started by ●October 11, 2005

Reply by ●October 11, 20052005-10-11

oj wrote:> Hello, > > I am looking for advice, or a push in the right direction for the > appropriate selection of window size applied to an audio digital > signal. > > My intention is to cross-correlate two non-stationary audio signals > obtained outdoors by a pair of adjacent microphones for the purpose of > determining the time delay of arrival between them. Both have a > sampling frequency of 44.1kHz. However from what I've read the > cross-correlation technique assumes the signals to be stationary. So > my window needs to be small enough for the audio signal to be > quasi-stationary but yet big enough to result in a reasonably confident > result. How can I decide on a particular window size, prove that the > resulting audio segments are stationary and hence justify my decision? > > >>From what I understand, the window size depends on the type of data and > what is considered 'stationary' for that type of data, i.e. a speech > signal is considered stationary for windows of size 20 to 30 ms. > However, my signal is any general outdoor noise that I can pick up and > I don't want to specify one type of sound. > > I can make an ad-hoc decision, but really want to back up and prove > that my choice in window size is the most appropriate. Any advice is > much appreciated, thanksHow fast does the signal direction change? General outdoor sound comes simultaneously from many directions. What are you trying to find? You can usually do better by restricting the analysis to higher frequencies where a given delay causes a greater phase difference. Then a shorter window can be effective. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������

Reply by ●October 11, 20052005-10-11

I'm trying to track moving objects such as vehicles on a road, between 3 and 10 meters from the microphones. The sound source would be moving perpendicular to the mics at anything from 30kmph up to 150kmph, so when they are passing the array they would be changing direction very rapidly. I can do the maths and get back to you if you want more info. That's a good idea about restricting the analysis to higher frequencies to make the window more effective, thanks.

Reply by ●October 11, 20052005-10-11

oj wrote:> I'm trying to track moving objects such as vehicles on a road, between > 3 and 10 meters from the microphones. > > The sound source would be moving perpendicular to the mics at anything > from 30kmph up to 150kmph, so when they are passing the array they > would be changing direction very rapidly. I can do the maths and get > back to you if you want more info. > > That's a good idea about restricting the analysis to higher frequencies > to make the window more effective, thanks.You get better resolution with wider microphone separation, but side lobes develop when the separation is too wide. Three (or more) microphones help to eliminate that effect. There's been a lot of work on side-looking passive sonar that might bear on your project. I'm not up on it, but I think the papers are out "there". Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������

Reply by ●October 11, 20052005-10-11

Ordinary Cross-Correlation for estimating time-delays only works well with white noise as the signals! You will need what is called in the literature 'Generalized Cross Correlation'. There are many methods - Hanan-Thomson,PHAT,SCOT algorithms to mention just three.Some work better in reverberant environments and others in noisy environments. Naebad

Reply by ●October 12, 20052005-10-12

I agree and have found that ordinary cross-correlation is insufficient; I've been working with a GCC-PHAT algorithm and found it to be better than the other ones for my application, thanks for the tip. Also, I've looked a bit at the impact of microphone separation. What I'm really trying to pin down is how to select and justify my window size prior to applying the GCC-PHAT cross correlation method to it. All the papers start with 'assuming a wide-sense-stationary (Gaussian) signal' or something like that. Fine, so that has to be the case and I understand why. But I'm looking for some method to test a series of window sizes (or mathematical proof) to come to a particular value or range (for a given sampling frequency and bandwidth. Hopefully I'm making sense and thank you so much Jerry and Naebad for your feedback.

Reply by ●October 18, 20052005-10-18

I don't see how you can make this work. At the ranges and speeds you are looking at, a vehicle is a distributed sound source so that there is no single distance that can be measured. You might help yourself by using more information about your source in your problem. It appears to me that even if you knew your source perfectly, it couldn't be tracked using the techniques you are proposing. In article <1129054682.465082.162340@o13g2000cwo.googlegroups.com>, "oj" <duffnero@gmail.com> wrote:>I'm trying to track moving objects such as vehicles on a road, between >3 and 10 meters from the microphones. > >The sound source would be moving perpendicular to the mics at anything >from 30kmph up to 150kmph, so when they are passing the array they >would be changing direction very rapidly. I can do the maths and get >back to you if you want more info. > >That's a good idea about restricting the analysis to higher frequencies >to make the window more effective, thanks. >