DSPRelated.com
Forums

comparing audio signals to determine delay

Started by F.B. Uijtdewilligen March 19, 2007
Hello people,

I'm kinda new to all this, but I think I'm in the right place for the issue 
I'm trying to solve:

For an assignment I have to do in my study, I have to implement a simulation 
of a large sensor network, consisting of nodes equiped with microphones. The 
idea is to let the nodes record the audio from their surroundings, probably 
transform (simplify) it in some way and communicate this with others. Thus, 
a node receives information from it's neighbours and compares this with it's 
own information (for the purpose of the simulation, there will only be one 
main sound source), if it finds that the neighbouring node is "hearing the 
same audio", it calculates the delay between it's own signal and the 
neighbours signal to establish their distance. After some time, hopefully, 
the nodes will be able to establish their location with respect to its 
neighbours

The issue I'm currently dealing with is which way to compare the signals? 
I've googled around, found some about FFTs, but is this suitable in a node 
with somewhat limited memory and/or CPU-capabilities? Other ideas where 
using some sort of peak-detection or dividing the audio in parts of say 
20ms, calculating averages for those parts.. Any ideas are welcome!

Thanks for your time,

With kind regards,

Freek Uijtdewilligen
University of Twente, the Netherlands 


On Mar 19, 8:54 am, "F.B. Uijtdewilligen"
<f.b.uijtdewilli...@student.utwente.nl> wrote:

> > The issue I'm currently dealing with is which way to compare the signals? > I've googled around, found some about FFTs, but is this suitable in a node > with somewhat limited memory and/or CPU-capabilities? Other ideas where > using some sort of peak-detection or dividing the audio in parts of say > 20ms, calculating averages for those parts.. Any ideas are welcome!
look into cross-correlation. suppose y(t) has some delayed component of x(t) in it in addition to some other signal, v(t), completely unrelated to x(t): y(t) = A*x(t-T) + v(t) then the crosscorrelation: R_xy(tau) = integral{ x(t)*y(t+tau)*w(t) dt} will be maximum when tau = T. w(t) is just a nice window function to reduce edge effects and make your integral finite in domain. r b-j
F.B. Uijtdewilligen wrote:
> I'm kinda new to all this, but I think I'm in the right place for the issue > I'm trying to solve: > > For an assignment I have to do in my study, I have to implement a simulation > of a large sensor network, consisting of nodes equiped with microphones. The > idea is to let the nodes record the audio from their surroundings, probably > transform (simplify) it in some way and communicate this with others. Thus, > a node receives information from it's neighbours and compares this with it's > own information (for the purpose of the simulation, there will only be one > main sound source), if it finds that the neighbouring node is "hearing the > same audio", it calculates the delay between it's own signal and the > neighbours signal to establish their distance. After some time, hopefully, > the nodes will be able to establish their location with respect to its > neighbours > > The issue I'm currently dealing with is which way to compare the signals? > I've googled around, found some about FFTs, but is this suitable in a node > with somewhat limited memory and/or CPU-capabilities? Other ideas where > using some sort of peak-detection or dividing the audio in parts of say > 20ms, calculating averages for those parts.. Any ideas are welcome! > > Thanks for your time, > With kind regards, > Freek Uijtdewilligen > University of Twente, the Netherlands
You haven't stated what communication bandwidth is available or if the sensor nodes are battery powered. It's not likely that you can compare the time signals of audio signal (20-20kHz), unless you have bandwidth and grid-powered sensors. Even then you'll need GPS level timing in order to synchronize the signals, (example acoustic arrays and beamforming). FFT is doable, but depends on the frequency resolution desired and the required bandwidth. For example, if this is a security application and you are trying to detect vehicles, then signal detection from 500-1000 Hz with a resolution of 50 Hz might be acceptable for tracking the vehicle with the sensor network. So it depends on the details of application or assumptions that you want to make.
"Freelance Embedded Systems Engineer" <g9u5dd43@yahoo.com> schreef in 
bericht news:45fec2fb$0$1413$4c368faf@roadrunner.com...
> F.B. Uijtdewilligen wrote: >> I'm kinda new to all this, but I think I'm in the right place for the >> issue I'm trying to solve: >> >> For an assignment I have to do in my study, I have to implement a >> simulation of a large sensor network, consisting of nodes equiped with >> microphones. The idea is to let the nodes record the audio from their >> surroundings, probably transform (simplify) it in some way and >> communicate this with others. Thus, a node receives information from it's >> neighbours and compares this with it's own information (for the purpose >> of the simulation, there will only be one main sound source), if it finds >> that the neighbouring node is "hearing the same audio", it calculates the >> delay between it's own signal and the neighbours signal to establish >> their distance. After some time, hopefully, the nodes will be able to >> establish their location with respect to its neighbours >> >> The issue I'm currently dealing with is which way to compare the signals? >> I've googled around, found some about FFTs, but is this suitable in a >> node with somewhat limited memory and/or CPU-capabilities? Other ideas >> where using some sort of peak-detection or dividing the audio in parts of >> say 20ms, calculating averages for those parts.. Any ideas are welcome! >> >> Thanks for your time, >> With kind regards, >> Freek Uijtdewilligen >> University of Twente, the Netherlands > > You haven't stated what communication bandwidth is available or if the > sensor nodes are battery powered. It's not likely that you can compare > the time signals of audio signal (20-20kHz), unless you have bandwidth and > grid-powered sensors. Even then you'll need GPS level timing in order to > synchronize the signals, (example acoustic arrays and beamforming). > > FFT is doable, but depends on the frequency resolution desired and the > required bandwidth. For example, if this is a security application and > you are trying to detect vehicles, then signal detection from 500-1000 Hz > with a resolution of 50 Hz might be acceptable for tracking the vehicle > with the sensor network. > So it depends on the details of application or assumptions that you want > to make. >
There are some prototype sensor-nodes available, which are battery-powered and equiped with mics, and I know they already do some filtering to the audio signal they capture, I've mailed to inquire some more details about the nodes, especially the bandwith capabilities. Should the timing be a problem, assumed the nodes all have the time synchronized before deployment? I don't suppose they would run out of sync once they are properly synchronized... In this first stage, I try to simulate the nodes being able to find the distance to its neighbours, locating or tracking a sound object is a stage further in time.. Some of the logic in that will probably be the same, yet it still is a different thing alltogether and therefore not a part of my research.. But still thanks for the advice, I'll post when I have more information about the nodes..
F.B. Uijtdewilligen wrote:

   ...

> There are some prototype sensor-nodes available, which are battery-powered > and equiped with mics, and I know they already do some filtering to the > audio signal they capture, I've mailed to inquire some more details about > the nodes, especially the bandwith capabilities. Should the timing be a > problem, assumed the nodes all have the time synchronized before deployment? > I don't suppose they would run out of sync once they are properly > synchronized...
Many general-purpose oscillator crystals have frequency tolerances of one part in 10^7, although crystals can be had that are better. How long do you need to run before resynching? How much drift can you tolerate? What procedure will you use to synch them all? Jerry -- Engineering is the art of making what you want from things you can get. &macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;
On Mar 20, 4:24 am, "robert bristow-johnson"
<r...@audioimagination.com> wrote:
> On Mar 19, 8:54 am, "F.B. Uijtdewilligen" > > <f.b.uijtdewilli...@student.utwente.nl> wrote: > > > The issue I'm currently dealing with is which way to compare the signals? > > I've googled around, found some about FFTs, but is this suitable in a node > > with somewhat limited memory and/or CPU-capabilities? Other ideas where > > using some sort of peak-detection or dividing the audio in parts of say > > 20ms, calculating averages for those parts.. Any ideas are welcome! > > look into cross-correlation. > > suppose y(t) has some delayed component of x(t) in it in addition to > some other signal, v(t), completely unrelated to x(t): > > y(t) = A*x(t-T) + v(t) > > then the crosscorrelation: > > R_xy(tau) = integral{ x(t)*y(t+tau)*w(t) dt} > > will be maximum when tau = T. w(t) is just a nice window function to > reduce edge effects and make your integral finite in domain. > > r b-j
Cross correlation does nto work well when estimating delays. (unless the signals are white)You need generalized cross correlation. N.
On Mar 19, 4:32 pm, "naebad" <minnae...@yahoo.co.uk> wrote:
> > Cross correlation does not work well when estimating delays (unless > the signals are white). You need generalized cross correlation.
the signals need not be white, but what they need to be is broadbanded and *not* periodic, and then cross correlation works okay. given those conditions, R_xy(tau) in R_xy(tau) = integral{ x(t)*y(t+tau)*w(t) dt} will be the same tau that you get from minimizing this difference function: min integral{ (x(t) - B*y(t+tau))^2 * w(t) dt} B,tau the value of B will be about 1/A and tau will be about T if y(t) is expressed as y(t) = A*x(t-T) + v(t) and v(t) is completely uncorrelated to x(t). if x(t) is periodic or quasi-periodic, then the problem is that there are several values of T (and therefore several values of tau) where the above is equally true so your measured delay will be ambiguous. but it doesn't have to be white, just nonperiodic and reasonably broadbanded. dunno what "generalized" cross correlation is. r b-j
robert bristow-johnson wrote:

> On Mar 19, 4:32 pm, "naebad" <minnae...@yahoo.co.uk> wrote: >> >> Cross correlation does not work well when estimating delays (unless >> the signals are white). You need generalized cross correlation. > > the signals need not be white, but what they need to be is broadbanded > and *not* periodic, and then cross correlation works okay.
Since audio is often periodic I would reduce the AC signal to a series of RMS or peak values with much reduced time resolution (depending on the required precision of the location). This also reduces the workload of the CPU doing the cross correlation dramatically. bye Andreas -- Andreas H&#4294967295;nnebeck | email: acmh@gmx.de ----- privat ---- | www : http://www.huennebeck-online.de Fax/Anrufbeantworter: 0721/151-284301 GPG-Key: http://www.huennebeck-online.de/public_keys/andreas.asc PGP-Key: http://www.huennebeck-online.de/public_keys/pgp_andreas.asc
On Mar 20, 9:52 am, "robert bristow-johnson"
<r...@audioimagination.com> wrote:
> On Mar 19, 4:32 pm, "naebad" <minnae...@yahoo.co.uk> wrote: > > > > > Cross correlation does not work well when estimating delays (unless > > the signals are white). You need generalized cross correlation. > > the signals need not be white, but what they need to be is broadbanded > and *not* periodic, and then cross correlation works okay. given > those conditions, R_xy(tau) in > > R_xy(tau) = integral{ x(t)*y(t+tau)*w(t) dt} > > will be the same tau that you get from minimizing this difference > function: > > min integral{ (x(t) - B*y(t+tau))^2 * w(t) dt} > B,tau > > the value of B will be about 1/A and tau will be about T if y(t) is > expressed as > > y(t) = A*x(t-T) + v(t) > > and v(t) is completely uncorrelated to x(t). if x(t) is periodic or > quasi-periodic, then the problem is that there are several values of T > (and therefore several values of tau) where the above is equally true > so your measured delay will be ambiguous. but it doesn't have to be > white, just nonperiodic and reasonably broadbanded. > > dunno what "generalized" cross correlation is. > > r b-j
No, they need to be white for a good estimate of delay. There is not space to discuss it all here. You need to look at Knapp and Carters paper. Knapp, G.C. Carter, The generalized correlation method for estimation of time delay, IEEE Trans. ASSP. 24 (4) (1976) 320-326. ... F.
minfitlike@yahoo.co.uk wrote:

   ...

> No, they need to be white for a good estimate of delay. There is not > space to discuss it all here. You need to look at Knapp and Carters > paper. > > Knapp, G.C. Carter, The generalized correlation method for estimation > of time delay, IEEE Trans. ASSP. 24 (4) (1976) 320-326. ...
You must be looking at one specific approach and assuming that the conditions to make it work apply generally. Suppose the signal consisted of a clean narrow spike. That would make timing the differential delay easy, and there's nothing noisy about it, let alone white. Jerry -- Engineering is the art of making what you want from things you can get. &macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;&macr;