DSPRelated.com
Forums

cross correlation sound

Started by maz_p5 August 24, 2008
Yes, you are definitely using too many points, so cut that number down
by a significant amount.  Given the physical parameters of your set-
up, you really don�t need nearly a million points to compute
something.

I know you�re under deadline, and you�re probably feeling a bit
stressed at this point, so let me suggest something that could make
things easier for you.  It has to do with how you�re computing your
solution once you get the time delays.

You don�t really need to solve simultaneous equations or anything like
that.  You can simply use brute force computer power to get an answer
to �what is the x,y location of the sound source, given the following
delays between microphones 1, 2 and 3?�

Let�s presume we have the time delays between the microphones.  Now
the problem becomes: �given the physical arrangement of the set-up,
might we use a look-up table to determine the x,y location of the
sound source, given that we have the following delays between
microphones?�

You know your microphone spacing.  With your given sound speed, you
know exactly how long it takes for a sound to travel from one point to
another.  Let's say that your microphones are placed at positions M0
(x=0, y=0), MX (x=1, y= 0), and MY (x=0, y=1).  OK.  I know that your
spacing is different, but follow along.  Your sound source is located
somewhere in the upper right quadrant bounded by M0 on the lower left,
MY on the upper left, and MX on the lower right.  The sound source is
somewhere in the upper right quadrant of a Cartesian coordinate
system, and you're trying to find its x,y location.  Now place an x,y
grid on top of the Cartesian quadrant.  The spacing of the grid points
is up to you, but your precision is limited by the physical parameters
of your problem.  Then compute the time delay from every grid point to
the known microphone locations.

For instance, presuming that the sound source is exactly at the M0
location, then, based on your spacing, you know precisely how long it
will take the sound to reach M0, MX and MY (the time to M0 is 0, and
the travel time to the others can be precisely computed).

Now sequence through every single one of the grid points and calculate
the delay from that point to M0, MX and MY.  This can be used to
generate a look-up table for any x,y grid point.  Each x,y point thus
contains the delays to each microphone.  All of this can be pre-
computed.

Now obtain the delays by using max( ) or xcorr( ) on your recorded
data. They are your measured results.  Compare the measured delays
with your previously generated grid data by going through the entire
list of grid points and compare them with your pre-computed delays.
Start with grid point 0,0, and run through all the rest.  Find the
best match between the measured delays and your pre_computed ones.

You�ll have a problem in that you need to account for positive or
negative delays (e.g.: m1 might be ahead of m2, or m2 might be ahead
of m1).  You'll also actually need multiple tables, because you have
multiple microphones.

Given the physical parameters of your problem, it shouldn't be too
difficult to compute a look-up table(s) with pre-computed time delays
from any x,y point in the grid to the known locations of M0, MY and
MX.  Then, using your measured delays ( from max( ) or xcorr( ) ),
you�d search through the look-up table(s) to find the closest match.
The physical parameters of your problem indicate that the maximum
travel time will be 155 sample time delays, so you don�t need very big
table(s) to cover all the possible time delays.
> >Yes, you are definitely using too many points, so cut that number down >by a significant amount. Given the physical parameters of your set- >up, you really don=92t need nearly a million points to compute >something. > >I know you=92re under deadline, and you=92re probably feeling a bit >stressed at this point, so let me suggest something that could make >things easier for you. It has to do with how you=92re computing your >solution once you get the time delays. > >You don=92t really need to solve simultaneous equations or anything like >that. You can simply use brute force computer power to get an answer >to =93what is the x,y location of the sound source, given the following >delays between microphones 1, 2 and 3?=94 > >Let=92s presume we have the time delays between the microphones. Now >the problem becomes: =93given the physical arrangement of the set-up, >might we use a look-up table to determine the x,y location of the >sound source, given that we have the following delays between >microphones?=94 > >You know your microphone spacing. With your given sound speed, you >know exactly how long it takes for a sound to travel from one point to >another. Let's say that your microphones are placed at positions M0 >(x=3D0, y=3D0), MX (x=3D1, y=3D 0), and MY (x=3D0, y=3D1). OK. I know
tha=
>t your >spacing is different, but follow along. Your sound source is located >somewhere in the upper right quadrant bounded by M0 on the lower left, >MY on the upper left, and MX on the lower right. The sound source is >somewhere in the upper right quadrant of a Cartesian coordinate >system, and you're trying to find its x,y location. Now place an x,y >grid on top of the Cartesian quadrant. The spacing of the grid points >is up to you, but your precision is limited by the physical parameters >of your problem. Then compute the time delay from every grid point to >the known microphone locations. > >For instance, presuming that the sound source is exactly at the M0 >location, then, based on your spacing, you know precisely how long it >will take the sound to reach M0, MX and MY (the time to M0 is 0, and >the travel time to the others can be precisely computed). > >Now sequence through every single one of the grid points and calculate >the delay from that point to M0, MX and MY. This can be used to >generate a look-up table for any x,y grid point. Each x,y point thus >contains the delays to each microphone. All of this can be pre- >computed. > >Now obtain the delays by using max( ) or xcorr( ) on your recorded >data. They are your measured results. Compare the measured delays >with your previously generated grid data by going through the entire >list of grid points and compare them with your pre-computed delays. >Start with grid point 0,0, and run through all the rest. Find the >best match between the measured delays and your pre_computed ones. > >You=92ll have a problem in that you need to account for positive or >negative delays (e.g.: m1 might be ahead of m2, or m2 might be ahead >of m1). You'll also actually need multiple tables, because you have >multiple microphones. > >Given the physical parameters of your problem, it shouldn't be too >difficult to compute a look-up table(s) with pre-computed time delays >from any x,y point in the grid to the known locations of M0, MY and >MX. Then, using your measured delays ( from max( ) or xcorr( ) ), >you=92d search through the look-up table(s) to find the closest match. >The physical parameters of your problem indicate that the maximum >travel time will be 155 sample time delays, so you don=91t need very big >table(s) to cover all the possible time delays. >
Hi, Thank you. This seems ok because it will also give me an idea whether the way we were calculating the delay was right or wrong. I will try it out and get back to you. But yea, one more thing. I have the formula and the functions to calculate the co-ordinates of the sound source using non-linear equations. I get perfect results as far as the center location is concerned (i.e. when i generate a sound in the center of the table). BUT for any other location other than the center, I still get the result as the center which is weird. I figured out where the problem is, and the problem is right here as what we are trying to solve i.e. the delay. The delay is converted into relative distance as (delay * 340 i.e. v of sound)/44100 Hz i.e. fz . This ans is very very small and therefore, when applied to the function; there is no big enough change in the value, thus showing the location as center always.
>> >>Yes, you are definitely using too many points, so cut that number down >>by a significant amount. Given the physical parameters of your set- >>up, you really don=92t need nearly a million points to compute >>something. >> >>I know you=92re under deadline, and you=92re probably feeling a bit >>stressed at this point, so let me suggest something that could make >>things easier for you. It has to do with how you=92re computing your >>solution once you get the time delays. >> >>You don=92t really need to solve simultaneous equations or anything
like
>>that. You can simply use brute force computer power to get an answer >>to =93what is the x,y location of the sound source, given the following >>delays between microphones 1, 2 and 3?=94 >> >>Let=92s presume we have the time delays between the microphones. Now >>the problem becomes: =93given the physical arrangement of the set-up, >>might we use a look-up table to determine the x,y location of the >>sound source, given that we have the following delays between >>microphones?=94 >> >>You know your microphone spacing. With your given sound speed, you >>know exactly how long it takes for a sound to travel from one point to >>another. Let's say that your microphones are placed at positions M0 >>(x=3D0, y=3D0), MX (x=3D1, y=3D 0), and MY (x=3D0, y=3D1). OK. I know >tha= >>t your >>spacing is different, but follow along. Your sound source is located >>somewhere in the upper right quadrant bounded by M0 on the lower left, >>MY on the upper left, and MX on the lower right. The sound source is >>somewhere in the upper right quadrant of a Cartesian coordinate >>system, and you're trying to find its x,y location. Now place an x,y >>grid on top of the Cartesian quadrant. The spacing of the grid points >>is up to you, but your precision is limited by the physical parameters >>of your problem. Then compute the time delay from every grid point to >>the known microphone locations. >> >>For instance, presuming that the sound source is exactly at the M0 >>location, then, based on your spacing, you know precisely how long it >>will take the sound to reach M0, MX and MY (the time to M0 is 0, and >>the travel time to the others can be precisely computed). >> >>Now sequence through every single one of the grid points and calculate >>the delay from that point to M0, MX and MY. This can be used to >>generate a look-up table for any x,y grid point. Each x,y point thus >>contains the delays to each microphone. All of this can be pre- >>computed. >> >>Now obtain the delays by using max( ) or xcorr( ) on your recorded >>data. They are your measured results. Compare the measured delays >>with your previously generated grid data by going through the entire >>list of grid points and compare them with your pre-computed delays. >>Start with grid point 0,0, and run through all the rest. Find the >>best match between the measured delays and your pre_computed ones. >> >>You=92ll have a problem in that you need to account for positive or >>negative delays (e.g.: m1 might be ahead of m2, or m2 might be ahead >>of m1). You'll also actually need multiple tables, because you have >>multiple microphones. >> >>Given the physical parameters of your problem, it shouldn't be too >>difficult to compute a look-up table(s) with pre-computed time delays >>from any x,y point in the grid to the known locations of M0, MY and >>MX. Then, using your measured delays ( from max( ) or xcorr( ) ), >>you=92d search through the look-up table(s) to find the closest match. >>The physical parameters of your problem indicate that the maximum >>travel time will be 155 sample time delays, so you don=91t need very
big
>>table(s) to cover all the possible time delays. >> >Hi, > >Thank you. This seems ok because it will also give me an idea whether
the
>way we were calculating the delay was right or wrong. > >I will try it out and get back to you. > >But yea, one more thing. I have the formula and the functions to
calculate
>the co-ordinates of the sound source using non-linear equations. I get >perfect results as far as the center location is concerned (i.e. when i >generate a sound in the center of the table). BUT for any other location >other than the center, I still get the result as the center which is >weird. >I figured out where the problem is, and the problem is right here as
what
>we are trying to solve i.e. the delay. >The delay is converted into relative distance as (delay * 340 i.e. v of >sound)/44100 Hz i.e. fz . This ans is very very small and therefore,
when
>applied to the function; there is no big enough change in the value,
thus
>showing the location as center always. > >
Hi, Thank you. This seems ok because it will also give me an idea whether the way we were calculating the delay was right or wrong. I have captured the sound from different co-ordinates on the board in sets of 4 (i.e. from 4 mics). But yea, one more thing. I have the formula and the functions to calculate the co-ordinates of the sound source using non-linear equations. I get perfect results as far as the center location is concerned (i.e. when i generate a sound in the center of the table). BUT for any other location other than the center, I still get the result as the center which is weird. I figured out where the problem is, and the problem is right here as what we are trying to solve i.e. the delay. The delay is converted into relative distance as (delay * 340 i.e. v of sound)/44100 Hz i.e. fz . This ans is very very small and therefore, when applied to the function; there is no big enough change in the value, thus showing the location as center always.
>>> >>>Yes, you are definitely using too many points, so cut that number down >>>by a significant amount. Given the physical parameters of your set- >>>up, you really don=92t need nearly a million points to compute >>>something. >>> >>>I know you=92re under deadline, and you=92re probably feeling a bit >>>stressed at this point, so let me suggest something that could make >>>things easier for you. It has to do with how you=92re computing your >>>solution once you get the time delays. >>> >>>You don=92t really need to solve simultaneous equations or anything >like >>>that. You can simply use brute force computer power to get an answer >>>to =93what is the x,y location of the sound source, given the
following
>>>delays between microphones 1, 2 and 3?=94 >>> >>>Let=92s presume we have the time delays between the microphones. Now >>>the problem becomes: =93given the physical arrangement of the set-up, >>>might we use a look-up table to determine the x,y location of the >>>sound source, given that we have the following delays between >>>microphones?=94 >>> >>>You know your microphone spacing. With your given sound speed, you >>>know exactly how long it takes for a sound to travel from one point to >>>another. Let's say that your microphones are placed at positions M0 >>>(x=3D0, y=3D0), MX (x=3D1, y=3D 0), and MY (x=3D0, y=3D1). OK. I
know
>>tha= >>>t your >>>spacing is different, but follow along. Your sound source is located >>>somewhere in the upper right quadrant bounded by M0 on the lower left, >>>MY on the upper left, and MX on the lower right. The sound source is >>>somewhere in the upper right quadrant of a Cartesian coordinate >>>system, and you're trying to find its x,y location. Now place an x,y >>>grid on top of the Cartesian quadrant. The spacing of the grid points >>>is up to you, but your precision is limited by the physical parameters >>>of your problem. Then compute the time delay from every grid point to >>>the known microphone locations. >>> >>>For instance, presuming that the sound source is exactly at the M0 >>>location, then, based on your spacing, you know precisely how long it >>>will take the sound to reach M0, MX and MY (the time to M0 is 0, and >>>the travel time to the others can be precisely computed). >>> >>>Now sequence through every single one of the grid points and calculate >>>the delay from that point to M0, MX and MY. This can be used to >>>generate a look-up table for any x,y grid point. Each x,y point thus >>>contains the delays to each microphone. All of this can be pre- >>>computed. >>> >>>Now obtain the delays by using max( ) or xcorr( ) on your recorded >>>data. They are your measured results. Compare the measured delays >>>with your previously generated grid data by going through the entire >>>list of grid points and compare them with your pre-computed delays. >>>Start with grid point 0,0, and run through all the rest. Find the >>>best match between the measured delays and your pre_computed ones. >>> >>>You=92ll have a problem in that you need to account for positive or >>>negative delays (e.g.: m1 might be ahead of m2, or m2 might be ahead >>>of m1). You'll also actually need multiple tables, because you have >>>multiple microphones. >>> >>>Given the physical parameters of your problem, it shouldn't be too >>>difficult to compute a look-up table(s) with pre-computed time delays >>>from any x,y point in the grid to the known locations of M0, MY and >>>MX. Then, using your measured delays ( from max( ) or xcorr( ) ), >>>you=92d search through the look-up table(s) to find the closest match. >>>The physical parameters of your problem indicate that the maximum >>>travel time will be 155 sample time delays, so you don=91t need very >big >>>table(s) to cover all the possible time delays. >>> >>Hi, >> >>Thank you. This seems ok because it will also give me an idea whether >the >>way we were calculating the delay was right or wrong. >> >>I will try it out and get back to you. >> >>But yea, one more thing. I have the formula and the functions to >calculate >>the co-ordinates of the sound source using non-linear equations. I get >>perfect results as far as the center location is concerned (i.e. when i >>generate a sound in the center of the table). BUT for any other
location
>>other than the center, I still get the result as the center which is >>weird. >>I figured out where the problem is, and the problem is right here as >what >>we are trying to solve i.e. the delay. >>The delay is converted into relative distance as (delay * 340 i.e. v of >>sound)/44100 Hz i.e. fz . This ans is very very small and therefore, >when >>applied to the function; there is no big enough change in the value, >thus >>showing the location as center always.
Also I have read about ways such as frequency cross-correlation using FFT and generalized cross correlation. Will this help? How do I do this?
In my post of Aug. 24, 10:59 PM, I described how a simple cross
correlation is done with FFT, and mentioned 3 references for
generalized cross correlation.  Basically, a generalized cross
correlator requires adding a filter in the frequency domain.  For
instance, one particular filter function is known as the 'smoothed
coherence transform' or SCOT:

G. C. Carter, A. H. Nuttal, P. G. Cable, " The smoothed coherence
transform,"  Proc. IEEE (Lett), vol. 61, pp. 1497-1498, Oct., 1973

There are many other filters (see previous 3 references).  But those
filter functions are not easy for a beginner to figure out, and it may
be very difficult to do it on your own.  Many years ago, I took a
graduate course on time delay estimation, and we spent a great deal of
time computing and comparing various filters and running computer
simulations.  I just don't think that it's something that anyone can
learn on a weekend.

You might want to try the grid method first.  Just superimpose a grid
on you data space and compute the time delays from any given point x,y
to the known microphone locations.  Then, after getting the measured
time delays from your data (from xcorr( ) or max( ) ), sift through
all x,y points of the grid until you find the best match between
measured and pre-computed delays.  It'd be a lot easier.



On Sep 11, 12:07 am, kevinjmc...@netscape.net wrote:
> In my post of Aug. 24, 10:59 PM, I described how a simple cross > correlation is done with FFT, and mentioned 3 references for > generalized cross correlation. Basically, a generalized cross > correlator requires adding a filter in the frequency domain.
so, is this in lieu of windowing in the time domain? or to undo some windowing in the time domain? just curious. r b-j
>On Sep 11, 12:07 am, kevinjmc...@netscape.net wrote: >> In my post of Aug. 24, 10:59 PM, I described how a simple cross >> correlation is done with FFT, and mentioned 3 references for >> generalized cross correlation. Basically, a generalized cross >> correlator requires adding a filter in the frequency domain. > >so, is this in lieu of windowing in the time domain? or to undo some >windowing in the time domain? > >just curious. > >r b-j >
Hi, Ok, lets leave the generalized cross-correlation for later (if i have time). You told me you can help me with the localization as well once i get the delays. I think the delays we calculate using xcorr are right. I am working on the grid technique, but its obviously not the best if the area is large. Can you suggest any other method? I have papers which say about over-determined systems using least square or least mean square technique.
>>On Sep 11, 12:07 am, kevinjmc...@netscape.net wrote: >>> In my post of Aug. 24, 10:59 PM, I described how a simple cross >>> correlation is done with FFT, and mentioned 3 references for >>> generalized cross correlation. Basically, a generalized cross >>> correlator requires adding a filter in the frequency domain. >> >>so, is this in lieu of windowing in the time domain? or to undo some >>windowing in the time domain? >> >>just curious. >> >>r b-j >> > >Hi, > >Ok, lets leave the generalized cross-correlation for later (if i have >time). You told me you can help me with the localization as well once i
get
>the delays. I think the delays we calculate using xcorr are right. > >I am working on the grid technique, but its obviously not the best if
the
>area is large. Can you suggest any other method? I have papers which say >about over-determined systems using least square or least mean square >technique. >
Hi, I did try for the 1st location i.e. (30,10). the co-ordinates of the mics are: m1(0,0), m2(0,60), m3(120,60), m4(120,0) the location at which the sound is made is: (30,10) (all in centimeter) Now, distance bet mic 1 and source is: 31.62 therefore, time1=0.00093s (t=d/s i.e. t=31.62/34000) similarly, for mic 2, time2 = 0.001715 for mic 3, time3 = 0.00303 for mic 4, time4 = 0.00266 therefore, the time diff bet mic 1 and 2 = time2-time1 = 0.000785 (do we have to divide by the fz?) therefore, the distance between mic 1 to the source and and mic 2 to the source is: d = 26.69 (d=t*s i.e. 0.000785 * 34000) Now, using xcorr, the delay between mic 1 and 2 is 30 sample units. converting it to relative distance (d1)= (delay *34000)/fz = 23.1293 Thus, d and d1 are not same. Is this what you are trying to say? Is the method and values correct?
To robert bristow-johnson: Actually, it's not really in lieu of
windowing or undoing some windowing in the time domain.  The two
signals are appropriately FFT'd, then one result is conjugated.  This
is equivalent to time reversing one the waveforms in the time domain.
Then the two results are multiplied in the frequency domain, the
filtering function multiplication is applied, and the result is
inverse transformed.  The output is the cross correlation.  If there
were no conjugation, then you'd get the convolution of the two inputs.
When doing a generalized cross correlation, the filter is applied
AFTER the conjugation/multiplication in the frequency domain, and just
before the inverse.  I suppose you could move the filter past the
inverse transform and then convolve it with your unfiltered result
(i.e.: do a simple cross correlator and then convolve the output with
the inverse transform of your frequency domain filter), but that seems
more difficult.  I suspect it would be even more difficult if you were
to try to move the filter function to the front end of the problem,
because you'd have to move it past the multiplication, conjugation and
FFT parts.  I suppose it could be done, and in that case, you'd window
both inputs in the time domain before doing the simple cross
correlation.  But those windows will be different for each time domain
input, and perhaps very difficult to compute (they're difficult enough
as it is).  It just seems easier to use the filter in the frequency
domain.

The main purpose of the frequency domain filter is to overcome the
problems with a simple cross correlator by taking into account the
characteristics of the signals and noise.  With a simple cross
correlator, there are two major problems: 1) any noise added to the
signals may cause the output to indicate a false time delay peak, and
2) sinusoidal inputs can give you time delay outputs that oscillate.
In a generalized cross correlator, the frequency domain filter is used
to get a good time delay estimate based on the characteristics of the
signal and noise.

I hope this answers your question - please let me know if it doesn't.

To Maz:  Your problem has two distinct parts - 1) estimate the time
delay based on the measured data, and 2) using the estimates.
determine the x,y location of the source.  Suppose you could send your
measured data off to a lab somewhere and tell them: "determine the
exact time delays."  Now suppose they handed an answer back to you:
"We've used some of the latest techniques in physics, math and
engineering to determine that the delays are precisely: (insert some
numbers here up to 10 to the million decimal points)."  Now the
question is: "What do you do with those estimates?"

I was hoping that you'd try the grid estimation method first because
that gives you a good idea of how your time delays physically relate
to your problem space.

And I've just noticed that you've posted again (2:59 AM) - I'll look
at the new post later today.
For one half of your problem, all you need to do is fill in a look-up
table.  You pick a grid point x,y and determine the distance between
that point and your microphones (simple high school geometry). Then
you calculate the delay between that point and the microphones.  As
per your example, for location (30,10), the time delay between that
location and microphone 1 is .00093 sec.  You compute the delays to
all the other microphones.  Then you should multiply the absolute
delays by 44100 (the sample rate) to get the delay expressed in terms
of the 'number of sample times.'  Those are the values you put into
the look-up table (e.g.: for grid location 30,10 - the time delays to
the microphones are: t1, t2, t3, t4).  I have no idea why you're
computing distances after computing the time delays.

Once you've populated the look-up table, you turn to the other half of
your problem - computing the actual time delays using your
measurements (do so with xcorr( ), max( ), or some other method to be
determined later).  Your measured results will be a delay expressed as
'number of sample times.'  Now, using your measured results, search
through the look-up table to find the best match between the measured
and pre-computed time delays.

Aa an aside, I also can't help but notice that you frequently start
other threads under a different title for the same problem, and you
ask for MATLAB code examples.  I strongly suspect that you are not
very experienced with programming.  Most people posting here, while
they can be very generous with their time and knowledge, won't do
something like "Here's how to solve your problem analytically, and,
oh, by the way, here's some MATLAB code to do it."  They rightfully
expect that the person who poses the problem has some basic analytical
and programming skills.  So I think you might have to reconsider your
deadline and include some unscheduled 'learn how to program in MATLAB'
time (or C, C++, FORTRAN, etc.).

It takes time to learn DSP techniques, and it can be very
frustrating.  But you should at least have some kind of programming
experience.