DSPRelated.com
Forums

to calculate time delay between two signals

Started by padma.kancharla October 26, 2011
On Oct 28, 3:55&#4294967295;am, maury <maury...@core.com> wrote:
> On Oct 26, 9:38&#4294967295;pm, fatalist <simfid...@gmail.com> wrote: > > > > > On Oct 26, 7:48&#4294967295;pm, brent <buleg...@columbus.rr.com> wrote: > > > > On Oct 26, 4:24&#4294967295;pm, maury <maury...@core.com> wrote: > > > > > On Oct 26, 9:51&#4294967295;am, "padma.kancharla" <pkanchar@n_o_s_p_a_m.mit.edu> > > > > wrote: > > > > > > HI > > > > > > I am a CS student and new to this field. I am working on source > > > > > localisation problem. I do not know matlab. Could you please throw some > > > > > light on how to implement the following steps in C language. > > > > > > 1.find the cross correlation between two signals received at two mics > > > > > simultaneously. Here the signals refer to speech signals. > > > > > > 2.determine the accurate time delay between them. > > > > > > Please give me ideas or sources that help me deal with this problem and > > > > > that would teach me how to implement them from scratch as I am very new to > > > > > this field. > > > > > > I really need it asap. > > > > > > Thanks in advance, > > > > > Padma > > > > > Look at U. S. patent 6947551. It shows how to use the AMDF as a > > > > correlator to do time delay for speech signals. f course, if you use > > > > the AMDF (even for research), you will need to pay the invention > > > > assignee. But, there are other ways to do correlation. This, at least, > > > > shows you how. > > > > > And, it's very accurate. > > > > > Maurice Givens > > > > That seems to contradict what the non- "IDIOT" is saying. &#4294967295;I guess you > > > are &#4294967295;an anti-non IDIOT.- Hide quoted text - > > > > - Show quoted text - > > > Severe PMS can cloud person's mind and turn anybody into stupident :) > > > Everybody and his uncle knows that the standard text-book answer to > > the OP' question is > > *generalized cross-correlation* (GCC), which turns into classical > > cross-correlation when weighting function is chosen to be equal to > > one. > > > Springer Handbook of Speech Processing, Chapter 51 "Time Delay > > Estimation and Source Localization" (Page 1045) > > > This works OK if there is no room reverberation. > > > With room reverberation.......- Hide quoted text - > > > - Show quoted text - > > Doing a correlation on a speech signal will, in general, give you the > pitch period of the voiced portion, not the delay of the speech. If > you're going to use correlation, you need the envelope (or something > similar). > > Maurice Givens
no it won't! Cross-correlation between two speech signals gives you the delay. This is well documented in the literature.
On Oct 27, 2:14&#4294967295;pm, HardySpicer <gyansor...@gmail.com> wrote:
> On Oct 28, 3:55&#4294967295;am, maury <maury...@core.com> wrote: > > > > > > > On Oct 26, 9:38&#4294967295;pm, fatalist <simfid...@gmail.com> wrote: > > > > On Oct 26, 7:48&#4294967295;pm, brent <buleg...@columbus.rr.com> wrote: > > > > > On Oct 26, 4:24&#4294967295;pm, maury <maury...@core.com> wrote: > > > > > > On Oct 26, 9:51&#4294967295;am, "padma.kancharla" <pkanchar@n_o_s_p_a_m.mit.edu> > > > > > wrote: > > > > > > > HI > > > > > > > I am a CS student and new to this field. I am working on source > > > > > > localisation problem. I do not know matlab. Could you please throw some > > > > > > light on how to implement the following steps in C language. > > > > > > > 1.find the cross correlation between two signals received at two mics > > > > > > simultaneously. Here the signals refer to speech signals. > > > > > > > 2.determine the accurate time delay between them. > > > > > > > Please give me ideas or sources that help me deal with this problem and > > > > > > that would teach me how to implement them from scratch as I am very new to > > > > > > this field. > > > > > > > I really need it asap. > > > > > > > Thanks in advance, > > > > > > Padma > > > > > > Look at U. S. patent 6947551. It shows how to use the AMDF as a > > > > > correlator to do time delay for speech signals. f course, if you use > > > > > the AMDF (even for research), you will need to pay the invention > > > > > assignee. But, there are other ways to do correlation. This, at least, > > > > > shows you how. > > > > > > And, it's very accurate. > > > > > > Maurice Givens > > > > > That seems to contradict what the non- "IDIOT" is saying. &#4294967295;I guess you > > > > are &#4294967295;an anti-non IDIOT.- Hide quoted text - > > > > > - Show quoted text - > > > > Severe PMS can cloud person's mind and turn anybody into stupident :) > > > > Everybody and his uncle knows that the standard text-book answer to > > > the OP' question is > > > *generalized cross-correlation* (GCC), which turns into classical > > > cross-correlation when weighting function is chosen to be equal to > > > one. > > > > Springer Handbook of Speech Processing, Chapter 51 "Time Delay > > > Estimation and Source Localization" (Page 1045) > > > > This works OK if there is no room reverberation. > > > > With room reverberation.......- Hide quoted text - > > > > - Show quoted text - > > > Doing a correlation on a speech signal will, in general, give you the > > pitch period of the voiced portion, not the delay of the speech. If > > you're going to use correlation, you need the envelope (or something > > similar). > > > Maurice Givens > > no it won't! Cross-correlation between two speech signals gives you > the delay. This is well documented in the literature.- Hide quoted text - > > - Show quoted text -
Of course. I'm still have the AMDF in mind. It's a lot cheaper, computationally, but you must filter the signal first.
On Oct 28, 9:36&#4294967295;am, maury <maury...@core.com> wrote:
> On Oct 27, 2:14&#4294967295;pm, HardySpicer <gyansor...@gmail.com> wrote: > > > > > On Oct 28, 3:55&#4294967295;am, maury <maury...@core.com> wrote: > > > > On Oct 26, 9:38&#4294967295;pm, fatalist <simfid...@gmail.com> wrote: > > > > > On Oct 26, 7:48&#4294967295;pm, brent <buleg...@columbus.rr.com> wrote: > > > > > > On Oct 26, 4:24&#4294967295;pm, maury <maury...@core.com> wrote: > > > > > > > On Oct 26, 9:51&#4294967295;am, "padma.kancharla" <pkanchar@n_o_s_p_a_m.mit.edu> > > > > > > wrote: > > > > > > > > HI > > > > > > > > I am a CS student and new to this field. I am working on source > > > > > > > localisation problem. I do not know matlab. Could you please throw some > > > > > > > light on how to implement the following steps in C language. > > > > > > > > 1.find the cross correlation between two signals received at two mics > > > > > > > simultaneously. Here the signals refer to speech signals. > > > > > > > > 2.determine the accurate time delay between them. > > > > > > > > Please give me ideas or sources that help me deal with this problem and > > > > > > > that would teach me how to implement them from scratch as I am very new to > > > > > > > this field. > > > > > > > > I really need it asap. > > > > > > > > Thanks in advance, > > > > > > > Padma > > > > > > > Look at U. S. patent 6947551. It shows how to use the AMDF as a > > > > > > correlator to do time delay for speech signals. f course, if you use > > > > > > the AMDF (even for research), you will need to pay the invention > > > > > > assignee. But, there are other ways to do correlation. This, at least, > > > > > > shows you how. > > > > > > > And, it's very accurate. > > > > > > > Maurice Givens > > > > > > That seems to contradict what the non- "IDIOT" is saying. &#4294967295;I guess you > > > > > are &#4294967295;an anti-non IDIOT.- Hide quoted text - > > > > > > - Show quoted text - > > > > > Severe PMS can cloud person's mind and turn anybody into stupident :) > > > > > Everybody and his uncle knows that the standard text-book answer to > > > > the OP' question is > > > > *generalized cross-correlation* (GCC), which turns into classical > > > > cross-correlation when weighting function is chosen to be equal to > > > > one. > > > > > Springer Handbook of Speech Processing, Chapter 51 "Time Delay > > > > Estimation and Source Localization" (Page 1045) > > > > > This works OK if there is no room reverberation. > > > > > With room reverberation.......- Hide quoted text - > > > > > - Show quoted text - > > > > Doing a correlation on a speech signal will, in general, give you the > > > pitch period of the voiced portion, not the delay of the speech. If > > > you're going to use correlation, you need the envelope (or something > > > similar). > > > > Maurice Givens > > > no it won't! Cross-correlation between two speech signals gives you > > the delay. This is well documented in the literature.- Hide quoted text - > > > - Show quoted text - > > Of course. I'm still have the AMDF in mind. It's a lot cheaper, > computationally, but you must filter the signal first.
well, if you use the generalise cross-correlation you need not filter first since this is done by the algorithm itself and depends on coherence etc. Hardy
On Oct 27, 4:36&#4294967295;pm, maury <maury...@core.com> wrote:
> On Oct 27, 2:14&#4294967295;pm, HardySpicer <gyansor...@gmail.com> wrote: > > On Oct 28, 3:55&#4294967295;am, maury <maury...@core.com> wrote: > > > On Oct 26, 9:38&#4294967295;pm, fatalist <simfid...@gmail.com> wrote:
...
> > > > Everybody and his uncle knows that the standard text-book answer to > > > > the OP' question is > > > > *generalized cross-correlation* (GCC), which turns into classical > > > > cross-correlation when weighting function is chosen to be equal to > > > > one. > > > > > > > > Springer Handbook of Speech Processing, Chapter 51 "Time Delay > > > > Estimation and Source Localization" (Page 1045) > > > > > > > > This works OK if there is no room reverberation. > > > > > > > > With room reverberation.......
simple reflections and maybe reverb can add correlation "components", i.e. peaks in the magnitude of the cross-correlation that may be taken as candidates for the delay you're trying to estimate. usually, the direct path is also the shortest path and as such you might expect to be the loudest peak in the cross-correlation. but i can imagine a situation where it isn't (like source and microphone space apart by 2 meters in some room with walls with a thick piece of sonex foam placed in between - it might be that the reflection offa the wall is stronger than the direct hit through the sound insulation). i dunno what you would do with a definite peak at the shortest lag that isn't the strongest peak. that might be a little confusing, but i think that as long as you can differentiate that peak from noise, then it would have to be the delay for the shortest path, even if it isn't the strongest path.
> > > > > Doing a correlation on a speech signal will, in general, give you the > > > pitch period of the voiced portion, not the delay of the speech.
i think you mean auto-correlation. autocorrelation is cross- correlating a signal with itself. with a lag of zero, of course you will get very good correlation no matter what the signal is. then if the signal is periodic (or quasi-periodic), then a delayed copy of that signal (delayed by an integer multiple of the period) also correlates well with the original signal.
> > > If you're going to use correlation, you need the envelope (or something > > > similar).
that i don't completely understand.
> > > > no it won't! Cross-correlation between two speech signals gives you > > the delay. This is well documented in the literature.
i think the hardy soul is right. (but the two speech signals really oughta be the same speech signal with different delays and some kinda noise or error. they must have some common source or they won't correlate at any lag.)
> > Of course. I'm still have the AMDF in mind. It's a lot cheaper, > computationally, but you must filter the signal first.
i still don't get what you're saying, maury. to the best of my understanding how all of these relate is: 1. AMDF and ASDF (Average Squared Difference Function) have a lot in common. both are always non-negative and, given a periodic or quasiperiodic input, will go to zero (or close to zero in the quasi case) at the same lags. and, of course, both will have a value of zero at a lag of zero. 2. auto-correlation is essentially the ASDF turned upside down with a bias (the auto-correlation at lag zero or the energy of the signal) added. 3. so, for a common single input signal (not a pair), we expect the auto-correlation to peak at the same lags where both ASDF and AMDF have minima. but AM[D]F does not work so well getting the relative delay of two correlated signals. even if the [D]ifference was that of the two signals (not of the same signal at different delays), the AMDF ain't gonna minimize too well if the amplitudes (or attached gains) of the one signal and its delayed version are significantly different. but, for cross-correlation, different gains don't change anything except for the scaling of the whole thang. the relative peaks stay at the same relative values and at exactly the same lags. r b-j
On Oct 27, 12:15&#4294967295;am, "steveu" <steveu@n_o_s_p_a_m.coppice.org> wrote:
> > Isn't that only going to work well for clean signals from the source? The > OP said these are signals from two mics, so they are going to have a lot of > reverb mixed in, and the reverb will be quite different at each mic. As > Vlad said, the cross correlation might look near to random.
so you'll get noisy peaks in the auto-correlation (and there will also be other sources from other angles that will have other path-length differences - so then when you see a peak, that will be a legitimate candidate for the time delay, but of a different source). but (alternatively to what Vlad is saying) i would expect that, for a source that is significantly louder than the competing sources, and if there isn't anything too weird going on (like the example in the post i just previously posted), then the loudest peak in the cross- correlation will have a lag value that is the path length difference of the two direct paths between the loudest source and the two microphones. just like our brains do in the Blumlein stereo model, one can calculate (from the lag of the peak, the spacing of the mics, and the speed of propagation) the angle of the source from offa the axis of the line passing through the two mics. it's just a little geometry and trig. r b-j
On Oct 27, 4:12&#4294967295;am, HardySpicer <gyansor...@gmail.com> wrote:
> > You need the Generalized Cross Correlation method.(there are many > varients - eg Hanan Thomson, SCOT,PHAT) &#4294967295;This works up to a point as I > have tried it in a real environment.
Hardy, i dunno what exactly is meant by *generalized* cross- correlation.
> Ordinary cross correlation is no good.
'smatter with ordinary, vanilla-flavored cross-correlation? (i assume it's windowed, because the summation is finite.) r b-j
On Oct 28, 12:12&#4294967295;am, robert bristow-johnson
<r...@audioimagination.com> wrote:
> On Oct 27, 4:12&#4294967295;am, HardySpicer <gyansor...@gmail.com> wrote: > > > > > You need the Generalized Cross Correlation method.(there are many > > variants - e.g. Hanan Thomson, SCOT,PHAT) &#4294967295;This works up to a point as I > > have tried it in a real environment. > > Hardy, i dunno what exactly is meant by *generalized* cross- > correlation. >
okay, i found some definition (for GCC-PHAT) at: http://www.xavieranguera.com/phdthesis/node92.html we know that (Discrete-Time) Fourier Transform the (ordinary) cross- correlation of x[n] and y[n] is X(w) * conj{Y(w)} ("*" means multiply) where X(w) and Y(w) are the DTFTs of x[n] and y[n]. so you inverse DTFT that expression and you have the regular, vanilla cross- correlation and you look for the lag with the maximum magnitude value. this GCC-PHAT thingie is the same except the magnitude of that expression above is normalized by dividing by its magnitude: | X(w) * conj{Y(w)} | so what does that do for you? all it seems to do is relatively amplify mutually weak frequency components (or relatively attenuate mutually strong frequency components). how does that help? r b-j
On Oct 28, 12:42&#4294967295;am, robert bristow-johnson
<r...@audioimagination.com> wrote:
> On Oct 28, 12:12&#4294967295;am, robert bristow-johnson > > <r...@audioimagination.com> wrote: > > On Oct 27, 4:12&#4294967295;am, HardySpicer <gyansor...@gmail.com> wrote: > > > > You need the Generalized Cross Correlation method.(there are many > > > variants - e.g. Hanan Thomson, SCOT,PHAT) &#4294967295;This works up to a point as I > > > have tried it in a real environment. > > > Hardy, i dunno what exactly is meant by *generalized* cross- > > correlation. > > okay, i found some definition (for GCC-PHAT) at:http://www.xavieranguera.com/phdthesis/node92.html > > we know that (Discrete-Time) Fourier Transform the (ordinary) cross- > correlation of x[n] and y[n] is > > &#4294967295; &#4294967295; X(w) * conj{Y(w)} &#4294967295; &#4294967295; &#4294967295;("*" means multiply) > > where X(w) and Y(w) are the DTFTs of x[n] and y[n]. &#4294967295;so you inverse > DTFT that expression and you have the regular, vanilla cross- > correlation and you look for the lag with the maximum magnitude value. > > this GCC-PHAT thingie is the same except the magnitude of that > expression above is normalized by dividing by its magnitude: > > &#4294967295; &#4294967295;| X(w) * conj{Y(w)} | > > so what does that do for you? &#4294967295;all it seems to do is relatively > amplify mutually weak frequency components (or relatively attenuate > mutually strong frequency components). &#4294967295;how does that help? > > r b-j
The 'generalized' cross correlator uses a filter in the frequency domain. So you transform the outputs of 2 (zero padded) sensors, conjugate one of them, multiply, then apply a frequency domain filter, after which you inverse transform. The filter depends very much on what kind of signal and noise you're dealing with. Three references for the above are: J. C. Hassab, R. E. Boucher, &#4294967295;Optimum Estimation of Time Delay by a Generalized Correlator,&#4294967295; IEEE T-ASSP, vol. 27, no. 4, Aug. 1979, pp. 373-380. J. C. Hassab, R. E. Boucher, &#4294967295;Performance of the Generalized Cross Correlator in the Presence of a Strong Spectral Peak in the Signal,&#4294967295; IEEE T-ASSP, vol. 29, no. 3, June 1981, pp. 549-555. J. C. Hassab, R. E. Boucher, &#4294967295;An Experimental Comparison of Optimum and Sub-Optimum Filters&#4294967295; Effectiveness in the Generalized Correlator,&#4294967295; J. Sound and Vibration, 1981, pp. 4+ (12 pages total). The 3 references were handouts from the first author for a graduate course in time delay estimation. He taught the course in the early 1980's. As I recall, there were at least 6 different filters we looked at: 1) W(subscript HBII), 2) W(subscript HBI), 3)W(subscript E for Eckart), 4) W(subscript ML - for maximum likelihood), 5) SCOT (for smoothed coherence transform), and 6) W(subscript LS for least squares). Suffice to say that they're all different, and they work well (or poorly), depending on the signal and noise you're dealing with. So it's not just about normalization. Kevin McGee

robert bristow-johnson wrote:

> On Oct 27, 12:15 am, "steveu" <steveu@n_o_s_p_a_m.coppice.org> wrote: > >>Isn't that only going to work well for clean signals from the source? The >>OP said these are signals from two mics, so they are going to have a lot of >>reverb mixed in, and the reverb will be quite different at each mic. As >>Vlad said, the cross correlation might look near to random. > > > so you'll get noisy peaks in the auto-correlation (and there will also > be other sources from other angles that will have other path-length > differences - so then when you see a peak, that will be a legitimate > candidate for the time delay, but of a different source). but > (alternatively to what Vlad is saying) i would expect that, for a > source that is significantly louder than the competing sources
The reality is that if you hear the same audio source from two different positions, those are going to be two very different signals and their correlation function is a mess. That happens due to multipath and reverberations, as well as because the same audio source behaves quite differently at different view angles. It is a naive idea to expect any accurate result from trivial AMDF or correlation approach. DOA is no simple problem; tons of books are written about it. IIRC doctor Rune was specializing at that; perhaps he could clarify. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On Oct 28, 5:57&#4294967295;am, kevin <kevinjmc...@netscape.net> wrote:
> On Oct 28, 12:42&#4294967295;am, robert bristow-johnson > > > > we know that (Discrete-Time) Fourier Transform the (ordinary) cross- > > correlation of x[n] and y[n] is > > > &#4294967295; &#4294967295; X(w) * conj{Y(w)} &#4294967295; &#4294967295; &#4294967295;("*" means multiply) > > > where X(w) and Y(w) are the DTFTs of x[n] and y[n]. &#4294967295;so you inverse > > DTFT that expression and you have the regular, vanilla cross- > > correlation and you look for the lag with the maximum magnitude value. > > > this GCC-PHAT thingie is the same except the magnitude of that > > expression above is normalized by dividing by its magnitude: > > > &#4294967295; &#4294967295;| X(w) * conj{Y(w)} | > > > so what does that do for you? &#4294967295;all it seems to do is relatively > > amplify mutually weak frequency components (or relatively attenuate > > mutually strong frequency components). &#4294967295;how does that help? > > > The 'generalized' cross correlator uses a filter in the frequency > domain. &#4294967295;So you transform the outputs of 2 (zero padded) sensors, > conjugate one of them, multiply, then apply a frequency domain filter, > after which you inverse transform. > > The filter depends very much on what kind of signal and noise you're > dealing with. >
well, the filter for PHAT is pretty clearly H(f) = 1 / | X(w) * conj{Y(w)} | i mean, it's pretty clear that regular correlation is like a data- dependent filtering too, when you convolve one input by the other input time-reversed. ...
> As I recall, there were at least 6 different filters we looked at: 1) > W(subscript HBII), 2) W(subscript HBI), 3)W(subscript E for Eckart), > 4) W(subscript ML - for maximum likelihood), 5) SCOT (for smoothed > coherence transform), and 6) W(subscript LS for least squares). > > Suffice to say that they're all different, and they work well (or > poorly), depending on the signal and noise you're dealing with. &#4294967295;So > it's not just about normalization.
thanks for the references. so it's all a matter of filtering in the frequency domain. i'll investigate more. at one time (way more than a decade ago), i've done delay estimation (actually source direction estimation) using "ordinary" cross-correlation in the time domain. seemed to work okay, except when there was an interfering source from a different direction. was not aware of these "generalized" variants. r b-j