comp.dsp | to calculate time delay between two signals| page 2

Reply by HardySpicer ●October 27, 20112011-10-27

On Oct 28, 3:55&#4294967295;am, maury <maury...@core.com> wrote:
> On Oct 26, 9:38&#4294967295;pm, fatalist <simfid...@gmail.com> wrote:
>
>
>
> > On Oct 26, 7:48&#4294967295;pm, brent <buleg...@columbus.rr.com> wrote:
>
> > > On Oct 26, 4:24&#4294967295;pm, maury <maury...@core.com> wrote:
>
> > > > On Oct 26, 9:51&#4294967295;am, "padma.kancharla" <pkanchar@n_o_s_p_a_m.mit.edu>
> > > > wrote:
>
> > > > > HI
>
> > > > > I am a CS student and new to this field. I am working on source
> > > > > localisation problem. I do not know matlab. Could you please throw some
> > > > > light on how to implement the following steps in C language.
>
> > > > > 1.find the cross correlation between two signals received at two mics
> > > > > simultaneously. Here the signals refer to speech signals.
>
> > > > > 2.determine the accurate time delay between them.
>
> > > > > Please give me ideas or sources that help me deal with this problem and
> > > > > that would teach me how to implement them from scratch as I am very new to
> > > > > this field.
>
> > > > > I really need it asap.
>
> > > > > Thanks in advance,
> > > > > Padma
>
> > > > Look at U. S. patent 6947551. It shows how to use the AMDF as a
> > > > correlator to do time delay for speech signals. f course, if you use
> > > > the AMDF (even for research), you will need to pay the invention
> > > > assignee. But, there are other ways to do correlation. This, at least,
> > > > shows you how.
>
> > > > And, it's very accurate.
>
> > > > Maurice Givens
>
> > > That seems to contradict what the non- "IDIOT" is saying. &#4294967295;I guess you
> > > are &#4294967295;an anti-non IDIOT.- Hide quoted text -
>
> > > - Show quoted text -
>
> > Severe PMS can cloud person's mind and turn anybody into stupident :)
>
> > Everybody and his uncle knows that the standard text-book answer to
> > the OP' question is
> > *generalized cross-correlation* (GCC), which turns into classical
> > cross-correlation when weighting function is chosen to be equal to
> > one.
>
> > Springer Handbook of Speech Processing, Chapter 51 "Time Delay
> > Estimation and Source Localization" (Page 1045)
>
> > This works OK if there is no room reverberation.
>
> > With room reverberation.......- Hide quoted text -
>
> > - Show quoted text -
>
> Doing a correlation on a speech signal will, in general, give you the
> pitch period of the voiced portion, not the delay of the speech. If
> you're going to use correlation, you need the envelope (or something
> similar).
>
> Maurice Givens

no it won't! Cross-correlation between two speech signals gives you
the delay. This is well documented in the literature.

Reply by maury ●October 27, 20112011-10-27

On Oct 27, 2:14&#4294967295;pm, HardySpicer <gyansor...@gmail.com> wrote:
> On Oct 28, 3:55&#4294967295;am, maury <maury...@core.com> wrote:
>
>
>
>
>
> > On Oct 26, 9:38&#4294967295;pm, fatalist <simfid...@gmail.com> wrote:
>
> > > On Oct 26, 7:48&#4294967295;pm, brent <buleg...@columbus.rr.com> wrote:
>
> > > > On Oct 26, 4:24&#4294967295;pm, maury <maury...@core.com> wrote:
>
> > > > > On Oct 26, 9:51&#4294967295;am, "padma.kancharla" <pkanchar@n_o_s_p_a_m.mit.edu>
> > > > > wrote:
>
> > > > > > HI
>
> > > > > > I am a CS student and new to this field. I am working on source
> > > > > > localisation problem. I do not know matlab. Could you please throw some
> > > > > > light on how to implement the following steps in C language.
>
> > > > > > 1.find the cross correlation between two signals received at two mics
> > > > > > simultaneously. Here the signals refer to speech signals.
>
> > > > > > 2.determine the accurate time delay between them.
>
> > > > > > Please give me ideas or sources that help me deal with this problem and
> > > > > > that would teach me how to implement them from scratch as I am very new to
> > > > > > this field.
>
> > > > > > I really need it asap.
>
> > > > > > Thanks in advance,
> > > > > > Padma
>
> > > > > Look at U. S. patent 6947551. It shows how to use the AMDF as a
> > > > > correlator to do time delay for speech signals. f course, if you use
> > > > > the AMDF (even for research), you will need to pay the invention
> > > > > assignee. But, there are other ways to do correlation. This, at least,
> > > > > shows you how.
>
> > > > > And, it's very accurate.
>
> > > > > Maurice Givens
>
> > > > That seems to contradict what the non- "IDIOT" is saying. &#4294967295;I guess you
> > > > are &#4294967295;an anti-non IDIOT.- Hide quoted text -
>
> > > > - Show quoted text -
>
> > > Severe PMS can cloud person's mind and turn anybody into stupident :)
>
> > > Everybody and his uncle knows that the standard text-book answer to
> > > the OP' question is
> > > *generalized cross-correlation* (GCC), which turns into classical
> > > cross-correlation when weighting function is chosen to be equal to
> > > one.
>
> > > Springer Handbook of Speech Processing, Chapter 51 "Time Delay
> > > Estimation and Source Localization" (Page 1045)
>
> > > This works OK if there is no room reverberation.
>
> > > With room reverberation.......- Hide quoted text -
>
> > > - Show quoted text -
>
> > Doing a correlation on a speech signal will, in general, give you the
> > pitch period of the voiced portion, not the delay of the speech. If
> > you're going to use correlation, you need the envelope (or something
> > similar).
>
> > Maurice Givens
>
> no it won't! Cross-correlation between two speech signals gives you
> the delay. This is well documented in the literature.- Hide quoted text -
>
> - Show quoted text -

Of course. I'm still have the AMDF in mind. It's a lot cheaper,
computationally, but you must filter the signal first.

Reply by HardySpicer ●October 28, 20112011-10-28

On Oct 28, 9:36&#4294967295;am, maury <maury...@core.com> wrote:
> On Oct 27, 2:14&#4294967295;pm, HardySpicer <gyansor...@gmail.com> wrote:
>
>
>
> > On Oct 28, 3:55&#4294967295;am, maury <maury...@core.com> wrote:
>
> > > On Oct 26, 9:38&#4294967295;pm, fatalist <simfid...@gmail.com> wrote:
>
> > > > On Oct 26, 7:48&#4294967295;pm, brent <buleg...@columbus.rr.com> wrote:
>
> > > > > On Oct 26, 4:24&#4294967295;pm, maury <maury...@core.com> wrote:
>
> > > > > > On Oct 26, 9:51&#4294967295;am, "padma.kancharla" <pkanchar@n_o_s_p_a_m.mit.edu>
> > > > > > wrote:
>
> > > > > > > HI
>
> > > > > > > I am a CS student and new to this field. I am working on source
> > > > > > > localisation problem. I do not know matlab. Could you please throw some
> > > > > > > light on how to implement the following steps in C language.
>
> > > > > > > 1.find the cross correlation between two signals received at two mics
> > > > > > > simultaneously. Here the signals refer to speech signals.
>
> > > > > > > 2.determine the accurate time delay between them.
>
> > > > > > > Please give me ideas or sources that help me deal with this problem and
> > > > > > > that would teach me how to implement them from scratch as I am very new to
> > > > > > > this field.
>
> > > > > > > I really need it asap.
>
> > > > > > > Thanks in advance,
> > > > > > > Padma
>
> > > > > > Look at U. S. patent 6947551. It shows how to use the AMDF as a
> > > > > > correlator to do time delay for speech signals. f course, if you use
> > > > > > the AMDF (even for research), you will need to pay the invention
> > > > > > assignee. But, there are other ways to do correlation. This, at least,
> > > > > > shows you how.
>
> > > > > > And, it's very accurate.
>
> > > > > > Maurice Givens
>
> > > > > That seems to contradict what the non- "IDIOT" is saying. &#4294967295;I guess you
> > > > > are &#4294967295;an anti-non IDIOT.- Hide quoted text -
>
> > > > > - Show quoted text -
>
> > > > Severe PMS can cloud person's mind and turn anybody into stupident :)
>
> > > > Everybody and his uncle knows that the standard text-book answer to
> > > > the OP' question is
> > > > *generalized cross-correlation* (GCC), which turns into classical
> > > > cross-correlation when weighting function is chosen to be equal to
> > > > one.
>
> > > > Springer Handbook of Speech Processing, Chapter 51 "Time Delay
> > > > Estimation and Source Localization" (Page 1045)
>
> > > > This works OK if there is no room reverberation.
>
> > > > With room reverberation.......- Hide quoted text -
>
> > > > - Show quoted text -
>
> > > Doing a correlation on a speech signal will, in general, give you the
> > > pitch period of the voiced portion, not the delay of the speech. If
> > > you're going to use correlation, you need the envelope (or something
> > > similar).
>
> > > Maurice Givens
>
> > no it won't! Cross-correlation between two speech signals gives you
> > the delay. This is well documented in the literature.- Hide quoted text -
>
> > - Show quoted text -
>
> Of course. I'm still have the AMDF in mind. It's a lot cheaper,
> computationally, but you must filter the signal first.

well, if you use the generalise cross-correlation you need not filter
first since this is done by the algorithm itself and depends on
coherence etc.

Hardy

Reply by robert bristow-johnson ●October 28, 20112011-10-28

On Oct 27, 4:36&#4294967295;pm, maury <maury...@core.com> wrote:
> On Oct 27, 2:14&#4294967295;pm, HardySpicer <gyansor...@gmail.com> wrote:
> > On Oct 28, 3:55&#4294967295;am, maury <maury...@core.com> wrote:
> > > On Oct 26, 9:38&#4294967295;pm, fatalist <simfid...@gmail.com> wrote:
...
> > > > Everybody and his uncle knows that the standard text-book answer to
> > > > the OP' question is
> > > > *generalized cross-correlation* (GCC), which turns into classical
> > > > cross-correlation when weighting function is chosen to be equal to
> > > > one.
> > > >
> > > > Springer Handbook of Speech Processing, Chapter 51 "Time Delay
> > > > Estimation and Source Localization" (Page 1045)
> > > >
> > > > This works OK if there is no room reverberation.
> > > >
> > > > With room reverberation.......

simple reflections and maybe reverb can add correlation "components",
i.e. peaks in the magnitude of the cross-correlation that may be taken
as candidates for the delay you're trying to estimate.  usually, the
direct path is also the shortest path and as such you might expect to
be the loudest peak in the cross-correlation.  but i can imagine a
situation where it isn't (like source and microphone space apart by 2
meters in some room with walls with a thick piece of sonex foam placed
in between - it might be that the reflection offa the wall is stronger
than the direct hit through the sound insulation).  i dunno what you
would do with a definite peak at the shortest lag that isn't the
strongest peak.  that might be a little confusing, but i think that as
long as you can differentiate that peak from noise, then it would have
to be the delay for the shortest path, even if it isn't the strongest
path.

> >
> > > Doing a correlation on a speech signal will, in general, give you the
> > > pitch period of the voiced portion, not the delay of the speech.

i think you mean auto-correlation.  autocorrelation is cross-
correlating a signal with itself.  with a lag of zero, of course you
will get very good correlation no matter what the signal is.  then if
the signal is periodic (or quasi-periodic), then a delayed copy of
that signal (delayed by an integer multiple of the period) also
correlates well with the original signal.

> > > If you're going to use correlation, you need the envelope (or something
> > > similar).

that i don't completely understand.

> >
> > no it won't! Cross-correlation between two speech signals gives you
> > the delay. This is well documented in the literature.

i think the hardy soul is right.  (but the two speech signals really
oughta be the same speech signal with different delays and some kinda
noise or error.  they must have some common source or they won't
correlate at any lag.)

>
> Of course. I'm still have the AMDF in mind. It's a lot cheaper,
> computationally, but you must filter the signal first.

i still don't get what you're saying, maury.

to the best of my understanding how all of these relate is:

1.  AMDF and ASDF (Average Squared Difference Function) have a lot in
common.  both are always non-negative and, given a periodic or
quasiperiodic input, will go to zero (or close to zero in the quasi
case) at the same lags.  and, of course, both will have a value of
zero at a lag of zero.

2.  auto-correlation is essentially the ASDF turned upside down with a
bias (the auto-correlation at lag zero or the energy of the signal)
added.

3.  so, for a common single input signal (not a pair), we expect the
auto-correlation to peak at the same lags where both ASDF and AMDF
have minima.

but AM[D]F does not work so well getting the relative delay of two
correlated signals.  even if the [D]ifference was that of the two
signals (not of the same signal at different delays), the AMDF ain't
gonna minimize too well if the amplitudes (or attached gains) of the
one signal and its delayed version are significantly different.  but,
for cross-correlation, different gains don't change anything except
for the scaling of the whole thang.  the relative peaks stay at the
same relative values and at exactly the same lags.

r b-j

Reply by robert bristow-johnson ●October 28, 20112011-10-28

On Oct 27, 12:15&#4294967295;am, "steveu" <steveu@n_o_s_p_a_m.coppice.org> wrote:
>
> Isn't that only going to work well for clean signals from the source? The
> OP said these are signals from two mics, so they are going to have a lot of
> reverb mixed in, and the reverb will be quite different at each mic. As
> Vlad said, the cross correlation might look near to random.

so you'll get noisy peaks in the auto-correlation (and there will also
be other sources from other angles that will have other path-length
differences - so then when you see a peak, that will be a legitimate
candidate for the time delay, but of a different source).  but
(alternatively to what Vlad is saying) i would expect that, for a
source that is significantly louder than the competing sources, and if
there isn't anything too weird going on (like the example in the post
i just previously posted), then the loudest peak in the cross-
correlation will have a lag value that is the path length difference
of the two direct paths between the loudest source and the two
microphones.  just like our brains do in the Blumlein stereo model,
one can calculate (from the lag of the peak, the spacing of the mics,
and the speed of propagation) the angle of the source from offa the
axis of the line passing through the two mics.  it's just a little
geometry and trig.

r b-j

Reply by robert bristow-johnson ●October 28, 20112011-10-28

On Oct 27, 4:12&#4294967295;am, HardySpicer <gyansor...@gmail.com> wrote:
>
> You need the Generalized Cross Correlation method.(there are many
> varients - eg Hanan Thomson, SCOT,PHAT) &#4294967295;This works up to a point as I
> have tried it in a real environment.

Hardy, i dunno what exactly is meant by *generalized* cross-
correlation.

> Ordinary cross correlation is no good.

'smatter with ordinary, vanilla-flavored cross-correlation?  (i assume
it's windowed, because the summation is finite.)

r b-j

Reply by robert bristow-johnson ●October 28, 20112011-10-28

On Oct 28, 12:12&#4294967295;am, robert bristow-johnson
<r...@audioimagination.com> wrote:
> On Oct 27, 4:12&#4294967295;am, HardySpicer <gyansor...@gmail.com> wrote:
>
>
>
> > You need the Generalized Cross Correlation method.(there are many
> > variants - e.g. Hanan Thomson, SCOT,PHAT) &#4294967295;This works up to a point as I
> > have tried it in a real environment.
>
> Hardy, i dunno what exactly is meant by *generalized* cross-
> correlation.
>

okay, i found some definition (for GCC-PHAT) at:
http://www.xavieranguera.com/phdthesis/node92.html

we know that (Discrete-Time) Fourier Transform the (ordinary) cross-
correlation of x[n] and y[n] is

    X(w) * conj{Y(w)}      ("*" means multiply)

where X(w) and Y(w) are the DTFTs of x[n] and y[n].  so you inverse
DTFT that expression and you have the regular, vanilla cross-
correlation and you look for the lag with the maximum magnitude value.

this GCC-PHAT thingie is the same except the magnitude of that
expression above is normalized by dividing by its magnitude:

   | X(w) * conj{Y(w)} |

so what does that do for you?  all it seems to do is relatively
amplify mutually weak frequency components (or relatively attenuate
mutually strong frequency components).  how does that help?

r b-j

Reply by kevin ●October 28, 20112011-10-28

On Oct 28, 12:42&#4294967295;am, robert bristow-johnson
<r...@audioimagination.com> wrote:
> On Oct 28, 12:12&#4294967295;am, robert bristow-johnson
>
> <r...@audioimagination.com> wrote:
> > On Oct 27, 4:12&#4294967295;am, HardySpicer <gyansor...@gmail.com> wrote:
>
> > > You need the Generalized Cross Correlation method.(there are many
> > > variants - e.g. Hanan Thomson, SCOT,PHAT) &#4294967295;This works up to a point as I
> > > have tried it in a real environment.
>
> > Hardy, i dunno what exactly is meant by *generalized* cross-
> > correlation.
>
> okay, i found some definition (for GCC-PHAT) at:http://www.xavieranguera.com/phdthesis/node92.html
>
> we know that (Discrete-Time) Fourier Transform the (ordinary) cross-
> correlation of x[n] and y[n] is
>
> &#4294967295; &#4294967295; X(w) * conj{Y(w)} &#4294967295; &#4294967295; &#4294967295;("*" means multiply)
>
> where X(w) and Y(w) are the DTFTs of x[n] and y[n]. &#4294967295;so you inverse
> DTFT that expression and you have the regular, vanilla cross-
> correlation and you look for the lag with the maximum magnitude value.
>
> this GCC-PHAT thingie is the same except the magnitude of that
> expression above is normalized by dividing by its magnitude:
>
> &#4294967295; &#4294967295;| X(w) * conj{Y(w)} |
>
> so what does that do for you? &#4294967295;all it seems to do is relatively
> amplify mutually weak frequency components (or relatively attenuate
> mutually strong frequency components). &#4294967295;how does that help?
>
> r b-j

The 'generalized' cross correlator uses a filter in the frequency
domain.  So you transform the outputs of 2 (zero padded) sensors,
conjugate one of them, multiply, then apply a frequency domain filter,
after which you inverse transform.

The filter depends very much on what kind of signal and noise you're
dealing with.

Three references for the above are:

J. C. Hassab, R. E. Boucher, &#4294967295;Optimum Estimation of Time Delay by a
Generalized
Correlator,&#4294967295; IEEE T-ASSP, vol. 27, no. 4, Aug. 1979, pp. 373-380.

J. C. Hassab, R. E. Boucher, &#4294967295;Performance of the Generalized Cross
Correlator in the
Presence of a Strong Spectral Peak in the Signal,&#4294967295; IEEE T-ASSP, vol.
29, no. 3, June 1981, pp. 549-555.

J. C. Hassab, R. E. Boucher, &#4294967295;An Experimental Comparison of Optimum
and Sub-Optimum
Filters&#4294967295; Effectiveness in the Generalized Correlator,&#4294967295; J. Sound and
Vibration, 1981, pp. 4+ (12 pages total).

The 3 references were handouts from the first author for a graduate
course in time delay estimation.  He taught the course in the early
1980's.

As I recall, there were at least 6 different filters we looked at: 1)
W(subscript HBII), 2) W(subscript HBI), 3)W(subscript E for Eckart),
4) W(subscript ML - for maximum likelihood), 5) SCOT (for smoothed
coherence transform), and 6) W(subscript LS for least squares).

Suffice to say that they're all different, and they work well (or
poorly), depending on the signal and noise you're dealing with.  So
it's not just about normalization.

Kevin McGee

Reply by Vladimir Vassilevsky ●October 28, 20112011-10-28

robert bristow-johnson wrote:

> On Oct 27, 12:15 am, "steveu" <steveu@n_o_s_p_a_m.coppice.org> wrote:
> 
>>Isn't that only going to work well for clean signals from the source? The
>>OP said these are signals from two mics, so they are going to have a lot of
>>reverb mixed in, and the reverb will be quite different at each mic. As
>>Vlad said, the cross correlation might look near to random.
> 
> 
> so you'll get noisy peaks in the auto-correlation (and there will also
> be other sources from other angles that will have other path-length
> differences - so then when you see a peak, that will be a legitimate
> candidate for the time delay, but of a different source).  but
> (alternatively to what Vlad is saying) i would expect that, for a
> source that is significantly louder than the competing sources

The reality is that if you hear the same audio source from two different 
positions, those are going to be two very different signals and their 
correlation function is a mess. That happens due to multipath and 
reverberations, as well as because the same audio source behaves quite 
differently at different view angles. It is a naive idea to expect any 
accurate result from trivial AMDF or correlation approach. DOA is no 
simple problem; tons of books are written about it. IIRC doctor Rune was 
specializing at that; perhaps he could clarify.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com

Reply by robert bristow-johnson ●October 28, 20112011-10-28

On Oct 28, 5:57&#4294967295;am, kevin <kevinjmc...@netscape.net> wrote:
> On Oct 28, 12:42&#4294967295;am, robert bristow-johnson
>
>
> > we know that (Discrete-Time) Fourier Transform the (ordinary) cross-
> > correlation of x[n] and y[n] is
>
> > &#4294967295; &#4294967295; X(w) * conj{Y(w)} &#4294967295; &#4294967295; &#4294967295;("*" means multiply)
>
> > where X(w) and Y(w) are the DTFTs of x[n] and y[n]. &#4294967295;so you inverse
> > DTFT that expression and you have the regular, vanilla cross-
> > correlation and you look for the lag with the maximum magnitude value.
>
> > this GCC-PHAT thingie is the same except the magnitude of that
> > expression above is normalized by dividing by its magnitude:
>
> > &#4294967295; &#4294967295;| X(w) * conj{Y(w)} |
>
> > so what does that do for you? &#4294967295;all it seems to do is relatively
> > amplify mutually weak frequency components (or relatively attenuate
> > mutually strong frequency components). &#4294967295;how does that help?
>
>
> The 'generalized' cross correlator uses a filter in the frequency
> domain. &#4294967295;So you transform the outputs of 2 (zero padded) sensors,
> conjugate one of them, multiply, then apply a frequency domain filter,
> after which you inverse transform.
>
> The filter depends very much on what kind of signal and noise you're
> dealing with.
>

well, the filter for PHAT is pretty clearly

    H(f) = 1 / | X(w) * conj{Y(w)} |


i mean, it's pretty clear that regular correlation is like a data-
dependent filtering too, when you convolve one input by the other
input time-reversed.

...
> As I recall, there were at least 6 different filters we looked at: 1)
> W(subscript HBII), 2) W(subscript HBI), 3)W(subscript E for Eckart),
> 4) W(subscript ML - for maximum likelihood), 5) SCOT (for smoothed
> coherence transform), and 6) W(subscript LS for least squares).
>
> Suffice to say that they're all different, and they work well (or
> poorly), depending on the signal and noise you're dealing with. &#4294967295;So
> it's not just about normalization.

thanks for the references.

so it's all a matter of filtering in the frequency domain. i'll
investigate more.  at one time (way more than a decade ago), i've done
delay estimation (actually source direction estimation) using
"ordinary" cross-correlation in the time domain.  seemed to work okay,
except when there was an interfering source from a different
direction.  was not aware of these "generalized" variants.

r b-j

Previous 123 4 Next

to calculate time delay between two signals

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group