DSPRelated.com
Forums

Estimating time offset between two audio signals.

Started by Mauritz Jameson April 18, 2013
I'm looking for some recommendations on real-time algorithms which are
able to estimate the time offset (delay) between two signals. One
signal is the source signal (speaker signal). The other signal is a
filtered version of the speaker signal (echo, microphone signal).
Delay might be as large as 500ms.

Running NLMS on downsampled signals to estimate the time offset works
great if the delay is "stationary", but in this case the time offset
might change suddenly over time. For example, it might be 250ms for
some time and then suddenly (within 10-20 milliseconds) "jump" to
300ms and shortly after jump back to 250ms. This is, of course, a
symptom of another problem (with the audio stream), but right now I
have to come up with a way of dealing with this problem.

Another issue is the convergence time. If the delay is 300ms and the
delay changes, it takes 300ms before the NLMS algorithm adapts to that
new delay....BUT...if the delay during those 300ms changes to
something else, the algorithm has a hard time tracking that. It
doesn't "see" the delay change. How do you deal with that???

I'm not sure if there are any fast and robust real-time algorithm for
this type of problem, but if there is, I'm sure folks here on comp.dsp
will enlighten me.


On Apr 19, 2:46&#4294967295;pm, Mauritz Jameson <mjames2...@gmail.com> wrote:
> I'm looking for some recommendations on real-time algorithms which are > able to estimate the time offset (delay) between two signals. One > signal is the source signal (speaker signal). The other signal is a > filtered version of the speaker signal (echo, microphone signal). > Delay might be as large as 500ms. > > Running NLMS on downsampled signals to estimate the time offset works > great if the delay is "stationary", but in this case the time offset > might change suddenly over time. For example, it might be 250ms for > some time and then suddenly (within 10-20 milliseconds) "jump" to > 300ms and shortly after jump back to 250ms. This is, of course, a > symptom of another problem (with the audio stream), but right now I > have to come up with a way of dealing with this problem. > > Another issue is the convergence time. If the delay is 300ms and the > delay changes, it takes 300ms before the NLMS algorithm adapts to that > new delay....BUT...if the delay during those 300ms changes to > something else, the algorithm has a hard time tracking that. It > doesn't "see" the delay change. How do you deal with that??? > > I'm not sure if there are any fast and robust real-time algorithm for > this type of problem, but if there is, I'm sure folks here on comp.dsp > will enlighten me.Ads not by this site
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1164314&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1164314
@HardySpicer

Will that work for situations where the delay "jumps" ?

>I'm looking for some recommendations on real-time algorithms which are >able to estimate the time offset (delay) between two signals. One >signal is the source signal (speaker signal). The other signal is a >filtered version of the speaker signal (echo, microphone signal). >Delay might be as large as 500ms. > >Running NLMS on downsampled signals to estimate the time offset works >great if the delay is "stationary", but in this case the time offset >might change suddenly over time. For example, it might be 250ms for >some time and then suddenly (within 10-20 milliseconds) "jump" to >300ms and shortly after jump back to 250ms. This is, of course, a >symptom of another problem (with the audio stream), but right now I >have to come up with a way of dealing with this problem. > >Another issue is the convergence time. If the delay is 300ms and the >delay changes, it takes 300ms before the NLMS algorithm adapts to that >new delay....BUT...if the delay during those 300ms changes to >something else, the algorithm has a hard time tracking that. It >doesn't "see" the delay change. How do you deal with that??? > >I'm not sure if there are any fast and robust real-time algorithm for >this type of problem, but if there is, I'm sure folks here on comp.dsp >will enlighten me. >
Are you measuring the delay of the signal as it traverses acoustic space? I.E. The time it takes for the signal to get from the speaker to the microphone? Or is it some kind of loop latency measurement from the microphone to the speaker and back to the microphone? I recall you posting about this before, its great that you were able to get the LMS implementation working, if only in some capacity. I still can't get my head around what exactly you are trying to measure.
@dszabo :

It's not that complicated :)

I have an audio stream coming in from the network (RTP). I'm pushing
the digital audio to an audio driver so that the audio is played out
through some loudspeakers. I'm also pulling audio from my microphone
via the audio driver interface. At some point, an "echo" of the audio
which was played out through the loudspeakers, will show up in the
digital microphone data.

Let's say that the speaker audio at time t = 0 contains the utterance
'A' and at time t = 256ms, I see the echo of that utterance in the
microphone signal. Then the delay is 256ms.

So I'm looking for a real-time algorithm which:

- can estimate that delay
- can adapt quickly to sudden changes (+-50ms) in the delay
- is robust in the sense that it also works if the audio signals are a
bit noisy (light office noise)
- is suitable for audio block processing (10ms blocks)
>@dszabo : > >It's not that complicated :) > >I have an audio stream coming in from the network (RTP). I'm pushing >the digital audio to an audio driver so that the audio is played out >through some loudspeakers. I'm also pulling audio from my microphone >via the audio driver interface. At some point, an "echo" of the audio >which was played out through the loudspeakers, will show up in the >digital microphone data. > >Let's say that the speaker audio at time t = 0 contains the utterance >'A' and at time t = 256ms, I see the echo of that utterance in the >microphone signal. Then the delay is 256ms. > >So I'm looking for a real-time algorithm which: > >- can estimate that delay >- can adapt quickly to sudden changes (+-50ms) in the delay >- is robust in the sense that it also works if the audio signals are a >bit noisy (light office noise) >- is suitable for audio block processing (10ms blocks) >
Right on. So you are measuring the time it takes to output data, play it through the loudspeaker, sound to travel from the loud speaker to the microphone, and input data. I think I get it now. Is the microphone signal being fed back to the loudspeaker, which would create multiple echos? I beleive I had suggested this previously, but you might look at some paper's on this site: http://miracle.otago.ac.nz/tartini/papers.html Tartini is used for pitch detection and is based on a combination of auto correlation and difference algorithms. It was designed to run in real time to provide feedback to musicians. I bet you could easily adapt it, or some of the concepts described in these papers, to what you are trying to do. While some of it should be fairly obvious, it does talk a bit about optimising the algorithms for performance, which would at least be worth a read. Something to think about. Implement some kind of a peak detection. Then, set up a detection algorithm to find transient moments in the audio. When a transient is detected, perform a correlation of the input data with the output with a window containing the transient, say 100ms, and over a period you beleive to contain the echo, say 250ms to 750ms. Rather than a brute force correlation, you could use an FFT based algorithm, or the SNAC algorithm described in the papers measured earlier. To further optimize things, you can narrow the period upon finding a lock, so that successive measurements take less time. This will make a loss of detection respond faster, and the period can be opened up to relock.
I should probably point out that your capacity to calculate a delay is
dependent on the presence of transient sound.  For example, is you have a
sine wave going through, the best you can do is measure the phase
difference between the input and output, but there would be an ambiguity in
the number of whole cycles that have passed.  This example can be
extrapolated to any periodic signal.

Suppose your delay is 200ms, and you have a signal that repeats every
150ms.  You would start the signal every n*150ms, and receive it every 200
+ m*150ms.  At 300 ms, you will have just sent out a signal, and at 350ms
you will receive it, which would imply a 50ms delay.

What all this means, is that trying to calculate a delay of >100ms during a
tonal aspect of a sound is a fools errand because the sound is likely (for
some sounds) to have a period of less than 100 ms.  Your best bet is to
wait for a transient that you can look for.
On 4/19/2013 11:13 AM, Mauritz Jameson wrote:

> I have an audio stream coming in from the network (RTP). I'm pushing > the digital audio to an audio driver so that the audio is played out > through some loudspeakers. I'm also pulling audio from my microphone > via the audio driver interface. At some point, an "echo" of the audio > which was played out through the loudspeakers, will show up in the > digital microphone data.
Now you can see why it is difficult to do EC at far end. No wonder all systems work EC at near end or over synchronous transport. Why can't you do it that way?
> > So I'm looking for a real-time algorithm which: > > - can estimate that delay > - can adapt quickly to sudden changes (+-50ms) in the delay > - is robust in the sense that it also works if the audio signals are a > bit noisy (light office noise) > - is suitable for audio block processing (10ms blocks)
It depends. Before jumping into the hell of difficulties, fix your system first. Vladimir Vassilevsky DSP and Mixed Signal Designs www.abvolt.com
On 4/19/2013 3:32 PM, dszabo wrote:
> I should probably point out that your capacity to calculate a delay is > dependent on the presence of transient sound. For example, is you have a > sine wave going through, the best you can do is measure the phase > difference between the input and output, but there would be an ambiguity in > the number of whole cycles that have passed. This example can be > extrapolated to any periodic signal. > > Suppose your delay is 200ms, and you have a signal that repeats every > 150ms. You would start the signal every n*150ms, and receive it every 200 > + m*150ms. At 300 ms, you will have just sent out a signal, and at 350ms > you will receive it, which would imply a 50ms delay. > > What all this means, is that trying to calculate a delay of >100ms during a > tonal aspect of a sound is a fools errand because the sound is likely (for > some sounds) to have a period of less than 100 ms. Your best bet is to > wait for a transient that you can look for.
Q: Why it is impossible to have sex in Red Square in Moscow ? A: Because every bystander idiot would be trying to give his invaluable advice. Vladimir Vassilevsky DSP and Mixed Signal Designs www.abvolt.com
>On 4/19/2013 3:32 PM, dszabo wrote: >> I should probably point out that your capacity to calculate a delay is >> dependent on the presence of transient sound. For example, is you have
a
>> sine wave going through, the best you can do is measure the phase >> difference between the input and output, but there would be an ambiguity
in
>> the number of whole cycles that have passed. This example can be >> extrapolated to any periodic signal. >> >> Suppose your delay is 200ms, and you have a signal that repeats every >> 150ms. You would start the signal every n*150ms, and receive it every
200
>> + m*150ms. At 300 ms, you will have just sent out a signal, and at
350ms
>> you will receive it, which would imply a 50ms delay. >> >> What all this means, is that trying to calculate a delay of >100ms
during a
>> tonal aspect of a sound is a fools errand because the sound is likely
(for
>> some sounds) to have a period of less than 100 ms. Your best bet is to >> wait for a transient that you can look for. > > >Q: Why it is impossible to have sex in Red Square in Moscow ? >A: Because every bystander idiot would be trying to give his invaluable >advice. > >Vladimir Vassilevsky >DSP and Mixed Signal Designs >www.abvolt.com > >
I love this guy! Can we hang out some time? Grab a drink and talk about the finer points of Kalman filters?