DSPRelated.com
Forums

Estimating time offset between two audio signals.

Started by Mauritz Jameson April 18, 2013
On 4/18/2013 9:46 PM, Mauritz Jameson wrote:
> I'm looking for some recommendations on real-time algorithms which are > able to estimate the time offset (delay) between two signals. One > signal is the source signal (speaker signal). The other signal is a > filtered version of the speaker signal (echo, microphone signal). > Delay might be as large as 500ms. > > Running NLMS on downsampled signals to estimate the time offset works > great if the delay is "stationary", but in this case the time offset > might change suddenly over time. For example, it might be 250ms for > some time and then suddenly (within 10-20 milliseconds) "jump" to > 300ms and shortly after jump back to 250ms. This is, of course, a > symptom of another problem (with the audio stream), but right now I > have to come up with a way of dealing with this problem.
Your system is broken. Make a buffer deep enough to absorb delay quirks; apply intelligent algorithm for synchronization.
> Another issue is the convergence time. If the delay is 300ms and the > delay changes, it takes 300ms before the NLMS algorithm adapts to that > new delay....BUT...if the delay during those 300ms changes to > something else, the algorithm has a hard time tracking that. It > doesn't "see" the delay change. How do you deal with that??? > > I'm not sure if there are any fast and robust real-time algorithm for > this type of problem, but if there is, I'm sure folks here on comp.dsp > will enlighten me.
This could be done in several ways; but there is no point in getting into great complications before the root cause is fixed. Vladimir Vassilevsky DSP and Mixed Signal Designs www.abvolt.com
> This could be done in several ways; but there is no point in getting > into great complications before the root cause is fixed. >
Vlad, I totally agree with you. However,...for the time being...I can't do anything about it and I have to mask the problem with some sort of work-around/fix.
Vlad,

Can you elaborate on your comment:

"Now you can see why it is difficult to do EC at far end. No wonder
all
systems work EC at near end or over synchronous transport"

Your far-end is my near-end and vice versa. So I'm not sure I
understand
what you mean by "difficult to do EC at far end" ?

And what do you mean by "synchronous transport" ?

The AEC processes the microphone signal which during far-end talk is
only composed of background noise and a capture of the acoustic echo
from the loudspeaker. From the near-end speaker's point of view, the
AEC
is done on the near-end. For the far-end speaker's point of view, that
AEC
is done at the far-end.

Which intelligent algorithm would you suggest for synchronization ?




On Apr 19, 6:12&#4294967295;pm, "dszabo" <62466@dsprelated> wrote:
> >On 4/19/2013 3:32 PM, dszabo wrote: > >> I should probably point out that your capacity to calculate a delay is > >> dependent on the presence of transient sound. &#4294967295;For example, is you have > a > >> sine wave going through, the best you can do is measure the phase > >> difference between the input and output, but there would be an ambiguity > in > >> the number of whole cycles that have passed. &#4294967295;This example can be > >> extrapolated to any periodic signal. > > >> Suppose your delay is 200ms, and you have a signal that repeats every > >> 150ms. &#4294967295;You would start the signal every n*150ms, and receive it every > 200 > >> + m*150ms. &#4294967295;At 300 ms, you will have just sent out a signal, and at > 350ms > >> you will receive it, which would imply a 50ms delay. > > >> What all this means, is that trying to calculate a delay of >100ms > during a > >> tonal aspect of a sound is a fools errand because the sound is likely > (for > >> some sounds) to have a period of less than 100 ms. &#4294967295;Your best bet is to > >> wait for a transient that you can look for. > > >Q: Why it is impossible to have sex in Red Square in Moscow ? > >A: Because every bystander idiot would be trying to give his invaluable > >advice. > > >Vladimir Vassilevsky > >DSP and Mixed Signal Designs > >www.abvolt.com > > I love this guy! &#4294967295;Can we hang out some time? &#4294967295;Grab a drink and talk about > the finer points of Kalman filters?
might have to wait until the next comp.dsp conference. i missed the first two, but will endevour to make it to the next one, whenever it is. r b-j
Mauritz Jameson wrote:
> Vlad, > > Can you elaborate on your comment: > > "Now you can see why it is difficult to do EC at far end. No wonder > all > systems work EC at near end or over synchronous transport" > > Your far-end is my near-end and vice versa. So I'm not sure I > understand > what you mean by "difficult to do EC at far end" ? >
Far end is Tx; near end is Rx from the frame of reference of the listener.
> And what do you mean by "synchronous transport" ? >
Synchronous means not-asynchronous - any buffering is purely deterministic and ideally constant delay. A T1 line is synchronous - the clock runs from end to end. The McDyson-Spohn book is worth ten times what it costs ( and weighs ) if you have an interest in that sort of thing.
> The AEC processes the microphone signal which during far-end talk is > only composed of background noise and a capture of the acoustic echo > from the loudspeaker. From the near-end speaker's point of view, the > AEC > is done on the near-end. For the far-end speaker's point of view, that > AEC > is done at the far-end. > > Which intelligent algorithm would you suggest for synchronization ?
RTP (and by extension VoIP) uses jitter buffers. They are quite strange. <snip> -- Les Cargill
On 4/19/2013 8:21 PM, Mauritz Jameson wrote:
> Vlad, > > Can you elaborate on your comment: > > "Now you can see why it is difficult to do EC at far end. No wonder > all > systems work EC at near end or over synchronous transport" > > Your far-end is my near-end and vice versa. So I'm not sure I > understand > what you mean by "difficult to do EC at far end" ?
Near end is whatever is local to speaker and mike. Far end is on the other side of the communication link (wrt this speaker and mike).
> And what do you mean by "synchronous transport" ?
All parts of the system sitting on the same clock. No cycle slips.
> The AEC processes the microphone signal which during far-end talk is > only composed of background noise and a capture of the acoustic echo > from the loudspeaker. From the near-end speaker's point of view, the > AEC > is done on the near-end. For the far-end speaker's point of view, that > AEC > is done at the far-end.
You can close AEC loop at near end, which is typical. Or you can try to close AEC loop at the far end; that is more difficult.
> Which intelligent algorithm would you suggest for synchronization ? >
Estimate the rate of upcomming/outgoing data. Resample the data so everything would work as if it is on the same sample clock. Vladimir Vassilevsky DSP and Mixed Signal Designs www.abvolt.com
On Friday, April 19, 2013 11:04:05 PM UTC+12, Mauritz Jameson wrote:
> @HardySpicer > > > > Will that work for situations where the delay "jumps" ?
Well it worked for me. Depends what you mean by jumping but it does track a varying delay no problem ie as I walk about a room it will get the TDOA between two mics. Biggest problem is reverberation.
I assume you're using the same type of algorithm for the delay estimation?

If so, try this experiment where there's no near-end speech and let me know how that works out:

1) Generate a digital speaker signal which lasts 60 seconds
2) Let the delay toggle between 250ms and 300ms every 7 seconds. So like this:

time = 0s to 7s : Delay = 250ms
time = 7s to 14s : Delay = 300ms
time = 14s to 21s : Delay = 250ms
time = 21s to 28s : Delay = 300ms

..etc..etc

3) Generate a digital microphone signal which meets the requirements in [2]. So like this:

time = 0s to 7s : Time offset between mic and spk signal is 250ms
time = 7s to 14s : Time offset between mic and spk signal is 300ms
time = 14s to 21s : Time offset between mic and spk signal is 250ms
time = 21s to 28s : Time offset between mic and spk signal is 300ms

4) Let your delay estimator process the mic and spk signal. 

Tell me if your delay estimator was able to accurately track the delay. I'm not talking about TDOA. The delay on the acoustic path is constant (why wouldn't it be?) since the mic and spk stay in fixed positions. The delay jumps are happening because the audio subsystem (audio driver etc) is not working optimally. 



Vlad,

You wrote:

"Estimate the rate of upcomming/outgoing data. Resample the data so 
everything would work as if it is on the same sample clock."

I think that's something you would do if you have sample rate drift? Meaning: You get more or less far-end data per second than near-end data, right? This is not the problem in this case. The problem is the audio subsystem. The delay on the transmission path between the digital speaker buffer (which stores incoming audio from RTP) and the digital microphone buffer (which stores audio delivered to the application by the audio driver) varies too much (sudden jumps by more than 30ms). By transmission path I mean:

digital spk buffer -> audio driver (spk)  -> acoustic path -> audio driver (mic) -> digital mic buffer

The delay on the acoustic path is naturally constant.
Vlad,

I guess you are suggesting that I measure how many speaker samples I send to the audio driver versus how many microphone samples I receive from the audio driver per time unit ? If I receive 'M' samples per second and I send 'N' samples per second, then I resample the speaker sample buffer from 'N' Hz to Fs_common and I resample the mic sample buffer from 'M' Hz to Fs_common. Data from the resampled buffers is used as input to the AEC. The output of the AEC is resampled back from Fs_common to 'M' Hz.

Am I understanding you correctly?