Hi I'm playing around with a traditional NLMS algorithm for acoustic echo cancellation. It seems like the NLMS algorithm have a hard time cancelling the echo as soon as the ERL level creeps above 0dB. Is that to be expected? The speaker signal is pure speech (no noise) and I have linearly filtered the speaker signal to have a simulated microphone signal. The adaptive filter order is 512 taps. I have made a test script which tests the NLMS algo with various step sizes ranging from very small (0.00001) to 1. For low step sizes nothing happens (the echo is not cancelled) and then at some point it just breaks (the error has a huge output ...like an impulse)... Is there a way to calculate the optimal step size given no interference (i.e. no near-end signal) and also given that there is some interference ?

# NLMS and ERL above 0dB

Started by ●June 7, 2012

Reply by ●June 7, 20122012-06-07

On 6/7/12 8:58 PM, Mauritz Jameson wrote:> Hi > > I'm playing around with a traditional NLMS algorithm for acoustic echo > cancellation. > > It seems like the NLMS algorithm have a hard time cancelling the echo > as soon as the ERL level creeps above 0dB.what's an ERL? echo <something> level?> Is that to be expected?if you are using an LMS filter to cancel an echo that is bigger than the direct sound, i wonder if the adaptation mechanism in the LMS filter might go unstable. dunno if the same happens for normalized LMS. the adaptation can be sorta modeled like a Markov process based on the cross-correlation of the input against uncorrected feedback path. it's been a while since i saw this set up. but it's kinda like a discrete-time state-variable control system with the LMS filter taps as the system states. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."

Reply by ●June 8, 20122012-06-08

ERL is echo return loss... It's calculated in dB as 20*log10(var(echo signal) / var(speaker signal)) and it tells you how strong the echo is relative to its source. The algorithm I am using is described here: http://en.wikipedia.org/wiki/Least_mean_squares_filter#Normalised_least_mean_squares_filter_.28NLMS.29

Reply by ●June 8, 20122012-06-08

On 6/7/12 10:58 PM, robert bristow-johnson wrote:> On 6/7/12 8:58 PM, Mauritz Jameson wrote: >> >> I'm playing around with a traditional NLMS algorithm for acoustic echo >> cancellation. >> >> It seems like the NLMS algorithm have a hard time cancelling the echo >> as soon as the ERL level creeps above 0dB....>> Is that to be expected? > > if you are using an LMS filter to cancel an echo that is bigger than the > direct sound, i wonder if the adaptation mechanism in the LMS filter > might go unstable. dunno if the same happens for normalized LMS. > > the adaptation can be sorta modeled like a Markov process based on the > cross-correlation of the input against uncorrected feedback path. it's > been a while since i saw this set up. but it's kinda like a > discrete-time state-variable control system with the LMS filter taps as > the system states.so let's see if this is how it works. anyone is invited to point out any misconceptions here. L tap FIR coefficient vector at time n: H[n] = { h[0,n] h[1,n] h[2,n] ... h[L-1,n] } FIR states at time n: X[n] = { x[n] x[n-1] x[n-2] ... x[n-L+1] } "desired signal" that LMS is s'posed to converge to. LMS: L-1 y[n] = SUM{ h[k,n]*x[n-k] } k=0 e[n] = y[n] - d[n] h[k,n+1] = h[k,n] - mu*e[n]*x[n-k] where NLMS is the same except L-1 h[k,n+1] = h[k,n] - (mu*e[n]*x[n-k]) / (1/L SUM{ (x[n-i])^2 }) i=0 L-1 = h[k,n] - (mu*(y[n]-d[n])*x[n-k]) / (1/L SUM{ (x[n-i])^2 }) i=0 looking at the adaptation term, if you apply expectation values on the whole thing and be real sloppy with the math, you might get something like h[k,n+1] = h[k,n] - ( mu*(Ryx[k]-Rdx[k]) ) / (Rxx[0]) L-1 Rxx[k] = (1/L) * SUM{ x[n-i]*x[n-i-k] } i=0 L-1 Ryx[k] = (1/L) * SUM{ y[n-i]*x[n-i-k] } i=0 L-1 Rdx[k] = (1/L) * SUM{ d[n-i]*x[n-i-k] } i=0 furthermore, i think Ryx[k] can look something like L-1 L-1 Ryx[k] = (1/L) * SUM{ SUM{ h[j,n-i]*x[n-i-j]*x[n-i-k] } } i=0 j=0 L-1 Ryx[k] = SUM{ h[j,n]*Rxx[j-k] } j=0 h[k,n+1] = h[k,n] - SUM{ h[j,n]*mu*(Rxx[j-k]/Rxx[0]) } j minus something to do with Rdx[k]. so the state transition matrix (applied to the time-varying *coefficients*, h[k,n]) looks sorta like [ 1-mu mu*Rxx[1]/Rxx[0] mu*Rxx[2]/Rxx[0] ... ] [ mu*Rxx[1]/Rxx[0] 1-mu mu*Rxx[1]/Rxx[0] ... ] [ mu*Rxx[2]/Rxx[0] mu*Rxx[1]/Rxx[0] 1-mu ... ] [ ... ... ... ... ] and then you gotta ask yourself when that represents a stable system. and i don't remember how to do that anymore. if there is an Rxx[k] that represents a high correlation (which is what happens when the echo is very loud and delayed by k samples), i can see some elements of this state transition matrix get large and you might have a problem. if you slow down the adaptation by reducing the adaptation gain, mu, there should always be a point where the updating, although slow, should be stable. i know i was sloppy tossing around the expectation values in the summations. i don't mind if anyone gets more anal about this and fixes it. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."

Reply by ●June 8, 20122012-06-08

On 6/8/12 12:57 AM, robert bristow-johnson wrote:> so the state transition matrix (applied to the time-varying > *coefficients*, h[k,n]) looks sorta like > > [ 1-mu mu*Rxx[1]/Rxx[0] mu*Rxx[2]/Rxx[0] ... ] > [ mu*Rxx[1]/Rxx[0] 1-mu mu*Rxx[1]/Rxx[0] ... ] > [ mu*Rxx[2]/Rxx[0] mu*Rxx[1]/Rxx[0] 1-mu ... ] > [ ... ... ... ... ]right away dropped a minus sign... [ 1-mu -mu*Rxx[1]/Rxx[0] -mu*Rxx[2]/Rxx[0] ... ] [ -mu*Rxx[1]/Rxx[0] 1-mu -mu*Rxx[1]/Rxx[0] ... ] [ -mu*Rxx[2]/Rxx[0] -mu*Rxx[1]/Rxx[0] 1-mu ... ] [ ... ... ... ... ] looks like I - mu/Rxx[0]*[ Rxx[i-j] ] where I is the identity matrix and [ Rxx[i-j] ] is the "Toeplitz matrix" with all this diagonals identical and with i the row and j the column number of each element. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."

Reply by ●June 8, 20122012-06-08

not related to the problem, but>> It's calculated in dB as 20*log10(var(echo signal) / var(speakershouldn't it be "10 log 10"? "Var"(iance) implies to me that some magnitude (where I'd use 20 log 10) was already squared. Might be a misunderstanding on my side, though.

Reply by ●June 8, 20122012-06-08

> shouldn't it be "10 log 10"? "Var"(iance) implies to me that some magnitude > (where I'd use 20 log 10) was already squared. Might be a misunderstanding > on my side, though.I think you're right about that.

Reply by ●June 8, 20122012-06-08

> looks like > > I - mu/Rxx[0]*[ Rxx[i-j] ] > > where I is the identity matrix and [ Rxx[i-j] ] is the "Toeplitz matrix" > with all this diagonals identical and with i the row and j the column > number of each element.The equations you posted look very similar to the equations you stumble into when you derive the levinson-durbin algorithm for calculating LPC?

Reply by ●June 8, 20122012-06-08

On Thursday, June 7, 2012 7:58:59 PM UTC-5, Mauritz Jameson wrote:> Hi > > I'm playing around with a traditional NLMS algorithm for acoustic echo > cancellation. > > It seems like the NLMS algorithm have a hard time cancelling the echo > as soon as the ERL level creeps above 0dB. > > Is that to be expected?Mauritz, I'm out of town at the moment, and won't be back until next Monday. But, the NLMS should cancel an echo with >0dB ERL, IF ALL conditions are favorable. I hope you're simulating the algorithm before trying to implement it. I need the answer to the following questions. 1. Are you using fixed-point or floating-point simulation? 2. What is your simulated unknown impulse response (including length of delay)? To test your basic algorithm you should be using a white-noise source, not speech. Once you are satisfied with the basic algorithm, then start looking at specialized signals. The "ideal" update gain depends on the input signal. With the NLMS, a gain of 1 is best for white noise. HOWEVER, since the LMS and NLMS estimates the Wiener filter, which requires the inverse of the input autocorrelation matrix,if the input is sinusoid type, the inverse matrix does not exist, and there is no unique solution. This can hurt fixed-point implementations if you are not careful with the update gain. I ran a quick floating-point simulation with white noise where the magnitude of the echo was 1.5 times the input, and the algorithm ran with no problems.

Reply by ●June 9, 20122012-06-09

> I hope you're simulating the algorithm before trying to implement it.Yes, I'm running simulations in MATLAB because I don't want to spend too much time chasing errors when I finally implement it.> 1. Are you using fixed-point or floating-point simulation?Floating point. Once I have the algorithm working in MATLAB I will make a floating point implementation in C and then once that has been verified to work I will translate the math to fixed point.> 2. What is your simulated unknown impulse response (including length of delay)?The impulse response I am using for the acoustic path has 512 filter coefficients. The delay is 18 samples (I filtered an impulse and the largest peak in the output happens after 18 samples).> To test your basic algorithm you should be using a white-noise source, not speech. Once you are satisfied with the basic algorithm, then start looking at specialized signals.Ok. I didn't know that. I thought it was better to use it on actual speech signals instead of signals which the algorithm won't encounter in "the real world".> The "ideal" update gain depends on the input signal. With the NLMS, a gain of 1 is best for white noise. HOWEVER, since the LMS and NLMS estimates the Wiener filter, which requires the inverse of the input autocorrelation matrix,if the input is sinusoid type, the inverse matrix does not exist, and there is no unique solution. This can hurt fixed-point implementations if you are not careful with the update gain.How should I be careful ? Should the update gain be lower than "normal" ? I have noticed that the update gain highly depends on the power of the speaker signal. The lower the power the higher the update gain can be. Is that a correct observation?> I ran a quick floating-point simulation with white noise where the magnitude of the echo was 1.5 times the input, and the algorithm ran with no problems.That's nice to know. I will do the same then. I look forward to your feedback. Thank you!!