comp.dsp | NLMS and ERL above 0dB

Hi

I'm playing around with a traditional NLMS algorithm for acoustic echo
cancellation.

It seems like the NLMS algorithm have a hard time cancelling the echo
as soon as the ERL level creeps above 0dB.

Is that to be expected?

The speaker signal is pure speech (no noise) and I have linearly
filtered the speaker signal to have a simulated microphone signal. The
adaptive filter order is 512 taps.
I have made a test script which tests the NLMS algo with various step
sizes ranging from very small (0.00001) to 1. For low step sizes
nothing happens (the echo is not cancelled) and
then at some point it just breaks (the error has a huge output ...like
an impulse)...

Is there a way to calculate the optimal step size given no
interference (i.e. no near-end signal) and also given that there is
some interference ?

Reply by robert bristow-johnson ●June 7, 20122012-06-07

On 6/7/12 8:58 PM, Mauritz Jameson wrote:
> Hi
>
> I'm playing around with a traditional NLMS algorithm for acoustic echo
> cancellation.
>
> It seems like the NLMS algorithm have a hard time cancelling the echo
> as soon as the ERL level creeps above 0dB.

what's an ERL?  echo <something> level?

> Is that to be expected?

if you are using an LMS filter to cancel an echo that is bigger than the 
direct sound, i wonder if the adaptation mechanism in the LMS filter 
might go unstable.  dunno if the same happens for normalized LMS.

the adaptation can be sorta modeled like a Markov process based on the 
cross-correlation of the input against uncorrected feedback path.  it's 
been a while since i saw this set up.  but it's kinda like a 
discrete-time state-variable control system with the LMS filter taps as 
the system states.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by Mauritz Jameson ●June 8, 20122012-06-08

ERL is echo return loss...

It's calculated in dB as 20*log10(var(echo signal) / var(speaker
signal)) and it tells you how strong the echo is relative to its
source.

The algorithm I am using is described here:

http://en.wikipedia.org/wiki/Least_mean_squares_filter#Normalised_least_mean_squares_filter_.28NLMS.29

Reply by robert bristow-johnson ●June 8, 20122012-06-08

On 6/7/12 10:58 PM, robert bristow-johnson wrote:
> On 6/7/12 8:58 PM, Mauritz Jameson wrote:
>>
>> I'm playing around with a traditional NLMS algorithm for acoustic echo
>> cancellation.
>>
>> It seems like the NLMS algorithm have a hard time cancelling the echo
>> as soon as the ERL level creeps above 0dB.
...
>> Is that to be expected?
>
> if you are using an LMS filter to cancel an echo that is bigger than the
> direct sound, i wonder if the adaptation mechanism in the LMS filter
> might go unstable. dunno if the same happens for normalized LMS.
>
> the adaptation can be sorta modeled like a Markov process based on the
> cross-correlation of the input against uncorrected feedback path. it's
> been a while since i saw this set up. but it's kinda like a
> discrete-time state-variable control system with the LMS filter taps as
> the system states.

so let's see if this is how it works.  anyone is invited to point out 
any misconceptions here.

L tap FIR

coefficient vector at time n:
                          H[n] = { h[0,n] h[1,n] h[2,n] ... h[L-1,n] }
FIR states at time n:    X[n] = { x[n]   x[n-1] x[n-2] ... x[n-L+1] }

"desired signal" that LMS is s'posed to converge to.

LMS:
               L-1
     y[n]  =   SUM{ h[k,n]*x[n-k] }
               k=0


     e[n]  =   y[n] - d[n]

     h[k,n+1]   =  h[k,n] - mu*e[n]*x[n-k]

where NLMS is the same except
                                                  L-1
   h[k,n+1]   =  h[k,n] - (mu*e[n]*x[n-k]) / (1/L SUM{ (x[n-i])^2 })
                                                  i=0

                                                     L-1
          =  h[k,n] - (mu*(y[n]-d[n])*x[n-k]) / (1/L SUM{ (x[n-i])^2 })
                                                     i=0

looking at the adaptation term, if you apply expectation values on the 
whole thing and be real sloppy with the math, you might get something like

    h[k,n+1]  =  h[k,n] - ( mu*(Ryx[k]-Rdx[k]) ) / (Rxx[0])

                     L-1
    Rxx[k] = (1/L) * SUM{ x[n-i]*x[n-i-k] }
                     i=0

                     L-1
    Ryx[k] = (1/L) * SUM{ y[n-i]*x[n-i-k] }
                     i=0

                     L-1
    Rdx[k] = (1/L) * SUM{ d[n-i]*x[n-i-k] }
                     i=0

furthermore, i think Ryx[k] can look something like

                     L-1  L-1
    Ryx[k] = (1/L) * SUM{ SUM{ h[j,n-i]*x[n-i-j]*x[n-i-k] } }
                     i=0  j=0

              L-1
    Ryx[k] =  SUM{ h[j,n]*Rxx[j-k] }
              j=0


   h[k,n+1]  =  h[k,n] -  SUM{ h[j,n]*mu*(Rxx[j-k]/Rxx[0]) }
                           j

minus something to do with Rdx[k].

so the state transition matrix (applied to the time-varying 
*coefficients*, h[k,n]) looks sorta like

   [  1-mu               mu*Rxx[1]/Rxx[0]   mu*Rxx[2]/Rxx[0]  ... ]
   [  mu*Rxx[1]/Rxx[0]   1-mu               mu*Rxx[1]/Rxx[0]  ... ]
   [  mu*Rxx[2]/Rxx[0]   mu*Rxx[1]/Rxx[0]   1-mu              ... ]
   [  ...                ...                ...               ... ]

and then you gotta ask yourself when that represents a stable system. 
and i don't remember how to do that anymore.  if there is an Rxx[k] that 
represents a high correlation (which is what happens when the echo is 
very loud and delayed by k samples), i can see some elements of this 
state transition matrix get large and you might have a problem.  if you 
slow down the adaptation by reducing the adaptation gain, mu, there 
should always be a point where the updating, although slow, should be 
stable.

i know i was sloppy tossing around the expectation values in the 
summations.  i don't mind if anyone gets more anal about this and fixes it.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by robert bristow-johnson ●June 8, 20122012-06-08

On 6/8/12 12:57 AM, robert bristow-johnson wrote:

> so the state transition matrix (applied to the time-varying
> *coefficients*, h[k,n]) looks sorta like
>
>  [  1-mu               mu*Rxx[1]/Rxx[0]   mu*Rxx[2]/Rxx[0]  ... ]
>  [  mu*Rxx[1]/Rxx[0]   1-mu               mu*Rxx[1]/Rxx[0]  ... ]
>  [  mu*Rxx[2]/Rxx[0]   mu*Rxx[1]/Rxx[0]   1-mu              ... ]
>  [  ...                ...                ...               ... ]


right away dropped a minus sign...

   [  1-mu               -mu*Rxx[1]/Rxx[0]   -mu*Rxx[2]/Rxx[0]  ... ]
   [  -mu*Rxx[1]/Rxx[0]   1-mu               -mu*Rxx[1]/Rxx[0]  ... ]
   [  -mu*Rxx[2]/Rxx[0]   -mu*Rxx[1]/Rxx[0]   1-mu              ... ]
   [  ...                 ...                 ...               ... ]

looks like

    I  -  mu/Rxx[0]*[ Rxx[i-j] ]

where I is the identity matrix and [ Rxx[i-j] ] is the "Toeplitz matrix" 
with all this diagonals identical and with i the row and j the column 
number of each element.


-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by mnentwig ●June 8, 20122012-06-08

not related to the problem, but 

>> It's calculated in dB as 20*log10(var(echo signal) / var(speaker

shouldn't it be "10 log 10"? "Var"(iance) implies to me that some magnitude
(where I'd use 20 log 10) was already squared. Might be a misunderstanding
on my side, though.

Reply by Mauritz Jameson ●June 8, 20122012-06-08

> shouldn't it be "10 log 10"? "Var"(iance) implies to me that some magnitude
> (where I'd use 20 log 10) was already squared. Might be a misunderstanding
> on my side, though.

I think you're right about that.

Reply by Mauritz Jameson ●June 8, 20122012-06-08

> looks like
>
>     I  -  mu/Rxx[0]*[ Rxx[i-j] ]
>
> where I is the identity matrix and [ Rxx[i-j] ] is the "Toeplitz matrix"
> with all this diagonals identical and with i the row and j the column
> number of each element.

The equations you posted look very similar to the equations you
stumble into when you derive the levinson-durbin algorithm for
calculating LPC?

Reply by maury ●June 8, 20122012-06-08

On Thursday, June 7, 2012 7:58:59 PM UTC-5, Mauritz Jameson wrote:
> Hi
> 
> I'm playing around with a traditional NLMS algorithm for acoustic echo
> cancellation.
> 
> It seems like the NLMS algorithm have a hard time cancelling the echo
> as soon as the ERL level creeps above 0dB.
> 
> Is that to be expected?

Mauritz,
I'm out of town at the moment, and won't be back until next Monday. But, the NLMS should cancel an echo with >0dB ERL, IF ALL conditions are favorable. I hope you're simulating the algorithm before trying to implement it.

I need the answer to the following questions.

1. Are you using fixed-point or floating-point simulation?
2. What is your simulated unknown impulse response (including length of delay)?

To test your basic algorithm you should be using a white-noise source, not speech. Once you are satisfied with the basic algorithm, then start looking at specialized signals.

The "ideal" update gain depends on the input signal. With the NLMS, a gain of 1 is best for white noise. HOWEVER, since the LMS and NLMS estimates the Wiener filter, which requires the inverse of the input autocorrelation matrix,if the input is sinusoid type, the inverse matrix does not exist, and there is no unique solution. This can hurt fixed-point implementations if you are not careful with the update gain.

I ran a quick floating-point simulation with white noise where the magnitude of the echo was 1.5 times the input, and the algorithm ran with no problems.

Reply by Mauritz Jameson ●June 9, 20122012-06-09

> I hope you're simulating the algorithm before trying to implement it.

Yes, I'm running simulations in MATLAB because I don't want to spend
too much time chasing errors when I finally implement it.


> 1. Are you using fixed-point or floating-point simulation?

Floating point. Once I have the algorithm working in MATLAB I will
make a floating point implementation in C and then once that has been
verified to work I will translate the math to fixed point.


> 2. What is your simulated unknown impulse response (including length of delay)?

The impulse response I am using for the acoustic path has 512 filter
coefficients. The delay is 18 samples (I filtered an impulse and the
largest peak in the output happens after 18 samples).

> To test your basic algorithm you should be using a white-noise source, not speech. Once you are satisfied with the basic algorithm, then start looking at specialized signals.

Ok. I didn't know that. I thought it was better to use it on actual
speech signals instead of signals which the algorithm won't encounter
in "the real world".


> The "ideal" update gain depends on the input signal. With the NLMS, a gain of 1 is best for white noise. HOWEVER, since the LMS and NLMS estimates the Wiener filter, which requires the inverse of the input autocorrelation matrix,if the input is sinusoid type, the inverse matrix does not exist, and there is no unique solution. This can hurt fixed-point implementations if you are not careful with the update gain.

How should I be careful ? Should the update gain be lower than
"normal" ? I have noticed that the update gain highly depends on the
power of the speaker signal. The lower the power the higher the update
gain can be. Is that a correct observation?


> I ran a quick floating-point simulation with white noise where the magnitude of the echo was 1.5 times the input, and the algorithm ran with no problems.

That's nice to know. I will do the same then.

I look forward to your feedback. Thank you!!

Previous12 3 Next

NLMS and ERL above 0dB

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group