On Thursday, June 14, 2012 10:35:16 AM UTC-5, Mauritz Jameson wrote:
> Thank you Maurice. I understand the experiment that you suggest.
> However, I am trying to understand why I see "bursts" of strong echo
> slipping through the NLMS algorithm when I use real-world signals.
> From what you're saying the explanation is that these microphone
> segments can't be modeled as the output of a linear operation on the
> speaker signal? You are saying that mic.wav contains segments which
> are the result of a non-linear operation on the speaker signal? My
> question is : How can I improve my algorithm so it handles these non-
> linear segments better? What do you do if you want to stick with a
> linear model? Do you increase the number of filter coefficients to
> "cover" for these possible non-linearities?
> 
> I will run the experiment you suggested and post my "theory" when I'm
> done.

You now have everything you need. Study my replys, and the two references I gave you. You should be able to take it from here.

Thank you Maurice. I understand the experiment that you suggest.
However, I am trying to understand why I see "bursts" of strong echo
slipping through the NLMS algorithm when I use real-world signals.
From what you're saying the explanation is that these microphone
segments can't be modeled as the output of a linear operation on the
speaker signal? You are saying that mic.wav contains segments which
are the result of a non-linear operation on the speaker signal? My
question is : How can I improve my algorithm so it handles these non-
linear segments better? What do you do if you want to stick with a
linear model? Do you increase the number of filter coefficients to
"cover" for these possible non-linearities?

I will run the experiment you suggested and post my "theory" when I'm
done.

On Wednesday, June 13, 2012 2:57:39 PM UTC-5, mjame...@gmail.com wrote:
> Maurice,
> 
> Thank you for your valuable feedback. I listened to the audio files in the zip file and the mic file only contains echo from the loudspeaker (if you disregard background noise). During the first 45 seconds of the mic file I hadn't put the loudspeaker on. After 45 seconds I switched on the loudspeaker, and you will hear that the audio in the mic file is the echo from the loudspeaker. That echo is time-aligned with the audio in the speaker signal.
> 
> 
> I tested the algorithm with speech and I fail to understand why the filter doesn't converge. Any ideas/suggestions?
> 
> The test can be downloaded from this link:
> 
> https://www.dropbox.com/s/14q5f8x1wbbzsjq/nlmstest.zip
> 
> The audio files in the zip file are not identical to the previous audio files, but the procedure is the same...I enable the loudspeakers after 45 seconds.

Mauritz,
Fist of all, your algorithm is getting about 20dB cancellation. So I don't know why you say it isn't working.

Secondly, the adaptive filter can only adapt to things that are consistent with the model. In this case y = x'h. Your algorithm has this as a linear model. Therefore, it there is anything in the return path (echo) that is non-linear, or longer than the model (filter) length, or not consistent with the model, then the adaptive filter can not adapt to it. Remember, the NLMS only says that it will get the least mean square error. It doesn't make promises about filter coefficient error. 

Now try this. In your main.m for testIndex = 2, replace "microphoneSignal = wavread('mic.wav');" with "microphoneSignal = filter(b,1,speakerSignal);". Then replace "outputSignal = nlmsAlgo(speakerSignal, microphoneSignal, 1);" with "[outputSignal,h] = nlmsAlgo(speakerSignal, microphoneSignal, 1);". Then in nlmsAlgo.m change the function declaration to
"function [outputSignal,h] = nlmsAlgo(speakerSignal, microphoneSignal, mu)".

This does two things. First, your simulated impulse response, b, is linear. Your microphone signal is now a linear convolution of the unknown impulse response and the sperker signal. Second, you will now have the adaptive filter, h, as a variable.

Now, run main, and your code will show you the cancellation possible if the impulse response is linear. Now plot b and h together to compare actual with the learned. Since your variables are row vectors, you will need to use plot([b' h']).

You now have two experiments: 1 with "b" as the unknown impulse response, and the other with your microphone wave file. Compare the results, think about the assumptions made when using the NLMS, then formulate a theory that is consistent with your observations.

Maurice

Maurice,

Thank you for your valuable feedback. I listened to the audio files in the zip file and the mic file only contains echo from the loudspeaker (if you disregard background noise). During the first 45 seconds of the mic file I hadn't put the loudspeaker on. After 45 seconds I switched on the loudspeaker, and you will hear that the audio in the mic file is the echo from the loudspeaker. That echo is time-aligned with the audio in the speaker signal.


I tested the algorithm with speech and I fail to understand why the filter doesn't converge. Any ideas/suggestions?

The test can be downloaded from this link:

https://www.dropbox.com/s/14q5f8x1wbbzsjq/nlmstest.zip

The audio files in the zip file are not identical to the previous audio files, but the procedure is the same...I enable the loudspeakers after 45 seconds.

On Tuesday, June 12, 2012 10:46:16 AM UTC-5, mjame...@gmail.com wrote:
> Maurice,
> 
> Thank you for your valuable feedback so far. I appreciate it.
> 
> I'm using the algorithm in a real-time setup where the audio is delivered in blocks. So that's why I process the audio in blocks.
> 

Understand.


> I don't understand why you say that the mic and spk files are different? In the beginning of the mic file I hadn't enabled the loudspeaker. If you "scroll" forward in the mic file you will hear the echo of the speaker signal in the microphone signal. There is some background noise in the mic file, but that's to be expected in a real setup, right?
> 

When I listen to the two, they are readings of two different passasges from The Hobbit. If one is the near-end and the other is the far-end (speaker and microphone), make sure you have some time at the beginning with just the far-end so the filter can converge.


> I noticed that I see ripples through the estimated filter coefficients when double-talk occurs. The peak in the filter coefficient vector usually stays in the same location. Any suggestions as to how I detect these ripples? It looks like a wave propagating through the filter coefficient vector. I was thinking about measuring the variance of the leading "zeros" and use the change in variance as an indicator of double-talk. On the other hand, I also noticed that even though the filter diverges during double-talk, it also reconverges pretty fast...so I'm wondering how much I will benefit from not updating the filter during double-talk? Will the double-talk segment sound much better?
> 

How you detect double-talk and what you do when you detect it is for you to determine.


> If the echo is stronger than its source, the peak in the estimated filter coefficients will probably be larger than 1 (this is just a guess) if there's no near-end speech present. 
> 
> The error signal should stay within a valid range from -1 to 1. The estimated microphone sample should ideally stay pretty close to the actual microphone sample given that there is no near-end speech. If there is near-end speech, the microphone sample could (for example) be -1 while the estimated is also 1. That means that the error has a range from -2 to 2. I placed a limiter on it in case the error signal is out of bounds, but maybe I should revise the boundaries from -1 to 1 to -2 to 2. I haven't investigated if the limiter introduces some undesired behavior of the algorithm or if it lowers the performance of the NLMS algorithm....
> 

Ask yourself what you are trying to prevent by limiting the error. Then remove the limter to see if you were successful. Then ask yourself if your fix is needed.


> If the echo canceller always sees double-talk and never just sees the far-end signal, I don't think it will converge. Most likely it will adapt to a false optimum. I'm not sure what that will look like. 
> 
> I will see if I can find the publications you recommended. Thank you!!!

You're welcome

Maurice

Maurice,

Thank you for your valuable feedback so far. I appreciate it.

I'm using the algorithm in a real-time setup where the audio is delivered in blocks. So that's why I process the audio in blocks.

I don't understand why you say that the mic and spk files are different? In the beginning of the mic file I hadn't enabled the loudspeaker. If you "scroll" forward in the mic file you will hear the echo of the speaker signal in the microphone signal. There is some background noise in the mic file, but that's to be expected in a real setup, right?

I noticed that I see ripples through the estimated filter coefficients when double-talk occurs. The peak in the filter coefficient vector usually stays in the same location. Any suggestions as to how I detect these ripples? It looks like a wave propagating through the filter coefficient vector. I was thinking about measuring the variance of the leading "zeros" and use the change in variance as an indicator of double-talk. On the other hand, I also noticed that even though the filter diverges during double-talk, it also reconverges pretty fast...so I'm wondering how much I will benefit from not updating the filter during double-talk? Will the double-talk segment sound much better?

If the echo is stronger than its source, the peak in the estimated filter coefficients will probably be larger than 1 (this is just a guess) if there's no near-end speech present. 

The error signal should stay within a valid range from -1 to 1. The estimated microphone sample should ideally stay pretty close to the actual microphone sample given that there is no near-end speech. If there is near-end speech, the microphone sample could (for example) be -1 while the estimated is also 1. That means that the error has a range from -2 to 2. I placed a limiter on it in case the error signal is out of bounds, but maybe I should revise the boundaries from -1 to 1 to -2 to 2. I haven't investigated if the limiter introduces some undesired behavior of the algorithm or if it lowers the performance of the NLMS algorithm....

If the echo canceller always sees double-talk and never just sees the far-end signal, I don't think it will converge. Most likely it will adapt to a false optimum. I'm not sure what that will look like. 

I will see if I can find the publications you recommended. Thank you!!!

On Monday, June 11, 2012 8:37:27 PM UTC-5, Mauritz Jameson wrote:
> Maury,
> 
> Here is a link to the code:
> 
> https://www.dropbox.com/s/ptw44onhtydv8d0/nlmsAlgorithm.zip

One other thing, a couple of questions for you.

Assume a simple impulse response such as an impulse (all coefficients = 0, except 1, i.e. a pure delay). If the echo is greater than the input, will the coefficient in the adaptive filter that represents that impulse have a value less than ot greater than 1? Will the error signal sometimes be greater than 1 (take another look at your whole algorithm).

Second question. If the echo canceller always sees double-talk, and never sees just the far-end signal, will it have a chance to converge?

Sometimes the way you test an algorithm is as critical, if not more so, as testing the algorithm itself.

Maurice

On Monday, June 11, 2012 8:37:27 PM UTC-5, Mauritz Jameson wrote:
> Maury,
> 
> Here is a link to the code:
> 
> https://www.dropbox.com/s/ptw44onhtydv8d0/nlmsAlgorithm.zip

Mauritz,
I have one wee little question, and a couple of comments. Fisrt the little question: why are you using the block NLMS algorithm? I know why I have used it in the past, just courious to know why you are using it.

Now the comments. You will be happy to know that your algorithm works. I changed your testIndex to 1 in main.m (testIndex = 1;), then added 

microphoneSignal = 10*microphoneSignal;

after 

microphoneSignal = filter(b,1,speakerSignal);

It worked fine, no problems. So your basic algorithm will cancel echos that are greater than the input (in this case 5 times bigger). Instead of using randi to generate noise, I would suggest using din = randn(col,row), then using din = din/max(din) to normalize (randn is white gaussian noise generator).

Your spk.wav and mic.wav signals are different speakers, and different speech. This is NOT a test of the NLMS algorithm, but a test of your algorithm performance with correlated interference. This is called "double-talk" (as I think you know). Correlated double-talk WILL cause the NLMS algorithm to diverge. No (simple) way around it. Your task is NOT to find out what is wrong with the NLMS algorithm, but to find out how to detect when double-talk occurs, and what to do when you detect it

If you can still find it, I would suggest you look at two publications. The first is an echo canceller tutorial from the T1A1 committee of ANSI, "ATIS T1.TR.27: Echo Cancelling", published Nov. 1993.
The second is ITU-T, G.168. The appendix (I believe appendix I) has "Considerations regarding echo canceller performance during double talk". If you Google "echo canceller ITU", then click "G.168 : New Appendix VII on guidance on echo canceller ... - ITU", you can get G.168(04.97) for free. I don't know about the ATIS T1.TR.27 cost.

Good luck,

Maurice

Maury,

Here is a link to the code:

https://www.dropbox.com/s/ptw44onhtydv8d0/nlmsAlgorithm.zip

On 6/11/12 3:49 PM, Mauritz Jameson wrote:
> Just to make sure I understand.
>
> Are you saying that I should remove DC from the error signal?

it wasn't what i meant to say.  i meant x[n] which is the speaker 
signal, if you are doing speakerphone feedback cancellation.  the "error 
signal", e[n], is actually your net microphone signal after your NLMS 
attempts to remove (via subtraction in the digital domain) the acoustic 
coupling into the microphone from the loudspeaker.  i guess it's d[n], 
the "desired signal" is the raw data that comes in from your mic.

>
> Right now I am just pre-filtering the microphone and speaker signal.
> The DC-suppressed mic and speaker signal are then handed over to the
> NLMS algorithm.

that might be okay.  i might do it to the speaker and not the mic, but 
because of the DC offset that might occur with the mic preamp and A/D 
conversion, it might be a good idea to DC block the mic input also.  if 
you DC block both x[n] and e[n] and if your NLMS is behaving well, the 
mean value (or sum) of the FIR coefficients *should* tend to add to zero 
on average.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."