Hi, We are currently working on an acoustic echocanceller based on the well know NLMS principle. This echocanceller works fine as long as we feed the echocanceller with a echo signal that is generated by an audio processing program. When using this ideal echo signal, freezing the FIR coefficients works like it should: the echo is still cancelled because the FIR tabs contain a representation of the impulse respons of the (virtual) room. Things are different in real life : when working with a real room, the echo is cancelled as long as the tabs are not froozen. Echo Attenuation (ERLE?) is as much as 40dB. However, as soons as the tabs are froozen, the echo attenuation is reduced to 10-15 dB even with no near end speech! This raises some questions: - Maybe our code is wrong. We've tested the algorithm in C++ and Matlab, and both behave the same. Below the Matlab code is include as a reference, so if anyone sees a bug, please let me know. In this peace if code you can see that we stop adapting the weights when half way through the microphone and speaker file. - If this bad behaviour is due to the non linear impulse respons of the room, and therefore is inherent to AEC, why is everybody talking about freezing the tabs when double talk is active? With kind regards, Johan Kleusens % read the speaker file [x,fs] = wavread('c:\testspeaker.wav'); % read speaker file % read the microphone file d=wavread('c:\testmic.wav'); % read microphone file L = 1500; % Define nr of tabs wn = zeros(L,1); % Array of weight values xn = zeros(L,1); % Storage for input data n = length(x); % nr of samples on wave file wavout = zeros(n,2); % Storage for wave output %read sound data on a one sample basis, and proces each sample for i=1:n xn(2:L) = xn(1:L-1); % shift data xn(1) = x(i); % get one new sample yn=wn' * xn; % calculate estimated echo signal en=d(i)-yn; % calculate error signal wavout(i,1)=en; % store error in output array p = xn' * xn; % Calculate power of input if (i/n) < 0.5 wn = wn + 0.5/(p+0.001) * xn .* en; % update weights wavout(i,2)=1; else wavout(i,2)=0; end end wavwrite(wavout,fs,'c:\fdaf.wav'); % Write result to output file
Problems after freezing FIR coefficients of Acoustic Echo Canceller
Started by ●April 11, 2005
Reply by ●April 11, 20052005-04-11
In the simulation you are assuming that the (virtual) room characteristics do not change, in real life fluctuations/changes in your "room" might cause the problem you are describing. "johan kleuskens" <j.kleuskens@opentsp.com> wrote in message news:425a50e5$0$147$e4fe514c@news.xs4all.nl...> Hi, > > We are currently working on an acoustic echocanceller based on the wellknow> NLMS principle. This echocanceller works fine as long as we feed the > echocanceller with a echo signal that is generated by an audio processing > program. When using this ideal echo signal, freezing the FIR coefficients > works like it should: the echo is still cancelled because the FIR tabs > contain a representation of the impulse respons of the (virtual) room. > > Things are different in real life : when working with a real room, theecho> is cancelled as long as the tabs are not froozen. Echo Attenuation (ERLE?) > is as much as 40dB. However, as soons as the tabs are froozen, the echo > attenuation is reduced to 10-15 dB even with no near end speech! > > This raises some questions: > > - Maybe our code is wrong. We've tested the algorithm in C++ andMatlab,> and both behave the same. Below the Matlab code is include as a reference, > so if anyone sees a bug, please let me know. In this peace if > code you can see that we stop adapting the weights when half way throughthe> microphone and speaker file. > > - If this bad behaviour is due to the non linear impulse respons ofthe> room, and therefore is inherent to AEC, why is everybody talking about > freezing the tabs when double talk is active? > > With kind regards, > > Johan Kleusens > > > > % read the speaker file > [x,fs] = wavread('c:\testspeaker.wav'); % read speaker file > > % read the microphone file > d=wavread('c:\testmic.wav'); % read microphone file > > L = 1500; % Define nr of tabs > > wn = zeros(L,1); % Array of weightvalues> xn = zeros(L,1); % Storage for inputdata> n = length(x); % nr of samples onwave> file > wavout = zeros(n,2); % Storage for wave > output > %read sound data on a one sample basis, and proces each sample > for i=1:n > xn(2:L) = xn(1:L-1); % shift data > xn(1) = x(i); % get one new sample > yn=wn' * xn; % calculate estimated > echo signal > en=d(i)-yn; % calculate errorsignal> wavout(i,1)=en; % store error inoutput> array > p = xn' * xn; % Calculate power of > input > if (i/n) < 0.5 > wn = wn + 0.5/(p+0.001) * xn .* en; % update weights > wavout(i,2)=1; > else > wavout(i,2)=0; > end > end > wavwrite(wavout,fs,'c:\fdaf.wav'); % Write result to output > file > >
Reply by ●April 11, 20052005-04-11
johan kleuskens wrote:> Hi, > > We are currently working on an acoustic echocanceller based on the well know > NLMS principle. This echocanceller works fine as long as we feed the > echocanceller with a echo signal that is generated by an audio processing > program. When using this ideal echo signal, freezing the FIR coefficients > works like it should: the echo is still cancelled because the FIR tabs > contain a representation of the impulse respons of the (virtual) room. > > Things are different in real life : when working with a real room, the echo > is cancelled as long as the tabs are not froozen. Echo Attenuation (ERLE?) > is as much as 40dB. However, as soons as the tabs are froozen, the echo > attenuation is reduced to 10-15 dB even with no near end speech! > > This raises some questions: > > - Maybe our code is wrong. We've tested the algorithm in C++ and Matlab, > and both behave the same. Below the Matlab code is include as a reference, > so if anyone sees a bug, please let me know. In this peace if > code you can see that we stop adapting the weights when half way through the > microphone and speaker file. > > - If this bad behaviour is due to the non linear impulse respons of the > room, and therefore is inherent to AEC, why is everybody talking about > freezing the tabs when double talk is active? > > With kind regards, > > Johan Kleusens >-- code snipped -- As Mr. Mughal suggested, your room characteristics may be constantly changing. You can find this out by monitoring how the echo parameters change with time -- if they don't settle for the real room, then that's probably it. You should also verify that the delay in your code doesn't change when you freeze the acquisition. Sudden changes of delay in the code may not be modeled with your "virtual" room, yet would certainly come into play with the real room. Does the effect actually change _immediately_ on freezing the acquisition, or does it take a little bit of time? If it's absolutely immediately that would point toward your delay changing (depending on your definition of "immediate", and how quickly you acquire echo information). -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Reply by ●April 11, 20052005-04-11
in article d3e1b4$7gu$1@newsg2.svr.pol.co.uk, Bobby Mughal at bmughal@dspcreations.com wrote on 04/11/2005 10:26:> In the simulation you are assuming that the (virtual) room characteristics > do not change, in real life fluctuations/changes in your "room" might cause > the problem you are describing.heck, they change as you walk from the door to the chair behind your desk. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
Reply by ●April 11, 20052005-04-11
The result changes "immediately" (within a few hundred samples). The room is a test-room. Nobody is present in that room. I think there is no change of delay in code, as you can see in the Matlab code i sent in the original posting. Could the non-linearity of the speaker and/or microphone cause this? They are common PC accesoires. "Tim Wescott" <tim@wescottnospamdesign.com> schreef in bericht news:115l3rgsp3lbv49@corp.supernews.com...> johan kleuskens wrote: > >> Hi, >> >> We are currently working on an acoustic echocanceller based on the well >> know NLMS principle. This echocanceller works fine as long as we feed the >> echocanceller with a echo signal that is generated by an audio processing >> program. When using this ideal echo signal, freezing the FIR coefficients >> works like it should: the echo is still cancelled because the FIR tabs >> contain a representation of the impulse respons of the (virtual) room. >> >> Things are different in real life : when working with a real room, the >> echo is cancelled as long as the tabs are not froozen. Echo Attenuation >> (ERLE?) is as much as 40dB. However, as soons as the tabs are froozen, >> the echo attenuation is reduced to 10-15 dB even with no near end speech! >> >> This raises some questions: >> >> - Maybe our code is wrong. We've tested the algorithm in C++ and >> Matlab, and both behave the same. Below the Matlab code is include as a >> reference, so if anyone sees a bug, please let me know. In >> this peace if code you can see that we stop adapting the weights when >> half way through the microphone and speaker file. >> >> - If this bad behaviour is due to the non linear impulse respons of >> the room, and therefore is inherent to AEC, why is everybody talking >> about freezing the tabs when double talk is active? >> >> With kind regards, >> >> Johan Kleusens >> > -- code snipped -- > > As Mr. Mughal suggested, your room characteristics may be constantly > changing. You can find this out by monitoring how the echo parameters > change with time -- if they don't settle for the real room, then that's > probably it. > > You should also verify that the delay in your code doesn't change when you > freeze the acquisition. Sudden changes of delay in the code may not be > modeled with your "virtual" room, yet would certainly come into play with > the real room. > > Does the effect actually change _immediately_ on freezing the acquisition, > or does it take a little bit of time? If it's absolutely immediately that > would point toward your delay changing (depending on your definition of > "immediate", and how quickly you acquire echo information). > > -- > > Tim Wescott > Wescott Design Services > http://www.wescottdesign.com
Reply by ●April 11, 20052005-04-11
Hi Johan, What does "as soon as" really mean? The very second, or just fairly quickly after? It is a step change, or a progressive degradation? When you switch off the adaption, what phyisical action do you perform? Are you just flicking a finger to press a key, or shuffling a bunch of people around the room? Is the window open? Do you have curtains flapping in the breeze? If the scale of the degradation relates in some way to the scale movement within the room, that is expected. If it is a step change, unprovoked by physical movement, you probably have a system problem. Software bugs are, of course, a possbility. So is some kind of sampling jitter. If you have quick adaption it is surprising how well things will work with unstable signal timing. Without the adaption they suddenly fall apart. When the adaption is at work is it settling to a fairly steady state, or in a constant state of flux? If you are using a sufficiently whitened training signal in should settle to a pretty stable state, until someone moves. For reference, with good quality converters, echo cancellation should do better than 40dB. You can get around 30dB even down a phone line with a-law/u-law distortion. Regards, Steve johan kleuskens wrote:> Hi, > > We are currently working on an acoustic echocanceller based on the well know > NLMS principle. This echocanceller works fine as long as we feed the > echocanceller with a echo signal that is generated by an audio processing > program. When using this ideal echo signal, freezing the FIR coefficients > works like it should: the echo is still cancelled because the FIR tabs > contain a representation of the impulse respons of the (virtual) room. > > Things are different in real life : when working with a real room, the echo > is cancelled as long as the tabs are not froozen. Echo Attenuation (ERLE?) > is as much as 40dB. However, as soons as the tabs are froozen, the echo > attenuation is reduced to 10-15 dB even with no near end speech! > > This raises some questions: > > - Maybe our code is wrong. We've tested the algorithm in C++ and Matlab, > and both behave the same. Below the Matlab code is include as a reference, > so if anyone sees a bug, please let me know. In this peace if > code you can see that we stop adapting the weights when half way through the > microphone and speaker file. > > - If this bad behaviour is due to the non linear impulse respons of the > room, and therefore is inherent to AEC, why is everybody talking about > freezing the tabs when double talk is active? > > With kind regards, > > Johan Kleusens > > > > % read the speaker file > [x,fs] = wavread('c:\testspeaker.wav'); % read speaker file > > % read the microphone file > d=wavread('c:\testmic.wav'); % read microphone file > > L = 1500; % Define nr of tabs > > wn = zeros(L,1); % Array of weight values > xn = zeros(L,1); % Storage for input data > n = length(x); % nr of samples on wave > file > wavout = zeros(n,2); % Storage for wave > output > %read sound data on a one sample basis, and proces each sample > for i=1:n > xn(2:L) = xn(1:L-1); % shift data > xn(1) = x(i); % get one new sample > yn=wn' * xn; % calculate estimated > echo signal > en=d(i)-yn; % calculate error signal > wavout(i,1)=en; % store error in output > array > p = xn' * xn; % Calculate power of > input > if (i/n) < 0.5 > wn = wn + 0.5/(p+0.001) * xn .* en; % update weights > wavout(i,2)=1; > else > wavout(i,2)=0; > end > end > wavwrite(wavout,fs,'c:\fdaf.wav'); % Write result to output > file > >
Reply by ●April 11, 20052005-04-11
Hi Steve, The room is a closed office with no curtains or moving things in it, and "as soon as" means immediately. For a long time we tought is was caused by a difference in sample frequency of the speaker and the microphone part of the PC soundcard we use. However, we did a test and concluded that microphone and speaker sample frequency are synchronous. We tested as follows: we played a 1000Hz soundfile through our speakers, and recorded this at the same time via our microphone. The phase of the sine on the recorded file was compared with the sine on the 1000Hz speaker file. The phase should be constant when speaker en microphone are synchronous and non constant if speaker and microphone are non synchronous. The phase difference was constant, and therefore the speaker and microphone are synchronous. Your idea of sample jitter is interesting. I will give that a thought, but i have no idea how to solve this problem if jiiter is the cause of all this. The recording device is a ordinary soundcard, and it is not possible to adjust jitter-behaviour on such a device. With kind regards, Johan Kleuskens The Netherlands "Steve Underwood" <steveu@dis.org> schreef in bericht news:d3ebdv$r20$1@nnews.pacific.net.hk...> Hi Johan, > > What does "as soon as" really mean? The very second, or just fairly > quickly after? It is a step change, or a progressive degradation? When you > switch off the adaption, what phyisical action do you perform? Are you > just flicking a finger to press a key, or shuffling a bunch of people > around the room? Is the window open? Do you have curtains flapping in the > breeze? > > If the scale of the degradation relates in some way to the scale movement > within the room, that is expected. If it is a step change, unprovoked by > physical movement, you probably have a system problem. Software bugs are, > of course, a possbility. So is some kind of sampling jitter. If you have > quick adaption it is surprising how well things will work with unstable > signal timing. Without the adaption they suddenly fall apart. When the > adaption is at work is it settling to a fairly steady state, or in a > constant state of flux? If you are using a sufficiently whitened training > signal in should settle to a pretty stable state, until someone moves. > > For reference, with good quality converters, echo cancellation should do > better than 40dB. You can get around 30dB even down a phone line with > a-law/u-law distortion. > > Regards, > Steve > > > johan kleuskens wrote: >> Hi, >> >> We are currently working on an acoustic echocanceller based on the well >> know NLMS principle. This echocanceller works fine as long as we feed the >> echocanceller with a echo signal that is generated by an audio processing >> program. When using this ideal echo signal, freezing the FIR coefficients >> works like it should: the echo is still cancelled because the FIR tabs >> contain a representation of the impulse respons of the (virtual) room. >> >> Things are different in real life : when working with a real room, the >> echo is cancelled as long as the tabs are not froozen. Echo Attenuation >> (ERLE?) is as much as 40dB. However, as soons as the tabs are froozen, >> the echo attenuation is reduced to 10-15 dB even with no near end speech! >> >> This raises some questions: >> >> - Maybe our code is wrong. We've tested the algorithm in C++ and >> Matlab, and both behave the same. Below the Matlab code is include as a >> reference, so if anyone sees a bug, please let me know. In >> this peace if code you can see that we stop adapting the weights when >> half way through the microphone and speaker file. >> >> - If this bad behaviour is due to the non linear impulse respons of >> the room, and therefore is inherent to AEC, why is everybody talking >> about freezing the tabs when double talk is active? >> >> With kind regards, >> >> Johan Kleusens >> >> >> >> % read the speaker file >> [x,fs] = wavread('c:\testspeaker.wav'); % read speaker file >> >> % read the microphone file >> d=wavread('c:\testmic.wav'); % read microphone >> file >> >> L = 1500; % Define nr of tabs >> >> wn = zeros(L,1); % Array of weight >> values >> xn = zeros(L,1); % Storage for input >> data >> n = length(x); % nr of samples on >> wave file >> wavout = zeros(n,2); % Storage for wave >> output >> %read sound data on a one sample basis, and proces each sample >> for i=1:n >> xn(2:L) = xn(1:L-1); % shift data >> xn(1) = x(i); % get one new sample >> yn=wn' * xn; % calculate estimated >> echo signal >> en=d(i)-yn; % calculate error >> signal >> wavout(i,1)=en; % store error in >> output array >> p = xn' * xn; % Calculate power of >> input >> if (i/n) < 0.5 >> wn = wn + 0.5/(p+0.001) * xn .* en; % update weights >> wavout(i,2)=1; >> else >> wavout(i,2)=0; >> end >> end >> wavwrite(wavout,fs,'c:\fdaf.wav'); % Write result to output >> file >>
Reply by ●April 11, 20052005-04-11
You might try switching to a high-end sound-card/recording system to see if that makes a difference. Maybe one with digital I/O on the card and external ADC/DACs. Also experiment with better speakers/microphones to see if that helps at all. -- Jon Harris SPAM blocked e-mail address in use. Replace the ANIMAL with 7 to reply. "johan kleuskens" <johanenbernie@hotmail.com> wrote in message news:425ad215$0$141$e4fe514c@news.xs4all.nl...> Hi Steve, > > The room is a closed office with no curtains or moving things in it, and "as > soon as" means immediately. For a long time we tought is was caused by a > difference in sample frequency of the speaker and the microphone part of the > PC soundcard we use. However, we did a test and concluded that microphone > and speaker sample frequency are synchronous. We tested as follows: we > played a 1000Hz soundfile through our speakers, and recorded this at the > same time via our microphone. The phase of the sine on the recorded file was > compared with the sine on the 1000Hz speaker file. The phase should be > constant when speaker en microphone are synchronous and non constant if > speaker and microphone are non synchronous. The phase difference was > constant, and therefore the speaker and microphone are synchronous. > > Your idea of sample jitter is interesting. I will give that a thought, but i > have no idea how to solve this problem if jiiter is the cause of all this. > The recording device is a ordinary soundcard, and it is not possible to > adjust jitter-behaviour on such a device. > > With kind regards, > > Johan Kleuskens > The Netherlands > > "Steve Underwood" <steveu@dis.org> schreef in bericht > news:d3ebdv$r20$1@nnews.pacific.net.hk... > > Hi Johan, > > > > What does "as soon as" really mean? The very second, or just fairly > > quickly after? It is a step change, or a progressive degradation? When you > > switch off the adaption, what phyisical action do you perform? Are you > > just flicking a finger to press a key, or shuffling a bunch of people > > around the room? Is the window open? Do you have curtains flapping in the > > breeze? > > > > If the scale of the degradation relates in some way to the scale movement > > within the room, that is expected. If it is a step change, unprovoked by > > physical movement, you probably have a system problem. Software bugs are, > > of course, a possbility. So is some kind of sampling jitter. If you have > > quick adaption it is surprising how well things will work with unstable > > signal timing. Without the adaption they suddenly fall apart. When the > > adaption is at work is it settling to a fairly steady state, or in a > > constant state of flux? If you are using a sufficiently whitened training > > signal in should settle to a pretty stable state, until someone moves. > > > > For reference, with good quality converters, echo cancellation should do > > better than 40dB. You can get around 30dB even down a phone line with > > a-law/u-law distortion. > > > > Regards, > > Steve
Reply by ●April 12, 20052005-04-12
johan kleuskens wrote:> Hi Steve, > > The room is a closed office with no curtains or moving things in it, and "as > soon as" means immediately. For a long time we tought is was caused by a > difference in sample frequency of the speaker and the microphone part of the > PC soundcard we use. However, we did a test and concluded that microphone > and speaker sample frequency are synchronous. We tested as follows: we > played a 1000Hz soundfile through our speakers, and recorded this at the > same time via our microphone. The phase of the sine on the recorded file was > compared with the sine on the 1000Hz speaker file. The phase should be > constant when speaker en microphone are synchronous and non constant if > speaker and microphone are non synchronous. The phase difference was > constant, and therefore the speaker and microphone are synchronous.What the microphone picks up will bear a constant phase relationship to what the speaker puts out even if the sample rates are very different, so long as the Nyquist criterion is met for both devices. I hope sample rate isn't the identity you thought to prove.> Your idea of sample jitter is interesting. I will give that a thought, but i > have no idea how to solve this problem if jiiter is the cause of all this. > The recording device is a ordinary soundcard, and it is not possible to > adjust jitter-behaviour on such a device.It seems unlikely to me that sample jitter could be the cause of progressive deterioration. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●April 12, 20052005-04-12