DSPRelated.com
Forums

Problems after freezing FIR coefficients of Acoustic Echo Canceller

Started by johan kleuskens April 11, 2005
Hi,

We are currently working on an acoustic echocanceller based on the well know 
NLMS principle. This echocanceller works fine as long as we feed the 
echocanceller with a echo signal that is generated by an audio processing 
program. When using this ideal echo signal, freezing the FIR coefficients 
works like it should: the echo is still cancelled because the FIR tabs 
contain a representation of the impulse respons of the (virtual) room.

Things are different in real life : when working with a real room, the echo 
is cancelled as long as the tabs are not froozen. Echo Attenuation (ERLE?) 
is as much as 40dB. However, as soons as the tabs are froozen, the echo 
attenuation is reduced to 10-15 dB even with no near end speech!

This raises some questions:

    - Maybe our code is wrong. We've tested the algorithm in C++ and Matlab, 
and both behave the same. Below the Matlab code is include as a reference, 
so         if  anyone  sees a bug,  please let me know. In this peace if 
code you can see that we stop adapting the weights when half way through the 
microphone and         speaker file.

    - If this bad behaviour is due to the non linear impulse respons of the 
room, and therefore is inherent to AEC, why is everybody talking about 
freezing the tabs     when double talk is active?

With kind regards,

Johan Kleusens



% read the speaker file
[x,fs] = wavread('c:\testspeaker.wav');             % read speaker file

% read the microphone file
d=wavread('c:\testmic.wav');                        % read microphone file

L  = 1500;                                          % Define nr of tabs

wn = zeros(L,1);                                    % Array of weight values
xn = zeros(L,1);                                    % Storage for input data
n = length(x);                                      % nr of samples on wave 
file
wavout = zeros(n,2);                                % Storage for wave 
output
%read sound data on a one sample basis, and proces each sample
for i=1:n
    xn(2:L) = xn(1:L-1);                            % shift data
    xn(1) = x(i);                                   % get one new sample
    yn=wn' * xn;                                    % calculate estimated 
echo signal
    en=d(i)-yn;                                     % calculate error signal
    wavout(i,1)=en;                                 % store error in output 
array
    p = xn' * xn;                                   % Calculate power of 
input
    if (i/n) < 0.5
        wn = wn +  0.5/(p+0.001) * xn .* en;            % update weights
        wavout(i,2)=1;
    else
        wavout(i,2)=0;
    end
end
wavwrite(wavout,fs,'c:\fdaf.wav');               % Write result to output 
file


In the simulation you are assuming that the (virtual) room characteristics
do not change, in real life fluctuations/changes in your "room" might cause
the problem you are describing.


"johan kleuskens" <j.kleuskens@opentsp.com> wrote in message
news:425a50e5$0$147$e4fe514c@news.xs4all.nl...
> Hi, > > We are currently working on an acoustic echocanceller based on the well
know
> NLMS principle. This echocanceller works fine as long as we feed the > echocanceller with a echo signal that is generated by an audio processing > program. When using this ideal echo signal, freezing the FIR coefficients > works like it should: the echo is still cancelled because the FIR tabs > contain a representation of the impulse respons of the (virtual) room. > > Things are different in real life : when working with a real room, the
echo
> is cancelled as long as the tabs are not froozen. Echo Attenuation (ERLE?) > is as much as 40dB. However, as soons as the tabs are froozen, the echo > attenuation is reduced to 10-15 dB even with no near end speech! > > This raises some questions: > > - Maybe our code is wrong. We've tested the algorithm in C++ and
Matlab,
> and both behave the same. Below the Matlab code is include as a reference, > so if anyone sees a bug, please let me know. In this peace if > code you can see that we stop adapting the weights when half way through
the
> microphone and speaker file. > > - If this bad behaviour is due to the non linear impulse respons of
the
> room, and therefore is inherent to AEC, why is everybody talking about > freezing the tabs when double talk is active? > > With kind regards, > > Johan Kleusens > > > > % read the speaker file > [x,fs] = wavread('c:\testspeaker.wav'); % read speaker file > > % read the microphone file > d=wavread('c:\testmic.wav'); % read microphone file > > L = 1500; % Define nr of tabs > > wn = zeros(L,1); % Array of weight
values
> xn = zeros(L,1); % Storage for input
data
> n = length(x); % nr of samples on
wave
> file > wavout = zeros(n,2); % Storage for wave > output > %read sound data on a one sample basis, and proces each sample > for i=1:n > xn(2:L) = xn(1:L-1); % shift data > xn(1) = x(i); % get one new sample > yn=wn' * xn; % calculate estimated > echo signal > en=d(i)-yn; % calculate error
signal
> wavout(i,1)=en; % store error in
output
> array > p = xn' * xn; % Calculate power of > input > if (i/n) < 0.5 > wn = wn + 0.5/(p+0.001) * xn .* en; % update weights > wavout(i,2)=1; > else > wavout(i,2)=0; > end > end > wavwrite(wavout,fs,'c:\fdaf.wav'); % Write result to output > file > >
johan kleuskens wrote:

> Hi, > > We are currently working on an acoustic echocanceller based on the well know > NLMS principle. This echocanceller works fine as long as we feed the > echocanceller with a echo signal that is generated by an audio processing > program. When using this ideal echo signal, freezing the FIR coefficients > works like it should: the echo is still cancelled because the FIR tabs > contain a representation of the impulse respons of the (virtual) room. > > Things are different in real life : when working with a real room, the echo > is cancelled as long as the tabs are not froozen. Echo Attenuation (ERLE?) > is as much as 40dB. However, as soons as the tabs are froozen, the echo > attenuation is reduced to 10-15 dB even with no near end speech! > > This raises some questions: > > - Maybe our code is wrong. We've tested the algorithm in C++ and Matlab, > and both behave the same. Below the Matlab code is include as a reference, > so if anyone sees a bug, please let me know. In this peace if > code you can see that we stop adapting the weights when half way through the > microphone and speaker file. > > - If this bad behaviour is due to the non linear impulse respons of the > room, and therefore is inherent to AEC, why is everybody talking about > freezing the tabs when double talk is active? > > With kind regards, > > Johan Kleusens >
-- code snipped -- As Mr. Mughal suggested, your room characteristics may be constantly changing. You can find this out by monitoring how the echo parameters change with time -- if they don't settle for the real room, then that's probably it. You should also verify that the delay in your code doesn't change when you freeze the acquisition. Sudden changes of delay in the code may not be modeled with your "virtual" room, yet would certainly come into play with the real room. Does the effect actually change _immediately_ on freezing the acquisition, or does it take a little bit of time? If it's absolutely immediately that would point toward your delay changing (depending on your definition of "immediate", and how quickly you acquire echo information). -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
in article d3e1b4$7gu$1@newsg2.svr.pol.co.uk, Bobby Mughal at
bmughal@dspcreations.com wrote on 04/11/2005 10:26:

> In the simulation you are assuming that the (virtual) room characteristics > do not change, in real life fluctuations/changes in your "room" might cause > the problem you are describing.
heck, they change as you walk from the door to the chair behind your desk. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
The result changes "immediately" (within a few hundred samples). The room is 
a test-room. Nobody is present in that room. I think there is no change of 
delay in code, as you can see in the Matlab code i sent in the original 
posting. Could the non-linearity of the speaker and/or microphone cause 
this? They are common PC accesoires.

"Tim Wescott" <tim@wescottnospamdesign.com> schreef in bericht 
news:115l3rgsp3lbv49@corp.supernews.com...
> johan kleuskens wrote: > >> Hi, >> >> We are currently working on an acoustic echocanceller based on the well >> know NLMS principle. This echocanceller works fine as long as we feed the >> echocanceller with a echo signal that is generated by an audio processing >> program. When using this ideal echo signal, freezing the FIR coefficients >> works like it should: the echo is still cancelled because the FIR tabs >> contain a representation of the impulse respons of the (virtual) room. >> >> Things are different in real life : when working with a real room, the >> echo is cancelled as long as the tabs are not froozen. Echo Attenuation >> (ERLE?) is as much as 40dB. However, as soons as the tabs are froozen, >> the echo attenuation is reduced to 10-15 dB even with no near end speech! >> >> This raises some questions: >> >> - Maybe our code is wrong. We've tested the algorithm in C++ and >> Matlab, and both behave the same. Below the Matlab code is include as a >> reference, so if anyone sees a bug, please let me know. In >> this peace if code you can see that we stop adapting the weights when >> half way through the microphone and speaker file. >> >> - If this bad behaviour is due to the non linear impulse respons of >> the room, and therefore is inherent to AEC, why is everybody talking >> about freezing the tabs when double talk is active? >> >> With kind regards, >> >> Johan Kleusens >> > -- code snipped -- > > As Mr. Mughal suggested, your room characteristics may be constantly > changing. You can find this out by monitoring how the echo parameters > change with time -- if they don't settle for the real room, then that's > probably it. > > You should also verify that the delay in your code doesn't change when you > freeze the acquisition. Sudden changes of delay in the code may not be > modeled with your "virtual" room, yet would certainly come into play with > the real room. > > Does the effect actually change _immediately_ on freezing the acquisition, > or does it take a little bit of time? If it's absolutely immediately that > would point toward your delay changing (depending on your definition of > "immediate", and how quickly you acquire echo information). > > -- > > Tim Wescott > Wescott Design Services > http://www.wescottdesign.com
Hi Johan,

What does "as soon as" really mean? The very second, or just fairly 
quickly after? It is a step change, or a progressive degradation? When 
you switch off the adaption, what phyisical action do you perform? Are 
you just flicking a finger to press a key, or shuffling a bunch of 
people around the room? Is the window open? Do you have curtains 
flapping in the breeze?

If the scale of the degradation relates in some way to the scale 
movement within the room, that is expected. If it is a step change, 
unprovoked by physical movement, you probably have a system problem. 
Software bugs are, of course, a possbility. So is some kind of sampling 
jitter. If you have quick adaption it is surprising how well things will 
work with unstable signal timing. Without the adaption they suddenly 
fall apart. When the adaption is at work is it settling to a fairly 
steady state, or in a constant state of flux? If you are using a 
sufficiently whitened training signal in should settle to a pretty 
stable state, until someone moves.

For reference, with good quality converters, echo cancellation should do 
better than 40dB. You can get around 30dB even down a phone line with 
a-law/u-law distortion.

Regards,
Steve


johan kleuskens wrote:
> Hi, > > We are currently working on an acoustic echocanceller based on the well know > NLMS principle. This echocanceller works fine as long as we feed the > echocanceller with a echo signal that is generated by an audio processing > program. When using this ideal echo signal, freezing the FIR coefficients > works like it should: the echo is still cancelled because the FIR tabs > contain a representation of the impulse respons of the (virtual) room. > > Things are different in real life : when working with a real room, the echo > is cancelled as long as the tabs are not froozen. Echo Attenuation (ERLE?) > is as much as 40dB. However, as soons as the tabs are froozen, the echo > attenuation is reduced to 10-15 dB even with no near end speech! > > This raises some questions: > > - Maybe our code is wrong. We've tested the algorithm in C++ and Matlab, > and both behave the same. Below the Matlab code is include as a reference, > so if anyone sees a bug, please let me know. In this peace if > code you can see that we stop adapting the weights when half way through the > microphone and speaker file. > > - If this bad behaviour is due to the non linear impulse respons of the > room, and therefore is inherent to AEC, why is everybody talking about > freezing the tabs when double talk is active? > > With kind regards, > > Johan Kleusens > > > > % read the speaker file > [x,fs] = wavread('c:\testspeaker.wav'); % read speaker file > > % read the microphone file > d=wavread('c:\testmic.wav'); % read microphone file > > L = 1500; % Define nr of tabs > > wn = zeros(L,1); % Array of weight values > xn = zeros(L,1); % Storage for input data > n = length(x); % nr of samples on wave > file > wavout = zeros(n,2); % Storage for wave > output > %read sound data on a one sample basis, and proces each sample > for i=1:n > xn(2:L) = xn(1:L-1); % shift data > xn(1) = x(i); % get one new sample > yn=wn' * xn; % calculate estimated > echo signal > en=d(i)-yn; % calculate error signal > wavout(i,1)=en; % store error in output > array > p = xn' * xn; % Calculate power of > input > if (i/n) < 0.5 > wn = wn + 0.5/(p+0.001) * xn .* en; % update weights > wavout(i,2)=1; > else > wavout(i,2)=0; > end > end > wavwrite(wavout,fs,'c:\fdaf.wav'); % Write result to output > file > >
Hi Steve,

The room is a closed office with no curtains or moving things in it, and "as 
soon as" means immediately. For a long time we tought is was caused by a 
difference in sample frequency of the speaker and the microphone part of the 
PC soundcard we use. However, we did a test and concluded that microphone 
and speaker sample frequency are synchronous. We tested as follows: we 
played a 1000Hz soundfile through our speakers, and recorded this at the 
same time via our microphone. The phase of the sine on the recorded file was 
compared with the sine on the 1000Hz speaker file. The phase should be 
constant when speaker en microphone are synchronous and non constant if 
speaker and microphone are non synchronous. The phase difference was 
constant, and therefore the speaker and microphone are synchronous.

Your idea of sample jitter is interesting. I will give that a thought, but i 
have no idea how to solve this problem if jiiter is the cause of all this. 
The recording device is a ordinary soundcard, and it is not possible to 
adjust jitter-behaviour on such a device.

With kind regards,

Johan Kleuskens
The Netherlands

"Steve Underwood" <steveu@dis.org> schreef in bericht 
news:d3ebdv$r20$1@nnews.pacific.net.hk...
> Hi Johan, > > What does "as soon as" really mean? The very second, or just fairly > quickly after? It is a step change, or a progressive degradation? When you > switch off the adaption, what phyisical action do you perform? Are you > just flicking a finger to press a key, or shuffling a bunch of people > around the room? Is the window open? Do you have curtains flapping in the > breeze? > > If the scale of the degradation relates in some way to the scale movement > within the room, that is expected. If it is a step change, unprovoked by > physical movement, you probably have a system problem. Software bugs are, > of course, a possbility. So is some kind of sampling jitter. If you have > quick adaption it is surprising how well things will work with unstable > signal timing. Without the adaption they suddenly fall apart. When the > adaption is at work is it settling to a fairly steady state, or in a > constant state of flux? If you are using a sufficiently whitened training > signal in should settle to a pretty stable state, until someone moves. > > For reference, with good quality converters, echo cancellation should do > better than 40dB. You can get around 30dB even down a phone line with > a-law/u-law distortion. > > Regards, > Steve > > > johan kleuskens wrote: >> Hi, >> >> We are currently working on an acoustic echocanceller based on the well >> know NLMS principle. This echocanceller works fine as long as we feed the >> echocanceller with a echo signal that is generated by an audio processing >> program. When using this ideal echo signal, freezing the FIR coefficients >> works like it should: the echo is still cancelled because the FIR tabs >> contain a representation of the impulse respons of the (virtual) room. >> >> Things are different in real life : when working with a real room, the >> echo is cancelled as long as the tabs are not froozen. Echo Attenuation >> (ERLE?) is as much as 40dB. However, as soons as the tabs are froozen, >> the echo attenuation is reduced to 10-15 dB even with no near end speech! >> >> This raises some questions: >> >> - Maybe our code is wrong. We've tested the algorithm in C++ and >> Matlab, and both behave the same. Below the Matlab code is include as a >> reference, so if anyone sees a bug, please let me know. In >> this peace if code you can see that we stop adapting the weights when >> half way through the microphone and speaker file. >> >> - If this bad behaviour is due to the non linear impulse respons of >> the room, and therefore is inherent to AEC, why is everybody talking >> about freezing the tabs when double talk is active? >> >> With kind regards, >> >> Johan Kleusens >> >> >> >> % read the speaker file >> [x,fs] = wavread('c:\testspeaker.wav'); % read speaker file >> >> % read the microphone file >> d=wavread('c:\testmic.wav'); % read microphone >> file >> >> L = 1500; % Define nr of tabs >> >> wn = zeros(L,1); % Array of weight >> values >> xn = zeros(L,1); % Storage for input >> data >> n = length(x); % nr of samples on >> wave file >> wavout = zeros(n,2); % Storage for wave >> output >> %read sound data on a one sample basis, and proces each sample >> for i=1:n >> xn(2:L) = xn(1:L-1); % shift data >> xn(1) = x(i); % get one new sample >> yn=wn' * xn; % calculate estimated >> echo signal >> en=d(i)-yn; % calculate error >> signal >> wavout(i,1)=en; % store error in >> output array >> p = xn' * xn; % Calculate power of >> input >> if (i/n) < 0.5 >> wn = wn + 0.5/(p+0.001) * xn .* en; % update weights >> wavout(i,2)=1; >> else >> wavout(i,2)=0; >> end >> end >> wavwrite(wavout,fs,'c:\fdaf.wav'); % Write result to output >> file >>
You might try switching to a high-end sound-card/recording system to see if that
makes a difference.  Maybe one with digital I/O on the card and external
ADC/DACs.  Also experiment with better speakers/microphones to see if that helps
at all.
-- 
Jon Harris
SPAM blocked e-mail address in use.  Replace the ANIMAL with 7 to reply.

"johan kleuskens" <johanenbernie@hotmail.com> wrote in message
news:425ad215$0$141$e4fe514c@news.xs4all.nl...
> Hi Steve, > > The room is a closed office with no curtains or moving things in it, and "as > soon as" means immediately. For a long time we tought is was caused by a > difference in sample frequency of the speaker and the microphone part of the > PC soundcard we use. However, we did a test and concluded that microphone > and speaker sample frequency are synchronous. We tested as follows: we > played a 1000Hz soundfile through our speakers, and recorded this at the > same time via our microphone. The phase of the sine on the recorded file was > compared with the sine on the 1000Hz speaker file. The phase should be > constant when speaker en microphone are synchronous and non constant if > speaker and microphone are non synchronous. The phase difference was > constant, and therefore the speaker and microphone are synchronous. > > Your idea of sample jitter is interesting. I will give that a thought, but i > have no idea how to solve this problem if jiiter is the cause of all this. > The recording device is a ordinary soundcard, and it is not possible to > adjust jitter-behaviour on such a device. > > With kind regards, > > Johan Kleuskens > The Netherlands > > "Steve Underwood" <steveu@dis.org> schreef in bericht > news:d3ebdv$r20$1@nnews.pacific.net.hk... > > Hi Johan, > > > > What does "as soon as" really mean? The very second, or just fairly > > quickly after? It is a step change, or a progressive degradation? When you > > switch off the adaption, what phyisical action do you perform? Are you > > just flicking a finger to press a key, or shuffling a bunch of people > > around the room? Is the window open? Do you have curtains flapping in the > > breeze? > > > > If the scale of the degradation relates in some way to the scale movement > > within the room, that is expected. If it is a step change, unprovoked by > > physical movement, you probably have a system problem. Software bugs are, > > of course, a possbility. So is some kind of sampling jitter. If you have > > quick adaption it is surprising how well things will work with unstable > > signal timing. Without the adaption they suddenly fall apart. When the > > adaption is at work is it settling to a fairly steady state, or in a > > constant state of flux? If you are using a sufficiently whitened training > > signal in should settle to a pretty stable state, until someone moves. > > > > For reference, with good quality converters, echo cancellation should do > > better than 40dB. You can get around 30dB even down a phone line with > > a-law/u-law distortion. > > > > Regards, > > Steve
johan kleuskens wrote:
> Hi Steve, > > The room is a closed office with no curtains or moving things in it, and "as > soon as" means immediately. For a long time we tought is was caused by a > difference in sample frequency of the speaker and the microphone part of the > PC soundcard we use. However, we did a test and concluded that microphone > and speaker sample frequency are synchronous. We tested as follows: we > played a 1000Hz soundfile through our speakers, and recorded this at the > same time via our microphone. The phase of the sine on the recorded file was > compared with the sine on the 1000Hz speaker file. The phase should be > constant when speaker en microphone are synchronous and non constant if > speaker and microphone are non synchronous. The phase difference was > constant, and therefore the speaker and microphone are synchronous.
What the microphone picks up will bear a constant phase relationship to what the speaker puts out even if the sample rates are very different, so long as the Nyquist criterion is met for both devices. I hope sample rate isn't the identity you thought to prove.
> Your idea of sample jitter is interesting. I will give that a thought, but i > have no idea how to solve this problem if jiiter is the cause of all this. > The recording device is a ordinary soundcard, and it is not possible to > adjust jitter-behaviour on such a device.
It seems unlikely to me that sample jitter could be the cause of progressive deterioration. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
On Mon, 11 Apr 2005 18:58:25 +0200, "jkle" <johanenbernie@hotmail.com>
wrote:

> microphone > common PC accessories.
Bingo? Chris Hornbeck 6x9=42