comp.dsp | Did you know

Reply by robert bristow-johnson ●July 12, 20122012-07-12

On 7/11/12 9:54 PM, HardySpicer wrote:
> On Jul 12, 11:20 am, robert bristow-johnson
> <r...@audioimagination.com>  wrote:
>> On 7/11/12 3:10 PM, HardySpicer wrote:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> On Jul 11, 3:46 pm, robert bristow-johnson<r...@audioimagination.com>
>>> wrote:
>>>> On 7/10/12 10:51 PM, HardySpicer wrote:
>>
>>>>> On Jul 11, 2:31 am, maury<maury...@core.com>      wrote:
>>
>>>> ...
>>>>>> Keep in mind that what I am addressing is your statement
>>
>>>>>> "That the LMS derivation and equation is only optimal for Gaussian
>>>>>> driving signals"
>>
>>>>>> not the merits of the LMS algorithm with different signal types.
>>
>>>>> Well it depends how you derive it! I can derive it from maximum
>>>>> likelihood and get different answers for different noise
>>>>> distributions.
>>
>>>> i would like to see this.  do you mind deriving LMS here (using whatever
>>>> flavor of ASCII-math is to your liking), once with Gaussian and again
>>>> with some other distribution?
>>
>>>> what difference does it make?  a different adaptation gain, mu?  or do
>>>> you get something different from the standard LMS equations?
>>
>>>>> It is merely a matter of definition here.
>>
>>>> Hardy, i would like to see this.  can you show us?
>>
>>>> --
>>
>>>> r b-j                  r...@audioimagination.com
>>
>>>> "Imagination is more important than knowledge."
>>
>>> I am not very good at ASCII maths. but here goes in words
>>
>>> the likelihood function is the product of all the PDFs (up to say n).
>>> Each one
>>> in the Guassian case has Guassian PDF
>>
>>> L(y(k)) = product (i=1,n)  Pi(k)
>>
>> i am not sure what it is you're saying, but if it is what i think it is,
>> i think you're mistaken.
>>
>> a way to get Gaussian (or Normal) random variable, you add up a pile of
>> RVs that have finite mean and variance (so the "Cauchy RV" is not
>> eligible).  now, when you add RVs, you *convolve* their p.d.f's
>> together.  or, if you compute the "characteristic functions", which are
>> the Fourier Transform of the p.d.fs., you multiply *them* functions.
>>
>>> It is normal to minimise the -ve log of this since it makes the maths
>>> easier
>>
>> what's "-ve"?
>>
>>

...
>
> Yes of course in its original form LMS made no prior assumptions about
> statistics of the driving noise.
> It is only later with the advent of so-called "blind"  equalization
> (deconvolution) problems that assumptions had to be made.
>
> -ve means negative. Instead of maximising the likelihood we minimise
> the negative log likelihood.
> You can also make use of the Kullback Leibler distance measure for
> some applications. This measures how far a distribution is from being
> Guassian.
> You can make use of this fact if you know what the actual PDF is then
> minimise a criterion based on the K-L distance measure.
>

Clay or Eric or Dilip or somebody, can any of you make sense of this?

guess i better look up Kullback Leibler distance measure.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by maury ●July 12, 20122012-07-12

On Wednesday, July 11, 2012 2:11:13 PM UTC-5, HardySpicer wrote:
> On Jul 12, 4:08&#4294967295;am, maury &lt;maury...@core.com&gt; wrote:
> &gt; On Tuesday, July 10, 2012 9:51:16 PM UTC-5, HardySpicer wrote:
> &gt; &gt; On Jul 11, 2:31&#4294967295;am, maury &amp;lt;maury...@core.com&amp;gt; wrote:
> &gt; &gt; &amp;gt; On Monday, July 9, 2012 6:49:06 PM UTC-5, HardySpicer wrote:
> &gt; &gt; &amp;gt; &amp;gt; On Jul 9, 3:20&#4294967295;am, maury &amp;amp;lt;maury...@core.com&amp;amp;gt; wrote:
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; On Thursday, July 5, 2012 8:03:04 PM UTC-5, HardySpicer wrote:
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; On Jul 6, 11:53&#4294967295;am, HardySpicer &amp;amp;lt;gyansor...@gmail.com&amp;amp;gt; wrote:
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; On Jul 6, 2:44&#4294967295;am, maury &amp;amp;lt;maury...@core.com&amp;amp;gt; wrote:
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; On Wednesday, July 4, 2012 1:15:04 AM UTC-5, HardySpicer wrote:
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; That the LMS derivation and equation is only optimal for Guassian
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; driving signals. If the driving noise through an unknown system has
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; some other form of distribution (say Laplace as is the case with
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; speech), then the best estimator of that FIR system is the simpler
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; sign() LMS. I found this quite a pleasant surprise.
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; Hardy
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; On what do you base this? It is my understanding that the only constraint on the LMS algorithm is that the input vector be independent of the weight vector (this in itself has hugh implications). As for as optimality is concerned, the LMS only provides optimality in the sense that the filter will estimate the Wiener optimal solution if the input is stationary (in addition to the independence requirement).
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; LMS requires neither gaussian nor white inputs.
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; It&amp;amp;#39;s not a major difference that I can see, but it is an important
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; observation.
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; LMS can be derived from maximum likelihood assuming a Guassian
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; distribution. However, if you assume a different distribution
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; you get a different algorithm tailored to that distribution. It should
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; give better results. From my simulations it&amp;amp;#39;s not that striking
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; though.
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; See this paper
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; OPTIMUM ERROR NONLINEARITIES FOR LMS ADAPTATION
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; S. C. Douglas and T. H.-Y. Meng
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; &amp;amp;gt; Hardy
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; and from that paper it says
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; for Laplacian plant noise, we may achieve a
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; 3dB reduction in misadjustment using a sign error nonlinearity for a
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; &amp;amp;gt; given convergence rate as compared to standard LMS adaptation
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; I just took a quick glance at the paper you mentioned. The authors are looking at the effect and performance of nonlinear functions of the error signal. These are variants of the LMS algorithm, but are not the LMS algorithm. Understand that the LMS algorithm is defined as a gradient descent with the error function being a linear difference of the desired and calculated. I would like to see if they have any data showing the comparison of their algorithm and the use of reduced update gain to minimize coefficient misadjustment for stationary Laplase-distributed inputs. Remember that the coefficient misadjustment in the LMS algorithm is directly related to the value of the update gain causing a overshoot of the optimun Wiener solution of the normal equation.
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; Without studying their paper nor having looked at the original papaer on which this is based (their reference [6]), one thing that bothered me is their statement:
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; To formulate the optimization problem, we need only focus on the second order behavior v(k) as reflected in n(k) = E[v(k)^T v(k)]
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; This may be true only for Gaussian-distributed inputs. One may need to look at higher orders for non-gaussian signals.
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; Around 1990-1991 time-frame I used a variant of the LMS that incorporated the median of the matrix of error vectors and input vectors to mitigate the effects of impulse noise on the error signal. That variant was there not to improve the LMS algorithm, but to mitigate a different problem. It was called the median-LMS, but never called the LMS.
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; All of this doesn&amp;amp;#39;t mean that the LMS is optimal only for Gaussian-white inputs.
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; &amp;amp;gt; By-the-way, the sign-LMS was originally introduced around the 1960&amp;amp;#39;s or 1970&amp;amp;#39;s to solve a different problem: inability of slow processors to fully implement the multiplications needed for the up-date calculations. As I remember, it was called a sign-sign algorithm.
> &gt; &gt; &amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; Indeed, that is why I was pleasantly surprised that it appears to be
> &gt; &gt; &amp;gt; &amp;gt; better at the job than LMS for speech signals.
> &gt; &gt; &amp;gt; &amp;gt; The thing about LMS is that it is effectively deterministic in nature
> &gt; &gt; &amp;gt; &amp;gt; ie a knowledge of the statistical properties of the driving noise is
> &gt; &gt; &amp;gt; &amp;gt; not required.
> &gt; &gt; &amp;gt; &amp;gt; Same for RLS of course. Nothing wrong with that of course if it works,
> &gt; &gt; &amp;gt; &amp;gt; but if you can get the same result from a simpler solution then why
> &gt; &gt; &amp;gt; &amp;gt; not.
> &gt; &gt; &amp;gt;
> &gt; &gt; &amp;gt; &amp;gt; Hardy
> &gt; &gt; &amp;gt;
> &gt; &gt; &amp;gt; Hardy,
> &gt; &gt; &amp;gt; Keep in mind that what I am addressing is your statemant
> &gt; &gt; &amp;gt;
> &gt; &gt; &amp;gt; &amp;quot;That the LMS derivation and equation is only optimal for Guassian
> &gt; &gt; &amp;gt; driving signals&amp;quot;
> &gt; &gt; &amp;gt;
> &gt; &gt; &amp;gt; not the merits of the LMS algorithm with different signal types.
> &gt;
> &gt; &gt; Well it depends how you derive it! I can derive it from maximum
> &gt; &gt; likelihood and get different answers for different noise
> &gt; &gt; distributions.
> &gt; &gt; It is merely a matter of definition here.
> &gt;
> &gt; &gt; Hardy
> &gt;
> &gt; Hardy,
> &gt; The LMS algorithm is a specific algorithm with a specific form. Others are variants, NOT the LMS algorithm.
> 
> Historically maybe.

Historically, my patootie. By definition

Reply by niarn ●July 12, 20122012-07-12

The LMS filter can't distinguish gaussian input and non-gaussian input with
the same variance.

The maximum likelihood objective function is equivalent to the least square
objective function for gaussian input.

The observation model is

y[n] = ww[n]^T xx[n] + e[n]

where e[n] is independent and identically distributed gaussian and for
simplicity with zero mean and unit variance.  We consider N samples of y[n]
and stuff them into an array yy[n] = {y[n-N+1], ..., y[n]}. Then we have

p(yy[n];ww[n]) = \prod_{n=0}^{N-1} Kexp( -0.5 (y[n] - ww[n]^T xx[n])^2 )

The principle (as I understand it) is then to fix the data, vary the
parameter vector (weights) and find the value of the parameter vector that
maximizes p(yy[n];ww[n]) or in other words, find the parameter vector that
maximizes the likelihood of the observed data. That is why sometimes
instead of writing p(yy[n];ww[n]) it is written L(ww[n]) to emphasize that
the pdf is regarded as an objective function. Because the pdf is positive,
the natural logarithm monotonic and out of convenience the negative
log-likelihood is considered, that is

-log p(yy[n];ww[n]) = C + \sum_{n=0}^{N-1} 0.5 (y[n] - ww[n]^T xx[n])^2 )

After differentiation this should reduce to a squared error objective
function. It's been a while since I looked into this so the
description/argumentation is probably not clear.



Now that the topic of non-standard adaptive filtering is discussed I have
pasted code for a Bayesian adaptive FIR filtering algorithm I wrote a small
decade ago. I never took it further than what the code below illustrates. I
almost forgot about it. It is based on sequential markov chain monte carlo
processing (aka particle filtering). One of the things I found interesting
is how easy it was to make an adaptive IIR filter that enforced stability.



N=30; %number of particles
w=cell(N,1);
v=zeros(N,1);
for idx=1:N %initialize
	w{idx}=zeros(order,siglen+1);
	w{idx}(:,1)=randn(order,1);
	v(idx)=rand(1);
end
sqrtDw=sqrt(1e-4); sqrtDv=sqrt(5e-5);
un=zeros(order,1);pweight=zeros(N,1);
pweightacc=(1/N)*ones(N,1);pfw=zeros(order,siglen);
for t=1:siglen
	un=[u(t); un(1:order-1)];
	for idx=1:N
		%sample filter taps
		w{idx}(:,t+1)=w{idx}(:,t)+sqrtDw*randn(order,1);
		%sample noise variance
		vprev=v(idx);
		v(idx)=vprev+sqrtDv*randn(1,1);
		if v(idx)<=0,
			v(idx)=vprev;
		end
		%compute error
		e(idx)=d(t)-un'*w{idx}(:,t+1);
		pweight(idx) = sqrt(2*pi*v(idx))^(-1)*exp(-0.5*e(idx)^2/v(idx));
	end
	pweight=pweight/sum(pweight);
	%resample
 	N_inv = 1/N;
 	g = N_inv*rand(1, 1); i = 1; j = 1; q = 0; clear idx;
 	while (g < 1),
 		if (q > g),
 			g      = g + N_inv;
 			idx(i) = pick;
 			i      = i + 1;
 		else,
 			pick = j;
 			q    = q + pweight(j);
 			j    = j + 1;
 		end;
 	end;
 	
   	% copying the particles
   	v     = v(idx);
 	w     = w(idx);
	m_w=zeros(order,1);
	for idx=1:N
		m_w  = m_w+w{idx}(:,t+1);
	end
	pfw(:,t)=m_w/N;
	pfv(t)=sum(v)/N;
end

Reply by robert bristow-johnson ●July 13, 20122012-07-13

On 7/12/12 4:42 PM, maury wrote:
> On Wednesday, July 11, 2012 2:11:13 PM UTC-5, HardySpicer wrote:
>> On Jul 12, 4:08 am, maury <maury...@core.com> wrote:
>>
...
>>> The LMS algorithm is a specific algorithm with a specific form. Others are variants, NOT the LMS algorithm.
>>
>> Historically maybe.
>
> Historically, my patootie. By definition

i certainly don't wanna see maury's patootie, but Hardy, it's pretty 
confusing what i am seeing here.  i mean, i thought i knew what the LMS 
adaptive filter is, on it's most fundamental level, and i thought i 
understood how it's derived and what assumptions have to be made to get 
to the equations that you actually implement in code.

   "desired" signal:  d[n]

which we think has some component that is correlated in some way to

   input:  x[n]

there is a (possibly time-varying) FIR filter in which we are going to 
use to model what it is that correlates input x[n] to desired d[n], and 
that filter has

                          N-1
   FIR output:  y[n]  =   SUM{ h[n,i] * x[n-i] }
                          i=0

the error signal that tells us how well we're hitting our target d[n] is 
e[n]:

    y[n]  +  e[n]  =  d[n]

or

    e[n]  =  d[n]  -  y[n]

so a least-square metric for the strength of the error signal e[n] is

    M-1
    SUM{ (e[n-i])^2 * w[i] }
    i=0

for some window function w[n]

first assumption is that rather that minimizing that metric shown, we 
might get to the same place by working on minimizing (e[n])^2 (just the 
first term).  so given a desired signal d[n] and input x[n], how might 
we twist the FIR coefficients to incrementally decrease (e[n])^2?

if you imagine (e[n])^2 as a function of h[n,i] (where d[n] and x[n] are 
fixed and known), then you compute the gradient or (e[n])^2 with respect 
to h[n,i].

       (e[n])^2  =  ( d[n] - y[n] )^2

     d(e[n]^2)/d(h[n,i])  =  2* e[n] * d(e[n])/d(h[n,i])

                          =  2*( d[n] - y[n] )*( -d(y[n])/d(h[n,i]) )

                          =  2*( d[n] - y[n] )*( -x[n-i] )

so with N coefficients, this is a vector that points "uphill" on 
(e[n])^2 vs. h[n,i].  we want to incrementally adjust (or "adapt") the 
FIR coefficients in the direction opposite of that gradient vector:

     h[n+1,i]  =   h[n,i]  -  mu/2 * d(e[n]^2)/d(h[n,i])

               =   h[n,i]  -  mu * ( d[n] - y[n] )*( x[n-i] )

               =   h[n,i]  -  mu*e[n]*x[n-i]

this moves down the slope of the e[n]^2 vs. h[n,i]  (w.r.t. various i).
the speed that it moves down that slope is proportional to mu, which is 
called the "adaptation gain".  for the normalized LMS, you want to scale 
mu to be proportional to 1/(e[n]*x[n-i]) which is sorta proportional to 
1/(x[n]^2).  so normalized LMS looks like

     h[n+1,i]   =   h[n,i]  -  (mu/(x[n])^2)*e[n]*x[n-i]

now, Hardy, can you relate this to anything that you're talking about?

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by niarn ●July 13, 20122012-07-13

>now, Hardy, can you relate this to anything that you're talking about?

Hi rbj, Let's do it in steps.

This is the 'FIR model' 

                          N-1
   y[n]  =   SUM{ h[n,i] * x[n-i] }  + e[n]
                          i=0
Do you accept this formulation as a formulation of adaptive FIR filtering:
"Given y[n] and x[n] and given some assumptions about e[n], then in some
sense estimate/compute h[n]."

I could also have written (but for simplicity only consider the above
formulation for now):
"Given y[n] and x[n] and given some assumptions about e[n] and h[n], then
in some sense estimate/compute h[n]."

Reply by robert bristow-johnson ●July 13, 20122012-07-13

niarn, it helps when doing "ASCII math", to set your font to a 
mono-spaced font, so you can line things up (like the limits to the 
summation) better for everyone to see.  otherwise i have to guess at 
which font you're seeing things with.

On 7/13/12 10:14 AM, niarn wrote:
>> now, Hardy, can you relate this to anything that you're talking about?
>
> Hi rbj, Let's do it in steps.
>
> This is the 'FIR model'
>
>                            N-1
>     y[n]  =   SUM{ h[n,i] * x[n-i] }  + e[n]
>                            i=0

that is *not* the FIR model.   there is no e[n] in the FIR model.  but 
there *is* a "desired signal", d[n], that is based on the same x[n] 
going into the FIR and that we're trying to match the FIR to.  very 
similar to "system identification".

> Do you accept this formulation as a formulation of adaptive FIR filtering:
> "Given y[n] and x[n] and given some assumptions about e[n], then in some
> sense estimate/compute h[n]."

a time-invariant FIR is, of course

               N-1
     y[n]  =   SUM{ h[i] * x[n-i] }
               i=0

this is LTI and h[i] can be thought of as the unchanging FIR 
coefficients or h[n] can be thought of as the impulse response of the 
FIR.  (we'll define "n" as discrete "time").

a time-variant FIR would be

               N-1
     y[n]  =   SUM{ h[n,i] * x[n-i] }
               i=0

the impulse response is not well-defined in this case (because it's 
different for impulses applied at different times), but h[n,i] can be 
thought of as the FIR coefficients at time n, or h[m,n] is the impulse 
response of an impulse applied at time m.

we are modeling the desired signal, d[n], as being some *other* filter 
(could be FIR but doesn't have to be) acting on the same input, x[n]. 
so, for shits and grins, let's say it *is* an FIR (same length, N):

               N-1
     d[n]  =   SUM{ g[i] * x[n-i] }
               i=0

but the coefficients, g[i], are hidden and considered fixed (or *very* 
slowly varying) and we have to figger out what they are by looking at 
the input, x[n], and output, d[n], and we have access to both signals.

this would be easy (non-blind deconvolution: use the DFT and iDFT to get 
g[i]) but there is also interference, i'll call v[n].  now this 
interference *might* be exactly the signal you're looking for (it *is* 
in speakerphone echo/feedback cancellation) and the above d[n] (the 
feedback signal from speaker into microphone) is what you are trying to 
get rid of.  so we add this interference signal:

               N-1
     d[n]  =   SUM{ g[i] * x[n-i] }   +   v[n]
               i=0

now, if

               N-1
     y[n]  =   SUM{ h[n,i] * x[n-i] }
               i=0

and if

     e[n]  =  d[n]  -  y[n]

and we can compute this e[n] in real time because we compute y[n] and 
d[n] is given.  then, *only* if h[n,i] has converged to g[i] will this 
apparent error e[n] be the same as v[n].

now, going back the the speakerphone application, when there is no 
"double-talk", then v[n] should be close to zero, and the only reason 
that e[n] is not zero is because h[n,i] is not (yet) the same as the 
target coefficients g[i].  the idea of the simple LMS adaptive filter is 
to slowly get h[n,i] to converge to g[i] (for 0 <= i < N), and then when 
v[n] is not zero, *and* hopefully h[n,i] has already converged to near 
g[i], then e[n] = v[n].  a good LMS application will have some other 
system that will detect or estimate doubletalk and will reduce the 
adaptation gain (so h[n,i] remains constant with n) during doubletalk so 
that it doesn't screw up the adaptation.

> I could also have written (but for simplicity only consider the above
> formulation for now):
> "Given y[n] and x[n] and given some assumptions about e[n] and h[n], then
> in some sense estimate/compute h[n]."

you also need a separate desired signal d[n].  you're not given y[n], 
but you can figure it out from x[n] and the *current* FIR coefficients 
h[n,i].  so you have inputs x[n] and d[n], and you're trying to get y[n] 
to match d[n] and "then in some sense [you] estimate/compute h[i]" to be 
whatever those coefs have to be to get y[n] to best match d[n].

a typical simple LMS filter has *two* inputs, x[n] and d[n], and two 
outputs, y[n] and e[n], and often it's v[n] that we're really looking 
for, which, when the adaptation settles down, is the same as e[n].  if 
the application is system identification, then what we're looking for 
are the coefficients, h_d[i] which should be converged upon by h[n,i].

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by niarn ●July 13, 20122012-07-13

>niarn, it helps when doing "ASCII math", to set your font to a 
>mono-spaced font, so you can line things up (like the limits to the 
>summation) better for everyone to see.  otherwise i have to guess at 
>which font you're seeing things with.


I'm not sure I can control the font, I'm using a simple text edit control
on the dsprelated website. But I will do what I can.


>> This is the 'FIR model'
>>
>>                N-1
>>     y[n]  =   SUM{ h[n,i] * x[n-i] }  + e[n]
>>                i=0
>
>that is *not* the FIR model.   there is no e[n] in the FIR model.  but 
>there *is* a "desired signal", d[n], that is based on the same x[n] 
>going into the FIR and that we're trying to match the FIR to.  very 
>similar to "system identification".

There must be an e[n] in there. That is the error that accounts for the
model mismatch. Then let's call it observation model or signal model. I
think we mean exactly the same it is just a matter of aligning what we call
things. Using your equations, you write

>     e[n]  =  d[n]  -  y[n]

Let us rearrange to 

     d[n]  =  y[n]  +  e[n]

Then substitute your expression for y[n]

               N-1
     d[n]  =   SUM{ h[n,i] * x[n-i] } + e[n]
               i=0

This is exactly what I before referred to as the 'FIR model' but now with
different symbols. Using your notation can you then accept this formulation
as a formulation of adaptive FIR filtering:
"Given d[n] and x[n] and given some assumptions about e[n], then in some
sense estimate/compute h[n,m]."

Reply by robert bristow-johnson ●July 13, 20122012-07-13

On 7/13/12 3:59 PM, niarn wrote:
>> niarn, it helps when doing "ASCII math", to set your font to a
>> mono-spaced font, so you can line things up (like the limits to the
>> summation) better for everyone to see.  otherwise i have to guess at
>> which font you're seeing things with.
>
>
> I'm not sure I can control the font, I'm using a simple text edit control
> on the dsprelated website. But I will do what I can.
>
>
>>> This is the 'FIR model'
>>>
>>>                 N-1
>>>      y[n]  =   SUM{ h[n,i] * x[n-i] }  + e[n]
>>>                 i=0
>>
>> that is *not* the FIR model.   there is no e[n] in the FIR model.  but
>> there *is* a "desired signal", d[n], that is based on the same x[n]
>> going into the FIR and that we're trying to match the FIR to.  very
>> similar to "system identification".
>
> There must be an e[n] in there. That is the error that accounts for the
> model mismatch. Then let's call it observation model or signal model. I
> think we mean exactly the same it is just a matter of aligning what we call
> things. Using your equations, you write
>
>>      e[n]  =  d[n]  -  y[n]
>
> Let us rearrange to
>
>       d[n]  =  y[n]  +  e[n]
>
> Then substitute your expression for y[n]
>
>                 N-1
>       d[n]  =   SUM{ h[n,i] * x[n-i] } + e[n]
>                 i=0
>
> This is exactly what I before referred to as the 'FIR model' but now with
> different symbols.

right, i had y[n] for the actual FIR output.

i just wanted to show that the so-called "desired" signal, d[n] has two 
very similar expressions, both:

              +inf
     d[n]  =   SUM{ g[i] * x[n-i] }   +   v[n]
               i=0
                                  (notice i changed something...)
and

     d[n]  =   y[n]  +  e[n]

               N-1
           =   SUM{ h[n,i] * x[n-i] }  +  e[n]
               i=0

forgive me for referring to the speaker feedback canceling app, but it's 
handy for illustration.  the first d[n] is what arrives at the 
microphone of the speakerphone, it is the remote signal x[n] (the other 
party) getting processed by the loudspeaker and the acoustic transfer 
function (the room and reflections) from loudspeaker to microphone. 
v[n] is the present speaker speaking.  the g[i] coefs are due to the 
physical situation, you have no control of them, but you might like 
knowing what they are so you can subtract the feedback signal and have 
only v[n] left.  *but* from the POV of estimating that remote signal and 
transfer function, v[n] is considered "interference".

now, let's say that you need not worry about g[i] for i >= N .
if h[n,i] = g[i], then this error signal e[n] that you compute, is 
identical to v[n].  we're done and we need to do no more adapting and 
h[n+1,i] = h[n,i] and the system is static.

but if the coefficients that you *can* adjust, h[n,i] do not match the 
target coefs, g[i], then the error signal e[n] has two components:

                    N-1
          e[n]  =   SUM{ (g[i] - h[n,i]) * x[n-i] }  +  v[n]
                    i=0

it has a component that the adaptation algorithm should force to zero 
and it has a component that is completely independent.

"doubletalk" is the problem with this e[n] when *both* x[n] and v[n] are 
not zero (over a period of time).  when both signals are hot, then 
trying to reduce e[n] to zero might not be what you want.  you really 
just want that first term going to zero in some sense.

> Using your notation can you then accept this formulation
> as a formulation of adaptive FIR filtering:
> "Given d[n] and x[n] and given some assumptions about e[n], then in some
> sense estimate/compute h[n,m]."

apart from the doubletalk issue, yes, the problem statement is to do 
something about h[n,i] so that e[n] is reduced to a minimum in a 
mean-square sense.  that is what the "LMS" is about.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by robert bristow-johnson ●July 13, 20122012-07-13

i made a mistake about the Normalized LMS ...

On 7/13/12 2:22 AM, robert bristow-johnson wrote:
...
>
>  h[n+1,i] = h[n,i] - mu/2 * d(e[n]^2)/d(h[n,i])
>
>           = h[n,i] - mu * ( d[n] - y[n] )*( x[n-i] )
>
>           = h[n,i] - mu*e[n]*x[n-i]
>
>
> this moves down the slope of the e[n]^2 vs. h[n,i] (w.r.t. various i).
> the speed that it moves down that slope is proportional to mu, which is
> called the "adaptation gain". for the normalized LMS, you want to scale
> mu to be proportional to 1/(e[n]*x[n-i]) which is sorta proportional to
> 1/(x[n]^2). so normalized LMS looks like
>
>
>     h[n+1,i] = h[n,i] - (mu/(x[n])^2)*e[n]*x[n-i]
>



this should say:

for the normalized LMS, you want to scale mu to be proportional to

       1/mean{ (x[n])^2 }

where
                           N-1
      mean{ (x[n])^2 }  =  SUM{ (x[n-i])^2 * w[i]}
                           i=0

for some window function w[i].

so

       h[n+1,i]  =  h[n,i]  -  mu/mean{(x[n])^2} * e[n]*x[n-i]


there is a sorta adaptation trick to track 1/mean{ (x[n])^2 } without 
doing division if this were in a DSP where division is hard or expensive.


-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."

Reply by niarn ●July 14, 20122012-07-14

>> Using your notation can you then accept this formulation
>> as a formulation of adaptive FIR filtering:
>> "Given d[n] and x[n] and given some assumptions about e[n], then in
some
>> sense estimate/compute h[n,m]."

>apart from the doubletalk issue, yes, the problem statement is to do 
>something about h[n,i] so that e[n] is reduced to a minimum in a 
>mean-square sense.  that is what the "LMS" is about.

I assume then that you accept the formulation of the adaptive FIR
filtering. 

The next step is then to write the expression for the PDF of d[n], written
as p(d[n];h[n,m]). Note that this is not a conditional PDF, the semi colon
is there to note/express that the PDF of d[n] is parameterized by h[n,m].
In our model

               N-1
     d[n]  =   SUM{ h[n,i] * x[n-i] } + e[n]   =   y[n] + e[n]
               i=0

we now take e[n] to be normally distributed with zero mean and variance
sigma. Typically, if I remember correct this is sometimes written in the
following way, e[n] ~ N(0,sigma)

Then we have: d[n] ~ N(y[n],sigma)

or if we want to write the detailed expression

p(d[n];h[n,m]) = sqrt(2 pi sigma)^{-1} exp(-0.5 (d[n] - y[n])^2 / sigma )


Does this make any sense?