DSPRelated.com
Forums

Did you know - LMS

Started by HardySpicer July 4, 2012
On Thursday, July 5, 2012 8:03:04 PM UTC-5, HardySpicer wrote:
> On Jul 6, 11:53&#4294967295;am, HardySpicer <gyansor...@gmail.com> wrote: > > On Jul 6, 2:44&#4294967295;am, maury <maury...@core.com> wrote: > > > > > On Wednesday, July 4, 2012 1:15:04 AM UTC-5, HardySpicer wrote: > > > > That the LMS derivation and equation is only optimal for Guassian > > > > driving signals. If the driving noise through an unknown system has > > > > some other form of distribution (say Laplace as is the case with > > > > speech), then the best estimator of that FIR system is the simpler > > > > sign() LMS. I found this quite a pleasant surprise. > > > > > > Hardy > > > > > On what do you base this? It is my understanding that the only constraint on the LMS algorithm is that the input vector be independent of the weight vector (this in itself has hugh implications). As for as optimality is concerned, the LMS only provides optimality in the sense that the filter will estimate the Wiener optimal solution if the input is stationary (in addition to the independence requirement). > > > > > LMS requires neither gaussian nor white inputs. > > > > It's not a major difference that I can see, but it is an important > > observation. > > LMS can be derived from maximum likelihood assuming a Guassian > > distribution. However, if you assume a different distribution > > you get a different algorithm tailored to that distribution. It should > > give better results. From my simulations it's not that striking > > though. > > See this paper > > > > OPTIMUM ERROR NONLINEARITIES FOR LMS ADAPTATION > > S. C. Douglas and T. H.-Y. Meng > > > > Hardy > > and from that paper it says > > for Laplacian plant noise, we may achieve a > 3dB reduction in misadjustment using a sign error nonlinearity for a > given convergence rate as compared to standard LMS adaptation
I just took a quick glance at the paper you mentioned. The authors are looking at the effect and performance of nonlinear functions of the error signal. These are variants of the LMS algorithm, but are not the LMS algorithm. Understand that the LMS algorithm is defined as a gradient descent with the error function being a linear difference of the desired and calculated. I would like to see if they have any data showing the comparison of their algorithm and the use of reduced update gain to minimize coefficient misadjustment for stationary Laplase-distributed inputs. Remember that the coefficient misadjustment in the LMS algorithm is directly related to the value of the update gain causing a overshoot of the optimun Wiener solution of the normal equation. Without studying their paper nor having looked at the original papaer on which this is based (their reference [6]), one thing that bothered me is their statement: To formulate the optimization problem, we need only focus on the second order behavior v(k) as reflected in n(k) = E[v(k)^T v(k)] This may be true only for Gaussian-distributed inputs. One may need to look at higher orders for non-gaussian signals. Around 1990-1991 time-frame I used a variant of the LMS that incorporated the median of the matrix of error vectors and input vectors to mitigate the effects of impulse noise on the error signal. That variant was there not to improve the LMS algorithm, but to mitigate a different problem. It was called the median-LMS, but never called the LMS. All of this doesn't mean that the LMS is optimal only for Gaussian-white inputs. By-the-way, the sign-LMS was originally introduced around the 1960's or 1970's to solve a different problem: inability of slow processors to fully implement the multiplications needed for the up-date calculations. As I remember, it was called a sign-sign algorithm.
On Jul 9, 3:20&#4294967295;am, maury <maury...@core.com> wrote:
> On Thursday, July 5, 2012 8:03:04 PM UTC-5, HardySpicer wrote: > > On Jul 6, 11:53&#4294967295;am, HardySpicer <gyansor...@gmail.com> wrote: > > > On Jul 6, 2:44&#4294967295;am, maury <maury...@core.com> wrote: > > > > > On Wednesday, July 4, 2012 1:15:04 AM UTC-5, HardySpicer wrote: > > > > > That the LMS derivation and equation is only optimal for Guassian > > > > > driving signals. If the driving noise through an unknown system has > > > > > some other form of distribution (say Laplace as is the case with > > > > > speech), then the best estimator of that FIR system is the simpler > > > > > sign() LMS. I found this quite a pleasant surprise. > > > > > > Hardy > > > > > On what do you base this? It is my understanding that the only constraint on the LMS algorithm is that the input vector be independent of the weight vector (this in itself has hugh implications). As for as optimality is concerned, the LMS only provides optimality in the sense that the filter will estimate the Wiener optimal solution if the input is stationary (in addition to the independence requirement). > > > > > LMS requires neither gaussian nor white inputs. > > > > It's not a major difference that I can see, but it is an important > > > observation. > > > LMS can be derived from maximum likelihood assuming a Guassian > > > distribution. However, if you assume a different distribution > > > you get a different algorithm tailored to that distribution. It should > > > give better results. From my simulations it's not that striking > > > though. > > > See this paper > > > > OPTIMUM ERROR NONLINEARITIES FOR LMS ADAPTATION > > > S. C. Douglas and T. H.-Y. Meng > > > > Hardy > > > and from that paper it says > > > for Laplacian plant noise, we may achieve a > > 3dB reduction in misadjustment using a sign error nonlinearity for a > > given convergence rate as compared to standard LMS adaptation > > I just took a quick glance at the paper you mentioned. The authors are looking at the effect and performance of nonlinear functions of the error signal. These are variants of the LMS algorithm, but are not the LMS algorithm. Understand that the LMS algorithm is defined as a gradient descent with the error function being a linear difference of the desired and calculated. I would like to see if they have any data showing the comparison of their algorithm and the use of reduced update gain to minimize coefficient misadjustment for stationary Laplase-distributed inputs. Remember that the coefficient misadjustment in the LMS algorithm is directly related to the value of the update gain causing a overshoot of the optimun Wiener solution of the normal equation. > > Without studying their paper nor having looked at the original papaer on which this is based (their reference [6]), one thing that bothered me is their statement: > > To formulate the optimization problem, we need only focus on the second order behavior v(k) as reflected in n(k) = E[v(k)^T v(k)] > > This may be true only for Gaussian-distributed inputs. One may need to look at higher orders for non-gaussian signals. > > Around 1990-1991 time-frame I used a variant of the LMS that incorporated the median of the matrix of error vectors and input vectors to mitigate the effects of impulse noise on the error signal. That variant was there not to improve the LMS algorithm, but to mitigate a different problem. It was called the median-LMS, but never called the LMS. > > All of this doesn't mean that the LMS is optimal only for Gaussian-white inputs. > > By-the-way, the sign-LMS was originally introduced around the 1960's or 1970's to solve a different problem: inability of slow processors to fully implement the multiplications needed for the up-date calculations. As I remember, it was called a sign-sign algorithm.
Indeed, that is why I was pleasantly surprised that it appears to be better at the job than LMS for speech signals. The thing about LMS is that it is effectively deterministic in nature ie a knowledge of the statistical properties of the driving noise is not required. Same for RLS of course. Nothing wrong with that of course if it works, but if you can get the same result from a simpler solution then why not. Hardy
On Monday, July 9, 2012 6:49:06 PM UTC-5, HardySpicer wrote:
> On Jul 9, 3:20&#4294967295;am, maury &lt;maury...@core.com&gt; wrote: > &gt; On Thursday, July 5, 2012 8:03:04 PM UTC-5, HardySpicer wrote: > &gt; &gt; On Jul 6, 11:53&#4294967295;am, HardySpicer &lt;gyansor...@gmail.com&gt; wrote: > &gt; &gt; &gt; On Jul 6, 2:44&#4294967295;am, maury &lt;maury...@core.com&gt; wrote: > &gt; > &gt; &gt; &gt; &gt; On Wednesday, July 4, 2012 1:15:04 AM UTC-5, HardySpicer wrote: > &gt; &gt; &gt; &gt; &gt; That the LMS derivation and equation is only optimal for Guassian > &gt; &gt; &gt; &gt; &gt; driving signals. If the driving noise through an unknown system has > &gt; &gt; &gt; &gt; &gt; some other form of distribution (say Laplace as is the case with > &gt; &gt; &gt; &gt; &gt; speech), then the best estimator of that FIR system is the simpler > &gt; &gt; &gt; &gt; &gt; sign() LMS. I found this quite a pleasant surprise. > &gt; > &gt; &gt; &gt; &gt; &gt; Hardy > &gt; > &gt; &gt; &gt; &gt; On what do you base this? It is my understanding that the only constraint on the LMS algorithm is that the input vector be independent of the weight vector (this in itself has hugh implications). As for as optimality is concerned, the LMS only provides optimality in the sense that the filter will estimate the Wiener optimal solution if the input is stationary (in addition to the independence requirement). > &gt; > &gt; &gt; &gt; &gt; LMS requires neither gaussian nor white inputs. > &gt; > &gt; &gt; &gt; It&#39;s not a major difference that I can see, but it is an important > &gt; &gt; &gt; observation. > &gt; &gt; &gt; LMS can be derived from maximum likelihood assuming a Guassian > &gt; &gt; &gt; distribution. However, if you assume a different distribution > &gt; &gt; &gt; you get a different algorithm tailored to that distribution. It should > &gt; &gt; &gt; give better results. From my simulations it&#39;s not that striking > &gt; &gt; &gt; though. > &gt; &gt; &gt; See this paper > &gt; > &gt; &gt; &gt; OPTIMUM ERROR NONLINEARITIES FOR LMS ADAPTATION > &gt; &gt; &gt; S. C. Douglas and T. H.-Y. Meng > &gt; > &gt; &gt; &gt; Hardy > &gt; > &gt; &gt; and from that paper it says > &gt; > &gt; &gt; for Laplacian plant noise, we may achieve a > &gt; &gt; 3dB reduction in misadjustment using a sign error nonlinearity for a > &gt; &gt; given convergence rate as compared to standard LMS adaptation > &gt; > &gt; I just took a quick glance at the paper you mentioned. The authors are looking at the effect and performance of nonlinear functions of the error signal. These are variants of the LMS algorithm, but are not the LMS algorithm. Understand that the LMS algorithm is defined as a gradient descent with the error function being a linear difference of the desired and calculated. I would like to see if they have any data showing the comparison of their algorithm and the use of reduced update gain to minimize coefficient misadjustment for stationary Laplase-distributed inputs. Remember that the coefficient misadjustment in the LMS algorithm is directly related to the value of the update gain causing a overshoot of the optimun Wiener solution of the normal equation. > &gt; > &gt; Without studying their paper nor having looked at the original papaer on which this is based (their reference [6]), one thing that bothered me is their statement: > &gt; > &gt; To formulate the optimization problem, we need only focus on the second order behavior v(k) as reflected in n(k) = E[v(k)^T v(k)] > &gt; > &gt; This may be true only for Gaussian-distributed inputs. One may need to look at higher orders for non-gaussian signals. > &gt; > &gt; Around 1990-1991 time-frame I used a variant of the LMS that incorporated the median of the matrix of error vectors and input vectors to mitigate the effects of impulse noise on the error signal. That variant was there not to improve the LMS algorithm, but to mitigate a different problem. It was called the median-LMS, but never called the LMS. > &gt; > &gt; All of this doesn&#39;t mean that the LMS is optimal only for Gaussian-white inputs. > &gt; > &gt; By-the-way, the sign-LMS was originally introduced around the 1960&#39;s or 1970&#39;s to solve a different problem: inability of slow processors to fully implement the multiplications needed for the up-date calculations. As I remember, it was called a sign-sign algorithm. > > Indeed, that is why I was pleasantly surprised that it appears to be > better at the job than LMS for speech signals. > The thing about LMS is that it is effectively deterministic in nature > ie a knowledge of the statistical properties of the driving noise is > not required. > Same for RLS of course. Nothing wrong with that of course if it works, > but if you can get the same result from a simpler solution then why > not. > > > Hardy
Hardy, Keep in mind that what I am addressing is your statemant "That the LMS derivation and equation is only optimal for Guassian driving signals" not the merits of the LMS algorithm with different signal types.
On Jul 11, 2:31&#4294967295;am, maury <maury...@core.com> wrote:
> On Monday, July 9, 2012 6:49:06 PM UTC-5, HardySpicer wrote: > > On Jul 9, 3:20&#4294967295;am, maury &lt;maury...@core.com&gt; wrote: > > &gt; On Thursday, July 5, 2012 8:03:04 PM UTC-5, HardySpicer wrote: > > &gt; &gt; On Jul 6, 11:53&#4294967295;am, HardySpicer &lt;gyansor...@gmail.com&gt; wrote: > > &gt; &gt; &gt; On Jul 6, 2:44&#4294967295;am, maury &lt;maury...@core.com&gt; wrote: > > &gt; > > &gt; &gt; &gt; &gt; On Wednesday, July 4, 2012 1:15:04 AM UTC-5, HardySpicer wrote: > > &gt; &gt; &gt; &gt; &gt; That the LMS derivation and equation is only optimal for Guassian > > &gt; &gt; &gt; &gt; &gt; driving signals. If the driving noise through an unknown system has > > &gt; &gt; &gt; &gt; &gt; some other form of distribution (say Laplace as is the case with > > &gt; &gt; &gt; &gt; &gt; speech), then the best estimator of that FIR system is the simpler > > &gt; &gt; &gt; &gt; &gt; sign() LMS. I found this quite a pleasant surprise. > > &gt; > > &gt; &gt; &gt; &gt; &gt; Hardy > > &gt; > > &gt; &gt; &gt; &gt; On what do you base this? It is my understanding that the only constraint on the LMS algorithm is that the input vector be independent of the weight vector (this in itself has hugh implications). As for as optimality is concerned, the LMS only provides optimality in the sense that the filter will estimate the Wiener optimal solution if the input is stationary (in addition to the independence requirement). > > &gt; > > &gt; &gt; &gt; &gt; LMS requires neither gaussian nor white inputs. > > &gt; > > &gt; &gt; &gt; It&#39;s not a major difference that I can see, but it is an important > > &gt; &gt; &gt; observation. > > &gt; &gt; &gt; LMS can be derived from maximum likelihood assuming a Guassian > > &gt; &gt; &gt; distribution. However, if you assume a different distribution > > &gt; &gt; &gt; you get a different algorithm tailored to that distribution. It should > > &gt; &gt; &gt; give better results. From my simulations it&#39;s not that striking > > &gt; &gt; &gt; though. > > &gt; &gt; &gt; See this paper > > &gt; > > &gt; &gt; &gt; OPTIMUM ERROR NONLINEARITIES FOR LMS ADAPTATION > > &gt; &gt; &gt; S. C. Douglas and T. H.-Y. Meng > > &gt; > > &gt; &gt; &gt; Hardy > > &gt; > > &gt; &gt; and from that paper it says > > &gt; > > &gt; &gt; for Laplacian plant noise, we may achieve a > > &gt; &gt; 3dB reduction in misadjustment using a sign error nonlinearity for a > > &gt; &gt; given convergence rate as compared to standard LMS adaptation > > &gt; > > &gt; I just took a quick glance at the paper you mentioned. The authors are looking at the effect and performance of nonlinear functions of the error signal. These are variants of the LMS algorithm, but are not the LMS algorithm. Understand that the LMS algorithm is defined as a gradient descent with the error function being a linear difference of the desired and calculated. I would like to see if they have any data showing the comparison of their algorithm and the use of reduced update gain to minimize coefficient misadjustment for stationary Laplase-distributed inputs. Remember that the coefficient misadjustment in the LMS algorithm is directly related to the value of the update gain causing a overshoot of the optimun Wiener solution of the normal equation. > > &gt; > > &gt; Without studying their paper nor having looked at the original papaer on which this is based (their reference [6]), one thing that bothered me is their statement: > > &gt; > > &gt; To formulate the optimization problem, we need only focus on the second order behavior v(k) as reflected in n(k) = E[v(k)^T v(k)] > > &gt; > > &gt; This may be true only for Gaussian-distributed inputs. One may need to look at higher orders for non-gaussian signals. > > &gt; > > &gt; Around 1990-1991 time-frame I used a variant of the LMS that incorporated the median of the matrix of error vectors and input vectors to mitigate the effects of impulse noise on the error signal. That variant was there not to improve the LMS algorithm, but to mitigate a different problem. It was called the median-LMS, but never called the LMS. > > &gt; > > &gt; All of this doesn&#39;t mean that the LMS is optimal only for Gaussian-white inputs. > > &gt; > > &gt; By-the-way, the sign-LMS was originally introduced around the 1960&#39;s or 1970&#39;s to solve a different problem: inability of slow processors to fully implement the multiplications needed for the up-date calculations. As I remember, it was called a sign-sign algorithm. > > > Indeed, that is why I was pleasantly surprised that it appears to be > > better at the job than LMS for speech signals. > > The thing about LMS is that it is effectively deterministic in nature > > ie a knowledge of the statistical properties of the driving noise is > > not required. > > Same for RLS of course. Nothing wrong with that of course if it works, > > but if you can get the same result from a simpler solution then why > > not. > > > Hardy > > Hardy, > Keep in mind that what I am addressing is your statemant > > "That the LMS derivation and equation is only optimal for Guassian > driving signals" > > not the merits of the LMS algorithm with different signal types.
Well it depends how you derive it! I can derive it from maximum likelihood and get different answers for different noise distributions. It is merely a matter of definition here. Hardy
On 7/10/12 10:51 PM, HardySpicer wrote:
> On Jul 11, 2:31 am, maury<maury...@core.com> wrote: >>
...
>> Keep in mind that what I am addressing is your statement >> >> "That the LMS derivation and equation is only optimal for Gaussian >> driving signals" >> >> not the merits of the LMS algorithm with different signal types. > > Well it depends how you derive it! I can derive it from maximum > likelihood and get different answers for different noise > distributions.
i would like to see this. do you mind deriving LMS here (using whatever flavor of ASCII-math is to your liking), once with Gaussian and again with some other distribution? what difference does it make? a different adaptation gain, mu? or do you get something different from the standard LMS equations?
> It is merely a matter of definition here.
Hardy, i would like to see this. can you show us? -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
On Tuesday, July 10, 2012 9:51:16 PM UTC-5, HardySpicer wrote:
> On Jul 11, 2:31&#4294967295;am, maury &lt;maury...@core.com&gt; wrote: > &gt; On Monday, July 9, 2012 6:49:06 PM UTC-5, HardySpicer wrote: > &gt; &gt; On Jul 9, 3:20&#4294967295;am, maury &amp;lt;maury...@core.com&amp;gt; wrote: > &gt; &gt; &amp;gt; On Thursday, July 5, 2012 8:03:04 PM UTC-5, HardySpicer wrote: > &gt; &gt; &amp;gt; &amp;gt; On Jul 6, 11:53&#4294967295;am, HardySpicer &amp;lt;gyansor...@gmail.com&amp;gt; wrote: > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; On Jul 6, 2:44&#4294967295;am, maury &amp;lt;maury...@core.com&amp;gt; wrote: > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; On Wednesday, July 4, 2012 1:15:04 AM UTC-5, HardySpicer wrote: > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; That the LMS derivation and equation is only optimal for Guassian > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; driving signals. If the driving noise through an unknown system has > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; some other form of distribution (say Laplace as is the case with > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; speech), then the best estimator of that FIR system is the simpler > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; sign() LMS. I found this quite a pleasant surprise. > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; Hardy > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; On what do you base this? It is my understanding that the only constraint on the LMS algorithm is that the input vector be independent of the weight vector (this in itself has hugh implications). As for as optimality is concerned, the LMS only provides optimality in the sense that the filter will estimate the Wiener optimal solution if the input is stationary (in addition to the independence requirement). > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; LMS requires neither gaussian nor white inputs. > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; It&amp;#39;s not a major difference that I can see, but it is an important > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; observation. > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; LMS can be derived from maximum likelihood assuming a Guassian > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; distribution. However, if you assume a different distribution > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; you get a different algorithm tailored to that distribution. It should > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; give better results. From my simulations it&amp;#39;s not that striking > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; though. > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; See this paper > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; OPTIMUM ERROR NONLINEARITIES FOR LMS ADAPTATION > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; S. C. Douglas and T. H.-Y. Meng > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; Hardy > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; &amp;gt; and from that paper it says > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; &amp;gt; for Laplacian plant noise, we may achieve a > &gt; &gt; &amp;gt; &amp;gt; 3dB reduction in misadjustment using a sign error nonlinearity for a > &gt; &gt; &amp;gt; &amp;gt; given convergence rate as compared to standard LMS adaptation > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; I just took a quick glance at the paper you mentioned. The authors are looking at the effect and performance of nonlinear functions of the error signal. These are variants of the LMS algorithm, but are not the LMS algorithm. Understand that the LMS algorithm is defined as a gradient descent with the error function being a linear difference of the desired and calculated. I would like to see if they have any data showing the comparison of their algorithm and the use of reduced update gain to minimize coefficient misadjustment for stationary Laplase-distributed inputs. Remember that the coefficient misadjustment in the LMS algorithm is directly related to the value of the update gain causing a overshoot of the optimun Wiener solution of the normal equation. > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; Without studying their paper nor having looked at the original papaer on which this is based (their reference [6]), one thing that bothered me is their statement: > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; To formulate the optimization problem, we need only focus on the second order behavior v(k) as reflected in n(k) = E[v(k)^T v(k)] > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; This may be true only for Gaussian-distributed inputs. One may need to look at higher orders for non-gaussian signals. > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; Around 1990-1991 time-frame I used a variant of the LMS that incorporated the median of the matrix of error vectors and input vectors to mitigate the effects of impulse noise on the error signal. That variant was there not to improve the LMS algorithm, but to mitigate a different problem. It was called the median-LMS, but never called the LMS. > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; All of this doesn&amp;#39;t mean that the LMS is optimal only for Gaussian-white inputs. > &gt; &gt; &amp;gt; > &gt; &gt; &amp;gt; By-the-way, the sign-LMS was originally introduced around the 1960&amp;#39;s or 1970&amp;#39;s to solve a different problem: inability of slow processors to fully implement the multiplications needed for the up-date calculations. As I remember, it was called a sign-sign algorithm. > &gt; > &gt; &gt; Indeed, that is why I was pleasantly surprised that it appears to be > &gt; &gt; better at the job than LMS for speech signals. > &gt; &gt; The thing about LMS is that it is effectively deterministic in nature > &gt; &gt; ie a knowledge of the statistical properties of the driving noise is > &gt; &gt; not required. > &gt; &gt; Same for RLS of course. Nothing wrong with that of course if it works, > &gt; &gt; but if you can get the same result from a simpler solution then why > &gt; &gt; not. > &gt; > &gt; &gt; Hardy > &gt; > &gt; Hardy, > &gt; Keep in mind that what I am addressing is your statemant > &gt; > &gt; &quot;That the LMS derivation and equation is only optimal for Guassian > &gt; driving signals&quot; > &gt; > &gt; not the merits of the LMS algorithm with different signal types. > > Well it depends how you derive it! I can derive it from maximum > likelihood and get different answers for different noise > distributions. > It is merely a matter of definition here. > > > Hardy
Hardy, The LMS algorithm is a specific algorithm with a specific form. Others are variants, NOT the LMS algorithm.
On Jul 11, 3:46&#4294967295;pm, robert bristow-johnson <r...@audioimagination.com>
wrote:
> On 7/10/12 10:51 PM, HardySpicer wrote: > > > On Jul 11, 2:31 am, maury<maury...@core.com> &#4294967295;wrote: > > ... > >> Keep in mind that what I am addressing is your statement > > >> "That the LMS derivation and equation is only optimal for Gaussian > >> driving signals" > > >> not the merits of the LMS algorithm with different signal types. > > > Well it depends how you derive it! I can derive it from maximum > > likelihood and get different answers for different noise > > distributions. > > i would like to see this. &#4294967295;do you mind deriving LMS here (using whatever > flavor of ASCII-math is to your liking), once with Gaussian and again > with some other distribution? > > what difference does it make? &#4294967295;a different adaptation gain, mu? &#4294967295;or do > you get something different from the standard LMS equations? > > > It is merely a matter of definition here. > > Hardy, i would like to see this. &#4294967295;can you show us? > > -- > > r b-j &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;r...@audioimagination.com > > "Imagination is more important than knowledge."
I am not very good at ASCII maths. but here goes in words the likelihood function is the product of all the PDFs (up to say n). Each one in the Guassian case has Guassian PDF L(y(k)) = product (i=1,n) Pi(k) It is normal to minimise the -ve log of this since it makes the maths easier when you take the -ve log(e) of an exponential (Guassian distribution) and a product you are left with a sum of squares plus a constant. What you do is make the PDFs the PDFs of the estimation errors and then you end up with ordinary least-squares. Now when you use a Laplacian distribution instead you end up with the sign algorithm. The point of all this is that ordinary least squares is not PDF dependent ie you don't care what the PDFs are whereas maximum likelihood is. In the Guassian case the two methods are the same. Hardy
On Jul 12, 4:08&#4294967295;am, maury <maury...@core.com> wrote:
> On Tuesday, July 10, 2012 9:51:16 PM UTC-5, HardySpicer wrote: > > On Jul 11, 2:31&#4294967295;am, maury &lt;maury...@core.com&gt; wrote: > > &gt; On Monday, July 9, 2012 6:49:06 PM UTC-5, HardySpicer wrote: > > &gt; &gt; On Jul 9, 3:20&#4294967295;am, maury &amp;lt;maury...@core.com&amp;gt; wrote: > > &gt; &gt; &amp;gt; On Thursday, July 5, 2012 8:03:04 PM UTC-5, HardySpicer wrote: > > &gt; &gt; &amp;gt; &amp;gt; On Jul 6, 11:53&#4294967295;am, HardySpicer &amp;lt;gyansor...@gmail.com&amp;gt; wrote: > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; On Jul 6, 2:44&#4294967295;am, maury &amp;lt;maury...@core.com&amp;gt; wrote: > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; On Wednesday, July 4, 2012 1:15:04 AM UTC-5, HardySpicer wrote: > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; That the LMS derivation and equation is only optimal for Guassian > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; driving signals. If the driving noise through an unknown system has > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; some other form of distribution (say Laplace as is the case with > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; speech), then the best estimator of that FIR system is the simpler > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; sign() LMS. I found this quite a pleasant surprise. > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; Hardy > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; On what do you base this? It is my understanding that the only constraint on the LMS algorithm is that the input vector be independent of the weight vector (this in itself has hugh implications). As for as optimality is concerned, the LMS only provides optimality in the sense that the filter will estimate the Wiener optimal solution if the input is stationary (in addition to the independence requirement). > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; LMS requires neither gaussian nor white inputs. > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; It&amp;#39;s not a major difference that I can see, but it is an important > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; observation. > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; LMS can be derived from maximum likelihood assuming a Guassian > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; distribution. However, if you assume a different distribution > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; you get a different algorithm tailored to that distribution. It should > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; give better results. From my simulations it&amp;#39;s not that striking > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; though. > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; See this paper > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; OPTIMUM ERROR NONLINEARITIES FOR LMS ADAPTATION > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; S. C. Douglas and T. H.-Y. Meng > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; &amp;gt; &amp;gt; Hardy > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; &amp;gt; and from that paper it says > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; &amp;gt; for Laplacian plant noise, we may achieve a > > &gt; &gt; &amp;gt; &amp;gt; 3dB reduction in misadjustment using a sign error nonlinearity for a > > &gt; &gt; &amp;gt; &amp;gt; given convergence rate as compared to standard LMS adaptation > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; I just took a quick glance at the paper you mentioned. The authors are looking at the effect and performance of nonlinear functions of the error signal. These are variants of the LMS algorithm, but are not the LMS algorithm. Understand that the LMS algorithm is defined as a gradient descent with the error function being a linear difference of the desired and calculated. I would like to see if they have any data showing the comparison of their algorithm and the use of reduced update gain to minimize coefficient misadjustment for stationary Laplase-distributed inputs. Remember that the coefficient misadjustment in the LMS algorithm is directly related to the value of the update gain causing a overshoot of the optimun Wiener solution of the normal equation. > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; Without studying their paper nor having looked at the original papaer on which this is based (their reference [6]), one thing that bothered me is their statement: > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; To formulate the optimization problem, we need only focus on the second order behavior v(k) as reflected in n(k) = E[v(k)^T v(k)] > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; This may be true only for Gaussian-distributed inputs. One may need to look at higher orders for non-gaussian signals. > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; Around 1990-1991 time-frame I used a variant of the LMS that incorporated the median of the matrix of error vectors and input vectors to mitigate the effects of impulse noise on the error signal. That variant was there not to improve the LMS algorithm, but to mitigate a different problem. It was called the median-LMS, but never called the LMS. > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; All of this doesn&amp;#39;t mean that the LMS is optimal only for Gaussian-white inputs. > > &gt; &gt; &amp;gt; > > &gt; &gt; &amp;gt; By-the-way, the sign-LMS was originally introduced around the 1960&amp;#39;s or 1970&amp;#39;s to solve a different problem: inability of slow processors to fully implement the multiplications needed for the up-date calculations. As I remember, it was called a sign-sign algorithm. > > &gt; > > &gt; &gt; Indeed, that is why I was pleasantly surprised that it appears to be > > &gt; &gt; better at the job than LMS for speech signals. > > &gt; &gt; The thing about LMS is that it is effectively deterministic in nature > > &gt; &gt; ie a knowledge of the statistical properties of the driving noise is > > &gt; &gt; not required. > > &gt; &gt; Same for RLS of course. Nothing wrong with that of course if it works, > > &gt; &gt; but if you can get the same result from a simpler solution then why > > &gt; &gt; not. > > &gt; > > &gt; &gt; Hardy > > &gt; > > &gt; Hardy, > > &gt; Keep in mind that what I am addressing is your statemant > > &gt; > > &gt; &quot;That the LMS derivation and equation is only optimal for Guassian > > &gt; driving signals&quot; > > &gt; > > &gt; not the merits of the LMS algorithm with different signal types. > > > Well it depends how you derive it! I can derive it from maximum > > likelihood and get different answers for different noise > > distributions. > > It is merely a matter of definition here. > > > Hardy > > Hardy, > The LMS algorithm is a specific algorithm with a specific form. Others are variants, NOT the LMS algorithm.
Historically maybe.
On 7/11/12 3:10 PM, HardySpicer wrote:
> On Jul 11, 3:46 pm, robert bristow-johnson<r...@audioimagination.com> > wrote: >> On 7/10/12 10:51 PM, HardySpicer wrote: >> >>> On Jul 11, 2:31 am, maury<maury...@core.com> wrote: >> >> ... >>>> Keep in mind that what I am addressing is your statement >> >>>> "That the LMS derivation and equation is only optimal for Gaussian >>>> driving signals" >> >>>> not the merits of the LMS algorithm with different signal types. >> >>> Well it depends how you derive it! I can derive it from maximum >>> likelihood and get different answers for different noise >>> distributions. >> >> i would like to see this. do you mind deriving LMS here (using whatever >> flavor of ASCII-math is to your liking), once with Gaussian and again >> with some other distribution? >> >> what difference does it make? a different adaptation gain, mu? or do >> you get something different from the standard LMS equations? >> >>> It is merely a matter of definition here. >> >> Hardy, i would like to see this. can you show us? >> >> -- >> >> r b-j r...@audioimagination.com >> >> "Imagination is more important than knowledge." > > I am not very good at ASCII maths. but here goes in words > > the likelihood function is the product of all the PDFs (up to say n). > Each one > in the Guassian case has Guassian PDF > > L(y(k)) = product (i=1,n) Pi(k) >
i am not sure what it is you're saying, but if it is what i think it is, i think you're mistaken. a way to get Gaussian (or Normal) random variable, you add up a pile of RVs that have finite mean and variance (so the "Cauchy RV" is not eligible). now, when you add RVs, you *convolve* their p.d.f's together. or, if you compute the "characteristic functions", which are the Fourier Transform of the p.d.fs., you multiply *them* functions.
> It is normal to minimise the -ve log of this since it makes the maths > easier
what's "-ve"?
> > when you take the -ve log(e) of an exponential (Guassian distribution) > and a product you are left with a sum of squares plus a constant. > What you do is make the PDFs the PDFs of the estimation errors and > then you end up with ordinary least-squares. > > Now when you use a Laplacian distribution instead you end up with the > sign algorithm. > > The point of all this is that ordinary least squares is not PDF > dependent ie you don't care what the PDFs are whereas maximum > likelihood is. > In the Guassian case the two methods are the same. >
this still needs clearer explanation. Hardy, do you know the simplistic "derivation" of LMS adaptive filter? the way Widrow presented it to an AES convention some many years ago? it's pretty simple and it doesn't make any assumptions of the statistics of the error function. it *does* make an assumption (that might not be justified in some cases) that an incremental adjustment of the FIR coefficients that reduces some mean error metric (sum of (e[n])^2 ) is similar to the incremental adjustment that reduces the square-error of the current sample: (e[n])^2. that's the simple way to see where the LMS adaptive filter alg comes from. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."
On Jul 12, 11:20&#4294967295;am, robert bristow-johnson
<r...@audioimagination.com> wrote:
> On 7/11/12 3:10 PM, HardySpicer wrote: > > > > > > > > > > > On Jul 11, 3:46 pm, robert bristow-johnson<r...@audioimagination.com> > > wrote: > >> On 7/10/12 10:51 PM, HardySpicer wrote: > > >>> On Jul 11, 2:31 am, maury<maury...@core.com> &#4294967295; &#4294967295;wrote: > > >> ... > >>>> Keep in mind that what I am addressing is your statement > > >>>> "That the LMS derivation and equation is only optimal for Gaussian > >>>> driving signals" > > >>>> not the merits of the LMS algorithm with different signal types. > > >>> Well it depends how you derive it! I can derive it from maximum > >>> likelihood and get different answers for different noise > >>> distributions. > > >> i would like to see this. &#4294967295;do you mind deriving LMS here (using whatever > >> flavor of ASCII-math is to your liking), once with Gaussian and again > >> with some other distribution? > > >> what difference does it make? &#4294967295;a different adaptation gain, mu? &#4294967295;or do > >> you get something different from the standard LMS equations? > > >>> It is merely a matter of definition here. > > >> Hardy, i would like to see this. &#4294967295;can you show us? > > >> -- > > >> r b-j &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;r...@audioimagination.com > > >> "Imagination is more important than knowledge." > > > I am not very good at ASCII maths. but here goes in words > > > the likelihood function is the product of all the PDFs (up to say n). > > Each one > > in the Guassian case has Guassian PDF > > > L(y(k)) = product (i=1,n) &#4294967295;Pi(k) > > i am not sure what it is you're saying, but if it is what i think it is, > i think you're mistaken. > > a way to get Gaussian (or Normal) random variable, you add up a pile of > RVs that have finite mean and variance (so the "Cauchy RV" is not > eligible). &#4294967295;now, when you add RVs, you *convolve* their p.d.f's > together. &#4294967295;or, if you compute the "characteristic functions", which are > the Fourier Transform of the p.d.fs., you multiply *them* functions. > > > It is normal to minimise the -ve log of this since it makes the maths > > easier > > what's "-ve"? > > > > > when you take the -ve log(e) of an exponential (Guassian distribution) > > and a product you are left with a sum of squares plus &#4294967295;a constant. > > What you do is make the PDFs the PDFs of the estimation errors and > > then you end up with ordinary least-squares. > > > Now when you use a Laplacian distribution instead you end up with the > > sign algorithm. > > > The point of all this is that ordinary least squares is not PDF > > dependent ie you don't care what the PDFs are whereas maximum > > likelihood is. > > In the Guassian case the two methods are the same. > > this still needs clearer explanation. > > Hardy, do you know the simplistic "derivation" of LMS adaptive filter? > the way Widrow presented it to an AES convention some many years ago? > it's pretty simple and it doesn't make any assumptions of the statistics > of the error function. &#4294967295;it *does* make an assumption (that might not be > justified in some cases) that an incremental adjustment of the FIR > coefficients that reduces some mean error metric (sum of (e[n])^2 ) is > similar to the incremental adjustment that reduces the square-error of > the current sample: (e[n])^2. > > that's the simple way to see where the LMS adaptive filter alg comes from. > > -- > > r b-j &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295; &#4294967295;r...@audioimagination.com > > "Imagination is more important than knowledge."
Yes of course in its original form LMS made no prior assumptions about statistics of the driving noise. It is only later with the advent of so-called "blind" equalization (deconvolution) problems that assumptions had to be made. -ve means negative. Instead of maximising the likelihood we minimise the negative log likelihood. You can also make use of the Kullback Leibler distance measure for some applications. This measures how far a distribution is from being Guassian. You can make use of this fact if you know what the actual PDF is then minimise a criterion based on the K-L distance measure. Hardy