DSPRelated.com
Forums

Basic problem about cost function of LMS noise whitening filter

Started by samwo123 December 10, 2007
X(z)---> H(z)----->Y(z)

I want to implement an adaptive noise whitening filter h(n).
The desired property of y(n) is that E{|y(n)|^6} = SigmaD (constant).
H(z) is a freq response of an all pole iir filter of length N, and H(z) =
1/(1+sum(h_i*z^-i)).

h(n)'s adaptation rule is
h_i(n+1) = h_i(n) - mu* d(J)/dh, where J is the cost function.

On papers I have, J = E{|y(n)|^2}. 

The goal of h(n) is to whiten the noise so that its output has constant
variance = SigmaD.

Why is that same as minimizing the variance of its output, i.e. why J =
E{|y(n)|^2}?
Shouldn't J be CMA, J = E{|y(n)|^2 - SigmaD}?


Regards,
Sam

On Dec 10, 4:41 pm, "samwo123" <sup...@gmail.com> wrote:
> X(z)---> H(z)----->Y(z) > > I want to implement an adaptive noise whitening filter h(n). > The desired property of y(n) is that E{|y(n)|^6} = SigmaD (constant). > H(z) is a freq response of an all pole iir filter of length N, and H(z) = > 1/(1+sum(h_i*z^-i)). > > h(n)'s adaptation rule is > h_i(n+1) = h_i(n) - mu* d(J)/dh, where J is the cost function. > > On papers I have, J = E{|y(n)|^2}. > > The goal of h(n) is to whiten the noise so that its output has constant > variance = SigmaD. > > Why is that same as minimizing the variance of its output, i.e. why J = > E{|y(n)|^2}? > Shouldn't J be CMA, J = E{|y(n)|^2 - SigmaD}? > > Regards, > Sam
This isn't a direct answer to your question, but have you thought about implementing the whitener in the frequency domain using an overlap-add or overlap-save algorithm with FFT bin modifications in the frequency domain? John
> Why is that same as minimizing the variance of its output, i.e. why J = > E{|y(n)|^2}? > Shouldn't J be CMA, J = E{|y(n)|^2 - SigmaD}? > > Regards, > Sam
Note that the two approaches that you suggested are equivalent. If SigmaD is a positive constant, then minimizing E(|y[n]|^2) is equivalent to minimizing E(|y[n]|^2 - SigmaD). However, whitening noise isn't equivalent to forcing its variance to a constant value. A noise process could have a known constant variance, but that says nothing of its color. Colors of noise are usually classified by the shape of the process's PSD, which in turn points to the level of correlation between different samples from the process. White noise has a flat power spectrum, meaning that any two samples taken from the process are uncorrelated. So really, your adaptive whitener is really trying to generate an output sequence where successive samples are not correlated with each other. I'm not a guru on this topic at all, but the one filter structure that I know of that has this property is the forward-linear-predictor-error filter. The basic idea behind it is that you're trying to predict future values of a process based upon previous observations; the output of the filter is the error between your prediction and the actual new value. If the filter is long enough, you can assume that your vector of observations (the tap-input vector) is "large" enough to span the entire space of values that the process will take on (and therefore give you a perfect prediction if your taps are right). Any residual error, which is the output of the filter, is uncorrelated with the tap-input vector, and therefore is uncorrelated with all past inputs of the filter. Thus, the sequence observed at the filter output is a stream of samples that are (almost) uncorrelated with each other, which looks like white noise. Or something like that. I probably got some of the terms wrong, but I hope you get the idea. Jason
>On Dec 10, 4:41 pm, "samwo123" <sup...@gmail.com> wrote: >> X(z)---> H(z)----->Y(z) >> >> I want to implement an adaptive noise whitening filter h(n). >> The desired property of y(n) is that E{|y(n)|^6} = SigmaD (constant). >> H(z) is a freq response of an all pole iir filter of length N, and H(z)
=
>> 1/(1+sum(h_i*z^-i)). >> >> h(n)'s adaptation rule is >> h_i(n+1) = h_i(n) - mu* d(J)/dh, where J is the cost function. >> >> On papers I have, J = E{|y(n)|^2}. >> >> The goal of h(n) is to whiten the noise so that its output has
constant
>> variance = SigmaD. >> >> Why is that same as minimizing the variance of its output, i.e. why J
=
>> E{|y(n)|^2}? >> Shouldn't J be CMA, J = E{|y(n)|^2 - SigmaD}? >> >> Regards, >> Sam > >This isn't a direct answer to your question, but have you thought >about implementing the whitener in the frequency domain using an >overlap-add or overlap-save algorithm with FFT bin modifications in >the frequency domain? > >John >
I havent thought about that. I want to implement it as a adaptive component, and I'm not very familiar with adaptive filter in frequency domain. Regards, Sam
>> Why is that same as minimizing the variance of its output, i.e. why J = >> E{|y(n)|^2}? >> Shouldn't J be CMA, J = E{|y(n)|^2 - SigmaD}? >> >> Regards, >> Sam > >Note that the two approaches that you suggested are equivalent. If >SigmaD is a positive constant, then minimizing E(|y[n]|^2) is >equivalent to minimizing E(|y[n]|^2 - SigmaD). > >However, whitening noise isn't equivalent to forcing its variance to a >constant value. A noise process could have a known constant variance, >but that says nothing of its color. Colors of noise are usually >classified by the shape of the process's PSD, which in turn points to >the level of correlation between different samples from the process. >White noise has a flat power spectrum, meaning that any two samples >taken from the process are uncorrelated. So really, your adaptive >whitener is really trying to generate an output sequence where >successive samples are not correlated with each other. > >I'm not a guru on this topic at all, but the one filter structure that >I know of that has this property is the forward-linear-predictor-error >filter. The basic idea behind it is that you're trying to predict >future values of a process based upon previous observations; the >output of the filter is the error between your prediction and the >actual new value. If the filter is long enough, you can assume that >your vector of observations (the tap-input vector) is "large" enough >to span the entire space of values that the process will take on (and >therefore give you a perfect prediction if your taps are right). Any >residual error, which is the output of the filter, is uncorrelated >with the tap-input vector, and therefore is uncorrelated with all past >inputs of the filter. Thus, the sequence observed at the filter output >is a stream of samples that are (almost) uncorrelated with each other, >which looks like white noise. > >Or something like that. I probably got some of the terms wrong, but I >hope you get the idea. > >Jason >
Thank you for your explanation. I now have a better understanding about the problem. I had a small typos in my original post though. I should have used SigmaD^2 instead of SigmaD.
>the >output of the filter is the error between your prediction and the >actual new value. If the filter is long enough,
So, the cost function in that case will be J = E{|y-d|^2}, where y is output of linear prediction and d is the known desired value. In my constraint I have no knowledge of d. I guess the best I can do is to do blind adaptation using E(||y[n]|^2 - SigmaD^2|) or E(|y[n]|^2) as my cost function.
>However, whitening noise isn't equivalent to forcing its variance to a >constant value. A noise process could have a known constant variance, >but that says nothing of its color.
I'm still a little confused here. The white noise , say w(k), has constant PSD across its spectrum. That translates to E{w(k)*conj(w(k-d))} = Sigmad^2, where d = 0, and = 0 else where. Whitening noise, and forcing the process to have constant variance are rather equivalent? Regards, Sam
On Dec 12, 11:10 am, "samwo123" <sup...@gmail.com> wrote:
> So, the cost function in that case will be J = E{|y-d|^2}, where y is > output of linear prediction and d is the known desired value. > In my constraint I have no knowledge of d. I guess the best I can do is to > do blind adaptation using E(||y[n]|^2 - SigmaD^2|) or E(|y[n]|^2) as my > cost function.
For an Nth-order forward linear prediction error filter, you use a vector containing samples x[n-1], x[n-2], ... x[n-N-1] to predict the value of x[n]. So, in effect, you do have a desired signal to use for adaptation: the current value of the input signal x[n]. So, in the common LMS nomenclature: input signal: x[n] tap-input vector: u = [ x[n-1] x[n-2] ... x[n-N-1] ] desired signal: x[n] I hope this makes it clearer that this is not a blind adaptation algorithm. I would recommend Haykin's Adaptive Filter Theory text if you have access to it; there is a good chapter on linear prediction.
> >However, whitening noise isn't equivalent to forcing its variance to a > >constant value. A noise process could have a known constant variance, > >but that says nothing of its color. > > I'm still a little confused here. The white noise , say w(k), has constant > PSD across its spectrum. That translates to E{w(k)*conj(w(k-d))} = > Sigmad^2, where d = 0, and = 0 else where. Whitening noise, and forcing > the process to have constant variance are rather equivalent? > > Regards, > Sam
They are not equivalent. Forcing a process to have constant variance would imply that you are forcing it to be wide-sense stationary. Its stationarity has nothing to do with the correlation between samples of the process, which is what makes it "white." You can force the process's variance to be a constant, but that makes no guarantee that its output will be white. In the most ridiculous example, make a filter with all zero taps. The output variance is a constant (zero), but it sure isn't white. Jason