DSPRelated.com
Forums

extended kalman filter going berserk

Started by hagai_sela July 16, 2009
Hi,
I am using an extended kalman filter to train a neural network. My problem
is that after a few thousand samples of training it starts producing very
high values.
I read about square root kalman filtering, but I am not sure this will
help because I am using 64 bit values which should be accurate enough (I
think). I only found square root algorithms for single output problems, and
my neural network predicts multiple simultaneous outputs.
Any pointers regarding what I should look for? Does anybody know a square
root algorithm which suits multiple outputs?

Thanks,
Hagai. 


On 16 Jul, 14:41, "hagai_sela" <hagai.s...@gmail.com> wrote:
> Hi, > I am using an extended kalman filter to train a neural network. My problem > is that after a few thousand samples of training it starts producing very > high values. > I read about square root kalman filtering, but I am not sure this will > help because I am using 64 bit values which should be accurate enough (I > think).
64 bit representations of the wrong answer is no good. Try the square root algorithms.
> I only found square root algorithms for single output problems, and > my neural network predicts multiple simultaneous outputs. > Any pointers regarding what I should look for? Does anybody know a square > root algorithm which suits multiple outputs?
Square root alghorithms are standard for EKFs. Look harder. Check the book by Dan Simon. Rune
On Thu, 16 Jul 2009 07:41:53 -0500, hagai_sela wrote:

> Hi, > I am using an extended kalman filter to train a neural network. My > problem is that after a few thousand samples of training it starts > producing very high values. > I read about square root kalman filtering, but I am not sure this will > help because I am using 64 bit values which should be accurate enough (I > think). I only found square root algorithms for single output problems, > and my neural network predicts multiple simultaneous outputs. Any > pointers regarding what I should look for? Does anybody know a square > root algorithm which suits multiple outputs? > > Thanks, > Hagai.
Unless you have a matrix that's very close to singular I don't think that going to a square root algorithm is going to help (sorry Rune). I'd go over my model and my implementation again, to make sure that I'm tracking things correctly. -- www.wescottdesign.com
Anything in particular? I standardized the input variables with a zero mean
and unit variance, and the output values are always between -1 and 1,
probably quite uniformly. The only other variables are:
- Initial value of the covariance matrix. I set it's diagonal to 100 and
the other cells to 0.
- Measurement noise covariance matrix (R): I start with a diagonal 100
like the previous one and I use exponential decay to values as low as 3 (I
only change the diagonal value, same value for every item of the
diagonal).
- Artificial process noise covariance matrix (Q): Same as above, but
starting with 0.01, and using exponential decay with a limiting value of
10^-6.
I based these on some papers I read.

Hagai.
On Thu, 16 Jul 2009 14:28:24 -0500, hagai_sela wrote:

> Anything in particular? I standardized the input variables with a zero > mean and unit variance, and the output values are always between -1 and > 1, probably quite uniformly. The only other variables are: - Initial > value of the covariance matrix. I set it's diagonal to 100 and the other > cells to 0. > - Measurement noise covariance matrix (R): I start with a diagonal 100 > like the previous one and I use exponential decay to values as low as 3 > (I only change the diagonal value, same value for every item of the > diagonal). > - Artificial process noise covariance matrix (Q): Same as above, but > starting with 0.01, and using exponential decay with a limiting value of > 10^-6. > I based these on some papers I read. > > Hagai.
I'm not sure what you mean by "standardized the input variables with zero mean" -- do you mean you're only linearizing around the operating point where your state vector = 0? If so you're not building an EKF. You should get yourself a copy of Dan Simon's book "Optimal State Estimation". He goes into the EKF (and notes that it can go unstable, at times). One thing that _can_ help an unstable EKF is to increase the process noise. Your filter will settle more slowly if you do, but slow and stable is better than unstable! If you have a way of estimating the error between your point linearization and the real model, you can use that estimate for your process noise to good effect. -- www.wescottdesign.com
On Thu, 16 Jul 2009 14:28:24 -0500, hagai_sela wrote:

> Anything in particular? I standardized the input variables with a zero > mean and unit variance, and the output values are always between -1 and > 1, probably quite uniformly. The only other variables are: - Initial > value of the covariance matrix. I set it's diagonal to 100 and the other > cells to 0. > - Measurement noise covariance matrix (R): I start with a diagonal 100 > like the previous one and I use exponential decay to values as low as 3 > (I only change the diagonal value, same value for every item of the > diagonal). > - Artificial process noise covariance matrix (Q): Same as above, but > starting with 0.01, and using exponential decay with a limiting value of > 10^-6. > I based these on some papers I read. > > Hagai.
You may also want to check the eigenvalues of your covariance matrix. If they ever, ever go negative then you've got a problem with your math. -- www.wescottdesign.com
>I'm not sure what you mean by "standardized the input variables with zero
>mean" -- do you mean you're only linearizing around the operating point >where your state vector = 0? If so you're not building an EKF.
Probably not, although I am not sure what you mean... :) I am not really a DSP expert. I meant that I am standardizing the input - for example if it is uniformly spread between 1 and 100 I scale it to be normally spread between -1 and 1.
>>I'm not sure what you mean by "standardized the input variables with
zero
> >>mean" -- do you mean you're only linearizing around the operating point
>>where your state vector = 0? If so you're not building an EKF. > >Probably not, although I am not sure what you mean... :) I am not really
a
>DSP expert. >I meant that I am standardizing the input - for example if it is
uniformly
>spread between 1 and 100 I scale it to be normally spread between -1 and
1.
>
I meant I scale it to be normally spread with 0 mean and 1 variance (standard score).
>>I'm not sure what you mean by "standardized the input variables with
zero
> >>mean" -- do you mean you're only linearizing around the operating point
>>where your state vector = 0? If so you're not building an EKF. > >Probably not, although I am not sure what you mean... :) I am not really
a
>DSP expert. >I meant that I am standardizing the input - for example if it is
uniformly
>spread between 1 and 100 I scale it to be normally spread between -1 and
1.
>
When you say input, you are presumably referring to measurements? I see you said you took a uniform pdf to normal; I assume you're just taking the mean and variance and treating it as normal, right? My understanding is that the filter will sometimes put up with non-Gaussian inputs, in spite of the derivation. Anyway, I've not read that it's important to rescale the measurements (I don't see it hurting), but I have read that it helps to scale the states. As far as the square root, I frequently see scalar measurement formulations in books, but if your sensor noise matrix is diagonal, or can be diagonalized (by linear combinations of the true measurements at a given instant), you may be able to treat it as several successive measurements. There may be a more elegant approach, but a diagonal Q also gives computational savings (p 221 of Grewal&Andrews 0-471-39254-5), if you can pull it off...
Hi guys,
I ran some tests, eigenvalues are positive at all times. It seems that the
problem is caused by a specific measurement which is 27 standard deviations
more than the average (The network fails at this point).
I tried taking the natural log of the data instead but the results seem to
be worse... any ideas?

Thanks,
Hagai.