Hi, I am using an extended kalman filter to train a neural network. My problem is that after a few thousand samples of training it starts producing very high values. I read about square root kalman filtering, but I am not sure this will help because I am using 64 bit values which should be accurate enough (I think). I only found square root algorithms for single output problems, and my neural network predicts multiple simultaneous outputs. Any pointers regarding what I should look for? Does anybody know a square root algorithm which suits multiple outputs? Thanks, Hagai.
extended kalman filter going berserk
Started by ●July 16, 2009
Reply by ●July 16, 20092009-07-16
On 16 Jul, 14:41, "hagai_sela" <hagai.s...@gmail.com> wrote:> Hi, > I am using an extended kalman filter to train a neural network. My problem > is that after a few thousand samples of training it starts producing very > high values. > I read about square root kalman filtering, but I am not sure this will > help because I am using 64 bit values which should be accurate enough (I > think).64 bit representations of the wrong answer is no good. Try the square root algorithms.> I only found square root algorithms for single output problems, and > my neural network predicts multiple simultaneous outputs. > Any pointers regarding what I should look for? Does anybody know a square > root algorithm which suits multiple outputs?Square root alghorithms are standard for EKFs. Look harder. Check the book by Dan Simon. Rune
Reply by ●July 16, 20092009-07-16
On Thu, 16 Jul 2009 07:41:53 -0500, hagai_sela wrote:> Hi, > I am using an extended kalman filter to train a neural network. My > problem is that after a few thousand samples of training it starts > producing very high values. > I read about square root kalman filtering, but I am not sure this will > help because I am using 64 bit values which should be accurate enough (I > think). I only found square root algorithms for single output problems, > and my neural network predicts multiple simultaneous outputs. Any > pointers regarding what I should look for? Does anybody know a square > root algorithm which suits multiple outputs? > > Thanks, > Hagai.Unless you have a matrix that's very close to singular I don't think that going to a square root algorithm is going to help (sorry Rune). I'd go over my model and my implementation again, to make sure that I'm tracking things correctly. -- www.wescottdesign.com
Reply by ●July 16, 20092009-07-16
Anything in particular? I standardized the input variables with a zero mean and unit variance, and the output values are always between -1 and 1, probably quite uniformly. The only other variables are: - Initial value of the covariance matrix. I set it's diagonal to 100 and the other cells to 0. - Measurement noise covariance matrix (R): I start with a diagonal 100 like the previous one and I use exponential decay to values as low as 3 (I only change the diagonal value, same value for every item of the diagonal). - Artificial process noise covariance matrix (Q): Same as above, but starting with 0.01, and using exponential decay with a limiting value of 10^-6. I based these on some papers I read. Hagai.
Reply by ●July 16, 20092009-07-16
On Thu, 16 Jul 2009 14:28:24 -0500, hagai_sela wrote:> Anything in particular? I standardized the input variables with a zero > mean and unit variance, and the output values are always between -1 and > 1, probably quite uniformly. The only other variables are: - Initial > value of the covariance matrix. I set it's diagonal to 100 and the other > cells to 0. > - Measurement noise covariance matrix (R): I start with a diagonal 100 > like the previous one and I use exponential decay to values as low as 3 > (I only change the diagonal value, same value for every item of the > diagonal). > - Artificial process noise covariance matrix (Q): Same as above, but > starting with 0.01, and using exponential decay with a limiting value of > 10^-6. > I based these on some papers I read. > > Hagai.I'm not sure what you mean by "standardized the input variables with zero mean" -- do you mean you're only linearizing around the operating point where your state vector = 0? If so you're not building an EKF. You should get yourself a copy of Dan Simon's book "Optimal State Estimation". He goes into the EKF (and notes that it can go unstable, at times). One thing that _can_ help an unstable EKF is to increase the process noise. Your filter will settle more slowly if you do, but slow and stable is better than unstable! If you have a way of estimating the error between your point linearization and the real model, you can use that estimate for your process noise to good effect. -- www.wescottdesign.com
Reply by ●July 17, 20092009-07-17
On Thu, 16 Jul 2009 14:28:24 -0500, hagai_sela wrote:> Anything in particular? I standardized the input variables with a zero > mean and unit variance, and the output values are always between -1 and > 1, probably quite uniformly. The only other variables are: - Initial > value of the covariance matrix. I set it's diagonal to 100 and the other > cells to 0. > - Measurement noise covariance matrix (R): I start with a diagonal 100 > like the previous one and I use exponential decay to values as low as 3 > (I only change the diagonal value, same value for every item of the > diagonal). > - Artificial process noise covariance matrix (Q): Same as above, but > starting with 0.01, and using exponential decay with a limiting value of > 10^-6. > I based these on some papers I read. > > Hagai.You may also want to check the eigenvalues of your covariance matrix. If they ever, ever go negative then you've got a problem with your math. -- www.wescottdesign.com
Reply by ●July 17, 20092009-07-17
>I'm not sure what you mean by "standardized the input variables with zero>mean" -- do you mean you're only linearizing around the operating point >where your state vector = 0? If so you're not building an EKF.Probably not, although I am not sure what you mean... :) I am not really a DSP expert. I meant that I am standardizing the input - for example if it is uniformly spread between 1 and 100 I scale it to be normally spread between -1 and 1.
Reply by ●July 17, 20092009-07-17
>>I'm not sure what you mean by "standardized the input variables withzero> >>mean" -- do you mean you're only linearizing around the operating point>>where your state vector = 0? If so you're not building an EKF. > >Probably not, although I am not sure what you mean... :) I am not reallya>DSP expert. >I meant that I am standardizing the input - for example if it isuniformly>spread between 1 and 100 I scale it to be normally spread between -1 and1.>I meant I scale it to be normally spread with 0 mean and 1 variance (standard score).
Reply by ●July 17, 20092009-07-17
>>I'm not sure what you mean by "standardized the input variables withzero> >>mean" -- do you mean you're only linearizing around the operating point>>where your state vector = 0? If so you're not building an EKF. > >Probably not, although I am not sure what you mean... :) I am not reallya>DSP expert. >I meant that I am standardizing the input - for example if it isuniformly>spread between 1 and 100 I scale it to be normally spread between -1 and1.>When you say input, you are presumably referring to measurements? I see you said you took a uniform pdf to normal; I assume you're just taking the mean and variance and treating it as normal, right? My understanding is that the filter will sometimes put up with non-Gaussian inputs, in spite of the derivation. Anyway, I've not read that it's important to rescale the measurements (I don't see it hurting), but I have read that it helps to scale the states. As far as the square root, I frequently see scalar measurement formulations in books, but if your sensor noise matrix is diagonal, or can be diagonalized (by linear combinations of the true measurements at a given instant), you may be able to treat it as several successive measurements. There may be a more elegant approach, but a diagonal Q also gives computational savings (p 221 of Grewal&Andrews 0-471-39254-5), if you can pull it off...
Reply by ●August 4, 20092009-08-04
Hi guys, I ran some tests, eigenvalues are positive at all times. It seems that the problem is caused by a specific measurement which is 27 standard deviations more than the average (The network fails at this point). I tried taking the natural log of the data instead but the results seem to be worse... any ideas? Thanks, Hagai.