comp.dsp | extended kalman filter going berserk

Hi,
I am using an extended kalman filter to train a neural network. My problem
is that after a few thousand samples of training it starts producing very
high values.
I read about square root kalman filtering, but I am not sure this will
help because I am using 64 bit values which should be accurate enough (I
think). I only found square root algorithms for single output problems, and
my neural network predicts multiple simultaneous outputs.
Any pointers regarding what I should look for? Does anybody know a square
root algorithm which suits multiple outputs?

Thanks,
Hagai.

Reply by Rune Allnor ●July 16, 20092009-07-16

On 16 Jul, 14:41, "hagai_sela" <hagai.s...@gmail.com> wrote:
> Hi,
> I am using an extended kalman filter to train a neural network. My problem
> is that after a few thousand samples of training it starts producing very
> high values.
> I read about square root kalman filtering, but I am not sure this will
> help because I am using 64 bit values which should be accurate enough (I
> think).

64 bit representations of the wrong answer is no good.
Try the square root algorithms.

> I only found square root algorithms for single output problems, and
> my neural network predicts multiple simultaneous outputs.
> Any pointers regarding what I should look for? Does anybody know a square
> root algorithm which suits multiple outputs?

Square root alghorithms are standard for EKFs. Look harder.
Check the book by Dan Simon.

Rune

Reply by Tim Wescott ●July 16, 20092009-07-16

On Thu, 16 Jul 2009 07:41:53 -0500, hagai_sela wrote:

> Hi,
> I am using an extended kalman filter to train a neural network. My
> problem is that after a few thousand samples of training it starts
> producing very high values.
> I read about square root kalman filtering, but I am not sure this will
> help because I am using 64 bit values which should be accurate enough (I
> think). I only found square root algorithms for single output problems,
> and my neural network predicts multiple simultaneous outputs. Any
> pointers regarding what I should look for? Does anybody know a square
> root algorithm which suits multiple outputs?
> 
> Thanks,
> Hagai.

Unless you have a matrix that's very close to singular I don't think that 
going to a square root algorithm is going to help (sorry Rune).

I'd go over my model and my implementation again, to make sure that I'm 
tracking things correctly.

-- 
www.wescottdesign.com

Reply by hagai_sela ●July 16, 20092009-07-16

Anything in particular? I standardized the input variables with a zero mean
and unit variance, and the output values are always between -1 and 1,
probably quite uniformly. The only other variables are:
- Initial value of the covariance matrix. I set it's diagonal to 100 and
the other cells to 0.
- Measurement noise covariance matrix (R): I start with a diagonal 100
like the previous one and I use exponential decay to values as low as 3 (I
only change the diagonal value, same value for every item of the
diagonal).
- Artificial process noise covariance matrix (Q): Same as above, but
starting with 0.01, and using exponential decay with a limiting value of
10^-6.
I based these on some papers I read.

Hagai.

Reply by Tim Wescott ●July 16, 20092009-07-16

On Thu, 16 Jul 2009 14:28:24 -0500, hagai_sela wrote:

> Anything in particular? I standardized the input variables with a zero
> mean and unit variance, and the output values are always between -1 and
> 1, probably quite uniformly. The only other variables are: - Initial
> value of the covariance matrix. I set it's diagonal to 100 and the other
> cells to 0.
> - Measurement noise covariance matrix (R): I start with a diagonal 100
> like the previous one and I use exponential decay to values as low as 3
> (I only change the diagonal value, same value for every item of the
> diagonal).
> - Artificial process noise covariance matrix (Q): Same as above, but
> starting with 0.01, and using exponential decay with a limiting value of
> 10^-6.
> I based these on some papers I read.
> 
> Hagai.

I'm not sure what you mean by "standardized the input variables with zero 
mean" -- do you mean you're only linearizing around the operating point 
where your state vector = 0?  If so you're not building an EKF.

You should get yourself a copy of Dan Simon's book "Optimal State 
Estimation".  He goes into the EKF (and notes that it can go unstable, at 
times).

One thing that _can_ help an unstable EKF is to increase the process 
noise.  Your filter will settle more slowly if you do, but slow and 
stable is better than unstable!  If you have a way of estimating the 
error between your point linearization and the real model, you can use 
that estimate for your process noise to good effect.

-- 
www.wescottdesign.com

Reply by Tim Wescott ●July 17, 20092009-07-17

On Thu, 16 Jul 2009 14:28:24 -0500, hagai_sela wrote:

> Anything in particular? I standardized the input variables with a zero
> mean and unit variance, and the output values are always between -1 and
> 1, probably quite uniformly. The only other variables are: - Initial
> value of the covariance matrix. I set it's diagonal to 100 and the other
> cells to 0.
> - Measurement noise covariance matrix (R): I start with a diagonal 100
> like the previous one and I use exponential decay to values as low as 3
> (I only change the diagonal value, same value for every item of the
> diagonal).
> - Artificial process noise covariance matrix (Q): Same as above, but
> starting with 0.01, and using exponential decay with a limiting value of
> 10^-6.
> I based these on some papers I read.
> 
> Hagai.

You may also want to check the eigenvalues of your covariance matrix.  If 
they ever, ever go negative then you've got a problem with your math.

-- 
www.wescottdesign.com

Reply by hagai_sela ●July 17, 20092009-07-17

>I'm not sure what you mean by "standardized the input variables with zero

>mean" -- do you mean you're only linearizing around the operating point 
>where your state vector = 0?  If so you're not building an EKF.

Probably not, although I am not sure what you mean... :) I am not really a
DSP expert.
I meant that I am standardizing the input - for example if it is uniformly
spread between 1 and 100 I scale it to be normally spread between -1 and 1.

Reply by hagai_sela ●July 17, 20092009-07-17

>>I'm not sure what you mean by "standardized the input variables with
zero
>
>>mean" -- do you mean you're only linearizing around the operating point

>>where your state vector = 0?  If so you're not building an EKF.
>
>Probably not, although I am not sure what you mean... :) I am not really
a
>DSP expert.
>I meant that I am standardizing the input - for example if it is
uniformly
>spread between 1 and 100 I scale it to be normally spread between -1 and
1.
>

I meant I scale it to be normally spread with 0 mean and 1 variance
(standard score).

Reply by Michael Plante ●July 17, 20092009-07-17

>>I'm not sure what you mean by "standardized the input variables with
zero
>
>>mean" -- do you mean you're only linearizing around the operating point

>>where your state vector = 0?  If so you're not building an EKF.
>
>Probably not, although I am not sure what you mean... :) I am not really
a
>DSP expert.
>I meant that I am standardizing the input - for example if it is
uniformly
>spread between 1 and 100 I scale it to be normally spread between -1 and
1.
>

When you say input, you are presumably referring to measurements?  I see
you said you took a uniform pdf to normal; I assume you're just taking the
mean and variance and treating it as normal, right?  My understanding is
that the filter will sometimes put up with non-Gaussian inputs, in spite of
the derivation.  Anyway, I've not read that it's important to rescale the
measurements (I don't see it hurting), but I have read that it helps to
scale the states.  

As far as the square root, I frequently see scalar measurement
formulations in books, but if your sensor noise matrix is diagonal, or can
be diagonalized (by linear combinations of the true measurements at a given
instant), you may be able to treat it as several successive measurements. 
There may be a more elegant approach, but a diagonal Q also gives
computational savings (p 221 of Grewal&Andrews 0-471-39254-5), if you can
pull it off...

Reply by hagai_sela ●August 4, 20092009-08-04

Hi guys,
I ran some tests, eigenvalues are positive at all times. It seems that the
problem is caused by a specific measurement which is 27 standard deviations
more than the average (The network fails at this point).
I tried taking the natural log of the data instead but the results seem to
be worse... any ideas?

Thanks,
Hagai.

Previous12 Next

extended kalman filter going berserk

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group