DSPRelated.com
Forums

SOS!any good curve fitting/data analysis technique for this problem?

Started by walala October 23, 2003
walala wrote:

> "Jerry Avins" <jya@ieee.org> wrote in message > news:bn8v2j$ljt$1@bob.news.rcn.net... > >>walala wrote: >> >> >>>Dear all, >>> >>>I have big trouble on this curve fitting for a few days. >> >>[snip attempt to use interpolation tools for extrapolation] >> >>Given a series 1, 2, 3, 4, 5, can you predict the sixth term? Would you >>believe 0? -80? 27? It is easy to construct formulas that will yield any >>of those values. When an interpolator constructs the best match over a >>set of points, it ignore all points not specified; internal unspecified >>points as well as those outside the range. Run a cubic spline through >>the points (-2, 0), (-1, 0), (0, 1), (1, 0), (2, 0). You will see a >>pretty good approximation of a sinc. >> > > > Jerry, > > thanks for your answer. so you mean I'd choose cubic spline to fit my data, > right? That's a better choice than polynomial, right? > > -Walala
No. I meant to illustrate that tools for interpolating and smoothing data don't serve well for extrapolation. If You had a year's worth of stock prices, could you accurately predict what a particular stock will sell for tomorrow? If so, you are likely to retire early from your day job. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
walala wrote:

> Fred, thanks a lot for your answer. > >>Jerry goes in a good direction in his first sentence. In my reading of > > what > >>you're trying to do it appears you're taking a number of data sets, >>capturing certain parameters and trying to predict a new data set by >>extracting the same parameter from new data, right? > > > Yes, that's what I want to do... > >>I may be way off base...... >>The potential problem I see with this has not much to do with techniques >>(polynomial, spline, etc) but rather that the data sets may have nothing >>whatsoever in common - there may be no correlation between them. If > > that's > >>the case then there may be no sensible method you can use to predict. >> > > > Please see my attached image with 5 curves, showing y=f(x) at different w > and z positions. > > I see four of 5 curves have good trends to be together, the fifth one, which > is far away from the other 4, I guess that should be able to distinguish > using w and z. > > That's why I try to use curve fitting to capture the relationship between > data. Yesterday I did a order-7 multinomial fit, then I use x=10 to test it, > it gives a way-off wrong value, I guess that's because x=10 is outside of > the training data set. > > Do you think the 5 curves are not correlated? What else techniques can I use > to treat them? > > Thanks a lot, > > -Wallal
The curves seem to be highly correlated. There are data for all at least up to x = 6, and for some up to x = 9. At x = 10, all reason to think your formula has meaning vanishes. Think of a Hilbert transformer as a formula that approximates a flat response over as wide a bandwidth as possible given the number of terms. What happens when you exceed that bandwidth? Your polynomials can do no better. The greater the number of terms, the better the match to the training set and the more striking the departure outside it. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
The number of data points in your training set should be much higher than
the degree of the polynomial you're using to fit them.  Use more data, and
it will work just fine.


Matt Timmermans wrote:

> The number of data points in your training set should be much higher than > the degree of the polynomial you're using to fit them. Use more data, and > it will work just fine. > >
Don't get his hopes up. No matter how many points are established within a range, any relation to reality that the power series bears to points outside that range is purely accidental. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
"Jerry Avins" <jya@ieee.org> wrote in message
news:bna8qg$k7m$1@bob.news.rcn.net...
> Matt Timmermans wrote: > > > The number of data points in your training set should be much higher
than
> > the degree of the polynomial you're using to fit them. Use more data,
and
> > it will work just fine. > > > > > Don't get his hopes up. No matter how many points are established within > a range, any relation to reality that the power series bears to points > outside that range is purely accidental.
Jerry, I originally got my hope up and now pulled down desperately by you. :=) Can I just do the following, I try to get my training range as large as possible(not data points as large as possible, since I need time to generate data points...I am slow...). then the training range should include all possible test data, right. That's to say, origianlly I train from 0 to 8, but test with 10, but now I train from 0 to 10, then test with 8... maybe that's better? -Walala
walala wrote:

> I have big trouble on this curve fitting for a few days. > > The problem, I guess, is out-of-range test data. Suppose my training data > set for this curve fitting has range: x from 0 to 8; but my test data has x > to be 10, then it's out of training range, and my multi-nomial fit cannot > handle this well. > > Here is the details: > > This is related to signal processing. I have a bunch of data, y, x, and w, > where w is a 5-element vector. I want to find the relationship between y, x, > and w. Here is how I obtained these data.
[...]
> But the result is way-off with over 100% error. I guess the problem is that > poly-nomial fit using the least square method(the "\" in matlab) is good at > "interpolate" data, but not good at "predict extended data". The test image > has a x=10.8565, which is out of the range of the training data. Hence the > predictor cannot correctly predict it. > > I am quite dispointed at this. There are two things I should do I guess, > 1.)obtain more training set and do more training; 2) use a better fit model. > A model that can extend... or a model can deal with little training > information. Maybe neural network?
How do you know that the process your're measuring (or sampling) is not a complete random process? Maybe it's just noise. You should have a model before hand in order to say you can derive a deterministic law capable of predicting the data. Of course the model is coming from the data, but I do not think you can do this automatically, unless you've a real simple situation. bye, -- Piergiorgio Sartor
"walala" <mizhael@yahoo.com> wrote in message news:<bnaag0$abu$1@mozo.cc.purdue.edu>...
> Can I just do the following, I try to get my training range as large as > possible(not data points as large as possible, since I need time to generate > data points...I am slow...). then the training range should include all > possible test data, right. That's to say, origianlly I train from 0 to 8, > but test with 10, but now I train from 0 to 10, then test with 8... maybe > that's better?
"All possible test data"... eh... I'm not sure if I understand. A common spec for images is 512 x 512 pixels with eight bits per pixel. That would be something like 256^(512^2) > 10e631305 possible images. While others may think little of such numbers (I have met that "search over all possible models and all possible parameters in those models" argument before, presented, to the best of my knowledge, with sincerety) I find those kinds of numbers just ridiculous. PLEASE tell me you did not mean that you want to train your filter on all possible images... Rune
It's best to fix the obvious problem first -- when you fit a 7th degree
polynomial to 5 data points, you get an (overspecified) interpolator, not an
estimator.

I don't know if a polynomial estimator is approprate for his data, but I do
know that if you have enough data points it should usually fail due to
unanticipated trends, not imagined ones.

"Jerry Avins" <jya@ieee.org> wrote in message
news:bna8qg$k7m$1@bob.news.rcn.net...
> Matt Timmermans wrote: > > > The number of data points in your training set should be much higher
than
> > the degree of the polynomial you're using to fit them. Use more data,
and
> > it will work just fine. > > > > > Don't get his hopes up. No matter how many points are established within > a range, any relation to reality that the power series bears to points > outside that range is purely accidental. > > Jerry > -- > Engineering is the art of making what you want from things you can get. > &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295; >
walala wrote:

   ...


> Can I just do the following, I try to get my training range as large as > possible(not data points as large as possible, since I need time to generate > data points...I am slow...). then the training range should include all > possible test data, right. That's to say, origianlly I train from 0 to 8, > but test with 10, but now I train from 0 to 10, then test with 8... maybe > that's better? > > -Walala
Much better. I think that will work. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
"Matt Timmermans" <mt0000@sympatico.nospam-remove.ca> wrote in message
news:RJ9mb.10088$VQ3.522567@news20.bellglobal.com...
> It's best to fix the obvious problem first -- when you fit a 7th degree > polynomial to 5 data points, you get an (overspecified) interpolator, not
an
> estimator. > > I don't know if a polynomial estimator is approprate for his data, but I
do
> know that if you have enough data points it should usually fail due to > unanticipated trends, not imagined ones. > > "Jerry Avins" <jya@ieee.org> wrote in message > news:bna8qg$k7m$1@bob.news.rcn.net... > > Matt Timmermans wrote: > > > > > The number of data points in your training set should be much higher > than > > > the degree of the polynomial you're using to fit them. Use more data, > and > > > it will work just fine. > > > > > > > > Don't get his hopes up. No matter how many points are established within > > a range, any relation to reality that the power series bears to points > > outside that range is purely accidental. > > > > Jerry > > -- > > Engineering is the art of making what you want from things you can get. > > &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;? > > > >
I heard that polynomial fit is bad for multi-input/multi-variable case. I am going to try neural networks... do you know which NN is best for such tasks? Thanks a lot, -Walala