Dear all, I have big trouble on this curve fitting for a few days. The problem, I guess, is out-of-range test data. Suppose my training data set for this curve fitting has range: x from 0 to 8; but my test data has x to be 10, then it's out of training range, and my multi-nomial fit cannot handle this well. Here is the details: This is related to signal processing. I have a bunch of data, y, x, and w, where w is a 5-element vector. I want to find the relationship between y, x, and w. Here is how I obtained these data. First for image A: I measured w to be: 1.75E+04 26.395 2.5283 0.75389 0.47193 Under this w, I sample 7 data points for x and y: x= 6.3795 1.8693 0.9923 0.6455 0.4612 0.3494 0.2753 y= 0.068 -0.0131 -0.0301 -0.0449 -0.0504 -0.0506 -0.0578 For image B, w= 1.61E+04 54.845 41.778 9.7303 7.8343 I sampled x and y to be the following: x= 7.5179 2.1687 1.0814 0.6556 0.4505 0.3321 0.2586 y= 0.0521 -0.0119 -0.0387 -0.05 -0.0484 -0.0459 -0.0508 For image C: W= 1.46E+04 250.53 128.03 50.119 23.137 x= 7.6963 2.0372 1.0318 0.6505 0.4576 0.3373 0.2606 y= -0.1174 -0.1379 -0.1191 -0.11 -0.0992 -0.0889 -0.0836 For image D: W= 6.98E+03 40.405 28.292 18.178 10.712 x= 8.9289 3.8078 2.2907 1.5374 1.0899 0.8161 0.6354 y= 0.0715 0.0176 -0.0078 -0.0225 -0.035 -0.043 -0.0496 For image E: W= 1.84E+04 145.86 96.09 33.04 26.242 x= 8.5197 2.9649 1.6112 1.0238 0.709 0.5226 0.3975 y= 0.0611 0.0025 -0.0262 -0.0365 -0.0479 -0.0521 -0.0557 ---------------------------------------------------- I did a order-7 multinomial fit, with 6 variables(x, w1, w2, w3, w4, w5), i.e., fit by multinomial... then use the curve to predict the y for the following test image: W= 1.15E+04 117.95 108.05 85.947 70.185 x= 10.8565 4.9287 2.9304 1.9591 1.4218 1.0899 0.862 y should be: 0.0193 -0.025 -0.0336 -0.0426 -0.0441 -0.042 -0.0428 -------------------------------------------------------- But the result is way-off with over 100% error. I guess the problem is that poly-nomial fit using the least square method(the "\" in matlab) is good at "interpolate" data, but not good at "predict extended data". The test image has a x=10.8565, which is out of the range of the training data. Hence the predictor cannot correctly predict it. I am quite dispointed at this. There are two things I should do I guess, 1.)obtain more training set and do more training; 2) use a better fit model. A model that can extend... or a model can deal with little training information. Maybe neural network? Anybody give more ideas/suggestions/comments? Thanks a lot, -Walala
SOS!any good curve fitting/data analysis technique for this problem?
Started by ●October 23, 2003
Reply by ●October 23, 20032003-10-23
"walala" <mizhael@yahoo.com> wrote in message news:<bn7r80$4hq$1@mozo.cc.purdue.edu>...> I guess the problem is that > poly-nomial fit using the least square method(the "\" in matlab) is good at > "interpolate" data, but not good at "predict extended data".There is a demo somewhere deep inside matlab that demonstrates the difference between interpolation and data extension using polynomial fits. As far as I remember, there were nine or ten evenly spaced data points that generally described some linear trend. The polynomial nicely fitted the data points, with little deviation from the linear trend inside the data domain, but shot abruptly to plus and minus infinity just outside each terminal point. I don't remember exactly where that demo was, but try to type "demo" or "demos" at the matlab command prompt, and look among the demos on curve fitting. Rune
Reply by ●October 23, 20032003-10-23
walala wrote:> Dear all, > > I have big trouble on this curve fitting for a few days. > > The problem, I guess, is out-of-range test data. Suppose my training data > set for this curve fitting has range: x from 0 to 8; but my test data has x > to be 10, then it's out of training range, and my multi-nomial fit cannot > handle this well.Walala: In my opinion, higher order polynomials are not suited to fitting data. The problem you describe is intrinsic to polynomial fitting. You can always find a polynomial to fit your data points exactly, but the polynomial diverges wildly past the end points. OUP
Reply by ●October 23, 20032003-10-23
walala wrote:> Dear all, > > I have big trouble on this curve fitting for a few days.[snip attempt to use interpolation tools for extrapolation] Given a series 1, 2, 3, 4, 5, can you predict the sixth term? Would you believe 0? -80? 27? It is easy to construct formulas that will yield any of those values. When an interpolator constructs the best match over a set of points, it ignore all points not specified; internal unspecified points as well as those outside the range. Run a cubic spline through the points (-2, 0), (-1, 0), (0, 1), (1, 0), (2, 0). You will see a pretty good approximation of a sinc. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●October 23, 20032003-10-23
"Jerry Avins" <jya@ieee.org> wrote in message news:bn8v2j$ljt$1@bob.news.rcn.net...> walala wrote: > > > Dear all, > > > > I have big trouble on this curve fitting for a few days. > > [snip attempt to use interpolation tools for extrapolation] > > Given a series 1, 2, 3, 4, 5, can you predict the sixth term? Would you > believe 0? -80? 27? It is easy to construct formulas that will yield any > of those values. When an interpolator constructs the best match over a > set of points, it ignore all points not specified; internal unspecified > points as well as those outside the range. Run a cubic spline through > the points (-2, 0), (-1, 0), (0, 1), (1, 0), (2, 0). You will see a > pretty good approximation of a sinc. >Jerry, thanks for your answer. so you mean I'd choose cubic spline to fit my data, right? That's a better choice than polynomial, right? -Walala
Reply by ●October 23, 20032003-10-23
"One Usenet Poster" <me@my.computer.org> wrote in message news:vpfv097bkpdb76@corp.supernews.com...> > > walala wrote: > > Dear all, > > > > I have big trouble on this curve fitting for a few days. > > > > The problem, I guess, is out-of-range test data. Suppose my trainingdata> > set for this curve fitting has range: x from 0 to 8; but my test datahas x> > to be 10, then it's out of training range, and my multi-nomial fitcannot> > handle this well. > > Walala: > > In my opinion, higher order polynomials are not suited to fitting data. > The problem you describe is intrinsic to polynomial fitting. You can > always find a polynomial to fit your data points exactly, but the > polynomial diverges wildly past the end points. > > OUP >Dear OUP, Thanks for your answer. That's exactly my problem, how to solve that? -Walala
Reply by ●October 23, 20032003-10-23
"Rune Allnor" <allnor@tele.ntnu.no> wrote in message news:f56893ae.0310230112.77747faf@posting.google.com...> "walala" <mizhael@yahoo.com> wrote in messagenews:<bn7r80$4hq$1@mozo.cc.purdue.edu>...> > I guess the problem is that > > poly-nomial fit using the least square method(the "\" in matlab) is goodat> > "interpolate" data, but not good at "predict extended data". > > There is a demo somewhere deep inside matlab that demonstrates the > difference between interpolation and data extension using polynomial > fits. As far as I remember, there were nine or ten evenly spaced data > points that generally described some linear trend. The polynomial nicely > fitted the data points, with little deviation from the linear trend inside > the data domain, but shot abruptly to plus and minus infinity just outside > each terminal point. > > I don't remember exactly where that demo was, but try to type "demo" or > "demos" at the matlab command prompt, and look among the demos on curve > fitting. > > RuneDear Rune, Thanks for your answer. That's exactly my problem, how to solve that? -Walala
Reply by ●October 23, 20032003-10-23
"walala" <mizhael@yahoo.com> wrote in message news:bn7r80$4hq$1@mozo.cc.purdue.edu...> Dear all, > > I have big trouble on this curve fitting for a few days. > > The problem, I guess, is out-of-range test data. Suppose my training data > set for this curve fitting has range: x from 0 to 8; but my test data hasx> to be 10, then it's out of training range, and my multi-nomial fit cannot > handle this well.Microsoft's Excel has a LINEST function that can be used to find the coefficients for a formula. It has some pretty good examples too. You first need to choose the form of the equation for which you which to find the coefficients. Peter Nachtwey
Reply by ●October 23, 20032003-10-23
"walala" <mizhael@yahoo.com> wrote in message news:bn8vqt$kml$1@mozo.cc.purdue.edu...> > "Jerry Avins" <jya@ieee.org> wrote in message > news:bn8v2j$ljt$1@bob.news.rcn.net... > > walala wrote: > > > > > Dear all, > > > > > > I have big trouble on this curve fitting for a few days. > > > > [snip attempt to use interpolation tools for extrapolation] > > > > Given a series 1, 2, 3, 4, 5, can you predict the sixth term? Would you > > believe 0? -80? 27? It is easy to construct formulas that will yield any > > of those values. When an interpolator constructs the best match over a > > set of points, it ignore all points not specified; internal unspecified > > points as well as those outside the range. Run a cubic spline through > > the points (-2, 0), (-1, 0), (0, 1), (1, 0), (2, 0). You will see a > > pretty good approximation of a sinc. > > > > Jerry, > > thanks for your answer. so you mean I'd choose cubic spline to fit mydata,> right? That's a better choice than polynomial, right?Walala, Jerry goes in a good direction in his first sentence. In my reading of what you're trying to do it appears you're taking a number of data sets, capturing certain parameters and trying to predict a new data set by extracting the same parameter from new data, right? I may be way off base...... The potential problem I see with this has not much to do with techniques (polynomial, spline, etc) but rather that the data sets may have nothing whatsoever in common - there may be no correlation between them. If that's the case then there may be no sensible method you can use to predict. For example: The local bakery purchases ingredients once a month that include various fruit, nuts, flour, etc. They made apple pies on Monday and collected $123. They made cherry pies on Tuesday and collect $220. They made blueberry turnovers on Wednesday and collected $150. They collected $175 on Thursday. What were they selling on Thursday? With a naive modeling technique one might conclude that they made a mixed fruit cobbler with a moderately thick crust using apples, blueberries and cherries. In fact, what they sold on Thursday was strawberry/rhubarb pie or maybe bran muffins or maybe just coffee. If you have no common ground amongst the data sets and if the parameter extraction isn't somehow related to that commonality then I don't think you can build a predictor that makes any sense. Are you reasonably sure you don't have such a situation here? Independent of wanting to predict, why do you think that prediction makes "physical" sense in this situation? Why is this not like predicting the next throw of a dice game where each trial is independent of all that precede it? You can't build a predictor for a throw of dice - only state the probabilities of the possible outcomes. If there is something common that suggests prediction is possible, then what is the justification for the extracted parameter(s) that you will be using? This is usually the creative part of building a predictor - finding parameters to extract that will allow you to predict other characteristics. First go to the physics of the situation. Fred
Reply by ●October 23, 20032003-10-23
"Fred Marshall" <fmarshallx@remove_the_x.acm.org> wrote in message news:ofycncMvivR5gwWiU-KYkA@centurytel.net...> > "walala" <mizhael@yahoo.com> wrote in message > news:bn8vqt$kml$1@mozo.cc.purdue.edu... > > > > "Jerry Avins" <jya@ieee.org> wrote in message > > news:bn8v2j$ljt$1@bob.news.rcn.net... > > > walala wrote: > > > > > > > Dear all, > > > > > > > > I have big trouble on this curve fitting for a few days. > > > > > > [snip attempt to use interpolation tools for extrapolation] > > > > > > Given a series 1, 2, 3, 4, 5, can you predict the sixth term? Wouldyou> > > believe 0? -80? 27? It is easy to construct formulas that will yieldany> > > of those values. When an interpolator constructs the best match over a > > > set of points, it ignore all points not specified; internalunspecified> > > points as well as those outside the range. Run a cubic spline through > > > the points (-2, 0), (-1, 0), (0, 1), (1, 0), (2, 0). You will see a > > > pretty good approximation of a sinc. > > > > > > > Jerry, > > > > thanks for your answer. so you mean I'd choose cubic spline to fit my > data, > > right? That's a better choice than polynomial, right? > > Walala, > > Jerry goes in a good direction in his first sentence. In my reading ofwhat> you're trying to do it appears you're taking a number of data sets, > capturing certain parameters and trying to predict a new data set by > extracting the same parameter from new data, right? > > I may be way off base...... > The potential problem I see with this has not much to do with techniques > (polynomial, spline, etc) but rather that the data sets may have nothing > whatsoever in common - there may be no correlation between them. Ifthat's> the case then there may be no sensible method you can use to predict. > > For example: > The local bakery purchases ingredients once a month that include various > fruit, nuts, flour, etc. > They made apple pies on Monday and collected $123. > They made cherry pies on Tuesday and collect $220. > They made blueberry turnovers on Wednesday and collected $150. > > They collected $175 on Thursday. What were they selling on Thursday? > > With a naive modeling technique one might conclude that they made a mixed > fruit cobbler with a moderately thick crust using apples, blueberries and > cherries. > In fact, what they sold on Thursday was strawberry/rhubarb pie or maybebran> muffins or maybe just coffee. > > If you have no common ground amongst the data sets and if the parameter > extraction isn't somehow related to that commonality then I don't thinkyou> can build a predictor that makes any sense. Are you reasonably sure you > don't have such a situation here? Independent of wanting to predict, whydo> you think that prediction makes "physical" sense in this situation? > Why is this not like predicting the next throw of a dice game where each > trial is independent of all that precede it? You can't build a predictor > for a throw of dice - only state the probabilities of the possibleoutcomes.> > If there is something common that suggests prediction is possible, thenwhat> is the justification for the extracted parameter(s) that you will beusing?> This is usually the creative part of building a predictor - finding > parameters to extract that will allow you to predict othercharacteristics.> First go to the physics of the situation. > > FredI guess I might have said "you can't build a very *good* predictor for a roll of dice". You can always predict "7" for the sum of a pair of dice because that's the most likely outcome. However, are you satisfied with being right 1/6 of the time? I didn't think so when I wrote this...... Fred






