I'm working on a problem where there is a "system" (really an algorithm) that inputs a number of data values and outputs the position of a toggle switch. So, based on one state (or short history of states) of the inputs, the switch will toggle to ON. And, based on other states of the inputs, the switch will toggle to OFF. The objective is to find the states of the inputs to cause the toggles in sync with historical performance. We have a fair history of both the inputs and the corresponding outputs. So, it sounds like a system identification problem.
For now, I have an algorithm that works on many cases perfectly. But some cases generate more disagreements than are acceptable.
I'm looking for ideas that might get me thinking in new directions for this task. I'm not an expert in system ID and maybe this task isn't well-suited or is at least tough because of the decision-making and binary output. But I figured I would ask. I'm doing most of the work using Excel but also have Matlab available.
I would consider running away, screaming. Or perhaps strolling off quietly while acting as casually as I can manage, so as not to attract any attention.
Seriously, it's a rough problem that you've chosen. Most "system identification" problems assume a linear system, or perhaps a system that has tractable nonlinearities. You've got a system with a relay nonlinearity in it, and that makes life hard.
It sounds like you're trying to do this blind, which is, in my opinion, a non-starter. Can you tell us whether you have any idea of the internal workings of the system? Being able to make a parameterized model of the system, and then finding the parameter values through some sort of fitting to observed behavior, would probably be your best bet -- if you knew that you had the right structure to hang your parameters on.
As for doing the work entirely blind, consider the search for the nature of light. There's about 2400 years of blind system identification between Empedocles and Planck, and even now, 115 years after Planck, we're still discovering new things about how light works. So if you can afford a 2500 year span between deciding to start your blind system ID and actually getting results -- go for it!
It sounds like a statistical problem. You have to look for hysteresis from previous states as well as the short history. I would plot a history of states coded to present state with a variable number of history times and look for patterns. If you can get even 10 blocks of really long times down to 1000 blocks of short times you may see something that "pops out". Then you can attempt to fit a curve to it.
It's kind of like an FFT - the longer time span you work with, the more resolution you have. That allows you to go to higher order curves for the estimate.
But if it really is non-linear then you'll have pathological cases you just can't guess correctly. An underlying chaotic system would be an example - no matter how well you model it you won't be able to predict the outcome for long times. That gets you to "what is good enough". If it agrees 90% of the time, or 95% of the time, it may simply be the best you can do.
Forgive my pedantism, as I tend to ramble. The best solutions can only be found from the best problem statements, and I find the problem statement too sparse and ambiguous for any solution, let alone an optimized one. The statement "inputs a number of data values" is too ambiguous. Are data values binary (like the output) or multivalued? If the later, how many bits? Floating point? What precision if so? Does "number" imply multiple inputs (e.g. from multiple sensors) or a time-series sequence of values from one (or more) sensors, which may or may not be sampled at any specific interval? Does the data exhibit hysteresis or other nonlinear effects?
Is this a simple discrete and finite state automata (i.e. designed using N-tuple specifications)? If so, it can be specified as various states for each N. Is this a continuous-time system? Invariant or not? Is it linear or nonlinear? If nonlinear, is the function fully specified so that it can be managed as one or more linear approximation segments? Is it subject to noise? If so, what kind? From a system model perspective, even temperature dependence can be "noise".
What is the criteria for judging bad vs. good performance from the historical data? How is that verified? Is it simply a "confidence" that determines "correctness" (i.e. more or less subjective)? Maybe this is a good choice for a purely discrete solution using one of the many AI "machine learning" techniques?
And how "bad" is bad? If the output is in the "wrong" state, what are the consequences? Is this time-critical and/or safety-critical? Is it OK for the output to glitch "momentarily" ("momentarily" obviously needs to be defined) like some of the input to an auto-pilot that causes control-motion-noise (i.e. shakes the rudders but doesn't affect the course/heading)? Note glitches can be of either polarity.
I accept that this may be "proprietary" information, but it should be simple enough to provide more engineering detail without compromising whatever is proprietary.
I recall a statement by P.J. Plauger of "C" fame (paraphrasing): "Keep all (raw) data for as long as possible and only when it is certain that the desired solution has been obtained, can any data be discarded." Clearly, it is better to over-specify than to under-specify a system. By definition, superfluous data can be discarded at any time.
Do you know all inputs and corresponding/desired outputs?
If so, how many are there inputs vs. outputs?
It should be manageable in coming up with an algorithm.
I will try to answer the questions but I'm not sure I'm doing us any favors....
The inputs are decimal to 2 or 3 decimal places after the decimal point. I don't think this matters much but in some cases the details could matter.
I have a large set of historical inputs and outputs - more than are likely needed but certainly enough to test consistency of a model.
There are 5 decimal inputs per sample epoch.
There is one binary output per sample epoch.
The system is obviously? nonlinear.
This is not a physical system in those terms - it's an algorithm - so a glitch in the output is not acceptable as a general rule. In the end, a few glitches may be OK but in terms of objectives, no. There is an overall performance measure over the entire history. So if there are glitches then the result will be measured by the overall performance. But that's a detail that's beyond this question. I must say that meeting the overall performance has proved to be elusive unless there are *no* glitches.
Simply stated, each output transition causes a result that is added to a summation of individual results. If the outputs are matched to the history then the aggregate result matches the historical result. If the outputs don't match the historical outputs then there is a difference in the aggregate. I've not found cases where a "better" model has a better resulting aggregate.
Is there hysteresis? There could be. I have no idea. That's part of the system identification problem. If so, I imagine it involving at most one sample epoch.
"Is this a simple discrete and finite state automata (i.e. designed using N-tuple specifications)? " Darned if I know, I didn't design it. I'm trying to characterize it.
"What is the criteria for judging bad vs. good performance from the historical data?" Ideally, the derived model matches the historical outputs given the same historical inputs.
It seems to me that it would be more useful to consider this as a classification problem rather than a system identification one. Even when nonlinear, system identification assumes given inputs and outputs from a known system, and looks for the coefficients. Since you stress that this is an algorithm, I assume that you mean that it is not a feedforward calculation with some number of coefficients (technically a filter is an algorithm too!).
So look at the problem as a supervised machine learning one. You have a set of labeled data points, some belong to switch ON and some to switch OFF. The problem is that you say that there is history - i.e., the ON or OFF is not only determined from the inputs NOW, but from the inputs NOW and for some time in the past (if I understood you correctly). So, you would need to input to the learning algorithm the inputs for some time in the past as well.
The first thing I do in such cases is to judge how difficult the problem is by using visualization techniques (e.g., Sammon nonlinear mapping). Then you need to pick a classifier and estimate the number of free parameters it should have. Then you separate your data into training, validation, and test sets and perform the training and evaluation.
Y(J)S, I think I understand what you're suggesting. Let me paraphrase it back: Given a series of states of the inputs, can those inputs be classified into the outputs: switch ON, don't care or do nothing, switch OFF.
Because it's a toggle switch, the current state of the switch, I suppose, might also be considered an "input". It makes little sense to generate "switch ON" when the switch is already ON, etc. And, in other cases there could be a rule for the *same* inputs to *change* the output state - making the output state an "input". So, in that sense there is feedback. That's inherent in my current analysis but I'd not thought of it in quite that way. I fully expect that the hidden algorithm could have rules such as:
1) IF Q then ON
2) IF P AND NOT Q then OFF
3) IF OFF AND xyz THEN OFF
4) IF ON and xyz THEN ON
5) IF OFF AND abc THEN ON
6) IF ON and abc THEN OFF.
1 and 2 are more or less independent and not output state dependent as stated.
3 and 4 define a "don't care / no change".
5 and 6 define a change that's input and output state dependent.
Otherwise, I really doubt that there is much history being carried forward. I have coded variable states in a gross classification sense (binary: A>=B or A>=k*B and A*b>) and weighted each with 1,2,4,8 ... so there is a unique code for each. The codes can be displayed for one output epoch so previous codes (history) are lined up with the current. I see correlation between codes and erroneous outputs but not much in the preceding inputs.
I've had no familiarity with Sammon's Mapping. The mathematics appear to be clear enough but defining the terms is a bit fuzzy to me right now. That is, how to get started with what I have to deal with?
It doesn't matter that you think of the switch position as an input - if it changes the observed behavior, you can consider it as separating the two possible behaviors into two classes. BTW, one way of inverting a given functionality for which you don't analytically know the inverse is to train a system input with the original system's output to give the input.
Sammon is very simple to use, but I only meant it as a first step. All you need to do is to run Sammon's NLM (you can find plenty of open source code for it) on all your data set (I believe you said that each point is 5 real numbers), and display the plot with two colors, one for ON and one for OFF. It you see a clear separation between the two colors, then training a classifier will be easy. You can also get a feeling as to whether a hyperplanar or hyperspherical classifier is suitable.
Sounds like a fun problem. You should be able to apply your solution to automatic buying/selling decisions in the stock market :-)
It sounds like simple characterizations did not pan out. You might consider a deep-learning approach: see if you can train up a neural net that will make the right decisions, and then try to interpret/idealize the weights. They have ways of pruning the net down to what is really working. I hope you have a lot of performance data to train on.
JOS: Yes I have plenty of historical inputs and outputs to work with and due to the short-memory that I anticipate, the results should be fairly well independent - maybe completely independent other than the feedback I mentioned.
I don't know about automatic buying/selling on the stock market as far as this problem is concerned because the results in this case are known and the process is not and the objective is to match the switches. But, since you mention it, if one were to postulate optimum stock market trades taken against price history and then match them, it would be similar I suppose. I've already done something like that but the objective function was to maximize the financial outcome. It worked on paper but not well in the real world .. I'm still pondering that one!
In the mean time, I hope someone can push me into Sammon's Mapping as it sounds intriguing and I'm having a hard time getting started.
Well, I got the Sammon working in Matlab somewhat. I had to remove some of the less-than important data to get it to run. In the end I got some numbers which plot up to look sort of interesting but not anything earthshaking. The output is meaningless to me so far! For example, I'd have a hard time relating the output to anything I need or understand.
The notion of classification seems much more fruitful. So I'm going to deal with that for a few days.
Any suggestions how to interpret the Sammon mapping output? All I understand is that there are 2 dimensions (or 3) with numbers in each. What do these dimensions mean? Just pretty pictures don't help me. I know it's a mapping but what is the mapped space relative to the space mapped *from*?