I am trying to develop a data anomaly detector. I basically want to detect clipped data, spikes, and drifting data to begin with. Any suggestions on how to do it.
data anomaly detection
Started by ●February 15, 2007
Reply by ●February 15, 20072007-02-15
lakshmanan.meyyappan@gmail.com wrote:> I am trying to develop a data anomaly detector. I basically want to > detect clipped data, spikes, and drifting data to begin with. Any > suggestions on how to do it.You want to make quality judgments. Defining the characteristics that flag poor quality in your application is the first task. It can't be reliably found if it can't be rigorously defined. Jerry -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Reply by ●February 15, 20072007-02-15
On Feb 15, 2:37 pm, lakshmanan.meyyap...@gmail.com wrote:> I am trying to develop a data anomaly detector. I basically want to > detect clipped data, spikes, and drifting data to begin with. Any > suggestions on how to do it.It would be helpful if you could provide a description of the non- anomolous data. For example if it is a sine wave then the instantaneous frequency could be a useful metric. John
Reply by ●February 16, 20072007-02-16
lakshmanancom wrote:> I am trying to develop a data anomaly detector. I basically want to > detect clipped data, spikes, and drifting data to begin with. Any > suggestions on how to do it.You can try difference filters (eg. [1/4 -1/2 1/4]): for clips, the output of the difference filter is close to zero for several samples. For spikes, the output varies wildly for several samples. To detect drift, subtract the output of the difference filter from the input (delayed by one sample). If the output grows or is very large (compared to the input), you have drift. Regards, Andor
Reply by ●February 18, 20072007-02-18
On Feb 15, 11:37 am, lakshmanan.meyyap...@gmail.com wrote:> I am trying to develop a data anomaly detector. I basically want to > detect clipped data, spikes, and drifting data to begin with. Any > suggestions on how to do it.What do you mean by drifting? If you're trying to detect high-frequency anomalies, like a clipp or spike, I would suggest using multirate analysis (i.e. filterbanks). This is very similar to what Andor wrote, regarding difference filters. Basically, the signal is input to a multi-channel filter bank, where one or more of the channels are expected to catch anomalies within a particular frequency range. During the analysis stage you could monitor the magnitude of your output channels, and when you detect a large increase in magnitude you can flag that as an anomly. The theory behind multirate analysis is somewhat advanced but the actual implementation is super easy with Haar wavelets, assuming you are dealing with discrete samples. Email me if you're interested in a better explanation. -marc
Reply by ●February 18, 20072007-02-18
On 15 Feb, 20:37, lakshmanan.meyyap...@gmail.com wrote:> I am trying to develop a data anomaly detector. I basically want to > detect clipped data, spikes, and drifting data to begin with. Any > suggestions on how to do it.The easy stuff first: Clipped data. In a fixed-point numerical format, you can check for the maximum and minimum integer values. In a system with a floating-point ADC you might have to check with some tabulated values. A bit more cumbersome, but not at all impossile. For outliers, check median filters. It was discussed here last summer: http://groups.google.no/group/comp.dsp/msg/9f740be9bda608d4?hl=no& For data drift, select a window frame length and map mean or median values inside the frames. Rune
Reply by ●February 20, 20072007-02-20
Thanks a lot for all your replies. To answer some of your questions ... I am actually trying to develop a generic data anomaly dtection toolkit for my project. My group maily deals with engineering data ... temperatures, speed, torque, stress, pressure and so on. I will summarize what I have done as I think it will also be useful for someone else ... Here's what I have done so far ... (if you think what I am doing is not correct or if u think there is a better way to do it, please let me know) For Clipping: I am getting indices of max/min values from the data. If the max or min values are consecutive, then I raise a flag that it could possibly be clipped. If a max value flat line is followed by a min value flat line, I conclude that it possibly a digital on/off type signal. I also give the user the option to enter range. If the user does that, it is more accurate. Drifting Data: I am breaking the entire plot area into 20 windows. Each window contains 5% of the data. Calculate the mean values within each window and store it in a array of size 20. The I calculate the standard deviation among these 20 values. If the data is normal the standard deviation should be reasonably small. If not, then data is either drifting or its a ramp signal Noise/Spikes Method 1: Amplitute Threshold Detection Take user input on max and min thresolds of data. Anything beyond that is spike Method 2: Amplitute Threshold Detection - no user input Anything beyond mean + 5 times std dveiation is a spike Method 3: Differencial Threshold Detection Calculate the abs value of slope of each consecutive points. If slope increase dramatically, its a spike Method 4: Running standard deviation I will be adding a few more data anomaly checks. Will keep you posted Thanks Laks
Reply by ●February 20, 20072007-02-20
lakshmanan.meyyappan@gmail.com wrote:> Thanks a lot for all your replies. > > To answer some of your questions ... I am actually trying to develop a > generic data anomaly dtection toolkit for my project. My group maily > deals with engineering data ... temperatures, speed, torque, stress, > pressure and so on. > > I will summarize what I have done as I think it will also be useful > for someone else ... > Here's what I have done so far ... (if you think what I am doing is > not correct or if u think there is a better way to do it, please let > me know) > > For Clipping: > I am getting indices of max/min values from the data. If the max or > min values are consecutive, then I raise a flag that it could possibly > be clipped. If a max value flat line is followed by a min value flat > line, I conclude that it possibly a digital on/off type signal. I also > give the user the option to enter range. If the user does that, it is > more accurate.How does clipping happen in your environment? If it is internal to the computer and in integer format. there could be numerical wraparound. That can sometimes be detected as a change in sign of successive numbers of large magnitude. If it is in an analog sensor, there might be saturation but only close successive values.> Drifting Data: > I am breaking the entire plot area into 20 windows. Each window > contains 5% of the data. Calculate the mean values within each window > and store it in a array of size 20. The I calculate the standard > deviation among these 20 values. If the data is normal the standard > deviation should be reasonably small. If not, then data is either > drifting or its a ramp signalOr overlain with low-frequency AC.> Noise/Spikes > Method 1: Amplitute Threshold Detection > Take user input on max and min thresolds of data. Anything beyond that > is spike > Method 2: Amplitute Threshold Detection - no user input > Anything beyond mean + 5 times std dveiation is a spike > Method 3: Differencial Threshold Detection > Calculate the abs value of slope of each consecutive points. If slope > increase dramatically, its a spike > Method 4: Running standard deviation > > I will be adding a few more data anomaly checks. Will keep you postedThanks for that. Jerry -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Reply by ●February 20, 20072007-02-20
Clipped data is where the real data values exceed the full scale limits of the calibrated acquisition unit. For example, if I configure my daq to measure strains in the range of -1000 me to +1000mu.e, any value outside this range are clipped
Reply by ●February 20, 20072007-02-20
lakshmanan.meyyappan@gmail.com wrote:> Clipped data is where the real data values exceed the full scale > limits of the calibrated acquisition unit. For example, if I configure > my daq to measure strains in the range of -1000 me to +1000mu.e, any > value outside this range are clippedSure, but what number is reported? Some analog circuits -- op amps especially -- fold back with large overloads. I bound those when I use them in instrumentation and set their max output just under full ADC scale. The principle is simple: "Once bitten, twice shy." Jerry -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯






