Hi. I've a bunch of signal data. 4000 points. I plot this 4000 points on an amplitude vs time axes. After I plot them, some points are completely "out of line". That is, visually you can see that they are sticking out. Way out compare to the rest of the data. The values are something like 291334000000.00 while the normal values are in the range of 36131.00. As a result, when I plot them to include all points, the data are flat with just a few lines that correspond to the out of line signal. How do I adjust my axes automatically at the point when I plot them so that I could see the normal points and signals like a usual plot. I dont really care about the extreme values. So far, I tried using mean but because the extreme values skewed the entire mean too much, it isnt helpful. Any advise please? Thank you.
Detecting out of line signal
Started by ●May 25, 2011
Reply by ●May 25, 20112011-05-25
Marc2050 <maarcc@n_o_s_p_a_m.gmail.com> wrote:> I've a bunch of signal data. 4000 points. > I plot this 4000 points on an amplitude vs time axes. > After I plot them, some points are completely "out of line".Are they supposed to be that far out? If not, then remove them.> That is, visually you can see that they are sticking out. Way out > compare to the rest of the data. The values are something like > 291334000000.00 while the normal values are in the range of 36131.00. > As a result, when I plot them to include all points, the data are flat > with just a few lines that correspond to the out of line signal.> How do I adjust my axes automatically at the point when I plot them so that > I could see the normal points and signals like a usual plot. I dont really > care about the extreme values.One popular way is to graph log(y) instead of y. That is usually done when there is a reason why the data should be exponential, but sometimes it is done even when there is no known reason. -- glen
Reply by ●May 25, 20112011-05-25
>Marc2050 <maarcc@n_o_s_p_a_m.gmail.com> wrote: > >> I've a bunch of signal data. 4000 points. >> I plot this 4000 points on an amplitude vs time axes. >> After I plot them, some points are completely "out of line". > >Are they supposed to be that far out? If not, then remove them. > >> That is, visually you can see that they are sticking out. Way out >> compare to the rest of the data. The values are something like >> 291334000000.00 while the normal values are in the range of 36131.00. >> As a result, when I plot them to include all points, the data are flat >> with just a few lines that correspond to the out of line signal. > >> How do I adjust my axes automatically at the point when I plot them sothat>> I could see the normal points and signals like a usual plot. I dontreally>> care about the extreme values. > >One popular way is to graph log(y) instead of y. That is usually >done when there is a reason why the data should be exponential, >but sometimes it is done even when there is no known reason. > >-- glen > >The thing is, I need this to be automated. Meaning, it is not seen by human and then decide if they are extreme values. And I have different set of data that comes in with different range. The above example is just one set. So, I cannot just do a threshold and throw away anything above certain values. Doing a log graph is possible. But then again, I would be faced with similar problem of having to decide how to automatically "zoom" to the right log scale. I'm thinking maybe some kind of peaks detection algorithm can help? Any pointers please?
Reply by ●May 25, 20112011-05-25
On May 25, 9:11�am, "Marc2050" <maarcc@n_o_s_p_a_m.gmail.com> wrote:> Hi. > > I've a bunch of signal data. 4000 points. > I plot this 4000 points on an amplitude vs time axes. > After I plot them, some points are completely "out of line". > That is, visually you can see that they are sticking out. Way out > compare to the rest of the data. The values are something like > 291334000000.00 while the normal values are in the range of 36131.00. > As a result, when I plot them to include all points, the data are flat > with > just a few lines that correspond to the out of line signal. > > How do I adjust my axes automatically at the point when I plot them so that > I could see the normal points and signals like a usual plot. I dont really > care about the extreme values. > > So far, I tried using mean but because the extreme values skewed the entire > mean too much, it isnt helpful. Any advise please?Google for 'outlier detection'. It is a common problem in statistical data analysis. Once you determine that a point is an outlier, you can remove it from the data set. Rune
Reply by ●May 25, 20112011-05-25
Marc2050 <maarcc@n_o_s_p_a_m.gmail.com> wrote:>>Marc2050 <maarcc@n_o_s_p_a_m.gmail.com> wrote:>>> I've a bunch of signal data. 4000 points. >>> I plot this 4000 points on an amplitude vs time axes. >>> After I plot them, some points are completely "out of line".(snip)> The thing is, I need this to be automated. Meaning, it is not > seen by human and then decide if they are extreme values. > And I have different set of data that comes in with different > range. The above example is just one set.For one, see the other post. But often the answer depends on the source of the data. Also, it helps to know where the big values are coming from. In the DSP world, a common source of data is digitized audio. If the usual range is in the + or - 30,000 it would be really noticable if a value of 3,000,000,000 came through. First, it can't normally come through an ADC, even at 24 bits. As an output from an audio amplifier, 30,000 might be 100 watts, but 3,000,000,000 would be 1,000,000,000,000 watts.> So, I cannot just do a threshold and throw away anything > above certain values. Doing a log graph is possible. > But then again, I would be faced with similar problem of > having to decide how to automatically "zoom" to the > right log scale.Well, with a log scale you can't usually get that far off. If you have 3,000,000,000 instead of 30,000 then it is about twice as big on a log scale. You will still see the features in the graph, though you have to look just a little more carefully.> I'm thinking maybe some kind of peaks detection algorithm > can help? Any pointers please?At some point it is statitical. Yes, peak detection but then figure out how likely it is for real data to get that big. -- glen
Reply by ●May 25, 20112011-05-25
On May 25, 2:11�am, "Marc2050" <maarcc@n_o_s_p_a_m.gmail.com> wrote:> Hi. > > I've a bunch of signal data. 4000 points. > I plot this 4000 points on an amplitude vs time axes. > After I plot them, some points are completely "out of line". > That is, visually you can see that they are sticking out. Way out > compare to the rest of the data. The values are something like > 291334000000.00 while the normal values are in the range of 36131.00. > As a result, when I plot them to include all points, the data are flat > with > just a few lines that correspond to the out of line signal. > > How do I adjust my axes automatically at the point when I plot them so that > I could see the normal points and signals like a usual plot. I dont really > care about the extreme values. > > So far, I tried using mean but because the extreme values skewed the entire > mean too much, it isnt helpful. Any advise please? > > Thank you.There was a lot of research being done on outliers, but your case _appears_ to fairly simple. Use a MEDIAN filter. I would recommend a 5th order or 7th order filter to start. For the simple case you have, it should clear it up automatically. Maurice Givens
Reply by ●May 25, 20112011-05-25
On May 25, 12:02�pm, maury <maury...@core.com> wrote:> On May 25, 2:11�am, "Marc2050" <maarcc@n_o_s_p_a_m.gmail.com> wrote: > > > > > > > Hi. > > > I've a bunch of signal data. 4000 points. > > I plot this 4000 points on an amplitude vs time axes. > > After I plot them, some points are completely "out of line". > > That is, visually you can see that they are sticking out. Way out > > compare to the rest of the data. The values are something like > > 291334000000.00 while the normal values are in the range of 36131.00. > > As a result, when I plot them to include all points, the data are flat > > with > > just a few lines that correspond to the out of line signal. > > > How do I adjust my axes automatically at the point when I plot them so that > > I could see the normal points and signals like a usual plot. I dont really > > care about the extreme values. > > > So far, I tried using mean but because the extreme values skewed the entire > > mean too much, it isnt helpful. Any advise please? > > > Thank you. > > There was a lot of research being done on outliers, but your case > _appears_ �to fairly simple. Use a MEDIAN filter. I would recommend a > 5th order or 7th order filter to start. For the simple case you have, > it should clear it up automatically. > > Maurice Givens- Hide quoted text - > > - Show quoted text -By the way, adaptive filters are _very_ susceptible to impulse noise in the error signal. The median LMS works great for this. Peter Clarkson (my initial research advisor) was one of the pioneers of this algorithm. If interested, look up median LMS by Clarkson, also, in 1991 I did ICCASP using the Median LMS to cure impulse noise on ADPCM channels.
Reply by ●May 25, 20112011-05-25
On 5/25/2011 1:41 AM, Rune Allnor wrote:> On May 25, 9:11 am, "Marc2050"<maarcc@n_o_s_p_a_m.gmail.com> wrote: >> Hi. >> >> I've a bunch of signal data. 4000 points. >> I plot this 4000 points on an amplitude vs time axes. >> After I plot them, some points are completely "out of line". >> That is, visually you can see that they are sticking out. Way out >> compare to the rest of the data. The values are something like >> 291334000000.00 while the normal values are in the range of 36131.00. >> As a result, when I plot them to include all points, the data are flat >> with >> just a few lines that correspond to the out of line signal. >> >> How do I adjust my axes automatically at the point when I plot them so that >> I could see the normal points and signals like a usual plot. I dont really >> care about the extreme values. >> >> So far, I tried using mean but because the extreme values skewed the entire >> mean too much, it isnt helpful. Any advise please? > > Google for 'outlier detection'. It is a common problem > in statistical data analysis. Once you determine that a > point is an outlier, you can remove it from the data set. > > RuneRune's advice is sound. Take a look. A very simple approach (conceptually) is to pass a simple FIR filter over the data with these coefficients whose sum is zero: [0.5 -1.0 0.5] If the data points are "close enough" then the result is close to zero. If the center data point is "far enough removed" form those around it, then the result is large and is suspect. You have to decide what "large" is. The obvious problem with this implementation is that data points close to the outlier can come out looking "large" as well. So, some tuning is in order. You might find that some suitable variant e.g. [0.1 0.1 0.1 0.1 0.1 -1.0 0.1 0.1 0.1 0.1 0.1] or [0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 -1.0 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05] is more suitable for your data set because 0.1 X the max outlier or 0.05 X the max outlier is not going to be "detected" with your threshold. Then, it's often the case that the optimum detection threshold is at 0.5 the distance between "normal" and "too large". So, you might find that the maximum "normal" test output is 4 and the minimum outlier test output is 24. So, you'd set the threshold at 14 to avoid both false detects AND false rests. If you make it too tight then you can expect more declarations of outliers. Run this filter over the data or use it in a streaming mode. If the output is "large" then replace the "outlier" with some simple interpolated value like [0.5 0 0.5] (a linear interpolation) which can be an output of the same filter. It might look like this: +----------- Normal Output | | | +--+--+ +--+--+ | +--+--+ +--+--+ -+-+ +-+-+ +-+-+ +-+-+ +-+ | |z^-1 | | |z^-1 | | |z^-1 | | |z^-1 | | | +--+--+ | +--+--+ | +--+--+ | +--+--+ | | | | | | | | /-+-\ | | | | |-1 | | | | | \-+-/ | | | | V | | | | +--+--+ | | | | | SUM |?? | | | | |Large| | | | | +--+--+ | | | | ^ | | /-+-\ /-+-\ | /-+-\ /-+-\ |.25 |.25| | |.25| .25 | \-+-/ \-+-/ | \-+-/ \-+-/ | | +--+--+ | | | | | | | | +---------+------| SUM |------+---------+ +--+--+ | V Alternate Output That is: N-1 Compare y(n) = sum x(n+k)*h(k) with the selected threshold. k=0 If it's too large, replace x(n+(N-1)/2) with: N-1 y'(n-(N-1)/2)) = sum x(n+k)*h(k) k=0 k!=(N-1)/2 Of course, you could get fancier and use a different Alternate Output than the one that's used as part of the detector. There's no need to tie the detector and the output interpolator as I did here. Note: I don't vouch for the statistical implications or even the advisability of removing outliers - that's up to you. And, I don't suggest that there isn't a better way. But, at least this is reasonably *simple*.... even though it takes some tuning for your data sets. Fred
Reply by ●May 25, 20112011-05-25
On 05/25/2011 01:41 AM, Rune Allnor wrote:> On May 25, 9:11 am, "Marc2050"<maarcc@n_o_s_p_a_m.gmail.com> wrote: >> Hi. >> >> I've a bunch of signal data. 4000 points. >> I plot this 4000 points on an amplitude vs time axes. >> After I plot them, some points are completely "out of line". >> That is, visually you can see that they are sticking out. Way out >> compare to the rest of the data. The values are something like >> 291334000000.00 while the normal values are in the range of 36131.00. >> As a result, when I plot them to include all points, the data are flat >> with >> just a few lines that correspond to the out of line signal. >> >> How do I adjust my axes automatically at the point when I plot them so that >> I could see the normal points and signals like a usual plot. I dont really >> care about the extreme values. >> >> So far, I tried using mean but because the extreme values skewed the entire >> mean too much, it isnt helpful. Any advise please? > > Google for 'outlier detection'. It is a common problem > in statistical data analysis. Once you determine that a > point is an outlier, you can remove it from the data set.Or remove it from the data set that you use to determine your display axes, but mark the point as an outlier (or show the spike going off-screen) on the display. Depending on what's causing them, and what you're trying to show people, of course. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com Do you need to implement control loops in software? "Applied Control Theory for Embedded Systems" was written for you. See details at http://www.wescottdesign.com/actfes/actfes.html
Reply by ●May 25, 20112011-05-25
>On May 25, 2:11=A0am, "Marc2050" <maarcc@n_o_s_p_a_m.gmail.com> wrote: >> Hi. >> >> I've a bunch of signal data. 4000 points. >> I plot this 4000 points on an amplitude vs time axes. >> After I plot them, some points are completely "out of line". >> That is, visually you can see that they are sticking out. Way out >> compare to the rest of the data. The values are something like >> 291334000000.00 while the normal values are in the range of 36131.00. >> As a result, when I plot them to include all points, the data are flat >> with >> just a few lines that correspond to the out of line signal. >> >> How do I adjust my axes automatically at the point when I plot them soth=>at >> I could see the normal points and signals like a usual plot. I dontreall=>y >> care about the extreme values. >> >> So far, I tried using mean but because the extreme values skewed theenti=>re >> mean too much, it isnt helpful. Any advise please? >> >> Thank you. > >There was a lot of research being done on outliers, but your case >_appears_ to fairly simple. Use a MEDIAN filter. I would recommend a >5th order or 7th order filter to start. For the simple case you have, >it should clear it up automatically. > >Maurice Givens >But a MEDIAN filter will modify your data? If I only want to remove/ignore/replace the outliners, WITHOUT modifying the original and the rest of the data, then is MEDIAN filter still suitable?






