DSPRelated.com
Forums

the "wavyness" question re-formulated

Started by lluu...@yahoo.com November 24, 2006
Guys, thank you very much for your enthusiastic responses.  Really
appreciate your help.

I guess I should have specified the origin of the signals first.

My data are biomedical signals from patients of different age groups
and genders.  Its  varies greatly due to medical conditions and age.  I
am trying to classfy the signal into several categories and to see if
there is any causal relationship between signal classes and disease or
age groups.

I have a lot of these signals (about a million signals, each with a
frame length of 1024)

By inspecting the some of the signals in the time domain, here is my
impression

1.  Some signals have drift -- maybe a constant drift.  Some signals
may have zero drift

2.  Signals might be classified into the following cateories (just by
inspection in the time domain)
Category (1) Almost a straight line with some noise.
Category (2) Brownian motions (random walk) overlaid with a drift.
Category (3) An overlay of harmonic waves of several frequency with a
zero or nonzero drift.

I guess I should re-formulate the questions as:

Question 1.  Is there a systematic approach (or perspectives) to
classify signals.

(1A) For instance, if I use the Fourier perspective, what are the
characteristics in the fourier domain for Category(1) and Category(2)

(1B) If I use the Random walk perspective, is there any characteristics
of Random walk that can differentiate Category(1) (2) and (3)

Question 2.  How do I fit a random walk signal to extract its drift and
the std deviation of the normally distributed increments.

Question 3.  How do I extract drift if the signal is an overlay of
several harmonics with a constant drift

In summary, the following cases would have an increasing degree of
"wavyness"
(4A) Straight line  -- "ZERO wavyness"
(4B) Random walk coupled with a zero or a constant drift -- "some
wavyness"
(4B) Overlay of several harmonics with a zero or a constant drift -- "a
lot of wavyness"

These are just from my "mental classifier".  I feel that using
"wavyness" appears to be a good measure.  My goal is to classify the
signals with a consistent measure -- using the measures raised in
Questions 1, 2 and 3

Thank you again for your valuable input.

Alex

Vladimir Vassilevsky wrote:
> lluum@yahoo.com wrote: > > > Thank you very much for your kind input. > > > > However, after looking into this, variance would NOT work. > > > > Reason: > > variance squares the departure from mean, a straigth line may well has > > the same variance as that of a sine wave. > > You can measure the standard deviation of the n-th derivative, if you like. > > However, as it is usual in this newsgroup, you are asking the wrong > questions and getting the wrong answers. > > First, can you tell what exactly are you trying to accomplish and how > that "wavyness" parameter is intended to be used? > > Vladimir Vassilevsky > > DSP and Mixed Signal Design Consultant > > http://www.abvolt.com
lluum@yahoo.com skrev:
> Guys, thank you very much for your enthusiastic responses. Really > appreciate your help. > > I guess I should have specified the origin of the signals first. > > My data are biomedical signals from patients of different age groups > and genders. Its varies greatly due to medical conditions and age. I > am trying to classfy the signal into several categories and to see if > there is any causal relationship between signal classes and disease or > age groups. > > I have a lot of these signals (about a million signals, each with a > frame length of 1024) > > By inspecting the some of the signals in the time domain, here is my > impression > > 1. Some signals have drift -- maybe a constant drift. Some signals > may have zero drift > > 2. Signals might be classified into the following cateories (just by > inspection in the time domain) > Category (1) Almost a straight line with some noise. > Category (2) Brownian motions (random walk) overlaid with a drift. > Category (3) An overlay of harmonic waves of several frequency with a > zero or nonzero drift. > > I guess I should re-formulate the questions as: > > Question 1. Is there a systematic approach (or perspectives) to > classify signals.
The one system or perspective that works (some of the time) is that you, the user, decides what model to use for describing any one signal, and work from that. You can not have the computer decide for you. Choosing signal models is a veritable black art, as you have found out, where any analyst's personal *opinion* comes in to play. You decide to analyze signal x under model A, I might prefer model B for the same data set. The analysis results differ, maybe significantly, but no one are definately right or wrong. At least when reasonably competent analysts are involved. The best you can do is to work through a couple of models for data analysis, say, some for random-walk type of data and some AR analysis, and get to know the methods. Find out how and why each method works, and what property of the data make them work. Test the methods with synthetic data "favorable" for each method. Once you get a feel for how the methods work in the favorable case, break the pressumptions behind the methods one by one, and see what effect this has on the analysis results. Do you think this sounds like a lot of work? That's because it is. To paraphrase some ancient wise man (Archimedes to Alexander the great?): There is no royal road to DSP. Rune
You might try performing FFT on each data block. After sorting the data
by disease and age you might inspect the groups of data for any
commonalties that appear evident in the frequency domain. It sounds like
you have already been able to, by visual inspection,  classify the data
into 3 frequency groups high, middle and low.

	Also, You mention some of the data looks like a random walk. The
implication is that the data is bouncing back and forth nearly every
other sample. This suggests the possibility that the data may not even
be valid. Presumably your data represents measured voltage at regular
intervals of time. If the signal (whatever it is generating the voltage)
is changing at a rate that is too fast for your sampling time then the
data may be corrupted by aliasing.
	
-jim



"lluum@yahoo.com" wrote:
> > Guys, thank you very much for your enthusiastic responses. Really > appreciate your help. > > I guess I should have specified the origin of the signals first. > > My data are biomedical signals from patients of different age groups > and genders. Its varies greatly due to medical conditions and age. I > am trying to classfy the signal into several categories and to see if > there is any causal relationship between signal classes and disease or > age groups. > > I have a lot of these signals (about a million signals, each with a > frame length of 1024) > > By inspecting the some of the signals in the time domain, here is my > impression > > 1. Some signals have drift -- maybe a constant drift. Some signals > may have zero drift > > 2. Signals might be classified into the following cateories (just by > inspection in the time domain) > Category (1) Almost a straight line with some noise. > Category (2) Brownian motions (random walk) overlaid with a drift. > Category (3) An overlay of harmonic waves of several frequency with a > zero or nonzero drift. > > I guess I should re-formulate the questions as: > > Question 1. Is there a systematic approach (or perspectives) to > classify signals. > > (1A) For instance, if I use the Fourier perspective, what are the > characteristics in the fourier domain for Category(1) and Category(2) > > (1B) If I use the Random walk perspective, is there any characteristics > of Random walk that can differentiate Category(1) (2) and (3) > > Question 2. How do I fit a random walk signal to extract its drift and > the std deviation of the normally distributed increments. > > Question 3. How do I extract drift if the signal is an overlay of > several harmonics with a constant drift > > In summary, the following cases would have an increasing degree of > "wavyness" > (4A) Straight line -- "ZERO wavyness" > (4B) Random walk coupled with a zero or a constant drift -- "some > wavyness" > (4B) Overlay of several harmonics with a zero or a constant drift -- "a > lot of wavyness" > > These are just from my "mental classifier". I feel that using > "wavyness" appears to be a good measure. My goal is to classify the > signals with a consistent measure -- using the measures raised in > Questions 1, 2 and 3 > > Thank you again for your valuable input. > > Alex > > Vladimir Vassilevsky wrote: > > lluum@yahoo.com wrote: > > > > > Thank you very much for your kind input. > > > > > > However, after looking into this, variance would NOT work. > > > > > > Reason: > > > variance squares the departure from mean, a straigth line may well has > > > the same variance as that of a sine wave. > > > > You can measure the standard deviation of the n-th derivative, if you like. > > > > However, as it is usual in this newsgroup, you are asking the wrong > > questions and getting the wrong answers. > > > > First, can you tell what exactly are you trying to accomplish and how > > that "wavyness" parameter is intended to be used? > > > > Vladimir Vassilevsky > > > > DSP and Mixed Signal Design Consultant > > > > http://www.abvolt.com
----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==---- http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups ----= East and West-Coast Server Farms - Total Privacy via Encryption =----
jim wrote:
> You might try performing FFT on each data block. After sorting the data > by disease and age you might inspect the groups of data for any > commonalties that appear evident in the frequency domain. It sounds like > you have already been able to, by visual inspection, classify the data > into 3 frequency groups high, middle and low. > > Also, You mention some of the data looks like a random walk. The > implication is that the data is bouncing back and forth nearly every > other sample. This suggests the possibility that the data may not even > be valid. Presumably your data represents measured voltage at regular > intervals of time. If the signal (whatever it is generating the voltage) > is changing at a rate that is too fast for your sampling time then the > data may be corrupted by aliasing. > > -jim >
Are you sure the low frequency drift isn't noise from the sensor? Look up "baseline wander" to see if that applies to your situation. There are many ways to take it out, from a simple HPF (which can distort the frequencies of interest) to sophisticated prediction schemes. John

lluum@yahoo.com wrote:


> My data are biomedical signals from patients of different age groups > and genders. Its varies greatly due to medical conditions and age. I > am trying to classfy the signal into several categories and to see if > there is any causal relationship between signal classes and disease or > age groups. > > I have a lot of these signals (about a million signals, each with a > frame length of 1024) > > By inspecting the some of the signals in the time domain, here is my > impression > > 1. Some signals have drift -- maybe a constant drift. Some signals > may have zero drift > > 2. Signals might be classified into the following cateories (just by > inspection in the time domain) > Category (1) Almost a straight line with some noise. > Category (2) Brownian motions (random walk) overlaid with a drift. > Category (3) An overlay of harmonic waves of several frequency with a > zero or nonzero drift. > > I guess I should re-formulate the questions as: > > Question 1. Is there a systematic approach (or perspectives) to > classify signals.
As Rune already noted, the classification is something that you have to decide. I can help you with the clear representation of the data and with getting numbers.
> > (1A) For instance, if I use the Fourier perspective, what are the > characteristics in the fourier domain for Category(1) and Category(2) > > (1B) If I use the Random walk perspective, is there any characteristics > of Random walk that can differentiate Category(1) (2) and (3)
The Fourier transform emphasises the periodic events. I.e. if an event happens with a certain period, there will be a peak on the spectrum. This also applies for the infinite period, i.e. non-changing signal. The spectrum of a random sequence is random.
> > Question 2. How do I fit a random walk signal to extract its drift and > the std deviation of the normally distributed increments. > > Question 3. How do I extract drift if the signal is an overlay of > several harmonics with a constant drift > > In summary, the following cases would have an increasing degree of > "wavyness" > (4A) Straight line -- "ZERO wavyness" > (4B) Random walk coupled with a zero or a constant drift -- "some > wavyness" > (4B) Overlay of several harmonics with a zero or a constant drift -- "a > lot of wavyness"
The straightforward way to get some numbers is to find the peaks of the Fourier chart and compare them one to another and to the average level.
> > These are just from my "mental classifier". I feel that using > "wavyness" appears to be a good measure. My goal is to classify the > signals with a consistent measure -- using the measures raised in > Questions 1, 2 and 3
Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
lluum@yahoo.com wrote:
> Guys, thank you very much for your enthusiastic responses. Really > appreciate your help. > > I guess I should have specified the origin of the signals first. > > My data are biomedical signals from patients of different age groups > and genders. Its varies greatly due to medical conditions and age. I > am trying to classfy the signal into several categories and to see if > there is any causal relationship between signal classes and disease or > age groups. > > I have a lot of these signals (about a million signals, each with a > frame length of 1024) > > By inspecting the some of the signals in the time domain, here is my > impression > > 1. Some signals have drift -- maybe a constant drift. Some signals > may have zero drift > > 2. Signals might be classified into the following cateories (just by > inspection in the time domain) > Category (1) Almost a straight line with some noise. > Category (2) Brownian motions (random walk) overlaid with a drift. > Category (3) An overlay of harmonic waves of several frequency with a > zero or nonzero drift. > > I guess I should re-formulate the questions as: > > Question 1. Is there a systematic approach (or perspectives) to > classify signals. > > (1A) For instance, if I use the Fourier perspective, what are the > characteristics in the fourier domain for Category(1) and Category(2) > > (1B) If I use the Random walk perspective, is there any characteristics > of Random walk that can differentiate Category(1) (2) and (3) > > Question 2. How do I fit a random walk signal to extract its drift and > the std deviation of the normally distributed increments. > > Question 3. How do I extract drift if the signal is an overlay of > several harmonics with a constant drift > > In summary, the following cases would have an increasing degree of > "wavyness" > (4A) Straight line -- "ZERO wavyness" > (4B) Random walk coupled with a zero or a constant drift -- "some > wavyness" > (4B) Overlay of several harmonics with a zero or a constant drift -- "a > lot of wavyness" > > These are just from my "mental classifier". I feel that using > "wavyness" appears to be a good measure. My goal is to classify the > signals with a consistent measure -- using the measures raised in > Questions 1, 2 and 3 >
Something that has not been mentioned yet: It is not uncommon to get data that has some straight-line offset + drift just from the sensor. With a finite chunk of data such as you have, often the best thing to do is to fit a straight line to the data and subtract it out. In your case, the RMS value of the residue would be a direct measure of "wigglieness", but wouldn't distinguish between a sine wave and a random walk. If you subtract out your straight line then window the data and perform an FFT, periodic signals should have their energy concentrated in a few bins, a true random walk would have the energy concentrated toward DC, and white noise would spread the energy out over most of the spectrum. Were I you, I would write a framework in my chosen analysis package (I use SciLab, MatLab or Octave are also quite suitable), then I would choose several identification strategies, starting with just a measure of the residual RMS energy after subtracting out the straight line. I would run the data through, and compare the resulting numbers against my database of age/disease/etc. and look for correlations. If I could, I would automate this process, so that whenever someone said "Hey! What about using Murgatroyd's Gfligporb Transform!" I could just plug it in and give it a whirl without disturbing the rest of my code. I would also study the sensor used, to see if I had any opportunity to distinguish sensor artifacts from real test data. Then I would study the test protocol to see if there are any common artifacts that one could expect there (I understand that the sloping baseline thing is common with biomedical measurements). Then I would sit down and try to think of what other effects may be happening to disturb the measurements. Then I would cook up ways to filter those effects out of my data. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com Posting from Google? See http://cfaj.freeshell.org/google/ "Applied Control Theory for Embedded Systems" came out in April. See details at http://www.wescottdesign.com/actfes/actfes.html
In article <CA%9h.215$Py2.79@newssvr27.news.prodigy.net>,
Vladimir Vassilevsky  <antispam_bogus@hotmail.com> wrote:
>The straightforward way to get some numbers is to find the peaks of the >Fourier chart and compare them one to another and to the average level.
Another possible approach would sum the signal power between two frequency limits (maybe expressing this as a proportion of total power). It sounds like the lower frequency limit might be close to 0 (DC) in this case. Francis
Tim Wescott wrote:
> lluum@yahoo.com wrote: > > Guys, thank you very much for your enthusiastic responses. Really > > appreciate your help. > > > > I guess I should have specified the origin of the signals first. > > > > My data are biomedical signals from patients of different age groups > > and genders. Its varies greatly due to medical conditions and age. I > > am trying to classfy the signal into several categories and to see if > > there is any causal relationship between signal classes and disease or > > age groups. > > > > I have a lot of these signals (about a million signals, each with a > > frame length of 1024) > > > > By inspecting the some of the signals in the time domain, here is my > > impression > > > > 1. Some signals have drift -- maybe a constant drift. Some signals > > may have zero drift > > > > 2. Signals might be classified into the following cateories (just by > > inspection in the time domain) > > Category (1) Almost a straight line with some noise. > > Category (2) Brownian motions (random walk) overlaid with a drift. > > Category (3) An overlay of harmonic waves of several frequency with a > > zero or nonzero drift. > > > > I guess I should re-formulate the questions as: > > > > Question 1. Is there a systematic approach (or perspectives) to > > classify signals. > > > > (1A) For instance, if I use the Fourier perspective, what are the > > characteristics in the fourier domain for Category(1) and Category(2) > > > > (1B) If I use the Random walk perspective, is there any characteristics > > of Random walk that can differentiate Category(1) (2) and (3) > > > > Question 2. How do I fit a random walk signal to extract its drift and > > the std deviation of the normally distributed increments. > > > > Question 3. How do I extract drift if the signal is an overlay of > > several harmonics with a constant drift > > > > In summary, the following cases would have an increasing degree of > > "wavyness" > > (4A) Straight line -- "ZERO wavyness" > > (4B) Random walk coupled with a zero or a constant drift -- "some > > wavyness" > > (4B) Overlay of several harmonics with a zero or a constant drift -- "a > > lot of wavyness" > > > > These are just from my "mental classifier". I feel that using > > "wavyness" appears to be a good measure. My goal is to classify the > > signals with a consistent measure -- using the measures raised in > > Questions 1, 2 and 3 > > > Something that has not been mentioned yet: It is not uncommon to get > data that has some straight-line offset + drift just from the sensor. > With a finite chunk of data such as you have, often the best thing to do > is to fit a straight line to the data and subtract it out. In your > case, the RMS value of the residue would be a direct measure of > "wigglieness", but wouldn't distinguish between a sine wave and a random > walk. > > If you subtract out your straight line then window the data and perform > an FFT, periodic signals should have their energy concentrated in a few > bins, a true random walk would have the energy concentrated toward DC, > and white noise would spread the energy out over most of the spectrum. > > Were I you, I would write a framework in my chosen analysis package (I > use SciLab, MatLab or Octave are also quite suitable), then I would > choose several identification strategies, starting with just a measure > of the residual RMS energy after subtracting out the straight line. I > would run the data through, and compare the resulting numbers against my > database of age/disease/etc. and look for correlations. If I could, I > would automate this process, so that whenever someone said "Hey! What > about using Murgatroyd's Gfligporb Transform!" I could just plug it in > and give it a whirl without disturbing the rest of my code. > > I would also study the sensor used, to see if I had any opportunity to > distinguish sensor artifacts from real test data. Then I would study > the test protocol to see if there are any common artifacts that one > could expect there (I understand that the sloping baseline thing is > common with biomedical measurements). Then I would sit down and try to > think of what other effects may be happening to disturb the > measurements. Then I would cook up ways to filter those effects out of > my data.
Once offsets and drifts are removed via a linear fit, unsupervised learning via clustering and PCA can be used to visualize the separability of the classes. Then neural networks using NEWFF and NEWRB can be used to create classifiers. If you decide to use neural networks see my post on pretraining advice. Hope this helps. Greg
<lluum@yahoo.com> wrote in message 
news:1164426337.856632.190440@14g2000cws.googlegroups.com...
> Guys, thank you very much for your enthusiastic responses. Really > appreciate your help. > > I guess I should have specified the origin of the signals first. > > My data are biomedical signals from patients of different age groups > and genders. Its varies greatly due to medical conditions and age. I > am trying to classfy the signal into several categories and to see if > there is any causal relationship between signal classes and disease or > age groups. > > I have a lot of these signals (about a million signals, each with a > frame length of 1024) > > By inspecting the some of the signals in the time domain, here is my > impression > > 1. Some signals have drift -- maybe a constant drift. Some signals > may have zero drift > > 2. Signals might be classified into the following cateories (just by > inspection in the time domain) > Category (1) Almost a straight line with some noise. > Category (2) Brownian motions (random walk) overlaid with a drift. > Category (3) An overlay of harmonic waves of several frequency with a > zero or nonzero drift. > > I guess I should re-formulate the questions as: > > Question 1. Is there a systematic approach (or perspectives) to > classify signals.
***There are. I believe all or most of them, including using neural nets, depend on using a basis set or signal model or .... something like that. Then you look for a "fit" to a weighted version of the basis functions. This can be in the time domain or it can be in the frequency domain. For example, you can correlate in time with a sinusoid or you can multiply in time by a narrow bandpass filter to yield the same detection result if you're looking for a sinusoid. Sums of sinusoids can be treated similarly. Of course, the Fourier Transform is rather the ultimate determiner of sinusoidal coefficients - a tour de force in that regard.
> > (1A) For instance, if I use the Fourier perspective, what are the > characteristics in the fourier domain for Category(1) and Category(2)
***You should be able to construct a test signal to see the result for yourself. In the magnitudes of the FFT there will be a large zero-frequency component with low-frequency components tapering as frequency increases - for a straight line or and "almost" straight line - and maybe some fairly flat (on the average) values representing the noise.
> > (1B) If I use the Random walk perspective, is there any characteristics > of Random walk that can differentiate Category(1) (2) and (3)
***I don't know what you mean by a "Random walk perspective". I would equate Random walk and white Gaussian noise so I don't know what a "white Gaussian noise perspective" would be. That said: (1) and (2) will look similar perhaps with more spread energy in (2) based on what you seem to be emphasizing in (2). (3) will show peaks of energy at distinct frequencies - if the noise isn't too great.
> > Question 2. How do I fit a random walk signal to extract its drift and > the std deviation of the normally distributed increments.
***Like any other signal. As I mentioned in an earlier post and others have mentioned: You can fit a straight line to the data if you imagine a steady drift - and subtract it out. You can compute a standard deviation after having done this subtraction. To go further: - you can remove the mean from the model f(t)= Mean + zero1(t) - you can remove a straight line from the model f(t)= mt + Mean + zero2(t) - you can remove a 2nd order curve from the model f(t)=nt^2 + mt + zero3(t) and so forth.... zerox(t)will go toward zero as the order gets high enough.
> > Question 3. How do I extract drift if the signal is an overlay of > several harmonics with a constant drift
***As above. You decide which order model is reasonable for the drift. Or get wild and assume all orders up to some limit and compute them all. You can compare the results to see what is helpful/useful. What you *can't* do is remove some "thing" you've decide to call "drift" without knowing what it is. So, this approach allows you to remove something that may be *like* what you would call drift and give a useful result.
> > In summary, the following cases would have an increasing degree of > "wavyness" > (4A) Straight line -- "ZERO wavyness"
***OK. This one makes sense. The energy will be at the low end in frequency.
> (4B) Random walk coupled with a zero or a constant drift -- "some > wavyness"
***The energy will be evenly distributed in frequency on the average but not a straight line
> (4B) Overlay of several harmonics with a zero or a constant drift -- "a > lot of wavyness"
***The energy will be clumped in frequency - as in perhaps spikes.
> > These are just from my "mental classifier". I feel that using > "wavyness" appears to be a good measure. My goal is to classify the > signals with a consistent measure -- using the measures raised in > Questions 1, 2 and 3
***OK, so your quest may be dealt with by computing an FFT and then asking the question (testing for) "is the result characterized by low-frequency energy only?, uniformly distributed energy? or more peaky energy? A simple classifier might subtract the mean of the frequency magnitudes, set a threshold above zero and count the number of incidents (or the sum of the magnitudes) where the remaining magnitudes exceed the threshold - over all frequencies and just at low frequencies and/or just above the low frequencies. If the threshold is set appropriately and if the signal to noise ratio is reasonable then: - a flat line with drift will have a high score at low frequencies only. - a noiselike signal will have a low score - a sinusoidal composite will have a higher score above the low frequencies. There are obviously a few parameters to set including: - the length of the FFTs - the threshold for detection/counting/summing etc. Fred
Fred Marshall skrev:
> <lluum@yahoo.com> wrote in message >> > Question 1. Is there a systematic approach (or perspectives) to > > classify signals. > > ***There are. I believe all or most of them, including using neural nets, > depend on using a basis set or signal model or .... something like that.
All signal analysis methods -- all -- depend on the user specifying in advance what "basis set" or "signal model" to evaluate the data against, as well as a strategy for the implementation (neural net, DFT, something else). The OP is asking whether these choises of signal models and analysis strategies can be delegated to the computer. They can not. Rune