DSPRelated.com
Forums

DFT with few samples

Started by MJ1978 February 4, 2007
Hello, comp.dsp friends,

I'm using DFT with few samples (5 or so) to determine how much
frequency information is contained. Based on my not-so-rigorous
defintion, I use area of the frequency domain magnitude to measure the
information. Therefore, [1 1 1 1 1] contains the least information,
and [0 0 1 0 0] contains the most.

My problem is that DFT with few samples can't give precise results. I
searched the web and comp.dsp, and I found that:

(1) Warped DFT might help in analyzing a short sequence. However, it
looks somewhat complex, and I don't want to spend the time coding it
before I'm sure that no other methods can be used.

(2) A thread in this group named "DFT and FFT as estimates of Fourier
Series Coefficients" seemed to cover the same problem. Specifically,
Eric Bujold mentioned that one can mirror the sequence to get better
results. However, in some cases, it still can't produce the most
correct results, which I can get (but not always) by using freqz( ) in
Matlab. I know that even Mr. Bujold didn't say that it will guarantee
something, but if I can't get correct results, I'd rather not use it.

freqz( ) in Matlab seems to do DFT with zero padding. But zero padding
will make [1 1 1 1 1] no more minimal in terms of area of the
frequency domain area. There are other 5-sample sequences whose area
are smaller.

I believe that there should be some method which can deal with my
problem, but I just can't find it. I will appreciate any comment or
reference. Thanks for your time :-)

Sincerely,
Merlin

"MJ1978" <merlin.jiang1978@gmail.com> wrote in message
news:1170590862.020782.318060@q2g2000cwa.googlegroups.com...
> Hello, comp.dsp friends,
> (2) A thread in this group named "DFT and FFT as estimates of Fourier > Series Coefficients"
Only if the signal is periodic. If it is a-periodic then there is no Fourier series (only the FT exists). F. -- Posted via a free Usenet account from http://www.teranews.com
Merlin wrote:
> Hello, comp.dsp friends, > > I'm using DFT with few samples (5 or so) to determine how much > frequency information is contained. Based on my not-so-rigorous > defintion, I use area of the frequency domain magnitude to measure the > information. Therefore, [1 1 1 1 1] contains the least information, > and [0 0 1 0 0] contains the most.
The smaller the area the higher the "information"?
> > My problem is that DFT with few samples can't give precise results.
I think your problem is elsewhere. Your measure for "information" content as a way to rank signals is essentially the power of a signal - ie. if a signal x1 has greater power than a signal x2, x1 will by your definition also contain more (or less?) "information".
> I believe that there should be some method which can deal with my > problem, but I just can't find it. I will appreciate any comment or > reference. Thanks for your time :-)
What exactly is the problem? What is this information measure supposed to accomplish? Regards, Andor

Merlin wrote:

> > > I'm using DFT with few samples (5 or so) to determine how much > > > frequency information is contained. Based on my not-so-rigorous > > > defintion, I use area of the frequency domain magnitude to measure the > > > information. Therefore, [1 1 1 1 1] contains the least information, > > > and [0 0 1 0 0] contains the most. > > > > The smaller the area the higher the "information"? > > By my definition, the smaller the area of the frequency domain magnitude, the > lower the information. [1 1 1 1 1] will result in a single non-zero term in the > frequency domain, while [0 0 1 0 0] will result in 5 non-zero terms, if I use > 5-point DFT. Therefore [1 1 1 1 1] contains less information than [0 0 1 0 0] > does.
Ok, I thought those samples represented magnitude in frequency domain.
> > > My problem is that DFT with few samples can't give precise results. > > > I think your problem is elsewhere. Your measure for "information" > > content as a way to rank signals is essentially the power of a signal > > - ie. if a signal x1 has greater power than a signal x2, x1 will by > > your definition also contain more (or less?) "information". > > The accuracy of my definition of "information" is not my problem here. > My issue is that I can't find good ways to do DFT with few samples. > Well, maybe my definition of "information" is bad, but it is another issue.
Ok, let's keep aside your definition of "information". A DFT is a well defined operation - there are no "good" or "bad" ways to calculate a DFT (well there are, but the results are the same nevertheless). You probably want to do frequency estimation, and there are other, more efficient (meaning you need less samples) methods than the DFT, but they are based on assuming that your signal contains a specific number of sinusoids in noise, or some other tightly specified scenario. These are called model based methods. Although with only 5 samples you can't expect much significance in your results.
> > > I believe that there should be some method which can deal with my > > > problem, but I just can't find it. I will appreciate any comment or > > > reference. Thanks for your time :-) > > > > What exactly is the problem? What is this information measure supposed > > to accomplish? > > Sorry for not expressing myself clearly, but as stated in my second paragraph, > my problem is how to get good results in DFT with few samples. On the other > hand, my information measure is supposed to provide a guide as how much > information except DC exists in the short sequence of signals. I don't want to > use the summation of squared magnitude in the frequency domain, because in > this way, the information is determined by the absolute values of the samples > (by Parseval's theorem). [0.5 0.5 0.5 0.5 0.5] should not provide more > information than [0.1 0.1 0.2 0.1 0.1]. So I choose to use the summation of the > magnitude in the frequency domain, i.e. the area of the frequency domain > magnitide.
I first thought you wanted to use the sum of magnitudes squared but you are interested in the sum of magnitudes as a measure. However, you still haven't specified what you want to do. What does "good result in DFT" mean to you? Good in what respect? What do you mean by "how much information except DC exists in the short sequence of signals"? What information? Why except DC (this is new, you didn't mention it before)? I'm sorry, but I can't help you unless you come out and say what you want directly (without using the terms "information" and "good", because they probably have completly different meaning for me and you). Regards, Andor
Hello,

Thanks for pointing it out. My main purpose is to mention that thread
such that others might want to take a look at it afterwards.

Sincerely,
Merlin

On 2=A4=EB5=A4=E9, =A4W=A4=C85=AE=C906=A4=C0, "Fitlike Min" <Fitl...@naeopt=
ion.com> wrote:
> "MJ1978" <merlin.jiang1...@gmail.com> wrote in message > > news:1170590862.020782.318060@q2g2000cwa.googlegroups.com... > > > Hello, comp.dsp friends, > > (2) A thread in this group named "DFT and FFT as estimates of Fourier > > Series Coefficients" > > Only if the signal is periodic. If it is a-periodic then there is no Four=
ier
> series (only the FT exists). > > F. > > -- > Posted via a free Usenet account fromhttp://www.teranews.com
On 2=A4=EB5=A4=E9, =A4U=A4=C84=AE=C956=A4=C0, "Andor" <andor.bari...@gmail.=
com> wrote:
> Ok, let's keep aside your definition of "information". A DFT is a well > defined operation - there are no "good" or "bad" ways to calculate a > DFT (well there are, but the results are the same nevertheless). You > probably want to do frequency estimation, and there are other, more > efficient (meaning you need less samples) methods than the DFT, but > they are based on assuming that your signal contains a specific number > of sinusoids in noise, or some other tightly specified scenario. These > are called model based methods. Although with only 5 samples you can't > expect much significance in your results.
Yes, what I'm trying to do is spectral analysis/estimation, and DFT is the simplest among the estimation methods I know.
> I first thought you wanted to use the sum of magnitudes squared but > you are interested in the sum of magnitudes as a measure. However, you > still haven't specified what you want to do. What does "good result in > DFT" mean to you? Good in what respect?
Good results in DFT are accurate ones. Well, since DFT is simply one of the methods for spectral analysis, I should say that good results of one method mean that they are closer to results of freqz( ) in Matlab than those of other methods, except in the case of [1 1 1 1 1]. As I've guessed, freqz( ) uses DFT with zero-padding to generate smooth magnitude and phase responses. In my application, the result of freqz( ) stands for the correct answer of the spectra for my 5-sample sequence, except [1 1 1 1 1]. In other words, I use freqz( ) to check if I get the correct answer from any spectral analysis method, which is DFT w/ sample mirroring but w/o zero padding so far. But since I only have 5 samples in each sequence, I will accept a method if it can provide me accurate "relative" results. An accurate relative result means that if A's area is larger than B's by using freqz( ), A's area is still larger than B's by using another method, where A and B are both 5-sample sequences. Of course, I still require that [1 1 1 1 1] has the smallest area as other DC-only sequences, no matter what the spectral analysis method is.
> > What do you mean by "how much information except DC exists in the > short sequence of signals"? What information? Why except DC (this is > new, you didn't mention it before)?
(1) The information is the area of the magnitude response. Before applying DFT, I do DC normalization such that every spectrum will have 1 in the DC term. Therefore, DC means nothing in the area summation. (2) In my application, DC doesn't provide any information. (Sorry but I have to use the term "information." It is an application-specific term in this post, and I define it by my own as I mentioned in (1).) So I will treat [1 1 1 1 1] the same as [0 0 0 0 0].
> > I'm sorry, but I can't help you unless you come out and say what you > want directly (without using the terms "information" and "good", > because they probably have completly different meaning for me and > you).
I didn't mean to mislead people like you who want to help, but I seemed to mix too many application-specific terms in my post. Maybe my problem should be restated as "How to do spectral analysis with only few samples?" I browsed the spectral analysis tools provided by Matlab and found that the modified covariance method (MCM) produces accurate "relative" results in most but not all cases for my application. Is MCM one of the model based methods, as you've mentioned? If you're interested in knowing exactly why I define "the information" in that way, I'm sorry to say that I'm also experimenting on it. I need some sort of measurement to let me know what a short sequence has in the frequency domain (or other domains, if I can figure them out). I choose the one I mention here and I'm still working on it. Thanks for your precious time :-) Sincerely, Merlin
On 2=A4=EB5=A4=E9, =A4U=A4=C84=AE=C956=A4=C0, "Andor" <andor.bari...@gmail.=
com> wrote:
> Ok, let's keep aside your definition of "information". A DFT is a well > defined operation - there are no "good" or "bad" ways to calculate a > DFT (well there are, but the results are the same nevertheless). You > probably want to do frequency estimation, and there are other, more > efficient (meaning you need less samples) methods than the DFT, but > they are based on assuming that your signal contains a specific number > of sinusoids in noise, or some other tightly specified scenario. These > are called model based methods. Although with only 5 samples you can't > expect much significance in your results.
Yes, what I'm trying to do is spectral analysis/estimation, and DFT is the simplest among the estimation methods I know.
> I first thought you wanted to use the sum of magnitudes squared but > you are interested in the sum of magnitudes as a measure. However, you > still haven't specified what you want to do. What does "good result in > DFT" mean to you? Good in what respect?
Good results in DFT are accurate ones. Well, since DFT is simply one of the methods for spectral analysis, I should say that good results of one method mean that they are closer to results of freqz( ) in Matlab than those of other methods, except in the case of [1 1 1 1 1]. As I've guessed, freqz( ) uses DFT with zero-padding to generate smooth magnitude and phase responses. In my application, the result of freqz( ) stands for the correct answer of the spectra for my 5-sample sequence, except [1 1 1 1 1]. In other words, I use freqz( ) to check if I get the correct answer from any spectral analysis method, which is DFT w/ sample mirroring but w/o zero padding so far. But since I only have 5 samples in each sequence, I will accept a method if it can provide me accurate "relative" results. An accurate relative result means that if A's area is larger than B's by using freqz( ), A's area is still larger than B's by using another method, where A and B are both 5-sample sequences. Of course, I still require that [1 1 1 1 1] has the smallest area as other DC-only sequences, no matter what the spectral analysis method is.
> > What do you mean by "how much information except DC exists in the > short sequence of signals"? What information? Why except DC (this is > new, you didn't mention it before)?
(1) The information is the area of the magnitude response. Before applying DFT, I do DC normalization such that every spectrum will have 1 in the DC term. Therefore, DC means nothing in the area summation. (2) In my application, DC doesn't provide any information. (Sorry but I have to use the term "information." It is an application-specific term in this post, and I define it by my own as I mentioned in (1).) So I will treat [1 1 1 1 1] the same as [0 0 0 0 0].
> > I'm sorry, but I can't help you unless you come out and say what you > want directly (without using the terms "information" and "good", > because they probably have completly different meaning for me and > you).
I didn't mean to mislead people like you who want to help, but I seemed to mix too many application-specific terms in my post. Maybe my problem should be restated as "How to do spectral analysis with only few samples?" I browsed the spectral analysis tools provided by Matlab and found that the modified covariance method (MCM) produces accurate "relative" results in most but not all cases for my application. Is MCM one of the model based methods, as you've mentioned? If you're interested in knowing exactly why I define "the information" in that way, I'm sorry to say that I'm also experimenting on it. I need some sort of measurement to let me know what a short sequence has in the frequency domain (or other domains, if I can figure them out). I choose the one I mention here and I'm still working on it. Thanks for your precious time :-) Sincerely, Merlin
This is a typical problem that AR model tries to solve. With few
samples, DFT lacks capability to extract frequency information. AR
Model (Auto-Regressive) model will be able to do better job. One of
the methods to get the estimation to an AR model is called Maximum
Entropy method.

Try google search "maximum entropy spectrum analysis"

http://mathworld.wolfram.com/MaximumEntropyMethod.html


Merlin wrote:
> On 2=E6=9C=885=E6=97=A5, =E4=B8=8B=E5=8D=884=E6=99=8256=E5=88=86, "Andor"=
wrote:
> > > Ok, let's keep aside your definition of "information". A DFT is a well > > defined operation - there are no "good" or "bad" ways to calculate a > > DFT (well there are, but the results are the same nevertheless). You > > probably want to do frequency estimation, and there are other, more > > efficient (meaning you need less samples) methods than the DFT, but > > they are based on assuming that your signal contains a specific number > > of sinusoids in noise, or some other tightly specified scenario. These > > are called model based methods. Although with only 5 samples you can't > > expect much significance in your results. > > Yes, what I'm trying to do is spectral analysis/estimation, and DFT is > the simplest among the estimation methods I know. > > > I first thought you wanted to use the sum of magnitudes squared but > > you are interested in the sum of magnitudes as a measure. However, you > > still haven't specified what you want to do. What does "good result in > > DFT" mean to you? Good in what respect? > > Good results in DFT are accurate ones. Well, since DFT is simply one > of the methods for spectral analysis, I should say that good results > of one method mean that they are closer to results of freqz( ) in > Matlab than those of other methods, except in the case of [1 1 1 1 1].
Ok, that's a fair definition of good - as close to the result of freqz() as possible. This begs the question (sorry Jerry): if you like the results of freqz() (ie. zero-padded DFT), why don't you use it?
> As I've guessed, freqz( ) uses DFT with zero-padding to generate > smooth magnitude and phase responses. In my application, the result of > freqz( ) stands for the correct answer of the spectra for my 5-sample > sequence, except [1 1 1 1 1]. In other words, I use freqz( ) to check > if I get the correct answer from any spectral analysis method, which > is DFT w/ sample mirroring but w/o zero padding so far.
DFT with sample mirroring is DCT. I'm not sure what this is supposed to buy you.
> But since I only have 5 samples in each sequence, I will accept a > method if it can provide me accurate "relative" results. An accurate > relative result means that if A's area is larger than B's by using > freqz( ), A's area is still larger than B's by using another method, > where A and B are both 5-sample sequences. Of course, I still require > that [1 1 1 1 1] has the smallest area as other DC-only sequences, no > matter what the spectral analysis method is.
I still don't understand this sentence. [1 1 1 1 1] has "area" equal to 5 (the DC value). Obviously, 5*[1 1 1 1 1] has 5 times this "area". This is not good?
> > > > > What do you mean by "how much =C2=A0information except DC exists in the > > short sequence of signals"? What information? Why except DC (this is > > new, you didn't mention it before)? > > (1) The information is the area of the magnitude response. Before > applying DFT, I do DC normalization such that every spectrum will have > 1 in the DC term. Therefore, DC means nothing in the area summation.
Well no, it means 1 in the summation. Depending on what else you have in the spectrum, this could well be dominant. Have you tried subtracting the DC value (ie. just ignoring the DC bin in the calculation of the area)? What happens if the DC is zero (or very small, ie. some low-cut spectrum)? Normalizing by this value would blow the "area" of the spectrum up disproportionally.
> > (2) In my application, DC doesn't provide any information. (Sorry but > I have to use the term "information." It is an application-specific > term in this post, and I define it by my own as I mentioned in (1).) > So I will treat [1 1 1 1 1] the same as [0 0 0 0 0].
You lost me. I really haven't the faintest clue what you are trying to achieve. Regards, Andor
Hello Andor and DigitalSignal,

Let me try to explain my purpose for asking this question in an
example:

I have five 5-sample sequences. Each sample has its value in the range
from 0 to 31.

s1 = [20 20 20 20 20];
s2 = [30 30 30 30 30];
s3 = [10 20 10 20 10];
s4 = [15  5 15  5 15];
s5 = [5  15  5 15  5];

My purpose is to determine which contains the least frequency
information. And I define frequency information as the magnitude area
in the frequency domain.

There's no good or bad information. All I want to know is which one
contains the least information. The exact amount of the information
doesn't matter that much.

Now I need a method to estimate the spectra of the five sequences, in
order to know their area of magnitude responses.

Since the sequences are too short, I can't rely on 5-point DFT. The
other choices I know are:
(1) DFT with zero padding: It interpolates the magnitude responses.
But it transforms s1 and s2 to sinc responses. Because s1 and s2 are
DC-only sequences, I don't want non-zero terms except on DC.
(2) DFT with sample mirroring: It also has the effect of interpolating
the magnitude response, while s1 and s2 won't be transformed to sinc
responses. But the frequency resolution is still too low to give
accurate results or accurate relative results.
(3) Modified covariance method and Burg method (w/ order 2): As
suggested by Matlab, they are good estimation methods for short
sequences. However,
         Modified covariance       Burg
 X      max(pmcov(X,2,512))     max(pburg(X,2,512)
----------------------------------------------------------------------------------------------------------
s1         3.6699E+17              NaN (divided by zero error
internally)
s2         4.8932E+17              NaN (divided by zero error
internally)
s3                0                           0
s4         1.2037E+17                  0
s5         1.3832E+18                  0

(pmcov( ) and pburg( ) are the functions for modified covariance
method and Burg method in Matlab, respectively.)

The results for the five sequences make me wonder if I should use any
of the two methods for spectral analysis.

So I raise my question: How to do spectral analysis with few samples?
Is there a method which can overcome all the disadvantages that I
mention above?

Assume that I find such a method. And I'll expect my answer for the
five sequences as:
s1 and s2 contain the least information. Their information amounts are
the same.
s4 and s5 contain the most information. Their information amounts are
the same.

> > As I've guessed, freqz( ) uses DFT with zero-padding to generate > > smooth magnitude and phase responses. In my application, the result of > > freqz( ) stands for the correct answer of the spectra for my 5-sample > > sequence, except [1 1 1 1 1]. In other words, I use freqz( ) to check > > if I get the correct answer from any spectral analysis method, which > > is DFT w/ sample mirroring but w/o zero padding so far. > > DFT with sample mirroring is DCT. I'm not sure what this is supposed > to buy you.
It is better than zero-padding, since it transforms s1 and s2 as non- zero on DC and zero elsewhere.
> > > But since I only have 5 samples in each sequence, I will accept a > > method if it can provide me accurate "relative" results. An accurate > > relative result means that if A's area is larger than B's by using > > freqz( ), A's area is still larger than B's by using another method, > > where A and B are both 5-sample sequences. Of course, I still require > > that [1 1 1 1 1] has the smallest area as other DC-only sequences, no > > matter what the spectral analysis method is. > > I still don't understand this sentence. [1 1 1 1 1] has "area" equal > to 5 (the DC value). Obviously, 5*[1 1 1 1 1] has 5 times this "area". > This is not good?
As for information, I don't determine whether it is good or bad. Although 5*[1 1 1 1 1] has a larger area than [1 1 1 1 1], I treat them as containing the same amount of information, because they both are DC-only.
> > (1) The information is the area of the magnitude response. Before > > applying DFT, I do DC normalization such that every spectrum will have > > 1 in the DC term. Therefore, DC means nothing in the area summation. > > Well no, it means 1 in the summation. Depending on what else you have > in the spectrum, this could well be dominant. Have you tried > subtracting the DC value (ie. just ignoring the DC bin in the > calculation of the area)? What happens if the DC is zero (or very > small, ie. some low-cut spectrum)? Normalizing by this value would > blow the "area" of the spectrum up disproportionally.
You are right. But I have to find a way to normalize the magnitude responses such that s1 and s2 are the same in the frequency domain. Maybe I should simply subtract the mean values for all sequences before I do any transform?
> > (2) In my application, DC doesn't provide any information. (Sorry but > > I have to use the term "information." It is an application-specific > > term in this post, and I define it by my own as I mentioned in (1).) > > So I will treat [1 1 1 1 1] the same as [0 0 0 0 0]. > > You lost me. I really haven't the faintest clue what you are trying to > achieve.
I'm not sure if you know what I want better now. Thank you for your precious time and kind help. At least I know that my communication skill is @#*$&^ and I'll try to improve it. Sincerely, Merlin