Forums

vector quantization problem/question

Started by John December 1, 2005
Hi

I am working on a project about noise suppression. Part of the algorithm 
that I am designing is based on vector quantization. I am wondering which 
training algorithm I should use?

Here is an example:

If I have a 3D-vector (x,y,z) and each coordinate of this vector is random 
but limited to an interval from say 0 to C where C is a constant, the 
directional variance will not be more or less the same when choosing 
arbitrary directions.

This is a problem if I know that the "true" centroid (codeword) is located 
on the edge of the 3D-box. If I use the k-means algorithm to calculate the 
centroids I won't get the optimal result, because the mass of points within 
the box will "pull" on the centroids while there is nothing outside the box 
to "pull" in the other direction.....So I am wondering how do I solve this 
problem?

Thank you.


"John" <joehatesspam@nospam.spamshit> writes:

> I am working on a project about noise suppression. Part of the algorithm > that I am designing is based on vector quantization. I am wondering which > training algorithm I should use? > > Here is an example: > > If I have a 3D-vector (x,y,z) and each coordinate of this vector is random > but limited to an interval from say 0 to C where C is a constant, the > directional variance will not be more or less the same when choosing > arbitrary directions.
Careful! Unless you're choosing uniformly distributed _angles_ the directions you choose will not be uniform (arbitrary). The way you've described it, the directions of the vectors will not be arbitrary (for a start, they're all in the positive [x>0, y>0, z>0] quadrant).
> This is a problem if I know that the "true" centroid (codeword) is located > on the edge of the 3D-box. If I use the k-means algorithm to calculate the > centroids I won't get the optimal result, because the mass of points within > the box will "pull" on the centroids while there is nothing outside the box > to "pull" in the other direction.....So I am wondering how do I solve this > problem?
If you know something about the distribution of the way the points are pulled away from the true mean, they you might be able to use that information. Ciao, Peter K.
Hi

thanks for the answer...

I have uploaded the estimated probability density function for each 
coefficient in the 10 dimensional observation vector. The coefficients are 
LSF-coefficients and as you can see the distribution neither looks like a 
gaussian or uniform distribution...Is it possible to find the most probable 
vectors based on the estimated distributions?

Based on the peaks in the 10 distributions I would say that the most 
probable vector is

[0.29,0.58,0.86,1.14,1.43,1.7,2,2.28,2.57,2.85]

but the time-information is lost when making these probability density 
functions....so how would I know that the coefficients of the vector occur 
at the same time ?

Lots of questions.... :o) hope you can help me.....

Here is the matlab-figure:

http://users.cybercity.dk/~dsl159353/pdfLSF.fig

Or the JPG-equivalent if you want to see that:

http://users.cybercity.dk/~dsl159353/pdfLSF.jpg

Thank you...



John wrote:
> > thanks for the answer... >
You're welcome.
> I have uploaded the estimated probability density function for each > coefficient in the 10 dimensional observation vector. The coefficients are > LSF-coefficients and as you can see the distribution neither looks like a > gaussian or uniform distribution...Is it possible to find the most probable > vectors based on the estimated distributions?
As you say below, the mode (or some other measure of central tendancy) might be a useful measure. However, generally it doesn't contain much information. One way to proceed would be to remove (subtract) it from the coefficients you have and look at the variability remaining. However, this doesn't tackle the main problem: that the LSFs will vary over time, and you probably want to figure out what that variability is.
> Based on the peaks in the 10 distributions I would say that the most > probable vector is > > [0.29,0.58,0.86,1.14,1.43,1.7,2,2.28,2.57,2.85] >
That vector is just the mode (a form of average, though not the usual one http://en.wikipedia.org/wiki/Average).
> but the time-information is lost when making these probability density > functions....so how would I know that the coefficients of the vector occur > at the same time ?
Well, you wouldn't. :-) The only way to see this is to partition the data you have and only look at the LSFs in a particular time period (in the same way you are now; the matlab plots below) and see if that tells you anything.
> Lots of questions.... :o) hope you can help me..... > > Here is the matlab-figure: > > http://users.cybercity.dk/~dsl159353/pdfLSF.fig > > Or the JPG-equivalent if you want to see that: > > http://users.cybercity.dk/~dsl159353/pdfLSF.jpg > > Thank you...
No worres. Keep asking. Ciao, Peter K.
Hi again Peter...

I think I get what you are saying....you are saying that I should
"plot" a time-dependent mode-vector, right??

If the time-window is narrow enough then statistics performed on
a sliding time-window will reveal what the state-vectors look like
and it will also show state-changes, right?

Thanks again...
---------------------

> The only way to see this is to partition the data you have and only > look at the LSFs in a particular time period (in the same way you are > now; the matlab plots below) and see if that tells you anything.
Yup!

Ciao,

Peter K.