# vector quantization problem/question

Started by December 1, 2005
```Hi

I am working on a project about noise suppression. Part of the algorithm
that I am designing is based on vector quantization. I am wondering which
training algorithm I should use?

Here is an example:

If I have a 3D-vector (x,y,z) and each coordinate of this vector is random
but limited to an interval from say 0 to C where C is a constant, the
directional variance will not be more or less the same when choosing
arbitrary directions.

This is a problem if I know that the "true" centroid (codeword) is located
on the edge of the 3D-box. If I use the k-means algorithm to calculate the
centroids I won't get the optimal result, because the mass of points within
the box will "pull" on the centroids while there is nothing outside the box
to "pull" in the other direction.....So I am wondering how do I solve this
problem?

Thank you.

```
```"John" <joehatesspam@nospam.spamshit> writes:

> I am working on a project about noise suppression. Part of the algorithm
> that I am designing is based on vector quantization. I am wondering which
> training algorithm I should use?
>
> Here is an example:
>
> If I have a 3D-vector (x,y,z) and each coordinate of this vector is random
> but limited to an interval from say 0 to C where C is a constant, the
> directional variance will not be more or less the same when choosing
> arbitrary directions.

Careful!  Unless you're choosing uniformly distributed _angles_ the
directions you choose will not be uniform (arbitrary).  The way you've
described it, the directions of the vectors will not be arbitrary (for
a start, they're all in the positive [x>0, y>0, z>0]  quadrant).

> This is a problem if I know that the "true" centroid (codeword) is located
> on the edge of the 3D-box. If I use the k-means algorithm to calculate the
> centroids I won't get the optimal result, because the mass of points within
> the box will "pull" on the centroids while there is nothing outside the box
> to "pull" in the other direction.....So I am wondering how do I solve this
> problem?

If you know something about the distribution of the way the points are
pulled away from the true mean, they you might be able to use that
information.

Ciao,

Peter K.

```
```Hi

I have uploaded the estimated probability density function for each
coefficient in the 10 dimensional observation vector. The coefficients are
LSF-coefficients and as you can see the distribution neither looks like a
gaussian or uniform distribution...Is it possible to find the most probable
vectors based on the estimated distributions?

Based on the peaks in the 10 distributions I would say that the most
probable vector is

[0.29,0.58,0.86,1.14,1.43,1.7,2,2.28,2.57,2.85]

but the time-information is lost when making these probability density
functions....so how would I know that the coefficients of the vector occur
at the same time ?

Lots of questions.... :o) hope you can help me.....

Here is the matlab-figure:

http://users.cybercity.dk/~dsl159353/pdfLSF.fig

Or the JPG-equivalent if you want to see that:

http://users.cybercity.dk/~dsl159353/pdfLSF.jpg

Thank you...

```
```John wrote:
>
>

You're welcome.

> I have uploaded the estimated probability density function for each
> coefficient in the 10 dimensional observation vector. The coefficients are
> LSF-coefficients and as you can see the distribution neither looks like a
> gaussian or uniform distribution...Is it possible to find the most probable
> vectors based on the estimated distributions?

As you say below, the mode (or some other measure of central tendancy)
might be a useful measure.  However, generally it doesn't contain much
information.  One way to proceed would be to remove (subtract) it from
the coefficients you have and look at the variability remaining.

However, this doesn't tackle the main problem: that the LSFs will vary
over time, and you probably want to figure out what that variability
is.

> Based on the peaks in the 10 distributions I would say that the most
> probable vector is
>
> [0.29,0.58,0.86,1.14,1.43,1.7,2,2.28,2.57,2.85]
>

That vector is just the mode (a form of average, though not the usual
one http://en.wikipedia.org/wiki/Average).

> but the time-information is lost when making these probability density
> functions....so how would I know that the coefficients of the vector occur
> at the same time ?

Well, you wouldn't. :-)

The only way to see this is to partition the data you have and only
look at the LSFs in a particular time period (in the same way you are
now; the matlab plots below) and see if that tells you anything.

> Lots of questions.... :o) hope you can help me.....
>
> Here is the matlab-figure:
>
> http://users.cybercity.dk/~dsl159353/pdfLSF.fig
>
> Or the JPG-equivalent if you want to see that:
>
> http://users.cybercity.dk/~dsl159353/pdfLSF.jpg
>
> Thank you...

Ciao,

Peter K.

```
```Hi again Peter...

I think I get what you are saying....you are saying that I should
"plot" a time-dependent mode-vector, right??

If the time-window is narrow enough then statistics performed on
a sliding time-window will reveal what the state-vectors look like
and it will also show state-changes, right?

Thanks again...
---------------------

> The only way to see this is to partition the data you have and only
> look at the LSFs in a particular time period (in the same way you are
> now; the matlab plots below) and see if that tells you anything.

```
```Yup!

Ciao,

Peter K.

```