# Vector Quantization: Problem with designing codebook

Started by December 30, 2005
```Hi

I am trying to do vector quantization for noise-free speech segments.

The algorithm I have programmed works like this:

1) Update 240-point frame F with pre-emphasized block x of 10 speech
samples.

2) Update estimate of power spectrum P based on updated frame

3) Calculate vector B=[b1 b2 b3 .....b17] where
b1=sum of amplitudes in 1st bark scale
b2=sum of amplitudes in 2nd bark scale
etc.
etc.

4) Divide B by sum of all amplitudes in P
5) Look closest neighbour B' to B up in codebook
6) Calculate estimate P' of P based on B' from codebook
7) Calculate autocorrelation estimate R' based on P'
8) Calculate autocorrelation R based on F
9) Calculate filter coefficients for synthesis filter (Levinson-Durbin
recursion with R' as input)
10) Calculate filter coefficients for whitening filter (Levinson-Durbin
recursion with R as input)
11) Send x thru whitening filter
12) Send output from whitening filter thru synthesis filter

Output from synthesis filter is the original speech if everything works
well.

Problem:

If I replace step 5 and just let B'=B then everything works fine.

The problem is designing a good codebook.

I have tried using k-means algorithm to design a codebook based on 700000
observations of B, but the result is far from impressing.

Maybe it is just impossible to design a good codebook when the dimension of
B is too high (17 in this case). ???

I hope there are some experts out there who can help me with some tips,
hints and tricks ? :o)

I have also tried to encode the spectrum using 10th order LPC-analysis such
that B is a 10-dimensional vector. That didn't work well. I then tried with
a 10th order LSF-vector, but that didn't work well either....

Thanks in advance.

```
```Jake wrote:

> [...]
> I have also tried to encode the spectrum using 10th order LPC-analysis such
> that B is a 10-dimensional vector. That didn't work well. I then tried with
> a 10th order LSF-vector, but that didn't work well either....

What are you doing for the residual?  If you don't encode the residual,
then your compression should be lossless, and thus lead to *perfect*
reconstruction of the original signal.  What exactly is "didn't work
well"??

BTW, this is the standard approach -- I learned the hard way that one
never encodes (VQ) the spectrum directly;  for speech, VQ is always
(yes, I believe with no single exception) combined with some form of
LPC processing.  Then, there's an array of specific techniques leading
to various levels of quality with various levels of compression ratio.

Perhaps the simplest technique could be:  encode the LPC or LSF (LSF
is in general better when used with VQ -- the "centroids" of the
codebook always lead to stable filters), then compute the residual,
split it into chunks of reasonable size (20 or 30 samples), and VQ
each of those chunks (yes, seeing them as 20- or 30-dimensional
vectors).  That should work reasonably well, provided that your
codebooks are sufficiently large -- you should expect more or less
good quality if your bitrate is around 8 or 10 kbps.  Below that,
the encoding has to be really sophisticated if you want a reasonable
quality.

HTH,

Carlos
--
```
```Jake wrote:

> Hi
>
> I am trying to do vector quantization for noise-free speech segments.
>

First question: what are you trying to accomplish? Why reinventing the
wheel? Why can't you just compute the LSFs and vector quantize it like
most of the codecs do?

> The algorithm I have programmed works like this:
>
> 1) Update 240-point frame F with pre-emphasized block x of 10 speech
> samples.
>
> 2) Update estimate of power spectrum P based on updated frame
>
> 3) Calculate vector B=[b1 b2 b3 .....b17] where
> b1=sum of amplitudes in 1st bark scale
> b2=sum of amplitudes in 2nd bark scale
> 4) Divide B by sum of all amplitudes in P
> 5) Look closest neighbour B' to B up in codebook
> 6) Calculate estimate P' of P based on B' from codebook
> 7) Calculate autocorrelation estimate R' based on P'
> 8) Calculate autocorrelation R based on F
> 9) Calculate filter coefficients for synthesis filter (Levinson-Durbin
> recursion with R' as input)
> 10) Calculate filter coefficients for whitening filter (Levinson-Durbin
> recursion with R as input)
> 11) Send x thru whitening filter
> 12) Send output from whitening filter thru synthesis filter
>
> Output from synthesis filter is the original speech if everything works
> well.
>
> Problem:
>
> If I replace step 5 and just let B'=B then everything works fine.
>
> The problem is designing a good codebook.
>
> I have tried using k-means algorithm to design a codebook based on 700000
> observations of B, but the result is far from impressing.

Your codebook design does not seem to be optimized neither for the
perceptual importance nor for the minimum LPC spectral distortion.
Therefore the results are expected to be fair.

>
> Maybe it is just impossible to design a good codebook when the dimension of
> B is too high (17 in this case). ???

BTW, how did you decide on the dimension?

>
> I hope there are some experts out there who can help me with some tips,
> hints and tricks ? :o)
> I have also tried to encode the spectrum using 10th order LPC-analysis such
> that B is a 10-dimensional vector. That didn't work well. I then tried with
> a 10th order LSF-vector, but that didn't work well either....

Keep trying :)
Take a look at the way the LSFs are quantized in G.723.1, G.729 or any
other standard codecs.

Vladimir Vassilevsky

DSP and Mixed Signal Design Consultant

http://www.abvolt.com

```
```> Keep trying :)
> Take a look at the way the LSFs are quantized in G.723.1, G.729 or any
> other standard codecs.
>

Hi Vladimir

Thanks for the answer....I am re-inventing the wheel so I can learn :o)

Can I download G723 and G729 for free somewhere or do I have to pay for it ?

I have free access to IEEE-xplore.

Thanks again ...

And happy new year to you all...

```
```
Jake wrote:

>>Take a look at the way the LSFs are quantized in G.723.1, G.729 or any
>>other standard codecs.
>>
>
> Thanks for the answer....I am re-inventing the wheel so I can learn :o)
>

To my knowledge, the best results for the LPC quantization are somewhere
around 20 bits per vector. They quantize the weighted difference vector
between the current and the previous LSFs. The 10 LSFs are split into
two subvectors of 4 + 6, and each subvector is quantized using the
separate 10 bit codebook.

> Can I download G723 and G729 for free somewhere or do I have to pay for it ?
>
> I have free access to IEEE-xplore.

They use 24 bits (three codebooks of 8 + 8 + 8 bits), and the 10 LSFs
vector is split as 3 + 3 + 4.
I think the descriptions are available from ITU web site. I have seen
the source code somewhere also.
Vladimir Vassilevsky

DSP and Mixed Signal Design Consultant

http://www.abvolt.com

```
```> To my knowledge, the best results for the LPC quantization are somewhere
> around 20 bits per vector.

Thanks for the quick answer, Vladimir

However...I am quite new in this field of quantizing vectors....I am a bit
puzzled
by the "20 bits per vector"....Can you explain that in simple words? What do
you mean by "20 bits per vector"?

Thank you.

```
```>
> To my knowledge, the best results for the LPC quantization are somewhere
> around 20 bits per vector.

This is just a guess:

By 20 bits per 10-dimensional vector do you mean that you assign 2 bits per
coefficient in the 10-dimensional vector?

This means that you assign a range from 0 to 3 to represent the dynamic
range of the observed coefficient, right?

The encoded vector would then be a 20 bit bit-stream looking like this:

xx|xx|xx|xx|xx|xx|xx|xx|xx|xx|

where "xx" is either 00,01,10 or 11

....am I right?

```
```
Jake wrote:

>>To my knowledge, the best results for the LPC quantization are somewhere
>>around 20 bits per vector.
>
>
> This is just a guess:
>
> By 20 bits per 10-dimensional vector do you mean that you assign 2 bits per
> coefficient in the 10-dimensional vector?
>

No.

It is the vector quantization of 10-dimensional LSF vector using the
total of 20 bits for indexing the codebooks.

Vladimir Vassilevsky

DSP and Mixed Signal Design Consultant

http://www.abvolt.com
```
```> It is the vector quantization of 10-dimensional LSF vector using the total
> of 20 bits for indexing the codebooks.

Thanks for a quick answer again :o)

But I am sorry to say that I don't understand why you would need 20 bits for
indexing the codebooks....20 bits is a pretty high number and the codebook
wouldn't be that large??....So there is something I don't understand.....

I read that a codebook usually contains 500 vectors. So wouldn't you need
9 bits to represent an index range from 1 to 500???

Thanks again...

```
```
Jake wrote:

GO TO THE LIBRARY

VLV

>>It is the vector quantization of 10-dimensional LSF vector using the total
>>of 20 bits for indexing the codebooks.
>
>
> Thanks for a quick answer again :o)
>
> But I am sorry to say that I don't understand why you would need 20 bits for
> indexing the codebooks....20 bits is a pretty high number and the codebook
> wouldn't be that large??....So there is something I don't understand.....
>
> I read that a codebook usually contains 500 vectors. So wouldn't you need
> 9 bits to represent an index range from 1 to 500???
>
> Thanks again...
>
>
>
>
>
```