Forums

Vector Quantization: Problem with designing codebook

Started by Jake December 30, 2005
Hi

I am trying to do vector quantization for noise-free speech segments.

The algorithm I have programmed works like this:

1) Update 240-point frame F with pre-emphasized block x of 10 speech
samples.

2) Update estimate of power spectrum P based on updated frame

3) Calculate vector B=[b1 b2 b3 .....b17] where
b1=sum of amplitudes in 1st bark scale
b2=sum of amplitudes in 2nd bark scale
etc.
etc.

4) Divide B by sum of all amplitudes in P
5) Look closest neighbour B' to B up in codebook
6) Calculate estimate P' of P based on B' from codebook
7) Calculate autocorrelation estimate R' based on P'
8) Calculate autocorrelation R based on F
9) Calculate filter coefficients for synthesis filter (Levinson-Durbin
recursion with R' as input)
10) Calculate filter coefficients for whitening filter (Levinson-Durbin
recursion with R as input)
11) Send x thru whitening filter
12) Send output from whitening filter thru synthesis filter

Output from synthesis filter is the original speech if everything works
well.

Problem:

If I replace step 5 and just let B'=B then everything works fine.

The problem is designing a good codebook.

I have tried using k-means algorithm to design a codebook based on 700000
observations of B, but the result is far from impressing.

Maybe it is just impossible to design a good codebook when the dimension of
B is too high (17 in this case). ???

I hope there are some experts out there who can help me with some tips,
hints and tricks ? :o)

I have also tried to encode the spectrum using 10th order LPC-analysis such 
that B is a 10-dimensional vector. That didn't work well. I then tried with 
a 10th order LSF-vector, but that didn't work well either....

Thanks in advance.




Jake wrote:

> [...] > I have also tried to encode the spectrum using 10th order LPC-analysis such > that B is a 10-dimensional vector. That didn't work well. I then tried with > a 10th order LSF-vector, but that didn't work well either....
What are you doing for the residual? If you don't encode the residual, then your compression should be lossless, and thus lead to *perfect* reconstruction of the original signal. What exactly is "didn't work well"?? BTW, this is the standard approach -- I learned the hard way that one never encodes (VQ) the spectrum directly; for speech, VQ is always (yes, I believe with no single exception) combined with some form of LPC processing. Then, there's an array of specific techniques leading to various levels of quality with various levels of compression ratio. Perhaps the simplest technique could be: encode the LPC or LSF (LSF is in general better when used with VQ -- the "centroids" of the codebook always lead to stable filters), then compute the residual, split it into chunks of reasonable size (20 or 30 samples), and VQ each of those chunks (yes, seeing them as 20- or 30-dimensional vectors). That should work reasonably well, provided that your codebooks are sufficiently large -- you should expect more or less good quality if your bitrate is around 8 or 10 kbps. Below that, the encoding has to be really sophisticated if you want a reasonable quality. HTH, Carlos --
Jake wrote:

 > Hi
 >
 > I am trying to do vector quantization for noise-free speech segments.
 >



First question: what are you trying to accomplish? Why reinventing the 
wheel? Why can't you just compute the LSFs and vector quantize it like 
most of the codecs do?


> The algorithm I have programmed works like this: > > 1) Update 240-point frame F with pre-emphasized block x of 10 speech > samples. > > 2) Update estimate of power spectrum P based on updated frame > > 3) Calculate vector B=[b1 b2 b3 .....b17] where > b1=sum of amplitudes in 1st bark scale > b2=sum of amplitudes in 2nd bark scale > 4) Divide B by sum of all amplitudes in P > 5) Look closest neighbour B' to B up in codebook > 6) Calculate estimate P' of P based on B' from codebook > 7) Calculate autocorrelation estimate R' based on P' > 8) Calculate autocorrelation R based on F > 9) Calculate filter coefficients for synthesis filter (Levinson-Durbin > recursion with R' as input) > 10) Calculate filter coefficients for whitening filter (Levinson-Durbin > recursion with R as input) > 11) Send x thru whitening filter > 12) Send output from whitening filter thru synthesis filter > > Output from synthesis filter is the original speech if everything works > well. > > Problem: > > If I replace step 5 and just let B'=B then everything works fine. > > The problem is designing a good codebook. > > I have tried using k-means algorithm to design a codebook based on 700000 > observations of B, but the result is far from impressing.
Your codebook design does not seem to be optimized neither for the perceptual importance nor for the minimum LPC spectral distortion. Therefore the results are expected to be fair.
> > Maybe it is just impossible to design a good codebook when the dimension of > B is too high (17 in this case). ???
BTW, how did you decide on the dimension?
> > I hope there are some experts out there who can help me with some tips, > hints and tricks ? :o) > I have also tried to encode the spectrum using 10th order LPC-analysis such > that B is a 10-dimensional vector. That didn't work well. I then tried with > a 10th order LSF-vector, but that didn't work well either....
Keep trying :) Take a look at the way the LSFs are quantized in G.723.1, G.729 or any other standard codecs. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
> Keep trying :) > Take a look at the way the LSFs are quantized in G.723.1, G.729 or any > other standard codecs. >
Hi Vladimir Thanks for the answer....I am re-inventing the wheel so I can learn :o) Can I download G723 and G729 for free somewhere or do I have to pay for it ? I have free access to IEEE-xplore. Thanks again ... And happy new year to you all...

Jake wrote:


>>Take a look at the way the LSFs are quantized in G.723.1, G.729 or any >>other standard codecs. >> > > Thanks for the answer....I am re-inventing the wheel so I can learn :o) >
To my knowledge, the best results for the LPC quantization are somewhere around 20 bits per vector. They quantize the weighted difference vector between the current and the previous LSFs. The 10 LSFs are split into two subvectors of 4 + 6, and each subvector is quantized using the separate 10 bit codebook.
> Can I download G723 and G729 for free somewhere or do I have to pay for it ? > > I have free access to IEEE-xplore.
They use 24 bits (three codebooks of 8 + 8 + 8 bits), and the 10 LSFs vector is split as 3 + 3 + 4. I think the descriptions are available from ITU web site. I have seen the source code somewhere also. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
> To my knowledge, the best results for the LPC quantization are somewhere > around 20 bits per vector.
Thanks for the quick answer, Vladimir However...I am quite new in this field of quantizing vectors....I am a bit puzzled by the "20 bits per vector"....Can you explain that in simple words? What do you mean by "20 bits per vector"? Thank you.
> > To my knowledge, the best results for the LPC quantization are somewhere > around 20 bits per vector.
This is just a guess: By 20 bits per 10-dimensional vector do you mean that you assign 2 bits per coefficient in the 10-dimensional vector? This means that you assign a range from 0 to 3 to represent the dynamic range of the observed coefficient, right? The encoded vector would then be a 20 bit bit-stream looking like this: xx|xx|xx|xx|xx|xx|xx|xx|xx|xx| where "xx" is either 00,01,10 or 11 ....am I right?

Jake wrote:

>>To my knowledge, the best results for the LPC quantization are somewhere >>around 20 bits per vector. > > > This is just a guess: > > By 20 bits per 10-dimensional vector do you mean that you assign 2 bits per > coefficient in the 10-dimensional vector? >
No. It is the vector quantization of 10-dimensional LSF vector using the total of 20 bits for indexing the codebooks. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
> It is the vector quantization of 10-dimensional LSF vector using the total > of 20 bits for indexing the codebooks.
Thanks for a quick answer again :o) But I am sorry to say that I don't understand why you would need 20 bits for indexing the codebooks....20 bits is a pretty high number and the codebook wouldn't be that large??....So there is something I don't understand..... I read that a codebook usually contains 500 vectors. So wouldn't you need 9 bits to represent an index range from 1 to 500??? Thanks again...

Jake wrote:

GO TO THE LIBRARY

VLV


>>It is the vector quantization of 10-dimensional LSF vector using the total >>of 20 bits for indexing the codebooks. > > > Thanks for a quick answer again :o) > > But I am sorry to say that I don't understand why you would need 20 bits for > indexing the codebooks....20 bits is a pretty high number and the codebook > wouldn't be that large??....So there is something I don't understand..... > > I read that a codebook usually contains 500 vectors. So wouldn't you need > 9 bits to represent an index range from 1 to 500??? > > Thanks again... > > > > >