Forums

lpc10 vs celp

Started by dunjie17 January 18, 2005


hi all,

am new to speech coding and trying to understand the
principles. stumbled upon this website:
http://svr-www.eng.cam.ac.uk/~ajr/SA95/node87.html
and it lists the areas to be addressed in LPC10. I
was wondering how CELP (the basic one) has improved
those areas in its algo, if at all. the areas for
improvement in LPC10 are:

a) Glottal pulse shaping
tk: the codebook consists of a wide, relatively constant spectrum

b)Pitch synchronous parameter updating
tk: ???

c)Fine tuning of the voicing decision
tk: CELP has an adaptive codebook for
voice, a fixed codebook for noise and the selected codes,
and the real excitation is the sum of these 2 and it
is passed through the synthesized filter. so, there's no
"clear" voicing decision made. rather, the synthesized output
is made to be "perceptually" as close to the input
as possible.

d)Separation of speech and noise
tk: I don't take this to mean whether a segment of signal is a
speech or noise cos LPC makes voice/unvoice decisions.
I take this to mean for a segment of signal which consists
of both speech and noise. If this is the case, then
the synthesized output for adaptive and fixed code will
give speech and noise respectively.

e)Exploitation of temporal correlations of acoustic vectors
tk:???

what would be a good book to pick up on CELP?
thanks for your time and tia.

cheers,
tk





LPC10 and CELP belong to different coding schemes, therefore their
direct comparison feature-by-feature is not quite correct - their
problems are different. What is common for both them is that they
estimate LPC coeeficients and then code the residual after
LPC-analysis filtering.

LPC10 is a parametric coder, i.e. it tries to estimate the speech
physical parameters (pitch, voicing measure etc) and transmit them.
This approach allows very strong compression however suffers from
errors in parameters estimation. For example, error in pitch detection
causes to wrong periodicity of the synthetic speech; wrong estimation
of voicing measure may causes to buzziness etc. The suggested tips
(glottal pulse shaping, fine tuning of voice decision etc) are
intended for correction of voicing measure in order to make the
synthetic speech less buzzy.

CELP is based on analysis-by-synthesis scheme, i.e. it does not even
try to estimate objective parameters but says "I don't know what it is
but I need something like that". In other words CELP just perceptually
"copies" and transmit the residual. This approach does not strongly
depends on errors in pitch estimation (since there is no pitch in CELP
but LTP - long term prediction, or adaptive codebook) nor on hard
voicing desision (adaptive/algebraic codebooks are responsible just
for SOFT separation to voiced/unvoiced) and therefore is free of
buzziness. Thus, CELP is of higher quality than LPC10, but it needs
more bit-rate and is more vulnerable to network errors (packet
losses); the last is due to strong dependence of past excitation
(adaptive codebook).

Hope it will help you a little.
Ilya Druker

--- In , "dunjie17" <tunkeat@g...> wrote:
>
>
> hi all,
>
> am new to speech coding and trying to understand the
> principles. stumbled upon this website:
> http://svr-www.eng.cam.ac.uk/~ajr/SA95/node87.html
> and it lists the areas to be addressed in LPC10. I
> was wondering how CELP (the basic one) has improved
> those areas in its algo, if at all. the areas for
> improvement in LPC10 are:
>
> a) Glottal pulse shaping
> tk: the codebook consists of a wide, relatively constant spectrum
>
> b)Pitch synchronous parameter updating
> tk: ???
>
> c)Fine tuning of the voicing decision
> tk: CELP has an adaptive codebook for
> voice, a fixed codebook for noise and the selected codes,
> and the real excitation is the sum of these 2 and it
> is passed through the synthesized filter. so, there's no
> "clear" voicing decision made. rather, the synthesized output
> is made to be "perceptually" as close to the input
> as possible.
>
> d)Separation of speech and noise
> tk: I don't take this to mean whether a segment of signal is a
> speech or noise cos LPC makes voice/unvoice decisions.
> I take this to mean for a segment of signal which consists
> of both speech and noise. If this is the case, then
> the synthesized output for adaptive and fixed code will
> give speech and noise respectively.
>
> e)Exploitation of temporal correlations of acoustic vectors
> tk:???
>
> what would be a good book to pick up on CELP?
> thanks for your time and tia.
>
> cheers,
> tk