hi all, am new to speech coding and trying to understand the principles. stumbled upon this website: http://svr-www.eng.cam.ac.uk/~ajr/SA95/node87.html and it lists the areas to be addressed in LPC10. I was wondering how CELP (the basic one) has improved those areas in its algo, if at all. the areas for improvement in LPC10 are: a) Glottal pulse shaping tk: the codebook consists of a wide, relatively constant spectrum b)Pitch synchronous parameter updating tk: ??? c)Fine tuning of the voicing decision tk: CELP has an adaptive codebook for voice, a fixed codebook for noise and the selected codes, and the real excitation is the sum of these 2 and it is passed through the synthesized filter. so, there's no "clear" voicing decision made. rather, the synthesized output is made to be "perceptually" as close to the input as possible. d)Separation of speech and noise tk: I don't take this to mean whether a segment of signal is a speech or noise cos LPC makes voice/unvoice decisions. I take this to mean for a segment of signal which consists of both speech and noise. If this is the case, then the synthesized output for adaptive and fixed code will give speech and noise respectively. e)Exploitation of temporal correlations of acoustic vectors tk:??? what would be a good book to pick up on CELP? thanks for your time and tia. cheers, tk |
lpc10 vs celp
Started by ●January 18, 2005
Reply by ●January 19, 20052005-01-19
LPC10 and CELP belong to different coding schemes, therefore their direct comparison feature-by-feature is not quite correct - their problems are different. What is common for both them is that they estimate LPC coeeficients and then code the residual after LPC-analysis filtering. LPC10 is a parametric coder, i.e. it tries to estimate the speech physical parameters (pitch, voicing measure etc) and transmit them. This approach allows very strong compression however suffers from errors in parameters estimation. For example, error in pitch detection causes to wrong periodicity of the synthetic speech; wrong estimation of voicing measure may causes to buzziness etc. The suggested tips (glottal pulse shaping, fine tuning of voice decision etc) are intended for correction of voicing measure in order to make the synthetic speech less buzzy. CELP is based on analysis-by-synthesis scheme, i.e. it does not even try to estimate objective parameters but says "I don't know what it is but I need something like that". In other words CELP just perceptually "copies" and transmit the residual. This approach does not strongly depends on errors in pitch estimation (since there is no pitch in CELP but LTP - long term prediction, or adaptive codebook) nor on hard voicing desision (adaptive/algebraic codebooks are responsible just for SOFT separation to voiced/unvoiced) and therefore is free of buzziness. Thus, CELP is of higher quality than LPC10, but it needs more bit-rate and is more vulnerable to network errors (packet losses); the last is due to strong dependence of past excitation (adaptive codebook). Hope it will help you a little. Ilya Druker --- In , "dunjie17" <tunkeat@g...> wrote: > > > hi all, > > am new to speech coding and trying to understand the > principles. stumbled upon this website: > http://svr-www.eng.cam.ac.uk/~ajr/SA95/node87.html > and it lists the areas to be addressed in LPC10. I > was wondering how CELP (the basic one) has improved > those areas in its algo, if at all. the areas for > improvement in LPC10 are: > > a) Glottal pulse shaping > tk: the codebook consists of a wide, relatively constant spectrum > > b)Pitch synchronous parameter updating > tk: ??? > > c)Fine tuning of the voicing decision > tk: CELP has an adaptive codebook for > voice, a fixed codebook for noise and the selected codes, > and the real excitation is the sum of these 2 and it > is passed through the synthesized filter. so, there's no > "clear" voicing decision made. rather, the synthesized output > is made to be "perceptually" as close to the input > as possible. > > d)Separation of speech and noise > tk: I don't take this to mean whether a segment of signal is a > speech or noise cos LPC makes voice/unvoice decisions. > I take this to mean for a segment of signal which consists > of both speech and noise. If this is the case, then > the synthesized output for adaptive and fixed code will > give speech and noise respectively. > > e)Exploitation of temporal correlations of acoustic vectors > tk:??? > > what would be a good book to pick up on CELP? > thanks for your time and tia. > > cheers, > tk |