Sign in

username:

password:



Not a member?

Search speechcoding



Search tips

Subscribe to speechcoding



speechcoding by Keywords

ACELP | ADPCM | AMBE | AMR | AMR-NB | CELP | Codebook | DTMF | G.723 | G.726 | G.729 | GSM | Interpolation | LPC | LSF | LSP | MELP | PCM | Perceptual | Pitch | PSOL | QCELP | Quantization | SMV | VAD | Vocoder

Ads

Discussion Groups

Discussion Groups | Speech Coding | Re: Speech reconstruction from spectrum.

Technical discussions related to Speech Coding (all itu and other vocoders, ACELP, CELP, AMR, etc)

  

Post a new Thread

Speech reconstruction from spectrum. - Author Unknown - Oct 22 8:42:00 1999



Hello.

I know that speech could be reconstructed by spectrum, but the question
is
what quality should be expected? How exactly to reconstruct the phases
of
harmonics? Is it possible to give to every harmonic different phases in
comparison with original sound spectrum (we will obtain different sound
wave),
but the sound to keep his original quality? (I meen not the simple
cases, i.e.
signal inversion, etc.)
What freedom we have over the phases in this case? Is there some
special
permanent relation between phases we may use when we look at the
coefficients
of source synchronized DFT?

Thank you in advance,
Stefan.





(You need to be a member of speechcoding -- send a blank email to speechcoding-subscribe@yahoogroups.com )

Re: Speech reconstruction from spectrum. - Author Unknown - Oct 22 9:36:00 1999

It is a well known fact that human ear is sensitive to frequency and not
the phase!
However phase distortions result in irregular propagation delays in
different frequencies!. This tends to produce distortion... If the phase
distortion is with in certain limit ( which will depend upon the
application... Hi-fi systems will require it to be as little as possible
while in communication quality systems you can take all the liberty) it
shall not be a problem. This can also be observed as:
1. The encoding of phase is not done in most vocoders (LPC etc.) and
they tend to produce mechanical sound.
2. In hybrid encoding (CELP etc.) some amount of phase is encoded, so
they have relatively better performance!
3. The waveform coders give even better quality!
(Well its not just phase, but phase have an important role in making
speech sound natural or mechanical)

-Bajwa wrote:

> Hello.
>
> I know that speech could be reconstructed by spectrum, but the
> question
> is
> what quality should be expected? How exactly to reconstruct the phases
>
> of
> harmonics? Is it possible to give to every harmonic different phases
> in
> comparison with original sound spectrum (we will obtain different
> sound
> wave),
> but the sound to keep his original quality? (I meen not the simple
> cases, i.e.
> signal inversion, etc.)
> What freedom we have over the phases in this case? Is there some
> special
> permanent relation between phases we may use when we look at the
> coefficients
> of source synchronized DFT?
>
> Thank you in advance,
> Stefan.






(You need to be a member of speechcoding -- send a blank email to speechcoding-subscribe@yahoogroups.com )

Re: Speech reconstruction from spectrum. - Khaled El-Maleh - Oct 22 14:44:00 1999


This is an interesting issue and let me tell you some of
my own experience. (but be patient and read to the end)

Read these references about signal reconstruction from magnitude-only or
phase only etc.

1) A. Oppenhiem and J. Lim, "The importance of phase in signals",
Proceeding sof the IEEE, Vol. 69, No. 5, May 1981.

2) B. Yegnanrayana et al., "Significance of group delay functions in
signal reconstruction from spectral magnitude or phase", IEEE Trans. ASSP,
Vol. ASSP-32, No. 3, June 1984.

(we can reconstruct a signal using only its magnitude spectrum if it
satisfies certain conditions, for example, minimum phase signal.
Similarily, we can reconstruct the signal using only its phase spectrum
or group delay under certain conditions).

Many researchers have studied the importance of phase on the
quality of reconstructed signal.

My own experiments prove that Fourier phase is important to retain
a natural quality speech. One simple experiment is to use only the
magnitude spectrum with any phase that replaces the true phase and study
the quality of reconstructed signal. I have done this using the LP
residual. I have noticed for speech phase of the LP residual is important
to preserve naturalness. Moreover, for unvoiced speech using random phase
is a good model with almost no effect on the quality. However, for voiced
speech and mixed voiced random phase is not sufficient and not a good
model.

(read my paper for explanations for non-speech sounds such as background
acoustic noise)

K. El-Maleh and P. Kabal, "Natural-quality background noise coding using
residual substitution", EuroSpeech 99. (www.tsp.ece.mcgill.ca)

The main reason is that for voiced speech and any structured sound, the
sequence of consecutive acoustic events produce the phase pattern. For
example, it is important in Waveform Interpolation to preserve the pulse
spacing between consecutive pitch pulses.

(Kang and Sen, "Phase adjustment in waveform interpolation", ICASSP 99).

Remark:
The long-term phase (sequence of short-time phase spectra) is
important perceptually.

To read more about this see:

1) C. Ma, and D. O'Shaughnessy," A perceptual study of source coding of
Fourier phase and amplitude of the linear predictive coding residual of
vowel sounds", J. Acoust. Soc. America, 95 (4), April 1994, pp. 2231-2239.

2) P. Hedelin, "Phase compensation in all-pole speech analysis", ICASSP
88, pp. 339-342.

3) O. Gautherot et al. "LPC residual phase investigation", Proc. of
EuroSpeech 89, pp. 35-38. For the CELP family of speech coders we have to remember that they use
a closed-loop (analysis-by-synthesis) to find the excitation (model the LP
residual). This waveform-matching is nothing more than preserving
(modeling) both the magnitude and phase of the LP residual (signal).

Read this paper for more details:

T. Ramabadran and C. Lueck, "Complexity reduction of CELP speech coders
through the use of phase information", IEEE Trans. Communications, Vol.
42, No. 2/3/4, Feb./March/April 1994, pp. 248-251. For CELP coders at and below 4 kbps, recently many researchers have
proposed using an extra all-pass filter (phase addition filter) to
compensate the insufficient modeling of phase in CELP with small
codebooks. Read these papers:

1) Y. Yamaura et al., " CELP coding below 3 kbps using LPC residual
phase coding", Speech Coding Workshop 97, pp. 103-104.

2) B. Cheetham eta l.," All-pass excitation phase modelling for low
bit-rate speech coding", 1997 IEEE Int. Symposium in Circuits and Systems,
June 1997, pp. 2633-2636.

Recently, many arguments have appeared that do not support the well-known
statement " the human ear is not sensitive to phase". This is not always
true.

Read the paper that appeared in the ICASPP 99 " On the Phase Perception
of Speech", by W. Kleijn.

For sinsuoidal coders, see this paper:

S. Ahmadi, and A . Spanias, "A new phase model for sinusoidal
transform coding of speech", IEEE Trans. on Speech and Audio
Processing, Vol. 6, No. 5, Sept. 1998, pp. 495-501.

Any comments are welcomed.

----------------------------------------------------------------
Khaled El-Maleh
Department of Electrical & Computer Engineering
McGill University
3480 University St.
Montreal Quebec
H3A 2A7 Canada

Telephone: (514) 398-5233 (O)
Fax : (514) 398-4470





(You need to be a member of speechcoding -- send a blank email to speechcoding-subscribe@yahoogroups.com )