Dudley's Channel Vocoder

The first major effort to encode speech electronically was Homer Dudley's channel vocoder (``voice coder'') [68] developed starting in October of 1928 at AT&T Bell Laboratories [245]. An overall schematic of the channel vocoder is shown in Fig.G.4.

% latex2html id marker 41913\psfrag{x} []{ \LARGE$ x(t)$\ }\psfrag{X0} []{ \LARGE$ x_0(t) $\ }\psfrag{X1} []{ \LARGE$ x_1(t)$\ }\psfrag{XNM1} []{ \LARGE$ x_{N-1}(t)$\ }\psfrag{xhat} []{ \LARGE$ \hat{x}(t)$\ }\psfrag{Xhat0} []{ \LARGE$ \hat{x}_0(t) $\ }\psfrag{Xhat1} []{ \LARGE$ \hat{x}_1(t)$\ }\psfrag{XhatNM1} []{ \LARGE$ \hat{x}_{N-1}(t)$\ }\begin{figure}[htbp]
\caption{Channel or phase vocoder block diagram.}

On analysis, the outputs of ten analog bandpass filters (spanning 250-3000 Hz)G.5were rectified and lowpass-filtered to obtain amplitude envelopes for each band. In parallel, the fundamental frequency $ F_0$ was measured, and a voiced/unvoiced decision was made (unvoiced segments were indicated by $ F_0=0$ . On synthesis, a ``buzz source'' (relaxation oscillator) at pitch $ F_0$ (for voiced speech) or a ``hiss source'' (for unvoiced speech) was used to drive a set of ten matching bandpass filters, whose outputs were summed to produce the reconstructed voice. While the voice quality had a quite noticeable ``unpleasant electrical accent'' [245], the bandwidth required to transmit $ F_0(t)$ and the bandpass-filter gain envelopes was much less than that required to transmit the original speech signal.

The vocoder synthesis model can be considered a source-filter model for speech which uses a nonparametric spectral model of the vocal tract given by the output of a fixed bandpass-filter-bank over time. Related efforts included the formant vocoder [190]--a type of parametric spectral model--which encoded $ F_0$ and the amplitude and center-frequency of the first three spectral formants. See [168, pp. 2452-3] for an overview and references.

The original vocoder used a ``buzz source'' (implemented using ``relaxation oscillator'') driving the filter bank during voiced speech, and a ``hiss source'' (implemented using the noise from a resistor) driving the filter bank during unvoiced speech. In later speech modeling by linear-prediction [162], the buzz source evolved to the more mathematically pure impulse train, and the hiss source became white noise.

The vocoder used an analog bandpass filter bank, and only the amplitude envelope was retained for each bandpass channel. When the vocoder was later reimplemented using the discrete Fourier transform on a digital computer (§G.7 below), it became simple to record both the instantaneous amplitude and phase for each channel. As a result, the name was updated to phase vocoder. Section G.7 summarizes the history of the phase vocoder, and §G.10 describes an example implementation using the STFT.

Speech Synthesis Examples

The original goal of the vocoder was speech synthesis from a sparse, parametric model. A large collection of sound examples spanning the history of speech synthesis can be heard on the CD-ROM accompanying a JASA-87 review article by Dennis Klatt [129].

Next Section:
Previous Section:
The Hammond Organ