DSPRelated.com
Free Books

Singing Kelly-Lochbaum Vocal Tract

In 1962, John L. Kelly and Carol C. Lochbaum published a software version of a digitized vocal-tract analog model [245,246]. This may be the first instance of a sampled traveling-wave model of the vocal tract, as opposed to a lumped-parameter transmission-line model. In other words, Kelly and Lochbaum apparently returned to the original acoustic tube model (a sequence of cylinders), obtained d'Alembert's traveling-wave solution in each section, and applied Nyquist's sampling theorem to digitize the system. This sampled, bandlimited approach to digitization contrasts with the use of bilinear transforms as in wave digital filters; an advantage is that the frequency axis is not warped, but it is prone to aliasing when the parameters vary over time (or if nonlinearities are present).

At the junction of two cylindrical tube sections, i.e., at area discontinuities, lossless scattering occurs.A.14As mentioned in §A.5.4, reflection/transmission at impedance discontinuities was well formulated in classical network theory [34,35], and in transmission-line theory.

The Kelly-Lochbaum model can be regarded as a kind of ladder filter [297] or, more precisely, using later terminology, a digital waveguide filter [433]. Ladder and lattice digital filters can be used to realize arbitrary transfer functions [297], and they enjoy low sensitivity to round-off error, guaranteed stability under coefficient interpolation, and freedom from overflow oscillations and limit cycles under general conditions. Ladder/lattice filters remain important options when designing fixed-point implementations of digital filters, e.g., in VLSI. In the context of wave digital filters, the Kelly-Lochbaum model may be viewed as a digitized unit element filter [136], reminiscent of waveguide filters used in microwave engineering. In more recent terminology, it may be called a digital waveguide model of the vocal tract in which the digital waveguides are degenerated to single-sample width [433,442,452].

In 1961, Kelly and Lochbaum collaborated with Max Mathews to create what was most likely the first digital physical-modeling synthesis example by any method.A.15 The voice was computed on an IBM 704 computer using speech-vowel data from Gunnar Fant's recent book [132]. Interestingly, Fant's vocal-tract shape data were obtained (via x-rays) for Russian vowels, not English, but they were close enough to be understandable. Arthur C. Clarke, visiting John Pierce at Bell Labs, heard this demo, and he later used it in ``2001: A Space Odyssey,''--the HAL9000 computer slowly sang its ``first song'' (``Bicycle Built for Two'') as its disassembly by astronaut Dave Bowman neared completion.A.16

Perhaps due in part to J. L. Kelly's untimely death afterward, research on vocal-tract analog models tapered off thereafter, although there was some additional work [306]. Perhaps the main reason for the demise of this research thread was that spectral models (both nonparametric models such as the vocoder, and parametric source-filter models such as linear predictive coding (discussed below)) proved to be more effective when the application was simply speech coding at acceptably low bit rates and high fidelity levels. In telephone speech-coding applications, there was no requirement that a physical voice model be retained for purposes of expressive musical performance. In fact, it was desired to automate and minimize the ``performance expertise'' required to operate the voice production model. One could go so far as to say that the musical expressivity of voice synthesis models reached their peak in the 1939 Voder and related (manual) systems (§A.6.1).

In computer music, the Kelly-Lochbaum vocal tract model was revived for singing-voice synthesis in the thesis research of Perry Cook [87].A.17 In addition to the basic vocal tract model with side branch for the nasal tract, Cook included neck radiation (e.g., for `b'), and damping extensions. Additional work on incorporating damping within the tube sections was carried out by Amir et al. [15]. Other extensions include sparse acoustic tube modeling [149] and extension to piecewise conical acoustic tubes [507]. The digital waveguide modeling framework [430,431,433] can be viewed as an adaptation of extremely sparse acoustic-tube models for artificial reverberation, vibrating strings, and wind-instrument bores.


Next Section:
Linear Predictive Coding of Speech
Previous Section:
Vocal Tract Analog Models