Free Books

A History of Enabling Ideas Leading to Virtual Musical Instruments

This appendix summarizes some milestones toward the development of virtual musical instruments. Beginning with early musical acoustics, key developments in the literature deemed most enabling for virtual acoustic instruments are traced up to the mid 1980s, with an emphasis on string models (as in the text overall). Selected follow-on efforts are mentioned. A condensed version of this appendix was published in [446], with associated presentation overheads [448].

Early Musical Acoustics

All things which can be known have number; for it is not possible that without number anything can be either conceived or known.
-- Philolaus (ca. 400 BC)
Vibrating strings were studied by the Pythagoreans (6th-5th century BC). Pythagorus noticed that harmonics were produced by dividing the string length by whole numbers, and he was interested in understanding consonant pitch intervals in terms of simple ratios of string lengths. ``Harmony theory'' from the Pythagoreans was taught throughout the Middle Ages as one of the seven liberal arts: the quadrivium, consisting of arithmetic, geometry, astronomy, and music (harmony theory); and the trivium, consisting of grammar, logic, and rhetoric [411]. The correspondence between musical pitch and frequency of physical vibration was not discovered until the seventeenth century [113]. It took until Galileo (1564-1642) to be free of the formulation of Aristotle (384-322 BC) that all motion required an ongoing applied force, thereby opening the way for modern differential equations of motion. The ideas of Galileo were formalized and extended by Newton (1642-1727), whose famous second law of motion ``$ f=ma$'' lies at the foundation of essentially all classical mechanics and acoustics. Newton's Principia (1686) describes sound as traveling pressure pulses, and single-frequency sound waves were analyzed. The first to publish a one-dimensional wave equation for the vibrating string was the applied mathematician Jean Le Rond d'Alembert (1717-1783) [100,103].A.1The 1D wave equation can be written as

$\displaystyle K\frac{\partial^2 y}{\partial x^2} = \epsilon\frac{\partial^2 y}{\partial t^2}$ (A.1)


\begin{displaymath}\begin{array}{rclrcl} K & \isdef & \mbox{string tension} & \q...
...r mass density} & & = & \mbox{string displacement}, \end{array}\end{displaymath}    

$ t$ is time in seconds, and $ x$ denotes position along the string. (See discussion in §C.1.) It can be derived directly from Newton's second law applied to a differential string element by equating the string restoring force about a small element (first described by Brook Taylor in 1713) to Newton's mass times acceleration of that element, which is how d'Alembert in fact proceeded in his derivation [103]. In addition to introducing the 1D wave equation as an equation among differentials, d'Alembert derived its solution in terms of traveling waves:

$\displaystyle y(t,x) = y_r\left(t-\frac{x}{c}\right) + y_l\left(t+\frac{x}{c}\right)$ (A.2)

where $ c=\sqrt{K/\epsilon }$ denotes the wave propagation speed. Earlier, in 1717, Brook Taylor (1685-1731)A.2published what was apparently the first mathematical paper on vibrating strings, but without introducing differential equations. D'Alembert's ideas were developed into essentially modern form primarily by Euler (1707-1783)A.3 [65]. Incidentally, d'Alembert also invented the theory of construction of eyeglasses.A.4

History of Modal Expansion

While the traveling-wave solution for the ideal vibrating string is fully general, and remains the basis for the most efficient known string synthesis algorithms, an important alternate formulation is in terms of modal expansions, that is, a superposition sum of orthogonal basis functions which solve the differential equation and obey the boundary conditions. Daniel Bernoulli (1700-1782) developed the notion that string vibrations can be expressed as the superposition of an infinite number of harmonic vibrations [103].A.5This approach ultimately evolved to Hilbert spaces of orthogonal basis functions that are solutions of Hermitian linear operators--a formulation at the heart of quantum mechanics describing what can be observed in nature [112,539]. In computational acoustic modeling, Sturm-Liouville theory has been used to give improved models of nonuniform acoustic tubes such as horns [50], and to provide spatial transforms analogous to the Laplace transform [360]. In the field of computer music, the introduction of modal synthesis is credited to Adrien [5]. More generally, a modal synthesis model is perhaps best formulated via a diagonalized state-space formulation [558,220,107].A.6 A more recent technique, which has also been used to derive modal synthesis models, is the so-called functional transformation method [360,499,498]. In this approach, physically meaningful transfer functions are mapped from continuous to discrete time in a way which preserves desired physical parametric controls.

Mass-Spring Resonators

Since a harmonic oscillator is produced by a simple mass-spring system, a mechanical generator for the harmonic basis functions of Bernoulli is readily obtained by equating Newton's second law $ f=m\ddot{x}$ for the reaction force of an ideal mass $ m$, with Robert Hooke's spring force law $ f=-kx$ (published in 1676), where $ k$ is an empirical spring constant [65]. Hooke (1635-1703) was a contemporary of Newton's who carried out extensive experiments with springs in search of a spring-regulated clock [259, pp. 274-288]. Hooke's law was generalized to 3D by Cauchy (1789-1857) as the familiar linear relationship between six components of stress and strain.A.7 Elementary mass-spring models have found much use in computational physical models for purposes of sound synthesis [69,92]. For example, a mass-spring oscillator is typically used to model a brass-player's lips [4], piano hammers [44], and is sometimes included in woodwind-reed models [406].

Sampling Theory

Nowadays, audio processing is typically carried out in discrete time. As a result, sampling theory is fundamental to digital audio signal processing. The sampling theorem is credited to Harold Nyquist (1928), extending an earlier result by Cauchy (1831) based on series expansions. Claude Shannon is credited with reviving interest in the sampling theorem after World War II when computers became public.A.8 As a result, the sampling theorem is often called ``Nyquist's sampling theorem,'' ``Shannon's sampling theorem,'' or the like. The sampling rate has been called the ``Nyquist rate'' in honor of Nyquist's contributions [333]. Often in common usage, however, the term ``Nyquist rate'' is used to refer instead to half the sampling rate. To preserve the historically correct meaning, we might encourage use of the term Nyquist limit to mean half the sampling rate, and simply say ``sampling rate'' instead of ``Nyquist rate'', so as to minimize confusion.

Physical Digital Filters

Analog Computers

Analog filters are inherently ``physical'' since voltages and currents are direct ``analogs'' for physical variables such as force and velocity. So-called ``analog computers'' were often used for physical system simulation before digital computers took over.

Finite Difference Methods

One of the first applications of digital computers to numerical simulation of physical systems was the so-called finite difference approach [481]. The general procedure is to replace derivatives by finite differences, and there are many variations on how this can be done. For example, the first partial derivative with respect to time in Eq.$ \,$(A.1) may be approximated by

$\displaystyle {\dot y}(t,x) \isdefs \frac{\partial}{\partial t} y(t,x) \;\approx\; \frac{y(t,x) - y(t-T,x)}{T} \approx \frac{y(t+T,x) - y(t-T,x)}{2T}.$ (A.3)

The second form, called a centered finite difference, has the advantage of not introducing a time delay, but at the expense of requiring an extra factor of two oversampling for a given accuracy in its magnitude response (when viewed as a digital filter). Finite differences are at least as old as Taylor series, since,

$\displaystyle y(t-T)= y(t) - T{\dot y}(t) + \cdots \;\Rightarrow\; {\dot y}(t) \approx \frac{y(t)-y(t-T)}{T}.$ (A.4)

Interestingly, the general Taylor series, published in 1715 by Taylor in his book Methodus Incrementorum Directa et Inversa, was known more than forty years earlier to James Gregory (1668), somewhat earlier to Jean Bernoulli, and to some extent even before 1550 in India [65, pp. 422,469]. Finite differences were used to construct the earliest known digital models of vibrating strings by Pierre Ruiz and Lejaren Hiller ca. 1971 [194]. Finite-difference methods have not historically been aimed at real-time simulation, and they are generally used with very large sampling rates compared with the ``band of interest''. On the order of ten-times oversampling is needed to obtain reasonably accurate simulation across the entire audio band when using classical finite difference methods. In the finite-difference method literature, accuracy is usually only considered at dc, which is inaudible. Since finite difference models are usually linear, time-invariant digital filters, it is straightforward to improve them by filter-design methods, optimizing perceived audio error over the entire audio band. Such improved (and optionally extended) coefficients can then be used to construct a refined, indirectly estimated, partial differential equation. Other offline (slower than real time) computational physical modeling methods include finite element [480] and boundary element methods. Such offline simulations can be valuable as a source of ``virtual experiments'' which enable the testing and calibration of faster algorithms. As an example, a detailed simulation of guitar acoustics, employing both finite difference and finite element modeling approaches, is reported in [109].

Transfer Function Models

As indicated in the previous section, instead of digitizing a differential equation by finite differences, one can often formulate a filter design problem. This is ideal when all that matters is the input-output response of the physical system, and the physical system is linear and time-invariant (LTI). When the desired transfer function spans more than one system element, non-physical models are usually obtained, so we will not consider such models further. However, digital filter design methods optimizing perceptually motivated error criteria are extremely effective in spectral modeling and audio compression applications [337]. They are also good choices for subsystems which are to remain fixed over time, such as cello bodies, piano soundboards, and the like.

Wave Digital Filter Models

Perhaps the best known physics-based approach to digital filter design is wave digital filters, developed principally by Alfred Fettweis [136].A.9Wave digital filters may be constructed by applying the bilinear transform [343] to the scattering-theoretic formulation of lumped RLC networks introduced in circuit theory by Vitold Belevitch [34]. Fettweis in fact worked with Belevitch.A.10Scattering theory had been in use for many years prior in quantum mechanics. A key, driving property of wave digital filters is low sensitivity to coefficient round-off error. This follows from the correspondence to passive circuit networks. Wave digital filters also have the nice property of preserving order of the original (analog) system. For example, a ``wave digital spring'' is simply a unit delay, and a ``wave digital mass'' is a unit delay with a sign flip. The only approximation aspect is the frequency-warping caused by the bilinear transform. It is interesting to note that when it is possible to frequency-warp input/output signals exactly, a wave digital filter can implement a continuous-time LTI system exactly! See [55] for a discussion of wave digital filters and their relation to finite differences et al. In computer music, various ``wave digital elements'' have been proposed, including wave digital toneholes [527], piano hammers [56], and woodwind bores [525].

Voice Synthesis

Unquestionably, the most extensive prior work in the 20th century relevant to virtual acoustic musical instruments occurred within the field of speech synthesis [139,142,363,408,335,106,243].A.11 This research was driven by both academic interest and the potential practical benefits of speech compression to conserve telephone bandwidth. It was clear at an early point that the bandwidth of a telephone channel (nominally 200-3200 Hz) was far greater than the ``information rate'' of speech. It was reasoned, therefore, that instead of encoding the speech waveform, it should be possible to encode instead more slowly varying parameters of a good synthesis model for speech. Before the 20th century, there were several efforts to simulate the voice mechanically, going back at least until 1779 [140].

Dudley's Vocoder

The first major effort to encode speech electronically was Homer Dudley's vocoder (``voice coder'') [119] developed starting in October of 1928 at AT&T Bell Laboratories [414]. A manually controlled version of the vocoder synthesis engine, called the Voder (Voice Operation Demonstrator [140]), was constructed and demonstrated at the 1939 World's Fairs in New York and San Francisco [119]. Pitch was controlled by a foot pedal, and ten fingers controlled the bandpass gains. Buzz/hiss selection was by means of a wrist bar. Three additional keys controlled transient excitation of selected filters to achieve stop-consonant sounds [140]. ``Performing speech'' on the Voder required on the order of a year's training before intelligible speech could reliably be produced. The Voder was a very interesting performable instrument! The vocoder and Voder can be considered based on a source-filter model for speech which includes a non-parametric spectral model of the vocal tract given by the output of a fixed bandpass-filter-bank over time. Later efforts included the formant vocoder (Munson and Montgomery 1950)--a type of parametric spectral model--which encoded $ F_0$ and the amplitude and center-frequency of the first three spectral formants. See [306, p. 2452-3] for an overview and references. While we have now digressed to into the realm of spectral models, as opposed to physical models, it seems worthwhile to point out that the early efforts toward speech synthesis were involved with essentially all of the mainstream sound modeling methods in use today (both spectral and physical domains).

Vocal Tract Analog Models

There is one speech-synthesis thread that clearly classifies under computational physical modeling, and that is the topic of vocal tract analog models. In these models, the vocal tract is regarded as a piecewise cylindrical acoustic tube. The first mechanical analogue of an acoustic-tube model appears to be a hand-manipulated leather tube built by Wolfgang von Kempelen in 1791, reproduced with improvements by Sir Charles Wheatstone [140]. In electrical vocal-tract analog models, the piecewise cylindrical acoustic tube is modeled as a cascade of electrical transmission line segments, with each cylindrical segment being modeled as a transmission line at some fixed characteristic impedance. An early model employing four cylindrical sections was developed by Hugh K. Dunn in the late 1940s [120]. An even earlier model based on two cylinders joined by a conical section was published by T. Chiba and M. Kajiyama in 1941 [120]. Cylinder cross-sectional areas $ A_i$ were determined based on X-ray images of the vocal tract, and the corresponding characteristic impedances were proportional to $ 1/A_i$. An impedance-based, lumped-parameter approximation to the transmission-line sections was used in order that analog LC ladders could be used to implement the model electronically. By the 1950s, LC vocal-tract analog models included a side-branch for nasal simulation [131]. The theory of transmission lines is credited to applied mathematician Oliver Heaviside (1850-1925), who worked out the telegrapher's equations (sometime after 1874) as an application of Maxwell's equations, which he simplified (sometime after 1880) from the original 20 equations of Maxwell to the modern vector formulation.A.12 Additionally, Heaviside is credited with introducing complex numbers into circuit analysis, inventing essentially Laplace-transform methods for solving circuits (sometime between 1880 and 1887), and coining the terms `impedance' (1886), `admittance' (1887), `electret', `conductance' (1885), and `permeability' (1885). A little later, Lord Rayleigh worked out the theory of waveguides (1897), including multiple propagating modes and the cut-off phenomenon.A.13

Singing Kelly-Lochbaum Vocal Tract

In 1962, John L. Kelly and Carol C. Lochbaum published a software version of a digitized vocal-tract analog model [245,246]. This may be the first instance of a sampled traveling-wave model of the vocal tract, as opposed to a lumped-parameter transmission-line model. In other words, Kelly and Lochbaum apparently returned to the original acoustic tube model (a sequence of cylinders), obtained d'Alembert's traveling-wave solution in each section, and applied Nyquist's sampling theorem to digitize the system. This sampled, bandlimited approach to digitization contrasts with the use of bilinear transforms as in wave digital filters; an advantage is that the frequency axis is not warped, but it is prone to aliasing when the parameters vary over time (or if nonlinearities are present). At the junction of two cylindrical tube sections, i.e., at area discontinuities, lossless scattering occurs.A.14As mentioned in §A.5.4, reflection/transmission at impedance discontinuities was well formulated in classical network theory [34,35], and in transmission-line theory. The Kelly-Lochbaum model can be regarded as a kind of ladder filter [297] or, more precisely, using later terminology, a digital waveguide filter [433]. Ladder and lattice digital filters can be used to realize arbitrary transfer functions [297], and they enjoy low sensitivity to round-off error, guaranteed stability under coefficient interpolation, and freedom from overflow oscillations and limit cycles under general conditions. Ladder/lattice filters remain important options when designing fixed-point implementations of digital filters, e.g., in VLSI. In the context of wave digital filters, the Kelly-Lochbaum model may be viewed as a digitized unit element filter [136], reminiscent of waveguide filters used in microwave engineering. In more recent terminology, it may be called a digital waveguide model of the vocal tract in which the digital waveguides are degenerated to single-sample width [433,442,452]. In 1961, Kelly and Lochbaum collaborated with Max Mathews to create what was most likely the first digital physical-modeling synthesis example by any method.A.15 The voice was computed on an IBM 704 computer using speech-vowel data from Gunnar Fant's recent book [132]. Interestingly, Fant's vocal-tract shape data were obtained (via x-rays) for Russian vowels, not English, but they were close enough to be understandable. Arthur C. Clarke, visiting John Pierce at Bell Labs, heard this demo, and he later used it in ``2001: A Space Odyssey,''--the HAL9000 computer slowly sang its ``first song'' (``Bicycle Built for Two'') as its disassembly by astronaut Dave Bowman neared completion.A.16 Perhaps due in part to J. L. Kelly's untimely death afterward, research on vocal-tract analog models tapered off thereafter, although there was some additional work [306]. Perhaps the main reason for the demise of this research thread was that spectral models (both nonparametric models such as the vocoder, and parametric source-filter models such as linear predictive coding (discussed below)) proved to be more effective when the application was simply speech coding at acceptably low bit rates and high fidelity levels. In telephone speech-coding applications, there was no requirement that a physical voice model be retained for purposes of expressive musical performance. In fact, it was desired to automate and minimize the ``performance expertise'' required to operate the voice production model. One could go so far as to say that the musical expressivity of voice synthesis models reached their peak in the 1939 Voder and related (manual) systems (§A.6.1). In computer music, the Kelly-Lochbaum vocal tract model was revived for singing-voice synthesis in the thesis research of Perry Cook [87].A.17 In addition to the basic vocal tract model with side branch for the nasal tract, Cook included neck radiation (e.g., for `b'), and damping extensions. Additional work on incorporating damping within the tube sections was carried out by Amir et al. [15]. Other extensions include sparse acoustic tube modeling [149] and extension to piecewise conical acoustic tubes [507]. The digital waveguide modeling framework [430,431,433] can be viewed as an adaptation of extremely sparse acoustic-tube models for artificial reverberation, vibrating strings, and wind-instrument bores.

Linear Predictive Coding of Speech

Approximately a decade after the Kelly-Lochbaum voice model was developed, Linear Predictive Coding (LPC) of speech began [20,296,297]. The linear-prediction voice model is best classified as a parametric, spectral, source-filter model, in which the short-time spectrum is decomposed into a flat excitation spectrum multiplied by a smooth spectral envelope capturing primarily vocal formants (resonances). LPC has been used quite often as a spectral transformation technique in computer music, as well as for general-purpose audio spectral envelopes [381], and it remains much used for low-bit-rate speech coding in the variant known as Codebook Excited Linear Prediction (CELP) [337].A.18When applying LPC to audio at high sampling rates, it is important to carry out some kind of auditory frequency warping, such as according to mel, Bark, or ERB frequency scales [182,459,482]. Interestingly, it was recognized from the beginning that the all-pole LPC vocal-tract model could be interpreted as a modified piecewise-cylindrical acoustic-tube model [20,297], and this interpretation was most explicit when the vocal-tract filters (computed by LPC in direct form) were realized as ladder filters [297]. The physical interpretation is not really valid, however, unless the vocal-tract filter parameters are estimated jointly with a realistic glottal pulse shape. LPC demands that the vocal tract be driven by a flat spectrum--either an impulse (or low-pitched impulse train) or white noise--which is not physically accurate. When the glottal pulse shape (and lip radiation characteristic) are ``factored out'', it becomes possible to convert LPC coefficients into vocal-tract shape parameters (area ratios). Approximate results can be obtained by assuming a simple roll-off characteristic for the glottal pulse spectrum (e.g., -12 dB/octave) and lip-radiation frequency response (nominally +6dB /octave), and compensating with a simple preemphasis characteristic (e.g., $ 12 - 6 = 6$ dB/octave) [297]. More accurate glottal pulse estimation in terms of parameters of the derivative-glottal-wave models by Liljencrants, Fant, and Klatt [133,257] (still assuming +6dB/octave for lip radiation) was carried out in the thesis research of Vicky Lu [290], and further extension of that work appears in [250,213,251].

Formant Synthesis Models

A formant synthesizer is a source-filter model in which the source models the glottal pulse train and the filter models the formant resonances of the vocal tract. Constrained linear prediction can be used to estimate the parameters of formant synthesis models, but more generally, formant peak parameters may be estimated directly from the short-time spectrum (e.g., [255]). The filter in a formant synthesizer is typically implemented using cascade or parallel second-order filter sections, one per formant. Most modern rule-based text-to-speech systems descended from software based on this type of synthesis model [255,256,257]. Another type of formant-synthesis method, developed specifically for singing-voice synthesis is called the FOF method [386]. It can be considered an extension of the VOSIM voice synthesis algorithm [219]. In the FOF method, the formant filters are implemented in the time domain as parallel second-order sections; thus, the vocal-tract impulse response is modeled a sum of three or so exponentially decaying sinusoids. Instead of driving this filter with a glottal pulse wave, a simple impulse is used, thereby greatly reducing computational cost. A convolution of an impulse response with an impulse train is simply a periodic superposition of the impulse response. In the VOSIM algorithm, the impulse response was trimmed to one period in length, thereby avoiding overlap and further reducing computations. The FOF method also tapers the beginning of the impulse-response using a rising half-cycle of a sinusoid. This qualitatively reduces the ``buzziness'' of the sound, and compensates for having replaced the glottal pulse with an impulse. In practice, however, the synthetic signal is matched to the desired signal in the frequency domain, and the details of the onset taper are adjusted to optimize audio quality more generally, including to broaden the formant resonances. One of the difficulties of formant synthesis methods is that formant parameter estimation is not always easy [408]. The problem is particularly difficult when the fundamental frequency $ F_0$ is so high that the formants are not adequately ``sampled'' by the harmonic frequencies, such as in high-pitched female voice samples. Formant ambiguities due to insufficient spectral sampling can often be resolved by incorporating additional physical constraints to the extent they are known. Formant synthesis is an effective combination of physical and spectral modeling approaches. It is a physical model in that there is an explicit division between glottal-flow wave generation and the formant-resonance filter, despite the fact that a physical model is rarely used for either the glottal waveform or the formant resonator. On the other hand, it is a spectral modeling method in that its parameters are estimated by explicitly matching short-time audio spectra of desired sounds. It is usually most effective for any synthesis model, physical or otherwise, to be optimized in the ``audio perception'' domain to the extent it is known how to do this [312,165]. For an illustrative example, see, e.g., [201].

Further Reading in Speech Synthesis

More on the history and practice of speech synthesis can be found in, e.g., [139,140,142,408,335,106]. A surprisingly complete collection of sound examples spanning the history of speech synthesis can be heard on the CD-ROM accompanying a JASA-87 review article by Dennis Klatt [256]. Sound examples of these and related models in music applications may be found online as well [448].

String Models

As mentioned in §A.5.2, the first known computational models of vibrating strings were based on finite difference modeling by Ruiz and Hiller [194]. In the field of musical acoustics, software simulations of bowed strings were being carried out by Michael E. McIntyre and James Woodhouse as early as the mid 1970s [307]. In these simulations, the string is represented by the ``Green's function'' (impulse response) seen by a traveling wave traversing the string once in both directions (i.e., one round trip). This model was evidently based on investigations in the 1960s by John Schelleng, Lothar Cremer, and Cremer's co-worker Hans Lazarus into the behavior of the so-called ``corner rounding function'' (round-trip impulse response) in the context of bowed-string dynamics [409,410,95].A.19 The bow-string junction was based on theory worked out by Friedlander [151] and Keller [244], both published in 1953. These mathematical models of bowed-string dynamics were in turn preceded by influential investigations by Raman, published in 1918 [365], and Hermann von Helmholtz, published in 1863 [538]. An important enabling scientific measurement was that of the friction curve describing the bow-string contact, and many models, such as in [307,308] used a hyperbolic friction curve, approximating measurements by Lazarus [95]. Interestingly, while researchers in musical acoustics would often use their models to produce example waveforms illustrating the successful modeling of various visible physical effects, they apparently never listened to them as sound. Robert T. Schumacher, who had been interested primarily in woodwind musical acoustics, collaborated with McIntyre and Woodhouse, and the result was a generalization of the Friedlander-Keller bowed-string model to wind instruments [308]. Incidentally, electrical equivalent circuits for bowed-string instruments appeared as early as 1952 [332, p. 122], with perhaps the most cited model being Schelleng's published in 1963 [409].

Karplus-Strong Algorithms

In 1983 the Karplus-Strong [236] and Extended Karplus-Strong (EKS) [207,428] algorithms were published (in the same issue of the Computer Music Journal). Kevin Karplus and Alex Strong were computer science students at Stanford trying out wavetable synthesis algorithms on an 8-bit microcomputer. The sounds were rather boring, as any precisely periodic sound tends to be, so Strong had the idea of filtering the wavetable by a two-point average, $ H(z)=0.5+0.5\,z^{-1}$, on each reading pass (which requires no multiplies), in order to change the timbre dynamically. The result sounded curiously like a string, almost no matter what initial table was used. After some experimentation, they settled on random numbers as the standard excitation. Strong played in a string quartet with David Jaffe, a CCRMA Composition PhD student, and Jaffe decided to use the algorithm in his piece ``Silicon Valley Breakdown'' (1982) which became a tour de force demonstration of the EKS algorithm. In preparation for the piece, Jaffe approached the author regarding how to obtain better musical control over tuning, brightness, dynamic level, and the like, and the resulting extensions were described in [207,428]. Having studied the McIntyre-Woodhouse papers, the author recognized the filtered-delay-loop in the Karplus-Strong algorithm as having the same transfer function as that of an idealized vibrating string [428]. This equivalence later led to deriving the Karplus-Strong algorithm as a special case of a digital waveguide string model (see Chapter 6 up to §9.1.4 for the derivation). When viewed as a digital waveguide model, the musical quality and expressivity of the EKS algorithm suggested (ca. 1985) that extremely simplified traveling-wave models with sparsely lumped losses and dispersion could capture the necessary dimensions of musical expressivity at very low computational cost.

Digital Waveguide Models

The digital waveguide modeling paradigm was developed in 1985-86 as a guaranteed-stable ``construction kit'' for lossless reverberator prototypes supporting general feedback topologies [430]. The following year, digital waveguide building blocks were extended to vibrating strings and single-cylinder acoustic tubes (principally for the clarinet family) [431]. A modular synthesis architecture was developed in which various ``nonlinear junctions'' could be used to excite digital waveguide networks of general design. The iterative Friedlander-Keller solver of [308] was replaced by a look-up table plus a couple of additions and a multiply, facilitating sound synthesis in real time. This computational model was presented, with sound examples, at the 1986 ICMC, and it so happened that Yamaha's chief engineer was in the audience. Perhaps significantly, the FM synthesis patent was nearing the end of its life. Yamaha soon hired some consultants to evaluate waveguide synthesis, and in 1989 they began a strenuous development effort culminating in the VL1 (``Virtual Lead'') synthesizer family, introduced in the U.S. at the 1994 NAMM show in LA. The jazz trio demonstrating the VL1 at NAMM created quite a ``buzz'', and the cover of Keyboard Magazine soon proclaimed it as ``The Next Big Thing'' in synthesis technology [375]. As it happened, physical modeling synthesis did not immediately meet with large commercial success, perhaps because wavetable synthesis required less computation and was so much easier to ``voice'', and because memory was fairly inexpensive. The market at large simply has not yet demanded more expressive sound synthesis algorithms than what wavetable synthesis can provide. However, the quality of the VL1 in the hands of skilled players constituted proof-of-concept for many of us, and helped stimulate further academic interest in the approach. Trained performing musicians generally prefer the expressive richness of a computational physical model, and find wavetable synthesis to be comparatively limiting, as typically implemented.


This appendix has attempted to trace some of the early technical developments which enabled and shaped virtual acoustic musical instrument technology as addressed in this book, particularly for the case of virtual stringed instruments. For the early period covered (roughly up to the mid 1980s), many lines of developments specifically in the fields of acoustics and musical acoustics per se have been omitted. Some recent review articles include [445,515]. For sound examples and few other recent developments, see the companion overheads for this appendix [448].
Next Section:
Elementary Physics, Mechanics, and Acoustics
Previous Section: