Free Books

Digital Waveguide Models

In this chapter, we summarize the basic principles of digital waveguide models. Such models are used for efficient synthesis of string and wind musical instruments (and tonal percussion, etc.), as well as for artificial reverberation. They can be further used in modal synthesis by efficiently implementing a quasi harmonic series of modes in a single ``filtered delay loop''.

We begin with the simplest case of the infinitely long ideal vibrating string, and the model is unified with that of acoustic tubes. The resulting computational model turns out to be a simple bidirectional delay line. Next we consider what happens when a finite length of ideal string (or acoustic tube) is rigidly terminated on both ends, obtaining a delay-line loop. The delay-line loop provides a basic digital-waveguide synthesis model for (highly idealized) stringed and wind musical instruments. Next we study the simplest possible excitation for a digital waveguide string model, which is to move one of its (otherwise rigid) terminations. Excitation from ``initial conditions'' is then discussed, including the ideal plucked and struck string. Next we introduce damping into the digital waveguide, which is necessary to model realistic losses during vibration. This much modeling yields musically useful results. Another linear phenomenon we need to model, especially for piano strings, is dispersion, so that is taken up next. Following that, we consider general excitation of a string or tube model at any point along its length. Methods for calibrating models from recorded data are outlined, followed by modeling of coupled waveguides, and simple memoryless nonlinearities are introduced and analyzed.

Ideal Vibrating String

Figure 6.1: The ideal vibrating string.

The ideal vibrating string is depicted in Fig.6.1. It is assumed to be perfectly flexible and elastic. Once ``plucked,'' it will vibrate forever in one plane as an energy conserving system. The mathematical theory of string vibration is considered in §B.6 and Appendix C. For present purposes, we only need some basic definitions and results.

Wave Equation

The wave equation for the ideal vibrating string may be written as

$\displaystyle Ky''= \epsilon {\ddot y}$ (7.1)

where we define the following notation:

\begin{displaymath}\begin{array}{rclrcl} K& \isdef & \mbox{string tension} & \qq...
...isdef & \frac{\partial}{\partial x}y(t,x) \nonumber \end{array}\end{displaymath}    

As discussed in Chapter 1, the wave equation in this form can be interpreted as a statement of Newton's second law,

$\displaystyle \textit{force} = \textit{mass} \times \textit{acceleration},

on a microscopic scale. Since we are concerned with transverse vibrations on the string, the relevant restoring force (per unit length) is given by the string tension (force along the string axis) times the curvature of the string, or $ Ky''(t,x)$; the restoring force is balanced at all times by the inertial force per unit length of the string which is equal to mass density (mass per unit length) times transverse acceleration, i.e., $ \epsilon {\ddot y}(t,x)$. See Appendix B for a review of basic physical concepts. The wave equation is derived in some detail in §B.6.

Wave Equation Applications

The ideal-string wave equation applies to any perfectly elastic medium which is displaced along one dimension. For example, the air column of a clarinet or organ pipe can be modeled using the one-dimensional wave equation by substituting air-pressure deviation for string displacement, and longitudinal volume velocity for transverse string velocity. We refer to the general class of such media as one-dimensional waveguides. Extensions to two and three dimensions (and more, for the mathematically curious), are also possible (see §C.14) [518,520,55].

For a physical string model, at least three coupled waveguide models should be considered. Two correspond to transverse-wave vibrations in the horizontal and vertical planes (two polarizations of planar vibration); the third corresponds to longitudinal waves. For bowed strings, torsional waves should also be considered, since they affect bow-string dynamics [308,421]. In the piano, for key ranges in which the hammer strikes three strings simultaneously, nine coupled waveguides are required per key for a complete simulation (not including torsional waves); however, in a practical, high-quality, virtual piano, one waveguide per coupled string (modeling only the vertical, transverse plane) suffices quite well [42,43]. It is difficult to get by with fewer than the correct number of strings, however, because their detuning determines the entire amplitude envelope as well as beating and aftersound effects [543].

Traveling-Wave Solution

It can be readily checked (see §C.3 for details) that the lossless 1D wave equation

$\displaystyle Ky''= \epsilon {\ddot y}$

(where all terms are defined in Eq.$ \,$(6.1)) is solved by any string shape which travels to the left or right with speed

$\displaystyle c \isdeftext \sqrt{\frac{K}{\epsilon }}.

If we denote right-going traveling waves in general by $ y_r(t-x/c)$ and left-going traveling waves by $ y_l(t+x/c)$, where $ y_r$ and $ y_l$ are arbitrary twice-differentiable functions, then the general class of solutions to the lossless, one-dimensional, second-order wave equation can be expressed as

$\displaystyle y(t,x) = y_r(t-x/c) + y_l(t+x/c). \protect$ (7.2)

Note that we have $ {\ddot y}_r= c^2y''_r$ and $ {\ddot y}_l= c^2y''_l$ (derived in §C.3.1) showing that the wave equation is satisfied for all traveling wave shapes $ y_r$ and $ y_l$. However, the derivation of the wave equation itself assumes the string slope $ \vert y^\prime\vert$ is much less than $ 1$ at all times and positions (see §B.6). An important point to note is that a function of two variables $ y(t,x)$ is replaced by two functions of a single (time) variable. This leads to great reductions in computational complexity, as we will see. The traveling-wave solution of the wave equation was first published by d'Alembert in 1747 [100]7.1

Sampled Traveling-Wave Solution

As discussed in more detail in Appendix C, the continuous traveling-wave solution to the wave equation given in Eq.$ \,$(6.2) can be sampled to yield

$\displaystyle y(nT,mX)$ $\displaystyle =$ $\displaystyle y_r(nT-mX/c) + y_l(nT+mX/c)$   $\displaystyle \mbox{(set $X=cT$)}$  
  $\displaystyle =$ $\displaystyle y_r(nT-mT) + y_l(nT+mT)$  
  $\displaystyle \isdef$ $\displaystyle y^{+}(n-m) + y^{-}(n+m)
\protect,$ (7.3)

where $ T$ denotes the time sampling interval in seconds, $ X=cT$ denotes the spatial sampling interval in meters, and $ y^{+}$ and $ y^{-}$ are defined for notational convenience.

Wave Impedance

A concept of high practical utility is that of wave impedance, defined for vibrating strings as force divided by velocity. As derived in §C.7.2, the relevant force quantity in this case is minus the string tension times the string slope:

$\displaystyle f(t,x) \isdef -Ky'(t,x)$ (7.4)

Physically, this can be regarded as the transverse force acting to the right on the string in the vertical direction. (Only transverse vibration is being considered.) In other words, the vertical component of a negative string slope pulls ``up'' on the segment of string to the right, and ``up'' is the positive direction for displacement, velocity, and now force. The traveling-wave decomposition of the force into force waves is thus given by (see §C.7.2 for a more detailed derivation)7.2

f(t,x) &=& f_r(t-x/c) + f_l(t+x/c)\\
&=& -Ky'_r(t-x/c) - Ky'...
...}{c}{\dot y}_l(t+x/c)\\
&\isdef & R{v_r}(t-x/c) - Rv_l(t+x/c),

where we have defined the new notation $ v={\dot y}$ for transverse velocity, and

$\displaystyle R\isdefs \frac{K}{c} \isdefs \frac{K}{\sqrt{K/\epsilon }} \eqsp \sqrt{K\epsilon },

where $ K$ is the string tension and $ \epsilon $ is mass density. The newly defined positive constant $ R=\sqrt{K\epsilon }$ is called the wave impedance of the string for transverse waves. It is always real and positive for the ideal string. Three expressions for the wave impedance are

$\displaystyle \zbox {R\isdefs \sqrt{K\epsilon } \eqsp \frac{K}{c} \eqsp \epsilon c.} \protect$ (7.5)

The wave impedance simply relates force and velocity traveling waves:

\begin{displaymath}\begin{array}{rcr@{\,}l} f^{{+}}(n)&=&&\!R\,v^{+}(n) \\ f^{{-...
...\end{array} \protect% FIXHTML: Causes a strange button in HTML
\end{displaymath} (7.6)

These relations may be called Ohm's law for traveling waves. Thus, in a traveling wave, force is always in phase with velocity (considering the minus sign in the left-going case to be associated with the direction of travel rather than a $ 180$ degrees phase shift between force and velocity).

The results of this section are derived in more detail in Appendix C. However, all we need in practice for now are the important Ohm's law relations for traveling-wave components given in Eq.$ \,$(6.6).

Ideal Acoustic Tube

As discussed in §C.7.3, the most commonly used digital waveguide variables (``wave variables'') for acoustic tube simulation are traveling pressure and volume-velocity samples. These variables are exactly analogous to the traveling force and transverse-velocity waves used for vibrating string models.

The Ohm's law relations for acoustic-tube wave variables may be written as follows (cf. Eq.$ \,$(6.6)):

\begin{displaymath}\begin{array}{rcrl} p^+(n) &=& &R_{\hbox{\tiny T}}\, u^{+}(n)...
...p^-(n) &=& -&R_{\hbox{\tiny T}}\, u^{-}(n) \end{array} \protect\end{displaymath} (7.7)

Here $ p^+(n)$ is the right-going traveling longitudinal pressure wave component, $ p^-(n)$ is the left-going pressure wave, and $ u^\pm (n)$ are the left- and right-going volume velocity waves. For acoustic tubes, the wave impedance $ R_{\hbox{\tiny T}}$ is given by

$\displaystyle R_{\hbox{\tiny T}}= \frac{\rho c}{A}$   (Acoustic-Tube Wave Impedance) (7.8)

where $ \rho$ denotes the density (mass per unit volume) of air, $ c$ is sound speed in air, and $ A$ is the cross-sectional area of the tube.

In this formulation, the acoustic tube is assumed to contain only traveling plane waves to the left and right. This is a reasonable assumption for wavelengths $ \lambda$ much larger than the tube diameter ( $ \approx \sqrt{A}$). In this case, a change in the tube cross-sectional area $ A$ along the tube axis will cause lossless scattering of incident plane waves. That is, the plane wave splits into a transmitted and reflected component such that wave energy is conserved (see Appendix C for a detailed derivation).

Figure 6.2: Kelly-Lochbaum piecewise cylindrical tube model of the vocal tract (top) and sampled traveling-wave formulation (bottom).

Figure 6.2 shows a piecewise cylindrical tube model of the vocal tract and a corresponding digital simulation [245,297]. In the figure, $ k_1$ denotes the reflection coefficient associated with the first tube junction (where the cross-sectional area changes), and $ 1+k_1$ is the corresponding transmission coefficient for traveling pressure plane waves. The corresponding reflection and transmission coefficients for volume velocity are $ -k_1$ and $ 1-k_1$, respectively. Again, see Appendix C for a complete derivation.

At higher frequencies, those for which $ \lambda\ll\sqrt{A}$, changes in the tube cross-sectional area $ A$ give rise to mode conversion (which we will neglect in this chapter). Mode conversion means that an incident plane wave (the simplest mode of propagation in the tube) generally scatters into waves traveling in many directions, not just the two directions along the tube axis. Furthermore, even along the tube axis, there are higher orders of mode propagation associated with ``node lines'' in the transverse plane (such as Bessel functions of integer order [541]). When mode conversion occurs, it is necessary to keep track of many components in a more general modal expansion of the acoustic field [336,13,50]. We may say that when a plane wave encounters a change in the cross-sectional tube area, it is ``converted'' into a sum of propagation modes. The coefficients (amplitude and phase) of the new modes are typically found by matching boundary conditions. (Pressure and volume-velocity must be continuous throughout the tube.)

As mentioned above, in acoustic tubes we work with volume velocity, because it is volume velocity that is conserved when a wave propagates from one tube section to another. For plane waves in open air, on the other hand, we use particle velocity, and in this case, the wave impedance of open air is $ R_0=\rho c$ instead. That is, the appropriate wave impedance in open air (not inside an acoustic tube) is pressure divided by particle-velocity for any traveling plane wave. If $ u^{+}(n)$ denotes a sample of the volume-velocity plane-wave traveling to the right in an acoustic tube of cross-sectional area $ A$, and if $ v^{+}(n)$ denotes the corresponding particle velocity, then we have

$\displaystyle u^{+}(n) \eqsp v^{+}(n)\, A.

Note that particle velocity is in units of meters per second, while volume velocity is in units of meters-cubed per second (literally a volume of flow per unit time--hence the name). In summary, particle velocity is the appropriate velocity for simulations of waves in open air, while volume velocity is the right choice for acoustic tubes, or ``ducts,'' of varying cross-sectional area. See §B.7.2 and Appendix C for further discussion.

Rigid Terminations

A rigid termination is the simplest case of a string (or tube) termination. It imposes the constraint that the string (or air) cannot move at the termination. (We'll look at the more practical case of a yielding termination in §9.2.1.) If we terminate a length $ L$ ideal string at $ x=0$ and $ x=L$, we then have the ``boundary conditions''

$\displaystyle y(t,0) \equiv 0 \qquad y(t,L) \equiv 0 \protect$ (7.9)

where ``$ \equiv$'' means ``identically equal to,'' i.e., equal for all $ t$. Let $ N\isdef 2L/X$ denote the time in samples to propagate from one end of the string to the other and back, or the total ``string loop'' delay. The loop delay $ N$ is also equal to twice the number of spatial samples along the string.

Applying the traveling-wave decomposition from Eq.$ \,$(6.2), we have

y(nT,0) &=& y^{+}(n) + y^{-}(n) \;\equiv\; 0\\
y(nT,NX/2) &=& y^{+}(n-N/2) + y^{-}(n+N/2) \;\equiv\; 0.

Therefore, solving for the reflected waves gives

$\displaystyle y^{+}(n)$ $\displaystyle =$ $\displaystyle -y^{-}(n)$ (7.10)
$\displaystyle y^{-}(n+N/2)$ $\displaystyle =$ $\displaystyle -y^{+}(n-N/2).$ (7.11)

A digital simulation diagram for the rigidly terminated ideal string is shown in Fig.6.3. A ``virtual pickup'' is shown at the arbitrary location $ x=\xi $.

Figure 6.3: The rigidly terminated ideal string, with a displacement output indicated at position $ x=\xi $. Rigid terminations reflect traveling displacement, velocity, or acceleration waves with a sign inversion. Slope or force waves reflect with no sign inversion.

Velocity Waves at a Rigid Termination

Since the displacement is always zero at a rigid termination, the velocity is also zero there:

$\displaystyle v(t,0) \equiv 0 \qquad v(t,L) \equiv 0

Therefore, velocity waves reflect from a rigid termination with a sign flip, just like displacement waves:
$\displaystyle v^{+}(n)$ $\displaystyle =$ $\displaystyle -v^{-}(n)$  
$\displaystyle v^{-}(n+N/2)$ $\displaystyle =$ $\displaystyle -v^{+}(n-N/2)
\protect$ (7.12)

Such inverting reflections for velocity waves at a rigid termination are identical for models of vibrating strings and acoustic tubes.

Force or Pressure Waves at a Rigid Termination

To find out how force or pressure waves recoil from a rigid termination, we may convert velocity waves to force or velocity waves by means of the Ohm's law relations of Eq.$ \,$(6.6) for strings (or Eq.$ \,$(6.7) for acoustic tubes), and then use Eq.$ \,$(6.12), and then Eq.$ \,$(6.6) again:

f^{{+}}(n) &=&Rv^{+}(n) \eqsp -Rv^{-}(n) \eqsp f^{{-}}(n) \\
...+N/2) &=&-Rv^{-}(n+N/2) \eqsp Rv^{+}(n-N/2) \eqsp f^{{+}}(n-N/2)

Thus, force (and pressure) waves reflect from a rigid termination with no sign inversion:7.3

f^{{+}}(n) &=& f^{{-}}(n) \\
f^{{-}}(n+N/2) &=& f^{{+}}(n-N/2)

The reflections from a rigid termination in a digital-waveguide acoustic-tube simulation are exactly analogous:

p^+(n) &=& p^-(n) \\
p^-(n+N/2) &=& p^+(n-N/2)

Waveguide terminations in acoustic stringed and wind instruments are never perfectly rigid. However, they are typically passive, which means that waves at each frequency see a reflection coefficient not exceeding 1 in magnitude. Aspects of passive ``yielding'' terminations are discussed in §C.11.

Moving Rigid Termination

It is instructive to study the ``waveguide equivalent circuit'' of the simple case of a rigidly terminated ideal string with its left endpoint being moved by an external force, as shown in Fig.6.4. This case is relevant to bowed strings9.6) since, during time intervals in which the bow and string are stuck together, the bow provides a termination that divides the string into two largely isolated segments. The bow can therefore be regarded as a moving termination during ``sticking''.

Figure 6.4: Moving rigid termination for an ideal string at time $ t_0$.

Referring to Fig.6.4, the left termination of the rigidly terminated ideal string is set in motion at time $ t=0$ with a constant velocity $ v_0$. From Eq.$ \,$(6.5), the wave impedance of the ideal string is $ R=\sqrt{K\epsilon }$, where $ K$ is tension and $ \epsilon $ is mass density. Therefore, the upward force applied by the moving termination is initially $ f_0=Rv_0$. At time $ t_0<L/c$, the traveling disturbance reaches a distance $ c t_0$ from $ x=0$ along the string. Note that the string slope at the moving termination is given by $ -v_0 t_0/(c t_0) = -v_0/c = -(f_0/R)/c = -f_0/K$, which derives the fact that force waves are minus tension times slope waves. (See §C.7.2 for a fuller discussion.)

Digital Waveguide Equivalent Circuits

Figure 6.5: Digital waveguide ``equivalent circuits'' for the rigidly terminated ideal string with a moving termination. a) Velocity-wave simulation. b) Force-wave simulation.

Two digital waveguide ``equivalent circuits'' are shown in Fig.6.5. In the velocity-wave case of Fig.6.5a, the termination motion appears as an additive injection of a constant velocity $ v_0$ at the far left of the digital waveguide. At time 0, this initiates a velocity step from 0 to $ v_0$ traveling to the right. When the traveling step-wave reaches the right termination, it reflects with a sign inversion, thus sending back a ``canceling wave'' to the left. Behind the canceling wave, the velocity is zero, and the string is not moving. When the canceling step-wave reaches the left termination, it is inverted again and added to the externally injected dc signal, thereby sending an amplitude $ 2v_0$ positive step-wave to the right, overwriting the amplitude $ v_0$ signal in the upper rail. This can be added to the amplitude $ -v_0$ signal in the lower rail to produce a net traveling velocity step of amplitude $ v_0$ traveling to the right. This process repeats forever, resulting in traveling wave components which grow without bound, but whose sum is always either 0 or $ v_0$. Thus, at all times the string can be divided into two segments, where the segment to the left is moving upward with speed $ v_0$, and the segment to the right is motionless.

At this point, it is a good exercise to try to mentally picture the string shape during this process: Initially, since both the left end support and the right-going velocity step are moving with constant velocity $ c$, it is clear that the string shape is piece-wise linear, with a negative-slope segment on the left adjoined to a zero-slope segment on the right. When the velocity step reaches the right termination and reflects to produce a canceling wave, everything to the left of this wave remains a straight line which continues to move upward at speed $ v_0$, while all points to the right of the canceling wave's leading edge are not moving. What is the shape of this part of the string? (The answer is given in the next paragraph, but try to ``see'' it first.)

Animation of Moving String Termination and Digital Waveguide Models

In the force wave simulation of Fig.6.5b,7.4 the termination motion appears as an additive injection of a constant force $ f_0=Rv_0$ at the far left. At time 0, this initiates a force step from 0 to $ f_0$ traveling to the right. Since force waves are negated slope waves multiplied by tension, i.e., $ f^{{+}}=-Ky'^{+}$, the slope of the string behind the traveling force step is $ y'=-f_0/K$. When the traveling step-wave reaches the right termination, it reflects with no sign inversion, thus sending back a doubling-wave to the left which elevates the string force from $ f_0$ to $ 2f_0$. Behind this wave, the slope is then $ y'=-2f_0/K$. This answers the question of the previous paragraph: The string is in fact piecewise linear during the first return reflection, consisting of two line segments with slope $ -f_0/K$ on the left, and twice that on the right. When the return step-wave reaches the left termination, it is reflected again and added to the externally injected dc force signal, sending an amplitude $ 2f_0$ positive step-wave to the right (overwriting the amplitude $ f_0$ signal in the upper rail). This can be added to the amplitude $ f_0$ samples in the lower rail to produce a net traveling force step in the string of amplitude $ 3f_0$ traveling to the right. The slope of the string behind this wave is $ y'= -3f_0/K$, and the slope in front of this wave is still $ -2f_0/K$. The force applied to the string by the termination has risen to $ 3f_0$ in order to keep the velocity steady at $ v_0$. (We may interpret the $ f_0$ input as the additional force needed each period to keep the termination moving at speed $ v_0$--see the next paragraph below.) This process repeats forever, resulting in traveling wave components which grow without bound, and whose sum (which is proportional to minus the physical string slope) also grows without bound.7.5The string is always piecewise linear, consisting of at most two linear segments having negative slopes which differ by $ -f_0/K$. A sequence of string displacement snapshots is shown in Fig.6.6.

Figure 6.6: Successive snapshots of the rigidly terminated ideal string with a moving termination. For clarity, the string is plotted higher on each successive snapshot. (One can consider both endpoints to be moving at the same speed up to time 0, after which the left termination begins moving faster by a constant velocity offset.)

Terminated String Impedance

Note that the impedance of the terminated string, seen from one of its endpoints, is not the same thing as the wave impedance $ R=\sqrt{K\epsilon }$ of the string itself. If the string is infinitely long, they are the same. However, when there are reflections, they must be included in the impedance calculation, giving it an imaginary part. We may say that the impedance has a ``reactive'' component. The driving-point impedance of a rigidly terminated string is ``purely reactive,'' and may be called a reactance7.1). If $ f(t)$ denotes the force at the driving-point of the string and $ v(t)$ denotes its velocity, then the driving-point impedance is given by (§7.1)

$\displaystyle R(j\omega) \isdefs \left.\frac{F(s)}{V(s)}\right\vert _{s=j\omega},

where $ F(s)$ and $ V(s)$ denote the Laplace transforms of $ f(t)$ and $ v(t)$. In the case of a rigidly terminated string above, as well as in any system in which all energy delivered to the system is ultimately reflected back to the input, the impedance is purely imaginary at every frequency (a ``pure reactance''), as is easy to show:

$\displaystyle R(s) \isdefs \frac{F(s)}{V(s)}
\eqsp \frac{F^{+}+F^{-}}{V^{+}+V^...
\eqsp R\frac{1+e^{-s2L/c}}{1-e^{-s2L/c}}

where $ L$ denotes the string length. Let $ P=2L/c$ denote the period of string vibration. Then on the frequency axis $ s=j\omega$ we have

$\displaystyle R(j\omega)
\eqsp R\frac{1+e^{-j\omega P}}{1-e^{-j\omega P}}
\eqsp R\frac{2\cos(\omega P/2)}{2j\sin(\omega P/2)}
\eqsp -jR\,\cot(\omega P/2).

Thus, the driving-point impedance of a rigidly terminated string is purely reactive (imaginary), with alternating poles and zeros along the $ j\omega $ axis. Impedance will be discussed further in §7.1 below.

The Ideal Plucked String

The ideal plucked string is defined as an initial string displacement and a zero initial velocity distribution [317]. More generally, the initial displacement along the string $ y(0,x)$ and the initial velocity distribution $ {\dot y}(0,x)$, for all $ x$, fully determine the resulting motion in the absence of further excitation.

An example of the appearance of the traveling-wave components and the resulting string shape shortly after plucking a doubly terminated string at a point one fourth along its length is shown in Fig.6.7. The negative traveling-wave portions can be thought of as inverted reflections of the incident waves, or as doubly flipped ``images'' which are coming from the other side of the terminations.

Figure 6.7: A doubly terminated string, ``plucked'' at 1/4 its length.

An example of an initial ``pluck'' excitation in a digital waveguide string model is shown in Fig.6.8. The circulating triangular components in Fig.6.8 are equivalent to the infinite train of initial images coming in from the left and right in Fig.6.7.

There is one fine-point to note for the discrete-time case: We cannot admit a sharp corner in the string since that would have infinite bandwidth which would alias when sampled. Therefore, for the discrete-time case, we define the ideal pluck to consist of an arbitrary shape as in Fig.6.8 lowpass filtered to less than half the sampling rate. Alternatively, we can simply require the initial displacement shape to be bandlimited to spatial frequencies less than $ f_s/2c$. Since all real strings have some degree of stiffness which prevents the formation of perfectly sharp corners, and since real plectra are never in contact with the string at only one point, and since the frequencies we do allow span the full range of human hearing, the bandlimited restriction is not limiting in any practical sense.

Figure 6.8: Initial conditions for the ideal plucked string. The initial contents of the sampled, traveling-wave delay lines are in effect plotted inside the delay-line boxes. The amplitude of each traveling-wave delay line is half the amplitude of the initial string displacement. The sum of the upper and lower delay lines gives the physical initial string displacement.

Note that acceleration (or curvature) waves are a simple choice for plucked string simulation, since the ideal pluck corresponds to an initial impulse in the delay lines at the pluck point. Of course, since we require a bandlimited excitation, the initial acceleration distribution will be replaced by the impulse response of the anti-aliasing filter chosen. If the anti-aliasing filter chosen is the ideal lowpass filter cutting off at $ f_s/2$, the initial acceleration $ a(0,x)\isdeftext {\ddot y}(0,x)$ for the ideal pluck becomes

$\displaystyle a(0,x) = \frac{A}{X}$sinc$\displaystyle \left(\frac{x-x_p}{X}\right)$ (7.13)

where $ A$ is amplitude, $ x_p$ is the pick position, and, as we know from §4.4.1, sinc$ [(x-x_p)/X]$ is the ideal, bandlimited impulse, centered at $ x_p$ and having a rectangular spatial frequency response extending from $ -\pi/X$ to $ \pi/X$. (Recall that sinc$ (\xi)\isdeftext
\sin(\pi\xi)/(\pi\xi)$). Division by $ X$ normalizes the area under the initial shape curve. If $ x_p$ is chosen to lie exactly on a spatial sample $ x_m = mX$, the initial conditions for the ideal plucked string are as shown in Fig.6.9 for the case of acceleration or curvature waves. All initial samples are zero except one in each delay line.

Aside from its obvious simplicity, there are two important benefits of obtaining an impulse-excited model: (1) an extremely efficient ``commuted synthesis'' algorithm can be readily defined (§8.7), and (2) linear prediction (and its relatives) can be readily used to calibrate the model to recordings of normally played tones on the modeled instrument. Linear Predictive Coding (LPC) has been used extensively in speech modeling [296,297,20]. LPC estimates the model filter coefficients under the assumption that the driving signal is spectrally flat. This assumption is valid when the input signal is (1) an impulse, or (2) white noise. In the basic LPC model for voiced speech, a periodic impulse train excites the model filter (which functions as the vocal tract), and for unvoiced speech, white noise is used as input.

In addition to plucked and struck strings, simplified bowed strings can be calibrated to recorded data as well using LPC [428,439]. In this simplified model, the bowed string is approximated as a periodically plucked string.

Figure 6.9: Initial conditions for the ideal plucked string when the wave variables are chosen to be proportional to acceleration or curvature. If the bandlimited ideal pluck position is centered on a spatial sample, there is only a single nonzero sample in each of the initial delay lines.

The Ideal Struck String

The ideal struck string [317] can be modeled by a zero initial string displacement and a nonzero initial velocity distribution. In concept, an inelastic ``hammer strike'' transfers an ``impulse'' of momentum to the string at time 0 along the striking face of the hammer. (A more realistic model of a struck string will be discussed in §9.3.1.) An example of ``struck'' initial conditions is shown in Fig.6.10 for a striking hammer having a rectangular shape. Since $ v^\pm = \pm f^\pm /R= \mp c y'^\pm $,7.6the initial velocity distribution can be integrated with respect to $ x$ from $ x=0$, divided by $ c$, and negated in the upper rail to obtain equivalent initial displacement waves [317]. Interestingly, the initial displacement waves are not local (see also Appendix E).

Figure 6.10: Initial conditions for the ideal struck string in a velocity wave simulation.

The hammer strike itself may be considered to take zero time in the ideal case. A finite spatial width must be admitted for the hammer, however, even in the ideal case, because a zero width and a nonzero momentum transfer sends one (massless) point of the string immediately to infinity under infinite acceleration. In a discrete-time simulation, one sample represents an entire sampling interval, so a one-sample hammer width is well defined.

If the hammer velocity is $ v_h$, then the force against the hammer due to pushing against the string wave impedance is $ -2Rv_h$. The factor of $ 2$ arises because driving a point in the string's interior is equivalent to driving two string endpoints in ``series,'' i.e., the reaction forces sum. If the hammer is itself a dynamic system which has been ``thrown'' into the string, as discussed in §9.3.1 below, the reaction force slows the hammer over time, and the interaction is not impulsive, but rather the momentum transfer takes place over a period of time.

The hammer-string collision is ideally inelastic since the string provides a reaction force that is equivalent to that of a dashpot. In the case of a pure mass striking a single point on the ideal string, the mass velocity decays exponentially, and an exponential wavefront emanates in both directions. In the musical acoustics literature for the piano, the hammer is often taken to be a nonlinear spring in series with a mass, as discuss further in §9.3.2. A commuted waveguide piano model including a linearized piano hammer is described in § ``Wave digital hammer'' models, which employ a traveling-wave formulation of a lumped model and therefore analogous to a wave digital filter [136], are described in [523,56,42]. The ``wave digital'' modeling approach is introduced in §F.1.

The Damped Plucked String

Without damping, the ideal plucked string sounds more like a cheap electronic organ than a string because the sound is perfectly periodic and never decays. Static spectra are very boring to the ear. The discrete Fourier transform (DFT) of the initial ``string loop'' contents gives the Fourier series coefficients for the periodic tone produced.

The simplest change to the ideal wave equation of Eq.$ \,$(6.1) that provides damping is to add a term proportional to velocity:

$\displaystyle Ky''= \epsilon {\ddot y}+ \mu{\dot y} \protect$ (7.14)

Here, $ \mu>0$ can be thought of as a very simple friction coefficient, or resistance. As derived in §C.5, solutions to this wave equation can be expressed as sums of left- and right-going exponentially decaying traveling waves. When $ \mu=0$, we get non-decaying traveling waves as before. As discussed in §2.2.2, propagation losses may be introduced by the substitution

$\displaystyle z^{-1}\leftarrow gz^{-1}, \quad \left\vert g\right\vert\leq 1

in each delay element (or wherever one sample of delay models one spatial sample of wave propagation). By commutativity of LTI systems, making the above substitution in a delay line of length $ N$ is equivalent to simply scaling the output of the delay line by $ g^N$. This lumping of propagation loss at one point along the waveguide serves to minimize both computational cost and round-off error. In general finite difference schemes, such a simplification is usually either not possible or nonobvious.

Computational Savings

To illustrate how significant the computational savings can be, consider the simulation of a ``damped guitar string'' model in Fig.6.11. For simplicity, the length $ L$ string is rigidly terminated on both ends. Let the string be ``plucked'' by initial conditions so that we need not couple an input mechanism to the string. Also, let the output be simply the signal passing through a particular delay element rather than the more realistic summation of opposite elements in the bidirectional delay line. (A comb filter corresponding to pluck position can be added in series later.)

Figure 6.11: Discrete simulation of the rigidly terminated string with distributed resistive losses. The $ N$ loss factors $ g$ are embedded between the delay-line elements.

In this string simulator, there is a loop of delay containing $ N = 2L/X=
f_s/f_1$ samples where $ f_1$ is the desired pitch of the string. Because there is no input/output coupling, we may lump all of the losses at a single point in the delay loop. Furthermore, the two reflecting terminations (gain factors of $ -1$) may be commuted so as to cancel them. Finally, the right-going delay may be combined with the left-going delay to give a single, length $ N$, delay line. The result of these inaudible simplifications is shown in Fig. 6.12.

Figure 6.12: Discrete simulation of the rigidly terminated string with consolidated losses (frequency-independent). All $ N$ loss factors $ g$ have been ``pushed'' through delay elements and combined at a single point.

If the sampling rate is $ f_s=50$ kHz and the desired pitch is $ f_1=100$ Hz, the loop delay equals $ N=500$ samples. Since delay lines are efficiently implemented as circular buffers, the cost of implementation is normally dominated by the loss factors, each one requiring a multiply every sample, in general. (Losses of the form $ 1-2^{-k}$, $ 1-2^{-k}-2^{-l}$, etc., can be efficiently implemented using shifts and adds.) Thus, the consolidation of loss factors has reduced computational complexity by three orders of magnitude, i.e., by a factor of $ 500$ in this case. However, the physical accuracy of the simulation has not been compromised. In fact, the accuracy is improved because the $ N$ round-off errors per period arising from repeated multiplication by $ g$ have been replaced by a single round-off error per period in the multiplication by $ g^N$.

Frequency-Dependent Damping

In real vibrating strings, damping typically increases with frequency for a variety of physical reasons [73,77]. A simple modification [392] to Eq.$ \,$(6.14) yielding frequency-dependent damping is

$\displaystyle Ky''= \epsilon {\ddot y}+ \mu{\dot y}+ \mu_2{\dot y''}.

See §C.5.2 for some analysis of damped PDEs of this nature. A book-chapter by Vallette containing a thorough analysis of damped vibrating strings is given in [517].

The result of adding such damping terms to the wave equation is that traveling waves on the string decay at frequency-dependent rates. This means the loss factors $ g$ of the previous section should really be digital filters having gains which decrease with frequency (and never exceed $ 1$ for stability of the loop). These filters commute with delay elements because they are linear and time invariant (LTI) [449]. Thus, following the reasoning of the previous section, they can be lumped at a single point in the digital waveguide. Let $ {\hat G}(z)$ denote the resulting string loop filter (replacing $ g^N$ in Fig.6.127.7). We have the stability (passivity) constraint $ \vert{\hat G}(e^{j\omega T})\vert\leq1$, and making the filter linear phase (constant delay at all frequencies) will restrict consideration to symmetric FIR filters only.

Restriction to FIR filters yields the important advantage of keeping the approximation problem convex in the weighted least-squares norm. Convexity of a norm means that gradient-based search techniques can be used to find a global miminizer of the error norm without exhaustive search [64],[428, Appendix A].

The linear-phase requirement halves the degrees of freedom in the filter coefficients. That is, given $ {\hat g}(n)$ for $ n\geq0$, the coefficients $ {\hat g}(-n)={\hat g}(n)$ are also determined. The loss-filter frequency response can be written in terms of its (impulse response) coefficients as

$\displaystyle {\hat G}(e^{j\omega T}) = {\hat g}(0) + 2\sum_{n=1}^{(N_{\hat g}-1)/2} {\hat g}(n) \cos(\omega n T)

where $ \omega T=2\pi f T=2\pi f/f_s$, and the impulse-response length $ N_{\hat g}$ is assumed odd.

A further degree of freedom is eliminated from the loss-filter approximation by assuming all losses are insignificant at 0 Hz so that $ G(0)=1\,\,\Rightarrow\,\,{\hat G}(1)=1$ (taking the approximation error to be zero at $ \omega=0$). This means the coefficients of the FIR filter must sum to $ 1$. If the length of the filter is $ N_{\hat g}$, we have

$\displaystyle 1 = \sum_n {\hat g}(n) = {\hat g}(0) + 2\sum_{n=1}^{(N_{\hat g}-1)/2} {\hat g}(n)

The Stiff String

Stiffness in a vibrating string introduces a restoring force proportional to the bending angle of the string. As discussed further in §C.6, the usual stiffness term added to the wave equation for the ideal string yields

$\displaystyle \epsilon {\ddot y}= Ky''- \kappa y''''.

When this wave equation is solved in terms of traveling wavesC.6), it emerges that high-frequency wave components travel faster than low-frequency components. In other words, wave propagation in stiff strings is dispersive. (See §C.6 for details.)

Stiff-string models are commonly used in piano synthesis. In §9.4, further details of string models used in piano synthesis are described (§9.4.1).

Experiments with modified recordings of acoustic classical guitars indicate that overtone inharmonicity due to string-stiffness is generally not audible in nylon-string guitars, although just-noticeable-differences are possible for the 6th (lowest) string [225]. Such experiments may be carried out by retuning the partial overtones in a recorded sound sample so that they become exact harmonics. Such retuning is straightforward using sinusoidal modeling techniques [359,456].

Stiff String Synthesis Models

Figure 6.13: Commuted simulation of a rigidly terminated, stiff string (no damping).

An ideal stiff-string synthesis model is drawn in Fig. 6.13 [10]. See §C.6 for a detailed derivation. The delay-line length $ N$ is the number of samples in $ K$ periods at frequency $ f_K$, where $ K$ is the number of the highest partial supported (normally the last one before $ f_s/2$). This is the counterpart of Fig. 6.12 which depicted ideal-string damping which was lumped at a single point in the delay-line loop. For the ideal stiff string, however, (no damping), it is dispersion filtering that is lumped at a single point of the loop. Dispersion can be lumped like damping because it, too, is a linear, time-invariant (LTI) filtering of a propagating wave. Because it is LTI, dispersion-filtering commutes with other LTI systems in series, such as delay elements. The allpass filter in Fig.C.9 corresponds to filter $ H_s(z)$ in Fig.9.2 for the Extended Karplus-Strong algorithm. In practice, losses are also included for realistic string behavior (filter $ H_d(z)$ in Fig.9.2).

Allpass filters were introduced in §2.8, and a fairly comprehensive summary is given in Book II of this series [449, Appendix C].7.8The general transfer function for an allpass filter is given (in the real, single-input, single-output case) by

$\displaystyle H_s(z) = z^{-K} \frac{\tilde{A}(z)}{A(z)}

where $ K\geq 0$ is an integer pure-delay in samples (all delay lines are allpass filters),

$\displaystyle A(z) \isdef 1 + a_1 z^{-1}+ a_2 z^{-2} + \cdots + a_Nz^{-N},


$\displaystyle \tilde{A}(z)\isdef z^{-N}\overline{A}(z^{-1}) =
\overline{a_N} +...
...^{-1}+ \overline{a_{N-2}} z^{-2} + \cdots
+ \overline{a_1} z^{-N+1} + z^{-N}.

We may think of $ \tilde{A}(z)$ as the flip of $ A(z)$. For example, if $ A(z)=1+1.4z^{-1}+0.49z^{-2}$, we have $ \tilde{A}(z)=0.49+1.4z^{-1}+z^{-2}$. Thus, $ \tilde{A}(z)$ is obtained from $ A(z)$ by simply reversing the order of the coefficients (and conjugating them if they are complex, but normally they are real in practice). For an allpass filter $ H_s(z)$ simulating stiffness, we would normally have $ K=0$, since the filter is already in series with a delay line.

Section 6.11 below discusses some methods for designing stiffness allpass filters $ H_s(z)$ from measurements of stiff vibrating strings, and §9.4.1 gives further details for the case of piano string modeling.

The Externally Excited String

Sections 6.5 and 6.6 illustrated plucking or striking the string by means of initial conditions: an initial displacement for plucking and an initial velocity distribution for striking. Such a description parallels that found in many textbooks on acoustics, such as [317]. However, if the string is already in motion, as it often is in normal usage, it is more natural to excite the string externally by the equivalent of a ``pick'' or ``hammer'' as is done in the real world instrument.

Figure 6.14 depicts a rigidly terminated string with an external excitation input. The wave variable $ w$ can be set to acceleration, velocity, or displacement, as appropriate. (Choosing force waves would require eliminating the sign inversions at the terminations.) The external input is denoted $ {\Delta w}$ to indicate that it is an additive incremental input, superimposing with the existing string state.

Figure 6.14: Discrete simulation of the rigidly terminated string with an external excitation.

For idealized plucked strings, we may take $ w=a$ (acceleration), and $ {\Delta w}$ can be a single nonzero sample, or impulse, at the plucking instant. As always, bandlimited interpolation can be used to provide a non-integer time or position. In the latter case, there would be two or more summers along both the upper and lower rails, separated by unit delays. More generally, the string may be plucked by a force distribution $ f_p(t_n,x_m)$. The applied force at a point can be translated to the corresponding velocity increment via the wave impedance $ R$:

$\displaystyle \Delta v = \frac{f_p}{2R}$ (7.15)

where $ R=\sqrt{K\epsilon }$ as before. The factor of two comes from the fact that two string endpoints are being driven physically in parallel. (Physically, they are in parallel, but as impedances, they are formally in series. See §7.2 for the theory.)

Note that the force applied by a rigid, stationary pick or hammer varies with the state of a vibrating string. Also, when a pick or hammer makes contact with the string, it partially terminates the string, resulting in reflected waves in each direction. A simple model for the termination would be a mass affixed to the string at the excitation point. (This model is pursued in §9.3.1.) A more general model would be an arbitrary impedance and force source affixed to the string at the excitation point during the excitation event. This is a special case of the ``loaded waveguide junction,'' discussed in §C.12. In the waveguide model for bowed strings9.6), the bow-string interface is modeled as a nonlinear scattering junction.

Equivalent Forms

In a physical piano string, as a specific example, the hammer strikes the string between its two inputs, some distance from the agraffe and far from the bridge. This corresponds to the diagram in Fig.6.15, where the delay lines are again arranged for clarity of physical interpretation. Figure 6.15 is almost identical to Fig.6.14, except that the delay lines now contain samples of traveling force waves, and the bridge is allowed to vibrate, resulting in a filtered reflection at the bridge (see §9.2.1 for a derivation of the bridge filter). The hammer-string interaction force-pulse is summed into both the left- and right-going delay lines, corresponding to sending the same pulse toward both ends of the string from the hammer. Force waves are discussed further in §C.7.2.

Figure 6.15: Model of a piano string struck in its interior by a hammer.

Figure: Diagram equivalent to Fig.6.15, obtained by combining upper- and lower-rail delay lines.

By commutativity of linear, time-invariant elements, Figure 6.15 can be immediately simplified to the form shown in Fig.6.16, in which each delay line corresponds to the travel time in both directions on each string segment. From a structural point of view, we have a conventional filtered delay loop plus a second input which sums into the loop somewhere inside the delay line. The output is shown coming from the middle of the larger delay line, which gives physically correct timing, but in practice, the output can be taken from anywhere in the feedback loop. It is probably preferable in practice to take the output from the loop-delay-line input. That way, other response latencies in the overall system can be compensated.

Figure: Diagram equivalent to Fig.6.15, obtained by replacing the second string input by a separate comb-filter applied to a single input.

An alternate structure equivalent to Fig.6.16 is shown in Fig.6.17, in which the second input injection is factored out into a separate comb-filtering of the input. The comb-filter delay equals the delay between the two inputs in Fig.6.16, and the delay in the feedback loop equals the sum of both delays in Fig.6.16. In this case, the string is modeled using a simple filtered delay loop, and the striking-force signal is separately filtered by a comb filter corresponding to the striking-point along the string.

Algebraic derivation

The above equivalent forms are readily verified by deriving the transfer function from the striking-force input $ f_i(n)$ to the output force signal $ f_o(n)$

Referring to Fig.6.15, denote the input hammer-strike $ z$ transform by $ F_i(z)$ and the output signal $ z$ transform by $ F_o(z)$. Also denote the loop-filter transfer function by $ H_l(z)$. By inspection of the figure, we can write

$\displaystyle F_o(z) = z^{-N} \left\{ F_i(z) + z^{-2M}\left[F_i(z) + z^{-N} H_l(z)F_o(z)\right]\right\}.

Solving for the input-output transfer function yields

H(z) \isdef \frac{F_o(z)}{F_i(z)}
&=& z^{-N} \frac{1+z^{-2M}}...
...& \left(1+z^{-2M}\right)\frac{z^{-N}}{1-H_l(z)\,z^{-(2M+2N)}}\\

The final factored form above corresponds to the final equivalent form shown in Fig.6.17.

Related Forms

We see from the preceding example that a filtered-delay loop (a feedback comb-filter using filtered feedback, with delay-line length $ 2M+2N$ in the above example) simulates a vibrating string in a manner that is independent of where the excitation is applied. To simulate the effect of a particular excitation point, a feedforward comb-filter may be placed in series with the filtered delay loop. Such a ``pluck position'' illusion may be applied to any basic string synthesis algorithm, such as the EKS [428,207].

By an exactly analogous derivation, a single feedforward comb filter can be used to simulate the location of a linearized magnetic pickup [200] on a simulated electric guitar string. An ideal pickup is formally the transpose of an excitation. For a discussion of filter transposition (using Mason's gain theorem [301,302]), see, e.g., [333,449].7.9

The comb filtering can of course also be implemented after the filtered delay loop, again by commutativity. This may be desirable in situations in which comb filtering is one of many options provided for in the ``effects section'' of a synthesizer. Post-processing comb filters are often used in reverberator design and in virtual pickup simulation.

Figure 6.18: Use of a second delay-line tap to implement comb filtering corresponding to hammer-strike echoes returning from the far end of the string.

The comb-filtering can also be conveniently implemented using a second tap from the appropriate delay element in the filtered delay loop simulation of the string, as depicted in Fig.6.18. The new tap output is simply summed (or differenced, depending on loop implementation) with the filtered delay loop output. Note that making the new tap a moving, interpolating tap (e.g., using linear interpolation), a flanging effect is available. The tap-gain $ c$ can be brought out as a musically useful timbre control that goes beyond precise physical simulation (e.g., it can be made negative). Adding more moving taps and summing/differencing their outputs, with optional scale factors, provides an economical chorus or Leslie effect. These extra delay effects cost no extra memory since they utilize the memory that's already needed for the string simulation. While such effects are not traditionally applied to piano sounds, they are applied to electric piano sounds which can also be simulated using the same basic technique.


In summary, two feedforward comb filters and one feedback comb filter arranged in series (in any order) can be interpreted as a physical model of a vibrating string driven at one point along the string and observed at a different point along the string. The two feedforward comb-filter delays correspond to the excitation and pickup locations along the string, while the amount of feedback-loop delay controls the fundamental frequency of vibration. A filter in the feedback loop determines the decay rate and fine tuning of the partial overtones.

Loop Filter Identification

In §6.7 we discussed damping filters for vibrating string models, and in §6.9 we discussed dispersion filters. For vibrating strings which are well described by a linear, time-invariant (LTI) partial differential equation, damping and dispersion filtering are the only deviations possible from the ideal string string discussed in §6.1.

The ideal damping filter is ``zero phase'' (or linear phase) [449],7.10while the ideal dispersion filter is ``allpass'' (as described in §6.9.1). Since every desired frequency response can be decomposed into a zero-phase frequency-response in series with an allpass frequency-response, we may design a single loop filter whose amplitude response gives the desired damping as a function of frequency, and whose phase response gives the desired dispersion vs. frequency. The next subsection summarizes some methods based on this approach. The following two subsections discuss methods for the design of damping and dispersion filters separately.

General Loop-Filter Design

For general loop-filter design in vibrating string models (as well as in woodwind and brass instrument bore models), we wish to minimize [428, pp. 182-184]:

  1. Errors in partial overtone decay times
  2. Errors in partial overtone tuning
The partial decay times are controlled by the filter's amplitude response, while the partial tunings are determined by the filter's phase response. There also may be other filters in the loop (such as a delay-line interpolation filter) which need to be considered when designing the main loop filter.

There are numerous methods for designing the string loop filter $ {\hat G}(z)$ based on measurements of real string behavior. In [428], a variety of methods for system identification [288] were explored for this purpose, including ``periodic linear prediction'' in which a linear combination of a small group of samples is used to predict a sample one period away from the midpoint of the group. An approach based on a genetic algorithm is described in [378]; in that work, the error measure used with the genetic algorithm is based on properties of human perception of short-time spectra, as is now standard practice in digital audio coding [62]. Overviews of other approaches appear in [29] and [508].

Below is an outline of a simple and effective method used (ca. 1995) to design loop filters for some of the Sondius sound examples:

  1. Estimate the fundamental frequency (see §6.11.4 below)
  2. Set a Hamming FFT-window length to approximately four periods
  3. Compute the short-time Fourier transform (STFT)
  4. Perform a sinusoidal modeling analysis [466] to
    detect peaks in each spectral frame, and
    connect peaks through time to form amplitude envelopes
  5. Fit an exponential to each amplitude envelope
  6. Prepare the desired frequency-response, sampled at the harmonic frequencies of the delay-line loop without the loop filter. At each harmonic frequency,
    the nearest-partial decay-rate gives the desired loop-filter gains,
    the nearest-partial peak-frequency give the desired loop-filter phase delay.
  7. Use a phase-sensitive filter-design method such as invfreqz in matlab to design the desired loop filter from its frequency-response samples (further details below).

Physically, amplitude envelopes are expected to decay exponentially, although coupling phenomena may obscure the overall exponential trend. On a dB scale, exponential decay is a straight line. Therefore, a simple method of estimating the exponential decay time-constant for each overtone frequency is to fit a straight line to its amplitude envelope and use the slope of the fitted line to compute the decay time-constant. For example, the matlab function polyfit can be used for this purpose (where the requested polynomial order is set to 1). Since ``beating'' is typical in the amplitude envelopes, a refinement is to replace the raw amplitude envelope by a piecewise linear envelope that connects the upper local maxima in the raw amplitude envelope. The estimated decay-rate for each overtone determines a sample of the loop-filter amplitude response at the overtone frequency. Similarly, the overtone frequency determines a sample of the loop-filter phase response.

Taken together, the measured overtone decay rates and tunings determine samples of the complex frequency response of the desired loop filter. The matlab function invfreqz7.11 can be used to convert these complex samples into recursive filter coefficients (see §8.6.4 for a related example). A weighting function inversely proportional to frequency is recommended. Additionally, Steiglitz-McBride iterations can improve the results [287], [428, pp. 101-103]. Matlab's version of invfreqz has an iteration-count argument for specifying the number of Steiglitz-McBride iterations. The maximum filter-gain versus frequency should be computed, and the filter should be renormalized, if necessary, to ensure that its gain does not exceed 1 at any frequency; one good setting is that which matches the overall decay rate of the original recording.

Damping Filter Design

When dispersion can be neglected (as it typically can in many cases, such as for guitar strings), we may use linear-phase filter design methods, such as provided by the functions remez, firls, and fir2 in matlab. Since strings are usually very lightly damped, such linear-phase filter designs give high quality at very low orders. Another approach is to fit a low-order IIR filter, such as by applying invfreqz to a minimum-phase version of the desired amplitude response, and then subtract the phase-response of the resulting filter from the desired phase-response used in a subsequent allpass design. (Alternatively, the phase response of the loop filter can simply be neglected, as long as tuning is unaffected. If tuning is affected, the tuning allpass can be adjusted to compensate.)

Dispersion Filter Design

A pure dispersion filter is an ideal allpass filter. That is, it has a gain of 1 at all frequencies and only delays a signal in a frequency-dependent manner. The need for such filtering in piano string models is discussed in §9.4.1.

There is a good amount of literature on the topic of allpass filter design. Generally, they fall into the categories of optimized parametric, closed-form parametric, and nonparametric methods. Optimized parametric methods can produce allpass filters with optimal group-delay characteristics in some sense [272,271]. Closed-form parametric methods provide coefficient formulas as a function of a desired parameter such as ``inharmonicity'' [368]. Nonparametric methods are generally based on measured signals and/or spectra, and while they are suboptimal, they can be used to design very large-order allpass filters, and the errors can usually be made arbitrarily small by increasing the order [551,369,42,41,1], [428, pp. 60,172]. In music applications, it is usually the case that the ``optimality'' criterion is unknown because it depends on aspects of sound perception (see, for example, [211,384]). As a result, perceptually weighted nonparametric methods can often outperform optimal parametric methods in terms of cost/performance [2].

In historical order, some of the allpass filter-design methods are as follows: A modification of the method in [551] was suggested for designing allpass filters having a phase delay corresponding to the delay profile needed for a stiff string simulation [428, pp. 60,172]. The method of [551] was streamlined in [369]. In [77], piano strings were modeled using finite-difference techniques. An update on this approach appears in [45]. In [340], high quality stiff-string sounds were demonstrated using high-order allpass filters in a digital waveguide model. In [384], this work was extended by applying a least-squares allpass-design method [272] and a spectral Bark-warping technique [459] to the problem of calibrating an allpass filter of arbitrary order to recorded piano strings. They were able to correctly tune the first several tens of partials for any natural piano string with a total allpass order of 20 or less. Additionally, minimization of the $ L^\infty$ norm [271] has been used to calibrate a series of allpass-filter sections [42,41], and a dynamically tunable method, based on Thiran's closed-form, maximally flat group-delay allpass filter design formulas (§4.3), was proposed in [368]. An improved closed-form solution appears in [1] based on an elementary method for the robust design of very high-order allpass filters.

Fundamental Frequency Estimation

As mentioned in §6.11.2 above, it is advisable to estimate the fundamental frequency of vibration (often called ``F0'') in order that the partial overtones are well resolved while maintaining maximum time resolution for estimating the decay time-constant.

Below is a summary of the F0 estimation method used in calibrating loop filters with good results [471]:

  • Take an FFT of the middle third of a recorded plucked string tone.
  • Find the frequencies and amplitudes of the largest $ K$ peaks, where $ K$ is chosen so that the $ K$ retained peaks all have a reasonable signal-to-noise ratio.
  • Form a histogram of peak spacing $ \Delta f_i$
  • The pitch estimate $ {\hat f}_0$ is defined as the most common spacing $ \Delta f_i$ in the histogram.
The F0 estimate so formed is sufficiently accurate for purposes of calibrating a peak tracker for decay-rate estimation. (The F0 value is only used to set the FFT window length in the STFT to four nominal periods under the window, so it is not critical.)

Approximate Maximum Likelihood F0 Estimation

In applications for which the fundamental frequency F0 must be measured very accurately in a periodic signal, the estimate $ {\hat f}_0$ obtained by the above algorithm can be refined using a gradient search which matches a so-called ``harmonic comb'' to the magnitude spectrum of an interpolated FFT $ X(\omega)$:

$\displaystyle {\hat f}_0 \isdefs \arg\max_{{\hat f}_0} \sum_{k=1}^K \log\left[\...
...f}_0} \prod_{k=1}^K \left[\left\vert X(k{\hat f}_0)\right\vert+\epsilon\right]


K &=& \mbox{number of peaks}\\
k &=& \mbox{harmonic number of...
...lue on the order of spectral magnitude \emph{noise floor level}}

Note that freely vibrating strings are not exactly periodic due to exponenential decay, coupling effects, and stiffness (which stretches harmonics into quasiharmonic overtones, as explained in §6.9). However, non-stiff strings can often be analyzed as having approximately harmonic spectra ( $ \leftrightarrow$ periodic time waveform) over a limited time frame.

Since string spectra typically exhibit harmonically spaced nulls associated with the excitation and/or observation points, as well as from other phenomena such as recording multipath and/or reverberation, it is advisable to restrict $ K$ to a range that does not include any spectral nulls (or simply omit index $ k$ when $ k{\hat f}_0$ is too close to a spectral null), since even one spectral null can push the product of peak amplitudes to a very small value. As a practical matter, it is important to inspect the magnitude spectra of the data manually to ensure that a robust row of peaks is being matched by the harmonic comb. For example, a display of the frame magnitude spectrum overlaid with vertical lines at the optimized harmonic-comb frequencies yields an effective picture of the F0 estimate in which typical problems (such as octave errors) are readily seen.

References on F0 Estimation

An often-cited book on classical methods for pitch detection, particularly for voice, is that by Hess [192]. The harmonic comb method can be considered an approximate maximum-likelihood pitch estimator, and more accurate maximum-likelihood methods have been worked out [114,547,376,377]. More recently, Klapuri has been developing some promising methods for multiple pitch estimation [254,253,252].7.12A comparison of real-time pitch-tracking algorithms applied to guitar is given in [260], with consideration of latency (time delay).

Extension to Stiff Strings

An advantage of the harmonic-comb method, as well as other frequency-domain maximum-likelihood pitch-estimation methods, is that it is easily extended to accommodate stiff strings. For this, the stretch-factor in the spectral-peak center-frequencies can be estimated--the so-called coefficient of inharmonicity, and then the harmonic-comb (or other maximum-likelihood spectral-matching template) can be stretched by the same amount, so that when set to the correct pitch, the template matches the data spectrum more accurately than if harmonicity is assumed.

EDR-Based Loop-Filter Design

This section discusses use of the Energy Decay Relief (EDR) (§3.2.2) to measure the decay times of the partial overtones in a recorded vibrating string.

First we derive what to expect in the case of a simplified string model along the lines discussed in §6.7 above. Assume we have the synthesis model of Fig.6.12, where now the loss factor $ g^N$ is replaced by the digital filter $ H_l(z)$ that we wish to design. Let $ \underline{x}(n)$ denote the contents of the delay line as a vector at time $ n$, with $ \underline{x}(0)$ denoting the initial contents of the delay line.

For simplicity, we define the EDR based on a length $ N$ DFT of the delay-line vector $ \underline{x}$, and use a rectangular window with a ``hop size'' of $ N$ samples, i.e.,

$\displaystyle \underline{X}_m(\omega_k) \isdef \dft _{N,\omega_k}\{\underline{x}_m\}, \quad m=0,1,2,\ldots,

where $ \underline{x}_m(n)\isdef \underline{x}(mN)$. That is $ \underline{x}_m$ is simply the $ m$th successive snapshot of the delay-line contents, where the snapshots are taken once every $ N$ samples. We may interpret $ \underline{X}_m$ as $ m$th short-time spectrum of the output signal $ y^{+}(n)$ shown in Fig.6.12. Due to the special structure of our synthesis model, we have

$\displaystyle \underline{X}_m(\omega_k) = H_l^m(\omega_k) \underline{X}_0(\omega_k)

for each DFT bin number $ k\in[0,N-1]$.

Applying the definition of the EDR (§3.2.2) to this short-time spectrum gives

&\isdef & \sum_{\nu=m}^\infty \left\vert\underl...
...omega_k)\right\vert^2} \left\vert H_l(\omega_k)\right\vert^{2m}.

We therefore have the following recursion for successive EDR time-slices:7.13

$\displaystyle E_{m+1}(\omega_k) = \left\vert H_l(\omega_k)\right\vert^2 E_m(\omega_k)

Since we normally try to fit straight-line decays to the EDR on a log scale (typically a decibel scale), we will see the relation

$\displaystyle \log(E_{m+1}) = \log(E_m) + \log(\vert H_l\vert^2),

where the common argument $ \omega_k$ is dropped for notational simplicity. Since we require $ \vert H_l(\omega_k)\vert<1$ for stability of the filtered-delay loop, the EDR decays monotonically in this example. Thus, the measured slope of the partial overtone decays will be found to be proportional to $ \log(\vert H_l\vert)$.

This analysis can be generalized to a time-varying model in which the loop filter $ H_l$ is allowed to change once per ``period'' $ N$.7.14

An online laboratory exercise covering the practical details of measuring overtone decay-times and designing a corresponding loop filter is given in [280].

String Coupling Effects

It turns out that a single digital waveguide provides a relatively static-sounding string synthesizer. This is because several coupling mechanisms exist in natural string instruments. The overtones in coupled strings exhibit more interesting amplitude envelopes over time. Coupling between different strings is not the only important coupling phenomenon. In a real string, there are two orthogonal planes of transverse vibration which are intrinsically coupled to each other [181]. There is also intrinsic coupling between transverse vibrational waves and longitudinal waves (see §B.6).

Horizontal and Vertical Transverse Waves

The transverse waves considered up to now represent string vibration only in a single two-dimensional plane. One such plane can be chosen as being perpendicular to the top plate of a stringed musical instrument. We will call this the $ xy$ plane and refer to it as the vertical plane of polarization for transverse waves on a string (or simply the vertical component of the transverse vibration). To more fully model a real vibrating string, we also need to include transverse waves in the $ xz$ plane, i.e., a horizontal plane of polarization (or horizontal component of vibration). Any polarization for transverse traveling waves can be represented as a linear combination of horizontal and vertical polarizations, and general transverse string vibration in 3D can be expressed as a linear superposition of vibration in any two distinct polarizations.

Figure 6.19: Digital waveguide model of a rigidly terminated string vibrating transversally in three-dimensional space (two uncoupled planes of vibration).

If string terminations were perfectly rigid, the horizontal polarization would be largely independent of the vertical polarization, and an accurate model would consist of two identical, uncoupled, filtered delay loops (FDL), as depicted in Fig.6.19. One FDL models vertical force waves while the other models horizontal force waves. This model neglects the small degree of nonlinear coupling between horizontal and vertical traveling waves along the length of the string--valid when the string slope is much less than unity (see §B.6).

Note that the model for two orthogonal planes of vibration on a single string is identical to that for a single plane of vibration on two different strings.

Coupled Horizontal and Vertical Waves

No vibrating string in musical acoustics is truly rigidly terminated, because such a string would produce no sound through the body of the instrument.7.15Yielding terminations result in coupling of the horizontal and vertical planes of vibration. In typical acoustic stringed instruments, nearly all of this coupling takes place at the bridge of the instrument.

Figure 6.20: Digital waveguide model of a string in which vertical and horizontal planes of vibration are coupled linearly at the bridge.

Figure 6.20 illustrates the more realistic case of two planes of vibration which are linearly coupled at one end of the string (the ``bridge''). Denoting the traveling force waves entering the bridge from the vertical and horizontal vibration components by $ F_v^+(z)$ and $ F_h^+(z)$, respectively, the outgoing waves in each plane are given by

$\displaystyle \left[\begin{array}{c} F_v^-(z) \\ [2pt] F_h^-(z) \end{array}\rig...
...] \left[\begin{array}{c} F_v^+(z) \\ [2pt] F_h^+(z) \end{array}\right] \protect$ (7.16)

as shown in the figure.

In physically symmetric situations, we expect $ H_{vh}(z) = H_{hv}(z)$. That is, the transfer function from horizontal to vertical waves is normally the same as the transfer function from vertical to horizontal waves.

If we consider a single frequency $ \omega $, then the coupling matrix with $ z = e^{j\omega T}$ is a constant (generally complex) matrix (where $ T$ denotes the sampling interval as usual). An eigenanalysis of this matrix gives information about the modes of the coupled system and the damping and tuning of these modes [543].

As a simple example, suppose the coupling matrix $ \mathbf{H}(e^{j\omega T})$ at some frequency has the form

$\displaystyle \mathbf{H}(e^{j\omega T}) = \left[\begin{array}{cc} A & B \\ [2pt] B & A \end{array}\right]

where $ A$ and $ B$ are any complex numbers. This means both string terminations are identical, and the coupling is symmetric (the simplest case in practice). The eigenvectors are easily calculated to be7.16

$\displaystyle \underline{e}_1 = \left[\begin{array}{c} 1 \\ [2pt] 1 \end{array}...
\underline{e}_2 = \left[\begin{array}{c} 1 \\ [2pt] -1 \end{array}\right],

and the eigenvalues are $ A+B$ and $ A-B$, respectively.

The eigenvector $ \underline{e}_T=[1, 1]$ corresponds to ``in phase'' vibration of the two string endpoints, i.e., $ F_v(e^{j\omega T}) = F_h(e^{j\omega T})$, while $ \underline{e}_T=[1, -1]$ corresponds to ``opposite phase'' vibration, for which $ F_v(e^{j\omega T}) = -F_h(e^{j\omega T})$. If it happens to be the case that

$\displaystyle \vert A+B\vert<\vert A-B\vert

then the in-phase vibration component will decay faster than the opposite-phase vibration. This situation applies to coupled piano strings [543], as discussed further below.

More generally, the two eigenvectors of the coupling frequency-response matrix

$\displaystyle \mathbf{H}(e^{j\omega}) \isdef \left[\begin{array}{cc} H_{vv}(e^{...
...\omega}) \\ [2pt] H_{hv}(e^{j\omega}) & H_{hh}(e^{j\omega}) \end{array}\right]

correspond to two decoupled polarization planes. That is, at each frequency there are two eigenpolarizations in which incident vibration reflects in the same plane. In general, the eigenplanes change with frequency. A related analysis is given in [543].

By definition of the eigenvectors of $ \mathbf{H}(e^{j\omega T})$, we have

$\displaystyle \mathbf{H}(e^{j\omega T}) \underline{e}_i(e^{j\omega T}) = \lambda_i(e^{j\omega T})\underline{e}_i(e^{j\omega T})

where $ \lambda_i(e^{j\omega T})$ denotes the $ i$th eigenvalue of the coupling-matrix $ \mathbf{H}(e^{j\omega T})$ at frequency $ \omega $, where $ i=1,2$. Since the eigenvector $ \underline{e}_i(e^{j\omega T})$ holds the Fourier transform of the incoming waves for mode $ i$ of the coupled-string system, we see that the eigenvalues have the following interpretation:
The $ i$th eigenvalue of the coupling matrix equals the frequency response seen by the $ i$th eigenpolarization.
In particular, the modulus of the eigenvalue gives the reflectance magnitude (affecting mode damping), and the angle of the eigenvalue is the phase shift of the reflection, for that mode (affecting tuning of the mode). Use of coupling matrix eigenanalysis to determine mode damping and tuning is explored further in §C.13.

Asymmetry of Horizontal/Vertical Terminations

It is common in real stringed instruments that horizontal and vertical transverse waves are transduced differently at the bridge. For example, the bridge on a guitar is typically easier to ``push'' into the top plate than it is to ``shear'' sidewise along the top plate. In terms of Eq.$ \,$(6.16), we have $ \left\vert H_{hh}(e^{j\omega})\right\vert\gg\left\vert H_{vv}(e^{j\omega})\right\vert$ (at most frequencies). This unequal terminating impedance causes the horizontal component of string vibration to decay slower than the vertical component of vibration. We can say that this happens because the vertical bridge admittance is much greater than the horizontal admittance, giving rise to a faster rate of energy transfer from the vertical string polarization into the bridge--in other words, the bridge is more ``yielding'' in the vertical direction. The audible consequence of this unequal rate of decay is a two-stage amplitude envelope. The initial fast decay gives a strong onset to the note, while the slower late decay provides a long-lasting sustain--two normally opposing but desirable features.

Coupled Strings

We have just discussed the coupling between vertical and horizontal planes of vibration along a single string. There is also important coupling among different strings on the same instrument. For example, modern pianos are constructed having up to three physical strings associated with each key. These strings are slightly mistuned in order to sculpt the shape of the decay envelope, including its beating characteristics and two-stage decay. A two-stage decay is desired in piano strings in order to provide a strong initial attack followed by a long-sustaining ``aftersound'' [543], [18, Weinreich chapter].

A simple approximation to the effect of coupled strings is obtained by simply summing two or more slightly detuned strings. While this can provide a realistic beating effect in the amplitude envelope, it does not provide a true two-stage decay. A more realistic simulation of coupling requires signal to flow from each coupled string into all others.

When the bridge moves in response to string vibrations, traveling waves are generated along all other strings attached to the bridge. In the simplest case of a bridge modeled as a rigid body, the generated wave is identical on all strings. In §C.13, an efficient scattering formulation of string coupling at a bridge is derived for this case [439]. It can be seen as a simplification of the general coupling matrix shown in Fig.6.20 for the two-string (or two-polarization) case. Additionally, an eigenanalysis of the coupling matrix is performed, thereby extending the analysis of §6.12.2 above.

Longitudinal Waves

In addition to transverse waves on a string, there are always longitudinal waves present as well. In fact, longitudinal waves hold all of the potential energy associated with the transverse waves, and they carry the forward momentum in the direction of propagation associated with transverse traveling waves [122,391]. Longitudinal waves in a string typically travel an order of magnitude faster than transverse waves on the same string and are only weakly affected by changes in string tension.

Longitudinal waves are often neglected, e.g., in violin acoustics, because they couple inefficiently to the body through the bridge, and because they are ``out of tune'' anyway. However, there exist stringed instruments, such as the Finnish Kantele [231], in which longitudinal waves are too important to neglect. In the piano, longitudinal waves are quite audible; to bring this out in a striking way, sound example 5 provided in [18, Conklin chapter] plays Yankee Doodle on the longitudinal modes of three piano strings all tuned to the same (transversal) pitch. The nonlinear nature of the coupling from transverse to longitudinal waves has been demonstrated in [163]. Longitudinal waves have been included in some piano synthesis models [30,28,24,23].

Nonlinear Elements

Many musical instrument models require nonlinear elements, such as

Since a nonlinear element generally expands signal bandwidth, it can cause aliasing in a discrete-time implementation. In the above examples, the nonlinearity also appears inside a feedback loop. This means the bandwidth expansion compounds over time, causing more and more aliasing.

The topic of nonlinear systems analysis is vast, in part because there is no single analytical approach which is comprehensive. The situation is somewhat analogous to an attempt to characterize ``all non-bacterial life''. As a result, the only practical approach is to identify useful classes of nonlinear systems which are amenable to certain kinds of analysis and characterization. In this chapter, we will look at certain classes of memoryless and passive nonlinear elements which are often used in digital waveguide modeling.

It is important to keep in mind that a nonlinear element may not be characterized by its impulse response, frequency response, transfer function, or the like. These concepts are only defined, in general, for linear time-invariant systems. However, it is possible to generalize these notions for nonlinear systems using constructs such as Volterra series expansions [52]. However, rather than getting involved with fairly general analysis tools, we will focus instead on approaching each class of nonlinear elements in the manner that best fits that class, with the main goal being to understand its audible effects on discrete-time signals.

Memoryless Nonlinearities

Memoryless or instantaneous nonlinearities form the simplest and most commonly implemented form of nonlinear element. Furthermore, many complex nonlinear systems can be broken down into a linear system containing a memoryless nonlinearity.

Given a sampled input signal $ x(n)$, the output of any memoryless nonlinearity can be written as

$\displaystyle y(n) = f(x(n))

where $ f(\cdot)$ is ``some function'' which maps numbers to real numbers. We exclude the special case $ f(x)=\alpha x$ which defines a simple linear gain of $ \alpha$.

The fact that a function may be used to describe the nonlinearity implies that each input value is mapped to a unique output value. If it is also true that each output value is mapped to a unique input value, then the function is said to be one-to-one, and the mapping is invertible. If the function is instead ``many-to-one,'' then the inverse is ambiguous, with more than one input value corresponding to the same output value.

Clipping Nonlinearity

A simple example of a noninvertible (many-to-one) memoryless nonlinearity is the clipping nonlinearity, well known to anyone who records or synthesizes audio signals. In normalized form, the clipping nonlinearity is defined by

$\displaystyle f(x) = \left\{\begin{array}{ll} -1, & x\leq -1 \\ [5pt] x, & -1 \leq x \leq 1 \\ [5pt] 1, & x\geq 1 \\ \end{array} \right. \protect$ (7.17)

Since the clipping nonlinearity abruptly transitions from linear to hard-clipped in a non-invertible, heavily aliasing manner, it is usually desirable to use some form of soft-clipping before entering the hard-clipping range.

Arctangent Nonlinearity

A simple example of an invertible (one-to-one) memoryless nonlinearity is the arctangent mapping:

$\displaystyle f(x) = \frac{2}{\pi}\arctan(\alpha x), \quad x\in[-1,1]

where normally $ \alpha\gg 1$. This function is graphed for $ \alpha=10$ in Fig.6.21. (Recall that $ \arctan(x)$ is defined as the angle whose tangent is $ x$. Only angles between $ -\pi /2$ and $ \pi /2$ are needed to cover all real values of $ x$.)

Figure 6.21: Arctangent nonlinearity.

Cubic Soft Clipper

In §9.1.6, we used the cubic soft-clipper to simulate amplifier distortion:

$\displaystyle f(x) = \left\{\begin{array}{ll}
-\frac{2}{3}, & x\leq -1 \\ [5pt]...
...{3}, & -1 \leq x \leq 1 \\ [5pt]
\frac{2}{3}, & x\geq 1 \\

This function is diagrammed in Fig.6.22. It is non-invertible when driven into ``hard clipping''.

Figure 6.22: Cubic nonlinearity.

Series Expansions

Any ``smooth'' function $ f(x)$ can be expanded as a Taylor series expansion:

$\displaystyle f(x) = f(0) + \frac{f^\prime(0)}{1}(x) + \frac{f^{\prime\prime}(0...
...cdot 2}(x)^2 + \frac{f^{\prime\prime\prime}(0)}{1\cdot 2\cdot 3}(x)^3 + \cdots,$ (7.18)

where ``smooth'' means that derivatives of all orders must exist over the range of validity. Derivatives of all orders are obviously needed at $ x=0$ by the above expansion, and for the expansion to be valid everywhere, the function $ f(x)$ must be smooth for all $ x$.

Arctangent Series Expansion

For example, the arctangent function used above can be expanded as

$\displaystyle \arctan(x) = x - \frac{x^3}{3} + \frac{x^5}{5} - \frac{x^7}{7} + \cdots

Note that all even-order terms are zero. This is always the case for odd functions, i.e., functions satisfying $ f(-x)=-f(x)$. For any smooth function, the odd-order terms of its Taylor expansion comprise the odd part of the function, while the even-order terms comprise the even part. The original function is clearly given by the sum of its odd and even parts.7.17

The clipping nonlinearity in Eq.$ \,$(6.17) is not so amenable to a series expansion. In fact, it is its own series expansion! Since it is not differentiable at $ x=\pm1$, it must be represented as three separate series over the intervals $ (-\infty,1]$, $ [-1,1]$, and $ [1,\infty)$, and the result obtained over these intervals is precisely the definition of $ f(x)$ in Eq.$ \,$(6.17).

Spectrum of a Memoryless Nonlinearities

The series expansion of a memoryless nonlinearity is a useful tool for quantifying the aliasing caused by that nonlinear mapping when introduced into the signal path of a discrete-time system.

Square Law Series Expansion

When viewed as a Taylor series expansion such as Eq.$ \,$(6.18), the simplest nonlinearity is clearly the square law nonlinearity:

$\displaystyle f(x) = x + \alpha x^2

where $ \alpha$ is a parameter of the mapping.7.18

Consider a simple signal processing system consisting only of the square-law nonlinearity:

$\displaystyle y(n) = x(n) + \alpha x^2(n)

The Fourier transform of the output signal is easily found using the dual of the convolution theorem:7.19

$\displaystyle Y(\omega) = X(\omega) + \alpha (X\ast X)(\omega)

where ``$ \ast $'' denotes convolution. In general, the bandwidth of $ X\ast X$ is double that of $ X$.

Power Law Spectrum

More generally,

$\displaystyle x^k(n) \;\longleftrightarrow\; \underbrace{(X\ast X \ast \cdots \ast X)}_{\mbox{$k$\ times}}(\omega)

so that the spectral bandwidth of $ x^k(n)$ is $ k$ times that of $ x(n)$, in general.

In summary, the spectrum at the output of the square-law nonlinearity can be written as

$\displaystyle Y(\omega) = X(\omega) + \alpha (X\ast X)(\omega)

This illustrates the great value of series expansions for nonlinearities. In audio signal processing systems, it is essential to keep track of the spectral representation of all output signals, since we hear directly in terms of signal spectra.7.20

Arctangent Spectrum

Since the series expansion of the arctangent nonlinearity is

$\displaystyle \arctan(x) = x - \frac{x^3}{3} + \frac{x^5}{5} - \frac{x^7}{7} + \cdots

bandwidth expansion is infinite (in continuous time). Thus, we expect problems with aliasing.

Cubic Soft-Clipper Spectrum

The cubic soft-clipper, like any polynomial nonlinearity, is defined directly by its series expansion:

$\displaystyle f(x) = \left\{\begin{array}{ll}
-\frac{2}{3}, & x\leq -1 \\ [5pt]...
...{3}, & -1 \leq x \leq 1 \\ [5pt]
\frac{2}{3}, & x\geq 1 \\
$ (7.19)

In the absence of hard-clipping ( $ \left\vert x\right\vert\leq1$), bandwidth expansion is limited to a factor of three. This is the slowest aliasing rate obtainable for an odd nonlinearity. Note that smoothing the ``corner'' in the clipping nonlinearity can reduce the severe bandwidth expansion associated with hard-clipping.

Stability of Nonlinear Feedback Loops

In general, placing a memoryless nonlinearity $ f(x)$ in a stable feedback loop preserves stability provided the gain of the nonlinearity is less than one, i.e., $ \vert f(x)\vert\le \vert x\vert$. A simple proof for the case of a loop consisting of a continuous-time delay-line and memoryless-nonlinearity is as follows.

The delay line can be interpreted as a waveguide model of an ideal string or acoustic pipe having wave impedance $ R>0$ and a noninverting reflection at its midpoint. A memoryless nonlinearity is a special case of an arbitrary time-varying gain [449]. By hypothesis, this gain has magnitude less than one. By routing the output of the delay line back to its input, the gain plays the role of a reflectance $ g$ at the ``other end'' of the ideal string or acoustic pipe. We can imagine, for example, a terminating dashpot with randomly varying positive resistance $ \mu(t)>0$. The set of all $ \mu>0$ corresponds to the set of real reflection coefficients $ g=(\mu-R)/(\mu+R)$ in the open interval $ (-1,1)$. Thus, each instantaneous nonlinearity-gain $ -1<g<1$ corresponds to some instantaneously positive resistance $ \mu>0$. The whole system is therefore passive, even as $ \mu(t)$ changes arbitrarily (while remaining positive). (It is perhaps easier to ponder a charged capacitor $ C$ terminated on a randomly varying resistor $ R(t)$.) This proof method immediately extends to nonlinear feedback around any transfer function that can be interpreted as the reflectance of a passive physical system, i.e., any transfer function $ H(s)$ for which the gain is bounded by 1 at each frequency, viz., $ \vert H(j\omega)\vert\le 1$.

The finite-sampling-rate case can be embedded in a passive infinite-sampling-rate case by replacing each sample with a constant pulse lasting $ T$ seconds (in the delay line). The continuous-time memoryless nonlinearity $ g[x(t)]$ is similarly a held version of the discrete-time case $ g[x(nT)]$. Since the discrete-time case is a simple sampling of the (passive) continuous-time case, we are done.

Practical Advice

In summary, the following pointers can be offered regarding nonlinear elements in a digital waveguide model:

  • Verify that aliasing can be heard and sounds bad before working to get rid of it.

  • Aliasing (bandwidth expansion) is reduced by smoothing ``corners'' in the nonlinearity.

  • Consider an oversampling factor for nonlinear subsystems sufficient to accommodate the bandwidth expansion caused by the nonlinearity.

  • Make sure there is adequate lowpass filtering in a feedback loop containing a nonlinearity.

As a specific example, consider the cubic nonlinearity used in a feedback loop (as in §9.1.6). This can be done with no aliasing at low levels (i.e., at levels below hard clipping) provided we use To avoid $ 3\times$ oversampling in the entire feedback loop, we may downsample by 3 after the lowpass filter and upsample by 3 just before the nonlinearity. If the lowpass filter is good, the downsampling by 3 is trivially accomplished by throwing away every 2 out of 3 samples. For upsampling, however, an additional third-band lowpass-filter is needed for the interpolation (§4.4).

Another variation is to oversample by two, in which case there is aliasing, but that aliasing does not reach the ``base band.'' Therefore, a half-band lowpass filter rejects both the second spectral image and the third, which is aliased onto the second.

Next Section:
Lumped Models
Previous Section:
Time-Varying Delay Effects